Algorith Design Analysis Wirless Networks
Algorith Design Analysis Wirless Networks
Lin Chen
Habilitation Thesis
Lin CHEN
Committee
M. Tamer BASAR Reviewer
M. Pierre FRAIGNIAUD Reviewer
M. Bruno GAUJAL Reviewer
M. Eitan ALTMAN Examiner
M. Joffroy BEAUQUIER Examiner
Mme. Johanne COHEN Examiner
M. Mérouane DEBBAH Examiner (president)
M. Fabio MARTIGNON Examiner
Abstract
Algorithms are perhaps the most fundamental and fascinating elements in computer science as
a whole. Networks and networked systems are no exception. This habilitation thesis summarizes
my research during the last eight years on some algorithmic problems of both fundamental and
practical importance in modern networks and networked systems, more specifically, wireless net-
works. Generically, wireless networks have a number of common features which form a common
ground on which algorithms for wireless networks are designed. These features include the lack
of network-wide coordination, large number of nodes, limited energy and computation resource,
and the unreliable wireless links. These constraints and considerations make the algorithmic study
for wireless networks an emerging research field requiring new tools and methodologies, some of
which cannot be drawn from existing state-of-the-art research in either algorithm or networking
community.
Motivated by this observation, we aim at making a tiny while systematic step forwards in the
design and analysis of algorithms that can scale elegantly, act efficiently in terms of computation
and communication, while keeping operations as local and distributed as possible. Specifically, we
expose our works on a number of algorithmic problems in emerging wireless networks that are
simple to state and intuitively understandable, while of both fundamental and practical importance,
and require non-trivial efforts to solve. These problems include (1) channel rendezvous and neigh-
bor discovery, (2) opportunistic channel access, (3) distributed learning, (4) path optimization
and scheduling, (5) algorithm design and analysis in radio-frequency identification systems.
Methodologically, most of our analysis is systematically articulated as follows.
• Theoretical performance bound. After formulating the target problem, we analytically char-
acterize the performance of the optimal solution as well as some natural and intuitive al-
gorithms in some cases. These results usually give us pertinent insights on the structural
properties of the problem including the theoretical limit and the performance gap between
the limit and any algorithm that is not carefully devised.
• Optimum or approximation algorithm design. Guided by the theoretical results estab-
lished in the first step, we then direct our efforts to the design and analysis of efficient
algorithms for the target problem. By efficient we mean that our algorithms produce either
the optimum solution, or, in case where the problem is NP-hard, constant-factor or logarithmic
approximations in polynomial or quasi-polynomial time.
• Further extension and generalization. Once we have established a complete framework
solving or approximately solving the problem, we further analyze the lessons that can be
learnt from the analysis process and demonstrate how our framework can be extended or
adapted to address a generic class of problems in a wider range of applications presenting
similar structural properties.
1
Contents
Contents
Abstract 1
1 Introduction 6
1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Thesis Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Overview of Major Results and Thesis Organization . . . . . . . . . . . . . . . . . . 8
2
Contents
3
Contents
4
Contents
Bibliography 142
Publications 155
5
Chapter 1
Introduction
Algorithms are perhaps the most fundamental and fascinating elements in computer science
as a whole. Networks and networked systems are no exception. This habilitation thesis summa-
rizes my research during the last eight years on some algorithmic problems of both fundamental
and practical importance in modern networks and networked systems, more specifically, wireless
networks.
The last two decades have witnessed an unprecedented success of wireless networks due to the
proliferation of inexpensive, widely available wireless devices. Examples of classic and emerging
wireless networks include cellular networks, wireless local area networks (WLAN), wireless sensor
networks (WSN), disruption tolerant networks (DTN), cognitive radio networks (CRN), etc. More
formally, a wireless network can be regarded as an interconnection of wireless devices aiming at
providing ubiquitous communication and computing services regardless of location, mobility and
other properties of individual devices.
6
Chapter 1. Introduction
• Limited computing resource. The quest of compatible size and low energy consumption
significantly limits the computing and processing capability of individual nodes in many
wireless networks. Specifically, many wireless nodes only have a small CPU and memory.
• Unreliable wireless links. Wireless links are notoriously unreliable and error-prone. Hence,
algorithms should be robust in the sense that they do not rely on long term reliability of
wireless links and individual nodes.
The above constraints and considerations make the algorithmic study for wireless networks an
emerging research field requiring new tools and methodologies, some of which cannot be drawn
from existing state-of-the-art research in either algorithm or networking community. Motivated by
this observation, by the present habilitation thesis we hope to make a tiny while systematic step
forwards in the design and analysis of algorithms that can scale elegantly, act efficiently in terms
of computation and communication, while keeping operations as local and distributed as possible.
7
Chapter 1. Introduction
8
Chapter 1. Introduction
placed each in a room with N telephones connecting the two rooms. The players do not know how
the telephones are interconnected. In each round, each player picks up a phone and says hello
until when they hear each other. The problem is to devise an algorithm minimizing the delay to
establish communication. We investigate a generalized version where among N telephones, only a
subset can establish communication between the two players. We devise a deterministic algorithm
achieving bounded and order-minimum worst-case rendezvous delay. The algorithm we develop
can be applied to solve the heterogeneous rendezvous problem by regarding telephone lines as
channels. We then proceed to study neighbor discovery, where a pair of neighbor nodes need to meet
each other on the same channel to discover each other. The analogy between channel rendezvous
and neighbor discovery is evident. However, neighbor discovery has an additional constraint: each
node has a duty cycle which limits the fraction of time when the node is awake. Two nodes can
discover each other only if they both wake up on the same channel at the same time. We provide
a complete treatment of the neighbor discovery problem, derive the performance limit for any
neighbor discovery algorithm and develop an order-optimum algorithm achieving the limit. Our
results on this topic are published in [40, 41, 43, 45, 49, 50, 52].
9
Chapter 1. Introduction
distributed channel access algorithms based on imitation, a behavior rule widely applied in human
societies consisting of imitating successful behavior. We establish the convergence of the proposed
algorithms to an imitation-stable equilibrium which is also the -optimum of the system. Simple,
natural and incentive-compatible, the proposed spectrum access algorithms can be implemented
distributedly based on solely local interactions. We then consider the case where instead of imitat-
ing other nodes, a node imitates only its behavior that has brought him higher payoff in the past.
Such self-imitation demonstrates more robustness in the case where imitating others is not possible
or reliable. Technically, we develop and analyze a framework of retrospective spectrum access
based on stochastic learning that has two features: (1) the entirely distributed implementation
requiring only local observations and (2) the guaranteed statistical convergence to the equilibrium
state within a bounded delay. Part of the work of this chapter is the topic of the thesis of my
former Ph.D. student Stefano Iellamo (co-advised with Pr. Marceau Coupechoux) who is actually a
Marie-Curie research fellow at ICS-FORTH. The ongoing thesis of Mira Morcos (co-advised with Pr.
Tijani Chahed) is also related to this topic. Our results on this topic are published in [23 – 26, 70,
71, 94 – 96].
10
Chapter 1. Introduction
scales. We start by studying the stability of the Frame Slotted Aloha (FSA) protocol, the de facto
MAC layer standard in tag identification. Very limited work has been done on the stability of FSA.
To bridge this gap, we investigate the stability properties of FSA by focusing on two physical layer
models of practical importance, the models with single packet reception (SPR) and multipacket
reception (MPR) capabilities. By employing drift analysis, we obtain the closed-form conditions for
the stability of FSA and when the stability region is maximized. Furthermore, to characterize sys-
tem behavior in the instability region, we mathematically demonstrate the existence of transience
of the backlog Markov chain. We then proceed to study the problem of tag population estimation
by developing a generic framework of stable and accurate tag population estimation schemes for
both static and dynamic RFID systems. By generic, we mean that our framework does not require
any prior knowledge on the tag arrival and departure patterns. By performing Lyapunov drift
analysis, we mathematically prove the efficiency and stability of our framework. We complete our
work with a comprehensive analysis on the detection of missing tags, one of the most important
RFID applications. Specifically, we develop a suite of three missing tag detection algorithms, each
decreasing the execution time compared to its predecessor. By sequentially analysing the developed
algorithms, we gradually iron out an optimum detection algorithm that works in practice. The work
of this chapter is in collaboration with my former Ph.D. student Jihong Yu who has just defended
his thesis and started his post-doc at Simon Fraser University in Canada. Our results on this topic
are published in [227 – 230].
Chapter 7 concludes the habilitation thesis by summarizing our other research works related
to the thesis and pointing out some perspectives for future research. In addition to a summary of
the future directions given in the concluding section of each chapter, focusing on the immediate
extensions of the corresponding models and analysis, in this concluding chapter we take a broader
view and consider more general directions in algorithm design and analysis in emerging networked
systems.
11
Chapter 2
2.1 Introduction
The operating frequencies of today’s mobile devices typically span a swath of spectrum sub-
divided into multiple orthogonal channels. In such multi-channel wireless networks, establishing
communication sessions requires the communicating pairs to meet each other on a common chan-
nel via a rendezvous process to exchange control information before initiating data communications.
The use of a single common control channel for rendezvous simplifies the rendezvous process but
it creates a single point of failure as the common control channel may become temporarily un-
available, leading to the rendezvous failure problem. Therefore, channel hopping (CH) approaches
have been widely used to create multiple rendezvous channels. Specifically, each node starts a
channel hopping process according to its own CH sequence and local clock. The CH sequences are
carefully chosen to spread out rendezvous over multiple pairwise rendezvous channels.
There are three major challenges in the design of distributed CH-based rendezvous algorithm.
• Lack of clock synchronization. It is difficult to require two independent nodes to have synchro-
nized clocks without any handshake between them.
• Asymmetrical perceptions of channel availability. Nodes may have different set of accessible
channels, thus increasing the difficulty in finding a common rendezvous channel.
• No universal channel set or common channel index system. Nodes may not have the same
knowledge on the universal channel set or a common way of mapping channel indices to
frequencies. For example, given three channels with center frequencies fa , fb and fc , two
nodes using the same CH sequence 0, 1, 2 fail to achieve rendezvous when one node maps
indices 0, 1, 2 to frequencies fa , fb , fc , while the other maps indices 0, 1, 2 to fb , fc , fa .
We emphasize that it is the holistic combination of the above three challenges that makes the
design of distributed rendezvous algorithm non-trivial. Formally, we coin the term heterogeneous
channel rendezvous problem to denote the following problem.
Problem 2.1 (Heterogeneous channel rendezvous). How can two nodes, given asynchronous local
clocks, asymmetrical channel perceptions, no universal channel sets, and heterogeneous channel index
systems, rendezvous with each other within a minimum upper-bounded rendezvous delay?
12
Chapter 2. Channel Rendezvous and Neighbor Discovery
The heterogeneous rendezvous problem we define is the most generic form of the channel
rendezvous problem, in which we remove all the assumptions which may not be realistic in some
wireless networking scenarios although they render the problem significantly more tractable. To
tackle the problem, we cast it to the telephone coordination game, a problem of fundamental
importance in distributed algorithm design, defined as below.
Problem 2.2 (Telephone coordination game). Two players wishing to communicate are placed each
in a room with N telephones connecting the two rooms. The players do not know how the telephones
are interconnected. In each round, each player picks up a phone and says hello until when they hear
each other. The problem is to devise an algorithm minimizing the delay to establish communication.
In our work, we investigate a generalized version where among N telephones, only a subset can
establish communication between the two players. We devise a deterministic algorithm achieving
bounded and order-minimum worst-case rendezvous delay. The algorithm we develop can be
applied to solve the heterogeneous rendezvous problem by regarding telephone lines as channels.
After solving the channel rendezvous problem, we proceed to study a related problem, neighbor
discovery, where a pair of neighbor nodes need to meet each other on the same channel to discover
each other. The analogy between the two problems is evident. However, neighbor discovery has
a distinguished constraint besides the design challenges in channel rendezvous. Each node has
a duty cycle which limits the fraction of time when the node is awake. Two nodes can discover
each other only if they both wake up on the same channel at the same time. Compared to channel
rendezvous, neighbor discovery has an additional dimension, duty cycle, which makes the problem
harder and calls for a dedicated investigation. Formally, we define the following heterogeneous
neighbor discovery problem.
Problem 2.3 (Heterogeneous neighbor discovery). How can neighbor nodes with heterogeneous
duty cycles, operating on different channels, without clock synchronization, discover each other over
every common channel within a bounded delay?
13
Chapter 2. Channel Rendezvous and Neighbor Discovery
The rest of this chapter is structured as follows. Section 2.2 develops our work on the telephone
problem and channel rendezvous. Section 2.3 presents our work on neighbor discovery. Section 2.4
concludes the chapter by briefly summarizing our other related work related to this topic. More
details of our work on this topic including proofs and numerical analysis can be found in our
publications [40, 41, 43, 45, 49, 50, 52].
2.2.1 Introduction
The Telephone Coordination Game, also referred to as the Telephone Problem, was first formally
introduced by Alpern in 1976 [9] as follows: Two players wishing to communicate with each other
are placed each in a distinct room with N telephone lines connecting the two rooms. The players
do not know how the telephones are interconnected. The game is played in discrete steps termed
as rounds. In each round t = 0, 1, 2, · · · , each player picks up a phone and says “hello” until when
they hear each other. The common aim of the two players is to minimize the time until they can
hear each other. The major difficulty that makes the Telephone Problem non-trivial is the lack of
any form of coordination between the two players, as summarized in the following:
• No common phone labeling: The N telephones are identical and not labeled or ordered in any
way. In other words, if each player puts a label on each phone for the purpose of identification,
by no means they can have any common labeling of the phones.
• No time synchronization: Each player is unaware of the moment when the other starts the
game. That is, there does not exist any external signal coordinating the searching process
such as starting or stopping signals.
• No pre-assigned roles: The players do not have pre-assigned roles as a caller or a callee
because such role assignment requires some form of coordination. Even if such coordination
is possible, a player may wish to be a caller to initiate a communication session and a callee
at the same time to accept incoming communication sessions from other players. In this case,
it is impossible to assign a role to each player.
The Telephone Problem reflects a typical paradigm of distributed algorithm design without any
prior coordination among agents. Despite (or thanks to) its simple and generic formulation, a num-
ber of engineering problems can be cast into it, ranging from communication link establishment,
meeting scheduling to web-crawling.
Mathematically, the Telephone Problem belongs to the field of Rendezvous Search Games [10,
11]. Due to its application in many engineering problems, the Telephone Problem and its variants
have attracted extensive research attention, but the original Telephone Problem still remains open,
even though it is simple to state and intuitively understandable. Until today, very little result
has been reported on the structure of the optimum algorithm, among which the most important
progress toward characterizing the structure of the optimum probabilistic algorithm was made by
Anderson and Weber [15].
14
Chapter 2. Channel Rendezvous and Neighbor Discovery
Anderson and Weber studied the Telephone Problem using another formulation as a symmet-
rical rendezvous search game on N locations between two players [15]. Each player can switch
across different locations freely from one to any other location and the delay required to pass from
one location to another is negligible. The objective is to find the optimum algorithm of visiting a lo-
cation each round to minimize the expected rendezvous delay. By regarding locations as telephones,
the rendezvous search game considered in [15] can be cast into the Telephone Problem.
Aiming at minimizing the expect rendezvous delay, Anderson and Weber developed a proba-
bilistic algorithm, referred to as Anderson and Weber algorithm, in which each player waits where it
is for N − 1 periods with probability θ and switch to a random permutation of the remaining N − 1
locations with probability 1 − θ. The algorithm is repeated each N − 1 periods until rendezvous.
By calculating the value of θ, they have demonstrated the optimality of the proposed algorithm in
minimizing the expected rendezvous delay for the cases N = 2 and 3. However, for other values of
N , even small values, the Telephone Problem remains unsolved.
We revisit the classical Telephone Problem in its generic form. Specifically, we investigate a
generalized version of the Telephone Problem by studying the situation where among the N telephone
lines, only a subset of them, unknown to the players, can establish the communication between the
two players. Others are connected to telephones in other rooms than the rooms of the players. This
generalization of the original Telephone Problem, termed as Generalized Telephone Problem, can
capture a number of engineering and system constraints in practical settings, e.g., in the channel
establishment problem, a channel may be temporally occupied by other users and thus cannot be
accessed; in the problem of robot rendezvous, some places may be inaccessible for some robots
due to resource constraint or security reasons. Compared to the original Telephone Problem, the
Generalized Telephone Problem we consider has one more difficulty, recaptured as follows:
• Partial telephone connection: Only a subset of the N telephones, not known by the players,
can establish the communication between the two players.
In the Generalized Telephone Problem, we are interested in finding the optimum deterministic
algorithm that is able to achieve bounded rendezvous delay and that minimizes the worst-case
rendezvous delay. Our work consists of a complementary research thrust compared to that pio-
neered by Anderson and Weber in [15] seeking probabilistic strategies minimizing the expected
meeting delay. Our focus on the deterministic algorithm is motivated by the long-tail effect of
the probabilistic strategies in which two players may experience extremely long and unbounded
delay before they can meet each other and consequently the need of rendezvous strategies that
can satisfy engineering applications requiring bounded rendezvous delay.
Particularly, we investigate the following natural while fundamental questions:
• What is the structure of a deterministic algorithm achieving bounded rendezvous delay?
• What is the worst-case rendezvous delay bound for any deterministic algorithm?
• How to design a deterministic algorithm approaching the worst-case delay bound?
By our analysis, we give the answers to the above questions.
• D-bounded phone pick sequence: We characterize the structure of the phone pick sequences of
15
Chapter 2. Channel Rendezvous and Neighbor Discovery
the deterministic strategies (termed as D-bounded phone pick sequences) that can guarantee
rendezvous without any coordination;
• Worst-case rendezvous delay bound: We prove that the lower-bound of the worst-case ren-
dezvous delay of any deterministic algorithm is O(N 2 );
• Zero-knowledge Rendezvous: We devise a deterministic algorithm, called Zero-knowledge Ren-
dezvous, that (1) guarantees rendezvous between the players regardless of their telephone
labeling functions and their relative time difference and (2) approaches the performance
limit without any prior knowledge or coordination.
The Telephone Problem and the related discovery and rendezvous problems belong to the field
of Rendezvous Search Games, which is extensively surveyed in [10]. In the following, we briefly
summarize the related work.
Original Telephone Problem: Despite significant research efforts, the original Telephone Prob-
lem still remains open today, even though it is simple to state and intuitively understandable. The
optimum probabilistic algorithm minimizing the expected rendezvous delay has only been derived
for the cases N = 2 and 3 [15], [211]. For other values of N (even small values), characterizing
the optimum probabilistic algorithm remains unsolved.
Rendezvous games on graphs: Recently, motivated by the rendezvous problems between
robots, the rendezvous games on graphs and their different variants have attracted significant
research attention, both from probabilistic and deterministic perspectives (cf. [61, 63, 64, 186]
and the references therein). Particularly, concerning the deterministic strategies which are more
related to our work, although a number of solutions have been proposed to achieve rendezvous on
graphs, the majority of them are focused on specific graphs and develop strategies with polynomial
complexity, leaving the optimality of the propositions unaddressed. Motivated by this observation,
we focus on establishing the theoretical performance bound for any deterministic algorithm and
devising strategies that can approach the performance bound. The problem we address can be
regarded as a specific version of rendezvous game on graphs where players can switch freely from
one vertex to any other vertex.
Channel rendezvous problem: Our work is also related and applicable to the channel ren-
dezvous problem in multi-channel networks, in which a number of distributed channel rendezvous
solutions have been proposed in the literature recently [19, 30, 31, 121, 190, 214, 236]. However,
none of them addressed all the three challenges due to the lack of coordination among players, i.e.,
common telephone labeling function, synchronization, and preassigned player roles. We would
like to point out that it is the holistic combination of the three challenges that makes the design of
rendezvous strategies far from trivial. Particularly, most, if not all, of the existing work implicitly
assumes identical channel labeling for the rendezvous pair. However, establishing reliable chan-
nel labeling requires coordination between the rendezvous pair, which cannot be satisfied before
the rendezvous is achieved. Some rendezvous schemes use the quality of a channel as its label.
This approach is not reliable, either because the perceived quality of the same channel may vary
significantly at different nodes due to their locations. Moreover, due to the application-oriented
approach, none of existing work has a complete study from the theoretical perspective as ours on
the channel rendezvous problem which can be cast into the Telephone Problem.
16
Chapter 2. Channel Rendezvous and Neighbor Discovery
We first formulate the Generalized Telephone Problem and the deterministic rendezvous algo-
rithm.
Consider two players, Alice (a) and Bob (b), who wish to communicate with each other, each
placed in a distinct room. In the room of Alice (Bob), there are Na (Nb ) telephones among which
Nc ≤ min{Na , Nb } telephones can be used to connect the two rooms. Other telephones may be
connected to those in places other than the rooms of Alice and Bob. In our analysis, we focus on
the extreme case by setting Nc = 1, i.e., there is only one telephone connecting the two rooms.
Our focus on the extreme case allows us to concentrate on the essential properties of the problem
and the resulting algorithm. The extension to the general case where Nc > 1 is straightforward.
The two players do not know how the telephones are interconnected.
Time is divided into rounds. In each round t = 0, 1, 2, · · · , each player picks up a phone and
says “hello” until when they hear each other, termed as a pairwise rendezvous. The common aim
of the two players is to minimize the worst-case delay until they can hear each other. As in the
original Telephone Problem, no prior coordination is possible between Alice and Bob, meaning that
they do not have (1) common phone labeling, (2) time synchronization, (3) pre-assigned roles.
We now introduce the telephone labeling function to formalize the first constraint.
Definition 2.1 (Telephone Labeling Function). Denote C the index set of all telephone lines where
each index c ∈ C denotes the pair of telephones connected by connection c. For each player i (i = a, b)
with a set Ni ∈ C of telephones in its room, we define the telephone label function fi as an bi-injective
mapping
fi : Ni → {0, 1, · · · , Ni − 1},
where ∀c1 , c2 ∈ Ni , c1 6= c2 implies fi (c1 ) 6= fi (c2 ). We define fi−1 : {0, 1, · · · , Ni − 1} → Ni as the
inverse mapping of fi .
Remark. Definition 2.1 basically states that each player i has its own labeling of the Ni telephones in
his room that may differ from the global label set of the Ni telephones Ni . Using Definition 2.1,Twe can
express formally the constraint that there is only one telephone connecting the two rooms (|Na Nb | =
1) as follows: There exists a unique telephone line c∗ ∈ C such that there exist 0 ≤ ha ≤ Na − 1 and
0 ≤ hb ≤ Nb − 1 such that fa (c∗ ) = ha and fb (c∗ ) = hb , or equivalently, fa−1 (ha ) = fb−1 (hb ) = c∗ .
Example 1. Consider a system with C = {c0 , c1 , c2 , c3 }. There are 2 and 3 telephones in the room of
Alice and Bob, respectively, with Na = {c0 , c1 }, Nb = {c1 , c2 , c3 }. The telephone labeling functions of
Alice and Bob are: fa (c0 ) = 0, fa (c1 ) = 1 and fb (c1 ) = 2, fb (c2 ) = 1, fb (c3 ) = 0. It can be noted that
Alice and Bob can communicate only via telephone line c1 . Mathematically, fa−1 (1) = fb−1 (2) = c1 .
17
Chapter 2. Channel Rendezvous and Neighbor Discovery
As analysed previously, any probabilistic algorithm cannot achieve bounded rendezvous delay
and suffers from the long-tail rendezvous latency problem in which Alice and Bob may experience
extremely long delay before they can rendezvous. Motivated by this observation, we consider
deterministic rendezvous strategies in which each player picks up a phone each round based on a
specific sequence so as to rendezvous with its peer. We call such sequence the phone pick sequence
and give its formal definition in the following.
Definition 2.2 (Phone Pick Sequence). The phone pick sequence of a player is defined as a sequence
u , {ut }0≤t≤Tu −1 where ut is the index of the telephone picked by the player in round t based on its
own labeling, Tu is the period of the sequence1 .
Given the phone pick sequences of Alice and Bob denoted Tas u and v, whose periods are denoted
as Ta and Tb , if there exists t ∈ [0, Ta Tb − 1] and c ∈ Na Nb such that fa (ut ) = fb−1 (vt ) = c,
−1
Alice and Bob can rendezvous in round t on telephone line c. t is called the rendezvous round and
c the rendezvous telephone.
Example 2. Consider the setting of Example 1 with the following phone pick sequences for Alice and
Bob: u = {0, 1} with Ta = 2 and v = {0, 1, 2} with Tb = 3. It can be noted that they can rendezvous
in round 5 on telephone c1 . However, if Alice and Bob operate on the following phone pick sequences:
u = {0, 1} with Ta = 2 and v = {2, 1, 0, 1} with Tb = 4, they can never rendezvous. The phone pick
sequences and the rendezvous process are illustrated in Fig. 2.1.
Alice: 0 (c0) 1 (c1 ) 0 (c0) 1 (c1) 0 (c0) 1 (c1 ) ... Alice: 0 (c0) 1 (c1 ) 0 (c0) 1 (c1) ...
Bob: 0 (c3) 1 (c2 ) 2 (c1) 0 (c3) 1 (c2) 2 (c1 ) ... Bob: 2 (c1) 1 (c2 ) 0 (c3) 1 (c2) ...
Figure 2.1: Example of phone pick sequences: left: u = {0, 1}, v = {0, 1, 2}; right: u = {0, 1},
v = {2, 1, 0, 1}.
To model the situation where the players are not synchronized such that they may start their
search in different time instances, we apply the concept of cyclic rotation to the phone pick se-
quences. Specifically, given a phone pick sequence w, we denote w(k) a cyclic rotation of w by k
rounds and k is referred to as the cyclic rotation phase. Consider an example where u = {0, 1, 2}
with Tu = 3, we have u(2) = {2, 0, 1}.
We define in the following the D-bounded rendezvous system consisting of the D-bounded
phone pick sequences, any pair of which can rendezvous within D rounds regardless of the cyclic
rotation phases and the telephone labeling functions of the players.
Definition 2.3 (D-bounded Rendezvous System). A D-bounded rendezvous system is defined as a
set of phone pick sequences such that any two distinct sequences u and v satisfy the following property:
∃0 ≤ t < D such that f −1 [ut (t0 )] = f 0−1 [vt (t00 )], ∀f, f 0 ∈ F, t0 , t00 .
1
A probabilistic rendezvous algorithm can be regarded as a special case where Tu → ∞.
18
Chapter 2. Channel Rendezvous and Neighbor Discovery
Definition 2.4 (D-bounded Phone Pick Sequence). The phone pick sequences in a D-bounded
rendezvous system are called D-bounded phone pick sequences.
Armed with the above definitions and the mathematic notations introduced in this section, we
can formalize the Generalized Telephone Problem with deterministic algorithm as follows.
Generalized Telephone Problem with deterministic algorithm. The Generalized Telephone
Problem with deterministic algorithm consists of devising D-bounded phone pick sequences u and
v for Alice and Bob to minimize the worst-case rendezvous delay bound D. In other words, we
seek an algorithm to construct phone pick sequences to achieve bounded and minimum worst-case
rendezvous delay among all possible telephone labeling functions and all cyclic rotation phases of
the two players.
We next establish the worst-case rendezvous delay bound for any deterministic algorithm. We
also analyse the structure of the phone pick sequence to guarantee rendezvous between Alice and
Bob regardless of their telephone labeling functions and cyclic rotation phases. The results derived
in this subsection serve as design guidelines for the order-optimum deterministic algorithm devised
later in Sec. 2.2.5 that approaches the performance bound. The proofs, detailed in [40], consists
of constructing contradictions and applying the definition of the generalized telephone problem.
Lemma 2.1 (Structural property of D-bounded phone pick sequence). If Alice and Bob can ren-
dezvous with D rounds by using the D-bounded phone pick sequences u and v, then for any cyclic rota-
tion phases ta0 and tb0 and any telephone label pair (ha , hb ) where 0 ≤ ha ≤ Na −1 and 0 ≤ hb ≤ Nb −1,
there exists t < D such that ut (t0a ) = ha and vt (t0b ) = hb .
Lemma 2.1 shows that given any cyclic rotation phases ta0 and tb0 , to ensure rendezvous within
D rounds, the pair (ut (ta0 ), vt (tb0 )) (0 ≤ t < D) must cover all the possible telephone label couples
(ha , hb ) where 0 ≤ ha ≤ Na − 1 and 0 ≤ hb ≤ Nb − 1. In other words, the two D-bounded phone
pick sequences of Alice and Bob should cover all couples in [0, Na − 1] × [0, Nb − 1]. The following
result follows straightforwardly.
Theorem 2.1 (Worst-case rendezvous delay lower bound). For any pair of D-bounded phone pick
sequences, the worst-case rendezvous delay among all possible telephone labeling functions and all
cyclic rotation phases cannot be lower than Na Nb , i.e., D ≥ Na Nb .
Having established the worst-case rendezvous delay bound of any deterministic rendezvous
algorithm, we now investigate the structure of the D-bounded phone pick sequences that can
guarantee rendezvous between Alice and Bob regardless of their telephone labeling functions and
their cyclic rotation phases. Without loss of generality, we investigate the structure of the phone
pick sequence of Alice u by focusing on its period Tu . The following theorem holds symmetrically
for Bob, whose phone pick sequence is v of period Tv .
Theorem 2.2 (Lower-bound of Tu ). If the rendezvous can be guaranteed between Alice and Bob
regardless of their cyclic rotation phases and Na , Nb , then it holds that Tu ≥ Na2 .
19
Chapter 2. Channel Rendezvous and Neighbor Discovery
The phone pick sequence of each player is constructed based on its ID, which is globally unique.
Examples of such globally unique IDs includes one’s passport number, biometric identities such
as DNA sequence and in case of a communication device its address. Mathematically, we define a
player’s ID as a binary sequence of length l composed of a sequence of bits where each bit takes
either a value of 0 or 1.
Remark. Using globally unique IDs in the design of Zero-knowledge Rendezvous is a way of breaking
the symmetry between the two players and is realistic in many practical engineering problems. A
natural question that arises is whether it is possible to devise a deterministic algorithm with bounded
rendezvous latency without breaking any form of symmetry between the players. In other words,
players are treated as indistinguishable entities. The response is unfortunately no. We next give a coun-
terexample which is simple while sufficient to demonstrate the impossibility of devising a rendezvous
algorithm without any form of symmetry breaking. Consider the case where Alice and Bob are perfectly
synchronized and Na = Nb . Without symmetry breaking, it holds that u = v. If the unique telephone
connecting them c0 is indexed differently by Alice and Bob, i.e., fa−1 (c0 ) 6= fb−1 (c0 ), the rendezvous can
never be achieved.
20
Chapter 2. Channel Rendezvous and Neighbor Discovery
Algorithm 1 Construct a regular sequence oi Algorithm 2 Construct the phone pick se-
Input: Padded ID sequence ei of l + 2l0 bits quence for Alice: the overall algorithm
Output: Regular sequence oi Input: ID sequence α of l bits
1: for t = 0 to l + 2l0 − 1 do Output: Phone pick sequence u
2: switch eit 1: Form the padded ID sequence a = 0||α||1
3: case 1: expand eit into four bits 0101 2: Construct the regular sequence oa
4: case 0: expand eit into four bits 0011 (Algorithm 1)
5: end for 3: Choose two prime numbers p0a and p1a larger
than Na and coprime to Lo = 4 l + b 2l c
6: oi ← the expanded sequence of ei
4: Construct the phone pick sequence u based
on (2.1)
that oit1 (ti0 ) = ojt1 (tj0 ) and oit2 (ti0 ) 6= ojt2 (tj0 ) for any cyclic rotation phases ti0 and tj0 . We denote such
sequences ei as regular sequences. In Algorithm 1 we develop an algorithm to generate regular
sequences.
where [x]y denotes x mod y, rand(Ni − 1) denotes a random integer in [0, Ni − 1]. It can be noted
that the period of the phone pick sequence u is Lo p0i p1i without taking into account the randomly
chosen telephones. Fig. 2.2 provides an example of the phone pick sequence in Zero-knowledge
Rendezvous.
Figure 2.2: A phone pick sequence example for Alice (r denotes a randomly chosen telephone).
21
Chapter 2. Channel Rendezvous and Neighbor Discovery
In the following theorem by applying the Chinese Remainder Theorem (proof detailed in [40]),
we prove the correctness of Zero-knowledge Rendezvous in guaranteeing rendezvous and establish
the worst-case rendezvous delay bound.
Theorem 2.3 (Correctness and worst-case rendezvous delay bound). Zero-knowledge Rendezvous
can ensure rendezvous between Alice and Bob regardless of their telephone labeling functions and cyclic
rotation phases. Let p1a and p1b denote the larger prime numbers chosen by Alice and Bob, the worst-case
rendezvous delay is upper-bounded by Lo p1a p1b .
Remark. Asymptotically, it follows from Theorem 2.3 that the worst-case rendezvous delay of Zero-
knowledge Rendezvous approaches Lo Na Nb , or O(N 2 ) if Na ' Nb ' N which approaches the estab-
lished rendezvous delay lower-bound established in Theorem 2.1 in Sec. 2.3.4.
Zero-knowledge Rendezvous can be adapted and simplified if players have pre-assigned roles,
such as in half-duplex communication. Specifically, each player is either a caller or a callee, and a
rendezvous is required between a caller and a callee. In such context, we can attribute a one-bit ID
a = 0 for any caller and an one-bit ID b = 1 for any callee. Following the same analysis, we can
show that the worst-case rendezvous delay of Zero-knowledge Rendezvous using the two one-bit
IDs is upper-bounded by p1a p1b , a factor of Lo shorter than the case without pre-assigned roles.
Zero-knowledge Rendezvous can also be adapted and simplified if players are synchronized,
i.e., they start the rendezvous search at the same time. In this case, we do not need to pad the ID
sequences of the two players to ensure that they are cyclic rotationally distinct one to the other.
The sequences oi can be generated directly using the ID sequence. In terms of rendezvous delay,
the worst-case delay is upper-bounded by 4lp1a p1b in the synchronized case, which is 50% shorter
than the case without pre-assigned roles.
To complete our study, we also show in [40] that the average rendezvous delay of Zero-
L p1 p1
knowledge Rendezvous is upper-bounded by o 2a b , and asymptotically when p0i ' p1i ' Ni ' N ,
2
it can be bounded by Lo2N .
We start by investigating the simplest probabilistic strategy, the purely random strategy where
each player i picks each telephone h ∈ Ni with equal probability µi (h), i.e., µi (h) = N1i . This
22
Chapter 2. Channel Rendezvous and Neighbor Discovery
process is repeated each round. Let c∗ denote the only telephone via which the two players can
rendezvous, the probability that both Alice and Bob picks it in a round is Na1Nb . The expected
rendezvous delay, denoted as d, is thus
∞ t−1
X 1 1
d= t 1− = Na Nb ,
Na Nb Na Nb
t=1
We next analyse the Anderson-Weber strategy developed in [15] addressing the original version
of the Telephone Problem. To recapture, under the Anderson-Weber strategy, each player i sticks
to a telephone for Ni rounds with probability θ and hops across a random permutation of the Ni
telephones with probability 1 − θ. The process is repeated each Ni rounds until rendezvous. To
make the analysis tractable on the expected rendezvous delay d as in their paper [15], we consider
the case where Na = Nb = N and both players begin the search simultaneously. The reasons why
we focus on this simplified case are two-fold:
• Dropping either assumption makes the analytical characterization of the expected rendezvous
delay intractable;
• The simplified case is sufficient to provide order-magnitude performance bound on this
strategy.
To derive the expected rendezvous delay d of the Anderson-Weber strategy, we consider the
following three cases:
Case 1: With probability θ2 , both players stick to a random telephone for N rounds. In this case,
with probability N1 , a player picks the right telephone c∗ that can make them rendezvous. Hence
with probability N12 , rendezvous can be achieved with a delay 1 and with probability 1 − N12 ,
rendezvous cannot be achieved within N rounds, thus resulting an expected rendezvous delay
N + d. The expected rendezvous delay in this case is thus N12 + (1 − N12 )(N + d).
Case 2: With probability 2θ(1 − θ), one player sticks to a telephone for N rounds and the other
player hops across the N telephones in his room. In this case, with probability N1 , the player sticking
to a random telephone chooses the telephone c∗ that makes them rendezvous, resulting in an
expected rendezvous delay N2 . With the complementary probability 1 − N1 , rendezvous cannot
be achieved within N rounds, resulting in an expected rendezvous delay N + d. The expected
rendezvous delay in this case is thus N1 · N2 + (1 − N1 )(N + d).
Case 3: With probability (1−θ)2 , both players hop across the N telephones in his room in a random
permutation. In This case, it can be calculated that the probability that they can rendezvous within
N rounds is
N 2
1 [(N − 1)!] 1
=
(N !)2 N
with an average rendezvous delay N2 . With the complementary probability 1 − N1 , rendezvous
cannot be achieved within N rounds, thus resulting an expected rendezvous delay N + d. The
expected rendezvous delay in this case is thus N1 · N2 + (1 − N1 )(N + d).
23
Chapter 2. Channel Rendezvous and Neighbor Discovery
The results on the delay of the Anderson-Weber strategy in the generalized telephone problem
demonstrate that it is better to explore than sticking to one telephone. Based on this observation,
we investigate a natural strategy aiming at further decreasing the rendezvous delay by repeatedly
exploring a subset of αN (0 ≤ α ≤ 1) telephones. We term this strategy as the impatient Marcovian
strategy where α characterizes the degree of patience of the players. When α = 1, the impatient
Markovian strategy becomes a patient strategy and degenerates to the Anderson-Weber strategy
with θ = 0.
Specifically, the impatient Markovian strategy works in an epoch-based way in which each
epoch is composed of αN rounds. In each epoch, each player randomly picks a subset of αN
telephones sequentially following a random permutation of them. Such operation is repeated until
when the rendezvous is achieved.
We now derive the expected rendezvous delay for the above impatient Markovian strategy. As
in the previous subsection, we consider the case where Na = Nb = N and both players begin the
search simultaneously. Let π denote the probability that the rendezvous can be achieved in an
epoch. We can compute π as
C(αN, 1)[P (N − 1, αN − 1)]2 α
π= 2
= ,
[P (N, αN )] N
where P (n1 , n2 ) , (n1n−n
1!
2 )!
denotes the number of permutations of n2 elements from a set of n1 .
We further consider the following two cases:
• Case 1: With probability π, the rendezvous can be achieved with an expected delay of αN
2 .
• Case 2: With probability 1 − π, the rendezvous cannot be achieved within the current epoch,
thus resulting an expected rendezvous delay αN + d.
It follows that
αN
D=π· + (1 − π)(d + αN ).
2
We can thus solve d as d = N (N − α2 ), which is minimized at α = 1 with the minimum N 2 − N
2. In
this case, patience actually leads to minimum rendezvous latency.
24
Chapter 2. Channel Rendezvous and Neighbor Discovery
2.2.6.4 Discussion
Conjecture 2.1. In the generalized telephone problem, the Markovian strategy of picking all the
telephones sequentially by following a random permutation of them minimizes the expected rendezvous
delay.
2
Note that even in the original Telephone Problem, it remains an open problem to find the optimum probabilistic
strategy that minimizes the expected rendezvous delay.
25
Chapter 2. Channel Rendezvous and Neighbor Discovery
The Generalized Telephone Problem and the rendezvous algorithm can generate many related
searching and rendezvous problems that are interesting and applicable in many engineering do-
mains. In this subsection we review and investigate a number of them.
It can be easily calculated that the purely random probabilistic rendezvous algorithm achieves
an average rendezvous delay of Na Nb . However, the worst-case rendezvous delay of a probabilistic
algorithm cannot be bounded. On the other hand, the rendezvous delay of the deterministic
L p1 p1
algorithm Zero-knowledge Rendezvous has a larger average rendezvous delay of o 2a b as analysed
in Sec. 2.2.5 with the worst-case rendezvous delay bounded by Lo p1a p1b . A natural question is how
to improve the average performance of Zero-knowledge Rendezvous while still ensuring a bounded
rendezvous delay.
We next investigate how a desired tradeoff between the worst-case and the average rendezvous
delay can be achieved by properly choosing the two prime numbers pi0 and pi1 . To make our analysis
tractable, we focus on a synchronized case where Na = Nb = N and Alice and Bob choose the
same prime numbers. However, the idea presented via this example also holds in the general cases.
Recall the phone pick sequence of Zero-knowledge Rendezvous (equation (2.1)), we note that for
rounds Ni ≤ [t]p0 ≤ p0i (when oit = 0) and Ni ≤ [t]p1 ≤ p1i (when oit = 1), each player randomly
i i
picks a telephone. We can configure the duration of such “random periods” via p0i and p1i so as
to improve the average performance while still ensuring the bounded rendezvous delay by the
operations in the remaining “deterministic rounds”. Specifically, choosing larger p0i and p1i results
in longer “random periods”, thus improving the average performance at the price of increasing the
worst-case delay. By choosing proper p0i and p1i , we can trade off the worst-case and the average
rendezvous delay.
We next provide an approximative quantitative analysis on the above tradeoff. Consider the
case where N is sufficiently large and p0i ' p1i ' p. Approximatively, within each p rounds, there
are p−N “random rounds” where player i randomly picks a telephone. We call such p−N “random
rounds” a random frame. The probability that a rendezvous can be achieved within one random
frame can be calculated as
1 p−N
q =1− 1− 2 .
N
Recall that the worst-case delay is bounded by Lo p2 rounds, i.e., Lo p random frames, we can then
calculate the upper-bound of the average rendezvous delay d as follows:
Given a target expected delay bound d, ¯ p can be chosen based on the above inequality. To get
more insight, we consider the case where we set p sufficiently larger than N but linear to N , i.e.,
p = (1+λ)N with sufficiently large λ. We have q ' Nλ . After some algebraic operations, the average
delay is bounded by
∞
X
k 1
d< (1 − q )q · (k + 1)p = 1 + N 2,
λ
k=0
26
Chapter 2. Channel Rendezvous and Neighbor Discovery
with the worst-case rendezvous delay Lo (1 + λ)2 N 2 . We can clearly see the possibility of trading-off
worst-case and average rendezvous delay in Zero-knowledge Rendezvous by configuring λ.
Throughout our analysis, we implicitly assume that each player can pick only one telephone
at a time. We now relax this constraint by considering the case where each player can pick two
telephones simultaneously. The analysis can be generalized to the case where each player can pick
an arbitrary number of telephones. In a broader sense of a rendezvous or searching game, relaxing
the constraint of picking only one telephone a time allows to study the case where each player can
search more than one places simultaneously or where the game is played between two groups of
players and the communication is possible among the players in the same group.
We first establish the lower-bound of the worst-case rendezvous delay. To this end, by perform-
ing a similar analysis as that in the proof of Theorem 2.1 [40], we can show that the worst-case
rendezvous delay, among all possible telephone labeling functions and all cyclic rotation phases,
cannot be lower than Na4Nb , and more generically Nna N2
b
if each player is allowed to pick 2n tele-
phones simultaneously.
We then devise a deterministic rendezvous algorithm achieving O(Na Nb ) worst-case ren-
dezvous delay. Specifically, we take Alice as an example to derive the phone pick sequence. Since
she can pick two telephones each time, we denote ul = {ut,l } and ur = {ut,r } the sequences of
telephones picked by her left and right hands, respectively. We devise ul = {ut,l } as follows:
(
[t]pla [t]pla < Na ,
ut,l =
rand(Na − 1) otherwise,
where pla is a prime number not smaller than Na . The sequence ur = {ut,r } can be devised
symmetrically using a different prime number pra . To establish that the rendezvous is guaranteed
between Alice and Bob, it suffices to note that at least one of pla and pra is co-prime to at least one
of plb and prb , the two prime numbers of the phone pick sequences of the left and right hands of Bob.
The worst-case rendezvous delay can be proved to be upper-bounded by pra prb if we specify that the
prime number of the right hand is larger than that of the left hand.
When each player can pick two telephones simultaneously, a symmetry breaking technique is
no more necessary because the phone pick sequence of the left hand can be designed in a different
way as that of the right hand, thus naturally breaking the symmetry and resulting in a rendezvous
between the left-hand and the right-hand sequences. Consequently, a shorter worst-case rendezvous
delay, which is a factor Lo shorter than baseline case, can be achieved.
The supporting primitive that discovers all the neighbors in a device’s communication range
is referred to as neighbor discovery, which is one of the bootstrapping primitives supporting many
basic network functionalities, such as topology control, clustering, medium access control, etc.
27
Chapter 2. Channel Rendezvous and Neighbor Discovery
Ideally, nodes should discover their neighbors as quickly as possible for other algorithms to quickly
start their execution.
Compared to the channel rendezvous problem, designing efficient neighbor discovery algo-
rithms is more challenging in energy-constraint wireless networks because the technique of duty
cycling is used to reduce the energy consumption when these devices are in the idle state. Under
duty cycling, each device alternates between active and sleeping modes by turning their radios on
only periodically to achieve synchronization and save energy. Duty cycle refers to the fraction of
time a device is in the active mode [14, 89]. For example, a device whose duty cycle is 1% activates
during one time slot every 100 slots. Despite its effectiveness in energy conservation, the duty
cycling technique significantly challenges the neighbor discovery algorithm design in the quest of
limiting discovery latency with low power consumption. Specifically, the two important design ob-
jectives, saving energy through a duty-cycle based scheduling and limiting the neighbor discovery
latency, are at odd with each other. The problem is more complex if the operating frequencies of
wireless nodes span multiple channels.
We investigate the neighbor discovery problem in the multi-channel duty-cycled wireless net-
works and develop a neighbor discovery algorithm MCD with the following properties.
• Fine-grained control of duty cycle: In contrast to existing solutions using prime numbers or
power-multiples, MCD can support more than 95% of duty cycles in practical settings, thus
providing much more fine-grained control of energy conservation levels.
• Bounded worst-case discovery delay: MCD achieves bounded discovery delay even between
nodes with heterogeneous duty cycles.
• Full discovery diversity: MCD guarantees discovery over each channel, thus minimizing the
probability of discovery failures caused by various channel problems.
• Robustness against asymmetrical channel perceptions: MCD achieves the same discovery perfor-
mance even if nodes have asymmetrical channel perceptions, either on the accessible channel
set or on the channel index.
• Robustness against clock drift: MCD achieves the same performance even if clocks of any two
nodes drift away from each other by an arbitrary amount of time.
The neighbor discovery algorithms for duty-cycled networks in the literature can be categorized
into probabilistic and deterministic algorithms. We give a high-level overview of these two types of
approaches and briefly analyse the pros and cons of each.
Probabilistic algorithms (cf. major work in this category [107, 139, 183, 197, 232]) adopt
probabilistic strategies at each node. Specifically, each node remains active or asleep with different
probabilities. A representative one is the birthday algorithm [139] where nodes transmit/receive or
sleep with different probabilities. Probabilistic algorithms have the advantages of being memoryless
and stationary and thus are especially robust and suitable in decentralized environments where no
prior knowledge or coordination is available. Moreover, they usually perform well in the average
case by limiting the expected discovery delay. The main drawback of them is the lack of performance
guarantee in terms of discovery delay. This problem is referred to as the long-tail discovery latency
problem in which two neighbor nodes may experience extremely long delay before discovering
each other.
28
Chapter 2. Channel Rendezvous and Neighbor Discovery
Deterministic algorithms, on the other hand, can provide strict upper-bound on the worst-case
discovery delay (cf. major work in this category [20, 67, 103, 106, 114, 140, 194, 210, 238]).
In deterministic algorithms, each node wakes up according to its neighbor discovery schedule
carefully tuned to ensure that each pair of two wake-up schedules overlap in at least one active
slot. The key element in the deterministic algorithm design is how to devise the neighbor discovery
schedule to ensure discovery and minimize the worst-case discovery delay, regardless of the duty
cycle asymmetry and the relative clock drift. Compared to probabilistic approaches that work well
in the average case while fail to bound the worst-case discovery delay, deterministic algorithms
have good worst-case performance while usually have longer expected discovery delay.
More specifically, based on the design of wake-up schedule, major existing deterministic algo-
rithms can further be divided into three classes as briefly reviewed in the following.
• The first class of them, based on Quorum [114, 194], construct the wake-up schedule by
assigning a column and a row of an m × m array to each node such that no matter which
row and column are selected, any two nodes have at least two overlapping awaken slots. The
main drawback of the Quorum-based approaches is the support of only symmetrical duty
cycles [194]. Although enhanced solutions have been proposed to support asymmetric duty
cycles, only two different duty cycles can be supported [114].
• The second class of deterministic algorithms overcome this limitation by using prime numbers
to guarantee bounded discovery delay even for asymmetrical duty cycles. A typical one in
this class is Disco [67], in which each node selects two prime numbers, based on which its
wake-up schedule is configured. A more recent solution, U-Connect [106], uses a single prime
number per node and has a shorter discovery delay, given the same duty cycle.
• The third class, Searchlight, proposed in [20] and a number of follow-up schemes [184, 210],
employs two kinds of wake-up slots, termed as anchor and probe slots, to achieve both lower
worst-case and average discovery delay.
Despite extensive research efforts devoted to neighbor discovery, none of them can solve the
multi-channel neighbor discovery problem by achieving bounded discovery delay for nodes op-
erating on heterogenous duty cycles. In contrast, the solution we develop can achieve bounded
discovery delay for nodes operating on heterogeneous duty cycles in multi-channel environments.
Definition 2.5 (Neighbor discovery schedule). The neighbor discovery schedule of a node u is defined
29
Chapter 2. Channel Rendezvous and Neighbor Discovery
Definition 2.6 (Duty cycle). The duty cycle of a node u, denoted by δu , is defined as the percentage
of slots per period of the neighbor discovery schedule xu where u is active. Formally, δu is defined as
|t ∈ [0, Tu − 1] : xtu = 1|
δu , .
Tu
The reciprocal of δu is denoted by du .
Consider two nodes a and b with their neighbor discovery schedules being xa and xb whose
periods are Ta and Tb . Given the periodicity of xa and xb , it suffices to consider consecutive Ta Tb
slots, i.e, 1 ≤ t ≤ Ta Tb . If ∃t ∈ [1, Ta Tb ] and h ∈ N such that xta = xtb = h, we say that a and b can
discover each other in slot t on channel h. Slot t is called the discovery slot and channel h is called
the discovery channel between a and b. Example 3 illustrates the above definition.
Example 3. Consider a network of two channels and two nodes a, b whose neighbor discovery schedules
are xa = {0, 0, 1} and xb = {0, 1, 0, 2} with Ta = 3 and Tb = 4. The duty cycles of a and b are δa = 13
and δb = 12 , or da = 3, db = 2. The neighbor discovery schedules of a and b are repeated each 12 slots,
as illustrated in Fig. 2.3 for one period. We can observe that a and b can discover each other in slots 6
on channel 1.
Node a : 0 0 1 0 0 1 0 0 1 0 0 1 ...
Node b : 0 1 0 2 0 1 0 2 0 1 0 2 ...
As in the previous section, to model the situation where the clocks of different nodes are not
synchronized4 , we apply the concept of cyclic rotation to neighbor discovery schedules. Specifically,
given a neighbor discovery schedule xa , we denote xa (k) a cyclic rotation of xa by k slots where k
is called the cyclic rotation phase. In Example 3, we have xa (1) , {1, 0, 0} and xb (2) = {0, 2, 0, 1}.
30
Chapter 2. Channel Rendezvous and Neighbor Discovery
Performance Metric 2: Discovery Diversity. The second metric, particularly pertinent for
the multi-channel environment, is the discovery diversity, which characterizes the capability of a
neighbor discovery algorithm of discovering a neighbor regardless of its operational channel. We
say that a neighbor discovery algorithm achieves full discovery diversity if the discovery of any pair
of nodes is guaranteed on every common channel they can access. The neighbor discover schedule
in Example 3 cannot achieve full discovery diversity as a and b can never discover each other on
channel 2.
Performance Metric 3: Maximum Time to Discovery with Full Diversity. When full discovery
diversity can be achieved, we further define the third metric maximum time to discovery with full
diversity (MTTD-FD) as the worst-case delay to achieve full discovery diversity. The MTTD-FD can
be regarded as a generalization of the MTTD in multi-channel networks. The MTTD-FD degenerates
to the MTTD in single-channel networks. Throughout our analysis, we analyse the MTTD in single-
channel case and the MTTD-FD in multi-channel case.
We next formulate the optimum multi-channel neighbor discovery problem.
Problem 2.4 (Multi-channel neighbor discovery). The optimum multi-channel neighbor discovery
problem is defined as follows:
minimize T ,
subject to ∀t0a ∈ [1, Ta ], t0b ∈ [1, Tb ], ∀δa , δb , ∃t ≤ T such that xta (t0a ) = xtb (t0b ) = h, ∀h ∈ Na
T
Nb .
That is, devising neighbor discover schedules to minimize T , the worst-case discovery delay while
achieving full discovery diversity between any pair of nodes a and b for any duty cycle pair (δa , δb ), any
initial time offset t0a and t0b , and any channel perception Na and Nb .
In what follows, we first establish a theoretical performance bound of any neighbor discovery
algorithm. We then present the baseline design and optimization of MCD in the single-channel case,
before proceeding to the multi-channel case with symmetrical channel perception (i.e., Na = Nb ).
We complete our analysis by addressing the generic case with asymmetrical channel perceptions
and arbitrary clock drift to iron out a version of MCD that works in practice.
Armed with the theoretical framework established previously, the following theorem derives
the performance bound of any multi-channel neighbor discovery algorithm achieving full discovery
diversity. This result thus establishes the lower-bound of the solution of Problem 2.4.
Theorem 2.4. (Algorithm-independent Bound of MTTD-FD) For any neighbor discovery algorithm
achieving full discovery channel diversity, the MTTD-FD between any pair of nodes a and b, denoted
by L, is lower-bounded by N 2 da db , where da and db denote the reciprocals of the duty cycles of a and
b, i.e., da = δ1a and db = δ1b . Asymptotically, when da ' db ' d, L ' N 2 d2 .
31
Chapter 2. Channel Rendezvous and Neighbor Discovery
In the single-channel case, the neighbor discovery schedule xu for each node u degenerates to
a binary sequence where
(
1 u wakes up in slot t,
xtu =
0 u sleeps in slot t.
Each node wakes up periodically to discover its neighbors. The wake-up period is determined
by its duty cycle. Specifically, we consider two neighboring nodes a and b with duty cycles δa = d1a
and δb = d1b . To discover each other, nodes a and b wake up every da and db slots, i.e., xa (t) = 1 for
t = kda and xb (t) = 1 for t = kdb + δab where δab is the clock offset between a and b, k = 1, 2, · · · .
It follows from the Chinese Remainder Theorem [152] that if da and db are co-prime to each other,
the two nodes are ensured to discover each other regardless of δab , i.e., there exists td such that
xtad = xtbd (δab ) = 1, ∀δab .
However, assigning co-prime numbers to each node in a distributed way is far from trivial. A
commonly adopted solution is to use only prime numbers because two distinct prime numbers are
by definition co-prime to each other, as in Disco [67] and U-Connect [106]. However, limiting the
choices to prime numbers fail to support all the duty cycles due to the limited number of prime
numbers. Note that among natural numbers smaller than 1000, only 16 are prime numbers.
Motivated by the above analysis, we devise the following neighbor discovery schedule in MCD.
For each node u with duty cycle δu = d1u ,
(
1 t is divisible by either 2du − 1 or 2du + 1,
xu (t) =
0 otherwise.
Example 4. Consider two nodes a and b with duty cycles δa = 13 , δb = 15 with a clock offset δab = 1. Un-
der MCD, using the time of a as reference, a wakes up in slots 5k and 7k, i.e., 5, 7, 10, 14, 15, 20, 21, · · · ,
b wakes up in slots 9k + 1 and 11k + 1, i.e., 10, 12, 19, 23, · · · , as illustrated in Fig. 2.4. The discovery
happens in slot 10.
Node a : 0 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 0 0 0 1 1 0 0 0 1 ...
Node b : 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 ...
The period of xu in MCD is (2du −1)(2du +1), in which there are 4du −1 active slots5 . Hence, the
4du −1
actual average duty cycle, denoted as δbu , is (2du −1)(2du +1)
which approaches to the required duty
1
cycle δu = when du is large. Generally, the relative error between δbu and δu is upper-bounded
du
1
by 4du , as established in the following lemma.
Lemma 2.4. The relative error between the duty cycle of the neighbor discovery schedule δbu and the
required duty cycle δu is upper-bounded by 4d1u .
5
Note that (2du − 1)(2du + 1) is divisible by both 2du − 1 and 2du + 1.
32
Chapter 2. Channel Rendezvous and Neighbor Discovery
Following the Chinese Remainder Theorem, the mutual discovery of two neighbor nodes a and
b in MCD, regardless of their clock drift, requires at least one of 2da ± 1 to be co-prime with at least
one of 2db ± 1. In the vast majority of cases, this requirement can be satisfied. To illustrate this, if
we allow the maximum duty cycle reciprocal D to be 100, then all duty cycles d1 except d = 17 and
38 can be supported by MCD; if we allow D to be 1000, only 43 duty cycles cannot be supported,
i.e., MCD can support nearly 96% of all duty cycles.
In this subsection, we conduct a formal analysis on the design idea of MCD. We start by
formulating the definition of regular duty cycles that are natively supported by MCD.
Definition 2.7 (Regular Duty Cycle). Given the duty cycle reciprocal upper-bound D, we call a duty
cycle δ = d1 (d ≤ D) a regular duty cycle if for any 2 ≤ d0 ≤ D, at least one number from 2d ± 1 is
co-prime with at least one number from 2d0 ± 1.
For two nodes a and b, if at least one of their duty cycles is regular, a and b can discover each
other. Reconsider Example 4 with D = 1000, it follows from Definition 2.7 that the duty cycles
δa = 13 and δb = 15 are both regular. Evidently the two nodes can discover each other, as illustrated
in Fig. 2.4.
We conclude this subsection by stating the following properties of regular duty cycles:
• The vast majority of duty cycles are regular. As illustrated in Tab. 2.1, more than 97% duty
cycles are regular when D varies from 100 to 900. In contrast, in existing solutions based
on prime numbers, only a small portion of duty cycles can be supported due to the limited
choice of prime numbers.
• There are no three consecutive non-regular duty cycle reciprocals. In other words, if d1 is non-
1
regular, at least one from d±1 is regular. This implies that if the required duty cycle d1 happens
1 1 1
to be non-regular, the node can operate on d+1 or d−1 . If we take the case of using d−1 , the
4(d−1)−1
effective duty cycle is [2(d−1)−1][2(d−1)+1] . The relative error to the required duty cycle can be
computed as
4(d − 1) − 1 1 1 4(du − 1) + 1
= − / = ,
[2(d − 1) − 1][2(d − 1) + 1] d d 4(du − 1)2 − 1
which is decreasing in d. In the case of D = 1000, the smallest non-regular duty cycle
reciprocal being 17, is upper-bounded by 4.5% for all non-regular duty cycles.
The analysis in this subsection demonstrates that MCD can support most duty cycles and even
in the cases where the duty cycles cannot be directed supported, MCD can use neighboring duty
cycles with almost negligible errors. Concerning the implementation of MCD, we note that the
regular duty cycles can be pre-calculated off-line by exhaustive search and stored in a look-up
table.
33
Chapter 2. Channel Rendezvous and Neighbor Discovery
In the single-channel case, only the first performance metric (maximum time to discovery,
MTTD) is applicable. In Theorem 2.5, we derive the MTTD of MCD between two nodes a and b if
at least one of δa and δb is regular.6 The proof consists of applying the Chinese Remainder Theorem
and the properties of the regular duty cycles [50].
Theorem 2.5 (Discovery Delay Upper-bound). Given any two nodes a and b, if at least one of their
duty cycles d1a and d1b is regular, they are ensured to discover each other within at most (2da +1)(2db +1)
slots.
The neighbor discovery schedule of MCD in the multi-channel case is constructed based on its
globally unique ID such as its MAC address, which can be mathematically expressed as a binary
sequence of length l. The neighbor discovery schedule construction process is composed of three
steps.
• Step 1: Each node u independently generates a padded binary sequence ou based on its ID
such that the padded binary sequences of any two nodes are cyclic rotationally distinct one to
the other;
• Step 2: Each node u independently generates a sequence su based on ou such that for any
two nodes a, b and any initial time offset t0a and t0b , there always exist four time slots lij
l l
(i, j ∈ {0, 1}) such that saij (t0a ) = i and sbij (tb ) = j.
• Step 3: Each node u generates its neighbor discovery schedule based on su .
The first two steps follow the similar procedure as those in the generalized telephone problem
analyzed in the previous section. We now discuss the third step.
In the third step, the neighbor discovery schedule is constructed as follows. Each node u hops
across different channels h ∈ N and wakes up based on the following schedule7 :
(
h t − hdu is divisible by 2N du ± 1,
xtu =
0 otherwise,
where xtu = h signifies that u wakes up on channel h in slot t while xtu = 0 indicates that u sleeps
in the slot, N1du is chosen from the regular duty cycle set.
The above construction of xu does not take into account the case where there exist two different
channels hc (c = 0, 1) such that t − h0 du is divisible by 2N du − 1 and t − h1 du by 2N du + 1. To
0
resolve such conflict, let t0 = t mod Ls , u operates on channel hc if stu = c. We refer to the slots
where u operates on channel hc in case of conflict as type-c slots.
6
Throughout our analysis, we focus on the pair-wise discovery between any pair of neighbor nodes a and b. The
obtained results can be readily generated to the network level where each node should discover all its neighbor nodes
by following the same way as Theorem 2.4.
7
To make the notation concise, we adopt the notation that t − hdu is divisible by 2N du ± 1 denotes that t − hdu is
divisible by 2N du − 1 or 2N du + 1 or both.
34
Chapter 2. Channel Rendezvous and Neighbor Discovery
To intuitively see that the discovery is ensured between any pair of nodes a, b, note that if N1du
belongs to the usable duty cycle set derived previously, i.e., at least one of 2N da ± 1 is co-prime
with at least one of 2N db ± 1, discovery can be guaranteed for any initial time offset t0a and t0b
l l
because there always exist four time slots lij (i, j ∈ {0, 1}) such that saij (t0a ) = i and sbij (tb ) = j
following the second step of construction.
Theorem 2.6. If N1da and N1db are regular duty cycles, the MTTD-FD between two nodes a and b is
O(Ls N 2 max{d2a , d2b }), where Ls denotes the length of the sequences generated in Step 2.
In previous analysis, we implicitly assume that a and b have the same channel perception, i.e.,
they have symmetrical knowledge on N . In this subsection, we relax this assumption to investigate
the scenario where each node u has its own perception on N , denoted by Nu , which is a subset of
N . In this context, the neighbor discovery schedule in MCD becomes
(
h t − hdu is divisible by 2Nu du ± 1,
xtu =
0 otherwise.
Specifically, the channel perception asymmetry between a and b can be characterized at two
levels:
• Asymmetry on accessible channel Tset: They have asymmetrical perceptions on the global chan-
nel set N , i.e., Na 6= Nb and Na Nb 6= ∅;
• Asymmetry on channel index: They have asymmetrical perceptions on the channel index, i.e.,
channel h ∈ N is indexed ha by a and hb by b where ha ∈ Na and hb ∈ Nb but ha 6= hb .
The following theorem established the performance of MCD in such context.
Theorem 2.7. MCD under asymmetrical channel perceptions achieves the same MTTD-FD as under
symmetrical channel perceptions, i.e., within at most O(Ls max{Na2 d2a , Nb2 d2b }) (specifically, O(Ls N 2 d2 )
if da ' dTb ' O(d) and Na ' Nb ' O(N )) slots; the discovery between a and b occurs on each channel
h ∈ Na Nb .
Theorem 2.7 shows that MCD is robust against asymmetrical channel perceptions, either on
the channel set or index.
We then study the effect of slot non-alignment caused by relative clock drift between the
neighbor nodes.
35
Chapter 2. Channel Rendezvous and Neighbor Discovery
We first briefly introduce the clock model. Each node is equipped with a local clock, which is a
time measurement device composed of a hardware oscillator and an accumulator. Mathematically,
consider two nodes a and b, we can express the local time at b, denoted as tb , as a function of the
local time of a, denoted as ta , by the following formula
Z ta
tb (ta ) = ρab (τ )dτ + tb (t0 ),
t0
where ρab (τ ) denotes the relative frequency drift rate of the oscillator of b as a function of a at time
τ , tb (t0 ) is the initial clock offset between them.
If a and b are ideally synchronized, it holds that ρab (τ ) = 1 and tb (t0 ) = 0. In practice, ρab (τ )
may drift away from each other, as formalized in the following:
where ∆ρmax is bounded by 10−6 in practice. Hence we can regard ρab (τ ) as a constant ρab during
the discovery process. Without loss of generality, we assume that the clock of b advances no slower
than that of a, i.e., ρab ≤ 1.
When ρab = 1, i.e., the clock difference between a and b remains tb (t0 ), we distinguish the
following two cases (to facilitate presentation, we normalize the slot duration of a):
• Case 1: tb (t0 ) = k ∈ Z: this is the case with aligned slots addressed in previous analysis;
• Case 2: tb (t0 ) = k + δ with k ∈ Z and δ ∈ (−1/2, 1/2]: the previous analysis can be directly
adapted to this case, the difference being that instead of ensuring entire overlap, a discovery
in this case is a partial overlap of time 1 − |δ|.
We now investigate the case where ρab < 1, meaning that if we regard the slot duration of
a as unit time, the slot duration of b is ρab < 1. The following theorem establishes the discovery
performance of MCD with arbitrary clock drift with ρab < 1.
Theorem 2.8. Regard the slot of a as unit time, a and b can discover each other on each channel h
within at most O(ρab Ls N 2 max{d2a , d2b }) time.
The results obtained in this subsection, particularly Theorem 2.8, demonstrate that the discovery
performance established in previous analysis holds even when the clocks of a and b drift away from
each other for an arbitrary amount of time. In other words, MCD is robust against clock drift and
slot non-alignment.
With the rapid development of the wireless communication technology and the significant
decreasing prices of radios, it is nowadays feasible to equip a wireless device with multiple radios,
36
Chapter 2. Channel Rendezvous and Neighbor Discovery
each operating on a separated spectrum channel. Equipping all or some nodes with multiple radios
can significantly boost the network capacity by enabling simultaneous operations over multiple
channels and mitigating interferences through proper channel assignment.
Motivated by the above observation, we propose an order-optimum multi-radio channel ren-
dezvous algorithm by exploiting the benefits brought by the rendezvous diversity created by mul-
tiple radios in minimizing the rendezvous delay and increasing the rendezvous robustness. Our
solution is a unified channel rendezvous algorithm that can operate in both homogenous case
where both of the rendezvous nodes are equipped with only one radio or multiple radios, and
heterogeneous case where one of the rendezvous nodes has single radio and the other has multiple
radios.
37
Chapter 2. Channel Rendezvous and Neighbor Discovery
In this work, we focus on neighbor discovery for wireless nodes with directional antennas.
Compared to the traditional omni-direction antenna paradigm, neighbor discovery with directional
antennas is intuitively more challenging as directional antennas can only cover a fraction of the
azimuth. Hence, neighbor discovery algorithms need to be carefully designed in order to guarantee
that any pair of neighbor nodes can eventually steer their antennas toward each other at certain
time instance. Moreover, nodes may not be synchronized and their antennas can be heterogeneous
in terms of beam-width. Neighbor discovery algorithms should be able to guarantee discovery in
this challenging environment in a fully decentralized manner without any prior coordination.
We coin the term oblivious neighbor discovery problem to denote the following problem: How can
neighbor nodes with heterogeneous antenna beam-width and without clock synchronization discover
each other within a bounded delay in a fully decentralized manner without any prior coordination?
Particularly, the following requirements should be satisfied:
• Bounded (and minimum) worst-case discovery delay;
• Discovery oblivity, the capability of guaranteeing discovery regardless of the antenna beam-
width and the relative positions of nodes. This requirement is particular in the neighbor
discovery with directional antennas.
We conduct a comprehensive investigation on the oblivious neighbor discovery problem, sum-
marized as follows:
• Theoretical framework. We establish a theoretical framework on oblivious neighbor discovery
and establish the performance bound of any oblivious neighbor discovery algorithm. Our
theoretical results not only shed light on the structure of the problem, but also serve as
design guidelines for oblivious neighbor discovery protocols.
• Algorithm design. Guided by the theoretical results, we further design an oblivious neighbor
discovery algorithm and prove that it achieves guaranteed oblivious discovery with order-
minimum worst-case discovery delay in the asynchronous and heterogeneous environment.
We further demonstrate how the protocol can be configured to achieve a desired trade-off
between average and worst-case performance.
38
Chapter 2. Channel Rendezvous and Neighbor Discovery
the other hand, have good worst-case performance at the price of worse average performance. A
promising research direction is to combine the advantage of both while limiting their side-effects. In
our work, we interleave the probabilistic slots, where the operation of the algorithm is randomized,
with the deterministic ones to tradeoff the worst-case performance with the average performance.
Another very recent result on the rendezvous problem demonstrates how to beat the quadratic
theoretical barrier on the worst-case rendezvous delay by utilizing a public source of randomness
in conjunction with a Markovian hitter [54]. Generally speaking, how to systematically orchestrate
and synergize randomness with determinism is an important research axe for us.
From a networking and system perspective, our work and most neighbor discovery protocols in
the literature do not make specific assumptions about the mobility patterns to achieve neighbour
discovery. In other words, they can achieve discovery without exploiting knowledge concerning
mobility patterns. This is sometimes an advantage as they can be applied to either static or mobile
devices and are robust to mobility as their performance does not depend on mobility. However, there
are many practical scenarios where by exploiting mobility patterns of devices, neighbor discovery
can be facilitated. Hence, how to design mobility-aware neighbor discovery protocols that exploit
device mobility to facilitate neighbor discovery, either limiting discovery delay or increasing energy
efficiency, is another pertinent research direction. For example, a possible idea is to adapt a more
flexible duty cycle mode and to allocate more active slots when mobility is important to ensure
more agile neighbor discovery.
39
Chapter 3
3.1 Introduction
In this chapter, we address the problem of opportunistic channel access in a multi-channel
communication system. Specifically, we consider a communication system in which a sender has
access to multiple channels, but is limited to sense and transmit only on one or a limited number
of channels in each slot. We explore how a smart sender should exploit past observations and
the knowledge of the stochastic properties of these channels to maximize its transmission rate by
switching opportunistically across channels.
Formally, we provide a generic analysis on the opportunistic spectrum access problem by casting
the problem into the restless multi-armed bandit (RMAB) problem, one of the most well-known
generalizations of the classic multi-armed bandit (MAB) problem, which is of fundamental impor-
tance in stochastic decision theory. Despite the significant research efforts in the field, the RMAB
problem in its generic form still remains open. Until today, very little result is reported on the
structure of the optimum policy1 . Obtaining the optimum policy for a general RMAB problem is
often intractable due to the exponential computation complexity. Hence, a natural alternative is to
seek a simple myopic policy maximizing the short-term reward.
Motivated by the above analysis, we study the following natural while fundamentally important
question: under what conditions is the myopic policy guaranteed to be optimum? We answer the ques-
tion by performing an axiomatic study. More specifically, we develop three axioms characterizing
a family of functions which we refer to as regular functions, which are generic and practically
important. We then establish the optimality of the myopic policy when the reward function can be
express as a regular function and when the discount factor is bounded by a closed-form threshold
determined by the reward function. We also illustrate how the derived results, generic in nature,
are applied to analyze a class of RMAB problems arising from multi-channel opportunistic access.
Compared with the existing literature addressing the optimality of the myopic policy of the
1
A policy is in fact an algorithm that tells the user which arm (channel in our case) to activate. We use the term
policy to be coherent to the notation convention in the MAB context.
40
Chapter 3. Opportunistic Channel Access: A Restless Multi-Armed Bandit Perspective
In the first part of this chapter, we study the optimality of the myopic sensing policy in the case
where the user is allowed to sense k out of N channels. In the second part, we further investigate
a more challenging problem where the user has to decide the number of channels to sense in
order to maximize its utility. This optimization problem hinges on the following tradeoff between
exploitation and exploration: sensing more channels can help learn and predict the future channel
state, thus maximizing the long-term reward, but at the price of sacrificing the reward at current slot
as sensing more channels reduces the time for data transmission, thus decreasing the throughput
in the current slot. Therefore, to find the optimum number of channels to sense consists of striking
a balance between the above exploitation and exploration. After showing the NP hardness of the
problem, we develop a heuristic ν-step lookahead policy which consists of sensing channels in a
myopic way and stopping sensing when the expected aggregated utility from the current slot t to
slot t + ν begins to decrease. In the developed policy, the parameter ν allows to achieve a desired
tradeoff between social efficiency and computation complexity.
From the system perspective, our analysis provides insight on the following design tradeoff in
opportunistic spectrum access:
• Gaining immediate access versus gaining information for future use: Due to hardware limita-
tions and the energy cost of spectrum monitoring, a user may not be able to sense all the
channels in the spectrum simultaneously. A sensing strategy is thus needed for intelligent
channel selection to track the channel evolution. The purpose of a sensing strategy is twofold:
to find good channels for immediate access and to gain statistical information on the channels
for future transmissions. The optimum sensing strategy should thus strike a balance between
these two often conflicting objectives.
• Aggressive versus conservative: Based on the imperfect sensing outcomes given by the spectrum
sensor, the user needs to decide whether to access. An aggressive access strategy may lead to
excessive transmission failures while a conservative one may result in throughput degradation
due to overlooked opportunities.
41
Chapter 3. Opportunistic Channel Access: A Restless Multi-Armed Bandit Perspective
Despite the our focus on opportunistic communication, the problem formulation is applicable
in many other engineering fields such as communication jamming, scheduling and object tracking.
Hence the results presented in this thesis are generically applicable in a large range of domains
beyond the scope of opportunistic channel access.
The rest of this chapter is structured as follows. Section 3.2 provides an overview of the RMAB
problem and its application. Section 3.3 develops our work on the myopic policy for the RMAB
problem in the context of opportunistic channel access. Section 3.4 presents our work on the ν-step
look-ahead strategy. Section 3.5 concludes the chapter by briefly summarizing our other related
work related to this topic. The work of this chapter is in collaboration with my former Ph.D. student
Kehao Wang (co-advised with Pr. Kaldoun Al Agha) who is now a visiting research associate at
MIT. More details of our work on this topic including proofs and numerical analysis can be found
in our publications [42, 200 – 203, 205 – 207].
Multi-armed bandit, first posed in 1933, has become a classical problem in stochastic opti-
mization with a wide range of engineering applications, including but not limited to, multi-agent
systems, web search and Internet advertising, social networks, and queueing systems. Recently, it
has found new applications in communication networks and dynamic systems.
Consider a dynamic system consisting of a player and N independent arms. In each slot t
(t = 1, 2, · · · ), the state of arm k is denoted by sk (t) and completely observable to the player.
At slot t, the player selects one arm, i.e., arm k, to activate based on the system state S(t) =
[s1 (t), s2 (t), · · · , sN (t)] and accrues reward R(sk (t)) determined by the state sk (t) of arm k. Mean-
while, the state of arm k will transmit to another state in the next slot according to certain transition
probabilities, i.e., pki,j = P (sk (t + 1) = j|sk (t) = i), i, j ∈ Ωk , where Ωk denotes the state space of
arm k. The states of other arms which are not activated will remain frozen, i.e., sn (t + 1) = sn (t)
∀n 6= k.
The player’s selection policy π = {π(1), π(2), · · · } is a series of mapping from the system state
S(t) to the action a(t) indicating which arm is activated, i.e., π(t) : S(t) → a(t). The objective of
the player is to obtain the optimum policy π ∗ maximizing the expected total discounted reward,
i.e.,
h X T i
∗
π = arg max E lim β t−1 R(sa(t) (t)) ,
π T →∞
t=1
42
Chapter 3. Opportunistic Channel Access: A Restless Multi-Armed Bandit Perspective
Since the size of the system states grows exponentially with the number of arms, the above
problem, called the classic MAB problem, has an exponential complexity for its general numerical
solutions.
This sequential decision problem was firstly proposed by Thompson in 1933 [191], but the
theoretical structure of the optimum solution for the classic MAB has not been obtained until
Gittins’s seminal work [83] in 1974. Gittins showed that an index policy is optimum, called Gittins
index later, and thus reduced the complexity of the problem from exponential to linear in the
number N of arms.
Theorem 3.1. The optimum policy has an index form. Specially, for all 1 ≤ k ≤ N , there exists an
index function Gk (·) that maps the state i ∈ Ωk of arm k to a real number. At each time, the optimum
action is to activate the arm with the largest index.
Gittins also gave a specific form of the index function Gk (·), referred as Gittins index, as given
in the following definition.
Definition 3.1 (Gittins Index). For any state i ∈ Ωk of arm k,
hP i
σ t−1 R(s (t))|s (1) = i
E t=1 β k k
Gk (i) = lim sup hP i ,
σ≥1 σ t−1 |s (1) = i
E t=1 β k
Basically, Gittins index measures the maximum reward rate that can be achieved by focusing
on activating one arm starting from its current state. Therefore, by Gittins index, the player can
accrue reward as quickly as possible and thus maximize the total discounted reward.
Whittle [212] extended the MAB to a more general model where a set of K (K > 1) arms,
denoted as K(t), can be activated simultaneously and change their states in each slot and mean-
while the passive arms are also allowed to offer reward and change state, which makes it different
from the classic MAB. If arm k is activated, then its state transits according to a transmitting rule
Pk1 and yields the immediate reward gk1 (sk (t)) while it transits by another rule Pk2 and yields
the immediate reward gk2 (sk (t)) when arm k isn’t activated. A policy π = {π(t)}∞ t=1 is a series of
mappings where π(t) maps the system state S(t) to the set of K arms K(t) to be activated in slot t.
In [212], Whittle considered the above problem of maximizing the average reward over an
infinite horizon2 , which can be formulated as follows:
T N
n 1 Xh X X io
π ∗ = argmax E lim gi1 (si (t)) + gj2 (sj (t)) .
π T →∞ T
t=1 i∈K(t) j=1,j ∈K(t)
/
| {z }
R(t)
We now introduce Whittle’s index by referring to the problem of opportunistic channel access3 .
To this end, we consider a Markovian single-armed bandit process, i.e., a single channel. In each
2
The case of discounted reward can be treated similarly.
3
We refer readers to [212] for a detailed presentation of Whittle’s index and the related indexability result.
43
Chapter 3. Opportunistic Channel Access: A Restless Multi-Armed Bandit Perspective
slot, the user chooses one of two possible actions a ∈ {0 (passive), 1 (active)} to make the arm
passive or active. An expected reward of ω is obtained when the arm is activated at belief state ω 4 .
The objective is to decide whether to active the arm in each slot to maximize the total discounted
or average reward. The optimum policy is essentially given by an optimum partition of the state
space [0, 1] into a passive set {ω : a∗ (ω) = 0} and an active set {ω : a∗ (ω) = 1}, where a∗ (ω)
denotes the optimum action under belief state ω.
Whittle’s index measures how attractive it is to activate an arm based on the concept of subsidy
for passivity. Specifically, we construct a single-armed bandit process that is identical to the above
specified bandit process except that a constant subsidy m is obtained whenever the arm is made
passive. Obviously, this subsidy m will change the optimum partition of the passive and active sets,
and states that remain in the active set under a larger subsidy m are more attractive to the user.
The minimum subsidy m that is needed to move a state from the active set to the passive set under
the optimum partition thus measures how attractive this state is.
Whittle index and Whittle index policy are formally defined as below. Whittle also conjectured
the optimality of the Whittle index policy.
Definition 3.2 (Whittle index and Whittle index policy [212]). If all arms are indexable, its Whittle’s
index W (ω) of the state ω is the infimum subsidy m such that it is optimum to make the arm passive
at ω. The Whittle index policy consists of activating the K arms of the largest indices in each slot.
Conjecture 3.1 (Whittle Conjecture [212]). If all arms are indexable, the Whittle index policy is
optimum in terms of average reward per arm in the limit.
The research efforts on analyzing the RMAB problem arising in various applications, espe-
cially communication and networking and dynamic systems, usually fall into the following three
categories.
• The first one is to seek sufficient conditions for simple and robust policies (e.g., myopic policy,
greedy policy) under which the optimality of such policies is guaranteed [6, 116, 125, 148,
237].
• The second one is to construct particular policies whose performance loss to the system
optimum is bounded [84, 85].
• The third one is to calculate the Whittle index and derive policies based on the Whittle
index [68, 101, 105, 118, 126, 162].
Another extension of MAB, different from the Bayesian formulation in our problem, is the
so-called non-Bayesian MAB where the channels’ availability statistics are not correlated in time
as Markov chains and are initially unknown to the users and need to be estimated via learning.
This leads to a tradeoff between exploration, by activating new arms to obtain more statistical
information, and exploitation, by activating optimum arms based on current knowledge. The
central problem in this formulation is to optimize the asymptotic performance by minimizing the
4
Please refer to Sec. 3.3.1 for a formal description of the system model.
44
Chapter 3. Opportunistic Channel Access: A Restless Multi-Armed Bandit Perspective
regret of the developed policy, given that the system regret under a policy π is defined as the
accumulative expected reward loss up to time T under the policy π compared to the genie-aided
policy where the stochastic properties of arms are known. Readers can refer to the literatures [4,
12, 13, 18, 115, 122, 123, 125, 189] for details.
45
Chapter 3. Opportunistic Channel Access: A Restless Multi-Armed Bandit Perspective
where
(i)
p01 = prob(channel i is good in the current slot given being bad in the previous slot),
(i)
p11 = prob(channel i is good in the current slot given being good in the previous slot).
We assume that channels go through state transition at the beginning of each slot t. The system
operates in a slotted fashion with slots indexed by t (t = 1, 2, · · · , T ), where T is the time horizon
of interest. Due to hardware constraints and energy cost, the user is allowed to sense only k
(1 ≤ k ≤ N ) of the N channels at each slot t. We assume that the user makes the channel selection
decision at the beginning of each slot after the channel state transition. Once a channel is chosen,
the user detects the channel state Si (t), which can be considered as a binary hypothesis test:
The performance of channel i state detection is characterized by the probability of false alarm i
and the probability of miss detection δi :
i , Pr{decide H1 | H0 is true },
ζi , Pr{decide H0 | H1 is true }.
We denote the set of channels chosen by the user at slot t by A(t) where A(t) ∈ N and |A(t)| = k.
Based on the imperfect sensing results {Oi (t) ∈ {0, 1} : i ∈ A(t)} in slot t, the user determines
which channels to access for transmission.
Obviously, by imperfectly sensing only k out of N channels, the user cannot observe the state
information of the whole system. Hence, the user has to infer the channel states from its past
decision and observation history so as to make its future decision. To this end, we define the channel
state belief vector (hereinafter referred to as belief vector for briefness) Ω(t) , {ωi (t), i ∈ N }, where
0 ≤ ωi (t) ≤ 1 is the conditional probability that channel i is in good state (i.e., Si (t) = 1). Given
the sensing action A(t) and the observations {Oi (t) ∈ {0, 1} : i ∈ A(t)}, the belief vector in t + 1
slot can be updated recursively using Bayes Rule as shown in (3.1):
(i)
p11 ,
i ∈ A(t), Oi (t) = 1
ωi (t + 1) = Ti (ϕi (ωi (t))), i ∈ A(t), Oi (t) = 0 (3.1)
Ti (ωi (t)), i 6∈ A(t)
where,
(i) (i)
Ti (ωi (t)) , ωi (t)p11 + (1 − ωi (t))p01 , (3.2)
i ωi (t)
ϕi (ωi (t)) , . (3.3)
1 − (1 − i )ωi (t)
Remark. We would like to emphasize that the sensing error introduces further complications in the
system dynamics (i.e., ϕi (ωi (t)) is non-linear with ωi (t)) compared with the perfect sensing case.
Therefore, those results (e.g., [6]) obtained without sensing error cannot be trivially extended to the
scenario with sensing error.
46
Chapter 3. Opportunistic Channel Access: A Restless Multi-Armed Bandit Perspective
where R(πt (Ω(t))) is the reward collected in slot t under the sensing policy πt with the initial belief
vector Ω(1)5 , 0 ≤ β ≤ 1 is the discount factor characterizing the feature that the future rewards
are less valuable than the immediate reward. By treating the belief value of each channel as the
state of each arm of a bandit, the user’s optimization problem can be cast into a RMAB problem.
In order to get more insight on the structure of the optimization problem formulated in (3.4)
and the complexity to solve it, we derive the dynamic programming formulation of (3.4) as follows:
VT (Ω(T )) = max E R(πt (Ω(T ))) ,
A(T )
h X Y Y i
Vt (Ω(t)) = max E R(πt (Ω(t))) + β (1 − i )ωi (t) [1 − (1 − j )ωj (t)]Vt+1 (Ω(t + 1)) .
A(t)
E⊆A(t) i∈E j∈A(t)\E
In the above Bellman equations, Vt (Ω(t)) is the value function corresponding to the maximum
expected reward from slot t to T (1 ≤ t ≤ T ) with the believe vector Ω(t + 1) following the
evolution described in (3.1) given that the channels in the subset E are sensed in good state and
the channels in A(t)\E are sensed in bad state.
Solving (3.4) using the above recursive iteration is obviously computationally heavy. Hence, a
natural alternative is to seek simple myopic sensing policy which is easy to compute and implement
that maximizes the immediate reward, formally defined as follows.
Definition 3.3 (Myopic Policy). Let the expected reward function F (Ω(t)) , E[R(πt (Ω(t)))] denote
the expected immediate reward obtained in slot t under the sensing policy π. The myopic sensing policy
consists of sensing the k channels that maximizes F (Ω(t)) for each slot t.
Despite (or due to) its simple and robust structure, the optimality of the myopic sensing policy is
not guaranteed. More specifically, when the channels are stochastically identical (i.e., all channels
follow the same Markovian dynamics P(i) = P, ∀i ∈ N ) and positively correlated (i.e., p11 > p01 ),
the myopic sensing policy is shown to be optimum when the user is limited to sensing one channel
each slot (k = 1) and obtains one unit of reward when the sensed channel is good [237]. The
analysis [6] and our work [209] further extend the study on the generic case where k ≥ 1. However,
the authors of [6] show that the myopic sensing policy is optimum if the user gets one unit of
5
If no information on the initial system state is available, each entry of Ω(1) can be set to the stationary distribution
(i)
(i) p01
ω0 = (i) (i) , 1 ≤ i ≤ N.
1+p01 −p11
47
Chapter 3. Opportunistic Channel Access: A Restless Multi-Armed Bandit Perspective
reward for each channel sensed to be good6 , while our work [209] shows via counter-examples
that the myopic sensing policy is not guaranteed to be optimal when the user’s objective is to find
at least one good channel7 . Given that such nuance on the reward function leads to totally contrary
results, a natural while fundamentally important question arises: how does the expected slot reward
function F (Ω(t)) impact the optimality of the myopic sensing policy? Or more specifically, under
what conditions on F (Ω(t)) is the myopic sensing policy guaranteed to be optimum?
In the sequel analysis by performing an axiomatic study, we give affirmative answer to the above
posed questions and study some important engineering implications behind the myopic sensing
policy. To make the presentation more streamlined, we present our results for the homogeneous
case where P(i) = P and i = ∀i ∈ N . We thus drop the channel index for notation simplicity.
Readers are referred to [201] for the heterogeneous case. We also assume that the channels are
positively correlated (p01 < p11 ), which corresponds to the realistic scenarios where the channel
states are observed to evolve gradually over time.
To conclude this subsection, we state some structural properties of T (ωi (t)) and ϕ(ωi (t)) that
are useful in the subsequent analysis.
Lemma 3.1. For positively correlated channel i, it holds that
• T (ωi (t)) is monotonically increasing in ωi (t);
• p01 ≤ T (ωi (t)) ≤ p11 , ∀ 0 ≤ ωi (t) ≤ 1.
(1−p11 )p01
Lemma 3.2. If 0 ≤ ≤ p11 (1−p01 ) and p01 < p11 , it holds that
• ϕ(ωi (t)) increases monotonically in ωi (t) with ϕ(0) = 0 and ϕ(1) = 1;
• ϕ(ωi (t)) ≤ p01 , ∀p01 ≤ ωi (t) ≤ p11 .
3.3.2 Axioms
This subsection introduces a set of three axioms characterizing a family of generic and prac-
tically important functions, to which we refer as regular functions. For presentation convenience,
we sort the elements of the believe vector Ω(t) = [ω1 (t), · · · , ωN (t)] for each slot t such that A =
{1, · · · , k} (i.e., the user senses channel 1 to channel k) and let ΩA , {ωi : i ∈ A} = {ω1 , · · · , ωk }8 .
The three axioms derived in the following characterize a generic function f defined on ΩA .
Axiom 3.1 (Symmetry). A function f (ΩA ) : [0, 1]k → R is symmetrical if ∀i, j ∈ A it holds that
f (ω1 , · · · , ωi , · · · , ωj , · · · , ωk ) = f (ω1 , · · · , ωj , · · · , ωi , · · · , ωk ).
48
Chapter 3. Opportunistic Channel Access: A Restless Multi-Armed Bandit Perspective
Axioms 3.1 and 3.2 are intuitive. Axiom 3.3 on the decomposability states that f (ΩA ) can
always be decomposed into two terms that replace ωi by 0 and 1, respectively. The three axioms
are consistent and non-redundant. Moreover, they can be used to characterize a family of generic
functions, referred to as regular functions, defined as follows.
Definition 3.4 (Regular Function). A function is called regular if it satisfies all the three axioms.
The developed three axioms characterize a set of generic functions widely used in practical
applications. We give two examples to get more insight: (1) The user gets one unit of reward for
each channel that is sensed good and is indeed good. In this example, the expected P
reward function
(for each slot), denoted as F , is the expected slot reward function is F (ΩA ) = ki=1 [(1 − )ωi ];
(2) The user gets one unit of reward if at least
Q one channel is sensed good. In this example, the
expected reward function is F (ΩA ) = 1 − ki=1 [1 − (1 − )ωi ]. It can be verified that in both
examples, F is regular by satisfying the three axioms.
The following definition studies the structure of the myopic sensing policy if the expected
reward function is regular.
Definition 3.5 (Structure of Myopic Sensing Policy). Sort the elements of the belief vector in de-
scending order such that ω1 ≥ · · · ≥ ωN , if the expected reward function F is regular, then the myopic
sensing policy, where the user can sense k channels, consists of sensing channel 1 to channel k.
We now establish closed-form conditions under which the myopic sensing policy achieves
the system optimum under imperfect sensing. To this end, we set up by defining an auxiliary
function and studying the structural properties of the auxiliary function, which serve as a basis
in the study of the optimality of the myopic sensing policy. We then establish the main result on
the optimality followed by illustrating how the obtained result can be applied via two concrete
application examples.
For the convenience of discussion, we firstly state some notations before presenting the analysis:
• The believe vector Ω(t) is sorted to [ω1 (t), · · · , ωN (t)] at each slot t such that A = {1, 2, · · · , k};
• N (m) , {1, · · · , m} (m ≤ N ) denotes the first m channels in N ;
Y Y
• Given E ⊆ M ⊆ N , P r(M, E) , (1 − )ωi (t) [1 − (1 − )ωj (t)], herein, P r(M, E)
i∈E j∈M\E
denotes the expected probability that the channels in E are sensed in good state, while the
channels in M \ E are sensed in bad state, given that the channels in M are sensed;
• PE11 denotes the vector of length |E| with each element being p11 ;
• Φ(l, m) , [T (ωi (t)), l ≤ i ≤ m] where the components are sorted by channel index. Φ(l, m)
characterizes the updated belief values of the channels between l and m if they are not
sensed;
• Given E ⊆ M ⊆ N , QM,E , [T (ϕ(ωi (t))), i ∈ M \ E] where the components are sorted by
channel index. QM,E characterizes the updated belief values of the channels in M \ E if they
are sensed in the bad state.
49
Chapter 3. Opportunistic Channel Access: A Restless Multi-Armed Bandit Perspective
∆min ,
min {F (1, ω−i ) − F (0, ω−i )}.
i∈N , ω−i ∈[0,1]k−1
Observing the form of the value function Vt (Ω(t)), we first define the auxiliary value function
with imperfect sensing and then derive several fundamental properties of the auxiliary value
function, which are crucial in the study on the optimality of the myopic sensing policy.
Definition 3.6 (Auxiliary Value Function under Imperfect Sensing). The auxiliary value function,
denoted as Wt (Ω) (t = 1, 2, · · · , T ) is recursively defined as follows:
WT (Ω(T )) = F (ω1 (T ), · · · , ωk (T )); X
Wt (Ω(t)) = F (ω1 (t), · · · , ωk (t)) + β P r(N (k), E)Wt+1 (ΩE (t + 1)), (3.5)
E⊆N (k)
where ΩE (t + 1) , (PE11 , Φ(k + 1, N ), QN (k),E ) denotes the belief vector generated by Ω(t) based
on (3.1).
The above recursively defined auxiliary value function gives the expected cumulated reward
of the following sensing policy: in slot t, sense the first k channels; if channel i is correctly sensed
good, then put it on the top of the list to be sensed in next slot, otherwise drop it to the bottom
of the list. Recall Lemma 3.1 and Lemma 3.2, under the condition 0 ≤ ≤ p(1−p 11 )p01
11 (1−p01 )
, if the belief
vector Ω(t) is ordered decreasingly in slot t, the above sensing policy is the myopic sensing policy
with Wt (Ω(t)) being the total reward from slot t to T .
We prove some structural properties of the auxiliary value function.
Lemma 3.3 (Symmetry). If the expected reward function F is regular, the correspondent auxiliary
value function Wt (Ω) is symmetrical in any two channels i, j ≤ k for all t = 1, 2, · · · , T , i.e.,
Wt (ω1 , · · · , ωi , · · · , ωj , · · · , ωN ) = Wt (ω1 , · · · , ωj , · · · , ωi , · · · , ωN ), ∀i, j ≤ k.
Lemma 3.4 (Decomposability). If the expected reward function F is regular, then the correspondent
auxiliary value function Wt (Ω(t)) is decomposable for all t = 1, 2, · · · , T , i.e.,
Wt (ω1 , · · · , ωi , · · · , ωN ) = ωi Wt (ω1 , · · · , 1, · · · , ωN ) + (1 − ωi )Wt (ω1 , · · · , 0, · · · , ωN ), ∀i ∈ N .
Lemma 3.4 can be applied one step further to prove the following corollary.
Corollary 3.1. If the expected reward function F is regular, then for any l, m ∈ N it holds that for
t = 1, 2, · · · , T
Wt (ω1 , · · · , ωl , · · · , ωm , · · · , ωN ) − Wt (ω1 , · · · , ωm , · · · , ωl , · · · , ωN ) =
h i
(ωl − ωm ) Wt (ω1 , · · · , 1, · · · , 0, · · · , ωN ) − Wt (ω1 , · · · , 0, · · · , 1, · · · , ωN ) .
Lemma 3.5 (Monotonicity). If the expected reward function F is regular, the correspondent auxiliary
value function Wt (Ω) is monotonously non-decreasing in ωl , ∀l ∈ N , i.e.,
ωl0 ≥ ωl =⇒ Wt (ω1 , · · · , ωl0 , · · · , ωN ) ≥ Wt (ω1 , · · · , ωl , · · · , ωN ).
50
Chapter 3. Opportunistic Channel Access: A Restless Multi-Armed Bandit Perspective
To study the optimality of the myopic sensing policy, we first develop two auxiliary lemmas
(Lemma 3.6 and Lemma 3.7) and then establish the sufficient condition under which the optimality
of the myopic sensing policy is guaranteed.
p01 (1−p11 ) ∆min /∆max
Lemma 3.6. Given that (1) < P11 (1−p01 ) , (2) β ≤ 11(p −p01 ) , and (3) F is regular,
(1−)(1−p01 )+ 1−(1−)(p
11 −p01 )
if p11 ≥ ωl ≥ ωm ≥ p01 where l < m, then it holds that
Wt (ω1 , · · · , ωl , · · · , ωm , · · · , ωN ) ≥ Wt (ω1 , · · · , ωm , · · · , ωl , · · · , ωN ), t = 1, · · · , T.
p01 (1−p11 ) ∆min /∆max
Lemma 3.7. Given that (1) < P11 (1−p01 ) , (2) β ≤ 11(p −p01 ) , and (3) F is regular,
(1−)(1−p01 )+ 1−(1−)(p
11 −p01 )
if p11 ≥ ω1 ≥ · · · ≥ ωN ≥ p01 , for any 1 ≤ t ≤ T , it holds that
The proof of the above lemmas consists of applying the structural properties of the auxiliary
value function and backward induction, as detailed in [200, 206]. Lemma 3.6 states that by swap-
ping two elements in Ω with the former larger than the latter, the user does not increase the total
expected reward. Lemma 3.7, on the other hand, gives the upper bound on the difference of the to-
tal reward of the two swapping operations, swapping ωN and ωk (k = N − 1, · · · , 1) and swapping
ω1 and ωN , respectively.
We now state the main result on the optimality of myopic sensing policy in the following
theorem. The proof [200, 206] consists of recursively applying Lemma 3.6 and Lemma 3.7.
Theorem 3.2 (Optimality of myopic sensing policy). If p01 ≤ ωi (1) ≤ p11 , 1 ≤ i ≤ N , the myopic
sensing policy is optimum if the following conditions hold: (1) F (Ω) is regular; (2) < Pp01 (1−p11 )
11 (1−p01 )
;
∆min /∆max
(3) β ≤ (p
11 −p01 ) .
(1−)(1−p01 )+ 1−(1−)(p
11 −p01 )
p01
As noted in [127], when the initial belief ωi is set to p01 +1−p 11
as the popular case in practical
systems, it can be checked that p01 ≤ ωi (1) ≤ p11 holds. Moreover, even the initial belief does not
fall in [p01 , p11 ], all the the belief values are bounded in the interval from the second slot following
Lemma 3.1. Hence our results can be extended by treating the first slot separately from the future
slots.
3.3.4 Discussion
We illustrate the application of the result obtained above in two concrete scenarios and compare
our work with the existing results.
Consider the channel access problem in which the user is limited to sensing k channels and
gets one unit of reward if the sensed
P channel is in the good state, i.e., the utility function can be
formulated as F (ΩA ) = (1 − ) i∈A ωi . Note that the optimality of the myopic sensing policy
under this model is studied in [127] for a subset of scenarios where k = 1, N = 2. We now
51
Chapter 3. Opportunistic Channel Access: A Restless Multi-Armed Bandit Perspective
study the generic case with k, N ≥ 2. To that end, we apply Theorem 3.2. Notice in this example,
we have ∆min = ∆max = 1 − . We can then verify that when < Pp01 (1−p11 )
11 (1−p10 )
, it holds that
∆ min > 1. Therefore, when the conditions 1 and 2 hold, the myopic
(p −p )
11 01
∆max (1−)(1−p01 )+ 1−(1−)(p
11 −p01 )
sensing policy is optimum for 0 ≤ β ≤ 1. This result in generic cases significantly generalizes the
results [127] where the optimality of the myopic policy is proved for the case of two channels and
only conjectured for general cases.
Next consider another scenario where the user can sense k channels but can only choose one of
them to transmit its packets. Under this model, the user wants to maximize its expected throughput.
More specifically, the slot utility function F = F (ΩA ) = 1 − Πi∈A [1 − (1 − )ωi ], which is regular.
In this context, we have ∆max = (1 − )k−1 pk−111 and ∆min = (1 − )
k−1 pk−1 . The third condition
01
k−1
p01
on for the myopic policy to be optimal becomes β ≤ k−1 (p −p )
. Particularly,
11 01
p11 (1−)(1−p01 )+ 1−(1−)(p
11 −p01 )
pk−1
when = 0, β ≤ k−1
01
. It can be noted that even when there is no sensing error, the myopic
p11 (1−p01 )
policy is not ensured to be optimum, which confirms our findings in [128, 209] on perfect sensing
scenarios.
We consider the same scenario as the previous section in which a user tries to access a multi-
channel opportunistic communication system consisting of a set N of N channels, each given by a
(k)
two state Markov chain with transition probabilities {pij }i,j=0,1 (1 ≤ k ≤ N ). The system operates
in a slotted fashion where the slots are indexed by t (1 ≤ t ≤ T ), where T is the time horizon of
interest. We assume that channels go through state transition at the beginning of a slot. The length
of each slot is denoted as ∆, which is further divided into two parts: the sensing phase and the
transmission phase. Let δ = a∆ (a ≤ 1) denote the time needed to sense one channel, the sensing
phase lasts na∆ if the user senses n channels and the transmission phase consists of the rest of the
52
Chapter 3. Opportunistic Channel Access: A Restless Multi-Armed Bandit Perspective
time (1 − an)∆.
The user’s objective is to maximize its throughput by choosing the appropriate set of channels
to sense. Let A(t), OA (t) denote the set of channels sensed and the set of sensing results OA (t) =
{Oi (t) ∈ {1, 0}, i ∈ A(t)} by the user at slot t who can sense at most M (1 ≤ M < N and aM ≤ 1)
channels due to hardware limit and sensing constraint. If at least one of the sensed channel is
in the good state, the user can successfully transmit one packet.9 In our study, we also take into
consideration the imperfect sensing which is characterized by the missed detection (the channel is
sensed good but is in fact bad) rate denoted as ζ and the false alarm rate denoted as (the channel
is sensed bad but is in fact good).
Obviously, by imperfectly sensing only |A(t)| out of N channels at each slot t, the user cannot
observe the state information of the whole system. Hence, the user has to infer the channel states
from its past decision and observation history so as to make its future decision. Moreover, the
current sensing outcome further serves as statistics for future decision. To this end, as in the
previous section we define the channel state belief vector (hereinafter referred to as belief vector for
briefness) Ω(t) , {ωi (t), i ∈ N }, where 0 ≤ ωi (t) ≤ 1 is the conditional probability that channel i
is in good state. Given the sensing set A(t) and the detection outcomes {Oi (t) ∈ {0, 1} : i ∈ A(t)},
the belief vector in t + 1 slot can be updated recursively using Bayes Rule as shown in (3.6):
(i)
p11 ,
i ∈ A(t), Oi (t) = 1
ωi (t + 1) = Ti (ϕ(ωi (t))), i ∈ A(t), Oi (t) = 0 (3.6)
Ti (ωi (t)), i 6∈ A(t),
where,
(i) (i)
Ti (ωi (t)) , ωi (t)p11 + (1 − ωi (t))p01 , (3.7)
ωi (t)
ϕ(ωi (t)) , . (3.8)
1 − (1 − )ωi (t)
We are interested in the user’s optimization problem to find a channel sensing policy π ∗ that
maximize the expected total discounted reward over a finite horizon. Mathematically, a sensing
policy πt is defined as a mapping from the belief vector Ω(t) to A(t) in slot t:
where the slot reward function R(πt (Ω(t)), OA (t)) = R(A(t), OA (t)) is the user throughput in slot
t under the sensing policy πt with the initial belief vector Ω(1), 0 ≤ β ≤ 1 is the discount.
9
Our work can be extended to the case where the user is equipped with more than one radio and can access multiple
channels at a time.
53
Chapter 3. Opportunistic Channel Access: A Restless Multi-Armed Bandit Perspective
Mathematically, P can be cast into a class of the RMAB problems with unknown number of arms
to be activated. It is worth noting that the RMAB problem is proved to be PSPACE-hard. Hence,
a natural alternative to tackle P is to seek myopic sensing policy that maximizes the immediate
reward. The motivation of focusing on the myopic sensing policy is two-fold:
• As demonstrated in our previous work, under certain conditions, the myopic sensing policy
is ensured to be optimum.
• The myopic sensing policy has a simple and robust structure that makes it easy to implement
in practice.
Existing studies on the myopic policy in the RMAB problem implicitly assume that the number
of arms to activate (in the context of our work, the number of channels to sense) is fixed. A natural
while crucial research problem is how many channels to sense at each slot so as to maximize the
expected total reward, which is the focus of our work presented in this chapter. We sort Ω(t) for
each slot t in the descending order such that ω1 (t) ≥ ω2 (t) ≥ · · · ≥ ωN (t) and thus form a channel
list l0 (t) = (1, 2, · · · , N )10 , the optimization problem on the number of channels to sense at each
slot is formalized as follows:
" T #
X
∗ t−1
P1 : nt = argmax E β R(A(t), OA (t)) Ω(1) , (3.10)
|A(t)| t=1
where, in slot t, the first |A(t)| channels are sensed, i.e., A(t) = {1, · · · , |A(t)|}.
It is insightful to note that the optimization problem P1 on the number of channels to sense
hinges on the following tradeoff between exploitation and exploration: sensing more channels can
help learn and predict the future channel state, thus increasing the long-term reward, but at the
price of sacrificing the reward at current slot as sensing more channels reduces the time for data
transmission, thus decreasing the throughput in the current slot.
Despite our focus on the opportunistic access problem of multi-channels communication sys-
tem, the model formulation and the consequent analysis to solve the optimization problem can be
generalized in the context of the RMAB problem and are readily applied in a variety of engineer-
ing fields such as object tracking, communication jamming and opportunistic packet scheduling.
Therefore, the following description and the use of terms in the context of opportunistic spectrum
access should be understood generically. Moreover, the slot reward function R(A(t), OA (t)) that
we adopt can be generically expressed in the normalized form as follows:
( Q
1 − C(|A(t)|), if i∈A(t) (1 − Oi (t)) = 0
R(A(t), OA (t)) = (3.11)
0, otherwise.
where C(|A(t)|) is the cost function monotonously increasing in |A(t)|, representing the time
associated to channel sensing and frequency switching. The first line of the right hand side of (3.11)
indicates that by sensing the channels in A(t) that contains at least one channel sensed good, the
user obtains a payoff 1 − C(|A(t)|). The second line indicates the case where none of the channels
in A(t) is sensed good, the user obtains 0 as payoff. In the channel access model depicted in
Sec. 3.4.1.1, by normalizing ∆ = 1, we have C(|A(t)|) = a|A(t)|.
10
The initial order of list is determined by the initial availability probability of each channel: ω1 (1) ≥ ω2 (1) ≥ · · · ≥
ωN (1) ⇒ l0 (1) = (1, 2, · · · , N ).
54
Chapter 3. Opportunistic Channel Access: A Restless Multi-Armed Bandit Perspective
It can be noticed that given a policy {n(t), 1 ≤ t ≤ T } (i.e., the number of channels to sense
at each slot, given the myopic sensing order), the belief vectors {Ω(t), 1 ≤ t ≤ T } form a Markov
process with an uncountable state space, which makes the optimization problem P1 intractable.
Therefore, we turn to the following heuristic policy referred to as ν-step lookahead policy: at each
slot t, the user senses the channels in the decreasing order of Ω(t) and estimates the expected
accumulated payoff from slot t + 1 to slot t + ν (t + ν ≤ T ), assuming that in slots t + 1, · · · , t + ν,
the user stops exploring new channels once an available one is found (or the maximum number of
channels to be sensed, M , is reached); the user stops sensing new channels when the sum of the
reward in the current slot plus that from slot t + 1 to t + ν decreases.
We now give the mathematical description of the ν-step lookahead policy. Let lk (t) and Ωk (t)
(k ≤ M ) denote the channel list and belief vector formed in the descending order of ωi (t) (1 ≤ i ≤
N ) after sensing the first k best channels in slot t, and lji (t) denote the jth channel in li (t).
To streamline our presentation, we introduce the pseudo cost function defined as follows:
( Q
C(|A(t)|) = a|A(t)|, if i∈A(t) (1 − Oi (t)) = 0
q(A(t), OA (t)) , 1 − R(A(t), OA (t)) = (3.12)
C0 = 1, otherwise.
Given the initial belief vector Ω0 (t + 1) at the beginning of slot t + 1 (with the correspondent
channel list l0 (t + 1)), let Qt+ν 0
t+1 (Ω (t + 1)) denote the expected accumulative pseudo cost accrued
from slot t + 1 to slot t + ν, given that the user stops sensing once a channel is sensed good or M
is reached, i.e.,
M h
X i−1
Y i
Qt+ν 0
(1 − ωl0 (t+1) (t + 1)) [C(i) + β · Qt+ν i
t+1 (Ω (t + 1)) , ωl0 (t+1) (t + 1)
i j t+2 (T(Ω1 (t + 1)))]
i=1 j=1
| {z }
A
M
Y
+ (1 − ωl0 (t+1) (t + 1))[C0 + β · Qt+ν M
t+2 (T(Ω0 (t + 1)))],
j
j=1
| {z }
B
where term A denotes the pseudo cost when channel li0 (t + 1) is sensed good while channels
l10 (t + 1), · · · , li−1
0 (t + 1) are sensed bad; term B denotes the pseudo cost when the first M channels
of l (t + 1) are sensed bad; Ωi1 (t + 1) and Ωi0 (t + 1) denote the belief vectors where the channel
0
li0 (t + 1) is sensed good and bad, respectively; T denotes the mapping from Ωk (t) to Ω0 (t + 1)
according to (3.6) at the beginning of slot t + 1, i.e., T : Ωk (t) → Ω0 (t + 1); Qtt21 +1 (T(Ωi (t1 )))
denotes the expected accumulative cost from slot t1 + 1 to t2 when i channels are sensed at slot t1 .
At each slot t, the ν-step lookahead policy can be implemented in a heuristic approach by
transforming the problem into an optimum stopping problem, i.e., the user stops sensing new
55
Chapter 3. Opportunistic Channel Access: A Restless Multi-Armed Bandit Perspective
channels when the sum of the reward in the current slot plus that from slot t + 1 to t + ν decreases.
Mathematically, the number of channels to sense in the ν-step lookahead policy, denoted as n(t),
is as follows:
n
|A(t)|
n(t) = inf |A(t)| : C(A(t), OA (t)) + βQt+ν
t+1 (T(Ω (t)))
o
t+ν
< C(A0 (t), OA0 (t)) + βQt+1 (Ω|A(t)| (t)), 1 ≤ |A(t)| ≤ M , (3.14)
is the expected accumulative pseudo cost from slot t + 1 to t + ν when the best |A(t)| channels of
t+ν
l0 (t) are sensed, and Qt+1 (Ω|A(t)| (t)) denotes the expected accumulative pseudo cost from slot t + 1
to t + ν when the (|A(t)| + 1)th channel of l0 (t) is sensed good with probability (1 − )ωl0 (t) (t)
|A(t)|+1
and bad with probability 1 − (1 − )ωl0 (t) (t), i.e.,
|A(t)|+1
t+ν |A(t)|+1
Qt+1 (Ω|A(t)| (t)) ,(1 − )ωl0 t+ν
(t) (t)Qt+1 (T(Ω1 (t)))
|A(t)|+1
t+ν |A(t)|+1
+ (1 − (1 − )ωl0 (t) (t))Qt+1 (T(Ω0 (t))). (3.15)
|A(t)|+1
The following theorem further studies the structure of the ν-step lookahead policy by developing
an optimum stopping algorithm to implement it.
Theorem 3.3. The ν-step lookahead policy can be implemented by Algorithm 3. In each iteration,
• the user continues to sense new channels if all the sensed channels are bad (exploration);
• if at least one channel is sensed good, the user stops sensing new channels if the expected pseudo
cost increases by sensing a new channel (exploration).
then
Terminate the algorithm by outputting A(t)
end if
end while
Remark. The ν-step lookahead policy can be decomposed into two steps:
• Exploitation: the user exploits the current available information Ω(t) in a greedy way so as to
find a good channel;
56
Chapter 3. Opportunistic Channel Access: A Restless Multi-Armed Bandit Perspective
• Exploration: once a good channel secured, the user proceeds to explore the system state space for
long term gain.
The second step (exploration) may be absent if all the M best channels are sensed bad or if exploring
does not increase gain in the long term (i.e., the condition in Algorithm 3 does not hold even once).
The complexity of the algorithm implementing the ν-step lookahead policy lies in the compu-
tation of (3.16), whose complexity is exponential with ν. On the other hand, a larger ν leads to
better performance of the lookahead policy. Hence, the user can tune the parameter ν to achieve a
desired tradeoff between complexity and efficiency.
Having derived the algorithm implementing the proposed ν-step lookahead policy, we now
focus on the case of i.i.d. channels and provide an mathematical analysis on the case where ν = 1,
i.e., the one-step lookahead policy. Our motivation of investigating this particular policy is two-fold:
• the study on the one-step lookahead policy can provide structural insights on the computation
of the expected pseudo cost, which is the foundation of the ν-step lookahead policy. The
general case ν > 1 can be extended iteratively from the case ν = 1;
• through numerical experiments (please refer to our publication [207]), we observe that
the benefit of the ν-step lookahead policy is most important in the case of ν = 1 and then
decreases gradually with the increase of ν; this observation, combined with the fact that
the complexity of the ν-step lookahead policy increases exponentially with ν, motivates a
more focused analysis on the one-step lookahead policy, which seems to be the most practical
strategy in many scenarios.
Given the system model presented in Subsection 3.4.1.1, assume that the user has sensed k
channels with at least one of them is in state good, recalling Algorithm 3, the condition to decide
whether to sense channel k + 1 in the channel list can be written as:
t+ν
a > β Qt+ν k k
t+1 (T(Ω (t))) − Qt+1 (Ω (t)) . (3.17)
t+1
We next show how to compute Qt+1 k k
t+1 (T(Ω (t)))) and Qt+1 (Ω (t)) in an efficient way. We present
the major steps and orient readers to our publication [207] for detailed analysis and algebraic
demonstrations. We first show the following lemma on how the channel list should be updated
when a new channel is sensed.
Lemma 3.8. For a system with positively correlated homogeneous i.i.d. channels, if 0 ≤ ≤ (1−p 11 )p01
p11 (1−p01 ) ,
the channel sensed good (bad) should be moved to the head (tail) of the old channel list to form the
new channel list.
Assume that the channel list at the beginning of slot t before sensing any channels is l0 (t) =
(1, 2, · · · , N ), sorted in the decreasing order of the belief values. Assume that among the k sensed
channels {1, · · · , k}, m (m ≥ 1) channels are sensed good while k − m are bad. It follows
from Lemma 3.8 that m channels are moved to the head of the channel list and others to the
tail, thus forming the new channel list lk (t). We now show how to compute Qt+1 k
t+1 (T(Ω (t))),
Qt+1 k+1 t+1 k+1
t+1 (T(Ω1 (t))) and Qt+1 (T(Ω0 (t))) in the case of m ≥ 1 so as to decide whether to sense
channel k + 1.
57
Chapter 3. Opportunistic Channel Access: A Restless Multi-Armed Bandit Perspective
The following lemma establishes a structural property of X(T(Ωk (t)), m) by showing that
X(T(Ωk+1 (t)), m + 1) can be recursively derived based on X(T(Ωk (t)), m) in both cases where the
channel k + 1 is sensed good and bad, respectively.
Lemma 3.9. The following recursive update on the auxiliary vector holds:
• If k + 1 channel is sensed good, X(T(Ωk+1 k
1 (t)), m + 1) = H1 · X(T(Ω (t)), m),
• If k + 1 channel is sensed bad, X(T(Ωk+1 k
0 (t)), m + 1) = H2 · X(T(Ω (t)), m),
where
1 0 0 0 0
0 1 − (1 − )p11 0 0 0
1 0 1 − (1 − )p11 0 0
H1 = 1
,
0 0 0 1−ωlk (t+1) 0
m+2 (t)
1
−1 0 0 0 1−ωlk (t+1)
m+2 (t)
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
1−ωlk (t+1)
H2 = .
(t)
M +1
0 0 0 1−ωlk (t+1) 0
m+2 (t)
1−ωlk (t)
(t+1)
M +1 1
−1
0 0 1−ωlk (t+1) 1−ωlk (t+1)
m+2 (t) m+2 (t)
Theorem 3.4 further shows that Qt+1 k t+1 k+1 t+1 k+1
t+1 (T(Ω (t))), Qt+1 (T(Ω1 (t))) and Qt+1 (T(Ω0 (t))) can
be easily computed by using the auxiliary vector. Consequently, the one-step lookahead policy
can be implemented in an efficient fashion by using the auxiliary vector, which can be updated
recursively.
58
Chapter 3. Opportunistic Channel Access: A Restless Multi-Armed Bandit Perspective
where,
Recall Algorithm 3 and (3.18) – (3.20), it can be verified that the one-step lookahead policy
has a linear computational complexity O(M ).
We further demonstrate some of the theoretical results derived above and gain more insight on
the developed ν-step lookahead policy as well as the performance tradeoff via a set of numerical
experiments. Readers are referred to [207] for detailed numerical analysis and demonstration.
Our first extension is to consider the Markov channel model with m states where m = 2 has
been extensively investigated. Specifically, we consider a communication system composed of N
independent channels each of which is models as a time-nonhomogeneous m-state Markov chain
with known matrix of transition probabilities. At each time period a user selects a number of
channels to access and uses it to transmit information. A reward depending on the states of those
selected channels is obtained for each transmission. The objective is to design a channel access
policy that maximizing the expected accumulated discounted reward (respectively, the expected
accumulated reward) collected over a finite (respectively, infinite) time horizon.
Our work along this research axis is the construction a set of conditions to guarantee the
optimality of myopic policy. In particular, we show that the structure of the myopic policy is a
simple queue determined by the availability probability vector of channels provided that certain
condition is satisfied for the transition matrix of multi-state channels. Further, we obtain a set of
conditions under which the myopic policy is proved to be optimum. Our derivation demonstrates
the advantage of branch-and-bound and the directed comparison based optimization approach.
These results consist of a generic complement to the state of the art of the RMAB theory, although
the structure of the optimum policy of generic case is still unsolved.
59
Chapter 3. Opportunistic Channel Access: A Restless Multi-Armed Bandit Perspective
so as to learn the primary receiver’s channel condition and the interference tolerance level, then
chooses appropriate power to transmit its data. In such context, the user cannot probe all the
channels for its limited number of receiving antennas, then a crucial optimization problem faced
by the user is to probe which channel(s) in order to maximize the long-term throughput given the
past probing history.
We tackle this problem by casting it into a RMAB problem. Given the specific and practical
constraints posed by the problem, we analyze the myopic probing policy which consists of probing
the best channels based on the past observation.We perform an analytical study on the optimality
of the developed myopic probing policy. Technically, we divide the belief vector in this work into
the value belief vector and the policy belief vector, which reflects the essence of decomposability.
The optimality condition derived in this work degenerates to those obtained in the literature by
relaxing the corresponding constraints.
In this work, we study a channel access scenario with multiple users where switching among
channels incurs an additional cost. We cast the problem into a non-Bayesian MAB problem. Our
objective is to develop a distributed channel access algorithm with logarithmic regret. That is, the
performance degradation compared to the system optimum is a logarithmic function in time. The
major difficulty in our problem is that the channel switching cost add a new element in the regret.
In order to design asymptotically efficient channel access policies with logarithmic regret, we
need to limit the frequency of channel switching at users. In this line of design, we develop the
block-based channel access policy. The proposed channel access policy is inspired by the block
allocation scheme in [5] on the single-player MAB problem with switching cost and adapted in our
multi-user context. The main idea can be summarized as follows: we group slots in blocks; at the
beginning of each block, the users choose which channel to sense and stick to that channel for the
whole block if no collision is experienced during the whole block; otherwise in case of collision,
indicating more than one user on the same channel, a channel randomization is performed such
that each user experiencing the collision switches randomly to another channel. The block structure
is carefully constructed such that the total cost of channel switching and the loss due to collisions
are both controlled to O(log t), resulting a global O(log t) regret.
60
Chapter 3. Opportunistic Channel Access: A Restless Multi-Armed Bandit Perspective
is to model the situation as a non-cooperative game among users and to see how the results
obtained in this chapter can further be tailored in the new context.
Another practical extension is to consider the correlated channels, i.e., the Markov chains
of different channels can be correlated. This problem can be cast into the RMAB problem (the
Markovian formulation) with correlated arms. The introduction of the correlation among arms
makes the tradeoff between exploration and exploitation more sophisticated as sensing a channel
can not only reveal the state of the sensed channel, but also provide information on other channels
as they are not entirely independent. How to characterize the tradeoff in this new context and how
to design efficient channel access policies are pertinent research topics in this direction.
61
Chapter 4
4.1 Introduction
In this chapter, we investigate the following problem arising from emerging wireless networks:
How to design distributed algorithms that allow users to gradually converge to a stable and desir-
able system state based on purely location information and interactions? We tackle this problem
by using game theory (more precisely, non-cooperative game theory) as a systematic framework
of modeling and analysis. The motivation of investigating the problem from the (non-cooperative)
game theoretic perspective is three-fold.
• Game theory is a powerful tool to model the interactions of decision makers with mutu-
ally conflicting objectives, e.g., the interaction among rational and selfish nodes in wireless
networks.
• Non-cooperative game theory can model the features or constraints of wireless networks such
as lack of coordination and network feedback. In fact, in such environments, non-cooperative
behavior is much more robust and scalable than any centralized cooperative control, which
is very expensive or even impossible to implement.
• Game theory can serve as a validation tool to evaluate and benchmark the proposed algo-
rithms.
Under the non-cooperative game-theoretical framework, we model the network as a site where
each rational node adjust its strategy under the non-cooperative paradigm to maximize its own pay-
off. In such game theoretic studies, the central issue is to derive and characterize the resulting Nash
equilibria (NE), where no one has incentive to deviate unilaterally. We then develop distributed
learning algorithms for nodes to adjust their strategies to converge to a system equilibrium based
on only observable local information and interactions.
For concreteness, we instantiate our study by focusing on the distributed channel access in cog-
nitive radio networks and devise a suite of distributed channel access algorithms with guaranteed
convergence to the system equilibrium. However, we note that the algorithms developed in our
work are generically applicable in a wide range of networking and system problems such as load
balancing, selfish routing and resource allocation. In this sense, the model description and the use
of terms in this chapter (such as “channels”) should be understood generically.
62
Chapter 4. Distributed Learning in Wireless Networks: A Game-theoretical Perspective
We first consider a generic model of cognitive networks consisting of multiple frequency chan-
nels, each characterized by a channel availability probability determined by the activity of primary
users (PUs) on it. In such model, from the secondary users’ (SUs) perspective, a challenging prob-
lem is to compete (or coordinate) with other SUs in order to opportunistically access the unused
spectrum of PUs to maximize its own payoff (e.g., throughput); at the system level, a crucial
research issue is to design efficient spectrum access algorithm achieving optimum spectrum usage.
We formulate the spectrum access problem as a non-cooperative game and develop distributed
spectrum access algorithms based on imitation, a behavior rule widely applied in human societies
consisting of imitating successful behavior. We establish the convergence of the proposed policies
to an imitation-stable equilibrium which is also the -optimum of the system. Simple, natural and
incentive-compatible, the proposed spectrum access algorithms can be implemented distributedly
based on solely local interactions and thus are especially suited in decentralized adaptive learning
environments as cognitive radio networks.
In our analysis, we start by developing the imitation-based spectrum access policies where a
SU can imitate any other SUs. More specifically, we develop two spectrum access policies based on
the following two imitation rules:
• the Proportional Imitation (PI) rule where a SU can sample one other SU;
• the more advanced adjusted proportional imitation rule with double sampling (Double Imi-
tation, DI) where a SU can sample two other SUs.
Under both imitation rules, each SU strives to improve its individual payoff by imitating other
SUs with higher payoff. We then adapt the proposed spectrum access policies to a more practical
scenario where a SU can only imitate the other SUs operating on the same channel. A systematic
theoretical analysis is presented for both scenarios on the induced imitation dynamics and the
convergence properties of the proposed policies to an imitation-stable equilibrium, which is also
the -optimum of the system.
Distributed spectrum access has been widely addressed in the literature. As discussed in the
last chapter, the problem can be cast into the RMAB problem, where spectrum access policies are
devised where the number of SUs is known or estimated by each SU [12]. Another important
thrust consists of applying game theory to model the competition and cooperation among SUs
and the interaction between SUs and PUs. Particularly, a number of works (e.g. [55]) model the
spectrum access as a potential game and study the system dynamics under asymptotic assumptions.
More related to our work, several works focus on learning via imitation. Alos-Ferrer et al. study
an imitation-based model of evolution with noise, where players have got memories of their past
payoffs and adopt the Imitate If Best Rule (see [7] for a review on imitation rules) among the
strategies associated to its recalled payoffs and those of another random player [8]. Ackermann et
al. investigated the concurrent imitation dynamics in the context of finite population symmetric
congestion games by focusing on the convergence properties [2]. Berenbrik et al. applied the
Proportional Imitation Rule to load-balance system resources by focusing on the convergence
speed [29]. Ganesh et al. applied the Imitate If Better rule in order to load-balance the service rate
of parallel server systems [80].
Compared to the state of the art, the key contribution of our work lies in the systematical
application of the natural imitation behavior to address the spectrum access problem with specific
constraints (e.g., learning is restricted to nodes on the same channel), the design of a distributed
63
Chapter 4. Distributed Learning in Wireless Networks: A Game-theoretical Perspective
imitation-based channel access algorithm, and the theoretic analysis on the induced imitation
dynamic and the convergence to an efficient and stable system equilibrium.
We then consider the case where instead of imitating other nodes, a node imitates only its
behavior that has brought him higher payoff in the past. Such “self-imitation” demonstrates more
robustness in the case where imitating others is not possible or reliable.
Technically, we develop and analyze a framework of retrospective spectrum access protocols
based on stochastic learning that can orient the network towards a socially efficient and fair
equilibrium state. Our developed retrospective spectrum access protocol has two features: (1)
the entirely distributed implementation requiring only local observations and (2) the guaranteed
statistical convergence to the equilibrium state within a bounded delay.
The developed retrospective protocol follows a natural design philosophy: in each decision
period, an SU j explores a new channel with probability and migrates to the channel that gives
the best utility within last Hj periods with probability (1−)(1−ρ) (ρ is called endogenous inertia).
We note that the protocol can model the particular feature that each SU is equipped with bounded
memory and should make its decision based on only local observations. The protocol also models
a natural human decision making behavior of striking a balance between exploring a new choice
and retrospectively exploiting past successful choices. To analyze the performance of the developed
protocol, we apply the mistake model introduced in [77, 143, 144, 225] and establish the statistical
convergence of dynamics to the system equilibrium within a bounded latency O(1/).
While our model is presented in the specific context of channel access, we intend it more
generically as a contribution to the literature on bounded rationality and learning in the presence
of noise, which thus far have been mostly explored in biology and economics. Relying on the
classical work [72, 77, 143, 144, 225], Dieckmann analyzed the evolution of conventions in a
society with local interactions and mobile players [192]. Mertikopoulos et al. studied exponential
learning in the presence of noise and the induced stochastic replicator dynamics [141]. Friedman
and Mezzetti investigated mistakes models that induce better and best reply dynamics [79]. Young
et al. proposed a series of completely uncoupled rules1 (e.g., [78, 136, 137, 158, 226]) possessing
several appealing properties. The version of Trial and Error presented in [158], for instance, is
able to converge to the PNE maximizing the social welfare. Nevertheless, complexity is high and
convergence speed is very slow, as it has been shown in [96].
The retrospective learning protocol that we propose has a similar architecture to the learning
procedures developed in [136] and [240]. Our main contributions with respect to the existing
literature is the introduction of nontrivial memories and inertia, as well as the results on conver-
gence time. We would like to emphasize that despite our focus on channel access, the developed
stochastic learning protocol and the analysis methodology in this work also provide some insights
on the design of decentralized load balancing algorithms that respect locality constraints (e.g.,
bounded memory, bounded rationality) while converging to the balanced state within a bounded
delay.
1
An individual’s learning rule is completely uncoupled if it does not depend directly on the actions or payoffs of anyone
else.
64
Chapter 4. Distributed Learning in Wireless Networks: A Game-theoretical Perspective
The rest of this chapter is structured as follows. Section 4.2 develops our work on the imitation-
based spectrum access. Section 4.3 presents our work on the retrospective spectrum access. Sec-
tion 4.4 concludes the chapter by briefly summarizing our other related work related to this topic.
Part of the work of this chapter is the topic of the thesis of my former Ph.D. student Stefano Iel-
lamo (co-advised with Pr. Marceau Coupechoux) who is actually a Marie-Curie research fellow at
ICS-FORTH. The ongoing thesis of Mira Morcos (co-advised with Pr. Tijani Chahed) is also related
to this topic. More details of our work on this topic including proofs and numerical analysis can be
found in our publications [23 – 26, 70, 71, 94 – 96].
We consider a primary network consisting of a set C of C frequency channels, each with band-
width B 2 . The users in the primary network are operated in a synchronous time-slotted fashion.
A set N of N SUs tries to opportunistically access the channels when they are left free by PUs.
Let Zi (k) be the random variable equal to 1 when of channel i is unoccupied by any PU at slot k
and 0 otherwise. We assume that the process {Zi (k)} is stationary and independent for each i and
k. We also assume that at each time slot, channel i is free with probability µi , i.e., E[Zi (k)] = µi .
The channel availability probabilities µ , {µi } are a priori not known by SUs. We assume perfect
sensing at the SUs, i.e., any transmission of any PU on a channel is perfectly sensed by SUs sensing
that channel and thus no collision occurs between PUs and SUs.
In our work, each SU j is modelled as a rational decision maker, striking to maximize the
throughput it can achieve, denoted as Tj , which can be expressed as a function of µi and nsj ,
where sj denotes the channel which j chooses, nsj denotes the number of SUs on channel sj . More
formally, the expected value of Tj can be written as:
In order to perform a closed-form analysis, we focus on the scenario where the channel capacity is
evenly shared among all SUs on the channel when it is free, i.e.,
It should be noted that f (µsj , nsj ) depends on the MAC protocol implemented at the cognitive users.
Beside the evenly shared model considered here, several other models are also largely applied in
practice such as the CSMA-based random access model. Our work can be adapted in those cases
by defining appropriate function f .
To study the interactions among autonomous selfish SUs and to derive distributed channel
access policies, we formulate the channel selection problem as a spectrum access game where the
players are the SUs. Each player j stays on a channel i to opportunistically exploit the unused
spectrum of PUs to maximize its expected throughput. The game is defined formally as follows:
2
Our analysis can be extended to study the heterogeneous case with different channel capacities.
65
Chapter 4. Distributed Learning in Wireless Networks: A Game-theoretical Perspective
Definition 4.1. The spectrum access game G is a 3-tuple (N , C, {Uj }), where N is the player set, C
is the strategy set of each player. Each player j chooses its strategy sj ∈ C to maximize its normalized
utility function Uj defined as
The solution of the spectrum access game G is characterized by a Nash Equilibrium (NE) [149],
a strategy profile from which no player has incentive to deviate unilaterally. Using the related
theory on congestion games, we can establish the existence and the uniqueness of the NE in the
spectrum access game G for the asymptotic case (N → ∞) in the following theorem.
Theorem 4.1. In the asymptotic case, G admits a unique NE. At the NE, there are x∗i N SUs staying
with channel i, where x∗i = P µi µl .
l∈C
We can observe two desirable properties of the unique NE derived in Theorem 4.1:
• the NE is optimum from the system perspective as the total throughput of the network
achieves its optimum at the NE;
• the NE ensures that the spectrum resource is shared fairly among SUs.
One critical challenge in the analyzed spectrum access game is the design of distributed spec-
trum access strategies for rational SUs to converge to the NE without the a priori knowledge of µ.
In response to this challenge, we develop an efficient spectrum access policy. Our proposed policy
can be implemented distributedly based on solely local interactions without any knowledge on
the channel statistics and thus is especially suited in decentralized adaptive learning environments
as cognitive radio networks. In terms of performance, we demonstrate both analytically and nu-
merically that the proposed channel access policy converges to the -NE3 of G which is also the
-optimum of the system.
The spectrum access algorithm we develop is based on imitation. As a behavior rule widely
observed in human societies, imitation captures the behavior of a rational player that mimics the
actions of other players with higher payoff in order to improve its own payoff. The induced imitation
dynamic models the spreading of successful strategies under imitation [174]. In this section, we
focus on the scenario where a SU can imitate any other SUs and develop two spectrum access
policies based on the proportional imitation rule and the double imitation rule. We analyze the
induced dynamic of the imitation process and show the convergence of the proposed policy to the
-NE of G. In the next section, we extend our efforts to a more practical scenario where a SU can
only imitate the other SUs operating on the same channel and develop an adapted imitation-based
spectrum access policy in the new context.
Algorithm 4 presents our proposed spectrum access policy based on the proportional imitation
rule, termed as PISAP. The core idea is as follows: at each iteration, each SU randomly selects
3
A strategy profile is an -NE if no player can gain more than in payoff by unilaterally deviating from his strategy.
66
Chapter 4. Distributed Learning in Wireless Networks: A Game-theoretical Perspective
another SU in the network; if the payoff of the selected SU is higher than its own payoff, the SU
imitates the strategy of the selected SU at the next iteration with a probability proportional to the
payoff difference, coefficiented by the imitation factor σ.4
We first study the dynamic induced by PISAP by setting U = 0. It is shown in [170] that in the
asymptotic case, the proportional imitation rule in Algorithm 4 generates a population dynamic
described by the following set of differential equations:
xi (t) = Ki e−(
µl
)σt + P µi
P
l∈C N , (4.2)
l∈C µl
where the constant Ki = xi (0) − P µi
l∈C µl .
As the first result of this section, the following theorem states the convergence of the dynamic
to the NE of the spectrum access game G.
Theorem 4.2. The imitation dynamic induced by PISAP converges exponentially to the NE of G.
We then study the convergence of PISAP in the general case with U > 0. Specifically, we define
the imitation-stable equilibrium as a state where no further imitations can be conducted based on
the imitation policy [2]. The following theorem analyzes the convergence of PISAP with respect to
this concept.
N 2
Theorem 4.3. PISAP converges to an imitation-stable equilibrium in expected O( µmin σU ) iterations
where µmin , mini∈C µi . The converged equilibrium is an -NE of G with = 2U .
N 2
Note that the convergence delay O( µmin σU ) derived in Theorem 4.3 consists of the upper
bound and through the simulations we conduct, we observe that the convergence is achieved in a
much shorter delay.
4
One way of setting σ is to set σ = 1/(ω − α), where ω and α are two exogenous parameters such that Uj ∈
[α, ω], ∀j ∈ C.
67
Chapter 4. Distributed Learning in Wireless Networks: A Game-theoretical Perspective
We next turn to a more advanced imitation rule, the double imitation (DI) rule [173], and
propose the DI-based spectrum access policy, termed as DISAP. Under DISAP, each SU randomly
samples two SUs and imitates them with a certain probability determined by the utility difference.
The spectrum access policy based on the double imitation is detailed in Algorithm 5, in which each
SUs randomly samples two other SUs j1 and j2 (without loss of generality, assume that j1 and j2
operate on channel i1 and i2 respectively, with corresponding utilities Uj1 ≤ Uj2 ) and updates the
probabilities of switching to channels i1 and i2 , denoted as pj1 and pj2 respectively.
The double imitation rule generates an aggregate monotone dynamic [169, 173], which is
defined as follows:
xi ω−π
ẋi = 1+ (πi − π) ∀i ∈ C (4.3)
ω−α ω−α
Injecting πi = µi /(xi N ) into the differential equations, we have:
σπ ω−π σπ ω−π
ẋi = 1+ − 1+ xi ,
ω−α ω−α ω−α ω−α
whose solution is σπ ω−π µi
xi (t) = Ke− ω−α (1+ ω−α )t + P , (4.4)
l∈C µl
where π = l∈C µl /N and K = xi (0) − P µi µl . In the studied scenario, α and ω are the lower and
P
l∈C
upper bound of the SUs’ utility, which are 0 and 1, respectively.
The following theorem stating the major result in this subsection follows immediately.
Theorem 4.4. DISAP converges exponentially to the NE of the spectrum access game G.
Compared with the proportional imitation rule, which produces the replicator dynamic (Eq. (4.1)),
the adjusted proportional imitation rule induces the aggregate monotone dynamic (Eq. (4.3)) that
converges to the NE at a higher rate.
We then study the convergence to an imitation-stable equilibrium of DISAP in the general case
with U > 0 in the following theorem.
68
Chapter 4. Distributed Learning in Wireless Networks: A Game-theoretical Perspective
N 2
Theorem 4.5. DISAP converges to an imitation-stable equilibrium in expected O( µmin σU ) iterations
where µmin , mini∈C µi . The converged equilibrium is an -NE of G with = 2U .
4.2.2.3 Discussion
As desirable properties, the proposed imitation-based spectrum access policies (both PISAP and
DISAP) are stateless, incentive-compatible for selfish autonomous SUs and requires no central com-
putational unit. The spectrum assignment is achieved by local interactions among autonomous SUs
and the -optimum of the system is achieved when the algorithm converges, which is achieved in
polynomial time. The autonomous behavior and decentralized implementation make the proposed
policies especially suitable for large scale cognitive radio networks. The imitation factor σ controls
the tradeoff between the convergence speed and the channel switching frequency in that larger σ
represents more aggressiveness in imitation and thus leads to fast convergence, at the price of more
frequent channel switching for the SUs which may consist of significant cost for today’s wireless
devices in terms of delay, packet loss and protocol overhead. The imitation threshold U , on the
other hand, can be tuned to balance between the convergence speed and the optimality of the
converged equilibrium.
Up to now, we have studied the imitation-based channel access policy where a SU can imitate
any other SU whatever the channel the latter stays in. This approach implicitly assumes that a
SU can interact with SUs on different channels, which may not be realistic in some cases or pose
additional system overhead (e.g., sensing a different channel). In this subsection, we focus on a
more practical scenario, where a SU only imitates the SUs on the same channel and the imitation
is based on the payoff difference of the precedent iteration. In the considered scenario, a SU only
needs to locally interact with the SUs on the same channel (e.g., exchange payoff of the precedent
iteration, which can be piggybacked with the data packets transmitted on the channel).
In the sequel analysis, we first study the induced imitation dynamic and the convergence of the
proposed spectrum access policies PISAP and DISAP subject to channel constraint on imitation.
We first derive in Theorem 4.6 the dynamic for a generic imitation rule F with large popula-
tion. We then derive in Lemma 4.1, Theorem 4.7 and Theorem 4.8 the dynamic of the proposed
proportional imitation policy PISAP and its convergence under the channel constraint. The coun-
terpart analysis for the double imitation policy DISAP is explored in Lemma 4.2, Theorem 4.9 and
Theorem 4.10.
We start by introducing the notations used in our analysis. At an iteration, we label all SUs
performing strategy i (channel i in our case) as SUs of type i and we refer to the SUs on sj as
neighbors of SU j. We denote nli (t) Pthe number of SUs onPchannel i at iteration t and operating
on channel l at t − 1. It holds that l∈C ni (t) = ni (t) and i∈C nli (t) = nl (t − 1). For a given state
l
s(t) , {sj (t), j ∈ C} at iteration t and a finite population of size N , we denote pi (t) , ni (t)/N the
proportion of SUs of type i and pli (t) , nli (t)/N the proportion of SUs migrating from channel l to
69
Chapter 4. Distributed Learning in Wireless Networks: A Game-theoretical Perspective
i. We use x instead of p to denote these proportions in asymptotic case. It holds that p → x when
N → +∞.
In our study, a generic imitation rule under the channel constraint is termed as F . In the case of
i } where F i
the proportional imitation rule (PISAP), F is characterized by the probability set {Fj,k j,k
denotes the probability that a SU choosing strategy j at the precedent iteration imitates another
SU choosing strategy k at the precedent iteration and then switches to channel i at next iteration
after imitation. Instead, by applying the double imitation rule (DISAP), we can characterize F by
i
the probability set {Fj,{k,l} i
} where Fj,{k,l} denotes the probability that a SU choosing strategy j
at the precedent iteration imitates two neighbors choosing respectively strategy k and strategy l
at the precedent iteration and then switches to channel i at next iteration after imitation. In both
cases the only way to switch to a channel i is to imitate a SU that was on channel i. That means
i = 0, ∀k 6= i (PISAP) and F i
Fj,k j,{k,l} = 0, ∀k, l 6= i (DISAP).
At the initialization phase (iteration 0 and 1), each SU randomly chooses its strategy. After that,
the system state at iteration t + 1, denoted as p(t + 1) (x(t + 1) in the asymptotic case), depends
on the states at iteration t and t − 1.
Theorem 4.6. For any imitation rule F , if the imitation among SUs of the same type occurs randomly
and independently, then ∀δ > 0, > 0 and any initial state {e xi (0)}, {e
xi (1)}, there exists N0 ∈ N
such that if N > N0 , ∀i ∈ C, the event |pi (t) − xi (t)| > δ occurs with probability less than , where
pi (0) = xi (0) = x ei (1). In the case of proportional imitation policy it holds that
ei (0), pi (1) = xi (1) = x
The proof of Theorem 4.6 [96] consists of first showing the theorem holds for iteration t = 2
and then proving the case t ≥ 3 by induction. Theorem 4.6 is an important result on the short
run adjustments of large populations under any generic imitation rule F : the probability that the
behavior of a large population differs from the one of an infinite population is arbitrarily small
when N is sufficiently large. In what follows, we study the convergence of PISAP and DISAP under
the channel constraint.
(1) Spectrum access policy PISAP under channel constraint
We now focus on PISAP under channel constraint and derive the induced imitation dynamic by
setting U = 0 in the following analysis.
Lemma 4.1. On the proportional imitation policy PISAP under channel constraint, it holds that
70
Chapter 4. Distributed Learning in Wireless Networks: A Game-theoretical Perspective
Theorem 4.7. The proportional imitation policy PISAP under channel constraint generates the fol-
lowing dynamic in the asymptotic case:
We observe via extensive numerical experiments that (4.6) always converges to the equilib-
rium. To get more in-depth insight on the dynamic (4.6), we notice that under the following
approximation:
X xlj (t)
πl (t − 1) ≈ π̄(t − 1), (4.7)
xj (t)
l∈C
P i π̄(t − 1) is the average individual payoff for the whole system at iteration t − 1, noticing
where
j xj (t) = xi (t − 1), (4.6) can be written as:
Note that the approximation (4.7) states that in any channel j at iteration t, the proportions of
SUs coming from any channel l are representative of the whole population.
Under the approximation (4.7), given the initial state {xi (0)}, {xi (1)}, we can decompose (4.8)
into the following two independent discrete-time replicator dynamics:
(
xi (u) = xi (u − 1) + σxi (u − 1)[πi (u − 1) − π̄(u − 1)]
(4.9)
xi (v) = xi (v − 1) + σxi (v − 1)[πi (v − 1) − π̄(v − 1)]
where u = 2t, v = 2t + 1. The two equations in (4.9) illustrate the underlying system dynamic
hinged behind the proportional imitation policy under channel constraint under the approxima-
tion (4.7): it can be decomposed into two independent delayed replicator dynamics that alterna-
tively occur at the odd and even iterations, respectively. The following theorem establishes the
convergence of (4.9) to a unique fixed point which is also the NE of the spectrum access game G.
Theorem 4.8. Starting from any initial point, the system described by (4.9) converges to a unique
fixed point which is also the NE of the spectrum access game G.
Furthermore, performing the same analysis as that of Theorem 4.3, we can establish the same
convergence property on the imitation algorithm under channel constraint under the approxima-
tion (4.7) for the general case with U ≥ 0.
(2) Spectrum access policy DISAP under channel constraint
We then focus on DISAP under channel constraint and derive the induced imitation dynamic.
Lemma 4.2. On the double imitation policy DISAP under channel constraint, it holds that
71
Chapter 4. Distributed Learning in Wireless Networks: A Game-theoretical Perspective
Theorem 4.9. The double imitation policy DISAP under channel constraint generates the following
dynamic in the asymptotic case
" #2
X X xkj (t) X xkj (t)
xi (t + 1) = xi (t − 1) + 2xi (t − 1)πi (t − 1) + xij (t) − 2xij (t) πk (t − 1)
xj (t) xj (t)
j k k
X k
xj (t)
− xij (t)πi (t − 1) πk (t − 1) (4.11)
xj (t)
k
Under the approximation (4.7), given the initial state {xi (0)}, {xi (1)}, we can decompose (4.12)
into the following two independent discrete-time aggregate monotone dynamics:
(
xi (u) = xi (u − 1) + xi (u − 1)[2 − π̄(u − 1)] · [πi (u − 1) − π̄(u − 1)]
(4.13)
xi (v) = xi (v − 1) + xi (v − 1)[2 − π̄(v − 1)] · [πi (v − 1) − π̄(v − 1)]
where u = 2t, v = 2t + 1. The above two equations illustrate the underlying system dynamic hinged
behind the double imitation policy under channel constraint under the approximation (4.7): it can
be decomposed into two independent delayed aggregate monotone dynamics that alternatively oc-
cur at the odd and even iterations, respectively. The following theorem establishes the convergence
of (4.13) to a unique fixed point which is also the NE of the spectrum access game G.
Theorem 4.10. Starting from any initial point, the system described by (4.13) converges to a unique
fixed point which is also the NE of the spectrum access game G.
Based on the theoretic results derived previously, we develop a fully distributed channel access
policy for the general case with finite population based on the imitation rule among SUs on the
same channel (i.e. neighbors). The proposed policy, detailed in Algorithm 6, is suitable both for
proportional and double imitation. Run at each SU j and at each iteration, it consists of:
• sampling randomly one (proportional imitation) or two (double imitation) neighbors;
• comparing the payoff achieved at the previous iteration t − 1 with that of the neighbor(s)
selected for imitation;
• performing channel migration with the probability dictated by the applied imitation rule.
Algorithm 6 is evaluated by extensive simulations detailed in [96].
72
Chapter 4. Distributed Learning in Wireless Networks: A Game-theoretical Perspective
Algorithm 6 Imitation-based Spectrum Access Policy under Channel Constraint: executed at each
SU j
1: Initialization: set the imitation factor σ, the imitation threshold U and the learning rate (t)
2: Randomly choose a channel for the first two iterations t = 0, 1
3: while for each iteration t ≥ 2 do
4: With probability 1 − (t)
5: Perform imitation in PISAP or DISAP on the same channel
6: With probability (t)
7: Switch to a random channel
8: t ← t + 1:
9: end while
We consider the downlink of primary network and SUs trying to opportunistically accessing the
free spectrum (Fig. 4.1). The primary spectrum consists of a set C of C frequency channels, each
with bandwidth B. The users in the primary network are operated in a synchronous time-slotted
fashion. A set N of N SUs tries to opportunistically access the channels when they are left free by
PUs.
Each SU j has a finite memory containing the history (strategies and payoffs) relative to the
Hj past iterations. Let Hj be the set of iterations recalled by SU j. Let ξi (k) be the random variable
equal to 1 when channel i is unoccupied by the PU at slot k and 0 otherwise. We assume that the
process {ξi (k)} is stationary and independent for each i and k. We also assume that at each time
slot, channel i is free with probability µi , i.e., E[ξi (k)] = µi . We define an iteration t as a block of
PU-slots of fixed duration T during which the SUs don’t change their strategy (see Fig. 4.2). At the
end of each iteration, SUs obtain a payoff which corresponds to the achieved throughput.
channel i PU is active
time
block t block t+1
73
Chapter 4. Distributed Learning in Wireless Networks: A Game-theoretical Perspective
In our work, each SU j is modeled as a rational decision maker, aiming at load-balancing the
total system throughput. The instantaneous throughput it can achieve in terms of packets per
second, denoted as Tj , can be expressed as a function of µsj and nsj , where sj denotes the channel
which j chooses, and nsj denotes the number of SUs choosing channel sj . The expected value of
Tj , which has to be intended as the long-term throughput when T is very large, can be written as:
In our work, SUs implement a generic random access protocol to avoid collisions. This yields:
where p(nsj ) is a decreasing function denoting the successful transmission probability with nsj
SUs interfering with SU j on channel sj . B is a constant standing for the available bandwidth per
channel. Without loss of generality, we will now assume that B = 1.
We next formulate the channel selection problem as a spectrum access game where the players
are the SUs. The game is defined formally as follows:
Definition 4.2 (Spectrum access game). The spectrum access game G is a 3-tuple (N , C, {Uj (s)}),
where N is the player set, C is the strategy set of each player. Let s−j = {s1 , · · · , sj−1 , sj+1 , · · · , sC }
be the channels chosen by all users except user j. When a player j chooses strategy sj ∈ C, its player-
specific utility function Uj (sj , s−j ) is defined as
The users struggle for maximizing their utility function and a commonly accepted solution
for the game is a Pure Nash Equilibrium (PNE), which in our case can be thought as a mutually
acceptable channel selection. More formally:
Definition 4.3 (Pure Nash Equilibrium). A Pure Nash Equilibrium is a point s∗ in the action profiles
space, from which no user has incentive to deviate unilaterally. Thus
We can recognize that G is a congestion game with player-specific payoff functions. It then
follows from [145] and [96] that G possesses at least one PNE in the general case.
We now develop a distributed retrospective spectrum access protocol (RSAP) that achieves a
PNE of the spectrum access game. We firstly provide some definitions we shall need in the sequel
analysis.
Define the state z(t) of the system at iteration t by z(t) , {Uj (t − h)}j∈N ,h∈Hj . Let λj =
argmaxh∈Hj Uj (t − h) be the number of iterations passed from the SU j highest remembered payoff.
Furthermore, let ρj denote the inertia, which is defined as a positive probability that SU j is unable
to adjust its strategy at each iteration. Note that the concept of inertia has already been included
in models of evolution with noise (see, e.g. [8]). In those cases however, inertia was defined as an
74
Chapter 4. Distributed Learning in Wireless Networks: A Game-theoretical Perspective
exogenous parameter, meaning that the probability of inertia could be take equal to zero. We will
show in Proposition 4.5 that RSAP converges 1) in the general case if ρj > 0 for all j and 2) in the
particular case where Hj = 1 and ρj = 0 for all j.
We now introduce the RSAP, as detailed in Algorithm 7. At each iteration t each user j applies
the following revision scheme. With probability (1−(t))(1−ρj ), SU j switches to channel sj (t−λj )
if Uj (t − λj ) > Uj (t), and with probability (t) selects for the next iteration a channel with uniform
distribution.
Definition 4.4 (Migration-stable state). A migration-stable state ω is a state where no more migra-
tion is possible, i.e., Uj (t) ≥ Uj (t − h), ∀h ∈ Hj , ∀j ∈ N .
Foster and Young, with their pioneering work dated 1990 [77], were the first to argue that
the Evolutionary Stable Strategy (ESS) does not capture the notion of long-run stability when the
system is subjected to continual (rather than isolated) stochastic perturbations. In this new context,
it is possible to identify a set of stochastically stable equilibria which consists of the states attained
almost surely by a dynamical system when the noise level approaches zero. The identification
of such system states is particularly useful in games with multiple equilibria (e.g., coordination
games) as it permits to find out whether some outcomes are much more likely than others when
the noise vanishes. Our protocol is characterized by stochastic perturbations and we study the small
noise limit by making use of the tools provided in [72]. For the sake of a self-contained exposition,
we include here some definitions and results we shall need.
Definition 4.5 (Model of evolution with noise [72]). A model of evolution with noise or mistakes
model is a triple (Z, P, P ()) where:
75
Chapter 4. Distributed Learning in Wireless Networks: A Game-theoretical Perspective
3. P () = (pzz 0 ())(z,z 0 )∈Z 2 is a family of Markov transition matrices on Z with ∈ [0, ¯) such
that:
Definition 4.6 (Unperturbed and perturbed Markov chain). In a model of evolution with noise
(Z, P, P ()), (Z, P ) is called the unperturbed Markov chain and, for any , (Z, P ()) is a perturbed
Markov chain. The family of perturbed Markov chains with is called a regular perturbation.
Remark. The fact that P () is ergodic ensures that from any state z ∈ Z, we can reach any state
z 0 ∈ Z in a finite number of steps with positive probability. The unperturbed Markov chain is however
not necessarily ergodic. If not, the Markov chain (Z, P ) has one or more limit sets.
Definition 4.7 (Limit set). A limit set or recurrent class L of a Markov chain X = (Z, P ) is a set
of states of X such that ∀z ∈ L, P [Xt+1 ∈ L|Xt = z] = 1 and ∀z, z 0 ∈ L, there exists τ > 0 s.t.
P [Xt+τ = z 0 |Xt = z] > 0.
The unperturbed Markov chain can be interpreted as the evolution of the system when players
follow a predefined rule of evolution like Best Response. Noise can be interpreted as a probability
that players do not follow the rule of the dynamics. For example, if the rule is Best Response, players
choose the best response strategy at the next iteration step with probability 1 − and choose any
other strategy at random with probability . When a player does not follow the predefined rule, we
say that there is a mutation by analogy with what happens in species evolution.
Definition 4.8 (State transition cost). The cost or resistance czz 0 of the transition z → z 0 is the rate
at which the transition probability pzz 0 () tends to zero as vanishes:
0 if Pzz 0 (0) > 0
czz 0 = k if Pzz 0 () = (azz 0 + o(1))k
∞ if Pzz 0 () = 0, ∀ ∈ [0, ¯]
Let µ() be the stationary probability distribution of the perturbed Markov chain (Z, P ()).
Lemma 4.3 (Existence of limit distribution [225]). There exists a limit distribution
Lemma 4.4 ([72]). The set of stochastically stable states is included in the limit sets of the unperturbed
Markov chain (Z, P ).
Definition 4.9 (Long-run stochastically stable set). A state z ∈ Z is said to be long-run stochastically
stable if and only if µ∗z > 0.
76
Chapter 4. Distributed Learning in Wireless Networks: A Game-theoretical Perspective
D(Ω) R(Ω)
Lr-1 CR*(Ω) L1
Ω x
proba=1
z
Let Ω be a union of one or more limit sets of (Z, P ). We now want to study the conditions for Ω
to be stochastically stable. We also want to know the speed at which Ω is reached. For this purpose,
[72] defines W (x, Ω, ) to be the expected time until set Ω is reached knowing that we start in state
x and that the system follows the perturbed Markov chain (Z, P ()). The goal is to characterize
maxx∈Z W (x, Ω, ).
We start with some definitions of concepts illustrated in Fig. 4.3 before giving the main theorem.
Define a path (z1 , z2 , · · · , zτ ) as a sequence of states.
Definition 4.10 (Basin of attraction). Let Ω be a union of one or more limit sets of (Z, P ) and let
(z1 , z2 , · · · , zτ ) be a sequence of states. The basin of attraction D(Ω) of Ω is the set of initial states
from which the unperturbed Markov chain converges to Ω with probability 1, i.e.:
Definition 4.11 (Path cost). For two sets X and Y , a path in Z is a sequence of states (z1 , z2 , · · · , zτ )
with z1 , z2 , · · · ∈ X and zτ ∈ Y . The cost of the path is the sum as below:
τ −1
X
c(z1 , z2 , ..., zτ ) = czi ,zi+1 .
i=1
be the set-to-set cost between X and Y . The radius of the basin of attraction of Ω is defined as the
minimum number of mutations needed to leave D(Ω) given that we start in Ω.
Definition 4.12 (Radius). The radius R(Ω) of Ω is the minimum cost of any path from Ω out of
D(Ω), i.e.:
R(Ω) = C(Ω, Z − D(Ω)).
Definition 4.13 (Coradius). The coradius CR(Ω) of Ω is defined by:
77
Chapter 4. Distributed Learning in Wireless Networks: A Game-theoretical Perspective
In other words, the coradius is the maximum number of mutations needed to reach Ω.
Consider now (z1 , · · · , zτ ) a path from x to Ω. Let L1 , · · · , Lr be a set of consecutive limit sets
with Lr ⊂ Ω and Li 6⊂ Ω for all i < r, through which the path passes. We define the modified cost
function by substracting from the initial cost function the intermediate radii of the limit sets Li :
r
X
∗
c (z1 , · · · , zτ ) = c(z1 , · · · , zτ ) − R(Li ). (4.18)
i=2
Definition 4.14 (Modified coradius). The modified coradius of the basin of attraction of Ω is defined
as:
The theorem proposed by Ellison in [72] is a sufficient condition to identify a long-run stochas-
tically stable set of the system. It also gives an lower bound on convergence rate.
Theorem 4.11 (Convergence to long-run stochastically stable set with modified cost [72]). Let
(Z, P, P ()) be a model of evolution with noise, and suppose that for some set Ω which is a union of
limit sets R(Ω) > CR∗ (Ω). Then:
In other words, if it is more difficult to leave Ω and its basin of attraction than to come back to
it, the long-run stochastically stable set is contained in Ω.
We now establish the convergence of the retrospective spectrum access algorithm. We start by
stating the following definitions required in the study of convergence.
Definition 4.15 (Single player improvement [79]). A strategy profile s0 is a single player improve-
ment over the strategy profile s if it coincides with s in every coordinate except one, say coordinate j,
and the payoff of player j is higher under s0 than under s.
Definition 4.16 (Weak finite improvement property [79]). A game G has the weak finite improve-
ment property (weak-FIP) if from each strategy profile s there exists a finite sequence of single-player
improvements that ends in a pure NE.
Theorem 4.12 ([145]). Given a congestion game G with player-specific decreasing payoff functions,
the weak-FIP holds and G admits at least one pure NE.
78
Chapter 4. Distributed Learning in Wireless Networks: A Game-theoretical Perspective
We next analyze structural properties of the spectrum access game dynamics and evolution
under the the retrospective spectrum access protocol.
Lemma 4.5. Under the RSAP, there is a one-to-one mapping between the set of migration stable states
and the limit set in the following cases:
• in the general case with endogenous inertia ρj > 0,
• in the particular case Hj = 1 and ρj = 0 for all j ∈ N .
Let Ω∗ denote the union of all limit sets in pure NEs, we can establish the following properties
on Ω∗ .
/ Ω∗ .
Lemma 4.6. It holds that R(ω) = 1, ∀ω ∈
We can then establishes the convergence of the retrospective spectrum access protocol by
applying Theorem 4.11.
Theorem 4.13 (Convergence of RSAP and convergence rate). If all SUs 1) adopt the RSAP and 2)
adopt a random strategy at each iteration with probability → 0, then the system dynamics converges
a.s. to Ω∗ , i.e. to a PNE of the game. The expected delay until a state in Ω∗ is reached, given that the
play in the -perturbed model begins in any state not in Ω∗ , is O(−1 ) as → 0.
Our study can be extended to other games possessing the weak-FIP. These include dominance
solvable games, quasi-acyclic games (similar to acyclic games as defined by Young [225]), power
set graphical congestion games5 and games with the finite improvement property (as defined by
Monderer and Shapley [146]).
We consider a cognitive radio scenario which consists of primary and secondary networks, as
well as a large set of cognitive users, and we focus on a fundamental issue concerning such systems,
i.e. whether it is better for a cognitive user to act as a primary user, paying the primary operator
for costlier, dedicated network resources with Quality of Service guarantees, or act as a secondary
user (paying the Secondary operator), sharing the spectrum holes left available by licensed users
and facing lower costs with degraded performance guarantees. At the same time, we consider the
pricing problem of both Primary and Secondary operators, who compete with each other, setting
access prices to maximize their revenues.
5
Power set graphical congestion games are extended graphical congestion games where players can use any subset
of resources.
79
Chapter 4. Distributed Learning in Wireless Networks: A Game-theoretical Perspective
The joint pricing and cognitive radio network selection problem is modeled as a Stackelberg
game, where first the primary and secondary operators set their access prices in order to maximize
their revenues. In this regard, we study both practical cases where (1) the primary and secondary
operators fix access prices at the same time, and (2) the primary operator exploits his dominant
position by playing first, anticipating the choices of the secondary operator. Then, network users
react to the prices set by the operators, choosing which network they should connect to, therefore
acting either like primary or secondary users.
The solution provides an insight on how rational users will distribute among existing access
solutions (higher-price primary networks vs. lower-price secondary networks), i.e., the proportion
of players who choose different strategies.
We then adopt a fluid queue approximation approach to study the steady-state performance of
these users, focusing on delay as QoS metric. Besides considering static traffic equilibrium settings,
we further formulate the network selection process of cognitive radio users as a population game,
which provides a powerful framework for characterizing the strategic interactions among large
numbers of agents, whose behavior is modeled as a dynamic adjustment process. More specifically,
we study the cognitive users’ behavior according to replicator dynamics, since such users adapt
their choices and strategies based on the observed network state. We provide equilibrium and
convergence properties of the proposed game, and derive optimum stable price and network
selection settings.
We also extend our methodology to study the load balancing in the smart grid. Specifically,
we proposes a fully distributed Demand-Side Management (DRM) system for smart grid infras-
tructures, especially tailored to reduce the peak demand of residential users. In particular, we
use a dynamic pricing strategy, where energy tariffs are function of the overall power demand of
customers.
We model our system using a game theoretical approach, considering two practical cases where
(1) each appliance decides autonomously its scheduling in a fully distributed fashion (Single-
Appliance DSM), and (2) each user must schedule all his home appliances (Multiple-Appliance
80
Chapter 4. Distributed Learning in Wireless Networks: A Game-theoretical Perspective
DSM). The proposed approach automatically ensures the reduction of the electricity demand at
peak hours due to dynamic pricing.
We compare numerically these two cases, showing that the first is characterized only by a
negligible performance degradation in all the considered grid scenarios. Nevertheless, while both
mechanisms achieve almost the same performance level, the Multiple-Appliance DSM system re-
quires a more complex architecture with a central server for each house that collects all appliances
information and plays on behalf of the house-holder. Such an approach would increase the instal-
lation and operating costs due to the higher system complexity. On the contrary, in the Single-
Appliance DSM system, one can use the processing and communication capabilities of devices that
can autonomously optimize their usage, thus greatly simplifying the architecture design and system
configuration.
We demonstrate that our game is a generalized ordinal potential game under some simple and
very general conditions (viz., the regularity of the pricing function). Such feature guarantees some
nice properties, such as the existence of at least one pure NE where no player has an incentive to
deviate unilaterally from the scheduling pattern he decided upon. Furthermore, we show that any
sequence of asynchronous improvement steps is finite and always converges to a pure NE.
81
Chapter 5
5.1 Introduction
In this chapter, we focus on path optimization and the related scheduling problems arising from
data harvesting and mobile charging. We present a generic analysis on these problems and design
polynomial or quasi-polynomial time algorithms achieving constant or logarithmic approximation
to the optimum.
We first consider the problem of data harvesting in wireless sensor networks. The task of data
harvesting is traditionally accomplished by multi-hop forwarding, which is known to suffer from
high energy consumption of forwarding nodes, especially those near the sink. Recently, as an
efficient alternative, data harvesting using mobile devices, also termed as data mules [176] or
data ferries [223], has been proposed and implemented in several applications such as underwater
environmental monitoring [196]. The core idea can be summarized as follows: a data ferry (e.g.,
robot, vehicle) travels across the sensor field and harvests data from sensor nodes while they are
within each other’s communication range, and later transfers the harvested data to the sink. The
use of data ferries in data harvesting can significantly reduce energy consumption at sensor nodes
and thus increase network lifetime. However, as the data ferry can harvest data only when it travels
close to the target node, it usually incurs longer data delivery latency, during which some delay-
sensitive data may become obsolete. Therefore, optimizing the trajectory of the data ferry to limit
or minimize data delivery latency is a primary concern for this approach to be effective in practice.
We investigate the trajectory optimisation problem in data collection applications for wireless
sensor networks. This problem seeks an optimal data harvesting path to collect as much data as
possible within a time duration. We call the problem time-constrained data harvesting problem.
Specifically, our problem formulates the situation where delay-sensitive data needs to be reported
to the sink within certain amount of time before they become obsolete. To make the analysis
theoretically complete and generic, we investigate the generic m-dimensional context, of which
the cases of m = 1, 2, 3 are particularly pertinent.
Our main results are naturally articulated as follows.
82
Chapter 5. Data Harvesting and Charging Path Optimization
In many emerging applications such as first responder, infrastructure monitoring, and scientific
exploration, battery-powered mobile agents (e.g., sensors [220], robots [17], drones [65], and
vehicles [161]) usually have specified tasks and mobility patterns [60]. To supply energy to these
agents, mobile chargers are dispatched to visit these agents, which can significantly prolong the
lifetime of mobile nodes. However, as the mobile charger only deliver energy to a target node when
it encounters the node (or close to the node if wireless charging is used), inefficient path planning
may incur long latency to pursue target nodes, result in dead nodes or even task failures. Therefore,
optimizing the trajectory of the mobile charger is a primary concern for maintaining the operation
of these systems.
Motivated by the above observation, we focus on the path optimization and charger schedul-
ing problems with mobile nodes in mobile charging applications. To make our analysis generic
and widely applicable, we do not impose any constraint on the mobility patterns of nodes, i.e.,
the trajectory of any node can be a curve of any form. We formulate a class of generic path opti-
mization problems and concentrate on the problem of maximizing the number of nodes charged
within a fixed time horizon. Our framework allows a variety of path optimization problems to
be formulated with realistic constraints, such as limited time and energy budget. We prove that
these problems are either NP-hard or APX-hard. We design a quasi-polynomial time algorithm that
83
Chapter 5. Data Harvesting and Charging Path Optimization
achieves logarithmic approximation to the optimum charging path. We also demonstrate how our
approximation algorithm can be adapted and extended to solve other charging path optimization
and scheduling problems.
Again, despite our focus on mobile charging, the generic formulation of the problem makes our
analysis methodology and obtained results applicable to a wide range of related problems such as
data mule scheduling, package delivery, target monitoring, and security patrolling. These problems
have a common generic objective of designing an optimum path and the corresponding scheduling
that maximize the number of encountered targets under a given budget in terms of time or energy,
or minimizes the cost of encountering a given minimum number of targets.
The rest of this chapter is structured as follows. Section 5.2 develops our work on the time-
constrained data harvesting. Section 5.3 presents our work on the charging path optimization
and scheduling. Section 5.4 concludes the chapter by briefly summarizing our other related work
related to this topic. More details of our work on this topic including proofs and numerical analysis
can be found in our publications [46, 48, 51, 93].
The problem we address and our methodology are related to the following research fields.
Data ferry Assisted Data Harvesting. There is a large body of existing work on data ferry
assisted data harvesting [59, 135, 182, 222, 231] (cf. [69] for a comprehensive survey). The
problem we address is the optimisation of data harvesting trajectory of the data ferry, which is a
hard problem in general, since we are constrained in both space (communication range between
the data ferry and sensors) and time domain (limiting data harvesting latency). Existing solutions
contour this difficulty by either using simple mobility and communication models [59, 135, 182,
222, 231] or assuming that the trajectory is already given [182].
The authors in [108, 223] address a similar problem of designing data harvesting path for data
ferries to minimize the data harvesting latency under the constraint that all sensors are visited. The
algorithms they propose are based on the well-known travel salesman problem (TSP) [16] and its
variant TSP with neighbors (TSPN) [66]. However, our problem is different because TSP requires
the path to pass all sensors while we seek the most profitable path to harvest maximum data given
the time constraint. Our problem formulation complements the TSP formulation and is particularly
pertinent when the network is large and it is impossible for the data ferry to traverse every node.
Mobile Charger Scheduling. Another similar problem is the mobile agent scheduling problem
where a mobile charger needs to travel within the charging range of each sensor node to recharge
them under the constraint of the battery life of sensor nodes, which is similar to the time constraint
in our data harvesting problem (cf. [218, 221, 235] and references therein). However, they rely
on additional assumptions or simplifications to make the problem tractable. For example, the
authors of [221] find out a near-optimum traveling path to recharge all sensor nodes using linear
programming, assuming the traveling speed being infinite, and then remove this assumption and
84
Chapter 5. Data Harvesting and Charging Path Optimization
derive a bound of performance degradation. However, their algorithm implicitly assumes the
travelling is fast enough. In our work, we remove these assumptions and analytically establish
the performance properties of the proposed data harvesting algorithm.
Related Theoretical Problems. From a theoretical point of view, the problem we address is
related to several fundamental problems in theoretical computer science, particularly the Orien-
teering problem [33] and the weight-constrained minimum spanning tree problem [3]. In the
Orienteering problem, each node of a given graph has certain quantity of reward. The problem is
to find a path that maximizes the reward collected, subject to a constraint on the path length. In
the weight-constrained minimum spanning tree problem, each edge has a cost and weight. The
problem is to find a spanning tree with minimum total cost subject to a upper-bound on the total
weight. Both problems are NP-hard and have constant-factor approximation algorithms. However,
these approximation algorithms cannot be directly applied in our problem as they are focused on
topological paths.
Example 5. Consider the two-dimensional network illustrated in Fig. 5.5 composed of three sensor
nodes v1 , v2 , v3 with the circles around them indicating their neighborhoods D1 , D2 , D3 . The path P
covers both v1 and v2 , but not v3 . When moving along P , s can harvest data generated at both v1 and
v2 . We thus have Λ(P ) = 2. d(P ) is the Euclidean length of P .
1
The case where nodes generate multiple data messages can be tackled by devising the node generating k unit data
messages to k virtual nodes at the same position, each generating unit data message.
85
Chapter 5. Data Harvesting and Charging Path Optimization
P v2 v3
v1
We consider the data harvesting problem faced by s to seek an optimum path to harvest as
much data as possible within a time duration T . The problem we address models the situation
where delay-sensitive data should be reported to the sink within certain time in order to be further
analysed. To make the notation concise, we let s move at unit speed and thus T is the maximum
path length s can traverse before deposing the harvested data. The results obtained can be easily
scaled to arbitrary speed by scaling the time duration T . Throughout our analysis, we are interested
in the non-trivial case where r T and T rm−1 Dm , i.e., the maximum path length is much
longer than the communication range, while the space covered by a path of length T is much smaller
than the network space. The time-constrained data harvesting problem is formalized below.
Problem 5.1 (Time-constrained Data Harvesting Problem). The time-constrained data harvesting
problem is as follows:
maximize Λ(P ), subject to d(P ) ≤ T .
That is, s seeks the optimal path P ∗ ∈ P of Euclidean length d(P ∗ ) ≤ T , along which it can harvest
86
Chapter 5. Data Harvesting and Charging Path Optimization
the maximum quantity of data. When there are more than one maximum, the optimal path P ∗ is the
one with minimum Euclidean length.
It is worth noting that the time-constrained data harvesting problem has a number of important
variants. In some applications, we require that the data harvesting path to be a cycle or have
predefined starting and end points; it is sometimes required to differentiate sensor nodes by giving
weights to them (e.g., giving higher weights to sensors at key positions) and seek the path maxi-
mizing the weighted sum of harvested data; furthermore, we may dispose multiple data ferries to
for data harvesting. Many of these variants can be addressed using the framework established in
this section to design and optimize data harvesting path.
A simple data harvesting algorithm is to randomly choose a data harvesting path of length T .
We call this algorithm random data harvesting algorithm, termed concisely as random algorithm.
Our motivation of starting with the random algorithm is two-fold:
• It is a natural strategy and easy to implement;
• It provides a reference for performance comparison with more sophisticated algorithms as
well as the optimal one.
The following theorem states the main result.
Theorem 5.1 (Performance of Random Data Harvesting Algorithm). Consider the random data
harvesting algorithm where s randomly chooses a path P of length T , it holds that
• E[Λ(P )] = O(λrm−1 T );
• Pr {Λ(P ) ≥ n E[Λ(P )]} → 0, when n → ∞, ∀ > 0, that is, Pr {Λ(P ) = Θ(n )} → 0.
Having derived the performance of the random algorithm, we proceed to investigate the per-
formance of the optimal data harvesting algorithm, as stated in Theorem 5.2.
Theorem 5.2 (Performance of Optimum Algorithm). Let P ∗ denote the path of the optimal data
harvesting algorithm, it holds that
87
Chapter 5. Data Harvesting and Charging Path Optimization
∗ log n
• E[Λ(P )] = Θ ;
log log n
∗ log n
• Pr Λ(P ) = Θ → 1, when n → ∞.
log log n
The intuition behind Theorem 5.2 (cf. [46]) is that by the bins and
balls problem,
in average
log n
we can find a region in the network such that it contains at least Θ log log n nodes and all the
nodes in the region can be covered by a data harvesting path of length T . We have also shown in
the proof that this bound is tight.
5.2.3.3 Discussion
Comparing the performance of optimal and random data harvesting algorithms, we can observe
that when the network scales, especially when n → ∞, the optimal algorithm significantly outper-
forms the random one. Even though the trend is logarithmic not polynomial or exponential, the gap
can still be significant in large networks. In other words, a data harvesting algorithm not carefully
chosen, such as randomly choosing a harvesting path, can be very inefficient. The motivates our
second part of work on the following fundamental question:
How to design efficient data harvesting algorithms that approaches the solution of Problem 5.1?
Remark. Theorem 5.2 establishes the performance of the optimal algorithm. However, it does not
specify how the optimal path can be constructed given a network instance. Choosing the path as
indicated in the first step in the proof of Theorem 5.2 only performs well in the average sense when
a large number of instances are executed, but it cannot give the optimal path for a given network
instance. In fact, as we will show in the next subsection by Theorem 5.6, the problem of constructing
the optimal path as formulated in Problem 5.1 is NP-hard.
We first show that Problem 5.1 is NP-hard. We then design constant-factor approximation data
harvesting algorithms with polynomial-time complexity.
Theorem 5.3 (NP-hardness of Time-constrained Data Harvesting Problem). Problem 5.1 is NP-hard.
The proof, detailed in [46], consists of relating Problem 5.1 to the TSP which is NP-hard.
Given the complexity of the time-constrained data harvesting problem, we first investigate
T a
specific scenario where the neighborhoods of any two nodes are non-overlapped (i.e., Di Dj = ∅,
∀vi , vj ∈ V ) and develop an approximation algorithm for Problem 5.1. We start by the following
definition of topological path.
Definition 5.1 (Topological path). A path Pt is called a topological path in a graph if Pt is composed
of uniquely the edges in the graph.
88
Chapter 5. Data Harvesting and Charging Path Optimization
Pg v2 Pt v3
v1
Figure 5.2: A topological path Pt and a geometrical path Pg , both covering 3 nodes.
Generically, we call a path geometrical path, denoted as Pg for presentation clarity, to emphasize
that Pg is not necessarily a topological path as Pg may contain curves and may start and end at
any point. Of course, a topological path is also a geometrical one, i.e., let Pg and Pt denote the
sets of geometrical and topological paths, it holds that Pc ⊂ Pg . Fig. 5.2 illustrates the notions of
topological and geometrical paths.
The key element towards designing approximation algorithms for Problem 5.1 is to establish
the relationship between geometrical and topological paths in terms of path length and number of
covered nodes. This relationship is established in two steps:
• Step 1: We show that any geometrical path Pg can be approximated by a topological path Pt
such that
d(Pt ) = O(d(Pg )), and Λ(Pt ) = Λ(Pg ).
• Step 2: We show that any topological path Pt can be approximated by a geometrical path Pg
via a geometrisation procedure that we develop such that
Lemma 5.1. Given an ordered set of nodes Vg , ∀Pg , Λ(Pg ) = Vg , let Pt = Vg , it holds that d(Pt ) =
O(d(Pg )). Particularly, let Pg∗ = argmin d(Pg ), it holds that d(Pt ) = Θ(d(Pg∗ )).
Λ(Pg )=Vg
We then proceed to the second step to approximate a topological path Pt by a geometrical path
Pg by introducing geometrisation, formally defined in the following.
Definition 5.2 (Geometrisation). Given a topological path Pt , the geometrisation procedure finds a
geometrical path Pg that approximates Pt . By approximation we require that
89
Chapter 5. Data Harvesting and Charging Path Optimization
Algorithm 8 details the proposed geometrisation procedure, whose core part is further illus-
trated in Fig. 5.3. It is straightforward to see that d(Pg ) < d(Pt ). One technical point worth
commenting is how to find Mi on Di such that |Mi−1 Mi | + |Mi vi+1 | is minimized (line 6). Mi can
be efficiently found by using the following technique: consider the outside border of Di as a mirror;
let a light beam be emitted from Mi−1 and then be reflected by Di to reach vi+1 ; it follows from
the theory of optics that light always travels using the shortest path; hence Mi corresponds to the
reflection point of the light beam on Di and can be found geometrically by equalizing the angle of
incident and the angle of reflection.
Algorithm 8 Geometrisation
Input: Topological path Pt passing nodes in Vt
Output: Geometrized path Pg
1: Denote the intersection point of v1 v2 and D1 by M1 ;
2: for i = 2 to |Vt | − 1 do
3: if Mi−1 vi+1 covers Di then
4: Denote the first intersection point between Mi−1 vi+1 and Di by Mi ; // See Fig. 5.3 (left);
5: else
6: Find a point Mi on Di such that |Mi−1 Mi | + |Mi vi+1 | is minimized; // See Fig. 5.3 (right);
7: end if
8: end for
9: Denote the intersection point of M|Vt |−1 v|Vt | and D|Vt | by M|Vt | ;
10: Return Pg = {M1 M2 , · · · , M|Vt |−1 M|Vt | };
vi
vi Mi
Mi−1vi−1 Mi vi+1 Mi−1vi−1 vi+1
It is worth mentioning that the for loop in Algorithm 8 can be repeated so as to further
improve geometrisation effectiveness (i.e., decrease d(Pg )). To make this clearer, let Pgj−1 =
{M1j−1 M2j−1 , · · · , M|V
j−1
t |−1
j−1
M|V t|
} denote the output of Algorithm 8 at iteration j − 1, for iteration
j, it suffices to set Pt = Pgj−1 by letting vk = Mkj−1 (2 ≤ k ≤ |Vt | − 1) in the algorithm. We observe
via simulation that that the improvement is not significant or even negligible when Algorithm 8 is
executed more than a handful of times.
After establishing the relationship between geometrical and topological paths, we are now
ready to present the global algorithm for Problem 5.1, as detailed in Algorithm 9.
The core idea of Algorithm 9 is as follows: for each node pair, we find the topological path
Πt (i, j) passing the maximum number of nodes in V whose geometrized path Πg (i, j) satisfies
d(Πg (i, j)) ≤ T ; we then return Π∗ = argmax Λ(Πg (i, j)). The two building blocks in Algo-
Πg (i,j),∀vi ,vj ∈V
rithm 9 is the geometrisation algorithm (Algorithm 8) and the algorithm of max-prize path in [33].
Given a graph in which each node has a certain amount of prize, the max-prize algorithm finds in
90
Chapter 5. Data Harvesting and Charging Path Optimization
v2
v1 Pb v3 v4
polynomial time a path collecting the maximum quantity of prize whose length is bounded by a
constant, given as an input parameter. The following theorem formally establishes the performance
of Algorithm 9.
Theorem 5.4 (Performance of Algorithm 9). Algorithm 9 returns Π∗ within polynomial time. It holds
that Λ(Π∗ ) = Θ(Λ(P ∗ )), where P ∗ denotes the optimal data harvesting path under time constraint T .
We next extend our efforts to study the generic case with overlapping neighborhoods.
We first construct a graph G0 whose node set is V and there is an edge between vi and vj if
vi vj ≤ 2r. We then construct a maximal independent set (MIS)2 of G0 using a coloring algorithm
similar as presented in [108, 199], detailed in Algorithm 10 for completeness. Fig. 5.4 illustrates
an example of MIS composed of nodes v1 and v3 .
We then define backbone topological paths, which can be regarded as topological paths using
nodes in the MIS U.
Definition 5.3 (Backbone Topological path). A path Pb is called a backbone topological path, or
backbone path for short, in a graph if Pb is composed of uniquely the edges whose endpoints are in the
MIS of the graph except the source and the destination nodes.
91
Chapter 5. Data Harvesting and Charging Path Optimization
topological and backbone paths, it holds that Pb ⊂ Pt ⊂ Pg . As an example, the path Pb is Fig. 5.4
is a backbone path.
We apply the same analysis and design methodology in the non-overlapping neighborhood
case and adapt it in the overlapping neighborhood case. A point M is said to be touched by path
P if the minimum distance between any point of P and M is larger than r but smaller or equal
to 2r. The key element of designing approximation algorithm for Problem 5.1 with overlapping
neighborhoods is to establish the relationship among geometrical, backbone, and geometrized
backbone paths in terms of path length and number of touched and covered nodes. Specifically, we
establish the relationship two steps:
• Step 1: We show that any geometrical path Pg can be approximated by a backbone path Pb
such that d(Pb ) = O(d(Pg )) and ∀vi covered by Pg , vi is either covered or touched by Pb ;
• Step 2: We show that any geometrical path Pg can be approximated by another geometrical
path Pg0 geometrized from a backbone path Pb via a backbone geometrisation procedure such
that
d(Pg0 ) = O(d(Pg )), and Λ(Pg0 ) ≥ Λ(Pg ).
We start with the first step by showing the following lemma. The proof uses similar reasoning
technique as the proof of Lemma 5.1 detailed in [46].
Lemma 5.2. Given any geometrical path Pg , there exists a backbone path Pb such that d(Pb ) =
O(d(Pg )) and ∀vi covered by Pg , vi is either covered or touched by Pb . Particularly, let Pg∗ = argmin d(Pg ),
Λ(Pg )=Vg
it holds that d(Pb ) = Θ(d(Pg∗ )).
Definition 5.4 (Backbone Geometrisation). Given a backbone path Pb , the backbone geometrisation
procedure finds a geometrical path Pg that approximates Pb . By approximation we require that
d(Pb ) = Θ(d(Pg )), and Λ(Pb ) ≤ Λ(Pg ).
92
Chapter 5. Data Harvesting and Charging Path Optimization
Lemma 5.3. Given any geometrical path Pg , there exists a path Pg0 geometrized from a backbone path
Pb such that
d(Pg0 ) = O(d(Pg )), and Λ(Pg0 ) ≥ Λ(Pg ).
After establishing the relationship among geometrical, backbone and geometrized backbone
paths, we now present the design of the global approximation algorithm for Problem 5.1 for the
overlapping neighborhood case, as detailed in Algorithm 11.
The core idea of Algorithm 11 is as follows: for each node pair (vi , vj ), we find the bcckbone
path Πb (i, j) passing the maximum number of nodes in V whose geometrized path Πg (i, j) satisfies
d(Πg (i, j)) ≤ T ; we then return Π∗ = argmax Λ(Πg (i, j)). The two building blocks in Algo-
Πg (i,j),∀vi ,vj ∈V
rithm 9 is the backbone geometrisation algorithm [108] and the algorithm of max-prize path [33].
When running the algorithm of max-prize path, we set the prize of each node vi to be the number
of nodes covered or touched by Di , which allows us to achieve constant-factor approximation. The
following theorem establishes the performance of Algorithm 11.
Theorem 5.5 (Performance of Algorithm 11). Algorithm 11 returns Π∗ within polynomial time. It
holds that Λ(Π∗ ) = Θ(Λ(P ∗ )), where P ∗ denotes the optimal data harvesting path.
The time complexity of Algorithm 9 and Algorithm 11 is O(n5 ) following that the complexity of
the max-prize path is O(n3 ). The approximation ratio can be derived from the approximation ratio
of the prize-collecting problem (approximately 2) and the geometrisation in the non-overlapping
case (2) and the backbone geometrisation in the overlapping case (1 + 20 π ) for the algorithm
in [108]). The overall approximation ratio is thus 4 and 2(1 + 20 π ) in the non-overlapping and
overlapping cases.
93
Chapter 5. Data Harvesting and Charging Path Optimization
Path Optimization in Vehicular Routing and Data Ferry Assisted Data Harvesting. There
have been a significant amount of research works on the Vehicular Routing Problem (VRP) [193].
In general, the VRP seeks to optimize the routing decisions of a single or fleet of vehicles to deliver
goods to specified locations according to demand requirements and other specified constraints.
The VRP has many variants based on the application scenarios and constraints. Some recent
papers deal with optimizations with dynamism [87, 151], uncertainty [142], and many real-life
constraints [224, 233]. These works provide a rich foundation for related algorithmic research.
However, it is commonly assumed in these works that the targets are stationary and the vehicle
travels through a set of fixed locations.
Another related problem is path optimization in data ferry assisted data harvesting [53, 59,
182, 222] (cf. [69] for a comprehensive survey), where a data ferry (e.g., robot, vehicle) travels
across the sensor field and harvests data from sensor nodes. Again, existing works focus on finding
the optimum path of minimum length in the setting where sensor nodes are stationary.
Mobile Charger Scheduling. The mobile charger scheduling problem [92, 164] seeks paths
for mobile chargers (e.g., mobile robots) to replenish batteries for sensor networks. Many variants
are studied by introducing different optimization goals, application scenarios, and constraints. For
example, the authors of [235] consider collaborate mobile charger scheduling, where different
mobile chargers can recharge each other so that the chargers can cover a larger area. In [157], the
authors assume that the charging time for sensor nodes is much longer than the mobile charger’s
traveling time. Their goal is to maximize the timespan that all sensor nodes are alive. In [221], the
authors assume that the charger can recharge all the sensor nodes lying within a distance through
wireless power transfer. Their goal is to find a charging path and stopping points to minimize the
charging time. In a recent paper [90], the authors consider the power heterogeneity of sensor
nodes. They divide sensor nodes into groups, and apply the TSP algorithms to recharge nodes
within each group. In [62], the problem of ensuring monitoring quality for stochastic events is
studied. When the traveling time of the mobile charger is ignored, this problem can be reformulated
as a sub-modular problem. A polynomial time algorithm with constant time approximation ratio is
then developed.
Existing works on mobile charger scheduling all assume that the nodes are fixed in locations.
In contrast, we study a more generic and practical scenario of moving target nodes, and develop a
quasi-polynomial time scheduling algorithm that achieves logarithmic approximation.
Traveling Salesman Problem and Orienteering Problem. The Traveling Salesman Problem
(TSP) is a class of combinatorial optimization problems which have been extensively studied. Many
approximation algorithms have been proposed [16, 117]. A relevant extension to the TSP is the
deadline-TSP or the TSP with time window, where each node can be visited only within a time
interval [21, 22, 33, 38].
In [33] and [38], the authors study the Max-Prize Path problem, also referred to as the Ori-
enteering problem, where the goal is to visits as many nodes as possible, but only before a hard
time deadline D. The APX-hardness of the Orienteering problem can be shown via reduction from
the TSP on bounded metrics which is APX-hard [33]. For the Orienteering problem on undirected
graphs, the best approximation ratio in the literature is 2 + [38]. For the Orienteering problem
on directed graphs, the best approximation ratio is O(log2 OP T ) [38].
Our work can be regarded as a non-trivial extension of the Orienteering problem in a more
94
Chapter 5. Data Harvesting and Charging Path Optimization
v2 (3)
P
s v1 v2 (2) v3 (3) t
v3 (2)
v2 (1)
v3 (1)
v2 (0) v3 (0)
generic case where nodes are are mobile. In this regard, only [91] has considered the TSP with
moving nodes by designing a (1 + α)-approximation algorithm, where α denotes the approximation
ratio of the TSP heuristic. However, the algorithm developed in [91] only works when the number
of moving targets is sufficiently small. In contrast, we address the generic case even when the
network scales. We would like to emphasize that the analysis method of the classic TSP and
the (directed) Orienteering problems where the heuristic path is formed by joining several trees
cannot be applied in our problem as the resulting path may not be feasible when nodes are mobile.
Therefore, an original study is called for which cannot draw on existing results.
We consider a network composed of n mobile nodes (e.g., robots, sensors, drones), denoted
by the set V , {vi }ni=1 , deployed over a 2-D Euclidean plan. Nodes are battery-powered and thus
need to be recharged periodically. To perform the charging task a mobile wireless charging vehicle,
referred to as charger for short, travels from a starting point, denoted by a virtual node s, then
visits a number of nodes to charge them before returning to a terminal point, denoted by t. If the
charger needs to return to the starting point, t coincides with s, and the charging path becomes a
tour. A node remains stationary when being charged.
The novel challenge we address, w.r.t. the state-of-the-art works, is that both the charger and the
nodes are mobile. We denote vi (t) (1 ≤ i ≤ n) the position of node vi at time t. We assume
the mobility pattern of the nodes (i.e., their moving trajectories vi (t)) are known to the charger
and leave the unknown or partially known mobility case for future research. We denote ri the
upper-bound of the moving speed of vi and rs the moving speed of the charger. We assume that
rs > ri , ∀i ∈ [1, n] which holds in typical mobile sensing applications. We denote P ∈ P the
charging path followed by the charger, where P denotes the set of all possible paths. To make our
analysis generic and widely applicable, we do not impose any constraint on the mobility patterns of
nodes, i.e., the trajectory of any node can be a curve of any form. Table 6.2 lists the major notations
used in this section. The example in Figure 5.5 further illustrates our network model.
Example 6. Consider the network illustrated in Figure 5.5 composed of three nodes v1 , v2 , v3 . v1 is
stationary. v2 and v3 are mobile with their trajectories illustrated in the figure. Charging is immediate
at any node. By following the path P , the charger can charge v1 , v2 and v3 at time 1, 2 and 3.
95
Chapter 5. Data Harvesting and Charging Path Optimization
We consider a class of charging path optimization and scheduling problems including (but not
limited to) the following:
• The charger has a fixed time budget for the charging journey and aims at charging the
maximum number of nodes in one journey;
• The charger has a battery reservoir and aims at charging the maximum number of nodes
before returning to its service station to replenish itself;
• The charger has a fixed number of M ≤ n nodes to charge and aims at minimizing the total
charging time;
• The charger needs to charge all nodes within a charging journey and seeks the charging path
minimizing the energy consumption or the total time.
The above problems can be classified into two categories:
96
Chapter 5. Data Harvesting and Charging Path Optimization
• The charger has a certain budget (e.g., in terms of time, energy) and it seeks a path to
maximize the number of nodes it can charge within the given budget;
• The charger has a number of nodes to charge and it seeks a path of minimum cost (e.g. in
terms of time, energy consumption) to accomplish the charging task.
Our work establishes a generic framework on the charging path optimization and scheduling
problems. To instantiate our work, we focus on the first problem of maximizing the number of
nodes charged within a fixed time horizon, as formulated below. We discuss in Section 5.3.4 how
our framework can be applied to address other problems formulated above.
We start by modeling the charging process. Let x ∈ [0, 1] denote the battery level (in percentage)
of a node during the charging process. For each node vi , x can be expressed as a function of the
charging time t and the initial battery level x0 : x = fi (t, x0 ). Figure 5.6 traces an example of
charging curve for x0 = 0. Throughout our analysis, we assume the battery level of any node
remains constant during the charging journey unless it is charged by the charger. This assumption
is reasonable as the duration of one charging journey is typically negligibly small compared to the
lifetime of a node. This assumption also implies that each node is charged at most once during a
charging journey. Under this assumption, fi can be expressed as a function of t from the charger’s
perspective. Generically, the following property holds from elementary electrical circuit analysis:
• fi (t) is continuous, derivable and monotonously increasing in t;
• fi (t) is concave in t, meaning that the marginal charging utility decreases in t.
Define gi (x) , fi−1 (t). The following properties on gi (x) directly follow from the properties of
fi (t):
• gi (x) exists and is the time required to charge vi to x;
• gi (x) is continuous and convex in x;
• The derivative of gi (x), gi0 (x), is increasing in x.
f (t)
1
Given a charging path P whose Euclidean length is denoted by d(P ), let Λ(P ) ⊆ P denote the
set of nodes that the charger charges to battery level α ∈ [0, 1] (e.g., 90%) while traveling along P 3 .
For any point p ∈ P , let ts (p) denote the sojourn time during which s stays at p without charging
3
To make our analysis clear, we assume nodes need to be charged to the same battery level α. The extension to the
generic case with different charging levels is straightforward.
97
Chapter 5. Data Harvesting and Charging Path Optimization
any node. The entire charging time along P , denoted by Γ(P ), can be established below
d(P ) X X
Γ(P ) = + gi (α) + ts (p),
rs
vi ∈Λ(P ) p∈P
where d(P ) P
rs is the time required for the
Pcharger to travel distance d(P ), vi ∈A(P ) gi (α) is the time
to charge all nodes in Λ(P ) to α and p∈P ts (p) is the total sojourn time.
We refer to the total charging time along a path P as the timespan of P . Without introducing
ambiguity, a charging path P also denotes the corresponding charging schedule.
Problem 5.2 (Charging Path Optimization). The charging path optimization problem is as follows:
That is, given the required charging level α and the maximum timespan B, the charger seeks an
optimum path that maximizes the number of charged nodes. The solution P ∗ is termed as an optimum
charging path.
In many practical scenarios, it is acceptable to have small marge on the charging performance,
which motivates the following definition.
Definition 5.5 (-optimum Charging Path). A charging path P is called an -optimum charging
path (0 ≤ ≤ 1) if the following conditions are satisfied:
• Γ(P ) ≤ B, i.e., the timespan of P does not exceed B;
• By following P , the charger can charge all the nodes in Λ(P ∗ ) to at least (1 − )α.
The proof [51] consists of relating Problem 5.2 to the Orienteering problem which has been
proved APX-hard. In fact, even the static version of Problem 5.2 is APX-hard. It becomes much
more complex with nodes being mobile.
We divide time into time instances {tk , k = 0, 1, · · · } with stepsize ∆t , tk − tk−1 4 . The
trajectory of vi can then be discretized into a vector [vi (t0 ), vi (t1 ), · · · ] where vi (tk ) is the position
4
To make the analysis concise, we set the same stepsize for all the nodes. Nevertheless, our analysis can be slightly
adapted to the case with different per-node stepsize ∆ti .
98
Chapter 5. Data Harvesting and Charging Path Optimization
Example 7. Figure 5.7 illustrates an example of Gd and a feasible charging path P . There are two
nodes to be charged with v1 being stationary and v2 being mobile. The coordinates of nodes and the
trajectory of v2 are shown in the left subfigure. ∆t = 1. B = 3. The speed of the charger and v2 are
rs = 1 unit length per unit time and r2 = 0.5. Charging is immediate at any node. The corresponding
discretized graph Gd is shown in the right subfigure with a feasible path P depicted in both subfigures.
5
How to set Ki will be analysed later.
99
Chapter 5. Data Harvesting and Charging Path Optimization
P
s v1 v2 (2) t v1 (2) v2 (2)
(0, 0) (1, 0) (3, 0)
(2, 0)
v2 (1)
(2, −0.5)
P
v2 (0) v0 (0) v1 (1) v2 (1) v3 (3)
(2, −1)
v1 (0) v2 (0)
Theorem 5.7. Any feasible path passing m cliques corresponds to a charging path of maximum
timespan B that can charge m nodes to (1 − )α.
We next show that an optimum charging path can be approximated arbitrarily close by a feasible
path in terms of the number of charged nodes.
Theorem 5.8. Given any > 0, under the condition that ∆t ≤ α 0
3 min1≤i≤n gi [(1 − )α], there exists
a feasible path in Gd which is also an -optimum charging path.
Theorem 5.8 demonstrates that the performance loss due to discretization can be controlled
to arbitrarily small, at the price of increasing computation complexity in terms of the size of Gd .
Given a tolerance level , Theorem 5.8 also quantifies the bound on the discretization granularity
∆t to meet the performance requirement.
It follows from Theorem 5.7 and Theorem 5.8 that the problem of finding -optimum charging
path can be transformed to the problem of finding the optimum feasible path, which is however
APX-hard. The proof follows from the same deduction to the Orienteering problem as that in the
proof of Theorem 5.6 [51]. Given its APX-hardness, we focus on approximation algorithm design
in the next subsection.
This subsection presents our design of a quasi-polynomial time algorithm that achieves loga-
rithmic approximation to the optimum feasible path. We first state the following property of Gd
which is useful in later analysis.
Lemma 5.4. For any pair of nodes vs , vt ∈ V (Gd ), if there exists a path from vs to vt , then there must
exist an edge from vs to vt , i.e., c(vs , vt ) 6= ∞. Equivalently, if c(vs , vt ) = ∞, then there does not exist
a path from v1 to v2 .
To develop our algorithm, we assume that the edge costs of Gd are integers. If not, we can
round them to integers by scaling each edge cost by a factor λ and rounding the scaled cost to its
ceiling integer. The relative error incurred by the rounding process can be upper-bounded by λ,
denoted by (λ), as follows:
dcd (e)λe − cd (e)λ 1 1
ε(λ) = ≤ =O .
cd (e)λ cd (e)λ λ
100
Chapter 5. Data Harvesting and Charging Path Optimization
The core idea of our algorithm, inspired by the idea of recursion in [36, 37], is summarized
below:
• For each m = [1..nd ] (nd , |V (Gd )|) and each node v ∈ V (Gd ) − {s, t}:
– Recursively search a path P1 from s to v of minimum timespan that charges m nodes,
denote the timespan of P1 by b1 ;
– Recursively search another path P2 from v to t of timespan at most B − b1 that charges
the maximum number of nodes;
• Output the concatenated path P = (P1 , P2 ) that charges the maximum number of nodes;
In the recursion process, we need to carefully choose P1 and P2 such that the resulting concatenated
path does not visit any clique more than once.
Formally, the pseudo-code of the algorithm is illustrated in Algorithm 12. The core part of
Algorithm 12 is the recursive procedure OPF, which has the following inputs:
• Gd : the discretized graph;
• vs , vt : the starting and terminating nodes;
• V : the set of nodes to be charged; initially V = V (Gd );
• l: the recursion level, upper-bounded by L;
• b: the timespan budget of the charging journey.
OPF returns the optimum feasible path starting from vs ending at vt charging maximum number
of nodes in V , whose timespan is upper-bounded by b, by invoking l recursions.
• OFP first checks the timespan budget b and returns P = ∅ if the budget is infeasible. OFP
also returns ∅ if cd (vs , vt ) = ∞ as it follows from Lemma 5.4 that cd (vs , vt ) = ∞ implies there
does not exist a path from vs to vt . Otherwise P is initialized to cd (vs , vt ).
• If l = 0, meaning that the current instance is the last recursion, then OFP returns P .
• Otherwise, OFP iterates on each node v ∈ V (Gd ) − C(vs ) − C(vt ) and m = [1..nd ] to
recursively find: (1) P1 with minimum timespan (denoted by b1 ) starting from vs ending at
v that charges m nodes, each in a distinct clique, and (2) P2 which starts from v and ends at
vt with timespan B − b1 that charges maximum number of nodes, each in a distinct clique
and different to the cliques in P1 . To find P1 , it follows from Lemma 5.4 that only nodes in
Vv− need to be searched. Symmetrically, only nodes in Vv+ − C(P1 ) need to be searched to
find P2 .
• The output is the concatenation of P1 and P2 that charges the maximum number of nodes.
To find P1 OFP calls the procedure BM IN, essentially a binary search function that returns the
path with minimum timespan starting from vs ending at v that charges m nodes, each in a distinct
clique.
In the following three lemmas, we show that OFP indeed returns a feasible path (Lemma 5.5)
and establish its time complexity (Lemma 5.6) and approximation ratio (Lemma 5.7). The core
part of the proofs, especially that of Lemma 5.7, consists of decomposing Pf∗ into two sub-paths
∗ ∗
charging b m2 c and d m2 e nodes and then proving the results by induction on L.
Lemma 5.5 (Correctness of OFP). If B ≥ cd (s, t), OFP returns a feasible path for any L, otherwise
it returns P = ∅.
101
Chapter 5. Data Harvesting and Charging Path Optimization
2: procedure OPF(Gd , vs , vt , V , l, b)
3: if b < cd (vs , vt ) or cd (vs , vt ) = ∞ then
4: return P := ∅
5: else
6: P := (vs , vt )
7: end if
8: if l = 0 then
9: return P
10: end if
11: for all v ∈ V − C(vs ) − C(vt ) do
12: for m := 1 to nd do
13: (P1 , b1 ) := B MIN(Gd , vs , v, Vv− , l − 1, b, m)
14: if P1 = ∅ then
15: Break
16: end if
17: P2 :=OPF(Gd , v, vt , Vv+ − C(P1 ), l − 1, b − b1 )
18: if P2 = ∅ then
19: Break
20: end if
21: if |Λ(P1 )| + |Λ(P2 )| > |Λ(P )| then
22: P := (P1 , P2 )
23: end if
24: end for
25: end for
26: return P
27: end procedure
Lemma 5.6 (Time complexity of OFP). OFP terminates in O (nd min(nd , B) log B)L time.
Lemma 5.7 (Approximation ratio of OFP). Let Pf∗ denote the optimum feasible path and let m∗ =
|Λ(Pf∗ )|. Under the condition L ≥ dlog m∗ e + 16 , it holds that
!
|Λ(Pf∗ )| |Λ(Pf∗ )|
|Λ(P )| ≥ , i.e., |Λ(P )| = Ω .
1 + dlog m∗ e log m∗
Lemma 5.5, Lemma 5.6 and Lemma 5.7 together lead to the main theorem below on the
performance of OFP.
Theorem 5.9. By setting L = 1 + dlog m∗ e, OFP finds an O(log m∗ )-approximate optimum feasible
path in quasi-polynomial time.
We now discuss how our approximation algorithm can be adapted and extended to solve other
charging path optimization and scheduling problems formulated in Section 5.3.2.
For the problem where the charger has a battery reservoir and aims at charging the maximum
number of nodes before returning to its service station to replenish itself, we can set the edge
cost between vi and vj (vi , vj ∈ V (Gd )) to the energy required to charge vi to (1 − )α plus the
energy consumption to move from vi to vj . Our algorithm OFP can then be invoked to find the
O(log m∗ )-optimum solution. In a broader sense, our approach can be adopted to solve the first
category of the charging path optimization and scheduling problems formulated in Section 5.3.2.
We next focus on the second class of charging path optimization problems where the charger
has a number of nodes M to charge and it seeks a path with minimum cost (e.g. in terms of time,
energy consumption) to accomplish the charging task. This class of problems are NP-hard because
when nodes are stationary and charging is immediate and M = n, the problems degenerate to the
classical TSP problem which is NP-hard. To solve these problems, we devise a recursive algorithm
similar to OFP, summarized below:
• For each m = [1..M ] and each node v ∈ V (Gd ) − {s, t}:
– Recursively search a path P1 from s to v of minimum timespan (or any form of budget)
charging m nodes, denote the timespan of P1 by b1 ;
– Recursively search another path P2 from v to t of minimum timespan charging M − m
nodes;
• Output the concatenated path P = (P1 , P2 ) of minimum timespan;
6
All logarithms in our analysis are to base 2. In the case where m∗ = 0, meaning that the optimum feasible path
cannot pass any node other than s and t, it holds that |Λ(Pf∗ )| = 0, Lemma 5.7 holds trivially for any L.
103
Chapter 5. Data Harvesting and Charging Path Optimization
Using the similar analysis as OFP, we can establish the logarithmic approximation ratio of the
above algorithm.
In many practical scenarios, some nodes are more critical than others. Instead of seeking
P∗ ∗
P
= argminP ∈P |Λ(P )|, it makes more sense to solve P = argminP ∈P vi ∈Λ(P ) wi , where wi is
a weight for node vi . Our algorithm OFP can be readily applied to solve such weighted version
by attributing a reward wi to vi and by adjusting the objective to finding the path maximizing the
collected reward.
Our algorithm can also be adapted to solve the charging path optimization with time windows
where node vi needs to be charged within a time window. Specifically, this can be done by redefining
the feasible charging path such that only nodes charged within its time window is counted in Λ(P ).
104
Chapter 5. Data Harvesting and Charging Path Optimization
105
Chapter 6
6.1 Introduction
In this chapter, we focus on algorithm design and analysis in RFID (radio-frequency identifi-
cation) systems, which are becoming ubiquitously available today in many domains ranging from
warehouse management, object tracking to inventory control. Our focus is tag counting and moni-
toring, one of the most fundamental functionalities in RFID systems, particularly when the system
scales. In this context, the major performance metric in the algorithm design is the time efficiency
as an algorithm (e.g., tag population estimation, missing tag monitoring and detection) is likely to
be executed frequently.
We start by studying the stability of the Frame Slotted Aloha (FSA) protocol. This is an im-
portant problem as FSA has been widely applied in RFID systems as the de facto standard in tag
identification. However, very limited work has been done on the stability of FSA despite its funda-
mental importance both on the theoretical characterisation of FSA performance and its effective
operation in practical systems. In order to bridge this gap, we devote our first analysis to investigat-
ing the stability properties of FSA by focusing on two physical layer models of practical importance,
the models with single packet reception (SPR) and multipacket reception (MPR) capabilities.
Technically, we model the FSA system backlog as a Markov chain with its states being backlog
size at the beginning of each frame. The objective is to analyze the ergodicity of the Markov
chain and demonstrate its properties in different regions, particularly the instability region. By
employing drift analysis, we obtain the closed-form conditions for the stability of FSA and show
that the stability region is maximized when the frame length equals the backlog size in the SPR
model and the upper bound of stability region is maximized when the backlog size equals the
maximum multipacket reception capacity in the MPR model. Furthermore, to characterise system
behavior in the instability region, we mathematically demonstrate the existence of transience of
the backlog Markov chain.
106
Chapter 6. Algorithm Design and Analysis in RFID Systems
We then proceed to study the problem of tag population estimation. Quickly and accurately
estimating the number of tagged objects is crucial in establishing inventory reports for large retailers
such as Wal-Mart [168]. Due to the paramount practical importance of tag population estimation,
a large body of studies [110, 120, 160, 177, 239] have been devoted to the design of efficient
estimation algorithms. Most of them, as reviewed in Sec. 6.3.1, are focused on the static scenario
where the tag population is constant during the estimation process. However, many practical
RFID applications, such as logistic control, are dynamic in the sense that tags may be activated or
terminated as specialized in C1G2 standard [73], or the tagged objects may enter and/or leave
the reader’s covered area frequently, thus resulting in tag population variation. In such dynamic
applications, a fundamental research question is how to design efficient algorithms to dynamically
trace the tag population quickly and accurately.
We develop a generic framework of stable and accurate tag population estimation schemes for
both static and dynamic RFID systems. By generic, we mean that our framework does not require
any prior knowledge on the tag arrival and departure patterns. Our design is based on the extended
Kalman filter (EKF) [180], a powerful tool in optimal estimation and system control. By performing
Lyapunov drift analysis, we mathematically prove the efficiency and stability of our framework.
We complete our work with a comprehensive analysis on the detection of missing tags, one
of the most important RFID applications which has attracted extensive research attention (cf.
Section 6.4.1 on related work on missing tag detection). We investigate a new problem motivated
by the following practical settings.
• Multiple groups of tags. Tags are usually attached to objects belonging to different groups: e.g.,
different brands of the goods with the high-end brands order-of-magnitude more valuable
than their low-end peers. Therefore, the missing tag events are characterized by asymmetrical
threshold and reliability requirement across groups.
• Multiple interrogation regions. Tags may be unevenly located in multiple interrogation regions:
e.g., tags may be located in several rooms or different corners or regions of a large warehouse.
Hence, a reader may need to move several times to cover all monitored tags and complete
the missing tag detection process.
The problem we consider is to devise missing tag detection algorithm with minimum execution
time while guaranteeing the detection reliability requirement for each group of tags. We deliver a
comprehensive analysis on the missing tag detection problem in the above multiple-group multiple-
region environment and investigate how to devise optimum missing tag detection algorithms. Note
that when there are only one group and all tags are with one interrogation region, our problem
degenerates to the classical missing tag detection problem studied in the literature.
To design missing tag detection algorithms in the multiple-region multiple-group case, we
leverage a powerful technique called Bloom filter which is a space-efficient probabilistic data
structure for representing a set and supporting set membership queries [32] to detect a missing
event. Specifically, we develop a suite of three missing tag detection algorithms, each decreasing
the execution time compared to its predecessor by incorporating an improved version of the
Bloom filter design and parameter tuning. By sequentially analysing the developed algorithms, we
107
Chapter 6. Algorithm Design and Analysis in RFID Systems
The rest of this chapter is structured as follows. Section 6.2 presents our work on the stability of
FSA. Section 6.3 develops our work on tag population estimation. Section 6.4 focuses on our work
on missing tag detection. The work of this chapter is in collaboration with my former Ph.D. student
Jihong YU who has just defended his thesis and started his post-doc at Simon Fraser University
in Canada. More details of our work on this topic including proofs and numerical analysis can be
found in our publications [227 – 230].
6.2.1 Introduction
Since the introduction of Aloha protocol in 1970 [1], a variety of such protocols have been
proposed to improve its performance, such as Slotted Aloha (SA) [167] and Frame Slotted Aloha
(FSA) [154]. SA is a well-known random access scheme where the time is divided into identical
slots of duration equal to the packet transmission time and the users contend to access the medium
with a predefined access probability. As a variant of SA, FSA divides slots into frames and a user is
allowed to transmit only a single packet per frame in a randomly chosen slot.
Due to their effectiveness to tackle collisions in wireless networks, SA- and FSA-based proto-
cols have been applied extensively to various networked systems ranging from the satellite net-
works [153], wireless LANs [197, 232] to the emerging Machine-to-Machine (M2M) networks [198,
215]. Specifically, in RFID systems, FSA plays a fundamental role in the identification of tags [131,
241] and is standardized in the EPCGlobal Class-1 Generation-2 (C1G2) RFID standard [73]. In
FSA-based protocols, all users with packets transmit in the selected slot of the frame respectively,
but only packets experiencing no collisions are successful while the other packets are retransmitted
in the subsequent frames.
Given the paramount importance of the stability for systems operating on top of Aloha-based
protocols, a large body of studies have been devoted to stability analysis in a slotted collision
channel [34, 74, 112] where a transmission is successful if and only if just a single user transmits in
the selected slot, referred to as single packet reception (SPR). Differently with SPR, the emerging
multipacket reception (MPR) technologies in wireless networks, such as Code Division Multiple
Access (CDMA) and Multiple-Input and Multiple-Output (MIMO), make it possible to receive
multiple packets in a slot simultaneously, which remarkably boosts system performance at the cost
of the system complexity.
More recently, the application of FSA in RFID systems and M2M networks has received con-
siderable research attention. However, very limited work has been done on the stability of FSA
despite its fundamental importance both on the theoretical characterisation of FSA performance
and its effective operation in practical systems. Motivated by the above observation, we argue
that a systematic study on the stability properties of FSA incorporating the MPR capability is
called for in order to lay the theoretical foundations for the design and optimization of FSA-based
communication systems.
108
Chapter 6. Algorithm Design and Analysis in RFID Systems
Motivated by the above analysis, we investigate the stability properties of FSA with SPR and
MPR capabilities. The main contributions are articulated as follows. We model the packet transmis-
sion process in a frame as the bins and balls problem [104] and derive the number of successfully
received packets under both SPR and MPR models. We formulate a homogeneous Markov chain to
characterize the number of the backlogged packets and derive the one-step transition probability.
By employing drift analysis, we obtain the closed-form conditions for the stability of FSA and derive
conditions maximising the stability regions for both SPR and MPR models. To characterise system
behavior in the instability region, we mathematically demonstrate the existence of transience of
the backlog Markov chain.
Aloha-based protocols are basic schemes for random medium access and are applied extensively
in many communication systems. As a central property, the stability of Aloha protocols has received
a lot of research attention, which we briefly review here.
Stability of slotted Aloha. Tsybakov and Mikhailov [195] initiated the stability analysis of
finite-user slotted Aloha. They established sufficient conditions for stability of the queues in the
system using the principle of stochastic dominance and derived the stability region for two users
explicitly. For the case of more than two users, the inner bounds to the stability region were shown
in [163]. Subsequently, Szpankowski [185] established necessary and sufficient (but not in closed-
form) conditions for the stability under a fixed transmission probability vector for three-user case.
In [34] an approximate stability region was derived for arbitrary number of users based on the
mean-field asymptotics. The sufficient condition for the stability was further derived to be linear in
arrival rates without the requirement on the knowledge of the stationary joint statistics of queue
lengths in [112]. Recently, the stability region of SA with K-exponential backoff was derived in [74]
by modeling the network as inter-related quasi-birth-death processes. We would like to point out
that all the above stability analysis results were derived for the SPR model.
Stability of slotted Aloha with MPR. The first attempt of analyzing stability properties of SA
with MPR was made by Ghez et al. in [81, 82] in an infinite-user single-buffer model. They demon-
strated that the system could be stabilized under the symmetrical MPR model with a non-zero
probability that all packets were transmitted successfully. Later, Sant and Sharma [171] studied
a special case of the symmetrical MPR model for finite-user with an infinite buffer. They derived
sufficient conditions on arrival rate for stability under the stationary ergodic arrival process. Subse-
quently, the effect of MPR on stability and delay was investigated in [150] and it was shown that
stability region undergoes a phase transition and then reaches the maximization. More recently,
Jeon and Ephremides [102] characterized the exact stability region of SA with stochastic energy
harvesting and MPR for a pair of bursty users. These works are mostly, if not all, focused on the
baseline SA, while our focus is FSA with both SPR and MPR.
Performance analysis of FSA. There exist several studies on the performance of FSA. Wieselth-
ier and Anthony [213] introduced an combinational technique to analyse performance of FSA-MPR
for the case of finite users. Schoute [175] investigated dynamic FSA and obtained the expected
number of slots needed until the backlog becomes zero. Recently, the optimal frame setting for
dynamic FSA was proved mathematically in [159] and [27]. However, these works did not address
the stability of FSA, which is of fundamental importance.
In summary, only very limited work has been done on the stability of FSA despite its funda-
109
Chapter 6. Algorithm Design and Analysis in RFID Systems
mental importance both on the theoretical characterisation of FSA performance and its effective
operation in practical systems. In order to bridge this gap, we devote our work to investigating the
stability properties of FSA under both SPR and MPR models.
We consider a system of infinite identical users operating on one frequency channel. In one slot,
a node can complete a packet transmission. We investigate two physical layer models, SPR and
MPR models.
• Under SPR, a packet suffers a collision if more than one packet is transmitted in the same
slot.
• Under MPR, up to M (M > 1) concurrently transmitted packets can be received successfully
with non-zero probabilities as specified by a stochastic matrix Ξ defined as follows:
ξˆ10 ξˆ11
ˆ
ξ20 ξˆ21 ξˆ22
0
. .. .. . .
.. . . .
ˆ
ξx0 0 ξˆx0 1 · · · · · · ξˆx0 x0
Ξ, . (6.1)
. .. .. .. ..
. . . . .
ˆ
ξM 0 ξˆM 1 · · · · · · ˆ
ξ M M
0
··· ··· ···
1 0
0
1 0
Ξ = . . .
.. ..
1 0
The random access process we consider is as follows: FSA organises time slots in frames, each
containing a number of consecutive slots. Each user is allowed to randomly and independently
choose a slot to send his packet at most once per frame. More specifically, Denote the length of
frame t by Lt ; in the beginning of frame t each user generates a random number r and selects the
(r mod Lt )-th slot in frame t to transmit his packet. Note that unsuccessful packets in the current
frame are retransmitted in the next frame with the constant persistence probability p while new
generated packets are transmitted in the next frame following their arrivals with probability one.1
1
To make our presentation concise, we focus on the case p = 1 in this chapter. The case with general p follows similar
analysis and is detailed in [227].
110
Chapter 6. Algorithm Design and Analysis in RFID Systems
For notation convenience, we use FSA-SPR and FSA-MPR to denote the FSA system operating
on the SPR and MPR models, respectively.
Let Nt denote the number of packets arrived during frame t and denote by Atl the number of
packets arrived in slot l of frame t where l = 1, 2, · · · , Lt . Assume that {Atl } are independent and
identically Poisson distributed random variables with probability distribution:
P {Atl = u} = Λu (u ≥ 0) (6.2)
Aiming at studying the stability of FSA, we decompose our global objective into the following
three questions:
• Q1: Under what condition(s) is FSA stable?
• Q2: When is the stability region maximized?
• Q3: How does FSA behave in the instability region?
Before answering the questions, we first introduce the formal definition of stability employed
by Ghez et al. in [82]. Define by random variable Xt the backlog size in the system at the beginning
of frame t. The discrete-time process (Xt )t≥0 can be seen as a homogeneous Markov chain.
Definition 6.1. An FSA system is stable if (Xt )t≥0 is ergodic and unstable otherwise.
By Definition 6.1, we can transform the study of stability of FSA into investigating the ergodicity
of the backlog Markov chain. The rationale of this transformation is two-fold. The first interpreta-
tion is the property of ergodicity: there exists a unique stationary distribution of a Markov chain if
it is ergodic. The second follows the nature of ergodicity: each state of the Markov chain can recur
in finite time with probability 1 asymptotically. From an engineering perspective, if an FSA system
is stable, then asymptotically the backlog size will not explode.
We next establish the following results characterizing the stability region and demonstrating
the behavior of the Markov chain in non-ergodic regions under both SPR and MPR. The proof and
analysis are detailed in [227], where the core technique used is drift analysis with the drift defined
as Di = E[Xt+1 − Xt |Xt = i].
Let ĥ denote the backlog size at the beginning of the frame t and define α , Lĥt , where Lt is the
size of frame t, we have the following theorem.
Theorem 6.1 (Stability of FSA-SPR). Under FSA-SPR, the following results hold asymptotically.
111
Chapter 6. Algorithm Design and Analysis in RFID Systems
1. The system is stable if Λ < αe−α and Lt = Θ(ĥ). Specially, α = 1 maximizes the stability
region2 and the stable throughput.
2. The system is unstable under each of the following three conditions: (1) Lt = o(ĥ); (2) ĥ =
o(Lt ); (3) Lt = Θ(ĥ) and Λ > αe−α .
Remark. Theorem 6.1 answers the first two questions and can be interpreted as follows:
• When Lt = o(ĥ), the number of packets sent during frame t is far larger than the frame length; a
packet experiences collision w.h.p., thus increasing the backlog size and destabilising the system.
• When ĥ = o(Lt ), the number of packets sent is far smaller than the frame length; a packet
is transmitted successfully w.h.p.; however, the expected number of successful packets, which is
o(Lt ), is still order-of-magnitude less than that of new arrivals in the frame, which is O(Lt )
asymptotically; the system is thus unstable.
• When Lt = Θ(ĥ), the number of packets sent is of the same order as the frame length; the system
is stable when the expected arrival rate is less than the effective throughput.
It is well known that an irreducible aperiodic Markov chain falls into one of three mutually
exclusive classes: positive recurrent, null recurrent and transient. So, our next step after deriving
the stability conditions is to show whether the backlog Markov chain in the instability region is
transient or recurrent, which answers the third question.
Theorem 6.2 (Behavior of FSA-SPR in instability region). (Xt )t≥0 is always transient in the instabil-
ity region under each of the following three conditions: (1) Lt = o(ĥ); (2) Lt = Θ(ĥ) and Λ > αe−α ;
(3) ĥ = o(Lt ).
Remark. If a state of a Markov chain is transient, then the probability of returning to itself for the
first time in a finite time is less than 1 asymptotically. Hence, Theorem 6.2 implies that once out of
the stability region, the system is not guaranteed to return to stable state in finite time, that is, the
number of backlogs may increase persistently.
Theorem 6.3 (Stability of FSA-MPR). Under FSA-MPR, the following results hold asymptotically.
−α αx0
PM Px0 ˆ
1. The system is always stable if Lt = Θ(ĥ) and Λ < x0 =1 e x0 ! k0 =1 k0 ξx0 k0 . Specially,
∗
let α denote the value of α that maximizes the upper bound of stability region, it holds that
α∗ = Θ(M ).
2. The system is unstable under each of the following conditions: (1) Lt = o(ĥ1−1 ) for any
0 < 1 ≤ 1; (2) Lt = O(ĥ); (3) Λ > α and Lt = Θ(ĥ).
Comparing Theorem 6.3 to Theorem 6.1, we can quantify the performance gap between FSA-
SPR and FSA-MPR in terms of stability. For example, when α = 1, the stability region is maxi-
mized in FSA-SPR with Λ < e−1 , while the upper bound of the stability region in FSA-MPR is
2
The ergodicity region of a Markov chain is referred to as stability region.
112
Chapter 6. Algorithm Design and Analysis in RFID Systems
PM
e−1 1
x0 =1 (x0 −1)! . Note that for M > 2, it holds that
M M M
1 X 1 X 1 X 1 1 1
1+1+ < < 1+1+ < 2+ − = 3− .
2 (x0 − 1)! x0 (x0 + 1) x0 x0 + 1 M +1
x0 =1 x0 =1 x0 =1
The upper bound of the stability region of FSA-MPR when α = 1 is thus between 2.5 and 3 times
the maximum stability region of FSA-SPR. Hence the maximum upper bound of the stability region
of FSA-MPR achieved when α∗ = Θ(M ) is far larger than that of FSA-SPR.
Theorem 6.4 (Behavior of FSA-MPR in instability region). With the same notations as in Theo-
rem 6.3, (Xt )t≥0 is transient under each of the following three conditions: (1) Lt = o(ĥ1−1 ); (2)
ĥ = o(Lt ); (3) Λ > α and Lt = Θ(ĥ).
Theorem 6.4 demonstrates that despite the gain on the stability region size of FSA-MPR over
FSA-SPR, their behavior in the unstable region are essentially identical.
Due to its fundamental importance, tag population estimation has received significant research
attention, which we briefly review here.
Most of existing works are focused on the static scenario where the tag population is constant
during the estimation process. The central question there is to design efficient algorithms quickly
and accurately estimating the static tag population. Kodialam et al. designed an estimator called
PZE which uses the probabilistic properties of empty and collision slots to estimate the tag pop-
ulation size [109]. The authors then enhanced PZE by taking the average of the probability of
idle slots in multiple frames as an estimator in order to eliminate the constant additive bias [110].
Han et al. exploited the average number of idle slots before the first non-empty slots to estimate
the tag population size [88]. Later, Qian et al. developed the Lottery-Frame scheme by employing
1
geometrically distributed hash function such that the jth slot is chosen with probability 2j+1 [160].
As a result, the first idle slot approaches around the logarithm of the tag population and the frame
size can be reduced to the logarithm of the tag population, thus reducing the estimation time.
Subsequently, a new estimation scheme called ART was proposed in [177] based on the average
length of consecutive non-empty slots. The design rationale of ART is that the average length of
consecutive non-empty slots is correlated to the tag population. ART admits smaller variance than
prior schemes. More recently, Zheng et al. proposed another estimation algorithm, ZOE, where
each frame just has a single slot and the random variable indicating whether a slot is idle follows
Bernoulli distribution [239]. The average of multiple individual observations is used to estimate
the tag population.
We note that the above research works do not consider the estimation problem for dynamic
RFID systems and thus may fail to monitor the system dynamics in real time. Specifically, in typical
113
Chapter 6. Algorithm Design and Analysis in RFID Systems
static tag population estimation schemes, the final estimation result is the average of the outputs of
multi-round executions. When applied to dynamic tag population estimation, additional estimation
error occurs due to the variation of the tag population size during the estimation process.
Only a few propositions have tackled the dynamic scenario. The works in [172] and [219]
considered specific tag mobility patterns with tags moving along the conveyor with constant speed.
Xiao et al. developed an estimation algorithm, ZDE, in dynamic RFID systems to estimate the
number of arriving and removed tags [216]. More recently, they further generalized ZDE by taking
into account the snapshots of variable frame sizes [217]. Though the algorithms in [216] and [217]
can monitor the dynamic RFID systems, they may fail to estimate the tag population size accurately,
because they use the same hash seed in the whole monitoring process. Using the same seed is
required in tracing tag departure and arrival. However, it may significantly limit the estimation
accuracy, even in the static case.
Besides the limitations above, prior works do not provide formal analysis on the stability and
the convergence rate. Motivated by the above argument, we develop a generic framework for
tag population estimation in dynamic RFID systems. By generic, we mean that our framework
can estimate the number of tags accurately without any prior knowledge on the tag arrival and
departure patterns. Moreover, the efficiency and stability of our framework is mathematically
established.
We briefly introduce the extended Kalman filter and some fundamental concepts and results in
stochastic process which are useful in the subsequent analysis. The main notations used are listed
in Table 6.1.
The extended Kalman filter is a powerful tool to estimate system state in nonlinear discrete-time
systems. Formally, a nonlinear discrete-time system can be described as follows:
where zk+1 ∈ Rn denotes the state of the system, xk ∈ Rd is the controlled inputs and yk ∈ Rm
stands for the measurement observed from the system. The uncorrelated stochastic variables wk∗ ∈
Rn and u∗k ∈ Rm denote the process noise and the measurement noise, respectively. The functions
f and h are assumed to be the continuously differentiable.
For the above system, we introduce an EKF-based state estimator in Definition 6.2.
Definition 6.2 (Extended Kalman filter [180]). A two-step discrete-time extended Kalman filter
consists of state prediction and measurement update, defined as follows:
114
Chapter 6. Algorithm Design and Analysis in RFID Systems
where
Remark. In the above definition of extended Kalman filter, the parameters can be interpreted in our
context as follows:
115
Chapter 6. Algorithm Design and Analysis in RFID Systems
• ẑk+1|k is the prediction of zk+1 at the beginning of frame k + 1 given by the previous state
estimate, while ẑk+1|k+1 is the estimate of zk+1 after the adjustment based on the measure at the
end of frame k + 1.
• vk+1 , called innovation, is the measurement residual in frame k+1. It represents the estimated
error of the measure.
• Kk+1 is the Kalman gain. In (6.7), it weighs the innovation vk+1 w.r.t. f (ẑk+1|k , xk ).
• Pk+1|k and Pk+1|k+1 , in contrast to the linear case, are not equal to the covariance of estimation
error of the system state. Here, we call them pseudo-covariance.
• Qk and Rk are two tunable parameters which play the role as that of the covariance of the
process and measurement noises in linear stochastic systems to achieve optimal filtering in the
maximum likelihood sense. They also play an important role in improving the stability and
convergence of our EKF-based estimators.
In order to analyse the stability of an estimation algorithm, we need to check the boundedness
of the estimation error defined as follows:
We further introduce the following two mathematical definitions [147] [188] on the bounded-
ness of stochastic process.
Definition 6.3 (Boundedness of Random Variable). The stochastic process of the estimation error
ek|k−1 is said to be bounded w.p.o., if there exists X > 0 such that
Definition 6.4 (Boundedness in Mean Square). The stochastic process ek|k−1 is said to be expo-
nentially bounded in the mean square with exponent ζ, if there exist real numbers ψ1 , ψ2 > 0 and
0 < ζ < 1 such that
E[e2k|k−1 ] ≤ ψ1 e21|0 ζ k−1 + ψ2 . (6.14)
To investigate the boundedness defined in Definition 6.3 and 6.4, we present the following
lemma [165].
Lemma 6.1. Given a stochastic process Vk (ek|k−1 ) and real numbers β, β, τ >0 and 0<α≤1 with the
following properties:
βe2k|k−1 ≤ Vk (ek|k−1 ) ≤ βe2k|k−1 , (6.15)
E[Vk+1 (ek+1|k )|ek|k−1 ] − Vk (ek|k−1 ) ≤ −αVk (ek|k−1 ) + τ, (6.16)
then for any k ≥ 1 it holds that
• the stochastic process ek|k−1 is exponentially bounded in the mean square, i.e.,
k−2
β τ X β τ
E[e2k|k−1 ] ≤ E[e21|0 ](1 − α)k−1 + (1 − α)j ≤ E[e21|0 ](1 − α)k−1 + , (6.17)
β β β βα
j=1
116
Chapter 6. Algorithm Design and Analysis in RFID Systems
From (6.15) and (6.16), if we can construct a function Vk (ek|k−1 ) such that both its drift and
Vk (ek|k−1 )
e2k|k−1
are bounded, then ek|k−1 is also bounded. Besides, it can be noted that Lemma 6.1 can
only be implemented offline. To address this limit, we adjust Lemma 6.1 to an online version with
time-varying parameters, which can be proven by the same method as in [166, 188].
Lemma 6.2. If there exist a stochastic process Vk (ek|k−1 ) and parameters β ∗ , βk , τk >0 and 0<αk∗ ≤1
with the following properties:
V1 (e1|0 ) ≤ β ∗ e21|0 , (6.18)
k−1 k−2 i
β∗ Y 1 X Y
E[e2k|k−1 ] ≤ E[e1|0 2 ] (1 − αi∗ ) + τk−i−1 ∗
(1 − αk−j ), (6.21)
βk βk
i=1 i=1 j=1
Remark. The conditions in Lemma 6.2 can be interpreted as follows: to establish the boundedness of
V (e )
ek|k−1 , it suffices to construct a function Vk (ek|k−1 ) such that both its drift, i.e, (6.20), and ke2 k|k−1 ,
k|k−1
i.e, (6.18), (6.19), are bounded.
Consider a RFID system consisting of a reader and a mass of tags operating on one frequency
channel. The number of tags is unknown a priori and can be constant or dynamic (time-varying),
which we refer to as static and dynamic systems, respectively. The MAC protocol for the RFID system
is the standard FSA protocol analyzed in 6.2, where the standard Listen-before-Talk mechanism is
employed by the tags to respond the reader’s interrogation [76].
Specifically, the reader initiates a series of frames indexed by k ∈ Z+ . Each frame, referred to
as a round, consists of a number of slots. The reader starts frame k by broadcasting a begin-round
command with frame size Lk , persistence probability rk and a random seed Rsk . When a tag
receives a begin-round command, it uses a hash function h(·), Lk , Rsk , and its ID to generate a
uniformly distributed random number i ∈ [0, Lk − 1] and reply in slot i of frame k with probability
rk .
Since every tag picks its own response slot independently, there may be zero, one, or more
than one tags transmitting in a slot, which are referred to as idle, singleton, and collision slots,
respectively. The reader is not assumed to be able to distinguish between a singleton or a collision
slot, but it can detect an idle slot. We term both singleton and collision slots as occupied slots.
117
Chapter 6. Algorithm Design and Analysis in RFID Systems
By collecting all responses in a frame, the reader can generate a binary sequence Bk , where ‘0’
indicates an idle slot, and ‘1’ stands for an occupied one.
The reader then terminates the current frame by sending an end round command. Based on
the number of idle slots, i.e., the number of ‘0’ in Bk , the reader runs the estimation algorithm,
detailed in following analysis, to estimate and trace the tag population.
Our objective is to design a stable and accurate tag population estimation algorithm for both
static and dynamic systems. By stable and accurate we mean that
• the estimation error of our algorithm is bounded in mean square in the sense of Definition 6.3
and 6.4 and the relative estimation error tends to zero;
• the estimated population size converges exponentially to the real value.
Mathematically, we consider a large-scale RFID system of a reader and a set of tags with the
unknown size zk in frame k which can be static or dynamic. Denote by ẑk|k−1 the prior estimate
of zk in the beginning of frame k. At the end of frame k, the reader updates the estimate ẑk|k−1 to
ẑk|k by running the estimation algorithm. Our designed estimation scheme need to guarantee the
following properties:
ẑk|k−1 − zk
• lim = 0;
zk →∞ zk
• the converges rate is exponential.
We begin with the baseline scenario of static systems where the tag population is constant
during the estimation process. We first establish the discrete-time model for the system dynamics
and the measurement model using the bit string Bk observed during frame k. We then present our
EKF-based estimation algorithm.
Consider the static RFID systems where the tag population stays constant, the system state
evolves as
zk+1 = zk , (6.22)
meaning that the number of tags zk+1 in the system in frame k + 1 equals that in frame k.
In order to estimate zk , we leverage the measurement on the number of idle slots during a
frame. To start, we study the stochastic characteristics of the number of idle slots.
Assume that the initial tag population z0 falls in the interval z0 ∈ [z 0 , z 0 ], yet the exact value of
z0 is unknown and should be estimated. The range [z 0 , z 0 ] can be a very coarse estimation that can
be obtained by any existing population estimation method. Recall the system model that in frame
k, the reader probes the tags with the frame size Lk . Denote by variable Nk the number of idle
slots in frame k, that is, the number of ‘0’s in Bk , we have the following results on Nk according
to [109, 111].
118
Chapter 6. Algorithm Design and Analysis in RFID Systems
Lemma 6.3. If each tag replies in a random slot among the Lk slots with probability rk , then it holds
that Nk ∼ N [µ, σ 2 ] for large Lk and zk , where
rk zk 2 2rk zk rk zk rk 2zk
µ = Lk (1 − ) , σ = Lk (Lk − 1)(1 − ) + Lk (1 − ) − Lk 2 (1 − ) .
Lk Lk Lk Lk
Lemma 6.4. For any ∗ > 0, there exists some M > 0, such that if zk ≥ M or Lk = ẑk|k−1 ≥ M ,
then it holds that
µ − Lk e−rk ρ ≤ ∗ , (6.23)
−rk ρ
2
σ − Lk (e − (1 + rk2 ρ)e−2rk ρ ) ∗
≤ , (6.24)
zk
where ρ = Lk is referred to as the reader load factor.
Lemmas 6.3 and 6.4 imply that in large-scale RFID systems, we can use Lk e−rk ρ and Lk (e−rk ρ −
(1 + rk2 ρ)e−2rk ρ ) to approximate µ and σ 2 .
At the end of each frame k, the reader gets a measure yk of the idle slot frequency defined as
Nk
yk = . (6.25)
Lk
Recall Lemma 6.3, it holds that yk is a Normal distributed random variable specified as follows:
E[yk ] = e−rk ρ and V ar[yk ] = L1k (e−rk ρ − (1 + rk2 ρ)e−2rk ρ ). Since there are zk tags reply in frame k
with probability rk , the probability that a slot is idle, denoted as p(zk ), can be calculated as
r z
rk zk − k k
p(zk ) = (1 − ) ≈ e Lk . (6.26)
Lk
Notice that for large zk , p(zk ) can be regarded as a continuously differentiable function of zk .
Using the notation in the Kalman filter, we can write yk as follows:
yk = p(zk ) + uk , (6.27)
where, based on the statistic characteristics of yk , uk is a Gaussian random variable with zero mean
and variance
1 −rk ρ
V ar[uk ] = (e − (1 + rk2 ρ)e−2rk ρ ). (6.28)
Lk
We note that uk measures the uncertainty of yk .
To summarize, the discrete-time model for static RFID systems is characterized by (6.22)
and (6.27). We conclude this subsection by stating the following auxiliary lemma which is useful
in our later analysis.
Lemma 6.5. Denote the function
1 −rk ρ
Λ(rk ) , V ar[uk ] = (e − (1 + rk2 ρ)e−2rk ρ ), ρ > 0,
Lk
e1.59 −1
it holds that Λ(rk ) has a unique minimizer rk∗ = 1.59/ρ where ρ ≥ 1.59, and Λ(rk∗ ) ≤ e3.18 Lk
.
Note that the estimation algorithm with the small variance V ar[uk ] is more accurate, thus we
set the persistence probability rk following Lemma 6.5.
119
Chapter 6. Algorithm Design and Analysis in RFID Systems
Noticing that the system state characterized by (6.22) and (6.27) is a discrete-time nonlinear
system, we thus leverage the two-step EKF described in Definition 6.2 to estimate the system state.
In (6.9), the Kalman gain Kk increases with Qk while decreases with Rk . As a result, Qk and
Rk can be used to tune the EKF such that increasing Qk and/or decreasing Rk accelerates the
convergence rate but leads to larger estimation error. In our design, we set Qk to a constant q > 0
and introduce a parameter φk as follows to replace Rk to facilitate our demonstration:
Rk = φk Pk|k−1 Ck 2 . (6.29)
It can be noted from (6.9) and (6.29) that Kk is monotonously decreasing in φk , i.e., a small
φk leads to quick convergence at the price of relatively high estimation error. Hence, choosing
the appropriate value for φk consists of striking a balance between the convergence rate and the
estimation error. In our work, we take a dynamic approach by setting φk to a small value φ at the
first few rounds (J rounds) of estimation to allow the system to act quickly since the estimation in
the beginning phase can be very coarse. After that we set φk to a relatively high value φ to achieve
high estimation accuracy.
Now, we are ready to present our tag population estimation algorithm as illustrated in Algo-
rithm 13. The major procedures of our estimation algorithm can be summarized as:
1. In the beginning of frame k: prediction (line 3). The reader predicts the number of tags based
on the estimation at the end of frame k − 1. The predicted value is defined as ẑk|k−1 . Then
the reader sets the persistence probability rk following Lemma 6.5 where zk is set to ẑk|k−1 .
2. Line 4-5. The reader launches the Listen-before-talk protocol as introduced in 6.3.3.1 in order
to receive the feedbacks from tags.
120
Chapter 6. Algorithm Design and Analysis in RFID Systems
3. At the end of frame k: correction (line 6-14). The reader computes Nk based on Bk and further
calculates yk and vk from Nk . It then updates the prediction with the corrected estimate ẑk|k
following (6.7).
We will theoretically establish the stability and accuracy of the estimation algorithm in Sec. 6.3.6.
We now tackle the dynamic case where the tag population may vary during the estimation
process. The objective for the dynamic systems is to promptly detect the global tag papulation
change and accurately estimate the quantity of this change. To that end, we first establish the
system model and then present our estimation algorithm.
zk+1 = zk + wk , (6.30)
where the tag population zk+1 in frame k+1 consists of two parts: i) the tag population in frame k
and ii) a random variable wk which accounts for the stochastic variation of tag population resulting
from the tag arrival/departure during frame k. Notice that wk is referred to as process noise in
Kalman filters and the appropriate characterisation of wk is crucial in the design of stable Kalman
filters, which will be investigated in detail later. Besides, the measurement model is the same as
the static case. Hence, the discrete-time model for dynamic RFID systems can be characterized
by (6.30) and (6.27).
In the dynamic case, we leverage the two-step EKF to estimate the system state combined with
the CUSUM test to further trace the tag population fluctuation.
Our main estimation algorithm is illustrated in Algorithm 14. The difference compared to the
static scenario is that tag population variation needs to be detected by the CUSUM test presented
in Algorithm 15 in the next subsection and the output of Algorithm 15 acts as a feedback to φk
because due to the tag population variation, φk is no more a constant after the Jth round as the
static case. The overall structure of the estimation algorithm is illustrated in Fig. 6.1. We note that
in the case where zk is constant, Algorithm 14 degenerates to Algorithm 13.
ẑk|k
⊕ ⊗ Kk φk
ẑk|k-1
Prediction
-p(ẑk|k-1)
yk
⊕
vk Alarm
CUSUM
Figure 6.1: Estimation algorithm diagram: Dashed box indicates the EKF.
121
Chapter 6. Algorithm Design and Analysis in RFID Systems
The CUSUM Detection Framework. We leverage the CUSUM test to detect the change of tag
population and further adjust φk . CUSUM test is a sequential analysis technique typically used
for change detection [86]. It is shown to be asymptotically optimal in the sense of the minimum
detection time subject to a fixed worst-case expected false alarm rate [35].
In the context of dynamic tag population detection, the reader monitors the innovation process
vk = yk − p(ẑk|k−1 ). If the number of the tags population is constant, vk equals to uk which is a
Gaussian process with zero mean. In contrast, upon the system state changes, i.e., tag population
changes, vk drifts away from the zero mean. In our design, we use Φk as a normalized input to the
CUSUM test by normalising vk with its estimated standard variance, specified as follows:
vk
Φk = q . (6.31)
(Pk|k−1 + Qk−1 )Ck 2 + V ar[uk ] z =ẑ
k k|k−1
The reader further updates the CUSUM statistics gk+ and gk− as follows:
gk+ = max{0, gk−1
+
+ Φk − Υ}, (6.32)
gk− = −
min{0, gk−1 + Φk + Υ}, (6.33)
gk+ = gk− = 0, if δ = 1, (6.34)
where g0+ =0 and g0− = 0. And Υ≥0, referred to as reference value, is a filter design parameter
indicating the sensitivity of the CUSUM test to the fluctuation of Φk , Moreover, by δ we define an
indicator flag indicating tag population change:
(
1 if gk+ > θ or gk− < −θ,
δ= (6.35)
0 otherwise,
122
Chapter 6. Algorithm Design and Analysis in RFID Systems
Parameter tuning in CUSUM test. The choice of the threshold θ and the drift parameter Υ has
a directly impact on the performance of the CUSUM test in terms of detection delay and false alarm
rate. Formally, the average running length (ARL) L(µ∗ ) is used to denote the duration between
two actions [28]. For a large θ, L(µ∗ ) can be approximated as 3
(
Θ(θ), if µ∗ 6= 0,
L(µ∗ ) = (6.36)
Θ(θ2 ), if µ∗ = 0,
θ = 4σ ∗ , (6.37)
∗ ∗
Υ = µ + 0.5σ . (6.38)
The rationale is that once a change on the tag population is detected in frame k, φk is set to φ to
quickly react to the change, while φk sticks to φ when no system change is detected.
3
For two variables X, Y, asymptotic notation X = Θ(Y ) implies that there exist positives c1 , c2 and x0 such that for
∀X > x0 , it follows that c1 X ≤ Y ≤ c2 X.
123
Chapter 6. Algorithm Design and Analysis in RFID Systems
We now establish the stability and the accuracy of our estimation algorithms for both static and
dynamic cases.
Our analysis is composed of two steps. We first derive the estimation error and then establish
the stability and the accuracy of Algorithm 13 in terms of the boundedness of estimation error.
Computing Estimation Error. We first approximate the non-linear discrete system by a linear
one. To that end, as the function p(zk ) is continuously differentiable at zk = ẑk|k−1 , using the Taylor
expansion and the fact that rk = 1.59Lk /ẑk|k−1 , we have
where
1.59
Ck = − , (6.41)
e1.59 ẑ k|k−1
∞
X 1 1.59zk j
χ(zk , ẑk|k−1 ) = (1.59 − ) . (6.42)
e1.59 j! ẑk|k−1
j=2
we can obtain the following boundedness of the residual for the case 0.61zk < ẑk|k−1 < 2.7zk :
∞
1.592 (ẑk|k−1 − zk )2 X 1.59j zk j
|χ(zk , ẑk|k−1 )| = 1.59 2 1−
e ẑk|k−1 (j + 2)! ẑk|k−1
j=0
∞
1.592 (ẑk|k−1 − zk )2 X zk j
≤ 1.59j 1 −
2e1.59 ẑk|k−1 2 j=0
ẑk|k−1
1.592 (ẑk|k−1 − zk )2 1.592 (ẑk|k−1 − zk )2
≤ ≤ , (6.44)
2e1.59 ẑk|k−1 2 [1 − 1.59(1 − zk
)] 2e1.59 ak ẑk|k−1 2
ẑk|k−1
where
Recall the definition of the estimation error in (6.12) and using (6.22), (6.5) and (6.7), we
can derive the estimation error ek+1|k as follows:
ek+1|k =zk+1 − ẑk+1|k = zk − ẑk|k = zk − ẑk|k−1 − Kk Ck (zk − ẑk|k−1 ) + χ(zk , ẑk|k−1 ) + uk
=(1 − Kk Ck )ek|k−1 + sk + mk , (6.46)
124
Chapter 6. Algorithm Design and Analysis in RFID Systems
sk = −Kk uk , (6.47)
mk = −Kk χ(zk , ẑk|k−1 ). (6.48)
Boundedness of Estimation Error. Having derived the dynamics of the estimation error, we
now state the main result on the stochastic stability and accuracy of Algorithm 13. The core tech-
nique in the proof, detailed in our publication [229], consists of setting an appropriate Lyapunov
e2
function, Vk (ek|k−1 ) , Pk|k−1 in our problem, and then employ Lyapunov drift analysis to prove
k|k−1
the conditions in Lemma 6.2 are satisfied.
Theorem 6.5. Consider the discrete-time stochastic system given by (6.22) and (6.27) and Algo-
rithm 13, the estimation error ek|k−1 defined by (6.12) is exponentially bounded in mean square and
bounded w.p.o., if the following conditions hold:
q ≤ Qk ≤ q, (6.49)
φ ≤ φk ≤ φ, (6.50)
1. The inequalities (6.49) and (6.50) can be satisfied by the configuring the correspondent param-
eters in Algorithm 13, which guarantees the boundedness of the pseudo-covariance Pk|k−1 .
3. As a sufficient condition for stability, the upper bound may be too stringent. As shown in the
simulation results in [229], stability is still ensured even with a relatively large .
Our analysis on the stability of Algorithm 14 for the dynamic case is also composed of two
steps. First, we derive the estimation error. Second, we establish the stability and the accuracy of
Algorithm 14 in terms of the boundedness of estimation error.
125
Chapter 6. Algorithm Design and Analysis in RFID Systems
which differs from the static case (6.46) in sk . In the dynamic case, we have
sk = wk − Kk uk (6.54)
Theorem 6.6. Under the conditions of Theorem 6.5, consider the discrete-time stochastic system
given by (6.30) and (6.27) and Algorithm 14, if there exist time-varying positive real number λk ,
σk > 0 such that E[wk ] ≤ λk and E[wk 2 ] ≤ σk , then the estimation error ek|k−1 defined by (6.12) is
exponentially bounded in mean square and bounded w.p.o..
The closed-form formulas of λk and σk are detailed in [229]. As in the static case, the conditions
may be too stringent such that the results still hold even if the conditions are not satisfied, as
illustrated in the simulations.
Detecting missing tags is one of the most important RFID applications. According to the statistics
presented in [75], inventory shrinkage, a combination of shoplifting, internal theft, administrative
and paperwork error, and vendor fraud, resulted in 44 billion dollars in loss for retailers in 2014.
Hence, missing tag detection has attracted extensive research attention. Existing missing tag de-
tection algorithms can be classified into probabilistic algorithms and deterministic algorithms,
summarized as below.
Probabilistic algorithms detect a missing tag event with a predefined probability. Tan et al.
initiated the study of probabilistic detection and propose a solution called Trusted Reader Protocol
(TRP) in [187]. TRP detects a missing tag event by comparing the pre-computed slots with those
picked by the tags in the population. Follow-up works [134] [133] employ multiple seeds to
increase the probability of the singleton slot. The latest probabilistic algorithm called RUN is
proposed in [178]. Different from previous works, RUN considers the influence of unexpected tags
and can work in the environment with unexpected tags.
Deterministic algorithms, on the other hand, is able to exactly identify which tags are absent.
Li et al. developed a series of deterministic algorithms in [119] to reduce the radio collision and
enable tag identification at the bit level. Subsequently, Zhang et al. proposed another suite of
determine algorithms in [234] by storing the bitmap of tag responses in all rounds and comparing
them to determine the present and absent tags. But how to configure the algorithm parameters is
not theoretically analyzed. More recently, Liu et al. [130] enhanced the work by reconciling both
2-collision and 3-collision slots and filtering the empty slots to improve time efficiency.
In a broader context, tag identification and tag population estimation algorithms sometimes
can also be used to detect missing tags. Specifically, tag identification algorithms (e.g., [113,
179]) identify all tags in the interrogation region. To detect missing tags, they can be executed to
126
Chapter 6. Algorithm Design and Analysis in RFID Systems
obtain the IDs of the tags present in the population and then the missing tags can be found out
by comparing the collected IDs with those recorded in the database. However, tag identification
algorithms are usually time-consuming [119] as they are designed to identify all tags. Moreover,
they fail to work when it is not allowed to read the IDs of tags due to privacy concern. Tag
estimation algorithms (e.g., [39, 177, 239]), on the other hand, estimate the number of tags in the
interrogation region. If more than a certain number of tags are absent in RFID systems, a missing
tag event can be detected by comparing the estimation and the number of expected tags stored in
the database. However, estimation error may be misinterpreted as missing tags and cause detection
error, especially when there are only a few missing tags.
Compared to the state-of-the-art development, we formulate the missing tag detection problem
in the multiple-group multiple-region scenario, which has not been addressed before. We provide
a comprehensive analysis on this new problem and investigate how to devise optimum missing tag
detection algorithms.
We consider a grouped RFID system composed of a mobile reader and G groups of tags dis-
tributed in R (R ≥ 1) interrogation regions (e.g., R rooms), concisely referred to as regions. In
case where a tag may be physically located in two regions, i.e., regions may overlap one with an-
other, the tag only responses to reader queries regarding to the first region when it is interrogated.
In this sense, we can treat the regions as non-overlapping ones.
We use E to denote the set of the tags which are expected to be present and we denote its
cardinality (i.e., the number of expected tags) by |E|. The reader knows the IDs of all tags in E
but does not know the set of tags in each region. For presentation conciseness, we set the ID of
group g (1 ≤ g ≤ G) to its index g. We assume every tag knows its group ID through a grouping
algorithm, e.g. [124]. We also assume the reader knows the approximate number of tags of each
group g actually present in each region r (1 ≤ r ≤ R), denoted by ngr . The estimation of ngr can
be achieved by the reader by deactivating all tags not belonging to group g (using the ID of group
g) and then using any state-of-the-art tag population estimation algorithm.
To make our analysis generic, we do not impose any physical constraints on tags, which can be
either battery-powered active tags or lightweight passive ones energized by radio waves emitted by
the reader. We follow the standard Listen-before-talk communication algorithm [88] between the
reader and tags: the reader initiates communication first by sending commands and broadcasting
the parameters to tags, such as the frame size, random seeds, and then each tag responds in its
chosen time slot. Consider an arbitrary time slot, if no tag replies in this slot, it is called an empty
slot; otherwise, it is called a nonempty slot. Only one bit is needed to distinguish an empty slot
from a nonempty slot: 0 for an empty slot and 1 for a nonempty slot. During the communication,
the tag-to-reader transmission rate and the reader-to-tag transmission rate may differ with each
other. In practice, the former is either 40 − 640kb/s in the FM0 encoding format or 5 − 320kb/s in
the modulated subcarrier encoding format, while the later is normally 26.7 − 128kb/s [73].
Table 6.2 summarizes the main notations.
127
Chapter 6. Algorithm Design and Analysis in RFID Systems
We are interested in detecting missing tag event for each group g. Let mg denote the number of
missing tags in group g which is of course not known by the reader. Let Mg denote the threshold
of group g. A missing event of group g denotes the event where there are at least Mg tags of group
g missing in the system. Let Pdg denote the probability that the reader can detect a missing event
of group g, we formulate the optimum missing tag detection problem as follows.
Definition 6.5 (Optimum missing tag detection problem). The optimum missing tag detection
problem is to devise an algorithm of minimum execution time which can detect a missing event for
each group g with probability Pdg ≥ αg if mg ≥ Mg , where αg is the requirement on the detection
reliability for group g. When there is only one group in the system, the problem degenerates to the
classical missing event detection problem.
128
Chapter 6. Algorithm Design and Analysis in RFID Systems
of whether item e is in set S, the Bloom filter returns true if all corresponding k bits are 1 (i.e., it
returns ∧ki=1 B[hi (e)%m]). Bloom filters admit no false negatives but have false positives.
In our design, we explore the following three natural ideas, each corresponding to a proposed
missing tag detection algorithm detailed in the next three subsections.
Baseline approach. To enable missing tag detection in the multiple-region multiple-group case,
we let the reader use the same Bloom filter parameters in each region for each group of tags and
construct the Bloom filter based on the responses from the tags to perform missing event detection.
This approach, termed as B-detect, is a direct application of Bloom filter to solve our problem.
Adaptive approach. In the baseline approach B-detect, the reader uses the same parameters
in each region, which may not be optimum in the case when tags are not evenly distributed across
regions. Motivated by this observation, we develop an adaptive approach, named AB-detect, which
enables the reader to use different parameters based on the number of tags in the interrogation
region the reader queries. Specifically, for each region r, the reader executes one query, to which
tags of all the groups in the region respond. The reader constructs a Bloom filter Br for each region
containing the response and aggregates Br (1 ≤ r ≤ R) to form a virtual Bloom filter B AB , based
on which it detects missing event for each group.
Group-wise approach. We further develop a group-wise approach, referred to as GAB-detect.
In GAB-detect, the reader executes G group-wise queries for each region r. Only tags of group g
(1 ≤ g ≤ G) in the interrogation region respond to the g-th query. The reader then constructs a
Bloom filter Bgr for each group g and aggregates Bgr (1 ≤ r ≤ R) to form a virtual Bloom filter
Bg∗ using the technique in AB-detect, based on which it detects missing event for group g.
By sequentially analysing the above three approaches and mathematically comparing their
performance, we gradually iron out an optimum detection algorithm that works in practice.
In the B-detect design to enable missing tag detection in the multiple-region case, we let
the reader use the same parameters in each region and construct the Bloom filter based on the
responses from the tags to perform missing event detection. Specifically, B-detect consists of two
phases, detailed as below.
Phase 1: Query and feedback collection. The reader performs a query in each region r with
the same parameter setting (f, k, s), where f is the length of the Bloom filter vector, k is the
number of independent hash functions used to construct the Bloom filter vector, and s is the seed
of the hash functions which is identical for all groups and regions. How their values are chosen is
analysed in Sec. 6.4.3.2 on parameter optimisation. Upon receiving the request, each tag in region
129
Chapter 6. Algorithm Design and Analysis in RFID Systems
r, regardless of the group to which it belongs, selects k slots (hv (ID) mod f ) (1 ≤ v ≤ k) in the
frame of f slots and replies in these slots. The reader then constructs a Bloom filter vector Br with
the responses from the tags in each region r as follows. Note there are two types of slots: empty
slots and nonempty slots. According to the responses from tags, if slot i (1 ≤ i ≤ f ) is empty, the
reader sets Br (i) = 0, otherwise it sets Br (i) = 1.
Phase 2: Virtual Bloom filter construction and missing event detection. After interrogating
all R regions, the reader combines the Bloom filter vectors Br (1 ≤ r ≤ R) to a virtual Bloom
filter B by XORing each bit of them, i.e., B(i) = B1 (i) ⊕ · · · ⊕ BR (i). The reader then performs
membership test. For each tag in E, the reader maps its ID into k bits at positions (hv (ID) mod f )
(1 ≤ v ≤ k). If all the corresponding bits in B are 1, then the tag is regarded as present. Otherwise,
the tag is considered to be missing. The reader reports a missing event in group g if the number of
missing tags is at least Mg and no missing event otherwise.
where m = G
P
g=1 mg denotes the total number of missing tags in all groups.
By rearranging (6.56), we can express the Bloom filter size as
−(|E| − m)k
f= 1 . (6.57)
ln(1 − Pfkp )
The following theorem derives the optimal values of f and k in the sense of minimising the
execution time.
Theorem 6.7. The optimum size of the Bloom filter and the optimum number of hash functions in
B-detect, denoted by f ∗ and k ∗ respectively, that minimize the execution time while satisfying the
detection reliability requirement for each group g regardless of mg , are as follows:
1
Mg ∗
k ∗ ln 1 − α g∗
f ∗ = (|E| − M ) · 1 , k∗ = 1 , (6.58)
k∗
− ln(1 − X ∗ ) ln 2
g
1
where M = G
P Mg ∗
g=1 Mg , Xg , 1 − αg , and g =arg ming Xg .
Given the practical meaning of k ∗ and f ∗ , both of them should been further rounded to the
smallest integers not smaller than themselves.
130
Chapter 6. Algorithm Design and Analysis in RFID Systems
In B-detect, the reader uses the same parameters in each region, particularly the length of the
Bloom filter, which may not be optimum in the case when the tags are not evenly distributed across
interrogation regions. Motivated by this observation, we develop another missing tag detection
algorithm, named AB-detect, which enables the reader to use different parameters based on the
number of tags in the region the reader queries.
Phase 1: Query and feedback collection. The reader performs a query in each region r with
the parameter (fr , {kg }Gg=1 , s) where fr is the length of the Bloom filter vector used in region r, kg
is the number of hash functions used by tags in group g, s is the hash seed which is identical for
all groups and regions. There are two differences compared to the baseline approach. First, fr may
be different across different regions but are identical across groups; Second, kg may be different
across different groups but are identical across regions. We require fr to be a power-multiple of
two, i.e., fr = 2br , (br ∈ N). As in B-detect, the reader constructs an fr -bit Bloom filter vector
Br with the responses from the tags in each region r. Without loss of generality, we assume that
f1 ≤ f2 ≤ · · · ≤ fR .
Phase 2: Virtual Bloom filter construction and missing event detection. After interrogating
all R regions, the reader first expand Br to an fR -bit padded Bloom filter by repeating Br BBRr times.
Denote the padded Bloom filter as P Br . The reader then combines P Br (1 ≤ r ≤ R−1) and BR to a
virtual Bloom filter B AB by XORing each bit of them, i.e., B AB (i) = P B1 (i)⊕· · ·⊕P BR−1 (i)⊕BR (i)
(1 ≤ i ≤ fR ), as illustrated in Fig. 6.2. The reader then performs membership test. For each tag
in group g, the reader maps its ID into kg bits at positions (hv (ID) mod fR ) (1 ≤ v ≤ kg ). If all
the corresponding bits in B AB are 1, then the tag is regarded as present. Otherwise, the tag is
considered to be missing. The reader reports a missing event for group g if the number of missing
tags in the group g is at least Mg and no missing event otherwise.
0 1 0 1 0 1 0 1
B1
0 1 PB1
B AB
XOR
Br 1 1 0 0 PBr 1 1 0 0 1 1 0 0 1 1 0 1 1 1 0 1
BR 1 0 0 1 0 1 0 1
We investigate how to tune the parameters in AB-detect to minimize the execution time while
ensuring the reliability requirement of each group. We first formulate the false positive rate for
each group g, defined as Pf p,g . Recall the construction of B AB in AB-detect, the probability that
131
Chapter 6. Algorithm Design and Analysis in RFID Systems
Qg P G
g=1 kg ngr
1
any bit in B AB is zero is r=1 1− fr . The false positive rate for group g can then be
derived as
" R
PG #kg
kg ngr kg
Y 1 g=1
−
PR PG kg ngr
Pf p,g = 1− 1− ≈ 1−e r=1 g=1 fr . (6.59)
fr
r=1
The following theorem derives the optimal values of fr and kg that minimize the execution
time while ensuring the group-wise reliability requirement.
Theorem 6.8. The optimum Bloom filter vector size for the region r and the number of hash functions
for the group g, denoted as fr∗ and kg∗ , that minimize the execution time while satisfying the detection
reliability requirement for each group g regardless of mg , are as follows:
qP PR qPg 1
G ∗n ·
kg gr kg∗ ngr ln(1 − α
Mg
g=1 r=1 g=1 g )
fr∗ = ∗
, kg∗ = 1 , (6.60)
ming Yg ln 2
1 1
M ∗
where Yg∗ , − ln[1 − (1 − αg g ) kg ]. The minimum execution time under the above setting, defined as
∗ , is:
TAB
v 2
R u G
∗ 1 X u X
TAB = t kg∗ ngr . (6.61)
min1≤g≤G Yg∗
r=1 g=1
As kg∗ needs to be an integer and fr a power-multiple of two, they need to be rounded to the
smallest integer and power-multiple of two not smaller than themselves.
Theorem 6.9. Given the optimum parameters in B-detect and AB-detect, the following relationship
∗
∗ holds: 1 ≤ TAB ≤ 2.
between the minimum execution time of B-detect TB∗ and AB-detect TAB R T∗ B
In AB-detect, the reader constructs one Bloom filter that contains the response bits of tags of
all groups in the interrogation region. Mixing responses from tags of different group may cause
”interference” among groups and thus may increase the detection time for certain groups. Motivated
by this observation, we develop a group-wise approach, termed as GAB-detect, in which the reader
queries one group each time and constructs group-wise Bloom filters to eliminate the inter-group
interference.
132
Chapter 6. Algorithm Design and Analysis in RFID Systems
Phase 1: Query and feedback collection. The reader performs G queries in each region r.
In the g-th query (1 ≤ g ≤ G), the reader broadcasts a tetrad (g, kg , fgr , s) where g is the group
ID of group g, kg is the number of hash functions used by group g tags, fgr is the Bloom filter
size used in region r for group g, s is the hash seed which is identical for all regions and groups.
Again, we require fgr to be a power-multiple of two. Without loss of generality, we assume that
fg1 ≤ fg2 ≤ · · · ≤ fgR . When receiving the query, each tag compares its group ID with g. If the tag
does not belong to the group being queried, it keeps silent and waits for the next query. Otherwise,
the tag selects kg positions (hv (ID) mod fgr ) (1 ≤ v ≤ kg ) in the frame of fgr slots and transmits
a short response at each of the kg slots. The reader then constructs a Bloom filter for each group g
and each region r, denoted by Bgr GAB .
Phase 2: Virtual Bloom filter construction and missing event detection. After interrogating
GAB (1 ≤ r ≤ R − 1) to a virtual Bloom filter B GAB for each
all R regions, the reader combines Bgr g∗
group g by using the expansion and combination technique in AB-detect. The reader then performs
GAB .
membership test for each group g by using Bg∗
QR kg ngr
GAB is zero is 1
that any bit in Bg∗ r=1 1 − fgr . Hence, the false positive rate for group g can
be derived as #kg
R
"
1 kg ngr
kg ngr
kg
− R
Y P
Pf p,g = 1 − 1− ≈ 1−e r=1 fgr
. (6.62)
fgr
r=1
The following theorem derives the optimal values of fgr and kg that minimize the execution
time while ensuring the group-wise reliability requirement.
Theorem 6.10. The optimum Bloom filter vector size and number of hash functions for group g
in region r, denoted as fgr∗ and k ∗ , that minimize the execution time while satisfying the detection
g
reliability requirement for each group g regardless of mg , are:
1
√ PR √ M
∗ ngr · r=1 ngr ln(1 − αg g )
fgr = , kg∗ = , (6.63)
Zg∗ ln 21
∗
The minimum execution time under the above setting, defined as TGAB , is:
2
R √
P
G
∗
X r=1 ngr
TGAB = , (6.64)
Zg∗
g=1
1 1
Mg ∗
ln[1−(1−αg ) kg ]
where Zg∗ , −kg∗ .
133
Chapter 6. Algorithm Design and Analysis in RFID Systems
6.4.6 Discussion
We discuss some implementation issues of our proposed missing tag detection algorithms.
In our algorithms, the reader needs to estimate the number of tags in ngr in each region and
for each group. This may lead to extra overhead prior to missing tag detection. However, this
overhead can be limited as the estimation can be achieved in O(log ngr ) time using state-of-the-art
estimation approaches. Specifically, we can apply two types of methods to estimate ngr : single-
group estimator and multi-group estimator. In the single-group estimator, when staying at region r
the reader queries with the group ID g and only the tags from g respond. Then it operates like a
single-group system. ngr can be estimated by the methods in [39]. On the other hand, multi-group
estimator estimates multiple group sizes simultaneously by employing the maximum likelihood
estimation method as in [132], which is time-efficient.
Despite the extra overhead due to estimation of ngr , this estimation phase enables the pre-
detection of missing tags if the number of missing tags is important (e.g., due to unexpected loss
or accidents). More specifically, the reader can achieve pre-detection by comparing the bitmaps
constructed by the tag feedbacks and computed a priori by the reader. If a bit that is 1 in the
pre-calculated bitmap by reader but turns out to be 0 in the bitmap of the feedbacks, the reader can
identify the absence of tags mapped into this slot. If the number of missing tag for a given group
exceeds the threshold, a missing event is reported for the group. Consequently, the reader may not
need to execute the fine-grained detection algorithms since missing tag events have already been
detected in the estimation phase, thus reducing the time cost.
Unknown and unexpected tags can be interpreted as the tags that have not been identified
by the reader [129], such as newly arrived products, on which the reader does not have any
knowledge. During the interrogation, the unknown tags will respond together with the known tags,
which results in the interference to the detection of missing known tags and thus degrades the
performance [178, 228].
Fortunately, two of our proposed algorithms, AB-detect and GAB-detect, are resistant to the
interference caused by unknown tags. The reason is as follows. The unknown tags have not been
identified by the reader, so they do not have their individual group IDs [124] such that no group
ID in the interrogation messages matches with theirs. Therefore, unknown tags stay silent during
the whole detection process.
134
Chapter 6. Algorithm Design and Analysis in RFID Systems
135
Chapter 7
This habilitation thesis presents our works on some algorithmic problems of both fundamental
and practical importance in wireless networks. Chapter 2 investigates channel rendezvous and
neighbor discovery and presents a series of distributed channel rendezvous and neighbor discovery
algorithms enabling users to meet and discover each other within bounded and order-optimum
delay. Chapter 3 addresses the problem of opportunistic channel access by providing a generic
analysis to cast the problem into the RMAB problem and conducting a systematic analysis on a class
of myopic policies of both theoretical and practical importance. Chapter 4 considers distributed
learning in wireless networks and presents the design of distributed algorithms allowing users to
gradually converge to a stable and desirable system state based on purely location information and
interactions. Chapter 5 studies a class of path optimization and the related scheduling problems
arising from data harvesting and mobile charging and presents a generic analysis on these problems
and designs polynomial or quasi-polynomial time algorithm achieving constant or poly-logarithmic
approximation to the optimum. Chapter 6 focuses on algorithm design and analysis in RFID systems
by designing a number of basic algorithms including tag population estimation and missing tag
detection.
Despite the variety of problems we address, our work is driven by the common and fundamental
quest for algorithms that can scale elegantly, act efficiently in terms of computation and commu-
nication, while keeping operations as local and distributed as possible. Each of the chapters has
stated the corresponding optimization problem and established either a methodology for finding
its exact solution or, in some cases, efficient algorithms for its approximation. This is also accompa-
nied by an analysis of the properties of the resulting system optimum, including, in particular, the
establishment of analytic and asymptotic bounds on their performance, as well as a study of the
mathematical problems hinging behind and its generalization.
In this concluding chapter, we first provide a brief description of other works we have done that
are related to the central thread of the thesis. We then take a broader and further view to discuss
more general perspectives and directions for future research.
136
Chapter 7. Conclusion and Perspectives
Due to their perceived fairness and allocation efficiency, auctions are among the best-known
market-based mechanisms to allocate resources. However, conventional auctions cannot be directly
applied in wireless networks because radio resources are by nature different to conventional goods
due to radio interference and reuse. Radio resource auction is essentially a problem of interference-
constrained resource allocation. Motivated by this analysis, we have studied a number of problems
arising from radio resource auctions summarized below.
• Spectrum auction [44, 47]. We propose an auction framework for cognitive radio networks to
allow unlicensed secondary users (SU) to share the available spectrum of licensed primary
users (PU) fairly and efficiently, subject to the interference temperature constraint at each PU.
To study the competition among SUs, we formulate a non-cooperative multiple-PU multiple-
SU auction game and study the structure of the resulting equilibrium by solving a non-
continuous two-dimensional optimization problem, including the existence, uniqueness of
the equilibrium and the convergence to the equilibrium in the two auctions. A distributed
algorithm is developed in which each SU updates its strategy based on local information
to converge to the equilibrium. We also analyze the revenue allocation among PUs and
propose an algorithm to set the prices under the guideline that the revenue of each PU
should be proportional to its resource. We then extend the proposed auction framework to
the more challenging scenario with free spectrum bands. We develop an algorithm based on
the no-regret learning to reach a correlated equilibrium of the auction game. The proposed
algorithm, which can be implemented distributedly based on local observation, is especially
tailored in decentralized adaptive learning environments as cognitive radio networks.
• Truthful auction in mesh community networks [138]. Nowadays, the maintenance costs of
wireless devices represent one of the main limitations to the deployment of Wireless Mesh
Networks (WMN) as a means to provide Internet access in urban and rural areas. A promising
solution to this issue is to let the WMN operator lease its available bandwidth to a subset
of customers, forming a Wireless Mesh Community Network, in order to increase network
coverage and the number of residential users it can serve. Motivated by the above analysis, we
design an economically efficient and resilient auction-based bandwidth allocation in WMNs.
Our particular emphasis is on the resilience of the proposed mechanism against any actions
of selfish customers that manipulate the bandwidth marketplace of the network scenario
described above to obtain extra benefit. To tackle this problem, we design an optimum
truthful auction that forces each customer interested in leasing the available bandwidth to
bid its real valuation of the required bandwidth demand. More specifically, the approach
consists of finding the optimum set of customers to be accepted by the operator (auction
winners), whose traffic demands can be routed through the WMN, and the corresponding
prices they have to pay for the leased service, which constitute the operator revenue. The
optimum allocation and the pricing together ensure the truthfulness of the proposed auction
scheme.
• Auction-based mobile data offloading [155, 156, 204, 208]. The opportunistic utilization of
third party WiFi access devices to offload customer traffic from the mobile network has re-
cently gained momentum as a promising approach to increase the network capacity and
137
Chapter 7. Conclusion and Perspectives
simultaneously reduce the energy consumption of the radio access network (RAN) infrastruc-
ture. To foster the opportunistic utilization of unexploited Internet connections, we propose
a new and open market where a mobile operator can lease the bandwidth made available by
third parties (residential users or private companies) through their access points to increase
dynamically (and adaptively) the network capacity. Specifically, we propose and analyze a
combinatorial reverse auction to implement a market both for selecting the cheapest third
party access devices and offloading the maximum amount of data traffic from the radio access
networks. We show that a payment rule that considers only the variation of the objective func-
tion solving the problem with and without the winner does not always ensure the individual
rationality of the participants for the analyzed mobile data offloading problem. We present a
novel payment rule based on the Vickrey-Clarke-Groves (VCG) scheme and demonstrate that
it guarantees both individual rationality and truthfulness. Since the optimum reverse auction
problem is NP-hard, we further propose three greedy algorithms that solve the allocation
problem in polynomial time while preserving the truthfulness property.
In a typical resource allocation algorithm, users submit their demands (e.g., bids in auctions) to
the resource manager who allocates the resource to users based on their demands. In such context,
revealing one’s demand naturally opens the door for many security vulnerabilities. For example,
a malicious auctioneer can exploit such information for future auctions, as historical data can be
used to evaluate the willingness to pay of users. Hence, a critical research issue is how to design
privacy-preserving resource allocation algorithms that do not leak any information to any entity
other than the outcome of the allocation. To address this problem, we develop a generic framework
based on garbled circuits and secret sharing and apply the framework in the following context.
• Privacy-preserving spectrum auction [56, 58]. We design an information-theoretically secure
framework, termed as ITSEC, for truthful spectrum auctions, which ensures the privacy of
bidders’ bid information against any adversary with arbitrary computation power. We would
like to put the emphasis on cryptographical security, where a protocol is said to be secure if
no participating party can learn any information beyond the output of the protocol. Using this
formal security criteria, existing approaches indeed reveal certain information that cannot be
inferred from the output. Moreover, ITSEC brings almost no extra computation overhead to
the underlining spectrum auction and incurs only limited communication overhead. Techni-
cally, ITSEC introduces two separate entities, a seller agent and a buyer agent, to cooperate
with the auctioneer to securely run the auction. Note that none of the three parties need
to be a trusted party, but any two of them are assumed not to collude. As a distinguished
feature, ITSEC reveals nothing about the bids to any adversary with unbounded computation
power, except for the auction result. We also design circuits for spectrum auction mechanisms
implementing ITSEC, and optimize the circuits to further improve performance.
• Privacy-preserving cloud auction [57]. We develop a privacy-preserving cloud auction frame-
work that runs without disclosing any bid information. Specifically, we focus on a truthful
cloud auction scenario, where a cloud provider provides various computing resources to a
large number of heterogeneous users. Being a multi-unit combinatorial auction problem,
finding its optimum is NP-hard. A greedy resource allocation scheme was then proposed to
achieve reasonable economic and computational efficiency. Our work is to design a privacy-
preserving mechanism that prevent the auction from disclosing any information on user bids.
138
Chapter 7. Conclusion and Perspectives
Compared with other combinatory auctions, privacy preserving design in cloud auctions has
the following specific challenges. The first one is the heterogeneity in both computing re-
sources (VM instances) and user preferences. It is not clear how privacy-preserving auction
can be performed under such heterogeneous environment. Secondly, cloud users may not
stay online during the entire auction duration. In other words, a user may switch offline after
having submitted its bid. Last but not least, privacy-preserving auctions should be efficient,
and scale nicely in both computation and communication so as to support auctions with a
large number of users. The challenges in cloud auction privacy require our privacy-preserving
framework to be data-oblivious, meaning that its execution path should not depend on the
input. We thus develop a privacy-preserving algorithm uniquely composed of data-oblivious
operation blocks. We then leverage tools in garbled circuits and homomorphic encryption
to further preserve privacy of the whole auction algorithm. The complexity of our privacy-
preserving cloud auction is O(n log2 n) which is within a logarithmic factor of the complexity
of the original auction algorithm O(n log n).
Another research thrust that I have been developing is the application of game theory in the
security analysis and defense strategy design in smart grids. In this regard, we employ game
theoretical techniques to optimize the deployment of defense resources by focusing in particular
on the impact of attacks on equipment. By analyzing the interactions between the attacker and
the defender, we find the optimum choice of security modes to enable on each equipment in
the Advanced Metering Infrastructure (AMI) to protect the confidentiality of customers’ data.
In addition, we find the minimum defense resources needed to thwart any attack attempt to
compromise customers’ data in the AMI. In the smart grid, the interdependency between the
communication and the electric infrastructures also renders the management of the overall security
risk a challenging task. We address this issue by presenting an analytical model for identifying
and hardening the most critical communication equipment used in the power system. Using non-
cooperative game theory, we model the interactions between an attacker and a defender, and
derive the minimum defense resources required and the optimum strategy of the defender that
minimizes the risk on the power system. We validate our model via a case study based on the
polish electric power transmission system. These works (cf. [97 – 100] for details) are conducted
in collaboration with Ziad Ismail (formal Ph.D. student co-supervised with Pr. Jean Leneutre at
Telecom ParisTech and Dr. David Bateman at EDF). We are now extending our research to generic
critical infrastructures.
139
Chapter 7. Conclusion and Perspectives
It is well-known that randomized algorithms are often conceptually easy to implement, produce
good average results, and have better time complexity compared to deterministic algorithms. Ran-
domness is even indispensable for many algorithms to produce good results, such as the distributed
learning algorithms in load balancing. However, random algorithms fail to bound the worst-case
performance. Take the neighbor discovery problem we address as an example, probabilistic al-
gorithms perform well in the average case by limiting the expected discovery delay. The main
drawback of them is the lack of performance guarantee in terms of discovery delay. Determinis-
tic algorithms, on the other hand, have good worst-case performance while usually have longer
expected discovery delay.
A natural question is how to combine the advantage of both while limiting their side-effects.
In our work, we propose an approach to interleave the probabilistic slots, where the operation of
the algorithm is randomized, with the deterministic ones. As a result, we can tradeoff the worst-
case performance with the average performance. Another example is the very recent result on the
rendezvous problem where the authors beat the quadratic theoretical barrier on the worst-case
rendezvous delay by utilizing a public source of randomness in conjunction with a Markovian
hitter [54]. Generally speaking, how to systematically orchestrate randomness with determinism
is an important research axe, especially in emerging networks where we often need to tradeoff
worst-case and average performance.
In wireless networks and other distributed systems, it is usually desirable to keep operations as
local and distributed as possible. However, sometimes allowing sharing of information, even small
quantity of information, may bring significant performance gain. Our work on the imitation-based
spectrum access actually exploits a very simple form of information sharing, pair-wise imitation.
In many distributed systems like wireless networks, information sharing requires communication,
which may be expensive. It is therefore natural to quantify the benefits brought by information
sharing and how much information to share in a distributed setting.
To instantiate this point, we can take the RMAB problem as an example. In the classical
RMAB setting, a user chooses an action at each time from a set of actions. Since the rewards
from each action are unknown before actually activating it, each player needs to balance between
exploiting the action that yields the best payoffs so far and exploring new actions that may give
even higher payoffs later. In this setting, sharing information among users can greatly expedite
the exploration component in a distributed system. Each user can benefit from the information
shared by other users to infer statistical properties of the actions to make better decisions. However,
information sharing raises the issue of cooperative exploration among players. In addition, it also
incurs a communication cost, which needs to be managed at an acceptable level. Therefore, how
to integrate the information sharing and quantify the related tradeoffs deserve a comprehensive
treatment.
A central objective in algorithm design is optimality, i.e., the algorithm should produce good
results, ideally the optimum ones. However, many problems we encounter in practice are NP-hard
140
Chapter 7. Conclusion and Perspectives
and thus do not admit efficient algorithms finding the optimum solution. By efficient we mean
that the complexity is polynomial or quasi-polynomial in both time and space. Examples of these
problems encountered in our thesis include the RMAB problem studied in Chapter 3 which is
PSPACE-hard and the path optimization problems invoked in Chapter 6 which are either NP-hard
or APX-hard. Given the impossibility of finding efficient algorithms in many cases, we argue that it
is natural to take the following alternatives which we have used in our works.
• The first one is to seek sufficient conditions for simple and robust algorithms under which
the optimality of such algorithms is guaranteed. We have used this methodology in Chapter 3
where we study myopic policies and derive the optimality conditions.
• The second one is to develop approximation algorithms with bounded efficiency loss compare
to the system optimum. We have adopted this methodology in Chapter 6 in solving the path
optimization problems.
The above methods should also be carefully tuned by taking into account specific constraints
posed by the problems we address. For example, in some cases, even algorithms with polynomial
complexity are too expensive in terms of practical operations and are thus impractical to implement.
On the other hand, simple and implementable algorithms often implicate significant efficiency
degradation. Therefore, a balance need to be carefully tuned. As there is no universal recipe in
solving all the problems. A reasonable approach is to design algorithms with good efficiency-
implementability trade-off for a class of problems which are sufficiently generic and extensible to
solve a variety of other problems.
One important omission in this thesis is perhaps the design of online algorithms to respond to
each new input upon arrival. The key challenge is that previous decisions of the online algorithms
cannot be revoked. We are currently extending our work on the charging path optimization to
the online case where the charging requests are not known a priori. In stead, they are sent to the
charger sequentially in an on-demand fashion. The key problem we are facing is to design an online
charging scheduling algorithm with good performance in the face of uncertainty, since the future
is unknown to the algorithm. Formally, we seek to design an algorithm with bounded and constant
competitive ratio, which compares the performance of an online algorithm to that of an offline
algorithm which is given the whole input sequence beforehand. We plan to model the situation as
a two-person game between an online algorithm and a malicious adversary. The online algorithm’s
strategy is to minimize its cost (in terms of time or energy), while the adversary’s strategy is to
construct the worst possible input for the algorithm in terms of the cost. This formulation allows
us to use the rich body of theories in game theory to design efficient online algorithms, random or
deterministic.
We hope that by developing and analysis a set of online algorithms and comparing their effi-
ciency with that of offline algorithms, we can get more insight on the structure properties of the
problem and the guidelines in designing efficient algorithms.
In the coming era of big data and massively connected heterogeneous devices, it is evident that
we need efficient algorithms more than ever, algorithms that can operate rapidly and adaptively,
while generating good results. There is still a long way to go towards this ambitious objective. This
thesis is only a starting point of a long and fascinating journey.
141
Bibliography
[1] N. A BRAMSON. “The Aloha System: Another Alternative for Computer Communications”.
Proc. AFIPS Fall Joint Computer Conference. 1970 (see p. 108)
[2] H. A CKERMANN, P. B ERENBRINK, S. F ISCHER, and M. H OEFER. “Concurrent Imitation Dy-
namics in Congestion Games”. Proc. PODC. 2009 (see pp. 63, 67)
[3] V. A GGARWAL, Y. A NEJA, and K. N AIR. Minimal spanning tree subject to a side constraint.
Computers and Operations Research, 9:4 (1982), 287 – 296 (see p. 85)
[4] R. A GRAWAL. Sample Mean Based Index Policies with O(logn) Regret for the Multi-Armed
Bandit Problem. Advances in Applied Probability, 27:4 (1995), 1054 – 1078 (see p. 45)
[5] R. A GRAWAL, D. T ENEKETZIS, and V. A NANTHARAM. Asymptotically Efficient Adaptive Al-
location Rules for the Multiarmed Bandit Problem with Switching. IEEE Transactions on
Automatic Control, 33:10 (1988), 899 – 906 (see p. 60)
[6] S. A HMAND, M. L IU, T. JAVIDI, Q. ZHAO, and B. K RISHNAMACHARI. Optimality of Myopic
Sensing in Multichannel Opportunistic Access. IEEE Transactions on Information Theory,
55:9 (2009), 4040 – 4050 (see pp. 41, 44, 46 – 48)
[7] C. A LOS -F ERRER and K. H. S CHLAG. The Handbook of Rational and Social Choice. Oxford
University Press, 2009. Chap. Imitation and Learning (see p. 63)
[8] C. A LOS -F ERRER and F. S HI. Imitation with asymmetric memory. Economic Theory, 49:1
(2012), 193 – 215 (see pp. 63, 74)
[9] S. A LPERN. Hide and Seek Games. Seminar, Institut fur Hohere Studien, Wien, (1976) (see
p. 14)
[10] S. A LPERN and S. G AL. The Theory of Search Games and Rendezvous and Discrimination
for Resource Allocation in Shared Computer Systems. International Series in Operations
Research and Management Science, (2002) (see pp. 14, 16)
[11] S. A LPERN, R. F OKKINK, L. G ASIENIEC, R. L INDELAUF, and V. S. S UBRAHMANIAN. Search
Theory: A Game Theoretic Perspective. Springer, 2013 (see p. 14)
[12] A. A NANDKUMAR, N. M ICHAEL, and A. TANG. “Index-based sampling policies for tracking
dynamic networks under sampling constraints”. Proc. INFOCOM. 2010 (see pp. 45, 63)
[13] V. A NANTHARAM, P. VARAIYA, and J. WALRAND. Asymptotically Efficient Adaptive Allo-
cation Rules for the Multiarmed Bandit Problem with Switching. IEEE Transactions on
Automatic Control, 32:11 (1987), 968 – 976 (see p. 45)
[14] G. A NASTASI, M. C ONTI, M. D I F RANCESCO, and A. PASSARELLA. Energy conservation in
wireless sensor networks: A survey. Ad Hoc Networks, 7:3 (2009), 537 – 568 (see p. 28)
142
Bibliography
[15] E. J. A NDERSON and R. R. W EBER. The rendezvous problem on discrete locations. Journal
Applied Probability, 27:4 (1990), 839 – 851 (see pp. 14 – 16, 23)
[16] D. L. A PPLEGATE, R. E. B IXBY, V. C HVATAL, and W. J. C OOK. The Traveling Salesman
Problem: A Computational Study: A Computational Study. Princeton University Press, 2011
(see pp. 84, 94)
[17] N. ATANASOV, J. L E N Y, and G. J. PAPPAS. Distributed Algorithms for Stochastic Source
Seeking with Mobile Robot Networks. ASME Journal of Dynamic Systems, Measurement,
Control, 137:3 (2014), 1521 – 1533 (see p. 83)
[18] P. AUER, N. C ESA -B IANCHI, and P. F ISCHER. Finite-time Analysis of the Multiarmed Bandit
Problem. Machine Learning, 47:2 (2002), 235 – 256 (see p. 45)
[19] Y. A ZAR, O. G UREL -G UREVICH, E. L UBETZKY, and T. M OSCIBRODA. “Optimal discovery
strategies in white space networks”. Proc. ESA. 2011 (see p. 16)
[20] M. B AKHT, M. T ROWER, and R. H. K RAVETS. “Searchlight: won’t you be my neighbor”.
Proc. Mobicom. 2012 (see p. 29)
[21] N. B ANSAL, A. B LUM, S. C HAWLA, and A. M EYERSON. “Approximation algorithms for
deadline-TSP and vehicle routing with time-windows”. Proc. STOC. 2004 (see p. 94)
[22] R. B AR -Y EHUDA, G. E VEN, and S. M. S HAHAR. On approximating a geometric prize-
collecting traveling salesman problem with time windows. Journal of Algorithms, 55:1
(2005), 76 – 92 (see p. 94)
[23] A. B ARBATO, A. C APONE, L. C HEN, F. M ARTIGNON, and S. PARIS. “A Power Scheduling
Game for Reducing the Peak Demand of Residential Users”. Proc. GreenComm. 2013 (see
pp. 9, 10, 65)
[24] A. B ARBATO, A. C APONE, L. C HEN, F. M ARTIGNON, and S. PARIS. “Distributed Demand-Side
Management in Smart Grid: How Imitation improves power scheduling”. Proc. ICC. 2015
(see pp. 9, 10, 65)
[25] A. B ARBATO, A. C APONE, L. C HEN, F. M ARTIGNON, and S. PARIS. “Distributed Learning
Algorithms for Scheduling Games in the Future Smart Grid (Extended abstract)”. Proc.
NetGCoop. 2014 (see pp. 9, 10, 65)
[26] A. B ARBATO, A. C APONE, L. C HEN, F. M ARTIGNON, and S. PARIS. A distributed demand-
side management framework for the smart grid. Computer Communications, 57:2 (2015),
13 – 24 (see pp. 9, 10, 65)
[27] L. B ARLETTA, F. B ORGONOVO, and M. C ESANA. A formal proof of the optimal frame setting
for Dynamic-Frame Aloha with known population size. IEEE Transactions on Information
Theory, 60:11 (2014), 7221 – 7230 (see p. 109)
[28] M. B ASSEVILLE and I. V. N IKIFOROV. Detection of abrupt changes: theory and application.
Prentice Hall Englewood Cliffs, 1993 (see p. 123)
[29] P. B ERENBRIK, T. F RIEDETZKY, L. G OLDBERG, and P. G OLDBERG. “Distributed Selfish Load
Balancing”. Proc. SODA. 2011 (see p. 63)
[30] K. B IAN, J.-M. PARK, and R. C HEN. “A quorum-based framework for establishing control
channels in dynamic spectrum access networks”. Proc. Mobicom. 2009 (see p. 16)
[31] K. B IAN, J.-M. PARK, and R. C HEN. “Asynchronous channel hopping for establising ren-
dezvous in CRNs”. Proc. INFOCOM. 2011 (see p. 16)
143
Bibliography
[32] B. H. B LOOM. Space/time trade-offs in hash coding with allowable errors. Communications
of the ACM, 13:7 (1970), 422 – 426 (see pp. 107, 128, 130)
[33] A. B LUM, S. C HAWLA, D. KARGER, T. L ANE, A. M EYERSON, and M. M INKOFF. Approximation
Algorithms for Orienteering and Discounted-Reward TSP. SIAM Journal of Computing, 37:2
(2007), 653 – 670 (see pp. 85, 90, 91, 93, 94)
[34] C. B ORDENAVE, D. M C D ONALD, and A. P ROUTIERE. Asymptotic stability region of slotted
Aloha. IEEE Transactions on Information Theory, 58:9 (2012), 5841 – 5855 (see pp. 108,
109)
[35] E B RODSKY and B. S. D ARKHOVSKY. Nonparametric methods in change point problems.
Springer Science & Business Media, 1993 (see p. 122)
[36] C. C HEKURI and M. PAL. “A recursive greedy algorithm for walks in directed graphs”. Proc.
FOCS. 2005 (see p. 101)
[37] C. C HEKURI, G. E VEN, and G. KORTSARZ. A greedy approximation algorithm for the group
steiner problem. Discrete Applied Mathematics, 154:1 (2006), 15 – 34 (see p. 101)
[38] C. C HEKURI, N. KORULA, and M. P ÁL. Improved algorithms for orienteering and related
problems. ACM Transactions on Algorithms, 8:3 (2012), 1 – 27 (see p. 94)
[39] B. C HEN, Z. Z HOU, and H. Y U. “Understanding RFID counting protocols”. Proc. Mobicom.
2013 (see pp. 127, 134)
[40] L. C HEN and K. B IAN. The Telephone Coordination Game Revisited: From Random to
Deterministic Algorithms. IEEE Transactions on Computers, 64:10 (2015), 2968 – 2980 (see
pp. 8, 9, 14, 19, 22, 27)
[41] L. C HEN, K. B IAN, and M. Z HENG. Never Live Without Neighbors: From Single- to Multi-
Channel Neighbor Discovery for Mobile Sensing Applications. IEEE/ACM Transactions on
Networking, 24:5 (2016), 3148 – 3161 (see pp. 8, 9, 14)
[42] L. C HEN, S. I ELLAMO, and M. C OUPECHOUX. “Opportunistic Spectrum Access with Channel
Switching Cost for CRNs”. Proc. ICC. 2011 (see pp. 9, 42)
[43] L. C HEN, Y. L I, and A. V. VASILAKOS. “Oblivious neighbor discovery for wireless devices
with directional antennas”. Proc. INFOCOM (extended version to appear in IEEE/ACM Trans-
actions on Networking). 2016 (see pp. 8, 9, 14)
[44] L. C HEN, S. I ELLAMO, M. C OUPECHOUX, and P. G ODLEWSKI. “An Auction Framework for
Spectrum Allocation with Interference Constraint in CRNs”. Proc. INFOCOM. 2010 (see
p. 137)
[45] L. C HEN, R. FAN, K. B IAN, L. C HEN, M. G ERLA, T. WANG, and X. L I. “On heterogeneous
neighbor discovery in WSNs”. Proc. INFOCOM. 2015 (see pp. 8, 9, 14)
[46] L. C HEN, W. WANG, H. H UANG, and S. L IN. On Time-Constrained Data Harvesting in WSNs:
Approximation Algorithm Design. IEEE/ACM Transactions on Networking, 24:5 (2016),
3123 – 3135 (see pp. 10, 84, 88, 89, 92)
[47] L. C HEN, S. I ELLAMO, M. C OUPECHOUX, and P. G ODLEWSKI. Spectrum auction with inter-
ference constraint for CRNs with multiple primary and secondary users. Wireless Networks,
17:5 (2011), 1355 – 1371 (see p. 137)
[48] L. C HEN, W. WANG, H. H UANG, and S. L IN. “Time-constrained data harvesting in WSNs:
Theoretical foundation and algorithm design”. Proc. INFOCOM. 2015 (see pp. 10, 84)
144
Bibliography
[49] L. C HEN and K. B IAN. Neighbor Discovery in Mobile Sensing Applications. Ad Hoc Networks,
48:9 (2016), 38 – 52 (see pp. 8, 9, 14)
[50] L. C HEN, K. B IAN, and M. Z HENG. “Heterogeneous Multi-channel Neighbor Discovery
Formobile Sensing Applications: Theoretical Foundationand and Protocol Design”. Proc.
ACM MobiHoc. 2014 (see pp. 8, 9, 14, 34)
[51] L. C HEN, S. L IN, and H. H UANG. “Charge Me if You Can: Charging Path Optimization and
Scheduling in Mobile Networks”. Proc. MobiHoc. 2016 (see pp. 10, 84, 98, 100)
[52] L. C HEN, K. B IAN, L. C HEN, C. L IU, J.-M. PARK, and X. L I. “A Group-theoretic Framework
for Rendezvous in Heterogeneous CRNs”. Proc. ACM MobiHoc. 2014 (see pp. 8, 9, 14)
[53] L. C HEN, W. WANG, H. H UANG, and S. L IN. “Time-constrained data harvesting in WSNs:
Theoretical foundation and algorithm design”. Proc. INFOCOM. 2015 (see p. 94)
[54] S. C HEN, M. D IPPEL, A. R USSELL, A. S AMANTA, and R. S UNDARAM. “Markovian Hitters and
the Complexity of Blind Rendezvous”. Proc. SODA. 2016 (see pp. 39, 140)
[55] X. C HEN and J. H UANG. “Spatial spectrum access game: Nash equilibria and distributed
learning”. Proc. MobiHoc. 2012 (see p. 63)
[56] Z. C HEN, L. H UANG, and L. C HEN. “ITSEC: An information-theoretically secure framework
for truthful spectrum auctions”. Proc. INFOCOM. 2015 (see p. 138)
[57] Z. C HEN, L. C HEN, L. H UANG, and H. Z HONG. “On Privacy-Preserving Cloud Auction”. Proc.
SRDS. 2016 (see p. 138)
[58] Z. C HEN, L. C HEN, L. H UANG, and H. Z HONG. “Towards Secure Spectrum Auction: Both
Bids and Bidder Locations Matter (Extended abstract)”. Proc. MobiHoc. 2016 (see p. 138)
[59] D. C IULLO, G. C ELIK, and E. M ODIANO. “Minimizing transmission energy in sensor net-
works via trajectory control”. Proc. WiOpt. 2010 (see pp. 84, 94)
[60] B. C OLTIN and M. V ELOSO. “Mobile robot task allocation in hybrid WSNs”. Proc. IROS.
2010 (see p. 83)
[61] J. C ZYZOWICZ, A. L ABOUREL, and A. P ELC. “How to Meet Asynchronously (Almost) Every-
where”. Proc. SODA. 2010 (see p. 16)
[62] H. D AI, G. C HEN, C. WANG, S. WANG, X. W U, and F. W U. Quality of Energy Provisioning
for Wireless Power Transfer. IEEE Transactions on Parallel and Distributed Systems, 26:2
(2015), 527 – 537 (see p. 94)
[63] G. D E M ARCO, L. G ARGANO, E. K RANAKIS, D. K RIZANC, A. P ELC, and U. VACCARO. Asyn-
chronous deterministic rendezvous in graphs. Theoretical Computer Science., 355:3 (2006),
315 – 326 (see p. 16)
[64] A. D ESSMARK, P. F RAIGNIAUD, D. R. KOWALSKI, and A. P ELC. Deterministic Rendezvous in
Graphs. Algorithmica, 46:1 (2006), 69 – 96 (see p. 16)
[65] L. D I P UGLIA P UGLIESE, F. G UERRIERO, D. Z ORBAS, and T. M. R AZAFINDRALAMBO. Mod-
elling the mobile target covering problem using flying drones. Optimization Letters, 10:5
(2016), 1021 – 1052 (see p. 83)
[66] A. D UMITRESCU and J. M ITCHELL. “Approximation algorithms for TSP with neighborhoods
in the plane”. Proc. SODA. 2001 (see p. 84)
145
Bibliography
[67] P. D UTTA and D. C ULLER. “Practical asynchronous neighbor discovery and rendezvous for
mobile sensing applications”. Proc. SenSys. 2008 (see pp. 29, 32)
[68] N. E HSAN and M. L IU. “On the optimality of an index policy for bandwidth allocation with
delayed state observation and differentiated services”. Proc. INFOCOM. 2004 (see p. 44)
[69] E. E KICI, Y. G U, and D. B OZDAG. Mobility-based communication in WSNs. IEEE Communi-
cations Magazine, 44:7 (2006), 56 – 62 (see pp. 84, 94)
[70] J. E LIAS, F. M ARTIGNON, L. C HEN, and M. K RUNZ. Distributed Spectrum Management in
TV White Space Networks. IEEE Transactions on Vehicular Technology (to appear), (2016)
(see pp. 9, 10, 65)
[71] J. E LIAS, F. M ARTIGNON, L. C HEN, and E. A LTMAN. Joint Operator Pricing and Network
Selection Game in CRNs: Equilibrium, System Dynamics and Price of Anarchy. IEEE Trans-
actions on Vehicular Technology, 62:9 (2013), 4576 – 4589 (see pp. 9, 10, 65)
[72] G. E LLISON. Basins of Attraction, Long-Run Stochastic Stability, and the Speed of Step-by-
Step Evolution. Review of Economic Studies, 67:1 (2000), 17 – 45 (see pp. 64, 75 – 78)
[73] EPC GLOBAL I NC . Radio-Frequency Identity Protocols Class-1 Generation-2 UHF RFID Pro-
tocol for Communications at 860 MHz - 960 MHz Version 1.0.9 (2005) (see pp. 107, 108,
127)
[74] F. FARHADI and F. A SHTIANI. Stability Region of a Slotted Aloha Network with K-Exponential
Backoff. arXiv preprint:1406.4448, (2014) (see pp. 108, 109)
[75] N. R. F EDERATION. National Retail Security Survey. 2015 (see p. 126)
[76] K. F INKENZELLE. RFID handbook: Radio frequency identification fundamentals and applica-
tions. John Wiley & Sons, 2000 (see p. 117)
[77] D. F OSTER and P. Y OUNG. Stochastic evolutionary game dynamics. Theoretical Population
Biology, 38:2 (Oct. 1990), 219 – 232 (see pp. 64, 75)
[78] D. P. F OSTER and H. P. Y OUNG. Regret testing: learning to play Nash equilibrium without
knowing you have an opponent. Theoretical Economics, 1:3 (2006), 341 – 367 (see p. 64)
[79] J. F RIEDMAN and C. M EZZETTI. Learning in Games by Random Sampling. Journal of
Economic Theory, 98:1 (May 2001), 55 – 84 (see pp. 64, 78)
[80] A. G ANESH, S. L ILIENTHAL, D. M ANJUNATH, A. P ROUTIERE, and F. S IMATOS. “Load Balanc-
ing via Random Local Search in Closed and Open Systems”. Proc. SIGMETRICS. 2010 (see
p. 63)
[81] S. G HEZ, S. V ERDU, and S. C. S CHWARTZ. Optimal decentralized control in the random
access multipacket channel. IEEE Transactions on Automatic Control, 34:11 (1989), 1153 –
1163 (see p. 109)
[82] S. G HEZ, S. V ERDU, and S. C. S CHWARTZ. Stability properties of slotted Aloha with multi-
packet reception capability. IEEE Transactions on Automatic Control, 33:7 (1988), 640 – 649
(see pp. 109, 111)
[83] J. C. G ITTINS and D. J ONES. A Dynamic Allocation Index For the Sequential Design of
Experiments. Progress in Statistics, (1974), 241 – 266 (see p. 43)
[84] S. G UHA and K. M UNAGALA. “Approximation algorithms for partial-information based
stochastic control with markovian rewards”. Proc. FOCS. 2007 (see p. 44)
146
Bibliography
[85] S. G UHA and K. M UNAGALA. “Approximation algorithms for restless bandit problems”. Proc.
SODA. 2009 (see p. 44)
[86] F. G USTAFSSON. Adaptive filtering and change detection. Wiley New York, 2000 (see p. 122)
[87] A. H AGHANI and S. J UNG. A Dynamic Vehicle Routing Problem with Time-dependent Travel
Times. Computers and Operations Research, 32:11 (2005), 2959 – 2986 (see p. 94)
[88] H. H AN, B. S HENG, C. C. TAN, Q. L I, W. M AO, and S. L U. “Counting RFID tags efficiently
and anonymously”. Proc. INFOCOM. 2010 (see pp. 113, 127)
[89] K. H AN, J. L UO, Y. L IU, and A. VASILAKOS. Algorithm design for data communications
in duty-cycled wireless sensor networks: a survey. IEEE Communications Magazine, 51:7
(2013), 107 – 113 (see p. 28)
[90] L. H E, L. F U, L. Z HENG, Y. G U, P. C HENG, J. C HEN, and J. PAN. “Esync: An energy synchro-
nized charging protocol for rechargeable WSNs”. Proc. MobiHoc. 2014 (see p. 94)
[91] C. H ELVIG, G. R OBINS, and A. Z ELIKOVSKY. The moving-target traveling salesman problem.
Journal of Algorithms, 49:1 (2003), 153 – 174 (see p. 95)
[92] Y. H U, X. WANG, and X. G AN. “Critical sensing range for mobile heterogeneous camera
sensor networks”. Proc. INFOCOM. 2014 (see p. 94)
[93] H. H UANG, S. L IN, L. C HEN, J. G AO, A. M AMAT, and J. W U. “Dynamic Mobile Charger
Scheduling in Heterogeneous WSNs”. Proc. MASS. 2015 (see pp. 10, 84)
[94] S. I ELLAMO, L. C HEN, and M. C OUPECHOUX. “Retrospective spectrum access protocol: A
payoff-based learning algorithm for CRNs”. Proc. ICC. 2014 (see pp. 9, 10, 65)
[95] S. I ELLAMO, L. C HEN, M. C OUPECHOUX, and A. V. VASILAKOS. “Imitation-based spectrum
access policy for CRNs”. Proc. ISWCS. 2012 (see pp. 9, 10, 65)
[96] S. I ELLAMO, L. C HEN, and M. C OUPECHOUX. Proportional and double imitation rules for
spectrum access in CRNs. Computer Networks, 57:8 (2013), 1863 – 1879 (see pp. 9, 10, 64,
65, 70, 72, 74)
[97] Z. I SMAIL, J. L ENEUTRE, D. B ATEMAN, and L. C HEN. A Game Theoretical Analysis of Data
Confidentiality Attacks on Smart-Grid AMI. IEEE Journal on Selected Areas in Communica-
tions, 32:7 (2014), 1486 – 1499 (see p. 139)
[98] Z. I SMAIL, J. L ENEUTRE, D. B ATEMAN, and L. C HEN. “A Game-Theoretical Model for Secu-
rity Risk Management of Interdependent ICT and Electrical Infrastructures”. Proc. HASE.
2015 (see p. 139)
[99] Z. I SMAIL, J. L ENEUTRE, D. B ATEMAN, and L. C HEN. “A Methodology to Apply a Game The-
oretic Model of Security Risks Interdependencies Between ICT and Electric Infrastructures”.
Proc. GameSec. 2016 (see p. 139)
[100] Z. I SMAIL, C. K IENNERT, J. L ENEUTRE, and L. C HEN. Auditing a Cloud Provider’s Compli-
ance With Data Backup Requirements: a Game Theoretical Analysis. IEEE Transactions on
Information Forensics and Security, 11:8 (2016), 1685 – 1699 (see p. 139)
[101] P. J. Value of information in optimal flow-level scheduling of users with Markovian time-
varing channels. Performance Evaluation, 68:11 (2011), 1022 – 1036 (see p. 44)
[102] J. J EON and A. E PHREMIDES. On the Stability of Random Multiple Access With Stochastic
Energy Harvesting. IEEE Journal on Selected Areas in Communications, 33:3 (2015), 571 –
584 (see p. 109)
147
Bibliography
[103] J.-R. J IANG. Expected quorum overlap sizes of quorum systems for asynchronous power-
saving in mobile ad hoc networks. Computer Networks, 52:17 (2008), 3296 – 3306 (see
p. 29)
[104] N. J OHNSON and S. KOTZ. Urn models and their application: an approach to modern discrete
probability theory. Wiley, 1977 (see p. 109)
[105] O. J ONATHAN. A Continuous-Time Markov Decision Process for Infrastructure Surveillance.
Proc. Operations Research, (2010), 327 – 332 (see p. 44)
[106] A. KANDHALU, K. L AKSHMANAN, and R. R AJKUMAR. “U-connect: a low-latency energy-
efficient asynchronous neighbor discovery protocol”. Proc. IPSN. 2010 (see pp. 29, 32)
[107] N. KAROWSKI, A. C. V IANA, and A. W OLISZ. Optimized Asynchronous Multi-channel Dis-
covery of IEEE 802.15.4-based Wireless Personal Area Networks. IEEE Transactions on
Mobile Computing, 12:10 (2013), 1972 – 1985 (see p. 28)
[108] D. K IM, B. A BAY, R. N. U MA, W. W U, W. WANG, and A. T OKUTA. “Minimizing data collec-
tion latency in WSNs with multiple mobile elements”. Proc. INFOCOM. 2012 (see pp. 84,
91 – 93)
[109] M. KODIALAM and T. N ANDAGOPAL. “Fast and reliable estimation schemes in RFID systems”.
Proc. Mobicom. 2006 (see pp. 113, 118)
[110] M. KODIALAM, T. N ANDAGOPAL, and W. C. L AU. “Anonymous tracking using RFID tags”.
Proc. INFOCOM. 2007 (see pp. 107, 113)
[111] V. F. KOLCHIN, B. A. S EVASTYANOV, and V. P. C HISTYAKOV. Random allocation. Wiley New
York, 1978 (see p. 118)
[112] S. C. KOMPALLI and R. R. M AZUMDAR. On the stability of finite queue slotted Aloha
protocol. IEEE Transactions on Information Theory, 59:10 (2013), 6357 – 6366 (see pp. 108,
109)
[113] T. F. L A P ORTA, G. M ASELLI, and C. P ETRIOLI. Anticollision protocols for single-reader
RFID systems: temporal analysis and optimization. IEEE Transactions on Mobile Computing,
10:2 (2011), 267 – 279 (see p. 126)
[114] S. L AI. Heterogenous Quorum-based Wakeup Scheduling for Duty-Cycled Wireless Sensor
Networks. PhD thesis. Virginia Polytechnic Institute and State University, 2009 (see p. 29)
[115] T. L. L AI and H. R OBBINS. Asymptotically Efficient Adaptive Allocation Rules. Advances in
Applied Probability, 6:1 (1985), 4 – 22 (see p. 45)
[116] F. E. L APICCIRELLA, K. L IU, and Z. D ING. “Multi-Channel Opportunistic Access Based on
Primary ARQ Messages Overhearing”. Proc. ICC. 2011 (see pp. 41, 44)
[117] G. L APORTE. The traveling salesman problem: An overview of exact and approximate
algorithms. European Journal of Operational Research, 59:2 (1992), 231 – 247 (see p. 94)
[118] J. L E N Y, M. D AHLEH, and E. F ERON. “Multi-UAV dynamic routing with partial observations
using restless bandit allocation indices”. Proc. ACC. 2008 (see p. 44)
[119] T. L I, S. C HEN, and Y. L ING. “Identifying the missing tags in a large RFID system”. Proc.
MobiHoc. 2010 (see pp. 126, 127)
[120] T. L I, S. W U, S. C HEN, and M. YANG. “Energy efficient algorithms for the RFID estimation
problem”. Proc. INFOCOM. 2010 (see p. 107)
148
Bibliography
[121] Z. Y. L IN, H. L IU, X. W. C HU, and Y.-W. L EUNG. “Jump-stay based channel hopping algo-
rithm with guaranteed rendezvous for CRNs”. Proc. INFOCOM. 2011 (see p. 16)
[122] H. L IU, K. L IU, and Q. Z HAO. “Learning and Sharing in A Changing World: Non-Bayesian
Restless Bandit with Multiple Players”. Proc. ITA. 2011 (see p. 45)
[123] H. L IU, K. L IU, and Q. Z HAO. “Logarithmic Weak Regret of Non-Bayesian Restless Multi-
Armed Bandit”. Proc. ICASSP. 2011 (see p. 45)
[124] J. L IU, B. X IAO, S. C HEN, F. Z HU, and L. C HEN. “Fast RFID grouping protocols”. Proc.
INFOCOM. 2015 (see pp. 127, 134)
[125] K. L IU and Q. Z HAO. Distributed learning in multi-armed bandit with multiple players.
IEEE Transactions on Wireless Communications, 58:11 (2010), 5667 – 5681 (see pp. 41, 44,
45)
[126] K. L IU and Q. Z HAO. Indexability of Restless Bandit Problems and Optimality of Whittle
Index for Dynamic Multichannel Access. IEEE Transactions on Information Theory, 56:11
(2000), 5547 – 5567 (see p. 44)
[127] K. L IU, Q. Z HAO, and B. K RISHNAMACHARI. Dynamic Multichannel Access With Imperfect
Channel State Detection. IEEE Transactions on Signal Processing, 58:5 (2010), 2795 – 2807
(see pp. 51, 52)
[128] Q. L IU, K. WANG, and L. C HEN. On Optimality of Greedy Policy for a Class of Standard
Reward Function of Restless Multi-armed Bandit Problem. IET Transactions on Signal Pro-
cessing, 6:6 (2012), 584 – 593 (see p. 52)
[129] X. L IU, B. X IAO, S. Z HANG, and K. B U. Unknown tag identification in large RFID systems:
An efficient and complete solution. IEEE Transactions on Parallel and Distributed Systems,
26:6 (2015), 1775 – 1788 (see p. 134)
[130] X. L IU, K. L I, G. M IN, Y. S HEN, A. X. L IU, and W. Q U. Completely pinpointing the missing
RFID Tags in a time-efficient way. IEEE Transactions on Computers, 64:1 (2015), 87 – 96
(see p. 126)
[131] X. L IU, K. L I, G. M IN, K. L IN, B. X IAO, Y. S HEN, and W. Q U. Efficient unknown tag identi-
fication protocols in large-scale RFID systems. IEEE Transactions on Parallel and Distributed
Systems, 25:12 (2014), 3145 – 3155 (see p. 108)
[132] W. L UO, Y. Q IAO, and S. C HEN. “An efficient protocol for RFID multigroup threshold-based
classification”. Proc. INFOCOM. 2013 (see p. 134)
[133] W. L UO, S. C HEN, Y. Q IAO, and T. L I. Missing-Tag Detection and Energy-Time Tradeoff in
Large-Scale RFID Systems With Unreliable Channels. IEEE/ACM Transactions on Network-
ing, 22:4 (2014), 1079 – 1091 (see p. 126)
[134] W. L UO, S. C HEN, T. L I, and Y. Q IAO. “Probabilistic missing-tag detection and energy-time
tradeoff in large-scale RFID systems”. Proc. MobiHoc. 2012 (see p. 126)
[135] M. M A and Y. YANG. SenCar: An Energy-Efficient Data Gathering Mechanism for Large-
Scale Multihop Sensor Networks. IEEE Transactions on Parallel and Distributed Systems,
18:10 (2007), 1476 – 1488 (see p. 84)
[136] J. R. M ARDEN, H. P. Y OUNG, G. A RSLAN, and J. S. S HAMMA. Payoff-Based Dynamics for
Multiplayer Weakly Acyclic Games. SIAM Journal on Control and Optimization, 48:1 (2009),
373 – 396 (see p. 64)
149
Bibliography
150
Bibliography
[155] S. PARIS, F. M ARTISNON, I. F ILIPPINI, and L. C HEN. “A bandwidth trading marketplace for
mobile data offloading”. Proc. INFOCOM. 2013 (see p. 137)
[156] S. PARIS, F. M ARTIGNON, I. F ILIPPINI, and L. C HEN. An Efficient Auction-based Mechanism
for Mobile Data Offloading. IEEE Transactions on Mobile Computing, 14:8 (2015), 1573 –
1586 (see p. 137)
[157] Y. P ENG, Z. L I, W. Z HANG, and D. Q IAO. “Prolonging sensor network lifetime through
wireless charging”. Proc. RTSS. 2010 (see p. 94)
[158] B. S. P RADELSKI and H. P. Y OUNG. Learning efficient Nash equilibria in distributed systems.
Games and Economic Behavior, 75:2 (2012), 882 – 897 (see p. 64)
[159] Z. G. P RODANOFF. Optimal frame size analysis for framed slotted Aloha based rfid networks.
Computer Communications, 33:5 (2010), 648 – 653 (see p. 109)
[160] C. Q IAN, H. N GAN, Y. L IU, and L. M. N I. Cardinality estimation for large-scale RFID
systems. IEEE Transactions on Parallel and Distributed Systems, 22:9 (2011), 1441 – 1454
(see pp. 107, 113)
[161] H. Q IN and W. Z HANG. “Charging Scheduling with Minimal Waiting in a Network of
Electric Vehicles and Charging Stations”. Proc. VANET. 2011 (see p. 83)
[162] V. R AGHUNATHAN, V. B ORKAR, C. M IN, and P. KUMAR. “Index Policies for Real-Time Multi-
cast Scheduling for Wireless Broadcast Systems”. Proc. INFOCOM. 2008 (see p. 44)
[163] R. R. R AO and A. E PHREMIDES. On the stability of interacting queues in a multiple-access
system. IEEE Transactions on Information Theory, 34:5 (1988), 918 – 930 (see p. 109)
[164] J. R EICH, V. M ISRA, D. R UBENSTEIN, and G. Z USSMAN. Connectivity maintenance in mobile
wireless networks via constrained mobility. IEEE Journal on Selected Areas on Communica-
tions, 30:5 (2012), 935 – 950 (see p. 94)
[165] K. R EIF, S. G ÜNTHER, E. YAZ S R ., and R. U NBEHAUEN. Stochastic stability of the discrete-
time extended Kalman filter. IEEE Transactions on Automatic Control, 44:4 (1999), 714 –
728 (see p. 116)
[166] M. B. R HUDY and Y. G U. Online Stochastic Convergence Analysis of the Kalman Filter.
International Journal of Stochastic Analysis, (2013) (see p. 117)
[167] L. G. R OBERT. Aloha packet system with and without slots and capture. ACM SIGCOMM
Computer Communication Review, 5:2 (1975), 28 – 42 (see p. 108)
[168] M. R OBERTI. Wal-Mart begins RFID process changes. RFID Journal, (2005) (see p. 107)
[169] L. S AMMUELSON and J. Z HANG. Evolutionary Stability in Asymmetric Games. Journal of
Economic Theory, 57:2 (1992), 363 – 391 (see p. 68)
[170] W. H. S ANDHOLM. Local Stability under Evolutionary Game Dynamics. Theoretical Eco-
nomics, 5:1 (2010), 27 – 50 (see p. 67)
[171] J. S ANT and V. S HARMA. Performance analysis of a slotted-Aloha protocol on a capture
channel with fading. Queueing Systems, 34:1 (2000), 1 – 35 (see p. 109)
[172] V. S ARANGAN, M. D EVARAPALLI, and S. R ADHAKRISHNAN. A framework for fast RFID tag
reading in static and mobile environments. Computer Networks, 52:5 (2008), 1058 – 1073
(see p. 114)
151
Bibliography
[173] K. H. S CHLAG. Which One Should I Imitate ? Journal of Mathematical Economics, 31:4
(1999), 493 – 522 (see p. 68)
[174] K. H. S CHLAG. Why Imitate, and if so, How ? A Boundedly Rational Approach to Multi-
Armed Bandits. Journal of Economic Theory, 78:1 (1998), 130 – 156 (see p. 66)
[175] F. C. S CHOUTE. Dynamic frame length Aloha. IEEE Transactions on Communications, 31:4
(1983), 565 – 568 (see p. 109)
[176] R. S HAH, S. R OY, S. JAIN, and W. B RUNETTE. “Data MULEs: Modeling a Three-tier Archi-
tecture for Sparse Sensor Networks”. Proc. SNPA Workshop. 2003 (see p. 82)
[177] M. S HAHZAD and A. X. L IU. “Every bit counts: fast and scalable RFID estimation”. Proc.
Mobicom. 2012 (see pp. 107, 113, 127)
[178] M. S HAHZAD and A. X. L IU. “Expecting the unexpected: Fast and reliable detection of
missing RFID tags in the wild”. Proc. INFOCOM. 2015 (see pp. 126, 134)
[179] M. S HAHZAD and A. X. L IU. “Probabilistic optimal tree hopping for RFID identification”.
Proc. SIGMETRICS. 2013 (see p. 126)
[180] Y. S ONG and J. W. G RIZZLE. “The extended Kalman filter as a local asymptotic observer
for nonlinear discrete-time systems”. Proc. ACC. 1992 (see pp. 107, 114)
[181] F. S PIRING. Introduction to Statistical Quality Control. Technometrics, 49:1 (2007), 108 –
109 (see p. 123)
[182] R. S UGIHARA and R. K. G UPTA. Speed Control and Scheduling of Data Mules in Sensor
Networks. ACM Transactions on Senor Networks, 7:1 (2010), 1 – 29 (see pp. 84, 94)
[183] G. S UN, F. W U, and G. C HEN. “Neighbor Discovery in Low-Duty-Cycle Wireless Sensor
Networks with Multipacket Reception”. Proc. ICPADS. 2012 (see p. 28)
[184] W. S UN, Z. YANG, K. WANG, and Y. L IU. “Hello: A generic flexible protocol for neighbor
discovery”. Proc. INFOCOM. 2014 (see p. 29)
[185] W. S ZPANKOWSKI. Stability conditions for some distributed systems: Buffered random
access systems. Advances in Applied Probability, 26:2 (1994), 498 – 515 (see p. 109)
[186] A. TA -S HMA and U. Z WICK. “Deterministic rendezvous, treasure hunts and strongly univer-
sal exploration sequences”. Proc. SODA. 2007 (see p. 16)
[187] C. C. TAN, B. S HENG, and Q. L I. “How to monitor for missing RFID tags”. Proc. ICDCS.
2008 (see p. 126)
[188] T.-J. TARN and Y. R ASIS. Observers for nonlinear stochastic systems. IEEE Transactions on
Automatic Control, 21:4 (1976), 441 – 448 (see pp. 116, 117)
[189] C. T EKIN and M. L IU. “Online learning in opportunistic spectrum access: a restless bandit
approach”. Proc. INFOCOM. 2011 (see p. 45)
[190] N. T HEIS, R. T HOMAS, and L. D A S ILVA. Rendezvous for cognitive radios. IEEE Transactions
on Mobile Computing, 10:2 (2011), 216 – 227 (see p. 16)
[191] W. R. T HOMPSON. On the Likelihood that One Unknown Probability Exceeds Another in
View of the Evidence of Two Samples. Biometrika, 25: (1933), 275 – 294 (see p. 43)
[192] D. T ONE. The evolution of conventions with mobile players. Journal of Economic Behavior
& Organization, 38:1 (1999), 93 – 111 (see p. 64)
152
Bibliography
[193] P. T OTH and D. V IGO, eds. The Vehicle Routing Problem. MOS/SIAM Series on Optimization,
2001 (see p. 94)
[194] Y.-C. T SENG, C.-S. H SU, and T.-Y. H SIEH. “Power-saving protocols for ieee 802.11-based
multi-hop ad hoc networks”. Proc. INFOCOM. 2002 (see p. 29)
[195] B. S. T SYBAKOV and V. A. M IKHAILOV. Ergodicity of a slotted Aloha system. Problemy
Peredachi Informatsii, 15:4 (1979), 73 – 87 (see p. 109)
[196] I. VASILESCU, K. KOTAY, D. R US, M. D UNBABIN, and P. C ORKE. “Data Collection, Storage,
and Retrieval with an Underwater Sensor Network”. Proc. SenSys. 2005 (see p. 82)
[197] S. VASUDEVAN, D. T OWSLEY, D. G OECKEL, and R. K HALILI. “Neighbor discovery in wireless
networks and the coupon collector’s problem”. Proc. MobiCom. 2009 (see pp. 28, 108)
[198] F. V ÁZQUEZ G ALLEGO, J. A LONSO -Z ARATE, and L. A LONSO. “Energy and delay analysis of
contention resolution mechanisms for machine-to-machine networks based on low-power
WiFi”. Proc. ICC. 2013 (see p. 108)
[199] P.-J. WAN, K. A LZOUBI, and O. F RIEDER. “Distributed construction of connected dominating
set in wireless ad hoc networks”. Proc. INFOCOM. 2002 (see p. 91)
[200] K. WANG and L. C HEN. On Optimality of Myopic Policy for Restless Multi-Armed Bandit
Problem: An Axiomatic Approach. IEEE Transactions on Signal Processing, 60:1 (2012),
300 – 309 (see pp. 9, 42, 51)
[201] K. WANG, L. C HEN, and Q. L IU. On Optimality of Myopic Policy for Opportunistic Ac-
cess With Nonidentical Channels and Imperfect Sensing. IEEE Transactions on Vehicular
Technology, 63:5 (2014), 2478 – 2483 (see pp. 9, 42, 48)
[202] K. WANG, L. C HEN, and Q. L IU. Opportunistic Spectrum Access by Exploiting Primary User
Feedbacks in Underlay Cognitive Radio Systems: An Optimality Analysis. IEEE Journal of
Selected Topics in Signal Processing, 7:5 (2013), 869 – 882 (see pp. 9, 42)
[203] K. WANG, L. C HEN, and J. Y U. “On optimality of myopic policy in multi-channel oppor-
tunistic access”. Proc. ICC. 2016 (see pp. 9, 42)
[204] K. WANG, F. C. M. L AU, L. C HEN, and R. S CHOBER. “A distributed market framework for
mobile data offloading”. Proc. ICC. 2015 (see p. 137)
[205] K. WANG, L. C HEN, Q. L IU, J. Y U, Q. FAN, and Q. A I. On Optimality of Myopic Policy
in Multi-channel Opportunistic Access. IEEE Transactions on Communications (to appear),
(2016) (see pp. 9, 42)
[206] K. WANG, L. C HEN, Q. L IU, and K. A. A GHA. On Optimality of Myopic Sensing Policy with
Imperfect Sensing in Multi-Channel Opportunistic Access. IEEE Transactions on Communi-
cations, 61:9 (2013), 3854 – 3862 (see pp. 9, 42, 51)
[207] K. WANG, L. C HEN, Q. L IU, W. WANG, and F. L I. One Step Beyond Myopic Probing Policy:
A Heuristic Lookahead Policy for Multi-Channel Opportunistic Access. IEEE Transactions on
Wireless Communications, 14:2 (2015), 759 – 769 (see pp. 9, 42, 57, 59)
[208] K. WANG, F. C. M. L AU, L. C HEN, and R. S CHOBER. Pricing Mobile Data Offloading: A Dis-
tributed Market Framework. IEEE Transactions on Wireless Communications, 15:2 (2016),
913 – 927 (see p. 137)
153
Bibliography
[209] K. WANG, L. C HEN, K. A. A GHA, and Q. L IU. On Optimality of Myopic Policy in Opportunis-
tic Spectrum Access: The Case of Sensing Multiple Channels and Accessing One Channel.
IEEE Wireless Communications Letters, 1:5 (2012), 452 – 455 (see pp. 47, 48, 52)
[210] K. WANG, X. M AO, and Y. L IU. “BlindDate: A Neighbor Discovery Protocol”. Proc. ICPP.
2013 (see p. 29)
[211] R. W EBER. Optimal Symmetric Rendezvous Search on Three Locations. Mathematics of
Operations Research, 37:1 (2012), 111 – 122 (see p. 16)
[212] P. W HITTLE. Restless bandits: activity allocation in a changing world. Journal of Applied
Probability, Special Vol. 25A (1988), 287 – 298 (see pp. 43, 44)
[213] J. E. W IESELTHIER, A. E PHREMIDES, and A L ARRY. An exact analysis and performance eval-
uation of framed Aloha with capture. IEEE Transactions on Communications, 37:2 (1989),
125 – 137 (see p. 109)
[214] C. W U and S. W U. “On bridging the gap between homogeneous and heterogeneous ren-
dezvous schemes for cognitive radios”. Proc. MobiHoc. 2013 (see p. 16)
[215] H. W U, C. Z HU, R. J. L A, and X. L IU. FASA: Accelerated S-Aloha Using Access History for
Event-Driven M2M Communications. IEEE/ACM Transactions on Networking, 21:6 (2013),
1904 – 1917 (see p. 108)
[216] Q. X IAO, B. X IAO, and S. C HEN. “Differential estimation in dynamic RFID systems”. Proc.
INFOCOM. 2013 (see p. 114)
[217] Q. X IAO, M. C. Z HOU, S. C HEN, and Y. Q IAO. “Temporally or Spatially Dispersed Joint
RFID Estimation Using Snapshots of Variable Lengths”. ACM MobiHoc. 2015 (see p. 114)
[218] L. X IE, Y. S HI, Y. T. H OU, and H. D. S HERALI. Making Sensor Networks Immortal: An
Energy-Renewal Approach With Wireless Power Transfer. IEEE/ACM Transactions on Net-
working, 20:6 (2012), 1748 – 1761 (see p. 84)
[219] L. X IE, B. S HENG, C. C. TAN, H. H AN, Q. L I, and D. C HEN. “Efficient tag identification in
mobile RFID systems”. Proc. INFOCOM. 2010 (see p. 114)
[220] L. X IE, Y. S HI, Y. T. H OU, W. L OU, H. D. S HERALI, H. Z HOU, and S. F. M IDKIFF. A Mobile
Platform for Wireless Charging and Data Collection in Sensor Networks. IEEE Journal of
Selected Areas on Communications, 33:8 (2015), 1521 – 1533 (see p. 83)
[221] L. X IE, Y. S HI, Y. T. H OU, W. L OU, and H. D. S HERALI. “On Traveling Path and Related
Problems for a Mobile Station in a Rechargeable Sensor Network”. Proc. MobiHoc. 2013
(see pp. 84, 94)
[222] G. X ING, T. WANG, W. J IA, and M. L I. “Rendezvous Design Algorithms for WSNs with a
Mobile Base Station”. Proc. MobiHoc. 2008 (see pp. 84, 94)
[223] L. X UE, D. K IM, Y. Z HU, D. L I, W. WANG, and A. O. T OKUTA. “Multiple heterogeneous data
ferry trajectory planning in WSNs”. Proc. INFOCOM. 2014 (see pp. 82, 84)
[224] J. YANG, P. JAILLET, and H. M AHMASSANI. Real-Time Multivehicle Truckload Pickup and
Delivery Problems. Transportation Science, 38: (2004), 135 – 148 (see p. 94)
[225] H. P. Y OUNG. The Evolution of Conventions. Econometrica, 61:1 (1993), 57 – 84 (see pp. 64,
76, 79)
[226] H. P. Y OUNG. Learning by trial and error. Games and Economic Behavior, 65:2 (2009),
626 – 643 (see p. 64)
154
Bibliography
[227] J. Y U and L. C HEN. “Stability Analysis of Frame Slotted Aloha Protocol”. Proc. IWQoS
(extended version to appear in IEEE Transactions on Mobile Computing). 2016 (see pp. 10,
11, 108, 110, 111)
[228] J. Y U, L. C HEN, R. Z HANG, and K. WANG. Finding Needles in a Haystack: Missing Tag
Detection in Large RFID Systems. IEEE Transactions on Communications (to appear), (2017)
(see pp. 10, 11, 108, 134)
[229] J. Y U, L. C HEN, R. Z HANG, and K. WANG. From Static to Dynamic Tag Population Estima-
tion: An Extended Kalman Filter Perspective. IEEE Transactions on Communications, 64:11
(2016), 4706 – 4719 (see pp. 10, 11, 108, 125, 126)
[230] J. Y U, L. C HEN, R. Z HANG, and K. WANG. On Missing Tag Detection in Multiple-group
Multiple-region RFID Systems. IEEE Transactions on Mobile Computing (to appear), (2016)
(see pp. 10, 11, 108)
[231] B. Y UAN, M. O RLOWSKA, and S. S ADIQ. On the Optimal Robot Routing Problem in WSNs.
IEEE Transactions on Knowledge and Data Engineering, 19:9 (2007), 1252 – 1261 (see p. 84)
[232] W. Z ENG, S. VASUDEVAN, X. C HEN, B. WANG, A. R USSELL, and W. W EI. “Neighbor discovery
in wireless networks with multipacket reception”. Proc. MobiHoc. 2011 (see pp. 28, 108)
[233] D. Z HANG, T. H E, S. L IN, S. M UNIR, and J. S TANKOVIC. Online Cruising Mile Reduction in
Large-Scale Taxicab Networks. IEEE Transactions on Parallel and Distributed Systems, 26:11
(2015), 3122 – 3135 (see p. 94)
[234] R. Z HANG, Y. L IU, Y. Z HANG, and J. S UN. “Fast identification of the missing tags in a large
RFID system”. Proc. SECON. 2011 (see p. 126)
[235] S. Z HANG, J. W U, and S. L U. Collaborative Mobile Charging. IEEE Transactions on Comput-
ers, 64:3 (2015), 654 – 667 (see pp. 84, 94)
[236] Y. Z HANG, Q. L I, G. Y U, and B. WANG. “Etch: efficient channel hopping for communication
rendezvous in dynamic spectrum access networks”. Proc. INFOCOM. 2011 (see p. 16)
[237] Q. Z HAO, B. K RISHNAMACHARI, and K. L IU. On Myopic Sensing for Multi-Channel Oppor-
tunistic Access: Structure, Optimality, and Performance. IEEE Transactions Wireless Commu-
nication, 7:3 (2008), 5413 – 5440 (see pp. 41, 44, 47)
[238] R. Z HENG, J. C. H OU, and L. S HA. “Asynchronous wakeup for ad hoc networks”. Proc.
MobiHoc. 2003 (see p. 29)
[239] Y. Z HENG and M. L I. “ZOE: Fast cardinality estimation for large-scale RFID systems”. Proc.
INFOCOM. 2013 (see pp. 107, 113, 127)
[240] M. Z HU and S. M ARTÍNEZ. Distributed Coverage Games for Energy-Aware Mobile Sensor
Networks. SIAM Journal on Control and Optimization, 51:1 (2013), 1 – 27 (see p. 64)
[241] Y. Z HU, W. J IANG, Q. Z HANG, and H. G UAN. Energy-efficient identification in large-scale
RFID systems with handheld reader. IEEE Transactions on Parallel and Distributed Systems,
25:5 (2014), 1211 – 1222 (see p. 108)
155
Publications
Articles in Peer-reviewed Journals
1. D. Zhang, Q. Quan, L. Chen, W. Xu, K. Wang, On-demand Ecology-inspired Spectrum Alloca-
tion for Heterogeneous CRNs, accepted in Telecommunication Systems (TELS), 2017.
2. J. Yu, L. Chen, R. Zhang, K. Wang, Finding Needles in a Haystack: Missing Tag Detection in
Large RFID Systems, accepted in IEEE Transactions on Communications (TCOM), 2017.
3. L. Chen, Y. Li, A. V. Vasilakos, On Oblivious Neighbor Discovery in Distributed Wireless
Networks with Directional Antennas: Theoretical Foundation and Algorithm Design, accepted
in IEEE/ACM Transactions on Networking (ToN), 2017.
4. M. Zheng, L. Chen, W. Liang, H. Yu, J. Wu, Energy-efficiency Maximization for Cooperative
Spectrum Sensing in Cognitive Sensor Networks, accepted in IEEE Transactions on Green
Communications and Networking (TGCN), 2017.
5. M. Koseoglu, E. Karasan, L. Chen, Cross-layer Energy Minimization for Underwater Aloha
Networks, accepted in IEEE Systems Journal (ISJ), 2017.
6. D. Zhang, Q. Liu, L. Chen, K. Wang, Multi-layer Based Multi-path Routing Algorithm for
Maximizing Spectrum Availability, accepted in Springer Wireless Networks (WINET), 2017.
7. X. Zhou, W. Wang, Y. Wang, L. Chen, Z. Zhang, Moderate Incentive Design for Delay-constrained
D2D Relaying, accepted in ACM/Springer Mobile Networks and Applications (MONET), 2017.
8. J. Elias, F. Martignon, L. Chen, M. Krunz, Distributed Spectrum Management in TV White
Space Networks, accepted in IEEE Transactions on Vehicular Technology (TVT), 2017.
9. J. Yu, L. Chen, R. Zhang, K. Wang, On Missing Tag Detection in Multiple-group Multiple-
region RFID Systems, accepted in IEEE Transactions on Mobile Computing (TMC), 2017.
10. J. Yu, L. Chen, Stability Analysis of Frame Slotted Aloha Protocol, accepted in IEEE Transac-
tions on Mobile Computing (TMC), 2017.
11. C. Rottondi, A. Barbato, L. Chen, G. Verticale, Enabling Privacy in a Distributed Game-
theoretical Scheduling System for Domestic Appliances, IEEE Transactions on Smart Grids
(TSG), 8(3):1220 – 1230, May 2017.
12. K. Wang, L. Chen, J. Yu, On Optimality of Myopic Policy in Multi-channel Opportunistic
Access, IEEE Transactions on Communications (TCOM), 65(2):677 – 690, February 2017.
13. M. Zheng, C. Xu, W. Liang, H. Yu, L. Chen, Time-efficient Cooperative Spectrum Sensing via
Analog Computation over Multiple-access Channel, Elsevier Computer Networks (ComNet),
112:84 – 94, January 2017.
14. J. Yu, L. Chen, R. Zhang, K. Wang, From Static to Dynamic Tag Population Estimation: An Ex-
tended Kalman Filter Perspective, IEEE Transactions on Communications (TCOM), 64(11):4706 –
4719, November 2016.
15. L. Chen, K. Bian, Neighbor Discovery in Mobile Sensing Applications: A Comprehensive
156
Bibliography
157
Bibliography
tic Lookahead Policy for Multi-channel Opportunistic Access, IEEE Transactions on Wireless
Communications (TWC), 14(2):759 – 769, February 2015.
32. Z. Ismail, J. Leneutre, D. Bateman, L. Chen, A Game Theoretical Analysis of Data Confiden-
tiality Attacks on Smart Grid AMI, IEEE Journal on Selected Areas in Communications (JSAC),
32(7):1486 – 1499, August 2014.
33. K. Wang, L. Chen, Q. Liu, On Optimality of Myopic Policy for Opportunistic Access with Non-
identical Channels and Imperfect Sensing, IEEE Transactions on Vehicular Technology (TVT),
63(5):2478 – 2483, June 2014.
34. M. Youssef, M. Ibrahim, M. Abdelatif, L. Chen, A. V. Vasilakos, Routing Metrics of CRNs: A
Survey, IEEE Communications Surveys and Tutorials (CST), 14(1):92 – 109, February 2014,
ESI Highly Cited Paper: total number of citations 200+.
35. J. Elias, F. Martignon, L. Chen, E. Altman, Joint Operator Pricing and Network Selection
Game in CRNs: Equilibrium, System Dynamics and Price of Anarchy, IEEE Transactions on
Vehicular Technology (TVT), 62(9):1 – 14, November 2013.
36. K. Wang, L. Chen, Q. Liu, K. Al Agha, On Optimality of Myopic Sensing Policy with Imper-
fect Sensing in Multi-channel Opportunistic Access, IEEE Transactions on Communications
(TCOM), 61(9):3854 – 3862, September 2013.
37. K. Wang, L. Chen, Q. Liu, Opportunistic Spectrum Access by Exploiting Primary User Feed-
backs in Underlay CRNs: An Optimality Analysis, IEEE Journal of Selected Topics in Signal
Processing (JSTSP), 7(5):869 – 882, October 2013.
38. S. Iellamo, L. Chen and M. Coupechoux, Proportional and Double Imitation Rules for Spec-
trum Access in CRNs, Elsevier Computer Networks (ComNet), 57(8):1863 – 1879, June 2013.
39. L. Chen, W. Wang, A. S. Anpalagan, A. V. Vasilakos, K. Illanko, H. Wang, M. Naeem, Green Co-
operative Cognitive Communication and Networking: A New Paradigm for Wireless Networks,
ACM/Springer Mobile Networks and Applications (MONET), 18(4):524 – 534, May 2013.
40. K. Wang, Q. Liu, L. Chen, Hierarchical Reversible Data Hiding Based on Statistical Informa-
tion: Preventing Embedding Unbalance, Elsevier Signal Processing (SigPro), 92(12):2888 –
2900, December 2012.
41. K. Wang, L. Chen, K. Al Agha, Q. Liu, On Optimality of Myopic Policy in Opportunistic
Spectrum Access: The Case of Sensing Multiple Channels and Accessing One Channel, IEEE
Wireless Communications Letters (WCL), 1(5):452 – 455, October 2012.
42. K. Wang, Q. Liu, L. Chen, Optimality of Greedy Policy for a Class of Standard Reward Function
of Restless Multi-armed Bandit Problem, IET Signal Processing, 6(6):584 – 593, August 2012.
43. M. A. Awal, L. Boukhatem, L. Chen, An Integrated Cross-layer Framework of Adaptive Feed-
back Resource Allocation and Prediction for OFDMA Systems, Elsevier Computer Networks
(ComNet), 56(7), pp. 1863 – 1875, May 2012.
44. K. Wang, L. Chen, On Optimality of Myopic Policy for Restless Multi-armed Bandit Prob-
lem: An Axiomatic Approach, IEEE Transactions on Signal Processing (TSP), 60(1):300 – 309,
January 2012.
45. L. Chen, J. Leneutre, Fight Jamming with Jamming: A Game Theoretic Analysis of Jamming
Attack in Wireless Networks and Defense Strategy, Elsevier Computer Networks (ComNet),
55(9):2259 – 2270, June 2011.
46. L. Chen, S. Iellamo, M. Coupechoux, Ph. Godlewski, Spectrum Auction with Interference
158
Bibliography
Constraint for CRNs with Multiple Primary and Secondary Users, Springer Wireless Networks
(WINET), 17(5):1355 – 1371, May 2011.
47. L. Chen, L. Libman, J. Leneutre, Conflicts and Incentives in Wireless Cooperative Relaying: A
Distributed Market Pricing Framework, IEEE Transactions on Parallel and Distributed Systems
(TPDS), 22(5):758 – 772, May 2011.
48. L. Chen, J. Leneutre, A Game Theoretic Framework of Intrusion Detection in Heterogeneous
Networks, IEEE Transactions on Information Forensics and Security (TIFS), 4(2):165 – 178,
June 2009.
49. L. Chen, J. Leneutre, On Multipath Routing in Multihop Wireless Networks: Security, Per-
formance and Their Tradeoff, EURASIP Journal on Wireless Communications and Networking
(JWCN), Vol. 2009, Article ID 946493.
50. L. Chen, J. Leneutre, A Game Theoretic Framework of Distributed Power and Rate Control in
IEEE 802.11 WLANs, IEEE Journal on Selected Areas in Communications (JSAC), 26(7):1128 –
1137, September 2008.
51. L. Chen, J. Leneutre, Toward Secure and Scalable Time Synchronization in Ad Hoc Networks,
Elsevier Computer Communication (ComCom), 30(11 – 12):2453 – 2467, September 2007.
52. X. Xue, J. Leneutre, L. Chen, J. Ben-Othman, SWAN: A Secured Watchdog for Ad Hoc Net-
works, International Journal of Computer Science and Network Security (IJCSNS), 6(2):209 –
218, June 2006
159
Bibliography
160
Bibliography
161