Robust and Distributed Stochastic Localization in Sensor Networks: Theory and Experimental Results
Robust and Distributed Stochastic Localization in Sensor Networks: Theory and Experimental Results
We present a robust localization system allowing wireless sensor networks to determine the phys-
ical location of their nodes. The coverage area is partitioned into regions and we seek to identify
the region of a sensor based on observations by stationary clusterheads. Observations (e.g., signal
strength) are assumed random. We pose the localization problem as a composite multi-hypothesis
testing problem, develop the requisite theory, and address the problem of optimally placing clus-
terheads. We show that localization decisions can be distributed by appropriate in-network pro-
cessing. The approach is validated in a testbed yielding promising results.
Categories and Subject Descriptors: C.2.1 [Network Architecture and Design]: Distributed
networks, Wireless communication.
General Terms: Algorithms, Experimentation, Theory.
Additional Key Words and Phrases: Sensor networks, localization, information theory, hypothesis
testing, optimal deployment, testbed.
1. INTRODUCTION
Localization is viewed as an important service in Wireless Sensor Networks (WS-
NETs) because it enables a number of innovative services, including asset and
personnel tracking and locating nodes that report a critical event. The Global Po-
sitioning System (GPS) provides an effective localization technology outdoors but
is expensive for many WSNET applications, unreliable in downtown urban areas,
and not operational indoors.
The localization literature is large but we will restrict our attention to systems
that only use RF signals from the sensors to localize. The motivation is that RF is
the common denominator of all WSNET platforms since all sensors have a radio to
Ioannis Ch. Paschalidis is with the Center for Information and Systems Engineering, the
Department of Electrical and Computer Engineering, and the Systems Engineering Division,
Boston University, 15 St. Mary’s St., Brookline, MA 02446, e-mail: [email protected], url:
http://ionia.bu.edu/.
Dong Guo is with the Center for Information and Systems Engineering, Boston University, e-mail:
[email protected].
Research partially supported by the NSF under grants DMI-0330171, CNS-0435312, ECS-0426453,
EFRI-0735974, and by the DOE under grant DE-FG52-06NA27490.
A preliminary version of this work has appeared in Paschalidis and Guo [2007].
Permission to make digital/hard copy of all or part of this material without fee for personal
or classroom use provided that the copies are not made or distributed for profit or commercial
advantage, the ACM copyright/server notice, the title of the publication, and its date appear, and
notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish,
to post on servers, or to redistribute to lists requires prior specific permission and/or a fee.
c 20YY ACM 0000-0000/20YY/0000-0001 $5.00
ACM Journal Name, Vol. XX, No. XX, MM 20YY, Pages 1–22.
2 · Paschalidis and Guo
communicate with each other. Moreover, most existing WSNET nodes carry very
rudimentary hardware that only allows the computation of the signal strength seen
by a receiver for packets transmitted by some other node. Additional RF character-
istics, like time-of-flight or angle-of-arrival, are not commonly available and require
more sophisticated hardware. The key idea underlying RF-based localization is as
follows: when a packet is transmitted by a sensor, associated RF characteristics
observed by stationary sensors – the clusterheads – depend on the location of the
transmitting sensor. These observations are exploited to reveal that location. As
we will see, the method we develop can localize using just RF signal strength but
is general enough to accommodate additional RF characteristics should they be
available.
One class of RF-based localization systems relies on a “deterministic” pattern
matching approach as in Bahl and Padmanabhan [2000], Lorincz and Welsh [2006],
and Kaemarungsi and Krishnamurthy [2004]. They use the signal strength (mean)
values observed at a sensor for packets transmitted by a set of beacon nodes and
compare these values to a pre-computed signal-strength map of the coverage area.
RADAR (Bahl and Padmanabhan [2000]), for instance, one of the first localization
systems developed, computes a Euclidean distance between observed signal strength
values at a sensor and the corresponding values pre-recorded at a set of training
locations to determine the location closest to the sensor. Such an approach may
face challenges when the RF signal landscape is highly variable. This is the case
in indoor environments which are very dynamic (e.g., doors opening and closing,
people moving, etc.) and feature multipath and fading.
Another class of localization systems uses triangulation or stochastic triangula-
tion techniques as in Patwari et al. [2003] where signal strength measurements are
used to estimate distance and location. The approach in Madigan et al. [2005] seeks
to benefit from estimating multiple locations at the same time. These techniques
assume a model describing how signal strength diminishes with distance (path loss
formula) and the modeling error can lead to inaccuracies. In experimental results
we report in this paper our approach can reduce the mean error distance by a factor
of 3.6 compared to stochastic triangulation techniques. A different triangulation-
like approach that may be vulnerable to RF signal variability appeared in Yedavalli
et al. [2005] and relies on a monotonicity property of signal strength as a function
of distance to be satisfied most of the time.
The work closer to the approach we present is in Ray et al. [2006] which developed
a stochastic localization system formulating the problem as a standard hypothesis
testing problem. Specifically, signal strength measurements from a number of lo-
cations spread throughout the coverage area are used to obtain probability density
functions (pdfs) of signal strength at every potential clusterhead position. To locate
a sensor somewhere in the coverage area the system tries to “match” measurements
for that sensor to these pdfs, hence, a hypothesis testing problem. A limitation of
this approach is that when the sensor we seek is not close to a location from which
we have measurements, then the observations may not match well with any of the
pdfs leading to errors. One can reduce these errors by obtaining measurements
from more points, but this is costly. This motivates the work in this paper.
The key idea underlying the present work is to partition the coverage area into
ACM Journal Name, Vol. XX, No. XX, MM 20YY.
Robust and Distributed Stochastic Localization in Sensor Networks · 3
a set of regions. The problem is to determine the region where the sensor we seek
resides. To every region-clusterhead pair we associate a family of (signal strength)
pdfs. This is intended to provide robustness with respect to the position of the
sensor within a region. The pdf family can be constructed from measurements
taken from locations within the region and can better represent the region than a
single pdf. We still pose the localization problem as a hypothesis testing problem
but now we have to match signal strength measurements to a pdf family, result-
ing to a composite hypothesis testing problem. In this new framework we consider
the Generalized Likelihood Ratio Test (GLRT) decision rule and obtain a necessary
and sufficient condition under which it is optimal in a Generalized Neyman-Pearson
(GNP) sense, thus, generalizing earlier work in Zeitouni et al. [1992]. Another im-
portant problem we consider is that of optimally placing clusterheads – an optimal
deployment/WSNET design problem – to minimize the maximum probability of
error.
We further demonstrate that our system can localize in a distributed manner
by appropriate in-network processing: clusterheads make observations and take
local decisions which get processed as they propagate through the network of clus-
terheads. The final decision reaches the gateway and, as we show, there is no
performance cost compared to a centralized approach. We have implemented our
approach in a testbed installed at a Boston University building. 1 Our experimental
results establish that we can achieve accuracy that is, roughly, on the same order
of magnitude as the radius of our regions. Specifically, we have achieved a mean
error distance from 8 feet down to 9 inches depending on the size of the regions
we define. The price to pay for greater accuracy is the amount of measurements
needed as smaller (thus, more) regions require more measurements to determine
the family of pdfs corresponding to every region-clusterhead pair.
Our contributions include:
—formulating the localization problem as a composite hypothesis testing problem
aiming at accommodating the stochastic nature of RF signals propagating in-
doors and providing robustness with respect to measurements based on which a
localization decision is made;
—generalizing the GLRT optimality conditions in Zeitouni et al. [1992] to the case
where both hypotheses correspond to a family of pdfs – a result which is of
independent interest;
—characterizing the performance of the localization system which enables
—solving the clusterhead placement problem building on the work in Ray et al.
[2006];
—devising a distributed algorithm for making the localization decision; and
—testing the proposed approach on an actual testbed.
The paper is organized as follows. In Sec. 2, we introduce our system model.
In Sec. 3, we study the composite binary hypothesis testing problem, establish an
optimality condition for GLRT, and obtain bounds on the error exponents which
allow us to optimize the GLRT threshold. We also consider the case where the
1 See http://pythagoras.bu.edu/bloc/index.html
ACM Journal Name, Vol. XX, No. XX, MM 20YY.
4 · Paschalidis and Guo
GLRT optimality conditions are not satisfied. In Sec. 4, we consider the clusterhead
placement problem. In Sec. 5, we develop the distributed decision approach and
make comparisons to a centralized one. In Sec. 6, we provide results from a testbed
implementation of our approach. Finally, in Sec. 7, we draw conclusions.
2. PROBLEM FORMULATION
In this section we introduce our system model. Consider a WSNET deployed in
a site for localization purposes. We divide the site into N regions denoted by an
index set L = {L1 , . . . , LN }. There are M distinct positions B = {B1 , . . . , BM }
at which we can place clusterheads.
Let a sensor be located in region L ∈ L . A set of packets broadcasted by
the sensor is received by some of the clusterheads which observe certain physical
quantities associated with each packet. Often, the observed physical quantities
are just the received signal strength (RSSI) and, if technology allows it, one can
also observe the angle-of-arrival of the signal or other signal characteristics. Our
methodology is general enough to apply to any set of physical observations.
Let y(i) denote the vector of observations by a clusterhead at position Bi corre-
sponding to a packet broadcasted by the sensor. These observations are assumed
to be random. To simplify the analysis in the rest of the paper we will assume that
the observations take values from a finite alphabet Σ = {σ1 , . . . , σ|Σ| }, where |Σ|
denotes the cardinality of Σ. 2 A series of n consecutive observations are denoted by
(i) (i)
y1 , . . . , yn and are assumed independent and identically distributed (i.i.d.) con-
ditioned on the region the sensor node resides. This assumption is justified when
the site is dynamic enough (e.g., doors opening or closing, people moving) so that
the lengths of various radio-paths between the receiver and the transmitter change
on the order of a wavelength between consecutive observations. For example, if a
sensor operates at the 2.4 GHz ISM band, the half-wavelength is only about 6cm,
and body movements of a user who carries the sensor may alone cause observations
separated in time by a few seconds to be i.i.d. Observations made by different clus-
terheads at about the same time need not be independent. We acknowledge that
when the site and the transmitter/receiver are fairly static, observations over such
short times may be correlated; a case we do not handle. The requisite theory could
be developed for that case as well but to “learn” models capturing such correlation
would probably require too many measurements for a practical system.
With every clusterhead-region pair (Bi , Lj ) we associate a family of pdfs
pY(i) |θj (y) where Y(i) denotes the random variable corresponding to observations
y(i) at clusterhead Bi when the transmitting sensor is in some location within Lj .
Here, θ j ∈ Ωj is a vector in some space Ωj parametrizing the pdf family. As men-
tioned earlier, the use of a family of pdfs rather than a single pdf is intended to
provide robustness with respect to the exact position of the sensor within the region
Lj . As we will see later on, we will use measurements at a few locations (or even
a single one) within Lj but we will associate to these measurements a family of
pdfs parametrized by θ j . For example, one could obtain an empirical pdf from the
measurements and associate with Lj pdfs with the same shape as the empirical pdf
2 This is indeed the case in practice since WSNET nodes report quantized RSSI measurements.
ACM Journal Name, Vol. XX, No. XX, MM 20YY.
Robust and Distributed Stochastic Localization in Sensor Networks · 5
where Pθj [·] (resp. Pθi [·]) is a probability evaluated assuming that y(k),n is drawn
S
from pY(k) |θj (·) (resp. pY(k) |θi (·)). We use a similar notation and write αijk,n (θ j )
S
and βijk,n (θ i ) for the error probabilities of any other test that declares Li whenever
y(k),n is in some set Sijk,n . In the sequel, we will often consider the asymptotic
rate according to which these probabilities approach zero as n → ∞. We will use
the term exponent to refer to the quantity limn→∞ n1 log P[·] for some probability
P[·]; if the exponent is d then the probabilities approaches zero as e−nd .
Zeitouni et al. [1992] have established conditions for the optimality of the GLRT
in a Neyman-Pearson sense for general Markov sources. The analysis in Zeitouni
et al. [1992] is carried out for the case where one hypothesis corresponds to a single
pdf and the other to a pdf family. We provide a generalization (in an i.i.d. setting)
3 We note that the pdf families are associated with a region-clusterhead pair. Thus, θj and θi
depend on k as well but we elect to suppress this dependence in the notation for simplicity. We
will be usually referring to the triplet i, j, k, hence, it will be evident to which θj and θi we refer.
ACM Journal Name, Vol. XX, No. XX, MM 20YY.
6 · Paschalidis and Guo
Definition 1
Generalized Neyman-Pearson (GNP) Criterion: We will say that the deci-
sion rule {Sijk,n } is optimal if it satisfies
1 S
lim sup log αijk,n (θ j ) < −λ, ∀θ j ∈ Ωj , (1)
n→∞ n
1 S
and maximizes − lim supn→∞ n log βijk,n (θ i ) uniformly for all θ i ∈ Ωi .
where Pθj denotes the probability law induced by pY(k) |θj (·). The following lemma
generalizes Hoeffding’s result and a similar result in Zeitouni et al. [1992]; the proof
is in Appendix A.
Lemma 3.1 The generalized Hoeffding test satisfies the GNP criterion.
∗
S
Next, we will determine the exponent of βijk,n (θ i ). Define the set Aijk =
{Q| inf θj D(QkPθj ) < λ}. We have
∗
S
βijk,n (θ i ) = Pθi [y(k),n 6∈ Sijk,n
∗
] = Pθi [Ly(k),n ∈ Aijk ∩ Ln ].
ACM Journal Name, Vol. XX, No. XX, MM 20YY.
Robust and Distributed Stochastic Localization in Sensor Networks · 7
o
where Aijk denotes the interior of Aijk . Since Aijk is an open set the upper and
S∗
lower bounds match and inf Q∈Aijk D(QkPθi ) is the exponent of βijk,n (θ i ).
The following theorem establishes a necessary and sufficient condition for the
optimality of GLRT under the GNP criterion. The proof is in Appendix B.
Theorem 3.2 The GLRT with a threshold λ is asymptotically optimal under the
GNP criterion, if and only if
inf D(QkPθi ) ≥ inf D(QkPθi ), (3)
Q∈Cijk Q∈Aijk
Cijk = {Q| inf D(QkPθj ) − inf D(QkPθi ) < λ ≤ inf D(QkPθj )}.
θj θi θj
Note that Zijk (λ) is nonincreasing in λ, Zijk (0) = minθi minθj D(Pθj kPθi ), and
limλ→∞ Zijk (λ) = 0. Assuming that Zijk (0) > 0, there exists a λ∗ijk > 0 such that
ACM Journal Name, Vol. XX, No. XX, MM 20YY.
8 · Paschalidis and Guo
Zijk (λ∗ijk ) = λ∗ijk . Furthermore, both error probability exponents in (4) and (5)
are no smaller than λ∗ijk .
Now consider the clusterhead at Bk observing y(k),n and seeking to distinguish
between Li and Lj . Assume that the GLRT using Xijk (y(k),n ) satisfies condition
(3) and, also, the GLRT using Xjik (y(k),n ) satisfies the symmetric condition. The
clusterhead has the option of using the GLRT by comparing Xijk (y(k),n ) to the
threshold λ∗ijk , or comparing Xjik (y(k),n ) to a threshold λ∗jik that can be obtained
in exactly the same way as λ∗ijk . Let
dijk = max{λ∗ijk , λ∗jik }, (7)
and set (ī, j̄) = (i, j) ifλ∗ijk
is the maximizer above; otherwise set (ī, j̄) = (j, i).
Define the maximum probability of error as
(e) △
Pijk,n = max{max αīGLRT GLRT
j̄k,n (θ j̄ ), max βīj̄k,n (θ ī )}.
θ j̄ θ ī
Proposition 3.3 Assume that the GLRT using Xijk (y(k),n ) satisfies condition (3)
and, also, the GLRT using Xjik (y(k),n ) satisfies the symmetric condition. Then,
when the clusterhead at Bk compares Xīj̄k (y(k),n ) to dijk the maximum probability
of error satisfies
1 (e)
lim sup log Pijk,n ≤ −dijk .
n→∞ n
One of the challenges computing dijk is that the problem in (6) is nonconvex.
Specifically, the relative entropy in the constraint is convex in Q but minimization
over θ j yields a piecewise convex function. This may not be an issue when there are
relatively few possible values of θ j and θ i but for large sets Ωj and Ωi computing
dijk becomes expensive. To address this issue, we will next develop a lower bound
(through duality) to Zijk (λ, θ i ).
Let Z̃ijk (λ, θ i ) the optimal value of the dual to (6); by weak duality it follows
Zijk (λ, θ i ) ≥ Z̃ijk (λ, θ i ). We have
Z̃ijk (λ, θ i ) = max min min[D(QkPθi ) + µD(QkPθj )] − µλ . (8)
µ≥0 θj Q
Note that the optimization over Q is convex and the optimization over µ is concave,
thus, this problem can be solved efficiently. (In fact, the optimization over Q can be
solved analytically.) It can be seen that Z̃ijk (λ, θ i ) is convex and nonincreasing in
λ for all θ i . Furthermore, the exponent of the type II GLRT error probability is no
smaller than Z̃ijk (λ) = minθi Z̃ijk (λ, θ i ). Note that Z̃ijk (λ) is also nonincreasing
in λ, Z̃ijk (0) = minθi minθj D(Pθj kPθi ), and limλ→∞ Z̃ijk (λ) = 0. Assuming that
Z̃ijk (0) > 0, there exists a λ̃∗ijk > 0 such that Z̃ijk (λ̃∗ijk ) = λ̃∗ijk . Furthermore, both
error exponents in (4) and (5) are no smaller than λ̃∗ijk .
Following the same line of development as before we set
d˜ijk = max{λ̃∗ijk , λ̃∗jik }, (9)
ACM Journal Name, Vol. XX, No. XX, MM 20YY.
Robust and Distributed Stochastic Localization in Sensor Networks · 9
(e)
and define ī, j̄, and Pijk,n in the same way as earlier. It can be seen that d˜ijk ≤ dijk .
We arrive at the following proposition which provides a weaker but more easily
computable probabilistic guarantee on the probability of error.
Proposition 3.4 Assume that the GLRT using Xijk (y(k),n ) satisfies condition (3)
and, also, the GLRT using Xjik (y(k),n ) satisfies the symmetric condition. Then,
when the clusterhead at Bk compares Xīj̄k (y(k),n ) to d˜ijk the maximum probability
of error satisfies
1 (e)
lim sup log Pijk,n ≤ −d˜ijk .
n→∞ n
Next, we tackle the case when the GLRT optimality condition (3) is not satisfied.
1 GLRT
lim sup log βijk,n (θ i ) ≤ − inf D(QkPθi ), ∀θ i ∈ Ωi . (11)
n→∞ n Q∈Dijk
Using the same argument as in the proof of Thm. 3.2 we can show that (4) still
holds. The exponent of the type II GLRT error probability (cf. (11)) is
Ẑijk (λ, θ i ) = minQ D(QkPθi )
s.t. minθj D(QkPθj ) − minθi D(QkPθi ) ≤ λ,
which is equivalent to
Ẑijk (λ, θ i ) = minQ D(QkPθi )
(12)
s.t. minθj D(QkPθj ) − D(QkPθi ) ≤ λ, ∀θ i .
The worst case exponent over θ i ∈ Ωi is given by
Ẑijk (λ) = min Ẑijk (λ, θ i ).
θi
Ẑijk (λ) is nonincreasing in λ, and limλ→∞ Ẑijk (λ) = 0. Assuming that Ẑijk (0) > 0,
there exists a λ̂∗ijk > 0 such that Ẑijk (λ̂∗ijk ) = λ̂∗ijk . Furthermore, both error
probability exponents in (10) and (11) are no smaller than λ̂∗ijk .
Following the same argument as before we set
dˆijk = max{λ̂∗ijk , λ̂∗jik }, (13)
(e)
and define ī, j̄, and Pijk,n in the same way as earlier. The discussion above leads
to the following proposition.
Proposition 3.5 Suppose that the clusterhead at Bk uses the GLRT and compares
Xīj̄k (y(k),n ) to dˆijk . Then, the maximum probability of error satisfies
1 (e)
lim sup log Pijk,n ≤ −dˆijk .
n→∞ n
Problem in (12) is nonconvex; we will again use dual relaxation to obtain a
quantity that is easier to compute. Let Z̄ijk (λ, θ i ) the optimal value of the dual of
(12); by weak duality it follows Ẑijk (λ, θ i ) ≥ Z̄ijk (λ, θ i ). After some algebra
|Σ|
X P
Z̄ijk (λ, θ i ) = max [min min[ Q(σr ) log(Q(σr )A(σr )) − θ i µθ i λ], (14)
µθi ≥0 θ j Q
r=1
µθi
PY(k) |θ (σr )
1
Q
where A(σr ) = PY(k) |θ (σr ) · θi
i
PY(k) |θ (σr ) . Note that the optimization
i j
over Q is convex and the optimization over µθi is concave, thus, this problem can
be solved efficiently. In fact, the optimization over Q can be solved analytically
yielding
1
P|Σ| 1
Q(σl ) = A(σ l)
/ r=1 A(σr ) , l = 1, . . . , |Σ|.
Z̄ijk (λ, θ i ) is convex and nonincreasing in λ for all θ i . Furthermore, the exponent of
the type II GLRT error probability is no smaller than Z̄ijk (λ) = minθi Z̄ijk (λ, θ i ).
Note that Z̄ijk (λ) is also nonincreasing in λ, and limλ→∞ Z̄ijk (λ) = 0. Assuming
that Z̄ijk (0) > 0, there exists a λ̄∗ijk > 0 such that Z̄ijk (λ̄∗ijk ) = λ̄∗ijk . Furthermore,
both error exponents in (10) and (11) are no smaller than λ̄∗ijk .
Following the same approach as before, set
d¯ijk = max{λ̄∗ijk , λ̄∗jik }, (15)
(e)
and define ī, j̄, and Pijk,n in the same way as earlier. It can be seen that dˆijk ≥ d¯ijk .
We arrive at the following proposition which provides a weaker but more easily
computable probabilistic guarantee on the probability of error.
Proposition 3.6 Suppose that the clusterhead at Bk uses the GLRT and compares
Xīj̄k (y(k),n ) to d¯ijk . Then, the maximum probability of error satisfies
1 (e)
lim sup log Pijk,n ≤ −d¯ijk .
n→∞ n
4. LOCALIZATION AND CLUSTERHEAD PLACEMENT
In this section, we focus on how to place the K ≤ M clusterheads at positions in B
to facilitate localization. We start by considering the multiple composite hypothesis
testing problem of identifying the region L ∈ L in which the sensor we seek resides.
4.1 Multiple composite hypothesis testing
We assume, without loss of generality, that we have placed clusterheads in positions
(k) (k)
B1 , . . . , BK , each one making n i.i.d. observations y(k),n = (y1 , . . . , yn ). Let dijk
be the GLRT threshold obtained in Sec. 3 for each region pair (i, j), i < j, and
ACM Journal Name, Vol. XX, No. XX, MM 20YY.
Robust and Distributed Stochastic Localization in Sensor Networks · 11
clusterhead k. (dijk is obtained from either (7), (9), (13), or (15), depending on
which optimization problem we elect to solve.)
We make N − 1 binary decisions with the GLRT rule to arrive at a final decision.
Specifically, we first compare L1 with L2 to accept one hypothesis, then compare
the accepted hypothesis with L3 , and so on and so forth. For each one of these
Li vs. Lj decisions we use a single clusterhead Bk as detailed in Sec. 3 and the
exponent of the corresponding maximum probability of error is bounded by dijk .
All in all we make N − 1 binary hypothesis decisions.
4.2 Clusterhead placement
Our objective is to minimize the worst case probability of error. To that end, for
every pair of regions Li and Lj we need to find a clusterhead that can discriminate
between them with a probability of error exponent larger than some ǫ and then
maximize ǫ. This is accomplished by the mixed integer linear programming problem
(MILP) formulation of Figure 1.
max ǫ (16)
PM
s.t. k=1 xk = K (17)
PM
k=1 yijk = 1, i, j = 1, . . . , N, i < j, (18)
yijk ≤ xk , ∀i, j, i < j, k = 1, . . . , M, (19)
ǫ≤ M
P
k=1 dijk yijk , ∀i, j, i < j, (20)
yijk ≥ 0, ∀i, j, i < j, ∀k, (21)
xk ∈ {0, 1}, ∀k. (22)
We can interpret maxk:xk (Y )=1 dijk as the best exponent for the probability of error
in distinguishing between regions Li and Lj from some clusterhead in Y . Then
ǫ(Y ) is simply the worst pairwise exponent. The following result is from Ray et al.
[2006].
ACM Journal Name, Vol. XX, No. XX, MM 20YY.
12 · Paschalidis and Guo
∗
where at most one yijk is set to 1 for a given (i, j) pair.
To summarize, the positions where clusterheads are to be placed are in the set
△
Y ∗ = {Bk : x∗k = 1}. For every region pair Li and Lj , maxk:x∗k =1 dijk is the best
exponent for the probability of error in distinguishing between these regions and
the clusterhead that will be responsible for that decision is the one corresponding
∗ ∗
to yijk = 1; we will be denoting by kij the index of this clusterhead.
than dijkij∗ . Now, for every i and j 6= i define En (i, j) as the event that the GLRT
employed by the clusterhead at Bkij ∗ will decide Lj under Pθ . For all δn > 0 and
i
large enough n we have
X −n(dijk∗ +δn ) ∗
Pθi [error] ≤ Pθi [∪j6=i En (i, j)] ≤ e ij ≤ (N − 1)e−n(ǫ +δn ) .
j6=i
The 2nd inequality above is due to Props. 3.3, 3.4, 3.5, or 3.6 and the last inequality
above is due to (25). Since the bound above holds for all i we obtain (27).
ACM Journal Name, Vol. XX, No. XX, MM 20YY.
Robust and Distributed Stochastic Localization in Sensor Networks · 13
5. DISTRIBUTED LOCALIZATION
In this section we consider the implementation of the decision rule described in
Sec. 4. We assume that the WSNET has a single gateway. We seek to devise a
distributed localization algorithm in order to minimize the information that needs
to be exchanged between clusterheads and the gateway. The primary motivation
is that in WSNETs communication is, in general, more expensive than processing.
For the remainder of this section we will assume that the clusterheads and the
gateway form a connected network. Otherwise, one can simply add a sufficient
number of relays.
work to perform the GLRT, yielding an overall O(nN ) processing effort distributed
to the K clusterheads. In terms of communication cost, N − 1 messages get ex-
changed each consisting of O(log N ) bits needed to encode the decision. Each of
these messages can, in the worst case, be sent over O(K) hops if two distant clus-
terheads need to communicate, yielding an overall worst case communication cost
of O(KN log N ). However, one can sequence the regions in such a way that ge-
ographically close regions are close in the sequence. As a result, it will often be
the case that clusterheads responsible for region pairs close in the sequence will be
geographically close.
We note that this “locality” property is plausible since the signal landscape is
primarily influenced by the structure of the site. Hence, it is reasonable to expect
the best clusterheads for nearby regions to be geographically close. To see that,
consider a large deployment with a radius much larger than the range of the sensor
nodes. The clusterheads responsible for nearby regions should be able to listen
to sensors within these regions, which implies that they are geographically close
compared to the overall size of the deployment.
This results in messages between clusterheads traveling a few hops. It follows
that the overall communication cost will often be O(N log N ).
Based on the preceding analysis, Table I compares the centralized and distributed
approaches. A couple of remarks are in order. The total processing cost is the same
for both approaches but in the distributed case the work is distributed among the K
clusterheads. To compare the communication costs note that typically K = O(N )
to ensure reasonable performance (e.g., one clusterhead for a fixed number of re-
gions). Moreover, S1 is the message size for the raw measurements at a clusterhead
corresponding to a packet sent from the transmitting sensor, while n can be large
enough (e.g., 20-30) so that the probability of error becomes small enough. Fur-
thermore, based on the discussion earlier, we expect the worst case to be typical
in the centralized approach while the best case should be typical in the distributed
approach. It follows that the distributed approach leads to communication cost
(and energy) savings.
Note that both the centralized and the distributed approach guarantee the per-
formance of the system obtained in Prop. 4.2, i.e., the savings from the distributed
approach come with no performance loss.
6. EXPERIMENTAL RESULTS
Next, we provide experimental results from a localization testbed we have installed
at Boston University (BU).
ACM Journal Name, Vol. XX, No. XX, MM 20YY.
Robust and Distributed Stochastic Localization in Sensor Networks · 15
(1) For each pair of positions (Bk , Bj ) estimate the pdf pY(k) |B (y) of RSSI at Bk when the
j
mote at Bj is transmitting. Let mjk denote the corresponding mean.
(2) For each (Bk , Bj ) construct a pdf family {pY(k) |θ (y), θj ∈ Ωj } to characterize transmissions
j
from positions within Lj .
(3) Compute the exponent dijk as described in Sec. 3.1.
(4) Determine the clusterhead placement by the algorithm in Sec. 4.2.
(5) Determine the location of any mote in the coverage area by the decision rule of Sec. 4.1.
are sent over a long enough time interval to capture the environment in different
“states” and thus account for the variability in RSSI measurements. For Phase 2
we define an interval [mjk − m̂jk , mjk + m̂jk ] and select points θj,1 , . . . , θj,R in this
interval. We construct the family {pY(k) |θj (y), θ j ∈ Ωj } so that the lth member
has the same shape as pY(k) |Bj (y) but a mean equal to θj,l , for l = 1, . . . , R. m̂jk is
selected appropriately so that the union over j, k of the intervals [mjk − m̂jk , mjk +
m̂jk ] is maximized and there is no overlap. The value of R determines how rich
are the pdf families; in our experiments θj,1 , . . . , θj,R were selected to include all
integers in the interval [mjk − m̂jk , mjk + m̂jk ]. It can be seen that to construct
the pdf families we only used measurements from a single point (the center) within
a region. Therefore, the measurement campaign is not necessarily more expensive
than the one required by the approach in Ray et al. [2006] which uses a single
(rather than a family) pdf per region. For Phase 3 we were not able to verify
the GLRT optimality condition (cf. Thm. 3.2), so we obtained dijk by computing
the type II exponent as in (15). The optimal placement obtained in Phase 4 is
shown in Fig. 2 where we used 12 clusterheads placed at the positions of the red
squares on the graph. The number of clusterheads was selected to achieve a small
enough probability of error (cf. Prop. 4.2). The training phase (Steps 1 and 2
of Fig. 3) takes about a day. Step 3 depends on the hardware used to solve the
corresponding optimization problems. It took about 2 days for our testbed. Step 4
takes just about half an hour. Note that these steps are performed once, assuming
that the environment does not change structurally in a very dramatic manner. The
detection phase (Step 5) takes on the order of 40 seconds. Finally, in terms of
storage requirements, the distributed algorithm needs to store about 2 Kbytes in
each of the clusterheads (pdf families, where to forward decisions, etc.).
We obtained results for three versions of the localization system. We made 100
localization tests in positions spread within the covered area. Each test used 20
packets (RSSI measurements) broadcasted by the mote to be located (5 over each
channel and power level pair for the 2F 2P cases described below). In Version 1 the
mote we wish to locate transmits packets over a single frequency (2.410 GHz) and
a single power level (0 dBm) and the system uses the GLRT (we write 1F 1P − G to
indicate Ver. 1 in Fig. 4) to determine the region of the mote. In Version 2 (denoted
by 2F 2P −G) RSSI observations are made for packets transmitted over two different
frequencies (2.410 GHz and 2.460 GHz) and two different power levels (0 dBm and
−10 dBm) and the GLRT is again used. Version 3 (denoted by 2F 2P − L) is
ACM Journal Name, Vol. XX, No. XX, MM 20YY.
Robust and Distributed Stochastic Localization in Sensor Networks · 17
10
3
5
0 0
0 200 400 600 800 1000 1200 1400 1600 0 5 10 15 20 25 30 35
Errror Distance Unit:inch Errror Distance Unit:inch
12
10
10
The Number per Interval 8
The Number per Interval
4
4
2
2
0 0
0 50 100 150 200 250 300 350 400 0 5 10 15 20 25 30 35
Errror Distance Unit:inch Errror Distance Unit:inch
10
10 8
The Number per Interval
The Number per Interval
5 4
0 0
0 200 400 600 800 1000 1200 1400 1600 1800 0 5 10 15 20 25 30 35
Errror Distance Unit:inch Errror Distance Unit:inch
identical to Version 2 but the LRT rather than the GLRT is used where every region
is represented by just the pdf observed in Phase 1 (rather than a pdf family). For
each Version 1–3 results are reported in Fig. 4(a)–(c), respectively. In each of these
figures we plot the histogram of the error distance (in inches) based on 100 trials.
If the system identifies region Lj as the one where the transmitting mote is located
then the error distance is defined as the distance between the transmitting mote and
ACM Journal Name, Vol. XX, No. XX, MM 20YY.
18 · Paschalidis and Guo
Bj . For each system we also report the corresponding mean error distance (D̄e ).
We stress that for each trial the location of the transmitting mote is randomly
selected and almost never one at which RSSI measurements have been made in
Phase 1.
The results show that the 2F 2P − G system, which exploits frequency and power
diversity, outperforms the 1F 1P − G system. Clearly, RSSI measurements at mul-
tiple power and frequency levels contain more information about the transmitter
location. Also, the 2F 2P − G system outperforms the 2F 2P − L system which uses
the standard LRT decision rule. This demonstrates that, as envisioned, the GLRT
provides robustness leading to better performance. The issue with the LRT is that
a single pdf can not adequately represent a relatively large region. We also note
that the total coverage area was 5258 feet2 , that is, about 87 feet2 per region. With
a mean error distance of D̄e = 8 feet the mean area of “confusion” was 82 = 64
feet2 . From these results it is evident that we were able to achieve accuracy on the
same order of magnitude as the mean area of a region. That is, the system was
identifying the correct or a neighboring region most of the time. Put differently, we
can say that the achieved mean√ error distance is about the√ same as the radius of a
region, defined as radius = area (for our experiments 87 = 9.3 feet which is in
fact larger than the mean error distance of 8 feet). We used a clusterhead density of
1 clusterhead per 5258/12 = 438 feet2 . Note that our system is not localizing based
on “proximity” to a clusterhead; one clusterhead corresponds to about 5 regions
thus resulting into cost savings compared to proximity-based systems that need a
higher density of observers.
An interesting question is whether the pdf families constructed during the train-
ing phase remain valid after a long period of time or need very frequent updating
(which is costly). To answer this question, we performed another (smaller) set of
56 localization tests after about one year from the time we derived our pdf fami-
lies. This second set of tests yielded a mean error distance of 87.32 inches for the
2F2P-G system, quite similar to the earlier tests. During this year there have been
modest changes in the building with labs and conference rooms been reorganized
and several faculty moving to new offices.
For comparison purposes, we also used the same testbed and the exact same tests
with the stochastic triangulation method of Patwari et al. [2003]. Patwari et al.
[2003] assumes that the RSSI (in db) at Bk when the mote at Bj is transmitting,
say Y (k) |Bj , is a random variable with a Gaussian distribution. The mean of
RSSI satisfies the path loss formula Ȳ (k) |Bj = Y0 − 10np log10 (ζkj /ζ0 ), where ζij
is the distance between Bk and Bj and ζ0 is a normalizing constant. From prior
measurements we obtained np = 3.65 and Y0 = −48.62 dBm for ζ0 = 3 feet. The
location estimation is obtained by maximum likelihood estimation. Applying this
method and using our clusterheads in the exact same position as before resulted in
a mean error distance of 341.72 inches (29 feet) which is much larger (a factor of
3.6!) than the 8 feet obtained by our method.
These results raised the question whether smaller regions can lead to better
accuracy. To that end, we placed 12 motes on a table (two rows of 6 motes each).
Two neighboring motes in one row (or in one column) were 6 inches apart. We
defined a 36 inches2 region around each mote and followed the exact same procedure
ACM Journal Name, Vol. XX, No. XX, MM 20YY.
Robust and Distributed Stochastic Localization in Sensor Networks · 19
as before. The results of this “small scale” localization experiment are in Fig. 4(d)–
(f). As before frequency and power diversity improve performance. Here, however,
the GLRT does not make a difference compared to LRT and this is because every
trial point in the coverage area is very close to a point we have measurements from.
With the LRT we can achieve a mean error distance of 9.26 inches, that is, we can
again achieve an accuracy on the same order of magnitude as the mean area of a
region.
7. CONCLUSIONS
In this paper, we presented a robust and distributed approach for locating the area
(region) where sensors of a WSNET reside. We posed the problem of localization
as a multiple composite hypothesis testing problem and proposed a GLRT-based
decision rule. We established a necessary and sufficient condition for the GLRT
to be optimal in a generalized Neyman-Pearson sense but also considered the case
where such optimality conditions are not met. Developing asymptotic results on
the type I and type II error exponents, we described how an optimal GLRT thresh-
old can be obtained. We then turned to the problem of optimally placing a given
number of clusterheads to minimize the probability of error. We devised a place-
ment algorithm that provides a probabilistic guarantee on the probability of error.
Furthermore, we proposed a distributed approach to implement the GLRT-based
decision rule and demonstrated that this can lead to savings in the communication
cost compared to a centralized approach.
We validated our approach using testbed implementations involving MICAz
motes manufactured by Crossbow. Our experimental results demonstrate that the
GLRT-based system provides significant robustness (and improved performance)
compared to an LRT-based system such as the one in Ray et al. [2006]. Fur-
thermore, our approach leads to significantly improved accuracy compared to a
stochastic triangulation technique like the one in Patwari et al. [2003] – by a factor
of 3.6 in our tests. We showed that we can achieve an accuracy on the same order
of magnitude as the mean area of a region. This represents the best possible ac-
curacy for a system which identifies the region of the mote rather than estimating
the exact location. Smaller regions (and more clusterheads) lead to better accuracy
but at the expense of more initial measurements (training) and higher equipment
cost. This provides a rule of thumb for practical systems: define as small regions
as possible given a tolerable amount of initial measurements and cost.
Acknowledgments
We would like to thank Binbin Li for implementing the stochastic triangulation
approach which was compared to ours.
REFERENCES
Bahl, P. and Padmanabhan, V. 2000. RADAR: An in-building RF-based user location and
tracking system. In Proceedings of the IEEE INFOCOM Conference. IEEE, Tel-Aviv, Israel.
Dembo, A. and Zeitouni, O. 1998. Large Deviations Techniques and Applications, 2nd ed.
Springer-Verlag, NY.
Hoeffding, W. 1965. Asymptotically optimal tests for multinomial distributions. Ann. Math.
Statist. 36, 369–401.
ACM Journal Name, Vol. XX, No. XX, MM 20YY.
20 · Paschalidis and Guo
−nD(Ly(k),n kPθj )
X
= e
∗
{Ly(k),n |Tn (Ly(k),n )⊆Sijk,n }
which establishes (1). For the first inequality above note that the size of the type
nH(Ly(k),n )
class of Ly(k),n is upper bounded by e and that the probability of a
sequence can be written in terms of the entropy and the relative entropy of its
type (see Dembo and Zeitouni [1998, Chap. 2]). In the last inequality above we
∗
used the definition of Sijk,n and the fact that the set of all possible types, Ln , has
cardinality upper bounded by (n + 1)|Σ| (Dembo and Zeitouni [1998, Chap. 2]).
Let now Sijk,n be some other decision rule satisfying constraint (1), hence, for
all ǫ > 0 and all large enough n
S
αijk,n (θ j ) ≤ e−n(λ+ǫ) . (28)
ACM Journal Name, Vol. XX, No. XX, MM 20YY.
Robust and Distributed Stochastic Localization in Sensor Networks · 21
Meanwhile for all ǫ > 0, all large enough n, and any y(k),n ∈ Sijk,n
X
S
αijk,n (θ j ) = |Tn (Ly(k),n )| pY(k) |θj (y(k),n )
{Ly(k),n |Tn (Ly(k),n )⊆Sijk,n }
−nD(Ly(k),n kPθj )
X
≥ (n + 1)−|Σ| e
{Ly(k),n |Tn (Ly(k),n )⊆Sijk,n }
where the first inequality above uses Dembo and Zeitouni [1998, Lemma 2.1.8].
Comparing the above with (28) it follows that if y(k),n ∈ Sijk,n then for all θ j
D(Ly(k),n kPθj ) ≥ λ, hence, y(k),n ∈ Sijk,n ∗ ∗
and Sijk,n ⊆ Sijk,n . Consequently,
S S∗
for all θ i βijk,n (θ i ) ≥ βijk,n (θ i ) which establishes that the generalized Hoeffding
test maximizes the exponent of the type II error probability. We conclude that it
satisfies the GNP criterion.
1 1
λ≤ log sup pY(k) |θi (y(k),n ) − log sup pY(k) |θj (y(k),n )
n θi n θj
∗
which implies that y(k),n ∈ Sijk,n
∗ GLRT
. It follows that αijk,n S
(θ j ) ≤ αijk,n (θ j ) which
establishes that the GLRT satisfies (1) and (4) due to Lemma 3.1.
For the type II error probability we have
GLRT
βijk,n (θ i ) =Pθi [y(k),n 6∈ Sijk,n
GLRT
]
=Pθi [y(k),n 6∈ Sijk,n
∗
] + Pθi [y(k),n ∈ Sijk,n
∗ GLRT ],
∩ Sijk,n (30)
1 1
λ> log sup pY(k) |θi (y(k),n ) − log sup pY(k) |θj (y(k),n )
n θi n θj
∗ GLRT then L
Sijk,n ∩ Sijk,n y(k),n ∈ Cijk . Sanov’s theorem yields
1
− lim sup log Pθi [y(k),n ∈ Sijk,n
∗ GLRT ] ≥
∩ Sijk,n inf D(QkPθi ) (31)
n→∞ n Q∈Cijk
= inf D(QkPθi ),
Q∈Aijk
where the last equality holds under condition (3). Thus, the type II error prob-
ability of GLRT has the same exponent as the generalized Hoeffding test if and
only if condition (3) holds. Since the generalized Hoeffding test satisfies the GNP
optimality condition (Lemma 3.1) so does the GLRT under condition (3). This also
establishes (5).