0% found this document useful (0 votes)
34 views22 pages

Robust and Distributed Stochastic Localization in Sensor Networks: Theory and Experimental Results

We present a robust localization system allowing Wireless Sensor Networks to determine the physical location of their nodes. We pose the localization problem as a composite multi-hypothesis testing problem, develop the requisite theory, and address the problem of optimally placing clusterheads. The approach is validated in a testbed yielding promising results.

Uploaded by

tejavallipaddu
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views22 pages

Robust and Distributed Stochastic Localization in Sensor Networks: Theory and Experimental Results

We present a robust localization system allowing Wireless Sensor Networks to determine the physical location of their nodes. We pose the localization problem as a composite multi-hypothesis testing problem, develop the requisite theory, and address the problem of optimally placing clusterheads. The approach is validated in a testbed yielding promising results.

Uploaded by

tejavallipaddu
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Robust and Distributed Stochastic Localization in

Sensor Networks: Theory and Experimental Results


IOANNIS CH. PASCHALIDIS and DONG GUO
Boston University

We present a robust localization system allowing wireless sensor networks to determine the phys-
ical location of their nodes. The coverage area is partitioned into regions and we seek to identify
the region of a sensor based on observations by stationary clusterheads. Observations (e.g., signal
strength) are assumed random. We pose the localization problem as a composite multi-hypothesis
testing problem, develop the requisite theory, and address the problem of optimally placing clus-
terheads. We show that localization decisions can be distributed by appropriate in-network pro-
cessing. The approach is validated in a testbed yielding promising results.
Categories and Subject Descriptors: C.2.1 [Network Architecture and Design]: Distributed
networks, Wireless communication.
General Terms: Algorithms, Experimentation, Theory.
Additional Key Words and Phrases: Sensor networks, localization, information theory, hypothesis
testing, optimal deployment, testbed.

1. INTRODUCTION
Localization is viewed as an important service in Wireless Sensor Networks (WS-
NETs) because it enables a number of innovative services, including asset and
personnel tracking and locating nodes that report a critical event. The Global Po-
sitioning System (GPS) provides an effective localization technology outdoors but
is expensive for many WSNET applications, unreliable in downtown urban areas,
and not operational indoors.
The localization literature is large but we will restrict our attention to systems
that only use RF signals from the sensors to localize. The motivation is that RF is
the common denominator of all WSNET platforms since all sensors have a radio to

Ioannis Ch. Paschalidis is with the Center for Information and Systems Engineering, the
Department of Electrical and Computer Engineering, and the Systems Engineering Division,
Boston University, 15 St. Mary’s St., Brookline, MA 02446, e-mail: [email protected], url:
http://ionia.bu.edu/.
Dong Guo is with the Center for Information and Systems Engineering, Boston University, e-mail:
[email protected].
Research partially supported by the NSF under grants DMI-0330171, CNS-0435312, ECS-0426453,
EFRI-0735974, and by the DOE under grant DE-FG52-06NA27490.
A preliminary version of this work has appeared in Paschalidis and Guo [2007].
Permission to make digital/hard copy of all or part of this material without fee for personal
or classroom use provided that the copies are not made or distributed for profit or commercial
advantage, the ACM copyright/server notice, the title of the publication, and its date appear, and
notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish,
to post on servers, or to redistribute to lists requires prior specific permission and/or a fee.
c 20YY ACM 0000-0000/20YY/0000-0001 $5.00

ACM Journal Name, Vol. XX, No. XX, MM 20YY, Pages 1–22.
2 · Paschalidis and Guo

communicate with each other. Moreover, most existing WSNET nodes carry very
rudimentary hardware that only allows the computation of the signal strength seen
by a receiver for packets transmitted by some other node. Additional RF character-
istics, like time-of-flight or angle-of-arrival, are not commonly available and require
more sophisticated hardware. The key idea underlying RF-based localization is as
follows: when a packet is transmitted by a sensor, associated RF characteristics
observed by stationary sensors – the clusterheads – depend on the location of the
transmitting sensor. These observations are exploited to reveal that location. As
we will see, the method we develop can localize using just RF signal strength but
is general enough to accommodate additional RF characteristics should they be
available.
One class of RF-based localization systems relies on a “deterministic” pattern
matching approach as in Bahl and Padmanabhan [2000], Lorincz and Welsh [2006],
and Kaemarungsi and Krishnamurthy [2004]. They use the signal strength (mean)
values observed at a sensor for packets transmitted by a set of beacon nodes and
compare these values to a pre-computed signal-strength map of the coverage area.
RADAR (Bahl and Padmanabhan [2000]), for instance, one of the first localization
systems developed, computes a Euclidean distance between observed signal strength
values at a sensor and the corresponding values pre-recorded at a set of training
locations to determine the location closest to the sensor. Such an approach may
face challenges when the RF signal landscape is highly variable. This is the case
in indoor environments which are very dynamic (e.g., doors opening and closing,
people moving, etc.) and feature multipath and fading.
Another class of localization systems uses triangulation or stochastic triangula-
tion techniques as in Patwari et al. [2003] where signal strength measurements are
used to estimate distance and location. The approach in Madigan et al. [2005] seeks
to benefit from estimating multiple locations at the same time. These techniques
assume a model describing how signal strength diminishes with distance (path loss
formula) and the modeling error can lead to inaccuracies. In experimental results
we report in this paper our approach can reduce the mean error distance by a factor
of 3.6 compared to stochastic triangulation techniques. A different triangulation-
like approach that may be vulnerable to RF signal variability appeared in Yedavalli
et al. [2005] and relies on a monotonicity property of signal strength as a function
of distance to be satisfied most of the time.
The work closer to the approach we present is in Ray et al. [2006] which developed
a stochastic localization system formulating the problem as a standard hypothesis
testing problem. Specifically, signal strength measurements from a number of lo-
cations spread throughout the coverage area are used to obtain probability density
functions (pdfs) of signal strength at every potential clusterhead position. To locate
a sensor somewhere in the coverage area the system tries to “match” measurements
for that sensor to these pdfs, hence, a hypothesis testing problem. A limitation of
this approach is that when the sensor we seek is not close to a location from which
we have measurements, then the observations may not match well with any of the
pdfs leading to errors. One can reduce these errors by obtaining measurements
from more points, but this is costly. This motivates the work in this paper.
The key idea underlying the present work is to partition the coverage area into
ACM Journal Name, Vol. XX, No. XX, MM 20YY.
Robust and Distributed Stochastic Localization in Sensor Networks · 3

a set of regions. The problem is to determine the region where the sensor we seek
resides. To every region-clusterhead pair we associate a family of (signal strength)
pdfs. This is intended to provide robustness with respect to the position of the
sensor within a region. The pdf family can be constructed from measurements
taken from locations within the region and can better represent the region than a
single pdf. We still pose the localization problem as a hypothesis testing problem
but now we have to match signal strength measurements to a pdf family, result-
ing to a composite hypothesis testing problem. In this new framework we consider
the Generalized Likelihood Ratio Test (GLRT) decision rule and obtain a necessary
and sufficient condition under which it is optimal in a Generalized Neyman-Pearson
(GNP) sense, thus, generalizing earlier work in Zeitouni et al. [1992]. Another im-
portant problem we consider is that of optimally placing clusterheads – an optimal
deployment/WSNET design problem – to minimize the maximum probability of
error.
We further demonstrate that our system can localize in a distributed manner
by appropriate in-network processing: clusterheads make observations and take
local decisions which get processed as they propagate through the network of clus-
terheads. The final decision reaches the gateway and, as we show, there is no
performance cost compared to a centralized approach. We have implemented our
approach in a testbed installed at a Boston University building. 1 Our experimental
results establish that we can achieve accuracy that is, roughly, on the same order
of magnitude as the radius of our regions. Specifically, we have achieved a mean
error distance from 8 feet down to 9 inches depending on the size of the regions
we define. The price to pay for greater accuracy is the amount of measurements
needed as smaller (thus, more) regions require more measurements to determine
the family of pdfs corresponding to every region-clusterhead pair.
Our contributions include:
—formulating the localization problem as a composite hypothesis testing problem
aiming at accommodating the stochastic nature of RF signals propagating in-
doors and providing robustness with respect to measurements based on which a
localization decision is made;
—generalizing the GLRT optimality conditions in Zeitouni et al. [1992] to the case
where both hypotheses correspond to a family of pdfs – a result which is of
independent interest;
—characterizing the performance of the localization system which enables
—solving the clusterhead placement problem building on the work in Ray et al.
[2006];
—devising a distributed algorithm for making the localization decision; and
—testing the proposed approach on an actual testbed.
The paper is organized as follows. In Sec. 2, we introduce our system model.
In Sec. 3, we study the composite binary hypothesis testing problem, establish an
optimality condition for GLRT, and obtain bounds on the error exponents which
allow us to optimize the GLRT threshold. We also consider the case where the

1 See http://pythagoras.bu.edu/bloc/index.html
ACM Journal Name, Vol. XX, No. XX, MM 20YY.
4 · Paschalidis and Guo

GLRT optimality conditions are not satisfied. In Sec. 4, we consider the clusterhead
placement problem. In Sec. 5, we develop the distributed decision approach and
make comparisons to a centralized one. In Sec. 6, we provide results from a testbed
implementation of our approach. Finally, in Sec. 7, we draw conclusions.

2. PROBLEM FORMULATION
In this section we introduce our system model. Consider a WSNET deployed in
a site for localization purposes. We divide the site into N regions denoted by an
index set L = {L1 , . . . , LN }. There are M distinct positions B = {B1 , . . . , BM }
at which we can place clusterheads.
Let a sensor be located in region L ∈ L . A set of packets broadcasted by
the sensor is received by some of the clusterheads which observe certain physical
quantities associated with each packet. Often, the observed physical quantities
are just the received signal strength (RSSI) and, if technology allows it, one can
also observe the angle-of-arrival of the signal or other signal characteristics. Our
methodology is general enough to apply to any set of physical observations.
Let y(i) denote the vector of observations by a clusterhead at position Bi corre-
sponding to a packet broadcasted by the sensor. These observations are assumed
to be random. To simplify the analysis in the rest of the paper we will assume that
the observations take values from a finite alphabet Σ = {σ1 , . . . , σ|Σ| }, where |Σ|
denotes the cardinality of Σ. 2 A series of n consecutive observations are denoted by
(i) (i)
y1 , . . . , yn and are assumed independent and identically distributed (i.i.d.) con-
ditioned on the region the sensor node resides. This assumption is justified when
the site is dynamic enough (e.g., doors opening or closing, people moving) so that
the lengths of various radio-paths between the receiver and the transmitter change
on the order of a wavelength between consecutive observations. For example, if a
sensor operates at the 2.4 GHz ISM band, the half-wavelength is only about 6cm,
and body movements of a user who carries the sensor may alone cause observations
separated in time by a few seconds to be i.i.d. Observations made by different clus-
terheads at about the same time need not be independent. We acknowledge that
when the site and the transmitter/receiver are fairly static, observations over such
short times may be correlated; a case we do not handle. The requisite theory could
be developed for that case as well but to “learn” models capturing such correlation
would probably require too many measurements for a practical system.
With every clusterhead-region pair (Bi , Lj ) we associate a family of pdfs
pY(i) |θj (y) where Y(i) denotes the random variable corresponding to observations
y(i) at clusterhead Bi when the transmitting sensor is in some location within Lj .
Here, θ j ∈ Ωj is a vector in some space Ωj parametrizing the pdf family. As men-
tioned earlier, the use of a family of pdfs rather than a single pdf is intended to
provide robustness with respect to the exact position of the sensor within the region
Lj . As we will see later on, we will use measurements at a few locations (or even
a single one) within Lj but we will associate to these measurements a family of
pdfs parametrized by θ j . For example, one could obtain an empirical pdf from the
measurements and associate with Lj pdfs with the same shape as the empirical pdf

2 This is indeed the case in practice since WSNET nodes report quantized RSSI measurements.
ACM Journal Name, Vol. XX, No. XX, MM 20YY.
Robust and Distributed Stochastic Localization in Sensor Networks · 5

and a mean lying in some interval centered at the empirical mean.


Given a family of pdfs for every pair (Bi , Lj ) we are interested in placing K ≤ M
clusterheads at positions in B and use observations by them to determine the region
in which a sensor node resides. To that end, we will (i) characterize the perfor-
mance of the localization system in terms of the probability of error, (ii) develop
an algorithm for placing clusterheads that provides guarantees for the probability
of error, and (iii) develop approaches for determining the sensor location in a dis-
tributed manner by directing the clusterheads to do appropriate processing of their
observations and only forward minimal information to the gateway.

3. BINARY COMPOSITE HYPOTHESIS TESTING


We start our analysis by considering the simpler problem of having a single clus-
terhead at Bk and two possible regions Li and Lj at which the sensor may re-
side. Each region has an associated pdf family pY(k) |θi (y) and pY(k) |θj (y), respec-
(k) (k)
tively. 3 The clusterhead makes n i.i.d. observations y(k),n = (y1 , . . . , yn ) from
which we need to determine the region Li vs. Lj . We will be using the notation
Qn (k)
pY(k) |θi (y(k),n ) = l=1 pY(k) |θi (yl ).
The problem at hand is a binary composite hypothesis testing problem for which
the so called Generalized Likelihood Ratio Test (GLRT) is commonly used. The
GLRT compares the normalized generalized log-likelihood ratio
1 supθi ∈Ωi pY(k) |θi (y(k),n )
Xijk (y(k),n ) = log
n supθj ∈Ωj pY(k) |θj (y(k),n )
to a threshold λ and declares Li whenever

y(k),n ∈ Sijk,n
GLRT
= {yn | Xijk (yn ) ≥ λ},
and Lj otherwise. There are two types of error (referred to as type I and type II,
respectively) associated with a decision with probabilities
GLRT
αijk,n (θ j ) = Pθj [y(k),n ∈ Sijk,n
GLRT
], GLRT
βijk,n (θ i ) = Pθi [y(k),n 6∈ Sijk,n
GLRT
],

where Pθj [·] (resp. Pθi [·]) is a probability evaluated assuming that y(k),n is drawn
S
from pY(k) |θj (·) (resp. pY(k) |θi (·)). We use a similar notation and write αijk,n (θ j )
S
and βijk,n (θ i ) for the error probabilities of any other test that declares Li whenever
y(k),n is in some set Sijk,n . In the sequel, we will often consider the asymptotic
rate according to which these probabilities approach zero as n → ∞. We will use
the term exponent to refer to the quantity limn→∞ n1 log P[·] for some probability
P[·]; if the exponent is d then the probabilities approaches zero as e−nd .
Zeitouni et al. [1992] have established conditions for the optimality of the GLRT
in a Neyman-Pearson sense for general Markov sources. The analysis in Zeitouni
et al. [1992] is carried out for the case where one hypothesis corresponds to a single
pdf and the other to a pdf family. We provide a generalization (in an i.i.d. setting)

3 We note that the pdf families are associated with a region-clusterhead pair. Thus, θj and θi
depend on k as well but we elect to suppress this dependence in the notation for simplicity. We
will be usually referring to the triplet i, j, k, hence, it will be evident to which θj and θi we refer.
ACM Journal Name, Vol. XX, No. XX, MM 20YY.
6 · Paschalidis and Guo

to the situation of interest where both hypotheses correspond to a family of pdfs.


More specifically, we will establish a necessary and sufficient condition for the GLRT
to satisfy the generalized Neyman-Pearson optimality criterion given below.

Definition 1
Generalized Neyman-Pearson (GNP) Criterion: We will say that the deci-
sion rule {Sijk,n } is optimal if it satisfies
1 S
lim sup log αijk,n (θ j ) < −λ, ∀θ j ∈ Ωj , (1)
n→∞ n
1 S
and maximizes − lim supn→∞ n log βijk,n (θ i ) uniformly for all θ i ∈ Ωi .

For any sequence of observations yn = (y1 , . . . , yn ), the empirical measure (or


type) is given by Lyn = (Lyn (σ1 ), . . . , Lyn (σ|Σ| )), where
1 Pn
Lyn (σi ) = 1{yj = σi }, i = 1, . . . , |Σ|,
n j=1
and 1{·} denotes the indicator function. We will denote the set of all possible types
of sequences of length n by Ln = {ν | ν = Lyn for some yn } and the type class
of a probability law ν by Tn (ν) = {yn ∈ Σn | Lyn = ν}, where Σn denotes the
cartesian product of Σ with itself n times. Let
P|Σ|
H(ν) = − i=1 ν(σi ) log ν(σi ),

be the entropy of the probability vector ν and


P|Σ| ν(σi )
D(νkµ) = i=1 ν(σi ) log µ(σi ) ,

the relative entropy of ν with respect to another probability vector µ.


Lemma 3.5.3 in Dembo and Zeitouni [1998] states that it suffices to consider
functions of the empirical measure when trying to construct an optimal test (i.e.,
the empirical measure is a sufficient statistic). Considering hereafter tests that
depend only on Lyn , the so called generalized Hoeffding [1965] test is optimal
according to the GNP criterion and accepts Li when y(k),n is in the set

Sijk,n = {yn | inf D(Lyn kPθj ) ≥ λ},
θj

where Pθj denotes the probability law induced by pY(k) |θj (·). The following lemma
generalizes Hoeffding’s result and a similar result in Zeitouni et al. [1992]; the proof
is in Appendix A.

Lemma 3.1 The generalized Hoeffding test satisfies the GNP criterion.

S
Next, we will determine the exponent of βijk,n (θ i ). Define the set Aijk =
{Q| inf θj D(QkPθj ) < λ}. We have

S
βijk,n (θ i ) = Pθi [y(k),n 6∈ Sijk,n

] = Pθi [Ly(k),n ∈ Aijk ∩ Ln ].
ACM Journal Name, Vol. XX, No. XX, MM 20YY.
Robust and Distributed Stochastic Localization in Sensor Networks · 7

Due to Sanov’s theorem (Dembo and Zeitouni [1998, Chap. 2])


1 S∗
inf D(QkPθi ) ≤ − lim sup log βijk,n (θ i )
Q∈Aijk n→∞ n
1 S∗
≤ − lim inf log βijk,n (θ i ) ≤ inf o D(QkPθi ), (2)
n→∞ n Q∈Aijk

o
where Aijk denotes the interior of Aijk . Since Aijk is an open set the upper and
S∗
lower bounds match and inf Q∈Aijk D(QkPθi ) is the exponent of βijk,n (θ i ).
The following theorem establishes a necessary and sufficient condition for the
optimality of GLRT under the GNP criterion. The proof is in Appendix B.

Theorem 3.2 The GLRT with a threshold λ is asymptotically optimal under the
GNP criterion, if and only if
inf D(QkPθi ) ≥ inf D(QkPθi ), (3)
Q∈Cijk Q∈Aijk

for all θ i , where

Cijk = {Q| inf D(QkPθj ) − inf D(QkPθi ) < λ ≤ inf D(QkPθj )}.
θj θi θj

Furthermore, assuming that (3) is in effect


1 GLRT
lim sup log αijk,n (θ j ) ≤ −λ, ∀θ j ∈ Ωj , (4)
n→∞ n
1 GLRT
lim sup log βijk,n (θ i ) ≤ − inf D(QkPθi ), ∀θ i ∈ Ωi . (5)
n→∞ n Q∈Aijk

3.1 Determining the optimal threshold


Assuming that the condition of Thm. 3.2 is in effect, it can be seen from (4) and
(5) that the exponent of the type I error probability is increasing with λ but the
exponent of the type II error probability is nonincreasing with λ. We have no
preference on the type of error we make, thus, we wish to balance the two exponents
and determine the value of λ at which they become equal. In this subsection we
detail how this can be done and obtain a λ∗ijk that bounds the worst case (over Ωj
and Ωi ) exponents of the type I and type II error probabilities. To simplify the
exposition we will be assuming that Ωj and Ωi are discrete sets; this is also the
case in the experimental setup we describe later on.
Let us consider the exponent of the type II GLRT error probability (cf. (5)):
Zijk (λ, θ i ) = minQ D(QkPθi )
(6)
s.t. minθj D(QkPθj ) ≤ λ.
The worst case exponent over θ i ∈ Ωi is given by
Zijk (λ) = min Zijk (λ, θ i ).
θi

Note that Zijk (λ) is nonincreasing in λ, Zijk (0) = minθi minθj D(Pθj kPθi ), and
limλ→∞ Zijk (λ) = 0. Assuming that Zijk (0) > 0, there exists a λ∗ijk > 0 such that
ACM Journal Name, Vol. XX, No. XX, MM 20YY.
8 · Paschalidis and Guo

Zijk (λ∗ijk ) = λ∗ijk . Furthermore, both error probability exponents in (4) and (5)
are no smaller than λ∗ijk .
Now consider the clusterhead at Bk observing y(k),n and seeking to distinguish
between Li and Lj . Assume that the GLRT using Xijk (y(k),n ) satisfies condition
(3) and, also, the GLRT using Xjik (y(k),n ) satisfies the symmetric condition. The
clusterhead has the option of using the GLRT by comparing Xijk (y(k),n ) to the
threshold λ∗ijk , or comparing Xjik (y(k),n ) to a threshold λ∗jik that can be obtained
in exactly the same way as λ∗ijk . Let
dijk = max{λ∗ijk , λ∗jik }, (7)
and set (ī, j̄) = (i, j) ifλ∗ijk
is the maximizer above; otherwise set (ī, j̄) = (j, i).
Define the maximum probability of error as
(e) △
Pijk,n = max{max αīGLRT GLRT
j̄k,n (θ j̄ ), max βīj̄k,n (θ ī )}.
θ j̄ θ ī

The discussion above leads to the following proposition.

Proposition 3.3 Assume that the GLRT using Xijk (y(k),n ) satisfies condition (3)
and, also, the GLRT using Xjik (y(k),n ) satisfies the symmetric condition. Then,
when the clusterhead at Bk compares Xīj̄k (y(k),n ) to dijk the maximum probability
of error satisfies
1 (e)
lim sup log Pijk,n ≤ −dijk .
n→∞ n

One of the challenges computing dijk is that the problem in (6) is nonconvex.
Specifically, the relative entropy in the constraint is convex in Q but minimization
over θ j yields a piecewise convex function. This may not be an issue when there are
relatively few possible values of θ j and θ i but for large sets Ωj and Ωi computing
dijk becomes expensive. To address this issue, we will next develop a lower bound
(through duality) to Zijk (λ, θ i ).
Let Z̃ijk (λ, θ i ) the optimal value of the dual to (6); by weak duality it follows
Zijk (λ, θ i ) ≥ Z̃ijk (λ, θ i ). We have
 
Z̃ijk (λ, θ i ) = max min min[D(QkPθi ) + µD(QkPθj )] − µλ . (8)
µ≥0 θj Q

Note that the optimization over Q is convex and the optimization over µ is concave,
thus, this problem can be solved efficiently. (In fact, the optimization over Q can be
solved analytically.) It can be seen that Z̃ijk (λ, θ i ) is convex and nonincreasing in
λ for all θ i . Furthermore, the exponent of the type II GLRT error probability is no
smaller than Z̃ijk (λ) = minθi Z̃ijk (λ, θ i ). Note that Z̃ijk (λ) is also nonincreasing
in λ, Z̃ijk (0) = minθi minθj D(Pθj kPθi ), and limλ→∞ Z̃ijk (λ) = 0. Assuming that
Z̃ijk (0) > 0, there exists a λ̃∗ijk > 0 such that Z̃ijk (λ̃∗ijk ) = λ̃∗ijk . Furthermore, both
error exponents in (4) and (5) are no smaller than λ̃∗ijk .
Following the same line of development as before we set
d˜ijk = max{λ̃∗ijk , λ̃∗jik }, (9)
ACM Journal Name, Vol. XX, No. XX, MM 20YY.
Robust and Distributed Stochastic Localization in Sensor Networks · 9

(e)
and define ī, j̄, and Pijk,n in the same way as earlier. It can be seen that d˜ijk ≤ dijk .
We arrive at the following proposition which provides a weaker but more easily
computable probabilistic guarantee on the probability of error.

Proposition 3.4 Assume that the GLRT using Xijk (y(k),n ) satisfies condition (3)
and, also, the GLRT using Xjik (y(k),n ) satisfies the symmetric condition. Then,
when the clusterhead at Bk compares Xīj̄k (y(k),n ) to d˜ijk the maximum probability
of error satisfies
1 (e)
lim sup log Pijk,n ≤ −d˜ijk .
n→∞ n
Next, we tackle the case when the GLRT optimality condition (3) is not satisfied.

3.2 When the GLRT is not optimal


Define the set Dijk = {Q| inf θj D(QkPθj ) − inf θi D(QkPθi ) < λ} and note that
by an argument similar to (29) y(k),n ∈ Sijk,n
GLRT
is equivalent to Ly(k),n 6∈ Dijk .
Hence, by Sanov’s theorem
1 GLRT
lim sup log αijk,n (θ j ) ≤ − inf D(QkPθj ), ∀θ j ∈ Ωj , (10)
n→∞ n Q∈Dijk

1 GLRT
lim sup log βijk,n (θ i ) ≤ − inf D(QkPθi ), ∀θ i ∈ Ωi . (11)
n→∞ n Q∈Dijk

Using the same argument as in the proof of Thm. 3.2 we can show that (4) still
holds. The exponent of the type II GLRT error probability (cf. (11)) is
Ẑijk (λ, θ i ) = minQ D(QkPθi )
s.t. minθj D(QkPθj ) − minθi D(QkPθi ) ≤ λ,
which is equivalent to
Ẑijk (λ, θ i ) = minQ D(QkPθi )
(12)
s.t. minθj D(QkPθj ) − D(QkPθi ) ≤ λ, ∀θ i .
The worst case exponent over θ i ∈ Ωi is given by
Ẑijk (λ) = min Ẑijk (λ, θ i ).
θi

Ẑijk (λ) is nonincreasing in λ, and limλ→∞ Ẑijk (λ) = 0. Assuming that Ẑijk (0) > 0,
there exists a λ̂∗ijk > 0 such that Ẑijk (λ̂∗ijk ) = λ̂∗ijk . Furthermore, both error
probability exponents in (10) and (11) are no smaller than λ̂∗ijk .
Following the same argument as before we set
dˆijk = max{λ̂∗ijk , λ̂∗jik }, (13)
(e)
and define ī, j̄, and Pijk,n in the same way as earlier. The discussion above leads
to the following proposition.

ACM Journal Name, Vol. XX, No. XX, MM 20YY.


10 · Paschalidis and Guo

Proposition 3.5 Suppose that the clusterhead at Bk uses the GLRT and compares
Xīj̄k (y(k),n ) to dˆijk . Then, the maximum probability of error satisfies
1 (e)
lim sup log Pijk,n ≤ −dˆijk .
n→∞ n
Problem in (12) is nonconvex; we will again use dual relaxation to obtain a
quantity that is easier to compute. Let Z̄ijk (λ, θ i ) the optimal value of the dual of
(12); by weak duality it follows Ẑijk (λ, θ i ) ≥ Z̄ijk (λ, θ i ). After some algebra
|Σ|
X P
Z̄ijk (λ, θ i ) = max [min min[ Q(σr ) log(Q(σr )A(σr )) − θ i µθ i λ], (14)
µθi ≥0 θ j Q
r=1
 µθi
PY(k) |θ (σr )
1
Q
where A(σr ) = PY(k) |θ (σr ) · θi
i
PY(k) |θ (σr ) . Note that the optimization
i j
over Q is convex and the optimization over µθi is concave, thus, this problem can
be solved efficiently. In fact, the optimization over Q can be solved analytically
yielding
 
1
P|Σ| 1
Q(σl ) = A(σ l)
/ r=1 A(σr ) , l = 1, . . . , |Σ|.

Z̄ijk (λ, θ i ) is convex and nonincreasing in λ for all θ i . Furthermore, the exponent of
the type II GLRT error probability is no smaller than Z̄ijk (λ) = minθi Z̄ijk (λ, θ i ).
Note that Z̄ijk (λ) is also nonincreasing in λ, and limλ→∞ Z̄ijk (λ) = 0. Assuming
that Z̄ijk (0) > 0, there exists a λ̄∗ijk > 0 such that Z̄ijk (λ̄∗ijk ) = λ̄∗ijk . Furthermore,
both error exponents in (10) and (11) are no smaller than λ̄∗ijk .
Following the same approach as before, set
d¯ijk = max{λ̄∗ijk , λ̄∗jik }, (15)
(e)
and define ī, j̄, and Pijk,n in the same way as earlier. It can be seen that dˆijk ≥ d¯ijk .
We arrive at the following proposition which provides a weaker but more easily
computable probabilistic guarantee on the probability of error.

Proposition 3.6 Suppose that the clusterhead at Bk uses the GLRT and compares
Xīj̄k (y(k),n ) to d¯ijk . Then, the maximum probability of error satisfies
1 (e)
lim sup log Pijk,n ≤ −d¯ijk .
n→∞ n
4. LOCALIZATION AND CLUSTERHEAD PLACEMENT
In this section, we focus on how to place the K ≤ M clusterheads at positions in B
to facilitate localization. We start by considering the multiple composite hypothesis
testing problem of identifying the region L ∈ L in which the sensor we seek resides.
4.1 Multiple composite hypothesis testing
We assume, without loss of generality, that we have placed clusterheads in positions
(k) (k)
B1 , . . . , BK , each one making n i.i.d. observations y(k),n = (y1 , . . . , yn ). Let dijk
be the GLRT threshold obtained in Sec. 3 for each region pair (i, j), i < j, and
ACM Journal Name, Vol. XX, No. XX, MM 20YY.
Robust and Distributed Stochastic Localization in Sensor Networks · 11

clusterhead k. (dijk is obtained from either (7), (9), (13), or (15), depending on
which optimization problem we elect to solve.)
We make N − 1 binary decisions with the GLRT rule to arrive at a final decision.
Specifically, we first compare L1 with L2 to accept one hypothesis, then compare
the accepted hypothesis with L3 , and so on and so forth. For each one of these
Li vs. Lj decisions we use a single clusterhead Bk as detailed in Sec. 3 and the
exponent of the corresponding maximum probability of error is bounded by dijk .
All in all we make N − 1 binary hypothesis decisions.
4.2 Clusterhead placement
Our objective is to minimize the worst case probability of error. To that end, for
every pair of regions Li and Lj we need to find a clusterhead that can discriminate
between them with a probability of error exponent larger than some ǫ and then
maximize ǫ. This is accomplished by the mixed integer linear programming problem
(MILP) formulation of Figure 1.

max ǫ (16)
PM
s.t. k=1 xk = K (17)
PM
k=1 yijk = 1, i, j = 1, . . . , N, i < j, (18)
yijk ≤ xk , ∀i, j, i < j, k = 1, . . . , M, (19)
ǫ≤ M
P
k=1 dijk yijk , ∀i, j, i < j, (20)
yijk ≥ 0, ∀i, j, i < j, ∀k, (21)
xk ∈ {0, 1}, ∀k. (22)

Fig. 1. Clusterhead placement MILP formulation.

In this formulation, the decision variables are xk , yijk , and ǫ where k = 1, . . . , M ,


i, j = 1, . . . , N, i < j. xk is the indicator function of a clusterhead been placed at
position Bk . Equation (17) represents the constraint that K clusterheads are to be
placed. Constraint (20) enforces that for every region pair there exist a clusterhead
k with dijk larger than ǫ. Let x∗k , yijk∗
, and ǫ∗ (k = 1, . . . , M , i, j = 1, . . . , N , i < j)
be an optimal solution of this MILP. Although this problem is NP-hard, it can
be solved efficiently for sites with more than 100 regions and potential clusterhead
positions by using a special purpose algorithm from Ray et al. [2006].
Consider an arbitrary placement of K clusterheads. More specifically, let Y be
any subset of the set of potential clusterhead positions B with cardinality K. Let
x(Y ) = (x1 (Y ), . . . , xM (Y )) where xk (Y ) is the indicator function of Bk being in
Y . Define:
ǫ(Y ) = min max dijk . (23)
i,j=1,...,N k:xk (Y )=1
i<j

We can interpret maxk:xk (Y )=1 dijk as the best exponent for the probability of error
in distinguishing between regions Li and Lj from some clusterhead in Y . Then
ǫ(Y ) is simply the worst pairwise exponent. The following result is from Ray et al.
[2006].
ACM Journal Name, Vol. XX, No. XX, MM 20YY.
12 · Paschalidis and Guo

Proposition 4.1 For any clusterhead placement Y we have


ǫ∗ ≥ ǫ(Y ). (24)
Moreover, the selected placement achieves equality; i.e.,
ǫ∗ = min max dijk , (25)
i,j=1,...,N k:x∗
k =1
i<j

and the optimal solution satisfies



1, if k = arg max dijk ,
∗ k:x∗
yijk = k =1 ∀i, j, i < j, ∀k, (26)
0, otherwise,


where at most one yijk is set to 1 for a given (i, j) pair.

To summarize, the positions where clusterheads are to be placed are in the set

Y ∗ = {Bk : x∗k = 1}. For every region pair Li and Lj , maxk:x∗k =1 dijk is the best
exponent for the probability of error in distinguishing between these regions and
the clusterhead that will be responsible for that decision is the one corresponding
∗ ∗
to yijk = 1; we will be denoting by kij the index of this clusterhead.

4.3 Performance guarantee


We will use the decision rule outlined in Sec. 4.1 and for every region pair (i, j)
∗ to make the corresponding decision. The
we will rely on the clusterhead at Bkij
following theorem establishes a performance guarantee.

Proposition 4.2 Let x∗ , y∗ be an optimal solution of the MILP in Fig. 1 with


corresponding optimal value ǫ∗ . Place clusterheads according Y ∗ , {Bk |x∗k = 1}
∗ ∗
and for every (i, j) select one clusterhead with index kij so that yijk ∗ = 1. Then, the
ij
(e),opt
worst case probability of error for the decision rule described in Sec. 4.1, Pn ,
satisfies
1
lim sup log Pn(e),opt ≤ −ǫ∗ . (27)
n→∞ n
Proof. Recall the results of Props. 3.3, 3.4, 3.5, and 3.6 for the case where dijk
is defined either by (7), (9), (13), or by (15), respectively. Define (ī, j̄) as in Sec.

3.
∗ (kij ),n
The clusterhead with index kij will use the GLRT which compares Xīj̄kij ∗ (y )
to dijkij
∗ , thus, achieving a maximum probability of error with exponent no smaller

than dijkij∗ . Now, for every i and j 6= i define En (i, j) as the event that the GLRT

employed by the clusterhead at Bkij ∗ will decide Lj under Pθ . For all δn > 0 and
i
large enough n we have
X −n(dijk∗ +δn ) ∗
Pθi [error] ≤ Pθi [∪j6=i En (i, j)] ≤ e ij ≤ (N − 1)e−n(ǫ +δn ) .
j6=i

The 2nd inequality above is due to Props. 3.3, 3.4, 3.5, or 3.6 and the last inequality
above is due to (25). Since the bound above holds for all i we obtain (27).
ACM Journal Name, Vol. XX, No. XX, MM 20YY.
Robust and Distributed Stochastic Localization in Sensor Networks · 13

5. DISTRIBUTED LOCALIZATION
In this section we consider the implementation of the decision rule described in
Sec. 4. We assume that the WSNET has a single gateway. We seek to devise a
distributed localization algorithm in order to minimize the information that needs
to be exchanged between clusterheads and the gateway. The primary motivation
is that in WSNETs communication is, in general, more expensive than processing.
For the remainder of this section we will assume that the clusterheads and the
gateway form a connected network. Otherwise, one can simply add a sufficient
number of relays.

5.1 Centralized approach


To set the stage and establish a benchmark to which we will compare other ap-
proaches we first describe a naive, centralized, approach. According to this ap-
(k) (k)
proach, every clusterhead observes y(k),n = (y1 , . . . , yn ) and transmits this in-
formation to the gateway. The clusterheads do not need to store anything and
perform no processing; they are simple sensors that transmit their measurements.
(k)
Letting S1 the message size (in bits) needed to encode the measurement yl , for
some l, the total amount of information that needs to be transported is O(S1 nK)
bits. Each one of these bits has to be sent over multiple hops to reach the gate-
way; in the worst case over K hops. Thus, the worst case communication cost is
O(S1 nK 2 ) bits. In our setting, we are interested in deploying as few clusterheads
as possible, thus, the resulting clusterhead network is sparse and likely to have a
linear topology. This implies that the worst case communication cost of O(K) per
transmission may be typical. If the clusterhead network is more dense, a binary
tree may be a good assumption for its topology which implies a communication cost
of O(log K) per transmission. In that best case scenario the total communication
cost becomes O(S1 nK log K). Once this information is received, the gateway can
apply the decision rule discussed in Sec. 4 to identify the region at which the sensor
in question resides.

5.2 Distributed approach


Next we describe a distributed implementation for the decision rule.
We start with an arbitrary pair of regions, say L1 vs. L2 . The clusterhead at

Bk12
∗ based on the observations y(k12 ),n uses the GLRT to make the decision; let
Ll1 the hypothesis accepted. The clusterhead at Bk12 ∗ sends the information that
l1 is accepted to the clusterhead at Bkl∗ ,3 which follows up with the decision Ll1
1
vs. L3 , and so on and so forth. Let now Lli denote the hypothesis accepted at
stage i of the algorithm, for i = 1, . . . , N − 1, where we set l0 = 1. At the ith
stage, the clusterhead at Bkl∗ ,(i+1) makes the decision Lli−1 vs. Li+1 and sends
i−1
the result to the clusterhead at Bkl∗ ,(i+2) , where the clusterhead at Bkl∗ ,(N +1) is
i N −1
the gateway. All in all this procedure takes N − 1 stages and LlN −1 is the final
accepted hypothesis.
Each clusterhead is responsible for a set of region pairs and needs to store the
corresponding pdfs and thresholds dijk as well as the necessary information to de-
cide where to forward its decision. At every stage i = 1, . . . , N − 1 it takes O(n)
ACM Journal Name, Vol. XX, No. XX, MM 20YY.
14 · Paschalidis and Guo

work to perform the GLRT, yielding an overall O(nN ) processing effort distributed
to the K clusterheads. In terms of communication cost, N − 1 messages get ex-
changed each consisting of O(log N ) bits needed to encode the decision. Each of
these messages can, in the worst case, be sent over O(K) hops if two distant clus-
terheads need to communicate, yielding an overall worst case communication cost
of O(KN log N ). However, one can sequence the regions in such a way that ge-
ographically close regions are close in the sequence. As a result, it will often be
the case that clusterheads responsible for region pairs close in the sequence will be
geographically close.
We note that this “locality” property is plausible since the signal landscape is
primarily influenced by the structure of the site. Hence, it is reasonable to expect
the best clusterheads for nearby regions to be geographically close. To see that,
consider a large deployment with a radius much larger than the range of the sensor
nodes. The clusterheads responsible for nearby regions should be able to listen
to sensors within these regions, which implies that they are geographically close
compared to the overall size of the deployment.
This results in messages between clusterheads traveling a few hops. It follows
that the overall communication cost will often be O(N log N ).
Based on the preceding analysis, Table I compares the centralized and distributed
approaches. A couple of remarks are in order. The total processing cost is the same

Table I. Comparing the centralized and distributed approaches. Typically K = O(N ).


Communication cost (bits) Processing cost
worst: O(S1 nK 2 ) O(nN ) at the
Centralized
best: O(S1 nK log K) gateway
worst: O(KN log N ) O(nN ) at the
Distributed
best: O(N log N ) Kclusterheads

for both approaches but in the distributed case the work is distributed among the K
clusterheads. To compare the communication costs note that typically K = O(N )
to ensure reasonable performance (e.g., one clusterhead for a fixed number of re-
gions). Moreover, S1 is the message size for the raw measurements at a clusterhead
corresponding to a packet sent from the transmitting sensor, while n can be large
enough (e.g., 20-30) so that the probability of error becomes small enough. Fur-
thermore, based on the discussion earlier, we expect the worst case to be typical
in the centralized approach while the best case should be typical in the distributed
approach. It follows that the distributed approach leads to communication cost
(and energy) savings.
Note that both the centralized and the distributed approach guarantee the per-
formance of the system obtained in Prop. 4.2, i.e., the savings from the distributed
approach come with no performance loss.

6. EXPERIMENTAL RESULTS
Next, we provide experimental results from a localization testbed we have installed
at Boston University (BU).
ACM Journal Name, Vol. XX, No. XX, MM 20YY.
Robust and Distributed Stochastic Localization in Sensor Networks · 15

Fig. 2. Floor plan for the testbed.

The localization testbed was implemented in a BU building and covered the


areas shown on the floorplan of Fig. 2. The coverage area included typical faculty
and student offices, several large rooms used as labs, two conference rooms, and an
equipment bay with heavy machining equipment located on a basement level (lower
middle shaded section in Fig. 2) whereas the rest of the covered area is located on
the 1st floor.
The testbed uses MPR2400 (MICAz) motes from Crossbow Technology Inc. The
MPR2400 (2400–2483.5 MHz band) uses the Chipcon CC2420, IEEE 802.15.4 com-
pliant, ZigBee-ready radio frequency transceiver integrated with an Atmega128L
micro-controller. Its radio can be tuned within the IEEE 802.15.4 channels, num-
bered from 11 (2.405 GHz) to 26 (2.480 GHz), each separated by 5 MHz. The RF
transmission power is programmable from 0 dBm (1 mW) to −25 dBm. We cov-
ered 16 rooms and corridors and defined 60 regions. Within each region we placed
a mote; the centers of these positions are identified by either a green circle or a
red square on the floor plan. These 60 positions make up the set B of possible
clusterhead positions. Hence, in our testbed N = M = 60 and Bj can be thought
as the center of Lj . All 60 motes are connected to a base MICAz through a mesh
network. The base mote is docked on a programming board which is connected to
a laptop acting as a server.
The experimental validation of our localization approach can be divided into
the five phases outlined in Fig. 3. Phase 1 can be carried out automatically by
scheduling the motes so that when one is broadcasting the others are listening. We
construct our pdf databases by measuring 200 packets for each pair of motes sent
over two frequency channels and with two different power levels. These packets
ACM Journal Name, Vol. XX, No. XX, MM 20YY.
16 · Paschalidis and Guo

(1) For each pair of positions (Bk , Bj ) estimate the pdf pY(k) |B (y) of RSSI at Bk when the
j
mote at Bj is transmitting. Let mjk denote the corresponding mean.
(2) For each (Bk , Bj ) construct a pdf family {pY(k) |θ (y), θj ∈ Ωj } to characterize transmissions
j
from positions within Lj .
(3) Compute the exponent dijk as described in Sec. 3.1.
(4) Determine the clusterhead placement by the algorithm in Sec. 4.2.
(5) Determine the location of any mote in the coverage area by the decision rule of Sec. 4.1.

Fig. 3. Phases of the experimental validation.

are sent over a long enough time interval to capture the environment in different
“states” and thus account for the variability in RSSI measurements. For Phase 2
we define an interval [mjk − m̂jk , mjk + m̂jk ] and select points θj,1 , . . . , θj,R in this
interval. We construct the family {pY(k) |θj (y), θ j ∈ Ωj } so that the lth member
has the same shape as pY(k) |Bj (y) but a mean equal to θj,l , for l = 1, . . . , R. m̂jk is
selected appropriately so that the union over j, k of the intervals [mjk − m̂jk , mjk +
m̂jk ] is maximized and there is no overlap. The value of R determines how rich
are the pdf families; in our experiments θj,1 , . . . , θj,R were selected to include all
integers in the interval [mjk − m̂jk , mjk + m̂jk ]. It can be seen that to construct
the pdf families we only used measurements from a single point (the center) within
a region. Therefore, the measurement campaign is not necessarily more expensive
than the one required by the approach in Ray et al. [2006] which uses a single
(rather than a family) pdf per region. For Phase 3 we were not able to verify
the GLRT optimality condition (cf. Thm. 3.2), so we obtained dijk by computing
the type II exponent as in (15). The optimal placement obtained in Phase 4 is
shown in Fig. 2 where we used 12 clusterheads placed at the positions of the red
squares on the graph. The number of clusterheads was selected to achieve a small
enough probability of error (cf. Prop. 4.2). The training phase (Steps 1 and 2
of Fig. 3) takes about a day. Step 3 depends on the hardware used to solve the
corresponding optimization problems. It took about 2 days for our testbed. Step 4
takes just about half an hour. Note that these steps are performed once, assuming
that the environment does not change structurally in a very dramatic manner. The
detection phase (Step 5) takes on the order of 40 seconds. Finally, in terms of
storage requirements, the distributed algorithm needs to store about 2 Kbytes in
each of the clusterheads (pdf families, where to forward decisions, etc.).
We obtained results for three versions of the localization system. We made 100
localization tests in positions spread within the covered area. Each test used 20
packets (RSSI measurements) broadcasted by the mote to be located (5 over each
channel and power level pair for the 2F 2P cases described below). In Version 1 the
mote we wish to locate transmits packets over a single frequency (2.410 GHz) and
a single power level (0 dBm) and the system uses the GLRT (we write 1F 1P − G to
indicate Ver. 1 in Fig. 4) to determine the region of the mote. In Version 2 (denoted
by 2F 2P −G) RSSI observations are made for packets transmitted over two different
frequencies (2.410 GHz and 2.460 GHz) and two different power levels (0 dBm and
−10 dBm) and the GLRT is again used. Version 3 (denoted by 2F 2P − L) is
ACM Journal Name, Vol. XX, No. XX, MM 20YY.
Robust and Distributed Stochastic Localization in Sensor Networks · 17

Large Scale Small Scale


1F 1P − G
Histogram of Error Distance for Location Detection Experiment Histogram of Error Distance for Location Detection Experiment
15 8

10

The Number per Interval


The Number per Interval

3
5

0 0
0 200 400 600 800 1000 1200 1400 1600 0 5 10 15 20 25 30 35
Errror Distance Unit:inch Errror Distance Unit:inch

D̄e = 266.07 inches D̄e = 11.10 inches


(a) (d)
2F 2P − G
Histogram of Error Distance for Location Detection Experiment Histogram of Error Distance for Location Detection Experiment
14 12

12
10

10
The Number per Interval 8
The Number per Interval

4
4

2
2

0 0
0 50 100 150 200 250 300 350 400 0 5 10 15 20 25 30 35
Errror Distance Unit:inch Errror Distance Unit:inch

D̄e = 96.07 inches D̄e = 9.77 inches


(b) (e)
2F 2P − L
Histogram of Error Distance for Location Detection Experiment Histogram of Error Distance for Location Detection Experiment
15 12

10

10 8
The Number per Interval
The Number per Interval

5 4

0 0
0 200 400 600 800 1000 1200 1400 1600 1800 0 5 10 15 20 25 30 35
Errror Distance Unit:inch Errror Distance Unit:inch

D̄e = 183.44 inches D̄e = 9.26 inches


(c) (f)

Fig. 4. Results for for various versions of the system.

identical to Version 2 but the LRT rather than the GLRT is used where every region
is represented by just the pdf observed in Phase 1 (rather than a pdf family). For
each Version 1–3 results are reported in Fig. 4(a)–(c), respectively. In each of these
figures we plot the histogram of the error distance (in inches) based on 100 trials.
If the system identifies region Lj as the one where the transmitting mote is located
then the error distance is defined as the distance between the transmitting mote and
ACM Journal Name, Vol. XX, No. XX, MM 20YY.
18 · Paschalidis and Guo

Bj . For each system we also report the corresponding mean error distance (D̄e ).
We stress that for each trial the location of the transmitting mote is randomly
selected and almost never one at which RSSI measurements have been made in
Phase 1.
The results show that the 2F 2P − G system, which exploits frequency and power
diversity, outperforms the 1F 1P − G system. Clearly, RSSI measurements at mul-
tiple power and frequency levels contain more information about the transmitter
location. Also, the 2F 2P − G system outperforms the 2F 2P − L system which uses
the standard LRT decision rule. This demonstrates that, as envisioned, the GLRT
provides robustness leading to better performance. The issue with the LRT is that
a single pdf can not adequately represent a relatively large region. We also note
that the total coverage area was 5258 feet2 , that is, about 87 feet2 per region. With
a mean error distance of D̄e = 8 feet the mean area of “confusion” was 82 = 64
feet2 . From these results it is evident that we were able to achieve accuracy on the
same order of magnitude as the mean area of a region. That is, the system was
identifying the correct or a neighboring region most of the time. Put differently, we
can say that the achieved mean√ error distance is about the√ same as the radius of a
region, defined as radius = area (for our experiments 87 = 9.3 feet which is in
fact larger than the mean error distance of 8 feet). We used a clusterhead density of
1 clusterhead per 5258/12 = 438 feet2 . Note that our system is not localizing based
on “proximity” to a clusterhead; one clusterhead corresponds to about 5 regions
thus resulting into cost savings compared to proximity-based systems that need a
higher density of observers.
An interesting question is whether the pdf families constructed during the train-
ing phase remain valid after a long period of time or need very frequent updating
(which is costly). To answer this question, we performed another (smaller) set of
56 localization tests after about one year from the time we derived our pdf fami-
lies. This second set of tests yielded a mean error distance of 87.32 inches for the
2F2P-G system, quite similar to the earlier tests. During this year there have been
modest changes in the building with labs and conference rooms been reorganized
and several faculty moving to new offices.
For comparison purposes, we also used the same testbed and the exact same tests
with the stochastic triangulation method of Patwari et al. [2003]. Patwari et al.
[2003] assumes that the RSSI (in db) at Bk when the mote at Bj is transmitting,
say Y (k) |Bj , is a random variable with a Gaussian distribution. The mean of
RSSI satisfies the path loss formula Ȳ (k) |Bj = Y0 − 10np log10 (ζkj /ζ0 ), where ζij
is the distance between Bk and Bj and ζ0 is a normalizing constant. From prior
measurements we obtained np = 3.65 and Y0 = −48.62 dBm for ζ0 = 3 feet. The
location estimation is obtained by maximum likelihood estimation. Applying this
method and using our clusterheads in the exact same position as before resulted in
a mean error distance of 341.72 inches (29 feet) which is much larger (a factor of
3.6!) than the 8 feet obtained by our method.
These results raised the question whether smaller regions can lead to better
accuracy. To that end, we placed 12 motes on a table (two rows of 6 motes each).
Two neighboring motes in one row (or in one column) were 6 inches apart. We
defined a 36 inches2 region around each mote and followed the exact same procedure
ACM Journal Name, Vol. XX, No. XX, MM 20YY.
Robust and Distributed Stochastic Localization in Sensor Networks · 19

as before. The results of this “small scale” localization experiment are in Fig. 4(d)–
(f). As before frequency and power diversity improve performance. Here, however,
the GLRT does not make a difference compared to LRT and this is because every
trial point in the coverage area is very close to a point we have measurements from.
With the LRT we can achieve a mean error distance of 9.26 inches, that is, we can
again achieve an accuracy on the same order of magnitude as the mean area of a
region.

7. CONCLUSIONS
In this paper, we presented a robust and distributed approach for locating the area
(region) where sensors of a WSNET reside. We posed the problem of localization
as a multiple composite hypothesis testing problem and proposed a GLRT-based
decision rule. We established a necessary and sufficient condition for the GLRT
to be optimal in a generalized Neyman-Pearson sense but also considered the case
where such optimality conditions are not met. Developing asymptotic results on
the type I and type II error exponents, we described how an optimal GLRT thresh-
old can be obtained. We then turned to the problem of optimally placing a given
number of clusterheads to minimize the probability of error. We devised a place-
ment algorithm that provides a probabilistic guarantee on the probability of error.
Furthermore, we proposed a distributed approach to implement the GLRT-based
decision rule and demonstrated that this can lead to savings in the communication
cost compared to a centralized approach.
We validated our approach using testbed implementations involving MICAz
motes manufactured by Crossbow. Our experimental results demonstrate that the
GLRT-based system provides significant robustness (and improved performance)
compared to an LRT-based system such as the one in Ray et al. [2006]. Fur-
thermore, our approach leads to significantly improved accuracy compared to a
stochastic triangulation technique like the one in Patwari et al. [2003] – by a factor
of 3.6 in our tests. We showed that we can achieve an accuracy on the same order
of magnitude as the mean area of a region. This represents the best possible ac-
curacy for a system which identifies the region of the mote rather than estimating
the exact location. Smaller regions (and more clusterheads) lead to better accuracy
but at the expense of more initial measurements (training) and higher equipment
cost. This provides a rule of thumb for practical systems: define as small regions
as possible given a tolerable amount of initial measurements and cost.

Acknowledgments
We would like to thank Binbin Li for implementing the stochastic triangulation
approach which was compared to ours.

REFERENCES
Bahl, P. and Padmanabhan, V. 2000. RADAR: An in-building RF-based user location and
tracking system. In Proceedings of the IEEE INFOCOM Conference. IEEE, Tel-Aviv, Israel.
Dembo, A. and Zeitouni, O. 1998. Large Deviations Techniques and Applications, 2nd ed.
Springer-Verlag, NY.
Hoeffding, W. 1965. Asymptotically optimal tests for multinomial distributions. Ann. Math.
Statist. 36, 369–401.
ACM Journal Name, Vol. XX, No. XX, MM 20YY.
20 · Paschalidis and Guo

Kaemarungsi, K. and Krishnamurthy, P. 2004. Modeling of indoor positioning systems based


on location fingerprinting. In Proceedings of the IEEE INFOCOM Conference.
Lorincz, K. and Welsh, M. 2006. Motetrack: A robust, decentralized approach to rf-based
location tracking. Springer Personal and Ubiquitous Computing, Special Issue on Location
and Context-Awareness, 1617–4909.
Madigan, D., Elnahrawy, E., Martin, R. P., Ju, W.-H., Krishnan, P., and Krishnakumar,
A. S. 2005. Bayesian indoor positioning systems. In Proceedings of the IEEE Infocom Confer-
ence. Miami, Florida.
Paschalidis, I. C. and Guo, D. 2007. Robust and distributed localization in sensor networks. In
Proceedings of the 46th IEEE Conference on Decision and Control. New Orleans, Louisiana,
933–938.
Patwari, N., Hero, A. O., Perkins, M., Correal, N. S., and O’Dea, R. J. 2003. Relative
location estimation in wireless sensor networks. IEEE Transanctions on Signal Processing 51, 8,
2137–2148.
Ray, S., Lai, W., and Paschalidis, I. C. 2006. Statistical location detection with sensor networks.
Joint special issue IEEE/ACM Trans. Networking and IEEE Trans. Information Theory 52, 6,
2670–2683.
Yedavalli, K., Krishnamachari, B., Ravula, S., and Srinivasan, B. 2005. Ecolocation: A
sequence based technique for rf-only localization in wireless sensor networks. In The Fourth
International Conference on Information Processing in Sensor Networks. Los Angeles, CA.
Zeitouni, O., Ziv, J., and Merhav, N. 1992. When is the generalized likelihood ratio test
optimal? IEEE Trans. Inform. Theory 38, 5, 1597–1602.

A. PROOF OF LEMMA 3.1


Proof. For all θ j ∈ Ωj we have

S
αijk,n (θ j ) =Pθj [y(k),n ∈ Sijk,n

]
X
= |Tn (Ly(k),n )| pY(k) |θj (y(k),n )

{Ly(k),n |Tn (Ly(k),n )⊆Sijk,n }

nH(Ly(k),n ) −n[H(Ly(k),n )+D(Ly(k),n kPθj )]


X
≤ e e

{Ly(k),n |Tn (Ly(k),n )⊆Sijk,n }

−nD(Ly(k),n kPθj )
X
= e

{Ly(k),n |Tn (Ly(k),n )⊆Sijk,n }

≤(n + 1)|Σ| e−nλ ,

which establishes (1). For the first inequality above note that the size of the type
nH(Ly(k),n )
class of Ly(k),n is upper bounded by e and that the probability of a
sequence can be written in terms of the entropy and the relative entropy of its
type (see Dembo and Zeitouni [1998, Chap. 2]). In the last inequality above we

used the definition of Sijk,n and the fact that the set of all possible types, Ln , has
cardinality upper bounded by (n + 1)|Σ| (Dembo and Zeitouni [1998, Chap. 2]).
Let now Sijk,n be some other decision rule satisfying constraint (1), hence, for
all ǫ > 0 and all large enough n
S
αijk,n (θ j ) ≤ e−n(λ+ǫ) . (28)
ACM Journal Name, Vol. XX, No. XX, MM 20YY.
Robust and Distributed Stochastic Localization in Sensor Networks · 21

Meanwhile for all ǫ > 0, all large enough n, and any y(k),n ∈ Sijk,n
X
S
αijk,n (θ j ) = |Tn (Ly(k),n )| pY(k) |θj (y(k),n )
{Ly(k),n |Tn (Ly(k),n )⊆Sijk,n }

−nD(Ly(k),n kPθj )
X
≥ (n + 1)−|Σ| e
{Ly(k),n |Tn (Ly(k),n )⊆Sijk,n }

−n[D(Ly(k),n kPθj )+ǫ]


≥e ,

where the first inequality above uses Dembo and Zeitouni [1998, Lemma 2.1.8].
Comparing the above with (28) it follows that if y(k),n ∈ Sijk,n then for all θ j
D(Ly(k),n kPθj ) ≥ λ, hence, y(k),n ∈ Sijk,n ∗ ∗
and Sijk,n ⊆ Sijk,n . Consequently,
S S∗
for all θ i βijk,n (θ i ) ≥ βijk,n (θ i ) which establishes that the generalized Hoeffding
test maximizes the exponent of the type II error probability. We conclude that it
satisfies the GNP criterion.

B. PROOF OF THEOREM 3.2



GLRT
Proof. We first show that Sijk,n ⊆ Sijk,n . For y(k),n ∈ Sijk,n
GLRT

1 1
λ≤ log sup pY(k) |θi (y(k),n ) − log sup pY(k) |θj (y(k),n )
n θi n θj

= sup[−H(Ly(k),n ) − D(Ly(k),n kPθi )]


θi

− sup[−H(Ly(k),n ) − D(Ly(k),n kPθj )]


θj

= − inf D(Ly(k),n kPθi ) + inf D(Ly(k),n kPθj ) (29)


θi θj

≤ inf D(Ly(k),n kPθj ),


θj


which implies that y(k),n ∈ Sijk,n
∗ GLRT
. It follows that αijk,n S
(θ j ) ≤ αijk,n (θ j ) which
establishes that the GLRT satisfies (1) and (4) due to Lemma 3.1.
For the type II error probability we have
GLRT
βijk,n (θ i ) =Pθi [y(k),n 6∈ Sijk,n
GLRT
]
=Pθi [y(k),n 6∈ Sijk,n

] + Pθi [y(k),n ∈ Sijk,n
∗ GLRT ],
∩ Sijk,n (30)

GLRT denotes the complement of S GLRT .


where Sijk,n ijk,n
Now, if y(k),n ∈ Sijk,n
GLRT then due to (29)

1 1
λ> log sup pY(k) |θi (y(k),n ) − log sup pY(k) |θj (y(k),n )
n θi n θj

= inf D(Ly(k),n kPθj ) − inf D(Ly(k),n kPθi ),


θj θi

and if y(k),n ∈ Sijk,n



then λ ≤ inf θj D(Ly(k),n kPθj ), which implies that if y(k),n ∈
ACM Journal Name, Vol. XX, No. XX, MM 20YY.
22 · Paschalidis and Guo

∗ GLRT then L
Sijk,n ∩ Sijk,n y(k),n ∈ Cijk . Sanov’s theorem yields

1
− lim sup log Pθi [y(k),n ∈ Sijk,n
∗ GLRT ] ≥
∩ Sijk,n inf D(QkPθi ) (31)
n→∞ n Q∈Cijk

for all θ i . Combining (2) and (31)



1 GLRT 1 S∗
− lim sup log βijk,n (θ i ) = min − lim sup log βijk,n (θ i ),
n→∞ n n→∞ n

1
− lim sup log Pθi [y(k),n ∈ Sijk,n
∗ GLRT
∩ Sijk,n
n→∞ n
 
≥ min inf D(QkPθi ), inf D(QkPθi )
Q∈Aijk Q∈Cijk

= inf D(QkPθi ),
Q∈Aijk

where the last equality holds under condition (3). Thus, the type II error prob-
ability of GLRT has the same exponent as the generalized Hoeffding test if and
only if condition (3) holds. Since the generalized Hoeffding test satisfies the GNP
optimality condition (Lemma 3.1) so does the GLRT under condition (3). This also
establishes (5).

ACM Journal Name, Vol. XX, No. XX, MM 20YY.

You might also like