03 Querying Sensor Networks
03 Querying Sensor Networks
Sensor Networks ∗
Samuel Madden, Michael J. Franklin, and Joseph M. Hellerstein Wei Hong
{madden,franklin,jmh}@cs.berkeley.edu [email protected]
UC Berkeley Intel Research, Berkeley
ABSTRACT strategies that are familiar from push-down techniques from distributed
We discuss the design of an acquisitional query processor for data col- query processing that emphasize moving queries to data.
lection in sensor networks. Acquisitional issues are those that pertain to In contrast, we present acquisitional query processing (ACQP), where
where, when, and how often data is physically acquired (sampled) and we focus on the significant new query processing opportunity that arises
delivered to query processing operators. By focusing on the locations in sensor networks: the fact that smart sensors have control over where,
and costs of acquiring data, we are able to significantly reduce power when, and how often data is physically acquired (i.e. sampled) and de-
consumption over traditional passive systems that assume the a priori livered to query processing operators. By focusing on the locations and
existence of data. We discuss simple extensions to SQL for controlling costs of acquiring data, we are able to significantly reduce power con-
data acquisition, and show how acquisitional issues influence query op- sumption compared to traditional passive systems that assume the a pri-
timization, dissemination, and execution. We evaluate these issues in ori existence of data. Acquisitional issues arise at all levels of query
the context of TinyDB, a distributed query processor for smart sensor processing: in query optimization, due to the significant costs of sam-
devices, and show how acquisitional techniques can provide significant pling sensors; in query dissemination, due to the physical co-location
reductions in power consumption on our sensor devices. of sampling and processing; and, most importantly, in query execution,
where choices of when to sample and which samples to process are
made. Of course, techniques proposed in other research on sensor and
1. INTRODUCTION power-constrained query processing, such as pushing down predicates
In the past few years, smart sensor devices have matured to the point and minimizing communication are also important alongside ACQP and
that it is now feasible to deploy large, distributed networks of such sen- fit comfortably within its model.
sors [42, 23, 37, 8]. Sensor networks are differentiated from other wire- We have designed and implemented an ACQP engine, called TinyDB
less, battery-powered environments in that they consist of tens or hun- (for more information on TinyDB, see [35]), which is a distributed query
dreds of autonomous nodes that operate without human interaction (e.g. processor that runs on each of the nodes in a sensor network. TinyDB
configuration of network routes, recharging of batteries, or tuning of pa- runs on the Berkeley Mica mote platform, on top of the TinyOS [23] op-
rameters) for weeks or months at a time. Furthermore, sensor networks erating system. We chose this platform because the hardware is readily
are often embedded into some (possibly remote) physical environment available from commercial sources [13] and the operating system is rel-
from which they must monitor and collect data. The long-term, low- atively mature. TinyDB has many of the features of a traditional query
power nature of sensor networks, coupled with their proximity to physi- processor (e.g. the ability to select, join, project, and aggregate data),
cal phenomena, lead to a significantly altered view of software systems but, as we will discuss in this paper, also incorporates a number of other
than that of more traditional mobile or distributed environments. features designed to minimize power consumption via acquisitional tech-
In this paper, we are concerned with query processing in sensor net- niques. These techniques, taken in aggregate, can lead to orders of mag-
works. Researchers have noted the benefits of a query processor-like in- nitude improvement in power consumption and increased accuracy of
terface to sensor networks and the need for sensitivity to limited power query results over non-acquisitional systems that do not actively control
and computational resources [27, 33, 41, 48, 34]. Prior systems, how- when and where data is collected.
ever, tend to view query processing in sensor networks simply as a We address a number of ACQP-related questions, including:
power-constrained version of traditional query processing: given some 1. When should samples for a particular query be taken?
set of data, they strive to process that data as energy-efficiently as possi- 2. What sensor nodes have data relevant to a particular query?
ble. Typical strategies include minimizing expensive communication by
applying aggregation and filtering operations inside the sensor network – 3. In what order should samples for this query be taken, and how
should sampling be interleaved with other operations?
∗This work has been supported in part by the National Science Founda- 4. Is it worth expending computational power or bandwidth to pro-
tion under ITR/IIS grant 0086057, ITR/IIS grant 0208588, ITR/IIS grant cess and relay a particular sample?
0205647, ITR/SI grant 0122599, and by ITR/IM grant 1187-26172 , as Of these issues, question (1) is unique to ACQP. The remaining ques-
well as research funds from IBM, Microsoft, and the UC MICRO pro-
gram. tions can be answered by adapting techniques that are similar to those
found in traditional query processing. Notions of indexing and opti-
mization, in particular, can be applied to answer questions (2) and (3),
Permission to make digital or hard copies of all or part of this work for
and question (4) bears some similarity to issues that arise in stream pro-
personal or classroom use is granted without fee provided that copies are cessing and approximate query answering. We will address each of these
not made or distributed for profit or commercial advantage and that copies questions, noting the unusual kinds of indices, optimizations, and ap-
bear this notice and the full citation on the first page. To copy otherwise, to proximations that are required in ACQP under the specific constraints
republish, to post on servers or to redistribute to lists, requires prior specific posed by sensor networks.
permission and/or a fee.
SIGMOD 2003, June 9-12, San Diego, CA
Figure 1 illustrates the basic architecture that we follow throughout
Copyright 2003 ACM 1-58113-634-X/03/06 ...$5.00. this paper – queries are submitted at a powered PC (the base station) ,
parsed, optimized and sent into the sensor network, where they are dis- ual sensor nodes will deplete their energy supplies in only a few days.
seminated and processed, with results flowing back up the routing tree In contrast, if sensor nodes are very spartan about power consumption,
that was formed as the queries propagated. After a brief introduction to months or years of lifetime are possible. Mica motes, for example, when
sensor networks in Section 2, the remainder of the paper discusses each operating at 2% duty cycle (between active and sleep modes) can achieve
of these phases of ACQP: Section 3 covers our query language, Section lifetimes in the 6 month range on a pair of AA batteries. This duty cycle
4 highlights optimization issues in power-sensitive environments, Sec- limits the active time to 1.2 seconds per minute.
tion 5 discusses query dissemination, and finally, Sections 6 discusses Mica motes have a 4Mhz, 8bit Atmel microprocessor. Their RFM
our adaptive, power-sensitive model for query execution and result col- TR1000 radios run at 40 kbits/second over a single shared CSMA chan-
lection. nel. Radio messages are variable size. Typically about 10 48-byte mes-
Result
1 28 sages (the default size in TinyDB) can be delivered per second. Power
2 55
SELECT nodeid, light
3 48
FROM SENSORS consumption tends to be dominated by radio communication. When
powered on, radios consume about as much power as the processor.
Query PC
However, because communication is so slow, every bit of data transmit-
FIELDS
nodeid Mote ted by the radio costs as much energy as executing 1000 CPU instruc-
light
OPS
tions. As an additional feature, motes have an external 32kHz clock that
Result NULL
1 28
the TinyOS operating system can synchronize with neighboring motes
2 55 +/- 1 ms to ensure that neighbors will be powered up and listening when
3 48
Result they wish to send a message[15].
2 55
Power consumption in sensors occurs in four phases, which we il-
lustrate in Figure 2 via an annotated capture of an oscilloscope display
showing current draw (which is proportional to power consumption) on
Result
3 48 a Mica mote running TinyDB. In “Snoozing” mode, where the node
spends most of its time, the processor and radio are idle, waiting for
a timer to expire or external event to wake the device. When the de-
Figure 1: A query and results propagating through the network. vice wakes it enters the “Processing” mode, which consumes an order of
magnitude more power than snooze mode, and where query results are
generated locally. The mote then switches to a “Processing and Receiv-
2. SENSOR NETWORK OVERVIEW ing” mode, where results are collected from neighbors over the radio.
We begin with an overview of some recent sensor network deploy- Finally, in the “Transmitting” mode, results for the query are delivered
ments, and then discuss properties of sensors and sensor networks in by the local mote – the noisy signal during this period reflects switching
general, providing specific numbers from our experience with TinyOS as the receiver goes off and the transmitter comes on and then cycles
motes when possible. back to a receiver-on, transmitter-off state.
In the past several years, the sensor network research community has Time v. Current Draw In Different Phases of Query Processing
22
developed and engaged in real deployments of these devices, making it
20
possible to understand the data collection needs specific to the sensor 18
12
scenarios, motes collect light, temperature, humidity, and other environ- 10
Processing
mental properties. On Great Duck Island, off the coast of Maine, sensors 8
Processing
and
have been placed in the burrows of Storm Petrels, a kind of endangered 6 Listening
4
sea bird. Scientists plan to use them to monitor burrow occupancy and 2
Snoozing
Transmitting
the conditions surrounding burrows that are correlated with birds com- 0 0.5 1 1.5
Time (seconds)
2 2.5 3
25
20 To simplify the equations in this example, we present a query with a
15
10
Event Based Trigger single selection predicate that is applied after attributes have been ac-
5 quired. The ordering of multiple predicates and interleaving of sam-
0
pling and selection are discussed in detail in Section 4. Table 1 shows
30 the parameters we use in this computation (we do not show processor
25
costs since they will be negligible for the simple selection predicates we
Current (mA)
20
15
support, and have been subsumed into costs of sampling and delivering
10
Polling Based Trigger results.)
5 The first step is to determine the available power ph per hour:
0
0 5 10 15 20 25 30 35 40 ph = crem / l
Time (s) We then need to compute the energy to collect and transmit one sam-
Figure 3: External interrupt driven event-based query (top) vs. ple, es , including the costs to forward data for our children:
es = ( numSensors
P
Polling driven event-based query (bottom). s=0 Es ) + (Ercv + Etrans ) × C + Etrans × σ
Finally, we can compute the maximum transmission rate, T (in sam-
Events can also serve as stopping conditions for queries. Ap- ples per hour), as :
pending a clause of the form STOP ON EVENT(param) WHERE T = ph /es
cond(param) will stop a continuous query when the specified event To illustrate the effectiveness of this simple estimation, we in-
arrives and the condition holds. serted a lifetime-based query (SELECT voltage, light FROM
In the current implementation of TinyDB, events are only signalled sensors LIFETIME x) into a sensor (with a fresh pair of AA bat-
on the local node – we do not currently provide a fully distributed event teries) and asked it to run for 24 weeks, which resulted in a sample rate
propagation system. Note, however, that queries started in response to of 15.2 seconds per sample. We measured the remaining voltage on the
a local event may be disseminated to other nodes (as in the example device 9 times over 12 days. The first two readings were outside the
above). range of the voltage detector on the mote (e.g. they read “1024” – the
maximum value) so are not shown. Based on experiments with our test
mote connected to a power supply, we expect it to stop functioning when
its voltage reaches 350. Figure 4 shows the measured lifetime at each
3.3 Lifetime-Based Queries point in time, with a linear fit of the data, versus the “expected voltage”
which was computed using the cost model above. The resulting linear fit
In lieu of a explicit SAMPLE INTERVAL clause, users may request of voltage is quite close to the expected voltage. The linear fit reaches
a specific query lifetime via a QUERY LIFETIME <x> clause, where V=350 about 5 days after the expected voltage line.
<x> is a duration in days, weeks, or months. Specifying lifetime is a Given that it is possible to estimate lifetime on a single node, we now
much more intuitive way for users to reason about power consumption. discuss coordinating the transmission rate across all nodes in the routing
Especially in environmental monitoring scenarios, scientific users are tree. Since sensors need to sleep between relaying of samples, it is im-
not particularly concerned with small adjustments to the sample rate, nor portant that senders and receivers synchronize their wake cycles. To do
do they understand how such adjustments influence power consumption. this, we allow nodes to transmit only when their parents in the routing
Such users, however, are very concerned with the lifetime of the network tree are awake and listening (which is usually the same time they are
executing the queries. Consider the query: transmitting.) By transitivity, this limits the maximum rate of the entire
SELECT nodeid, accel network to the transmission rate of the root of the routing tree. If a node
FROM sensors
LIFETIME 30 days must transmit slower than the root to meet the lifetime clause, it may
This query specifies that the network should run for at least 30 days, transmit at an integral divisor of the root’s rate.4 To propagate this rate
sampling light and acceleration sensors at a rate that is as quick as pos- through the network, each parent node (including the root) includes its
sible and still satisfies this goal. transmission rate in queries that it forwards to its children.
To satisfy a lifetime clause, TinyDB performs lifetime estimation. The The previous analysis left the user with no control over the sample
goal of lifetime estimation is to compute a sampling and transmission rate, which could be a problem because some applications require the
rate given a number of Joules of energy remaining. We begin by consid- ability to monitor physical phenomena at a particular granularity. To
ering how a single node at the root of the sensor network can compute remedy this, we allow an optional MIN SAMPLE RATE r clause to be
these rates, and then discuss how other nodes coordinate with the root supplied. If the computed sample rate for the specified lifetime is greater
to compute their delivery rates. For now, we also assume that sampling than this rate, sampling proceeds at the computed rate (since the alterna-
and delivery rates are the same. On a single node, these rates can be tive is expressible by replacing the LIFETIME clause with a SAMPLE
computed via a simple cost-based formula, taking into account the costs INTERVAL clause.) Otherwise, sampling is fixed at a rate of r and the
of accessing sensors, selectivities of operators, expected communication
rates and current battery voltage. We show below a lifetime computation 4
One possible optimization, which we do not explore, would involve
for simple queries of the form: selecting or reassigning the root to maximize transmission rate.
Predicted Voltage vs. Actual Voltage (Lifetime Goal = 24 Wks)
1100
Linear Fit (r = -0.92) Metadata Description
Actual Data
1000 Predicted Lifetime
Power Cost to sample this attribute (in J)
Sample Time Time to sample this attribute (in s)
tures for event-based processing and lifetime queries, we now turn to consists of a triplet of functions, that initialize, merge, and update the fi- (V=350)
query processing issues. We begin with a discussion of optimization, nal value of partial aggregate records as they flow through the system. As
and then cover query dissemination and execution. in the TAG[34] paper, aggregate authors must provide information about
Queries in TinyDB are parsed at the basestation and disseminated in a functional properties. In TinyDB, we currently require two: whether the
simple binary format into the sensor network, where they are instantiated aggregate is monotonic and whether it is exemplary or summary. COUNT
and executed. Before queries are disseminated, the basestation performs is a monotonic aggregate as its value can only get larger as more values
a simple query optimization phase to choose the correct ordering of sam- are aggregated. MIN is an exemplary aggregate, as it returns a single
pling, selections, and joins. value from the set of aggregate values, while AVERAGE is a summary
We use a simple cost-based optimizer to choose a query plan that will aggregate because it computes some property over the entire set of val-
yield the lowest overall power consumption. Optimizing for power al- ues.
lows us to subsume issues of processing cost and radio communication, TinyDB also stores metadata information about the costs of process-
which both contribute to power consumption and so will be taken into ing and delivering data, which is used in query-lifetime estimation. The
account. One of the most interesting aspects of power-based optimiza- costs of these phases in TinyDB were shown in Figure 2 – they range
tion, and a key theme of acquisitional query processing, is that the cost from 2 mA while sleeping, to over 20 mA while transmitting and pro-
of a particular plan is often dominated by the cost of sampling the physi- cessing. Note that actual costs vary from mote to mote – for example,
cal sensors and transmitting query results rather than the cost of applying with a small sample of 5 motes (using the same batteries), we found that
individual operators (which are, most frequently, very simple.) We begin the average current with processor active varied from 13.9 to 17.6 mA
by looking at the types of metadata stored by the optimizer. Our opti- (with the average being 15.66 mA).
mizer focuses on ordering joins, selections, and sampling operations that
run on individual nodes. 4.2 Ordering of Sampling And Predicates
Having described the metadata maintained by TinyDB, we now de-
4.1 Metadata Management scribe how it is used in query optimization.
Each node in TinyDB maintains a catalog of metadata that describes As Table 3 shows, sampling is often an expensive operation in terms of
its local attributes, events, and user-defined functions. This metadata is power. However, a sample from a sensor s must be taken to evaluate any
periodically copied to the root of the network for use by the optimizer. predicate over the attribute sensors.s. If a predicate discards a tuple
Metadata is registered with the system via static linking done at compile of the sensors table, then subsequent predicates need not examine the
time using the TinyOS C-like programming language. Events and at- tuple – and hence the expense of sampling any attributes referenced in
tributes pertaining to various operating system and TinyDB components those subsequent predicates can be avoided. Thus these predicates are
are made available to queries by declaring them in an interface file and “expensive”, and need to be ordered carefully. The predicate ordering
providing a small handler function. For example, in order to expose net- problem here is somewhat different than than in the earlier literature (e.g.
work topology to the query processor, the TinyOS Network component [21]) because (a) an attribute may be referenced in multiple predicates,
defines the attribute parent of type integer and registers a handler that and (b) expensive predicates are only on a single table, sensors. The
returns the id of the node’s parent in the current routing tree. first point introduces some subtlety, as it is not clear which predicate
Event metadata consists of a name, a signature, and a frequency esti-
mate that is used in query optimization (see Section 4.3 below.) User- 5
Scientists are particularly interested in monitoring the micro-climates
defined predicates also have a name and a signature, along with a selec- created by plants and their biological processes. See [14, 8]. An example
tivity estimate which is provided by the author of the function. of such a sensor is Figaro Inc’s H2 S sensor [16].
should be “charged” the cost of the sample. the selection predicate over mag, rather than first sampling mag. This
To model this issue, we treat the sampling of a sensor t as a sep- sort of reordering, which we call exemplary aggregate pushdown can be
arate “job” τ to be scheduled along with the predicates. Hence a set applied to any exemplary aggregate (e.g. MIN, MAX). Unfortunately, the
of predicates P = {p1 , . . . , pm } is rewritten as a set of operations selectivities of exemplary aggregates are very hard to capture, especially
S = {s1 , . . . , sn }, where P ⊂ S, and S − P = {τ1 , . . . , τn−m } for window aggregates. We reserve the problem of ordering exemplary
contains one sampling operator for each distinct attribute referenced in aggregates in query optimization for future work.
P . The selectivity of sampling operators is always 1. The selectivity of
selection operators is derived by assuming that attributes have a uniform 4.3 Event Query Batching to Conserve Power
distribution over their range (which is available in the catalog.) Relaxing As a second example of the benefit of power-aware optimization, we
this assumption by, for example, storing histograms or time-dependent consider the optimization of the query:
functions per-attribute remains an area of future work. The cost of an op-
ON EVENT e(nodeid)
erator (predicate or sample) can be determined by consulting the meta- SELECT a1
data, as described in the previous section. In the cases we discuss here, FROM sensors AS s
WHERE s.nodeid = e.nodeid
selections and joins are essentially “free” compared to sampling, but this SAMPLE INTERVAL d FOR k
is not a requirement of our technique. This query will cause an instance of the internal query (SELECT
We also introduce a partial order on S, where τi must precede pj if ...) to be started every time the event e occurs. The internal query
pj references the attribute sampled by τi . The combination of sampling samples results every d seconds for a duration of k seconds, at which
operators and the dependency of predicates on samples captures the costs point it stops running.
of sampling operators and the sharing of operators across predicates. Note that, by the semantics formulated above, it is possible for mul-
The partial order induced on S forms a graph with edges from sam- tiple instances of the internal query to be running at the same time. If
pling operators to predicates. This is a simple series-parallel graph. An enough such queries are running simultaneously, the benefit of event-
optimal ordering of jobs with series-parallel constraints is a topic treated based queries (e.g. not having to poll for results) will be outweighed
in the Operations Research literature that inspired earlier optimization by the fact that each instance of the query consumes significant energy
work [25, 30, 21]; Monma and Sidney present the Series-Parallel Algo- sampling and delivering (independent) results. To alleviate the burden
rithm Using Parallel Chains [38], which gives an optimal ordering of the of running multiple copies of the same identical query , we employ a
jobs in O(|S| log |S|) time. multi-query optimization technique based on rewriting. To do this, we
Due to space constraints, we have glossed over the details of han- convert external events (of type e) into a stream of events, and rewrite
dling the expensive nature of sampling in the SELECT, GROUP BY, and the entire set of independent internal queries as a sliding window join
HAVING clauses. The basic idea is to add them to S with appropriate between events and sensors, with a window size of k seconds on
selectivities, costs, and ordering constraints. the event stream, and no window on the sensor stream. For example:
As an example of this process, consider the query: SELECT s.a1
SELECT accel,mag FROM sensors AS s, events AS e
FROM sensors WHERE s.nodeid = e.nodeid
WHERE accel > c1 AND e.type = e
AND mag > c2 AND s.time - e.time <= k AND s.time > e.time
SAMPLE INTERVAL 1s SAMPLE INTERVAL d
The order of magnitude difference in per-sample costs for the ac- We execute this query by treating it as a join between a materialization
celerometer and magnetometer suggests that the power costs of plans point of size k on events and the sensors stream. When an event
with different orders of sampling and selection will vary substantially. tuple arrives, it is added to the buffer of events. When a sensor tuple
We consider three possible plans: in the first, the magnetometer and ac- s arrives, events older than k seconds are dropped from the buffer and s
celerometer are sampled before either selection is applied. In the second, is joined with the remaining events.
the magnetometer is sampled and the selection over its reading (which The advantage of this approach is that only one query runs at a time
we call Smag ) is applied before the accelerometer is sampled or filtered. no matter how frequently the events of type e are triggered. This of-
In the third plan, the accelerometer is sampled first and its selection fers a large potential savings in sampling and transmission cost. At first
(Saccel ) is applied before the magnetometer is sampled. We compared it might seem as though requiring the sensors to be sampled every d
the cost of these three plans, and, as expected, found that the first was al- seconds irrespective of the contents of the event buffer would be pro-
ways more expensive than the other two. More interestingly, the second hibitively expensive. However, the check to see if the the event buffer
can be an order of magnitude more expensive than third, when Saccel is empty can be pushed before the sampling of the sensors, and can be
is much more selective than Smag . Conversely, when Smag is highly done relatively quickly.
selective, it can be cheaper to sample the magnetometer first, although Figure 5 shows the power tradeoff for event-based queries that have
only by a small factor (.8). The order of magnitude difference in relative and have not been rewritten. Rewritten queries are labeled as stream
costs represents an absolute difference of 1320 uJ per sample, or 3.96 join and non-rewritten queries as asynch events. We measure the cost in
mW at a (slow) sample rate of one sample per second – putting the ad- mW of the two approaches using a numerical model of power costs for
ditional power consumption from sampling in the incorrect order on par idling, sampling and processing (including the cost to check if the event
with the power costs of running the radio or CPU for an entire second. queue is non-empty in the event-join case), but excluding transmission
Similarly, we note that there are certain kinds of aggregate functions costs to avoid complications of modeling differences in cardinalities be-
where the same kind of interleaving of sampling and processing can also tween the two approaches. We expect that the asynchronous approach
lead to a performance savings. Consider the query: will generally transmit many more results. We varied the sample rate
SELECT MAX(light) and duration of the inner query, and the frequency of events. We chose
FROM sensors the specific parameters in this plot to demonstrate query optimization
WHERE mag > x
SAMPLE INTERVAL 8s tradeoffs; for much faster or slower event rates, one approach tends to
In this query, the maximum light reading will be computed over all the always be preferable.
nodes in the network whose magentometers read greater than x . Inter- For very low event rates (fewer than 1 per second), the asynchronous
estingly, it turns out that, unless the mag > x predicate is very selective, events approach is sometimes preferable due to the extra overhead of
it will be cheaper to evaluate this query by checking to see if each new empty-checks on the event queue in the stream-join case. However, for
light reading is greater than the previous maximum and then applying faster event rates, the power cost of this approach increases rapidly as
independent samples are acquired for each event that few seconds. In-
Event Rate v. Power Consumption
(8 Samples/S)
lenge is to determine when a node or its children need not participate
1
Stream Join in a particular query. One common situation arises with constant-valued
0.9 Async Events, Event Dur = 1s
Async Events, Event Dur = 3s attributes (e.g. nodeid or location in a fixed-location network) with a se-
0.8 Async Events, Event Dur = 5s
Power Consumption(mW) lection predicate that indicates the node need not participate. Similarly,
0.7
if a node knows that none of its children will ever satisfy the value of
0.6
0.5
some selection predicate, say because they have constant attribute val-
0.4
ues outside the predicate’s range, it need not forward the query down
0.3 the routing tree. To maintain information about child attribute values ,
0.2 we propose the use of a semantic routing tree (SRT). We describe the
0.1 properties of SRTs in the next section, and briefly outline how they are
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
created and maintained.
Events Per Second
5.1 Semantic Routing Trees
Figure 5: The cost of processing event-based queries as asyn- An SRT is a routing tree (similar to the tree discussed in Section 2.2
chronous events versus joins. above) designed to allow each node to efficiently determine if any of
creasing the duration of the inner query increases the cost of the asyn- the nodes below it will need to participate in a given query over some
chronous approach as more queries will be running simultaneously. The constant attribute A. Traditionally, in sensor networks, routing tree con-
maximum absolute difference (of about .8mW) is roughly comparable to struction is done by having nodes pick a parent with the most reliable
1/4 the power cost of the CPU or radio. connection to the root (highest link quality.) With SRTs, we argue that
Finally, we note that there is a subtle semantic change introduced by the choice of parent should include some consideration of semantic prop-
this rewriting. The initial formulation of the query caused samples in erties as well. In general, SRTs are most applicable in situations in which
each of the internal queries to be produced relative to the time that the there are several parents of comparable link quality. A link-quality-based
event fired: for example, if event e1 fired at time t, samples would appear parent selection algorithm, such as the one described in [47], should be
at time t + d, t + 2d, .... If a later event e2 fired at time t + i, it would used in conjunction with the SRT to prefilter the set of parents made
produce a different set of samples at time t + i + d, t + i + 2d, .... Thus, available to the SRT.
unless i were equal to d (i.e. the events were in phase), samples for the Conceptually, an SRT is an index over A that can be used to locate
two queries would be offset from each other by up to d seconds. In the nodes that have data relevant to the query. Unlike traditional indices,
rewritten version of the query, there is only one stream of sensor tuples however, the SRT is an overlay on the network. Each node stores a single
which is shared by all events. unidimensional interval representing the range of A values beneath each
In many cases, users may not care that tuples are out of phase with of its children. 6 When a query q with a predicate over A arrives at a
events. In some situations, however, phase may be very important. In node n, n checks to see if any child’s value of A overlaps the query range
such situations, one way the system could improve the phase accuracy of A in q. If so, it prepares to receive results and forwards the query. If
of samples while still rewriting multiple event queries into a single join no child overlaps, the query is not forwarded. Also, if the query also
is via oversampling, or acquiring some number of (additional) samples applies locally (whether or not it also applies to any children) n begins
every d seconds. The increased phase accuracy of oversampling comes executing the query itself. If the query does not apply at n or at any of
at an increased cost of acquiring additional samples (which may still be its children, it is simply forgotten.
less than running multiple queries simultaneously.) For now, we simply Building an SRT is a two phase process: first the SRT build request is
allow the user to specify that a query must be phase-aligned by specify- flooded (re-transmitted by every mote until all motes have heard the re-
ing ON ALIGNED EVENT in the event clause. quest) down the network. This request includes the name of the attribute
Thus, we have shown that there are several interesting optimization A over which the tree should be built. As a request floods down the net-
issues in ACQP systems; first, the system must properly order sampling, work, a node n may have several possible choices of parent, since, in
selection, and aggregation to be truly low power. Second, for frequent general, many nodes in radio range may be closer to the root. If n has
event-based queries, rewriting them as a join between an event stream children, it forwards the request on to them and waits until they reply.
and the sensors stream can significantly reduce the rate at which a If n has no children, it chooses a node p from available parents to be its
sensor must acquire samples. parent, and then reports the value of A to p in a parent selection mes-
sage. If n does have children, it records the value of A along with the
child’s id. When it has heard from all of its children, it chooses a parent
5. POWER SENSITIVE DISSEMINATION AND and sends a selection message indicating the range of values of A which
ROUTING it and its descendents cover. The parent records this interval with the
id of the child node and proceeds to choose its own parent in the same
After the query has been optimized, it is disseminated into the net-
manner, until the root has heard from all of its children.
work; dissemination begins with a broadcast of the query from the root of
Figure 6 shows an SRT over the latitude. The query arrives at the root,
the network. As each sensor hears the query, it must decide if the query
is forwarded down the tree, and then only the gray nodes are required to
applies locally and/or needs to be broadcast to its children in the routing
participate in the query (note that node 3 must forward results for node
tree. We say a query q applies to a node n if there is a non-zero probabil-
4, despite the fact that its own location precludes it from participation.)
ity that n will produce results for q. Deciding where a particular query
should run is an important ACQP-related decision. Although such deci- 5.2 Maintaining SRTs
sions occur in other distributed query processing environments, the costs
Even though SRTs are limited to constant attributes, some SRT main-
of incorrectly initiating queries in ACQP environments like TinyDB can
tenance must occur. In particular, new nodes can appear, link qualities
be unusually high, as we will show.
can change, and existing nodes can fail.
If a query does not apply at a particular node, and the node does not
Node appearance and link quality change can both require a node to
have any children for which the query applies, then the entire subtree
switch parents. To do this, it sends a parent selection message to its new
rooted at that node can be excluded from the query, saving the costs
parent, n. If this message changes the range of n’s interval, it notifies its
of disseminating, executing, and forwarding results for the query across
several nodes, significantly extending the node’s lifetime. 6
A natural extension to SRTs would be to store multiple intervals at each
Given the potential benefits of limiting the scope of queries, the chal- node.
QUERY
SELECT light
WHERE x > 3
AND x < 7 Location : (4,12) value was randomly and uniformly selected from the interval [0,1000].
SRT(x)
12 1 1: [1,1]
3: [5,10]
In the geographic distribution, (one-dimensional) sensor values were
11
10
computed based on a function of sensor’s x and y position in the grid,
9
Location : (8,7)
SRT(x)
such that a sensor’s value tended to be highly correlated to the values of
8 4: [5,5]
7 2 3
5: [10,10]
its neighbors.
6
5
Location : (1,7)
Figure 7 shows the number of nodes which participate in queries over
4
SRT(x)
variably-sized query intervals (where the interval size is shown on the
3 4 Location : (5,3)
5 X axis) of the attribute space in a 20x20 grid. The interval for queries
Location : (10,3)
2
SRT(x) SRT(x)
1 was randomly selected from the uniform distribution. Each point in the
Y 0
0
X
1 2 3 4 5 6 7 8 9 10 11
graph was obtained by averaging over five trials for each of the three
Figure 6: A semantic routing tree in use for a query. Gray arrows parent selection policies in each of the sensor distributions (for a total of
indicate flow of the query down the tree, gray nodes must produce 30 experiments). In each experiment, an SRT was constructed according
or forward results in the query. to the appropriate policy and sensor value distribution. Then, for each
interval size, the average number of nodes participating in 100 randomly
parent; in this way, updates can propagate to the root of the tree. constructed queries of the appropriate size was measured.
To handle the disappearance of a child node, parents associate an ac- For both distributions, the clustered approach was superior to other
tive query id and last epoch with every child in the SRT (recall that an SRT algorithms, beating the random approach by about 25% and the
epoch is the period of time between successive samples.) When a parent closest parent approach by about 10% on average. With the geographic
p forwards a query q to a child c, it sets c’s active query id to the id of q distribution, the performance of the clustered approach is close to opti-
and sets its last epoch entry to 0. Every time p forwards or aggregates a mal: for most ranges, all of the nodes in the range tend to be co-located,
result for q from c, it updates c’s last epoch with the epoch on which the so few intermediate nodes are required to relay information for queries
result was received. If p does not hear c for some number of epochs t, in which they themselves are not participating. This simulation is admit-
it assumes c has moved away, and removes its SRT entry. Then, p sends tedly optimistic, since geography and topology are perfectly correlated
a request asking its remaining children retransmit their ranges. It uses in our experiment. Real sensor network deployments show significant
this information to construct a new interval. If this new interval differs but not perfect correlation [17].
in size from the previous interval, p sends a parent selection message up It is a bit surprising that, even for a random distribution of sensor val-
the routing tree to reflect this change. ues, the closest-parent and clustered approaches are substantially better
Finally, we note that, by using these maintenance rules proposed, it is than the random-parent approach. The reason for this is that these tech-
possible to support SRTs over non-constant attributes, although if those niques reduce the spread of sensor values beneath parents, thereby re-
attributes change quickly, the cost of propagating changes in child inter- ducing the probability that a randomly selected range query will require
vals could be prohibitive. a particular parent to participate.
As the previous results show, the benefit of using an SRT can be sub-
5.3 Evaluation of Benefit of SRTs stantial. There are, however, maintenance and construction costs associ-
ated with SRTs; as discussed above. Construction costs are comparable
The benefit that an SRT provides is dependent on the quality of the
to those in conventional sensor networks (which also have a routing tree),
clustering of children beneath parents. If the descendents of some node
but slightly higher due to the fact that parent selection messages are ex-
n are clustered around the value of the index attribute at n, then a query
plicitly sent, whereas parents do not always require confirmation from
that applies to n will likely also apply to its descendents. This can be
their children in other sensor network environments.
expected for geographic attributes, for example, since network topology
is correlated with geography. 5.4 SRT Summary
We study three policies for SRT parent selection. In the first, random
SRTs provide an efficient mechanism for disseminating queries and
approach, each node picks a random parent from the nodes with which
collecting query results for queries over constant attributes. For at-
it can communication reliably. In the second, closest-parent approach,
tributes that are highly correlated amongst neighbors in the routing tree
each parent reports the value of its index attribute with the SRT-build
(e.g. location), SRTs can reduce the number of nodes that must dissem-
request, and children pick the parent whose attribute value is closest to
inate queries and forward the continuous stream of results from children
their own. In the clustered approach, nodes select a parent as in the
by nearly an order of magnitude.
closest-parent approach, except, if a node hears a sibling node send a
parent selection message, it snoops on the message to determine its sib-
lings parent and value. It then picks its own parent (which could be the 6. PROCESSING QUERIES
same as one of its siblings) to minimize the spread of attribute values Once queries have been disseminated and optimized, the query pro-
underneath all of its available parents. cessor begins executing them. Query execution is straightforward, so we
We studied these policies in a simple simulation environment – nodes describe it only briefly. The remainder of the section is devoted to the
were arranged on an nxn grid and were asked to choose a constant at- ACQP-related issues of prioritizing results and adapting sampling and
tribute value from some distribution (which we varied between exper- delivery rates. We present simple schemes for prioritizing data in se-
iments.) We used a perfect (lossless) connectivity model where each lection queries, briefly discuss prioritizing data in aggregation queries,
node could talk to its immediate neighbors in the grid (so routing trees and then turn to adaptation. We discuss two situations in which adapta-
were n nodes deep), and each node had 8 neighbors (with 3 choices of tion is necessary: when the radio is highly contented and when power
parent, on average.) We compared the total number of nodes involved in consumption is more rapid than expected.
range queries of different sizes for the three SRT parent selection poli-
cies to the best-case approach and the no SRT approach. The best-case 6.1 Query Execution
approach would only result if exactly those nodes that overlapped the Query execution consists of a simple sequence of operations at each
range predicate were activated, which is not possible in our topologies node during every epoch: first, nodes sleep for most of an epoch; then
but provides a convenient lower bound. In the no SRT approach, all they wake, sample sensors and apply operators to data generated locally
nodes participate in each query. and received from neighbors, and then deliver results to their parent. We
We experimented with a number of sensor value distributions; we re- (briefly) describe ACQP-relevant issues in each of these phases.
port on two here. In the random distribution, each constant attribute Nodes sleep for as much of each epoch as possible to minimize power
Query Range v. Nodes in Query (Random Dist) Query Range v. Nodes in Query (Geographic Dist)
450 450
400 400
300 300
250 250
200 200
150 150
consumption. They wake up only to sample sensors and relay and de- overflow. In these situations, the system must decide if it should dis-
liver results. Because nodes are time synchronized, they all sleep and card the overflow tuple, discard some other tuple already in the queue,
wake up at the same time, ensuring that results will not be lost as a re- or combine two tuples via some aggregation policy.
sult of a parent sleeping when a child tries to propagate a message. The The ability to make runtime decisions about the value of an individ-
amount of time, tawake that a sensor node must be awake to success- ual data item is central to ACQP systems, because the cost of acquiring
fully accomplish the latter three steps above is largely dependent on the and delivering data is high, and because of these situations where the
number of other nodes transmitting in the same radio cell, since only a rate of data items arriving at a node will exceed the maximum delivery
small number of messages per second can be transmitted over the single rate. A simple conceptual approach for making such runtime decisions
shared radio channel. is as follows: whenever the system is ready to deliver a tuple, send the
TinyDB uses a simple algorithm to scale tawake based on the neigh- result that will most improve the “quality” of the answer that the user
borhood size, the details of which we omit. Note, however, that there are sees. Clearly, the proper metric for quality will depend on the applica-
situations in which a node will be forced to drop or combine results as a tion: for a raw signal, root-mean-square (RMS) error is a typical metric.
result of the either tawake or the sample interval being too short to per- For aggregation queries, minimizing the confidence intervals of the val-
form all needed computation and communication. We discuss policies ues of group records could be the goal [43]. In other applications, users
for choosing how to aggregate data and which results to drop in the next may be concerned with preserving frequencies, receiving statistical sum-
subsection. maries (average, variance, or histograms), or maintaining more tenuous
Once a node is awake, it begins sampling and filtering results accord- qualities such as signal “shape”.
ing to the plan provided by the optimizer. Samples are taken at the ap- Our goal is not to fully explore the spectrum of techniques available
propriate (current) sample rate for the query, based on lifetime compu- in this space. Instead, we have implemented several policies in TinyDB
tations and information about radio contention and power consumption to illustrate that substantial quality improvements are possible given a
(see Section 6.3 for more information on how TinyDB adapts sampling particular workload and quality metric. Generalizing concepts of qual-
in response to variations during execution.) Filters are applied and re- ity and implementing and exploring more sophisticated prioritization
sults are routed to join and aggregation operators further up the query schemes remains an area of future work.
plan. There is a large body of related work on approximation and com-
For aggregation queries across nodes, we adopt the approach of TAG pression schemes for streams in the database literature (e.g. [18, 9]),
[34], although TAG does not support temporal aggregates but only ag- although these approaches typically focus on the problem of building
gregates of values produced in the same epoch. histograms or summary structures over the streams rather than trying to
The basic approach used in both TAG and TinyDB is to compute a preserve the (in order) signal as best as possible, which is the goal we
partial state record at each intermediate node in the routing topology. tackle first. Algorithms from signal processing, such as Fourier analy-
This record represents the partially evaluated aggregation of local sen- sis and wavelets are likely applicable, although the extreme memory and
sor values with sensor values received from child nodes as they flow up processor limitations of our devices and the online nature of our problem
the routing tree. The benefit of doing this is that a great deal less data (e.g. choosing which tuple in an overflowing queue to evict) make them
is transmitted than when all sensors’ values are sent to the root of the tricky to apply. We have begun to explore the use of wavelets in this
network to be aggregated together. context; see [22] for more information on our initial efforts.
Finally, we note that in event-based queries, the ON EVENT clause We begin with a comparison of three simple prioritization schemes,
must be handled specially. When an event fires on a node, that node naive, winavg, and delta for simple selection queries. In the naive
disseminates the query, specifying itself as the query root. This node scheme no tuple is considered more valuable than any other, so the queue
collects query results, and delivers them to the basestation or a local is drained in a FIFO manner and tuples are dropped if they do not fit in
materialization point. the queue.
The winavg scheme works similarly, except that instead of dropping
6.2 Prioritizing Data Delivery results when the queue fills, the two results at the head of the queue are
Once results have been sampled and all local operators have been ap- averaged to make room for new results. Since the head of the queue is
plied, they are enqueued onto a radio queue for delivery to the node’s now an average of multiple records, we associate a count with it.
parent. This queue contains both tuples from the local node as well In the delta scheme, a tuple is assigned an initial score relative to its
as tuples that are being forwarded on behalf of other nodes in the net- difference from the most recent (in time) value successfully transmitted
work. When network contention and data rates are low, this queue can be from this node, and at each point in time, the tuple with the highest score
drained faster than results arrive. However, because the number of mes- is delivered. The tuple with the lowest score is evicted when the queue
sages produced during a single epoch can vary dramatically, depending overflows. Out of order delivery (in time) is allowed. This scheme re-
on the number of queries running, the cardinality of joins, and the num- lies on the intuition that the largest changes are probably interesting. It
ber of groups and aggregates, there are situations when the queue will
works as follows: when a tuple t with timestamp T is initially enqueued most tens or hundreds), so they may be less valuable.
and scored, we mark it with the timestamp R of this most recently de- Thus, we have illustrated some examples where prioritization of re-
livered tuple r. Since tuples can be delivered out of order, it is possible sults can be used improve the overall quality of that data that are trans-
that a tuple with a timestamp between R and T could be delivered next mitted to the root when some results must be dropped or aggregated.
(indicating that r was delivered out of order), in which case the score Choosing the proper policies to apply in general, and understanding
we computed for t as well as its R timestamp are now incorrect. Thus, how various existing approximation and prioritization schemes map into
in general, we must rescore some enqueued tuples after every delivery. ACQP is an important future direction.
The delta scheme is similar to the value-deviation metric used in [18]
for minimizing deviation between a source and a cache, value-deviation 6.3 Adapting Rates and Power Consumption
does not include the possibility of out of order delivery. We saw in the previous sections how TinyDB can exploit query se-
We compared these three approaches on a single mote running mantics to transmit the most relevant results when limited bandwidth
TinyDB. To measure their effect in a controlled setting, we set the sample or power is available. In this section, we discuss selecting and adjust-
rate to be a fixed number K faster than the maximum delivery rate (such ing sampling and transmission rates to limit the frequency of network-
that 1 of every K tuples was delivered, on average) and compared their related losses and fill rates of queues. This adaptation is the other half
performance against several predefined sets of sensor readings (stored of the runtime techniques in ACQP: because the system can adjust rates,
in the EEPROM of the device.) In this case, delta had a buffer of 5 significant reductions can be made in the frequency with which data pri-
tuples; we performed reordering of out of order tuples at the basesta- oritization decisions must be made. These techniques are simply not
tion. To illustrate the effect of winavg and delta, Figure 8 shows how available in non-acquisitional query processing systems.
delta and winavg approximate a high-periodicity trace of sensor read- When initially optimizing a query, TinyDB’s optimizer chooses a
ings generated by a shaking accelerometer (we omit naive due to space transmission and sample rate based on current network load conditions,
constraints.) Notice that delta is considerably closer in shape to the orig- and requested sample rates and lifetimes. However, static decisions
inal signal in this case, as it is tends to emphasize extremes, whereas made at the start of query processing may not be valid after many days
average tends to dampen them. running the same continuous query. Just as adaptive query processing
techniques like eddies [6], or those of Tukwila[28] dynamically reorder
Approximations of Acceleration Signal
800
operators as the execution environment changes, TinyDB must react to
Sample Value
Acceleration Signal
700 changing conditions – however, unlike in previous adaptive query pro-
600 cessing systems, failure to adapt in TinyDB can bring the system to its
500
400 knees, reducing data flow to a trickle or causing the system to severely
miss power budget goals.
800
Sample Value
700
Delta We study the need for adaptivity in two contexts: network contention
600 and power consumption. We first examine network contention. Rather
500 than simply assuming that a specific transmission rate will result in a
400
relatively uncontested network channel, TinyDB monitors channel con-
800
tention and adaptively reduces the number of packets transmitted as con-
Sample Value
Avg
700
600
tention rises. This backoff is very important: as the 4 motes line of Figure
500 9 shows, if several nodes try to transmit at high rates, the total number of
400 packets delivered is substantially less than if each of those nodes tries to
400 450 500 550 600 650
# of Samples
transmit at a lower rate. Compare this line with the performance of a sin-
Figure 8: An acceleration signal (top) approximated by a delta (mid- gle node (where there is no contention) – a single node does not exhibit
dle) and an average (bottom), K=4. the same falling off because there is no contention (although the percent-
We also measured RMS error for this signal as well as two others: a age of successfully delivered packets does fall off.) Finally, the 4 motes
square wave-like signal from a light sensor being covered and uncov- adaptive line does not have the same precipitous performance because it
ered, and a slow sinusoidal signal generated by moving a magnet around is able to monitor the network channel and adapt to contention.
a magnetometer. The error for each of these signals and techniques is Note that the performance of the adaptive approach is slightly less
shown in Table 4. Although delta appears to match the shape of the ac- than the non-adaptive approach at 4 and 8 samples per second as back-
celeration signal better, its RMS value is about the same as average’s off begins to throttle communication in this regime. However, when we
(due to the few peaks that delta incorrectly merges together.) Delta out- compared the percentage of successful transmission attempts at 8 pack-
performs either other approach for the fast changing step-functions in ets per second, the adaptive scheme achieves twice the success rate of
the light signal because it does not smooth edges as much as average. the non-adaptive scheme, suggesting the adaptation is still effective in
reducing wasted communication effort, despite the lower utilization.
Delivery Rate, Aggregate over All Motes (packets per second)
Accel Light (Step) Magnetometer (Sinusoid) Sample Rate vs. Delivery Rate
7
Winavg 64 129 54 4 motes
1 mote
Delta 63 81 48 6 4 motes, adaptive
Naive 77 143 63
5
2
We omit a discussion of prioritization policies for aggregation queries.
TAG [34] discusses several snooping-based techniques unique to sensor 1
700 600
[27, 34, 41, 33, 48]. As mentioned above, these papers noted the im-
ERMS=81
450
portance of power sensitivity. Their predominant focus to date has been
600
300
on in-network processing – that is, the pushing of operations, particularly
500 ERMS=87 selections and aggregations, into the network to reduce communication.
800 150
750 We too endorse in-network processing, but believe that, for a sensor net-
Sample Value Sample Value
No Adaptation No Adaptation
700 600 work system to be truly power sensitive, acquisitional issues of when,
600 450
ERMS=112 where, and in what order to sample and which samples to process must
300
500 ERMS=109 be considered. To our knowledge, no prior work addresses these issues.
800 150
Accelerometer Signal 750 Magnetometer Signal There is a small body of work related to query processing in mobile
700 600 environments [26, 2]. This work is concerned with laptop-like devices
600 450
that are carried with the user, can be readily recharged every few hours,
300
500 and, with the exception of a wireless network interface basically have
150
500 520 540 560 580 600 350 400 450 500 550 the capabilities of a wired, powered PC. Lifetime-based queries, notions
Sample # Sample # of sampling the associated costs, and runtime issues regarding rates and
Figure 10: Comparison of delivered values (bottom) versus actual
contention are not considered. Many of the proposed techniques, as well
readings for from two motes (left and right) sampling at 16 packets
as more recent work on moving object databases (such as [46]) focus on
per second and sending simultaneously. Four motes were communi-
the highly mobile nature of devices, a situation we are not (yet) dealing
cating simultaneously when this data was collected.
with, but which could certainly arise in sensor networks.
Power sensitive query optimization was proposed in [1], although, as
with the previous work, the focus is on optimizing costs in traditional
6.3.1 Measuring Power Consumption mobile devices (e.g. laptops and palmtops), so concerns about the cost
We now turn to the problem of adapting tuple delivery rates to meet and ordering of sampling do not appear. Furthermore, laptop-style de-
specific lifetime requirements in response to incorrect sample rates com- vices typically do not offer the same degree of rapid power-cycling that
puted at query optimization time (see Section 3.3). We first note that, is available on embedded platforms like motes. Even if they did, their
using similar computations to those shown Section 3.3, it is possible to interactive, user oriented nature makes it undesirable to turn off displays,
compute a predicted battery voltage for a time t seconds into processing network interfaces, etc. because they are doing more than simply collect-
a query. We omit the calculation due to space constraints. ing and processing data, so there are many fewer power optimizations
The system can then compare its current voltage to this predicted volt- that can be applied.
age. By assuming that voltage decays linearly (see Figure 4 for empirical Building an SRT is analogous to building an index in a conventional
evidence of this property), we can re-estimate the power consumption database system. Due to the resource limitations of sensor networks, the
characteristics of the device (e.g. the costs of sampling, transmitting, actual indexing implementations are quite different. See [29] for a sur-
and receiving) and then re-run our lifetime calculation. By re-estimating vey of relevant research on distributed indexing in conventional database
these parameters, the system can ensure that this new lifetime calculation systems. There is also some similarity to indexing in peer-to-peer sys-
tracks the actual lifetime more closely. tems [4]. However, peer-to-peer systems differ in that they are inexact
Although this calculation and re-optimization are straightforward, and not subject to the same paucity of communications or storage infras-
they serve an important role by allowing sensors in TinyDB to satisfy tructure as sensor networks, so algorithms tend to be storage and com-
occasional ad-hoc queries and relay results for other sensors without munication heavy. Similar indexing issues also appear in highly mobile
compromising the lifetime goals of long running monitoring queries. environments (like [46, 26]), but this work relies on a centralized loca-
Finally, we note that incorrect measurements of power consumption tion servers for tracking recent positions of objects.
may also be due to incorrect estimates of the cost of various phases of The observation that interleaving the fetching of attributes and appli-
query processing, or may be as a result of incorrect selectivity estima- cation of operators also arises in the context of compressed databases
tion. We cover both by tuning sample rate. As future work, we intend [12], as decompression effectively imposes a penalty for fetching an in-
to explore adaptation of optimizer estimates and ordering decisions (in dividual attribute, so it is beneficial to apply selections and joins on al-
the spirit of other adaptive work like Eddies [6]) and the effect of fre- ready decompressed or easy to decompress attributes.
quency of re-estimation on lifetime (currently, in TinyDB, re-estimation There is a large body of work on event-based query processing in the
can only be triggered by an explicit request from the user.) active database literature. Languages for event composition and sys-
tems for evaluating composite events, such as [10], as well as systems [12] Z. Chen, J. Gehrke, and F. Korn. Query optimization in compressed database
for efficiently determining when an event has fired, such as [20] could systems. In ACM SIGMOD, 2001.
[13] I. Crossbow. Wireless sensor networks (mica motes).
(possibly) be useful in TinyDB. More recent work on continuous query http://www.xbow.com/Products/Wireless Sensor Networks.htm.
systems [32, 11] describes languages which provide for query processing [14] K. A. Delin and S. P. Jackson. Sensor web for in situ exploration of gaseous
in response to events or at regular intervals over time. This earlier work, biosignatures. In IEEE Aerospace Conference, 2000.
[15] J. Elson, L. Girod, and D. Estrin. Fine-grained network time synchroniza-
as well as our own work on continuous query processing [36], inspired tion using reference broadcasts. In OSDI, 2002.
the periodic and event-driven features of TinyDB. [16] Figaro, Inc. TGS-825 - Special Sensor For Hydrogen Sulfide.
Approximate and best effort caches [40], as well as systems for online- http://www.figarosensor.com.
[17] D. Ganesan, B. Krishnamachari, A. Woo, D. Culler, D. Estrin, and
aggregation [43] and stream query processing [39, 7] include some no- S. Wickera. Complex behavior at scale: An experimental study of
tion of data quality. Most of this other work is focused on quality low-power wireless sensor networks. Under submission. Available at:
http://lecs.cs.ucla.edu/ deepak/PAPERS/empirical.pdf, July 2002.
with respect to summaries, aggregates, or staleness of individual objects, [18] M. Garofalakis and P. Gibbons. Approximate query processing: Taming the
whereas we focus on quality as a measure of fidelity to the underlying terabytes! (tutorial). In VLDB, 2001.
continuous signal. Aurora [7] mentions a need for this kind of metric, [19] J. Gehrke, F. Korn, and D. Srivastava. On computing correlated aggregates
over continual data streams. In Proceedings of the ACM SIGMOD Confer-
but proposes no specific approaches. Work on approximate query pro- ence on Management of Data, Santa Barbara, CA, May 2001.
cessing [18] includes a scheme similar to our delta approach, as well [20] E. N. Hanson. The design and implementation of the ariel active database
as a substantially more thorough evaluation of its merits, but does not rule system. IEEE Transactions on Knowledge and Data Engineering,
8(1):157–172, February 1996.
consider the possibility of out of order delivery. [21] J. M. Hellerstein. Optimization techniques for queries with expensive meth-
ods. TODS, 23(2):113–157, 1998.
[22] J. M. Hellerstein, W. Hong, S. Madden, and K. Stanek. Beyond Average:
9. CONCLUSIONS AND FUTURE WORK Towards Sophisticated Sensing with Queries. . In Workshop on Information
Processing In Sensor Networks (IPSN), 2003.
Acquisitional query processing provides a framework for addressing [23] J. Hill, R. Szewczyk, A. Woo, S. Hollar, and D. C. K. Pister. System archi-
issues of when, where, and how often data is sampled and which data tecture directions for networked sensors. In ASPLOS, November 2000.
is delivered in distributed, embedded sensing environments. Although [24] Honeywell, Inc. Magnetic Sensor Specs HMC1002.
http://www.ssec.honeywell.com/magnetic/spec sheets/specs 1002.html.
other research has identified the opportunities for query processing in [25] T. Ibaraki and T. Kameda. On the optimal nesting order for computing n-
sensor networks, this work is the first to discuss these fundamental issues relational joins. TODS, 9(3):482–502, 1984.
in an acquisitional framework. [26] T. Imielinski and B. Badrinath. Querying in highly mobile distributed envi-
ronments. In VLDB, Vancouver, Canada, 1992.
We identified several opportunities for future research. We are cur- [27] C. Intanagonwiwat, R. Govindan, and D. Estrin. Directed diffusion: A scal-
rently actively pursuing two of these: first, we are exploring how query able and robust communication paradigm for sensor networks. In Mobi-
COM, Boston, MA, August 2000.
optimizer statistics change in acquisitional environments and studying [28] Z. G. Ives, D. Florescu, M. Friedman, A. Levy, and D. S. Weld. An adaptive
the role of online re-optimization in sample rate and operator orderings query execution system for data integration. In Proceedings of the ACM
in response to bursts of data or unexpected power consumption. Second, SIGMOD, 1999.
[29] D. Kossman. The state of the art in distributed query processing. ACM
we are pursuing more sophisticated prioritization schemes, like wavelet Computing Surveys, 2000.
analysis, that can capture salient properties of signals other than large [30] R. Krishnamurthy, H. Boral, and C. Zaniolo. Optimization of nonrecursive
changes (as our delta mechanism does) as well as mechanisms to allow queries. In VLDB, pages 128–137, 1986.
[31] C. Lin, C. Federspiel, and D. Auslander. Multi-Sensor Single Actuator Con-
users to express their prioritization preferences. trol of HVAC Systems. 2002.
We believe that ACQP notions are of critical importance for preserv- [32] L. Liu, C. Pu, and W. Tang. Continual queries for internet-scale event-
ing the longevity and usefulness of any deployment of battery powered driven information delivery. IEEE Knowledge and Data Engineering, 1999.
Special Issue on Web Technology.
sensing devices, such as those that are now appearing in biological pre- [33] S. Madden and M. J. Franklin. Fjording the stream: An architechture for
serves, roads, businesses, and homes. Without appropriate query lan- queries over streaming sensor data. In ICDE, 2002.
guages, optimization models, and query dissemination and data delivery [34] S. Madden, M. J. Franklin, J. M. Hellerstein, and W. Hong. TAG: A Tiny
AGgregation Service for Ad-Hoc Sensor Networks. In OSDI, 2002.
schemes that are cognisant of semantics and the costs and capabilities of [35] S. Madden, W. Hong, J. Hellerstein, and M. Franklin. TinyDB web page.
the underlying hardware the success of such deployments will be limited. http://telegraph.cs.berkeley.edu/tinydb.
[36] S. Madden, M. Shah, J. M. Hellerstein, and V. Raman. Continuously adap-
tive continuous queries over streams. In SIGMOD, 2002.
References [37] A. Mainwaring, J. Polastre, R. Szewczyk, and D. Culler. Wireless sensor
networks for habitat monitoring. In ACM Workshop on Sensor Networks
[1] R. Alonso and S. Ganguly. Query optimization in mobile environments. In and Applications, 2002.
Workshop on Foundations of Models and Languages for Data and Objects, [38] C. L. Monma and J. Sidney. Sequencing with seriesparallel precedence
pages 1–17, September 1993. constraints. Mathematics of Operations Research, 1979.
[2] R. Alonso and H. F. Korth. Database system issues in nomadic computing. [39] R. Motwani, J. Window, A. Arasu, B. Babcock, S.Babu, M. Data, C. Olston,
In ACM SIGMOD, Washington DC, June 1993. J. Rosenstein, and R. Varma. Query processing, approximation and resource
[3] Analog Devices, Inc. ADXL202E: Low-Cost 2 g Dual-Axis Accelerometer. management in a data stream management system. In CIDR, 2003.
http://products.analog.com/products/info.asp?product=ADXL202. [40] C. Olston and J.Widom. Best effort cache sychronization with source coop-
[4] H. G. Arturo Crespo. Routing indices for peer-to-peer systems. In ICDCS, eration. SIGMOD, 2002.
July 2002. [41] P.Bonnet, J.Gehrke, and P.Seshadri. Towards sensor database systems. In
[5] Atmel Corporation. Atmel ATMega 128 Microcontroller Datasheet. Conference on Mobile Data Management, January 2001.
http://www.atmel.com/atmel/acrobat/doc2467.pdf. [42] G. Pottie and W. Kaiser. Wireless integrated network sensors. Communica-
[6] R. Avnur and J. M. Hellerstein. Eddies: Continuously adaptive query pro- tions of the ACM, 43(5):51 – 58, May 2000.
cessing. In Proceedings of the ACM SIGMOD, pages 261–272, Dallas, TX,
May 2000. [43] V. Raman, B. Raman, and J. Hellerstein. Online dynamic reordering. The
[7] D. Carney, U. Centiemel, M. Cherniak, C. Convey, S. Lee, G. Seidman, VLDB Journal, 9(3), 2002.
M. Stonebraker, N. Tatbul, and S. Zdonik. Monitoring streams - a new class [44] M. Stonebraker and G. Kemnitz. The POSTGRES Next-Generation
of data management applications. In VLDB, 2002. Database Management System. Communications of the ACM, 34(10):78–
[8] A. Cerpa, J. Elson, D.Estrin, L. Girod, M. Hamilton, , and J. Zhao. Habitat 92, 1991.
monitoring: Application driver for wireless communications technology. In [45] UC Berkeley. Smart buildings admit their faults. Web Page, November
ACM SIGCOMM Workshop on Data Communications in Latin America and 2001. Lab Notes: Research from the College of Engineering, UC Berkeley.
the Caribbean, 2001. http://coe.berkeley.edu/labnotes/1101.smartbuildings.html.
[9] K. Chakrabarti, M. Garofalakis, R. Rastogi, and K. Shim. Approximate [46] O. Wolfson, A. P. Sistla, B. Xu, J. Zhou, and S. Chamberlain. DOMINO:
query processing using wavelets. VLDB Journal, 10, 2001. Databases fOr MovINg Objects tracking. In ACM SIGMOD, Philadelphia,
PA, June 1999.
[10] S. Chakravarthy, V. Krishnaprasad, E. Anwar, and S. K. Kim. Composite [47] A. Woo and D. Culler. A transmission control scheme for media access in
events for active databases: Semantics, contexts and detection. In VLDB, sensor networks. In ACM Mobicom, July 2001.
1994.
[11] J. Chen, D. DeWitt, F. Tian, and Y. Wang. NiagaraCQ: A scalable con- [48] Y. Yao and J. Gehrke. The cougar approach to in-network query processing
tinuous query system for internet databases. In Proceedings of the ACM in sensor networks. In SIGMOD Record, September 2002.
SIGMOD, 2000.