2024 - Elsevier - Enabling Energy-Efficient and Lossy-Aware Data Compression in
2024 - Elsevier - Enabling Energy-Efficient and Lossy-Aware Data Compression in
Information Sciences
journal homepage: www.elsevier.com/locate/ins
a r t i c l e i n f o a b s t r a c t
Article history: Nodes of wireless sensor networks (WSNs) are typically powered by batteries with a lim-
Received 8 January 2009 ited capacity. Thus, energy is a primary constraint in the design and deployment of WSNs.
Received in revised form 11 December 2009 Since radio communication is in general the main cause of power consumption, the differ-
Accepted 25 January 2010
ent techniques proposed in the literature to improve energy efficiency have mainly focused
on limiting transmission/reception of data, for instance, by adopting data compression and/
or aggregation. The limited resources available in a sensor node demand, however, the
Keywords:
development of specifically designed algorithms. To this aim, we propose an approach to
Wireless sensor networks
Data compression
perform lossy compression on single node based on a differential pulse code modulation
Multi-objective evolutionary algorithms scheme with quantization of the differences between consecutive samples. Since different
Energy efficiency combinations of the quantization process parameters determine different trade-offs
Signal processing between compression performance and information loss, we exploit a multi-objective evo-
lutionary algorithm to generate a set of combinations of these parameters corresponding to
different optimal trade-offs. The user can therefore choose the combination with the most
suitable trade-off for the specific application. We tested our lossy compression approach on
three datasets collected by real WSNs. We show that our approach can achieve significant
compression ratios despite negligible reconstruction errors. Further, we discuss how our
approach outperforms LTC, a lossy compression algorithm purposely designed to be
embedded in sensor nodes, in terms of compression rate and complexity.
Ó 2010 Elsevier Inc. All rights reserved.
1. Introduction
A wireless sensor network (WSN) consists of a set of small autonomous systems, called sensor nodes, which cooperate to
perform some common task such as environmental, habitat and structural monitoring, disaster management, equipment
diagnostic, alarm detection, and target classification. Nodes are deployed in large scale (from tens to thousands) so as to form
ad hoc distributed sensing and data propagation networks. Each node is a small device able to collect information from the
surrounding environment through one or more sensors, to elaborate this information locally and to communicate it to a data
collection center called sink or base station, using generally node to node multi-hop data propagation [1,19].
To this aim, nodes are equipped with a processing unit with limited memory and computational power, a sensing unit for
data acquisition from the surrounding environment and a communication unit, usually a radio transceiver. In general, each
sensor produces a stream of data which has to flow from the sensor node itself to the sink. Further, nodes which act as relays
in a multi-hop propagation scheme have also to store data coming from other nodes and to forward them towards the sink.
This requires to face with a main technological constraint: nodes are powered by small batteries which typically cannot be
* Corresponding author. Tel.: +39 050 2217678; fax: +39 050 2217600.
E-mail addresses: [email protected] (F. Marcelloni), [email protected] (M. Vecchio).
0020-0255/$ - see front matter Ó 2010 Elsevier Inc. All rights reserved.
doi:10.1016/j.ins.2010.01.027
F. Marcelloni, M. Vecchio / Information Sciences 180 (2010) 1924–1941 1925
changed or recharged. Since radio communication is in general the main cause of power consumption, transmission/recep-
tion of data should be limited as much as possible. Data compression appears a very appealing and effective tool to achieve
this objective.
Compression techniques reduce the data size by exploiting the structure of the data. Data compression algorithms fall
into two broad classes: lossless and lossy algorithms. Lossless algorithms guarantee the integrity of data during the compres-
sion/decompression process. On the contrary, lossy algorithms may generate a loss of information, but generally ensure a
higher compression ratio. Since sensor nodes are typically equipped with a few kilobytes of memory and a 4–8 MHz micro-
processor, embedding classical data compression schemes in these tiny nodes is practically impossible [2,34,22]. In [28], we
have tackled the problem of developing a lossless compression scheme suitable for sensor networks. We have proposed a
simple algorithm based on a modified version of the exponential Golomb code. The algorithm requires very low computa-
tional power, compresses data on the fly and uses a very small dictionary whose size is determined by the resolution of the
analog-to-digital converter on-board sensor nodes. Anyway, lossless compression can be inefficient for sensors used in com-
mercial nodes. Indeed, due to noise, these sensors produce different readings even when they are sampling an unchanging
phenomenon. For this reason, sensor manufactures specify not only the sensor operating range but also the sensor accuracy.
Datasheets express accuracy by providing a margin of error, but typically do not include a probability distribution for this
error. Thus, when a value is measured by a sensor, we are confident that the actual value is within the error margin, but can-
not know with what probability that value is some distance from the real value [36]. Signals produced by these sensors are
generally ‘‘cleaned” by employing some de-noising technique.
A de-noising technique can be applied directly on the sensor node or offline on the base station. In the first case, an en-
ergy-aware and low-complexity de-noising technique is required [49]. Further, before executing the de-noising process a
quite large number of samples has to be stored in the data memory, increasing the memory requirements of the sensor node.
In the second case, the sensor node simply acquires, compresses and stores noisy samples; when a packet payload is filled,
the sensor node sends the compressed packet towards the base station, where data are uncompressed and later de-noised.
Obviously, this approach can be adopted only if data have not to be transferred in real-time. In applications of environmental
monitoring, for instance, this is the typical case. In this scenario the use of a lossless compression algorithm lets us simply
post-pone the de-noising process so as to perform it on a machine, with generally no hard constraint on energy, computa-
tional power and memory. We have to consider, however, that noise increases the entropy of the signal and therefore hin-
ders the lossless compression algorithm to achieve considerable compression rates. Indeed data which will be discharged by
the de-noising process at the base station are equally transmitted by the sensor node affecting power consumption and con-
sequently the sensor node lifetime.
The ideal solution would be to adopt on the sensor node a lossy compression algorithm in which the loss of information
would be just the noise. Thus, we could achieve high compression ratios without losing relevant information. To this aim, we
exploit the observation that data typically collected by WSNs are strongly correlated. Thus, differences between consecutive
samples should be regular and generally very small. If this does not occur, it is likely that samples are affected by noise. To
de-noise and simultaneously compress the samples, we quantize the differences between consecutive samples. Further, to
reduce the number of bits required to code these differences, we adopt a Differential Pulse Code Modulation (DPCM) scheme
[7]. Of course, different combinations of the quantization process parameters determine different trade-offs between com-
pression performance and information loss. To generate a set of optimal combinations of the quantization process parame-
ters, we adopt one of the most popular Multi-Objective Evolutionary Algorithms (MOEAs), namely NSGA–II [8].
MOEAs generate a family of equally valid solutions, where each solution tends to satisfy a criterion to a higher extent than
another. Different solutions are compared with each other by using the notion of Pareto dominance. A solution x associated
with a performance vector u dominates a solution y associated with a performance vector v if and only if, 8i 2 f1; . . . ; Ig, with
I the number of criteria, ui performs better than v i , or equal to, v i and 9i 2 f1; . . . ; Ig such that ui performs better than v i ,
where ui and v i are the ith elements of vectors u and v, respectively. A solution is said to be Pareto optimal if it is not dom-
inated by any other possible solution. The set of Pareto-optimal solutions is denoted as Pareto front. Thus, the aim of an
MOEA is to discover a family of solutions that are a good approximation of the Pareto front.
To execute NSGA–II, we first collect a short sequence of samples from the sensor node. Then, we apply a popular de-nois-
ing technique to this sequence so as to obtain a sequence of de-noised samples. Each solution generated by NSGA–II is eval-
uated by quantizing the original samples and computing the information entropy of the quantized sequence as first
objective, the number of levels used in the quantization process as second objective and the signal-to-noise ratio (SNR) be-
tween the original de-noised samples and the quantized samples as third objective. The entropy and the number of levels
provide an indirect measure of the possible obtainable compression rates. The SNR quantifies the loss of information with
respect to the ideal (not affected by noise) signal. Each solution in the Pareto front represents, therefore, a quantizer with
an associated trade-off among information entropy, number of quantization levels and SNR between the original de-noised
and the quantized sequences of samples.
We show that the lossy compression scheme obtained by using the quantizers generated by the MOEAs in the DPCM
framework is characterized by low complexity and memory requirements for its execution. Further, it is able to compute
a compressed version of each value on the fly, thus reducing storage occupation.
We have tested our lossy compression approach on three datasets collected by real WSNs. We show that, though very
simple, our approach can achieve significant compression rates despite negligible reconstruction errors, that is, negligible
SNRs between the original de-noised and the reconstructed signals. We have compared our approach with a lossy
1926 F. Marcelloni, M. Vecchio / Information Sciences 180 (2010) 1924–1941
compression algorithm, namely the Lightweight Temporal Compression (LTC) algorithm [36], specifically designed to be
embedded in sensor nodes. We show that our approach outperforms LTC in terms of compression rates (and consequently
number of messages sent by a generic sensor node to transmit measures to the sink), complexity (average number of instruc-
tions required to compress a sample) and reconstruction errors, thus representing a very interesting state-of-art solution to
the problem of compressing noisy data in WSNs.
2. Related work
Due to the limited resources available in sensor nodes, to apply data compression in WSNs requires specifically designed
algorithms. Two approaches have been followed in the literature:
The first approach is natural in cooperative and dense WSNs. Here, nodes can collaborate with each other so as to carry
out tasks they could not execute alone. Thanks to the particular topology of these WSNs, data measured by neighboring
nodes are correlated both in space and in time.
The simplest distributed approach in WSNs relies on fixing a model, either stochastic or deterministic, of the phenomenon
under monitoring and estimating the parameters of that model. An overview of stochastic methods can be found in [52].
Here, the phenomenon is modeled as a single scalar or a vector. Nodes quantize their noisy measurements, a central data
sink collects the measurements and estimates the scalar or the vector. While the technique is able to accommodate a broad
class of noise models, the constant signal model is very restrictive and unable to represent arbitrarily smooth phenomena. A
deterministic technique, which allows managing more complicated signal models, has been proposed in [15]. Here, first a set
of basis functions is used to locally fit the phenomenon in various regions of the network. Then, the resulting approximations
are tied together by kernel regression, implemented using inter-sensor message exchange. Unfortunately, the reliability of
the reconstruction depends on the appropriate choice of the basis functions. Since this choice is performed statically and the
set of functions does not vary dynamically, phenomena of arbitrary complexity cannot be adequately modeled. For example,
using a set of globally smooth functions, piecewise smooth phenomena separated by a jump discontinuity cannot be recon-
structed. Indeed, the basis representation will smooth out the discontinuity, arguably the most interesting feature of the sig-
nal [49].
Distributed transform coding bridges this gap by allowing the data itself to guide the representation rather than forcing it
to fit a set of pre-fixed models. This approach is based on theoretical principles of transform analysis performed on the
source bits in order to compact the signal energy so as to achieve a better SNR. Various kinds of transforms such as Discrete
Wavelet Transform (DWT) and Karhunen–Loève Transform (KLT) [13] have been proposed for different types of applications.
We recall that transforms are invertible functions which merely change the representation of the signal without altering the
information contained in it. The motivating principle of such transform is that a more effective simple coding can be
achieved in the transform domain than in the original signal space.
Distributed transform coding, however, requires inter-node communication: nodes need data from neighbors to compute
the transform coefficients. Extra power is therefore consumed to transmit data to neighbors and to receive data from neigh-
bors. Further, since multiple nodes may want to communicate to each other at the same time, a channel access method for
shared medium has to be adopted (TDMA scheduling or FDMA scheduling, for example). Due to the popularity of dense sen-
sor networks, several works have been proposed for distributed transform [12,11,4,45,50,5].
Finally, to overcome the problem of inter-node communication, Distributed Source Coding (DSC) [32,53], also known as
Slepian–Wolf coding (in the lossless case) [42] and Wyner–Ziv coding (in the lossy case) [51], has been proposed. Here, the
assumption is that sensor nodes are densely deployed. Thus, the readings of one sensor are highly correlated with those of its
neighbors. DSC refers to the compression of multiple correlated sensor outputs that do not communicate with each other.
These sensors send their compressed outputs to a central point, e.g., the base station, for joint decoding the encoded streams.
DSC is the on-fashion trend in distributed data compression and a large number of papers focus on this scheme: for the sake
of brevity we invite interested readers to refer to [33]. We observe that DSC cannot be used without proper synchronization
among the nodes of the WSN, i.e., assumptions have to be made for the routing and scheduling algorithms and their connec-
tion to the DSC scheme [33].
To the best of our knowledge, only a few papers have discussed the second approach. Actually, to compress data locally
and independently of the other nodes may have some non-negligible advantages with respect to the distributed transform
approach. First, nodes do not need to communicate with each other. Thus, nodes save energy and do not compete for the
shared medium. On the other hand, in an attempt to decorrelate the information, each sensor can use only its local informa-
tion, thus vanishing the promising results obtained by DSC. This can be problematic whether either the compression ratios
achieved by the compression algorithm executed independently on the sensor node are too low (the amount of compressed
data is still large), or, though the compression ratios are high, these have been obtained by executing a high number of
instructions, with consequent large power consumption. In fact, power saving can be achieved only if the execution of
the compression algorithm does not require an amount of energy larger than the one saved in reducing communication.
F. Marcelloni, M. Vecchio / Information Sciences 180 (2010) 1924–1941 1927
Indeed, after analyzing several families of classic compression algorithms, Barr and Asanović conclude that compression
prior to transmission in wireless battery-powered devices may actually cause an overall increase of power consumption,
if no energy awareness is introduced [2]. On the other hand, standard compression algorithms are aimed at saving storage
and not energy. Thus, appropriate strategies have to be adopted.
Second, enabling compression acting at the single node independently of the others may be particularly effective in the
WSN topology which is gaining popularity in the last years, that is, the topology consisting of a number of sparse static sensor
nodes and one or more data collectors, also called data-mules, which come into contact with the static nodes at approximately
regular intervals and collect values measured by them [39,43]. In this topology, sensor nodes may be quite far from each other.
Hence, nodes cannot collaborate with each other and data measured by neighboring nodes might not be correlated in space
and/or in time. Thus, distributed approaches might not reach satisfactory performances in terms of compression ratios and
consequently in terms of power saving. On the other hand, we have to consider that this topology is characterized by a number
of advantages in comparison with the traditional multi-hop-based WSNs [43]: the network lifetime is longer (the number of
exchanged packets decreases), the packet loss probability decreases (the number of hops decreases), the network capacity
increases and node synchronization error decreases (the number of hops is smaller than in the multi-hop approach). On
the other hand, the data latency and the costs of the network infrastructure might increase [43]. Typically, the additional cost
for data-mules can be maintained very low exploiting the mobility of external agents available in the environment. Further,
since the model with data-mules is typically applied in environmental monitoring applications, latency is not generally an
issue. On the other hand, the use of the compression algorithm makes sense only if the application allows collecting a reason-
able number of measures before transmitting packets to the base station. In monitoring applications, this scenario is quite
typical: we are not interested in obtaining data in real-time, but only in tracing the history of the measures (in [34], these
networks are denoted as delay tolerant networks). On the contrary, in alarm detection applications, data have to be transmit-
ted in real-time and therefore compression performed locally in sensor node could not have an effective relevance.
Generally, data-mules may come into contact with the static sensor nodes rarely. Further, static sensor nodes (acting as
relays) can be requested to transmit to the data-mule not only the samples collected from the sensors on-board the node, but
also those received from other nodes which cannot directly come into contact with the data-mule [21,16,48]. Thus, static
sensor nodes have to exploit data compression both to reduce memory occupation and to diminish the transmission time.
Indeed, data transmission can be performed only during the time interval when data-mule and static sensor nodes are in
contact, and the amount of data to be transmitted can be considerable also due to the function of storage/forwarding points
played by the static sensor nodes. On the contrary, when the amount of stored data is not considerable because, for instance,
of the frequent contacts with the data-mule, data compression may be still useful to reduce power consumption. Indeed,
based on the amount of data, the node can decide whether the data have to be transmitted to the data-mule or whether
it is more convenient to wait for the next contact so as to fill completely the payload of the packet. This aspect is currently
an interesting research topic (see, for instance, [3]). Further, compression can help to decrease channel sampling frequency in
discovery protocols, that is, in the protocols used to promptly detect the arrival of the data-mule. Indeed, if the amount of
data to be transmitted is reduced, the contact time can be shortened and consequently the arrival of the data-mule can be
discovered with a lower precision. Obviously, the reduction of the channel sampling frequency implies the reduction of the
number of times radio is turned on, thus achieving a further power saving.
Examples of compression techniques applied to the single node adapt some existing dictionary-based compression algo-
rithms to the constraints imposed by the limited resources available on the sensor nodes. For instance, the lossless compres-
sion algorithms proposed in [26,28,34] are, respectively, purposely adapted versions of LZ77, Exponential-Golomb code and
LZW, respectively. The LTC algorithm proposed in [36] is an efficient and simple lossy compression technique for the context
of habitat monitoring. LTC introduces a small amount of error into each reading bounded by a control knob: the larger the
bound on this error, the greater the saving by compression. Basically LTC is similar to Run Length Encoding (RLE) in the sense
that it attempts to represent a long sequence of similar data with a single symbol [35]. The difference with RLE is that while
the latter searches for strings of a repeated symbol, LTC search for linear trends. We will use LTC as comparison since LTC is,
to the best of our knowledge, the unique lossy compression algorithm specifically designed to lossy compress data on a sin-
gle sensor node of a WSN.
Our compression scheme is a purposely adapted version of the DPCM scheme widely used for compressing signals, espe-
cially in speech and video coding [20]. DPCM is a member of the family of differential compression methods. These methods
exploit the high correlation that typically exists between neighboring samples of smooth digitized signals, achieving com-
pression by appropriately encoding differences between these samples. The simplest differential encoder calculates and en-
codes the differences di ¼ si si1 , between consecutive samples si . The first data sample, s0 , is either encoded separately or is
written on the compressed stream in raw format; in both the cases the decoder can reconstruct s0 exactly. We will adopt the
second solution. The decoder simply reverses the encoding tasks, in a symmetric manner, that is, it decodes the differences di
and uses the decoded values to generate the reconstructed samples si ðsi ¼ si1 þ di Þ.
In principle, any suitable method, lossy or lossless, can be used to encode the differences. In practice, scalar quantization
is often used, resulting in lossy compression. The quantity encoded is therefore not the difference di but its quantized
1928 F. Marcelloni, M. Vecchio / Information Sciences 180 (2010) 1924–1941
version, denoted by d ^ . The difference q between d and d ^ is denoted as quantization error. The design of the quantizer is a
i i i i
crucial point in the development of a DPCM compression scheme.
Let S ¼ fS1 ; . . . ; SL g be a set of cells Sl , with l 2 ½1::L, which form a disjoint and exhaustive partition of the input domain D
(difference domain in our case). Let C ¼ fy1 ; . . . ; yL g be a set of levels yl 2 Sl , with l 2 ½1::L. The quantization process is per-
formed by a quantization operator Q : D ! C such that Q ðdi Þ ¼ yl () di 2 Sl . Cells Sl are often expressed in the form of inter-
vals Sl ¼ ðal1 ; al , where the bounds al are called thresholds. The width of a cell is expressed by jal al1 j [14]. The
quantization rule can be expressed as Q ðdi Þ ¼ gðbf ðdi ÞcÞ, where bf ðÞc returns the index l of the cell Sl , which the difference
di belongs to, and gðÞ returns the quantized output d ^ ¼y.
i l
A quantizer is said to be uniform if the levels yi are equispaced and the thresholds are midway between adjacent levels.
Given a uniform quantizer with cell width D, the region of the input space within D=2 of some quantizer level is called the
granular region or simply the support and that outside (where the quantizer error is unbounded) is called the overload or sat-
uration region [14].
When a good rate-distortion performance is requested to a quantizer, the zero-cell width is usually treated individually,
even if the quantizer is uniform. Since each input within the zero-cell is quantized to 0, this cell is often called dead zone:
uniform quantizers with dead zone have been successfully employed, for example, in many international standards for im-
age and video coding, such as JPEG and MPEG [46].
Unfortunately, the introduction of the quantization block in a DPCM scheme introduces a new problem, namely, the accu-
mulation of errors problem [35]. To overcome this problem, the encoder is modified so as to compute the differences
di ¼ si ^si1 , that is, to calculate difference di by subtracting the most recent reconstructed value ^si1 (which both encoder
and decoder have) from the current original sample si . Thus, the decoder first decodifies s0 . Then, when it receives the first
quantized difference d ^1 , it computes ^s1 ¼ s0 þ d ^1 ¼ s0 þ d1 þ q ¼ s1 þ q . When it receives the second quantized difference
1 1
^ ^ ^ ^
d2 , it computes s2 ¼ s1 þ d2 ¼ s1 þ d2 þ q2 ¼ s1 þ s2 ^s1 þ q2 ¼ s2 þ q2 . The decoded value ^s2 contains just the single quanti-
^ ^
zation error q2 . In general, the decoded value ^si is equal to si þ qi , thus it contains just the quantization error qi .
Typically, the DPCM scheme takes also advantage of the fact that the current sample depends on several and not only one
of its near neighbors. Thus, to improve prediction, we can use K of the previously seen neighbors to encode the current sam-
ple si by using a prediction function in the form Uð^si1 ; . . . ; ^siK Þ. Methods which use such a predictor are called differential
pulse code modulations.
Usually the d ^ data sequence is further compressed by using any lossless compression algorithm which makes d ^ more
i i
compactly represented by removing some redundancies. Typically run length encoding schemes, entropy schemes and arith-
metic schemes are adopted to achieve this data compaction [35].
In the context of WSNs, both the original DPCM and adaptive versions (denoted ADPCM in the following) and distributed
variants have been already employed. For instance, in [55], the authors propose an algorithm based on DPCM to compress
data collected by vibration sensors and discuss the effects of signal distortion due to lossy data compression on structural
system identification. In their scheme, they use the least squares method to derive the linear predictor coefficients, a Jayant
quantizer for scalar quantization and an arithmetic coding as entropy encoder. Papers [27,54,31] show how DPCM tech-
niques can be used to enable audio signal compression over WSNs. In particular, in [27], the authors show a very interesting
experiment on how to implement a networking platform for supporting real-time voice streaming over a WSN in a coal
mine. In [54], the authors describe how to implement streaming services for supporting military surveillance applications.
The need to transmit understandable speech along WSNs with energy consumption constraints was common to both the
studies. To this aim, microphone sample rates were set lower than the normal 8kHz and ADPCM was used to encode the data
and thus reduce the transmission data rate. In [31], a distributed ADPCM scheme was proposed in order to solve the problem
of the low sample rates which affected the quality of the speech at the receiver in [27] and [54]. In [25], the authors introduce
a two-stage distributed DPCM coding scheme for WSNs, consisting of temporal and spatial stages that compress data by
making predictions based on samples from the past. The interesting feature of this approach is that, since it continuously
monitors the additional gain provided by samples collected from other sensors, it can be combined with data-centric routing
algorithms for joint compression/routing optimization.
The novelty introduced in this paper is not, therefore, the use of a DPCM compression scheme in WSNs, but rather an opti-
mization method which allows using a classical DPCM scheme for reducing the information entropy at the encoder, resulting
in a reduced noise after reconstruction at the decoder.
Fig. 1 shows the block diagrams of our compressor and uncompressor. As regards the compressor, the generic difference
di is calculated by subtracting only the most recent reconstructed value ^si1 , that is, there is only a delay block rather than a
prediction block. The introduction of a prediction block, in fact, would have caused an overall increase in the complexity of
the compression algorithm, without a tangible increase of the compression performance (at least for the type of data typi-
cally collected by WSNs). The bf ðÞc block returns the index li of the cell Sli which di belongs to. The index li is input to the gðÞ
block, which computes the quantized difference d ^i , and to the encoding block ENC, which generates the codeword ci [30].
In the uncompressor, the codeword ci is analyzed by the decoding block DEC which outputs the index li . This index is elab-
orated by the block gðÞ to produce d ^i , which is added to ^si1 to output ^si .
F. Marcelloni, M. Vecchio / Information Sciences 180 (2010) 1924–1941 1929
Fig. 1. Block diagram of (a) the compressor and (b) the uncompressor.
As regards the block ENC, it is well-known in information theory that, when the quantization indexes have unequal prob-
abilities, assigning equal numbers of bits to all quantization indexes is wasteful [40]. Indeed, the number of bits produced by
the quantizer will be reduced if shorter binary codewords are assigned to more probable indexes. This observation is used in
the entropy encoders, which encode more probable indexes with lower number of bits. Further, the set of binary codewords
satisfies the prefix condition, that is, no member is a prefix of another member, in order to ensure unambiguous decodability.
In our case, inputs di to the quantizer represent the general differences between consecutive digitized environmental data
samples si . In general, environmental signals are quite smooth and therefore small differences are more probable than large.
Thus, we can use an entropy encoder in order to further compress the integer-valued quantization indexes.
Any scalar quantization operator Q can be adopted. In deciding the type of operator, we have to consider that a coarse
quantization generates a high data reduction, but also a high reconstruction error at the decoder. Further, the operator can-
not be computationally heavy, since it is executed on a battery powered tiny device. Thus, a right trade-off among compres-
sion, reconstruction error and complexity has to be found.
In the next section, we propose to use an MOEA to determine a set of optimal operators with different trade-offs among
information entropy H, quantization complexity C and SNR between the quantized and the de-noised samples (we denote
1930 F. Marcelloni, M. Vecchio / Information Sciences 180 (2010) 1924–1941
this SNR as SNR ). Information entropy H provides an indirect measure of the possible obtainable compression rates and is
defined as:\
X
L
H¼ pl log2 ðpl Þ; ð1Þ
l¼1
where N is the number of samples, and s^i and si are, respectively, the reconstructed and the ideal de-noised samples. To de-
noise the samples, we have adopted the wavelet shrinkage and thresholding method proposed in [9]. We would like to point
out that each type of de-noising technique can actually be employed. We have used the Symmlet 8 wavelet, a level of decom-
position equal to 5 and the soft universal thresholding rule for thresholding the detail coefficients at each level. The de-nois-
ing process has been performed by using standard Matlab built-in functions. To provide a glimpse of the effectiveness of the
de-noising approach, Fig. 2 shows a real temperature dataset collected by a sensor on-board a node of a WSN before and after
applying the de-noising process. We can observe how the noise is practically completely removed.
MOEAs have been investigated by several authors in recent years [56]. Some of the most popular among these algorithms
are the Strength Pareto Evolutionary Algorithm (SPEA) [58] and its evolution (SPEA2) [57], the Niched Pareto Genetic Algo-
rithm (NPGA) [17], the different versions of the Pareto Archived Evolution Strategy (PAES) [23], and the Non-dominated Sort-
ing Genetic Algorithm (NSGA) [44] and its evolution (NSGA–II) [8]. Since NSGA–II is considered as one of the most effective
MOEAs, we used NSGA–II in the experiments. On the other hand, to compare the performance of different MOEAs is out of
the scope of this paper: we only aim to show the effectiveness of an MOEA approach in determining a set of quantizers which
allow achieving different trade-offs among information entropy, quantization complexity and SNR . We use the jMetal [10]
implementation of NSGA–II for our optimization.
The choice of the solution with the best trade-off among H, C and SNR for the specific application can be made on the
basis of the constraints which have to be satisfied at the moment.
In the following subsections, we describe the chromosome coding, the genetic operators and the NSGA–II algorithm.
Each chromosome codifies a different quantizer. The choice of the parameters which identify the quantizers is based on the
following considerations. The signals collected by sensors on-board nodes are affected by noise: these sensors produce differ-
ent readings even when they are sampling an unchanging phenomenon. To reduce this problem, the quantizer has to be char-
acterized by a dead zone. Further, to guarantee a higher flexibility than a uniform quantizer, but without complicating too
much the quantization rule, we split the granular region into two subregions. Then, we partition both the subregions uni-
formly with appropriate different cell widths. It follows that each quantizer is determined by the following five parameters:
Every general difference within the interval ðDW; DWÞ is quantized to zero.
The first granular subregions ðDW FN FW; DW and ½DW; DW þ FN FWÞ are uniformly partitioned by FN cells of FW
width.
The second granular subregions ðDW FN FW SF SW; DW FN FW and ½DW þ FN FW; DW þ FN FW þ SF SWÞ
are uniformly partitioned by SN cells of SW width.
The differences which fall in the two semi-infinite saturation regions ð1; DW FN FW SF SW and ½DW þ FN
FW þ SF SW; þ1Þ are quantized to the midway of the adjacent cells of the second granular subregions. It follows that
Eq. (2) can be rewritten as:
C ¼ 2 ðFN þ SNÞ þ 1: ð4Þ
The choice of these parameters originates several different types of quantizers. In the chromosome, each parameter is ex-
pressed as a positive integer in the range ½1; MAX and is codified by a Gray binary code. The value of MAX depends on
the resolution of the ADC on-board the sensor node. On the other hand, to constrain the upper bound of the range reduces
the search space and allows a better exploration. In our experiments we set MAX ¼ 64: it follows that each chromosome is
represented by a string of 30 bits.
Each chromosome is associated with a vector of three elements, where each element expresses the fulfillment degree of
the three objectives (Eqs. (1), (4) and (3)). To compute H; C and SNR , we use a small set (training set) of samples collected by
the sensor on-board the node.
We apply classical one-point crossover operator and a gene mutation operator [29]. The one-point crossover operator cuts
two chromosomes at some chosen common point and swaps the resulting sub-chromosomes. The common point is chosen
by extracting randomly a number in (1, 30).
In the mutation operator a point is randomly selected and its value is swapped (0 becomes 1 and vice versa). The cross-
over operator is applied with probability P X ; the mutation operator is applied with probability P M . In the experiments, we
adopted PX ¼ 0:9 and P M ¼ 0:02.
In order to select the mating operators and probability values, we performed several experiments by comparing the dif-
ferent Pareto fronts obtained by applying NSGA–II with different crossover and mutation operators, and different probabil-
ities. We verified that the selected mating operators and probability values allow obtaining the widest Pareto fronts and the
best trade-offs among H; C and SNR .
5.3. NSGA–II
The NSGA–II algorithm was introduced in [8] as an improved version of the Non-dominated Sorting Genetic Algorithm
[44]. It is a population-based genetic algorithm, which uses an ad-hoc density-estimation metric and a non-dominance rank
1932 F. Marcelloni, M. Vecchio / Information Sciences 180 (2010) 1924–1941
assignment. NSGA–II starts from an initial random population P 0 of N pop individuals (100 in our experiments) sorted based on
the non-dominance. Each individual is associated with a rank equal to its non-dominance level (1 for the best level, 2 for the
next-best level, and so on). To determine the non-dominance level (and consequently the rank), two entities are computed
for each individual p: (i) the number np of individuals that dominate p and (ii) the set Sp of individuals dominated by p. All
individuals with np ¼ 0 belong to the best non-dominance level associated with rank 1. Indeed, these individuals are dom-
inated by no other individual. To determine the individuals associated with rank 2, for each solution p with rank 1, we visit
each member q of the set Sp and decrease nq by one. If nq becomes zero, then q belongs to the non-dominance level associated
with rank 2. The procedure is repeated for each solution with rank 2, rank 3 and so on until all fronts are identified. At each
iteration t; t ¼ 0; . . . ; T max , an offspring population Q t of size N pop is generated by selecting mating individuals through the
binary tournament selection, and by applying the crossover and mutation operators. Parent population Pt and offspring pop-
ulation Q t are combined so as to generate a new population P ext ¼ Pt [ Q t . Then, a rank is assigned to each individual in Pext as
explained above. Based on these ranks, Pext is split into different non-dominated fronts, one for each different rank. Within
each front, a specific crowding measure, which represents the sum of the distances to the closest individual along each objec-
tive, is used to define an ordering among individuals: in order to cover the overall objective space, individuals with large
crowding distance are preferred to individuals with small crowding distance. The new parent population P tþ1 is generated
by selecting the best N pop individuals (considering first the ordering among the fronts and then among the individuals) from
Pext . The algorithm terminates when the number of iterations achieves T max (10,000 in our experiments).
In this Section, we summarize the steps which have to be performed in order to design a lossy compression algorithm for
a specific sensor type:
(1) Collect a small set of samples si from the sensor (the number of these samples depends on the specific application: the
higher the number of these samples, the higher the reliability of the training set);
(2) Select a de-noising algorithm;
(3) Apply this algorithm to the set of samples to generate a training set composed of pairs ðsi ; si Þ, where si and si are,
respectively, the original and the de-noised samples;
(4) Apply the MOEA to determine the parameters of the quantizer used in the compression algorithm. Use the training set
for ranking the solutions in terms of SNR ; H and C;
(5) Select one solution among the set of solutions of the Pareto front. This solution determines the parameters of the
Quantizer used in the lossy compression algorithm;
(6) Configure the lossy compression algorithm and transfer the code into the sensor nodes.
Steps (3) and (4) are performed off-line on a general-purpose machine with reasonable computational power. Once the
parameters of the Quantizer have been computed, the lossy compression algorithm is deployed on the sensor nodes (step
(6)). Whenever a sample is collected by a sensor, it is compressed on the fly on the sensor node by the lossy compression
algorithm.
One could observe that the parameters of the quantizer are computed off-line by using a training set and therefore the
compression algorithm is not adaptable to changes of the data model. Though this observation is correct, we would like to
point out that, in our experience, when monitoring the same phenomenon (for instance, air temperature) with the same sen-
sor type at the same frequency, the SNR achieved by the lossy compressor is quite stable in time, that is, the SNRs measured
on the training set are approximately equal to the ones measured on the test set, and for different deployments, that is, the
SNRs measured on the training set collected from a sensor are approximately equal to the ones measured on the samples
collected from another sensor of the same type. Actually, this behavior has been shown in the experimental part of the paper,
where, once determined the parameters of the quantizer used in the lossy compression algorithm, trained on a subset of a
dataset, we have used the compression algorithm on three different test sets, generated by measuring temperature in dif-
ferent instants and different places by means of the same type of sensor. Further, we have to consider that our algorithm
works on the differences between consecutive samples and the distribution of these differences tends to be stable in appli-
cations of environmental monitoring. Finally, we would like to remark that if the current data model is different from the
model studied in the training set, the lossy compression continues to work, though less efficiently.
7. Experimental results
In order to show the effectiveness and validity of our lossy compression approach, we tested it against some real-world
temperature datasets. In particular, we used temperature measurements from three SensorScope deployments [38]: FishNet
Deployment, Grand-St-Bernard Deployment and Le Gènèpi Deployment. We chose to adopt public domain datasets rather
than to generate data by ourselves to make the assessment as fair as possible. The WSNs adopted in the deployments employ
a TinyNode node type [47], which uses a TI MSP430 microcontroller, a Xemics XE1205 radio and a Sensirion SHT75 sensor
module [37]. This module is a single chip which includes a bandgap temperature sensor, coupled to a 14 bit ADC and a serial
F. Marcelloni, M. Vecchio / Information Sciences 180 (2010) 1924–1941 1933
Table 1
Main characteristics of the four datasets.
Table 2
Statistical characteristics of the three datasets.
Dataset s rs r
d Hs Hd
d
interface circuit. The Sensirion SHT75 can sense air temperature in the [ 20 °C, + 60 °C] range. Each ADC output raw_t is rep-
resented with resolution of 14 bits and normally converted into a measure t in Celsius degrees (°C) as described in [37]. The
datasets corresponding to the three deployments contain measures t. On the other hand, our algorithm works on raw_t data.
Thus, before applying the algorithm, we extracted raw_t from t, by using the inverted versions of the conversion functions in
[37].
We built our datasets by extracting from the three SensorScope deployment data bases the temperature measurements
for a randomly extracted sensor node within a specific time interval. Table 1 summarizes the main characteristics of the
datasets. In the following we will refer to the mentioned datasets by using their symbolic names. Table 2 shows some sta-
tistical characteristics of these datasets. In particular, we have computed the mean s, the standard deviation rs and the infor-
mation entropy Hs of the samples, the mean d, the standard deviation r and the information entropy H of the differences
d d
between consecutive samples.
We used the first N ¼ 5040 samples from FN101 temperature dataset to build the training set. Since this dataset is a col-
lection of temperature samples collected with a frequency of 1 sample each 2 minutes, it is equivalent to consider a 7-days
training set. The extracted portion of the original signal s is first de-noised and then converted back to raw data (we recall
that our compression scheme works on raw data). The raw data corresponding to the de-noised signal are denoted as s .
Fig. 2 shows the portion of the original signal used to build the training set and its de-noised version.
We applied NSGA–II to the training set. At the end of the optimization process we obtained an archive of non-dominated
solutions with respect to the three objectives. In particular, the objective C (quantization complexity) has allowed us to steer
the search of the best solutions towards the ones with the minimum number of cells. Thus, during the evolution, the archive
is populated preferably by quantizers which, having equal entropy and SNR , are characterized by a lower number of cells,
thus avoiding to consider quantizers with a high number of unused cells. This allows simplifying the implementation of the
quantizer and consequently of the encoder. Indeed, if the number of indexes is low, the encoder can use a small dictionary to
encode the quantization indexes. This dictionary can be generated by using the Huffman’s algorithm [18] which provides a
systematic method of designing binary codes with the smallest possible average length for a given set of symbol occurrence
frequencies. Once the binary codeword representation of each quantization index has been computed and stored in the sen-
sor node, the encoding phase reduces to a look-up table consultation.
The only critic point of this approach is that the Huffman’s algorithm requires to know the probability with which the
source produces each symbol in its alphabet. To determine an approximation of these probabilities, we can exploit again
the training set: for the specific quantizer, we compute the probability with which each quantization index occurs when
quantizing the differences between consecutive samples of the training set, and we build the optimal dictionary for that data
source by applying the Huffman’s algorithm.
If we project the final archive on the SNR H plane (see Fig. 3), we realize that actually almost all solutions maintain the
non-dominance property with respect to the SNR and H objectives: only 12 out of 100 solutions result to be dominated by
one or more solutions in the archive. In the figure, dots and crosses represent, respectively, non-dominated and dominated
solutions with respect to the SNR and H objectives. Non-dominated solutions in the SNR H plane are actually the solutions
of interest. We can observe that the front is wide and the solutions are characterized by a good trade-off between SNR and H.
To perform an accurate analysis of some solution, we selected from the front in Fig. 3 three significant quantizers: solu-
tions (A) and (C) characterized by, respectively, the highest H and the lowest SNR , and solution (B) characterized by a good
1934 F. Marcelloni, M. Vecchio / Information Sciences 180 (2010) 1924–1941
Table 3
Parameters of solutions (A), (B) and (C).
Solution DZ FW FN SW SN
A 8 1 3 2 1
B 32 15 1 5 1
C 63 61 1 44 1
trade-off between SNR and H. Table 3 shows the values of the five parameters which characterize the three selected
quantizers.
Solutions (A), (B) and (C) correspond, respectively, to the quantization rules represented in Fig. 4, where the black dots
and the circles represent the differences di and quantized differences d ^ , respectively. Table 4 shows the quantization in-
i
dexes, their probabilities and the codewords assigned by the Huffman’s algorithm when the selected quantizers are used
in the proposed scheme for compressing the training set.
To assess the performances of the three compression algorithms generated by, respectively, the quantizers corresponding
to (A), (B) and (C), we use the compression rate (expressed in bits/sample), defined as:
comSize
cr ¼ ; ð5Þ
N
where comSize and N represent the size of the compressed bitstream and the number of samples, respectively. Further, under
the assumption that all samples have to be transmitted to the sink by using the lowest number of messages so as to have
power saving, and supposing that each packet can contain at most 29 bytes of payload [6], we can evaluate the packet com-
pression ratio, defined as:
comPkt
PCR ¼ 100 1 ; ð6Þ
origPkt
where comPkt and origPkt represent the number of packets necessary to deliver the compressed and the uncompressed bit-
streams, respectively.
Finally, we have to note that, like all the differential compression algorithms, the proposed scheme suffers from the fol-
lowing problem. In order to reconstruct the original samples, the decoder must know the value of the first sample: if the first
sample has been lost or corrupted, all the other samples are not correctly decoded. In our case, the compressed bitstream is
sent by a wireless communication to the sink, which takes the decompression process in charge. Since the transmission can
be non-reliable, the first packet could be lost and thus also the first value, making correct reconstruction of samples impos-
sible. A number of solutions have been proposed to make a communication reliable. In general, these solutions involve pro-
tocols based on acknowledgments which act at Transport layer. Obviously, these protocols require a higher number of
message exchanges between nodes and this increases the power consumption. A review of these algorithms is out of the
F. Marcelloni, M. Vecchio / Information Sciences 180 (2010) 1924–1941 1935
Fig. 4. Quantization rules for the three solutions: (a) solution (A), (b) solution (B) and (c) solution (C).
scope of this paper. Anyway, a solution to this problem can be also provided at the application layer without modifying the
protocols of the underlying layers: the first sample inserted into the payload of a new packet is written in raw format, thus
resetting the compression algorithm from scratch. Under this assumption, the decoding of each packet is independent of the
reception of the previous packets: this is paid with a slight decrease of the packet compression ratio. We denote the packet
compression ratio obtained by using this expedient as PCR .
Table 5 shows the cr, PCR and PCR obtained for the three temperature datasets and the three selected quantizers. Further,
the table reports the SNR between the original noisy and the reconstructed de-noised samples, denoted as SNRn , and the SNR
between the original de-noised and the reconstructed samples, denoted as SNRd . We can observe that all solutions achieve
good trade-offs between compression rates and SNR. Further, there do not exist considerably differences between the results
obtained in the training set and the ones achieved in the overall FN101 dataset and the other two datasets. This result could
1936 F. Marcelloni, M. Vecchio / Information Sciences 180 (2010) 1924–1941
Table 4
Codewords used in solutions (A), (B) and (C).
Table 5
Results obtained by solutions (A), (B) and (C) on the three datasets.
be considered enough surprising. Indeed, we highlight that both the optimization and the Huffman’s algorithm were exe-
cuted using only a portion of the FN101 dataset. Thus, GSB10 and LG20 datasets are completely unknown to the compression
scheme. We are conscious that the procedure adopted for FN101 could have been exhaustively applied also to the other data-
sets, in order to find ad-hoc solutions for the particular deployment. On the other hand, the three temperature datasets were
collected, though in different places and times, by the same sensor nodes with the same type of temperature sensor and the
same sampling frequency: for this reason it could be unnecessary to perform the optimization on each dataset. To validate
this assumption for each selected solution, we executed the Huffman’s algorithm on a portion of N ¼ 5040 samples extracted
respectively from GSB10 and LG20 datasets and used the resulting dictionaries to compress the corresponding datasets. Ta-
ble 6 shows the cr, PCR and PCR obtained in this case. If we compare the results in Table 6 with those in Table 5, we can
observe that the decreases in cr (or equivalently the increases in PCR and PCR ) are very small and almost negligible, thus
confirming the possibility of adopting the same encoding for similar applications of the same sensor. Similar considerations
(for the sake of brevity these results are not shown) can be also made for the genetic optimization.
Solution (B) which was chosen on the knee of the Pareto front is characterized by compression rates comparable to those
achieved by solution (C) and by SNRd comparable to those obtained by solution (A). This solution therefore represents a good
trade-off between compression rates and SNRd . For this reason, we chose this solution to perform the comparisons discussed
in the following subsections.
To assess the effectiveness of our approach, we adopt the LTC algorithm proposed in [36]. LTC generates a set of line seg-
ments which form a piecewise continuous function. This function approximates the original dataset in such a way that no
original sample is farther than a fixed error e from the closest line segment. Thus, before executing the LTC algorithm, we
have to set error e. We choose e as a percentage of the Sensor Manufactured Error (SME). From the Sensirion SHT75 sensor
data sheet [37], we have SME ¼ 0:3 C for temperature. To analyze the trend of the cr with respect to increasing values of e,
we varied e from 10% to 230% of the SME with step 10%.
Fig. 5 shows, for all the three datasets, the resulting behavior of LTC with respect to increasing values of e on the SNRn -cr
plane. In the same figure, we have also reported the performance achieved by the solutions (A), (B) and (C) when applied to
the datasets FN101 (A1, B1, C1), GSB10 (A2, B2, C2) and LG20 (A3, B3, C3), respectively.
F. Marcelloni, M. Vecchio / Information Sciences 180 (2010) 1924–1941 1937
Table 6
Compression rates and packet compression ratios achieved by the three solutions (A), (B) and (C) on GSB10 and LG20 datasets when applying the Huffman’s
algorithm to training sets extracted from the two datasets.
Fig. 5. Performance of the solutions (A), (B) and (C), and of LTC on the SNRn -cr plane for the three datasets.
It is worth to note that, due to the particular approach based on linear approximation used in LTC, when using low values
of e, LTC works at compression rates higher than 16 bits/sample, and, therefore, the size of the compressed data is larger than
the original data. We observe that solutions (A), (B) and (C) are characterized by low compression rates for all datasets. To
assess the behavior of the different algorithms with respect to the de-noised datasets, we show in Fig. 6 the performance of
the solutions (A), (B), (C) and of LTC on the SNRd -cr plane (we reported only the zone of interest in the figure).
In order to fairly compare our solution (B) with LTC, we computed for each dataset the intervals of e which allows us to
obtain values of cr similar to the solution (B). Table 7 shows the correspondences between the compression rates achieved by
our algorithm, and cr, e, SNRn and SNRd obtained by LTC on the three datasets. By comparing Table 7 with Table 5, we can
observe that our algorithm obtains SNRs equal to, or higher than, LTC in correspondence to lower compression rates (that
is, higher compression ratios). For example, for the FN101 dataset and an SNRd ¼ 43:88, the compression rate for LTC is
1.24, whereas for our algorithm is 1.15.
7.5. Complexity
Compression rates and reconstruction errors are only two of the factors which determine the choice of a compression
algorithm suited to WSNs. Another fundamental factor is complexity. To assess the complexity of our algorithm and of
LTC, we have performed a comparative analysis on the number of instructions required by each algorithm to compress data.
To this aim, we have adopted the Sim–It Arm simulator [41]. Sim–It Arm is an instruction-set simulator that runs both sys-
tem-level and user-level ARM programs. For LTC, we have set e to the left extremes of the e intervals in Table 7 (we recall that
the left extremes are the most favourable cases for the LTC algorithm). Table 8 shows the numbers of instructions required
for compressing each dataset, the number of saved bits, the numbers of instructions per saved bit for each temperature data-
sets and their average values.
We note that, though, given a value of SNR (SNRn or SNRd ), our algorithm achieves lower crs (i.e., higher compression ra-
tios) than LTC, it requires a lower number of instructions. We observe that, on average, our algorithm executes 5.92 instruc-
tions for each saved bit against 40.43 executed by LTC.
1938 F. Marcelloni, M. Vecchio / Information Sciences 180 (2010) 1924–1941
Fig. 6. Performance of the solutions (A), (B) and (C), and of LTC on the SNRd -cr plane for the three datasets.
Table 7
Correspondences between compression rates achieved by solution (B), and compression rates and SNRs achieved by LTC on the three datasets.
FN101 1.15 [120, 130] [1.24, 1.12] [37.82, 37.15] [43.88, 42.92]
GSB10 1.32 [150, 160] [1.46, 1.29] [28.01, 27.51] [33.89, 33.02]
LG20 1.63 [210, 220] [1.68, 1.51] [25.02, 24.72] [31.58, 31.00]
Since we adopt the send and receive primitives of TinyOS for Tmote Sky (TelosB), we transmit or receive packets com-
posed of 38 bytes. The energy consumed to transmit a packet can be computed as:
Epkt ¼ 38 Ebyte ;
where Ebyte is the energy consumed to transmit a byte. Energy Ebyte can be derived as follows:
Ebyte ¼ V cTXB t TXB ;
where cTXB and tTXB are the current consumption and the time spent for transmitting one byte, respectively.
If we consider, for instance, the CC2420 radio (the one on-board Tmote Sky motes from MoteIv), we have that
cTXB ¼ 0:0174½A and tTXB ¼ 0:032½ms. It results that:
Table 8
Complexity of our algorithm and LTC.
7.6. Latency
In order to provide a general evaluation on how compression rates can impact the network latency, let us consider that
the sensor on-board the node collects samples with a frequency of one sample each t minutes. Let us consider that each
packet can contain at most B bytes. We have that the maximum delay D introduced by the compression algorithm is
B8
D¼ t: ð7Þ
cr
The maximum delay corresponds to the scenario in which all samples have to be propagated using the lowest number of
messages. Just to provide some value, let us consider the temperature dataset FN101 used in the experiments, which con-
tains samples collected with a frequency of 1 sample each 2 minutes. Here, each temperature sample is normally byte
aligned and is 16 bits long. For solution (B), for instance, we have cr ¼ 1:15. If the packet contains at most 29 bytes of payload
[6], we have that the delay introduced by the compression on the first sample is 402 minutes. On the other hand, this also
means that each packet contains 201 samples (corresponding to 402 minutes of a monitoring process). Thus, thanks to the
compression, a single sensor can transfer its local 402 minutes log to a requesting data-mule by sending a single packet.
8. Conclusions
In this paper, we have proposed a lossy compression algorithm purposely-designed for the limited resources available on-
board sensor nodes. Compression allows reducing the amount of data transmitted/received by a sensor node, thus extending
its lifetime. The algorithm is based on a differential pulse code modulation scheme where the differences between consec-
utive samples are quantized. The quantization process affects both the compression rate and the information loss. To gen-
erate different combinations of the quantization process parameters corresponding to different optimal trade-offs between
compression performance and information loss, we have applied NSGA–II, a popular multi-objective evolutionary algorithm,
on a subset of samples collected by the sensor. The user can therefore choose the combination with the most suitable trade-
off for the specific application. We tested our lossy compression approach on three datasets collected by real WSNs, obtain-
ing high compression rates at very high signal-to-noise ratios. Finally, we have shown how our approach outperforms LTC, a
lossy compression algorithm purposely designed to be embedded in sensor nodes, in terms of both compression rates and
complexity.
Acknowledgment
This work was supported by the Italian Ministry of University and Research (MIUR) under the FIRB project ‘‘Adaptive
Infrastructure for Decentralized Organization (ArtDecO)”.
1940 F. Marcelloni, M. Vecchio / Information Sciences 180 (2010) 1924–1941
References
[1] T. Abdelzaher, S. Prabh, R. Kiran, On real-time capacity limits of multihop wireless sensor networks, in: Proceedings of the 25th IEEE International Real-
Time Systems Symposium, 2004.
[2] K.C. Barr, K. Asanović, Energy-aware lossless data compression, ACM Trans. Comput. Syst. 24 (3) (2006) 250–291.
[3] L. Bölöni, D. Turgut, Should i send now or send later? a decision-theoretic approach to transmission scheduling in sensor networks with mobile sinks,
Wirel. Commun. Mob. Comput. 8 (3) (2008) 385–403.
[4] A. Ciancio, A. Ortega, A distributed wavelet compression algorithm for wireless multihop sensor networks using lifting, in: Proceedings of the IEEE
International Conference on Acoustics, Speech, and Signal Processing (ICASSP’05), vol. 4, 2005.
[5] A. Ciancio, S. Pattem, A. Ortega, B. Krishnamachari, Energy-efficient data representation and routing for wireless sensor networks based on a
distributed wavelet compression algorithm, in: Proceedings of the Fifth International Conference on Information Processing in Sensor Networks (IPSN
2006), 2006.
[6] S. Croce, F. Marcelloni, M. Vecchio, Reducing power consumption in wireless sensor networks using a novel approach to data aggregation, Comput. J. 51
(2) (2008) 227–239.
[7] C.C. Cutler, Differential quantization of communication signals, Patent, 2 605 361, July 1952.
[8] K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A fast and elitist multiobjective genetic algorithm: NSGA–II, IEEE Trans. Evol. Comput. 6 (2) (2002) 182–197.
[9] D.L. Donoho, I.M. Johnstone, Ideal spatial adaptation by wavelet shrinkage, Biometrika 81 (3) (1994) 425–455.
[10] J.J. Durillo, A.J. Nebro, F. Luna, B. Dorronsoro, E. Alba, jMetal: A Java Framework for Developing Multi-objective Optimization Metaheuristics, Tech. Rep.
ITI-2006-10, E.T.S.I. Informática, Campus de Teatinos, 2006.
[11] D. Ganesan, D. Estrin, J. Heidemann, DIMENSIONS: Why do we need a new data handling architecture for sensor networks?, in: Proceedings of the ACM
Workshop on Hot Topics in Networks, 2002.
[12] M. Gastpar, P.L. Dragotti, M. Vetterli, The distributed Karhunen–Loève Transform, IEEE Trans. Inf. Theory 52 (12) (2006) 5177–5196.
[13] V. Goyal, Theoretical foundations of transform coding, IEEE Signal Process. Mag. 18 (5) (2001) 9–21.
[14] R. Gray, D. Neuhoff, Quantization, IEEE Trans. Inf. Theory 44 (6) (1998) 2325–2383.
[15] C. Guestrin, P. Bodik, R. Thibaux, M. Paskin, S. Madden, Distributed regression: an efficient framework for modeling sensor network data, in:
Proceedings of the Third International Symposium on Information Processing in Sensor Networks, (IPSN’04), 2004.
[16] E.B. Hamida, G. Chelius, Strategies for data dissemination to mobile sinks in wireless sensor networks, IEEE Wirel. Commun. Mag. 15 (6) (2008) 31–37.
[17] J. Horn, N. Nafpliotis, D.E. Goldberg, A Niched Pareto genetic algorithm for multiobjective optimization, in: Proceedings of the First IEEE Conference on
Evolutionary Computation, vol. 1, 1994.
[18] D. Huffman, A method for the construction of minimum-redundancy codes, in: Proceedings of the IRE, vol. 40 (9), 1952, pp. 1098–1101.
[19] C. Intanagonwiwat, R. Govindan, D. Estrin, J. Heidemann, F. Silva, Directed diffusion for wireless sensor networking, IEEE/ACM Trans. Netw. 11 (1)
(2003) 2–16.
[20] N.S. Jayant, P. Noll, Digital Coding of Waveforms: Principles and Applications to Speech and Video, Prentice Hall Professional Technical Reference, 1990.
[21] P. Juang, H. Oki, Y. Wang, M. Martonosi, L. Peh, D. Rubenstein, Energy-efficient computing for wildlife tracking: design trade offs and early experiences
with zebranet, in: Proceedings of 10th International Conference on Architectural Support for Programming Languages and Operating Systems
(ASPLOS), 2002.
[22] N. Kimura, S. Latifi, A survey on data compression in wireless sensor networks, in: International Conference on Information Technology: Coding and
Computing (ITCC’05), vol. 2, 2005.
[23] J.D. Knowles, D.W. Corne, Approximating the nondominated front using the Pareto Archived Evolution Strategy, IEEE Trans. Evol. Comput. 8 (2) (2000)
149–172.
[24] N. Lane, A. Campbel, The influence of microprocessor instructions on the energy consumption of wireless sensor networks, in: Proceedings of the Third
Workshop on Embedded Networked Sensors, 2006.
[25] H. Luo, Y.-C. Tong, G. Pottie, A two-stage DPCM scheme for wireless sensor networks, in: Proceedings of the IEEE International Conference on Acoustics,
Speech, and Signal Processing (ICASSP’05), vol. 3, 2005.
[26] LZO homepage, 2008. URL %3chttp://www.oberhumer.com/opensource/lzo/%3e.
[27] R. Mangharam, A. Rowe, R. Rajkumar, R. Suzuki, Voice over sensor networks, in: RTSS’06: Proceedings of the 27th IEEE International Real-Time Systems
Symposium, IEEE Computer Society, Washington, DC, USA, 2006.
[28] F. Marcelloni, M. Vecchio, A simple algorithm for data compression in wireless sensor networks, IEEE Commun. Lett. 12 (6) (2008) 411–413.
[29] Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs, second ed., Springer-Verlag New York, Inc., New York, NY, USA, 1994.
[30] J. O’Neal, Differential pulse-code modulation (PCM) with entropy coding, IEEE Trans. Inf. Theory 22 (2) (1976) 169–174.
[31] A. Pazarloglou, R. Stoleru, R. Gutierrez-Osuna, High-resolution speech signal reconstruction in wireless sensor networks, in: Proceedings of the Sixth
IEEE Consumer Communications and Networking Conference (CCNC), 2009.
[32] S. Pradhan, J. Kusuma, K. Ramchandran, Distributed compression in a dense microsensor network, IEEE Signal Process. Mag. 19 (2) (2002) 51–60.
[33] D. Rebollo-Monedero, Quantization and Transforms for Distributed Source Coding, Ph.D. Thesis, Stanford University, Palo Alto, CA, December 2007.
[34] C.M. Sadler, M. Martonosi, Data compression algorithms for energy-constrained devices in delay tolerant networks, in: Proceedings of the Fourth
International Conference on Embedded networked sensor systems (SenSys’06), 2006.
[35] D. Salomon, Data Compression: The Complete Reference, fourth ed., Springer-Verlag, London, UK, 2007.
[36] T. Schoellhammer, B. Greenstein, E. Osterweil, M. Wimbrow, D. Estrin, Lightweight temporal compression of microclimate datasets, in: 29th Annual
IEEE International Conference on Local Computer Networks, 2004.
[37] Sensirion homepage, 2008. URL %3cwww.sensirion.com%3e.
[38] SensorScope deployments homepage, 2008. URL %3chttp://sensorscope.epfl.ch%3e.
[39] R. Shah, S. Roy, S. Jain, W. Brunette, Data MULEs: modeling a three-tier architecture for sparse sensor networks, in: Proceedings of the First IEEE
International Workshop on Sensor Network Protocols and Applications, 2003.
[40] C.E. Shannon, A mathematical theory of communication, Bell Syst. Tech. J. 27 (1948) 379–423. 623–656.
[41] SimIt-ARM homepage, 2008. URL %3chttp://simit-arm.sourceforge.net/%3e.
[42] D. Slepian, J. Wolf, Noiseless coding of correlated information sources, IEEE Trans. Inf. Theory 19 (4) (1973) 471–480.
[43] A.A. Somasundara, A. Kansal, D.D. Jea, D. Estrin, M.B. Srivastava, Controllably mobile infrastructure for low energy embedded networks, IEEE Trans.
Mob. Comput. 5 (8) (2006) 958–973.
[44] N. Srinivas, K. Deb, Multiobjective optimization using nondominated sorting in genetic algorithms, IEEE Trans. Evol. Comput. 2 (3) (1994) 221–248.
[45] C. Tang, C.S. Raghavendra, Compression techniques for wireless sensor networks, in: Wireless Sensor Networks, Kluwer Academic Publishers., Norwell,
MA, USA, 2004, pp. 207–231.
[46] B. Tao, On optimal entropy-constrained deadzone quantization, IEEE Trans. Circuits Syst. Video Technol. 11 (4) (2001) 560–563.
[47] TinyNode homepage, 2008. URL %3chttp://www.tinynode.com%3e.
[48] A.C. Viana, A. Ziviani, R. Friedman, Decoupling data dissemination from mobile sink’s trajectory in wireless sensor networks, IEEE Commun. Lett. 13 (3)
(2009) 178–180.
[49] R. Wagner, V. Delouille, R. Baraniuk, Distributed wavelet de-noising for sensor networks, in: Proceedings of the 45th IEEE Conference on Decision and
Control, 2006.
[50] R.S. Wagner, R.G. Baraniuk, S. Du, D.B. Johnson, A. Cohen, An architecture for distributed wavelet analysis and processing in sensor networks, in:
Proceedings of the Fifth International Conference on Information Processing in Sensor Networks (IPSN’06), 2006.
F. Marcelloni, M. Vecchio / Information Sciences 180 (2010) 1924–1941 1941
[51] A. Wyner, J. Ziv, The rate-distortion function for source coding with side information at the decoder, IEEE Trans. Inf. Theory 22 (1) (1976) 1–10.
[52] J.-J. Xiao, A. Ribeiro, Z.-Q. Luo, G. Giannakis, Distributed compression-estimation using wireless sensor networks, IEEE Signal Process. Mag. 23 (4)
(2006) 27–41.
[53] Z. Xiong, A. Liveris, S. Cheng, Distributed source coding for sensor networks, IEEE Signal Process. Mag. 21 (5) (2004) 80–94.
[54] J. Zhang, G. Zhou, S.H. Son, J.A. Stankovic, Ears on the ground: an acoustic streaming service in wireless sensor networks, in: Fifth IEEE/ACM
International Conference on Information Processing in Sensor Networks, IEEE/ACM IPSN06, 2006.
[55] Y. Zhang, J. Li, Dpcm-based vibration sensor data compression and its effect on structural system identification, J. Earthquake Eng. Eng. Vib. 4 (1) (2005)
153–163.
[56] E. Zitzler, K. Deb, L. Thiele, Comparison of multiobjective evolutionary algorithms: empirical results, IEEE Trans. Evol. Comput. 8 (2) (2000) 173–195.
[57] E. Zitzler, M. Laumanns, L. Thiele, SPEA2: improving the strength Pareto evolutionary algorithm for multiobjective optimization, in: K. Giannakoglou
et al. (Eds.), Evolutionary Methods for Design, Optimisation and Control with Application to Industrial Problems (EUROGEN 2001), International Center
for Numerical Methods in Engineering (CIMNE), Barcelona, Spain, 2002, pp. 95–100.
[58] E. Zitzler, L. Thiele, Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach, IEEE Trans. Evol. Comput. 3 (4)
(1999) 257–271.