0% found this document useful (0 votes)
19 views

Approximate dynamic programming for condition-based node deployment

Uploaded by

Leon Young
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Approximate dynamic programming for condition-based node deployment

Uploaded by

Leon Young
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Reliability Engineering and System Safety 243 (2024) 109803

Contents lists available at ScienceDirect

Reliability Engineering and System Safety


journal homepage: www.elsevier.com/locate/ress

Approximate dynamic programming for condition-based node deployment in


a wireless sensor network✩
Nicholas T. Boardman a ,∗, Kelly M. Sullivan b
a Department of Operational Sciences, Air Force Institute of Technology, Wright-Patterson AFB, United States of America
b
Department of Industrial Engineering, University of Arkansas, Fayetteville, AR, United States of America

ARTICLE INFO ABSTRACT

Keywords: The flexibility of deployment strategies combined with the low cost of individual sensor nodes allow wireless
Approximate dynamic programming sensor networks (WSNs) to be integrated into a variety of applications. Network operations degrade over time
Selective maintenance as sensors consume a finite power supply and begin to fail. In this work we address the selective maintenance
Network reliability
of a WSN through a condition-based deployment policy (CBDP) in which sensors are deployed over a series
Wireless sensor networks
of missions. The main contribution is a Markov decision process (MDP) model to maintain a reliable WSN
with respect to region coverage. Due to the resulting high dimensional state and outcome space, we explore
approximate dynamic programming (ADP) methodology in the search for high quality CBDPs. Our model is
one of the first related to the selective maintenance of a large-scale WSN through the repeated deployment of
new sensor nodes with a reliability objective, and one of the first ADP applications for the maintenance of a
complex WSN. Additionally, our methodology incorporates a destruction spectrum reliability estimate which
has received significant attention with respect to network reliability, but its value in a maintenance setting has
not been widely explored. We conclude with a discussion on CBDPs in a range of test instances, and compare
the performance to alternative deployment strategies.

1. Introduction constructing a network, and is further supported by the low infrastruc-


ture (e.g., wires, cables) required for operation [5]. Each individual
Through the cooperative effort of individual sensor nodes, a wireless sensor node contains the components necessary for sensing and send-
sensor network (WSN) can be deployed to monitor and report data on ing/receiving data, as well as an individual battery that is drained over
an event of interest in a desired region. In environmental settings WSNs the course of network operation [6]. As an increasing number of sensor
can be valuable to monitor a forest providing early detection of forest nodes fully consume their power supply and lose functionality, overall
fires, or to monitor a coastline and warn about potential flooding [1]. network coverage and connectivity begins to degrade. Sensor nodes
WSNs have additionally been deployed to observe animals and their may also fail for other reasons as well (e.g., malfunction, damage).
behavior in a natural habitat over a period of time with minimal While not the focus of this work, identifying faulty sensor nodes is an
disruption [2]. In commercial applications, WSNs can be utilized to important problem as well to ensure data from the WSN is accurate [7].
track inventory or for temperature/climate control in buildings and In either case, node failures can have a significant impact on network
warehouses [3]. Sensors have also been integrated into military and capability and may have ripple effects in the network as the remaining
healthcare applications [4], illustrating the flexibility WSNs offer. sensor nodes are relied on more heavily, thereby increasing power
While a single sensor can monitor only a relatively small area, sen- consumption and the risk to other faults [8].
sor nodes are able to communicate with each other to route information Methods to delay the impact of sensor node failures and extend
through the network. By sufficiently distributing nodes throughout a network lifetime have received significant attention in the literature.
region of interest, the WSN is able to monitor a much larger region. The A few areas include sleep/wake cycles [9–11] and power management
deployment of sensor nodes is typically categorized as either determin- techniques [12–14]. Battery and/or sensor node replacement policies
istic, where nodes are located at specified locations, or random [4]. are examined in [15,16], but is not considered a viable strategy for a
Random deployment strategies can be attractive due to their ease of large network operating in an environment where it is not practical to

✩ This material is based upon work supported by the National Science Foundation, United States of America under Grant No. CMMI-1751191.
∗ Corresponding author.
E-mail addresses: nicholas.boardman@afit.edu (N.T. Boardman), ksulliv@uark.edu (K.M. Sullivan).

https://doi.org/10.1016/j.ress.2023.109803
Received 15 July 2021; Received in revised form 7 November 2023; Accepted 8 November 2023
Available online 24 November 2023
0951-8320/© 2023 Published by Elsevier Ltd.
N.T. Boardman and K.M. Sullivan Reliability Engineering and System Safety 243 (2024) 109803

do not address the need to deploy sensors in multiple stages to maintain


Key Notation a WSN over a longer horizon.
 The set of all subregions. In this work we consider the problem of selectively redeploying
 The Wireless Sensor Network. sensor nodes into a WSN over a series of maintenance actions subject to
budget limitations. By redeploying sensor nodes, we aim to maximize
𝛼 The coverage requirement.
a multiple-mission measure of a WSN’s reliability of covering an area.
𝐶((𝑡)) The coverage of the network at time 𝑡 ≥ 0.
The main contributions of this work are as follows:
𝑀 The number of missions.
𝛿 The duration of a mission. 1. We formulate the first Markov Decision Process (MDP) model
𝑁𝑚,𝑖,𝑘 The number of functioning sensor nodes to redeploy nodes into a WSN to maximize the reliability of
with age 𝑘 in subregion 𝑖 at the beginning region coverage over time. This model also contributes to the
of mission 𝑚. selective maintenance literature by addressing a large, com-
𝑁̄ 𝑚 The total number of sensor nodes func- plex network with hundreds of components that cannot be rep-
tioning in the network at the beginning of resented using traditional series, parallel, or combinations of
mission 𝑚. simple subsystems.
2. We propose an Approximate Dynamic Programming (ADP) algo-
𝐵𝑚 The budget available for the remaining
rithm to solve the MDP approximately. Noting that the reward
missions.
function of the MDP entails evaluating network reliability, we
𝑺𝑚 The state variable, 𝑺 𝑚 = (𝑵 𝑚 , 𝐵𝑚 ).
customize the ADP using a destruction spectrum approach for
 The set of all possible states. estimating network reliability in the presence of maintenance
𝑥𝑚 The number of sensor nodes deployed at the actions.
beginning of mission 𝑚. 3. We demonstrate the model’s value and the solution procedure’s
𝑥̄ 𝑚 The total number of sensors deployed in efficacy through numerical examples and comparison to simpler
subregion 𝑖 at the beginning of mission 𝑚. node deployment policies.
𝐶𝑚 (𝑥𝑚 ) The cost of action 𝑥𝑚 .
The remainder of this paper proceeds as follows. Section 2 summa-
𝑅𝑚 (𝑺 𝑚 , 𝑥𝑚 ) Network reliability given state 𝑺 𝑚 and
rizes the relevant literature in the areas of WSN reliability evaluation
action 𝑥𝑚 .
and selective maintenance modeling and characterizes this work’s re-
𝑉𝑚 (𝑺 𝑚 ) Value function of state 𝑺 𝑚 , defined as the
lationships to other closely related research. Section 3 formally states
maximum expected number of successful
the problem and underlying assumptions, formulates an MDP model for
missions remaining among missions 𝑚, 𝑚 +
the sensor node deployment problem, and prescribes an ADP approach
1, … , 𝑀 − 1 if the network is in state 𝑺 𝑚 at
to identify node deployment policies. Section 4 presents numerical
the beginning of mission 𝑚.
results for a range of test instances and compares the ADP policies
𝑺 𝑥𝑚 The post-decision state variable, the state to alternative deployment strategies. Finally, Section 5 concludes the
variable immediately after a deployment article, and provides directions for future research.
action.
𝑉𝑚𝑥 (𝑺 𝑥𝑚 ) The value function of the post-decision 2. Literature review
state, defined as the maximum number of
successful missions remaining among mis- The following section overviews key literature in three main areas:
sions 𝑚 + 1, 𝑚 + 2, … , 𝑀 − 1 given the WSN reliability, selective maintenance, and the destruction spectrum,
post-decision state variable 𝑺 𝑥𝑚 . along with other closely related research. The conclusion of this sec-
𝑠𝑛𝛼,𝑖 The probability that the 𝑖𝑡ℎ sensor node tions highlights how the problem and proposed methodology in this
failure results in coverage falling below 𝛼. work combine elements from each of these subjects, and addresses the
( )
𝑏(𝑥; 𝑛, 𝑝) The binomial pdf, 𝑏(𝑥; 𝑛, 𝑝) = 𝑥𝑛 𝑝𝑥 (1−𝑝)𝑛−𝑥 . distinguishing elements.
∑ ()
𝐵(𝑥; 𝑛, 𝑝) The binomial cdf, 𝐵(𝑥; 𝑛, 𝑝) = 𝑥𝑖=0 𝑛𝑖 𝑝𝑖 (1 − Failures in a WSN are frequently attributed to either link fail-
𝑝)𝑛−𝑖 . ures [22] or node failures [23–25]. The impact of component fail-
ures on reliability is reflected through WSN reliability measures that
are commonly defined by traditional reliability definitions such as
two-terminal, 𝑘-terminal, or all-terminal reliability, see [26–28], re-
access failed nodes individually [17]. In [18,19] WSN coverage and/or spectively. Recently, Xiang and Yang [24] introduced a generalized
connectivity is restored by deploying a minimal number of relay nodes. 𝑘-terminal measure to reflect the characteristic that a WSN can function
A similar problem is addressed in [20,21], with an objective of pro- as long as 𝑘 arbitrary sensor nodes are connected to the sink node. In
viding a level of redundancy (i.e., 𝑘-connectivity objective) to ensure addition to battery depletion, WSN reliability in the presence of sensor
the next sensor node failure does not immediately require additional node malfunctions or software errors has been addressed [29]. WSN
reliability considering common cause failures is introduced in [30]
actions to restore the WSN.
where all nodes located in a certain region may be impacted. Per-
The reliability of a WSN is an important metric as well as it can
formance based reliability measures, such as the amount of data that
be used to justify the design, deployment, and operational policies for
can delivered to a desired sink node, are also mentioned in [31] and
individual sensor nodes. While initial WSN reliability (i.e., for a WSN more recently in [14], while Wang et al. [32] define reliability in
constructed at a single point in time) has been considered, research relation to the timeliness of information reaching the sink node. Since
focusing on WSN node redeployment has diverged from research fo- WSNs are frequently deployed with a purpose of monitoring a desired
cusing on WSN reliability evaluation. Specifically, existing research region, WSN reliability definitions have also addressed reliability of
related to WSN node deployment and redeployment typically considers area coverage [23,29,33].
a deterministic coverage, connectivity, or lifetime measure (e.g., time to For complex networks such as WSNs, network reliability evaluation
first node failure) instead of an explicit measure of network reliability. problems are typically #P-Complete [34] and therefore pose a signif-
An additional limitation of existing deployment models is that they are icant computational challenge. An exact approach to WSN reliability,
concerned with the deployment of sensors at a single point in time, and such as a path-set approach in [29] or a derivation of the reliability

2
N.T. Boardman and K.M. Sullivan Reliability Engineering and System Safety 243 (2024) 109803

polynomial as in [28] is limited to WSNs with only a few sensor nodes. smaller subsystems, and ultimately a system that is a combination of
Network reliability is further complicated when there are more than series and/or parallel components.
two states (e.g., operating and failed) for sensor nodes; however [35] Compared to the selective maintenance problems discussed in [37–
recently discussed how a minimal path-set approach can still be used 40,46,47], WSNs typically lack the well defined structure of a series–
to estimate reliability with multi-state nodes. Fault-tree analysis [36] parallel system which complicates the estimation of network reliability.
and reliability block diagram [25] techniques have also been utilized, A survey of selective maintenance problems is provided in [48], with
but are not practical for randomly deployed networks with complex mention to several works that address complex configurations. How-
sensor node communication paths. For large scale WSN’s where exact ever the definition of a complex system in the selective maintenance
methods become intractable, approximation methods such as a Monte literature typically refers to a bridge system, a 𝐾-out-of-𝑁∶𝐺 system,
Carlo simulation can be utilized [23,30]. or a system that is comprised of multiple structures (e.g., series–
Literature addressing the reliability of a WSN has primarily focused parallel) [48]. In this work a complex network refers to a network that
on the evaluation of reliability. When addressing a WSN design prob- cannot be represented by a combination of series, parallel, or other
lem a handful of works have considered an optimization objective well-known configurations.
involving reliability. In [28] one such problem is approached through In extending the selective maintenance problem to a WSN approx-
the evaluation and reliability comparison for a small number of fixed imation methods such as a Monte Carlo simulation or an estimate for
network topologies. In [24] reliability is maximized by varying the a reliability bound might be considered. However relying on such an
transmission power based on the relationship between sensor node approach that requires repeated implementation to optimize a policy is
power consumption and sensor node lifetime. The extension of WSN not computationally tractable.
reliability as an objective beyond initial network deployment, and in We address the complexity present in a reliability objective by
particular informing a maintenance policy to sustain operations for a incorporating the destruction spectrum (D-spectrum) to estimate net-
large-scale WSN as new sensor nodes are deployed in the network has work reliability [49]. In the presence of independent and identically
not been addressed. distributed (i.i.d.) sensor failures the D-spectrum is only a function of
Our problem of maintaining a WSN over an extended period of the network structure, and does not depend on the failure distribution
time subject to limitations on the available maintenance actions (e.g., a
of sensors in the network [50]. While it is possible to compute the
budget) relates closely to the selective maintenance problem. A math-
D-spectrum of a network exactly it is more common to use an approxi-
ematical formulation of the selective maintenance problem in a series–
mation method, particularly when applied to a large, complex system.
parallel system is discussed in [37], where models are presented that
A Monte Carlo estimation of the D-spectrum has been shown to be
maximize system reliability subject to constraints on cost and mainte-
more efficient compared to a traditional Monte Carlo simulation that
nance time available, or minimize cost (time) subject to a constraint on
directly estimates network reliability [51]. The lower computational
the time (cost) and minimum system reliability requirement. In [38] the
effort required in estimating the D-spectrum, algorithms of which are
model is expanded to consider multiple maintenance actions (e.g., min-
outlined in [52,53], becomes significant when reliability estimation is
imally repair failed components, replace failed components, replace
embedded in an optimization problem and may need to be repeated
functioning components), and model the lifetime of an individual com-
over a large number of replications. The D-spectrum has received signif-
ponent with a Weibull failure distribution. In both [37,38] the main-
icant attention in network reliability literature, but its application in a
tenance decision is based on maximizing or minimizing the objective
maintenance setting is still emerging. The D-spectrum is applied in [54]
for the next mission (i.e., until the next maintenance action). Since the
to develop a preventive maintenance policy for a network subject to
system is likely maintained over a series of missions, a maintenance
external shocks causing node failures with equal probability. The D-
policy can be improved by considering the impact of a decision on
spectrum is incorporated in an expected cost-estimate dependent upon
future missions as well. This problem is first explored in [39] through
either a preventive maintenance action if components are repaired prior
an MDP model for a small series–parallel system, and later in [40]
to network failure, or emergency repair if the network has failed. The
by applying ADP methodology to solve for a maintenance policy in a
system comprised of a larger number of subsystems and components. resulting policy determines the number of failed components before a
MDP models are also presented for multi-state components for a 𝐾-out- preventive maintenance action is necessary to minimize the long-run
of-𝑁∶𝐺 system in [41] and a moderately-sized series–parallel system cost.
in [42]. Research that is most closely related to this paper and addresses
Recent attention on the selective maintenance problem has focused elements from each of the previous topics is found in [52,55]. In [52]
on variations to a number of assumptions common in the previous a time-based deployment policy (TBDP) for a WSN is explored where
works. In [43], the authors present a model addressing stochastic the network is restored to a fixed size at periodic time intervals. Sensor
imperfect maintenance. In addition to a do-nothing and perfect mainte- nodes are randomly deployed in the network, and the D-spectrum is
nance action, the decision can be to perform imperfect maintenance but incorporated to estimate both the cost and WSN reliability over a wide
the exact outcome/improvement to the system is uncertain. In much range of deployment policies. Closely related to a TBDP is one in which
of the selective maintenance literature, the time between maintenance a fixed number of sensors are deployed in the network at constant time
actions is also assumed to be constant. The model in [44] introduces intervals. This now results in a varying network size, but [55] address
uncertainty in mission duration, resulting in an unknown time until the how the D-spectrum remains valuable in estimating WSN reliability.
next maintenance action. Meanwhile, in [45] structural dependencies The myopic condition-based deployment policy in [55] deploys new
between components are introduced in which improving system per- sensors to maximize reliability for a single mission, without considering
formance might require maintenance to several components in a group the impacts on future missions. To the best of our knowledge, [52,55]
instead of a single individual component. are the only sensor node redeployment policies in the literature to
Ahadi and Sullivan [40], Liu et al. [46], and Xu et al. [47] all optimize WSN reliability over time. In this work we discuss how the D-
propose MDP formulations for a selective maintenance problem, ap- spectrum can be adapted into a model to estimate WSN reliability in the
plying reinforcement learning or approximate dynamic programming to presence of a condition-based deployment policy in which the decision
address the large size of the state and action space that are encountered. also includes the number of new sensor nodes to deploy in the network,
Notably, all of these works address a reliability objective, and both Liu and complications that arise concerning a dynamic network topology,
et al. [46] and Xu et al. [47] use a fixed plus variable cost model similar dynamic network size, and a dynamic age composition of sensors.
to the one we introduce in Section 3. They are also similar in that the Our research extends the prior work [52,55] by formulating the
test instances consider several components operating together to form node deployment problem as an MDP. Although MDPs have been

3
N.T. Boardman and K.M. Sullivan Reliability Engineering and System Safety 243 (2024) 109803

applied to a variety of WSN problems [56], the work of [16] ap- prevent a further drop in coverage and restore network capability, new
pears to be the only previous MDP model focusing on the problem sensors are deployed in the network, demonstrated in Fig. 1(c). New
of replacing failed nodes over time. We note that the MDP of [16] sensors can be deployed in the network with an objective to improve
makes a significant assumption that all failed nodes equally affect WSN the ability of sensors to communicate with one another, in addition to
performance, thereby disregarding network topology. By comparison, re-establishing coverage in portions of the network that were severely
our work specifically considers network topology within the MDP. impacted by failures.
While Liu et al. [46], Xu et al. [47], and Ahadi and Sullivan [40] The desire of deploying new sensors in the WSN is to enable
formulate MDPs for the selective maintenance problem, the focus of the region of interest to be monitored over a sequence of missions
this work is different in several key areas. First, the size of the network {0, 1, … , 𝑀 −1}. Each mission is of equal duration 𝛿, and mission 𝑚 cor-
is on scale of hundreds of sensor nodes connected in a non-standard responds to the duration of time between 𝑚𝛿 and (𝑚+1)𝛿. Additionally,
series–parallel fashion. Second, new sensor nodes are deployed in the it is assumed that the initial network is provided (i.e., node positions
network, which changes the structure of the network over time and at time 𝑡 = 0 are known). The first redeployment action therefore
must be addressed when estimating network reliability. Finally, the corresponds to mission 1 at time 𝑡 = 𝛿. At the beginning of mission 1,
duration of the planning horizon is much longer and requires the and each subsequent mission, the network is observed and a decision
selective maintenance policy to address network performance over a is then made on how many new sensors are deployed in the network.
larger number of consecutive missions. In our discussion throughout we adopt the convention that network
observation and the deployment of any new sensors always occur at the
3. Problem description and model beginning of a mission. Since the end of mission 𝑚 − 1 corresponds to
the beginning of mission 𝑚, an equivalent statement is that the network
In this section we discuss a condition-based node deployment MDP is observed at the end of mission 𝑚 − 1, the deployment of new sensors
model in which a limited budget is available to deploy additional occurs, and then mission 𝑚 starts. For consistency purposes and ease
sensors in the network. The WSN, represented by , is comprised of of state variable and decision variable definitions introduced later, we
a collection of sensor nodes and a sink node deployed throughout a always refer to both actions occurring at the beginning of a mission.
region of interest. Sensor nodes in the network are responsible for
communicating with neighboring nodes to route information through 3.1. Deployment actions and template structures
the network, with a desired destination at the sink node. In addition to
a communication capability, sensor nodes are tasked with monitoring To avoid the computational effort in explicitly modeling the location
the surrounding area and desired target locations in the region. We of each newly deployed node, we instead allow the decision maker
assume a unit disk graph model in which a pair of sensor nodes can to specify a subregion into which each new node is deployed, and
communicate directly if their distance from each other is no more than assume the new node is deployed randomly and uniformly within
𝑑1 . Similarly, we assume a functioning sensor node can monitor any that subregion. Accordingly, the region of interest is partitioned into
target within a distance of 𝑑2 . a number of subregions represented by the index set  = {1, 2, … , 𝑟}.
For a target to be covered in the network it must not only be within As specified in Definition 1 and Assumption 1, we further restrict the
the monitoring radius of a functioning sensor; there must also be a redeployment decision by assuming new nodes are assigned to a given
communication path through a sequence of functioning sensor nodes subregion based on the network size and the current number of sensor
(that can communicate directly) from the monitoring sensor back to nodes in a subregion.
the sink node. The ability of sensors to communicate with one another
declines over time as a result of sensor node failures, which also impacts Definition 1. For a given network size, the template structure for a WSN
the collection of targets covered. The lifetime of an individual sensor specifies how many sensor nodes should be located in each subregion.
node is modeled by a survival function 𝐹̄ (𝑡) = 1 − 𝐹 (𝑡), where 𝐹 (𝑡) That is, if there are 𝑛 sensor nodes in the network, a template structure,
represents the cumulative distribution function (cdf) of sensor node ℎ𝑛 = (ℎ𝑛 (1), ℎ𝑛 (2), … , ℎ𝑛 (𝑟)), specifies the number of nodes, ℎ𝑛 (𝑖), located
lifetime and is assumed to be identically distributed for all sensors. At in each subregion 𝑖 ∈ .
time 𝑡 ≥ 0, the WSN  is represented by (𝑡) and consists of sensors Definition 1 introduces the idea of a template structure, which
that remain functioning at time 𝑡. The proportion of targets covered, might be informed by desired performance goals (e.g., a highly reli-
or WSN coverage, is denoted 𝐶((𝑡)) and informs the condition of the able network). For example, sensor nodes located near the sink node
network. contribute significantly in routing information from nodes farther away
Note that WSN coverage is dependent upon the number of tar- that cannot directly communicate with the sink node. Therefore, it
gets within range of a functioning sensor node (influenced by 𝑑2 ), might be desirable to deploy sensor nodes in a higher density near the
and the ability of a sensor node within range of a target to route sink to provide redundant communication paths, and prevent a single
information to the sink node, communicating through multiple hops node failure from disconnecting a large portion of the WSN. In addition
if necessary to route information over a longer distance (influenced to influencing the initial deployment of sensors, a template structure(s)
by 𝑑1 ). The survival function 𝐹̄ (𝑡) is defined for each individual sen- can be informative in a WSN maintenance policy that deploys new
sor node, and impacts 𝐶((𝑡)) as nodes fail over time and network sensor nodes by advising the subregion new sensor nodes should be
communication/monitoring capabilities degrade. The timing and loca- deployed in.
tions of newly deployed sensor nodes also impact 𝐶((𝑡)) as the new
sensors may reestablish coverage over certain targets and/or restore Assumption 1. When new sensor nodes are deployed in the WSN,
connectivity with a group of sensor nodes that remain functioning from the number of new nodes deployed in each subregion is determined by
previous missions but were isolated from the sink node due to sensor the template structure for the resulting network size. That is, suppose
node failures along a communication path. The redeployment decision there are currently 𝑛𝑖 sensor nodes located in each subregion 𝑖 ∈ .

must consider these factors in order to maximize WSN reliability at key Let 𝑛′ = 𝑖∈ 𝑛𝑖 and suppose we wish to deploy 𝑛′′ new nodes. Let 𝑧𝑖
points in time, where reliability is defined as 𝑃 [𝐶((𝑡))] ≥ 𝛼 for a given denote the number of nodes deployed to subregion 𝑖 ∈ . We choose
′ ′′
coverage requirement 𝛼. 𝑧𝑖 , 𝑖 ∈ , to minimize max{ℎ𝑛 +𝑛 (𝑖) − 𝑛𝑖 − 𝑧𝑖 ∶ 𝑖 ∈ }. Note that the
An example of the WSN evolution over time is illustrated in Fig. 1. resulting number of nodes 𝑛𝑖 + 𝑧𝑖 in each subregion 𝑖 ∈  will be equal
′ ′′ ′ ′′
In Fig. 1(a) the WSN contains a large number of sensors and covers to ℎ𝑛 +𝑛 (𝑖) unless 𝑛𝑗 > ℎ𝑛 +𝑛 (𝑗) for some subregion 𝑗 ∈ , in which
a significant portion of the region. Over time sensors fail and can case we ensure the resulting number of nodes in each subregion 𝑖 ∈ 
′ ′′
dramatically impact network performance, as illustrated in Fig. 1(b). To is not too much smaller than ℎ𝑛 +𝑛 (𝑖).

4
N.T. Boardman and K.M. Sullivan Reliability Engineering and System Safety 243 (2024) 109803

Fig. 1. (a) Initial WSN with sink node (⋆) and functioning sensor nodes (∙) ; (b) WSN with failed sensors (◦) ; (c) WSN with newly deployed sensors (∙).

To illustrate Assumption 1, suppose there are 𝑟 = 3 subregions 3.2. MDP formulation


with (𝑛1 , 𝑛2 , 𝑛3 ) = (3, 4, 6) nodes currently in each subregion. Note that
𝑛′ = 𝑛1 + 𝑛2 + 𝑛3 = 13 and suppose we wish to locate 𝑛′′ = 7 new When new sensors are deployed in the WSN, a fixed cost 𝑐𝐹 is
nodes. Suppose the template structure for size 𝑛 = 𝑛′ + 𝑛′′ = 20 is incurred if at least one sensor is deployed in addition to a variable cost
(ℎ20 (1), ℎ20 (2), ℎ20 (3)) = (6, 9, 5). Then the new nodes will be assigned 𝑐𝑉 for each sensor deployed. The fixed-plus-variable cost model relates
to subregions according to either (𝑧1 , 𝑧2 , 𝑧3 ) = (2, 5, 0) or (3, 4, 0). Note to cost models discussed in [16,46], and is also used in a related work
that in this example it is not possible to deploy new nodes in a manner investigating time-based redeployment policies [52]. It is assumed that
that achieves the template structure for a 20 node network. However, all sensors deployed in the network are homogeneous, in the sense that
either of these actions ensures that the resulting number of nodes 𝑛𝑖 +𝑧𝑖 all sensor capabilities are identical and sensors follow an i.i.d. failure
in each subregion 𝑖 ∈ {1, 2, 3} is no more than 1 less than ℎ20 (𝑖). distribution, 𝐹 .
Assumption 1 leverages Definition 1 by using the template structure Since new sensors are deployed in the network over a sequence
to inform the subregion new sensor nodes should be deployed in. This of missions, the collection of sensors is heterogeneous in the sense
assumption simplifies the decision problem in that the primary decision that sensors have different ages, and therefore different residual life
must now address how many sensor nodes to deploy, and the resulting distributions. Let 𝑘 be the age of a sensor in the network, where sensors
network size and template structure will influence the subregion a new are deployed with initial age 𝑘 = 0. The age of a sensor therefore
node is deployed in. corresponds to how many missions the sensor has survived. Define
Collectively, Definition 1 and Assumption 1 support the idea there  = {0, 1, … , 𝐾} as the set of all possible ages, where 𝐾 is some upper
is some insight beforehand into how the network should be designed, bound on the age of a sensor in the network.
and the decision should reflect that new nodes are deployed to achieve The state space consists of two main components, the first of which
something close to this design over time. Even when exact sensor is the observed distribution of sensors in the network and is defined as
node placement is not feasible introducing a number of subregions and
defining a template structure allows new nodes to be deployed with 𝑵 𝑚 = (𝑁𝑚,𝑖,𝑘 )𝑖∈,𝑘∈ ≡ (𝑁𝑚,1,0 , 𝑁𝑚,1,1 , … , 𝑁𝑚,1,𝐾 , 𝑁𝑚,2,0 , … , 𝑁𝑚,𝑟,𝐾 ), (1)
varying density throughout the network, the advantages of which are where 𝑁𝑚,𝑖,𝑘 denotes the number of functioning sensors with age 𝑘 ∈ 
explored in [57]. in subregion 𝑖 ∈  (immediately prior to the deployment of any new
It is important to note that the deployment action (specifically the sensors) at the beginning of mission 𝑚. The total number of functioning
subregion new nodes are deployed in) is determined by the template ∑ ∑
sensors in the network is denoted by 𝑁̄ 𝑚 = 𝑖∈ 𝑘∈ 𝑁𝑚,𝑖,𝑘 . The
structure, however selecting the ‘‘best’’ or ‘‘optimal’’ template structure second component of the state space is the budget available to deploy
is not a trivial task. In fact, one of the implications of Assumption 1 is sensors during mission 𝑚 (and all future missions), denoted 𝐵𝑚 . Com-
that the decision avoids the complexity present in an optimal network bining these two components, the state of the system at the beginning
design problem as well, allowing our model to place a larger focus on of mission 𝑚 is defined by 𝑺 𝑚 = (𝑵 𝑚 , 𝐵𝑚 ) ∈ , where  is the set of all
the impact of WSN maintenance, specifically the timing and the number possible states.
of nodes to deploy. After observing the state of the network, a decision must be made
We recognize that there are a large number of strategies in defining on how many new sensors are deployed. Let 𝑥𝑚 denote the number
subregions (both the number and size) as well as approaches to inform of sensors deployed at the beginning of mission 𝑚. We define 𝑥̄ 𝑚𝑖 as
the template structure. Given the complexity present by addressing the number of sensors deployed in subregion 𝑖 ∈  at the beginning of
network reliability and a series of actions to maintain a WSN over mission 𝑚, and note that 𝑥̄ 𝑚𝑖 is determined based on Assumption 1. The
time, Definition 1 and Assumption 1 are present to avoid introducing resulting cost from implementing action 𝑥𝑚 is denoted 𝐶𝑚 (𝑥𝑚 ), where
additional levels of difficulty in the model and/or decision. However, {
they are designed to be flexible and allow the model to address relax- 𝑐𝐹 + 𝑐𝑉 𝑥𝑚 , if 𝑥𝑚 > 0,
𝐶𝑚 (𝑥𝑚 ) = (2)
ations of these assumptions in future work. In this manner, one of the 0, otherwise.
benefits of introducing subregions is that the model is flexible and able The transition probability functions can now be used to characterize
to address the scenario in which sensors are randomly deployed over how the system evolves from one state to another. First, note that
the entire region (i.e., 𝑟 = 1), as well as scenarios in which a more an individual sensor with age 𝑘 survives the current mission with
controlled deployment (i.e., 𝑟 > 1) is possible [57,58]. The number of probability
subregions can also be influenced by the application of the WSN, thus
the model is not dependent on a specific number of subregions. The 𝐹̄ ((𝑘 + 1)𝛿)
𝑝𝑘 = . (3)
focus of this work is therefore not on the number of subregions or how 𝐹̄ (𝑘𝛿)
to optimally partition the region, but rather allow the model to address Using the survival probability for an individual sensor, the transition
these different scenarios. probability for the number of sensors with age 𝑘 in subregion 𝑖 is

5
N.T. Boardman and K.M. Sullivan Reliability Engineering and System Safety 243 (2024) 109803

determined by a small number of actions, and the actions are feasible to the original
problem with only a cost constraint.
⎧𝑏(𝑁𝑚+1,𝑖,1 ; 𝑥̄ 𝑚𝑖 , 𝑝𝑘−1 ), if 𝑘 = 1 and The value function, 𝑉𝑚 (𝑺 𝑚 ), is defined as the maximum number of

⎪ 0 ≤ 𝑁𝑚+1,𝑖,𝑘 ≤ 𝑥̄ 𝑚𝑖 , successful missions remaining among missions 𝑚, 𝑚 + 1, … , 𝑀 − 1 if the
Pr(𝑁𝑚+1,𝑖,𝑘 |𝑁𝑚,𝑖,𝑘−1 , 𝑥𝑚 ) = ⎨ system is in state 𝑺 𝑚 at the beginning of mission 𝑚. To determine an
⎪ 𝑏(𝑁𝑚+1,𝑖,𝑘 ; 𝑁 , 𝑝
𝑚,𝑖,𝑘−1 𝑘−1 ), if 𝑘 > 1 and
⎪ optimal policy to (7) we must find a solution to Bellman’s equation,
⎩ 0 ≤ 𝑁𝑚+1,𝑖,𝑘 ≤ 𝑁𝑚,𝑖,𝑘−1 . { [ | ]}
𝑉𝑚 (𝑺 𝑚 ) = max 𝑅𝑚 (𝑺 𝑚 , 𝑥𝑚 ) + E 𝑉𝑚+1 (𝑺 𝑚+1 )|𝑺 𝑚 , 𝑥𝑚 . (9)
(4) 𝑥𝑚 ∈𝑺 𝑚 |

where 𝑏(𝑛; 𝑥, 𝑝) is the binomial probability of 𝑛 successes in 𝑥 trials


3.3. ADP formulation
with probability of success 𝑝. The overall transition probability given
maintenance action 𝑥𝑚 can now be determined by
The previous section provides an initial MDP model for the
∏∏ condition-based sensor deployment problem over a sequence of 𝑀
Pr(𝑵 𝑚+1 |𝑵 𝑚 , 𝑥𝑚 ) = Pr(𝑁𝑚+1,𝑖,𝑘 |𝑁𝑚,𝑖,𝑘−1 , 𝑥𝑚 ). (5)
𝑖∈ 𝑘∈
missions. Common to many dynamic programming problems, this
model suffers from the curses of dimensionality [59]. The large size
The second component of the state variable is the budget, which
of the state space can be illustrated by examining the distribution of
transitions based on the corresponding cost of the action implemented,
sensor nodes in the network. For a network containing 𝑖 sensor nodes,
these nodes can be allocated to different subregions of the network
(𝑟+𝑖−1)
𝐵𝑚+1 = 𝐵𝑚 − 𝐶𝑚 (𝑥𝑚 ). (6) 𝑖
different ways. Due to node failures and the deployment of
new sensor nodes, the total number of sensor nodes in the network
The state transition function is defined as 𝑺 𝑚+1 = 𝑆 𝑀 (𝑺
𝑚 , 𝑥𝑚 , 𝑾 𝑚+1 ), also varies between 0 and 𝑛𝑚𝑎𝑥 . As a result, the size of the state space
where 𝑾 𝑚+1 represents information on sensor failures that occur dur- considering only the distribution of sensor nodes in the network is
ing mission 𝑚. ∑𝑛𝑚𝑎𝑥 (𝑟+𝑖−1)
𝑖=0 𝑖
. Note that this does not include any information about
Given a starting budget 𝐵0 , the objective is to deploy sensors in the age composition of nodes, which further complicates the size of the
the network to maximize the expected number of successful missions. state space. The remaining budget is also a factor, and can be bounded
For a given coverage requirement 𝛼, an individual mission is successful between 0 and 𝐵0 . Assuming integer values of 𝑐𝐹 and 𝑐𝑉 then the
if WSN coverage over the duration of the mission remains above this budget for mission 𝑚 can also assume integer values between 0 and 𝐵0 ,
requirement. Network reliability is also defined with respect to 𝛼, ∑ 𝑚𝑎𝑥 ( )
and the size of the state space can be bounded by 𝐵0 𝑛𝑖=0 𝑟+𝑖−1 𝑖
for a
and is defined as the probability the coverage requirement is satisfied single mission. The large deployment action space and outcome space
over the mission duration. From an observed network state 𝑺 𝑚 and (i.e., observing sensor node failures) are additional components that
implementing action 𝑥𝑚 , the resulting network reliability is denoted limit exact algorithms to be applied for only small problem instances.
𝑅𝑚 (𝑺 𝑚 , 𝑥𝑚 ) = 𝑃 [𝐶((𝑚𝛿 + 𝛿)) ≥ 𝛼] where 𝑚𝛿 + 𝛿 refers to the period For large-scale WSNs of interest, ADP can be applied to the
of time at the end of mission 𝑚 prior to the deployment of new sensors condition-based sensor deployment problem. First, the optimality equa-
at the beginning of the subsequent mission. Let 𝑋𝑚𝜋 (𝑺 𝑚 ) be a policy that tions can be reformulated around the post-decision state variable, 𝑺 𝑥𝑚 =
determines the sensor deployment action (when and how many sensor (𝑵 𝑥𝑚 , 𝐵𝑚𝑥 ), which is the state at the beginning of mission 𝑚 immediately
nodes are deployed) for each state 𝑺 𝑚 ∈ . For a given number of after new sensor nodes have been deployed. In the post-decision state,
missions 𝑀, the objective is the number of sensors functioning in each subregion immediately after
{𝑀−1
∑ } new nodes have been deployed and the total number of sensor nodes
max E𝜋 𝑅𝑚 (𝑺 𝑚 , 𝑋𝑚𝜋 (𝑺 𝑚 )) . (7) in the network are represented by 𝑵 𝑥𝑚 and 𝑁̄ 𝑚𝑥 , respectively. Analogous
𝜋∈𝛱
𝑚=0 to Eq. (1), 𝑵 𝑥𝑚 is a vector with 𝑟𝐾 components such that 𝑁𝑚,𝑖,𝑘 𝑥 refers
Constraining a decision each mission is first the budget available, to the post-decision number of functioning sensor nodes of age 𝑘 ∈ 
𝐵𝑚 , to deploy sensors in the network. Additionally, there may be some in subregion 𝑖 ∈ . Collectively, the post-decision state variable 𝑺 𝑥𝑚 =
desired minimum reliability (i.e., probability of mission success), 𝜙, (𝑵 𝑥𝑚 , 𝐵𝑚𝑥 ) is defined by (10) and (11) below.
that each mission must achieve. This constraint is intended to pre- {
𝑥 𝑥̄ 𝑚𝑖 if 𝑘 = 0
vent the scenario where network reliability is completely sacrificed 𝑁𝑚,𝑖,𝑘 = (10)
(i.e., unacceptably low reliability and almost certain network failure) 𝑁𝑚,𝑖,𝑘 if 𝑘 ∈ {1, 2, … , 𝐾}
one mission, while the reliability of a later mission is near one. Finally, 𝐵𝑚𝑥 = 𝐵𝑚 − 𝐶𝑚 (𝑥𝑚 ). (11)
there may exist an upper limit on the number of sensors allowed in
the network, 𝑛𝑚𝑎𝑥 , to prevent the region from becoming saturated with Similarly, 𝐵𝑚𝑥 refers to the remaining budget after implementing the
sensors at any given time. Overall the set of feasible actions, 𝑺 𝑚 , deployment action. Let 𝑉𝑚𝑥 (𝑺 𝑥𝑚 ) denote the value of being in the post-
during mission 𝑚 is therefore defined by decision state 𝑺 𝑥𝑚 , and is defined as the maximum number of successful
{ } missions among missions 𝑚 + 1, 𝑚 + 2, … , 𝑀 − 1 given the post-decision
𝑺 𝑚 = 𝑥𝑚 ∶ 𝐶𝑚 (𝑥𝑚 ) ≤ 𝐵𝑚 , 𝑅𝑚 (𝑺 𝑚 , 𝑥𝑚 ) ≥ 𝜙, 𝑁̄ 𝑚 + 𝑥𝑚 ≤ 𝑛𝑚𝑎𝑥 . (8) state variable 𝑺 𝑥𝑚 . The relationship between 𝑉𝑚𝑥 and 𝑉𝑚 is given by
A complicating aspect in determining the set of feasible actions is 𝑥
𝑉𝑚−1 (𝑺 𝑥𝑚−1 ) = E[𝑉𝑚 (𝑺 𝑚 )|𝑺 𝑥𝑚−1 ], (12)
the reliability requirement an action must satisfy. Because network
reliability problems commonly fall in the #P-Compete class of problems where
{ }
determining the exact set of feasible actions as defined by (8) is not
𝑉𝑚 (𝑺 𝑚 ) = max 𝑅𝑚 (𝑺 𝑚 , 𝑥𝑚 ) + 𝑉𝑚𝑥 (𝑺 𝑥𝑚 ) . (13)
a trivial task. Section 3.3.1 addresses this difficulty by outlining an 𝑥𝑚 ∈𝑺 𝑚
efficient method to estimate network reliability and instead apply the
Substituting (13) into (12) we obtain the optimality equations around
constraint to the estimated reliability of an action, 𝑅̂ 𝑚 (𝑺 𝑚 , 𝑥𝑚 ). In doing
the post-decision state variable
so the set of feasible actions is now approximated as well, and it is { }
( )
possible our approximation includes actions that are not feasible to (8). 𝑥
𝑉𝑚−1 (𝑺 𝑥𝑚−1 ) = E max 𝑅𝑚 (𝑺 𝑚 , 𝑥𝑚 ) + 𝑉𝑚𝑥 (𝑺 𝑥𝑚 ) |𝑺 𝑥𝑚−1 . (14)
𝑥𝑚 ∈𝑺 𝑚
That is, the estimated reliability of an action may satisfy the constraint
and therefore appear in our approximated action set, but the true value One of the advantages of utilizing the post-decision state variable
might be below the requirement. However, this should only occur for is the expectation is now outside of the maximization problem. The

6
N.T. Boardman and K.M. Sullivan Reliability Engineering and System Safety 243 (2024) 109803

resulting maximization problem in (14) is less complicated than the One of the limitations of the proposed approach to reliability es-
original formulation in (9), but still requires an evaluation of network timation is that it uses the stable residual life distribution derived
reliability. Due to Assumption 1 the network structure with respect to in (16b), which relies on a probability distribution of sensor ages
the post-decision state 𝑺 𝑥𝑚 will be similar to the template structure given aggregated over the entire network. Since we observe information on
by Definition 1. Based on this observation, the following section de- the age distribution of sensors within a subregion, it is reasonable to
scribes an approach for approximating 𝑅𝑚 (𝑺 𝑚 , 𝑥𝑚 ) based on estimating question why this level of detail is not retained and incorporated in our
the D-spectrum of the template structure. estimation method. That is, the residual life distribution can be subre-
gion dependent and more accurately reflect the state of the network.
A disadvantage of this approach is it now requires an application of
3.3.1. Destruction spectrum reliability estimation
the multi-dimensional D-spectrum [61] which is more complicated to
The D-spectrum is an approach to evaluating reliability, 𝑅𝑚 (𝑺 𝑚 , 𝑥𝑚 ),
estimate. To alleviate introducing further complexity into the model,
which appears both in Eq. (8) and (14). The action of deploying
we leave an in-depth investigation for this consideration for future
new sensors in the network influences the network structure and the
work.
number of sensors functioning in each subregion of the network. From
information available in the post-decision state variable we can apply
3.3.2. Value function approximation
the network D-spectrum to estimate reliability, but must first define
Due to the large state space, we approximate the value function
a number of state aggregation functions. Let  (𝑎) be the state space
through the use of the previously defined aggregation functions and
at the 𝑎th level of aggregation, where the aggregation function 𝐴𝑎
lookup tables. This is based on the observation that the age composition
maps the original state space  to  (𝑎) . Define 𝐴1 as the function that
of sensors in the network and the distribution of sensors contribute
aggregates over the age composition of sensors in a subregion, resulting
greatly to the size of the state space. The former is necessary to
in the number of sensors in each subregion, 𝑁𝑚(1) = (𝑁𝑚1
(1) (1)
, 𝑁𝑚2 (1)
, … , 𝑁𝑚𝑟 ).
2
estimate the stable residual life distribution while the latter is necessary
The second aggregation function, 𝐴 , aggregates over the subregions
to estimate the destruction spectrum, both of which are required to
in the network, resulting in the number of sensors with a given age,
estimate reliability of the current mission. It is reasonable to expect that
𝑁𝑚(2) = (𝑁𝑚0
(2) (2)
, 𝑁𝑚1 (2)
, … , 𝑁𝑚𝐾 ). while both of these components will impact future missions as well,
Applying the first aggregation function to the post-decision state the primary factor impacting future missions can be summarized by
variable, we can determine the number of sensors functioning in each the number of nodes in the network. Therefore, we aggregate over the
subregion which informs the current WSN structure. Alternatively, as a age composition and subregion distribution of sensors in approximating
result of Assumption 1 this closely matches a predefined template struc- the value function. This is defined as the aggregation function 𝐴3 ≡
∑ ∑ ̄
ture. This is of significance because we can estimate the D-spectrum 𝑖∈ 𝑘∈ 𝑁𝑚,𝑖,𝑘 , which is equivalent to 𝑁𝑚 .
for each of our template structures, an in turn WSN reliability from the Additionally, the starting budget 𝐵0 influences the size of the state
post-decision state. The D-spectrum estimate with respect to template space and impacts the ability to deploy new sensors in the network.
𝑛
structure ℎ𝑛 is denoted 𝑠̂ℎ𝛼,𝑖 , and is the probability the 𝑖th sensor failure Assuming the variable cost of deploying additional sensors is relatively
results in network coverage falling below the requirement 𝛼 in a small (particularly compared to the total cost), deploying one or two
network where there are ℎ𝑛 (𝑗) sensor nodes randomly and uniformly additional sensors has a minor impact on the budget remaining. It is
located in subregion 𝑗 ∈ . A Monte Carlo simulation is implemented also reasonable to assume that the impact of deploying one or two
to estimate the D-spectrum, which further illustrates the value of additional sensors has a minor increase to the overall value function,
Assumption 1. Instead of requiring a Monte Carlo simulation repeatedly particularly when compared to the impact of deploying 15 to 20 addi-
throughout the MDP/ADP model, the D-spectrum is only required for tional sensors. As a result we can aggregate the budget into different
the template structures which can be estimated prior to solving the intervals corresponding to a range of values that result in a similar state
model, avoiding the need to constantly estimate the D-spectrum within value. If the budget is aggregated into intervals of size 𝑑, there are now
𝐵
the model itself. 𝐵̄ 0 = ⌈ 𝑑0 ⌉ different budget states.
From the second aggregation function we can determine the prob- The approximate value function for a given post-decision state
ability of randomly selecting a sensor with age 𝑘 in the network by 𝑺 𝑥𝑚 is denoted 𝑉̄𝑚 (𝑺 𝑥𝑚 ), and with an aggregated state space size of
approximately 𝐵̄ 0 × 𝑛𝑚𝑎𝑥 is significantly smaller than the original state
𝑥,(2) space. We recognize that there are alternative methods to approximate
𝑁𝑚𝑘
𝜌̃𝑘 = , 𝑘 ∈ Z≥0 . (15) the value function (e.g., using basis functions with a regression model,
𝑁̄ 𝑥𝑚 nonparametric models, etc.), and a look-up table significantly simplifies
With (15), the residual life distribution for a sensor randomly selected this step. For the results in Section 4 we will demonstrate that the
in the network is now given by the cdf resulting CBDP performs favorable in comparison to existing node
deployment policies [52,55].
∑∞
𝐹 (𝑘𝛿 + 𝑡) − 𝐹 (𝑘𝛿)
̃ 𝛿) =
𝐺(𝑡; 𝜌̃𝑘 , (16a)
𝑘=0 𝐹̄ (𝑘𝛿) 3.3.3. Determining an optimal action
𝑥,(2) The primary question that remains is addressing how the maximiza-


𝐹 (𝑘𝛿 + 𝑡) − 𝐹 (𝑘𝛿) 𝑁𝑚𝑘
= , (16b) tion problem in (14) is solved for the optimal value and corresponding
𝑘=0 𝐹̄ (𝑘𝛿) 𝑁̄ 𝑚𝑥 action. From the observed state 𝑺 𝑚 , we can first determine an upper
and note that this follows a development similar to that in [52], based bound on the number of sensors that can be deployed by 𝑛̃ = 𝑛𝑚𝑎𝑥 − 𝑁̄ 𝑚
upon [60]. From the D-spectrum estimate and residual life distribution (assuming the budget does not limit us first). This results in a range of
in (16b), network reliability over the next mission, given the observed ̃ to search for the number of new sensor nodes to deploy. Since
(0, 𝑛)
state 𝑺 𝑚 and action 𝑥𝑚 can be estimated by the D-spectrum is independent of the failure distribution of sensors
in the network, the reliability for each post-decision state evaluated
̄𝑥
𝑁𝑚
∑ 𝑛
in this search can be quickly estimated without re-evaluating the D-
𝑅̂ 𝑚 (𝑺 𝑚 , 𝑥𝑚 ) = 𝑠̂ℎ𝛼,𝑖 𝐵(𝑖 − 1; 𝑁̄ 𝑚𝑥 , 𝐺(𝛿;
̃ 𝛿)), (17) spectrum. The only step that is required is to update the residual
𝑖=1
life distribution (16b), after which reliability can be estimated by
where 𝐵(𝑖 − 1; 𝑁̄ 𝑚𝑥 , 𝐺(𝛿;
̃ 𝛿)) is the cumulative binomial probability distri- applying (17). While estimating the D-spectrum is more efficient than
bution of no more than 𝑖 − 1 successes in 𝑁̄ 𝑚𝑥 trials with probability of a traditional Monte Carlo simulation to estimate reliability, repeatedly
̃ 𝛿) [50].
success 𝐺(𝛿; estimating the D-spectrum for different network structures becomes

7
N.T. Boardman and K.M. Sullivan Reliability Engineering and System Safety 243 (2024) 109803

computationally burdensome. With Assumption 1, we can estimate Algorithm 1 AVI for Finite Horizon Problem Using the Post-Decision
the D-spectrum for template structures over a range of network sizes State
(e.g., for a network with 300 to 𝑛𝑚𝑎𝑥 sensors) once at the very beginning 1: function AVI
of the problem and store the estimates for use later in the ADP model. 2: Initialization: approximation of the value function 𝑉̄𝑚0 (𝑺 𝑥𝑚 ) for
As a further enhancement, we revisit the discussion from Sec- all post-decision states, and an initial state 𝑺 𝑥,1 0
. Set 𝑦 = 1.
tion 3.3.2 in which we noted that a single additional sensor has a 3: For 𝑚 = 0, 1, 2, … , 𝑀 − 1,
minor impact on network reliability and the future number of successful 4: Determine 𝑣̂ 𝑦𝑚 by
missions. Based on this observation, we can search the range (0, 𝑛) ̃ ( )
𝑣̂ 𝑦𝑚 = max 𝑅𝑚 (𝑺 𝑦𝑚 , 𝑥𝑚 ) + 𝑉̄𝑚𝑦−1 (𝑺 𝑥,𝑦
𝑚 )
in intervals of 𝑑 nodes, i.e., resulting in approximately 𝑛∕𝑑
̃ reliability 𝑥𝑚 ∈𝑺 𝑚
evaluations instead of the 𝑛̃ evaluations that would be required to do a
complete search of (0, 𝑛).
̃ and let 𝑥𝑦𝑚 be the optimal action.
𝑦−1
5: Update 𝑉̄𝑚−1 using
3.3.4. Initializing the value function 𝑦
𝑉̄𝑚−1 (𝑺 𝑥,𝑦
𝑚−1
𝑦−1
) = (1 − 𝜂𝑦−1 )𝑉̄𝑚−1 (𝑺 𝑥,𝑦
𝑚−1
) + 𝜂𝑦−1 𝑣̂ 𝑦𝑚 .
A more simplistic policy considers the impact of deploying sensors
on only the upcoming mission. This is a version of a myopic policy, 6: Sample 𝑾 𝑦𝑚+1 and compute the next state 𝑺 𝑦𝑚+1 =
and can be informative in our ADP formulation as well. Since a myopic 𝑆 𝑀 (𝑺 𝑦𝑚 , 𝑥𝑦𝑚 , 𝑾 𝑦𝑚+1 ).
policy is interested in reliability of a single mission, the policy will 7: Increment 𝑦. If 𝑦 ≤ 𝑌 go to step 3.
always deploy sensors until a constraint limits the action. That is, the 8: Return the value functions (𝑉̄𝑚𝑛 )𝑀−1
𝑚=0
.
myopic policy will never skip a deployment opportunity, and deploy 9: end function
sensors every mission until a constraint is reached (e.g., budget no
longer available or maximum network size reached). When considering
a myopic policy it is therefore more appropriate to consider, or allocate,
is also selected to model failures in [28,62], with a shape parameter
a small budget to each mission to ensure there is a budget available to
𝛽 = 1.5 and a scale parameter 𝜆 = 10. Sensor capabilities are defined
missions near the end of the planning horizon as well. A myopic CBDP
by on a common communication radius 𝑑1 = 0.075 and a monitoring
is explored in [55], and is of value to our ADP model in two ways. First,
radius of 𝑑2 = 0.075. Values for the sensor node capabilities are selected
as discussed in [55], a myopic CBDP results in a relatively consistent
to provide a notional instance with reasonable parameter values. An
network size. Applying Assumption 1 when a fixed number of sensor
increasing failure rate (IFR) distribution (i.e., 𝛽 > 1) is selected to
nodes are deployed across all missions results in a consistent template
reflect that the expected remaining life of a sensor node should decrease
structure over time. Through a comparison with an ADP policy we
as the node consumes limited available energy. In practice, the scale
can now highlight the significance of allowing greater control on the
parameter 𝜆 would depend upon the hardware, application, and envi-
number of sensor nodes deployed (and therefore budget allocated to)
ronment; however, since an equivalent problem results upon scaling 𝜆
each mission. Second, the resulting reliability estimate of a myopic
and 𝛿 by a constant factor the results are easily generalizable to other
CBDP can be of value in the ADP formulation to initialize the value
values of 𝜆. Values for the communication and monitoring radius are
function. In the ADP problem, if there is a budget 𝐵𝑚 remaining then
selected to provide a balance between the capability of an individual
one option is to evenly allocate this budget to the remaining 𝑀 − 1 − 𝑚
node, and the number of sensor nodes required for overall network
missions. This essentially corresponds to a myopic policy with each
𝐵𝑚 function.
mission receiving 𝑀−1−𝑚 of the budget. The reliability of the myopic
policy can then be used to estimate the number of successful missions The cost of deploying sensors in the network is determined by the
in the remaining 𝑀 − 1 − 𝑚 missions and initialize the value function. variable cost 𝑐𝑉 = 1, with a fixed cost 𝑐𝐹 = 100 incurred each time one
or more sensors are deployed. Fixed and variable costs are selected to
3.4. Approximate value iteration algorithm balance in the ratio of the fixed cost of accessing the network, which
may be large when the network is in a hostile environment, and the
individual cost of a single sensor node. The region of interest is a
Algorithm 1 outlines an approximate value iteration (AVI) algo-
[0, 1] × [0, 1] unit square that is partitioned into 𝑟 = 16 equal sized
rithm utilizing a value function approximation based on a lookup
subregions of size 0.25 × 0.25. This partitioning of subregions is selected
table representation on the aggregated state space, adapted from [59].
to provide the model flexibility in focusing the deployment of new
The AVI algorithm updates our value function approximation over a
nodes either toward the middle of the region or toward the boundaries
sequence of iterations 𝑦 = 1, 2, … , 𝑌 , which in turn updates the CBDP.
as needed. Additionally, 441 targets are uniformly spaced as a 21 × 21
𝑺 𝑦𝑚 represents the observed state at the beginning of mission 𝑚 in
grid representing target locations where the WSN must provide cover-
iteration 𝑦, and 𝑺 𝑥,𝑦𝑚 represents the post-decision state variable given
𝑦 age. The number of targets and their distribution is selected to ensure
action 𝑥𝑚 . 𝑉̄𝑚−1 (𝑺 𝑥,𝑦
𝑚−1
) represents the value function approximation for
coverage throughout the entire region is sufficiently captured.
the post-decision state variable 𝑺 𝑥,𝑦𝑚−1
during iteration 𝑦, and is updated
based on the step size parameter 𝜂𝑦 . While Algorithm 1 outlines a In the results that follow the number of sensor nodes in a given
relatively standard AVI algorithm, we hope to show that the resulting subregion of the WSN, which defines the template structure, is based
CBDP are a significant improvement over both a myopic condition- on a subregion weight and is inversely proportional to (1) the distance
based deployment policy and a time-based deployment policy. As this is from the center of the subregion to the sink node, and (2) the proba-
also one of the first ADP applications for the maintenance of a complex bility that all sensor nodes in a subregion are connected (see [63] for
WSN with respect to a reliability evaluation, the performance of the details). Defining template structures in this manner is influenced by
AVI algorithm can identify components of the model to focus more on two factors. First, if the number of sensor nodes in each subregion is
in future work. approximately equal, then it is desirable to deploy a new node near the
sink and provide a level of redundancy in communication paths with
4. Numerical example the sink node. Second, if a subregion is farther away from the sink
node but has a very small number of functioning sensor nodes then
In this section we illustrate the performance of the ADP formulation they are likely disconnected from one another and/or cover a small
and provide results for a number of test instances. The lifetime of each fraction of the subregion. Therefore we also desire to deploy a number
sensor node is distributed according to a Weibull distribution, which of new sensor nodes in this subregion as well, and constructing template

8
N.T. Boardman and K.M. Sullivan Reliability Engineering and System Safety 243 (2024) 109803

Table 1 of the budget. While the budget is more constraining in this instance,
Test instances and policy performance.
the expected number of successful missions of 23.66 (23.99 without a
𝛿 𝑀 𝐵0 𝜙 𝑉0 MC-PE reliability requirement) is still relatively high. The following pair of test
4 25 8700 0 24.97 24.95 instances result in a similar decline in WSN performance, particularly
4 25 8700 0.95 24.97 24.97
when a reliability requirement is present. Compared to the previous
4 25 7600 0 23.99 23.85
4 25 7600 0.89 23.66 23.66
group of test instances the budget has decreased slightly to 7400, while
4 25 7400 0 23.13 22.69 the decline in the expected number of successful missions is comparable
4 25 7400 0.79 22.97 22.65 to lowering budget from 8700 to 7600. This pair of test instances also
3 33 8050 0 31.89 31.71 help illustrate the value in providing a minimum reliability requirement
3 33 8050 0.85 31.88 31.69
for each mission. When no requirement is imposed and there is no
3 33 7650 0 29.45 28.14
3 33 7650 0.65 26.27 27.42 penalty for WSN failure then network reliability for a given mission
2 50 8700 0 49.95 49.89 can be sacrificed to avoid the fixed cost. This allows a larger number
2 50 8700 0.95 49.96 49.94 of sensors to be deployed over the remaining missions. When the
2 50 7600 0 48.54 47.54
reliability requirement is set to 𝜙 = 0.79 this ensures that the probability
2 50 7600 0.89 48.05 46.73
2 50 7400 0 47.19 45.55
a single mission is successful is still relatively high and also has little
2 50 7400 0.79 46.33 44.89 impact on the expected number of successful mission over the planning
horizon.
In the next grouping of test instances the inspection interval is low-
ered to 𝛿 = 3, and for the total time horizon to remain approximately
structures in this manner is designed to balance these two competing the same the planning horizon for the number of missions is increased
objectives. to 33. The noticeable result from this grouping is again observed in
Defining template structures in this manner also accounts for the the smallest budget instance with a reliability requirement in place.
overall size of the WSN. Smaller sized networks require a more uniform With a budget of 7650 and a minimum reliability requirement of 𝜙 =
distribution of sensor nodes to balance coverage in exterior regions and 0.65, the expected number of successful missions is significantly smaller
sensor nodes located near the sink to support network connectivity. compared to the case when no requirement is in place. This is again a
Meanwhile, once a sufficient number of sensor nodes are deployed result of not penalizing WSN failure, and by sacrificing performance to
in the exterior regions a larger sized network will focus more on the avoid incurring the fixed cost the budget for the remaining missions is
subregion surrounding the sink node in an attempt to increase the large enough to maintain a highly reliable network.
redundancy in communication paths. The last grouping of test instances contain the shortest inspection
The step size influences the rate at which the value function approxi- interval with 𝛿 = 2 and the largest number of missions with 50,
mation is updated and the convergence of the AVI algorithm. Since the influencing the policy in a number of areas. With a smaller inspection
value functions are initialized with a myopic CBM policy, the initial interval the network is observed more frequently, and there is an
step size for updating the value function approximation is 𝜂0 = 0.7, and opportunity to observe a network state that might fail during the next
the step size is updated according to mission that would not be observed under a larger inspection interval.
𝑎 In this scenario, new sensors can be deployed to avoid the potential
𝜂𝑦 = 𝜂0 , (18)
𝑎+𝑦−1 network failure, and the overall number of successful missions should
with 𝑎 = 20. This step size rule allows the rate at which 𝜂 drops to zero increase. Alternatively, with a shorter time between inspections it
to be influenced by the parameter 𝑎, with larger values slowing the rate might be more advantageous to avoid deploying sensors in the network
at which 𝜂 decreases. if the reliability of the upcoming mission is already at a sufficient
For the test instances, the inspection interval 𝛿 varies among level. While this does not improve reliability for the next mission,
{2, 3, 4}, and the number of missions is selected so that the total time the fixed cost is avoided and allows a larger number of sensors to be
horizon (𝑀 ∗ 𝛿) is approximately the same. The coverage requirement deployed in the network over the remaining missions. For the largest
is set at 𝛼 = 0.8, meaning if the WSN covers less than 80% of target starting budget of 8700 the ADP policy again results in an expected
locations the network is in a ‘failed’ state. The maximum network size number of successful missions that is near the total number. Even
is also fixed at 𝑛𝑚𝑎𝑥 = 950 sensors for every test instance, with an though the smaller inspection interval results in more frequent network
initial number of 𝑁̄ 0 = 650 sensors deployed in the region. Parameter observation and more flexibility in when sensors are deployed, the
values for each test instance, to include the starting budget, 𝐵0 , and decline in the expected number of successful missions as the starting
reliability requirement, 𝜙, are provided in Table 1. To force exploration budget decreases remains noticeable.
in the decision space, each mission there is a 5% chance a random
non-optimal deployment action is implemented. 4.1. Monte Carlo policy performance
Table 1 also provides performance results of Algorithm 1 with 𝑌 =
300 replications, where column 5 (labeled 𝑉0 ) reports the expected The optimal CBDP identified by the ADP algorithm is also imple-
number of successful missions from the resulting ADP policy. The final mented in a Monte Carlo simulation to observe the average number of
column in the table, labeled Monte Carlo Policy Evaluation (MC-PE), successful missions the policy achieves, and is reported in the ‘‘MC-PE’’
reports the average number of successful missions observed when the (Monte Carlo Policy Evaluation) column of Table 1. These results help
optimal ADP policy is evaluated through a Monte Carlo Simulation, demonstrate the performance of the deployment policy in a simulated
assisting a later discussion on a comparison of the expected vs observed setting obtain results close to the predicted values. In several of the test
policy performance. Starting with 𝛿 = 4 and the largest budget 𝐵0 = instances with a larger inspection interval the performance of the ADP
8700, the WSN is not overly strained and a sufficient number of new policy matches the expected number of successful missions. The largest
sensors can be deployed when needed to maintain the WSN at a high difference between the expected and observed number of successful
level. The budget is also large enough that enforcing a minimum missions occurs for the smallest budget and smallest inspection interval
reliability requirement on every mission has little impact on the perfor- test instance. In this test instance, the observed number of successful
mance of the optimal deployment policy. The next pair of test instance missions is slightly smaller than the expected number. Observing the
reduces the budget by 1100 which corresponds to a smaller number largest deviation in this test instance is somewhat expected since this
of sensors that can be deployed, and a larger emphasis on deploying corresponds to a more difficult scenario. A smaller 𝛿 results in more
sensors effectively to avoid the fixed cost consuming a large portion missions, which implies a larger number of decisions are made. This

9
N.T. Boardman and K.M. Sullivan Reliability Engineering and System Safety 243 (2024) 109803

Table 2 Table 3
Observed ADP and myopic policy comparison. Percent of budget dedicated to variable cost.
𝛿 𝑀 𝐵0 𝜙 ADP MC-PE Myopic Policy 𝛿 𝑀 𝐵0 No reliability With reliability
requirement requirement
4 25 8700 0.95 24.97 23.96
4 25 7600 0.89 23.66 21.44 4 25 8700 71.26% 71.41%
4 25 7600 0.8 23.71 21.36 4 25 7600 68.42% 67.25%
4 25 7500 0.84 23.06 20.52 4 25 7400 68.52% 67.38%
4 25 7400 0.79 22.65 19.34 3 33 8050 61.00% 61.02%
3 33 8050 0.85 31.69 28.31 3 33 7650 62.51% 61.24%
3 33 7900 0.79 30.57 23.3 2 50 8700 70.75% 70.61%
3 33 7800 0.75 29.83 21.39 2 50 7600 69.53% 69.71%
3 33 7650 0.65 27.42 18.98 2 50 7400 69.94% 70.21%
2 50 8700 0.95 49.96 2.5

which the ADP policy again performs noticeably better than the myopic
instance is also more resource constrained since it has the smallest policy in each instance.
budget. While the observed performance of the ADP policy does begin For the smallest value of 𝛿 = 2, a direct comparison with the
to deviate as the test instances become more difficult, the overall myopic policy is misleading, although remains appropriate. When the
observed number of successful missions remains relatively high. total budget is 8700, then roughly 8700/50 = 174 units of budget
The observed MC-PE also provides a more appropriate comparison are available for the myopic policy to spend during each time period;
on the results for an inspection interval of 𝛿 = 4 with an inspection however, because the fixed cost is 100, this means (only) around 74
interval of 𝛿 = 2. For each test instance, the resulting ADP policy sensor nodes are deployed in the network every 2 time units. In the long
with an inspection interval of 𝛿 = 4 is also a feasible policy for the run, the network contains, on average, 350 functioning sensor nodes
corresponding 𝛿 = 2 test instance. As a result, the observed number after new sensor nodes have been deployed (or roughly 275 prior to
of successful missions in an optimal ADP policy for the 𝛿 = 2 instance redeployment), but given the small communication and sensing capa-
should be at least twice that of the corresponding 𝛿 = 4 test instance. bilities the network struggles maintain the 80% coverage requirement.
However in a majority of the test instances the observed number of Again, the substantial difference between the ADP and myopic policy in
successful missions for the 𝛿 = 2 ADP policy is approximately double this instance highlights the significance of the ADP policy and its ability
that of the corresponding 𝛿 = 4 ADP policy, and is lower than expected to identify that new sensor nodes do not (and should not) need to be
in the 𝐵0 = 7400, 𝜙 = 0.89 test instance. This again highlights the deployed at every opportunity. The significant improvement of the ADP
difficulty of the test instance and the impact of reducing the time policy over a myopic policy illustrates the value of a deployment policy
between network observations. When the network is inspected more that considers the impact on network performance over a planning
frequently a larger number of deployment decisions must be made horizon, compared to traditional policies that focus on an immediate
regarding when and how many sensors are deployed. The comparison effect.
in the observed performance of the ADP policy for different inspection
intervals further demonstrates the complexity of a policy related to 4.3. ADP policy investigation
the repeated deployment of sensors in a WSN, and suggest there is an
opportunity for future work to focus on improving a policy when the We are also interested in investigating the impact any test instance
planning horizon increases. parameters have on the resulting ADP Policy. One observation is that
the optimal policy is more likely to skip a deployment opportunity
4.2. ADP comparison to myopic policy (i.e., deploy zero sensors at the start of a mission) as the starting
budget 𝐵0 and/or the inspection interval 𝛿 decrease. For a large starting
In addition to initializing the value function, the myopic deploy- budget, it may be possible to incur the fixed cost every mission and
ment policy provides a good comparison to demonstrate the improve- still deploy a sufficient number of sensors to maintain a highly reliable
ment of the ADP policy. For this purpose, the myopic CBDP [55] is network. As the budget decreases, the fixed cost of deploying sensors
also implemented in a Monte Carlo simulation with a budget of 𝐵0 ∕𝑀 every mission consumes a larger proportion of the overall budget which
available to deploy sensors per mission. One of the benefits of this results in fewer sensors deployed each mission. Therefore, it becomes
comparison is that it will provide insight into the ADP policies ability more desirable to skip a maintenance opportunity when allowed to
to identify when it is more advantageous to forgo a deployment action avoid the fixed cost, providing a larger budget over the remaining
(i.e., a ‘do nothing’ maintenance decision), and instead conserve budget missions and increasing the proportion of the budget consumed by
to utilize in future actions. To the best of our knowledge this is only the variable cost, which equates to a new sensor in the network.
prior work that focuses on WSN reliability with region-based node Similarly, as the inspection interval decreases the amount of time
redeployment over time. The observed number of successful missions the network must function until the next deployment window is also
for the myopic policy is provided in Table 2, along with the previous smaller. Compared to a larger inspection interval, it is likely that fewer
ADP results. sensors will fail in a shorter time interval and the network will more
In each of the test instances the ADP policy results in a larger often be observed in a state providing the opportunity to skip sensor
number of successful missions, and is more noticeable with a smaller deployment while ensuring the upcoming mission remains successful
budget. This result is somewhat expected since the ADP policy is with high probability.
allowed to deploy a variable number of sensors and reallocate the The average percent of the budget consumed by the variable cost in
budget as necessary, saving when able and deploying a larger number each policy is reported in Table 3. For each test instance the column
of sensors when needed. However the magnitude of this improvement labeled ‘‘No Reliability Requirement’’ implies 𝜙 = 0, while the column
is quite significant particularly when the budget is more constraining, ‘‘With Reliability Requirement’’ refers to the non-zero reliability re-
clearly seen in the instance with 𝛿 = 3 and a budget of 𝐵0 = 7650. With quirement for the corresponding test instance defined in Table 1. When
the small budget available in this instance the myopic policy performs 𝛿 = 4, the significant drop in the starting budget between the first and
quite poorly and only 19 of the 33 missions are successful, compared second test instance impacts both the total number of missions in which
to the ADP policy which is able to achieve over 27 successful missions. sensors are deployed and the number of sensors deployed. However
A similar outcome is observed with an inspection interval of 𝛿 = 4, in given the longer time between inspection intervals it is more difficult to

10
N.T. Boardman and K.M. Sullivan Reliability Engineering and System Safety 243 (2024) 109803

skip a deployment opportunity and maintain a highly reliable network, Table 4


Single region policy performance.
which is observed by the decrease from 71.26% to 68.42% (71.41%
to 67.25% with a reliability requirement) of the overall budget dedi- 𝛿 𝑀 𝐵0 𝜙 Single region Subregion

cated to variable cost. Meanwhile, the budget allocation appears to be 𝑉0 MC-PE 𝑉0 MC-PE
impacted less for the smaller inspection intervals. For example, when 4 25 8700 0.95 24.91 24.89 24.97 24.97
𝛿 = 3 the overall proportion of the budget consumed by the variable 4 25 7600 0.89 22.59 22.40 23.66 23.66
4 25 7400 0.79 20.79 21.12 22.97 22.65
cost is approximately the same when the starting budget decreases from
3 33 8050 0.85 30.55 30.52 31.88 31.69
8050 down to 7650. Additionally, for the inspection interval 𝛿 = 2 the 3 33 7650 0.65 24.53 25.35 26.27 27.42
decrease in the percent of budget allocated to the variable cost is not 2 50 8700 0.95 49.88 49.84 49.96 49.94
as significant compared to the larger interval of 𝛿 = 4. This result is 2 50 7600 0.89 45.73 44.03 48.05 46.73
somewhat expected since the network does not have to operate as long 2 50 7400 0.79 42.67 43.39 46.33 44.89

until the next deployment decision, and there is more flexibility for
the ADP policy to control when sensors are deployed in the network
providing a better balance between the fixed and variable cost. expected number of successful missions compared to the original per-
The discussion at the end of Section 4.1 also highlighted the dif- formance with multiple subregions. Even if the state variable definition
ficulty encountered in the 𝛿 = 2, 𝐵0 = 7400, 𝜙 = 0.79 test instance. remains the same (i.e., we are still able to observe the number and
Compared to the corresponding test instance with 𝛿 = 4, a larger pro- ages of sensors in various subregions in the network), there is now
portion of the overall budget is allocated to the variable cost under the no guarantee that deploying new sensors based on observing a small
smaller inspection interval of 𝛿 = 2. This suggests that, as expected, the number of sensors in one or more subregions at the beginning of a
ADP policy in the 𝛿 = 2 instance is skipping a deployment opportunity mission will improve the performance in the degraded areas of the
more often, but based on the observed policy performance compared to WSN.
the 𝛿 = 4 policy is struggling to do so in the most effective manner. This
The decrease in expected number of successful missions resulting
suggests that the ADP policy can potentially be improved by focusing
from randomly deploying sensors over the entire region compared to a
more on the timing of when a deployment opportunity is skipped.
smaller defined subregion is more noticeable for the smaller starting
It is also interesting to note that for the smaller starting budgets and
budgets. This can partially be attributed to the impact influencing
𝛿 = 3 or 𝛿 = 4, the variable cost consumes a larger proportion of the
network topology has on the probability of mission success in a smaller
budget when there is no reliability requirement present. The reason for
sized network compared to the impact in a larger network. In terms
this is that the ADP policy is actually more likely to skip a deployment
of the budget available, a decrease to the budget results in a decrease
opportunity when there is no minimum reliability to maintain. With
in the total number of sensors that are deployed over the planning
no penalty for network coverage falling below the requirement and no
horizon, and as a result the overall size of the WSN is generally smaller
minimum reliability the network must maintain the ADP policy is freely
as well. For smaller sized networks it is less likely that randomly
able to sacrifice network performance. By avoiding the deployment
deploying sensors over the entire region of interest will result in sensors
costs for the current mission, there is a larger budget for the remaining
sufficiently distributed throughout the region for coverage purposes,
missions which likely contributes to an increase in the number of
and within the communication radius of nearby sensors necessary to
sensors deployed. When there is a minimum reliability requirement the
route information to the sink node. While randomly deploying a sensor
policy must be more strategic in when a deployment opportunity is
within a smaller subregion does not entirely remove this problem, it
skipped to ensure reliability of every mission is sufficiently high. As
does provide the ability to avoid the situation in which one portion of
a result, the opportunity to skip a deployment window likely arises by
the WSN is overly dense with sensor nodes whereas another portion of
deploying a larger number of sensors at the beginning of a previous
the network is uncovered and individual sensors are isolated. Therefore,
mission, and/or a favorable network observation in which only a small
there is a larger benefit (e.g., improvement in probability of mission
number of sensors failed during the prior mission. Compared to an
success) in a smaller network when the subregion a sensor is deployed
instance with no reliability requirement, where an increase in the
in can be specified compared to the benefit present in a larger sized
overall number of successful missions can be achieved by low network
network. This is observed several of the test instances, for example with
performance over one or more missions.
𝛿 = 4 and 𝐵0 = 7400 where the single region ADP policy achieves an
expected number of successful missions of 20.79, while the previous
4.4. Single region comparison
results with 16 subregions achieve an optimal ADP policy with an
expected 22.97 successful missions. Additionally, even if there is only
Finally, we explore the influence specifying the subregion a sensor
a minor improvement for a single mission the cumulative impact over
is deployed in has on the overall number of successful missions. A
the entire planning horizon can be more substantial.
simpler strategy to implement might involve randomly deploying a
sensor over the entire region of interest, and is one of the more Exploring the performance in a single region model helps further
common assumptions when deploying a WSN [57,64]. The previous illustrate the significance of the ADP policy and considering the impact
model formulation can easily address a single region by setting 𝑟 = 1. of an action on future missions as well. Notice that the observed
It is interesting to note that since we previously defined a network performance of the single region ADP policy, reported in the ‘MC-PE’
structure by assigning weights to every subregion which determined column of Table 4, is still able to outperform the myopic condition-
how new sensors were deployed, a decision in the multiple subregion based policy. This highlights the advantage of deciding if and how
model is no more complex than the single subregion case. The only many sensors are deployed each mission, allowing an appropriate allo-
difference is that now sensors are randomly deployed over the entire cation of the budget to each mission as necessary. Even if new sensors
region, whereas we previously used a rule-set to determine how sensors are randomly deployed over the entire region of interest, rather than
were allocated to each subregion. more controlled through a subregion deployment policy, the decision
Table 4 contains the expected number of successful missions from on when and how many sensors are deployed has a significant impact
the optimal ADP policy when sensors are randomly deployed over the on WSN performance over an extended period of time.
entire region. The final two columns of Table 4, under the ‘Subregion’ A single region scenario also enables a more straightforward com-
label, contain the results from the corresponding test instance with mul- parison with the TBDPs considered in [52], where sensors are deployed
tiple subregions originally reported in Table 1. As expected, removing in order to restore the network to a fixed network size at periodic
the ability to specify the subregion a sensor is deployed in lowers the time intervals. Instead of a direct comparison with a TBDP, we can

11
N.T. Boardman and K.M. Sullivan Reliability Engineering and System Safety 243 (2024) 109803

first note that there exists a close relationship between a TBDP and with observation. A more complex decision might include whether the
a corresponding myopic CBDP. In [52] an expression for the cost WSN is inspected/observed or not, where there is a cost associated
rate of an associated TBDP is derived based on the expected number with observing the network. Similarly network observation may be
of sensors that fail during a mission. The expected number of failed imperfect or there might be a time delay between our observation and
sensors informs the average cost of deploying sensors to reach a fixed deployment action. These directions begin to incorporate uncertainty
network size, which can now be treated as a fixed budget available in a in the true state of the network at the time sensors are deployed and
myopic CBDP. A TBDP differ from the myopic CBDPs in Section 4.2 in might be better modeled as a partially observable MDP.
that sensors are randomly deployed over the entire region rather than Our value function approximation was based on a combination of
a specified subregion. Since the myopic CBDP provides more control aggregation functions and lookup tables. Future work might consider
over how sensors are deployed, the performance of a myopic CBDP is at the use of several basis functions and building a parametric model to
least as good as the related TBDP. With this similarity, and the previous approximate the value function. In this approach the previously defined
discussion on the improvement of a single region ADP policy over a aggregation functions may still be of use, but exploration is needed to
myopic CBDP, the ADP policy also improves upon a simpler time-based define additional basis functions and an appropriate model representa-
policy. tion (e.g., linear, nonlinear, etc.). A parametric model approximation of
the value function is also of interest because it may provide additional
5. Conclusion opportunities to solve the optimality equation each stage, allowing the
optimal action to determined more efficiently. Another direction for
The coverage and communication capability of a WSN is made future work is to implement alternative solution methodologies, such
possible through the cooperative effort of a large number of sensor as a Deep Q-Learning algorithm, to help address the large state and
nodes. The flexibility with which WSNs can be established, randomly action space, which would help provide another point of comparison
deploying sensors over a target region when exact placement is not along with the myopic and time-based deployment policies.
feasible, enables their incorporation into a wide range of applications.
It is important to consider not only the initial capability provided by a CRediT authorship contribution statement
WSN, but performance over a period of time and the impact of eventual
sensor failures. As the number of failed sensors increases the decline in Nicholas T. Boardman: Writing – original draft, Software, Method-
network capability becomes more significant and appropriate actions ology, Investigation, Formal analysis, Conceptualization. Kelly M. Sul-
must be taken to restore WSN coverage and communication abilities. A livan: Writing – review & editing, Supervision, Methodology.
large focus on research related to this problem has been on deploying
a small number of new sensor nodes in the network at a single point in Declaration of competing interest
time. The selective maintenance problem for a WSN over a prolonged
period time in which sensors are repeatedly deployed in the network The authors declare that they have no known competing finan-
has received less attention. cial interests or personal relationships that could have appeared to
In this work we have contributed an MDP model for the condition- influence the work reported in this paper.
based sensor deployment problem in which new sensors are deployed in
the network over an extended period of time. While MDP models have Disclaimer
been applied to a wide range of WSN related problems, our model is one
of the few addressing maintenance through the repeated deployment of The views expressed in this article are those of the authors and do
new sensor nodes, and one of the first ADP applications for the mainte- not reflect the official policy or position of the United States Air Force,
nance of a complex WSN. Whereas previous sensor deployment models Department of Defense, or the U.S. Government.
have primarily been interested in extending a network lifetime metric,
our work also addresses the complexity encountered by incorporating a Data availability
reliability objective. A few of the difficulties that must be addressed in
this problem include a variation in the age composition of sensors as No data was used for the research described in the article.
well as a dynamic network topology as sensors fail and new sensors
are deployed in the network. Our methodology has addressed both References
of these issues by the incorporation of the network D-spectrum. The
[1] Dietrich I, Dressler F. On the lifetime of wireless sensor networks. ACM Trans
D-spectrum has been widely research in network reliability problems, Sensor Netw 2009;5(1):5:1–39.
but only a handful of works discuss the D-spectrum in a maintenance [2] Arampatzis T, Lygeros J, Manesis S. A survey of applications of wireless sensors
optimization model as well [52,54,55]. Finally, we discussed an ADP and wireless sensor networks. In: Proceedings of the 2005 IEEE international
solution approach using a value function approximations to determine symposium on, Mediterrean conference on control and automation intelligent
control, 2005. IEEE; 2005, p. 719–24.
optimal CBDPs, and presented results on a range of test instances.
[3] Akyildiz I, Su W, Sankarasubramaniam Y, Cayirci E. Wireless sensor networks:
The model also provides several directions for future work, focusing a survey. Comput Netw 2002;38(4):393–422.
both on the modeling assumptions and ADP methodology discussed in [4] Yick J, Mukherjee B, Ghosal D. Wireless sensor network survey. Comput Netw
Section 3. The reliability of a WSN is currently defined based on a 2008;52(12):2292–330.
[5] Tiwari A, Ballal P, Lewis FL. Energy-efficient wireless sensor network design
given coverage requirement. The objective is to maximize reliability,
and implementation for condition-based maintenance. ACM Trans Sensor Netw
but there is otherwise no detriment to not satisfying the coverage 2007;3(1):1–es.
requirement over a mission. One possibility is to include a penalty [6] Thai MT, Wang F, Du DH, Jia X. Coverage problems in wireless sensor networks:
based on the probability of network failure, which could also reflect designs and analysis. Int J Sens Netw 2008;3(3):191.
[7] Saeed U, Jan SU, Lee Y-D, Koo I. Fault diagnosis based on extremely randomized
need for immediate maintenance to provide a functioning WSN at all
trees in wireless sensor networks. Reliab Eng Syst Saf 2021;205:107284.
times. With respect to sensor failures, the model classifies sensors into [8] Fu X, Yang Y. Modeling and analysis of cascading node-link failures in multi-sink
an operating or failed state. Similar to the development of selective wireless sensor networks. Reliab Eng Syst Saf 2020;197:106815.
maintenance models for series–parallel systems, future work might [9] Zhang H, Hou JC. Maintaining sensing coverage and connectivity in large sensor
allow multiple sensor states in which a sensor is partially degraded but networks. Ad Hoc Sens Wireless Netw 2005;1:89–124.
[10] Wang X, Xing G, Zhang Y, Lu C, Pless R, Gill C. Integrated coverage and
still able to contribute towards WSN functions. connectivity configuration in wireless sensor networks. In: Proceedings of the
The current model also assumes the WSN is observed every 𝛿 1st international conference on embedded networked sensor systems. New York,
time units and does not explicitly incorporate any cost associated NY: ACM; 2003, p. 28–39.

12
N.T. Boardman and K.M. Sullivan Reliability Engineering and System Safety 243 (2024) 109803

[11] Frye L, Cheng L, Du S, Bigrigg MW. Topology maintenance of wireless sen- [37] Cassady CR, Pohl EA, Murdock WP. Selective maintenance modeling for
sor networks in node failure-prone environments. In: 2006 IEEE international industrial systems. J Qual Maint Eng 2001;7(2):104–17.
conference on networking, sensing and control. IEEE; 2006, p. 886–91. [38] Cassady CR, Murdock Jr WP, Pohl EA. Selective maintenance for support
[12] Li N, Hou JC. FLSS: a fault-tolerant topology control algorithm for wireless equipment involving multiple maintenance actions. European J Oper Res
networks. In: Proceedings of the 10th annual international conference on mobile 2001;129(2):252–8.
computing and networking. ACM; 2004, p. 275–86. [39] Maillart LM, Cassady CR, Rainwater C, Schneider K. Selective mainte-
[13] Sen A, Shen BH, Zhou L, Hao B. Fault-tolerance in sensor networks: A new nance decision-making over extended planning horizons. IEEE Trans Reliab
evaluation metric. In: INFOCOM 2006: 25th IEEE international conference on 2009;58(3):462–9.
computer communications. 2006, 4146923. [40] Ahadi K, Sullivan KM. Approximate dynamic programming for selective
[14] Zhang C, Yang J, Wang N. Timely reliability modeling and evaluation of wireless maintenance in series–parallel systems. IEEE Trans Reliab 2019;69(3):1147–64.
sensor networks with adaptive N-policy sleep scheduling. Reliab Eng Syst Saf [41] Xu J, Liang Z, Li Y-F, Wang K. Generalized condition-based maintenance
2023;235:109270. optimization for multi-component systems considering stochastic dependency and
[15] Parikh S, Vokkarane VM, Xing L, Kasilingam D. Node-replacement policies imperfect maintenance. Reliab Eng Syst Saf 2021;211:107592.
to maintain threshold-coverage in wireless sensor networks. In: 2007 16th [42] Zhou Y, Lin TR, Sun Y, Ma L. Maintenance optimisation of a parallel-series
international conference on computer communications and networks. IEEE; 2007, system with stochastic and economic dependence under limited maintenance
p. 760–5. capacity. Reliab Eng Syst Saf 2016;155:137–46.
[16] Misra S, Mohan SR, Choudhuri R. A probabilistic approach to minimize the [43] Shahraki AF, Yadav OP, Vogiatzis C. Selective maintenance optimization
conjunctive costs of node replacement and performance loss in the management for multi-state systems considering stochastically dependent components and
of wireless sensor networks. IEEE Trans Netw Serv Manag 2010;7(2):107–17. stochastic imperfect maintenance actions. Reliab Eng Syst Saf 2020;196:106738.
[17] Jain A, Reddy B. Node centrality in wireless sensor networks: Importance, [44] Jiang T, Liu Y. Selective maintenance strategy for systems executing multiple
applications and advances. In: 2013 3rd IEEE international advance computing consecutive missions with uncertainty. Reliab Eng Syst Saf 2020;193:106632.
conference. IEEE; 2013, p. 127–31. [45] Dao CD, Zuo MJ. Selective maintenance of multi-state systems with structural
[18] Cheng X, Du D-Z, Wang L, Xu B. Relay sensor placement in wireless sensor dependence. Reliab Eng Syst Saf 2017;159:184–95.
networks. Wirel Netw 2008;14(3):347–55. [46] Liu Y, Chen Y, Jiang T. Dynamic selective maintenance optimization for multi-
[19] Lloyd EL, Xue G. Relay node placement in wireless sensor networks. IEEE Trans state systems over a finite horizon: A deep reinforcement learning approach.
Comput 2006;56(1):134–8. European J Oper Res 2020;283(1):166–81.
[20] Bredin JL, Demaine ED, Hajiaghayi M, Rus D. Deploying sensor networks [47] Xu Y, Pi D, Wu Z, Chen J, Zio E. Hybrid discrete differential evolution and
with guaranteed capacity and fault tolerance. In: Proceedings of the 6th ACM deep Q-network for multimission selective maintenance. IEEE Trans Reliab
international symposium on mobile ad hoc networking and computing. 2005, p. 2021;71(4):1501–12.
309–19. [48] Cao W, Jia X, Hu Q, Zhao J, Wu Y. A literature review on selective maintenance
[21] Almasaeid HM, Kamal AE. On the minimum k-connectivity repair in wireless for multi-unit systems. Qual Reliab Eng Int 2018;34(5):824–45.
sensor networks. In: 2009 IEEE international conference on communications. [49] Samaniego FJ. On closure of the IFR class under formation of coherent systems.
IEEE; 2009, p. 1–5. IEEE Trans Reliab 1985;R-34(1):69–72.
[22] Egeland G, Engelstad PE. The availability and reliability of wireless multi- [50] Navarro J, Samaniego FJ, Balakrishnan N, Bhattacharya D. On the application
hop networks with stochastic link failures. IEEE J Sel Areas Commun and extension of system signatures in engineering reliability. Nav Res Logist
2009;27(7):1132–46. 2008;55(4):313–27.
[23] Chakraborty S, Goyal NK, Mahapatra S, Soh S. A Monte-Carlo Markov chain [51] Shpungin Y. Networks with unreliable nodes and edges: Monte Carlo lifetime
approach for coverage-area reliability of mobile wireless sensor networks with estimation. Appl Math Comput Sci 2007;27:168–73.
multistate nodes. Reliab Eng Syst Saf 2020;193:106662. [52] Boardman NT, Sullivan KM. Time-based node deployment policies for reliable
[24] Xiang S, Yang J. K-terminal reliability of ad hoc networks considering the impacts wireless sensor networks. IEEE Trans Reliab 2021;1–14.
of node failures and interference. IEEE Trans Reliab 2019;69(2):725–39. [53] Gertsbakh IB, Shpungin Y. Models of network reliability: analysis, combinatorics,
[25] Distefano S. Evaluating reliability of WSN with sleep/wake-up interfering nodes. and Monte Carlo. Boca Raton, FL: CRC Press; 2016.
Internat J Systems Sci 2013;44(10):1793–806. [54] Finkelstein M, Gertsbakh I. Time-free preventive maintenance of sys-
[26] AboElFotoh HM, Colbourn CJ. Computing 2-terminal reliability for radio- tems with structures described by signatures. Appl Stoch Models Bus Ind
broadcast networks. IEEE Trans Reliab 1989;38(5):538–55. 2015;31(6):836–45.
[27] Pullen KW. A random network model of message transmission. Networks [55] Boardman NT, Sullivan KM. Condition-based node deployment policies for
1986;16(4):397–409. reliable wireless sensor networks. Tech. rep., University of Arkansas; 2021.
[28] Park J-H. Time-dependent reliability of wireless networks with dependent [56] Alsheikh MA, Hoang DT, Niyato D, Tan H-P, Lin S. Markov decision processes
failures. Reliab Eng Syst Saf 2017;165:47–61. with applications in wireless sensor networks: A survey. IEEE Commun Surv
[29] Deif D, Gadallah Y. A comprehensive wireless sensor network reliability met- Tutor 2015;17(3):1239–67.
ric for critical internet of things applications. EURASIP J Wireless Commun [57] Senouci MR, Mellouk A, Aissani A. Random deployment of wireless sen-
Networking 2017;2017(1):145. sor networks: a survey and approach. Int J Ad Hoc Ubiquitous Comput
[30] Chowdhury C, Aslam N, Ahmed G, Chattapadhyay S, Neogy S, Zhang L. 2014;15(1–3):133–46.
Novel algorithms for reliability evaluation of remotely deployed wireless sensor [58] Younis M, Akkaya K. Strategies and techniques for node placement in wireless
networks. Wirel Pers Commun 2018;98(1):1331–60. sensor networks: A survey. Ad Hoc Netw 2008;6(4):621–55.
[31] AboElFotoh HM, ElMallah ES, Hassanein HS. On the reliability of wireless sensor [59] Powell WB. Approximate dynamic programming: solving the curses of
networks. In: 2006 IEEE international conference on communications, Vol. 8. dimensionality, Vol. 703. John Wiley & Sons; 2007.
IEEE; 2006, p. 3455–60. [60] Finkelstein M, Vaupel J. On random age and remaining lifetime for populations
[32] Wang N, Xiao Y, Tian T, Yang J. The optimal 5G base station location of of items. Appl Stoch Models Bus Ind 2015;31(5):681–9.
the wireless sensor network considering timely reliability. Reliab Eng Syst Saf [61] Navarro J, Samaniego FJ, Balakrishnan N. Signature-based representations for
2023;236:109310. the reliability of systems with heterogeneous components. J Appl Probab
[33] Liu Q. Coverage reliability evaluation of wireless sensor network considering 2011;48(3):856–67.
common cause failures based on D–S evidence theory. IEEE Trans Reliab [62] Bistouni F, Jahanshahi M. Evaluating failure rate of fault-tolerant multistage
2020;70(1):331–45. interconnection networks using Weibull life distribution. Reliab Eng Syst Saf
[34] Provan JS, Ball MO. The complexity of counting cuts and of computing the 2015;144:128–46.
probability that a graph is connected. SIAM J Comput 1983;12(4):777–88. [63] Bettstetter C. On the minimum node degree and connectivity of a wireless
[35] Chakraborty S, Goyal NK, Mahapatra S, Soh S. Minimal path-based reliability multihop network. In: Proceedings of the 3rd ACM international symposium on
model for wireless sensor networks with multistate nodes. IEEE Trans Reliab mobile Ad Hoc networking & computing. 2002, p. 80–91.
2019;69(1):382–400. [64] Ishizuka M, Aida M. Performance study of node placement in sensor networks.
[36] Silva I, Guedes LA, Portugal P, Vasques F. Reliability and availability In: 24th international conference on distributed computing systems workshops,
evaluation of wireless sensor networks for industrial applications. Sensors 2004. Proceedings. IEEE; 2004, p. 598–603.
2012;12(1):806–38.

13

You might also like