Escape Sensing Games:
Detection-vs-Evasion in Security Applications
Abstract
Traditional game-theoretic research for security applications primarily focuses on the allocation of external protection resources to defend targets. This work puts forward the study of a new class of games centered around strategically arranging targets to protect them against a constrained adversary, with motivations from varied domains such as peacekeeping resource transit and cybersecurity. Specifically, we introduce Escape Sensing Games (ESGs). In ESGs, a blue player manages the order in which targets pass through a channel, while her opponent tries to capture the targets using a set of sensors that need some time to recharge after each activation. We present a thorough computational study of ESGs. Among others, we show that it is NP-hard to compute best responses and equilibria. Nevertheless, we propose a variety of effective (heuristic) algorithms whose quality we demonstrate in extensive computational experiments.
1 Introduction
The past decade has witnessed an influential line of research in AI, particularly multiagent systems (MAS), that employs computational game theory to tackle critical challenges in security and public safety applications, ranging from protecting national ports [21] to combating smuggling [7] and illegal poaching [14] to defending our cyber systems [27]. At the core of almost all of these problems is to optimize the allocation of (often limited) external forces to protect critical targets. In this work, we adopt a similar computational game theory approach but address a fundamentally different type of security challenge that looks to improve security via optimizing the arrangement of targets in the face of adversaries. This research contributes a novel perspective to game-theoretic security strategies, emphasizing target arrangement as a defense mechanism against adversaries.
Specifically, we introduce and study Escape Sensing Games (ESGs). In these games, a blue player aims to securely navigate a set of targets through a channel, whereas her opponent, the red player, controls a set of sensors along the channel and tries to sense (and therefore “steal”) as many targets as possible. This model captures strategic interactions arising in various domains. One example is the transportation of peacekeeping resources using a convoy of ships or cars over a fixed route with malicious actors (e.g., pirates or hostile forces) trying to intercept them [25] (see Section 2 for details). In cybersecurity, the blue player could model a network administrator routing sensitive data packets through a network with an attacker trying to intercept them. Our model captures strategic interactions in these settings arising when security measures are either unavailable or have already been allocated and the blue player is only left with scheduling the targets to avoid detection by the attacker.
We study the optimal sequential play in ESGs, where the blue player first commits to an ordering of targets followed by the red player devising an optimal sensing plan. Herein, sensors’ capabilities are limited in two ways. First, each sensor is only capable of sensing certain targets, modeling that detection and interception technologies are not uniformly effective across different targets, due to differing characteristics such as size, speed, or defense mechanisms. Second, sensors need a certain time to recharge after sensing a target, modeling limits inherent in detection and interception systems, where permanent action is not feasible.
There are certain challenges integral to our model that make the computation of equilibria highly non-trivial. First, the action space of both players has an exponential size, rendering standard solution approaches such as support enumeration computationally infeasible. Second, also after the strategies of both players have been fixed, the game evolves in a complex, sequential fashion with targets moving one after each other through the channel. Connected to this, third, it turns out that the red player’s best response problem of coming up with an optimal sensing plan given a target ordering is already NP-hard. Consequently, this paper also contributes to the algorithmic research on computationally challenging games, a fairly unexplored topic outside of combinatorial game theory [10, 20, 12].
1.1 Our Contribution
We contribute a new perspective to the rich literature on computational game theory for security applications through our study of the previously overlooked problem of target arrangement. Specifically, we introduce and analyze Escape Sensing Games with a focus on the target-controlling blue player. We demonstrate that solving this game is highly complex, as we prove that it is NP-hard for both players to compute their optimal strategies. To nevertheless be able to solve ESGs in practice, we devise algorithms for computing the red player’s strategy, which turn out to scale well in our experiments. Computing the blue player’s strategy and thereby the game’s Stackelberg equilibrium turns out to be a much more intricate task. Our experiments show that our formulation of the problem as a bilevel program is only capable of solving small instances of the game exactly. Motivated by this, we present a heuristic that effectively combines simulated annealing with a greedy heuristic and an Integer Linear Program (ILP) for computing the red player’s strategy. We demonstrate the quality of our heuristic through extensive experiments.
We further this investigation in Section 6 by studying a different variant where sensors are decentralized hence each sensor acts independently according to a simple greedy strategy. We show that it remains NP-hard for the blue player to compute its optimal strategy. While in this setting blue’s problem admits an ILP formulation, we demonstrate in experiments that it can only solve up to medium-sized instances. Addressing this, we present heuristics that perform well in our experiments. We also demonstrate that while sensors usually have some gain from coordination, this gain depends decisively on the instance structure and is oftentimes rather small (below ).
Full proofs of all results and descriptions of additional experiments can be found in our full version [6].
2 Escape Sensing Games (ESGs): The Model
In an Escape Sensing Game (ESG) a blue player (henceforth Blue) tries to route a set of targets through a channel. They compete against a red player (henceforth Red) who controls a set of sensors and tries to sense (and therefore “steal”) as many of Blue’s targets as possible.222Note that the terms “sensor” and “sensing” are only part of our terminology and do not limit the applications of our model. For instance, instead of “sensing” the targets, Red might also aim to intercept them. Formally, an Escape Sensing Game is defined by
-
1.
a set of targets , each target equipped with some utility value ,
-
2.
a set of sensors and a recharging time which is the same for all sensors and,
-
3.
a sensing matrix where means that sensor is capable of sensing target .
We assume that all of these parts are known to both players at any point in time. Note that in this paper, we consider the constant-sum utility structure and leave the general-sum version for future work. That is, Blue seeks to maximize the summed value of not-sensed targets, i.e., targets not sensed by any sensor. In contrast, Red seeks to minimize this value, or equivalently, maximize the summed value of targets that are sensed by some sensor.
The strategy of Blue is an ordering of the targets that assigns each target a unique position , i.e., is a bijection. The targets move through the channel according to , i.e., the target on position moves first, on position second, and so on. In each time step each target moves to the next sensor, leaves the channel (in case it passed all sensors), or enters the channel at the first sensor (in case it is the next target in the ordering ).
The strategy of Red is a sensing plan that maps each sensor to a subset of targets sensed by the sensor, where each sensor senses different targets, i.e., for each . Red cannot play arbitrary sensing plans but only those which are valid. A sensing plan is valid (with respect to a senor ordering ) if (i) a sensor only senses targets it has the capabilities to sense, i.e., for each and we have , and (ii) a sensor pauses for at least time steps after sensing a target, i.e., for each and we have . Given a sensing plan, we can immediately calculate the value of not-sensed targets as , which quantifies Blue’s utility.
Objectives and equilibrium
Due to the motivating applications of our interest, this work adopts Blue’s perspective and analyzes sequential play in this game by assuming Blue moves first.333Note that this is already reflected in our game definition, since the validity of Red’s sensing plan depends on the strategy of Blue. Thus, Red cannot move before or simultaneous to Blue. Our analysis consists of two parts. First, we will analyze the best response problem for Red called Best Red Response: Given an ordering of the targets, output the sensing plan that is valid with respect to and minimizes among such plans. Second, to compute the optimal strategy of Blue, we analyze the game’s Stackelberg equilibrium444We assume that ties in the strategies are broken according to some predefined lexicographic ordering of the strategies., which can be written as the following bilevel optimization problem: We term the corresponding computational problem Blue Leader Stackelberg Equilibrium.
A motivating application
One major motivation of our work is the secure transit of peacekeeping resources in the presence of adversarial actors such as pirates, which has critical importance due to past incidents, e.g., to the United Nations [25]. Citing the UN’s peacekeeping mission manual [26], “protecting shipping in transit ensures the safety and security of vessels as they pass through waters threatened by piracy on the high seas…” In these applications, UN plays Blue’s role whereas pirates correspond to Red, who can observe the ordering of targets and then act second. The UN commands a fleet of ships (i.e., targets in our model) that often carry resources of different importance and that can be arranged strategically. Protecting shipping is overall a complex, multi-facet, task and our model captures one of the phases after potential (often scarce) security measures have already been allocated to the ships and the pirates look to identify targets to attack. According to Winn and Govern [28], pirates often use a set of boats (i.e., sensors in our model) to probe different passing targets, usually by following them to observe their speed, crew amount, firearm, etc. to judge based on this whether they are capable of capturing the ship. Such probing takes time, which is modeled by the recharging time .
Sensor and target types
We develop some customized algorithms for instances with only a few different target or sensor models: We say that two sensors are of the same type if they are capable of sensing the same targets, i.e., the -th and -th column of the sensing matrix are identical. We say that two targets are of the same type if they have the same utility value and can be sensed by the same sensors, i.e., and the -th and -th row of the sensing matrix are identical. We denote as and the set of target and sensor types, respectively. It is easy to see that , as a sensor’s sensing capabilities are defined by the set of target types it can sense. Similarly, assuming that all targets have the same value, it holds that .
3 Related Work
While the escape sensing game model is new, it is closely related to a few lines of AI research, as detailed below.
Computational game theory for security
Conceptually, our work subscribes to the extensive MAS literature on computational game theory for tackling security challenges. The Stackelberg security game [24] is one widely studied example. Other game-theoretic models include the hide-and-seek game [8], blotto games [4], auditting games [5] and catcher-evader games [19]. Most of these games study the optimal usage of security forces under different game structures. In contrast, our ESG model is motivated by detection-vs-evasion situations in which security forces have already been allocated.
Scheduling
On a formal level, our problem is to schedule/order targets in an adversarial environment, which shares similarities with the classic problem of scheduling that looks to assign tasks to different machines to optimize certain criteria [18]. There is a rich body of AI research on scheduling, ranging from solving varied problems using AI techniques such as satisfiability [11] and distributed constraint optimization [22], to developing new models of scheduling problems under uncertainty [3] or in multi-agent setups [31].
In fact, the Best Red Response problem can be formulated as the following slightly non-standard scheduling problem: There are machines (modeling sensors). In each step, a job (modeling a target) arrives. The job can be processed (modeling sensing) by a given subset of machines and if executed successfully generates a given reward value. The job has a processing time of and needs to be processed within the next steps. This implies that the job needs to be processed (i.e., sensed) either now or its reward is lost.
4 The Algorithmics of Escape Sensing Games
We analyze the computational complexity of ESGs starting with Red’s best response problem, followed by computing equilibria.
4.1 Computing Red’s Best Response Strategy
We analyze Red’s best response problem that Red needs to solve in each game after Blue has committed to a target ordering. This problem turns out to be NP-hard, even if Red is only interested in determining whether it can sense all targets. This intractability result is the first strong indicator of the intricate game dynamics in ESGs.
Theorem 1.
Best Red Response is NP-complete, even when asked to decide whether Red can sense all targets or not.
Proof.
We reduce from Hitting Set where we are given a universe , a collection of sets and an integer , and the question is whether there a size- subset containing at least one element from each set in (we assume that and ).
In the construction, all targets have a value of and the question is whether Red can sense all targets. As the core of the construction we add element sensors , set targets , and selection targets . Each element sensor can sense all selection targets and all set targets corresponding to sets in which the element appears. Regarding the ordering of targets, it is easiest to think of the targets as being arranged in “rounds”. In each round , first the selection targets move through the channel followed by the the set target . The idea is that the same element sensors sense the selection targets in every round, which correspond to the elements that are not part of the hitting set (we extend the construction in the following paragraph to ensure that this holds). Then, the remaining element sensors need to form a hitting set to be able to sense the set target in each round.
We extend the construction as follows. We add filling targets for all and , which all element sensors can sense. Moreover, we add dummy sensors for each and and dummy targets for each and . For each and , dummy sensor can sense dummy target . We set . Formally, the target ordering is constructed—in multiple “rounds”—as follows. In each round , we first move the selection targets through the channel, then the dummy targets , then the set target , then the filling targets and then the dummy targets (the ordering of targets in each of the groups is arbitrary).
Proof of correctness: forward direction
Assume that is a size- hitting set of . For each and , we let sense . We construct the sensing plan for the element sensors iteratively as follows. In each round , we let each of the element sensors sense exactly one of the selection targets . Now, let be an element from that is contained in (such an element needs to exist because is a hitting set). We let sense and we let each of the element sensors sense exactly one of the filling targets .
The constructed sensing plan senses all targets and clearly respects the sensing matrix. It remains to be argued that the recharging times of all element sensors are respected (dummy sensors only sense one target). For each , we have that senses one selection target in each round. Between two selection targets in two different rounds there are at least dummy targets and one set target, so recharging times are respected. For each , the sensor senses either a set or filling target in each round. There are dummy targets and selection targets between each two sets and filling targets from different rounds, so recharging times are respected.
Proof of correctness: backward direction
Assume that is a valid sensing plan that senses all targets. Consequently, in each round, the element sensors need to sense selection, filling, and one set target. As is valid and there are only targets between the first selection and last filling target in each round, this means that each element sensor needs to sense exactly one of these targets in each round. Note that an element sensor that senses a non-selection (i.e., either a set or filling) target in round cannot sense a selection target in round , as there are only targets between the first non-selection target in round and the last selection target in round . Consequently, as each element sensor needs to sense one target in each round, it follows that there is a set of elements so that the corresponding element sensors sense a selection target in every round. Consequently, the remaining element sensors need to sense all set targets. As an element sensor is only capable of sensing a set target if the element appears in the set, it follows that is a size- hitting set of . ∎
Despite this intractability result, it is still possible to construct exact combinatorial algorithms for Best Red Response. In particular, we present a dynamic programming-based algorithm empowered by some structural observations on ESGs that runs in (recall that ). This algorithm in particular implies that the problem becomes polynomial-time solvable if the recharging time, which we expect to be rather small in comparison to the number of targets, is a constant.
Proposition 2.
There is a -time algorithm for Best Red Response.
Proof Sketch.
Our idea is to construct a valid sensing plan iteratively by going through the arriving targets one by one (we assume that the ordering of targets is ). For each target, we either decide that it will not be sensed or assign it to one of the sensors so that the resulting plan is still valid. Our key observation to bring down the time and space complexity of the dynamic program is that we do not need to store the full sensing plan to ensure the validity of the plan after updating it. Instead, it is sufficient to know for each sensor whether it has sensed a target in the last steps. More formally, given a valid sensing plan that has been constructed by iterating over the first targets, we only store the following information: (i) the value of all targets that have not been assigned to a sensor in , i.e., , (ii) the sensors the last targets have been assigned to. It is possible to store this information in a table of size where each cell can be computed in -time. To extend the algorithm to sensor types, we prove that we can collapse sensors of one type into a “meta” sensor, making it sufficient to bookmark the types of sensors that have sensed the last targets. ∎
We conclude by giving a clean ILP formulation of Best Red Response, which turns out to scale very favorably in our experiments allowing us to solve instances with up to targets within one minute.
Proposition 3.
Best Red Response admits an ILP formulation with binary variables and constraints.
Proof.
We model an instance of Best Red Response as an ILP as follows. We assume that the targets are ordered as . We create a binary variable for each and . Setting to one corresponds to letting sensor sense target if , and letting not be sensed by any sensor if .
To ensure that Red minimizes the value of not-sensed targets, the optimization criterion becomes: To ensure the validity of the sensing plan , for each , we enforce that: Moreover, to ensure that sensor capabilities are respected, we impose for each and that: Lastly, to enforce that recharging times are respected, for each and we add the constraint: ∎
4.2 Solving for the Stackelberg Equilibrium
We now study the problem of computing Blue’s optimal strategy, i.e., to solve Blue Leader Stackelberg Equilibrium. Theorem 1 already shows the NP-hardness of Best Red Response. While this does not imply the hardness of computing Stackelberg equilibria555Note that the fact that it is NP-hard for Red to best respond to certain Blue strategies (as constructed in the reduction of Theorem 1) does not imply that is also hard for Red to best respond to the particular Stackelberg equilibrium strategy of Blue (as these strategies might admit some structure that makes it easier to best respond)., a convincing intractability result for Blue’s optimal strategy shall ideally “disentangle” its complexity from Red’s best response problem. With this in mind, we prove the NP-hardness of Blue Leader Stackelberg Equilibrium even in situations where Red’s best response problem is linear-time solvable. This demonstrates that the complexity in our reduction does not come from finding Red’s strategy but from the problem of whether Blue can arrange the targets in an optimal way.
Theorem 4.
Blue Leader Stackelberg Equilibrium is NP-hard, even on instances where Best Red Response is linear-time computable and the recharging time is .
Note that the NP-hardness upholds even if sensors’ recharging time is constant, a case in which Red’s best response problem is polynomial-time solvable (see Proposition 2). Our hardness result indicates that computing Blue’s optimal strategy is a generally much harder problem than computing Red’s optimal strategy. In fact, it remains open whether Blue Leader Stackelberg Equilibrium is contained in NP or whether it is complete for complexity classes beyond NP. We suspect the latter to hold.
4.2.1 Bilevel Optimization
In light of this, it is unclear (and from our perspective rather unlikely) that Blue Leader Stackelberg Equilibrium admits an ILP formulation. Naive brute-force approaches are also computationally infeasible, as we would need to enumerate all possible target orderings and solve the NP-hard Best Red Response problem as a subroutine for each of them.
Thus, we turn to a formulation as a bilevel optimization problem [9] as one way to solve the problem exactly. In such formulations, constraints are still linear, but there exist two connected levels of the problem, i.e., an outer and an inner level. The inner level controls certain variables that it sets to minimize an objective subject to linear constraints that also involve variables controlled by the outer level, while the outer level sets these variables to maximize the objective. In our problem, we can model Red’s best response problem as the inner level loosely following the ILP from Proposition 3. The outer-level models Blue’s problem. The key parts of the outer level are variables for each target that encode the position in which the target appears in the final ordering and that are used in the inner level to ensure the validity of the sensing plan.
Proposition 5.
Blue Leader Stackelberg Equilibrium admits a bilevel optimization formulation with binary variables, integer variables, and constraints.
4.2.2 Heuristic
We will see later that the running time for the bilevel formulation of the problem becomes already infeasible on small-sized instances. Therefore, we experimented with different heuristics to solve the problem.666Note that the heuristic double-oracle approach that has been successfully employed for other large combinatorial games [1, 17] is not applicable to ESGs. Traditionally, the approach successively expands the strategy spaces of both players by letting them best respond to each other. However, in ESGs, we face a bilevel problem in which there is no best response of the leader to the follower. The approach also fails here because the valid strategies of the follower heavily depend on the strategy picked by the leader. In the following, we present two variants of simulated annealing-based heuristics that performed best. For a target ordering , we denote as its neighbors, i.e., all orderings that arise from by swapping the position of any two different targets. The relaxed version of our simulated annealing (SA_Relax) is presented in Algorithm 1. The idea is to find an optimal ordering through repeated local rearrangements. We store the current ordering as and compute its value for Blue by solving Red’s best response problem using Proposition 3. Then, we pick a random neighbor of , compute its value, and update the ordering based on this according to standard simulated annealing rules.
In the full version of our simulated annealing (SA), instead of picking a random neighbor from in Line 3 of Algorithm 1, we first run a heuristic for Best Red Response on all orderings from .777In our (greedy) heuristic, we consider the targets in decreasing order of their value and construct the sensing plan iteratively. Let be the sensors so that remains a valid sensing plan after adding the current target to . We let the target be sensed by a randomly selected sensor from (or by no sensor if is empty). For a formal description, see our full version [6]. Then, on the fraction of neighbors with the highest returned value, we execute the ILP from Proposition 3 to compute the optimal sensing plan. Of the examined neighbors, we pick the one with the highest returned value as . As a hyper-parameter tuning process, we tested the performance of our heuristic algorithm with respect to the choice of (see our full version [6] for results). It turns out that provides a good trade-off between the algorithm’s running time and Blue’s utility. Thus, we fix throughout the paper. For both heuristics, we always run the heuristic three times with three different initial randomly generated target orderings and return the best computed ordering.
5 Experimental Evaluations
We analyze the quality and performance of our algorithms to compute the Stackelberg equilibrium.888We use Gurobi [15] to solve the ILP from Proposition 3 and MIBS [23] to solve the bilevel program from Proposition 5. Both are among the most popular off-the-shelf tools for solving the respective problem. We consider three simulated game settings for generating ESGs. For each setting, we determine the value of a target by drawing a number uniformly within 999In our full version [6], we analyze supplementary scenarios, reinforcing similar conclusions to those presented here.:
-
1.
Default (Def): For each and , we set with probability .
-
2.
Euclidean (Euc): Each target and each sensor are uniformly sampled points in . A sensor can sense a target (i.e., ) if the Euclidean distance between their points is below .
-
3.
RandomLevel (Rand): Each target has a difficulty level uniformly sampled from , and each sensor has a skill level uniformly sampled from . For each and , we set with probability .
In all our experiments, if not stated otherwise, we average over instances generated according to one of the models. We present our experimental results as tables where each entry contains Blue’s average utility (i.e., the summed value of not-sensed targets) from the computed target ordering assuming Red best responds and the average running time in seconds in italics, both followed by their respective standard deviations. Note that standard deviations are calculated across the different sampled instances, implying that independent of the solution method some non-trivial standard deviation is to be expected, as certain instances are more favorable for Blue than others.
We analyze the maximum size of instances that we can solve exactly using the bilevel program, which we denote as OPT. We present results for the Default game setting in Table 3 (results for other simulated game settings are similar). It turns out that while instances with targets can be solved within a second by OPT, instances with targets take already around hours to solve. This demonstrates that the bilevel program is only usable for quite small instances. Moreover, we observe the to-be-expected trend that Blue’s utility increases when Blue has more targets or Red has less sensors. However, we do not find any consistent trend regarding whether it is more advantageous for Blue: more targets or fewer sensors.
Motivated by the high computational cost of the bilevel program, we now turn to analyzing the quality of our heuristics. We also include the Random method here as a baseline where Blue simply picks an arbitrary ordering of targets (and Red best responds to it). In addition, we compare our heuristics against a naive random strategy of comparable computational cost. For this, we include the Random2 method which generates random orderings for Table 3 and random orderings for Table 3. The sampled ordering that achieves the highest utility for Blue assuming that Red best responds is returned.
2 | 3 | 5 | |
5 | 1.79 0.71, 0.61 0.21 | 1.55 0.65, 0.77 0.02 | 1.02 0.62, 0.98 0.04 |
7 | 2.41 0.78, 102 36 | 2.29 0.80, 116 31 | 1.62 0.72, 140 29 |
8 | 2.96 0.81, 1501 354 | 2.33 0.74, 1760 38 | 1.7 0.69, 1814 23 |
9 | n/a, 31358 | n/a, 32541 | n/a, 35376 |
Def | Euc | Rand | |
OPT | 2.29 0.80, 116 31 | 1.952 0.74, 126 2.29 | 2.09 0.87, 120 25 |
SA | 2.29 0.80, 4.78 0.38 | 1.951 0.75, 4.96 0.47 | 2.09 0.87, 5.03 0.69 |
SA_Relax | 2.29 0.80, 0.81 0.03 | 1.952 0.74, 0.84 0.05 | 2.09 0.87, 0.85 0.08 |
Random | 2.16 0.84, 0.001 | 1.71 0.83, 0.001 | 1.93 0.92, 0.001 |
Random2 | 2.29 0.80, 5.13 0.16 | 1.952 0.74, 5.25 0.33 | 2.09 0.87, 5.3 0.46 |
Def | Euc | Rand | |
SA | 15.96 1.1, 28101 563 | 18.4 2.1, 27755 928 | 17.3 4.2, 27970 1136 |
SA_Relax | 8.76 0.9, 49.6 2.57 | 10.53 2.32, 47.5 0.86 | 12.3 3.6, 49.7 1.6 |
Random | 6.19 1.26, 0.001 | 6.86 2.25, 0.001 | 9.68 2.78, 0.001 |
Random2 | 8.27 0.73, 25036 311 | 9.54 2.45, 24333 211 | 11.68 3.58, 26810 295 |
In Table 3, we show the algorithms’ performance for small instances where we can still compute Blue’s maximum utility (OPT) via the bilevel program. In Table 3, we consider larger instances where the optimum value is unknown. Note that higher values correspond to a better performance of the algorithm, as we always report Blue’s utility for Red’s best response.
From the results in Table 3, we can see that all heuristics perform well on small instances. In particular, SA_Relax, SA, and Random2 find the optimal solution in all (but one) cases. However, SA_Relax proves advantageous because it only needs a sixth of the running time of the other two methods.
While our two heuristics SA and SA_Relax show a similar approximation quality for small instances, for larger instances (Table 3) SA clearly outperforms SA_Relax. For the Default game setting, using SA compared to SA_Relax even regularly leads to a doubled utility for Blue. While this is a strong argument for using SA, SA’s downside is its higher computational cost, needing over hours to solve instances with targets.
Finally, we observe that both methods clearly outperform the Random baseline, with SA consistently preserving an average of approximately 20 more targets for the larger instances. This highlights that the solution quality of the target ordering clearly increases throughout the simulated annealing. Considering Random2, we find that repeatedly sampling orders (instead of only once) leads to a noticeable utility increase. However, on the larger instances, Random2 performs even worse than SA_Relax while running as long as SA, thereby combining the disadvantages of SA and SA_Relax. Overall, our experiments highlight that Blue benefits from ordering the targets strategically instead of randomly.
6 Escape from Non-Coordinated Sensing
ESGs assume that the different sensors are controlled by a central authority that computes the sensing plan. We now investigate the situation where these sensors are non-coordinated and each one acts independently based on a natural greedy algorithm. This happens when sensors cannot easily exchange information and coordinate with each other. Another motivation is when sensors are controlled by different adversaries, each serving only their own interests and being unlikely to coordinate their actions and share their reward. Both of these scenarios can occur in our motivating domain of piracy at large open seas, as coordination between different groups is likely to be challenging. Different pirate groups might even refuse to coordinate at all and instead directly compete with each other.
We model these situations by assuming that sensors have a predefined ordering as (as induced by fixed locations of the sensors); for each sensor , as soon as a not previously sensed target that can sense passes (i.e., has the capabilities and is currently not recharging), senses it, thereby greedily maximizing its number of sensed targets.
Formally, given a target ordering , we construct a sensing plan sequentially as follows. For each step , if target passes sensor in step , then we add to if the resulting sensing plan remains valid with respect to (formally, for and target passes sensor in step ). As the strategy of Red is fixed, the problem Best Blue Response Blue faces is to pick a target ordering so that gets maximized. In the following, we study the computational complexity of this problem and solve it in computational experiments. By comparing the answer of Best Blue Response to the value of the Stackelberg equilibrium in the corresponding ESG we can ultimately answer how much Red gains from being able to centrally control its sensors.
6.1 Algorithmic Analysis
Unfortunately, it turns out that computing Blue’s strategy is NP-hard, even in restricted cases where each sensor can only sense one target. Due to the sequential construction of Red’s sensing plan, this reduction is our most intricate one:
Theorem 6.
Best Blue Response is NP-complete, even if the recharging time is , i.e., each sensor can sense only one target, each target has value , and the sum of each row and column in the sensing matrix is at most four.
Proof Sketch.
We focus on the variant where each sensor can only sense one target. Interestingly, as discussed in more detail in our full version [6] this problem shares some similarities with the NP-hard Minimum Maximal Matching problem, as we can view the sensors and targets as two sides of a bipartite graph with sensor-target pairs where the sensor senses the target corresponding to maximal matchings in this graph. However, the ordering of the sensors makes only certain maximal matchings in these graphs realizable, which is why we instead show NP-hardness by reducing from a variant of 3-SAT where each variable appears only twice positive and once negative. The core idea of our construction is the following: We add a literal target for each literal. Moreover, for each clause, we add a clause sensor and a clause target. The clause sensor is capable of sensing the corresponding clause target as well as targets corresponding to the three literals appearing in the clause. We add further targets and sensors to the instance so that all clause targets need to make it unsensed through the channel. This implies that each clause sensor needs to sense a literal target as it will otherwise sense the corresponding clause target in passing, i.e., we need to “cover” each clause with a literal appearing in the clause. Now for each variable, we add a slightly intricate gadget that ensures that we can either use the targets corresponding to positive literals to cover clause sensors (which corresponds to setting the variable to true) or the one target corresponding to a negative literal (which corresponds to setting the variable to false). Because we need to “cover” each clause, the induced assignment is satisfying. ∎
We can adopt a similar view as in Proposition 2 to solve the problem via dynamic programming. However, this time the dynamic programming iteratively constructs the optimal target ordering and we need to keep track of the previously used targets together with the sensors used in the last timesteps. This results in a naive running time of , which can be improved to if we incorporate types:
Proposition 7.
Best Blue Response is solvable in , where is the number of targets of type .
6.1.1 ILP Formulation
Constructing an ILP for Best Blue Response turns out to be slightly more challenging, as we need to encode Red’s greedy sequential behavior:
Proposition 8.
There is an ILP formulation for Best Blue Response with binary variables, integer variables, and constraints.
Proof Sketch.
We introduce for each target an integer variable encoding the position in which the target appears. Moreover, similar to Proposition 3, for each and , we add a binary variable , which encodes whether is sensed by sensor or whether the target makes it unsensed through the channel (for ). We can add mostly straightforward constraints to ensure that respects recharging times. The main challenge is to encode the greedy behavior of the sensors (i.e., the ILP cannot have the freedom to pick the values arbitrarily to optimize Blue’s utility but they are set according to sensors’ greedy behavior). For this, for each and , we add a binary variable and add constraints so that is equal to one if target is sensed by sensor and because of this recharges when is passing, i.e., “covers” .
To encode sensors’ greedy behavior, we want to add a constraint that makes sure that in case , the target needs to be covered by other targets for all sensors that are capable of sensing it placed before . Note that this together with another constraint () in particular implies that each target is sensed by the first sensor it passes which is not recharging, thereby encoding the greedy behavior of sensors. Specifically, for each and , we add:
(1) | ||||
∎
6.1.2 Heuristic
Since it will turn out that the ILP formulation cannot quickly solve medium-to-large instances, we explore various simulated annealing-based heuristics, similar to the approach discussed in Section 5. We present the variant SA_Relax where a random neighbor is picked in Algorithm 2. The other variant SA computes Blue’s utility for all neighbors and picks the one with the highest utility.
6.2 Experiments
We reuse the general setup described in Section 5, but naturally now report Blue’s computed utility assuming that sensors act greedily. Here, we let the Random2 method generate random orderings in Table 6 and random orderings in Table 6.
First of all, we evaluate the scalability of our ILP for Best Blue Response (OPT) in Table 6. The ILP can solve the problem for medium-sized instances with up to targets in a few minutes. However, due to the complexity of the ILP modeling, already for targets as soon as the number of sensors reaches , instances can take more than hours to solve. This is why the last line of the table only reports the running time for one instance.
2 | 3 | 5 | |
5 | 2 0.68, 0.007 0.005 | 1.71 0.57, 0.009 0.005 | 1.32 0.52, 0.01 0.003 |
15 | 6.6 1, 0.14 0.35 | 6.29 0.9, 0.32 0.88 | 5.46 0.87, 6.85 22.9 |
20 | 9.05 1.13, 0.52 2.94 | 8.49 1.01, 1.58 4.4 | 7.56 1.01, 229 1261 |
25 | 11.38 1.26, 6.8 30.6 | 10.96 1.29, 283 1771 | n/a, 22537 |
Def | Euc | Rand | |
OPT | 3.1 0.86, 0.15 0.37 | 3.25 0.79, 0.25 0.71 | 3.04 0.78, 3.26 8.77 |
SA | 2.83 0.81, 0.7 0.02 | 2.92 0.81, 0.71 0.02 | 2.72 0.8, 0.71 0.018 |
SA_Relax | 3.06 0.88, 0.02 | 3.17 0.81, 0.02 | 2.89 0.82, 0.02 |
Random | 1.9 0.72, 0.001 | 2.15 0.9, 0.001 | 1.9 0.9, 0.001 |
Random2 | 2.91 0.87, 0.81 0.0004 | 3.11 0.79, 0.81 0.0004 | 2.89 0.78, 0.83 0.0004 |
Def | Euc | Rand | |
SA | 16.57 1.64, 485 20 | 17.2 2.4, 470 9.6 | 17.54 3.27, 503 20 |
SA_Relax | 12.3 1.58, 3.14 0.35 | 13.11 2.62, 2.84 0.2 | 14.47 2.96, 2.95 0.2 |
Random | 9.2 1.99, 0.001 | 10.02 2.77, 0.001 | 11.67 3.07, 0.001 |
Random2 | 12.75 1.03, 458 15 | 12.98 2.37, 437 13 | 14.67 3.18, 496 19 |
Next, we analyze the solution quality of our heuristic approaches. On small instances presented in Table 6, our best heuristic algorithm approximates the optimal solution quite well and the error is typically below with the SA_Relax method consistently outperforming SA. Both heuristics outperform Random, while Random2 performs better than SA (yet still worse than SA_Relax, while having a much longer running time). When moving to larger instances in Table 6, the picture flips, as SA is now substantially outperforming SA_Relax. This shows a general trend that the solution quality of SA scales more favorably than that of SA_Relax (while the opposite is naturally true for the running time). The heuristics again clearly outperform Random, with SA sensing approximately more targets. Random 2 performs similarly to the suboptimal heuristic SA_Relax, while being slower by a factor of more than .
Finally, we are interested in exploring the power of coordination for Red, i.e., the difference between the optimal utility Blue gets in the non-coordinated setting explored in this section compared to its utility in the Stackelberg equilibria from Section 5. We find that for the small instances where we can compute the Stackelberg equilibrium exactly Red can reduce Blue’s utility by to through coordination. For larger instance sizes, we no longer know the optimal solutions, which is why we resort to comparing the results of the respective SA heuristics. We find that for larger instances, the gap decreases with Red being only able to decrease Blue’s utility by through coordination in the instances from the Default setting underlying Table 3. In our full version [6], we show that when Red’s sensors are capable of sensing more targets, coordination is more important sometimes leading to halving Blue’s utility.
7 Conclusion
By introducing Espace Sensing Games, we initiated the study of a new class of games concerned with target arrangement and motivated by security applications. We showed that while the worst-case computational complexity of ESGs is prohibitive, our presented algorithms still have a good performance in experiments.
There are multiple directions for future work emanating from our work. First, pinpointing the precise complexity of computing Stackelberg equilibria remains a concrete open question. Second, there are other variants of ESGs beyond those studied by us. For instance, it would be possible to merge the settings studied in Sections 4 and 6 into a game where sensors act greedily but Red can control the ordering of the sensors. In this game variant where both Red and Blue need to pick orders, it would also be possible to study simultaneous play or Stackelberg equilibria where Red moves first. Lastly, there are various other target arrangement problems to be studied. One example could be a game where Blue needs to place targets on a grid and Red cannot sense any two targets placed close to each other.
Acknowledgements
This work was supported by the Office of Naval Research (ONR) under Grant Number N00014-23-1-2802. The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of the Office of Naval Research or the U.S. Government.
References
- [1] Lukás Adam, Rostislav Horcík, Tomás Kasl, and Tomás Kroupa. Double oracle algorithm for computing equilibria in continuous games. In AAAI, pages 5070–5077, 2021.
- [2] Gemayqzel Bouza Allende and Georg Still. Solving bilevel programs with the kkt-approach. Math. Program, 138:309–332, 2013.
- [3] Evripidis Bampis, Konstantinos Dogeas, Alexander V Kononov, Giorgio Lucarelli, and Fanny Pascual. Scheduling with untrusted predictions. In IJCAI, pages 4581–4587, 2022.
- [4] Soheil Behnezhad, Avrim Blum, Mahsa Derakhshan, MohammadTaghi HajiAghayi, Mohammad Mahdian, Christos H Papadimitriou, Ronald L Rivest, Saeed Seddighin, and Philip B Stark. From battlefields to elections: Winning strategies of blotto and auditing games. In SODA, pages 2291–2310, 2018.
- [5] Jeremiah Blocki, Nicolas Christin, Anupam Datta, Ariel Procaccia, and Arunesh Sinha. Audit games with multiple defender resources. In AAAI, pages 791–797, 2015.
- [6] Niclas Boehmer, Minbiao Han, Haifeng Xu, and Milind Tambe. Escape sensing games: Detection-vs-evasion in security applications. CoRR, abs/XXXX, 2024.
- [7] Victor Bucarey, Carlos Casorrán, Óscar Figueroa, Karla Rosas, Hugo Navarrete, and Fernando Ordóñez. Building real Stackelberg security games for border patrols. In GameSec, pages 193–212, 2017.
- [8] Martin Chapman, Gareth Tyson, Peter McBurney, Michael Luck, and Simon Parsons. Playing hide-and-seek: an abstract game for cyber security. In ACySE, pages 1–8, 2014.
- [9] Benoît Colson, Patrice Marcotte, and Gilles Savard. An overview of bilevel optimization. Annals of operations research, 153:235–256, 2007.
- [10] John H. Conway. On numbers and games, Second Edition. Academic Press, 2001.
- [11] James M Crawford and Andrew B Baker. Experimental results on the application of satisfiability algorithms to scheduling problems. In AAAI, pages 1092–1097, 1994.
- [12] Erik D. Demaine. Playing games with algorithms: Algorithmic combinatorial game theory. In MFCS, pages 18–32, 2001.
- [13] Stephan Dempe and Alain Zemkoho. Bilevel optimization. In Springer optimization and its applications, volume 161. Springer, 2020.
- [14] Fei Fang, Thanh Nguyen, Benjamin Ford, Nicole Sintov, and Milind Tambe. Introduction to green security games. In IJCAI, 2015.
- [15] Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2022.
- [16] Venkatesan Guruswami, C. Pandu Rangan, Maw-Shang Chang, Gerard J. Chang, and C. K. Wong. The vertex-disjoint triangles problem. In WG, pages 26–37, 1998.
- [17] Manish Jain, Dmytro Korzhyk, Ondrej Vanek, Vincent Conitzer, Michal Pechoucek, and Milind Tambe. A double oracle algorithm for zero-sum security games on graphs. In AAMAS, pages 327–334, 2011.
- [18] Jan Karel Lenstra, AHG Rinnooy Kan, and Peter Brucker. Complexity of machine scheduling problems. In Annals of discrete mathematics, volume 1, pages 343–362. Elsevier, 1977.
- [19] Yuqian Li, Vincent Conitzer, and Dmytro Korzhyk. Catcher-evader games. In IJCAI, pages 329–337, 2016.
- [20] Christof Löding and Philipp Rohde. Solving the sabotage game is PSPACE-hard. In MFCS, pages 531–540, 2003.
- [21] Eric Shieh, Bo An, Rong Yang, Milind Tambe, Craig Baldwin, Joseph DiRenzo, Ben Maule, and Garrett Meyer. Protect: A deployed game theoretic system to protect the ports of the united states. In AAMAS, pages 13–20, 2012.
- [22] Evan Sultanik, Pragnesh Jay Modi, and William C Regli. On modeling multiagent task scheduling as a distributed constraint optimization problem. In IJCAI, pages 1531–1536, 2007.
- [23] Sahar Tahernejad, Ted K Ralphs, and Scott T DeNegre. A branch-and-cut algorithm for mixed integer bilevel linear optimization problems and its implementation. Math. Program. Comput., 12:529–568, 2020.
- [24] Milind Tambe. Security and game theory: algorithms, deployed systems, lessons learned. Cambridge university press, 2011.
- [25] UN-News. Somalia: Pirates attack UN aid ship, prompting call for action, June 2007. https://reliefweb.int/report/somalia/somalia-pirates-attack-un-aid-ship-prompting-call-action.
- [26] United-Nation. United nations peacekeeping missions military maritime task force manual, Sept 2015.
- [27] Ondřej Vaněk, Zhengyu Yin, Manish Jain, Branislav Bošanskỳ, Milind Tambe, and Michal Pěchouček. Game-theoretic resource allocation for malicious packet detection in computer networks. In AAMAS, pages 905–912, 2012.
- [28] John I Winn and Kevin H Govern. Maritime pirates, sea robbers, and terrorists: New approaches to emerging threats. Homeland Security Rev., 2:131, 2008.
- [29] Mihalis Yannakakis. Edge-deletion problems. SIAM J. Comput., 10(2):297–309, 1981.
- [30] Mihalis Yannakakis and Fanica Gavril. Edge dominating sets in graphs. SIAM J. Appl. Math., 38(3):364–372, 1980.
- [31] Chongjie Zhang and Julie A Shah. Co-optimizating multi-agent placement with task assignment and scheduling. In IJCAI, pages 3308–3314, 2016.
Appendix A Additional Material for Section 4.1
See 2
Proof.
For our dynamic program, we create a table for each and . stores the minimum value of surviving target from induced by a valid sensing plan that for each assigns target to sensor if and to no sensor if . In this case, we say that witnesses the table entry. If no such sensing plan exists, is . Clearly, the answer to the problem is .
For the initialization, we set an entry to if and to otherwise. Now, we update the table for increasing by filling as follows. We start by assuming that implying that is to be sensed by . In this case, we set the entry to if (i.e., the recharging period of would be violated) or if is not capable of sensing . In both cases, there is no sensing plan to witness the entry. Otherwise, we update the entry as:
We claim that in case an update of the entry happens here, it is correct. For this, let be the minimizer of the right hand side and the plan witnessing . Then, we can extend to a sensing plan by adding to . Plan is valid and a witness for the updated entry, since is valid and we ruled out above that is either not ready or capable of sensing .
Lastly, it remains to consider the case , where is not assigned to any sensor. In this case, we update the entry as:
Analogous to above, in case an update happens assume that is the minimizer of the right-hand side and the plan witnessing . Then is also a witness for the updated entry.
The algorithm runs in , as the table contains entries and for each entry we need to take the minimum over values.
For , let be the number of sensors of type . To extend the algorithm to sensor types, we need to prove the following lemma that allows us to collapse all sensors of the same type into a “meta”-sensor, which can sense many targets in each -time window:
Lemma 9.
There is a valid sensing plan of value if and only if there is a sensing plan of value where sensor capabilities are respected and for each at most targets from are assigned to a target of type for .
Proof.
Let be a valid sensing plan of value . We claim that also fulfills the second condition. Assume for the sake of contradiction that there is a and some so that more than targets from are assigned to a target of type . Then, by the pigeonhole principle, at least one sensor of type needs to sense two targets within steps, rendering the plan invalid.
For the reverse direction, assume is the sensing plan of value respecting the condition. For each sensor type , let be the targets sensed by sensors of type in . We construct a sensing plan as follows. For each sensor type , we iterate over the targets in according to their position in the target ordering and always assign a target to the sensor of type who has not sensed a target for the longest time. has the same value as and respects sensor capabilities, so it remains to argue that the recharging periods are respected. For the sake of contradiction assume that there is a sensor of type and some so that the sensor is assigned two targets from in . Then, by the construction of all other sensors of type are also assigned a target from . It follows that sensors of are assigned at least many targets from in and thereby also in , which contradicts our initial assumptions on . ∎
Using this lemma we can easily adjust the dynamic programming formation: Instead of bookmarking the sensors that have sensed the last targets, we instead bookmark the types of these sensors. Now, we can set a table entry to (due to violated recharging time) if contains more some sensor type more than times. The rest of the algorithm adapts in a straightforward manner with a resulting running time of ∎
Appendix B Additional Material for Section 4.2
See 4
Proof.
Let be some target ordering. We will say that two targets are (placed) at distance (in ) if . Further, for a target we define its -surrounding to be the set of all targets whose distance from is at most .
We start the proof with two immediate claims about optimal sensing plans in response to some target ordering :
Claim 1.
Let be a sensor and be a target so that is the only sensor capable of sensing and has a higher value than all the other targets can sense combined. senses in an optimal sensing plan.
Proof.
Assume that did not sense , we could just arrive at a better valid sensing plan by letting sense only . ∎
Claim 2.
Let be a sensor with recharging time and be some target is capable of sensing. If there is no other target capable of sensing in the -surrounding of , then will be sensed by some sensor in the optimal sensing plan.
Proof.
Assume that is not sensed by any sensor. Then we can just arrive at a better valid sensing plan by letting sense : This will not violate the recharging constraint, as does not sense any other targets in the -surrounding of . ∎
We reduce from Independent Set on -regular triangle-free graphs [16, Theorem 3].
Construction
Let be a -regular triangle-free graph. For each vertex , we introduce a vertex target of value and for each a blocker vertex target of value and a vertex sensor which is capable of sensing and . Moreover, for each edge , we introduce an edge target of value and an edge sensor that can sense , , and . Additionally, we add a constraining edge target of value and a constraining edge sensor that is capable of sensing targets and . The recharging time is . We claim that there is an -sized independent set in if and only if the Stackelberg equilibrium in the constructed ESG has value at least .
Forward Direction
Assume that is an independent set of size . To construct the target ordering we go through the vertices from the independent set one by one: For , let be the three edges incident to . Then, we add to the ordering of targets the targets in this order. After we have processed all vertices from the independent set like this, we append all other targets in a random ordering. First observe that because of 1, the constraining edge sensors will always sense the constraining edge targets. As a result of the structure of our target ordering, the constraint edge sensors for edges incident to cannot sense the corresponding edge targets. From this, following a reasoning analogous to 1, we get that for each edge incident to a vertex from , the edge sensor senses the edge target . Moreover, we also get from 1 that all three vertex sensors corresponding to a vertex sense their respective blocker vertex target. All in all, it follows that for each vertex all three corresponding vertex sensors and all edge sensors corresponding to incident edges sense a target within distance of . Thus, none of the sensors that are capable of sensing can do so without violating their recharging constraint. Thus, all vertex targets corresponding to vertices from make it unsensed through the channel.
Backward Direction
Assume that we have a target ordering so that targets of summed value at least make it unsensed through the channel under optimal play by Red. Let be the optimal sensing plan played in response by Red. From 1, it is immediate that no blocker vertex target and no constraining edge target can make it unsensed through the channel. Moreover, it is also easy to see that no edge target can make it unsensed through the channel, as otherwise we could always improve the plan by letting the corresponding edge sensor sense the edge target (and delete all vertex targets the sensor senses instead). Let be the set vertices so that the corresponding vertex targets make it unsensed through the channel. By the above observation, we have . We will now show that is an independent set.
For this, we make a series of observations: First, let be an unsensed vertex target. Note that can be sensed by different sensors (three vertex sensors and three edge sensors). By 2, we have that for all of these sensors there needs to be another target that the sensor is capable of sensing in the -surrounding of . As there are targets in a -surrounding, it follows that the -surrounding of contains only targets that can be sensed by one of these six sensors (specifically one target for each of these six sensors).
Second, we show that for each unsensed vertex target there is no other vertex target in its -surrounding. Recall from the first observations that the only vertex targets that could be in the -surrounding of are vertex targets corresponding to neighbors of that are sensed by the corresponding edge sensor in . For the sake of contradiction assume that there is some so that and . Observe that the -surroundings of and overlap in at least targets. As is triangle-free and thus and do not share any common neighbors, this implies that the -surrounding of contains at most three targets that can be sensed by one of the six sensors that are capable of sensing (the other three spots will be filled with vertex blocker targets corresponding to and edge targets corresponding to edges incident to by the first observation and as makes it unsensed through the channel). Let be a sensor that is capable of sensing for which no other target that can sense in the -surrounding of . Next, note that by the first observation, is sensed by in . We alter the sensing plan as follows. We let instead of be sensed by and let be sensed by . As argued above, this does not violate the recharging time of the sensor . Moreover, it also does not violate the recharging time of , as there are no other targets (except ) that is capable of sensing in the -surrounding of (by the first observation). In the altered sensing plan the value of surviving targets is strictly smaller contradicting that is an optimal response by Red.
Third, we claim that for each of two unsensed vertex targets and we have that they are placed at a distance at least from each other, i.e., their -surroundings do not overlap. In order to show that this cannot be the case, we need to examine the constraining edge sensors. Let be an edge incident to . Our first two observations told us that the edge target needs to be in the -surrounding of . However, this is not sufficient: We claim that the constraining edge target needs to be in the -surroundings of . Assume that this was not to hold. Then we could alter our sensing plan by making sense (it can do so without violating its recharging time because is not in the -surrounding of and make sense , which leads to a strictly better sensing plan. The claim follows. To prove the observation, for the sake of contradiction assume that the -surroundings of and did overlap. From the previous two observations, it follows that the only possibility for their -surrounding to overlap is that there is an edge and that constitutes the overlap of their -surrounding. However, in this case cannot be placed in the -surrounding of , as it can neither belong to the -surrounding of nor (by the first observation). This leads to a contradiction to our above claim.
Now, combining these observations it follows that for each , the -surrounding of contains edge targets with , , and being the edges incident to . Moreover, we have shown that the -surrounding of each target with is disjoint. This implies that no two vertices can be incident to the same edge , as otherwise, would be in the -surrounding of and , which leads to a contradiction as they are disjoint. It follows that is an independent set of size at least .
∎
See 5
Proof.
We start by giving the formulation, where the inner-level program is similar to the ILP presented in Proposition 3. Note that the outer-level maximizes the value of non-sensed targets, while the inner-level minimizes this value:
(2) | ||||
s.t. | (3) | |||
(4) | ||||
(5) | ||||
(6) | ||||
s.t. | (7) | |||
(8) | ||||
(9) |
The outer-level program controls an integer variable for each target that encodes the position in which the target appears in the final ordering. Moreover, for each pair of targets , we have a binary variable capturing whether appears at least positions before in the ordering induced by the variables. Equation 3 ensures that each target is assigned a unique position101010For each , to convert Equation 3 into a linear constraint, we have to introduce a new binary variable . We then add two linear constraint and that are satisfied if and onlf if . and Equations 5 and 6 ensures that is set to one if and only if (in Equations 5 and 6 we have if ; and if ).
The inner-level program controls binary variables for each and , which encode the sensing plan as in Proposition 3, i.e, for and implies that is sensed by . The value of the encoded plan is again , which Blue wants to maximize (Equation 2) and Red wants to minimize (Equation 6). The validity of the sensing plan is secured in Equations 7, 8 and 9, where Equation 9 imposes that a sensor can only sense two targets and if either is at least positions before in the ordering encoded in the variables (i.e., ) or the other way around (i.e., ). ∎
We give our greedy approximation algorithm for Best Red Response in Algorithm 3.
Appendix C Additional Material for Section 6.1
C.1 Proof of Theorem 6
C.1.1 Connection Best Blue Response and Minimum Maximal Matching
On an intuitive level, solving Best Blue Response with infinite recharging time and uniform target values has some similarities to solving the classic NP-hard Minimum Maximal Matching problem Yannakakis and Gavril [30]:
Minimum Maximal Matching
Input: A bipartite graph and an integer .
Question: Is there a maximal matching in containing at most edges?
We now discuss the intuitive connection as well as reasons why immediate reductions between the two problems are prohibited. Assume that we have a solution to our Best Blue Response instance where targets are ordered as . From this let us construct a bipartite graph with vertices on the left side and vertices on the right side. For the edge set , we add an edge between a target and a sensor if is capable of sensing Let now be the set of sensor-target pairs with if senses when targets are send according to . It needs to hold that is a maximal matching in : Otherwise, there is some . The existence of this edge implies that target made it through the channel and sensor did not sense any target, which leads to a contradiction. Moreover, the size of this matching, i.e., , corresponds to the number of lost targets. Thus, Blue wants to find a maximal matching of minimum size. This discussion suggests a close connection between Simple Sequential Covering and Minimum Maximal Matching.
However, there are some crucial differences between the two problems which prohibit immediate reductions from one problem to the other: Most crucially, assume we were to model a bipartite graph as an instance of Best Blue Response by letting be the targets and the sensors (in some ordering ) and let be capable of sensing if . The problem with this construction is that we cannot model arbitrary matchings as solutions to the constructed Best Blue Response instance: Assume that contains some edge and there is some with , , and is not incident to any edges from . In this case, it is not possible to send the targets through the channel such that will be the sensed sensor-target pairs because will always be sensed by before it can be sensed by (which in turn implies that is still ready to sense other targets). As a consequence, intuitively speaking, in instances of Best Blue Response we are only interested in maximal matchings where each matched vertex from the left is matched to its “first” otherwise unmatched neighbor from the right side.
Because of this, we need to turn to a slightly more involved reduction that draws inspiration from the NP-hardness proof of Minimum Maximal Matching by Yannakakis and Gavril [30], yet requires some reworking of the construction and a different more involved proof. We reduce from the following SAT variant.
C.1.2 Proof of Correctness
See 6
Proof.
In this proof, for a target , we let be the set of sensors that are capable of sensing .
We reduce from the following problem, which is NP-hard as proven by Yannakakis [29].
Restricted 3-Sat
Input: A propositional formula where each clause contains three literals and each variable appears in exactly two clauses positively and in exactly one clause negatively.
Question: Is there an assignment to variables in such that each clause from is satisfied?
Let be a given Restricted 3-Sat instance.
Construction.
Each target has value one. For each clause , we add a clause target and a clause sensor with .
For each variable , we add a variable gadget. That is, we add variable targets , and together with dummy targets , , , and . Next, we add catch sensors , , and , which ensure that none of the variable targets can make it through the channel. Moreover, we add variable sensors , and together with dummy sensors and .
Let be three clauses such that appears positive in clauses and and negative in clause . The sensing matrix is defined through:
The recharging time is , i.e., each sensor can sense at most one target. The ordering of the sensors is as follows. First, come the clause sensors (in some arbitrary ordering), then the dummy sensors (in some arbitrary ordering), then the variable sensors (in some arbitrary ordering) and last the catch sensors (in some arbitrary ordering). We ask whether there is an ordering of targets so that at least targets are not sensed. It is easy to see that the construction satisfies the restrictions from the theorem statement.
Proof of Correctness: Forward Direction
Assume we are given an assignment of variables that fulfills . Let be the set of variables set to true in this assignment. From this, we construct a partition of the targets into four groups that determine the ordering in which the targets move through the channel; the first group comes first and so on; the ordering of targets within one group is arbitrary. The first group consists of targets and for each and target for each . The second group consists of targets and for each and and for each . The third group contains all remaining variable targets. And, finally, the fourth group contains all clause targets and the remaining dummy targets. It is sufficient to prove that all targets from the fourth group make it through the channel. We prove this via a series of three claims.
Claim 3.
-
1.
Each clause sensor senses a target from the first group.
-
2.
Each dummy sensor senses a target from the second group.
-
3.
For each , senses a target from one of the first three groups, and for each , and sense a target from one of the first three groups.
Proof.
Proof of 1. This follows directly from the fact that is a satisfying assignment and that the clause sensors come first in the ordering of sensors.
Proof of 2. This follows directly from the fact that the dummy sensors come after the clause sensors in the sensor ordering and as the second group contains for each or (making sense a target) and or (making sense a target).
Proof of 3. Let us focus on one , and let be three clauses such that appears positive in clauses and and negative in clause . If , then is part of the third group. However, from Statement 1 it follows that already sensed a previous target. As the variable sensors are before the catch sensors in the sensor order, it follows that senses . Similarly, if , and are part of the third group. Both and have already sensed previous targets because of Statement 1. From this Statement 3 follows. ∎
The claim implies that all sensors that can sense a target from the fourth group have already sensed another target before it is the fourth group’s turn, implying that all targets from the fourth group will make it through the channel.
Proof of Correctness: Backward Direction
Assume that there is an ordering of the targets such that at least targets move unsensed through the channel, and let be the set of these targets.
We prove the backward direction in a series of claims:
Claim 4.
-
1.
No variable target is part of .
-
2.
For each , either or and either or is part of . All clause targets are part of .
-
3.
For each , if is sensed by a clause target, then neither nor are sensed by a clause target.
Proof.
Proof of 1. This follows immediately from the existence of a designated catch sensor for each variable target which can only sense this target. As a consequence, no variable target can ever make it unsensed through the channel.
Proof of 2. Note that because of the sensor it is never possible that both and make it unsensed through the targets. Similarly, because of it is never possible that both and make it unsensed through the channel. Together with Statement 1, this implies that from each variable gadget at most targets can be part of . By recalling that and that there are only clause targets outside of variable gadgets, the statement follows.
Proof of 3. Let us focus on (the proof for is analogous). For the sake of contradiction assume that and are both sensed by clause sensors, then both and do not sense a variable target, respectively. Accordingly, at most one target out of can make it unsensed through the channel, contradicting Statement 2. ∎
Let be a truth assignment that sets to false if is sensed by a clause sensor and to true if or is sensed by a clause sensor. If neither of the two conditions hold, then we set to true. Note that the well-definedness of follows immediately from Statement 3 of 4. Assume that is not satisfied by . However, this implies that does not sense a variable target corresponding to a literal appearing in (by the definition of ). This implies that will sense , a contradiction to Statement 2 of 4. ∎
C.2 Proof of Proposition 7
See 7
Proof.
Recall that we assume that sensors act greedily and that the sensor ordering is fixed and known. We iteratively construct the target ordering always appending an additional target at the end of the ordering, while storing the types of already sent targets as well as the sensors that sensed the last targets. For our dynamic program, we create a table with for each and . For a table cell, let , i.e., is the total number of targets that have been sent. An entry of the table stores the maximum value of targets that can survive if Blue sends targets of type (for each ) through the channel in a way that the th last target sent for is sensed by sensor if and by no sensor if (the intuition is that is the sensor that sensed the most recent passing target (if existent), sensed the second most recent one, and so on. If the second constraint is not realizable, we set the table entry to . The answer to our problem is .
For the initialization, we set an entry to if and to otherwise. Now, we update the table for increasing by filling as follows. We start by assuming that implying that the next target to be sent needs to be sensed by . If appears among , we set the entry to . Otherwise, let be the target types so that is capable of sensing targets of this type and only sensors from appear before in the sensor ordering and are capable of sensing targets of this type. Less formally speaking, are all the target types so that if a target of this type is sent next over the channel would be the sensor sensing this target (as all other sensors placed before that can sense targets of this type are still recharging, i.e., they are part of ). If no such target type exists, we set to . Otherwise, for each , we check and let the entry be the maximum of these values.
Analogously, if , we let be the target types where contains all the sensors that are capable of sensing targets of this type. If no such target type exists, we set the entry to be . Otherwise, for each , we compute plus the utility of targets of type and let the entry be the maximum of these values.
The correctness follows from the fact that the initially stated invariant is preserved throughout the algorithm. Observing that computing each table entry takes time, the claimed running time of follows.
∎
C.3 Proof of Proposition 8
See 8
Proof.
We model an instance of Best Blue Response as follows. For each target , we add an integer variable that encodes the position in which the target appears in the final ordering. We add linear constraints so that and for all (see Footnote 10).
Next, similar as in Proposition 3, for each and , we add a binary variable . Setting to one means that is detected by sensor or in case that that the target makes it unsensed through the channel. Accordingly, the objective becomes:
For each target , we impose that:
Moreover, we impose for each and , enforcing the sensor capabilities.
To ensure that the recharging times of sensors are respected we add the following set of constraints. For each, and , we add
This ensures that if and , then and are placed far enough away from each other, while otherwise the condition is vacant. To realize the absolute value from the above equation, we have to introduce another set of binary variables for and add the constraints: and .
If a target is sensed by sensor then due to the recharging time sensor will not be able to sense other sensors, thus “protects” some targets from being sensed by sensor . To capture this information, for each and , we add a binary variable that is equal to one target is sensed by sensor and because of this cannot sense . To ensure this, first, for each and , we add:
(a target can only protect other targets if the target is sensed by the corresponding sensor). Moreover, for each and , we need to make sure that if , then (to exploit of the recharging constraint). For this, we add constraints:
(10) |
Lastly, we need to make sure that a target will survive until step if i.e., the target needs to be covered by other targets for all sensors that are capable of sensing it placed before . Note that this together with the first constraint () in particular implies that each target is sensed by the first sensor it passes which is not recharging, thereby successfully encoding the greedy behavior of the sensors. Specifically, we add the following set of constraints for each and :
(11) | ||||
∎
Appendix D Additional Experimental Results
To begin, we generate a Figure 2 illustrating the average utility for Blue across varying probabilities of . This visualization aims to demonstrate the impact of the probability of on Blue’s utility under the Default game settings.
In Table 7, we show the effectiveness of our ILP solver. Notably, it demonstrates the capability to solve large instances very fast, completing the task within a second. This proficiency has been valuable in the development of heuristic algorithms for bilevel optimization in the search for identifying the Stackelberg equilibrium. We also show that our ILP solver can solve extensive instances involving hundreds of targets within a single hour in Table 8.
Utility, Time (s) | 2 | 5 | 10 |
5 | 1.76 0.75, 0.001 | 1 0.63, 0.002 0.02 | 0.38 0.43, 0.003 0.002 |
25 | 8.73 1.27, 0.004 0.002 | 4.55 1.3, 0.008 0.002 | 1.39 0.84, 0.01 0.003 |
75 | 26 2.8, 0.01 0.007 | 13.9 2.33, 0.02 0.007 | 4.41 1.57, 0.04 0.007 |
Utility, Time (s) | 5 | 10 | 20 |
600 | 175 6, 0.73 0.01 | 77 5, 1.55 0.22 | 4.01 1.53, 2.79 0.04 |
800 | 234 6.5, 1.02 0.01 | 103 6.1, 2.09 0.27 | 5.28 2.06, 3.82 0.04 |
1000 | 292 7.2, 1.23 0.02 | 131 6.7, 2.62 0.35 | 6.4 2, 5 0.06 |
5000 | 1478 20, 6.5 0.05 | 662 13.1, 13.66 1.48 | 33.1 3.98, 24.76 0.051 |
10,000 | 2959 24, 12.9 0.18 | 1321 20, 29.8 2.6 | 66.3 5.57, 62.59 0.83 |
In the remaining parts of this section, we explore a new game setting (Append) for generating ESGs. The new method is similar to Default, with the distinction that each element with a 0.5 probability. In essence, this configuration increases the likelihood of each target being sensed compared to the Default setting, thereby resulting in a stronger Red sensing model.
D.1 Computing the Follower Strategy
Similar to the Default game setting, we show the scalability results of Append in table 9.
Utility, Time (s) | 5 | 10 | 20 |
600 | 67.8 4.3, 0.74 0.06 | 13.82 3.96, 4.68 8.46 | 0.2 1.38, 2.85 0.04 |
800 | 91.1 5.8, 1.02 0.1 | 19.6 7.75, 6.27 10.98 | 0.13 0.36, 3.66 0.04 |
1000 | 114 7.9, 1.29 0.14 | 24.5 5.9, 6.06 4.7 | 0.25 1.45, 4.77 0.06 |
In this new setting, an intriguing observation emerges as the number of sensors increases significantly: the runtime of our ILP decreases, given that the abundant sensors can effectively sense all targets (e.g., when the number of sensors increased from 10 to 20.).
D.2 Additional Results from Computing the Stackelberg Equilbrium
In this subsection, we begin by showing Figure 3, illustrating the impact of the choice of ration () on the SA algorithm discussed in Section 4.2. Specifically, we present and test three quadratic-time greedy heuristics to build the sensing plan iteratively by trying to sense the most valuable targets first. We consider the targets in decreasing order of their value. Let be the already processed targets and the target to consider. Moreover, let the set of sensors so that remains a valid sensing plan after adding to , i.e., the sensors that are currently free to sense . If is empty, then we do not assign to any sensor, implying that it will be won by Blue. Otherwise, we apply three different methods to decide which sensor from to pick:
- random
-
Randomly select a sensor from .
- remaining_value
-
Pick the sensor from that has the lowest summed value of remaining targets that is capable of sensing, i.e., .
- harm
-
Pick the sensor from that where assigning does the least harm: The harm that does to in summed value of remaining targets that is capable of sensing that it can no longer sense when is assigned to , i.e., .
We also present the bilevel ILP’s scalability results under the new Append game setting in Table 10.
Utility, Time (s) | 2 | 3 | 5 |
5 | 0.72 0.55, 0.73 0.17 | 0.41 0.49, 0.85 0.12 | 0.14 0.33, 1.02 0.04 |
7 | 0.92 0.43, 121 13 | 0.57 0.53, 134 17 | 0.18 0.33, 155 16 |
8 | 0.86 0.51, 1841 356 | 0.41 0.42, 2151 253 | 0.13 0.27, 2832 549 |
9 | n/a, 31358 | n/a, 36181 | n/a, 41999 |
We also provide a comparison of heuristic algorithms under the new Append game setting in Table 11.
Utility, Time (s) | (7,3,2) | (75, 10, 5) |
OPT | 0.58 0.53, 134 17 | n/a |
SA | 0.58 0.53, 6.13 0.87 | 2.26 1.13, 27335 347 |
SA_Relax | 0.58 0.53, 0.043 0.01 | 0.56 0.48, 46.87 0.92 |
Random | 0.50 0.54, 0.001 | 0.15 0.27, 0.001 |
D.3 Additional Results from Non-Coordinated Sensing
In Table 12, we present the scalability results of the ILP that solves for optimal Blue responses in the non-coordinated Red sensing setting under Append game setting.
Utility, Time (s) | 2 | 3 | 5 |
5 | 1.26 0.46, 0.01 0.04 | 0.96 0.47, 0.02 0.04 | 0.52 0.42, 0.02 0.04 |
10 | 2.34 0.7, 4.07 16.8 | 1.85 0.89, 90.2 294.6 | 1 0.73, 293 565 |
15 | 3.79 0.87, 482 2551 | 3.01 1.16, 50.4 18.9 | n/a, 40296 n/a |
20 | 4.61 1, 889 1530 | n/a, 9380 n/a | n/a, n/a |
For the non-coordinated sensing setting, we also provide a comparison of heuristic algorithms under the new scenario in Table 13.
Utility, Time (s) | (10,5,2) | (75, 10, 5) |
OPT | 1 0.73, 293 565 | n/a |
SA | 0.83 0.67, 0.76 0.02 | 4.57 1.46, 523 6.28 |
SA_Relax | 0.94 0.71, 0.02 0.003 | 3.47 1.02, 3.13 0.12 |
Random | 0.38 0.5, 0.001 | 1.28 1.05, 0.001 |
D.3.1 Power of Coordination
Finally, in Table 14, we present the power of coordination results under Append game setting with Red that has stronger sensing capabilities.
Greedy | Coordination | |
(5, 2, 2) | 1.26 0.46 | 0.72 0.55 |
(5, 3, 2) | 0.96 0.47 | 0.41 0.49 |
(5, 5, 2) | 0.52 0.42 | 0.14 0.33 |
Moreover, in scenarios with large instance sizes, such as , where the optimal (bilevel) ILP is unsolvable, we can compare Blue’s approximately optimal utility under the best heuristic algorithms. Specifically, as shown in Table 11, the average Blue’s utility under the SA algorithm is for 9 instances. Given the same 9 instances, the average Blue’s utility when escaping from non-coordinated sensing is , which is approximately twice the value observed in the coordinated sensing setting. Once again, due to the constant-sum nature of this game, the utility loss for Blue in transitioning from non-coordinated sensing to coordinated sensing essentially represents the utility gain for Red, highlighting the power of coordination.