Escape Sensing Games:
Detection-vs-Evasion in Security Applications

Niclas Boehmer¹¹1Equal Contribution.
Harvard University
[email protected] Minbiao Han^∗
University of Chicago
[email protected] Haifeng Xu
University of Chicago
[email protected] Milind Tambe
Harvard University
[email protected]

Abstract

Traditional game-theoretic research for security applications primarily focuses on the allocation of external protection resources to defend targets. This work puts forward the study of a new class of games centered around strategically arranging targets to protect them against a constrained adversary, with motivations from varied domains such as peacekeeping resource transit and cybersecurity. Specifically, we introduce Escape Sensing Games (ESGs). In ESGs, a blue player manages the order in which targets pass through a channel, while her opponent tries to capture the targets using a set of sensors that need some time to recharge after each activation. We present a thorough computational study of ESGs. Among others, we show that it is NP-hard to compute best responses and equilibria. Nevertheless, we propose a variety of effective (heuristic) algorithms whose quality we demonstrate in extensive computational experiments.

1 Introduction

The past decade has witnessed an influential line of research in AI, particularly multiagent systems (MAS), that employs computational game theory to tackle critical challenges in security and public safety applications, ranging from protecting national ports [21] to combating smuggling [7] and illegal poaching [14] to defending our cyber systems [27]. At the core of almost all of these problems is to optimize the allocation of (often limited) external forces to protect critical targets. In this work, we adopt a similar computational game theory approach but address a fundamentally different type of security challenge that looks to improve security via optimizing the arrangement of targets in the face of adversaries. This research contributes a novel perspective to game-theoretic security strategies, emphasizing target arrangement as a defense mechanism against adversaries.

Specifically, we introduce and study Escape Sensing Games (ESGs). In these games, a blue player aims to securely navigate a set of targets through a channel, whereas her opponent, the red player, controls a set of sensors along the channel and tries to sense (and therefore “steal”) as many targets as possible. This model captures strategic interactions arising in various domains. One example is the transportation of peacekeeping resources using a convoy of ships or cars over a fixed route with malicious actors (e.g., pirates or hostile forces) trying to intercept them [25] (see Section 2 for details). In cybersecurity, the blue player could model a network administrator routing sensitive data packets through a network with an attacker trying to intercept them. Our model captures strategic interactions in these settings arising when security measures are either unavailable or have already been allocated and the blue player is only left with scheduling the targets to avoid detection by the attacker.

We study the optimal sequential play in ESGs, where the blue player first commits to an ordering of targets followed by the red player devising an optimal sensing plan. Herein, sensors’ capabilities are limited in two ways. First, each sensor is only capable of sensing certain targets, modeling that detection and interception technologies are not uniformly effective across different targets, due to differing characteristics such as size, speed, or defense mechanisms. Second, sensors need a certain time to recharge after sensing a target, modeling limits inherent in detection and interception systems, where permanent action is not feasible.

There are certain challenges integral to our model that make the computation of equilibria highly non-trivial. First, the action space of both players has an exponential size, rendering standard solution approaches such as support enumeration computationally infeasible. Second, also after the strategies of both players have been fixed, the game evolves in a complex, sequential fashion with targets moving one after each other through the channel. Connected to this, third, it turns out that the red player’s best response problem of coming up with an optimal sensing plan given a target ordering is already NP-hard. Consequently, this paper also contributes to the algorithmic research on computationally challenging games, a fairly unexplored topic outside of combinatorial game theory [10, 20, 12].

1.1 Our Contribution

We contribute a new perspective to the rich literature on computational game theory for security applications through our study of the previously overlooked problem of target arrangement. Specifically, we introduce and analyze Escape Sensing Games with a focus on the target-controlling blue player. We demonstrate that solving this game is highly complex, as we prove that it is NP-hard for both players to compute their optimal strategies. To nevertheless be able to solve ESGs in practice, we devise algorithms for computing the red player’s strategy, which turn out to scale well in our experiments. Computing the blue player’s strategy and thereby the game’s Stackelberg equilibrium turns out to be a much more intricate task. Our experiments show that our formulation of the problem as a bilevel program is only capable of solving small instances of the game exactly. Motivated by this, we present a heuristic that effectively combines simulated annealing with a greedy heuristic and an Integer Linear Program (ILP) for computing the red player’s strategy. We demonstrate the quality of our heuristic through extensive experiments.

We further this investigation in Section 6 by studying a different variant where sensors are decentralized hence each sensor acts independently according to a simple greedy strategy. We show that it remains NP-hard for the blue player to compute its optimal strategy. While in this setting blue’s problem admits an ILP formulation, we demonstrate in experiments that it can only solve up to medium-sized instances. Addressing this, we present heuristics that perform well in our experiments. We also demonstrate that while sensors usually have some gain from coordination, this gain depends decisively on the instance structure and is oftentimes rather small (below $20\%$ ).

Full proofs of all results and descriptions of additional experiments can be found in our full version [6].

2 Escape Sensing Games (ESGs): The Model

Refer to caption — Figure 1: A visual representation of an Escape Sensing Game.

In an Escape Sensing Game (ESG) a blue player (henceforth Blue) tries to route a set of targets through a channel. They compete against a red player (henceforth Red) who controls a set of sensors and tries to sense (and therefore “steal”) as many of Blue’s targets as possible.²²2Note that the terms “sensor” and “sensing” are only part of our terminology and do not limit the applications of our model. For instance, instead of “sensing” the targets, Red might also aim to intercept them. Formally, an Escape Sensing Game is defined by

1.

a set of targets $T=\{t_{1},\dots,t_{n}\}$ , each target $t_{i}\in T$ equipped with some utility value $v_{i}\in\mathbb{R}^{+}$ ,
2.

a set of sensors $S=\{s_{1},\dots,s_{k}\}$ and a recharging time $\tau\in\mathbb{N}\cup\{\infty\}$ which is the same for all sensors and,
3.

a sensing matrix $D\in\{0,1\}^{n\times k}$ where $D_{i,j}=1$ means that sensor $s_{j}$ is capable of sensing target $t_{i}$ .

We assume that all of these parts are known to both players at any point in time. Note that in this paper, we consider the constant-sum utility structure and leave the general-sum version for future work. That is, Blue seeks to maximize the summed value of not-sensed targets, i.e., targets not sensed by any sensor. In contrast, Red seeks to minimize this value, or equivalently, maximize the summed value of targets that are sensed by some sensor.

The strategy of Blue is an ordering of the targets $\sigma:T\to[n]$ that assigns each target $t\in T$ a unique position $\sigma(t)$ , i.e., $\sigma$ is a bijection. The targets move through the channel according to $\sigma$ , i.e., the target on position $1$ moves first, on position $2$ second, and so on. In each time step each target moves to the next sensor, leaves the channel (in case it passed all sensors), or enters the channel at the first sensor (in case it is the next target in the ordering $\sigma$ ).

The strategy of Red is a sensing plan $\psi:S\to 2^{T}$ that maps each sensor to a subset of targets $T^{\prime}\subseteq T$ sensed by the sensor, where each sensor senses different targets, i.e., $\psi(s)\cap\psi(s^{\prime})=\emptyset$ for each $s\neq s^{\prime}\in S$ . Red cannot play arbitrary sensing plans but only those which are valid. A sensing plan is valid (with respect to a senor ordering $\sigma$ ) if (i) a sensor only senses targets it has the capabilities to sense, i.e., for each $s_{j}\in S$ and $t_{i}\in\psi(s_{j})$ we have $D_{i,j}=1$ , and (ii) a sensor pauses for at least $\tau$ time steps after sensing a target, i.e., for each $s\in S$ and $t,t^{\prime}\in\psi(s)$ we have $|\sigma(t)-\sigma(t^{\prime})|>\tau$ . Given a sensing plan, we can immediately calculate the value of not-sensed targets as $v(\psi):=\sum_{t_{i}\in T\setminus\cup_{s\in S}\psi(s)}v_{i}$ , which quantifies Blue’s utility.

Objectives and equilibrium

Due to the motivating applications of our interest, this work adopts Blue’s perspective and analyzes sequential play in this game by assuming Blue moves first.³³3Note that this is already reflected in our game definition, since the validity of Red’s sensing plan depends on the strategy of Blue. Thus, Red cannot move before or simultaneous to Blue. Our analysis consists of two parts. First, we will analyze the best response problem for Red called Best Red Response: Given an ordering $\sigma$ of the targets, output the sensing plan $\psi$ that is valid with respect to $\sigma$ and minimizes $v(\psi)$ among such plans. Second, to compute the optimal strategy of Blue, we analyze the game’s Stackelberg equilibrium⁴⁴4We assume that ties in the strategies are broken according to some predefined lexicographic ordering of the strategies., which can be written as the following bilevel optimization problem: $\max_{\sigma}\min_{\begin{subarray}{c}\psi:\text{$\psi$ is valid wrt. $\sigma$% }\end{subarray}}v(\psi).$ We term the corresponding computational problem Blue Leader Stackelberg Equilibrium.

A motivating application

One major motivation of our work is the secure transit of peacekeeping resources in the presence of adversarial actors such as pirates, which has critical importance due to past incidents, e.g., to the United Nations [25]. Citing the UN’s peacekeeping mission manual [26], “protecting shipping in transit ensures the safety and security of vessels as they pass through waters threatened by piracy on the high seas…” In these applications, UN plays Blue’s role whereas pirates correspond to Red, who can observe the ordering of targets and then act second. The UN commands a fleet of ships (i.e., targets in our model) that often carry resources of different importance and that can be arranged strategically. Protecting shipping is overall a complex, multi-facet, task and our model captures one of the phases after potential (often scarce) security measures have already been allocated to the ships and the pirates look to identify targets to attack. According to Winn and Govern [28], pirates often use a set of boats (i.e., sensors in our model) to probe different passing targets, usually by following them to observe their speed, crew amount, firearm, etc. to judge based on this whether they are capable of capturing the ship. Such probing takes time, which is modeled by the recharging time $\tau$ .

Sensor and target types

We develop some customized algorithms for instances with only a few different target or sensor models: We say that two sensors $s_{i},s_{j}\in S$ are of the same type if they are capable of sensing the same targets, i.e., the $i$ -th and $j$ -th column of the sensing matrix are identical. We say that two targets $t_{i},t_{j}\in T$ are of the same type if they have the same utility value and can be sensed by the same sensors, i.e., $v_{i}=v_{j}$ and the $i$ -th and $j$ -th row of the sensing matrix are identical. We denote as $\Gamma=\{\gamma_{1},\dots,\gamma_{n_{\chi}}\}$ and $\Theta=\{\theta_{1},\dots,\theta_{k_{\chi}}\}$ the set of target and sensor types, respectively. It is easy to see that $k_{\chi}\leq 2^{n_{\chi}}$ , as a sensor’s sensing capabilities are defined by the set of target types it can sense. Similarly, assuming that all targets have the same value, it holds that $n_{\chi}\leq 2^{k_{\chi}}$ .

3 Related Work

While the escape sensing game model is new, it is closely related to a few lines of AI research, as detailed below.

Computational game theory for security

Conceptually, our work subscribes to the extensive MAS literature on computational game theory for tackling security challenges. The Stackelberg security game [24] is one widely studied example. Other game-theoretic models include the hide-and-seek game [8], blotto games [4], auditting games [5] and catcher-evader games [19]. Most of these games study the optimal usage of security forces under different game structures. In contrast, our ESG model is motivated by detection-vs-evasion situations in which security forces have already been allocated.

Scheduling

On a formal level, our problem is to schedule/order targets in an adversarial environment, which shares similarities with the classic problem of scheduling that looks to assign tasks to different machines to optimize certain criteria [18]. There is a rich body of AI research on scheduling, ranging from solving varied problems using AI techniques such as satisfiability [11] and distributed constraint optimization [22], to developing new models of scheduling problems under uncertainty [3] or in multi-agent setups [31].

In fact, the Best Red Response problem can be formulated as the following slightly non-standard scheduling problem: There are $k$ machines (modeling sensors). In each step, a job (modeling a target) arrives. The job can be processed (modeling sensing) by a given subset of machines and if executed successfully generates a given reward value. The job has a processing time of $\tau$ and needs to be processed within the next $\tau$ steps. This implies that the job needs to be processed (i.e., sensed) either now or its reward is lost.

4 The Algorithmics of Escape Sensing Games

We analyze the computational complexity of ESGs starting with Red’s best response problem, followed by computing equilibria.

4.1 Computing Red’s Best Response Strategy

We analyze Red’s best response problem that Red needs to solve in each game after Blue has committed to a target ordering. This problem turns out to be NP-hard, even if Red is only interested in determining whether it can sense all targets. This intractability result is the first strong indicator of the intricate game dynamics in ESGs.

Theorem 1.

Best Red Response is NP-complete, even when asked to decide whether Red can sense all targets or not.

Proof.

We reduce from Hitting Set where we are given a universe $U$ , a collection of sets $\mathcal{Z}=\{Z_{1},\dots,Z_{m}\}$ and an integer $t$ , and the question is whether there a size- $t$ subset $U^{\prime}\subseteq U$ containing at least one element from each set in $\mathcal{Z}$ (we assume that $t\geq 2$ and $|U|>t$ ).

In the construction, all targets have a value of $1$ and the question is whether Red can sense all targets. As the core of the construction we add element sensors $\{a_{u}\mid u\in U\}$ , set targets $\{\alpha_{Z}\mid Z\in\mathcal{Z}\}$ , and selection targets $\{\beta_{i,j}\mid i\in[|U|-t],j\in[m]\}$ . Each element sensor can sense all selection targets and all set targets corresponding to sets in which the element appears. Regarding the ordering of targets, it is easiest to think of the targets as being arranged in “rounds”. In each round $j\in[m]$ , first the selection targets $\{\beta_{i,j}\mid i\in[|U|-t]\}$ move through the channel followed by the the set target $\alpha_{Z_{j}}$ . The idea is that the same $|U|-t$ element sensors sense the selection targets in every round, which correspond to the elements that are not part of the hitting set (we extend the construction in the following paragraph to ensure that this holds). Then, the remaining $t$ element sensors need to form a hitting set to be able to sense the set target in each round.

We extend the construction as follows. We add filling targets $\gamma_{i,j}$ for all $i\in[t-1]$ and $j\in[m]$ , which all element sensors can sense. Moreover, we add dummy sensors $d_{i,j}$ for each $i\in[2|U|]$ and $j\in[m]$ and dummy targets $\delta_{i,j}$ for each $i\in[2|U|]$ and $j\in[m]$ . For each $i\in[2|U|]$ and $j\in[m]$ , dummy sensor $d_{i,j}$ can sense dummy target $\delta_{i,j}$ . We set $\tau:=2|U|+1$ . Formally, the target ordering $\sigma$ is constructed—in multiple “rounds”—as follows. In each round $j\in[m]$ , we first move the selection targets $\{\beta_{i,j}\mid i\in[|U|-t]\}$ through the channel, then the dummy targets $\{\delta_{i,j}\mid i\in[|U|]\}$ , then the set target $\alpha_{Z_{j}}$ , then the filling targets $\{\gamma_{i,j}\mid i\in[t-1]\}$ and then the dummy targets $\{\delta_{i,j}\mid i\in[|U|+1,2|U|]\}$ (the ordering of targets in each of the groups is arbitrary).

Proof of correctness: forward direction

Assume that $U^{\prime}\subseteq U$ is a size- $t$ hitting set of $\mathcal{Z}$ . For each $i\in[2|U|]$ and $j\in[m]$ , we let $d_{i,j}$ sense $\delta_{i,j}$ . We construct the sensing plan for the element sensors iteratively as follows. In each round $j\in[m]$ , we let each of the $|U|-t$ element sensors $\{a_{u}\mid u\in U\setminus U^{\prime}\}$ sense exactly one of the selection targets $\{\beta_{i,j}\mid i\in[|U|-t]\}$ . Now, let $u^{*}$ be an element from $U^{\prime}$ that is contained in $Z_{j}$ (such an element needs to exist because $U^{\prime}$ is a hitting set). We let $a_{u^{*}}$ sense $\alpha_{Z_{j}}$ and we let each of the $t-1$ element sensors $\{a_{u}\mid u\in U^{\prime}\setminus\{u^{*}\}\}$ sense exactly one of the filling targets $\{\gamma_{i,j}\mid i\in[t-1]\}$ .

The constructed sensing plan senses all targets and clearly respects the sensing matrix. It remains to be argued that the recharging times of all element sensors are respected (dummy sensors only sense one target). For each $u\in U\setminus U^{\prime}$ , we have that $a_{u}$ senses one selection target in each round. Between two selection targets in two different rounds there are at least $2|U|$ dummy targets and one set target, so recharging times are respected. For each $u\in U^{\prime}$ , the sensor $a_{u}$ senses either a set or filling target in each round. There are $2|U|$ dummy targets and $|U|-t\geq 1$ selection targets between each two sets and filling targets from different rounds, so recharging times are respected.

Proof of correctness: backward direction

Assume that $\psi$ is a valid sensing plan that senses all targets. Consequently, in each round, the $|U|$ element sensors need to sense $|U|-t$ selection, $t-1$ filling, and one set target. As $\psi$ is valid and there are only $|U|-t-1+|U|+1+t-2=2|U|-2$ targets between the first selection and last filling target in each round, this means that each element sensor needs to sense exactly one of these targets in each round. Note that an element sensor that senses a non-selection (i.e., either a set or filling) target in round $j\in[m]$ cannot sense a selection target in round $j+1$ , as there are only $t-1+|U|+|U|-t-1=2|U|-2$ targets between the first non-selection target in round $j$ and the last selection target in round $j+1$ . Consequently, as each element sensor needs to sense one target in each round, it follows that there is a set $U^{\prime\prime}\subseteq U$ of $|U|-t$ elements so that the corresponding element sensors sense a selection target in every round. Consequently, the remaining $t$ element sensors need to sense all set targets. As an element sensor is only capable of sensing a set target if the element appears in the set, it follows that $U\setminus U^{\prime\prime}$ is a size- $t$ hitting set of $\mathcal{Z}$ . ∎

Despite this intractability result, it is still possible to construct exact combinatorial algorithms for Best Red Response. In particular, we present a dynamic programming-based algorithm empowered by some structural observations on ESGs that runs in $\mathcal{O}(n\cdot(k_{\chi}+1)^{\tau+2})$ (recall that $k_{\chi}\leq k$ ). This algorithm in particular implies that the problem becomes polynomial-time solvable if the recharging time, which we expect to be rather small in comparison to the number of targets, is a constant.

Proposition 2.

There is a $\mathcal{O}(n\cdot(k_{\chi}+1)^{\tau+2})$ -time algorithm for Best Red Response.

Proof Sketch.

Our idea is to construct a valid sensing plan iteratively by going through the arriving targets one by one (we assume that the ordering of targets is $t_{1},\dots,t_{n}$ ). For each target, we either decide that it will not be sensed or assign it to one of the sensors so that the resulting plan is still valid. Our key observation to bring down the time and space complexity of the dynamic program is that we do not need to store the full sensing plan to ensure the validity of the plan after updating it. Instead, it is sufficient to know for each sensor whether it has sensed a target in the last $\tau$ steps. More formally, given a valid sensing plan $\psi$ that has been constructed by iterating over the first $i\in[n]$ targets, we only store the following information: (i) the value of all targets $\{t_{1},\dots,t_{i}\}$ that have not been assigned to a sensor in $\psi$ , i.e., $\sum_{t_{j}\in\{t_{1},\dots,t_{i}\}\setminus\cup_{s\in S}\psi(s)}v_{j}$ , (ii) the sensors the last $\tau+1$ targets have been assigned to. It is possible to store this information in a table of size $\mathcal{O}(n\cdot(k+1)^{\tau+1})$ where each cell can be computed in $\mathcal{O}(k)$ -time. To extend the algorithm to sensor types, we prove that we can collapse sensors of one type into a “meta” sensor, making it sufficient to bookmark the types of sensors that have sensed the last $\tau+1$ targets. ∎

We conclude by giving a clean ILP formulation of Best Red Response, which turns out to scale very favorably in our experiments allowing us to solve instances with up to $10000$ targets within one minute.

Proposition 3.

Best Red Response admits an ILP formulation with $\mathcal{O}(n\cdot k)$ binary variables and $\mathcal{O}(n\cdot k)$ constraints.

Proof.

We model an instance $\mathcal{I}$ of Best Red Response as an ILP as follows. We assume that the targets are ordered as $t_{1},\dots,t_{n}$ . We create a binary variable $x_{i,j}$ for each $i\in[n]$ and $j\in[k+1]$ . Setting $x_{i,j}$ to one corresponds to letting sensor $s_{j}$ sense target $t_{i}$ if $j\in[k]$ , and letting $t_{i}$ not be sensed by any sensor if $j=k+1$ .

To ensure that Red minimizes the value of not-sensed targets, the optimization criterion becomes: $\min\sum_{i\in[n]}v_{i}\cdot x_{i,k+1}.$ To ensure the validity of the sensing plan $\psi$ , for each $i\in[n]$ , we enforce that: $\sum_{j\in[k+1]}x_{i,j}=1.$ Moreover, to ensure that sensor capabilities are respected, we impose for each $i\in[n]$ and $j\in[k]$ that: $x_{i,j}\leq D_{i,j}.$ Lastly, to enforce that recharging times are respected, for each $j\in[k]$ and $i\in[n-\tau]$ we add the constraint: $\sum_{\ell=i}^{i+\tau}x_{\ell,j}\leq 1.$ ∎

4.2 Solving for the Stackelberg Equilibrium

We now study the problem of computing Blue’s optimal strategy, i.e., to solve Blue Leader Stackelberg Equilibrium. Theorem 1 already shows the NP-hardness of Best Red Response. While this does not imply the hardness of computing Stackelberg equilibria⁵⁵5Note that the fact that it is NP-hard for Red to best respond to certain Blue strategies (as constructed in the reduction of Theorem 1) does not imply that is also hard for Red to best respond to the particular Stackelberg equilibrium strategy of Blue (as these strategies might admit some structure that makes it easier to best respond)., a convincing intractability result for Blue’s optimal strategy shall ideally “disentangle” its complexity from Red’s best response problem. With this in mind, we prove the NP-hardness of Blue Leader Stackelberg Equilibrium even in situations where Red’s best response problem is linear-time solvable. This demonstrates that the complexity in our reduction does not come from finding Red’s strategy but from the problem of whether Blue can arrange the targets in an optimal way.

Theorem 4.

Blue Leader Stackelberg Equilibrium is NP-hard, even on instances where Best Red Response is linear-time computable and the recharging time is $3$ .

Note that the NP-hardness upholds even if sensors’ recharging time is constant, a case in which Red’s best response problem is polynomial-time solvable (see Proposition 2). Our hardness result indicates that computing Blue’s optimal strategy is a generally much harder problem than computing Red’s optimal strategy. In fact, it remains open whether Blue Leader Stackelberg Equilibrium is contained in NP or whether it is complete for complexity classes beyond NP. We suspect the latter to hold.

4.2.1 Bilevel Optimization

In light of this, it is unclear (and from our perspective rather unlikely) that Blue Leader Stackelberg Equilibrium admits an ILP formulation. Naive brute-force approaches are also computationally infeasible, as we would need to enumerate all $n!$ possible target orderings and solve the NP-hard Best Red Response problem as a subroutine for each of them.

Thus, we turn to a formulation as a bilevel optimization problem [9] as one way to solve the problem exactly. In such formulations, constraints are still linear, but there exist two connected levels of the problem, i.e., an outer and an inner level. The inner level controls certain variables that it sets to minimize an objective subject to linear constraints that also involve variables controlled by the outer level, while the outer level sets these variables to maximize the objective. In our problem, we can model Red’s best response problem as the inner level loosely following the ILP from Proposition 3. The outer-level models Blue’s problem. The key parts of the outer level are variables for each target that encode the position in which the target appears in the final ordering and that are used in the inner level to ensure the validity of the sensing plan.

Proposition 5.

Blue Leader Stackelberg Equilibrium admits a bilevel optimization formulation with $\mathcal{O}(n^{2}+n\cdot k)$ binary variables, $\mathcal{O}(n)$ integer variables, and $\mathcal{O}(n^{2}\cdot k)$ constraints.

Note that standard techniques to convert this bilevel program into an (integer) linear program, e.g., by exploiting KTT-optimality conditions [2, 13], are not applicable in our setting, as we are solving an integer bilevel program within which the inner-level program is already non-convex.

4.2.2 Heuristic

We will see later that the running time for the bilevel formulation of the problem becomes already infeasible on small-sized instances. Therefore, we experimented with different heuristics to solve the problem.⁶⁶6Note that the heuristic double-oracle approach that has been successfully employed for other large combinatorial games [1, 17] is not applicable to ESGs. Traditionally, the approach successively expands the strategy spaces of both players by letting them best respond to each other. However, in ESGs, we face a bilevel problem in which there is no best response of the leader to the follower. The approach also fails here because the valid strategies of the follower heavily depend on the strategy picked by the leader. In the following, we present two variants of simulated annealing-based heuristics that performed best. For a target ordering $\sigma$ , we denote as $N(\sigma)$ its neighbors, i.e., all ${n\choose 2}$ orderings that arise from $\sigma$ by swapping the position of any two different targets. The relaxed version of our simulated annealing (SA_Relax) is presented in Algorithm 1. The idea is to find an optimal ordering through repeated local rearrangements. We store the current ordering as $\sigma$ and compute its value for Blue by solving Red’s best response problem using Proposition 3. Then, we pick a random neighbor of $\sigma$ , compute its value, and update the ordering based on this according to standard simulated annealing rules.

Algorithm 1 SA_Relax

Input: Target ordering $\sigma$ and temperature $T=100$

1: Compute optimal sensing plan

\psi

wrt.

\sigma

2: while

T>0.00001

3: Select a random neighbor

\hat{\sigma}\in N(\sigma)

4: Compute optimal sensing plan

\psi^{\prime}

wrt.

\hat{\sigma}

5: if

e^{\frac{v(\hat{\psi})-v(\psi)}{T}}>

random[0, 1] then

\sigma:=\hat{\sigma}

\psi:=\hat{\psi}

T:=T\cdot 0.9

8: Return

\sigma

In the full version of our simulated annealing (SA), instead of picking a random neighbor $\hat{\sigma}$ from $N(\sigma)$ in Line 3 of Algorithm 1, we first run a heuristic for Best Red Response on all orderings from $N(\sigma)$ .⁷⁷7In our (greedy) heuristic, we consider the targets in decreasing order of their value and construct the sensing plan $\psi$ iteratively. Let $S^{\prime}\subseteq S$ be the sensors $s$ so that $\psi$ remains a valid sensing plan after adding the current target to $\psi(s)$ . We let the target be sensed by a randomly selected sensor from $S$ (or by no sensor if $S$ is empty). For a formal description, see our full version [6]. Then, on the $\mu$ fraction of neighbors with the highest returned value, we execute the ILP from Proposition 3 to compute the optimal sensing plan. Of the examined neighbors, we pick the one with the highest returned value as $\sigma^{\prime}$ . As a hyper-parameter tuning process, we tested the performance of our heuristic algorithm with respect to the choice of $\mu$ (see our full version [6] for results). It turns out that $\mu=0.1$ provides a good trade-off between the algorithm’s running time and Blue’s utility. Thus, we fix $\mu=0.1$ throughout the paper. For both heuristics, we always run the heuristic three times with three different initial randomly generated target orderings and return the best computed ordering.

5 Experimental Evaluations

We analyze the quality and performance of our algorithms to compute the Stackelberg equilibrium.⁸⁸8We use Gurobi [15] to solve the ILP from Proposition 3 and MIBS [23] to solve the bilevel program from Proposition 5. Both are among the most popular off-the-shelf tools for solving the respective problem. We consider three simulated game settings for generating ESGs. For each setting, we determine the value of a target by drawing a number uniformly within $[0,1]$ ⁹⁹9In our full version [6], we analyze supplementary scenarios, reinforcing similar conclusions to those presented here.:

1.

Default (Def): For each $i\in[n]$ and $j\in[k]$ , we set $D_{i,j}=1$ with probability $0.2$ .
2.

Euclidean (Euc): Each target $t_{i}\in T$ and each sensor $s_{j}\in S$ are uniformly sampled points in $[0,1]\times[0,1]$ . A sensor can sense a target (i.e., $D_{i,j}=1$ ) if the Euclidean distance between their points is below $0.3$ .
3.

RandomLevel (Rand): Each target $t_{i}\in T$ has a difficulty level $d_{i}$ uniformly sampled from $[0,1]$ , and each sensor $s_{j}\in S$ has a skill level $s_{j}$ uniformly sampled from $[0,1]$ . For each $i\in[n]$ and $j\in[k]$ , we set $D_{i,j}=1$ with probability $(1-d_{i})\cdot s_{j}$ .

In all our experiments, if not stated otherwise, we average over $50$ instances generated according to one of the models. We present our experimental results as tables where each entry contains Blue’s average utility (i.e., the summed value of not-sensed targets) from the computed target ordering assuming Red best responds and the average running time in seconds in italics, both followed by their respective standard deviations. Note that standard deviations are calculated across the different sampled instances, implying that independent of the solution method some non-trivial standard deviation is to be expected, as certain instances are more favorable for Blue than others.

We analyze the maximum size of instances that we can solve exactly using the bilevel program, which we denote as OPT. We present results for the Default game setting in Table 3 (results for other simulated game settings are similar). It turns out that while instances with $5$ targets can be solved within a second by OPT, instances with $9$ targets take already around $9$ hours to solve. This demonstrates that the bilevel program is only usable for quite small instances. Moreover, we observe the to-be-expected trend that Blue’s utility increases when Blue has more targets or Red has less sensors. However, we do not find any consistent trend regarding whether it is more advantageous for Blue: more targets or fewer sensors.

Motivated by the high computational cost of the bilevel program, we now turn to analyzing the quality of our heuristics. We also include the Random method here as a baseline where Blue simply picks an arbitrary ordering of targets (and Red best responds to it). In addition, we compare our heuristics against a naive random strategy of comparable computational cost. For this, we include the Random2 method which generates $3000$ random orderings for Table 3 and $3\cdot 10^{7}$ random orderings for Table 3. The sampled ordering that achieves the highest utility for Blue assuming that Red best responds is returned.

	2	3	5
5	1.79 $\pm$ 0.71, 0.61 $\pm$ 0.21	1.55 $\pm$ 0.65, 0.77 $\pm$ 0.02	1.02 $\pm$ 0.62, 0.98 $\pm$ 0.04
7	2.41 $\pm$ 0.78, 102 $\pm$ 36	2.29 $\pm$ 0.80, 116 $\pm$ 31	1.62 $\pm$ 0.72, 140 $\pm$ 29
8	2.96 $\pm$ 0.81, 1501 $\pm$ 354	2.33 $\pm$ 0.74, 1760 $\pm$ 38	1.7 $\pm$ 0.69, 1814 $\pm$ 23
9	n/a, 31358	n/a, 32541	n/a, 35376

Table 1: Scalability test of bilevel-program (OPT) for Default game setting with

\tau=2

. For

n=9

, we report running time for one instance. For all tables: each entry shows Blue’s average utility (top) and running time in seconds (bottom).

	Def	Euc	Rand
OPT	2.29 $\pm$ 0.80, 116 $\pm$ 31	1.952 $\pm$ 0.74, 126 $\pm$ 2.29	2.09 $\pm$ 0.87, 120 $\pm$ 25
SA	2.29 $\pm$ 0.80, 4.78 $\pm$ 0.38	1.951 $\pm$ 0.75, 4.96 $\pm$ 0.47	2.09 $\pm$ 0.87, 5.03 $\pm$ 0.69
SA_Relax	2.29 $\pm$ 0.80, 0.81 $\pm$ 0.03	1.952 $\pm$ 0.74, 0.84 $\pm$ 0.05	2.09 $\pm$ 0.87, 0.85 $\pm$ 0.08
Random	2.16 $\pm$ 0.84, 0.001	1.71 $\pm$ 0.83, 0.001	1.93 $\pm$ 0.92, 0.001
Random2	2.29 $\pm$ 0.80, 5.13 $\pm$ 0.16	1.952 $\pm$ 0.74, 5.25 $\pm$ 0.33	2.09 $\pm$ 0.87, 5.3 $\pm$ 0.46

Table 2: Comparison of algorithms to compute Blue’s utility for different simulated game settings, where

n=7

k=3

, and

\tau=2

	Def	Euc	Rand
SA	15.96 $\pm$ 1.1, 28101 $\pm$ 563	18.4 $\pm$ 2.1, 27755 $\pm$ 928	17.3 $\pm$ 4.2, 27970 $\pm$ 1136
SA_Relax	8.76 $\pm$ 0.9, 49.6 $\pm$ 2.57	10.53 $\pm$ 2.32, 47.5 $\pm$ 0.86	12.3 $\pm$ 3.6, 49.7 $\pm$ 1.6
Random	6.19 $\pm$ 1.26, 0.001	6.86 $\pm$ 2.25, 0.001	9.68 $\pm$ 2.78, 0.001
Random2	8.27 $\pm$ 0.73, 25036 $\pm$ 311	9.54 $\pm$ 2.45, 24333 $\pm$ 211	11.68 $\pm$ 3.58, 26810 $\pm$ 295

Table 3: Comparison of algorithms to compute Blue’s utility for different simulated game settings, where

n=75

k=10

, and

\tau=5

. We generate

9

instances per method.

In Table 3, we show the algorithms’ performance for small instances where we can still compute Blue’s maximum utility (OPT) via the bilevel program. In Table 3, we consider larger instances where the optimum value is unknown. Note that higher values correspond to a better performance of the algorithm, as we always report Blue’s utility for Red’s best response.

From the results in Table 3, we can see that all heuristics perform well on small instances. In particular, SA_Relax, SA, and Random2 find the optimal solution in all (but one) cases. However, SA_Relax proves advantageous because it only needs a sixth of the running time of the other two methods.

While our two heuristics SA and SA_Relax show a similar approximation quality for small instances, for larger instances (Table 3) SA clearly outperforms SA_Relax. For the Default game setting, using SA compared to SA_Relax even regularly leads to a doubled utility for Blue. While this is a strong argument for using SA, SA’s downside is its higher computational cost, needing over $7$ hours to solve instances with $75$ targets.

Finally, we observe that both methods clearly outperform the Random baseline, with SA consistently preserving an average of approximately 20 more targets for the larger instances. This highlights that the solution quality of the target ordering clearly increases throughout the simulated annealing. Considering Random2, we find that repeatedly sampling orders (instead of only once) leads to a noticeable utility increase. However, on the larger instances, Random2 performs even worse than SA_Relax while running as long as SA, thereby combining the disadvantages of SA and SA_Relax. Overall, our experiments highlight that Blue benefits from ordering the targets strategically instead of randomly.

6 Escape from Non-Coordinated Sensing

ESGs assume that the different sensors are controlled by a central authority that computes the sensing plan. We now investigate the situation where these sensors are non-coordinated and each one acts independently based on a natural greedy algorithm. This happens when sensors cannot easily exchange information and coordinate with each other. Another motivation is when sensors are controlled by different adversaries, each serving only their own interests and being unlikely to coordinate their actions and share their reward. Both of these scenarios can occur in our motivating domain of piracy at large open seas, as coordination between different groups is likely to be challenging. Different pirate groups might even refuse to coordinate at all and instead directly compete with each other.

We model these situations by assuming that sensors have a predefined ordering as $s_{1},\dots,s_{k}$ (as induced by fixed locations of the sensors); for each sensor $s\in S$ , as soon as a not previously sensed target that $s$ can sense passes $s$ (i.e., $s$ has the capabilities and is currently not recharging), $s$ senses it, thereby greedily maximizing its number of sensed targets.

Formally, given a target ordering $\sigma$ , we construct a sensing plan $\psi_{\sigma}$ sequentially as follows. For each step $\ell\in[n+k-1]$ , if target $t_{i}$ passes sensor $s_{j}$ in step $\ell$ , then we add $t_{i}$ to $\psi_{\sigma}(s_{j})$ if the resulting sensing plan remains valid with respect to $\sigma$ (formally, for $i\in[n]$ and $j\in[k]$ target $\sigma^{-1}(i)$ passes sensor $s_{j}$ in step $i+j-1$ ). As the strategy of Red is fixed, the problem Best Blue Response Blue faces is to pick a target ordering $\sigma$ so that $v(\psi_{\sigma})$ gets maximized. In the following, we study the computational complexity of this problem and solve it in computational experiments. By comparing the answer of Best Blue Response to the value of the Stackelberg equilibrium in the corresponding ESG we can ultimately answer how much Red gains from being able to centrally control its sensors.

6.1 Algorithmic Analysis

Unfortunately, it turns out that computing Blue’s strategy is NP-hard, even in restricted cases where each sensor can only sense one target. Due to the sequential construction of Red’s sensing plan, this reduction is our most intricate one:

Theorem 6.

Best Blue Response is NP-complete, even if the recharging time is $\infty$ , i.e., each sensor can sense only one target, each target has value $1$ , and the sum of each row and column in the sensing matrix is at most four.

Proof Sketch.

We focus on the variant where each sensor can only sense one target. Interestingly, as discussed in more detail in our full version [6] this problem shares some similarities with the NP-hard Minimum Maximal Matching problem, as we can view the sensors and targets as two sides of a bipartite graph with sensor-target pairs where the sensor senses the target corresponding to maximal matchings in this graph. However, the ordering of the sensors makes only certain maximal matchings in these graphs realizable, which is why we instead show NP-hardness by reducing from a variant of 3-SAT where each variable appears only twice positive and once negative. The core idea of our construction is the following: We add a literal target for each literal. Moreover, for each clause, we add a clause sensor and a clause target. The clause sensor is capable of sensing the corresponding clause target as well as targets corresponding to the three literals appearing in the clause. We add further targets and sensors to the instance so that all clause targets need to make it unsensed through the channel. This implies that each clause sensor needs to sense a literal target as it will otherwise sense the corresponding clause target in passing, i.e., we need to “cover” each clause with a literal appearing in the clause. Now for each variable, we add a slightly intricate gadget that ensures that we can either use the targets corresponding to positive literals to cover clause sensors (which corresponds to setting the variable to true) or the one target corresponding to a negative literal (which corresponds to setting the variable to false). Because we need to “cover” each clause, the induced assignment is satisfying. ∎

We can adopt a similar view as in Proposition 2 to solve the problem via dynamic programming. However, this time the dynamic programming iteratively constructs the optimal target ordering and we need to keep track of the previously used targets together with the sensors used in the last $\tau+1$ timesteps. This results in a naive running time of $\mathcal{O}(n\cdot 2^{n}\cdot(k+1)^{\tau+2})$ , which can be improved to $\mathcal{O}\left(n_{\chi}\cdot\left(\prod_{i=1}^{n_{\chi}}(\ell_{i}+1)\right)% \cdot(k+1)^{\tau+2}\right)$ if we incorporate types:

Proposition 7.

Best Blue Response is solvable in $\mathcal{O}\left(n_{\chi}\cdot\left(\prod_{i=1}^{n_{\chi}}(\ell_{i}+1)\right)% \cdot(k+1)^{\tau+2}\right)$ , where $\ell_{i}$ is the number of targets of type $\gamma_{i}$ .

6.1.1 ILP Formulation

Constructing an ILP for Best Blue Response turns out to be slightly more challenging, as we need to encode Red’s greedy sequential behavior:

Proposition 8.

There is an ILP formulation for Best Blue Response with $\mathcal{O}(n^{2}\cdot k)$ binary variables, $\mathcal{O}(n)$ integer variables, and $\mathcal{O}(n^{2}\cdot k)$ constraints.

Proof Sketch.

We introduce for each target $i\in[n]$ an integer variable $z_{i}$ encoding the position in which the target appears. Moreover, similar to Proposition 3, for each $i\in[n]$ and $j\in[k+1]$ , we add a binary variable $x_{i,j}$ , which encodes whether $t_{i}$ is sensed by sensor $s_{j}$ or whether the target makes it unsensed through the channel (for $j=k+1$ ). We can add mostly straightforward constraints to ensure that $x_{i,j}$ respects recharging times. The main challenge is to encode the greedy behavior of the sensors (i.e., the ILP cannot have the freedom to pick the $x_{i,j}$ values arbitrarily to optimize Blue’s utility but they are set according to sensors’ greedy behavior). For this, for each $i,i^{\prime}\in[n]$ and $j\in[k]$ , we add a binary variable $y_{i,i^{\prime},j}$ and add constraints so that $y_{i,i^{\prime},j}$ is equal to one if target $i$ is sensed by sensor $j$ and because of this $j$ recharges when $i^{\prime}$ is passing, i.e., $i$ “covers” $i^{\prime}$ .

To encode sensors’ greedy behavior, we want to add a constraint that makes sure that in case $x_{i,j}=1$ , the target needs to be covered by other targets for all sensors that are capable of sensing it placed before $j$ . Note that this together with another constraint ( $\sum_{j\in[k+1]}x_{i,j}=1$ ) in particular implies that each target is sensed by the first sensor it passes which is not recharging, thereby encoding the greedy behavior of sensors. Specifically, for each $i\in[n]$ and $j\in[k+1]$ , we add:

	$\displaystyle\sum_{t\in[j-1]:D_{i,t}=0}1+$	$\displaystyle\sum_{t\in[j-1]:D_{i,t}=1}\sum_{i^{\prime}\in[n]}y_{i^{\prime},i,% t}-(j-1)$		(1)
		$\displaystyle\geq-n(1-\sum_{t=j}^{k+1}x_{i,t}).$

∎

6.1.2 Heuristic

Since it will turn out that the ILP formulation cannot quickly solve medium-to-large instances, we explore various simulated annealing-based heuristics, similar to the approach discussed in Section 5. We present the variant SA_Relax where a random neighbor is picked in Algorithm 2. The other variant SA computes Blue’s utility $v(\psi_{\hat{\sigma}})$ for all neighbors and picks the one with the highest utility.

Algorithm 2 SA_Relax for Best Blue Response

Input: Initial target ordering $\sigma$ and temperature $T=100$

1: while

T>0.00001

2: Select a random neighbor

\hat{\sigma}\in N(\sigma)

3: if

e^{\frac{v(\psi_{\hat{\sigma}})-v(\psi_{\sigma})}{T}}>

random[0, 1] then

\sigma:=\hat{\sigma}

T:=T\cdot 0.9

6: Return

\sigma

6.2 Experiments

We reuse the general setup described in Section 5, but naturally now report Blue’s computed utility assuming that sensors act greedily. Here, we let the Random2 method generate $1000$ random orderings in Table 6 and $5\cdot 10^{5}$ random orderings in Table 6.

First of all, we evaluate the scalability of our ILP for Best Blue Response (OPT) in Table 6. The ILP can solve the problem for medium-sized instances with up to $25$ targets in a few minutes. However, due to the complexity of the ILP modeling, already for $25$ targets as soon as the number of sensors reaches $5$ , instances can take more than $5$ hours to solve. This is why the last line of the table only reports the running time for one instance.

	2	3	5
5	2 $\pm$ 0.68, 0.007 $\pm$ 0.005	1.71 $\pm$ 0.57, 0.009 $\pm$ 0.005	1.32 $\pm$ 0.52, 0.01 $\pm$ 0.003
15	6.6 $\pm$ 1, 0.14 $\pm$ 0.35	6.29 $\pm$ 0.9, 0.32 $\pm$ 0.88	5.46 $\pm$ 0.87, 6.85 $\pm$ 22.9
20	9.05 $\pm$ 1.13, 0.52 $\pm$ 2.94	8.49 $\pm$ 1.01, 1.58 $\pm$ 4.4	7.56 $\pm$ 1.01, 229 $\pm$ 1261
25	11.38 $\pm$ 1.26, 6.8 $\pm$ 30.6	10.96 $\pm$ 1.29, 283 $\pm$ 1771	n/a, 22537

Table 4: Scalability test of ILP (OPT) for Default game setting with

\tau=2

. For all tables: each entry shows Blue’s average utility (top) and running time in seconds (bottom).

	Def	Euc	Rand
OPT	3.1 $\pm$ 0.86, 0.15 $\pm$ 0.37	3.25 $\pm$ 0.79, 0.25 $\pm$ 0.71	3.04 $\pm$ 0.78, 3.26 $\pm$ 8.77
SA	2.83 $\pm$ 0.81, 0.7 $\pm$ 0.02	2.92 $\pm$ 0.81, 0.71 $\pm$ 0.02	2.72 $\pm$ 0.8, 0.71 $\pm$ 0.018
SA_Relax	3.06 $\pm$ 0.88, 0.02	3.17 $\pm$ 0.81, 0.02	2.89 $\pm$ 0.82, 0.02
Random	1.9 $\pm$ 0.72, 0.001	2.15 $\pm$ 0.9, 0.001	1.9 $\pm$ 0.9, 0.001
Random2	2.91 $\pm$ 0.87, 0.81 $\pm$ 0.0004	3.11 $\pm$ 0.79, 0.81 $\pm$ 0.0004	2.89 $\pm$ 0.78, 0.83 $\pm$ 0.0004

Table 5: Comparison of algorithms for Best Blue Response for different game settings, where

n=10

k=5

, and

\tau=2

	Def	Euc	Rand
SA	16.57 $\pm$ 1.64, 485 $\pm$ 20	17.2 $\pm$ 2.4, 470 $\pm$ 9.6	17.54 $\pm$ 3.27, 503 $\pm$ 20
SA_Relax	12.3 $\pm$ 1.58, 3.14 $\pm$ 0.35	13.11 $\pm$ 2.62, 2.84 $\pm$ 0.2	14.47 $\pm$ 2.96, 2.95 $\pm$ 0.2
Random	9.2 $\pm$ 1.99, 0.001	10.02 $\pm$ 2.77, 0.001	11.67 $\pm$ 3.07, 0.001
Random2	12.75 $\pm$ 1.03, 458 $\pm$ 15	12.98 $\pm$ 2.37, 437 $\pm$ 13	14.67 $\pm$ 3.18, 496 $\pm$ 19

Table 6: Comparison of algorithms for Best Blue Response for different game settings, where

n=75

k=10

, and

\tau=5

Next, we analyze the solution quality of our heuristic approaches. On small instances presented in Table 6, our best heuristic algorithm approximates the optimal solution quite well and the error is typically below $10\%$ with the SA_Relax method consistently outperforming SA. Both heuristics outperform Random, while Random2 performs better than SA (yet still worse than SA_Relax, while having a much longer running time). When moving to larger instances in Table 6, the picture flips, as SA is now substantially outperforming SA_Relax. This shows a general trend that the solution quality of SA scales more favorably than that of SA_Relax (while the opposite is naturally true for the running time). The heuristics again clearly outperform Random, with SA sensing approximately $15$ more targets. Random 2 performs similarly to the suboptimal heuristic SA_Relax, while being slower by a factor of more than $100$ .

Finally, we are interested in exploring the power of coordination for Red, i.e., the difference between the optimal utility Blue gets in the non-coordinated setting explored in this section compared to its utility in the Stackelberg equilibria from Section 5. We find that for the small instances where we can compute the Stackelberg equilibrium exactly Red can reduce Blue’s utility by $10\%$ to $20\%$ through coordination. For larger instance sizes, we no longer know the optimal solutions, which is why we resort to comparing the results of the respective SA heuristics. We find that for larger instances, the gap decreases with Red being only able to decrease Blue’s utility by $5\%$ through coordination in the instances from the Default setting underlying Table 3. In our full version [6], we show that when Red’s sensors are capable of sensing more targets, coordination is more important sometimes leading to halving Blue’s utility.

7 Conclusion

By introducing Espace Sensing Games, we initiated the study of a new class of games concerned with target arrangement and motivated by security applications. We showed that while the worst-case computational complexity of ESGs is prohibitive, our presented algorithms still have a good performance in experiments.

There are multiple directions for future work emanating from our work. First, pinpointing the precise complexity of computing Stackelberg equilibria remains a concrete open question. Second, there are other variants of ESGs beyond those studied by us. For instance, it would be possible to merge the settings studied in Sections 4 and 6 into a game where sensors act greedily but Red can control the ordering of the sensors. In this game variant where both Red and Blue need to pick orders, it would also be possible to study simultaneous play or Stackelberg equilibria where Red moves first. Lastly, there are various other target arrangement problems to be studied. One example could be a game where Blue needs to place targets on a grid and Red cannot sense any two targets placed close to each other.

Acknowledgements

This work was supported by the Office of Naval Research (ONR) under Grant Number N00014-23-1-2802. The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of the Office of Naval Research or the U.S. Government.

References

[1] Lukás Adam, Rostislav Horcík, Tomás Kasl, and Tomás Kroupa. Double oracle algorithm for computing equilibria in continuous games. In AAAI, pages 5070–5077, 2021.
[2] Gemayqzel Bouza Allende and Georg Still. Solving bilevel programs with the kkt-approach. Math. Program, 138:309–332, 2013.
[3] Evripidis Bampis, Konstantinos Dogeas, Alexander V Kononov, Giorgio Lucarelli, and Fanny Pascual. Scheduling with untrusted predictions. In IJCAI, pages 4581–4587, 2022.
[4] Soheil Behnezhad, Avrim Blum, Mahsa Derakhshan, MohammadTaghi HajiAghayi, Mohammad Mahdian, Christos H Papadimitriou, Ronald L Rivest, Saeed Seddighin, and Philip B Stark. From battlefields to elections: Winning strategies of blotto and auditing games. In SODA, pages 2291–2310, 2018.
[5] Jeremiah Blocki, Nicolas Christin, Anupam Datta, Ariel Procaccia, and Arunesh Sinha. Audit games with multiple defender resources. In AAAI, pages 791–797, 2015.
[6] Niclas Boehmer, Minbiao Han, Haifeng Xu, and Milind Tambe. Escape sensing games: Detection-vs-evasion in security applications. CoRR, abs/XXXX, 2024.
[7] Victor Bucarey, Carlos Casorrán, Óscar Figueroa, Karla Rosas, Hugo Navarrete, and Fernando Ordóñez. Building real Stackelberg security games for border patrols. In GameSec, pages 193–212, 2017.
[8] Martin Chapman, Gareth Tyson, Peter McBurney, Michael Luck, and Simon Parsons. Playing hide-and-seek: an abstract game for cyber security. In ACySE, pages 1–8, 2014.
[9] Benoît Colson, Patrice Marcotte, and Gilles Savard. An overview of bilevel optimization. Annals of operations research, 153:235–256, 2007.
[10] John H. Conway. On numbers and games, Second Edition. Academic Press, 2001.
[11] James M Crawford and Andrew B Baker. Experimental results on the application of satisfiability algorithms to scheduling problems. In AAAI, pages 1092–1097, 1994.
[12] Erik D. Demaine. Playing games with algorithms: Algorithmic combinatorial game theory. In MFCS, pages 18–32, 2001.
[13] Stephan Dempe and Alain Zemkoho. Bilevel optimization. In Springer optimization and its applications, volume 161. Springer, 2020.
[14] Fei Fang, Thanh Nguyen, Benjamin Ford, Nicole Sintov, and Milind Tambe. Introduction to green security games. In IJCAI, 2015.
[15] Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2022.
[16] Venkatesan Guruswami, C. Pandu Rangan, Maw-Shang Chang, Gerard J. Chang, and C. K. Wong. The vertex-disjoint triangles problem. In WG, pages 26–37, 1998.
[17] Manish Jain, Dmytro Korzhyk, Ondrej Vanek, Vincent Conitzer, Michal Pechoucek, and Milind Tambe. A double oracle algorithm for zero-sum security games on graphs. In AAMAS, pages 327–334, 2011.
[18] Jan Karel Lenstra, AHG Rinnooy Kan, and Peter Brucker. Complexity of machine scheduling problems. In Annals of discrete mathematics, volume 1, pages 343–362. Elsevier, 1977.
[19] Yuqian Li, Vincent Conitzer, and Dmytro Korzhyk. Catcher-evader games. In IJCAI, pages 329–337, 2016.
[20] Christof Löding and Philipp Rohde. Solving the sabotage game is PSPACE-hard. In MFCS, pages 531–540, 2003.
[21] Eric Shieh, Bo An, Rong Yang, Milind Tambe, Craig Baldwin, Joseph DiRenzo, Ben Maule, and Garrett Meyer. Protect: A deployed game theoretic system to protect the ports of the united states. In AAMAS, pages 13–20, 2012.
[22] Evan Sultanik, Pragnesh Jay Modi, and William C Regli. On modeling multiagent task scheduling as a distributed constraint optimization problem. In IJCAI, pages 1531–1536, 2007.
[23] Sahar Tahernejad, Ted K Ralphs, and Scott T DeNegre. A branch-and-cut algorithm for mixed integer bilevel linear optimization problems and its implementation. Math. Program. Comput., 12:529–568, 2020.
[24] Milind Tambe. Security and game theory: algorithms, deployed systems, lessons learned. Cambridge university press, 2011.
[25] UN-News. Somalia: Pirates attack UN aid ship, prompting call for action, June 2007. https://reliefweb.int/report/somalia/somalia-pirates-attack-un-aid-ship-prompting-call-action.
[26] United-Nation. United nations peacekeeping missions military maritime task force manual, Sept 2015.
[27] Ondřej Vaněk, Zhengyu Yin, Manish Jain, Branislav Bošanskỳ, Milind Tambe, and Michal Pěchouček. Game-theoretic resource allocation for malicious packet detection in computer networks. In AAMAS, pages 905–912, 2012.
[28] John I Winn and Kevin H Govern. Maritime pirates, sea robbers, and terrorists: New approaches to emerging threats. Homeland Security Rev., 2:131, 2008.
[29] Mihalis Yannakakis. Edge-deletion problems. SIAM J. Comput., 10(2):297–309, 1981.
[30] Mihalis Yannakakis and Fanica Gavril. Edge dominating sets in graphs. SIAM J. Appl. Math., 38(3):364–372, 1980.
[31] Chongjie Zhang and Julie A Shah. Co-optimizating multi-agent placement with task assignment and scheduling. In IJCAI, pages 3308–3314, 2016.

Appendix A Additional Material for Section 4.1

See 2

Proof.

For our dynamic program, we create a table $J[i,b_{1},\dots,b_{\tau+1}]\in\mathbb{N}$ for each $i\in[n]$ and $b_{1},\dots,b_{\tau+1}\in S\cup\{\emptyset\}$ . $J[i,b_{1},\dots,b_{\tau+1}]$ stores the minimum value of surviving target from $t_{1},\dots,t_{i}$ induced by a valid sensing plan $\psi$ that for each $j\in[\max(1,i-\tau),i]$ assigns target $t_{j}$ to sensor $b_{i-j+1}$ if $b_{j}\neq\emptyset$ and to no sensor if $b_{j}=\emptyset$ . In this case, we say that $\psi$ witnesses the table entry. If no such sensing plan exists, $J[i,b_{1},\dots,b_{\tau+1}]$ is $\infty$ . Clearly, the answer to the problem is $\min_{b_{1},\dots,b_{\tau+1}\in S\cup\{\emptyset\}}J[n,b_{1},\dots,b_{\tau+1}]$ .

For the initialization, we set an entry $J[i,b_{1},\dots,b_{\tau+1}]$ to $0$ if $i=0$ and to $\infty$ otherwise. Now, we update the table for increasing $i=0,1,\dots,n$ by filling $J[i,b_{1},\dots,b_{\tau+1}]$ as follows. We start by assuming that $b_{1}\neq\emptyset$ implying that $t_{i}$ is to be sensed by $b_{1}$ . In this case, we set the entry to $\infty$ if $\exists\ell\in[2,\tau+1]:b_{1}=b_{\ell}$ (i.e., the recharging period of $b_{1}$ would be violated) or if $b_{1}$ is not capable of sensing $t_{i}$ . In both cases, there is no sensing plan to witness the entry. Otherwise, we update the entry as:

J[i,b_{1},\dots,b_{\tau+1}]=\min_{s\in S\cup\{\emptyset\}}J[i-1,b_{2},\dots,b_% {\tau+1},s].

We claim that in case an update of the entry happens here, it is correct. For this, let $s^{*}\in S\cup\{\emptyset\}$ be the minimizer of the right hand side and $\psi^{*}$ the plan witnessing $J[i-1,b_{2},\dots,b_{\tau+1},s^{*}]$ . Then, we can extend $\psi^{*}$ to a sensing plan $\psi^{\prime}$ by adding $t_{i}$ to $\psi^{*}(b_{1})$ . Plan $\psi^{\prime}$ is valid and a witness for the updated entry, since $\psi^{*}$ is valid and we ruled out above that $b_{1}$ is either not ready or capable of sensing $t_{i}$ .

Lastly, it remains to consider the case $b_{1}=\emptyset$ , where $t_{i}$ is not assigned to any sensor. In this case, we update the entry as:

J[i,b_{1},\dots,b_{\tau+1}]=\min_{s_{\in}S\cup\{\emptyset\}}J[i,b_{2},\dots,b_% {\tau+1},s]+v_{i}.

Analogous to above, in case an update happens assume that $s^{*}\in S\cup\{\emptyset\}$ is the minimizer of the right-hand side and $\psi^{*}$ the plan witnessing $J[i-1,b_{2},\dots,b_{\tau+1},s^{*}]$ . Then $\psi^{*}$ is also a witness for the updated entry.

The algorithm runs in $\mathcal{O}(n\cdot(k+1)^{\tau+2})$ , as the table contains $\mathcal{O}(n\cdot(k+1)^{\tau+1})$ entries and for each entry we need to take the minimum over $k+1$ values.

For $i\in[k_{\chi}]$ , let $\ell_{i}$ be the number of sensors of type $\theta_{i}$ . To extend the algorithm to sensor types, we need to prove the following lemma that allows us to collapse all sensors of the same type $\theta_{i}$ into a “meta”-sensor, which can sense $\ell_{i}$ many targets in each $\tau$ -time window:

Lemma 9.

There is a valid sensing plan of value $v$ if and only if there is a sensing plan of value $v$ where sensor capabilities are respected and for each $j\in[1,n-\tau]$ at most $\ell_{i}$ targets from $\{t_{j},\dots,t_{j+\tau}\}$ are assigned to a target of type $\theta_{i}$ for $i\in[k_{\chi}]$ .

Proof.

Let $\psi$ be a valid sensing plan of value $v$ . We claim that $\psi$ also fulfills the second condition. Assume for the sake of contradiction that there is a $\theta_{i}$ and some $j\in[1,n-\tau]$ so that more than $\ell_{i}$ targets from $\{t_{j},\dots,t_{j+\tau}\}$ are assigned to a target of type $\theta_{i}$ . Then, by the pigeonhole principle, at least one sensor of type $\theta_{i}$ needs to sense two targets within $\tau$ steps, rendering the plan invalid.

For the reverse direction, assume $\psi$ is the sensing plan of value $v$ respecting the condition. For each sensor type $\theta_{i}$ , let $T_{\psi}(\theta_{i})$ be the targets sensed by sensors of type $\theta_{i}$ in $\psi$ . We construct a sensing plan $\psi^{\prime}$ as follows. For each sensor type $\theta_{i}$ , we iterate over the targets in $T_{\psi}(\theta_{i})$ according to their position in the target ordering and always assign a target to the sensor of type $\theta_{i}$ who has not sensed a target for the longest time. $\psi^{\prime}$ has the same value as $\psi$ and respects sensor capabilities, so it remains to argue that the recharging periods are respected. For the sake of contradiction assume that there is a sensor of type $\theta_{i}$ and some $j$ so that the sensor is assigned two targets from $\{t_{j},\dots,t_{j+\tau}\}$ in $\psi^{\prime}$ . Then, by the construction of $\psi^{\prime}$ all other sensors of type $\theta_{i}$ are also assigned a target from $\{t_{j},\dots,t_{j+\tau}\}$ . It follows that sensors of $\theta_{i}$ are assigned at least $\ell_{i}+1$ many targets from $\{t_{j},\dots,t_{j+\tau}\}$ in $\psi^{\prime}$ and thereby also in $\psi$ , which contradicts our initial assumptions on $\psi$ . ∎

Using this lemma we can easily adjust the dynamic programming formation: Instead of bookmarking the sensors $b_{1},\dots,b_{\tau+1}\in S$ that have sensed the last $\tau+1$ targets, we instead bookmark the types $\theta^{\prime}_{1},\dots,\theta^{\prime}_{\tau+1}\in\Theta$ of these sensors. Now, we can set a table entry to $\infty$ (due to violated recharging time) if $\{\theta^{\prime}_{1},\dots,\theta^{\prime}_{\tau+1}\}$ contains more some sensor type $\theta_{i}\in\Theta$ more than $\ell_{i}$ times. The rest of the algorithm adapts in a straightforward manner with a resulting running time of $\mathcal{O}(n\cdot(k_{\chi}+1)^{\tau+2})$ ∎

Appendix B Additional Material for Section 4.2

See 4

Proof.

Let $\sigma:T\to[n]$ be some target ordering. We will say that two targets $t,t^{\prime}\in T$ are (placed) at distance $j$ (in $\sigma$ ) if $|\sigma(t)-\sigma(t^{\prime})|=j$ . Further, for a target $t$ we define its $i$ -surrounding to be the set of all targets whose distance from $t$ is at most $i$ .

We start the proof with two immediate claims about optimal sensing plans in response to some target ordering $\sigma$ :

Claim 1.

Let $s$ be a sensor and $t$ be a target so that $s$ is the only sensor capable of sensing $t$ and $t$ has a higher value than all the other targets $s$ can sense combined. $s$ senses $t$ in an optimal sensing plan.

Proof.

Assume that $s$ did not sense $t$ , we could just arrive at a better valid sensing plan by letting $s$ sense only $t$ . ∎

Claim 2.

Let $s$ be a sensor with recharging time $\tau$ and $t$ be some target $s$ is capable of sensing. If there is no other target $s$ capable of sensing in the $\tau$ -surrounding of $t$ , then $t$ will be sensed by some sensor in the optimal sensing plan.

Proof.

Assume that $t$ is not sensed by any sensor. Then we can just arrive at a better valid sensing plan by letting $s$ sense $t$ : This will not violate the recharging constraint, as $s$ does not sense any other targets in the $\tau$ -surrounding of $t$ . ∎

We reduce from Independent Set on $3$ -regular triangle-free graphs [16, Theorem 3].

Construction

Let $G=(V,E)$ be a $3$ -regular triangle-free graph. For each vertex $v\in V$ , we introduce a vertex target $\alpha_{v}$ of value $1$ and for each $i\in[3]$ a blocker vertex target $\beta^{i}_{v}$ of value $3$ and a vertex sensor $a^{i}_{v}$ which is capable of sensing $\alpha_{v}$ and $\beta^{i}_{v}$ . Moreover, for each edge $e=\{u,v\}\in E$ , we introduce an edge target $\gamma_{e}$ of value $3$ and an edge sensor $g_{e}$ that can sense $\gamma_{e}$ , $\alpha_{u}$ , and $\alpha_{v}$ . Additionally, we add a constraining edge target $\hat{\gamma}_{e}$ of value $4$ and a constraining edge sensor $\hat{g}_{e}$ that is capable of sensing targets $\gamma_{e}$ and $\hat{\gamma}_{e}$ . The recharging time is $\tau=3$ . We claim that there is an $\ell$ -sized independent set $X$ in $G$ if and only if the Stackelberg equilibrium in the constructed ESG has value at least $\ell$ .

Forward Direction

Assume that $X\subseteq V$ is an independent set of size $\ell$ . To construct the target ordering we go through the vertices from the independent set one by one: For $v\in X$ , let $e_{1},e_{2},e_{3}$ be the three edges incident to $v$ . Then, we add to the ordering of targets the targets $\hat{\gamma}_{e_{1}}\beta^{1}_{v},\beta^{2}_{v},\gamma_{e_{1}},\alpha_{v},% \gamma_{e_{2}},\gamma_{e_{3}},\beta^{3}_{v},\hat{\gamma}_{e_{2}},\hat{\gamma}_% {e_{3}}$ in this order. After we have processed all vertices from the independent set like this, we append all other targets in a random ordering. First observe that because of 1, the constraining edge sensors will always sense the constraining edge targets. As a result of the structure of our target ordering, the constraint edge sensors for edges incident to $v\in X$ cannot sense the corresponding edge targets. From this, following a reasoning analogous to 1, we get that for each edge $e$ incident to a vertex from $X$ , the edge sensor $g_{e}$ senses the edge target $\gamma_{e}$ . Moreover, we also get from 1 that all three vertex sensors corresponding to a vertex $v\in X$ sense their respective blocker vertex target. All in all, it follows that for each vertex $v\in X$ all three corresponding vertex sensors and all edge sensors corresponding to incident edges sense a target within distance $3$ of $\alpha_{v}$ . Thus, none of the sensors that are capable of sensing $v$ can do so without violating their recharging constraint. Thus, all $\ell$ vertex targets corresponding to vertices from $X$ make it unsensed through the channel.

Backward Direction

Assume that we have a target ordering $\sigma$ so that targets of summed value at least $\ell$ make it unsensed through the channel under optimal play by Red. Let $\psi$ be the optimal sensing plan played in response by Red. From 1, it is immediate that no blocker vertex target and no constraining edge target can make it unsensed through the channel. Moreover, it is also easy to see that no edge target can make it unsensed through the channel, as otherwise we could always improve the plan by letting the corresponding edge sensor sense the edge target (and delete all vertex targets the sensor senses instead). Let $X$ be the set vertices so that the corresponding vertex targets make it unsensed through the channel. By the above observation, we have $|X|\geq\ell$ . We will now show that $X$ is an independent set.

For this, we make a series of observations: First, let $\alpha_{v}$ be an unsensed vertex target. Note that $\alpha_{v}$ can be sensed by $6$ different sensors (three vertex sensors and three edge sensors). By 2, we have that for all of these sensors there needs to be another target that the sensor is capable of sensing in the $3$ -surrounding of $\alpha_{v}$ . As there are $6$ targets in a $3$ -surrounding, it follows that the $3$ -surrounding of $\alpha_{v}$ contains only targets that can be sensed by one of these six sensors (specifically one target for each of these six sensors).

Second, we show that for each unsensed vertex target $\alpha_{v}$ there is no other vertex target in its $3$ -surrounding. Recall from the first observations that the only vertex targets that could be in the $3$ -surrounding of $\alpha_{v}$ are vertex targets corresponding to neighbors of $v$ that are sensed by the corresponding edge sensor in $\psi$ . For the sake of contradiction assume that there is some $u\in V$ so that $e=\{u,v\}\in E$ and $|\sigma(\alpha_{v})-\sigma(\alpha_{u})|\leq 3$ . Observe that the $3$ -surroundings of $\alpha_{u}$ and $\alpha_{v}$ overlap in at least $3$ targets. As $G$ is triangle-free and thus $u$ and $v$ do not share any common neighbors, this implies that the $3$ -surrounding of $\alpha_{u}$ contains at most three targets that can be sensed by one of the six sensors that are capable of sensing $\alpha_{u}$ (the other three spots will be filled with vertex blocker targets corresponding to $v$ and edge targets corresponding to edges incident to $v$ by the first observation and as $v$ makes it unsensed through the channel). Let $s$ be a sensor that is capable of sensing $\alpha_{u}$ for which no other target that $s$ can sense in the $3$ -surrounding of $\alpha_{u}$ . Next, note that by the first observation, $\alpha_{u}$ is sensed by $g_{e}$ in $\psi$ . We alter the sensing plan $\psi$ as follows. We let $\alpha_{v}$ instead of $\alpha_{u}$ be sensed by $g_{e}$ and let $\alpha_{u}$ be sensed by $s$ . As argued above, this does not violate the recharging time of the sensor $s$ . Moreover, it also does not violate the recharging time of $g_{e}$ , as there are no other targets (except $\alpha_{u}$ ) that $g_{e}$ is capable of sensing in the $3$ -surrounding of $\alpha_{v}$ (by the first observation). In the altered sensing plan the value of surviving targets is strictly smaller contradicting that $\psi$ is an optimal response by Red.

Third, we claim that for each of two unsensed vertex targets $\alpha_{v}$ and $\alpha_{v^{\prime}}$ we have that they are placed at a distance at least $7$ from each other, i.e., their $3$ -surroundings do not overlap. In order to show that this cannot be the case, we need to examine the constraining edge sensors. Let $e$ be an edge incident to $v$ . Our first two observations told us that the edge target $\gamma_{e}$ needs to be in the $3$ -surrounding of $\alpha_{v}$ . However, this is not sufficient: We claim that the constraining edge target $\hat{\gamma}_{e}$ needs to be in the $3$ -surroundings of $\gamma_{e}$ . Assume that this was not to hold. Then we could alter our sensing plan $\psi$ by making $\hat{g}_{e}$ sense $\gamma_{e}$ (it can do so without violating its recharging time because $\hat{\gamma}_{e}$ is not in the $3$ -surrounding of $\gamma_{e})$ and make $g_{e}$ sense $\alpha_{v}$ , which leads to a strictly better sensing plan. The claim follows. To prove the observation, for the sake of contradiction assume that the $3$ -surroundings of $\alpha_{v}$ and $\alpha_{v^{\prime}}$ did overlap. From the previous two observations, it follows that the only possibility for their $3$ -surrounding to overlap is that there is an edge $e=\{v,v^{\prime}\}\in E$ and that $\gamma_{e}$ constitutes the overlap of their $3$ -surrounding. However, in this case $\hat{\gamma}_{e}$ cannot be placed in the $3$ -surrounding of $\gamma_{e}$ , as it can neither belong to the $3$ -surrounding of $\alpha_{v}$ nor $\alpha_{v^{\prime}}$ (by the first observation). This leads to a contradiction to our above claim.

Now, combining these observations it follows that for each $v\in X$ , the $3$ -surrounding of $\alpha_{v}$ contains edge targets $\gamma_{e_{1}},\gamma_{e_{2}},\gamma_{e_{3}}$ with $e_{1}$ , $e_{2}$ , and $e_{3}$ being the edges incident to $v$ . Moreover, we have shown that the $3$ -surrounding of each target $\alpha_{v}$ with $v\in X$ is disjoint. This implies that no two vertices $v,u\in X$ can be incident to the same edge $e$ , as otherwise, $\gamma_{e}$ would be in the $3$ -surrounding of $\alpha_{u}$ and $\alpha_{v}$ , which leads to a contradiction as they are disjoint. It follows that $X$ is an independent set of size at least $\ell$ .

∎

Algorithm 3 Greedy_Sensing

Input: Blue team’s ordering $\sigma$ of targets
Output: Red team’s sensing plan $\psi$

1: Sort Blue’s target according to their value

2: Label Red’s sensors with

\{1,\cdots,k\}

3: for each target in the order of decreasing values do

4: Find all sensors that are available to sense the target with index

i

in the input ordering

5: if Only one sensor

s\in[k]

can sense then

6: Set

\psi_{i,s}=1

7: else if Multiple sensors

S\subseteq[k]

can sense then

8: Choose one

s\in S

randomly and set

\psi_{i,s}=1

9: Return

\psi

See 5

Proof.

We start by giving the formulation, where the inner-level program is similar to the ILP presented in Proposition 3. Note that the outer-level maximizes the value of non-sensed targets, while the inner-level minimizes this value:

$\displaystyle\max_{z,t}$	$\displaystyle\sum_{i\in[n]}v_{i}\cdot x_{i,k+1}$	(2)
s.t.	$\displaystyle z_{i}\neq z_{i^{\prime}},\forall i\neq i^{\prime}\in[n]$	(3)
	$\displaystyle\tau+1-n\cdot(1-l_{i,j})\leq z_{j}-z_{i},\forall i,j\in[n]$	(4)
	$\displaystyle z_{j}-z_{i}\leq n\cdot l_{i,j}+\tau,\forall i,j\in[n]$	(5)
	$\displaystyle\mathbf{x}=\text{argmin}_{\hat{\mathbf{x}}}\sum_{i\in[n]}v_{i}% \cdot\hat{x}_{i,k+1}$	(6)
s.t.	$\displaystyle\sum_{j\in[k+1]}\hat{x}_{i,j}=1,\forall i\in[n]$	(7)
	$\displaystyle\hat{x}_{i,j}\leq D_{i,j},\forall i\in[n],j\in[k]$	(8)
	$\displaystyle\hat{x}_{i,j}+\hat{x}_{i^{\prime},j}\leq 1+l_{i,i^{\prime}}+l_{i^% {\prime},i},\forall i\neq i^{\prime}\in[n],j\in[k]$	(9)

The outer-level program controls an integer variable $z_{i}$ for each target $i\in[n]$ that encodes the position in which the target appears in the final ordering. Moreover, for each pair of targets $i,j\in[n]$ , we have a binary variable $l_{i,j}$ capturing whether $t_{i}$ appears at least $\tau+1$ positions before $t_{j}$ in the ordering induced by the $z$ variables. Equation 3 ensures that each target is assigned a unique position¹⁰¹⁰10For each $i\neq i^{\prime}\in[n]$ , to convert Equation 3 into a linear constraint, we have to introduce a new binary variable $\delta_{i,i^{\prime}}$ . We then add two linear constraint $z_{i}\leq z_{j}-1+n\cdot\delta_{i,j}$ and $z_{i}\geq z_{j}+1+n\cdot(1-\delta_{i,j})$ that are satisfied if and onlf if $z_{i}\neq z_{i^{\prime}}$ . and Equations 5 and 6 ensures that $l_{i,j}$ is set to one if and only if $z_{j}\geq z_{i}+\tau+1$ (in Equations 5 and 6 we have $\tau+1\leq z_{j}-z_{i}$ if $l_{i,j}=1$ ; and $z_{j}-z_{i}\leq\tau$ if $l_{i,j}=0$ ).

The inner-level program controls binary variables $x_{i,j}$ for each $i\in[n]$ and $j\in[k+1]$ , which encode the sensing plan as in Proposition 3, i.e, $x_{i,j}=1$ for $i\in[n]$ and $j\in[k]$ implies that $t_{i}$ is sensed by $s_{j}$ . The value of the encoded plan is again $\sum_{i\in[n]}v_{i}\cdot x_{i,k+1}$ , which Blue wants to maximize (Equation 2) and Red wants to minimize (Equation 6). The validity of the sensing plan is secured in Equations 7, 8 and 9, where Equation 9 imposes that a sensor $s_{j}$ can only sense two targets $t_{i}$ and $t_{i^{\prime}}$ if either $t_{i}$ is at least $\tau+1$ positions before $t_{i^{\prime}}$ in the ordering encoded in the $z$ variables (i.e., $l_{i,i^{\prime}}=1$ ) or the other way around (i.e., $l_{i^{\prime},i}=1$ ). ∎

We give our greedy approximation algorithm for Best Red Response in Algorithm 3.

Appendix C Additional Material for Section 6.1

C.1 Proof of Theorem 6

C.1.1 Connection Best Blue Response and Minimum Maximal Matching

On an intuitive level, solving Best Blue Response with infinite recharging time $\tau=\infty$ and uniform target values has some similarities to solving the classic NP-hard Minimum Maximal Matching problem Yannakakis and Gavril [30]:

Minimum Maximal Matching
Input: A bipartite graph $G=(U\cup V,E)$ and an integer $\ell$ .

Question: Is there a maximal matching in $G$ containing at most $\ell$ edges?

We now discuss the intuitive connection as well as reasons why immediate reductions between the two problems are prohibited. Assume that we have a solution to our Best Blue Response instance where targets are ordered as $(t_{1},\dots,t_{n})$ . From this let us construct a bipartite graph $G$ with vertices $T$ on the left side and vertices $S$ on the right side. For the edge set $E$ , we add an edge between a target $t$ and a sensor $s$ if $s$ is capable of sensing $t$ Let now $F\subseteq E$ be the set of sensor-target pairs with $\{t,s\}\in F$ if $s$ senses $t$ when targets are send according to $(t_{1},\dots,t_{n})$ . It needs to hold that $F$ is a maximal matching in $G$ : Otherwise, there is some $\{t,s\}\in E\setminus F$ . The existence of this edge implies that target $t$ made it through the channel and sensor $s$ did not sense any target, which leads to a contradiction. Moreover, the size of this matching, i.e., $|F|$ , corresponds to the number of lost targets. Thus, Blue wants to find a maximal matching of minimum size. This discussion suggests a close connection between Simple Sequential Covering and Minimum Maximal Matching.

However, there are some crucial differences between the two problems which prohibit immediate reductions from one problem to the other: Most crucially, assume we were to model a bipartite graph as an instance of Best Blue Response by letting $U$ be the targets and $V$ the sensors (in some ordering $(v_{1},\dots,v_{n})$ ) and let $v$ be capable of sensing $u$ if $\{u,v\}\in E$ . The problem with this construction is that we cannot model arbitrary matchings $E^{\prime}\subseteq E$ as solutions to the constructed Best Blue Response instance: Assume that $E^{\prime}$ contains some edge $\{u,v_{i}\}$ and there is some $v_{j}$ with $j<i$ , $\{u,v_{j}\}\in E$ , and $v_{j}$ is not incident to any edges from $E^{\prime}$ . In this case, it is not possible to send the targets through the channel such that $E^{\prime}$ will be the sensed sensor-target pairs because $u$ will always be sensed by $v_{j}$ before it can be sensed by $v_{i}$ (which in turn implies that $v_{i}$ is still ready to sense other targets). As a consequence, intuitively speaking, in instances of Best Blue Response we are only interested in maximal matchings where each matched vertex from the left is matched to its “first” otherwise unmatched neighbor from the right side.

Because of this, we need to turn to a slightly more involved reduction that draws inspiration from the NP-hardness proof of Minimum Maximal Matching by Yannakakis and Gavril [30], yet requires some reworking of the construction and a different more involved proof. We reduce from the following SAT variant.

C.1.2 Proof of Correctness

See 6

Proof.

In this proof, for a target $t$ , we let $D(t)$ be the set of sensors that are capable of sensing $t$ .

We reduce from the following problem, which is NP-hard as proven by Yannakakis [29].

Restricted 3-Sat
Input: A propositional formula $(X,C)$ where each clause contains three literals and each variable appears in exactly two clauses positively and in exactly one clause negatively.

Question: Is there an assignment to variables in $X$ such that each clause from $C$ is satisfied?

Let $(X,C)$ be a given Restricted 3-Sat instance.

Construction.

Each target has value one. For each clause $c\in C$ , we add a clause target $t_{c}$ and a clause sensor $s_{c}$ with $D({t_{c}})=\{s_{c}\}$ .

For each variable $x\in X$ , we add a variable gadget. That is, we add variable targets $t_{x,1}$ , $t_{x,2}$ and $t_{\bar{x}}$ together with dummy targets $t^{\mathrm{du}}_{x,1}$ , $t^{\mathrm{du}}_{x,2}$ , $t^{\mathrm{du}}_{x,3}$ , and $t^{\mathrm{du}}_{x,4}$ . Next, we add catch sensors $s^{\mathrm{ca}}_{x,1}$ , $s^{\mathrm{ca}}_{x,2}$ , and $s^{\mathrm{ca}}_{\bar{x}}$ , which ensure that none of the variable targets can make it through the channel. Moreover, we add variable sensors $s_{x,1}$ , $s_{x,2}$ and $s_{\bar{x}}$ together with dummy sensors $s^{\mathrm{du}}_{x,1}$ and $s^{\mathrm{du}}_{x,2}$ .

Let $c_{i},c_{j},c_{k}\in C$ be three clauses such that $x$ appears positive in clauses $c_{i}$ and $c_{j}$ and negative in clause $c_{k}$ . The sensing matrix is defined through:

	$\displaystyle D({t_{x,1}})=\{s_{c_{i}},s_{x,1},s^{\mathrm{ca}}_{x,1}\}$	$\displaystyle D({t_{x,1}^{\mathrm{du}}})=\{s_{x,1},s^{\mathrm{du}}_{x,1}\}$
	$\displaystyle D({t_{x,2}})=\{s_{c_{j}},s_{x,2},s^{\mathrm{ca}}_{x,2}\}$	$\displaystyle D({t_{x,2}^{\mathrm{du}}})=\{s_{\bar{x}},s^{\mathrm{du}}_{x,1}\}$
	$\displaystyle D({t_{\bar{x}}})=\{s_{c_{k}},s_{\bar{x}},s^{\mathrm{ca}}_{\bar{x% }}\}$	$\displaystyle D({t_{x,3}^{\mathrm{du}}})=\{s_{\bar{x}},s^{\mathrm{du}}_{x,2}\}$
		$\displaystyle D({t_{x,4}^{\mathrm{du}}})=\{s_{x,2},s^{\mathrm{du}}_{x,2}\}$

The recharging time is $\infty$ , i.e., each sensor can sense at most one target. The ordering of the sensors is as follows. First, come the clause sensors (in some arbitrary ordering), then the dummy sensors (in some arbitrary ordering), then the variable sensors (in some arbitrary ordering) and last the catch sensors (in some arbitrary ordering). We ask whether there is an ordering of targets so that at least $\ell:=|C|+2|X|$ targets are not sensed. It is easy to see that the construction satisfies the restrictions from the theorem statement.

Proof of Correctness: Forward Direction

Assume we are given an assignment of variables $X$ that fulfills $C$ . Let $X^{*}\subseteq X$ be the set of variables set to true in this assignment. From this, we construct a partition of the targets into four groups that determine the ordering in which the targets move through the channel; the first group comes first and so on; the ordering of targets within one group is arbitrary. The first group consists of targets $t_{x,1}$ and $t_{x,2}$ for each $x\in X^{*}$ and target $t_{\bar{x}}$ for each $x\notin X^{*}$ . The second group consists of targets $t^{\mathrm{du}}_{x,1}$ and $t^{\mathrm{du}}_{x,4}$ for each $x\in X^{*}$ and $t^{\mathrm{du}}_{x,2}$ and $t^{\mathrm{du}}_{x,3}$ for each $x\notin X^{*}$ . The third group contains all remaining variable targets. And, finally, the fourth group contains all $|C|$ clause targets and the remaining $2|X|$ dummy targets. It is sufficient to prove that all $\ell$ targets from the fourth group make it through the channel. We prove this via a series of three claims.

Claim 3.

1.

Each clause sensor senses a target from the first group.
2.

Each dummy sensor senses a target from the second group.
3.

For each $x\in X^{*}$ , $s_{\bar{x}}$ senses a target from one of the first three groups, and for each $x\notin X^{*}$ , $s_{x,1}$ and $s_{x,2}$ sense a target from one of the first three groups.

Proof.

Proof of 1. This follows directly from the fact that $X^{*}$ is a satisfying assignment and that the clause sensors come first in the ordering of sensors.

Proof of 2. This follows directly from the fact that the dummy sensors come after the clause sensors in the sensor ordering and as the second group contains for each $x\in X$ $t^{\mathrm{du}}_{x,1}$ or $t^{\mathrm{du}}_{x,2}$ (making $s^{\mathrm{du}}_{x,1}$ sense a target) and $t^{\mathrm{du}}_{x,3}$ or $t^{\mathrm{du}}_{x,4}$ (making $s^{\mathrm{du}}_{x,2}$ sense a target).

Proof of 3. Let us focus on one $x\in X$ , and let $c_{i},c_{j},c_{k}\in C$ be three clauses such that $x$ appears positive in clauses $c_{i}$ and $c_{j}$ and negative in clause $c_{k}$ . If $x\in X^{*}$ , then $t_{\bar{x}}$ is part of the third group. However, from Statement 1 it follows that $s_{c_{k}}$ already sensed a previous target. As the variable sensors are before the catch sensors in the sensor order, it follows that $s_{\bar{x}}$ senses $t_{\bar{x}}$ . Similarly, if $x\notin X^{*}$ , $t_{x,1}$ and $t_{x,2}$ are part of the third group. Both $s_{c_{i}}$ and $s_{c_{j}}$ have already sensed previous targets because of Statement 1. From this Statement 3 follows. ∎

The claim implies that all sensors that can sense a target from the fourth group have already sensed another target before it is the fourth group’s turn, implying that all $\ell$ targets from the fourth group will make it through the channel.

Proof of Correctness: Backward Direction

Assume that there is an ordering of the targets such that at least $\ell$ targets move unsensed through the channel, and let $P^{*}\subseteq P$ be the set of these targets.

We prove the backward direction in a series of claims:

Claim 4.

1.

No variable target is part of $P^{*}$ .
2.

For each $x\in X$ , either $t^{\mathrm{du}}_{x,1}$ or $t^{\mathrm{du}}_{x,2}$ and either $t^{\mathrm{du}}_{x,3}$ or $t^{\mathrm{du}}_{x,4}$ is part of $P^{*}$ . All clause targets are part of $P^{*}$ .
3.

For each $x\in X$ , if $t_{\bar{x}}$ is sensed by a clause target, then neither $t_{x,1}$ nor $t_{x,2}$ are sensed by a clause target.

Proof.

Proof of 1. This follows immediately from the existence of a designated catch sensor for each variable target which can only sense this target. As a consequence, no variable target can ever make it unsensed through the channel.

Proof of 2. Note that because of the sensor $s_{x,1}^{\mathrm{du}}$ it is never possible that both $t^{\mathrm{du}}_{x,1}$ and $t^{\mathrm{du}}_{x,2}$ make it unsensed through the targets. Similarly, because of $s_{x,2}^{\mathrm{du}}$ it is never possible that both $t^{\mathrm{du}}_{x,3}$ and $t^{\mathrm{du}}_{x,4}$ make it unsensed through the channel. Together with Statement 1, this implies that from each variable gadget at most $2$ targets can be part of $P^{*}$ . By recalling that $\ell=|C|+2\cdot|X|$ and that there are only $|C|$ clause targets outside of variable gadgets, the statement follows.

Proof of 3. Let us focus on $t_{x,1}$ (the proof for $t_{x,2}$ is analogous). For the sake of contradiction assume that $t_{x,1}$ and $t_{\bar{x}}$ are both sensed by clause sensors, then both $s_{x,1}$ and $s_{\bar{x}}$ do not sense a variable target, respectively. Accordingly, at most one target out of $\{t^{\mathrm{du}}_{x,1},t^{\mathrm{du}}_{x,2},t^{\mathrm{du}}_{x,3},t^{\mathrm% {du}}_{x,4}\}$ can make it unsensed through the channel, contradicting Statement 2. ∎

Let $\alpha$ be a truth assignment that sets $x\in X$ to false if $t_{\bar{x}}$ is sensed by a clause sensor and $x\in X$ to true if $t_{x,1}$ or $t_{x,2}$ is sensed by a clause sensor. If neither of the two conditions hold, then we set $x$ to true. Note that the well-definedness of $\alpha$ follows immediately from Statement 3 of 4. Assume that $c\in C$ is not satisfied by $\alpha$ . However, this implies that $s_{c}$ does not sense a variable target corresponding to a literal appearing in $c$ (by the definition of $\alpha$ ). This implies that $s_{c}$ will sense $t_{c}$ , a contradiction to Statement 2 of 4. ∎

C.2 Proof of Proposition 7

See 7

Proof.

Recall that we assume that sensors act greedily and that the sensor ordering is fixed and known. We iteratively construct the target ordering always appending an additional target at the end of the ordering, while storing the types of already sent targets as well as the sensors that sensed the last $\tau+1$ targets. For our dynamic program, we create a table $J[i_{1},\dots,i_{n_{\chi}},b_{1},\dots,b_{\tau+1}]\in\mathbb{N}$ with $i_{j}\in[\ell_{j}]$ for each $j\in[n_{\chi}]$ and $b_{1},\dots,b_{\tau+1}\in S\cup\{\emptyset\}$ . For a table cell, let $i:=\sum_{j\in[n_{\chi}]}i_{j}$ , i.e., $i$ is the total number of targets that have been sent. An entry of the table stores the maximum value of targets that can survive if Blue sends $i_{j}$ targets of type $\gamma_{j}$ (for each $j\in[n_{\chi}]$ ) through the channel in a way that the $t$ th last target sent for $t\in[\max(1,(\sum_{j\in[n_{\chi}]}i_{j})-\tau),\sum_{j\in[n_{\chi}]}i_{j}]$ is sensed by sensor if $b_{j}\neq\emptyset$ and by no sensor if $b_{j}=\emptyset$ (the intuition is that $b_{1}$ is the sensor that sensed the most recent passing target (if existent), $b_{2}$ sensed the second most recent one, and so on. If the second constraint is not realizable, we set the table entry to $-\infty$ . The answer to our problem is $\min_{b_{1},\dots,b_{\tau+1}\in S\cup\{\emptyset\}}J[\ell_{1},\dots,\ell_{n_{% \chi}},b_{1},\dots,b_{\tau+1}]$ .

For the initialization, we set an entry $J[i_{1},\dots,i_{n_{\chi}},b_{1},\dots,b_{\tau+1}]$ to $0$ if $\sum_{j\in[n_{\chi}]}i_{j}=0$ and to $-\infty$ otherwise. Now, we update the table for increasing $\sum_{j\in[n_{\chi}]}i_{j}=0,1,\dots,n$ by filling $J[i_{1},\dots,i_{n_{\chi}},b_{1},\dots,b_{\tau+1}]$ as follows. We start by assuming that $b_{1}\neq\emptyset$ implying that the next target to be sent needs to be sensed by $b_{1}$ . If $b_{1}$ appears among $b_{2},\dots,b_{\tau+1}$ , we set the entry to $-\infty$ . Otherwise, let $\gamma_{j_{1}},\dots,\gamma_{j_{z}}$ be the target types so that $b_{1}$ is capable of sensing targets of this type and only sensors from $\{b_{2},\dots,b_{\tau+1}\}$ appear before $b_{1}$ in the sensor ordering and are capable of sensing targets of this type. Less formally speaking, $\gamma_{j_{1}},\dots,\gamma_{j_{z}}$ are all the target types so that if a target of this type is sent next over the channel $b_{1}$ would be the sensor sensing this target (as all other sensors placed before $b_{1}$ that can sense targets of this type are still recharging, i.e., they are part of $\{b_{2},\dots,b_{\tau+1}\}$ ). If no such target type exists, we set $J[i_{1},\dots,i_{n_{\chi}},b_{1},\dots,b_{\tau+1}]$ to $-\infty$ . Otherwise, for each $t\in[z]$ , we check $\max_{s\in S\cup\{\emptyset\}}J[i_{1},\dots,i_{\gamma_{j_{t}}}-1,\dots,i_{n_{% \chi}},b_{2},\dots,b_{\tau+1},s]$ and let the entry be the maximum of these values.

Analogously, if $b_{1}=\emptyset$ , we let $\gamma_{j_{1}},\dots,\gamma_{j_{z}}$ be the target types where $\{b_{2},\dots,b_{\tau+1}\}$ contains all the sensors that are capable of sensing targets of this type. If no such target type exists, we set the entry to be $-\infty$ . Otherwise, for each $t\in[z]$ , we compute $\max_{s\in S\cup\{\emptyset\}}J[i_{1},\dots,i_{\gamma_{j_{t}}}-1,\dots,i_{n_{% \chi}},b_{2},\dots,b_{\tau+1},s]$ plus the utility of targets of type $\gamma_{j_{t}}$ and let the entry be the maximum of these values.

The correctness follows from the fact that the initially stated invariant is preserved throughout the algorithm. Observing that computing each table entry takes $\mathcal{O}(n_{\chi}\cdot(k+1))$ time, the claimed running time of $\mathcal{O}\left(n_{\chi}\cdot\left(\prod_{i=1}^{n_{\chi}}(\ell_{i}+1)\right)% \cdot(k+1)^{\tau+2}\right)$ follows.

∎

C.3 Proof of Proposition 8

See 8

Proof.

We model an instance $\mathcal{I}$ of Best Blue Response as follows. For each target $i\in[n]$ , we add an integer variable $z_{i}$ that encodes the position in which the target appears in the final ordering. We add linear constraints so that $z_{i}\in[1,n]$ and $z_{i}\neq z_{i^{\prime}}$ for all $i\neq i^{\prime}\in[1,n]$ (see Footnote 10).

Next, similar as in Proposition 3, for each $i\in[n]$ and $j\in[k+1]$ , we add a binary variable $x_{i,j}$ . Setting $x_{i,j}$ to one means that $t_{i}$ is detected by sensor $s_{j}$ or in case that $j=k+1$ that the target makes it unsensed through the channel. Accordingly, the objective becomes:

\max\sum_{i\in[n]}v_{i}\cdot x_{i,k+1}.

For each target $i\in[n]$ , we impose that:

\sum_{j\in[k+1]}x_{i,j}=1.

Moreover, we impose $x_{i,j}\leq D_{i,j}$ for each $i\in[n]$ and $j\in[k]$ , enforcing the sensor capabilities.

To ensure that the recharging times of sensors are respected we add the following set of constraints. For each, $j\in[k]$ and $i,i^{\prime}\in[n]$ , we add

|z_{i^{\prime}}-z_{i}|\geq-n(2-x_{i,j}-x_{i^{\prime},j})+\tau.

This ensures that if $x_{i,j}=1$ and $x_{i^{\prime},j}=1$ , then $i$ and $i^{\prime}$ are placed far enough away from each other, while otherwise the condition is vacant. To realize the absolute value from the above equation, we have to introduce another set of binary variables $o_{i,i^{\prime}}$ for $i,i^{\prime}\in[n]$ and add the constraints: $z_{i^{\prime}}-z_{i}+n\cdot o_{i,i^{\prime}}\geq-n(2-x_{i,j}-x_{i^{\prime},j})+\tau$ and $z_{i}-z_{i^{\prime}}+n\cdot(1-o_{i,i})\geq-n(2-x_{i,j}-x_{i^{\prime},j})+\tau$ .

If a target $i$ is sensed by sensor $j$ then due to the recharging time sensor $j$ will not be able to sense other sensors, thus $i$ “protects” some targets from being sensed by sensor $j$ . To capture this information, for each $i,i^{\prime}\in[n]$ and $j\in[k]$ , we add a binary variable $y_{i,i^{\prime},j}$ that is equal to one target $i$ is sensed by sensor $j$ and because of this $j$ cannot sense $i^{\prime}$ . To ensure this, first, for each $i\in[n]$ and $j\in[k]$ , we add:

\sum_{i^{\prime}\in[n]}y_{i,i^{\prime},j}\leq n\cdot x_{i,j}

(a target $i$ can only protect other targets if the target is sensed by the corresponding sensor). Moreover, for each $i,i^{\prime}\in[n]$ and $j\in[k]$ , we need to make sure that if $y_{i,i^{\prime},j}=1$ , then $0\leq z_{i^{\prime}}-z_{i}\leq\tau$ (to exploit of the recharging constraint). For this, we add constraints:

-n(1-y_{i,i^{\prime},j})\leq z_{i^{\prime}}-z_{i}\leq n(1-y_{i,i^{\prime},j})+\tau.

(10)

Lastly, we need to make sure that a target will survive until step $j$ if $x_{i,j}=1$ i.e., the target needs to be covered by other targets for all sensors that are capable of sensing it placed before $j$ . Note that this together with the first constraint ( $\sum_{j\in[k+1]}x_{i,j}=1$ ) in particular implies that each target is sensed by the first sensor it passes which is not recharging, thereby successfully encoding the greedy behavior of the sensors. Specifically, we add the following set of constraints for each $i\in[n]$ and $j\in[k+1]$ :

	$\displaystyle\sum_{t\in[j-1]:D_{i,t}=0}1+$	$\displaystyle\sum_{t\in[j-1]:D_{i,t}=1}\sum_{i^{\prime}\in[n]}y_{i^{\prime},i,% t}-(j-1)$		(11)
		$\displaystyle\geq-n(1-\sum_{t=j}^{k+1}x_{i,t}).$

∎

Appendix D Additional Experimental Results

To begin, we generate a Figure 2 illustrating the average utility for Blue across varying probabilities of $D_{i,j}=1$ . This visualization aims to demonstrate the impact of the probability of $D_{i,j}=1$ on Blue’s utility under the Default game settings.

In Table 7, we show the effectiveness of our ILP solver. Notably, it demonstrates the capability to solve large instances very fast, completing the task within a second. This proficiency has been valuable in the development of heuristic algorithms for bilevel optimization in the search for identifying the Stackelberg equilibrium. We also show that our ILP solver can solve extensive instances involving hundreds of targets within a single hour in Table 8.

Utility, Time (s)	2	5	10
5	1.76 $\pm$ 0.75, 0.001	1 $\pm$ 0.63, 0.002 $\pm$ 0.02	0.38 $\pm$ 0.43, 0.003 $\pm$ 0.002
25	8.73 $\pm$ 1.27, 0.004 $\pm$ 0.002	4.55 $\pm$ 1.3, 0.008 $\pm$ 0.002	1.39 $\pm$ 0.84, 0.01 $\pm$ 0.003
75	26 $\pm$ 2.8, 0.01 $\pm$ 0.007	13.9 $\pm$ 2.33, 0.02 $\pm$ 0.007	4.41 $\pm$ 1.57, 0.04 $\pm$ 0.007

Table 7: Best Red Response ILP running time: Each row represents the number of targets, and each column represents the number of sensors.

\tau=2

for every setting. Each element represents the average Blue’s utility (first row) and the average solving time (italic second row) of 50 randomly generated instances under Default game setting.

Utility, Time (s)	5	10	20
600	175 $\pm$ 6, 0.73 $\pm$ 0.01	77 $\pm$ 5, 1.55 $\pm$ 0.22	4.01 $\pm$ 1.53, 2.79 $\pm$ 0.04
800	234 $\pm$ 6.5, 1.02 $\pm$ 0.01	103 $\pm$ 6.1, 2.09 $\pm$ 0.27	5.28 $\pm$ 2.06, 3.82 $\pm$ 0.04
1000	292 $\pm$ 7.2, 1.23 $\pm$ 0.02	131 $\pm$ 6.7, 2.62 $\pm$ 0.35	6.4 $\pm$ 2, 5 $\pm$ 0.06
5000	1478 $\pm$ 20, 6.5 $\pm$ 0.05	662 $\pm$ 13.1, 13.66 $\pm$ 1.48	33.1 $\pm$ 3.98, 24.76 $\pm$ 0.051
10,000	2959 $\pm$ 24, 12.9 $\pm$ 0.18	1321 $\pm$ 20, 29.8 $\pm$ 2.6	66.3 $\pm$ 5.57, 62.59 $\pm$ 0.83

Table 8: Each row represents the number of targets, and each column represents the number of sensors.

\tau=10

for every setting. Each element represents the average Blue’s utility (first row) and the average solving time (italic second row) of the ILP for 50 randomly generated instances under Default game setting.

In the remaining parts of this section, we explore a new game setting (Append) for generating ESGs. The new method is similar to Default, with the distinction that each element $D_{i,j}=1$ with a 0.5 probability. In essence, this configuration increases the likelihood of each target being sensed compared to the Default setting, thereby resulting in a stronger Red sensing model.

D.1 Computing the Follower Strategy

Similar to the Default game setting, we show the scalability results of Append in table 9.

Utility, Time (s)	5	10	20
600	67.8 $\pm$ 4.3, 0.74 $\pm$ 0.06	13.82 $\pm$ 3.96, 4.68 $\pm$ 8.46	0.2 $\pm$ 1.38, 2.85 $\pm$ 0.04
800	91.1 $\pm$ 5.8, 1.02 $\pm$ 0.1	19.6 $\pm$ 7.75, 6.27 $\pm$ 10.98	0.13 $\pm$ 0.36, 3.66 $\pm$ 0.04
1000	114 $\pm$ 7.9, 1.29 $\pm$ 0.14	24.5 $\pm$ 5.9, 6.06 $\pm$ 4.7	0.25 $\pm$ 1.45, 4.77 $\pm$ 0.06

Table 9: Each row represents the number of targets, and each column represents the number of sensors.

\tau=10

In this new setting, an intriguing observation emerges as the number of sensors increases significantly: the runtime of our ILP decreases, given that the abundant sensors can effectively sense all targets (e.g., when the number of sensors increased from 10 to 20.).

D.2 Additional Results from Computing the Stackelberg Equilbrium

In this subsection, we begin by showing Figure 3, illustrating the impact of the choice of ration ( $\mu$ ) on the SA algorithm discussed in Section 4.2. Specifically, we present and test three quadratic-time greedy heuristics to build the sensing plan $\psi$ iteratively by trying to sense the most valuable targets first. We consider the targets in decreasing order of their value. Let $T^{\prime}$ be the already processed targets and $t_{\ell}$ the target to consider. Moreover, let $S^{\prime}\subseteq S$ the set of sensors $s$ so that $\psi$ remains a valid sensing plan after adding $t_{\ell}$ to $\psi(s)$ , i.e., the sensors that are currently free to sense $t_{\ell}$ . If $S^{\prime}$ is empty, then we do not assign $t_{\ell}$ to any sensor, implying that it will be won by Blue. Otherwise, we apply three different methods to decide which sensor from $S^{\prime}$ to pick:

random: Randomly select a sensor from $S^{\prime}$ .
remaining_value: Pick the sensor $s$ from $S^{\prime}$ that has the lowest summed value of remaining targets that $s$ is capable of sensing, i.e., $\operatorname*{arg\,min}_{s_{j}\in S^{\prime}}\sum_{t_{i}\in T\setminus T^{% \prime}:D_{i,j}=1}v_{i}$ .
harm: Pick the sensor $s$ from $S^{\prime}$ that where assigning $t_{\ell}$ does the least harm: The harm that $t_{\ell}$ does to $s$ in $\psi$ summed value of remaining targets that $s$ is capable of sensing that it can no longer sense when $t_{\ell}$ is assigned to $s$ , i.e., $\operatorname*{arg\,min}_{s_{j}\in S^{\prime}}\sum_{t_{i}\in T\setminus T^{% \prime}:D_{i,j}=1\text{ and }|i-\ell|\leq\tau}v_{i}$ .

We also present the bilevel ILP’s scalability results under the new Append game setting in Table 10.

Utility, Time (s)	2	3	5
5	0.72 $\pm$ 0.55, 0.73 $\pm$ 0.17	0.41 $\pm$ 0.49, 0.85 $\pm$ 0.12	0.14 $\pm$ 0.33, 1.02 $\pm$ 0.04
7	0.92 $\pm$ 0.43, 121 $\pm$ 13	0.57 $\pm$ 0.53, 134 $\pm$ 17	0.18 $\pm$ 0.33, 155 $\pm$ 16
8	0.86 $\pm$ 0.51, 1841 $\pm$ 356	0.41 $\pm$ 0.42, 2151 $\pm$ 253	0.13 $\pm$ 0.27, 2832 $\pm$ 549
9	n/a, 31358	n/a, 36181	n/a, 41999

Table 10: Each row represents the number of targets, and each column represents the number of sensors.

\tau=2

for every setting. Each element (when

n=5,7,8

) represents the average Blue’s utility (first row) and the solving time (italic second row) of the bilevel ILP for 50 randomly generated instances under the new Append game setting. At

n=9

, the computational time is prohibitively high. Therefore, we conduct a single run on a random instance and record the solving time. Since the utility of this individual instance is not comparable to the average utility derived from 50 random instances, we have omitted it from the table.

We also provide a comparison of heuristic algorithms under the new Append game setting in Table 11.

Utility, Time (s)	(7,3,2)	(75, 10, 5)
OPT	0.58 $\pm$ 0.53, 134 $\pm$ 17	n/a
SA	0.58 $\pm$ 0.53, 6.13 $\pm$ 0.87	2.26 $\pm$ 1.13, 27335 $\pm$ 347
SA_Relax	0.58 $\pm$ 0.53, 0.043 $\pm$ 0.01	0.56 $\pm$ 0.48, 46.87 $\pm$ 0.92
Random	0.50 $\pm$ 0.54, 0.001	0.15 $\pm$ 0.27, 0.001

Table 11: Compare the approximability of different greedy algorithms in terms of Blue’s utility and solving time. We generate 50 random instances and report the averaged value plus the standard deviation under Append game setting.

D.3 Additional Results from Non-Coordinated Sensing

In Table 12, we present the scalability results of the ILP that solves for optimal Blue responses in the non-coordinated Red sensing setting under Append game setting.

Utility, Time (s)	2	3	5
5	1.26 $\pm$ 0.46, 0.01 $\pm$ 0.04	0.96 $\pm$ 0.47, 0.02 $\pm$ 0.04	0.52 $\pm$ 0.42, 0.02 $\pm$ 0.04
10	2.34 $\pm$ 0.7, 4.07 $\pm$ 16.8	1.85 $\pm$ 0.89, 90.2 $\pm$ 294.6	1 $\pm$ 0.73, 293 $\pm$ 565
15	3.79 $\pm$ 0.87, 482 $\pm$ 2551	3.01 $\pm$ 1.16, 50.4 $\pm$ 18.9	n/a, 40296 $\pm$ n/a
20	4.61 $\pm$ 1, 889 $\pm$ 1530	n/a, 9380 $\pm$ n/a	n/a, n/a

Table 12: Each row represents the number of targets, and each column represents the number of sensors.

\tau=2

for every setting. Each element represents the average Blue’s utility (first row) and the solving time (italic second row) of Blue’s best response ILP for 50 randomly generated instances under Append game setting. For settings where the computational time is prohibitively high, we conduct a single run on a random instance and record the single solving time. Thus, their standard deviation value is recorded as “n/a”. Additionally, the utility of this single instance is not comparable to the average utility derived from 50 random instances, we have omitted it from the table.

For the non-coordinated sensing setting, we also provide a comparison of heuristic algorithms under the new scenario in Table 13.

Utility, Time (s)	(10,5,2)	(75, 10, 5)
OPT	1 $\pm$ 0.73, 293 $\pm$ 565	n/a
SA	0.83 $\pm$ 0.67, 0.76 $\pm$ 0.02	4.57 $\pm$ 1.46, 523 $\pm$ 6.28
SA_Relax	0.94 $\pm$ 0.71, 0.02 $\pm$ 0.003	3.47 $\pm$ 1.02, 3.13 $\pm$ 0.12
Random	0.38 $\pm$ 0.5, 0.001	1.28 $\pm$ 1.05, 0.001

Table 13: Compare the approximability of different greedy algorithms in terms of Blue’s utility and solving time. We generate 50 random instances and report the averaged value plus the standard deviation under Append game setting.

D.3.1 Power of Coordination

Finally, in Table 14, we present the power of coordination results under Append game setting with Red that has stronger sensing capabilities.

	Greedy	Coordination
(5, 2, 2)	1.26 $\pm$ 0.46	0.72 $\pm$ 0.55
(5, 3, 2)	0.96 $\pm$ 0.47	0.41 $\pm$ 0.49
(5, 5, 2)	0.52 $\pm$ 0.42	0.14 $\pm$ 0.33

Table 14: Append setting: Each element represents the average Blue’s utility for 50 randomly generated instances. The game is a constant-sum game. Therefore, the decrease in Blue’s utility corresponds to an increase in Red’s utility, showing the power of coordination.

Moreover, in scenarios with large instance sizes, such as $n=75,k=10,\tau=5$ , where the optimal (bilevel) ILP is unsolvable, we can compare Blue’s approximately optimal utility under the best heuristic algorithms. Specifically, as shown in Table 11, the average Blue’s utility under the SA algorithm is $2.26\pm 1.13$ for 9 instances. Given the same 9 instances, the average Blue’s utility when escaping from non-coordinated sensing is $4.43\pm 1.41$ , which is approximately twice the value observed in the coordinated sensing setting. Once again, due to the constant-sum nature of this game, the utility loss for Blue in transitioning from non-coordinated sensing to coordinated sensing essentially represents the utility gain for Red, highlighting the power of coordination.

Escape Sensing Games: Detection-vs-Evasion in Security Applications

Abstract

1 Introduction

1.1 Our Contribution

2 Escape Sensing Games (ESGs): The Model

Objectives and equilibrium

A motivating application

Sensor and target types

3 Related Work

Computational game theory for security

Scheduling

4 The Algorithmics of Escape Sensing Games

4.1 Computing Red’s Best Response Strategy

Theorem 1.

Proof.

Proof of correctness: forward direction

Proof of correctness: backward direction

Proposition 2.

Proof Sketch.

Proposition 3.

Proof.

4.2 Solving for the Stackelberg Equilibrium

Theorem 4.

4.2.1 Bilevel Optimization

Proposition 5.

4.2.2 Heuristic

5 Experimental Evaluations

6 Escape from Non-Coordinated Sensing

6.1 Algorithmic Analysis

Theorem 6.

Proof Sketch.

Proposition 7.

6.1.1 ILP Formulation

Proposition 8.

Proof Sketch.

6.1.2 Heuristic

6.2 Experiments

7 Conclusion

Acknowledgements

References

Appendix A Additional Material for Section 4.1

Proof.

Lemma 9.

Proof.

Appendix B Additional Material for Section 4.2

Proof.

Claim 1.

Proof.

Claim 2.

Proof.

Construction

Forward Direction

Backward Direction

Proof.

Appendix C Additional Material for Section 6.1

C.1 Proof of Theorem 6

C.1.1 Connection Best Blue Response and Minimum Maximal Matching

C.1.2 Proof of Correctness

Proof.

Construction.

Proof of Correctness: Forward Direction

Claim 3.

Proof.

Proof of Correctness: Backward Direction

Claim 4.

Proof.

C.2 Proof of Proposition 7

Proof.

C.3 Proof of Proposition 8

Proof.

Appendix D Additional Experimental Results

D.1 Computing the Follower Strategy

D.2 Additional Results from Computing the Stackelberg Equilbrium

D.3 Additional Results from Non-Coordinated Sensing

D.3.1 Power of Coordination

Escape Sensing Games:
Detection-vs-Evasion in Security Applications