Article APT
Article APT
com/science/article/pii/S0167404819302020
Manuscript_87e795d2d244b766dc0a506db692ed49
1. Introduction
The recent advances in automation technologies, 5G networks, and cloud services have accelerated the development
of cyber-physical systems (CPSs) by integrating computing and communication functionalities with components in
the physical world. Cyber integration increases the operational efficiency of the physical system, yet it also creates
additional security vulnerabilities. First, the increased connectivity and openness have expanded the attack surface
and enabled attackers to leverage vulnerabilities from multiple system components to launch a sequence of stealthy
attacks. Second, the component heterogeneity, the functionality complexity, and the dimensionality of cyber-physical
systems have created many zero-day vulnerabilities, which make the defense arduous and costly.
Advanced Persistent Threats (APTs) are a class of emerging threats for cyber-physical systems with the following
distinct features. Unlike opportunistic attackers who spray and pray, APTs have specific targets and sufficient knowl-
edge of the system architecture, valuable assets, and even defense strategies. Attackers can tailor their strategies and
invalidate cryptography, firewalls, and intrusion detection systems. Unlike myopic attackers who smash and grab,
APTs are stealthy and can disguise themselves as legitimate users for a long sojourn in the victim’s system.
A few security researchers and experts have proposed APT models in which the entire intrusion process is divided
into a sequence of phases, such as Lockheed-Martin’s Cyber Kill Chain (see Hutchins, Cloppert and Amin (2011)),
MITRE’s ATT&CK (see Corporation (2019)), the NSA/CSS technical cyber threat framework (see Department of
Homeland Security (2018)), and the ones surveyed in Messaoud, Guennoun, Wahbi and Sadik (2016). Fig. 1 illustrates
the multi-stage structure of APTs. During the reconnaissance phase, a threat actor collects open-source or internal
intelligence to identify valuable targets. After the attacker obtains a private key and establishes a foothold, he escalates
privilege, propagates laterally in the cyber network, and eventually either accesses confidential information or inflicts
physical damage. Static standalone defense on a physical system cannot deter attacks originated from a cyber network.
The multi-phase feature of APTs results in the concept of Defense in Depth (DiD), i.e., multi-stage cross-layer
defense policies. A system defender should adopt defensive countermeasures across the phases of APTs and holistically
consider interconnections and interdependencies among these layers. To formally describe the interaction between an
APT attacker and a defender with the defense-in-depth strategy, we map the sequential phases of APTs into a game of
multiple stages. Each stage describes a local interaction between the attacker and the defender where the outcome leads
[email protected] (L. Huang); [email protected] (Q. Zhu)
ORCID (s):
00000003-15918749 (L. Huang); 00000002-00082953 (Q. Zhu)
Sensor
Social engineering
Physical damage
Controller
Insider threats
Web
phishing Private
key
Figure 1: An illustrate example of the multi-stage structure of APTs. The multi-stage attack is composed of reconnaissance,
initial compromise, privilege escalation, lateral movement, and mission execution. An attack originated from an early-stage
cyber network can lead to damage in a physical system.
to the next stage of interactions. The goal of the attacker is to stealthily reach the targeted physical or informational
assets while the defender aims to take defensive actions at multiple phases to thwart the attack or reduce its impact.
Detecting APTs timely (i.e., before attackers have reached the final stage) and effectively (i.e., with a low rate of
false alarms and missed detections) is still an open problem due to their stealthy and deceptive characteristics. As
reported in LLC (2018), US companies in 2018 have taken an average of 197 and 69 days, respectively, to detect
and contain a data breach. Stuxnet-like APT attacks can conceal themselves in a critical industrial system for years
and inconspicuously increase the failure probability of physical components. Due to the insufficiency of timely and
effective detection systems for APTs, the defender remains uncertain about the user’s type, i.e., either legitimate or
adversarial, throughout stages. To prepare for the potential APT attacks, the defender needs to adopt precautions and
proactive defense measures, which may also impair the user experience and reduce the utility of a legitimate user.
Therefore, the defender needs to strategically balance the tradeoff between security and usability when the user’s type
remains private.
In this work, we model the private information of the user’s type as a random variable following the work of
Harsanyi (1967). Under the same defense action, the behavior and the utility of a user depend on whether his type is
legitimate or adversarial. To make secure and usable decisions under incomplete information, the defender forms a
belief on the user’s type and updates the belief via the Bayesian rule based on the information acquired at each stage.
For example, throughout the phases of an APT, detection systems can generate many alerts based on suspicious user
activities. Although these alerts do not directly reveal the user’s type, a defender can use them to reduce the uncertainty
on the user’s type and better determine her defense-in-depth strategies at multiple stages.
Defensive deception provides an alternative perspective to bring uncertainty to the attacker and tilt the information
asymmetry. We classify a defender into different levels of sophistication based on factors such as her level of security
awareness, detection techniques she have adopted, and the completeness of her virus signature database. A sophisti-
cated defender has a higher success rate of detecting adversarial behaviors. Thus, the behavior of an attacker depends
on the type of defender that he interacts with. For example, the attacker may remain stealthy when he interacts with a
sophisticated defender but behaves more aggressively when interacting with a primitive defender. As the attacker has
incomplete information regarding the defender’s type, he needs to form a belief and continuously updates it based on
his observation of the defender’s actions. In this way, the attacker can optimally decide whether, when, and to what
extent, to behave aggressively or conservatively.
To this end, we also use a random variable to characterize the private information of the defender’s type. As both
players have incomplete information regarding the other player’s type and they make sequential decisions across multi-
ple stages, we extend the classical static Bayesian game to a multi-stage nonzero-sum game with two-sided incomplete
information. Both players act strategically according to their beliefs to maximize their utilities. The Perfect Bayesian
Nash Equilibrium (PBNE) provides a useful prediction of their policies at every stage for each type since no players can
benefit from unilateral deviations at the equilibrium. Computing the PBNE is challenging due to the coupling between
the forward belief update and the backward policy computation. We first formulate a mathematical programming
problem to compute the equilibrium policy pair under a given belief for the one-stage Bayesian game. For multi-stage
Bayesian games, we compute the equilibrium policy pair under a given sequence of beliefs by constructing a sequence
of nested mathematical programming problems. Finally, we combine these programs with the Bayesian update and
propose an efficient algorithm to compute the PBNE.
The proposed modeling and computational methods are shown to be capable of hardening the security of a broad
class of supervisory control and data acquisition (SCADA) systems. This work leverages the Tennessee Eastman pro-
cess as a case study of proactive defense-in-depth strategies against the APT attackers who can infiltrate into the cyber
network through phishing emails, escalate privileges through the process injection, tamper the sensor reading through
malicious encrypted communication, and eventually decrease the operational efficiency of the Tennessee Eastman
process without triggering the alarm. The dynamic game approach offers a quantitative way to assess the risks and
provides a systematic and computational mechanism to develop proactive and strategic defenses across multiple cyber
and physical stages. Based on the computation result of the case study, we obtain the following insights to guide the
design of practical defense systems.
• Defense at the final stage is usually too late to be effective when APTs have been well-prepared and ready to
attack. We need to take precautions and proactive responses in the cyber stages when the attack remains “under
the radar" so that the attacker becomes less dominant when they reach the final stage.
• The online learning capability of the defender plays an important role in detecting the adversarial deception and
tilting the information asymmetry. It increases the probability of identifying the hidden information from the
observable behaviors, threatens the stealthy attacker to take more conservative actions, and hence reduces the
attack loss.
• Third, defensive deception techniques are shown to be effective to introduce uncertainty to attackers, increase
their learning costs, and hence reduce the probability of successful attacks. Those techniques may introduce a
negative impact on legitimate users. However, a delicate balance between security and usability can be achieved
under proper designs.
Oprea and Rivest (2013) has proposed FlipIt game to model the key leakage under APTs as a private takeover between
the system operator and the attacker. Many works have integrated FlipIt with other components for the APT defense
such as the signaling game to defend cloud service (see Pawlick, Chen and Zhu (2018)), an additional player to model
the insider threats (see Feng, Zheng, Hu, Cansever and Mohapatra (2015)), and a system of multiple nodes under limited
resources (see Zhang, Zheng and Shroff (2015)). The FlipIt has described a high-level abstraction of the attacker’s
behavior to understand optimal timing for resource allocations. However, for our purpose of developing multi-stage
defense policies, we need to provide a finer-grained model that can capture the dynamic interactions between players of
different types across multiple stages. Our game framework models heterogeneous adversarial and defensive behaviors
at multiple stages, allowing the prediction of attack moves and the estimation of losses using the equilibrium analysis.
Other security game models such as Zhu and Rass (2018); Yang, Li, Zhang, Yang, Xiang and Zhou (2018); Huang,
Chen and Zhu (2017) have provided dynamic risk management frameworks that allow the defender to response and
repair effectively. In particular, to model the multi-stage structure of APTs, Zhu and Rass (2018) has developed a
sequence of heterogeneous game phases, i.e., a static Bayesian game for spear phishing, a nested game for penetration,
and a finite zero-sum game for the final stage of physical-layer infrastructure protection. However, most of these
security game frameworks have assumed complete information. Our framework explicitly models the incomplete
information across the entire phases of APTs and introduces their belief updates based on multi-stage information for
making long-term strategic decisions.
Cyber deception is an emerging research area. Games of incomplete information are natural frameworks to model
the uncertainty and misinformation introduced by cyber deceptions. Previous works mainly focus on adversarial de-
ceptions where the deceiver is the attacker. For example, strategic attackers in Nguyen, Wang, Sinha and Wellman
manipulate the attack data to mislead the defender in finitely repeated security games. A defender, on the other hand,
can also initiate defensive deception techniques such as perturbations via external noises, obfuscations via revealing
useless information, or honeypot deployments as shown in Pawlick, Colbert and Zhu (2017). Horák, Zhu and Bošanskỳ
(2017) proposes a framework to engage with attackers strategically to deceive them against the attack goal without their
awareness. A honeypot which appears to contain valuable information can lure attackers into isolation and surveil-
lance. La, Quek, Lee, Jin and Zhu (2016) has used a Bayesian game to model deceptive attacks and defenses in a
honeypot-enabled network in the envisioned Internet of Things. Besides detection, a honeypot can also be used to ob-
tain high-level indicators of compromise under a proper engagement policy as shown in Huang and Zhu (2019a) where
several security metrics are investigated and the optimal engagement policy is learned by reinforcement learning. A
system can also disguise a real asset as a honeypot to evade attacks as shown in Rowe, Custy and Duong (2007). Our
work considers a dynamic Bayesian game with double-sided incomplete information to incorporate both adversarial
and defensive deceptions.
The preliminary versions of this work (see Huang and Zhu (2018, 2019b)) have considered a dynamic game with
one-sided incomplete information where attackers disguise themselves as legitimate users. This work extends the
framework to a two-sided incomplete information structure where primitive systems can also disguise themselves as
sophisticated systems. The new framework enables us to jointly investigate deceptions adopted by both attackers and
defenders, and strategically design defensive deceptions to counter adversarial ones. We also develop new method-
ologies to address the challenge of the coupled belief update in a generalize setting without the previous assumption
of the beta-binomial conjugate pair. In the case study, we investigate heterogeneous actions and cyber stages such as
web phishing and privilege escalation, whose utilities are no longer negligible. Moreover, we leverage the Tennessee
Eastman process with new performance metric and attack models to validate the efficacy of the proposed proactive
defense-in-depth strategies, the Bayesian learning, and the defensive deception.
Table 1
Summary of notations, variables, and acronyms.
General Notation Meaning
𝐴 ∶= 𝐵 𝐴 is defined as 𝐵
Pr Probability
𝑓 ∶𝐴↦𝐵 A function or a mapping 𝑓 from domain 𝐴 to codomain 𝐵
𝔼𝑎∼𝐴 [𝑓 (𝑎)] Expectation of 𝑓 (𝑎) over random variable 𝑎 whose distribution is 𝐴
ℝ Set of real numbers
|𝐴| The cardinality of set 𝐴
𝑎∼𝐴 Random variable 𝑎 follows probability distribution 𝐴
𝟏{𝑥=𝑦} Indicator function which equals one when 𝑥 = 𝑦, and zero otherwise
{𝑎1 , ⋯ , 𝑎𝑛 } Set with 𝑛 elements 𝑎1 , ⋯ , 𝑎𝑛
𝐵⧵𝐴 Set of elements in 𝐵 but not in 𝐴
Variable Meaning
𝑖, 𝑗 ∈ {1, 2} Index for players in the game: 𝑖, 𝑗 = 1 for the defender and 𝑖, 𝑗 = 2 for the user
Θ𝑖 Set of all possible types of player 𝑖 ∈ {1, 2}
Δ(Θ𝑖 ) Space of probability distributions over type set Θ𝑖 of player 𝑖 ∈ {1, 2}
𝜃𝑖 ∈ Θ𝑖 Type of player 𝑖 ∈ {1, 2}
𝜃1𝐻 (resp. 𝜃1𝐿 ) The defender is sophisticated (resp. primitive)
𝜃2𝑏 (resp. 𝜃2𝑔 ) The user is adversarial (resp. legitimate)
𝐾 Total number of stages
𝑘 ∈ {0, 1, ⋯ , 𝐾} Stage index
𝑘0 ∈ {0, 1, ⋯ , 𝐾} Index for the initial stage
𝐴𝑘𝑖 Set of all possible actions of player 𝑖 ∈ {1, 2} at stage 𝑘 ∈ {0, 1, ⋯ , 𝐾}
Δ(𝐴𝑘𝑖 ) Space of probability distributions over the action set 𝐴𝑘𝑖
𝑎𝑘𝑖 ∈ 𝐴𝑘𝑖 Action of player 𝑖 ∈ {1, 2} at stage 𝑘 ∈ {0, 1, ⋯ , 𝐾}
ℎ𝑘 , 𝐻 𝑘 Action history and the set of all possible action histories at stage 𝑘 ∈ {0, 1, ⋯ , 𝐾}
𝑥𝑘 , 𝑋 𝑘 State and the set of all possible states at stage 𝑘 ∈ {0, 1, ⋯ , 𝐾}
𝑓𝑘 State transition function at stage 𝑘, i.e., 𝑥𝑘+1 = 𝑓 𝑘 (𝑥𝑘 , 𝑎𝑘1 , 𝑎𝑘2 )
𝑙𝑖𝑘 , 𝐿𝑘𝑖 Available Information and set of all available information for player 𝑖 at stage 𝑘
𝜎𝑖𝑘 , Σ𝑘𝑖 Behavioral strategy and the set of all behavioral strategies for player 𝑖 at stage 𝑘
𝜎𝑖𝑘 (𝑎𝑘𝑖 |𝑙𝑖𝑘 ) Probability of player 𝑖 taking action 𝑎𝑘𝑖 at stage 𝑘 based on the available information 𝑙𝑖𝑘
𝑘 ∶𝐾
𝜎𝑖 0 Player 𝑖’s behavioral strategies from stage 𝑘0 to 𝐾
∗,𝑘 ∶𝐾
𝜎𝑖 0 (𝜎𝑖∗,𝐾 ∶= 𝜎𝑖∗,𝐾∶𝐾 ) Player 𝑖’s behavioral strategies from stage 𝑘0 to 𝐾 at the equilibrium
𝑏𝑖 ∶ 𝐿𝑘𝑖 ↦ Δ(Θ𝑗 )
𝑘
Player 𝑖’s belief on the other player 𝑗’s type at stage 𝑘 based on the available information
𝑏𝑘𝑖 (𝜃𝑗 |𝑙𝑖𝑘 ) Probability of player 𝑗 being type 𝜃𝑗 when player 𝑖 observes information 𝑙𝑖𝑘 at stage 𝑘
Player 𝑖’s stage utility received at stage 𝑘 when the state is 𝑥𝑘 , player 𝑖 takes action 𝑎𝑘𝑖 ,
𝐽̄𝑖𝑘 (𝑥𝑘 , 𝑎𝑘1 , 𝑎𝑘2 , 𝜃1 , 𝜃2 , 𝑤𝑘𝑖 )
player 𝑖’s type is 𝜃𝑖 , and the noise is 𝑤𝑘𝑖
𝐽𝑖𝑘 (𝑥𝑘 , 𝑎𝑘1 , 𝑎𝑘2 , 𝜃1 , 𝜃2 ) Player 𝑖’s expected stage utility received at stage 𝑘 with the input of 𝑥𝑘 , 𝑎𝑘1 , 𝑎𝑘2 , 𝜃1 , 𝜃2
𝑘 ∶𝐾 𝑘 ∶𝐾 𝑘 ∶𝐾 Player 𝑖’s expected cumulative utility received from stage 𝑘0 to 𝐾 when the initial state
𝑈𝑖 0 (𝜎𝑖 0 , 𝜎𝑗 0 , 𝑥𝑘0 , 𝜃𝑖 ) 𝑘 ∶𝐾
is 𝑥𝑘0 , his/her type is 𝜃𝑖 , and the multi-stage strategies of player 𝑖 are 𝜎𝑖 0
𝑉𝑖𝑘 (𝑥𝑘 , 𝜃𝑖 ) 𝑘
Player 𝑖’s value function at state 𝑥 when his/her type is 𝜃𝑖
Acronym Meaning
APT(s) Advanced persistent threat(s)
SBNE Static Bayesian Nash equilibrium
DBNE Dynamic Bayesian Nash equilibrium
PBNE Perfect Bayesian Nash equilibrium
&'*
Interaction at % %- Interaction
Stage ! + 1 at Stage .
User’s Type
Legitimate % &'(
Belief update
Delayed observation
and decision or
%&
making Primitive
Figure 2: A block diagram of applying the defense-in-depth approach against multi-stage APT attacks. We denote the
user, the defender, and the system states in red, blue, and black, respectively. The defender interacts with the user from
stage 0 to stage 𝐾 in sequence where the output state of stage 𝑘 − 1 becomes the input state of stage 𝑘. At each stage
𝑘, the user observes the defender’s actions at previous stages, forms a belief on the defender’s type, and takes an action.
At the same time, the defender makes decisions based on the output of an imperfect detection system. The dotted line
means that the observation is not in real time, i.e., both players can only observe the previous-stage actions of the other
player.
5 and 6 where the user’s type 𝜃2 is either adversarial 𝜃2𝑏 or legitimate 𝜃2𝑔 . The APT attacker, i.e., the adversarial user,
disguises himself as the legitimate user, thus the defender does not know the type of the user. The set of the user’s type
can also be non-binary and incorporate different APT groups when their attack tools and targeted assets are different
(see FireEye (2017)).
The Defender can also be classified into different levels of sophistication based on various factors such as her level
of security awareness, detection techniques she adopted, and the completeness of her virus signature database. The
discrete type 𝜃1 distinguishes defenders of different sophistication levels and all the possible type values constitute the
defender’s type set Θ1 . For example, in our case study, the defender’s type 𝜃1 is either sophisticated 𝜃1𝐻 or primitive
𝜃1𝐿 . The defender can apply defensive deception techniques and keep her type private to the user. We assume that both
players’ type sets are commonly known. Each player knows his/her own type, yet not the other player’s type. Thus,
each player 𝑖 should treat the other player’s type as a random variable with an initial distribution 𝑏0𝑖 and update the
distribution to 𝑏𝑘𝑖 when obtaining new information at each stage 𝑘. We present the above belief update formally in
Section 2.3.
is that code obfuscation can be either used legitimately to prevent reverse engineering or illegally to conceal malicious
JavaScript code from being recognized by signature-based detectors or human analysts as shown in Nissim et al. (2015).
We assume that the user can observe the defender’s stage-𝑘 action at stage 𝑘 + 1. The observation of the defender’s
action at a single stage also does not reveal the defender’s type.
In this paper, each player obtains a one-stage delayed observation of the other player’s actions, i.e., at each stage
∏ ∏ ̄
𝑘, the action history available to both players is ℎ𝑘 = {𝑎01 , ⋯ , 𝑎𝑘−1
1
, 𝑎02 , ⋯ , 𝑎𝑘−1
2
} ∈ 𝐻 𝑘 ∶= 2𝑖=1 𝑘−1 ̄
𝑘=0
𝐴𝑘𝑖 . Given
history ℎ𝑘 at the current stage 𝑘, players at stage 𝑘 + 1 obtain an updated history ℎ𝑘+1 = ℎ𝑘 ∪ {𝑎𝑘1 , 𝑎𝑘2 } after the
observation of both players’ actions at stage 𝑘. At each stage 𝑘, we further define a state 𝑥𝑘 ∈ 𝑋 𝑘 which summarizes
information about both players’ actions in previous stages so that the initial state 𝑥0 ∈ 𝑋 0 and the history at stage 𝑘
uniquely determine 𝑥𝑘 through a known state transition function 𝑓 𝑘 , i.e., 𝑥𝑘+1 = 𝑓 𝑘 (𝑥𝑘 , 𝑎𝑘1 , 𝑎𝑘2 ), ∀𝑘 ∈ {0, 1, ⋯ , 𝐾 −1}.
States at different stages can have different meanings such as the reconnaissance outcome, the user’s location, the
privilege level, and the sensor status.
Here, player 𝑖 updates the belief 𝑏𝑘𝑖 based on the observation of the action 𝑎𝑘𝑖 , 𝑎𝑘𝑗 . When the denominator is 0, the history
ℎ𝑘+1 is not reachable from ℎ𝑘 , and the Bayesian update does not apply. In this case, we let 𝑏𝑘+1
𝑖 (𝜃𝑗 |ℎ ∪{𝑎𝑖 , 𝑎𝑗 }, 𝜃𝑖 ) ∶=
𝑘 𝑘 𝑘
where 𝐴̄ 𝑘 ∶= {𝑎𝑘1 ∈ 𝐴𝑘1 , 𝑎𝑘2 ∈ 𝐴𝑘2 |𝑥𝑘+1 = 𝑓 𝑘 (𝑥𝑘 , 𝑎𝑘1 , 𝑎𝑘2 )} contains all the action pairs that change the system state
from 𝑥𝑘 to 𝑥𝑘+1 . Equation (3) shows that the Bayesian update in (2) can be obtained from (1) by clustering all the
action pairs in set 𝐴̄ 𝑘 . Thus, the Markov belief update (2) can also be regarded as an approximation of (1) using action
aggregations. Unlike the history set 𝐻 𝑘 , the dimension of the state set, |𝑋 𝑘 |, does not grow with the number of stages.
Hence, the Markov approximation significantly reduces the memory and computational complexity. The following
sections adopt the Markov belief update.
𝑘 ∶𝐾 𝑘 ∶𝐾 𝑘 ∶𝐾 ∑
𝐾
𝑈𝑖 0 (𝜎𝑖 0 , 𝜎𝑗 0 , 𝑥𝑘0 , 𝜃𝑖 ) ∶= 𝔼𝜃 𝑘 𝑘 𝑘 𝑘 𝑘 [𝐽𝑖𝑘 (𝑥𝑘 , 𝑎𝑘1 , 𝑎𝑘2 , 𝜃1 , 𝜃2 )]
𝑗 ∼𝑏𝑖 ,𝑎𝑖 ∼𝜎𝑖 ,𝑎𝑗 ∼𝜎𝑗
𝑘=𝑘0
(4)
∑
𝐾 ∑ ∑ ∑
= 𝑏𝑘𝑖 (𝜃𝑗 |𝑥𝑘 , 𝜃𝑖 ) 𝜎𝑖𝑘 (𝑎𝑘𝑖 |𝑥𝑘 , 𝜃𝑖 ) ⋅ 𝜎𝑗𝑘 (𝑎𝑘𝑗 |𝑥𝑘 , 𝜃𝑗 )𝐽𝑖𝑘 (𝑥𝑘 , 𝑎𝑘1 , 𝑎𝑘2 , 𝜃1 , 𝜃2 ), 𝑗 ≠ 𝑖.
𝑘=𝑘0 𝜃𝑗 ∈Θ𝑗 𝑎𝑘𝑖 ∈𝐴𝑘𝑖 𝑎𝑘𝑗 ∈𝐴𝑘𝑗
Definition 1. Consider the two-person 𝐾-stage game with double-sided incomplete information (i.e., each player’s
type is not known to the other player), a sequence of beliefs 𝑏𝑘𝑖 , ∀𝑘 ∈ {0, ⋯ , 𝐾}, an expected cumulative utility 𝑈𝑖0∶𝐾
∏
in (4), and a given scalar 𝜀 ≥ 0. A sequence of strategies 𝜎𝑖∗,0∶𝐾 ∈ 𝐾 𝑘
𝑘=0 Σ𝑖 is called 𝜀-dynamic Bayesian Nash
equilibrium for player 𝑖 if condition (C2) is satisfied. If condition (C1) is also satisfied, 𝜎𝑖∗,0∶𝐾 is further called 𝜀-
perfect Bayesian Nash equilibrium.
(C1) Belief consistency: under strategy pair (𝜎1∗,0∶𝐾 , 𝜎2∗,0∶𝐾 ), each player’s belief 𝑏𝑘𝑖 at each stage 𝑘 = 0, ⋯ , 𝐾
satisfies (2).
(C2) Sequential rationality: for all given initial state 𝑥𝑘0 ∈ 𝑋 𝑘0 at every initial stage 𝑘0 ∈ {0, ⋯ , 𝐾},
When 𝜀 = 0, the two 𝜀-equilibria are called Dynamic Bayesian Nash Equilibrium (DBNE) and Perfect Bayesian Nash
Equilibrium (PBNE), respectively.
The belief consistency emphasizes that when strategic players make long-term decisions, they have to consider
the impact of their actions on their opponent’s beliefs at future stages. The PBNE is a refinement of the DBNE
with the additional requirement of the belief consistency property. When the horizon 𝐾 = 0, the multi-stage game
of incomplete information defined in Section 2 degenerates to a one-stage (static) Bayesian game with the one-stage
belief pairs (𝑏𝐾1
, 𝑏𝐾
2
) and the solution concept of the DBNE/PBNE degenerates to the Static Bayesian Nash Equilibrium
(SBNE) in Definition 2.
The sequential rationality property in (5) guarantees that unilateral deviations from the equilibrium at any states do
not benefit the deviating player. Thus, the equilibrium strategy can be a reasonable prediction of both players’ multi-
stage behaviors. DBNE strategies have the property of strongly time consistency because (5) holds for any possible
initial states, even for states that are not on the equilibrium path, i.e., those states would not be visited under DBNE
strategies. The strongly time consistency property makes the DBNE adapt to unexpected changes. Solutions obtained
by dynamic programming naturally satisfy strongly time consistency. Hence, in the following, we introduce algorithms
based on dynamic programming techniques.
𝑘 𝑘 ∶𝐾 ∗,𝑘 ∶𝐾 ∗,𝑘 ∶𝐾
Define the value function 𝑉𝑖 0 (𝑥𝑘0 , 𝜃𝑖 ) ∶= 𝑈𝑖 0 (𝜎1 0 , 𝜎2 0 , 𝑥𝑘0 , 𝜃𝑖 ) as the utility-to-go from any initial stage
∗,𝑘0 ∶𝐾 ∗,𝑘0 ∶𝐾
𝑘0 ∈ {0, ⋯ , 𝐾} under the DBNE strategy pair (𝜎1 , 𝜎2 ). Then, at the final stage 𝐾, the value function for
player 𝑖 ∈ {1, 2} with type 𝜃𝑖 at state 𝑥𝐾 is
For any feasible sequence of belief pairs (𝑏𝑘1 , 𝑏𝑘2 ), 𝑘 = 0, ⋯ , 𝐾 − 1, we have the following recursive system equa-
tions for player 𝑖 to find the equilibrium strategy pairs (𝜎1∗,𝑘 , 𝜎2∗,𝑘 ) backwardly from stage 𝐾 − 1 to the initial stage 0,
i.e., ∀𝑘 ∈ {0, ⋯ , 𝐾 − 1}, ∀𝑖, 𝑗 ∈ {1, 2}, 𝑗 ≠ 𝑖,
𝑉𝑖𝑘 (𝑥𝑘 , 𝜃𝑖 ) = sup 𝔼𝜃 [𝑉𝑖𝑘+1 (𝑓 𝑘 (𝑥𝑘 , 𝑎𝑘1 , 𝑎𝑘2 ), 𝜃𝑖 ) + 𝐽𝑖𝑘 (𝑥𝑘 , 𝑎𝑘1 , 𝑎𝑘2 , 𝜃1 , 𝜃2 )].
𝑘 𝑘 𝑘 𝑘 ∗,𝑘
𝑗 ∼𝑏𝑖 ,𝑎𝑖 ∼𝜎𝑖 ,𝑎𝑗 ∼𝜎𝑗 (7)
𝜎𝑖𝑘 ∈Σ𝑘𝑖
4. Computational Algorithms
In 4.1, we formulate a constrained optimization problem to compute the SBNE and 𝑉𝑖𝐾 for the one-stage game. In
4.2, we use the proposed optimization problem as building blocks to compute the DBNE and 𝑉𝑖𝑘 , ∀𝑘 ∈ {0, ⋯ , 𝐾 − 1}.
Finally, we propose an iterative algorithm to solve for the PBNE. Efficient algorithms to compute the PBNE lay a solid
foundation to quantify the risk of cyber-physical attacks and guide the design of proactive defense-in-depth strategies.
In Theorem 1, we propose a constrained optimization program 𝐶 𝐾 to compute the SBNE. We suppress the superscript
of 𝐾 without any ambiguity in one-stage games.
Theorem 1. A strategy pair (𝜎1∗ ∈ Σ1 , 𝜎2∗ ∈ Σ2 ) constitutes a SBNE to the one-stage bi-matrix Bayesian game (𝐽1 , 𝐽2 )
under private type 𝜃𝑖 ∈ Θ𝑖 , ∀𝑖 ∈ {1, 2}, belief 𝑏𝑖 , ∀𝑖 ∈ {1, 2}, and a given state 𝑥, if and only if the strategy pair is a
solution to 𝐶 𝐾 :
∑ ∑
[𝐶 𝐾 ] ∶ max 𝛼1 (𝜃1 )𝑠1 (𝑥, 𝜃1 ) + 𝛼2 (𝜃2 )𝑠2 (𝑥, 𝜃2 )
𝜎1 ,𝜎2 ,𝑠1 ,𝑠2
𝜃1 ∈Θ1 𝜃2 ∈Θ2
∑
+ 𝛼1 (𝜃1 )𝔼𝜃2 ∼𝑏1 ,𝑎1 ∼𝜎1 ,𝑎2 ∼𝜎2 [𝐽1 (𝑥, 𝑎1 , 𝑎2 , 𝜃1 , 𝜃2 )]
𝜃1 ∈Θ1
∑
+ 𝛼2 (𝜃2 )𝔼𝜃1 ∼𝑏2 ,𝑎1 ∼𝜎1 ,𝑎2 ∼𝜎2 [𝐽2 (𝑥, 𝑎1 , 𝑎2 , 𝜃1 , 𝜃2 )]
𝜃2 ∈Θ2
s.t. (𝑎) 𝔼𝜃1 ∼𝑏2 ,𝑎1 ∼𝜎1 [𝐽2 (𝑥, 𝑎1 , 𝑎2 , 𝜃1 , 𝜃2 )] ≤ −𝑠2 (𝑥, 𝜃2 ), ∀𝜃2 , ∀𝑎2 ,
∑
(𝑏) 𝜎1 (𝑎1 |𝑥, 𝜃1 ) = 1, 𝜎1 (𝑎1 |𝑥, 𝜃1 ) ≥ 0, ∀𝜃1 ,
𝑎1 ∈𝐴1
(𝑐) 𝔼𝜃2 ∼𝑏1 ,𝑎2 ∼𝜎2 [𝐽1 (𝑥, 𝑎1 , 𝑎2 , 𝜃1 , 𝜃2 )] ≤ −𝑠1 (𝑥, 𝜃1 ), ∀𝜃1 , ∀𝑎1 ,
∑
(𝑑) 𝜎2 (𝑎2 |𝑥, 𝜃2 ) = 1, 𝜎2 (𝑎2 |𝑥, 𝜃2 ) ≥ 0, ∀𝜃2 .
𝑎2 ∈𝐴2
The dimensions of decision variables 𝜎1 (𝑎1 |𝑥, 𝜃1 ), ∀𝜃1 ∈ Θ1 , and 𝜎2 (𝑎2 |𝑥, 𝜃2 ), ∀𝜃2 ∈ Θ2 , are |𝐴1 | × |Θ1 | and |𝐴2 | ×
|Θ2 |, respectively. Besides, 𝑠1 (𝑥, 𝜃1 ), ∀𝜃1 and 𝑠2 (𝑥, 𝜃2 ), ∀𝜃2 are scalar decision variables for each given 𝜃𝑖 , 𝑖 ∈ {1, 2}.
The non-decision variables 𝛼1 (𝜃1 ), ∀𝜃1 and 𝛼2 (𝜃2 ), ∀𝜃2 , can be any strictly positive and finite numbers. The solution
to 𝐶 𝐾 exists and is achieved at the equality of constraints (𝑎), (𝑐), i.e., 𝑠∗2 (𝑥, 𝜃2 ) = −𝑉2 (𝑥, 𝜃2 ), 𝑠∗1 (𝑥, 𝜃1 ) = −𝑉1 (𝑥, 𝜃1 ).
PROOF. The finiteness and discreteness of the action and the type spaces guarantee the existence of the SBNE in mixed
strategies as shown in Shoham and Leyton-Brown (2008), which further guarantee that program 𝐶 𝐾 has solutions. To
show the equivalence between the solution to 𝐶 𝐾 and the SBNE, we first show that every SBNE is a solution of
𝐶 𝐾 . If (𝜎1∗ ∈ Σ1 , 𝜎2∗ ∈ Σ2 ) is a SBNE pair, then the quadruple 𝜎1∗ (𝜃1 ), 𝜎2∗ (𝜃2 ), 𝑠∗2 (𝑥, 𝜃2 ) = −𝑉2 (𝑥, 𝜃2 ), 𝑠∗1 (𝑥, 𝜃1 ) =
−𝑉1 (𝑥, 𝜃1 ), ∀𝜃𝑖 ∈ Θ𝑖 , ∀𝑖 ∈ {1, 2}, is feasible because it satisfies constraints (𝑎), (𝑏), (𝑐), (𝑑). Constraints (𝑎) and (𝑐)
imply a non-positive objective function of 𝐶 𝐾 . Since the value of the objective function achieved under this quadruple
is 0, this quadruple is also optimal. Second, we show that 𝜎1∗ (𝜃1 ), 𝜎2∗ (𝜃2 ), 𝑠∗2 (𝑥, 𝜃2 ), 𝑠∗1 (𝑥, 𝜃1 ), the result of 𝐶 𝐾 is a
SBNE. The solution of 𝐶 𝐾 should satisfy all the constraints, i.e.,
𝔼𝜃1 ∼𝑏2 ,𝑎1 ∼𝜎 ∗ ,𝑎2 ∼𝜎2 [𝐽2 (𝑥, 𝑎1 , 𝑎2 , 𝜃1 , 𝜃2 )] ≤ −𝑠∗2 (𝑥, 𝜃2 ), ∀𝜃2 , ∀𝜎2 ∈ Σ2 ,
1
(9)
𝔼𝜃2 ∼𝑏1 ,𝑎1 ∼𝜎1 ,𝑎2 ∼𝜎 ∗ [𝐽2 (𝑥, 𝑎1 , 𝑎2 , 𝜃1 , 𝜃2 )] ≤ −𝑠∗1 (𝑥, 𝜃1 ), ∀𝜃1 , ∀𝜎1 ∈ Σ1 .
2
In particular, if we pick 𝜎𝑖 (𝜃𝑖 ) = 𝜎𝑖∗ (𝜃𝑖 ), ∀𝜃𝑖 , ∀𝑖 ∈ {1, 2}, and combine the fact that the optimal value is achieved at
0, the inequality turns out to be an equality and equation (9) becomes (8), which shows that (𝜎1∗ ∈ Σ1 , 𝜎2∗ ∈ Σ2 ) is a
SBNE. □
Theorem 1 focuses on the double-sided Bayesian game where each player player 𝑖 has a private type 𝜃𝑖 ∈ Θ𝑖 . To
accommodate the one-sided Bayesian game where player 𝑖’s type 𝜃𝑖 ∈ Θ𝑖 is known by both players and player 𝑗’s type
remains unknown to player 𝑖, we can modify program 𝐶 𝐾 by letting 𝛼𝑖 (𝜃𝑖 ) > 0 and 𝛼𝑖 (𝜃̃𝑖 ) = 0, ∀𝜃̃𝑖 ∈ Θ𝑖 ⧵ {𝜃𝑖 }.
∑
2 ∑ ∑ ∑ ∑
[𝐶 𝑘 ] ∶ max 𝛼𝑖 (𝜃𝑖 ){𝑠𝑘𝑖 (𝑥𝑘 , 𝜃𝑖 ) + 𝑏𝑘𝑖 (𝜃𝑗 |𝑥𝑘 , 𝜃𝑖 ) 𝜎1𝑘 (𝑎𝑘1 |𝑥𝑘 , 𝜃1 ) 𝜎2𝑘 (𝑎𝑘2 |𝑥𝑘 , 𝜃2 )
𝜎1𝑘 ,𝜎2𝑘 ,𝑠𝑘1 ,𝑠𝑘2 𝑖=1 𝜃𝑖 ∈Θ𝑖 𝜃𝑗 ∈Θ𝑗 𝑎𝑘1 ∈𝐴𝑘1 𝑎𝑘2 ∈𝐴𝑘2
Similarly, 𝛼1 (𝜃1 ), 𝛼2 (𝜃2 ) can be any strictly positive and finite numbers, and (𝑠𝑘1 (𝑥𝑘 , 𝜃1 ), 𝑠𝑘2 (𝑥𝑘 , 𝜃2 )) is a sequence of
scalar variables for each 𝑥𝑘 ∈ 𝑋 𝑘 , 𝜃𝑖 ∈ Θ𝑖 , 𝑖 ∈ {1, 2}. The optimum exists and is achieved at the equality of constraints
(𝑎), (𝑏), i.e., 𝑠∗,𝑘 𝑘 𝑘 𝑘
𝑖 (𝑥 , 𝜃𝑖 ) = −𝑉𝑖 (𝑥 , 𝜃𝑖 ), ∀𝜃𝑖 ∈ Θ𝑖 , ∀𝑖 ∈ {1, 2}.
The proof is similar to the one for Theorem 1. The decision variables 𝜎𝑖𝑘 are of size |𝐴𝑘𝑖 | × |𝑋 𝑘 | × |Θ𝑖 |. By letting
stage 𝑘 = 𝐾 and 𝑉𝑖𝐾+1 = 0, program 𝐶 𝐾 for the static Bayesian game is a special case of 𝐶 𝑘 for the multi-stage
Bayesian game. We can solve program 𝐶 𝑘+1 to obtain the DBNE strategy pair (𝜎1𝑘+1 , 𝜎2𝑘+1 ) and the value of 𝑉𝑖𝑘+1 .
Then, we apply 𝑉𝑖𝑘+1 in program 𝐶 𝑘 to obtain a DBNE strategy pair (𝜎1𝑘 , 𝜎2𝑘 ) and the value of 𝑉𝑖𝑘 . Thus, for any given
sequences of type belief pairs 𝑏𝑘𝑖 , ∀𝑖 ∈ {1, 2}, ∀𝑘 ∈ {0, 1, ⋯ , 𝐾}, we can solve 𝐶 𝑘 from 𝑘 = 𝐾 to 𝑘 = 0 recursively
to obtain the DBNE pair (𝜎1∗,0∶𝐾−1 , 𝜎2∗,0∶𝐾−1 ).
4.2.1. PBNE
Given a sequence of beliefs, we can obtain the corresponding DBNE via 𝐶 𝑘 in a backward fashion. However,
given a sequence of policies, both players forwardly update their beliefs at each stage by (2). Thus, we need to find a
consistent pair of belief and policy sequences as required by the PBNE. As summarized in Algorithm 1, we iteratively
alternate between the forward belief update and the backward policy computation to find the PBNE. We resort to
𝜀-PBNE solutions when the existence of PBNE is not guaranteed.
Algorithm 1 provides a computational approach to find 𝜀-PBNE with the following procedure. First, both players
initialize their beliefs 𝑏𝑘𝑖 for every state 𝑥𝑘 at stage 𝑘 ∈ {0, 1, ⋯ , 𝐾}, according to their types. Then, they compute
the DBNE strategy pair 𝜎𝑖∗,0∶𝐾 , ∀𝑖 ∈ {1, 2}, under the given belief sequence at each stage by solving program 𝐶 𝑘
from stage 𝐾 to stage 0 in sequence. Next, they update their beliefs at each stage according to the strategy pair
𝜎𝑖∗,0∶𝐾−1 , ∀𝑖 ∈ {1, 2}, via the Bayesian update (2). If the strategy pair 𝜎𝑖∗,0∶𝐾−1 , ∀𝑖 ∈ {1, 2}, satisfies (5) under the
updated belief, we find the 𝜀-PBNE and terminate the iteration. Otherwise, we repeat the backward policy computation
in step two and the forward belief update in step three.
5. Case Study
The model presented in Section 2 can be applied to various APT scenarios. To illustrate the framework, this
section presents a specific attack scenario where the attacker stealthily initiates infection and escalates privileges in the
cyber network, aiming to launch attacks on the physical plant as shown in Fig. 3. Three vertical columns in the left
block illustrate the state transitions across three stages: the initial compromise, the privilege escalation, and the sensor
compromise of a physical system. The red squares at each column represent possible states at that stage. The right
block illustrates a simplified flow chart of the Tennessee Eastman Process. We use the Tennessee Eastman process as
a benchmark of industrial control systems to show that attackers can strategically compromise the SCADA system and
decrease the operational efficiency of a physical plant without triggering the alarm.
In this case study, we adopt the binary type space Θ2 = {𝜃2𝑏 , 𝜃2𝑔 } and Θ1 = {𝜃1𝐻 , 𝜃1𝐿 } for the user and the defender,
respectively. In particular, 𝜃2𝑏 and 𝜃2𝑔 denote the adversarial and legitimate user, respectively; 𝜃1𝐻 and 𝜃1𝐿 denote the
sophisticated and primitive defender, respectively. The bi-matrices in Table 2, 3, and 4 represent both players’ expected
utilities at three stages, respectively. In these matrices, the defender is the row player and the user is the column player.
Each entry of the matrix corresponds to players’ payoffs under their action pairs, types, and the state. In particular, the
two red numbers in the parenthesis before the semicolon are the payoffs of the defender and the user, respectively, under
type 𝜃2𝑏 , while the parenthesis in blue after the semicolon presents the payoff of the defender and the user, respectively,
under type 𝜃2𝑔 .
Sensor
Validity )(+) + .(+) + 6(+) → 7(345)
!& =0
!8 = 0
A Reactor Compressor
!" = 0 Attack
9; 9: !& = 1 Physical
!8 = 1 D
Plant via Recycle
Cyber Condenser Separator flow
!" =1 !& =2 E
Components
!8 = 2
C Stripper
Controller
!& = 3
Intermediate G&H
Initial Stage Final Stage
Stage
Figure 3: The diagram of the cyber state transition (denoted by the left block in orange) and the physical attack on
Tennessee Eastman process via the compromise of the SCADA system (denoted by the right block in blue). APTs
can damage the normal industrial operation by falsifying controllers’ setpoints, tampering sensor readings, and blocking
communication channels to cause delays in either the control message or the sensing data.
Table 2
The expected utilities of the defender and the user at the initial stage, i.e., 𝐽10 and 𝐽20 , respectively.
Email
𝜃2𝑏 ;𝜃2𝑔 Email Managers Email Avatars
Employees
No
(−𝑟02 , 𝑟02 );(0, 𝑟01 ) (−𝑟02 , 𝑟02 );(0, 𝑟01 ) (0, 𝑟0𝑏,𝑓 );(0, 𝑟0𝑔,𝑓 )
Training
Train
(−𝑐 0 , −𝑟0 );(−𝑐 0 , 𝑟01 ) (−𝑐 0 , 𝑟02 );(−𝑐 0 , 𝑟01 ) (−𝑐 0 , 𝑟0𝑏,𝑓 );(−𝑐 0 , 𝑟0𝑔,𝑓 )
Employees
Train
(−𝑐 0 , 𝑟02 );(−𝑐 0 , 𝑟01 ) (−𝑐 0 , −𝑟0 );(−𝑐 0 , 𝑟01 ) (−𝑐 0 , 𝑟0𝑏,𝑓 );(−𝑐 0 , 𝑟0𝑔,𝑓 )
Managers
Table 3
The expected utilities of the defender and the user at the intermediate stage, i.e., 𝐽11 and 𝐽21 , respectively.
𝑈𝑇 𝐸 = 𝑅𝑝 × 𝑄𝑝 × 𝑃𝐺 − 𝐶𝑜 . (10)
3 Normal Operation
Twofold Reading
Utility ($)
Constant Reading
2
0
0 10 20 30 40 50 60 70
Time (hrs)
10 5 Sensor Compromise in Loop 8 and 13
4
3.5
Composition Attack
2.5
1.5
0 10 20 30 40 50 60 70
Time (hrs)
Figure 4: The economic impact of sensor compromise in the Tennessee Eastman process. The black line represents the
utility of Tennessee Eastman process under the normal operation while the other four lines represent the utility of Tennessee
Eastman process under attacks with four possible privilege levels. We use the time average of these utilities to obtain
the normal operational utility 𝑟24 and compromised utilities 𝑟21 (𝑥2 ), ∀𝑥2 ∈ {0, 1, 2, 3}, under four different states of privilege
levels in Table 4.
attacker loses access to XMEAS(40) at the 6𝑡ℎ hour, the system is sufficiently resilient to recover partially in about
16 hours and achieve the same level of utility as the single attack in green. When the attacker also loses access to
XMEAS(17) at the 36𝑡ℎ hour, the utility goes back to normal in about 13 hours.
Table 4
The expected utilities of the defender and the user at the final stage, i.e., 𝐽12 and 𝐽22 , respectively.
Selective Monitoring (SM) (𝑟24 , 0);(𝑟24 , 𝑟24 ∕2) (𝑟21 (𝑥2 ), 𝑟24 − 𝑟21 (𝑥2 ));(𝑟24 , 𝑟24 )
Complete Monitoring (CM) (𝑟24 − 𝑐 2 , 0);(𝑟24 − 𝑐 2 , 𝑟24 ∕2) (𝑟2 − 𝑐 2 , −𝑟2 );(𝑟24 − 𝑐 2 , 𝑟24 )
data and analyzes them elaborately to identify malicious commands despite encryption. The selective monitoring
cannot identify malicious commands if they are encrypted. The implementation of the complete monitoring incurs
an additional cost 𝑐 2 compared to the selective one. The last-stage utility matrix of both players is defined in Table
4. If the user is legitimate, as denoted in blue, both the defender and the user can receive a reward of 𝑟4 when the
Tennessee Eastman process operates normally. Legitimate users further receive a utility reduction of 𝑟4 ∕2 for the
potential privacy loss if they choose unencrypted commands. For adversarial users, they send malicious commands
only when the communication is encrypted to evade detection. Thus, if they choose not to encrypt the communication,
they receive 0 utility and the defender receives a reward of 𝑟4 for the normal operation. However, if they choose to
send encrypted malicious commands, both players’ rewards depend on whether the defender chooses the selective
or complete monitoring. If the defender chooses the selective monitoring, then the adversarial user can successfully
compromise the sensor, which results in a reduced utility of 𝑟21 (𝑥2 ). In the meantime, the attacker benefits from the
reward reduction of 𝑟24 − 𝑟21 (𝑥2 ). If the defender chooses the complete monitoring, then the adversarial user suffers a
loss of 𝑟2 for being detected. The detection reward and the implementation cost for two types of defenders are 𝑟2𝐿 , 𝑟2𝐻
and 𝑐𝐿2 , 𝑐𝐻
2 , respectively. Let 𝑟2 ∶= 𝑟2 ⋅ 𝟏
𝐿 {𝜃1 =𝜃 𝐿 } + 𝑟𝐻 ⋅ 𝟏{𝜃1 =𝜃 𝐻 } and 𝑐 ∶= 𝑐𝐿 ⋅ 𝟏{𝜃1 =𝜃 𝐿 } + 𝑐𝐻 ⋅ 𝟏{𝜃1 =𝜃 𝐻 } .
2 2 2 2
1 1 1 1
6. Computation Results
In this section, we apply the algorithms introduced in Section 4 to compute both players’ strategies and utilities at
the equilibrium. We implement our algorithms in MATLAB and use YALMIP (see Löfberg (2004)) as the interface
to call external solvers such as BARON (see Tawarmalani and Sahinidis (2005)) to solve the optimization problems.
We present elaborate results from the concrete case study and provide meaningful insights of the proactive cross-layer
defense against multi-stage APT attacks that are stealthy and deceptive.
For the static Bayesian game at the final stage in Section 6.1, we focus on illustrating how two players’ private types
affect their policies and utilities under different information structures. We further apply sensitivity analysis to show
how the value of the key parameter affects the defender’s and the attacker’s utilities. For the multi-stage Bayesian game
in 6.2, we focus on the dynamic of the belief update and state transition under the interaction of the stealthy attacker
and the proactive defender. Moreover, we investigate how the adversarial and defensive deception, and how the initial
state can affect the stage utility and the cumulative utility of the user and the defender.
Defender Legitimate User Adversarial User Defender Legitimate User Adversarial User
Probability of UC/SM
Probability of UC/SM
1 1
0.5 0.5
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
105 105
4 4
Expected Utility
Expected Utility
2 2
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Probability of the Adversarial User Probability of the Adversarial User
(a) The user knows that the defender is primitive, yet the de- (b) Both players’ types are private, and each player only knows
fender only knows the probability of the user being adversarial. the probability of the other player’s type.
Figure 5: The SBNE strategy and the expected utility of the primitive defender and the user who is either legitimate or
adversarial. The 𝑥-axis represents the probability of the user being adversarial. The 𝑦-axis of the upper figure represents
the probability of either the user taking action ‘selective monitoring (SM)’ or the defender taking action ‘unencrypted
command (UC)’.
0.5
0
0 0.2 0.4 0.6 0.8 1
105
Expected Utility
3.3
3.28
3.26
3.24
0 0.2 0.4 0.6 0.8 1
Probability of the Sophisticated Defender
Figure 6: The SBNE strategy and the expected utility of the adversarial user and the defender who is either primitive
or sophisticated. The defender knows that the user is adversarial while the adversarial user only knows the probability of
the defender being primitive. The 𝑥-axis represents the probability of the defender being sophisticated. The 𝑦-axis of the
upper figure represents the probability of either the user taking action ‘selective monitoring (SM)’ or the defender taking
action ‘unencrypted command (UC)’.
Fig. 6 shows that the defender benefits from introducing defensive deception. When the defender becomes more
likely to a sophisticated one, both types of defenders can have a higher probability to apply the selective monitoring and
save the extra surveillance cost of the complete monitoring. The attacker with incomplete information has a threshold
policy and switches to a lower attacking probability after reaching the threshold of 0.5 as shown in the black line.
When the probability goes beyond the threshold, the primitive defender can pretend to be a sophisticated one and
take action ‘selective monitoring’. Meanwhile, a sophisticated defender can reduce the security effort and take action
5 5
10 10
3 3.5
2.5 3
2 2.5
1.5 2
1 1.5
0.5 1
0 0.5
0 0.5 1 1.5 2 2.5 0 0.5 1 1.5 2 2.5
10
5 10 5
Figure 7: Utilities of the primitive defender and the attacker versus the value of 𝑟2𝐿 under different states 𝑥2 ∈ {0, 1, 2, 3}.
‘selective monitoring’ with a higher probability since the attacker becomes more cautious in taking adversarial actions
after identifying the defender as more likely to be sophisticated. It is also observed that the sophisticated defender
receives a higher payoff before the attacker’s belief reaches the 0.5 threshold. After the belief reaches the threshold,
the attacker is threatened to take less aggressive actions, and both types of defenders share the same payoff.
Finally, we consider the double-sided incomplete information where both players’ types are private information,
and each player only has the belief of the other player’s type. Compared with the defender in Fig. 5a who takes action
‘selective monitoring’ with a probability less than 0.5 and receives a decreasing expected payoff, the defender in Fig.
5b can take ‘selective monitoring’ with a probability closed to 1 and receive a constant payoff in expectation after
the user’s belief exceeds the threshold. Thus, the defender can spare defense efforts and mitigate risks by introducing
uncertainties on her type as a countermeasure to the adversarial deception.
0.5
0.5
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1
0.5
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1
0.5
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Prior Belief of Adversarial User
Figure 8: The defender’s prior and posterior beliefs of the user being adversarial.
0.5
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1
State Probability
0.5
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1
0.5
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1
0.5
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Prior Belief of Adversarial User
10 5
3
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
10 5
3
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
10 5
3
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Belief of Adversarial User under Deception
The results from Fig. 11 are summarized as follows. First, the sophisticated defender’s payoffs can increase as much
as 56% than those of the primitive defender. Also, a prevention of effectual reconnaissance increases the defender’s
utility by as much as 41% and reduces the attacker’s utility by as much as 38%. Second, the defender and the attacker
receive the highest and the lowest payoff, respectively, under the complete information. When the attacker introduces
deceptions over his type, the attacker’s utility increases and the defender’s utility decreases. Third, when the defender
adopts defensive deceptions to introduce double-sided incomplete information, we find that the decrease of the sophis-
ticated defender’s utilities is reduced by at most 64%, i.e., changes from $55, 570 to $35, 570 when the reconnaissance
is effectual. The double-sided incomplete information also brings lower utilities to the attacker than the one-sided
adversarial deception. However, the defender’s utility under the double-sided deception is still less than the complete
information case, which concludes that acquiring complete information of the adversarial user is the most effective
defense. However, if the complete information cannot be obtained, the defender can mitigate her loss by introducing
defensive deceptions.
330000
310000
Complete Information with the H-Type Complete Information with the L-Type
Complete Information with H-Type Complete Information with L-Type
Deception with the H-Type Deception with the L-Type
Deception with H-Type Deception with L-Type
Double Deception with the H-Type Double Deception with the L-Type
Double Deception with H-Type Double Deception with L-Type
180000
160000
Attacker's Utility ($)
140000
120000
100000
80000
60000
40000
Ineffectual Reconnaissance Effectual Reconnaissance
Figure 11: The cumulative utilities of the attacker and the defender under the complete information, the adversarial
deception, and the defensive deception. In the legend, the left three represent the utilities for a sophisticated defender and
the right three represent the ones for a primitive defender.
attacker receives a higher payoff after introducing the adversarial deception as it increases the defender’s uncertainties
on the user’s type. On the other hand, by creating uncertainties for attackers, the defender can successfully threaten
them to take more conservative behaviors and become less motivated to launch attacks. It has been shown that the
defender can significantly benefit from the mitigation of attack losses when he adopts defensive deceptions.
The main challenge of our approach is to identify the utility and feasible actions of defenders and users at each stage.
One future direction to reduce the complexity of the model description is to develop mechanisms that can automate
the synthesis of verifiably correct game-theoretic models. It would alleviate the workload of the system defender and
operator. Nevertheless, game theory provides a quantitative and explainable framework to design the proactive defen-
sive response under uncertainty compared to rule-based and machine-learning-based defense methods, respectively.
Besides, the rule-based defense is static, thus an attack can circumvent it through sufficient effort. Machine learning
methods require a lot of labeled data sets which may be hard to obtain in the APT scenario. Second, we have proposed
the belief to quantify the uncertainty which results from players’ private types. The belief is continuously updated to
reduce uncertainties and provide a probabilistic detection system as a byproduct of the APT response design. Third,
our approach enables the defender to evaluate the multi-stage impact of her defense strategies to both legitimate and
adversarial users when adversarial and defensive deceptions present at the same time. Based on the evaluation, defend-
ers can further find revised countermeasures and design new game rules to achieve a better tradeoff between security
and usability. Our model can be broadly applied to scenarios in artificial intelligence, economy, and social science
where multi-stage interactions occur between multiple agents with incomplete information. Multi-sided non-binary
types can be defined based on the scenario, and our iteration algorithm of the forward belief update and the backward
policy computation can be extended for efficient computations of the perfect Bayesian Nash equilibrium. The future
work would extend the framework to an 𝑁-person game to characterize the simultaneous interactions among multiple
users and model composition attacks. We would also consider scenarios where players’ actions and the system state
are partially observable.
References
Bathelt, A., Ricker, N.L., Jelali, M., 2015. Revision of the Tennessee Eastman process model. IFAC-PapersOnLine 48, 309 – 314. doi:https:
//doi.org/10.1016/j.ifacol.2015.08.199. 9th IFAC Symposium on Advanced Control of Chemical Processes ADCHEM 2015.
Cárdenas, A.A., Amin, S., Lin, Z.S., Huang, Y.L., Huang, C.Y., Sastry, S., 2011. Attacks against process control systems: risk assessment, detection,
and response, in: Proceedings of the 6th ACM symposium on information, computer and communications security, ACM. pp. 355–366.
Corporation, T.M., 2019. Enterprise matrix. URL: https://attack.mitre.org/matrices/enterprise/.
Dufresne, M., 2018. Putting the MITRE ATT&CK evaluation into context. URL: https://www.endgame.com/blog/technical-blog/
putting-mitre-attck-evaluation-context.
Feng, X., Zheng, Z., Hu, P., Cansever, D., Mohapatra, P., 2015. Stealthy attacks meets insider threats: a three-player game model, in: MILCOM
2015-2015 IEEE Military Communications Conference, IEEE. pp. 25–30.
FireEye, 2017. Advanced Persistent Threat Groups | FireEye. URL: https://www.fireeye.com/current-threats/apt-groups.html.
Friedberg, I., Skopik, F., Settanni, G., Fiedler, R., 2015. Combating advanced persistent threats: From network event correlation to incident
detection. Computers & Security 48, 35–57.
Ghafir, I., Hammoudeh, M., Prenosil, V., Han, L., Hegarty, R., Rabie, K., Aparicio-Navarro, F.J., 2018. Detection of advanced persistent threat
using machine-learning correlation analysis. Future Generation Computer Systems 89, 349–359.
Ghafir, I., Kyriakopoulos, K.G., Lambotharan, S., Aparicio-Navarro, F.J., AsSadhan, B., BinSalleeh, H., Diab, D.M., 2019. Hidden markov models
and alert correlations for the prediction of advanced persistent threats. IEEE Access 7, 99508–99520.
Ghafir, I., Prenosil, V., Hammoudeh, M., Han, L., Raza, U., 2017. Malicious ssl certificate detection: A step towards advanced persistent threat
defence, in: Proceedings of the International Conference on Future Networks and Distributed Systems, ACM. p. 27.
Harsanyi, J.C., 1967. Games with incomplete information played by “Bayesian” players, i–iii part i. the basic model. Management science 14,
159–182.
Department of Homeland Security, D., 2018. NSA/CSS Technical Cyber Threat Framework v2 A REPORT FROM: CYBERSECURITY OPERA-
TIONS THE CYBERSECURITY PRODUCTS AND SHARING DIVISION. Technical Report. URL: https://www.nsa.gov/Portals/70/
documents/what-we-do/cybersecurity/professional-resources/ctr-nsa-css-technical-cyber-threat-framework.pdf.
Horák, K., Zhu, Q., Bošanskỳ, B., 2017. Manipulating adversaryâĂŹs belief: A dynamic game approach to deception by design for proactive
network security, in: International Conference on Decision and Game Theory for Security, Springer. pp. 273–294.
Huang, L., Chen, J., Zhu, Q., 2017. A large-scale markov game approach to dynamic protection of interdependent infrastructure networks, in:
International Conference on Decision and Game Theory for Security, Springer. pp. 357–376.
Huang, L., Zhu, Q., 2018. Analysis and computation of adaptive defense strategies against advanced persistent threats for cyber-physical systems,
in: International Conference on Decision and Game Theory for Security, Springer. pp. 205–226.
Huang, L., Zhu, Q., 2019a. Adaptive honeypot engagement through reinforcement learning of semi-markov decision processes. CoRR
abs/1906.12182. URL: http://arxiv.org/abs/1906.12182, arXiv:1906.12182.
Huang, L., Zhu, Q., 2019b. Adaptive strategic cyber defense for advanced persistent threats in critical infrastructure networks. ACM SIGMETRICS
Performance Evaluation Review 46, 52–56.
Hutchins, E.M., Cloppert, M.J., Amin, R.M., 2011. Intelligence-driven computer network defense informed by analysis of adversary campaigns
and intrusion kill chains. Leading Issues in Information Warfare & Security Research 1, 80.
Krotofil, M., Cárdenas, A.A., 2013. Resilience of process control systems to cyber-physical attacks, in: Nordic Conference on Secure IT Systems,
Springer. pp. 166–182.
La, Q.D., Quek, T.Q., Lee, J., Jin, S., Zhu, H., 2016. Deceptive attack and defense game in honeypot-enabled networks for the internet of things.
IEEE Internet of Things Journal 3, 1025–1035.
Li, P., Yang, X., Xiong, Q., Wen, J., Tang, Y.Y., 2018. Defending against the advanced persistent threat: An optimal control approach. Security and
Communication Networks 2018.
LLC, P.I., 2018. 2018 cost of data breach study.
Löfberg, J., 2004. Yalmip : A toolbox for modeling and optimization in MATLAB, in: In Proceedings of the CACSD Conference, Taipei, Taiwan.
Marchetti, M., Pierazzi, F., Colajanni, M., Guido, A., 2016. Analysis of high volumes of network traffic for advanced persistent threat detection.
Computer Networks 109, 127–141.
Messaoud, B.I., Guennoun, K., Wahbi, M., Sadik, M., 2016. Advanced persistent threat: New analysis driven by life cycle phases and their
challenges, in: 2016 International Conference on Advanced Communication Systems and Information Security (ACOSIS), IEEE. pp. 1–6.
Milajerdi, S.M., Kharrazi, M., 2015. A composite-metric based path selection technique for the tor anonymity network. Journal of Systems and
Software 103, 53–61.
Mitnick, K.D., Simon, W.L., 2011. The art of deception: Controlling the human element of security. John Wiley & Sons.
Molok, N.N.A., Chang, S., Ahmad, A., 2010. Information leakage through online social networking: Opening the doorway for advanced persistence
threats .
Morris, T.H., Gao, W., 2013. Industrial control system cyber attacks, in: Proceedings of the 1st International Symposium on ICS & SCADA Cyber
Security Research, pp. 22–29.
Nguyen, T.H., Wang, Y., Sinha, A., Wellman, M.P., . Deception in finitely repeated security games.
Nissim, N., Cohen, A., Glezer, C., Elovici, Y., 2015. Detection of malicious pdf files and directions for enhancements: A state-of-the art survey.
Computers & Security 48, 246–266.
Pawlick, J., Chen, J., Zhu, Q., 2018. istrict: An interdependent strategic trust mechanism for the cloud-enabled internet of controlled things. arXiv
preprint arXiv:1805.00403 .
Pawlick, J., Colbert, E., Zhu, Q., 2017. A game-theoretic taxonomy and survey of defensive deception for cybersecurity and privacy. arXiv preprint
arXiv:1712.05441 .
Ricker, N.L., 1996. Decentralized control of the tennessee eastman challenge process. Journal of Process Control 6, 205–221.
Rowe, N.C., Custy, E.J., Duong, B.T., 2007. Defending cyberspace with fake honeypots .
Sahoo, D., Liu, C., Hoi, S.C., 2017. Malicious url detection using machine learning: a survey. arXiv preprint arXiv:1701.07179 .
Shoham, Y., Leyton-Brown, K., 2008. Multiagent systems: Algorithmic, game-theoretic, and logical foundations. Cambridge University Press.
Sigholm, J., Bang, M., 2013. Towards offensive cyber counterintelligence: Adopting a target-centric view on advanced persistent threats, in: 2013
European Intelligence and Security Informatics Conference, IEEE. pp. 166–171.
Tawarmalani, M., Sahinidis, N.V., 2005. A polyhedral branch-and-cut approach to global optimization. Mathematical Programming 103, 225–249.
Team, M.D.A.R., 2017. Detecting stealthier cross-process injection techniques with windows defender atp: Process hollowing and atom bombing.
URL: https://bit.ly/2nVWDQd.
Van Dijk, M., Juels, A., Oprea, A., Rivest, R.L., 2013. Flipit: The game of “stealthy takeover”. Journal of Cryptology 26, 655–713.
Yang, L.X., Li, P., Zhang, Y., Yang, X., Xiang, Y., Zhou, W., 2018. Effective repair strategy against advanced persistent threat: A differential game
approach. IEEE Transactions on Information Forensics and Security 14, 1713–1728.
Zhang, M., Zheng, Z., Shroff, N.B., 2015. A game theoretic model for defending against stealthy attacks with limited resources, in: International
Conference on Decision and Game Theory for Security, Springer. pp. 93–112.
Zhu, Q., Rass, S., 2018. On multi-phase and multi-stage game-theoretic modeling of advanced persistent threats. IEEE Access 6, 13958–13971.