Article Cloud Storage Defense Against Advanced Persistent
Article Cloud Storage Defense Against Advanced Persistent
Abstract—Cloud storage is vulnerable to Advanced Persistent to [1], more than 65% of the organizations responding to the
Threats (APTs), in which an attacker launches stealthy, continu- survey in 2014 witnessed an increase of APT attacks, and the
ous, well funded and targeted attacks over storage devices. In this current doctrine against APTs is to detect them as early as
paper, we apply prospect theory (PT) to formulate the interaction
between the defender of a cloud storage system and an APT possible [2].
attacker who makes subjective decisions that sometimes deviate Game theory is an important tool for studying APT attacks.
from the results of the expected utility theory, which is a the In the seminal work in [3], the interaction between an APT
basis of traditional game theory. In the PT-based cloud storage attacker and a defender was formulated as a stealthy takeover
defense game with pure-strategy, the defender chooses a scan game. Most existing game theoretic study on APT attacks
interval for each storage device and the subjective APT attacker
chooses his or her attack interval against each device. A mixed- is based on expected utility theory (EUT), in which each
strategy subjective storage defense game is also investigated, in player chooses the strategy to maximize the expected utility.
which each of the subjective defender and APT attacker acts However, as human beings, APT attackers are not always
under uncertainty about the action of its opponent. The Nash rational as assumed in traditional game theoretic models and
equilibria (NEs) of both games are derived, showing that the they sometimes make subjective decisions under uncertainties
subjective view of an APT attacker can improve the utility of the
defender. A Q-learning based APT defense scheme is proposed that deviate from the results of expected utility theory, such
that the storage defender can apply without being aware of the as risk seeking, loss aversion and the nonlinear weighting
APT attack model and the subjectivity model of the attacker in of gains and losses [4], as illustrated by Allais paradox
the dynamic APT defense game. Simulation results show that described in [5]. Similarly, the defenders also are subject to
the proposed defense scheme suppresses the attack motivation of such subjective traits in decision-making, thus making the
subjective APT attackers and improves the utility of the defender,
compared with the benchmark greedy defense strategy. model here amenable to use of prospect theory (PT).
By using the probability weighting function and value
Index Terms—Cloud storage, advanced persistent threat, game function, prospect theory can model the subjective decision-
theory, prospect theory, Q-learning.
making processes of end-users and successfully explain the
deviations of their decisions from the EUT-based results [6].
I. I NTRODUCTION Prospect theory has been successfully applied to study the
Cloud storage is vulnerable to Advanced Persistent Threats interactions between people in many areas, such as social
(APTs), in which an attacker launches sophisticated, stealthy, sciences [7], [8], communication networks [9]–[15], and smart
continuous, and targeted attacks. By applying multiple sophis- energy management [16], [17].
ticated attack methods, APT attackers aim to steal information In this paper, prospect theory is applied to study the cloud
from a target cyber system including cloud storage over an storage defense against advanced persistent threats and inves-
extended period of time without being noticed. APT attackers tigate the impact of end-user subjectivity on storage defense.
usually take multiple attack phases and study the defense More specifically, we formulate a cloud storage defense game,
policy of the target system in advance, making it challenging in which a subjective attacker chooses his or her interval to
to detect APTs and estimate the attack duration. According launch APT attacks to compromise storage devices and a
defender chooses its scan interval to recapture the compro-
This manuscript was received on April 30, 2016, and revised on September mised storage devices. Prelec probability weighting function
21, 2016. [18] is applied to model the subjective decision-making of the
Liang Xiao, Dongjin Xu, Caixia Xie are with the Department of Commu-
nication Engineering, Xiamen University, Xiamen, China. Liang Xiao is also attacker and defender under uncertain attack durations in the
with the Key Laboratory of Underwater Acoustic Communication and Marine pure-strategy game or uncertain action of their opponent in
Information Technology Ministry of Education, Xiamen University, Xiamen, the mixed-strategy game. The Nash equilibria (NEs) of both
China, and Beijing Key Laboratory of IOT information security technology,
Institute of Information Engineering, CAS, China (email: [email protected]). subjective games are derived to investigate the impact of end-
Narayan B. Mandayam is with the Wireless Information Network Labo- user subjectivity on the APT defense games.
ratory, Department Electrical and Computer Engineering, Rutgers University, A Q-learning based APT defense strategy is proposed for
New Brunswick, NJ 08816 USA (e-mail: [email protected]).
H. Vincent Poor is with the Department of Electrical Engineering, Princeton the cloud storage defender who is unaware of the attack model
University, Princeton, NJ 08544 USA (email: [email protected]). and the subjectivity model of the APT attacker to derive
This research was supported in part by NSFC(61671396,61271242), in part the optimal storage scan policy via trials in the dynamic
by the U. S. National Science Foundation under Grants CMMI-1435778,
ECCS-1549881, CNS-1421961 and ACI-1541069, and in part by CCF- games. Based on the iterative Bellman equation, the Q-learning
Venustech Hongyan Research Initiative (2016-010). algorithm, as a model-free reinforcement learning technique,
Digital Object Identifier: 10.1109/JSAC.2017.2659418
1558-0008 c 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
2
6WRUDJH GHYLFH L
is convenient to implement and can achieve the optimal \L
policy in the Markov decision process (MDP). Simulations $37 DWWDFNHU
\
L
]L \L ]L ]L
are performed to evaluate the performance of the Q-learning 6WRUDJH GHIHQGHU 7
based APT defense scheme, showing that it can suppress the [
L [ L [ L
6WRUDJH $37
attack motivation of subjective APT attackers and improve the GHIHQGHU N
[ 6WRUDJH GHYLFH \ N DWWDFNHU
utility of the defender. [
N
\N
[ N 6WRUDJH GHYLFH N
The main contributions of this work can be summarized as: 6 ͘͘͘ \6
6WRUDJH GHYLFH 6
• We formulate a PT-based cloud storage defense game, in &ORXG VWRUDJH
which both the APT attacker and the storage defender
hold subjective views to choose their attack or scan
interval at each cloud storage device under uncertain Fig. 1. Illustration of a cloud storage defense game, in which the defender
scans storage device i at xik interval, while the APT attacker takes a duration
attack durations in the pure-strategy game or action of the
zik to complete the k-th attack against device i after attack interval yik , with
opponents in the mixed-strategy game. We derive the NEs 1 ≤ i ≤ S and k > 0.
of the PT-based storage defense games and provide the
conditions under which the equilibria exist, showing that
a subjective APT attacker tends to attack less frequently.
PT-based anti-jamming transmission game formulated in [23]
• We propose a Q-learning based APT defense scheme for
investigates the impact of the subjectivity of end-users and
the cloud storage defender to derive the optimal scan
jammers on the throughput in cognitive radio networks.
interval policy without knowing the APT attack model or
Game theory can help develop security mechanisms for
subjectivity model in the dynamic storage defense games
cloud computing. For example, the game theoretic study on
against subjective APT attackers.
coresident attacks in [24] develops a semi-supervised learning
The remainder of the paper is organized as follows. We based defense strategy to increase the attack costs. In the PT-
review related work in Section II and present the system model based storage defense game against subjective APT attacks
in Section III. We present a static subjective storage defense as presented in [25], we derived the NE of the game under
game with pure-strategy in Section IV and investigate the uncertain APT attack durations. In this paper, we consider
mixed-strategy PT-based game in Section V. We propose the the generic APT scenarios with multiple storage devices and
Q-learning based APT defense schemes in dynamic storage multiple attack duration levels, instead of the special case with
defense games in Section VI. We provide simulation results a single device as assumed in [25]. We also present a dynamic
in Section VII and conclude in Section VIII. storage defense game with mixed-strategy and present a Q-
learning based defense strategy to resist subjective APT attacks
II. R ELATED W ORK under uncertain device scan intervals.
Game theoretic approaches for modeling and studying AP-
T attacks have received a lot of attention. In the seminal III. S YSTEM M ODEL
work of [3], a Flipit game was proposed to formulate the We consider a cloud storage system consisting of S storage
stealthy and continuous attacks of APT. The game between devices that are threatened by a subjective APT attacker (A)
an overt defender and a stealthy attacker was investigated in and are protected by a storage defender (D), as shown in Fig.
[19], showing that the periodic defense strategy is the best 1. The defender chooses the time interval to perform the k-
response against a non-adaptive attacker. A cyber-physical th detection at storage device i against APT attacks, denoted
signaling game among an APT attacker, a cloud defender by xik , with 1 ≤ i ≤ S. It is clear that xik > 0, because
and a mobile device was formulated in [20], in which the the defender has to take time to scan a storage device to
mobile device decides whether to trust the commands from detect APT attacks. Upon detecting APT attacks, the defender
the cloud under APTs. The defense based on the dynamic restores a compromised storage device and provides privacy
programming algorithm proposed in [21] provides a nearly for the data stored on the device. The defender is unaware of
optimal solution against APT attacks. The two-layer APT whether a storage device is compromised unless the device is
defense game formulated in [22] studies the joint threats from monitored.
an APT attacker and insiders in the cyber system. Prospect According to the APT model as given in [21], the APT
theory has been applied to study wireless communications attacker can apply advanced and sophisticated methods and
and network security. For instance, a random access game inject multiple types of malware to estimate the defense
formulated in [9] applies prospect theory to study the channel strategy of the target system. The attacker can also determine
access between two subjective end-users in wireless networks. whether the attack successfully controls the target storage
The impact of user subjectivity on both the wireless random device according to the data stolen from the device, and
access and data pricing games was identified in [10] based observe the size of the stolen data to determine when the attack
on prospect theory. The spectrum investment of subjective is detected and stopped by the defender. The attacker waits
secondary operators was investigated in [11], and a PT-based yik time before launching the k-th APT attack against storage
sensing and leasing method was derived. The PT-based pricing device i, once the defender detects attacks and restores that
and resource allocation scheme proposed in [15] improves the storage device. The duration for the attacker to complete its
revenue of service providers in presence of subjective users. A k-th attack at storage device i, denoted by zik , is in general
3
TABLE I
given by
S UMMARY OF S YMBOLS AND N OTATION .
S
1 yi + zi
Notation Definition R=1− min ,1 . (2)
S Number of storage devices S i=1 xi
α A/D Objective weight of the attacker/defender
xik /yik Defense/attack interval at time k against device i The utility of the defender depends on the normalized
zik Duration to complete the k-th attack against device i "good" interval during which each storage device is protected
Gi Defense gain of device i by the defender, i.e., min ((yi + zi )/xi, 1), and the gain of a
Ci Attack cost against device i
L Number of non-zero attack duration levels longer defense interval. Similar to the game model presented
M Number of detection interval levels in [21], the utility of the defender denoted by uD is defined
N Number of non-zero attack interval levels as
p/q Mixed-strategy of the defender/attacker
S
yi + zi
uD (x, y) = min , 1 + xi Gi . (3)
i=1
xi
a random positive variable that is unknown to both players.
The defender is assumed to take charge of all the S storage The utility of the attacker denoted by u A is defined as
devices at the beginning. S
yi + zi
We use the Prelec function in [18] to explain how a u A(x, y) = − min , 1 + I(yi < xi )Ci , (4)
subjective attacker or defender over-weighs low-probability i=1
xi
events and under-weighs outcomes having a high probability.
Being easy to analyze, the Prelec function has been used to where the indicator function I(ξ) = 1 if ξ is true and 0
explain the human decision deviations from EUT results in otherwise. The desire of the attacker to steal information from
network security [12], [16]. Therefore, we apply this proba- the storage device is modeled by − min (yi + zi )/xi, 1 .
bility weighting function to model the subjective probability of The time interval for the attacker to successfully launch
the attacker (or defender), denoted by w A (or wD ), and given an APT attack against storage device i zi is difficult to
by estimate and is quantized into L non-zero levels following the
distribution [Pli ]0≤l ≤L , where Pli = Pr(zi = l/L), ∀0 ≤ l ≤ L
L i
wr (p) = exp − (− ln p)αr , (1) and 1 ≤ i ≤ S. By definition, we have Pli ≥ 0 and l=0 Pl = 1.
The expected utilities of the defender and the attacker over the
where αr ∈ (0, 1] as the objective weight of player r represents realizations of attack duration zi , denoted by UD EUT
and U AEUT ,
the distortion to make decisions. For example, if α A = 1, the respectively, are given by (3) and (4) as
attacker is objective and w A(p) = p. Table 1 summarizes the L
S
notation used in the paper. EUT yi L + l
UD (x, y) = Pli min , 1 + xi Gi (5)
i=1 l=0
xi L
IV. S UBJECTIVE S TORAGE D EFENSE G AME WITH S L
yi L + l
P URE - STRATEGY U AEUT (x, y) = − Pli min ,1
i=1 l=0
xi L
The interaction between an APT attacker and a storage
defender over S storage devices is formulated as a subjective + I(yi < xi )Ci . (6)
cloud storage defense game with pure-strategy, denoted by G.
In this game, the storage defender chooses the scan interval xi The Prelec probability weighting function in (1) is used
for storage device i, and the attacker decides his or her attack to model the subjective decision-making of the players under
interval yi against storage device i. The defense interval and uncertain attack durations. The PT-based utilities of the sub-
jective defender and the attacker, denoted by UD PT and U PT ,
the attack interval are normalized for simplicity of analysis. A
According to the maximum scan interval of the defender respectively, are given by replacing the objective probability
denoted by T, the attacker and defender compete to take charge Pli in (5) and (6) with the subjective probability w(Pli ), i.e.,
of the S storage devices, with 0 < xi ≤ 1 and 0 ≤ yi ≤ 1, L
S
∀1 ≤ i ≤ S. If the attack interval denoted by Ta is greater than PT yi L + l
UD (x, y) = wD (Pli ) min , 1 + xi G i (7)
T, the game can be divided into K = ⌈Ta /T⌉ interactions, with xi L
i=1 l=0
yi = 1, ∀i < K and yK = mod (Ta, T), where ⌈⌉ is the ceiling S L
yi L + l
function. U APT (x, y) = − w A(Pli ) min ,1
The gain of the defender for a longer scan interval at storage i=1 l=0
xi L
device i is denoted by Gi , and the attack cost against device i
is denoted by Ci . As shown in Fig. 1, the time interval during + I(yi < xi )Ci . (8)
which storage device i is not compromised and the data is
safe is min ((yi + zi ) /xi, 1). Therefore, the attack rate denoted A Nash Equilibrium of the PT-based storage defense game G,
by R is defined as the normalized "bad" interval during which denoted by (x∗, y∗ ), consists of the best response of the player
data privacy is at risk averaged over S storage devices, and is in terms of the PT-based utility, if the opponent uses the NE
4
x∗ = arg max UD
PT
(x, y∗ ), ∀x (9)
x 0.8
Attack rate
0.6
defender can apply the Q-learning based defense strategy to 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Objective weight of attacker, αA
0.8 0.9 1
game G with a single storage device and 2 non-zero attack Attacker 1.6
Defender
duration levels, i.e., the probability mass function of z is given −0.7 1.5
by [P0, P1, 1−P0 −P1 ]. The index i in the superscript is omitted 1.4
1.2
⎧
⎪ αD
−1 0.9
⎪
⎨ G ≤ exp − (− ln P1 )
⎪
⎪ (11a) 0.8
I1 : −1.1 0.7
⎪
⎪ α
0.1 0.2 0.3 0.4 0.5 0.6 0.7
Objective weight of attacker, α
0.8 0.9 1
⎪
⎪ C ≤ exp − (− ln P0 ) A ; (11b) A
⎩
(b) Utility
(x ∗, y ∗ ) = (1, 0), if
Fig. 2. Performance of the static PT-based storage defense game G at the
⎧ NEs, with C = 0.62, G = 0.6, P0 = 0.46, P1 = 0.5, αD = 1 and L = 2.
⎪
⎪ G > exp − (− ln P ) αD
(12a)
⎪
⎪ 1
⎪
⎪
⎪
⎪
⎨
I2 : αA
⎪ C < exp − (− ln P0 ) Similarly, if 0.5 < x ≤ 1, we have
⎪
⎪
⎪
⎪
⎪
⎪ +0.5exp − (− ln P1 ) αA
; (12b) PT 1
⎪ UD (x, 0) = wD (P1 ) + wD (1 − P0 − P1 ) + xG, (17)
⎩ 2x
and (x ∗, y ∗ ) = (1, 1), if and
PT
∂ 2 UD
1
I3 : C > exp − (− ln P0 )α A = wD (P1 ) ≥ 0, (18)
∂ x2 y=0 x3
PT (x, 0) is concave and maximized at x = 0.5
indicating that UD
+ 0.5 exp − (− ln P1 )α A . (13)
or 1.
Proof: By (1) and (8), we see that if 0 ≤ y < 0.5, If (11a) holds, by (7), we have
PT
UD (0.5, 0) = wD (P1 ) + wD (1 − P0 − P1 ) + 0.5G
U APT (0.5, 0) = −w A(P1 ) − w A(1 − P0 − P1 ) − C
PT
≥ 0.5wD (P1 ) + wD (1 − P0 − P1 ) + G = UD (1, 0). (19)
≥ −2yw A(P0 ) − w A(P1 ) − w A(1 − P0 − P1 ) − C
= U APT (0.5, y). (14) Thus, (9) holds for (x ∗, y ∗ ) = (0.5, 0), which is an NE of the
game G. Similarly, we can prove (1, 0) and (1, 1) are also NEs
Similarly, if (11b) holds, ∀0.5 ≤ y ≤ 1, we have of the game.
Under a low attack cost, as shown in (11b) and (12b), the
U APT (0.5, 0) = −w A(P1 ) − w A(1 − P0 − P1 ) − C
APT attack is launched immediately and the scan interval is
≥ −w A(P0 ) − w A(P1 ) − w A(1 − P0 − P1 ) maximized to save energy, as shown in (12a). Otherwise, if the
= U APT (0.5, y). (15) attack cost is high, as shown in (13), a subjective APT attacker
has no motivation to launch attacks against storage device. As
Thus, (10) holds for (x ∗, y ∗ ) = (0.5, 0). By (7), if 0 < x ≤ 0.5, a concrete example, we evaluate performance of the storage
we have defense game G with C = 0.62, G = 0.6, P0 = 0.46, P1 = 0.5,
PT
UD (x, 0) = wD (P1 ) + wD (1 − P0 − P1 ) + xG, (16) αD = 1 and L = 2. As shown in Fig. 2, the attack rate has a
sharp increase from 0 to 0.7, as the attacker’s objective weight
which increases linearly with x and is maximized at x = 0.5. α A changes at around 0.42, because a subjective APT attacker
5
tends to overweigh his or her attack cost. The objective weight (x∗, y∗ ) = (1, 0), if
of attacker α A = 0.42 is a turning point from Condition I3 to αD
⎧
I2 , i.e., the utility of the defender decreases sharply from 1.6 ⎨ G > exp − (− ln P1)
⎪
⎪
(25a)
to 0.89. I9 :
⎪ max (C1, C2 ) ≤ exp − (− ln P0 )α A
⎪ +0.5exp − (− ln P1 )α A ; (25b)
Next, we consider the storage defense game with a single ⎩
storage device and 3 non-zero attack duration levels, i.e., the
(x∗, y∗ )=([1, 0.5], [1, 0]), if
distribution of the attack duration follows [P0, P1, P2, 1 − P0 −
P1 − P2 ]. ⎧
⎪ G ≤ exp − (− ln P1 )αD (26a)
⎪
Theorem 2. The subjective storage defense game G with S = 1 ⎨ C1 > exp − (− ln P0 )α A
⎪
⎪
I10 :
and L = 3 has an NE (x ∗, y ∗ ) = (1/3, 0), if ⎪
⎪ +0.5exp − (− ln P1 )α A (26b)
⎪
⎪
⎧ 3 ⎩ C2 ≤ exp − (− ln P0 )α A ; (26c)
⎪
⎪ G < min exp − (− ln P1 ) αD
,
⎪
⎪
⎪
⎪
⎪ 2 (x∗, y∗ )=(1, [1, 0]), if
⎪
⎪
⎪
⎨ exp − (− ln P1 )αD ⎧
αD
I4 :
⎪ ⎨ G > exp − (− ln P1 )
⎪
⎪
(27a)
⎪
⎪
⎪ +
1
exp − (− ln P ) αD
(20a)
I11 :
⎪ C2 < exp − (− ln P0 )α A
⎪
⎪ 2 ⎪ +0.5exp − (− ln P )α A < C ;
⎪
⎪ 2 ⎩ 1 1 (27b)
⎪ C < exp − (− ln P )α A ;
⎪
(20b)
⎩ 0
and (x∗, y∗ ) = (1, 1), if
(x ∗, y ∗ ) = (2/3, 0), if
I12 : min (C1, C2 ) > exp − (− ln P0 )α A
⎧ 3
⎪
⎪
⎪ exp − (− ln P1 )αD < G < +0.5 exp − (− ln P1 )α A . (28)
⎪
⎪ 2
⎪
⎪ 1
⎪
⎪
⎨
⎪ exp − (− ln P1 )αD Proof: By (8), if 0 ≤ y1, y2 < 0.5, we have
2
I5 : + exp − (− ln P2 )αD (21a)
⎪
⎪
⎪
⎪ C < exp − (− ln P0 )α A
U APT (0.5, 0) = −2w A(P1 ) − 2w A(1 − P0 − P1 )
⎪
⎪
⎪
⎪ 1 − C1 − C2 ≥ − (2y1 + 2y2 ) w A(P0 ) − 2w A(P1 )
⎪
⎪ + exp − (− ln P1 )α A ; (21b)
⎩ 2 − 2w A(1 − P0 − P1 ) − C1 − C2 = U APT (0.5, y). (29)
(x ∗, y ∗ ) = (1, 0), if
If 0 ≤ y2 < 0.5 ≤ y1 ≤ 1, and C1 < w A(P0 ), we have
⎧
⎪
⎪ G > max exp − (− ln P1 )αD
⎪
⎪
⎪ 1 αD 1 U APT (0.5, 0) = −2w A(P1 ) − 2w A(1 − P0 − P1 ) − C1 − C2
⎪
⎪ + 2 exp − (− ln P2 )
⎪ , exp − (− ln P1 )αD
⎪
⎪ 2 ≥ − (1 + 2y2 ) w A(P0 ) − 2w A(P1 ) − 2w A(1 − P0 − P1 )
⎪
⎨ + exp − (− ln P2 )αD (22a)
I6 : − C2 = U APT (0.5, y). (30)
⎪
⎪ 2
⎪
⎪
⎪ C < exp − (− ln P0 )α A + exp − (− ln P1 )α A
⎪
⎪ 3 Similarly, (10) also holds, if 0 ≤ y1 < 0.5 ≤ y2 ≤ 1. Thus,
⎪
⎪ 1
⎪
⎪ + exp − (− ln P2 )α A ; (22b) (10) holds for (x∗, y∗ ) = (0.5, 0).
⎩ 3
By (7), if 0 < x1, x2 ≤ 0.5, we have
and (x ∗, y ∗ ) = (1, 1), if
PT
2 UD (0.5, 0) = 2wD (P1 ) + 2wD (1 − P0 − P1 ) + G
I7 : C > exp − (− ln P0 )α A + exp − (− ln P1 )α A
3 ≥ 2wD (P1 ) + 2wD (1 − P0 − P1 ) + (x1 + x2 )G
1 PT
+ exp − (− ln P2 )α A . = UD (x, 0). (31)
3
(23) If 0 ≤ x2 < 0.5 ≤ x1 ≤ 1, and G ≤ wD (P1 ), we have
Proof: The proof is given in Appendix A. PT
Under a low attack cost, as shown in (20b) (21b) and (22b), UD (0.5, 0) = 2wD (P1 ) + 2wD (1 − P0 − P1 ) + G
the attacker launches an attack immediately against storage 1
≥ + 1 wD (P1 ) + 2wD (1 − P0 − P1 )
device and the defender maximizes its detection interval to 2x1
save energy, as shown in (22a). If the attack cost is high, as + (x1 + x2 )G = UD PT
(x, 0). (32)
in (23), the attacker has no motivation to launch APT.
Now we consider the case with two storage devices that Similarly, (9) holds for the other cases, indicating that (0.5, 0)
have the same detection gain, i.e., G1 = G2 = G, and the is an NE of the game. We can prove the other NEs of the
same attack durations distribution, i.e., Pl1 = Pl2 = Pl . game similarly.
Theorem 3. If S = 2, L = 2, the subjective storage defense If the attack cost is low, i.e., (24b), the attacker launches
game G has an NE (x∗, y∗ ) = (0.5, 0), if APT attacks and the defender scans the two devices at the
same frequency. If (28) holds with a high attack cost, the attack
G ≤ exp − (− ln P1 )αD (24a) motivation is suppressed and the defender maximizes the scan
I8 :
max (C1, C2 ) ≤ exp − (− ln P0 )α A ; (24b) interval.
6
m n nM + zi M N mGi α =1
uiD (
0.48
, ) = min ,1 + (42) D
αD=0.5
M N mN M
n m
0.46
m n nM + zi M N
uiA( , ) = − min ,1 −I < Ci . (43)
Attack rate
M N mN N M 0.44
⎪ ∂L D
⎨
⎪
∂pmi = 0
i i i i
(44) (a) Attack rate
⎪
⎪ M m ≤ 0, μm ≥ 0, μm pm = 0, 1 ≤ m ≤ M
⎪
⎪
−p
⎪
⎪ i 0.9
⎪
⎪ pm − 1 = 0.
⎩ m=1 0.88
0.86
According to (38), we apply the complementary slackness for
(44) to obtain
Scan frequency
0.84
⎧ N
0.82
⎪
⎪ i k n i∗ i
⎪
⎪
⎪
u D M N wD (qn ) − λ D = 0, 1 ≤ k ≤ M
, 0.8
⎪
⎨ n=0
M
(45) 0.78
⎪
⎪ pi = 1
⎪ m=1 m
⎪ 0.76
⎪
⎪ λi ≥ 0,
⎩ D 0.74
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
Objective weight of attacker, α
A
and yield (41a). Similarly, we have (41b).
(b) Scan frequency, p1∗ , with αD = 1
Corollary 1. If S = 1, M = 2, N = 1,
−0.9 0.75
u A(0.5, 1) − u A(0.5, 0)
> 1, (47) −0.92 0.7
u A(1, 0) − u A(1, 1)
the subjective storage defense game G′ has a unique NE given −0.93
by
αA
u A(0.5, 1) − u A(0.5, 0) ∗
ln + − ln 1 − p1 −0.94
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
0.65
αA (c) Utility
− − ln p∗1 =0 (48)
Fig. 3. Performance of the static subjective storage defense game with
uD (0.5, 0) − uD (1, 0) αD mixed-strategy G′ at the NE, with C = 0.5, G = 0.1 and z = 0.2.
ln + − ln 1 − q0∗
uD (1, 1) − uD (0.5, 1)
∗ αD
− − ln q0 = 0. (49)
According to Corollary 1, the NE of the EUT-based storage
Proof: According to (1), (41a) and (46), we have (49). defense game is given by
Similarly, we can obtain (48) by (1), (41b) and (47). Nex-
1−z−C G
t, we prove the uniqueness of q0∗ . As f (x) = (− ln (x))α ∗ ∗
(p1, q0 ) = , . (51)
min(2z, 1) − z 2 min(2z, 1) − 2z
monotonically decreases with x, by (46) and (49) we have
f (q0∗ ) > f (1 − q0∗ ), yielding 0 < q0∗ < 1 − q0∗ < 1. Thus we As shown in Fig. 3, the attack rate of the subjective storage
have 0 < q0∗ < 1/2. If 0 < x < 1/2, we have defense game G′ decreases with αD , e.g., it decreases by
1.04%, as αD changes from 0.5 to 1, because a subjective
d f (1 − x) − f (x)
= f ′(1 − x) − f ′(x) > 0, (50) defender scans less frequently. Consequently, the utility of the
dx defender increases from 0.66 to 0.69, if αD changes from 0.5
indicating that f (1− x)− f (x) increases with x. Therefore, (49) to 1. In addition, the scan frequency decreases by 15% as α A
has a unique solution. Similarly, (48) has a unique solution. changes from 0.5 to 1, and thus the utility of the defender
decreases from 0.7 to 0.69.
8
Algorithm 1 APT defense in a dynamic game with pure-strategy. Algorithm 2 APT defense in a dynamic game with mixed-strategy.
Initialize γ = 0.7, δ = 0.7, y 0, z 0, Q(s, x) = 0, V (s) = 0, ∀x, s. Initialize γ = 0.7, δ = 0.7, Φ0, Q(s, p) = 0, V (s) = 0, ∀s, p.
For k = 1, 2, 3, ... For k = 1, 2, 3, ...
s k = y k−1 + z k−1 Update s k = Φ k−1
Choose x k via (54) Choose p k with the -greedy algorithm
Scan the storage device after time x k Scan the storage device according to strategy p k
Observe u D and y k + z k Observe u D and Φ k
Update Q(s k , x k ) via (52) Update Q(s k , p k ) via (55)
Update V (s k ) via (53) Update V (s k ) via (56)
End for End for
0.14
VI. DYNAMIC PT- BASED S TORAGE D EFENSE G AME Case1, Q−learing
Case1, greedy
0.12
If the defender is unaware of the APT attack model and Case2, Q−learning
Case2, greedy
the subjective view model in the dynamic subjective cloud 0.1
storage defense game, the storage defender can apply the Q- 0.08
Attack rate
learning technique, a model-free and widely-used reinforce- 0.06
1.45
duration in the last slot z k−1 + y k−1 . The value function V(s) 1.4
γ uD s k , x k + δV s k+1 (52)
(b) Utility of the defender
V s k = max Q s k , x , (53)
x ∈x Fig. 4. Performance of the dynamic storage defense game with pure-strategy
G averaged over 1000 runs, with L = 5, C = 0.4 and α A = 0.8 in Case 1,
where δ ∈ [0, 1] is the discount factor regarding the future and L = 2, C = 0.62 and α A = 0.3 in Case 2.
reward, and γ ∈ (0, 1] is the learning rate of the current
experience.
By applying the ǫ-greedy policy, the defender chooses its
scan interval x k to maximize its current Q-function as Let Q(s, p) denote the Q-function with mixed-strategy p, and
V(s) be the value function. Based on the iterative Bellman
1 − ǫ, x̃ = arg maxx Q(s k , x) equation, the Q-function can be updated with
Pr(x k = x̃) = (54)
M−1 , o.w.
Q sk , pk ← − (1 − γ)Q sk , pk + γ uD + δV sk+1 (55)
The Q-learning based storage defense algorithm is summarized
in Algorithm 1. V sk = max Q sk , p . (56)
p
B. PT-based dynamic game with mixed-strategy The mixed-strategy is chosen based on the ǫ-greedy algorithm
in terms of the Q-function in (55). The algorithm is summa-
In the dynamic PT-based storage defense game with mixed- rized in Algorithm 2.
strategy, denoted by G ′, the defender chooses the detection
interval distribution p = [pm ]1≤m≤M , while the attacker
determines the attack interval distrubution q = [qn ]0≤n≤N , VII. S IMULATION R ESULTS
where pm and qn are quantized into ζ levels, with 1 ≤ m ≤ M, Simulations have been performed to evaluate the perfor-
and 0 ≤ n ≤ N. mance of the Q-learning based APT detection scheme in the
The system state at time k is defined as the total attack PT-based dynamic games G and G ′. If not specified otherwise,
duration distribution in the last time slot, denoted by Φk−1 . we set αD = 1 to maximize the utility of the defender,
9
0.18 0.18
Q−learning
0.16 0.16 Q−learing
Greedy
Greedy
0.14 0.14
0.12 0.12
Average attack rate
Attack rate
0.1 0.1
0.08 0.08
0.06 0.06
0.04 0.04
0.02 0.02
0 0
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 0 500 1000 1500 2000 2500 3000 3500 4000
Objective weight of the attacker, αA Time slot
Q−learning
Greedy Q−learing
1.15 Greedy
1.4
Average utility of the defender
Utility of defender
1.1
1.35
1.05
1.3
1
1.25
0.95
0.9
0 500 1000 1500 2000 2500 3000 3500 4000
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
Objective weight of the attacker, α Time slot
A
Fig. 5. Performance of the dynamic storage defense game with pure-strategy Fig. 6. Performance of the dynamic storage defense game with mixed-
G averaged over 4000 time slots, with L = 5, C = 0.4, G = 0.6 and = 0.1. strategy G ′ averaged over 1000 runs, with z = 0.3, C = 0.6, G = 0.25,
M = 2, N = 1, αD = 1, α A = 0.8 and = 0.1.
α A = 0.8 to represent a typical subjective attacker, γ = 0.7, from 1.39 to 1.27, if α A changes from 0.5 to 1. The utility
δ = 0.7, and ǫ = 0.1 to achieve good performance. We changes most significantly if α A changes between 0.7 and 0.9,
chose typical attack and defense parameters, G = 0.6 and because the attack interval changes most significantly due to
z = 0.3, and used a greedy detection strategy as benchmark, the probability distortion of the subjective attacker, and the
in which the scan interval is chosen to maximize the estimated turning points at α A = 0.64 and 0.94 match the theoretic
immediate utility based on the previous attack interval. The results in Theorem 1.
attack strategy is chosen to maximize the PT utility of the The performance of the PT-based mixed-strategy game G ′
attacker according to the attack history in the last time slot. in Fig. 6 shows that the attack rate decreases with time from
The performance of the PT-based dynamic game with pure- 15% at the beginning of the game to 6% after 500 time slots,
strategy G is shown in Fig. 4, with L = 5, C = 0.4 and which is only half of that of the benchmark strategy if α A = 1.
α A = 0.8 in Case 1, and L = 2, C = 0.62 and α A = 0.3 Thus, the utility of the defender increases over time, e.g., from
in Case 2. The attack rate decreases over time, from 8% at 1 to around 1.1 after 500 time slots, and is 10% higher than
the beginning of the game to 0.5% after 6000 time slots in that of the benchmark strategy at time slot 500.
Case 1, about 93.7% lower than the convergent attack rate of As shown in Fig. 7, the attack rate R increases from 3%
the benchmark strategy. Consequently, as shown in Fig. 4 (b), to 6.5% , if α A changes from 0.2 to 1, and the attack rate
the utility of the defender increases over time from 1.29 at is only half of that of the benchmark strategy at α A = 1.
the beginning to 1.43 after 6000 time slots at convergence, Consequently, the utility of the defender decreases from 1.15
about 10.9% higher than the benchmark strategy. In Case 2, to 1.095, if α A changes from 0.2 to 1.
the utility of the defender converges to 1.6 after 8000 times
slots, which matches the result of the NE given by Theorem VIII. C ONCLUSION
1. In this work, we have formulated PT-based cloud storage
As shown in Fig. 5, the attack rate increases with the defense games to investigate the impact of the subjective view
objective weight of the attacker, e.g., R increases about 4 times of APT attackers under uncertain attack durations in the pure-
if α A changes from 0.5 to 1. The attack rate at convergence is strategy game or uncertain scan interval of the defender in
4.4%, which is 70.6% lower than the benchmark strategy, with the mixed-strategy game. The NEs of the PT-based games
α A = 1. Consequently, the utility of the defender decreases have been provided, showing that a subjective attacker tends
10
in wireless networks,” in Proc. IEEE Annu. Conf. Inf. Sci. Syst. (CISS),
(a) Average attack rate pp. 1–6, Princeton, NJ, Mar. 2014.
[14] Y. Yang and N. B. Mandayam, “Impact of end-user decisions on pricing
in wireless networks under a multiple-user-single-provider setting,” in
Q−learning
Proc. Annu. Allerton. Conf. Commun. Control Comput., pp. 206 – 212,
1.15
Greedy Monticello, IL, Oct. 2014.
Average utility of the defender
If (20b) holds, by (8), ∀1/3 ≤ y ≤ 1, we have Dongjin Xu received the B.S. degree in communica-
tion engineering from Xiamen University, Xiamen,
1 China, in 2016, where she is currently pursuing the
U APT , 0 = −w A(P1 ) − w A(P2 ) M.S. degree with the Department of Communication
3
Engineering. Her research interests include network
− w A(1 − P0 − P1 − P2 ) − C ≥ −w A(P0 ) − w A(P1 ) security and wireless communications.
− w A(P2 ) − w A(1 − P0 − P1 − P2 )
PT 1
= UA ,y . (58)
3
Thus, (10) holds for (x ∗, y ∗ ) = (1/3, 0).
By (7), if 0 < x < 1/3, we see that UD PT (x, 0) increases
or 2/3.
Similarly, if 2/3 < x < 1, we see that UD PT (x, 0) is concave