0% found this document useful (0 votes)
29 views12 pages

Article Cloud Storage Defense Against Advanced Persistent

This document proposes applying prospect theory to model the subjective decision making of advanced persistent threat (APT) attackers and cloud storage defenders. It formulates a game theoretic model of the interaction between an APT attacker choosing attack intervals and a defender choosing scan intervals. Nash equilibria of the pure-strategy and mixed-strategy games are derived. A Q-learning based defense strategy is also proposed for defenders to determine optimal scan policies without knowledge of the attacker's strategies or subjectivity.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views12 pages

Article Cloud Storage Defense Against Advanced Persistent

This document proposes applying prospect theory to model the subjective decision making of advanced persistent threat (APT) attackers and cloud storage defenders. It formulates a game theoretic model of the interaction between an APT attacker choosing attack intervals and a defender choosing scan intervals. Nash equilibria of the pure-strategy and mixed-strategy games are derived. A Q-learning based defense strategy is also proposed for defenders to determine optimal scan policies without knowledge of the attacker's strategies or subjectivity.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

1

Cloud Storage Defense Against Advanced Persistent


Threats: A Prospect Theoretic Study
Liang Xiao, Senior Member, IEEE, Dongjin Xu, Caixia Xie,
Narayan B. Mandayam, Fellow, IEEE, and H. Vincent Poor, Fellow, IEEE

Abstract—Cloud storage is vulnerable to Advanced Persistent to [1], more than 65% of the organizations responding to the
Threats (APTs), in which an attacker launches stealthy, continu- survey in 2014 witnessed an increase of APT attacks, and the
ous, well funded and targeted attacks over storage devices. In this current doctrine against APTs is to detect them as early as
paper, we apply prospect theory (PT) to formulate the interaction
between the defender of a cloud storage system and an APT possible [2].
attacker who makes subjective decisions that sometimes deviate Game theory is an important tool for studying APT attacks.
from the results of the expected utility theory, which is a the In the seminal work in [3], the interaction between an APT
basis of traditional game theory. In the PT-based cloud storage attacker and a defender was formulated as a stealthy takeover
defense game with pure-strategy, the defender chooses a scan game. Most existing game theoretic study on APT attacks
interval for each storage device and the subjective APT attacker
chooses his or her attack interval against each device. A mixed- is based on expected utility theory (EUT), in which each
strategy subjective storage defense game is also investigated, in player chooses the strategy to maximize the expected utility.
which each of the subjective defender and APT attacker acts However, as human beings, APT attackers are not always
under uncertainty about the action of its opponent. The Nash rational as assumed in traditional game theoretic models and
equilibria (NEs) of both games are derived, showing that the they sometimes make subjective decisions under uncertainties
subjective view of an APT attacker can improve the utility of the
defender. A Q-learning based APT defense scheme is proposed that deviate from the results of expected utility theory, such
that the storage defender can apply without being aware of the as risk seeking, loss aversion and the nonlinear weighting
APT attack model and the subjectivity model of the attacker in of gains and losses [4], as illustrated by Allais paradox
the dynamic APT defense game. Simulation results show that described in [5]. Similarly, the defenders also are subject to
the proposed defense scheme suppresses the attack motivation of such subjective traits in decision-making, thus making the
subjective APT attackers and improves the utility of the defender,
compared with the benchmark greedy defense strategy. model here amenable to use of prospect theory (PT).
By using the probability weighting function and value
Index Terms—Cloud storage, advanced persistent threat, game function, prospect theory can model the subjective decision-
theory, prospect theory, Q-learning.
making processes of end-users and successfully explain the
deviations of their decisions from the EUT-based results [6].
I. I NTRODUCTION Prospect theory has been successfully applied to study the
Cloud storage is vulnerable to Advanced Persistent Threats interactions between people in many areas, such as social
(APTs), in which an attacker launches sophisticated, stealthy, sciences [7], [8], communication networks [9]–[15], and smart
continuous, and targeted attacks. By applying multiple sophis- energy management [16], [17].
ticated attack methods, APT attackers aim to steal information In this paper, prospect theory is applied to study the cloud
from a target cyber system including cloud storage over an storage defense against advanced persistent threats and inves-
extended period of time without being noticed. APT attackers tigate the impact of end-user subjectivity on storage defense.
usually take multiple attack phases and study the defense More specifically, we formulate a cloud storage defense game,
policy of the target system in advance, making it challenging in which a subjective attacker chooses his or her interval to
to detect APTs and estimate the attack duration. According launch APT attacks to compromise storage devices and a
defender chooses its scan interval to recapture the compro-
This manuscript was received on April 30, 2016, and revised on September mised storage devices. Prelec probability weighting function
21, 2016. [18] is applied to model the subjective decision-making of the
Liang Xiao, Dongjin Xu, Caixia Xie are with the Department of Commu-
nication Engineering, Xiamen University, Xiamen, China. Liang Xiao is also attacker and defender under uncertain attack durations in the
with the Key Laboratory of Underwater Acoustic Communication and Marine pure-strategy game or uncertain action of their opponent in
Information Technology Ministry of Education, Xiamen University, Xiamen, the mixed-strategy game. The Nash equilibria (NEs) of both
China, and Beijing Key Laboratory of IOT information security technology,
Institute of Information Engineering, CAS, China (email: [email protected]). subjective games are derived to investigate the impact of end-
Narayan B. Mandayam is with the Wireless Information Network Labo- user subjectivity on the APT defense games.
ratory, Department Electrical and Computer Engineering, Rutgers University, A Q-learning based APT defense strategy is proposed for
New Brunswick, NJ 08816 USA (e-mail: [email protected]).
H. Vincent Poor is with the Department of Electrical Engineering, Princeton the cloud storage defender who is unaware of the attack model
University, Princeton, NJ 08544 USA (email: [email protected]). and the subjectivity model of the APT attacker to derive
This research was supported in part by NSFC(61671396,61271242), in part the optimal storage scan policy via trials in the dynamic
by the U. S. National Science Foundation under Grants CMMI-1435778,
ECCS-1549881, CNS-1421961 and ACI-1541069, and in part by CCF- games. Based on the iterative Bellman equation, the Q-learning
Venustech Hongyan Research Initiative (2016-010). algorithm, as a model-free reinforcement learning technique,
Digital Object Identifier: 10.1109/JSAC.2017.2659418

1558-0008 c 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
2

6WRUDJH GHYLFH L
is convenient to implement and can achieve the optimal \L 
policy in the Markov decision process (MDP). Simulations $37 DWWDFNHU

\
L
]L \L ]L ]L
are performed to evaluate the performance of the Q-learning 6WRUDJH GHIHQGHU 7
  
based APT defense scheme, showing that it can suppress the [
L [ L [ L
6WRUDJH $37
attack motivation of subjective APT attackers and improve the GHIHQGHU N
[ 6WRUDJH GHYLFH  \ N DWWDFNHU

utility of the defender. [
N

\N
[ N 6WRUDJH GHYLFH  N
The main contributions of this work can be summarized as: 6 ͘͘͘ \6
6WRUDJH GHYLFH 6
• We formulate a PT-based cloud storage defense game, in &ORXG VWRUDJH
which both the APT attacker and the storage defender
hold subjective views to choose their attack or scan
interval at each cloud storage device under uncertain Fig. 1. Illustration of a cloud storage defense game, in which the defender
scans storage device i at xik interval, while the APT attacker takes a duration
attack durations in the pure-strategy game or action of the
zik to complete the k-th attack against device i after attack interval yik , with
opponents in the mixed-strategy game. We derive the NEs 1 ≤ i ≤ S and k > 0.
of the PT-based storage defense games and provide the
conditions under which the equilibria exist, showing that
a subjective APT attacker tends to attack less frequently.
PT-based anti-jamming transmission game formulated in [23]
• We propose a Q-learning based APT defense scheme for
investigates the impact of the subjectivity of end-users and
the cloud storage defender to derive the optimal scan
jammers on the throughput in cognitive radio networks.
interval policy without knowing the APT attack model or
Game theory can help develop security mechanisms for
subjectivity model in the dynamic storage defense games
cloud computing. For example, the game theoretic study on
against subjective APT attackers.
coresident attacks in [24] develops a semi-supervised learning
The remainder of the paper is organized as follows. We based defense strategy to increase the attack costs. In the PT-
review related work in Section II and present the system model based storage defense game against subjective APT attacks
in Section III. We present a static subjective storage defense as presented in [25], we derived the NE of the game under
game with pure-strategy in Section IV and investigate the uncertain APT attack durations. In this paper, we consider
mixed-strategy PT-based game in Section V. We propose the the generic APT scenarios with multiple storage devices and
Q-learning based APT defense schemes in dynamic storage multiple attack duration levels, instead of the special case with
defense games in Section VI. We provide simulation results a single device as assumed in [25]. We also present a dynamic
in Section VII and conclude in Section VIII. storage defense game with mixed-strategy and present a Q-
learning based defense strategy to resist subjective APT attacks
II. R ELATED W ORK under uncertain device scan intervals.
Game theoretic approaches for modeling and studying AP-
T attacks have received a lot of attention. In the seminal III. S YSTEM M ODEL
work of [3], a Flipit game was proposed to formulate the We consider a cloud storage system consisting of S storage
stealthy and continuous attacks of APT. The game between devices that are threatened by a subjective APT attacker (A)
an overt defender and a stealthy attacker was investigated in and are protected by a storage defender (D), as shown in Fig.
[19], showing that the periodic defense strategy is the best 1. The defender chooses the time interval to perform the k-
response against a non-adaptive attacker. A cyber-physical th detection at storage device i against APT attacks, denoted
signaling game among an APT attacker, a cloud defender by xik , with 1 ≤ i ≤ S. It is clear that xik > 0, because
and a mobile device was formulated in [20], in which the the defender has to take time to scan a storage device to
mobile device decides whether to trust the commands from detect APT attacks. Upon detecting APT attacks, the defender
the cloud under APTs. The defense based on the dynamic restores a compromised storage device and provides privacy
programming algorithm proposed in [21] provides a nearly for the data stored on the device. The defender is unaware of
optimal solution against APT attacks. The two-layer APT whether a storage device is compromised unless the device is
defense game formulated in [22] studies the joint threats from monitored.
an APT attacker and insiders in the cyber system. Prospect According to the APT model as given in [21], the APT
theory has been applied to study wireless communications attacker can apply advanced and sophisticated methods and
and network security. For instance, a random access game inject multiple types of malware to estimate the defense
formulated in [9] applies prospect theory to study the channel strategy of the target system. The attacker can also determine
access between two subjective end-users in wireless networks. whether the attack successfully controls the target storage
The impact of user subjectivity on both the wireless random device according to the data stolen from the device, and
access and data pricing games was identified in [10] based observe the size of the stolen data to determine when the attack
on prospect theory. The spectrum investment of subjective is detected and stopped by the defender. The attacker waits
secondary operators was investigated in [11], and a PT-based yik time before launching the k-th APT attack against storage
sensing and leasing method was derived. The PT-based pricing device i, once the defender detects attacks and restores that
and resource allocation scheme proposed in [15] improves the storage device. The duration for the attacker to complete its
revenue of service providers in presence of subjective users. A k-th attack at storage device i, denoted by zik , is in general
3

TABLE I
given by
S UMMARY OF S YMBOLS AND N OTATION .
S  
1 yi + zi
Notation Definition R=1− min ,1 . (2)
S Number of storage devices S i=1 xi
α A/D Objective weight of the attacker/defender
xik /yik Defense/attack interval at time k against device i The utility of the defender depends on the normalized
zik Duration to complete the k-th attack against device i "good" interval during which each storage device is protected
Gi Defense gain of device i by the defender, i.e., min ((yi + zi )/xi, 1), and the gain of a
Ci Attack cost against device i
L Number of non-zero attack duration levels longer defense interval. Similar to the game model presented
M Number of detection interval levels in [21], the utility of the defender denoted by uD is defined
N Number of non-zero attack interval levels as
p/q Mixed-strategy of the defender/attacker
S    
yi + zi
uD (x, y) = min , 1 + xi Gi . (3)
i=1
xi
a random positive variable that is unknown to both players.
The defender is assumed to take charge of all the S storage The utility of the attacker denoted by u A is defined as
devices at the beginning. S    
 yi + zi
We use the Prelec function in [18] to explain how a u A(x, y) = − min , 1 + I(yi < xi )Ci , (4)
subjective attacker or defender over-weighs low-probability i=1
xi
events and under-weighs outcomes having a high probability.
Being easy to analyze, the Prelec function has been used to where the indicator function I(ξ) = 1 if ξ is true and 0
explain the human decision deviations from EUT results in otherwise. The desire of the attacker to steal  information  from
network security [12], [16]. Therefore, we apply this proba- the storage device is modeled by − min (yi + zi )/xi, 1 .
bility weighting function to model the subjective probability of The time interval for the attacker to successfully launch
the attacker (or defender), denoted by w A (or wD ), and given an APT attack against storage device i zi is difficult to
by estimate and is quantized into L non-zero levels following the
  distribution [Pli ]0≤l ≤L , where Pli = Pr(zi = l/L), ∀0 ≤ l ≤ L
L i
wr (p) = exp − (− ln p)αr , (1) and 1 ≤ i ≤ S. By definition, we have Pli ≥ 0 and l=0 Pl = 1.
The expected utilities of the defender and the attacker over the
where αr ∈ (0, 1] as the objective weight of player r represents realizations of attack duration zi , denoted by UD EUT
and U AEUT ,
the distortion to make decisions. For example, if α A = 1, the respectively, are given by (3) and (4) as
attacker is objective and w A(p) = p. Table 1 summarizes the  L
 S   
notation used in the paper. EUT yi L + l
UD (x, y) = Pli min , 1 + xi Gi (5)
i=1 l=0
xi L
IV. S UBJECTIVE S TORAGE D EFENSE G AME WITH  S  L  
yi L + l
P URE - STRATEGY U AEUT (x, y) = − Pli min ,1
i=1 l=0
xi L

The interaction between an APT attacker and a storage
defender over S storage devices is formulated as a subjective + I(yi < xi )Ci . (6)
cloud storage defense game with pure-strategy, denoted by G.
In this game, the storage defender chooses the scan interval xi The Prelec probability weighting function in (1) is used
for storage device i, and the attacker decides his or her attack to model the subjective decision-making of the players under
interval yi against storage device i. The defense interval and uncertain attack durations. The PT-based utilities of the sub-
jective defender and the attacker, denoted by UD PT and U PT ,
the attack interval are normalized for simplicity of analysis. A
According to the maximum scan interval of the defender respectively, are given by replacing the objective probability
denoted by T, the attacker and defender compete to take charge Pli in (5) and (6) with the subjective probability w(Pli ), i.e.,
of the S storage devices, with 0 < xi ≤ 1 and 0 ≤ yi ≤ 1,  L  
S 
∀1 ≤ i ≤ S. If the attack interval denoted by Ta is greater than PT yi L + l
UD (x, y) = wD (Pli ) min , 1 + xi G i (7)
T, the game can be divided into K = ⌈Ta /T⌉ interactions, with xi L
i=1 l=0
yi = 1, ∀i < K and yK = mod (Ta, T), where ⌈⌉ is the ceiling S  L  
 yi L + l
function. U APT (x, y) = − w A(Pli ) min ,1
The gain of the defender for a longer scan interval at storage i=1 l=0
xi L
device i is denoted by Gi , and the attack cost against device i 
is denoted by Ci . As shown in Fig. 1, the time interval during + I(yi < xi )Ci . (8)
which storage device i is not compromised and the data is
safe is min ((yi + zi ) /xi, 1). Therefore, the attack rate denoted A Nash Equilibrium of the PT-based storage defense game G,
by R is defined as the normalized "bad" interval during which denoted by (x∗, y∗ ), consists of the best response of the player
data privacy is at risk averaged over S storage devices, and is in terms of the PT-based utility, if the opponent uses the NE
4

strategy. By definition, we have


1

x∗ = arg max UD
PT
(x, y∗ ), ∀x (9)
x 0.8

y∗ = arg max U APT (x∗, y), ∀y. (10)


y

Attack rate
0.6

The objective weight of the attacker α A can be estimated by


0.4
the defender according to the defense history or provided by
security agents. Similarly, the attacker can obtain both αD and 0.2

α A according to the attack history against the target storage


system. On the other hand, if α A and αD are unknown, the 0

defender can apply the Q-learning based defense strategy to 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Objective weight of attacker, αA
0.8 0.9 1

derive its best defense policy that converges to the NEs, as


described in Section VI. (a) Attack rate
We first evaluate the NE of the static cloud storage defense −0.6 1.7

game G with a single storage device and 2 non-zero attack Attacker 1.6

Defender
duration levels, i.e., the probability mass function of z is given −0.7 1.5

by [P0, P1, 1−P0 −P1 ]. The index i in the superscript is omitted 1.4

Utility of the defender


Utility of the attacker
if no confusion occurs. −0.8 1.3

1.2

Theorem 1. The subjective storage defense game G with S = 1 −0.9 1.1

and L = 2 has an NE (x ∗, y ∗ ) = (0.5, 0), if 1

 

⎪ αD
−1 0.9


⎨ G ≤ exp − (− ln P1 )

⎪ (11a) 0.8

I1 :   −1.1 0.7

⎪ α
0.1 0.2 0.3 0.4 0.5 0.6 0.7
Objective weight of attacker, α
0.8 0.9 1


⎪ C ≤ exp − (− ln P0 ) A ; (11b) A


(b) Utility
(x ∗, y ∗ ) = (1, 0), if
  Fig. 2. Performance of the static PT-based storage defense game G at the
⎧ NEs, with C = 0.62, G = 0.6, P0 = 0.46, P1 = 0.5, αD = 1 and L = 2.

⎪ G > exp − (− ln P ) αD
(12a)

⎪ 1




⎨  
I2 : αA
⎪ C < exp − (− ln P0 ) Similarly, if 0.5 < x ≤ 1, we have

⎪  



⎪ +0.5exp − (− ln P1 ) αA
; (12b) PT 1
⎪ UD (x, 0) = wD (P1 ) + wD (1 − P0 − P1 ) + xG, (17)
⎩ 2x
and (x ∗, y ∗ ) = (1, 1), if and
  PT 
∂ 2 UD
 1
I3 : C > exp − (− ln P0 )α A = wD (P1 ) ≥ 0, (18)
∂ x2  y=0 x3
 
PT (x, 0) is concave and maximized at x = 0.5
indicating that UD
+ 0.5 exp − (− ln P1 )α A . (13)
or 1.
Proof: By (1) and (8), we see that if 0 ≤ y < 0.5, If (11a) holds, by (7), we have
PT
UD (0.5, 0) = wD (P1 ) + wD (1 − P0 − P1 ) + 0.5G
U APT (0.5, 0) = −w A(P1 ) − w A(1 − P0 − P1 ) − C
PT
≥ 0.5wD (P1 ) + wD (1 − P0 − P1 ) + G = UD (1, 0). (19)
≥ −2yw A(P0 ) − w A(P1 ) − w A(1 − P0 − P1 ) − C
= U APT (0.5, y). (14) Thus, (9) holds for (x ∗, y ∗ ) = (0.5, 0), which is an NE of the
game G. Similarly, we can prove (1, 0) and (1, 1) are also NEs
Similarly, if (11b) holds, ∀0.5 ≤ y ≤ 1, we have of the game.
Under a low attack cost, as shown in (11b) and (12b), the
U APT (0.5, 0) = −w A(P1 ) − w A(1 − P0 − P1 ) − C
APT attack is launched immediately and the scan interval is
≥ −w A(P0 ) − w A(P1 ) − w A(1 − P0 − P1 ) maximized to save energy, as shown in (12a). Otherwise, if the
= U APT (0.5, y). (15) attack cost is high, as shown in (13), a subjective APT attacker
has no motivation to launch attacks against storage device. As
Thus, (10) holds for (x ∗, y ∗ ) = (0.5, 0). By (7), if 0 < x ≤ 0.5, a concrete example, we evaluate performance of the storage
we have defense game G with C = 0.62, G = 0.6, P0 = 0.46, P1 = 0.5,
PT
UD (x, 0) = wD (P1 ) + wD (1 − P0 − P1 ) + xG, (16) αD = 1 and L = 2. As shown in Fig. 2, the attack rate has a
sharp increase from 0 to 0.7, as the attacker’s objective weight
which increases linearly with x and is maximized at x = 0.5. α A changes at around 0.42, because a subjective APT attacker
5

tends to overweigh his or her attack cost. The objective weight (x∗, y∗ ) = (1, 0), if
of attacker α A = 0.42 is a turning point from Condition I3 to  αD 

I2 , i.e., the utility of the defender decreases sharply from 1.6 ⎨ G > exp − (− ln P1)

⎪ 
(25a)
to 0.89. I9 :
⎪ max (C1, C2 ) ≤ exp − (− ln P0 )α A
⎪ +0.5exp − (− ln P1 )α A ; (25b)
Next, we consider the storage defense game with a single ⎩
storage device and 3 non-zero attack duration levels, i.e., the
(x∗, y∗ )=([1, 0.5], [1, 0]), if
distribution of the attack duration follows [P0, P1, P2, 1 − P0 −
 
P1 − P2 ]. ⎧
⎪ G ≤ exp − (− ln P1 )αD (26a)

Theorem 2. The subjective storage defense game G with S = 1 ⎨ C1 > exp  − (− ln P0 )α A 


I10 :  
and L = 3 has an NE (x ∗, y ∗ ) = (1/3, 0), if ⎪
⎪ +0.5exp − (− ln P1 )α A (26b)
   ⎪
⎪  
⎧ 3 ⎩ C2 ≤ exp − (− ln P0 )α A ; (26c)

⎪ G < min exp − (− ln P1 ) αD
,




⎪ 2  (x∗, y∗ )=(1, [1, 0]), if



⎨ exp − (− ln P1 )αD ⎧
 αD 
I4 :
⎪   ⎨ G > exp − (− ln P1 )

⎪ 
(27a)


⎪ +
1
exp − (− ln P ) αD
(20a)
I11 :
⎪ C2 < exp − (− ln P0 )α A

⎪ 2 ⎪ +0.5exp − (− ln P )α A < C ;

⎪ 2 ⎩ 1 1 (27b)
⎪ C < exp  − (− ln P )α A  ;

(20b)
⎩ 0
and (x∗, y∗ ) = (1, 1), if
(x ∗, y ∗ ) = (2/3, 0), if  
  I12 : min (C1, C2 ) > exp − (− ln P0 )α A
⎧ 3  


⎪ exp − (− ln P1 )αD < G < +0.5 exp − (− ln P1 )α A . (28)

⎪ 2

⎪ 1  



⎪ exp − (− ln P1 )αD Proof: By (8), if 0 ≤ y1, y2 < 0.5, we have
2  
I5 : + exp − (− ln P2 )αD (21a)


⎪ 
⎪ C < exp − (− ln P0 )α A
 U APT (0.5, 0) = −2w A(P1 ) − 2w A(1 − P0 − P1 )



⎪ 1   − C1 − C2 ≥ − (2y1 + 2y2 ) w A(P0 ) − 2w A(P1 )

⎪ + exp − (− ln P1 )α A ; (21b)
⎩ 2 − 2w A(1 − P0 − P1 ) − C1 − C2 = U APT (0.5, y). (29)
(x ∗, y ∗ ) = (1, 0), if
   If 0 ≤ y2 < 0.5 ≤ y1 ≤ 1, and C1 < w A(P0 ), we have


⎪ G > max exp − (− ln P1 )αD
⎪ 

⎪ 1 αD  1   U APT (0.5, 0) = −2w A(P1 ) − 2w A(1 − P0 − P1 ) − C1 − C2

⎪ + 2 exp − (− ln P2 )
⎪ , exp − (− ln P1 )αD

⎪  2  ≥ − (1 + 2y2 ) w A(P0 ) − 2w A(P1 ) − 2w A(1 − P0 − P1 )

⎨ + exp − (− ln P2 )αD (22a)
I6 : − C2 = U APT (0.5, y). (30)

⎪   2  


⎪ C < exp − (− ln P0 )α A + exp − (− ln P1 )α A

⎪ 3 Similarly, (10) also holds, if 0 ≤ y1 < 0.5 ≤ y2 ≤ 1. Thus,

⎪ 1  

⎪ + exp − (− ln P2 )α A ; (22b) (10) holds for (x∗, y∗ ) = (0.5, 0).
⎩ 3
By (7), if 0 < x1, x2 ≤ 0.5, we have
and (x ∗, y ∗ ) = (1, 1), if
PT
  2   UD (0.5, 0) = 2wD (P1 ) + 2wD (1 − P0 − P1 ) + G
I7 : C > exp − (− ln P0 )α A + exp − (− ln P1 )α A
3 ≥ 2wD (P1 ) + 2wD (1 − P0 − P1 ) + (x1 + x2 )G
1   PT
+ exp − (− ln P2 )α A . = UD (x, 0). (31)
3
(23) If 0 ≤ x2 < 0.5 ≤ x1 ≤ 1, and G ≤ wD (P1 ), we have
Proof: The proof is given in Appendix A. PT
Under a low attack cost, as shown in (20b) (21b) and (22b), UD (0.5, 0) = 2wD (P1 ) + 2wD (1 − P0 − P1 ) + G
 
the attacker launches an attack immediately against storage 1
≥ + 1 wD (P1 ) + 2wD (1 − P0 − P1 )
device and the defender maximizes its detection interval to 2x1
save energy, as shown in (22a). If the attack cost is high, as + (x1 + x2 )G = UD PT
(x, 0). (32)
in (23), the attacker has no motivation to launch APT.
Now we consider the case with two storage devices that Similarly, (9) holds for the other cases, indicating that (0.5, 0)
have the same detection gain, i.e., G1 = G2 = G, and the is an NE of the game. We can prove the other NEs of the
same attack durations distribution, i.e., Pl1 = Pl2 = Pl . game similarly.
Theorem 3. If S = 2, L = 2, the subjective storage defense If the attack cost is low, i.e., (24b), the attacker launches
game G has an NE (x∗, y∗ ) = (0.5, 0), if APT attacks and the defender scans the two devices at the
   same frequency. If (28) holds with a high attack cost, the attack
G ≤ exp − (− ln P1 )αD (24a) motivation is suppressed and the defender maximizes the scan
I8 :  
max (C1, C2 ) ≤ exp − (− ln P0 )α A ; (24b) interval.
6

Theorem 4. An NE of the subjective storage defense game G and (4) as


with S storage devices and L non-zero attack duration levels
S 
 N
M 
is given by (x∗, y∗ ) = (1, 1), if EUT
UD (p, q) = pim qni
i=1 m=1 n=0
   
L
L−l    αA  nM + zi M N mGi
Ci > exp − − ln Pli , ∀1 ≤ i ≤ S. (33) min ,1 + (36)
L mN M
l=0
S 
 M N
U AEUT (p, q) = pim qni
Proof: If (33) holds, by (8), ∀0 ≤ yi ≤ 1 we have i=1 m=1 n=0
   n 
nM + zi M N m
− min ,1 −I < Ci . (37)
S 
 L
   αA  mN N M
U APT (1, 1) =− exp − − ln Pli
i=1 l=0 In the PT-based game G′, the subjective defender and the
  attacker make decisions to maximize their PT-based utilities,
S 
 L
   αA  yi L + l
≥− exp − − ln Pli min ,1 given by
i=1 l=0
L
S 
 M 
N
S
 PT
UD (p, q) = pim wD (qni )
− Ci = U APT (1, y). (34)
i=1 m=1 n=0
i=1    
nM + zi M N mGi
min ,1 + (38)
mN M
Thus, (10) holds for (x∗, y∗ ) = (1, 1).
S
 M N
By (7), ∀0 < xi ≤ 1, we have U APT (p, q) = w A(pim )qni
i=1 m=1 n=0
   n 
S 
 L
   αA  S
 nM + zi M N m
PT
UD (1, 1) = exp − − ln Pli + Gi − min ,1 −I < Ci . (39)
mN N M
i=1 l=0 i=1
S 
 L
   αA  S By definition, an NE of the PT-based mixed-strategy storage
≥ exp − − ln Pli + xi G i defense game G′, denoted by (p∗, q∗ ) is given by
i=1 l=0 i=1
PT
= UD (x, 1). (35) ⎧

⎪ p∗ = arg max UD PT
(p, q∗ ) (40a)
⎪ p



⎪ q∗ = arg max U APT (p∗, q) (40b)

⎪ q
Thus, (9) holds for (x∗, y∗ ) = (1, 1), indicating that (1, 1) is an ⎪


⎨ 
⎪ M
NE of the game.
⎪ pim = 1, p 0 (40c)
The subjective attacker has no motivation to launch APT ⎪


⎪ m=1
and the scan interval is maximized if (33) holds for a high ⎪
⎪ N
⎪  i


attack cost. ⎪
⎪ qn = 1, q 0, 1 ≤ i ≤ S. (40d)

⎩ n=0
Theorem 5. The NE of the subjective storage defense game
V. S UBJECTIVE PT- BASED S TORAGE D EFENSE G AME G′ is given by
WITH M IXED - STRATEGY ⎧      T
⎪ i m n


⎪ u D ( , ) w D qki∗

⎪ M N 1≤m≤M,0≤n≤ N 0kN

⎪ = λDi
1N+1
In the subjective cloud storage defense game with mixed- ⎪


⎪ (41a)
strategy, denoted by G′, the defender quantizes the scan ⎪
⎪ 
interval into M levels, i.e., xi ∈ {m/M }1≤m≤M , and choos-


⎪ i m n T    T
i∗

⎪ u A( , ) w A pk
es xi according to the mixed strategy denoted by p = ⎪
⎪ M N 1≤m≤M,0≤n≤N 1kM


[pim ]1≤m≤M,1<i ≤S , where pim = Pr(xi = m/M). The APT ⎪

⎪ = λiA1M
attacker quantizes his or her non-zero attack interval into (41b)


N non-zero levels, i.e., yi ∈ {n/N }0≤n≤N , and determines ⎪
⎪ M

the mixed strategy denoted by q = [qni ]0≤n≤N,1<i ≤S , where


⎪ pi∗m = 1, p 0, 1 ≤ i ≤ S (41c)

⎪ m=1
qni = Pr(yi = n/N).By definition, we have pim ≥ 0, qni ≥ ⎪

⎪ 
⎪ N
0, m=1M N
pim = 1 and n=0 qni = 1. ⎪



⎪ qni∗ = 1, q 0, 1 ≤ i ≤ S (41d)
For simplicity, we assume a known and constant time zi in ⎪


⎪ n=0
the mixed-strategy game G′ to focus on the impact of uncertain ⎪
⎪ λi ≥ 0, λi ≤ 0, (41e)
⎩ D A
opponent actions. The expected utilities of the defender and
the attacker in the mixed-strategy game G′ are given by (3) if the solution exists, where 1η is the η-dimensional all-1
7

column vector, and 0.5

 
m n nM + zi M N mGi α =1
uiD (
0.48
, ) = min ,1 + (42) D
αD=0.5
M N mN M
  n m
0.46
m n nM + zi M N
uiA( , ) = − min ,1 −I < Ci . (43)

Attack rate
M N mN N M 0.44

Proof: The Karush-Kuhn-Tucker (KKT) conditions of 0.42

(40) are given by


0.4
 
⎧ M M

⎪ PT ∗ − ϕ  pi − 1 +  μi pi

⎪ L D = U D (p, q ) m m m 0.38

⎪ m=1 m=1
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85
Objective weight of attacker, αA
0.9 0.95 1

⎪ ∂L D


∂pmi = 0
i i i i
(44) (a) Attack rate

⎪ M m ≤ 0, μm ≥ 0, μm pm = 0, 1 ≤ m ≤ M


−p

⎪  i 0.9


⎪ pm − 1 = 0.
⎩ m=1 0.88

0.86
According to (38), we apply the complementary slackness for
(44) to obtain

Scan frequency
0.84

⎧ N
   0.82


⎪ i k n i∗ i



u D M N wD (qn ) − λ D = 0, 1 ≤ k ≤ M
, 0.8


⎨ n=0
M
 (45) 0.78


⎪ pi = 1
⎪ m=1 m
⎪ 0.76

⎪ λi ≥ 0,
⎩ D 0.74
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
Objective weight of attacker, α
A
and yield (41a). Similarly, we have (41b).
(b) Scan frequency, p1∗ , with αD = 1
Corollary 1. If S = 1, M = 2, N = 1,
−0.9 0.75

uD (0.5, 0) − uD (1, 0) Attacker, αD=1


> 1, (46) Defender, αD=1
uD (1, 1) − uD (0.5, 1) −0.91
Attacker, α =0.5
D
Defender, αD=0.5

Utility of the defender


Utility of the attacker

u A(0.5, 1) − u A(0.5, 0)
> 1, (47) −0.92 0.7

u A(1, 0) − u A(1, 1)
the subjective storage defense game G′ has a unique NE given −0.93

by
    αA
u A(0.5, 1) − u A(0.5, 0)  ∗
ln + − ln 1 − p1 −0.94
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
0.65

u A(1, 0) − u A(1, 1) Objective weight of attacker, αA

 
  αA (c) Utility
− − ln p∗1 =0 (48)
    Fig. 3. Performance of the static subjective storage defense game with
uD (0.5, 0) − uD (1, 0)   αD mixed-strategy G′ at the NE, with C = 0.5, G = 0.1 and z = 0.2.
ln + − ln 1 − q0∗
uD (1, 1) − uD (0.5, 1)
 
 ∗  αD
− − ln q0 = 0. (49)
According to Corollary 1, the NE of the EUT-based storage
Proof: According to (1), (41a) and (46), we have (49). defense game is given by
Similarly, we can obtain (48) by (1), (41b) and (47). Nex-  
1−z−C G
t, we prove the uniqueness of q0∗ . As f (x) = (− ln (x))α ∗ ∗
(p1, q0 ) = , . (51)
min(2z, 1) − z 2 min(2z, 1) − 2z
monotonically decreases with x, by (46) and (49) we have
f (q0∗ ) > f (1 − q0∗ ), yielding 0 < q0∗ < 1 − q0∗ < 1. Thus we As shown in Fig. 3, the attack rate of the subjective storage
have 0 < q0∗ < 1/2. If 0 < x < 1/2, we have defense game G′ decreases with αD , e.g., it decreases by
  1.04%, as αD changes from 0.5 to 1, because a subjective
d f (1 − x) − f (x)
= f ′(1 − x) − f ′(x) > 0, (50) defender scans less frequently. Consequently, the utility of the
dx defender increases from 0.66 to 0.69, if αD changes from 0.5
indicating that f (1− x)− f (x) increases with x. Therefore, (49) to 1. In addition, the scan frequency decreases by 15% as α A
has a unique solution. Similarly, (48) has a unique solution. changes from 0.5 to 1, and thus the utility of the defender
decreases from 0.7 to 0.69.
8

Algorithm 1 APT defense in a dynamic game with pure-strategy. Algorithm 2 APT defense in a dynamic game with mixed-strategy.
Initialize γ = 0.7, δ = 0.7, y 0, z 0, Q(s, x) = 0, V (s) = 0, ∀x, s. Initialize γ = 0.7, δ = 0.7, Φ0, Q(s, p) = 0, V (s) = 0, ∀s, p.
For k = 1, 2, 3, ... For k = 1, 2, 3, ...
s k = y k−1 + z k−1 Update s k = Φ k−1
Choose x k via (54) Choose p k with the  -greedy algorithm
Scan the storage device after time x k Scan the storage device according to strategy p k
Observe u D and y k + z k Observe u D and Φ k
Update Q(s k , x k ) via (52) Update Q(s k , p k ) via (55)
Update V (s k ) via (53) Update V (s k ) via (56)
End for End for

0.14
VI. DYNAMIC PT- BASED S TORAGE D EFENSE G AME Case1, Q−learing
Case1, greedy
0.12
If the defender is unaware of the APT attack model and Case2, Q−learning
Case2, greedy
the subjective view model in the dynamic subjective cloud 0.1

storage defense game, the storage defender can apply the Q- 0.08

Attack rate
learning technique, a model-free and widely-used reinforce- 0.06

ment learning technique, to derive an optimal action-selection


0.04
policy in the Markov decision process. The Q-learning based
defense strategy updates the quality function, denoted by 0.02

Q(s, x), which is the expected long-term discounted reward 0

with action x at system state s, which consists of the action −0.02


0 1000 2000 3000 4000 5000 6000 7000 8000
of the opponent and the parameters of the environment. Time slot

(a) Attack rate


A. Dynamic storage defense game with pure-strategy
1.6

The dynamic PT-based game with pure-strategy, denoted by Case1, Q−learing


1.55

G, consists of a storage defender and a subjective APT attacker Case1, greedy


Case2, Q−learing
under uncertain attack duration against a storage device. 1.5
Case2, greedy

The system state observed at time k is the total attack


Utility of defender

1.45

duration in the last slot z k−1 + y k−1 . The value function V(s) 1.4

provides the maximum expected reward of the defender at 1.35

system state s. The defender updates the Q-function based on


1.3
the immediate utility uD and the value function as follows:
    1.25

Q sk, x k ← − (1 − γ)Q s k , x k + 1.2


     0 1000 2000 3000 4000
Time slot
5000 6000 7000 8000

γ uD s k , x k + δV s k+1 (52)
    (b) Utility of the defender
V s k = max Q s k , x , (53)
x ∈x Fig. 4. Performance of the dynamic storage defense game with pure-strategy
G averaged over 1000 runs, with L = 5, C = 0.4 and α A = 0.8 in Case 1,
where δ ∈ [0, 1] is the discount factor regarding the future and L = 2, C = 0.62 and α A = 0.3 in Case 2.
reward, and γ ∈ (0, 1] is the learning rate of the current
experience.
By applying the ǫ-greedy policy, the defender chooses its
scan interval x k to maximize its current Q-function as Let Q(s, p) denote the Q-function with mixed-strategy p, and
 V(s) be the value function. Based on the iterative Bellman
1 − ǫ, x̃ = arg maxx Q(s k , x) equation, the Q-function can be updated with
Pr(x k = x̃) =  (54)       
M−1 , o.w.
Q sk , pk ← − (1 − γ)Q sk , pk + γ uD + δV sk+1 (55)
The Q-learning based storage defense algorithm is summarized    
in Algorithm 1. V sk = max Q sk , p . (56)
p

B. PT-based dynamic game with mixed-strategy The mixed-strategy is chosen based on the ǫ-greedy algorithm
in terms of the Q-function in (55). The algorithm is summa-
In the dynamic PT-based storage defense game with mixed- rized in Algorithm 2.
strategy, denoted by G ′, the defender chooses the detection
interval distribution p = [pm ]1≤m≤M , while the attacker
determines the attack interval distrubution q = [qn ]0≤n≤N , VII. S IMULATION R ESULTS
where pm and qn are quantized into ζ levels, with 1 ≤ m ≤ M, Simulations have been performed to evaluate the perfor-
and 0 ≤ n ≤ N. mance of the Q-learning based APT detection scheme in the
The system state at time k is defined as the total attack PT-based dynamic games G and G ′. If not specified otherwise,
duration distribution in the last time slot, denoted by Φk−1 . we set αD = 1 to maximize the utility of the defender,
9

0.18 0.18

Q−learning
0.16 0.16 Q−learing
Greedy
Greedy
0.14 0.14

0.12 0.12
Average attack rate

Attack rate
0.1 0.1

0.08 0.08

0.06 0.06

0.04 0.04

0.02 0.02

0 0
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 0 500 1000 1500 2000 2500 3000 3500 4000
Objective weight of the attacker, αA Time slot

(a) Average attack rate (a) Attack rate


1.45

Q−learning
Greedy Q−learing
1.15 Greedy
1.4
Average utility of the defender

Utility of defender
1.1

1.35

1.05

1.3
1

1.25
0.95

0.9
0 500 1000 1500 2000 2500 3000 3500 4000
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
Objective weight of the attacker, α Time slot
A

(b) Average utility of the defender (b) Utility of the defender

Fig. 5. Performance of the dynamic storage defense game with pure-strategy Fig. 6. Performance of the dynamic storage defense game with mixed-
G averaged over 4000 time slots, with L = 5, C = 0.4, G = 0.6 and  = 0.1. strategy G ′ averaged over 1000 runs, with z = 0.3, C = 0.6, G = 0.25,
M = 2, N = 1, αD = 1, α A = 0.8 and  = 0.1.

α A = 0.8 to represent a typical subjective attacker, γ = 0.7, from 1.39 to 1.27, if α A changes from 0.5 to 1. The utility
δ = 0.7, and ǫ = 0.1 to achieve good performance. We changes most significantly if α A changes between 0.7 and 0.9,
chose typical attack and defense parameters, G = 0.6 and because the attack interval changes most significantly due to
z = 0.3, and used a greedy detection strategy as benchmark, the probability distortion of the subjective attacker, and the
in which the scan interval is chosen to maximize the estimated turning points at α A = 0.64 and 0.94 match the theoretic
immediate utility based on the previous attack interval. The results in Theorem 1.
attack strategy is chosen to maximize the PT utility of the The performance of the PT-based mixed-strategy game G ′
attacker according to the attack history in the last time slot. in Fig. 6 shows that the attack rate decreases with time from
The performance of the PT-based dynamic game with pure- 15% at the beginning of the game to 6% after 500 time slots,
strategy G is shown in Fig. 4, with L = 5, C = 0.4 and which is only half of that of the benchmark strategy if α A = 1.
α A = 0.8 in Case 1, and L = 2, C = 0.62 and α A = 0.3 Thus, the utility of the defender increases over time, e.g., from
in Case 2. The attack rate decreases over time, from 8% at 1 to around 1.1 after 500 time slots, and is 10% higher than
the beginning of the game to 0.5% after 6000 time slots in that of the benchmark strategy at time slot 500.
Case 1, about 93.7% lower than the convergent attack rate of As shown in Fig. 7, the attack rate R increases from 3%
the benchmark strategy. Consequently, as shown in Fig. 4 (b), to 6.5% , if α A changes from 0.2 to 1, and the attack rate
the utility of the defender increases over time from 1.29 at is only half of that of the benchmark strategy at α A = 1.
the beginning to 1.43 after 6000 time slots at convergence, Consequently, the utility of the defender decreases from 1.15
about 10.9% higher than the benchmark strategy. In Case 2, to 1.095, if α A changes from 0.2 to 1.
the utility of the defender converges to 1.6 after 8000 times
slots, which matches the result of the NE given by Theorem VIII. C ONCLUSION
1. In this work, we have formulated PT-based cloud storage
As shown in Fig. 5, the attack rate increases with the defense games to investigate the impact of the subjective view
objective weight of the attacker, e.g., R increases about 4 times of APT attackers under uncertain attack durations in the pure-
if α A changes from 0.5 to 1. The attack rate at convergence is strategy game or uncertain scan interval of the defender in
4.4%, which is 70.6% lower than the benchmark strategy, with the mixed-strategy game. The NEs of the PT-based games
α A = 1. Consequently, the utility of the defender decreases have been provided, showing that a subjective attacker tends
10

[9] T. Li and N. B. Mandayam, “Prospects in a wireless random access


Q−learning
Greedy
game,” in Proc. Annu. Conf. Inf. Sci. Syst., pp. 1–6, Princeton, NJ, Mar.
0.12
2012.
[10] T. Li and N. B. Mandayam, “When users interfere with protocols:
0.1
Prospect theory in wireless networks using random access and data
Average attack rate

pricing as an example,” IEEE Trans. Wireless Commun., vol. 13, no. 4,


0.08
pp. 1888–1907, Apr. 2014.
[11] J. Yu, M. H. Cheung, and J. Huang, “Spectrum investment with
0.06
uncertainty based on prospect theory,” in Proc. IEEE Int. Conf. Commun.
(ICC), pp. 1620–1625, Sydney, Australia, Jun. 2014.
0.04 [12] L. Xiao, J. Liu, Y. Li, N. B. Mandayam, and H. V. Poor, “Prospect
theoretic analysis of anti-jamming communications in cognitive radio
0.02 networks,” in Proc. IEEE Global Commun. Conf., pp. 746 – 751, Austin,
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
TX, Dec. 2014.
Objective weight of attacker, α [13] Y. Yang and N. B. Mandayam, “Impact of end-user decisions on pricing
A

in wireless networks,” in Proc. IEEE Annu. Conf. Inf. Sci. Syst. (CISS),
(a) Average attack rate pp. 1–6, Princeton, NJ, Mar. 2014.
[14] Y. Yang and N. B. Mandayam, “Impact of end-user decisions on pricing
in wireless networks under a multiple-user-single-provider setting,” in
Q−learning
Proc. Annu. Allerton. Conf. Commun. Control Comput., pp. 206 – 212,
1.15
Greedy Monticello, IL, Oct. 2014.
Average utility of the defender

[15] Y. Yang, L. T. Park, N. B. Mandayam, I. Seskar, A. L. Glass, and


1.1 N. Sinha, “Prospect pricing in cognitive radio networks,” IEEE Trans.
Cognitive Commun. Netw., vol. 1, no. 1, pp. 56–70, Mar. 2015.
1.05
[16] L. Xiao, N. B. Mandayam, and H. V. Poor, “Prospect theoretic analysis
of energy exchange among microgrids,” IEEE Trans. Smart Grid, vol. 6,
no. 1, pp. 63–72, Jan. 2015.
1
[17] Y. Wang, W. Saad, N. B. Mandayam, and H. V. Poor, “Integrating energy
storage into the smart grid: A prospect theoretic approach,” in Proc.
0.95 IEEE Int. Conf. Acoust. Speech Signal Processing, pp. 7779 –7783,
Florence, Italy, May 2014.
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Objective weight of the attacker, α
A
[18] D. Prelec, “The probability weighting function,” Econometrica, vol. 66,
no. 3, pp. 497–527, May 1998.
(b) Average utility of the defender [19] M. Zhang, Z. Zheng, and N. B. Shroff, “Stealthy attacks and observable
defenses: A game theoretic model under strict resource constraints,” in
Fig. 7. Performance of the dynamic storage defense game with mixed- Proc. IEEE Global Conf. Signal Inf. Process (GlobalSIP), pp. 813–817,
strategy G ′ averaged over 2500 time slots, with z = 0.3, C = 0.6, G = 0.25, Atlanta, GA, Dec. 2014.
M = 2, N = 1 and  = 0.1. [20] J. Pawlick, S. Farhang, and Q. Zhu, “Flip the cloud: Cyber-physical
signaling games in the presence of advanced persistent threats,” in
Decision and Game Theory for Security, vol. 9406, pp. 289–308, Nov.
2015.
[21] M. Zhang, Z. Zheng, and N. B. Shroff, “A game theoretic model for
to overweight his or her attack cost and thus increases the scan defending against stealthy attacks with limited resources,” in Decision
interval, yielding a higher utility of the defender. A Q-learning and Game Theory for Security, vol. 9406, pp. 93–112, Nov. 2015.
based APT resistance scheme has been proposed to improve [22] P. Hu, H. Li, H. Fu, D. Cansever, and P. Mohapatra, “Dynamic defense
strategy against advanced persistent threat with insiders,” in Proc. IEEE
the performance of the dynamic storage defense game, e.g., in Conf. Comput. Commun. (INFOCOM), pp. 747–755, HK, May 2015.
our simulation examples, the attack rate decreases by 50% and [23] L. Xiao, J. Liu, Q. Li, N. B. Mandayam, and H. V. Poor, “User-centric
the utility of the defender increases by 10% against subjective view of jamming games in cognitive radio networks,” IEEE Trans. Inf.
Forensics and Security, vol. 10, no. 12, pp. 2578–2590, Dec. 2015.
APT attackers compared with the benchmark greedy strategy. [24] Y. Han, T. Alpcan, J. Chan, C. Leckie, and B. I. P. Rubinstein, “A
game theoretical approach to defend against co-resident attacks in cloud
computing: Preventing co-residence using semi-supervised learning,”
R EFERENCES IEEE Trans. Inf. Forensics and Security, vol. 11, no. 3, pp. 556–570,
Mar. 2016.
[1] R. B. Sagi, “The Economic Impact of Advanced Persistent Threats,” [25] D. Xu, Y. Li, L. Xiao, N. B. Mandayam, and H. V. Poor, “Prospect
IBM Research Intelligence, May 2014, http://www-01.ibm.com. theoretic study of cloud storage defense against advanced persistent
[2] C. Tankard, “Advanced persistent threats and how to monitor and deter threats,” in Proc. IEEE Global Commun. Conf., Washington, DC, Dec.
them,” Network Security, vol. 2011, no. 8, pp. 16–19, Aug. 2011. 2016.
[3] M. van Dijk, A. Juels, A. Oprea, and R. L. Rivest, “Flipit: The game
of stealthy takeover,” J. Cryptology, vol. 26, no. 4, pp. 655–713, Oct.
2013. A PPENDIX A
[4] D. Kahneman and A. Tversky, “Prospect theory: An analysis of decision
under risk,” Econometrica, vol. 47, no. 2, pp. 263–291, Mar. 1979. P ROOF OF T HEOREM 2
[5] A. Tversky and D. Kahneman, “Advances in prospect theory: Cumulative
representation of uncertainty,” J. Risk Uncertainty, vol. 5, no. 4, pp. 297– By (8), if 0 ≤ y < 1/3, we have
323, Oct. 1992.  
[6] W. Saad, A. Glass, N. B. Mandayam, and H. V. Poor, “Toward a user- PT 1
centric grid: A behavioral perspective,” Proc. IEEE, vol. 104, no. 4, UA , 0 = −w A(P1 ) − w A(P2 )
3
pp. 865–882, Apr. 2016.
[7] S. Gao, E. Frejinger, and M. Ben-Akiva, “Adaptive route choices in risky − w A(1 − P0 − P1 − P2 ) − C ≥ −3yw A(P0 ) − w A(P1 )
traffic networks: A prospect theory approach,” Transp. Res. C Emerg.
Technol., vol. 18, no. 5, pp. 727–740, Oct. 2010. − w A(P2 ) − w A(1 − P0 − P1 − P2 ) − C
 
[8] G. W. Harrison and E. E. Rutstrom, “Expected utility theory and prospect
PT 1
theory: One wedding and a decent funeral,” Experim. Econ., vol. 12, = UA ,y . (57)
no. 2, pp. 133–158, Feb. 2009. 3
11

If (20b) holds, by (8), ∀1/3 ≤ y ≤ 1, we have Dongjin Xu received the B.S. degree in communica-
  tion engineering from Xiamen University, Xiamen,
1 China, in 2016, where she is currently pursuing the
U APT , 0 = −w A(P1 ) − w A(P2 ) M.S. degree with the Department of Communication
3
Engineering. Her research interests include network
− w A(1 − P0 − P1 − P2 ) − C ≥ −w A(P0 ) − w A(P1 ) security and wireless communications.
− w A(P2 ) − w A(1 − P0 − P1 − P2 )
 
PT 1
= UA ,y . (58)
3
Thus, (10) holds for (x ∗, y ∗ ) = (1/3, 0).
By (7), if 0 < x < 1/3, we see that UD PT (x, 0) increases

linearly with x and is maximized at 1/3.


According to (7), if 1/3 < x < 2/3, we have
PT 1
UD (x, 0) = wD (P1 ) + wD (P2 ) Caixia Xie received the B.S. degree in communica-
3x tion engineering from Xiamen University, Xiamen,
+ wD (1 − P0 − P1 − P2 ) + xG, (59) China, in 2015, where she is currently pursuing the
M.S. degree with the Department of Communication
and Engineering. Her research interests include network
security and wireless communications.
PT 
∂ 2 UD
 2
= wD (P1 ) ≥ 0, (60)
∂ x2  y=0 3x 3
indicating that UD PT (x, 0) is concave and is maximized at 1/3

or 2/3.
Similarly, if 2/3 < x < 1, we see that UD PT (x, 0) is concave

and is maximized at 2/3 or 1.


By (7), if (20a) holds, we have
 
PT 1
UD , 0 = wD (P1 ) + wD (P2 ) Narayan B. Mandayam (S’89, M’94, SM’99, F’09)
3 received the B.Tech (Hons.) degree in 1989 from

1 1 the Indian Institute of Technology, Kharagpur, and
+ wD (1 − P0 − P1 − P2 ) + G ≥ max wD (P1 ) the M.S. and Ph.D. degrees in 1991 and 1994 from
3 3 Rice University, all in electrical engineering. Since
2 1994 he has been at Rutgers University where he is
+ wD (P2 ) + wD (1 − P0 − P1 − P2 ) + G, currently a Distinguished Professor and Chair of the
3  Electrical and Computer Engineering department.
1 2 He also serves as Associate Director at WINLAB.
wD (P1 ) + wD (P2 ) + wD (1 − P0 − P1 − P2 ) + G
2 3 He was a visiting faculty fellow in the Department
   of Electrical Engineering, Princeton University, in
PT PT 2 2002 and a visiting faculty at the Indian Institute of Science, Bangalore,
= max UD (1, 0), UD ,0 . (61)
3 India in 2003. Using constructs from game theory, communications and
networking, his work has focused on system modeling, information processing
Thus, (9) holds for (x ∗, y ∗ ) = (1/3, 0). Similarly, we can prove as well as resource management for enabling cognitive wireless technologies
the other NEs in the subjective APT game. to support various applications. He has been working recently on the use
of prospect theory in understanding the psychophysics of data pricing for
wireless networks as well as the smart grid. His recent interests also include
privacy in IoT as well as modeling and analysis of trustworthy knowledge
creation on the internet.
Dr. Mandayam is a co-recipient of the 2015 IEEE Communications Society
Advances in Communications Award for his seminal work on power control
and pricing, the 2014 IEEE Donald G. Fink Award for his IEEE Proceedings
paper titled "Frontiers of Wireless and Mobile Communications" and the
2009 Fred W. Ellersick Prize from the IEEE Communications Society for
Liang Xiao (M’09, SM’13) received the B.S. degree his work on dynamic spectrum access models and spectrum policy. He is
in communication engineering from the Nanjing U- also a recipient of the Peter D. Cherasia Faculty Scholar Award from Rutgers
niversity of Posts and Telecommunications, Nanjing, University (2010), the National Science Foundation CAREER Award (1998)
China; the M.S. degree in electrical engineering and the Institute Silver Medal from the Indian Institute of Technology (1989).
from Tsinghua University, Beijing, China; and the He is a coauthor of the books: Principles of Cognitive Radio (Cambridge
Ph.D. degree in electrical engineering from Rutgers University Press, 2012) and Wireless Networks: Multiuser Detection in
University, New Brunswick, NJ, USA, in 2000, Cross-Layer Design (Springer, 2004). He has served as an Editor for the
2003, and 2009, respectively. She is currently a journals IEEE Communication Letters and IEEE Transactions on Wireless
Professor with the Department of Communication Communications. He has also served as a guest editor of the IEEE JSAC
Engineering, Xiamen University, Fujian, China. Her Special Issues on Adaptive, Spectrum Agile and Cognitive Radio Networks
current research interests include smart grids, net- (2007) and Game Theory in Communication Systems (2008). He is a Fellow
work security, and wireless communications. and Distinguished Lecturer of the IEEE.
12

H. Vincent Poor (S’72, M’77, SM’82, F’87) re-


ceived the Ph.D. degree in EECS from Princeton
University in 1977. From 1977 until 1990, he was
on the faculty of the University of Illinois at Urbana-
Champaign. Since 1990 he has been on the faculty
at Princeton, where he is currently the Michael
Henry Strater University Professor of Electrical En-
gineering. During 2006 to 2016, he served as Dean
of Princeton’s School of Engineering and Applied
Science. His research interests are in the areas of
information theory, statistical signal processing and
stochastic analysis, and their applications in wireless networks and related
fields. Among his publications in these areas is the book Mechanisms and
Games for Dynamic Spectrum Allocation (Cambridge University Press, 2014).
Dr. Poor is a member of the National Academy of Engineering, the National
Academy of Sciences, and is a foreign member of the Royal Society. He is
also a fellow of the American Academy of Arts and Sciences, the National
Academy of Inventors, and other national and international academies. He
received the Marconi and Armstrong Awards of the IEEE Communications
Society in 2007 and 2009, respectively. Recent recognition of his work
includes the 2016 John Fritz Medal, the 2017 IEEE Alexander Graham Bell
Medal, a Doctor of Science honoris causa from Syracuse University (2017)
and Honorary Professorships at Peking University and Tsinghua University,
both conferred in 2016.

You might also like