0% found this document useful (0 votes)
75 views11 pages

Understanding Human-AI Cooperation Through Game-Theory and Reinforcement Learning Models

This paper aims to understand human-AI cooperation through game theory and reinforcement learning models. It conducts an empirical study comparing different reinforcement learning algorithms (Vanilla Policy Gradient, Proximal Policy Optimization, and Deep Q-Network) and game theory scenarios (Hawk Dove and Prisoners dilemma) in a large-scale human-AI experiment. The results found that the Deep Q-Network algorithm and Hawk Dove scenario elicited significantly higher levels of cooperation between humans and AI. A regression analysis also found that the choice of reinforcement learning algorithm and game theory scenario could successfully predict the level of cooperation in human-AI systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views11 pages

Understanding Human-AI Cooperation Through Game-Theory and Reinforcement Learning Models

This paper aims to understand human-AI cooperation through game theory and reinforcement learning models. It conducts an empirical study comparing different reinforcement learning algorithms (Vanilla Policy Gradient, Proximal Policy Optimization, and Deep Q-Network) and game theory scenarios (Hawk Dove and Prisoners dilemma) in a large-scale human-AI experiment. The results found that the Deep Q-Network algorithm and Hawk Dove scenario elicited significantly higher levels of cooperation between humans and AI. A regression analysis also found that the choice of reinforcement learning algorithm and game theory scenario could successfully predict the level of cooperation in human-AI systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/348066637

Understanding Human-AI Cooperation Through Game-Theory and


Reinforcement Learning Models

Conference Paper · December 2020


DOI: 10.24251/HICSS.2021.041

CITATIONS READS
2 499

4 authors:

Beau Schelble Christopher Flathmann


Clemson University Clemson University
16 PUBLICATIONS   153 CITATIONS    14 PUBLICATIONS   40 CITATIONS   

SEE PROFILE SEE PROFILE

Nathan J. McNeese Lorenzo Barberis Canonico


Clemson University Clemson University
117 PUBLICATIONS   1,203 CITATIONS    13 PUBLICATIONS   35 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Developing Synthetic Task Environments for Studying Human–AI–Robot Teaming View project

New Contexts of Team Cognition View project

All content following this page was uploaded by Beau Schelble on 31 December 2020.

The user has requested enhancement of the downloaded file.


Proceedings of the 54th Hawaii International Conference on System Sciences | 2021

Understanding Human-AI Cooperation Through Game-Theory and


Reinforcement Learning Models

Beau G. Schelble1 , Christopher Flathmann1 , Nathan McNeese1 , Lorenzo Barberis Canonico 1 ,


1
Human-Centered Computing, Clemson University
(bschelb,cflathm,mcneese,lorenzb)@g.clemson.edu

Abstract human and at least one AI interacting with one another


in a shared environment or task, as seen in other recent
For years, researchers have demonstrated the work [1]. This potential began to be shown when
viability and applicability of game theory principles IBM’s artificial intelligence (AI) system Deep Blue
to the field of artificial intelligence. Furthermore, defeated Kasparov [2], shifting the field of human-AI
game theory has been shown as a useful tool for interaction. Human-AI systems’ ability was shown
researching human-machine interaction, specifically explicitly in Kasparov’s ”advanced chess” tournament
their cooperation, by creating an environment where (where AIs, humans, and human-AI systems compete
cooperation can initially form before reaching a against each other), highlighting a human-AI system
continuous and stable presence in a human-machine that could defeat both the top AI as well as the top
system. Additionally, recent developments in human chess players. This team consisted of an amateur
reinforcement learning artificial intelligence have human and a mediocre AI [3]. The finding suggested
led to artificial agents cooperating more efficiently with that successful collaboration between humans and AI is
humans, especially in more complex environments. This certainly possible.
research conducts an empirical study to understand how
different modern reinforcement learning algorithms The enhanced ability behind many human-AI
and game theory scenarios could create different systems seen in recent research lies in leveraging
cooperation levels in human-machine teams. Three either agent’s strengths. As research published in
different reinforcement learning algorithms (Vanilla Nature shows bots attempting to solve a graph coloring
Policy Gradient, Proximal Policy Optimization, and problem requiring high levels of coordination, fails to
Deep Q-Network) and two different game theory achieve a globally optimal solution [4]. The study
scenarios (Hawk Dove and Prisoners dilemma) points out a way to transcend the limitations of bots
were examined in a large-scale experiment. The coordinating on a macro level by adding humans
results indicated that different reinforcement learning to the team [4]. Successful human-AI systems are
models interact differently with humans with Deep-Q capable of doing more than merely playing Chess and
engendering higher cooperation levels. The Hawk solving graph coloring problems; they extend even to
Dove game theory scenario elicited significantly higher the medical field. Specifically, in cancer detection,
levels of cooperation in the human-artificial intelligence as teams of doctors partnering with machine learning
system. A multiple regression using these two algorithms outperformed both expert teams as well as
independent variables also found a significant ability to state-of-the-art neural networks in diagnosing cancer
predict cooperation in the human-artificial intelligence [5]. This level of success can be attributed to the unique
systems. The results highlight the importance of social advantages that emerge from harvesting human and AI
and task framing in human-artificial intelligence potential in a compatible and integrated way and should
systems and noted the importance of choosing pave the way for more research of this kind.
reinforcement learning models. However, these human-AI systems are not without
their challenges. Prior research has shown that effective
team behavior occurs when each team member seeks
1. Introduction to model their teammates’ thought process, which is
inherently more challenging in a human-AI system [6].
Human-artificial intelligence (AI) systems have a Specifically, humans tend to distance themselves from
massive potential to outperform either agent alone. teammates they perceive to be autonomous, and AIs tend
Human-AI systems are characterized by at least one to avoid wanting to cooperate with human agents who

URI: https://hdl.handle.net/10125/70652
978-0-9981331-4-0 Page 348
(CC BY-NC-ND 4.0)
do not share their thought process [7]. These challenges 2. Related Work
can make cooperation in human-AI systems difficult,
especially when coupled with the fact that cooperation Game theory is the study of the decision-making
in these human-AI systems can be significantly altered process of self-interested agents in strategic situations.
by the nature of the task and the most mutually Emerging from the intersection between mathematics
beneficial outcome, a problem compounded by the fact and economics, it functions as a highly appropriate
that task and social framing varies widely from system framework to conduct AI research in cooperation
to system. as it provides a mathematical common ground that
Task framing is the reason for taking action, while humans can understand, and AI can train to be experts
social framing is the context of the agents’ relationship in. Hence, a reward-maximizing agent embodies the
and the results of taking an action [8]. The importance definition of a ”rational” player within these specific
of task design and group dynamics is mentioned in contexts; however, this does not mean this mathematical
recent research agendas on human-machine interaction rationality would extend to other contexts as complexity,
[9], emphasizing the importance of addressing these and environmental factors could change. While a
aspects of human-AI systems as the current study does. human could achieve this rationality, AI would have
Choosing the right reinforcement learning model (RL) greater consistency in being rational in game theory
from the many available contributes to the problem, as specific scenarios. While it is out of the scope of
the chosen model may impact their ability to cooperate. this study, human rationality in different environments
RL agents become more prevalent in applied settings could be further explored through concepts such as
around the world as time goes on, serving in a variety of bounded rationality, which would provide a more
different industries [10, 11], the need for further research human-centered definition of rationality, especially
to clarify the dynamics behind human-AI systems is in contexts outside of these simplistic game theory
obvious. With the numerous RL models available to scenarios. However, the game theory rationality studied
practitioners, there is a specific need to highlight the here is essential in reaching a Nash Equilibrium, where
dynamics behind the RL model used on human-AI each individual’s strategies can converge and mutually
system dynamics like cooperation. Along with the respond to each other [12]. The use of game theory
various settings and contexts that human-AI systems are models, and therefore game theory scenarios imply that
deployed to, it is vital to identify how the social and task players are going to converge to this equilibrium.
framing of these systems may potentially alter system
outcomes. 2.1. Game Theory
The current paper leverages the use of two
similar but distinct game theory scenarios that While the Nash Equilibrium can be present in a
specifically emphasize cooperation to construct strategic wide variety of scenarios, the optimal equilibrium can
interactions. The experimental setups incorporate three differ from scenario to scenario. Due to the existence
state-of-the-art RL models whose strategic behavior of scenario-specific optimal strategies, research efforts
illuminates their different receptiveness to specific have created Matrix Game Social Dilemmas (MGSDs),
incentive structures, while the two game theory which allow game theory principles to be applied to
scenarios emphasize differences in social and task a variety of scenarios to elicit multiple factors in
framing for these same systems. In order to capture creating group strategy, including group reciprocity,
these differences, the current study focuses on norm enforcement, and social network effects [13]. The
answering the following research questions: design of these scenarios has resulted in the creation
of games where individual players are not able to
• Research Question 1: Does average overall succeed solely through an individualist mindset, but
cooperation in a human-AI system differ based on rather through group strategy [14].
the reinforcement learning model used? Due to the implicit goal of reaching a group strategy,
game theory provides a potentially beneficial lens for
• Research Question 2: What effect does the game viewing human-AI interaction through; most notably,
theory scenario have on average overall human-AI game theory can be used to evaluate the team’s ability
system cooperation? to coordinate and cooperate [15]. Specifically, MGSDs
provide a powerful method for evaluating human-AI
• Research Question 3: Can the game theory cooperation as they provide a variety of social and task
scenario and reinforcement learning model used contexts that can be used to engage joint strategy within
to predict overall human-AI system cooperation? a team [16]. The formation of these strategies depends

Page 349
heavily on important teaming factors, like fairness, ability to learn a cooperative strategy from the self-play
coordination, reciprocity, and cooperation, which is of simulations of game-theory games.
the focus of this study [17]. Due to the factors that While RL’s strengths allow it to find optimal paths in
contribute to efficient game theory models, which are more simplistic contexts, such as older video games, RL
essential to teams in general, game theory serves as a has shown a deficiency in understanding more complex
powerful tool for observing human-agent interaction. environments where multiple solutions exist and the
Due to the limitations of AI in representing possibility of getting stuck in a local optimum increases
the expansive real world, limiting human-AI system [24]. Advances in RL have seen these problems
interaction around MGSDs provides a capable and fair begin to be mitigated; for example, J.W. Crandall’s
environment for observing and understanding human-AI work in human-AI cooperation has produced a novel
interaction. The matrix design of MGSDs and their RL model that ensures payoffs are at a minimum of
social nature create a platform that can put humans the game and also learns to cooperate [25]. These
and AI on an even playing field. This methodology, findings are in addition to more recent work showing
and the benefits outlined above, can be extended to the the newly developed S# model to be fully capable
context of human-AI systems, specifically involving AI of cooperating with human players and AI players
built with RL, to observe human-AI cooperation and in game-theoretical situations like Prisoner’s Dilemma,
the potential of using game theory to predict and plan which require intuition and are affected by cultural
human-AI strategy. norms and emotion [26]. Due to the potential RL
AI has shown in the past years for understanding
2.2. Reinforcement Learning complex environments, it would prove to be an essential
tool in human-AI cooperation since teams are set in
RL is a class of machine learning algorithms that are more realistic environments. However, despite these
based upon behavioral models that reward and punish advancements, there is still a lack of empirical research
behavior for inducing the discovery of a unique policy, on human-AI system’s ability to converge and cooperate
mapping situations to actions as to maximize positive on optimal strategies when observed in differing social
rewards over time [18]. These tradeoffs and strategies and task framing.
are balanced through a series of hyperparameters
inherent to each RL model, giving unique advantages 2.3. Human-AI Systems
based on the underlying algorithm. Some of the
most widely used modern RL models include Deep Human-AI systems involve two or more agents
Q-Network (DQN) [19], Vanilla Policy Gradients which consist of at least a single human, and a single AI.
(VPG) [20], and Proximal Policy Optimization (PPO) The principal obstacle has been avoiding limiting each
[21]. In terms of similarities and differences, VPG agent to only local information to make processing a
and PPO are more closely related to each other as complex environment tractable. Markovian games have
they are both on-policy methods while the DQN is emerged as the primary model for human-AI systems
an off-policy method, which, simplistically, means the that include RL because they enable a distributed
DQN maximizes the utility of target states while VPG decision-making process and a stochastic environment
and PPO maximize the utility of the current state. These [27]. The dominant approach in those settings relies on
optimization differences could lead to a higher level the Nash-Q algorithm for general-sum stochastic games,
of convergence by the DQN, especially in the more which enables RL agents to converge towards stable
simplistic environments used in game theory. strategies in zero-sum and common payoff games. Such
One of RL’s greatest strengths is creating an restrictive parameters have made Nash-Q challenging
understanding of an environment through simple board to generalize from [28]. Furthermore, such tight game
states, which has led to RL models being highly skilled structures do not lend themselves to optimal solutions
in a variety of games, such as Go, Chess, and soccer for games with multiple NEs, making learning in such
[22]. This strength is made possible through self-play, settings nontrivial [29].
where AI agents can repeatedly play many games over Prior human-AI research has focused on independent
time to develop a sophisticated understanding of their RL, where each RL agent is not aware of the other
environment, which can be represented as a simplistic agents and instead senses them as part of the interactions
board state and reward signals [23]. These skills are with its environment. Such non-stationary environments
important to navigating game theory scenarios as they violate the Markovian property, which undermines
too can be represented as matrix-based board states that the generalizability of the policies the agents learn
can be navigated by AI systems, which includes the [30]. Specifically, human-AI system strategies rarely

Page 350
converge towards an optimal equilibrium because, as years, 14 between 56-65 years. The Prisoners Dilemma
long future rewards are highly discounted, agents condition consisted of 103 human-AI systems, while the
may not risk deviating from a suboptimal equilibrium Hawk Dove condition consisted of the remaining 123
[31]. Alternatives to independent RL involve creating human-AI systems. The number of human-AI systems
special-purpose algorithms (WoLF, JAL, AWESOME) per reinforcement learning model is shown below in
that privilege rationality and joint-action in many Table 1. The imbalance in the number of human-AI
cooperative scenarios [32]. However, recent research systems completing the tasks was a result of systems
has pointed out that such algorithms cannot shape the being dropped from the analysis for incomplete data
learning behavior of the opponents to obtain higher recording during the completion of the game theory task,
payouts at convergence, especially over repeated games which was the result of client-side connectivity issues.
[22].
The current study, however, does not focus on
Table 1. Participant Numbers
hyperparameter tuning, algorithm development, or other
technical advancements, but instead on the human
Prisoners Dilemma: 103
aspects affecting the dynamics behind these human-AI DQN: 43 VPG: 27 PPO: 33
systems, hoping to understand better and predict those
systems outcomes.
Hawk Dove: 123
3. The Current Study DQN: 43 VPG: 39 PPO: 41
The current research study reports on an experiment
in which a human-AI system played two game theory 3.2. Task
scenarios, Hawk Dove and Prisoners Dilemma (detailed
below in a later section). Each game theory scenario The cooperative game theory scenarios known as
focuses around cooperation in order to achieve the Prisoners Dilemma and Hawk Dove were selected
optimal expected reward, but the motive and concept to provide a broad analytical base to identify the
of the two scenarios are unique. Participants also extent to which different factors affect the willingness
interacted with three different reinforcement learning to cooperate with both the human players and the
models: DQN, VPG, and PPO. These two variable reinforcement learning agents. While both scenarios
groups represented the two independent variables (IV) target cooperation between the two players, the
manipulated for this experiment: 1) game theory fundamental motivations and concepts are unique,
scenario (Prisoners Dilemma, Hawk Dove), and 2) making a detailed description of each scenario
reinforcement learning model (DQN, VPG, PPO). All necessary.
independent variables were examined resulting in a 2x3
factorial design conducted between subjects. Based
on the experimental design and previous research the 3.2.1. Prisoners Dilemma The Prisoners Dilemma
following hypothesis can be considered regarding RL is an ideal scenario in game theory where two players
algorithms: (1) Due to the algorithmic design of DQN are posited to have been arrested by authorities for
models, we would expect them to achieve higher levels committing a crime. Once apprehended, each player
of cooperation. Regarding scenario choice, this study is separated from the other so that they are unable to
elects to take a more exploratory approach to the communicate. Because the police do not have sufficient
effects the social framing of each scenario could have evidence to convict both players, they offer each player
on cooperation rather than hypothesizing the specific the opportunity to confess to gain a lighter sentence at
superiority of either scenario. the expense of the other player.
Prisoner’s Dilemma’s core result is that the Nash
3.1. Participants Equilibrium induces both players to confess, leading
to the collectively worst outcome for both players.
This experiment recruited 226 participants from However, in experimental settings, this dynamic
Amazon Mechanical Turk to participate in the often changes when the Prisoners Dilemma is played
experiment, resulting in 226 human-AI systems iteratively. The iterative nature is because a sequential
completing the experiment. Participants’ demographics Prisoner’s Dilemma creates the opportunity for players
were as follows: gender: 145 Male, 80 Female, 1 to punish one another for defecting from an agreement
Other, Age: 63 between 18-25 years, 94 between to remain silent, thus creating a reasonable expectation
26-35 years, 41 between 36-45 years, 14 between 46-55 of cooperation.

Page 351
Figure 1 is taken from the Prisoners Dilemma Figure 1. Interface for the Prisoners Dilemma game
interface of the custom experimental platform. The theory scenario
payoffs for mutual cooperation are -1 for each player,
the payoffs for mutual defection are -2 for each
player, and the payoff for successfully defecting on a
cooperative player are 0 and -3, respectively. In a
Prisoner’s Dilemma, defecting is the dominant strategy
because both players are better off defecting, given
what they expect the other player to do. Axelrod’s
famous tournament that included both human and
computer-generated solutions found the tit for tat
strategy to be most beneficial, effectively repeating the
partner’s previous decision [33].

The open-source RL framework TensorForce


was utilized to implement the RL agents. The
3.2.2. Hawk Dove The Hawk Dove game is a more
TensorForce library is focused on providing explicit
dynamic version of a Prisoner’s Dilemma where each
APIs, readability, and modularization to deploy RL
player is faced with a decision of whether to attack
solutions both in research and practical applications
or to remain peaceful. Hawk Dove is symmetric, so
[34, 35].
each player abides by the same incentive structure that
rewards peace (0 payoff) over war (-2 payoff). The
3.4. Procedure
only situation in which any player is better than being
peaceful is by successfully attacking when the other
Participants were recruited through the Amazon
player selects to be peaceful.
Mechanical Turk (MTurk) platform, a platform that
The Hawk Dove scenario looked the same as Figure allows researchers to recruit participants worldwide in
1 but the scenario, title, and reward square was modified return for monetary compensation [36]. The MTurk
to match the Hawk Dove scenario (exact reward square platform is highly reliable and hugely representative of
defined in measures). A successful attack occurs when the population compared to typical university subject
one of the players decides to remain peaceful, and pools [37].
results in one point being transferred from the peaceful For the current study, participants were randomly
player to the attacker. This payoff is significant because assigned to conditions. The first thing participants did
it results in a smaller loss for the peaceful player when was give informed consent to participate in the study,
attacked than when mutually attacking. This aspect which, after consenting to participate in the study, saw
is essential because it creates a somewhat powerful the experiment begin automatically. The participants
incentive to remain peaceful, implying that attacks result were shown one of two interfaces depending on the
from an essential zero-sum mentality driving the player. game theory scenario they were grouped into, (Figure
1 shows the basic layout). Directions to the scenario
were shown at the top of the interface, with the players’
3.3. Materials and Equipment scores directly below, followed by the outcome table
and possible decisions. After playing the game for ten
turns the interface reinitialized and began a new game.
A custom experimental platform was developed Participants played three games for a total of thirty turns,
to accommodate the current study consisting of an with data being automatically collected and stored by
interface that supported each of the experimental the platform. Once participants had completed the three
conditions. The interface for Prisoners Dilemma is games, they were directed to a Qualtrics survey for
shown below in Figure 1, and the interface for Hawk demographics collection and quality assurance. Quality
Dove use the same format with slight modification to assurance involved the participants answering a question
the content provided. Each move by both players was to prove they were a human and were taking the
recorded by the application and stored on a server. Each experiment seriously. Upon completing the experiment,
player plays three rounds of the game theory scenario participants were paid $1 for their time (roughly ten
for ten turns, with the score re-setting every round. minutes).

Page 352
3.5. Measures 4.1. RQ1: Does Cooperation Change Based on
the AI Model Used?
Average cooperation was the only dependent
variable recorded by the current experiment. The In order to determine if significant differences
operationalization of cooperation is necessary to clearly existed in the overall cooperation of the human-AI
define due to the unique differences in the reward systems between independent groups with different RL
structure of the Prisoners Dilemma and Hawk Dove models a Kruskal-Wallis test was ran on the data set in
scenarios. As can be seen in Figure 1, the participants its entirety.
are shown the reward structure for their decisions (Hawk
Dove reward structure square starting from top left Table 2. Kruskal-Wallis Test on Game Theory
to right: -2,-2, 1,-1, -1,1, 0,0). The 2x2 decision Scenario and Overall Cooperation
tree consists of four outcomes, regardless of the score Kruskal-Wallis Test
displayed. These outcomes include the following: 1) n H df p
Player A and Player B both do not cooperate, 2) Player 226 36.22 2 < .001
B cooperates while Player A does not, 3) Player A Post-Hoc Tests
cooperates while Player B does not, 4) Both players RL Models U Z p
cooperate. Accordingly, the experimental platform’s DQN-VPG 1650 -4.42 < .001
result was a number between 1 and 4, with 1 being DQN-PPO 1581.50 -5.48 < .001
low levels of cooperation and 4 being high levels of VPG-PPO 2015.50 -1.78 .075
cooperation. This result was recorded for each turn
in all three of the ten turn games. The results were The Kruskal-Wallis test (see Table 2) showed that the
then averaged for average overall cooperation in the RL model used significantly affected overall human-AI
human-AI system. Outcome 2 is considered lower system cooperation, H(2) = 36.22, p = <.001. Post-hoc
cooperation than outcome 3 because the AI is rewarded Mann-Whitney U tests using a Bonferroni-adjusted
based on the global performance, making it’s tendency alpha level of .017 (0.05/3) was used to determine
to cooperate inherently higher than the human agent’s the significance of each pairwise comparison. The
tendency to cooperate. This ordering also aligns with comparison between human-AI systems using the
the general philosophy of game theory as the AI is much DQN RL model and VPG RL model was significant,
more likely to play rationally. U(N D QN = 86, N V P G = 66) = 1650.00, Z = -4.42, p
<.001. The difference in overall cooperation between
4. Results human-AI systems using the DQN RL model and the
PPO RL model was significant, U(N P P O = 74, N D QN
Due to the unequal sample sizes, the assumption = 86) = 1581.50, Z = -5.48, p <.001. All other
of homogeneity of variances was violated for this data pairwise comparisons were not statistically significant,
set, making the use of non-parametric tests necessary; along with a follow up Kruskal-Wallis test between
however, normality of the data set was maintained. RL models and improvement over the three games
Accordingly, the recommendations of current literature was not significant. Finally, to test the interaction
in statistical analysis were followed [38]. Notably, in effect between game theory scenario and RL model a
order to minimize the chances of committing a Type two-way ANOVA was used; however, the results of
1 error, independent groups were compared using the this test should be interpreted carefully as ANOVA’s are
Mann-Whitney U test, while the Kruskal-Wallis test was robust to violations of homoskedasticity, but only with
used to compare three independent groups. Post-hoc roughly equal sample sizes. ANOVA results revealed
comparisons to Kruskal-Wallis tests were completed a significant interaction effect between game theory
using Mann-Whitney U tests with Bonferroni adjusted scenario and RL model, F(2, 222) = 35.69, p < .001,
p values. Finally, to do more than compare independent η 2 = .25.
group means and determine predictability, the current Based on these results, we can tell that the AI model
analysis utilized a heteroskedasticity-consistent used did have a significant effect on overall cooperation
standard error estimator for ordinary least squares within the human-AI systems. While the VPG and
regression, as detailed by Hayes and Cai [39]. The PPO RL models had very similar cooperation levels, the
results of these analyses are detailed in the following DQN RL model had much higher levels of cooperation,
section, organized by research question. Additionally, lending credence to the notion that the RL model AI use
gender and age data revealed no differences when used to train impacts their interactions with human agents.
as control variables. This result also supports the earlier hypothesis that the

Page 353
DQN model would produce higher levels of cooperation Table 4. Game Theory Scenario and AI Model
in human-AI systems. Linear Regression for Cooperation
Model Fit
4.2. RQ2: Is Cooperation Affected by Game R2 F df p
Theory Scenario? .209 26.76 3, 222 < .001
Score
To investigate whether overall human-AI system
Variable Coefficient Std. Error p
cooperation was affected by the game theory scenario
Constant 2.46 .06 < .001
the system completed, a Mann-Whitney U test was ran
DQN .47 .08 < .001
on the two independent groups data set (see Table 3).
VPG .09 .07 .230
Table 3. Mann-Whitney U Test on Game Theory
HD .33 .07 < .001
Scenario and Overall Cooperation Setwise Hypothesis Test
Mann-Whitney U Test F df num df den p
Variable U p rpb 12.15 2 222 < . 001
Average Cooperation 4575.50 < .001 -.281
Mean and Standard Deviation
Scenario n Mdn SD ordinary least squares regression was used with the
PD 103 2.7 .51 HC3 estimator to predict a human-AI systems overall
cooperation from the game theory scenario used and
HD 123 2.98 .58
the RL model used (see Table 4). As all variables
were nominal each was dummy coded for use in
Descriptive statistics revealed that the overall
the regression. The model explained a statistically
cooperation of human-AI systems completing the
significant amount of variance in overall cooperation,
Prisoners Dilemma scenario (Mdn = 2.7), were lower
F(3, 222) = 26.76, p = <.001, R2 = .21, R2 adj usted
than those completing the Hawk Dove scenario
= .20. AI type DQN was a significant predictor of
(Mdn = 2.98). The Mann-Whitney U test indicated
overall cooperation, β = .474, t(223) = 5.81, p = <
that this difference was statistically significant,
.001. A change in game theory scenario saw human-AI
U(N P r isoner sD ilemma = 103, N H awkD ov e = 123) =
systems overall cooperation increase by 0.474 points, B
4575.50, z = -3.60, p <.001, rpb = -.281. A follow up
= 0.474, 95% CI [0.315, 0.633]. AI type VPG was not
Mann-Whitney test between game theory scenarios and
a significant predictor of overall cooperation, β = .085,
improvement over the three games was not significant.
t(223) = 1.20, p = .230. Alternatively, the game theory
This analysis provides additional clarity to the
scenario Hawk Dove significantly predicted overall
importance of the task and social framing in human-AI
cooperation, β = .325, t(223) = 4.75, p = <.001. A
systems. As stated previously, while the Prisoners
change in game theory scenario saw human-AI systems
Dilemma and Hawk Dove scenarios target cooperation,
overall cooperation increase by 0.325 points, B = 0.325,
the two have unique differences in context and
95% CI [0.191, 0.459].
motivation conveyed to the two players. While the
The results of this regression analysis showcase
current study cannot answer with certainty that the
the impact of RQ1 and RQ2 based on the significant
different contexts and motivations are the driving force
predictive ability of game theory scenario and RL model
behind these differences in cooperation, the results
used on cooperation. This finding further cements the
emphasize their impact.
point made in RQ1 and RQ2 that the social and task
4.3. RQ3: Can the Game Theory Scenarios framing conveyed to both humans and AI models are
and AI Model Used Predict Cooperation? highly relevant to both human-AI system outcomes.

In order to move beyond simply ascertaining 5. Discussion


whether overall cooperation differences between the
independent groups are significantly different an Our results show meaningful differences in the
ordinary least squares regression must be utilized. cooperative dynamics between humans and AIs across
Running this regression gives the current study various settings. Instead of limiting ourselves to
the ability to determine if the RL model and just one game theory model, such as the often-used
game theory scenario can predict the human-AI Prisoner’s Dilemma, we explored the additional Hawk
systems overall cooperation. To accomplish this Dove scenario for human-AI cooperation and compared
a heteroskedasticity-consistent standard error multiple the impact of the similar but unique social and task

Page 354
framing of each. The results from each game can be and human players’ behavioral patterns. While both of
analyzed separately, but should also be understood as these studies and scenarios look to evaluate cooperation
indicative of a broader behavior pattern. Beforehand, in human-AI systems, the actual context and motive
it is important to briefly discuss how the use of our given to the participants varied based on the game.
selected game theory scenarios and RL models could While we cannot say that these contextual differences
have resulted in our observed results. are the reason for cooperation differences, the existence
Specifically, the data from both scenarios have of these differences highlights the importance of the
implications for both AI as well as humans’ cooperative social framing of a task and scenario, which is suspected
dynamics. Cooperation manifested differently based to be the reason behind varying levels of cooperation
on the RL model that the AI teammate utilized. This between games. Therefore, it is essential to consider
finding may be the result of the distinct ways in which both team, task, and evaluation contexts when looking to
DQNs process strategic interactions compared to PPOs understand human-AI systems. A lack of understanding
and VPGs. This would be expected as the design of in these areas could significantly change the utility and
DQNs generally leads to higher levels of convergence viability of powerful tools, such as game theory, to help
overtime, which could result in a more cooperative evaluate and coordinate aspects of human-AI systems.
agent in these scenarios. Generally, the downside to While the bulk of contribution of this study is to the
this algorithm would be the time it takes to build the field of human-AI interaction, additional implications
model; however, the simplistic nature of game theory exist regarding the field of game theory. Specifically,
scenarios allows DQN models to be trained quickly, this study further contributes to the literature regarding
resulting in higher degrees of cooperation forming in the applicability of game-theory to evaluating human-AI
similar training times to the PPO and VPG. It is essential interaction. Different game theory scenarios were
to understand, consider, and compensate for these shown to change the level of cooperation possible during
differences when implementing AI alongside humans. human-AI interaction. These findings demonstrate that
These differences would need to be clarified for the the value game theory can have in the promotion and
specific task a team is conducting, which will allow a encouragement of cooperation. Furthermore, while
more intelligent and deliberate choice when deciding identifying game theory as an evaluation tool is not
the back-end design of AI teammates. For instance, entirely novel, the ability to use game theory as
the DQN model’s higher cooperation in these two tasks an encouragement and social scoping tool is highly
would suggest that it be utilized in similar contexts to the important to future game theory research, especially
Prisoner’s Dilemma and Hawk Dove scenario; however, regarding AI’s interactions with game theory. Building
choosing a PPO or VPG model due to ignorance scenarios and tasks that are scoped within the design of
towards models differences could significantly reduce game theory scenarios, especially with the framing of
the cooperation within the human-AI system. Without the Hawk Dove, could create tasks and environments
this knowledge and design, significant performance that demonstrate a greater and more apparent benefit
differences could be seen between different human-AI from human-AI cooperating.
systems despite them existing in similar contexts and The more back-end consideration of algorithm
environments. Additionally, differences in cooperation selection and the more user-facing consideration of
levels over time were not significant between conditions, social and task framing show that understanding and
highlighting stable development of cooperation between designing human-AI systems are reliant on multiple
all conditions. layers of human-AI interactions. The continued pursuit
It is also important to note that AI safety researchers of advancing human-AI systems will need to consider
should not assume that the willingness for an AI to these features, especially in the field of research where
cooperate with humans in one scenario necessarily algorithm and task selection could significantly affect
generalizes to all situations. Our setup goes a long way the results human-AI systems exhibit. As research in
in establishing a strong basis to investigate human-AI this area continues, a complete understanding of the
interactions by testing cooperative dynamics across two factors that affect human-AI interaction can be achieved.
unique game theory scenarios where cooperation is in
the collective interest of the multi-agent system. Using 6. Limitations and Future Work
game theory in this setting is useful because sharp
deviations from Nash Equilibria indicate the complex The following limiting factors should be taken into
nature of the interactions clearly. However, limiting account when interpreting the results of this study.
empirical research on human-AI cooperation to just one Response times were unable to be recorded during
game would have only provided scant evidence about AI the game theory tasks to record the quality of the

Page 355
responses; quality checks could only be implemented in While this study provides insight into the viability of
the post-task survey. Additionally, while the scoping of using game theory to understanding human-AI systems,
game theory used has identified advantages in observing further research efforts are required to ensure broader
strategic play, its use creates some partial limitations in applicability of game theory to real-world human-AI
this study. Firstly, the simplistic nature of the game systems.
theory scenarios used make it easier for cooperation
to occur as the benefits can become apparent more 7. Conclusion
quickly. Real-world environments and scenarios may
not benefit from the same simplicity and may not be A significant question in AI safety and AI research
able to achieve high levels of cooperation in such a short as a whole for the years to come will be how to
amount of time. Secondly, game theory lends itself to train humans and AIs to work together. Reinforcement
a specific definition of rationality that can be viewed learning is quickly becoming the dominant machine
more simplistically and mathematically. However, learning paradigm because of its generalizability. Thus,
rational behavior theories exist within humans, such as it is crucial to understand how the tasks human-AI
bounded rationality, which may go beyond the simplistic systems face need to be framed and the task-specific
definition in the current study. These limitations do benefits of differing agent design. To that end, this
not mean that these results are not applicable to the paper’s methodology shows how different game theory
real-world where humans have complex rationales, but it models can be used to frame human-AI systems and
does mean that the relevancy of these results should not better understand cooperation differences based on
be considered without consideration for the game theory context. To that end, it is essential that the understanding
scoping used. of this research is further expanded to understand a
The two primary avenues for expanding upon this more extensive variety of contexts, specifically related to
work involve the choice of RL models and game theory human-AI interaction. As human-AI systems continue
scenarios. For the former, RL’s field is expanding so to progress into more environments, understanding the
rapidly that new RL models have emerged that analyze impact that task and social framing and context have on
strategic situations in different ways. This paper limited interaction, along with the underlying algorithms used
itself to DQNs, VPGs, and PPOs because they provide for AI, will be vital to human-AI cooperation.
representative models from the classical, deep learning,
and modern RL paradigms. The results strongly indicate
8. Acknowledgements
that the RL model the agent operates by is not ancillary
This material is based upon work supported by the
to the outcome of a human-AI interaction in a game
National Science Foundation under Grant No. 1829008.
theory setting; thus, it would follow that empirically
testing additional models might also be useful to
References
generate a complete picture of human-AI cooperation.
For the latter, many game theory scenarios would [1] M. H. Jarrahi, “Artificial intelligence and the future of
enable the exploration of human-AI cooperation under work: Human-ai symbiosis in organizational decision
making,” Business Horizons, vol. 61, no. 4, pp. 577–586,
different incentive structures. Ideally, future research 2018.
will focus on long-form games instead of iterative games [2] F.-H. Hsu, Behind Deep Blue: Building the computer
since the cooperation’s nature is different. This focus that defeated the world chess champion. Princeton
would expand game theory’s viability to human-AI University Press, 2004.
system interaction as specific game theory scenarios [3] C. Thompson, “Clive thompson on the cyborg
could be chosen based on the context and function of advantage,” Wired Mag, 2010.
the human-AI system being evaluated. For example, the [4] H. Shirado and N. A. Christakis, “Locally noisy
Centipede game, where participants take turns taking a autonomous agents improve global human coordination
in network experiments,” Nature, vol. 545, no. 7654,
slightly larger payoff or passing on a pot of rewards, p. 370, 2017.
would help identify how backward induction plays a [5] D. Wang, A. Khosla, R. Gargeya, H. Irshad, and A. H.
role in human-AI cooperation. This type of scenario Beck, “Deep learning for identifying metastatic breast
could be highly applicable in teams that mostly function cancer,” arXiv preprint arXiv:1606.05718, 2016.
asynchronously but are highly dependent on shared [6] N. J. McNeese, M. Demir, N. J. Cooke, and
resources. C. Myers, “Teaming with a synthetic teammate:
insights into human-autonomy teaming,” Human factors,
Overall, using cutting-edge RL models as well as pp. 262–273, 2017.
context applicable and extended duration games can [7] M. Demir, N. J. McNeese, and N. J. Cooke, “The
shed light on a different type of human-AI cooperation. impact of perceived autonomous agents on dynamic team

Page 356
behaviors,” IEEE Transactions on Emerging Topics in [23] T. Bansal, J. Pachocki, S. Sidor, I. Sutskever, and
Computational Intelligence, 2018. I. Mordatch, “Emergent complexity via multi-agent
[8] P. M. Miller and N. S. Fagley, “The effects of framing, competition,” arXiv preprint arXiv:1710.03748, 2017.
problem variations, and providing rationale on choice,” [24] J. Hu, M. P. Wellman, et al., “Multiagent reinforcement
Personality and Social Psychology Bulletin, vol. 17, learning: theoretical framework and an algorithm.,” in
no. 5, pp. 517–522, 1991. ICML, vol. 98, pp. 242–250, Citeseer, 1998.
[9] I. Seeber, E. Bittner, R. O. Briggs, T. de Vreede, [25] J. W. Crandall and M. A. Goodrich, “Learning to
G.-J. De Vreede, A. Elkins, R. Maier, A. B. compete, compromise, and cooperate in repeated
Merz, S. Oeste-Reiß, N. Randrup, et al., “Machines general-sum games,” in Proceedings of the 22nd
as teammates: A research agenda on ai in team international conference on Machine learning,
collaboration,” Information & management, vol. 57, pp. 161–168, 2005.
no. 2, p. 103174, 2020.
[26] J. W. Crandall, M. Oudah, F. Ishowo-Oloko, S. Abdallah,
[10] K. Mason, P. Mannion, J. Duggan, and E. Howley, J.-F. Bonnefon, M. Cebrian, A. Shariff, M. A. Goodrich,
“Applying multi-agent reinforcement learning to I. Rahwan, et al., “Cooperating with machines,” Nature
watershed management,” in Proceedings of the Adaptive communications, vol. 9, no. 1, pp. 1–12, 2018.
and Learning Agents workshop (at AAMAS 2016), 2016.
[11] L. Zheng, J. Yang, H. Cai, M. Zhou, W. Zhang, [27] W. Zhang, L. Ma, and X. Li, “Multi-agent reinforcement
J. Wang, and Y. Yu, “Magent: A many-agent learning based on local communication,” Cluster
reinforcement learning platform for artificial collective Computing, pp. 1–10, 2018.
intelligence,” in Thirty-Second AAAI Conference on [28] Y. Shoham, R. Powers, and T. Grenager, “Multi-agent
Artificial Intelligence, 2018. reinforcement learning: a critical survey,” Web
[12] J. F. Nash et al., “Equilibrium points in n-person games,” manuscript, 2003.
Proceedings of the national academy of sciences, vol. 36, [29] X. Wang and T. Sandholm, “Reinforcement learning
no. 1, pp. 48–49, 1950. to play an optimal nash equilibrium in team markov
[13] J. Z. Leibo, V. Zambaldi, M. Lanctot, J. Marecki, games,” in Advances in neural information processing
and T. Graepel, “Multi-agent reinforcement learning systems, pp. 1603–1610, 2003.
in sequential social dilemmas,” in Proceedings of the [30] M. Lanctot, V. Zambaldi, A. Gruslys, A. Lazaridou,
16th Conference on Autonomous Agents and MultiAgent K. Tuyls, J. Perolat, D. Silver, and T. Graepel, “A unified
Systems, pp. 464–473, International Foundation for game-theoretic approach to multiagent reinforcement
Autonomous Agents and Multiagent Systems, 2017. learning,” in Advances in Neural Information Processing
[14] M. L. Littman, “Markov games as a framework Systems, pp. 4190–4203, 2017.
for multi-agent reinforcement learning,” in Machine [31] C. Claus and C. Boutilier, “The dynamics of
learning proceedings 1994, pp. 157–163, Elsevier, 1994. reinforcement learning in cooperative multiagent
[15] A. Bab and R. I. Brafman, “Multi-agent reinforcement systems,” AAAI/IAAI, vol. 1998, pp. 746–752, 1998.
learning in common interest and fixed sum stochastic
games: An experimental study,” Journal of Machine [32] Y. Yang, R. Luo, M. Li, M. Zhou, W. Zhang,
Learning Research, vol. 9, no. Dec, pp. 2635–2675, and J. Wang, “Mean field multi-agent reinforcement
2008. learning,” arXiv preprint arXiv:1802.05438, 2018.
[16] E. M. de Cote, A. Lazaric, and M. Restelli, “Learning to [33] R. Axelrod, “More effective choice in the prisoner’s
cooperate in multi-agent social dilemmas,” in AAMAS, dilemma,” Journal of conflict resolution, vol. 24, no. 3,
vol. 6, pp. 783–785, 2006. pp. 379–403, 1980.
[17] I. Erev and A. E. Roth, “Predicting how people [34] M. Schaarschmidt, A. Kuhnle, B. Ellis, K. Fricke,
play games: Reinforcement learning in experimental F. Gessert, and E. Yoneki, “LIFT: Reinforcement
games with unique, mixed strategy equilibria,” American learning in computer systems by learning from
economic review, pp. 848–881, 1998. demonstrations,” CoRR, vol. abs/1808.07903, 2018.
[18] K. Tuyls and G. Weiss, “Multiagent learning: Basics, [35] A. Kuhnle, M. Schaarschmidt, and K. Fricke,
challenges, and prospects,” Ai Magazine, vol. 33, no. 3, “Tensorforce: a tensorflow library for applied
pp. 41–41, 2012. reinforcement learning.” Web page, 2017.
[19] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, [36] J. J. Horton, D. G. Rand, and R. J. Zeckhauser, “The
I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing online laboratory: Conducting experiments in a real
atari with deep reinforcement learning,” arXiv preprint labor market,” Experimental economics, vol. 14, no. 3,
arXiv:1312.5602, 2013. pp. 399–425, 2011.
[20] R. J. Williams, “Simple statistical gradient-following [37] G. Paolacci, J. Chandler, and P. G. Ipeirotis, “Running
algorithms for connectionist reinforcement learning,” experiments on amazon mechanical turk,” Judgment and
Machine learning, vol. 8, no. 3-4, pp. 229–256, 1992. Decision making, vol. 5, no. 5, pp. 411–419, 2010.
[21] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and [38] P. J. Rosopa, M. M. Schaffer, and A. N. Schroeder,
O. Klimov, “Proximal policy optimization algorithms,” “Managing heteroscedasticity in general linear models.,”
arXiv preprint arXiv:1707.06347, 2017. Psychological Methods, vol. 18, no. 3, p. 335, 2013.
[22] J. Foerster, R. Y. Chen, M. Al-Shedivat, S. Whiteson, [39] A. F. Hayes and L. Cai, “Using
P. Abbeel, and I. Mordatch, “Learning with heteroskedasticity-consistent standard error estimators
opponent-learning awareness,” in Proceedings of the in ols regression: An introduction and software
17th International Conference on Autonomous Agents implementation,” Behavior research methods, vol. 39,
and MultiAgent Systems, pp. 122–130, International no. 4, pp. 709–722, 2007.
Foundation for Autonomous Agents and Multiagent
Systems, 2018.

Page 357

View publication stats

You might also like