Artificial Intelligence Final
Artificial Intelligence Final
UNIVERSITY ABSTRACT
Artificial Intelligence is today’s world
ENGINEERING
popular application area for AI
research. While board games have been
central to AI research since the
inception of the field, video games have
during the last decade increasingly
become the domain of choice for
testing and showcasing new algorithms.
Paper illustrate some AI methods used
to play games and how they may be
applied.
Team member
THIBAUT RIFAI
(1312767)
LALLBAHADOOR GOVIND
(1413959)
ARTIFICIAL INTELLIGENCE IN GAMES
BEng ( Hons ) MECHATRONICS
Contents
1. Introduction ............................................................................................................................. 2
3. Utility-Based AI .................................................................................................................. 7
1. ANNs .......................................................................................................................... 13
3. Conclusion ............................................................................................................................ 16
References ..................................................................................................................................... 17
1
1. Introduction
Artificial Intelligence (AI) has seen immense progress in recent years. AI advances have enabled
better understanding of images and speech, emotion detection, self-driving cars, web searching,
AI-assisted creative design, and game-playing, among many other tasks; for some of these tasks
machines have reached human level status or beyond. There is, however, a difference between
what machines can do well and what humans are good at. In the early days of AI, these problems
were presented to the machines as a set of formal mathematical notions within rather narrow and
controlled spaces, which could be solved by some form of symbol manipulation or search in
symbolic space. Over the years the focus of much AI research has shifted to tasks that are relatively
simple for humans to do but are hard for us to describe how to do, such as remembering a face or
recognizing our friend’s voice over the phone [STUDY PANEL (2016) , Yannakakis. G. N (2018)]
Game AI—in particular video game or computer game AI—has seen major advancements in the
(roughly) fifteen years of its existence as a separate research field. We can nowadays use AI to
play many games better than any human, we can design AI bots that are more believable and
human-like than human players, we can collaborate with AI to design better and unconventional
(aspects of) games, we can better understand players and play by modeling the overall game
experience, we can better understand game design by modeling it as an algorithm, and we can
improve game design and tune our monetization strategy by analyzing massive amounts of player
data. Any AI-informed decisions about the future of a game’s design or development are based on
evidence rather than intuition, which showcases the potential of AI—via game analytics and game
data mining—for better design, development and quality assurance procedures [Togelius. J
(2011)].
2
2. AI Methods for playing games
This chapter presents a number of basic AI methods that are commonly used in games.
Throughout the chapter the game of Ms Pac-Man (Namco, 1982) is being taken as reference to
enable the understanding of how these method can be used in game. In the Artificial Intelligence
and Games ,Ms Pac-Man was picked for its popularity and its game design simplicity as well as
for its high complexity when it comes to playing the game. It is important to remember that Ms
Pac-Man is a non-deterministic variant of its ancestor Pac-Man (Namco, 1980) which implies that
the movements of ghosts involve a degree of randomness. For the sake of consistency, all the
methods covered are employed to control Ms Pac-Man’s behavior even though they can find a
multitude of other uses in this game [Yannakakis. G. N (2018)].
All methods has a former principle; algorithm’s representation and the second is its utility. On
the one hand, any AI algorithm somehow stores and maintains knowledge obtained about a
particular task at hand. On the other hand, most AI algorithms seek to find better representations
of knowledge. This seeking process is driven by a utility function of some form. We should note
that the utility is of no use solely in methods that employ static knowledge representations such as
finite state machines or behavior trees.
1. Representation
2. Utility
Utility in game theory is a measure of rational choice when playing a game. In general, it can be
viewed as a function that is able to assist a search algorithm to decide which path to take. For that
3
purpose, the utility function samples aspects of the search space and gathers information about the
“goodness” of areas in the space. In a sense, a utility function is an approximation of the solution
we try to find. In other words, it is a measure of goodness of the existing representation we search
through. [Woodbury. R.F (1991)]
In this section we showcase FSMs as employed to control the Ms Pac-Man agent. A hypothetical
and simplified FSM controller for Ms Pac-Man is illustrated in Fig. 2.1. In this example our FSM
has three states (seek pellets, chase ghosts and evade ghosts) and four transitions (ghosts flashing,
no visible ghost, ghost in sight, and power pill eaten). While in the seek pellets state, Ms Pac-Man
moves randomly up until it detects a pellet and then follows a path finding algorithm to eat as
many pellets as possible and as soon as possible. If a power pill is eaten, then Ms Pac-Man moves
to the chase ghosts’ state in which it can use any tree-search algorithm to chase the blue ghosts.
When the ghosts start flashing, Ms Pac-Man moves to the evade ghosts state in which it uses tree
search to evade ghosts so that none is visible within a distance; when that happens Ms Pac-Man
moves back to the seek pellets state.
4
Figure 2-1 A high-level and simplified FSM example for controlling Ms Pac-Man.
2. Behavior Trees
A Behavior Tree (BT) [Champandard, 2007] is an expert-knowledge system which, similarly to
an FSM, models transitions between a finite set of tasks (or behaviors). The strength of BTs
compared to FSMs is their modularity: if designed well, they can yield complex behaviors
composed of simple tasks. The main difference between BT and FSMs is that they are composed
of behaviors rather than states. As with finite state machines, BTs are easy to design, test and
debug, which made them dominant in the game development scene.
BT employs a tree structure with a root node and a number of parent and corresponding child
nodes representing behaviors—see Fig. 2.2 for an example. We traverse a BT starting from the
root. We then activate the execution of parent-child pairs as denoted in the tree. A child may return
the following values to the parent in predetermined time steps (ticks): run if the behavior is still
active, success if the behavior is completed, failure if the behavior failed. BTs are composed of
three node types: the sequence, the selector, and the decorator the basic functionality of which is
described below:
• Sequence (see blue rectangle in Fig. 2.2): if the child behavior succeeds, the sequence continues
and eventually the parent node succeeds if all child behaviors succeed; otherwise the sequence
fails.
• Selector (see red rounded rectangle in Fig. 2.2): there are two main types of selector nodes: the
probability and the priority selectors. When a probability selector is used child behaviors are
selected based on parent-child probabilities set by the BT designer. On the other hand if priority
5
selectors are used, child behaviors are
ordered in a list and tried one after the
other. Regardless of the selector type
used, if the child behavior succeeds
the selector succeeds. If the child
behavior fails, the next child in the
order is selected (in priority selectors)
or the selector fails (in probability
selectors).
A BT for Ms Pac-Man
In Fig. 2.3 we illustrate a simple BT for the seek pellets behavior of Ms Pac-Man. While in the
seek pellets sequence behavior Ms Pac-Man will first move (selector), it will then find a pellet and
finally it will keep eating pellets until a ghost is found in sight (decorator). While in the move
behavior— which is a priority selector—Ms Pac-Man will prioritize ghost-free corridors over
corridors with pellets and over corridors without pellets.
6
3. Utility-Based AI
An increasingly popular method for ad-hoc behavior authoring that eliminates the modularity
limitations of FSMs and BTs is the utility-based AI approach which can be used for the design of
control and decision making systems in games [Mark. D, 2009]. Following this approach, instances
in the game get assigned a particular utility function that gives a value for the importance of the
particular instance [Alexander. B, 2002.]. For instance, the importance of an enemy being present
at a particular distance or the importance of an agent’s health being low in this particular context.
Given the set of all utilities available to an agent and all the options it has, utility-based AI decides
which is the most important option it should consider at this moment [Mark. D and Dill. k. (2010)].
The approach is similar to the design of membership functions in a fuzzy set. A utility can measure
anything from observable objective data (e.g., enemy health) to subjective notions such as
emotions, mood and threat. The various utilities about possible actions or decisions can be
aggregated into linear or non-linear formulas and guide the agent to take decisions based on the
aggregated utility.
So while FSMs and BTs would examine one decision at a time, utility-based AI architectures
examine all available options, assign a utility to them and select the option that is most appropriate.
(Highest utility).
Figure 2.4 illustrates an example of three simple utility functions that could be considered by Ms
Pac-Man during play. Each function corresponds to a different behavior that is dependent on the
current threat level of the game; threat is, in turn, a function of current ghost positions. At any
point in the game Ms Pac-Man selects the behavior with the highest utility value.
7
Figure 2-4 A utility-based approach for controlling Ms Pac-Man behavior. The threat level (x-axis) is a function that lies
between 0 and 1 which is based on the current position of ghosts. Ms Pac-Man considers the current level of threat, assigns
utility values (through the three different curves) and decides to follow the behavior with the highest utility value. In this example
the utility of ghost evading rises exponentially with the level of threat. The utility of ghost hunting decreases linearly with respect
to threat up to a point where it stabilizes; it then decreases linearly as the threat level increases above a threshold value. Finally,
the utility of pellet seeking increases linearly up to considerable threat level from which point it decreases exponentially.
8
The three main tree search are:
1. Uninformed Search
2. Best-First Search
3. Monte Carlo Tree Search
1. Uninformed Search
Uninformed search algorithms are algorithms which search a state space without any further
information about the goal. The basic uninformed search algorithms are commonly seen as
fundamental computer science algorithms, and are sometimes not even seen as AI. Depth-first
search is a search algorithm which explores each branch as far as possible before backtracking and
trying another branch. At every iteration of its main loop, depth-first search selects a branch and
then moves on to explore the resulting node in the next iteration. When a terminal node is reached,
depth-first search advances up the list of visited nodes until it finds one which has unexplored
actions [Russell. S and Norvig. P (1995)].
A depth-first approach in Ms Pac-Man would normally consider the branches of the game tree
until Ms Pac-Man either completes the level or loses. The outcome of this search for each possible
action would determine which action to take at a given moment. Breadth-first instead would first
explore all possible actions of Ms Pac-Man at the current state of the game (e.g., going left, up,
down or right) and would then explore all their resulting nodes (children) and so on. The game tree
of either method is too big and complex to visualize within a Ms Pac-Man example.
2. Best-First Search
In best-first search, the expansion of nodes in the search tree is informed by some knowledge
about the goal state. In general, the node that is closest to the goal state by some criterion is
expanded first. The most well-known best-first search algorithm is A* (pronounced A star). The
A* algorithm keeps a list of “open” nodes, which are next to an explored node but which have not
themselves been explored. For each open node, an estimate of its distance from the goal is made.
New nodes are chosen to explore based on a lowest cost basis, where the cost is the distance from
the origin node plus the estimate of the distance to the goal. A* can easily be understood as
9
navigation in two- or three-dimensional space. Variants of this algorithm are therefore commonly
used for path finding in games [Sturtevant. N (2008)].
Best-first search can be applicable in Pac-Man in the form of A*. Following the paradigm of the
2009 Mario AI competition champion, Ms Pac-Man can be controlled by an A* algorithm that
searches through possible game states within a short time frame and takes a decision on where to
move next (up, down, left or right). The game state can be represented in various ways: from a
very direct, yet costly, representation that takes ghost and pellet coordinates into account to an
indirect representation that considers the distance to the closest ghost or pellet. Regardless of the
representation chosen, A* requires the design of a cost function that will drive the search. Relevant
cost functions for Ms Pac-Man would normally reward moves to areas containing pellets and
penalizing areas containing ghosts.
10
Figure 2-5 The four basic steps of MCTS exemplified through one iteration of the algorithm. The
Figure is a recreation of the corresponding MCTS outline figure by Chaslot et al. [118].
MCTS can be applicable to the real-time control of the Ms Pac-Man agent. We will outline the
approach followed by Pepels et al. [524] given its success in obtaining high scores for Ms Pac-
Man. When MCTS is used for real-time decision making a number of challenges become critical.
First, the algorithm has limited rollout computational budget which increases the importance of
heuristic knowledge. Second, the action space can be particularly fine-grained which suggests that
macro-actions are a more powerful way to model the game tree; otherwise the agent’s planning
will be very short-term. Third, there might be no terminal node in sight which calls for good
heuristics and possibly restricting the simulation depth. The MCTS agent of Pepels et al. [Pepels.
T (2014] managed to cope with all the above challenges of using MCTS for real-time control by
using a restricted game tree and a junction-based game state representation (see Fig. 2.8).
11
Figure 2-5 The junction-based representation of a game state for the Maastricht MCTS controller [524]. All letter nodes refer to
game tree nodes (decisions) for Ms Pac-Man. Imaged adapted from [524] with permission from authors.
12
C. AI methods: Supervised Learning
Supervised learning requires a set of labeled training examples; hence supervised. More
specifically, the training signal comes as a set of supervised labels on the data (e.g., this is an apple
whereas that one is a pear) which acts upon a set of characterizations of these labels (e.g., this
apple has red color and medium size). The ultimate goal of supervised learning is not to merely
learn from the input-output pairs but to derive a function that approximates (better, imitates) their
relationship. The derived function should be able to map well to new and unseen instances of input
and output pairs [Bishop. M. C. (2006)].
1. ANNs
ANN learn by First, evaluation a cost function of the quality of any set of weights and Second, it
using a search strategy within the space of possible solutions (i.e., the weight space). In the
following two subsections these aspect are outline.
Figure 2-5 An MLP example with three inputs, one hidden layer containing four hidden neurons
and two outputs. The ANN has labeled and ordered neurons and example connection weight labels.
Bias weights bj are not illustrated in this example but are connected to each neuron j of the ANN.
Before we attempt to adjust the weights to approximate f, we need some measure of MLP
performance. The most common performance measure for training ANNs in a supervised manner
is the squared Euclidean distance (error) between the vectors of the actual output of the ANN (a)
and the desired labeled output y (see equation below).
13
Where the sum is taken over all the output neurons (the neurons in the final layer). Note that the
yj labels are constant values and more importantly, also note that E is a function of all the weights
of the ANN since the actual outputs depend on them. As we will see below, ANN training
algorithms build strongly upon this relationship between error and weights
Back-propagation
One straightforward way to use ANNs in Ms Pac-Man is to attempt to imitate expert players of
the game. Thus, one can ask experts to play the game and record their play-throughs, through
which a number of features can be extracted and used as the input of the ANN. The resolution of
the ANN input may vary from simple statistics of the game—such as the average distance between
ghosts and Ms Pac-Man—to detailed pixel-to-pixel RGB values of the game level image. The
output data, on the other hand, may contain the actions selected by Ms Pac-Man in each frame of
the game. Given the input and desired output pairs, the ANN is trained via back-propagation to
predict the action performed by expert players (ANN output) given the current game state (ANN
input). The size (width and depth) of the ANN depends on both the amount of data available from
the expert Ms Pac-Man players and the size of the input vector considered.
14
output data type they attempt to learn. The goal of decision tree learning is to construct a mapping
(a tree model) that predicts the value of target outputs based on a number of input attributes.
As with ANNs, decision tree learning requires data to be trained on. Presuming that data from
expert Ms Pac-Man players would be of good quality and quantity, decision trees can be
constructed to predict the strategy of Ms Pac-Man based on a number of ad-hoc designed attributes
of the game state. Figure 2.16 illustrates a simplified hypothetical decision tree for controlling Ms
Pac-Man. According to that example if a ghost is nearby then Ms Pac-Man checks if power pills
are available in a close distance and aims for those; otherwise it takes actions so that it evades the
ghost. If alternatively, ghosts are not visible Ms Pac-Man checks for pellets. If those are nearby or
in a fair distance then it aims for them; otherwise it aims for the fruit, if that is available on the
level. It is important to note that the leaves of the tree in our example represent control strategies
(macro-actions) rather than actual actions (up, down, left, and right) for Ms Pac-Man.
15
3. Conclusion
AI can either play to win or play in order to create a particular experience for a human player or
observer [Yannakakis. G. N (2018)]. The former goal involves maximizing a utility that maps to
the game performance whereas the latter goal involves objectives beyond merely winning such as
engagement, believability, balance and interestingness. AI as an actor can take the role of either a
player character or a non-player character that exists in a game. The characteristics of games an
AI method needs to consider when playing include the number of players, the level of stochasticity
of the game, the amount of observability available, the action space and the branching factor, and
the time granularity. Further, when we design an algorithm to play a game we also need to consider
algorithmic aspects such as the state representation, the existence of a forward model, the training
time available, and the number of games AI can play. The above roles and characteristics were
detailed in the first part of the chapter as they are important and relevant regardless of the AI
method applied. When it comes to the methods covered in this chapter, we focused on tree-search,
and supervised learning, and hybrid approaches for playing games. In a sense, we tailored the
methods outlined in Chapter 2 to the task of game playing.
16
References
Alexander. B. (2002). The beauty of response curves. AI Game Programming Wisdom, page 78.
Breiman. L, Friedman. J, Stone. C. J and Olshen. R. A. (1984) Classification and regression trees.
CRC Press
Chaslot. G. M. J. B., Mark H. M. Winands, H. Jaap van Den Herik, Jos W. H. M. Uiterwijk, and
Bruno Bouzy. (2008). Progressive strategies for Monte-Carlo tree search. New Mathematics and
Natural Computation, 4(03):343–357
Developers Conference].
Mark. D and Dill. k. (2010).Improving AI decision modeling through utility theory. In Game
Mark. D. (2009). Behavioral Mathematics for game AI. Charles River Media
Pepels. T, Mark H. M. Winands, and Lanctot. M. (2014). Real-time Monte Carlo tree search in
Ms Pac-Man. IEEE Transactions on Computational Intelligence and AI in Games, 6(3):245– 257.
STUDY PANEL (2016) by Stanford University. Artificial Intelligence and Life in 2030 is made
available under a Creative Commons Attribution-NoDerivatives 4.0 License (International):
https://creativecommons. org/licenses/by-nd/4.0/
17
Togelius. J, Yannakakis. G. N., Stanley .K. O, and Browne. C. (2011) .Search based procedural
content generation: A taxonomy and survey. Computational Intelligence and AI in Games, IEEE
Transactions on, 3(3):172–186
Woodbury. R.F (1991). Searching for designs: Paradigm and practice. Building and
Environment, 26(1):61–73.
Yannakakis. G. N and Togelius. J. (2018). Artificial Intelligence and Games. New York, NY,
USA: Springer. Available at: http://gameaibook.org/book.pdf
18