Adversial Search
Adversial Search
1
CS 221: Artificial Intelligence
2
Do the Right Thing under Uncertainty
3
Problems of Agent Uncertainty
Problem: Stochasticity
Solution: MDPs → a policy π(s)
Problem: Partial Observability
Operate in belief space: POMDPs → policy
Problem: Unknown Model
Exploration; Reinforcement learning
Problem: Computational Limitation
Heuristics; A*; Monte Carlo approximations
Problem: Other agents; Adversaries
4
Environments and Agents
What is the environment?
Something that evolves from one state to next in
response to actions: Result(s, a) → s′
5
Decision to Model as an Agent
Model an object as an agent iff:
The object’s actions are dependent not just on
the state, but on my belief state
(especially my beliefs about my own actions)
“Now, a clever man would put
the poison into his own goblet,
because he would know that
only a great fool would reach
for what he was given. I am not
a great fool, so I can clearly not
choose the wine in front of you.
But you must have known I was
not a great fool, you would
have counted on it, so I can
clearly not choose the wine in
front of me.”
Princess Bride: Vizzini vs. Wesley 6
Other Agents
7
Other Agents
Cooperation
Competition
8
Types of Games
Deterministic (Chess)
Stochastic (Soccer)
Pacman: unknown
10
1: Deterministic, Fully Observable
Many possible formalizations, one is:
States: S (start at s0)
Players: P={1...N} (usually take turns; often N=2)
Actions: A (may depend on player / state)
Transition Function: SxA S or Sx{Ai} S
Terminal Test: S {t,f}
Terminal Utilities: SxP R
11
Deterministic Single-Player
Deterministic, single player
(solitaire), perfect information:
Know the rules
Know what actions do
Know when you win
E.g. Freecell, Rubik’s cube
… it’s just search!
Slight reinterpretation:
Each node stores a value: the
best outcome it can reach
This is the maximal outcome of
its children (the max value)
Note that we don’t have path
sums as before (utilities at end)
lose win lose
12
Deterministic Two-Player
Deterministic, zero-sum games: Minimax values:
computed recursively
Tic-tac-toe, chess, checkers
5 max
One player maximizes result
The other minimizes result
2 5 min
Minimax search:
A state-space search tree
Players alternate turns 8 2 5 6
Each node has a minimax
Terminal values:
value: best achievable utility
part of the game
against a rational adversary
13
Computing Minimax Values
Two recursive functions:
max-value maxes the values of successors
min-value mins the values of successors
def value(state):
If the state is a terminal state: return the state’s utility
If the next agent is MAX: return max-value(state)
If the next agent is MIN: return min-value(state)
def max-value(state):
Initialize max = -∞
For each successor of state:
V ← value(successor)
max ← maximum(max, v)
Return max
Tic-tac-toe Game Tree
15
Minimax Example
3 max
3 2 1
3 12 8 2 3 9 14 1 8
16
Minimax Properties
Optimal against a perfect player.
Against non-perfect player?
max
Time complexity?
O(bm) min
Space complexity?
O(bm)
10 10 9 100
17
Overcoming Computational Limits
Cannot search to leaves in most games
4 max
Depth-limited search
Instead, search a limited depth of tree -2 min 4 min
Replace terminal utilities with an evaluation limit=2
function for non-terminals -1 -2 4 9
Guarantee of optimal play is gone
1 1
1 0 1
3 max
3 ≤2 ≤1
3 12 8 2 14 1
22
-: Pruning in Depth-Limited Search
General configuration
is the best value that Player
current path
If n becomes worse than
, MAX will avoid it, so Player
can stop considering n’s
other children Opponent n
23
Another - Pruning Example
≤2 ≤1
3
3 12 2 14 5 1
≥8
8
- Pruning Algorithm
25
- Pruning Properties
Pruning has no effect on final action computed
26
Stochasticity
28
Expectimax Search Trees
What if we don’t know what the
result of an action will be? E.g.,
In solitaire, next card is unknown
In backgammon, dice roll max
In minesweeper, mine locations
In Pacman, random ghost moves
29
Reminder: Expectations
We can define function f(X) of a random variable X
30
Expectimax Search
In expectimax search, we have a
probabilistic model of how the
opponent (or environment) will
behave in any state
Model could be a simple uniform
distribution (roll a die)
Model could be sophisticated
and require a great deal of
computation
We have a node for every
outcome out of our control:
opponent or environment
The model might say that
adversarial actions are likely!
For now, assume for any state
we magically have a distribution
to assign probabilities to
opponent actions / environment
outcomes Having a probabilistic belief about
an agent’s action does not mean
that agent is flipping any coins! 31
Expectimax Algorithm
def value(s)
if s is a max node return maxValue(s)
if s is an exp node return expValue(s)
if s is a terminal node return evaluation(s)
def maxValue(s)
values = [value(s’) for s’ in successors(s)]
return max(values)
8 4 5 6
def expValue(s)
values = [value(s’) for s’ in successors(s)]
weights = [probability(s, s’) for s’ in successors(s)]
return expectation(values, weights)
32
Expectimax for Pacman
Notice that we’ve gotten away from thinking that the
ghosts are trying to minimize pacman’s score
Instead, they are now a part of the environment
Pacman has a belief (distribution) over how they will act
Quiz: Can we see minimax as a special case of
expectimax?
Quiz: what would pacman’s computation look like if we
assumed that the ghosts were doing 1-ply minimax and
taking the result 80% of the time, otherwise moving
randomly?
If you take this further, you end up calculating belief
distributions over your opponents’ belief distributions
over your belief distributions, etc…
Can get unmanageable very quickly!
33
Expectimax for Pacman
Results from playing 5 games
Minimizing Random
Ghost Ghost
Pacman used depth 4 search with an eval function that avoids trouble
Ghost used depth 2 search with an eval function that seeks Pacman
Expectimax Example
35
Expectimax Pruning?
36
Expectimax Evaluation
Evaluation functions quickly return an estimate for a
node’s true value (which value, expectimax or minimax?)
For minimax, evaluation function scale doesn’t matter
We just want better states to have higher evaluations
(get the ordering right)
For expectimax, we need magnitudes to be meaningful
ExpectiMinimax-Value(state):
Stochastic Two-Player
Dice rolls increase b: 21 possible rolls with
2 dice
Backgammon 20 legal moves
Depth 2 = 20 x (21 x 20)3 = 1.2 x 109
As depth increases, probability of reaching
a given search node shrinks
So usefulness of search is diminished
So limiting depth is less damaging
But pruning is trickier…
TDGammon uses depth-2 search + very
good evaluation function + reinforcement
learning:
world-champion level play
1st AI world champion in any game!