0% found this document useful (0 votes)
16 views38 pages

Adversial Search

different types of searching like informed and uninformed

Uploaded by

Surya Basnet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views38 pages

Adversial Search

different types of searching like informed and uninformed

Uploaded by

Surya Basnet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 38

Announcements

 Assignment 3 due before midnight tonight

 Assignment 4 out today, due next Thursday

1
CS 221: Artificial Intelligence

Lecture 8: Adversarial Search


also known as
Games

Peter Norvig and Sebastian Thrun


Slide credits: Dan Klein, Stuart Russell, Andrew Moore

2
Do the Right Thing under Uncertainty

3
Problems of Agent Uncertainty
 Problem: Stochasticity
 Solution: MDPs → a policy π(s)
 Problem: Partial Observability
 Operate in belief space: POMDPs → policy
 Problem: Unknown Model
 Exploration; Reinforcement learning
 Problem: Computational Limitation
 Heuristics; A*; Monte Carlo approximations
 Problem: Other agents; Adversaries
4
Environments and Agents
 What is the environment?
 Something that evolves from one state to next in
response to actions: Result(s, a) → s′

 What is an Agent? An adversary?


 Lecture 2: agent: “a person or thing that acts”
 For a sailor: The moon?
 Tides: “remain at rest unless a force acts …”
 The wind and waves?
 For Pacman: A RandomGhostAgent?

5
Decision to Model as an Agent
 Model an object as an agent iff:
 The object’s actions are dependent not just on
the state, but on my belief state
(especially my beliefs about my own actions)
“Now, a clever man would put
the poison into his own goblet,
because he would know that
only a great fool would reach
for what he was given. I am not
a great fool, so I can clearly not
choose the wine in front of you.
But you must have known I was
not a great fool, you would
have counted on it, so I can
clearly not choose the wine in
front of me.”
Princess Bride: Vizzini vs. Wesley 6
Other Agents

7
Other Agents
 Cooperation

 Competition

8
Types of Games
 Deterministic (Chess)

 Stochastic (Soccer)

 Partially Observable (Poker)


 (Also n > 2 players; stochastic)

 Large state space (Go)


9
Game Playing State-of-the-Art
 Chess: Deep Blue defeated human world champion Gary Kasparov in a six-
game match in 1997. Deep Blue examined 200 million positions per second,
used very sophisticated evaluation and undisclosed methods for extending
some lines of search up to 40 ply. Current programs are even better, if less
historic.

 Checkers: Chinook ended 40-year-reign of human world champion Marion


Tinsley in 1994. Used an endgame database defining perfect play for all
positions involving 8 or fewer pieces on the board, a total of 443 billion
positions. Checkers is now solved!

 Othello: Human champions refuse to compete against computers, which are


too good.

 Go: Human champions are just beginning to be challenged by machines,


though the best humans still beat the best machines. In go, b > 300, so most
programs use pattern knowledge bases to suggest plausible moves, along with
aggressive pruning.

 Pacman: unknown

10
1: Deterministic, Fully Observable
 Many possible formalizations, one is:
 States: S (start at s0)
 Players: P={1...N} (usually take turns; often N=2)
 Actions: A (may depend on player / state)
 Transition Function: SxA  S or Sx{Ai}  S
 Terminal Test: S  {t,f}
 Terminal Utilities: SxP  R

 Solution for a player is a policy: S  A

11
Deterministic Single-Player
 Deterministic, single player
(solitaire), perfect information:
 Know the rules
 Know what actions do
 Know when you win
 E.g. Freecell, Rubik’s cube
 … it’s just search!

 Slight reinterpretation:
 Each node stores a value: the
best outcome it can reach
 This is the maximal outcome of
its children (the max value)
 Note that we don’t have path
sums as before (utilities at end)
lose win lose

12
Deterministic Two-Player
 Deterministic, zero-sum games: Minimax values:
computed recursively
 Tic-tac-toe, chess, checkers
5 max
 One player maximizes result
 The other minimizes result
2 5 min
 Minimax search:
 A state-space search tree
 Players alternate turns 8 2 5 6
 Each node has a minimax
Terminal values:
value: best achievable utility
part of the game
against a rational adversary

13
Computing Minimax Values
 Two recursive functions:
 max-value maxes the values of successors
 min-value mins the values of successors

def value(state):
If the state is a terminal state: return the state’s utility
If the next agent is MAX: return max-value(state)
If the next agent is MIN: return min-value(state)
def max-value(state):
Initialize max = -∞
For each successor of state:
V ← value(successor)
max ← maximum(max, v)
Return max
Tic-tac-toe Game Tree

15
Minimax Example

3 max

3 2 1

3 12 8 2 3 9 14 1 8

16
Minimax Properties
 Optimal against a perfect player.
Against non-perfect player?
max
 Time complexity?
 O(bm) min

 Space complexity?
 O(bm)
10 10 9 100

 For chess, b  35, m  100


 Exact solution is completely infeasible
 But, do we need to explore the whole tree?

17
Overcoming Computational Limits
 Cannot search to leaves in most games
4 max
 Depth-limited search
 Instead, search a limited depth of tree -2 min 4 min
 Replace terminal utilities with an evaluation limit=2
function for non-terminals -1 -2 4 9
 Guarantee of optimal play is gone

 More plies makes a BIG difference


(as does good evaluation function)

 Example: Chess program


 Suppose we have 100 seconds, can explore
10K nodes / sec
 So can check 1M nodes per move
 Minimax won’t finish depth 4: novice
 If we could reach depth 8: decent
 How could we achieve that?
? ? ? ?
18
Depth-Limited Search
 Still two recursive functions:
 max-value and min-value

def value(state, limit):


If the state is a terminal state: return the state’s utility
If limit = 0: return evaluation_function(state)
If the next agent is MAX: return max-value(state, limit)
If the next agent is MIN: return min-value(state, limit)
def max-value(state, limit):
Initialize max = -∞
For each successor of state:
V ← value(successor, limit-1)
max ← maximum(max, v)
Return max
Problem: Horizon Effect
(or Why Pacman Starves)

1 1

1 0 1

 Example: Depth limited search with depth 2


 Evaluation function = number of dots eaten.
 (For now ignore ghosts; treat as single-player game.)
 Backing up values gives 1 for root, and 1 for west, east moves from root.
 So Pacman is just as happy to eat as not eat.
 But note he might forever go east, west, east, west, … !
Evaluation Functions
 Function which scores non-terminals

 Ideal function: returns the utility of the position


 In practice: typically weighted linear sum of features:

 e.g. f1(s) = (num white queens – num black queens), etc.


21
Pruning in Minimax

3 max

3 ≤2 ≤1

3 12 8 2 14 1

22
-: Pruning in Depth-Limited Search
 General configuration
  is the best value that Player

MAX can get at any


choice point along the Opponent 

current path
 If n becomes worse than
, MAX will avoid it, so Player
can stop considering n’s
other children Opponent n

 Define  similarly for MIN

23
Another - Pruning Example

≤2 ≤1
3

3 12 2 14 5 1

≥8

8
- Pruning Algorithm

25
- Pruning Properties
 Pruning has no effect on final action computed

 Good move ordering improves effectiveness of pruning

 With “perfect ordering”:


 Time complexity drops to O(bm/2)
 Doubles solvable depth
 Chess: from bad to good player, but far from perfect

 A simple example of metareasoning, here reasoning


about which computations are relevant

26
Stochasticity

28
Expectimax Search Trees
 What if we don’t know what the
result of an action will be? E.g.,
 In solitaire, next card is unknown
 In backgammon, dice roll max
 In minesweeper, mine locations
 In Pacman, random ghost moves

 Can do expectimax search chance


 Max nodes as in minimax search
 Chance nodes are like min nodes,
except the outcome is uncertain
 Chance nodes take average
(expectation) of value of children 10 4 5 7

 This is a Markov Decision


Process
couched in the language of trees

29
Reminder: Expectations
 We can define function f(X) of a random variable X

 The expected value, E[f(X)], is the average value,


weighted by the probability of each value X=xi

 Example: How long to get to the airport?


 Length of driving time as a function of traffic, L(T):
L(none) = 20 min, L(light) = 30 min, L(heavy) = 60 min
 Given P(T) = {none: 0.25, light: 0.5, heavy: 0.25}
 What is my expected driving time, E[ L(T) ]?
 E[ L(T) ] = ∑i L(ti) P(ti)
 E[ L(T) ] = L(none) P(none) + L(light) P(light) + L(heavy) P(heavy)
 E[ L(T) ] = (20 * 0.25) + (30 * 0.5) + (60 * 0.25) = 35 min

30
Expectimax Search
 In expectimax search, we have a
probabilistic model of how the
opponent (or environment) will
behave in any state
 Model could be a simple uniform
distribution (roll a die)
 Model could be sophisticated
and require a great deal of
computation
 We have a node for every
outcome out of our control:
opponent or environment
 The model might say that
adversarial actions are likely!
 For now, assume for any state
we magically have a distribution
to assign probabilities to
opponent actions / environment
outcomes Having a probabilistic belief about
an agent’s action does not mean
that agent is flipping any coins! 31
Expectimax Algorithm
def value(s)
if s is a max node return maxValue(s)
if s is an exp node return expValue(s)
if s is a terminal node return evaluation(s)

def maxValue(s)
values = [value(s’) for s’ in successors(s)]
return max(values)
8 4 5 6

def expValue(s)
values = [value(s’) for s’ in successors(s)]
weights = [probability(s, s’) for s’ in successors(s)]
return expectation(values, weights)

32
Expectimax for Pacman
 Notice that we’ve gotten away from thinking that the
ghosts are trying to minimize pacman’s score
 Instead, they are now a part of the environment
 Pacman has a belief (distribution) over how they will act
 Quiz: Can we see minimax as a special case of
expectimax?
 Quiz: what would pacman’s computation look like if we
assumed that the ghosts were doing 1-ply minimax and
taking the result 80% of the time, otherwise moving
randomly?
 If you take this further, you end up calculating belief
distributions over your opponents’ belief distributions
over your belief distributions, etc…
 Can get unmanageable very quickly!

33
Expectimax for Pacman
Results from playing 5 games

Minimizing Random
Ghost Ghost

Won 5/5 Won 5/5


Minimax
Pacman Avg. Score: Avg. Score:
493 483

Won 1/5 Won 5/5


Expectimax
Pacman Avg. Score: Avg. Score:
-303 503

Pacman used depth 4 search with an eval function that avoids trouble
Ghost used depth 2 search with an eval function that seeks Pacman
Expectimax Example

35
Expectimax Pruning?

36
Expectimax Evaluation
 Evaluation functions quickly return an estimate for a
node’s true value (which value, expectimax or minimax?)
 For minimax, evaluation function scale doesn’t matter
 We just want better states to have higher evaluations
(get the ordering right)
 For expectimax, we need magnitudes to be meaningful

0 40 20 30 x2 0 1600 400 900


Expectiminimax
 E.g. Backgammon
 Environment is an extra
player that moves after
each agent
 Combines minimax
and expectimax

ExpectiMinimax-Value(state):
Stochastic Two-Player
 Dice rolls increase b: 21 possible rolls with
2 dice
 Backgammon  20 legal moves
 Depth 2 = 20 x (21 x 20)3 = 1.2 x 109
 As depth increases, probability of reaching
a given search node shrinks
 So usefulness of search is diminished
 So limiting depth is less damaging
 But pruning is trickier…
 TDGammon uses depth-2 search + very
good evaluation function + reinforcement
learning:
world-champion level play
 1st AI world champion in any game!

You might also like