0% found this document useful (0 votes)
17 views101 pages

Adversial Search

The document discusses adversarial search and game playing. It covers topics like minimax search, alpha-beta pruning, and static evaluation functions which are used to evaluate board positions in games. Practical techniques like iterative deepening and quiescence search are also mentioned to make these search methods more effective in real games.

Uploaded by

acharabibash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views101 pages

Adversial Search

The document discusses adversarial search and game playing. It covers topics like minimax search, alpha-beta pruning, and static evaluation functions which are used to evaluate board positions in games. Practical techniques like iterative deepening and quiescence search are also mentioned to make these search methods more effective in real games.

Uploaded by

acharabibash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 101

Adversarial Search and Game-Playing

Environment Type
Fully
Observable  Turn-taking: Semi-dynamic

yes  Deterministic and non-deterministic

Multi-agent

yes

Sequential
yes no

Discrete no
yes Discrete

yes
Game Game Matrices Continuous Action Games
Tree
Search
2 CMPT 310 - Blind Search
Adversarial Search
 Examine the problems that arise when we try to plan ahead in
a world where other agents are planning against us.

 A good example is in board games.

 Adversarial games, while much studied in AI, are a small part


of game theory in economics.
Typical AI assumptions

 Two agents whose actions alternate

 Utility values for each agent are the opposite of the other
 creates the adversarial situation

 Fully observable environments

 In game theory terms: Zero-sum games of perfect information.

 We’ll relax these assumptions later.


Search versus Games
 Search – no adversary
 Solution is (heuristic) method for finding goal
 Heuristic techniques can find optimal solution
 Evaluation function: estimate of cost from start to goal through given node
 Examples: path planning, scheduling activities

 Games – adversary
 Solution is strategy (strategy specifies move for every possible opponent reply).
 Optimality depends on opponent. Why?
 Time limits force an approximate solution
 Evaluation function: evaluate “goodness” of game position

 Examples: chess, checkers, Othello, backgammon


Types of Games
deterministic Chance moves • on-line
Perfect Chess, checkers, Backgammon, backgam
information go, othello monopoly mon
• on-line
Imperfect Bridge, Skat Poker, scrabble, chess
information blackjack • tic-tac-
(Initial Chance toe
Moves)

• Theorem of Nobel Laureate Harsanyi: Every game with


chance moves during the game has an equivalent representation
with initial chance moves only.
• A deep result, but computationally it is more tractable to
consider chance moves as the game goes along.
• This is basically the same as the issue of full observability +
nondeterminism vs. partial observability + determinism.
Game Setup
 Two players: MAX and MIN

 MAX moves first and they take turns until the game is over
 Winner gets award, loser gets penalty.

 Games as search:
 Initial state: e.g. board configuration of chess
 Successor function: list of (move,state) pairs specifying legal moves.
 Terminal test: Is the game finished?
 Utility function: Gives numerical value of terminal states. E.g. win (+1), lose (-1) and draw
(0) in tic-tac-toe or chess
Size of search trees
 b = branching factor

 d = number of moves by both players

 Search tree is O(bd)

 Chess
 b ~ 35
 D ~100
- search tree is ~ 10 154 (!!)
- completely impractical to search this

 Game-playing emphasizes being able to make optimal decisions in a finite amount of time
 Somewhat realistic as a model of a real-world agent
 Even if games themselves are artificial
Partial Game Tree for Tic-Tac-Toe
Game tree (2-player, deterministic,
turns)

How do we search this tree to find the optimal move?


Minimax strategy: Look ahead and reason
backwards
 Find the optimal strategy for MAX assuming an infallible MIN
opponent
 Need to compute this all the down the tree
 Game Tree Search Demo

 Assumption: Both players play optimally!


 Given a game tree, the optimal strategy can be determined by using
the minimax value of each node.
 Zermelo 1912.
Two-Ply Game Tree
Two-Ply Game Tree
Two-Ply Game Tree
Two-Ply Game Tree
Minimax maximizes the utility for the worst-case outcome for max

The minimax decision


Pseudocode for Minimax Algorithm

function MINIMAX-DECISION(state) returns an action


inputs: state, current state in game
vMAX-VALUE(state)
return the action in SUCCESSORS(state) with value v
function MAX-VALUE(state) returns a utility value
if TERMINAL-TEST(state) then return UTILITY(state)
v  -∞
for a,s in SUCCESSORS(state) do
v  MAX(v,MIN-VALUE(s))
return v

function MIN-VALUE(state) returns a utility value


if TERMINAL-TEST(state) then return UTILITY(state)
v∞
for a,s in SUCCESSORS(state) do
v  MIN(v,MAX-VALUE(s))
return v
Example of Algorithm Execution
MAX to move
Minimax Algorithm
 Complete depth-first exploration of the game tree

 Assumptions:
 Max depth = d, b legal moves at each point
 E.g., Chess: d ~ 100, b ~35
Criterion Minimax

Time  O(bd)

Space O(bd)

Practical problem with minimax search
 Number of game states is exponential in the number of moves.
 Solution: Do not examine every node
=> pruning
 Remove branches that do not influence final decision

 Revisit example …
Alpha-Beta Example
Do DF-search until first leaf

Range of possible values

[-∞,+∞]

[-∞, +∞]
Alpha-Beta Example (continued)

[-∞,+∞]

[-∞,3]
Alpha-Beta Example (continued)

[-∞,+∞]

[-∞,3]
Alpha-Beta Example (continued)

[3,+∞]

[3,3]
Alpha-Beta Example (continued)

[3,+∞]
This node is worse
for MAX

[3,3] [-∞,2]
Alpha-Beta Example (continued)

[3,14] ,

[3,3] [-∞,2] [-∞,14]


Alpha-Beta Example (continued)

[3,5] ,

[3,3] [−∞,2] [-∞,5]


Alpha-Beta Example (continued)

[3,3]

[3,3] [−∞,2] [2,2]


Alpha-Beta Example (continued)

[3,3]

[3,3] [-∞,2] [2,2]


Alpha-beta Algorithm
 Depth first search – only considers nodes along a single path at any
time

a = highest-value choice that we can guarantee for MAX so far in the


current subtree.
b = lowest-value choice that we can guarantee for MIN so far in the
current subtree.
 update values of a and b during search and prunes remaining
branches as soon as the value is known to be worse than the current a
or b value for MAX or MIN.
 Alpha-beta Demo.
Effectiveness of Alpha-Beta Search
 Worst-Case
 branches are ordered so that no pruning takes place. In this case alpha-beta gives no
improvement over exhaustive search

 Best-Case
 each player’s best move is the left-most alternative (i.e., evaluated first)
 in practice, performance is closer to best rather than worst-case

 In practice often get O(b(d/2)) rather than O(bd)


 this is the same as having a branching factor of sqrt(b),
 since (sqrt(b))d = b(d/2)
 i.e., we have effectively gone from b to square root of b
 e.g., in chess go from b ~ 35 to b ~ 6
 this permits much deeper search in the same amount of time
 Typically twice as deep.
Example
-which nodes can be pruned?
MAX

MIN

MAX

5 6
3 4 1 2 7 8
Final Comments about Alpha-Beta Pruning
 Pruning does not affect final results

 Entire subtrees can be pruned.

 Good move ordering improves effectiveness of pruning

 Repeated states are again possible.


 Store them in memory = transposition table
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Practical Implementation
How do we make these ideas practical in real game trees?

Standard approach:
 cutoff test: (where do we stop descending the tree)
 depth limit
 better: iterative deepening
 cutoff only when no big changes are expected to occur next (quiescence search).

 evaluation function
 When the search is cut off, we evaluate the current state
by estimating its utility using an evaluation function.
Static (Heuristic) Evaluation Functions
 An Evaluation Function:
 estimates how good the current board configuration is for a player.
 Typically, one figures how good it is for the player, and how good it is for the opponent, and subtracts the
opponents score from the players
 Othello: Number of white pieces - Number of black pieces
 Chess: Value of all white pieces - Value of all black pieces

 Typical values from -infinity (loss) to +infinity (win) or [-1, +1].


 If the board evaluation is X for a player, it’s -X for the opponent.
 Many clever ideas about how to use the evaluation function.
 e.g. null move heuristic: let opponent move twice.
 Example:
 Evaluating chess boards,
 Checkers
 Tic-tac-toe
Summary
 Game playing can be effectively modeled as a search problem

 Game trees represent alternate computer/opponent moves

 Evaluation functions estimate the quality of a given board configuration for the
Max player.

 Minimax is a procedure which chooses moves by assuming that the opponent


will always choose the move which is best for them

 Alpha-Beta is a procedure which can prune large parts of the search tree and
allow search to go deeper

 For many well-known games, computer algorithms based on heuristic search


match or out-perform human world experts.
Constraint Satisfaction Problems

79
Constraint satisfaction problems (CSPs)
 CSP:
 state is defined by variables Xi with values from domain Di
 goal test is a set of constraints specifying allowable combinations of values
for subsets of variables

 Allows useful general-purpose algorithms with more power than


standard search algorithms

80
Example: Map-Coloring

 Variables WA, NT, Q, NSW,V, SA,T

 Domains Di = {red,green,blue}

 Constraints: adjacent regions must have different colors


 e.g., WA ≠ NT

81
Example: Map-Coloring

 Solutions are complete and consistent assignments, e.g., WA =


red, NT = green,Q = red,NSW = green,V = red,SA = blue,T
= green

82
Constraint graph
 Binary CSP: each constraint relates two variables
 Constraint graph: nodes are variables, arcs are constraints

83
Varieties of CSPs
 Discrete variables
 finite domains:
 n variables, domain size d  O(d n) complete assignments
 e.g., 3-SAT (NP-complete)
 infinite domains:
 integers, strings, etc.
 e.g., job scheduling, variables are start/end days for each job
 need a constraint language, e.g., StartJob1 + 5 ≤ StartJob3

 Continuous variables
 e.g., start/end times for Hubble Space Telescope observations
 linear constraints solvable in polynomial time by linear programming

84
Varieties of constraints
 Unary constraints involve a single variable,
 e.g., SA ≠ green

 Binary constraints involve pairs of variables,


 e.g., SA ≠WA

 Higher-order constraints involve 3 or more variables,


 e.g., SA ≠WA ≠ NT

85
Example: Cryptarithmetic

 Variables: F T U W R O X1 X2 X3
 Domains: {0,1,2,3,4,5,6,7,8,9} {0,1}
 Constraints: Alldiff (F,T,U,W,R,O)
 O + O = R + 10 · X1
 X1 + W + W = U + 10 · X2
 X2 + T + T = O + 10 · X3
 X3 = F, T ≠ 0, F ≠ 0
86
 SEND+MORE=MONEY
 BASE+BALL=GAMES
 LOGIC+LOGIC=PROLOG

87
Real-world CSPs
 Assignment problems
 e.g., who teaches what class
 Timetabling problems
 e.g., which class is offered when and where?
 Transportation scheduling
 Factory scheduling

 Notice that many real-world problems involve real-valued


variables

88
Standard search formulation
Let’s try the standard search formulation.

We need:
• Initial state: none of the variables has a value (color)
• Successor state: one of the variables without a value will get some value.
• Goal: all variables have a value and none of the constraints is violated.

NxD
N layers
WA WA WA NT T
[NxD]x[(N-1)xD]
WA WA WA NT
NT NT NT WA

Equal! N! x D^N
89
There are N! x D^N nodes in the tree but only D^N distinct states??
Backtracking (Depth-First) search
• Special property of CSPs: They are commutative: NT = WA
This means: the order in which we assign variables WA NT
does not matter.
• Better search tree: First order variables, then assign them values one-by-one.

D
WA WA WA
WA
NT D^2
WA WA
NT NT

D^N
90
Backtracking example

91
Backtracking example

92
Backtracking example

93
Backtracking example

94
Improving backtracking efficiency
 General-purpose methods can give huge gains in speed:
 Which variable should be assigned next?
 In what order should its values be tried?
 Can we detect inevitable failure early?

95
Most constrained variable
 Most constrained variable:
choose the variable with the fewest legal values

 a.k.a. minimum remaining values (MRV) heuristic


 Picks a variable which will cause failure as soon as possible,
allowing the tree to be pruned.

96
Most constraining variable
 Tie-breaker among most constrained variables

 Most constraining variable:


 choose the variable with the most constraints on remaining
variables (most edges in graph)

97
Least constraining value
 Given a variable, choose the least constraining value:
 the one that rules out the fewest values in the remaining
variables

 Leaves maximal flexibility for a solution.


 Combining these heuristics makes 1000 queens feasible

98
Forward checking
 Idea:
 Keep track of remaining legal values for unassigned variables
 Terminate search when any variable has no legal values

99
Forward checking
 Idea:
 Keep track of remaining legal values for unassigned variables
 Terminate search when any variable has no legal values

100
Forward checking
 Idea:
 Keep track of remaining legal values for unassigned variables
 Terminate search when any variable has no legal values

101
Forward checking
 Idea:
 Keep track of remaining legal values for unassigned variables
 Terminate search when any variable has no legal values

102
Constraint propagation
 Forward checking propagates information from assigned to
unassigned variables, but doesn't provide early detection for all
failures:

 NT and SA cannot both be blue!


 Constraint propagation repeatedly enforces constraints locally

103
Arc consistency
 Simplest form of propagation makes each arc consistent
 X Y is consistent iff
for every value x of X there is some allowed y

constraint propagation propagates arc consistency on the graph.


104
Arc consistency
 Simplest form of propagation makes each arc consistent
 X Y is consistent iff
for every value x of X there is some allowed y

105
Arc consistency
 Simplest form of propagation makes each arc consistent
 X Y is consistent iff
for every value x of X there is some allowed y

 If X loses a value, neighbors of X need to be rechecked

106
Arc consistency
 Simplest form of propagation makes each arc consistent
 X Y is consistent iff
for every value x of X there is some allowed y

 If X loses a value, neighbors of X need to be rechecked


 Arc consistency detects failure earlier than forward checking
 Can be run as a preprocessor or after each assignment

107
 Time complexity: O(n2d3)

You might also like