0% found this document useful (0 votes)
22 views62 pages

Lecture 7

Uploaded by

Par Veen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views62 pages

Lecture 7

Uploaded by

Par Veen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

Adversarial Search and Game

Playing

CHAPTER 5
ICE 3201
Bangladesh University of Professionals
Environment Type Discussed In this Lecture
2

Fully
Observable  Turn-taking: Semi-dynamic

yes  Deterministic and non-deterministic

Multi-agent

yes

Sequential
yes no

Discrete no
yes Discrete

yes
Game Game Matrices Continuous Action Games
Tree
Search
CMPT 310 - Blind Search
Adversarial Search

 Examine the problems that arise when we try to


plan ahead in a world where other agents are
planning against us.

 A good example is in board games.

 Adversarial games, while much studied in AI, are a


small part of game theory in economics.
Typical AI assumptions

 Two agents whose actions alternate

 Utility values for each agent are the opposite of the


other
 creates the adversarial situation

 Fully observable environments

 In game theory terms: Zero-sum games of perfect


information.

 We’ll relax these assumptions later.


Search versus Games

 Search – no adversary
 Solution is (heuristic) method for finding goal
 Heuristic techniques can find optimal solution
 Evaluation function: estimate of cost from start to goal through given node
 Examples: path planning, scheduling activities

 Games – adversary
 Solution is strategy (strategy specifies move for every possible opponent
reply).
 Optimality depends on opponent. Why?
 Time limits force an approximate solution
 Evaluation function: evaluate “goodness” of game position
 Examples: chess, checkers, Othello, backgammon
Types of Games

deterministic Chance moves • on-line


Perfect Chess, checkers, Backgammon, backgam
information go, othello monopoly mon
• on-line
Imperfect Bridge, Skat Poker, scrabble, chess
information blackjack •
(Initial Chance tic-tac-
Moves) toe

• Theorem of Nobel Laureate Harsanyi: Every game with


chance moves during the game has an equivalent representation
with initial chance moves only.
• A deep result, but computationally it is more tractable to
consider chance moves as the game goes along.
• This is basically the same as the issue of full observability +
nondeterminism vs. partial observability + determinism.
Game Setup

 Two players: MAX and MIN

 MAX moves first and they take turns until the game is over
 Winner gets award, loser gets penalty.

 Games as search:
 Initial state: e.g. board configuration of chess
 Successor function: list of (move,state) pairs specifying legal moves.
 Terminal test: Is the game finished?
 Utility function: Gives numerical value of terminal states. E.g. win (+1), lose
(-1) and draw (0) in tic-tac-toe or chess

 MAX uses search tree to determine next move.


Size of search trees

 b = branching factor

 d = number of moves by both players

 Search tree is O(bd)

 Chess
 b ~ 35
 D ~100
- search tree is ~ 10 154 (!!)
- completely impractical to search this

 Game-playing emphasizes being able to make optimal decisions in a finite amount of time
 Somewhat realistic as a model of a real-world agent
 Even if games themselves are artificial
Partial Game Tree for Tic-Tac-Toe
Game tree (2-player, deterministic, turns)

How do we search this tree to find the optimal move?


Minimax strategy: Look ahead and reason backwards

 Find the optimal strategy for MAX assuming an


infallible MIN opponent
 Need to compute this all the down the tree
 Game Tree Search Demo

 Assumption: Both players play optimally!


 Given a game tree, the optimal strategy can be
determined by using the minimax value of each
node.
 Zermelo 1912.
Two-Ply Game Tree
Two-Ply Game Tree
Two-Ply Game Tree
Two-Ply Game Tree
Minimax maximizes the utility for the worst-case outcome for max

The minimax decision


Pseudocode for Minimax Algorithm

function MINIMAX-DECISION(state) returns an action


inputs: state, current state in game
v←MAX-VALUE(state)
return the action in SUCCESSORS(state) with value v
function MAX-VALUE(state) returns a utility value
if TERMINAL-TEST(state) then return UTILITY(state)
v ← -!
for a,s in SUCCESSORS(state) do
v ← MAX(v,MIN-VALUE(s))
return v

function MIN-VALUE(state) returns a utility value


if TERMINAL-TEST(state) then return UTILITY(state)
v←!
for a,s in SUCCESSORS(state) do
v ← MIN(v,MAX-VALUE(s))
return v
Minimax algorithm
Minimax algorithm
Minimax algorithm
Minimax algorithm
Minimax algorithm
Minimax algorithm
Example of Algorithm Execution

MAX to move
Minimax Algorithm

 Complete depth-first exploration of the game tree

 Assumptions:
 Max depth = d, b legal moves at each point

 E.g., Chess: d ~ 100, b ~35


Criterion Minimax

Time  O(bd)

Space O(bd)

Practical problem with minimax search

 Number of game states is exponential in the number of


moves.
 Solution: Do not examine every node

=> pruning
 Remove branches that do not influence final decision

 Revisit example …
Alpha-Beta Example

Do DF-search until first leaf

Range of possible values


[-!,+!]

[-!, +!]
Alpha-Beta Example (continued)

[-!,+!]

[-!,3]
Alpha-Beta Example (continued)

[-!,+!]

[-!,3]
Alpha-Beta Example (continued)

[3,+!]

[3,3]
Alpha-Beta Example (continued)

[3,+!]
This node is worse

for MAX

[3,3] [-!,2]
Alpha-Beta Example (continued)

[3,14] ,

[3,3] [-!,2] [-!,14]


Alpha-Beta Example (continued)

[3,5] ,

[3,3] ["!,2] [-!,5]


Alpha-Beta Example (continued)

[3,3]

[3,3] ["!,2] [2,2]


Alpha-Beta Example (continued)

[3,3]

[3,3] [-!,2] [2,2]


Alpha-beta Algorithm

 Depth first search – only considers nodes along a single


path at any time

α = highest-value choice that we can guarantee for MAX


so far in the current subtree.
β = lowest-value choice that we can guarantee for MIN so
far in the current subtree.
 update values of α and β during search and prunes
remaining branches as soon as the value is known to be
worse than the current α or β value for MAX or MIN.
 Alpha-beta Demo.
Effectiveness of Alpha-Beta Search

 Worst-Case
 branches are ordered so that no pruning takes place. In this case alpha-beta
gives no improvement over exhaustive search

 Best-Case
 each player’s best move is the left-most alternative (i.e., evaluated first)
 in practice, performance is closer to best rather than worst-case

 In practice often get O(b(d/2)) rather than O(bd)


 this is the same as having a branching factor of sqrt(b),
 since (sqrt(b))d = b(d/2)
 i.e., we have effectively gone from b to square root of b
 e.g., in chess go from b ~ 35 to b ~ 6
 this permits much deeper search in the same amount of time
 Typically twice as deep.
Example
-which nodes can be pruned?
MAX

MIN

MAX

5 6
3 4 1 2 7 8
Alpha Beta Pruning

 Alpha-beta pruning is a modified version of the


minimax algorithm. It is an optimization technique
for the minimax algorithm.
 As we have seen in the minimax search algorithm
that the number of game states it has to examine are
exponential in depth of the tree.
 Since we cannot eliminate the exponent, but we can
cut it to half. Hence there is a technique by which
without checking each node of the game tree we can
compute the correct minimax decision, and this
technique is called pruning.
Alpha Beta Pruning

 Alpha-beta pruning can be applied at any depth of a


tree, and sometimes it not only prune the tree leaves
but also entire sub-tree.
 The two-parameter can be defined as:
 Alpha: The best (highest-value) choice we have found so far at
any point along the path of Maximizer. The initial value of
alpha is -∞.
 Beta: The best (lowest-value) choice we have found so far at
any point along the path of Minimizer. The initial value of beta
is +∞.
Condition for Alpha Beta Pruning

 The main condition which required for alpha-beta


pruning is:
α>=β
Key Points

 The Max player will only update the value of alpha.


 The Min player will only update the value of beta.
 While backtracking the tree, the node values will be
passed to upper nodes instead of values of alpha and
beta.
 We will only pass the alpha, beta values to the child
nodes.
Example
Example

 Step 1: At the first step the, Max player will start


first move from node A where α= -∞ and β= +∞,
these value of alpha and beta passed down to node B
where again α= -∞ and β= +∞, and Node B passes
the same value to its child D.

Example
Example

 Step 2: At Node D, the value of α will be calculated


as its turn for Max. The value of α is compared with
firstly 2 and then 3, and the max (2, 3) = 3 will be the
value of α at node D and node value will also 3.
 Step 3: Now algorithm backtrack to node B, where
the value of β will change as this is a turn of Min,
Now β= +∞, will compare with the available
subsequent nodes value, i.e. min (∞, 3) = 3, hence at
node B now α= -∞, and β= 3.
Example
Example

 In the next step, algorithm traverse the next


successor of Node B which is node E, and the values
of α= -∞, and β= 3 will also be passed.
 Step 4: At node E, Max will take its turn, and the
value of alpha will change. The current value of alpha
will be compared with 5, so max (-∞, 5) = 5, hence at
node E α= 5 and β= 3, where α>=β, so the right
successor of E will be pruned, and algorithm will not
traverse it, and the value at node E will be 5.

Example
Example

 Step 5: At next step, algorithm again backtrack the


tree, from node B to node A. At node A, the value of
alpha will be changed the maximum available value is
3 as max (-∞, 3)= 3, and β= +∞, these two values now
passes to right successor of A which is Node C.
 At node C, α=3 and β= +∞, and the same values will be
passed on to node F.
 Step 6: At node F, again the value of α will be
compared with left child which is 0, and max(3,0)= 3,
and then compared with right child which is 1, and
max(3,1)= 3 still α remains 3, but the node value of F
will become 1.
Example
Example

 Step 7: Node F returns the node value 1 to node C,


at C α= 3 and β= +∞, here the value of beta will be
changed, it will compare with 1 so min (∞, 1) = 1.
Now at C, α=3 and β= 1, and again it satisfies the
condition α>=β, so the next child of C which is G will
be pruned, and the algorithm will not compute the
entire sub-tree G.
Example
Example

 Step 8: C now returns the value of 1 to A here the


best value for A is max (3, 1) = 3. Following is the
final game tree which is the showing the nodes which
are computed and nodes which has never computed.
Hence the optimal value for the maximizer is 3 for
this example.
Final Comments about Alpha-Beta Pruning

 Pruning does not affect final results

 Entire subtrees can be pruned.

 Good move ordering improves effectiveness of


pruning

 Repeated states are again possible.


 Store them in memory = transposition table
Practical Implementation

How do we make these ideas practical in real game trees?

Standard approach:
 cutoff test: (where do we stop descending the tree)
 depth limit
 better: iterative deepening
 cutoff only when no big changes are expected to occur next (quiescence search).

 evaluation function
 When the search is cut off, we evaluate the current state
by estimating its utility using an evaluation function.
Static (Heuristic) Evaluation Functions

 An Evaluation Function:
 estimates how good the current board configuration is for a player.

 Typically, one figures how good it is for the player, and how good it is for the
opponent, and subtracts the opponents score from the players
 Othello: Number of white pieces - Number of black pieces
 Chess: Value of all white pieces - Value of all black pieces

 Typical values from -infinity (loss) to +infinity (win) or [-1, +1].


 If the board evaluation is X for a player, it’s -X for the opponent.
 Many clever ideas about how to use the evaluation function.
 e.g. null move heuristic: let opponent move twice.
 Example:
 Evaluating chess boards,
 Checkers
 Tic-tac-toe
Iterative (Progressive) Deepening

 In real games, there is usually a time limit T on making a


move

 How do we take this into account?


 using alpha-beta we cannot use “partial” results with any confidence
unless the full breadth of the tree has been searched
 So, we could be conservative and set a conservative depth-limit
which guarantees that we will find a move in time < T
 disadvantage is that we may finish early, could do more search

 In practice, iterative deepening search (IDS) is used


 IDS runs depth-first search with an increasing depth-limit

 when the clock runs out we use the solution found at the previous
depth limit
The State of Play

 Checkers:
 Chinook ended 40-year-reign of human world champion
Marion Tinsley in 1994.
 Chess:
 Deep Blue defeated human world champion Garry Kasparov in
a six-game match in 1997.
 Othello:
 human champions refuse to compete against computers: they
are too good.
 Go:
 human champions refuse to compete against computers: they
are too bad b > 300 (!)
 See (e.g.) http://www.cs.ualberta.ca/~games/ for more information
Deep Blue

 1957: Herbert Simon


 “within 10 years a computer will beat the world chess champion”

 1997: Deep Blue beats Kasparov


 Parallel machine with 30 processors for “software” and 480
VLSI processors for “hardware search”
 Searched 126 million nodes per second on average
 Generated up to 30 billion positions per move
 Reached depth 14 routinely
 Uses iterative-deepening alpha-beta search with
transpositioning
 Can explore beyond depth-limit for interesting moves
Summary
 Game playing can be effectively modeled as a search problem

 Game trees represent alternate computer/opponent moves

 Evaluation functions estimate the quality of a given board


configuration for the Max player.

 Minimax is a procedure which chooses moves by assuming that


the opponent will always choose the move which is best for them

 Alpha-Beta is a procedure which can prune large parts of the


search tree and allow search to go deeper

 For many well-known games, computer algorithms based on


heuristic search match or out-perform human world experts.

You might also like