0% found this document useful (0 votes)
25 views74 pages

Unit 202 Game Playing

game_playing

Uploaded by

test.qip.project
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views74 pages

Unit 202 Game Playing

game_playing

Uploaded by

test.qip.project
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

Adversarial Search

Chapter 5

Mausam
(Based on slides of Stuart Russell, Henry
Kautz, Linda Shapiro & UW AI Faculty)
1
Game Playing

Why do AI researchers study game playing?

1. It’s a good reasoning problem, formal and nontrivial.

2. Direct comparison with humans and other computer


programs is easy.

3
What Kinds of Games?
Mainly games of strategy with the following
characteristics:

1. Sequence of moves to play


2. Rules that specify possible moves
3. Rules that specify a payment for each move
4. Objective is to maximize your payment

4
Games vs. Search Problems

• Unpredictable opponent  specifying a move


for every possible opponent reply

• Time limits  unlikely to find goal, must


approximate

5
Two-Player Game
Opponent’s Move

Generate New Position

Game yes
Over?
no
Generate Successors

Evaluate Successors

Move to Highest-Valued Successor

no Game yes
Over?

6
Games as Adversarial Search
• States:
– board configurations
• Initial state:
– the board position and which player will move
• Successor function:
– returns list of (move, state) pairs, each indicating a legal
move and the resulting state
• Terminal test:
– determines when the game is over
• Utility function:
– gives a numeric value in terminal states
(e.g., -1, 0, +1 for loss, tie, win)
7
Game Tree (2-player, Deterministic,
Turns)
computer’s
turn

opponent’s
turn

computer’s The computer is Max.


turn
The opponent is Min.
opponent’s
turn

leaf nodes At the leaf nodes, the


are evaluated utility function
is employed. Big value
means good, small is bad.
8
Mini-Max Terminology
• move: a move by both players
• ply: a half-move
• utility function: the function applied to leaf nodes
• backed-up value
– of a max-position: the value of its largest successor
– of a min-position: the value of its smallest successor
• minimax procedure: search down several levels; at
the bottom level apply the utility function, back-up
values all the way up to the root node, and that node
selects the move.
9
Minimax
• Perfect play for deterministic games
• Idea: choose move to position with highest minimax value
= best achievable payoff against best play
• E.g., 2-ply game:

10
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
11
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
12
80

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
13
30

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
14
30

30

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
15
30

30 25

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
16
30

30 25

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
17
30

30

30 25

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
18
30

30

30 25

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
19
30

30 20

30 25 20

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
20
30

30 20

30 25 20 05

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
21
30

30 20

30 25 20 05

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
22
30

30 20

30 25 20 05

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
23
20

20

30 20

30 25 20 05

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
24
20

20 15

30 20 15 60

30 25 20 05 10 15 45 60

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
25
20

20 15

30 20 15 60

30 25 20 05 10 15 45 60

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
26
20

20 15

30 20 15 60

30 25 20 05 10 15 45 60

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
27
Minimax Strategy
• Why do we take the min value every other
level of the tree?

• These nodes represent the opponent’s choice


of move.

• The computer assumes that the human will


choose that move that is of least value to the
computer.
28
Minimax algorithm
Adversarial analogue of DFS

29
Properties of Minimax
• Complete?
– Yes (if tree is finite)
• Optimal?
– Yes (against an optimal opponent)
– No (does not exploit opponent weakness against suboptimal opponent)
• Time complexity?
– O(bm)
• Space complexity?
– O(bm) (depth-first exploration)

30
Good Enough?
• Chess:
– branching factor b≈35

– game length m≈100

– search space bm ≈ 35100 ≈ 10154

• The Universe:
– number of atoms ≈ 1078

– age ≈ 1018 seconds

– 108 moves/sec x 1078 x 1018 = 10104

• Exact solution completely infeasible


31
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
32
30

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
33
30

30 25

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
34
30 Do we need to check
this node?

30 25

80 30 25 ?? 55 20 05 65 40 10 70 15 50 45 60 75
35
30 No - this branch is guaranteed to be
worse than what max already has

30 25

80 30 25 X
?? 55 20 05 65 40 10 70 15 50 45 60 75
36
30

30 20
Do we need to check
this node?

30 25 20 05

80 30 25 X
35 55 20 05 ?? 40 10 70 15 50 45 60 75
37
30

30 20

30 25 20 05

80 30 25 X
35 55 20 05 X
?? 40 10 70 15 50 45 60 75
38
Alpha-Beta
• The alpha-beta procedure can speed up a
depth-first minimax search.
• Alpha: a lower bound on the value that a max
node may ultimately be assigned
v>

• Beta: an upper bound on the value that a


minimizing node may ultimately be assigned
v<
39
Alpha-Beta
MinVal(state, alpha, beta){
if (terminal(state))
return utility(state);
for (s in children(state)){
child = MaxVal(s,alpha,beta);
beta = min(beta,child);
if (alpha>=beta) return child;
}
return best child (min); }

alpha = the highest value for MAX along the path


beta = the lowest value for MIN along the path
40
Alpha-Beta
MaxVal(state, alpha, beta){
if (terminal(state))
return utility(state);
for (s in children(state)){
child = MinVal(s,alpha,beta);
alpha = max(alpha,child);
if (alpha>=beta) return child;
}
return best child (max); }

alpha = the highest value for MAX along the path


beta = the lowest value for MIN along the path
41
α=-∞
α - the best value β=∞
for max along the path
β - the best value
for min along the path

α=-∞
β=∞

α=-∞
β=∞

α=-∞
β=∞

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
42
α=-∞
α - the best value β=∞
for max along the path
β - the best value
for min along the path

α=-∞
β=∞

α=-∞
β=∞

α=-∞ 80
β=80

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
43
α=-∞
α - the best value β=∞
for max along the path
β - the best value
for min along the path

α=-∞
β=∞

α=-∞
β=∞

α=-∞
30
β=30

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
44
α=-∞
α - the best value β=∞
for max along the path
β - the best value
for min along the path

α=-∞
β=∞

α=30
β=∞ 30

α=-∞
30
β=30

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
45
α=-∞
α - the best value β=∞
for max along the path
β - the best value
for min along the path

α=-∞
β=∞

α=30
β=∞ 30

α=30
β=∞
α=-∞
30
β=30

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
46
α=-∞
α - the best value β=∞
for max along the path
β - the best value
for min along the path

α=-∞
β=∞

α=30
β=∞ 30

β≤α
α=30
β=25 prune!
α=-∞
30 25
β=30

80 30 25 X
35 55 20 05 65 40 10 70 15 50 45 60 75
47
α=-∞
α - the best value β=∞
for max along the path
β - the best value
for min along the path

α=-∞ 30
β=30

α=30
β=∞ 30

α=30
β=25
α=-∞
30 25
β=30

80 30 25 X
35 55 20 05 65 40 10 70 15 50 45 60 75
48
α=-∞
α - the best value β=∞
for max along the path
β - the best value
for min along the path

α=-∞ 30
β=30

α=30 α=-∞
β=∞ 30 β=30

α=30
β=25
α=-∞ α=-∞
30 25
β=30 β=30

80 30 25 X
35 55 20 05 65 40 10 70 15 50 45 60 75
49
α=-∞
α - the best value β=∞
for max along the path
β - the best value
for min along the path

α=-∞ 30
β=30

α=30 α=20
β=∞ 30 β=30 20

α=30 α=20
β=25 β=30
α=-∞ α=-∞ 20
30 25
β=30 β=20

80 30 25 X
35 55 20 05 65 40 10 70 15 50 45 60 75
50
α=-∞
α - the best value β=∞
for max along the path
β - the best value
for min along the path

α=-∞ 30
β=30

α=30 α=20
β=∞ 30 β=30 20

α=30 α=20
β=25 β=05
α=-∞ α=-∞ 20
30 25 05
β=30 β=20

80 30 25 X
35 55 20 05 65 40 10 70 15 50 45 60 75
51
α=-∞
α - the best value β=∞
for max along the path
β - the best value
for min along the path

α=-∞ 30
β=30

α=30 α=20
β=∞ 30 β=30 20

β≤α
α=30 α=20
β=25 β=05 prune!
α=-∞ α=-∞ 20
30 25 05
β=30 β=20

80 30 25 X
35 55 20 05 X
65 40 10 70 15 50 45 60 75
52
α=-∞
α - the best value β=∞
for max along the path
β - the best value
for min along the path

α=-∞ 20
β=20

α=30 α=20
β=∞ 30 β=30 20

α=30 α=20
β=25 β=05
α=-∞ α=-∞ 20 05
30 25
β=30 β=20

80 30 25 X
35 55 20 05 X
65 40 10 70 15 50 45 60 75
53
α=20
α - the best value 20 β=∞
for max along the path
β - the best value
for min along the path

α=-∞ 20
β=20

α=30 α=20
β=∞ 30 β=30 20

α=30 α=20
β=25 β=05
α=-∞ α=-∞ 20 05
30 25
β=30 β=20

80 30 25 X
35 55 20 05 X
65 40 10 70 15 50 45 60 75
54
α=20
α - the best value 20 β=∞
for max along the path
β - the best value
for min along the path

α=20
20
β=∞

α=20
30 20 β=∞

α=20
30 25 20 05 β=∞

80 30 25 X
35 55 20 05 X
65 40 10 70 15 50 45 60 75
55
α=20
α - the best value 20 β=∞
for max along the path
β - the best value
for min along the path

α=20
20
β=∞

α=20
30 20 β=∞

α=20
30 25 20 05 β=10 10

80 30 25 X
35 55 20 05 X
65 40 10 70 15 50 45 60 75
56
α=20
α - the best value 20 β=∞
for max along the path
β - the best value
for min along the path

α=20
20
β=∞

α=20
30 20 10 β=∞

α=20
30 25 20 05 β=10 10

80 30 25 X
35 55 20 05 X
65 40 10 70 15 50 45 60 75
57
α=20
α - the best value 20 β=∞
for max along the path
β - the best value
for min along the path

α=20
20
β=∞

α=20
30 20 10 β=∞

α=20
α=20 β=15
30 25 20 05 β=10 10 15

80 30 25 X
35 55 20 05 X
65 40 10 70 15 50 45 60 75
58
α=20
α - the best value 20 β=∞
for max along the path
β - the best value
for min along the path

α=20
20
β=∞

α=20
30 20 15 β=∞

α=20
α=20 β=15
30 25 20 05 β=10 10 15

80 30 25 X
35 55 20 05 X
65 40 10 70 15 50 45 60 75
59
α=20
α - the best value 20 β=∞
for max along the path
β - the best value
for min along the path

α=20
20 15
β=15

α=20
30 20 15 β=∞

α=20
α=20 β=15
30 25 20 05 β=10 10 15

80 30 25 X
35 55 20 05 X
65 40 10 70 15 50 45 60 75
60
α=20
α - the best value
for max along the path
20 β=∞
β≤α
β - the best value
for min along the path
prune!

α=20
20 15
β=15

X
α=20
30 20 15 β=∞

α=20
α=20 β=15
30 25 20 05 β=10 10 15
X X

80 30 25 X
35 55 20 05 X
65 40 10 70 15 50 X
X 45 X
60 X
75
61
Bad and Good Cases for Alpha-Beta Pruning
• Bad: Worst moves encountered first
4 MAX
+----------------+----------------+
2 3 4 MIN
+----+----+ +----+----+ +----+----+
6 4 2 7 5 3 8 6 4 MAX
+--+ +--+ +--+ +-+-+ +--+ +--+ +--+ +--+ +--+--+
6 5 4 3 2 1137 4 5 2 3 8 2 1 61 2 4

• Good: Good moves ordered first


4 MAX
+----------------+----------------+
4 3 2 MIN
+----+----+ +----+----+ +----+----+
4 6 8 3 x x 2 x x MAX
+--+ +--+ +--+ +--+ +-+-+
4 2 6 x 8 x 3 2 1 21
• If we can order moves, we can get more benefit from alpha-beta pruning
Properties of α-β
• Pruning does not affect final result. This means that it gets the
exact same result as does full minimax.

• Good move ordering improves effectiveness of pruning

• With "perfect ordering," time complexity = O(bm/2)


 doubles depth of search

• A simple example of reasoning about ‘which computations are


relevant’ (a form of metareasoning)

63
Why O(bm/2)?
Let T(m) be time complexity of search for depth m

Normally:
T(m) = b.T(m-1) + c  T(m) = O(bm)

With ideal α-β pruning:


T(m) = T(m-1) + (b-1)T(m-2) + c  T(m) = O(bm/2)

64
Node Ordering
Iterative deepening search

Use evaluations of the previous search for order

Also helps in returning a move in given time

65
Good Enough?
• Chess: The universe
– branching factor b≈35 can play chess
- can we?
– game length m≈100
– search space bm/2 ≈ 3550 ≈ 1077
• The Universe:
– number of atoms ≈ 1078
– age ≈ 1018 seconds
– 108 moves/sec x 1078 x 1018 = 10104 66
Cutting off Search
MinimaxCutoff is identical to MinimaxValue except
1. Terminal? is replaced by Cutoff?
2. Utility is replaced by Eval

Does it work in practice?


bm = 106, b=35  m=4

4-ply lookahead is a hopeless chess player!


– 4-ply ≈ human novice
– 8-ply ≈ typical PC, human master
– 12-ply ≈ Deep Blue, Kasparov

67
Cutoff

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
68
0

0 0

0 0 Cutoff 0 0

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
69
Evaluation Functions
Tic Tac Toe
• Let p be a position in the game
• Define the utility function f(p) by
– f(p) =
• largest positive number if p is a win for computer
• smallest negative number if p is a win for opponent
• RCDC – RCDO
– where RCDC is number of rows, columns and diagonals in
which computer could still win
– and RCDO is number of rows, columns and diagonals in
which opponent could still win.

70
Sample Evaluations
• X = Computer; O = Opponent

O O O X
X X X

X O X O
rows rows
cols cols
diags diags
71
Evaluation functions
• For chess/checkers, typically linear weighted sum of features
Eval(s) = w1 f1(s) + w2 f2(s) + … + wm fm(s)

e.g., w1 = 9 with
f1(s) = (number of white queens) – (number of black queens),
etc.

72
Example: Samuel’s Checker-Playing
Program
• It uses a linear evaluation function
f(n) = w1f1(n) + w2f2(n) + ... + wmfm(n)

For example: f = 6K + 4M + U
– K = King Advantage
– M = Man Advantage
– U = Undenied Mobility Advantage (number of
moves that Max where Min has no jump moves)

73
Samuel’s Checker Player
• In learning mode

– Computer acts as 2 players: A and B


– A adjusts its coefficients after every move
– B uses the static utility function
– If A wins, its function is given to B

74
Samuel’s Checker Player
• How does A change its function?
Coefficent replacement
(node) = backed-up value(node) – initial value(node)
if > 0 then terms that contributed positively are
given more weight and terms that contributed
negatively get less weight
if < 0 then terms that contributed negatively are
given more weight and terms that contributed
positively get less weight

75

You might also like