0% found this document useful (0 votes)
16 views44 pages

COE206 L5 Adversarial Search

The document outlines the principles of adversarial search in artificial intelligence, focusing on game playing strategies such as MiniMax and Alpha-Beta pruning. It defines key concepts such as game states, utility functions, and the structure of game trees, while also discussing the complexities of multiplayer games and the potential for alliances. The document emphasizes the importance of optimizing decision-making processes in competitive environments through efficient search algorithms.

Uploaded by

smilyboy2904
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views44 pages

COE206 L5 Adversarial Search

The document outlines the principles of adversarial search in artificial intelligence, focusing on game playing strategies such as MiniMax and Alpha-Beta pruning. It defines key concepts such as game states, utility functions, and the structure of game trees, while also discussing the complexities of multiplayer games and the potential for alliances. The document emphasizes the importance of optimizing decision-making processes in competitive environments through efficient search algorithms.

Uploaded by

smilyboy2904
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

COE206 – Principles of Artificial

Intelligence
Mustafa MISIR

Istinye University, Department of Computer Engineering

[email protected]

http://mustafamisir.github.io
http://memoryrlab.github.io
L5: Adversarial Search
Game Playing 1

1
MIT 6.034 Artificial Intelligence (Fall 2010) - Search: Games, Minimax, and Alpha-Beta - https://www.youtube.com/watch?v=STjW3eH0Cik
1 / 57
Outline

I Formal Definition
I Optimal Decision in Games (MiniMax)
I Alpha-Beta Pruning
I Stochastic Games
I Partially Observable Games

2 / 57
Games

A game can be formally defined as a kind of search problem with


the following elements:
I S0 : The initial state, which specifies how the game is set up at the start.
I PLAYER(s): Defines which player has the move in a state.
I ACTIONS(s): Returns the set of legal moves in a state.
I RESULT(s, a): The transition model, which defines the result of a move.
I TERMINAL-TEST(s): A terminal test, which is true when the game is
over and false otherwise. States where the game has ended are called
terminal states.
I UTILITY(s, p): A utility function (objective / payoff function), defines
the final numeric value for a game that ends in terminal state s for a
player p

3 / 57
Games – Utility

In chess, the outcome is a win, loss, or draw, with +1, 0, or 21 .

Some games have a wider variety of possible outcomes;


I the payoffs in backgammon range from 0 to +192

A zero-sum game is defined as one where the total payoff to all


players is the same for every instance of the game.
I Chess is zero-sum because every game has payoff of either
0 + 1, 1 + 0 or 12 + 12 .

4 / 57
Games – Types

I Two-player game: Player A and B. Player A starts.


I Deterministic: None of the moves/states are subject to
chance (no random draws).
I Perfect information: Both players see all the states and
decisions. Each decision is made sequentially.
I Zero-sum: Player’s A gain is exactly equal to player B’s loss.
One of the player’s must win or there is a draw (both gains
are equal).

5 / 57
Games – Types, e.g.

6 / 57
Games AI – History 2

2
image source: https://www.andreykurenkov.com/writing/ai/a- brief- history- of- game- ai/
7 / 57
Adversarial Search 3

Multiagent environments, in which each agent needs to consider


the actions of other agents and how they affect its own welfare.

The unpredictability of these other agents can introduce


contingencies into the agent’s problem-solving process.

In competitive environments, in
which the agents’ goals are in con-
flict, giving rise to adversarial search
problems — often known as games.

3
involving or characterized by conflict or opposition

8 / 57
Games – Game Tree, e.g. Tic Tac Toe 4

The initial state, ACTIONS and RESULT functions define the


game tree for the game—a tree where the nodes are game states
and the edges are moves.

4
A (partial) game tree for the game of tic-tac-toe. The top node is the initial state, and MAX moves first, placing an X in an empty square. We show part of the
tree, giving alternating moves by MIN (O) and MAX (X), until we eventually reach terminal states, which can be assigned utilities according to the rules of the game.

9 / 57
Games – Game Tree, e.g. Tic Tac Toe

For tic-tac-toe the game tree is relatively small—fewer than


9! = 362, 880 terminal nodes.
I From the initial state, MAX has 9 possible moves.
I Play alternates between MAX’s placing an X and MIN’s
placing an O until we reach leaf nodes corresponding to
terminal states such that one player has three in a row or all
the squares are filled.
I The number on each leaf node indicates the utility value of
the terminal state from the point of view of MAX; high
values are assumed to be good for MAX and bad for MIN.

10 / 57
Optimal Decisions in Games

In a normal search problem, the optimal solution would be a


sequence of actions leading to a goal state – a terminal state
that is a win.

In adversarial search,
I MIN has something to say about it.
I MAX must find a strategy, which specifies MAX’s move in
the initial state, then MAX’s moves in the states resulting
from every possible response by MIN, then MAX’s moves in
the states resulting from every possible response by MIN to
those moves, and so on.

11 / 57
Optimal Decisions in Games – MiniMax 5

I The possible moves for MAX at the root node are labeled a1 ,
a2 , and a3
I The possible replies to a1 for MIN are b1 , b2 , b3 , and so on.
I The utilities of the terminal states range from 2 to 14.

5
Even a simple game like tic-tac-toe is too complex to draw the entire game tree on one page, so we will switch to the trivial game – The 4 nodes are MAX
nodes, in which it is MAX’s turn to move, and the 5 nodes are MIN nodes. The terminal nodes show the utility values for MAX; the other nodes are labeled with their
minimax values. MAX’s best move at the root is a1 , because it leads to the state with the highest minimax value, and MIN’s best reply is b1 , because it leads to the state
with the lowest minimax value.

12 / 57
Optimal Decisions in Games – MiniMax

Given a game tree, the optimal strategy can be determined from


the minimax value of each node, which we write as MINIMAX(n).
I The minimax value of a node is the utility (for MAX) of being in the
corresponding state, assuming that both players play optimally from there
to the end of the game.
I The minimax value of a terminal state is just its utility.
I MAX prefers to move to a state of maximum value, whereas MIN prefers
a state of minimum value.

13 / 57
Optimal Decisions in Games – MiniMax 6

6
returns the action corresponding to the best possible move, that is, the move that leads to the outcome with the best utility, under the assumption that the
opponent plays to minimize utility.

14 / 57
Optimal Decisions in Games – MiniMax

I The terminal nodes on the bottom level get their utility


values from the game’s UTILITY function.
I The first MIN node, labeled B, has 3 successor states with
values 3, 12, and 8, so its minimax value is 3.
I The other two MIN nodes have minimax value 2.
I The root node is a MAX node; its successor states have
minimax values 3, 2, and 2; so it has a minimax value of 3.
15 / 57
Optimal Decisions in Games – MiniMax, e.g. 8

With 3 players of A, B and C, we need to replace the single value


for each node with a vector of values.
I A vector hvA , vB , vC i is associated with each node.
I This vector gives the utility of the state from each player’s
viewpoint . 7

7
In two-player, zero-sum games, the two-element vector can be reduced to a single value because the values are always opposite.
8
a game tree with three players (A, B, C). Each node is labeled with values from the viewpoint of each player. The best move is marked at the root.

16 / 57
Optimal Decisions in Games – MiniMax, e.g.

Consider the node marked X in the game, where player C chooses


what to do
I The two choices lead to terminal states with utility vectors
hvA = 1, vB = 2, vC = 6i and hvA = 4, vB = 2, vC = 3i
I Since 6 is bigger than 3, C should choose the first move. This
means that if state X is reached, subsequent play will lead to
a terminal state with utilities hvA = 1, vB = 2, vC = 6i

17 / 57
Optimal Decisions in Games – Alliances

Anyone who plays multiplayer games quickly becomes aware that


much more is going on than in two-player games.
I Multiplayer games usually involve alliances, whether formal or
informal, among the players.
I Alliances are made and broken as the game proceeds.

e.g. suppose A and B are in weak positions and C is in a stronger


position.
I Then it is often optimal for both A and B to attack C rather
than each other, lest C destroy each of them individually.
I Collaboration emerges from purely selfish behavior.
As soon as C weakens, the alliance loses its value, and either A or
B could violate the agreement.

18 / 57
Optimal Decisions in Games – Alliances
If the game is not zero-sum, then collaboration can also occur
with just two players.
I e.g. there is a terminal state with utilities
hvA = 1000, vB = 1000i and that 1000 is the highest possible
utility for each player.
I the optimal strategy is for both players to do everything
possible to reach this state—that is, the players will
automatically cooperate to achieve a mutually desirable goal.

19 / 57
MiniMax – Properties
Performs a complete depth-first exploration of the game tree

I Completeness : Yes 9

I Time Complexity : O(bm ) 10

I Space Complexity : O(bm) (depth-first exploration) 11

I Optimality : Yes – against an optimal opponent


12

Yet, e.g. for chess, b ≈ 35, m ≈ 100 for “reasonable” games


exact solution completely infeasible
I do we need to explore every path?

9
Is the algorithm guaranteed to find a solution when there is one?
10
How long does it take to find a solution?
11
How much memory is needed to perform the search?
12
Does the strategy find the optimal solution?

20 / 57
Alpha–Beta Pruning – Improving MiniMax 14

The problem with minimax search is that the number of game


states it has to examine is exponential in the depth of the tree.
I can’t eliminate the exponent, but it can cut it in half.
I The trick is that it is possible to compute the correct minimax
decision without looking at every node in the game tree

Perform pruning – eliminate (large) parts of the tree from


13

consideration:
I prunes away branches that cannot possibly influence the final
decision

13
selective removal of certain parts of a plant, such as branches, buds, or roots
14
https://en.wikipedia.org/wiki/Alpha- beta_pruning
21 / 57
Alpha–Beta Pruning 15

Consider a node n somewhere in a tree,


such that Player has a choice of moving
to that node.
I If Player has a better choice m either at
the parent node of n or at any choice
point further up, then n will never be
reached in actual play.
I So once we have found out enough about
n (by examining some of its descendants)
to reach this conclusion, we can prune it.

15
can be applied to trees of any depth, and it is often possible to prune entire subtrees rather than just leaves

22 / 57
Alpha–Beta Pruning

Alpha–beta pruning gets its name from the following two


parameters that describe bounds on the backed-up values that
appear anywhere along the path:

I α: the value of the best (i.e., highest-value) choice we have


found so far at any choice point along the path for MAX
I β: the value of the best (i.e., lowest-value) choice we have
found so far at any choice point along the path for MIN

23 / 57
Alpha–Beta Pruning 16

16
these routines are the same as the MINIMAX functions, except for the two lines in each of MIN-VALUE and MAX-VALUE that maintain α and β (and the
bookkeeping to pass these parameters along).

24 / 57
Alpha–Beta Pruning

Identify the minimax decision without ever evaluating two of the


leaf nodes.

25 / 57
Alpha–Beta Pruning

26 / 57
Alpha–Beta Pruning, e.g. 17

17
Alpha–Beta Pruning example by John Levine (U. Strathclyde): https://www.youtube.com/watch?v=zp3VMe0Jpf8
27 / 57
Alpha–Beta Pruning – Properties

Pruning preserves completeness and optimality of original


minimax algorithm

Degrades Time Complexity from O(bm ) to O(bm/2 ) 18

I doubles the depth of search

18
How long does it take to find a solution?

28 / 57
Imperfect Real-Time Decisions

The minimax algorithm generates the entire game search space,


whereas the alpha–beta algorithm allows us to prune large parts
of it.
I However, alpha–beta still has to search all the way to
terminal states for at least a portion of the search space.

This depth is usually not practical, because moves must be made


in a reasonable amount of time—typically a few minutes at most.

29 / 57
Imperfect Real-Time Decisions – Evaluation Functions
Suggestion: cut off the search earlier and apply a heuristic
evaluation function to states in the search, effectively turning
nonterminal nodes into terminal leaves.

Alter minimax or alpha–beta in two ways:


I replace the utility function by a heuristic evaluation function EVAL,
which estimates the position’s utility
I replace the terminal test by a cutoff test that decides when to apply
EVAL
That gives us the following for heuristic minimax for state s and
maximum depth d:

30 / 57
Imperfect Real-Time Decisions – Evaluation Functions
An evaluation function returns an estimate of the expected
utility of the game from a given position, just as the heuristic
functions discussed before, i.e. estimate of the distance to the
goal.

A linear weighted sum of features evaluation function for chess,


e.g. f1 (s) = (number of white queens) – (number of black queens)

Eval(s) = w1 f1 (s) + w2 f2 (s) + . . . + wn fn (s)

31 / 57
Imperfect Real-Time Decisions – Evaluation Functions

Behaviour is preserved under any monotonic transformation of


Eval
I Only the order matters: payoff in deterministic games acts as
an ordinal utility function

32 / 57
Imperfect Real-Time Decisions – Cutting Off Search

As a simple example, Terminal-Test can be replaced by the


following statement, in order to stop the search reaching at the
tree depth of d

33 / 57
Imperfect Real-Time Decisions – Forward Pruning

It is also possible to do forward pruning – some moves at a given


node are pruned immediately without further consideration.
I One approach to forward pruning is beam search – 19

consider only a beam of the n best moves (according to the


evaluation function) rather than considering all possible
moves.

Of course, such approaches can be rather dangerous because there


is no guarantee that the best move will not be pruned away.

19
https://en.wikipedia.org/wiki/Beam_search
34 / 57
Stochastic Games

Many unpredictable external events can put us into unforeseen


situations.
I Many games mirror this unpredictability by including a
random element, such as the throwing of dice.

Backgammon is a typical game that


combines luck and skill.
I Dice are rolled at the beginning
of a player’s turn to determine
the legal moves.
I e.g. white has rolled a 6–5 and
has 4 possible moves.

35 / 57
Stochastic Games

Although White knows what his or her own legal moves are, White
does not know what Black is going to roll and thus does not know
what Black’s legal moves will be.
I means that White cannot construct a standard game tree of
the sort like in tic-tac-toe.

A game tree in backgammon must include chance nodes in


addition to MAX and MIN nodes.
I the branches leading from each chance node denote the
possible dice rolls;
I each branch is labeled with the roll and its probability.

There are 36 ways to roll two dice, each equally likely; but because
a 6–5 is the same as a 5–6, there are only 21 distinct rolls.

36 / 57
Stochastic Games 20

20
Chance nodes are shown as circles. The branches leading from each chance node denote the possible dice rolls; each branch is labeled with the roll and its
probability.

37 / 57
Stochastic Games
Still want to pick the move that leads to the best position without
having definite minimax values.
I Instead, we can only calculate the expected value of a
position: the average over all possible outcomes of the
chance nodes.
I Leads us to generalize the minimax value for deterministic
games to an expecti-minimax value for games with chance
nodes.

38 / 57
Stochastic Games – Coin Flipping, e.g.

39 / 57
Stochastic Games – Evaluation Functions
The presence of chance nodes means that one has to be more
careful about what the evaluation values mean.
I with an evaluation function that assigns the values [1, 2, 3,
4] to the leaves move a1 is best;
I with values [1, 20, 30, 400], move a2 is best
the program behaves totally differently if we make a change in the
scale of some evaluation values!

40 / 57
Partially Observable Games
In deterministic partially observable games, uncertainty about
the state of the board arises entirely from lack of access to the
choices made by the opponent
I e.g. the game of Kriegspiel, a partially observable variant of
chess in which pieces can move but are completely invisible to
the opponent.

41 / 57
Partially Observable Games – Card Games
Card games provide many examples of stochastic partial
observability, where the missing information is generated
randomly.
I e.g. in many games, cards are dealt randomly at the
beginning of the game, with each player receiving a hand that
is not visible to the other players – e.g. bridge, whist, hearts,
and some forms of poker.

42 / 57

You might also like