0% found this document useful (0 votes)

17 views88 pages

Lecture 5 - Adversal Search

This lecture focuses on adversarial search and game AI, covering concepts such as the MiniMax rule, Alpha-Beta pruning, and Monte-Carlo tree search. It discusses the evolution of AI in games like checkers, chess, and Go, highlighting significant milestones and the complexity of these games. The document also explains the structure of game trees and the principles behind adversarial search algorithms, emphasizing the importance of strategy in multi-agent environments.

Uploaded by

nxhieuidol

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views88 pages

Lecture 5 - Adversal Search

Uploaded by

nxhieuidol

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 88

Introduction to Artificial Intelligence

Lecture 5: Adversarial Search and Games

PGS.TS. Nguyễn Phương Thái

2 Certain Slides Adapted From or Referred
To…
◉ Slides from UC Berkeley CS188, Dan Klein and Pieter Abbeel
• Game Trees I & II: https://inst.eecs.berkeley.edu/~cs188/su21/
◉ Slides from UPenn CIS391, Mitch Marcus
• 2-Player Games: Adversarial Search: https://www.seas.upenn.edu/~cis391/#LECTURES
◉ https://www.javatpoint.com/ai-alpha-beta-pruning

◉ https://int8.io/monte-carlo-tree-search-beginners-guide/
3 Plan
◉ Game AI

◉ The MiniMax Rule

◉ Alpha-Beta Pruning

◉ Monte-Carlo Tree Search

◉ Search with Uncertainty

Game
AI
5 AI for Checkers
◉ 1950: First computer player.

◉ 1994: First computer champion:

Chinook ended 40-year-reign of
human champion Marion Tinsley
using complete 8-piece endgame.

◉ 2007: Checkers was weakly solved in

2007 by a team of Canadian computer
scientists led by Jonathan Schaeffer.
From the standard starting position,
perfect play by each side would result
in a draw!
6 AI for Chess
◉ 1997: Deep Blue defeats human
champion Gary Kasparov in a six-
game match. Deep Blue examined
200M positions per second, used very
sophisticated evaluation and
undisclosed methods for extending
some lines of search up to 40 ply.
Current programs are even better, if
less historic.

◉ https://www.ibm.com/ibm/history/
ibm100/us/en/icons/deepblue/
7 AI for Go
◉ Go originated in China over 3,000 years ago. Winning this
board game requires multiple layers of strategic thinking.

◉ Two players, using either white or black stones, take turns

placing their stones on a board. The goal is to surround
and capture their opponent's stones or strategically create
spaces of territory. Once all possible moves have been
played, both the stones on the board and the empty points
are tallied. The highest number wins.

◉ As simple as the rules may seem, Go is profoundly

complex. There are an astonishing 10 to the power of 170
possible board configurations - more than the number
of atoms in the known universe. This makes the game of
Go a googol times more complex than chess.

◉ 2016: AlphaGO (created by DeepMind) defeats human

champion. Uses Monte Carlo Tree Search, learned
evaluation function. (More details later.)

https://deepmind.com/research/case-studies/
8 More Games…
◉ Poker AI: Libratus (CMU, 2017), Pluribus (CMU, 2019), DeepStack (University of Alberta)
…

◉ StarCraft AI: AlphaStar (DeepMind, 2019)

◉ Dotar 2 AI: OpenAI Five (OpenAI, 2018)

9 Types of Games
◉ Many different kinds of games!

◉ Axes:
• Deterministic or stochastic?
• One, two, or more players?
• Zero sum?
• Perfect information (can you see the
state)?

◉ Want algorithms for calculating a

strategy (policy) which recommends a
move from each state
1
0
Deterministic Games
◉ Deterministic v.s. nondeterministic
• Whether the next state of the environment is completely determined by the current state
and the action executed by the agent(s)

◉ Many possible formalizations, one is:

• States: S (start at s0)
• Players: P={1...N} (usually take turns)
• Actions: A (may depend on player / state)
• Transition Function: S A S
• Terminal Test: S {t,f}
• Terminal Utilities: S P R

◉ Solution for a player is a policy: →

1
1
Zero-Sum Games
◉ Zero-Sum Games
• Agents have opposite utilities (values on
outcomes)
• Lets us think of a single value that one
maximizes and the other
minimizes
• Adversarial, pure competition
◉ General Games
• Agents have independent utilities
(values on outcomes)
• Cooperation, indifference, competition, Image created by Market Business News.
and more are all possible
1
2
Adversarial Search
◉ Adversarial search is a search, where we
examine the problem which arises when we try
to plan ahead of the world and other agents are
planning against us.

◉ The environment with more than one agent is

termed as multi-agent environment. Each
agent needs to consider the action of other
agent and effect of that action on their
performance.

◉ So, Searches in which two or more players with

conflicting goals are trying to explore the same
search space for the solution, are called
adversarial searches, often known as Games
1
3
Game Tree
A game tree is a tree
where nodes of the tree
are the game states and
Edges of the tree are
the moves by players.

The right figure is

showing part of the
game-tree for
tic-tac-toe game.
Following are some key
points of the game:
• There are two players
MAX and MIN.
• Players have an
alternate turn and
start with MAX.
• MAX maximizes the
result of the game tree
• MIN minimizes the
Tic-Tac-Toe Game Tree
result.
The MiniMax Rule
1
5
Value of a State
1
6
Minimax Values
1
7
Minimax Algorithm
◉ Idea: Make the best move for MAX assuming that MIN always replies with the best move
for MIN

◉ Easily computed by a recursive process:

• The backed-up value (i.e., state value) of each node in the tree is determined by the values
of its children:

• For a MAX node, the backed-up value is the maximum of the value of its children (i.e.,
the best for MAX)
• For a MIN node, the backed-up value is the minimum of the values of its children
(i.e. the best for MIN)
•
1
8
The Minimax Procedure

Until game is over:

1. Start with the current position as a MAX node.

2. Expand the game tree a fixed number of ply.

3. Apply the evaluation function to the leaf positions.

4. Calculate back-up values bottom-up.

5. Pick the move assigned to MAX at the root

6. Wait for MIN to respond

1
9
2-ply Example: Backing up values
2
0
Minimax Implementation
2
1
Properties of Minimax Algorithm
◉ Complete: Minimax algorithm is complete. It will definitely find a solution (if exist), in
the finite search tree.

◉ Optimal: Minimax algorithm is optimal if both opponents are playing optimally.

◉ Time complexity: As it performs DFS for the game-tree, so the time complexity of Minimax
algorithm is O(bm), where b is branching factor of the game-tree, and m is the maximum depth
of the tree.

◉ Space complexity: Space complexity of Mini-max algorithm is also similar to DFS which is
O(bm).
2
2
What if MIN does not play optimally?
◉ Definition of optimal play for MAX assumes
MIN plays optimally:
• Maximizes worst-case outcome for MAX.
• (Classic game theoretic strategy)
◉ But if MIN does not play optimally, what
will happen?
2
3
What if MIN does not play optimally?
◉ MAX will do even better.

◉ Consider a MIN node whose children are terminal

nodes. If MIN plays suboptimally, then the value of
the node is greater than or equal to the value it would
have if MIN played optimally. Hence, the value of
the MAX node that is the MIN node’s parent can
only be increased.

◉ This argument can be extended by a simple induction

all the way to the root.

◉ If the suboptimal play by MIN is predictable, then

one can do better than a minimax strategy. For
example, if MIN always falls for a certain kind of
trap and loses, then setting the trap guarantees a win
even if there is actually a devastating response for
MIN.
2
4
What if MIN does not play optimally?
◉ Is it always best to play the minimax
optimal move when facing a suboptimal
opponent?
2
5
What if MIN does not play optimally?
◉ Is it always best to play the minimax
optimal move when facing a suboptimal
opponent? NO

◉ Consider a situation where optimal play by

both sides will lead to a draw, but there is one
risky move for MAX that leads to a state in
which there are 10 possible response moves
by MIN that all seem reasonable, but 9 of
them are a loss for MIN and one is a loss for
MAX.

◉ If MAX believes that MIN does not have

sufficient computational power to discover the
optimal move, MAX might want to try the
risky move, on the grounds that a 9/10 chance
of a win is better than a certain draw.
2
6
Comments on Minimax Search
◉ Performance will depend on
• the quality of the static evaluation function (expert knowledge)
• depth of search (computing power and search algorithm)
◉ Differences from normal state space search
• Looking to make one move only, despite deeper search
• No cost on arcs – costs from backed-up static evaluation
• MAX can’t be sure how MIN will respond to his moves
◉ Minimax forms the basis for other game tree search algorithms.
Alpha-Beta Pruning
2
8
Resource Limits
◉ Problem: In realistic games, cannot search to leaves!
2
9
Game Tree Pruning
3
0
Alpha-Beta Pruning
◉ A way to improve the performance of the Minimax Procedure

◉ Basic idea: “If you have an idea which is surely bad, don’t take the time to see how truly
awful it is” ~ Pat Winston
3
1
Alpha-Beta Pruning
◉ During Minimax, keep track of two additional values:
• α: MAX’s current lower bound on MAX’s outcome
• β: MIN’s current upper bound on MIN’s outcome
◉ MAX will never allow a move that could lead to a worse score (for MAX) than α

◉ MIN will never allow a move that could lead to a better score (for MAX) than β

◉ Therefore, stop evaluating a branch whenever:

• When evaluating a MAX node: a value v ≥ β is backed-up
— MIN will never select that MAX node
• When evaluating a MIN node: a value v ≤ α is found
— MAX will never select that MIN node
3
2
Alpha-Beta Pruning
◉ Based on observation that for all viable paths utility value f(n) will be α <= f(n) <=
β

◉ Initially, α = - , β=

◉ As the search tree is traversed, the possible utility value window shrinks as α increases,
β
decreases

◉ Whenever the current ranges of alpha and beta no longer overlap, it is clear that the
current node is a dead end
3
3
When to Prune

Prune whenever α >= β.

◉ Prune below a Max node when its value becomes ≥ the β value of its ancestors.
• — Max nodes update based on children’s returned values.
• — Idea: Player MIN at node above won’t pick that value anyway, he can force a worse value.
◉ Prune below a Min node when its β value becomes ≤ the value of its ancestors.
• — Min nodes update β based on children’s returned values.
• — Idea: Player MAX at node above won’t pick that value anyway; she can do better.
3
4
Example
◉ Max player will start first move from node A where α= -∞ and β= +∞, these value of alpha
and beta passed down from A to B, then B to D.
3
5
Example
◉ At Node D, the value of α will be calculated as its turn for Max: max (2, 3) = 3 will be the value of α at
node D and node value will also 3. Now algorithm backtrack to node B, where the value of β will change
as this is a turn of Min: β= min (∞, 3) = 3, hence at node B, α= -∞ and β= 3.

◉ In the next step, algorithm traverse the next successor of Node B which is node E, and the values of α=
-∞, and β= 3 will also be passed.
3
6
Example
◉ At node E, Max will take its turn: α= max (-∞, 5) = 5, hence at node E α= 5 and β= 3,
where α>=β, so the right successor of E will be pruned, and algorithm will not traverse it,
and the value at node E will be 5.
3
7
Example
◉ Next step, algorithm backtrack the tree, from node B to node A. At node A: α=max (-∞, 3)=
3, β= +∞; then pass to Node C. At node C, α=3 and β= +∞, then passed to node F. At node F:
compare to left and right child, α remains 3. The node value of F will become 1.
3
8
Example
◉ Node F returns the node value 1 to node C, at C α= 3 and β= +∞, here the value of beta will
be changed, it will compare with 1 so min (∞, 1) = 1. Now at C, α=3 and β= 1, α>=β, so the
next child of C which is G will be pruned. The algorithm will not compute the entire sub-tree
G.
3
9
Example
◉ C now returns the value of 1 to A. The best value for A is max(3, 1) = 3. Hence the
optimal value for the maximizer is 3 for this example.
4
0
Alpha-Beta Implementation
4
1
Move Ordering in Alpha-Beta pruning
◉ The effectiveness of alpha-beta pruning is highly dependent on the order in which each node is
examined.

◉ Worst ordering:
• In some cases, alpha-beta pruning algorithm does not prune any of the leaves of the tree, and
works exactly as minimax algorithm.
• In this case, the best move occurs on the right side of the tree. The time complexity for such an
order is O(bm).

◉ Ideal ordering:
• The ideal ordering for alpha-beta pruning occurs when best moves occur at the left side of the
tree.
• We apply DFS hence it first search left of the tree and go deep twice as minimax algorithm in
the same amount of time. Complexity in ideal ordering is O(b m2 ) (Best-Case Analysis of
Beta Pruning).
Alpha-
4
2
Test Example…

-which nodes can be pruned?

5 6
3 4 1 2 7 8
4
3
Test Example…
-which nodes can be pruned?
Max

Min

Max

5 6
3 4 1 2 7 8

Answer: NONE! Because the most favorable nodes for both are
explored last (i.e., in the diagram, are on the right-hand side).
4
4
Test Example 2…
example) -which nodes can be pruned?

3 4
6 5 8 7 2 1
4
5
Test Example 2…
example) Max -which nodes can be pruned?

Min

Max

6 5 8 7 2 1 3 4

Answer: LOTS! Because the most favorable nodes for both are
explored first (i.e., in the diagram, are on the left-hand side).
4
6
Resource Limits
◉ Problem: In realistic games, cannot search to leaves!

◉ Solution: Depth-limited search

• Instead, search only to a limited depth in the tree
• Replace terminal utilities with an evaluation function for
non-terminal positions

◉ Example:
• Suppose we have 100 seconds, can explore 10K nodes /
sec
• So can check 1M nodes per move
• α-β reaches about depth 8 – decent chess program
◉ Guarantee of optimal play is gone

◉ More plies makes a BIG difference

◉ Use iterative deepening for an anytime algorithm

4
7
Applications
◉ Deep Blue
• Evaluate positions using features handcrafted by human grandmasters and carefully
tuned weights
• Combined with a high-performance alpha-beta search that expands a vast search tree using a
large number of clever heuristics and domain-specific adaptations.
• Uses a parallel array of 256 special chess-specific processors
• Evaluates 200 billion moves every 3 minutes; 12-ply search depth
• 8000 factor evaluation function tuned from hundreds of thousands of grandmaster games
• Tends to play for tiny positional advantages.
◉ Chinook
• The World Man-Made Checkers Champion, developed at the University of Alberta.
• Competed in human tournaments, earning the right to play for the human
world championship, and defeated the best players in the world.
Monte-Carlo Tree Search
4
9
The Game of Go
◉ For quite a long time, a common opinion in academic world
was that machine achieving human master performance
level in the game of Go was far from realistic.

◉ It was considered a ‘holy grail’ of AI – a milestone we were

quite far away from reaching within upcoming decade.

◉ Surprisingly, in march 2016 an algorithm invented by

Google DeepMind called Alpha Go defeated Korean world
champion in Go 4-1 proving fictional and real-life skeptics
wrong.

◉ Around a year after that, Alpha Go Zero – the next

generation of Alpha Go Lee (the one beating Korean
master) – was reported to destroy its predecessor 100-0,
being very doubtfully reachable for humans.
5
0
AlphaGo
◉ 2016: AlphaGO (created by DeepMind) defeats human champion. Uses Monte Carlo
Tree Search, learned evaluation function.

https://www.alphagomovie.com/
5
1
5
2
AlphaZero, MuZero, and More…

MuZero: Mastering Go, chess, shogi and Atari without rules

5
3
Alpha Go/Zero
◉ Alpha Go/Zero system is a mix of several methods assembled into one great engineering
piece of work. The core components of the Alpha Go/Zero are:
• Monte Carlo Tree Search (certain variant with PUCT function for tree traversal)
• Residual Convolutional Neural Networks – policy and value network(s) used for
game evaluation and move prior probability estimation
• Reinforcement learning used for training the network(s) via self-plays
◉ Here we will focus on Monte Carlo Tree Search only.
5
4
Monte Carlo Tree Search
◉ Monte Carlo Tree Search was introduced by Rémi Coulom in 2006 as a building block
of Crazy Stone – Go playing engine with an impressive performance.

◉ From a helicopter view Monte Carlo Tree Search has one main purpose: given a game state
to choose the most promising next move.
5
5
Monte Carlo Tree Search
1. Selection
Start from root R and select successive child nodes until a leaf node L is reached. The root is the current
game state and a leaf is any node that has a potential child from which no simulation (playout) has yet
been initiated.

2. Expansion
Unless L ends the game decisively (e.g. win/loss/draw) for either player, create one (or more) child
nodes and choose node C from one of them. Child nodes are any valid moves from the game position
defined by L.

3. Simulation
Complete one random playout from node C. This step is sometimes also called playout or rollout. A
playout may be as simple as choosing uniform random moves until the game is decided (for example in
chess, the game is won, lost, or drawn).

4. Backpropagation
Use the result of the playout to update information in the nodes on the path from C to R.
5
6
Monte Carlo Tree Search
5
7
Choosing the Most Promising Move: Monte Carlo
◉ In Monte Carlo Tree Search algorithm, the most
promising move is computed in a sightly different fashion.

◉ As the name suggests (especially its monte-carlo component) –

Monte Carlo Tree Search simulates the games many times and
tries to predict the most promising move based on the simulation
results.

• Monte Carlo method:

• A broad class of computational algorithms that rely on
repeated random sampling to obtain numerical results. Monte Carlo method applied to
• The underlying concept is to use randomness to approximating the value of π.
solve problems that might be deterministic in
principle.
5
8
Simulation / Playout
5
9
Fully expanded and visited nodes
◉ Node is considered visited if a playout has been started in that node – meaning it has
been evaluated at least once.
◉ If all children nodes of a node are visited node is considered fully expanded, otherwise –
well – it is not fully expanded and further expansion is possible.
◉ nodes chosen by rollout policy function during simulation are not considered visited.
6
0
Backpropagation
◉ Once simulation for a freshly visited node (sometimes called a leaf) is finished, its
result is ready to be propagated back up to the current game tree root node. The node
where simulation started is marked visited.

◉ For every node on the backpropagation path certain statistics are computed/updated
6
1
Node’s Statistics
◉ Back-propagating updates the total simulation
reward Q(v) and total number of visits N(v)
for all nodes v on backpropagation path:
• Q(v) – Total simulation reward, e.g., sum
of simulation results that passed through
considered node.
• N(v) – Total number of visits, i.e.,
how many times a node has been on
the backpropagation path

◉ Nodes with high reward are good candidates

to follow (exploitation) but those with low
amount of visits may be interesting too
(because they are not explored well)
6
2
Example
6
3
Game Tree Traversal: Upper Confidence Bound
◉ How do we get from a root node to the unvisited node to start a simulation?

◉ Upper Confidence Bound applied to trees (UCT) is a function that lets us choose the next
node among visited nodes to traverse through – the core function of Monte Carlo Tree
Search

Exploitation Exploration

Controls the trade-off

between exploitation
and exploration in
MCTS
6
4
UCT in Alpha Go and Alpha Zero
6
5
Terminating Monte Carlo Tree Search
◉ When do we actually end the MCTS procedure?
• It depends on the context. If you build a game engine then your “thinking time” is probably
limited, plus your computational capacity has its boundaries, too. Therefore the safest
bet is to run MCTS routine as long as your resources let you.

◉ The general idea of simulating moves into the future, observing the outcome, and using the
outcome to determine which moves are good ones is one kind of reinforcement learning (we
will cover in the future lectures).
Search with Uncertainty
6
7
Stochastic Games
◉ What if we don’t know what the result of an action will be? E.g.,
• In solitaire, shuffle is unknown

• In minesweeper, mine locations

6
8
Worst Case vs. Average Case
6
9
Expectimax Search
◉ Values should now reflect average-case (expectimax) outcomes,
not worst-case (minimax) outcomes

◉ Expectimax search: compute the average score under optimal

play
• Max nodes as in minimax search
• Chance nodes are like min nodes but the outcome is
uncertain
• Calculate their expected utilities
• I.e. take weighted average (expectation) of children
◉ Later, we’ll learn how to formalize the underlying uncertain-
result problems as Markov Decision Processes
7
0
Reminder: Probabilities
◉ A random variable represents an event whose outcome is unknown

◉ A probability distribution is an assignment of weights to outcomes

◉ Example: Traffic on freeway

•Random variable: T = whether there’s traffic
•Outcomes: T in {none, light, heavy}
•Distribution: P(T=none) = 0.25, P(T=light) = 0.50, P(T=heavy) = 0.25

◉ Some laws of probability (more later):

• Probabilities are always non-negative
• Probabilities over all possible outcomes sum to one

◉ As we get more evidence, probabilities may change:

•P(T=heavy) = 0.25, P(T=heavy | Hour=8am) = 0.60
7
1
Reminder: Expectations
◉ The expected value of a function of a random variable is the average, weighted by
the probability distribution over outcomes

◉ Example: How long to get to the airport?

7
2
Expectimax Pseudocode
7
3
Expectimax Pseudocode
7
4
Expectimax Pruning?
7
5
Depth-Limited Expectimax
7
6
Expectimax Search
◉ In expectimax search, we have a probabilistic model of
how the opponent (or environment) will behave in any state
• Model could be a simple uniform distribution (roll a die)
• Model could be sophisticated and require a great deal
of computation
• We have a chance node for any outcome out of our control:
opponent or environment
• The model might say that adversarial actions are likely!
◉ For now, assume each chance node magically comes along
with probabilities that specify the distribution over its
Having a probabilistic belief about
outcomes
another agent’s action does not
mean that the agent is flipping any
coins!
7
7
The Dangers of Optimism and Pessimism

Dangerous Optimism Dangerous Pessimism

Assuming chance when the world is Assuming the worst case when it’s not
adversarial likely
7
8
Stochastic Single-Player: Pacman
◉ Notice that we’ve gotten away from thinking that the ghosts are trying to minimize
pacman’s score

◉ Instead, they are now a part of the environment

◉ Pacman has a belief (distribution) over how they will act

◉ Quiz: Can we see minimax as a special case of expectimax?

◉ Quiz: what would pacman’s computation look like if we assumed that the ghosts were doing
1- ply minimax and taking the result 80% of the time, otherwise moving randomly?
7
9
Expectimax for Pacman
8
0
Stochastic Two-Player: Backgammon
◉ The goal of the game is to move all one’s pieces
off the board.
• Black moves clockwise toward 25, and
White moves counterclockwise toward 0.
• A piece can move to any position unless
multiple opponent pieces are there; if there is
one opponent, it is captured and must
start over.

◉ In the position shown, Black has rolled 6–5 and

must choose among four legal moves: (5–
11,5– 10), (5–11,19–24), (5–10,10–16),
and (5–11,11–
16), where the notation (5–11,11–16) means
move one piece from position 5 to 11, and then
move a piece from 11 to 16.
8
1
Example: Backgammon

At this point Black knows

what moves can be made,
but does not know what
White is going to roll and
thus does not know what
White’s legal moves will
be. That means Black
cannot construct a
standard game tree of the
sort we saw in chess and
tic-tac-toe. A game tree in
backgammon must
include chance nodes in
addition to MAX and MIN
nodes. Chance nodes are
shown as circles in Figure
8
2
Mixed Layer Types
◉ Expectiminimax
• Environment is an extra
“random agent”
player that moves
after each min/max
agent
• Each node computes the
appropriate
combination of its
children
8
3
Multi-player Non-Zero-Sum Games
◉ What if the game is not zero-sum, or has
multiple players?

◉ Generalization of minimax:
• Terminals have utility tuples
• Node values are also utility tuples
• Each player maximizes its own
component
• Can give rise to cooperation and
competition
dynamically…
8
4
Utilities
◉ Utilities are functions from outcomes (states of the world) to real numbers that
describe an agent’s preferences

◉ Where do utilities come from?

• In a game, may be simple (+1/-1)
• Utilities summarize the agent’s goals
• Theorem: any “rational” preferences can be summarized as a utility function
◉ We hard-wire utilities and let behaviors emerge
• Why don’t we let agents pick utilities?
• Why don’t we prescribe behaviors?
8
5
Maximum Expected Utilities
◉ Why should we average utilities? Why not
minimax?

◉ Principle of maximum expected utility:

• A rational agent should chose the action that
maximizes its expected utility, given its knowledge
8
6
What Utilities to Use?

◉ For worst-case minimax reasoning, terminal function scale doesn’t matter

• We just want better states to have higher evaluations (get the ordering right)
• We call this insensitivity to monotonic transformations

◉ For average-case expectimax reasoning, we need magnitudes to be meaningful (we’ll talk more about
utilities in the future)
8
7
Summary
◉ In two-player, discrete, deterministic, turn-taking zero-sum games with perfect information, the minimax
algorithm can select optimal moves by a depth-first enumeration of the game tree.

◉ The alpha–beta search algorithm computes the same optimal move as minimax, but achieves much
greater efficiency by eliminating subtrees that are provably irrelevant.

◉ Usually, it is not feasible to consider the whole game tree (even with alpha–beta), so we need to cut the
search off at some point and apply a heuristic evaluation function that estimates the utility of a state.

◉ An alternative called Monte Carlo tree search (MCTS) evaluates states not by applying a heuristic
function, but by playing out the game all the way to the end and using the rules of the game to see who
won. Since the moves chosen during the playout may not have been optimal moves, the process is
repeated multiple times and the evaluation is an average of the results.

◉ Games of chance can be handled by expectiminimax, an extension to the minimax algorithm that
evaluates a chance node by taking the average utility of all its children, weighted by the probability of
each child.
Thanks!
Q&A

HQPDS 2019.5.5
No ratings yet
HQPDS 2019.5.5
622 pages
GameTek
From Everand
GameTek
Geoffrey Engelstein
5/5 (1)
6 Min Max
No ratings yet
6 Min Max
11 pages
Game Playing Algorithm
No ratings yet
Game Playing Algorithm
27 pages
2025 Lecture03 AdversarialSearch
No ratings yet
2025 Lecture03 AdversarialSearch
51 pages
UNIT 2 AI Notes
No ratings yet
UNIT 2 AI Notes
26 pages
AI-Lecture 6 (Adversarial Search)
No ratings yet
AI-Lecture 6 (Adversarial Search)
68 pages
Yapay Zeka - 8
No ratings yet
Yapay Zeka - 8
48 pages
Lecture Adversarial Searches
No ratings yet
Lecture Adversarial Searches
25 pages
AI - Unit - 2
No ratings yet
AI - Unit - 2
30 pages
Chap 4 Games
No ratings yet
Chap 4 Games
31 pages
Lecture11 AdversarialSearch
No ratings yet
Lecture11 AdversarialSearch
74 pages
AI Module3
No ratings yet
AI Module3
17 pages
AI Unit-3
No ratings yet
AI Unit-3
109 pages
Unit 3 Updated
No ratings yet
Unit 3 Updated
112 pages
Minimax Algorithm & Alpha-Beta Pruning
No ratings yet
Minimax Algorithm & Alpha-Beta Pruning
35 pages
Adversarial Search and Game Playing
No ratings yet
Adversarial Search and Game Playing
77 pages
AI Unit 3
No ratings yet
AI Unit 3
76 pages
Ai Unit 2
No ratings yet
Ai Unit 2
88 pages
Chap-4 Adversarial Search
No ratings yet
Chap-4 Adversarial Search
43 pages
Lecture Notes Adversarial Search
No ratings yet
Lecture Notes Adversarial Search
13 pages
Chapter. 06 - Adversarial Search and Games - No Embedded Videos
No ratings yet
Chapter. 06 - Adversarial Search and Games - No Embedded Videos
51 pages
Lecture05 AdversarialSearch
No ratings yet
Lecture05 AdversarialSearch
51 pages
SET394 - AI - Lecture 06 - Adversarial Search
No ratings yet
SET394 - AI - Lecture 06 - Adversarial Search
27 pages
4 Adversel Search Game Tree
No ratings yet
4 Adversel Search Game Tree
51 pages
Institute of Southern Punjab Multan: Syed Zohair Quain Haider Lecturer ISP Multan
No ratings yet
Institute of Southern Punjab Multan: Syed Zohair Quain Haider Lecturer ISP Multan
41 pages
Adversarial Search Algorithms in Artificial Intelligence (AI) - GeeksforGeeks
No ratings yet
Adversarial Search Algorithms in Artificial Intelligence (AI) - GeeksforGeeks
20 pages
AAI - Intro Lec 9 10
No ratings yet
AAI - Intro Lec 9 10
22 pages
Lecture13 - Adversial Search Algorithms
No ratings yet
Lecture13 - Adversial Search Algorithms
23 pages
Lec11&12-Adversarial Search
No ratings yet
Lec11&12-Adversarial Search
30 pages
AI All Units
No ratings yet
AI All Units
93 pages
Aiml Unit-2
No ratings yet
Aiml Unit-2
61 pages
Unit 3 - Ai - II Aiml Full-1
No ratings yet
Unit 3 - Ai - II Aiml Full-1
108 pages
Adversarial Search - Game Trees and Minimax Evaluation
No ratings yet
Adversarial Search - Game Trees and Minimax Evaluation
50 pages
Ai-Ml Mod-2
No ratings yet
Ai-Ml Mod-2
72 pages
Adversarial Search Two - Persons Game: Russel Norvig (Text) Book and Patrick Henry Winston (Reference Book)
No ratings yet
Adversarial Search Two - Persons Game: Russel Norvig (Text) Book and Patrick Henry Winston (Reference Book)
71 pages
Aiunit 2
No ratings yet
Aiunit 2
18 pages
AD8402 - Artificial Intelligence (Unit III)
No ratings yet
AD8402 - Artificial Intelligence (Unit III)
24 pages
Adversarial Search
No ratings yet
Adversarial Search
20 pages
AI Unit 4
No ratings yet
AI Unit 4
25 pages
Adversial Search
No ratings yet
Adversial Search
21 pages
Unit 2 Adversial Search
No ratings yet
Unit 2 Adversial Search
36 pages
Adversarial Search (Minimax, Alfa-Beta Algorithm)
No ratings yet
Adversarial Search (Minimax, Alfa-Beta Algorithm)
15 pages
Adversarial Search
No ratings yet
Adversarial Search
42 pages
6-A Star Search Adversarial Search-09!01!2025
No ratings yet
6-A Star Search Adversarial Search-09!01!2025
42 pages
Lecture 6 - Minmax Alpha Beta
No ratings yet
Lecture 6 - Minmax Alpha Beta
41 pages
18CS753 AI Module 4
No ratings yet
18CS753 AI Module 4
44 pages
Games
No ratings yet
Games
41 pages
2.6-Adversarial Search Algorithms
No ratings yet
2.6-Adversarial Search Algorithms
21 pages
Ai Unit 3
No ratings yet
Ai Unit 3
33 pages
Adversarial Search
No ratings yet
Adversarial Search
37 pages
0 Adversial Search Min Max ALpha Beta
No ratings yet
0 Adversial Search Min Max ALpha Beta
64 pages
AI 3 Unit New Savita
No ratings yet
AI 3 Unit New Savita
18 pages
Module 3
No ratings yet
Module 3
18 pages
UNIT-II-Adversarial Search
No ratings yet
UNIT-II-Adversarial Search
28 pages
Adversarial Search 2020
No ratings yet
Adversarial Search 2020
34 pages
Module 2 (Part 2)
No ratings yet
Module 2 (Part 2)
136 pages
IAI UNIT-II Games
No ratings yet
IAI UNIT-II Games
57 pages
18CS753 Ai Module 4
No ratings yet
18CS753 Ai Module 4
43 pages
Fun Online Games For Teens with Tips and Tricks: Ages 13 And Up: Games for Kids and Teens
From Everand
Fun Online Games For Teens with Tips and Tricks: Ages 13 And Up: Games for Kids and Teens
Baby Professor
No ratings yet
Winning Monopoly
From Everand
Winning Monopoly
Dr. Glenn Seidman
5/5 (1)
Internet Technologies Exam
No ratings yet
Internet Technologies Exam
14 pages
Bhagwan Seminar
No ratings yet
Bhagwan Seminar
14 pages
Ttl1 Module
No ratings yet
Ttl1 Module
50 pages
BTCO12107 Pps
No ratings yet
BTCO12107 Pps
9 pages
The Impact of COVID-19 On Customer's Online Banking and E-Payment Usage: A Study of Customers Within Kaduna Metropolis
No ratings yet
The Impact of COVID-19 On Customer's Online Banking and E-Payment Usage: A Study of Customers Within Kaduna Metropolis
18 pages
Campus - Backend Developer
No ratings yet
Campus - Backend Developer
2 pages
HAESG 1257 The Three Little Pigs Dancing
No ratings yet
HAESG 1257 The Three Little Pigs Dancing
41 pages
MassLynx 4.2 WIN10 Configuration Guide
No ratings yet
MassLynx 4.2 WIN10 Configuration Guide
76 pages
Devops: Roadmap - SH
No ratings yet
Devops: Roadmap - SH
1 page
Taglib
No ratings yet
Taglib
145 pages
9SD00582 PSRPT 2024-02-15 05.05.44
No ratings yet
9SD00582 PSRPT 2024-02-15 05.05.44
14 pages
Kindle Cashflow - CheatSheet - 2016
No ratings yet
Kindle Cashflow - CheatSheet - 2016
57 pages
t1020 Mathematics n4 QP Nov 2019
No ratings yet
t1020 Mathematics n4 QP Nov 2019
6 pages
CPS-UNIT - 1-Compressed
No ratings yet
CPS-UNIT - 1-Compressed
183 pages
BIORADIO PG Contribution - Bci2000.org BBS
No ratings yet
BIORADIO PG Contribution - Bci2000.org BBS
3 pages
Netway N4PS2037B CAT6 UTP Cable (Indoor)
No ratings yet
Netway N4PS2037B CAT6 UTP Cable (Indoor)
1 page
Console Output CLI Console
No ratings yet
Console Output CLI Console
20 pages
HTML, CSS, and Javascript
No ratings yet
HTML, CSS, and Javascript
10 pages
Operating System
No ratings yet
Operating System
11 pages
Microsoft Expert Lesson 1 Knowledge Assessment
No ratings yet
Microsoft Expert Lesson 1 Knowledge Assessment
1 page
Navigator Flight Controller For Raspberry Pi 4 For ROVs, Robots, and Drones 5
No ratings yet
Navigator Flight Controller For Raspberry Pi 4 For ROVs, Robots, and Drones 5
1 page
Zied Kanoun Resume v2
No ratings yet
Zied Kanoun Resume v2
1 page
40 Gbps QSFP Cables Ds
No ratings yet
40 Gbps QSFP Cables Ds
2 pages
The Activities Are Carried Out by The Following Three People: Administrative Support Person: Filing
No ratings yet
The Activities Are Carried Out by The Following Three People: Administrative Support Person: Filing
3 pages
API 12 X Lesson Transcript v0 2
No ratings yet
API 12 X Lesson Transcript v0 2
37 pages
Types of Computer Networks
No ratings yet
Types of Computer Networks
9 pages
Full Stack
100% (1)
Full Stack
81 pages
30 Best Manufacturing KPIs & Metrics For 2021 Rep
No ratings yet
30 Best Manufacturing KPIs & Metrics For 2021 Rep
2 pages
Practical No. 1 Aim: The Euclid Problem Theory
No ratings yet
Practical No. 1 Aim: The Euclid Problem Theory
4 pages

Lecture 5 - Adversal Search

Uploaded by

Lecture 5 - Adversal Search

Uploaded by

Introduction to Artificial Intelligence

Lecture 5: Adversarial Search and Games

PGS.TS. Nguyễn Phương Thái

◉ The MiniMax Rule

◉ Monte-Carlo Tree Search

◉ Search with Uncertainty

◉ 1994: First computer champion:

◉ 2007: Checkers was weakly solved in

◉ Two players, using either white or black stones, take turns

◉ As simple as the rules may seem, Go is profoundly

◉ 2016: AlphaGO (created by DeepMind) defeats human

◉ StarCraft AI: AlphaStar (DeepMind, 2019)

◉ Dotar 2 AI: OpenAI Five (OpenAI, 2018)

◉ Want algorithms for calculating a

◉ Many possible formalizations, one is:

◉ Solution for a player is a policy: →

◉ The environment with more than one agent is

◉ So, Searches in which two or more players with

The right figure is

◉ Easily computed by a recursive process:

Until game is over:

1. Start with the current position as a MAX node.

2. Expand the game tree a fixed number of ply.

3. Apply the evaluation function to the leaf positions.

4. Calculate back-up values bottom-up.

5. Pick the move assigned to MAX at the root

6. Wait for MIN to respond

◉ Optimal: Minimax algorithm is optimal if both opponents are playing optimally.

◉ Consider a MIN node whose children are terminal

◉ This argument can be extended by a simple induction

◉ If the suboptimal play by MIN is predictable, then

◉ Consider a situation where optimal play by

◉ If MAX believes that MIN does not have

◉ Therefore, stop evaluating a branch whenever:

Prune whenever α >= β.

-which nodes can be pruned?

◉ Solution: Depth-limited search

◉ More plies makes a BIG difference

◉ Use iterative deepening for an anytime algorithm

◉ It was considered a ‘holy grail’ of AI – a milestone we were

◉ Surprisingly, in march 2016 an algorithm invented by

◉ Around a year after that, Alpha Go Zero – the next

MuZero: Mastering Go, chess, shogi and Atari without rules

◉ As the name suggests (especially its monte-carlo component) –

• Monte Carlo method:

◉ Nodes with high reward are good candidates

Controls the trade-off

• In minesweeper, mine locations

◉ Expectimax search: compute the average score under optimal

◉ A probability distribution is an assignment of weights to outcomes

◉ Example: Traffic on freeway

◉ Some laws of probability (more later):

◉ As we get more evidence, probabilities may change:

◉ Example: How long to get to the airport?

Dangerous Optimism Dangerous Pessimism

◉ Instead, they are now a part of the environment

◉ Pacman has a belief (distribution) over how they will act

◉ Quiz: Can we see minimax as a special case of expectimax?

◉ In the position shown, Black has rolled 6–5 and

At this point Black knows

◉ Where do utilities come from?

◉ Principle of maximum expected utility:

◉ For worst-case minimax reasoning, terminal function scale doesn’t matter

You might also like