Unit III AI
Unit III AI
Unit III AI
Game Theory, Optimal Decisions in Games, Heuristic Alpha–Beta Tree Search, Monte Carlo Tree
Search, Stochastic Games, Partially Observable Games, Limitations of Game Search Algorithms,
Constraint Satisfaction Problems (CSP), Constraint Propagation: Inference in CSPs, Backtracking
Search for CSPs.
Unit III
Adversarial Search and Games
Adversarial Search
Search in competitive environments where agents goals are in conflict.
Examples of Adversarial Search are Games
State are easy to represent (unlike otjher real world problem)
Agents are restricted to a small number of actions
Outcomes of agent actions defined by precise rule
Usually hard to solve.
Game Theory
Multiagent environments, in which each agent needs to consider the actions of other agents and how they
affect its own welfare.
The unpredictability of these other agents can introduce contingencies into the agent‗s problem-
solving process.
In environments the agents‗ goals are in conflict which gives rise to adversarial search problems—
often known as games.
Mathematical game theory, a branch of economics, views any multi-agent environment as a game,
provided that the impact of each agent on the others is ―significant, regardless of whether the agents are
cooperative or competitive.
Types of Games
1. Perfect Information 2.Imperfect Information 3.Deterministic
4. Non-Deterministic 5 .Zero-sum 6.Constant sum
1. Perfect information
Agent can look into complete board.
Agent have all information about games.
Agent can see others move also.
Example- chess, checkers etc.
2.Imperfect information
Agent do not have all information about games.
Agent is not aware what is going on.
Example-Tic-Tac-Toe, Battleship, Bridge etc.
3. Deterministic
Follow strict pattern and set of rules for the games.
No randomness associated with games.
Example –Chess, Checkers, Tic Tac Toe etc
4. Non-deterministic
It Have various unpredictable events. It Has a factor of luck or chance.
Luck factor introduced by the dice or cards
Each action response is not fixed.
Also called as stochastic games
Example- Poker, Backgemmon etc
Figure shows part of the game tree for tic-tac-toe (noughts and crosses).
From the initial state, MAX has nine possible moves.
Play alternates between MAX„s placing an X and MIN‗s placing an O until we reach leaf nodes
corresponding to terminal states such that one player has three in a row or all the squares are filled.
The number on each leaf node indicates the utility value of the terminal state from the point of view of
MAX; high values are assumed to be good for MAX and
bad for MIN.
For tic-tac-toe the game tree is relatively small—fewer than 9! = 362, 880 terminal nodes.
Minmax(s,p)
defines the numeric values at all other nodes. i.e. terminal state and Non- terminal
state.
Heuristic Alpha–Beta Tree Search
The problem with minimax search is that the number of game states it has to examine is
exponential in the depth of the tree.
Unfortunately, we can‗t eliminate the exponent, but it turns out we can effectively cut it
in half.
The trick is that it is possible to compute the correct minimax decision without looking
at every node in the game tree.
That is, we can borrow the idea of pruning to eliminate large parts of the tree from
consideration.
The particular technique is called alph-beta pruning.
where;
Si = value of node i
xi = empirical mean of a node i
C = a constant
N = total number of simulations
When traversing a tree during the selection process, the child node that returns the greatest value from
the above equation will be one that will get selected. During traversal, once a child node is found which is
also a leaf node, the MCTS jumps into the expansion step.
2. Expansion-
In this process, a new child node is added to the tree to that node which was optimally reached during
the selection process.
3. Simulation:
In this process, a simulation is performed by choosing moves or strategies until a result or predefined
state is achieved.
4. Backpropagation:
After determining the value of the newly added node, the remaining tree must be updated. So, the
backpropagation process is performed, where it backpropagates from the new node to the root node.
During the process, the number of simulation stored in each node is incremented. Also, if the new
node‘s simulation results in a win, then the number of wins is also incremented.
Stochastic Games
Many games mirror the unpredictability by including a random element, such as the
throwing of dice.
We call these stochastic games.
Backgammon is a typical game that combines luck and skill. Dice are rolled at the
beginning of a player‗s turn to determine the legal moves.
In the backgammon position of Figure 5.6,
for example, White has rolled a 6–5 and has four possible moves.
The branches leading from each chance node denote the possible dice rolls; each branch
is labeled with the roll and its probability.
There are 36 ways to roll two dice, each equally likely; but because a 6–5 is the same as
a 5–6, there are only 21 distinct rolls.
The six doubles (1–1 through 6–6) each have a probability of 1/36, so we say P(1–1) =
1/36. The other 15 distinct rolls each have a 1/18 probability.
The next step is to understand how to make correct decisions. Obviously, we still want
to pick the move that leads to the best position.
However, positions do not have definite minimax values. Instead, we can only calculate
the expected value of a position: the average over all possible outcomes of the chance nodes.
This leads to generalize the minimax value for deterministic games to an expecti-
minimax value for games with chance nodes. Terminal nodes and MAX and MIN nodes (for
which the dice roll is known) work exactly the same way as before
For chance nodes we compute the expected value, which is the sum of the value over
all outcomes, weighted by the probability of each chance action:
EXPECTIMINIMAX(s) =
UTILITY(s) if TERMINAL-TEST(s)
maxa EXPECTIMINIMAX(RESULT(s, a)) if PLAYER(s)= MAX
mina EXPECTIMINIMAX(RESULT(s, a)) if PLAYER(s)= MIN
Σr P(r)EXPECTIMINIMAX(RESULT(s, r)) if PLAYER(s)= CHANCE
Partially Observable Games
Chess has often been described as war in miniature, but it lacks at least one major
characteristic of real wars, namely, partial observability.
In the ―fog of war,‖ the existence and disposition of enemy units is often unknown until
revealed by direct contact.
As a result, warfare includes the use of scouts and spies to gather information and the
use of concealment and bluff to confuse the enemy.
If Black is checkmated or stalemated, the referee says so; otherwise, it is Black‘s turn to
move.
Initially, White‘s belief state is a singleton
because Black‘s pieces haven‘t moved yet.
After White makes a move and Black responds,
White‘s belief state contains 20 positions because Black has 20 replies to any White move.
Keeping track of the belief state as the game progresses is exactly the problem of state
estimation,
We can map Kriegspiel state estimation directly onto the partially observable,
nondeterministic framework, if we consider the opponent as the source of nondeterminism;
that is, the RESULTS of White‘s move are composed from the (predictable) outcome of
White‘s own move and the unpredictable outcome given by Black‘s reply
Given a current belief state, White may ask, “Can I win the game?”
For a partially observable game, the notion of a strategy is altered; instead of
specifying a move to make for each possible move the opponent might make, we need a move
for every possible percept sequence that might be received.
For Kriegspiel, a winning strategy, or guaranteed checkmate, is one that, for each
possible percept sequence, leads to an actual checkmate for every possible board state in the
current belief state, regardless of how the opponent moves.
With this definition, the opponent‘s belief state is irrelevant—the strategy has to work
even if the opponent can see all the pieces.
This greatly simplifies the computation. Figure 5.13 shows part of a guaranteed
checkmate for the KRK (king and rook against king) endgame.
In this case, Black has just one piece (the king), so a belief state for White can be
shown in a single board by marking each possible position of the Black king.
The general AND-OR search algorithm can be applied to the belief-state space to find
guaranteed checkmates.
The incremental belief-state algorithm mentioned
in that section often finds midgame checkmates up to depth 9—probably well beyond the
abilities of human players.
In addition to guaranteed checkmates, Kriegspiel admits an entirely new concept that
makes no sense in fully observable games: probabilistic checkmate.
Such checkmates are still required to work in every board state in the belief state; they
are probabilistic with respect to randomization of the winning player‘s moves.
To get the basic idea, consider the problem of finding a lone black king using just the
white king.
Simply by moving randomly, the white king will eventually bump into the black king
even if the latter tries to avoid this fate, since Black cannot keep guessing the right evasive
moves indefinitely. In the terminology of probability theory, detection occurs with probability
The KBNK endgame—king, bishop and knight against king—is won in this sense;
White presents Black with an infinite random sequence of choices, for one of which Black
will guess incorrectly and reveal his position, leading to checkmate.
The KBBK endgame, on the other hand, is won with probability 1− €.
White can force a win only by leaving one of his bishops unprotected for one move.
If Black happens to be in the right place and captures the bishop (a move that would
lose if the
bishops are protected), the game is drawn.
White can choose to make the risky move at some randomly chosen point in the middle
of a very long sequence, thus reducing to an arbitrarily small constant, but cannot reduce € to
zero.
It is quite rare that a guaranteed or probabilistic checkmate can be found within any
reasonable depth, except in the endgame.
Sometimes a checkmate strategy works for some of the board states in the current belief
state but not others.
Trying such a strategy may succeed, leading to an accidental checkmate—accidental
in the sense that White could not know that it would be checkmate—if Black‘s pieces happen
to be in the right places.
(Most checkmates in games between humans are of this accidental nature.)
This idea leads naturally to the question of how likely it is that a given strategy will win,
which leads in turn to the question of how likely it is that each board state in the current belief
state is the true board state.
One‘s first inclination might be to propose that all board states in the current belief state
are equally likely—but this can‘t be right. Consider, for example, White‘s belief state after
Black‘s first move of the game.
By definition (assuming that Black plays optimally), Black must have played an
optimal move, so all board states resulting from suboptimal moves ought to be assigned zero
probability.
This argument is not quite right either, because each player’s goal is not just to move
pieces to the right squares but also to minimize the information that the opponent has about
their location.
Playing any predictable ―optimal‖ strategy provides the opponent with information.
Hence, optimal play in partially observable games requires a willingness to play somewhat
randomly. (This is why restaurant hygiene inspectors do random inspection visits.)
This means occasionally selecting moves that may seem ―intrinsically‖ weak—but they
gain strength from their very unpredictability, because the opponent is unlikely to have
prepared any defense against them.
From these considerations, it seems that the probabilities associated with the board
states in the current belief state can only be calculated given an optimal randomized
strategy; in turn, computing that strategy seems to require knowing the probabilities of the
various states the board might be in.
This conundrum can be resolved by adopting the gametheoretic notion of an
equilibrium solution.
An equilibrium specifies an optimal randomized strategy for each player.
Computing equilibria is prohibitively expensive, however, even for small games, and is
out of the question for Kriegspiel.
At present, the design of effective algorithms for general Kriegspiel play is an open
research topic. Most systems perform bounded-depth lookahead in their own beliefstate space,
ignoring the opponent‘s belief state. Evaluation functions resemble those for the observable
game but include a component for the size of the belief state—smaller is better!
Card games
Card games provide many examples of stochastic partial observability, where the missing
information is generated randomly.
For example, in many games, cards are dealt randomly at the beginning of the game,
with each player receiving a hand that is not visible to the other
players.
Such games include bridge, whist, hearts, and some forms of poker.
At first sight, it might seem that these card games are just like dice games: the cards are
dealt randomly and determine the moves available to each player, but all the ―dice‖ are rolled
at the beginning!
Even though this analogy turns out to be incorrect, it suggests an effective algorithm:
consider all possible deals of the invisible cards; solve each one as if it were a fully observable
game; and then choose the move that has the best outcome averaged over all the deals
Suppose that each deal s occurs with probability P(s); then the move we want is
Solving even one deal is quite difficult, so solving ten million is out of the question.
Instead, we resort to a Monte Carlo approximation: instead of adding up all the deals,
we take a random sample of N deals,
where the probability of deal s appearing in the sample is proportional to P(s):
For games like whist and hearts, where there is no bidding or betting phase before play
commences, each deal will be equally likely and so the values of P(s) are all equal.
(Notice that P(s) does not appear explicitly in the summation, because the samples are
already drawn according to P(s).)
As N grows large, the sum over the random sample tends to the exact value, but even
for fairly small N—say, 100 to 1,000—the method gives a good
approximation.
It can also be applied to deterministic games such as Kriegspiel, given some reasonable
estimate of P(s).
For games like whist and hearts, where there is no bidding or betting phase before play
commences, each deal will be equally likely and so the values of P(s) are all equal.
For bridge, play is preceded by a bidding phase in which each team indicates how many
tricks it expects to win.
Since players bid based on the cards they hold, the other players learn more about the
probability of each deal
Taking this into account in deciding how to play the hand is tricky, for the reasons
mentioned in our description of Kriegspiel: players may bid in such
a way as to minimize the information conveyed to their opponents.
The strategy described in Equations 5.1 and 5.2 is sometimes called averaging over
clairvoyance because it assumes that the game will become observable to both players
immediately after the first move.
Consider the following story:
Day 1: Road A leads to a heap of gold; Road B leads to a fork. Take the left fork and
you‘ll find a bigger heap of gold, but take the right fork and you‘ll be run over by a bus.
Day 2: Road A leads to a heap of gold; Road B leads to a fork. Take the right fork and
you‘ll find a bigger heap of gold, but take the left fork and you‘ll be run over by a bus.
Day 3: Road A leads to a heap of gold; Road B leads to a fork. One branch of the
fork leads to a bigger heap of gold, but take the wrong fork and you‘ll be hit by a bus.
Unfortunately you don‘t know which fork is which.
Averaging over clairvoyance leads to the following reasoning: on Day 1, B is the right
choice; on Day 2, B is the right choice; on Day 3, the situation is the same as either Day 1 or
Day 2, so B must still be the right choice. Now we can see how averaging over clairvoyance
fails: it does not consider the belief state that the agent will be in after acting.
A belief state of total ignorance is not desirable, especially when one possibility is
certain death.Because it assumes that every future state will automatically be one of perfect
knowledge, the approach never selects actions that gather information (like the first move in
Figure 5.13); nor will it choose actions that hide information from the opponent or provide
information to a partner because it assumes that they already know the information; and it will
never bluff in poker,4 BLUFF because it assumes the opponent can see its cards.
2.Computational cost:
Game playing can be computationally expensive, especially for complex games such as
chess or Go, and may require powerful computers to achieve real-time performance.
3.Due to huge branching factor, the process of reaching the goal is slower.
4.Evaluation and search of all possible nodes and branches degrades the performance and
efficiency of the engine.
5. Both the players have too many choices to decide from.
6. If there is a restriction of time and space, it is not possible to explore the entire tree.
A complete assignment is one in which every variable is assigned, and a solution to a CSP is a
consistent, complete assignment.
A partial assignment is one that assigns values to only some of the variables.
Constraints can assert that one task must occur before another—for example, a wheel must be installed
before the hubcap is put on—and that only so many tasks can go on at once.
Constraints can also specify that a task takes a certain amount of time to complete.
We consider a small part of the car assembly, consisting of 15 tasks: install axles (front and back),
affix all four wheels (right and left, front and back), tighten nuts for each wheel, affix hubcaps, and inspect
the final assembly. We can represent the tasks with 15 variables:
X={
Inspect}.
The value of each variable is the time that the task starts. Next we represent precedence constraints
between individual tasks. Whenever a task must occur before task , and task T1 takes duration d1 to
complete, we add an arithmetic constraint of the form
+ d1 ≤ .
In our example, the axles have to be in place before the wheels are put on, and it takes 10 minutes to install an
axle, so we write
+ 10 ≤ ; + 10 ≤ ;
+ 10 ≤ ; +10 ≤ .
Next for each wheel, we must affix the wheel (which takes 1 minute), then tighten the nuts (2
minutes), and
finally attach the hubcap (1 minute, but not represented yet):
+1≤ ; +2≤ ;
+1≤ ; +2 ≤ ;
+1≤ ; +2≤ ;
+1≤ ; +2≤ .
Suppose we have four workers to install wheels, but they have to share one tool that helps put the axle
in place.
We need a disjunctive constraint to say that and must not overlap in time; either one
comes first or the other does:
( + 10 ≤ ) or ( + 10 ≤ ).
This looks like a more complicated constraint, combining arithmetic and logic.
But it still reduces to a set of pairs of values that and can take on.
We also need to assert that the inspection comes last and takes 3 minutes.
For every variable except Inspect we add a constraint of the form X + ≤ Inspect . Finally, suppose
there is a requirement to get the whole assembly done in 30 minutes.
Map- coloring problems and scheduling with time limits are both of this kind.
Where the variables Q1, . . . ,Q8 are the positions of each queen in columns 1, . . . , 8 and each
variable has the domain Di = {1, 2, 3, 4, 5, 6, 7, 8}.
A discrete domain can be infinite, such as the set of integers or strings.
(If we didn‘t put a deadline on the job-scheduling problem, there would be an infinite number of start
times
for each variable.)
With infinite domains, it is no longer possible to describe constraints by enumerating all allowed
combinations of values
A constraint language must be used that understands constraints such as + ≤ directly,
without enumerating the set of pairs of allowable values for ( , ).
= {0, 1,2,3,4,5,6,7,8,9}
= {0,1}
Global constraint
1.Node consistency
A single variable (corresponding to a node in the CSP network) is node-consistent if all
the values in the variable‘s domain satisfy the variable‘s unary constraints.
The simplest type is the unary constraint, which restricts the value of a single variable.
For example, in the map-coloring problem it could be the case that South Australians
won‘t tolerate the color green; we can express that with the unary constraint {(SA), SA ≠
green }
A binary constraint relates two variables. For example, SA = NSW is a binary
constraint.
For example, in the variant of the Australia map-coloring problem (Figure 6.1)
Where South Australians dislike green, the variable SA starts with domain {red, green,
blue}, and we can make it node consistent by eliminating green, leaving SA with the reduced
domain {red, blue}.
A variable in a CSP is arc-consistent if every value in its domain satisfies the variable‘s
binary constraints.
A network is arc-consistent if every variable is arc consistent with every other variable.
3.Path consistency
A two-variable set {Xi, Xj} is path-consistent with respect to a third variable Xm if,for
every assignment {Xi = a, Xj = b}
consistent with the constraints on {Xi, Xj},
there is an assignment to Xm that satisfies the constraints on {Xi, Xm} and {Xm, Xj}.
This is called path consistency.
Let‘s see how path consistency fares in coloring the Australia map with two colors.
We will make the set {WA, SA} path consistent with respect to NT.
We start by enumerating the consistent assignments to the set.
In this case, there are only two:
{WA = red, SA = blue} and
4. K-consistency
Stronger forms of propagation can be defined with the notion of k-consistency.
A CSP is k-consistent if, for any set of k − 1 variables and for any consistent
assignment to those variables, a consistent value can always be assigned to any kth variable.
1-consistency says that, given the empty set, we can make any set of one variable
consistent: this is what we called node consistency.
2-consistency is the same as arc consistency. For binary constraint networks.
3-consistency is the same as path consistency.
1.Variable ordering
The backtracking algorithm contains the line
var ←SELECT-UNASSIGNED-VARIABLE(csp)
The simplest strategy for SELECT-UNASSIGNED-VARIABLE is to choose the next
unassigned variable in order, {X1,X2, . . .}.
This static variable ordering seldom results in the most efficient search.
For example, after the assignments for WA=red and NT =green in Figure 6.6.
There is only one possible value for SA, so it makes sense to assign SA=blue next
rather than assigning Q.
In fact, after SA is assigned, the choices for Q, NSW, and V are all forced.
This intuitive idea—choosing the variable with the fewest ―legal‖ values—is called the
minimum-remaining-values (MRV) heuristic.
It also has been called the ―most constrained variable‖ or ―fail-first‖ heuristic, the latter
because it picks a variable that is most likely to cause a failure soon, thereby pruning the
search tree.
It attempts to reduce the branching factor on future choices by selecting the variable
that is involved in the largest number of constraints on other unassigned variables.
In Figure 6.1 b ,
SA is the variable with highest degree, 5;
The other variables have degree 2 or 3,
except for T, which has degree 0.
2. degree heuristic
In fact, once SA is chosen, applying the degree heuristic solves the problem without
any false steps—
You can choose any consistent color at each choice point and still arrive at a solution
with no backtracking.
The minimum-remaining values heuristic is usually a more powerful guide, but the
degree heuristic can be useful as a tie-breaker.
Least-constraining-value heuristic
It prefers the value that rules out the fewest choices for the neighboring variables in the
constraint graph.
For example, suppose that in Figure 6.1 we have generated the partial assignment with
WA=red and NT =green and that our next choice is for Q.
Blue would be a bad choice because it eliminates the last legal value left for Q‘s neighbor,
SA.
The least-constraining-value heuristic therefore prefers red to blue.
In general, the heuristic is trying to leave the maximum flexibility for subsequent
variable assignments.
Forward checking
One of the simplest forms of inference is called forward checking.
Because forward checking only does arc consistency inferences, there is no reason to
do forward checking if we have already done arc consistency as a preprocessing step.
This is called chronological backtracking because the most recent decision point is
revisited.
Consider what happens when we apply simple backtracking in Figure 6.1 with a fixed
variable ordering Q, NSW, V , T, SA, WA, NT.
Suppose we have generated the partial assignment {Q=red, NSW =green, V =blue, T
=red}.
When we try the next variable, SA, we see that every value violates a constraint. We
back up to T and try a new color for Tasmania!
To do this, we will keep track of a set of assignments that are in conflict with some
value for SA.
The set (in this case {Q=red ,NSW =green, V =blue, }), is called the conflict set for
SA.
The backjumping method backtracks to the most recent assignment in the conflict set;