0% found this document useful (0 votes)
15 views39 pages

Unit Ii

Unit 2 pdf

Uploaded by

Aditya Mishra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views39 pages

Unit Ii

Unit 2 pdf

Uploaded by

Aditya Mishra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

UNIT II PROBLEM SOLVING METHODS

Problem solving Methods- Search Strategies – Uninformed – Informed – Heuristics –


Local Search Algorithms and Optimization Problems – Searching with partial
Observations – Constraint Satisfaction Problems – Constraint Propagation –
Backtracking Search - Game playing – Optimal Decisions in Games – Alpha – Beta
Pruning – Stochastic Games

Solving Problems by Searching


Simple Reflex Agent - Unable to plan ahead
- Considers current percept for action
Goal based Agent - Problem Solving agent (decide what to do by finding sequences of actions
leading to desirable state)
Problem Solving Agents
This adopts a goal and aims to satisfy it. Example : Driving from one major town to another.
Steps in Problem Solving are :
 Goal Formulation - based on the current situation and the agent’s performance measure is
the first step in problem solving.
 Decide on factors that affect desirability to achieve goal
 Decide the various sequences of actions and states to consider. Choose best one.
 Find out which actions will lead to Goal state.
The process of looking for a sequence of actions that reaches the goal is called Search. A search
algorithm takes a problem as input and returns a solution in the form of an action sequence. Once a
solution is found, the actions it recommends can be carried out. This is called an execution Phase.
Thus we have a simple “formulate , search, execute” design for the agent as shown in the Figure.

Well- defined Problems and Solutions


A Problem can be defined formally by 5 components
 The initial state that the agent starts in.
1
Prepared by S.Surendhar AP/CSE
Example : The initial state for our agent in Romania might be described as In(Arad)
 A description of the possible actions available to the agent. Given a particular state s,
ACTION(s) returns the set of actions that can be executed in s. For example, from the state
In (Arad) , the applicable actions are {Go(Sibiu), Go(Timisoara), Go(Zerind)}
 A description of what each action does: the formal name for this is the transition model,
specified by a function RESULT(s,a) that returns the state that results from doing action a in
state s.
RESULT(In(Arad),Go (Zerind)) = In (Zerind)
 The goal state , which determines whether a given state is a goal state. The agent’s goal in
Romania is the singleton set {In(Bucharest)}
 A path cost function that assigns a numeric cost to each path. The problem-solving agent
chooses a cost function that reflects its own performance measure.

Fig : A simplified road map of part of Romania

On holiday in Romania; currently in Arad. Flight leaves tomorrow from Bucharest


Formulate goal: be in Bucharest
Formulate problem:
states: various cities
actions: drive between
cities
Find solution: sequence of cities, e.g., Arad, Sibiu, Fagaras, Bucharest

Problem Formulation is the process of deciding what actions and states to consider, given a goal.
Knowledge available to the agent is considered
 Current state
 Outcome of actions

Prepared by S.Surendhar AP/CSE


Example Problems
A Toy Problem is intended to illustrate or exercise various problem-solving methods. A real-
world problem is one whose solutions people actually care about.
Toy Problems:
Vaccum World
States : The state is determined by both the agent location and the dirt locations. The agent is in
one of the 2 locations, each of which might or might not contain dirt. Thus there are 2*22= 8
possible world states.
Initial state: Any state can be designated as the initial state.
Actions: In this simple environment, each state has just three actions: Left, Right, and Suck.
Larger environments might also include Up and Down.
Transition model: The actions have their expected effects, except that moving Left in the
leftmost square, moving Right in the rightmost square, and Sucking in a clean square have no
effect. The complete state space is shown in Figure.
Goal test: This checks whether all the squares are clean.
Path cost: Each step costs 1, so the path cost is the number of steps in the path.

8- Puzzle Problem

States: A state description specifies the location of each of the eight tiles and the blank in one of the
nine squares.
Initial state: Any state can be designated as the initial state. Note that any given goal can be
reached from exactly half of the possible initial states.
Actions: The simplest formulation defines the actions as movements of the blank space Left, Right,
Up, or Down. Different subsets of these are possible depending on where the blank is.
Transition model: Given a state and action, this returns the resulting state; for example, if we apply
Left to the start state in Figure 3.4, the resulting state has the 5 and the blank switched.
Goal test: This checks whether the state matches the goal configuration shown in Figure.
Path cost: Each step costs 1, so the path cost is the number of steps in the path.
3

Prepared by S.Surendhar AP/CSE


Search Algorithms
Figure shows the first few steps in growing the search tree for finding a route from Arad to
Bucharest. The root node of the tree corresponds to the initial state, In(Arad). The first step is to test
whether this is a goal state. We do this by expanding the current state; that is, GENERATING
applying each legal action to the current state, thereby generating a new set of states. In PARENT
NODE this case, we add three branches from the parent node In(Arad) leading to three new child
CHILD NODE nodes: In(Sibiu), In(Timisoara), and In(Zerind). Now we must choose which of
these three possibilities to consider further.
Suppose we choose Sibiu first. We check to see whether it is a goal state (it is not) and then expand
it to get In(Arad), In(Fagaras), In(Oradea), and In(RimnicuVilcea). We can then choose any of
these four or go
LEAF NODE back and choose Timisoara or Zerind. Each of these six nodes is a leaf node, that is, a
node with no children in the tree. The set of all leaf nodes available for expansion at any given
FRONTIER point is called the frontier.
The process of expanding nodes on the frontier continues until either a solution is found or there are
no more states to expand. The general TREE-SEARCH algorithm is shown as follows:

4
Prepared by S.Surendhar AP/CSE
Different Search Algorithms

UNINFORMED SEARCH STRATEGIES


• Uninformed or blind search strategies uses only the information available in the
problem definition
• Informed or heuristic search strategies use additional information. Heuristic tells us
approximately how far the state is from the goal state. Heuristics might underestimate
or overestimate the merit of a state.

Breadth-First Search
Breadth-first search is a simple strategy in which the root node is expanded first, then all
the successors of the root node are expanded next, then their successors, and so on. In general, all
the nodes are expanded at a given depth in the search tree before any nodes at the next level are
expanded.
Breadth-first search is an instance of the general graph-search algorithm in which the
shallowest unexpanded node is chosen for expansion. This is achieved very simply by using a FIFO
queue for the frontier. Thus, new nodes (which are always deeper than their parents) go to the back
of the queue, and old nodes, which are shallower than the new nodes, get expanded first. Thus,
breadth-first search always has the shallowest path to every node on the frontier.
 All nodes in depth ‘d’ expanded before the nodes at depth ‘d+1’
5

Prepared by S.Surendhar AP/CSE


 Finds the shallowest goal state
 Queuing function puts the newly generated states at the end of the queue
 Systematic strategy (level by level)
 BFS is complete, optimal Eg: branching factor=b
ie, every state is expanded to ‘b’ new state
 In path length d, 1 + b + b2 + b3 + : : : + bd number of nodes are expanded.
 Solution is found only on dth level. O(bd)
 Worst space, time complexity
Pseudocode and the progress of the search on a simple binary tree are shown in Figure.

Properties of breadth-First search


Complete: Yes (if b is _nite)
Time : 1 + b + b2 + b3 + : : : + bd + b(bd - 1) = O(bd+1), i.e., exp. in d
Space : O(bd+1) (keeps every node in memory)
Optimal : Yes (if cost = 1 per step); not optimal in general
Space is the big problem; can easily generate nodes at 100MB/sec so 24hrs = 8640GB.

Prepared by S.Surendhar AP/CSE


Uniform-cost search
When all step costs are equal, breadth-first search is optimal because it always expands the
shallowest unexpanded node. By a simple extension, we can find an algorithm that is optimal with
any step-cost function. Instead of expanding the shallowest node, uniform-cost search expands
the node n with the lowest path cost g(n). This is done by storing the frontier as a priority queue
ordered by g.
 Expands the lowest cost node as measured by the path cost g(n) rather than lowest depth
node.
 BFS uses g(n)=DEPTH(n)
 Uniform cost search finds the cheapest solution provided a simple requirement is met. The
cost of a path must never decrease as we go along the path ie, g(SUCCESSOR(n) >= g(n)
 Finds the cheapest path without exploring the whole search tree.
The algorithm is shown in Figure.

Properties of Uniform-cost Search


Expand least-cost unexpanded node
Implementation: fringe = queue ordered by path cost, lowest first
Equivalent to breadth-first if step costs all equal
Complete: Yes, if step cost > =£
Time: # of nodes with g <= cost of optimal solution, O(b[C*/£])
where C* is the cost of the optimal solution
Space: # of nodes with g <= cost of optimal solution, O(b[C*/£])
Optimal: Yes nodes expanded in increasing order of g(n)

Prepared by S.Surendhar AP/CSE


Depth First Search
Depth-first search always expands the deepest node in the current frontier of the search tree.
The progress of the search is illustrated in Figure. The search proceeds immediately to the deepest
level of the search tree, where the nodes have no successors. As those nodes are expanded, they are
dropped from the frontier, so then the search “backs up” to the next deepest node that still has
unexplored successors.
The depth-first search algorithm is an instance of the graph-search algorithm, whereas
breadth-first-search uses a FIFO queue, depth-first search uses a LIFO queue. A LIFO queue means
that the most recently generated node is chosen for expansion. This must be the deepest unexpanded
node because it is one deeper than its parent—which, in turn, was the deepest unexpanded node
when it was selected.
 Expands one of nodes at the deepest level till it reaches a dead end. Search goes back and
continues
 Less memory requirement
Example: branching factor ‘b’, depth ‘m’ DFS needs storage of bm nodes but BFS bd nodes
in memory
 Time complexity O(bm)
 DFS is faster than BFS
 Avoid DFS for search trees with large or infinite maximum depths

Properties of Depth First-Search


Complete : No: fails in infinite-depth spaces, spaces with loops.
Modify to avoid repeated sates along path.
complete in finite spaces
m
Time: O(b ): Terrible if m is much larger than d but if solutions are dense,
May be much faster than breadth-first
m
Space: O(b ), i.e., linear space
Optimal: No

Prepared by S.Surendhar AP/CSE


Depth-Limited Search
 Avoids the pitfalls of DFS by using a cutoff on the max depth of a path.
Example : 20 cities longest path length = 19 . “Generate a new state with a path length that
is one greater, if the already travelled path is less than 19”
 This search is complete but not optimal.
 For small depth-limit, the search is incomplete
 O(bl) time and O(bl) space where l is the depth limit
 Disadv : picking a good limit.

Depth-limited search can be implemented as a simple modification to the general tree or graph-
search algorithm. Alternatively, it can be implemented as a simple recursive algorithm as shown in
Figure. Notice that depth-limited search can terminate with two kinds of failure: the standard failure
value indicates no solution; the cutoff value indicates no solution within the depth limit.

Iterative-Deepening Search
Iterative deepening search (or iterative deepening depth-first search) is a general strategy, often used
in combination with depth-first tree search, that finds the best depth limit. It does this by gradually
increasing the limit—first 0, then 1, then 2, and so on until a goal is found. This will occur when the
depth limit reaches d, the depth of the shallowest goal node. The algorithm is shown in Figure.

Iterative deepening combines the benefits of depth-first and breadth-first search. Like depth-
first search, its memory requirements are modest: O(bd) to be precise. Like breadth-first search, it is
complete when the branching factor is finite and optimal when the path cost is a nondecreasing
function of the depth of the node. The expansion of states is like BFS except some states are
expanded multiple times. IDS is preferred for large search space and when the depth of the solution
is unknown.

Prepared by S.Surendhar AP/CSE


Iterative deepening search L=1

Iterative deepening search L=2

Iterative deepening search L=3

Properties of Iterative Deepening Search:


Complete: Yes
Time: (d + 1)b0 + db1 + (d - 1)b2 + : : : + bd = O(bd)
Space: O(bd)
Optimal: Yes, if step cost = 1
Can be modified to explore uniform-cost tree
Numerical comparison for b = 10 and d = 5, solution at far right leaf:
N(IDS) = 50 + 400 + 3000 + 20000 + 100,000 = 123,450
N(BFS) = 10 + 100 + 1000 + 10000 + 100,000 + 999, 990 = 1,111,100
IDS does better because other nodes at depth d are not expanded
BFS can be modifed to apply goal test when a node is generated
Disadvantage
 Iterative deepening looks inefficient because so many states are expanded multiple times. In
practice this is not that bad, because by far most of the nodes are at the bottom level.
 For a branching factor b of 2, this might double the search time.
 For a branching factor b of 10, this might add 10% to the search time.
Advantage
 Avoids the problem of choosing cutoffs without sacrificing efficiency.
 DFID is the optimal algorithm for uniformed search.

10

Prepared by S.Surendhar AP/CSE


Bidirectional Search
The idea behind bidirectional search is to run two simultaneous searches—one forward from the
initial state and the other backward from the goal—hoping that the two searches meet in the middle.
Bidirectional search is implemented by replacing the goal test with a check to see whether the
frontiers of the two searches intersect; if they do, a solution has been found.

For example, if a problem has solution depth d=6, and each direction runs breadth-first
search one node at a time, then in the worst case the two searches meet when they have generated
all of the nodes at depth 3. For b=10, this means a total of 2,220 node generations, compared with
1,111,110 for a standard breadth-first search. Thus, the time complexity of bidirectional search
using breadth-first searches in both directions is O(bd/2). The space complexity is also O(bd/2).

Comparing Uninformed Search Strategies


The following figure compares search strategies in terms of the four evaluation criteria. This
comparison is for tree-search versions. For graph searches, the main differences are that depth-first
search is complete for finite state spaces and that the space and time complexities are bounded by
the size of the state space.

INFORMED (HEURISTIC) SEARCH STRATEGIES


Best-First Search
Best first search combines the advantages of Breadth-First and Depth-First searches.
– DFS: follows a single path, don’t need to generate all competing paths.
– BFS: doesn’t get caught in loops or dead-end-paths.
• Best First Search: explore the most promising path seen so far. Nodes are ordered and
expanded using evaluation function. The best evaluation function is expanded first.
• Two types of evaluation function
– Greedy Best First Search, A* search
11

Prepared by S.Surendhar AP/CSE


(i) Greedy Best First Search
Greedy best first search minimize the estimated cost to reach the goal. It always expand the
node that appears to be closed to the goal.
Evaluation
function
f(n)=h(n)
• h(n) = estimated cost of the cheapest path from the state at node n to a goal state
Algorithm
1. Start with OPEN containing just the initial state.
2. Until a goal is found or there are no nodes left on OPEN do
(a) Pick the best node on OPEN
(b) Generate its successors
(c) For each successor do
(i) If it has not been generated before, evaluate it, add it to OPEN, and
record its parent.
(ii) If it has been generated before, change the parent if this new path is better
than the previous one. In that case, update the cost of getting to this node and
to any successors that this node may already have.\

Let us see how this works for route-finding problems in Romania; we use the straight line distance
heuristic, which we will call hSLD. If the goal is Bucharest, we need to know the straight-line
distances to Bucharest, which are shown in Figure. For example, hSLD(In(Arad))=366.

The following Figure shows the progress of a greedy best-first search using hSLD to find a
path from Arad to Bucharest. The first node to be expanded from Arad will be Sibiu because it is
12

Prepared by S.Surendhar AP/CSE


closer to Bucharest than either Zerind or Timisoara. The next node to be expanded will be Fagaras
because it is closest. Fagaras in turn generates Bucharest, which is the goal. For this particular
problem, greedy best-first search using hSLD finds a solution without ever expanding a node that is
not on the solution path; hence, its search cost is minimal. It is not optimal, however: the path via
Sibiu and Fagaras to Bucharest is 32 kilometers longer than the path through Rimnicu Vilcea and
Pitesti. This shows why the algorithm is called “greedy”—at each step it tries to get as close to the
goal as it can.

Performance Analysis
• Time and space complexity – O(bm)
• Optimality – no
• Completeness - no

(ii) A* SEARCH ALGORITHM (Minimizing the total estimated solution cost)


The most widely known form of best-first search is called A*search. It evaluates nodes by
combining g(n), the cost to reach the node, and h(n), the cost to get from the node to the goal
f(n) = h(n)+g(n)
• f(n) = cost of the cheapest solution through n
• g(n) = actual path cost from the start node to node n
Since g(n) gives the path cost from the start node to node n, and h(n) is the estimated cost of
the cheapest path from n to the goal, we have f(n) = estimated cost of the cheapest solution
through n .

13

Prepared by S.Surendhar AP/CSE


Thus, if we are trying to find the cheapest solution, a reasonable thing to try first is the node
with the lowest value of g(n) + h(n). It turns out that this strategy is more than just reasonable:
provided that the heuristic function h(n) satisfies certain conditions, A* search is both
complete and optimal. The algorithm is identical to UNIFORM-COST-SEARCH except that
A* uses g + h instead of g.
A∗ has the following properties: the tree-search version of A ∗ is optimal if h(n) is admissible,
while the graph-search version is optimal if h(n) is consistent.

14

Prepared by S.Surendhar AP/CSE


Algorithm
1. Create a priority queue of search nodes (initially the start state). Priority is
determined by the function f )
2. While queue not empty and goal not found:
(a) Get best state x from the queue.
(b) If x is not goal state:
(i) generate all possible children of x (and save path information with each
node).
(ii) Apply f to each new node and add to queue.
(iii) Remove duplicates from queue (using f to pick the best).

A-S-R-P-
Performance Analysis
• Time complexity – depends on heuristic function and admissible heuristic value
• space complexity – O(bm)
• Optimality – yes (locally finite graphs)
• Completeness – yes (locally finite graphs)

HEURISTIC FUNCTIONS

The 8-puzzle was one of the earliest heuristic search problems. The object of the puzzle is to slide
the tiles horizontally or vertically into the empty space until the configuration matches the goal
configuration. The average solution cost for a randomly generated 8-puzzle instance is about 22
steps. The branching factor is about 3.

There are two commonly used candidates:


• h1 = the number of misplaced tiles. For Figure 3.28, all of the eight tiles are out of position, so the
start state would have h1 = 8. h1 is an admissible heuristic because it is clear that any tile that is out
of place must be moved at least once.
15

Prepared by S.Surendhar AP/CSE


• h2 = the sum of the distances of the tiles from their goal positions. Because tiles cannot move
along diagonals, the distance we will count is the sum of the horizontal and vertical distances. This
is sometimes called the city block distance or Manhattan distance. h2 is also admissible because
all any move can do is move one tile one step closer to the goal. Tiles 1 to 8 in the start state give a
Manhattan distance of
h2 = 3+1 + 2 + 2+ 2 + 3+ 3 + 2 = 18 . As expected, neither of these overestimates the true solution
cost, which is 26.

LOCAL SEARCH ALGORITHMS AND OPTIMIZATION PROBLEM


Local search algorithms operate using a single current node (rather than multiple paths) and
generally move only to neighbors of that node. Typically, the paths followed by the search are not
retained. Although local search algorithms are not systematic, they have two key advantages:
(1) they use very little memory usually a constant amount
(2) they can often find reasonable solutions in large or infinite (continuous) state spaces for which
systematic algorithms are unsuitable.
In addition to finding goals, local search algorithms are useful for solving pure optimization
problems, in which the aim is to find the best state according to an objective function.

To understand local search, we find it useful to consider the state-space landscape (as in Figure). A
landscape has both “location” (defined by the state) and “elevation” (defined by the value of the
heuristic cost function or objective function). If elevation corresponds to cost, then the aim is to find
the lowest valley—a global minimum; if elevation corresponds to an objective function, then the
aim is to find the highest peak—a global maximum. Local search algorithms explore this
landscape. A complete local search algorithm always finds a goal if one exists; an optimal
algorithm always finds a global minimum/maximum.

Hill Climbing
The hill-climbing search algorithm (steepest-ascent version) is shown in Figure. It is
simply a loop that continually moves in the direction of increasing value ie, uphill. It terminates
when it reaches a “peak” where no neighbor has a higher value. The algorithm does not maintain a
search tree, so the data structure for the current node need only record. Hill climbing does not look
ahead beyond the immediate neighbors of the current state.

16
Prepared by S.Surendhar AP/CSE
To illustrate hill climbing, we will use the 8-queens problem. Local search algorithms typically use
a complete-state formulation, where each state has 8 queens on the board, one per column. The
successors of a state are all possible states generated by moving a single queen to another square in
the same column.
The heuristic cost function h is the number of pairs of queens that are attacking each other,
either directly or indirectly. The global minimum of this function is zero, which occurs only at
perfect solutions. The following figure shows a state with h=17. The figure also shows the values
of all its successors, with the best successors having h=12. Hill-climbing algorithms typically
choose randomly among the set of best successors if there is more than one. Hill climbing is
sometimes called greedy local search because it grabs a good neighbor state without thinking
ahead about where to go next.

For example, from the state in Figure (a), it takes just five steps to reach the state in Figure (b),
which has h=1 and is very nearly a solution. Unfortunately, hill climbing often gets stuck for the
following reasons:
Local maxima: a local maximum is a peak that is higher than each of its neighboring states but
lower than the global maximum.
Ridges: a ridge is shown in Figure. Ridges result in a sequence of local maxima that is very difficult
for greedy algorithms to navigate.
Plateaux: a plateau is a flat area of the state-space landscape. It can be a flat local maximum, from
which no uphill exit exists, or a shoulder, from which progress is possible.

Many variants of hill climbing have been invented. Stochastic hill climbing chooses at
random from among the uphill moves; the probability of selection can vary with the steepness of the
uphill move. First-choice hill climbing implements stochastic hill climbing by generating
17

Prepared by S.Surendhar AP/CSE


successors randomly until one is generated that is better than the current state. Random-restart hill
climbing adopts the well-known adage, “If at first you don’t succeed, try, try again.”
Advantages of Hill Climbing
• Estimates how far away the goal is.
• Is neither optimal nor complete.
• Can be very fast.

Simulated Annealing
A hill-climbing algorithm that never makes “downhill” moves toward states with lower
value (or higher cost) is guaranteed to be incomplete, because it can get stuck on a local maximum.
In contrast, a purely random walk—that is, moving to a successor chosen uniformly at random from
the set of successors—is complete but extremely inefficient. Therefore, it seems reasonable to try to
combine hill climbing with a random walk in some way that yields both efficiency and
completeness. Simulated annealing is such an algorithm. To explain simulated annealing, we
switch our point of view from hill climbing to gradient descent (i.e., minimizing cost) and imagine
the task of getting a ping-pong ball into the deepest crevice in a bumpy surface
The innermost loop of the simulated-annealing algorithm is quite similar to hill climbing.
Instead of picking the best move, however, it picks a random move. If the move improves the
situation, it is always accepted. Otherwise, the algorithm accepts the move with some probability
less than 1. The probability decreases exponentially with the “badness” of the move—the amount
ΔE by which the evaluation is worsened. The probability also decreases as the “temperature” T goes
down: “bad” moves are more likely to be allowed at the start when T is high, and they become more
unlikely as T decreases. If the schedule lowers T slowly enough, the algorithm will find a global
optimum with probability approaching 1.

Local Beam Search


The local beam search algorithm keeps track of k states rather than just one. It begins with
k randomly generated states. At each step, all the successors of all k states are generated. If any one
is a goal, the algorithm halts. Otherwise, it selects the k best successors from the complete list and
repeats.
At first sight, a local beam search with k states might seem to be nothing more than running
k random restarts in parallel instead of in sequence. In fact, the two algorithms are quite different. In

18
Prepared by S.Surendhar AP/CSE
a random-restart search, each search process runs independently of the others. In a local beam
search, useful information is passed among the parallel search threads. In effect, the states that
generate the best successors say to the others, “Come over here, the grass is greener!” The algorithm
quickly abandons unfruitful searches and moves its resources to where the most progress is being
made.
In its simplest form, local beam search can suffer from a lack of diversity among the k
states—they can quickly become concentrated in a small region of the state space, making the
search little more than an expensive version of hill climbing.

Genetic Algorithms
A genetic algorithm (or GA) is a variant of stochastic beam search in which successor states are
generated by combining two parent states rather than by modifying a single state.

Like beam searches, GAs begin with a set of k randomly generated states, called the
population. Each state, or individual, is represented as a string over a finite alphabet—most
commonly, a string of 0s and 1s. For example, an 8-queens state must specify the positions of 8
queens, each in a column of 8 squares, and so requires 8× log2 8=24 bits. Alternatively, the state
could be represented as 8 digits, each in the range from 1 to 8.
Figure (a) shows a population of four 8-digit strings representing 8-queens states. The
production of the next generation of states is shown in Figure (b)–(e). In (b), each state is rated by
the objective function, or (in GA terminology) the fitness function. A fitness function should return
higher values for better states, so, for the 8-queens problem we use the number of nonattacking
pairs of queens, which has a value of 28 for a solution. The values of the four states are 24, 23, 20,
and 11.
In (c), two pairs are selected at random for reproduction, in accordance with the probabilities
in (b). Notice that one individual is selected twice and one not at all. For each pair to be mated, a
crossover point is chosen randomly from the positions in the string. In Figure, the crossover points
are after the third digit in the first pair and after the fifth digit in the second pair. In (d), the offspring
themselves are created by crossing over the parent strings at the crossover point.
The 8-queens states involved in this reproduction step are shown in Figure. The example
shows that when two parent states are quite different, the crossover operation can produce a state
that is a long way from either parent state. It is often the case that the population is quite diverse
early on in the process, so crossover (like simulated annealing) frequently takes large steps in the
state space early in the search process and smaller steps later on when most individuals are quite
similar.
Finally, in (e), each location is subject to random mutation with a small independent
probability. One digit was mutated in the first, third, and fourth offspring. In the 8-queens problem,

19
Prepared by S.Surendhar AP/CSE
this corresponds to choosing a queen at random and moving it to a random square in its column. An
algorithm that implements all these steps are given below.

SEARCHING WITH PARTIAL OBSERVATIONS


The key concept required for solving partially observable problems is the belief state, representing
the agent’s current belief about the possible physical states it might be in, given the sequence of
actions and percepts up to that point.

Searching with no Observation


When the agent’s percepts provide no information at all, we have what is called a sensor less
problem or sometimes a conformant problem. At first, one might think the sensorless agent has
no hope of solving a problem if it has no idea what state it’s in; in fact, sensorless problems are
quite often solvable. Moreover, sensorless agents can be surprisingly useful, primarily because they
don’t rely on sensors working properly.
Assume that the agent knows the geography of its world, but doesn’t know its location or the
distribution of dirt. In that case, its initial state could be any element of the set {1, 2, 3, 4, 5, 6, 7, 8}.
Now, consider what happens if it tries the action Right. This will cause it to be in one of the states

20
Prepared by S.Surendhar AP/CSE
{2, 4, 6, 8}—the agent now has more information! Furthermore, the action sequence [Right,Suck]
will always end up in one of the states {4, 8}. Finally, the sequence [Right,Suck,Left,Suck] is
guaranteed to reach the goal state no matter what the start state
It is instructive to see how the belief-state search problem is constructed. Suppose the underlying
physical problem P is defined by ACTIONSP, RESULTP, GOAL-TESTP, and STEP-COSTP . Then
we can define the corresponding sensorless problem as follows:
• Belief states: The entire belief-state space contains every possible set of physical states.
If P has N states, then the sensorless problem has up to 2N states, although many may
be unreachable from the initial state.
• Initial state: Typically the set of all states in P, although in some cases the agent will
have more knowledge than this.
• Actions: This is slightly tricky. Suppose the agent is in belief state b={s1, s2}, but
ACTIONSP (s1) !=ACTIONSP (s2); then the agent is unsure of which actions are legal.
If we assume that illegal actions have no effect on the environment, then it is safe to take the
union of all the actions in any of the physical states in the current belief state b:

On the other hand, if an illegal action might be the end of the world, it is safer to allow only the
intersection, that is, the set of actions legal in all the states. For the vacuum world, every state
has the same legal actions, so both methods give the same result.
• Transition model: The agent doesn’t know which state in the belief state is the right one; so
as far as it knows, it might get to any of the states resulting from applying the action to one
of the physical states in the belief state. For deterministic actions, the set of states that might
be reached is

which may be larger than b, as shown in the following Figure.

The process of generating the new belief state after the action is called the prediction step;
the notation bl = PREDICTP (b, a) will come in handy.
• Goal test: The agent wants a plan that is sure to work, which means that a belief state
satisfies the goal only if all the physical states in it satisfy GOAL-TESTP . The agent may
accidentally achieve the goal earlier, but it won’t know that it has done so.
• Path cost: This is also tricky. If the same action can have different costs in different states,
then the cost of taking an action in a given belief state could be one of several values.

21

Prepared by S.Surendhar AP/CSE


The following Figure shows the reachable belief-state space for the deterministic, sensorless
vacuum world.
There are only 12 reachable belief states out of 28 =256 possible belief states. The action sequence
[Suck,Left,Suck] starting at the initial state reaches the same belief state as [Right,Left,Suck],
namely, {5, 7}. Now, consider the belief state reached by [Left], namely, {1, 3, 5, 7}. Obviously,
this is not identical to {5, 7}, but it is a superset. It is easy to prove that if an action sequence is a
solution for a belief state b, it is also a solution for any subset of b. Hence, we can discard a path
reaching {1, 3, 5, 7} if {5, 7} has already been generated. Conversely, if {1, 3, 5, 7} has already
been generated and found to be solvable, then any subset, such as {5, 7}, is guaranteed to be
solvable. This extra level of pruning may dramatically improve the efficiency of sensorless problem
solving.

Searching with Observations


For a general partially observable problem, we have to specify how the environment
generates percepts for the agent. For example, we might define the local-sensing vacuum world to
be one in which the agent has a position sensor and a local dirt sensor but has no sensor capable of
detecting dirt in other squares. The formal problem specification includes a PERCEPT(s) function
that returns the percept received in a given state.
For example, in the local-sensing vacuum world, the PERCEPT in state 1 is [A, Dirty]. Fully
observable problems are a special case in which PERCEPT(s)=s for every state s, while sensorless
problems are a special case in which PERCEPT(s)=null . When observations are partial, it will
usually be the case that several states could have produced any given percept. For example, the
percept [A, Dirty] is produced by state 3 as well as by state 1. Hence, given this as the initial
percept, the initial belief state for the local-sensing vacuum world will be {1, 3}. The ACTIONS,
STEP-COST, and GOAL-TEST are constructed from the underlying physical problem just as for
sensorless problems, but the transition model is a bit more complicated. We can think of transitions
from one belief state to the next for a particular action as occurring in three stages, as shown in
following Figure.
22
Prepared by S.Surendhar AP/CSE
• The prediction stage is the same as for sensorless problems: given the action a in belief state
b, the predicted belief state is

• The observation prediction stage determines the set of percepts o that could be observed
in the predicted belief state:

• The update stage determines, for each possible percept, the belief state that would result
from the percept. The new belief state bo is just the set of states in ˆb that could have
produced the percept:

Putting these three stages together, we obtain the possible belief states resulting from a given
action and the subsequent possible percepts:

Prepared by S.Surendhar AP/CSE


23
Solving partially Observable Problems

Figure 4.16 shows part of the search tree for the local-sensing vacuum world, assuming an initial
percept [A, Dirty]. The solution is the conditional plan
[Suck, Right, if Bstate ={6} then Suck else [ ]]
Because we supplied a belief-state problem to the AND–OR search algorithm, it returned a
conditional plan that tests the belief state rather than the actual state. This is as it should be: in a
partially observable environment the agent won’t be able to execute a solution that requires testing
the actual state.
As in the case of standard search algorithms applied to sensorless problems, the AND– OR
search algorithm treats belief states as black boxes, just like any other states. One can improve on
this by checking for previously generated belief states that are subsets or supersets of the current
state, just as for sensorless problems. One can also derive incremental search algorithms, analogous
to those described for sensorless problems, that provide substantial speedups over the black-box
approach.

An Agent for partially Observable Environments


The design of a problem-solving agent for partially observable environments is quite similar to the
simple problem-solving agent : the agent formulates a problem, calls a search algorithm (such as
AND-OR-GRAPH-SEARCH) to solve it, and executes the solution. There are two main
differences. First, the solution to a problem will be a conditional plan rather than a sequence; if
the first step is an if–then–else expression, the agent will need to test the condition in the if-part and
execute the then-part or the else-part accordingly. Second, the agent will need to maintain its belief
state as it performs actions and receives percepts.
Given an initial belief state b, an action a, and a percept o, the new belief state is:
bl = UPDATE(PREDICT(b, a), o).
This Equation is called a recursive state estimator because it computes the new belief state from the
previous one rather than by examining the entire percept sequence.
Figure 4.17 shows the belief state being maintained in the kindergarten vacuum world with
local sensing, wherein any square may become dirty at any time unless the agent is actively
cleaning it at that moment

Prepared by S.Surendhar AP/CSE


24
In partially observable environments—which include the vast majority of real-world
environments—maintaining one’s belief state is a core function of any intelligent system. This
function goes under various names, including monitoring, filtering and state estimation.

Here we will show an example in a discrete environment with deterministic sensors and
nondeterministic actions. The example concerns a robot with the task of localization: working out
where it is, given a map of the world and a sequence of percepts and actions. Our robot is placed in
the maze-like environment of Figure 4.18. The robot is equipped with four sonar sensors that tell
whether there is an obstacle—the outer wall or a black square in the figure—in each of the four
compass directions. We assume that the sensors give perfectly correct data, and that the robot has a
correct map of the enviornment. But unfortunately the robot’s navigational system is broken, so
when it executes a Move action, it moves randomly to one of the adjacent
squares. The robot’s task is to determine its current location.

Suppose the robot has just been switched on, so it does not know where it is. Thus its initial belief
state b consists of the set of all locations. The the robot receives the percept NSW, meaning there are
obstacles to the north, west, and south, and does an update using the equation bo =UPDATE(b),

Prepared by S.Surendhar AP/CSE


25
yielding the 4 locations shown in Figure 4.18(a). You can inspect the maze to see that those are the
only four locations that yield the percept NWS.
Next the robot executes a Move action, but the result is nondeterministic. The new belief
state, ba =PREDICT(bo,Move), contains all the locations that are one step away from the locations
in bo. When the second percept, NS, arrives, the robot does UPDATE(ba,NS) and finds that the
belief state has collapsed down to the single location shown in Figure 4.18(b). That’s the only
location that could be the result of
UPDATE(PREDICT(UPDATE(b,NSW),Move),NS) .

CONSTRAINT SATISFACTION PROBLEM


A problem is solved when each variable has a value that satisfies all the constraints on the variable.
A problem described this way is called a constraint satisfaction problem, or CSP.

CSP Search – Search procedure that operates in a space of constraint sets.


Initial State – contains the constraints that are originally given in the problem description
Goal State – any state that has been constrained enough where enough must be defined for each
problem
Example – Crypt arithmetic, Graph Coloring

Defining Constraint Satisfaction Problems


A constraint satisfaction problem consists of three components, X,D, and C:
X is a set of variables, {X1, . . . ,Xn}.
D is a set of domains, {D1, . . . ,Dn}, one for each variable.
C is a set of constraints that specify allowable combinations of values.
Each domain Di consists of a set of allowable values, {v1, . . . , vk} for variable Xi. Each constraint
Ci consists of a pair {scope, rel } where scope is a tuple of variables that participate in the constraint
and rel is a relation that defines the values that those variables can take on. A relation can be
represented as an explicit list of all tuples of values that satisfy the constraint, or as an abstract
relation that supports two operations: testing if a tuple is a member of the relation and enumerating
the members of the relation. For example, if X1 and X2 both have the domain {A,B}, then the
constraint saying the two variables must have different values can be written as {(X1,X2), [(A,B),
(B,A)]} or as{(X1,X2),X1 X2}
To solve a CSP, we need to define a state space and the notion of a solution. Each state in a CSP is
defined by an assignment of values to some or all of the variables, {Xi = vi,Xj = vj , . . .}. An
assignment that does not violate any constraints is called a consistent or legal assignment. A
complete assignment is one in which every variable is assigned, and a solution to a CSP is a
consistent, complete assignment. A partial assignment is one that assigns values to only some of
the variables.

Example Problem : Map Coloring


Consider the map of Australia showing each of its states and territories (Figure 6.1(a)). We are
given the task of coloring each region either red, green, or blue in such a way that no neighboring
regions have the same color. To formulate this as a CSP, we define the variables to be the regions X
= {WA,NT,Q,NSW, V,SA, T} .
The domain of each variable is the set Di = {red , green, blue}. The constraints require neighboring
regions to have distinct colors. Since there are nine places where regions border, there are nine
constraints:
C = {SA WA,SA NT,SA Q,SA NSW,SA V, WA NT,NT
Q,Q NSW,NSW V}.

Prepared by S.Surendhar AP/CSE


26
Here we are using abbreviations; SA WA is a shortcut for {(SA,WA),SA WA }, where
SA WA can be fully enumerated in turn as {(red , green), (red , blue), (green, red ), (green,
blue), (blue, red ), (blue, green)} .
There are many possible solutions to this problem, such as {WA=red ,NT =green,Q=red ,NSW
=green, V =red ,SA=blue, T =red }. It can be helpful to visualize a CSP as a constraint graph, as
shown in Figure 6.1(b). The nodes of the graph correspond to variables of the problem, and a link
connects any two variables that participate in a constraint.
CSPs yield a natural representation for a wide variety of problems; In addition, CSP solvers can be
faster than state-space searchers because the CSP solver can quickly eliminate large swatches of the
search space. For example, once we have chosen {SA=blue} in the Australia problem, we can
conclude that none of the five neighboring variables can take on the value blue. Without taking
advantage of constraint propagation, a search procedure would have to consider 3 5 =243
assignments for the five neighboring variables; with constraint propagation we never have to
consider blue as a value, so we have only 25 =32 assignments to look at, a reduction of 87%.

Example Problem – Crypt Arithmetic


The aim is to find a substitution of digits for letters such that the resulting sum is arithmetically
correct , each letter stand for a different digit
Given :
FORTY+ TEN + TEN = SIXTY
29786 + 850+ 850 = 31486
F=0, O=9, R=7, T=8, Y=6, E=5, N=0
Constraint Satisfaction is a two-step process
 Constraints are discovered and propagated as far as possible throughout the system. Then if
there is still not a solution, search begins
 A guess about something is made and added as a new constraint. Propagation can then occur
with this new constraint and so forth

Prepared by S.Surendhar AP/CSE

27
Prepared by S.Surendhar AP/CSE 28
Constraint Propagation : Inference in CSP
In CSPs there is a choice: an algorithm can search (choose a new variable assignment from several
possibilities) or do a specific type of inference called constraint propagation: using the
constraints to reduce the number of legal values for a variable, which in turn can reduce the legal
values for another variable, and so on.

The key idea is local consistency. If we treat each variable as a node in a graph (see Figure 6.1(b))
and each binary constraint as an arc, then the process of enforcing local consistency in each part of
the graph causes inconsistent values to be eliminated throughout the graph. There are different
types of local consistency, which are as follows.

Node consistency
A single variable (corresponding to a node in the CSP network) is node-consistent if all the values
in the variable’s domain satisfy the variable’s unary constraints. For example, in the variant of the
Australia map-coloring problem (Figure 6.1) where South Australians dislike green, the variable SA
starts with domain {red , green, blue}, and we can make it node consistent by eliminating green,
leaving SA with the reduced domain {red , blue}. We say that a network is node-consistent if every
variable in the network is node-consistent. It is always possible to eliminate all the unary constraints
in a CSP by running node
consistency.

Arc consistency
A variable in a CSP is arc-consistent if every value in its domain satisfies the variable’s binary
constraints. More formally, Xi is arc-consistent with respect to another variable Xj if for every value
in the current domain Di there is some value in the domain Dj that satisfies the binary constraint on
the arc (Xi,Xj). A network is arc-consistent if every variable is arc consistent with every other
variable. For example, consider the constraint Y = X 2 where the domain of both X and Y is the set
of digits. We can write this constraint explicitly as {(X, Y ), {(0, 0), (1, 1), (2, 4), (3, 9)) })
To make X arc-consistent with respect to Y , we reduce X’s domain to {0, 1, 2, 3}. If we also make
Y arc-consistent with respect to X, then Y ’s domain becomes {0, 1, 4, 9} and the whole CSP is arc-
consistent.
On the other hand, arc consistency can do nothing for the Australia map-coloring problem. Consider
the following inequality constraint on (SA,WA):
{(red , green), (red , blue), (green, red ), (green, blue), (blue, red ), (blue, green)} .

Prepared by S.Surendhar AP/CSE

29
No matter what value you choose for SA (or for WA), there is a valid value for the other variable.
So applying arc consistency has no effect on the domains of either variable.
The most popular algorithm for arc consistency is called AC-3 (see Figure 6.3). To make every
variable arc-consistent, the AC-3 algorithm maintains a queue of arcs to consider. Initially, the
queue contains all the arcs in the CSP. AC-3 then pops off an arbitrary arc (Xi,Xj) from the queue
and makes Xi arc-consistent with respect to Xj . If this leaves Di unchanged, the algorithm just
moves on to the next arc. But if this revises Di (makes the domain smaller), then we add to the
queue all arcs (Xk,Xi) where Xk is a neighbor of Xi. We need to do that because the change in Di
might enable further reductions in the domains of Dk, even if we have previously considered Xk. If
Di is revised down to nothing, then we know the whole CSP has no consistent solution, and AC-3
can immediately return failure. Otherwise, we keep checking, trying to remove values from the
domains of variables until no more arcs are in the queue. At that point, we are left with a CSP that is
equivalent to the original CSP—they both have the same solutions—but the arc-consistent CSP will
in most cases be faster to search because its variables have smaller domains

Path Consistency
Path consistency tightens the binary constraints by using implicit constraints that are inferred by
looking at triples of variables.
A two-variable set {Xi,Xj} is path-consistent with respect to a third variable Xm if, for every
assignment {Xi = a,Xj = b} consistent with the constraints on {Xi,Xj}, there is an assignment to
Xm that satisfies the constraints on {Xi,Xm} and {Xm,Xj}. This is called path consistency because
one can think of it as looking at a path from Xi to Xj with Xm in the middle.

Let’s see how path consistency fares in coloring the Australia map with two colors. We will make
the set {WA,SA} path consistent with respect to NT. We start by enumerating the consistent
assignments to the set. In this case, there are only two: {WA = red ,SA = blue}and {WA = blue,SA
= red}. We can see that with both of these assignments NT can be neither red nor blue (because it
would conflict with either WA or SA). Because there is no valid choice for NT, we eliminate both

Prepared by S.Surendhar AP/CSE


30
assignments, and we end up with no valid assignments for {WA,SA}. Therefore, we know that
there can be no solution to this problem.

Backtracking Search for CSP


The term backtracking search is used for a depth-first search that chooses values for one variable
at a time and backtracks when a variable has no legal values left to assign. The algorithm is shown
in Figure 6.5. It repeatedly chooses an unassigned variable, and then tries all values in the domain
of that variable in turn, trying to find a solution. If an inconsistency is detected, then BACKTRACK
returns failure, causing the previous call to try another value. Part of the search tree for the Australia
problem is shown in Figure 6.6, where we have assigned variables in the order WA,NT,Q, . . ..
Because the representation of CSPs is standardized, there is no need to supply BACKTRACKING-
SEARCH with a domain-specific initial state, action function, transition model, or goal test.

Prepared by S.Surendhar AP/CSE

31
The backtracking algorithm contains the line var ←SELECT-UNASSIGNED-VARIABLE(csp) .
The simplest strategy for SELECT-UNASSIGNED-VARIABLE is to choose the next unassigned
variable in order, {X1,X2, . . .}. This static variable ordering seldom results in the most efficient
search. For example, after the assignments for WA=red and NT =green in Figure 6.6, there is only
one possible value for SA, so it makes sense to assign SA=blue next rather than assigning Q. In fact,
after SA is assigned, the choices for Q, NSW, and V are all forced. This intuitive idea—choosing
the variable with the fewest “legal” values—is called the minimum remaining-values (MRV)
heuristic. It also has been called the “most constrained variable” or “fail-first” heuristic, the latter
because it picks a variable that is most likely to cause a failure soon, thereby pruning the search tree.
In Figure 6.1, SA is the variable with highest degree, 5; the other variables have degree 2 or 3,
except for T, which has degree 0. In fact, once SA is chosen, applying the degree heuristic solves
the problem without any false steps—you can choose any consistent color at each choice point and
still arrive at a solution with no backtracking.
Once a variable has been selected, the algorithm must decide on the order in which to examine its
values. For this, the least-constraining-value heuristic can be effective in some cases. It prefers
the value that rules out the fewest choices for the neighboring variables in the constraint graph. For
example, suppose that in Figure 6.1 we have generated the partial assignment with WA=red and NT
=green and that our next choice is for Q. Blue would be a bad choice because it eliminates the last
legal value left for Q’s neighbor, SA. The least-constraining-value heuristic therefore prefers red to
blue. In general, the heuristic is trying to leave the maximum flexibility for subsequent variable
assignments. Of course, if we are trying to find all the solutions to a problem, not just the first one,
then the ordering does not matter because we have to consider every value anyway. The same holds
if there are no solutions to the problem.

GAME PLAYING
Early work on AI focused on formal tasks such as game playing and theorem proving. In
game playing to select the next state, search technique is used. There were two reasons that
games appeared to be a good domain in which to explore machine intelligence.
(i) Games provide a structured task in which it is easy to measure success or failure
(ii) They did not require large amount of knowledge. They apply a straight forward
search and provide a solution from the starting state to a winning state

In generate and test procedure, the generator generates the entire proposed solutions and the
tester then evaluates. To improve the effectiveness of the test procedure two things can be
done.
 Improve the generate procedure so that only good moves are generated.
 Improve the test procedure, so that the best moves will be recognized and explored first.
Two types of generator
(i) Legal move generator – All the possible moves will be generated
(ii) Plausible move generator – Some smaller number of promising moves are generated.
32
Prepared by S.Surendhar AP/CSE
A game can be formally defined as a kind of search problem with the following elements:
• S0: The initial state, which specifies how the game is set up at the start.
• PLAYER(s): Defines which player has the move in a state.
• ACTIONS(s): Returns the set of legal moves in a state.
• RESULT(s, a): The transition model, which defines the result of a move.
• TERMINAL-TEST(s): A terminal test, which is true when the game is over and false
otherwise. States where the game has ended are called terminal states.
• UTILITY(s, p): A utility function (also called an objective function or payoff function), defines
the final numeric value for a game that ends in terminal state s for a player p. In chess, the outcome
is a win, loss, or draw, with values +1, 0, or 1/ 2 . Some games have a wider variety of possible
outcomes; the payoffs in backgammon range from 0 to +192.

Figure 5.1 shows part of the game tree for tic-tac-toe (noughts and crosses). From the initial state,
MAX has
nine possible moves. Play alternates between MAX’s placing an X and MIN’s placing an O until we
reach leaf nodes corresponding to terminal states such that one player has three in a row or all the
squares are filled. The number on each leaf node indicates the utility value of the terminal state from
the point of view of MAX; high values are assumed to be good for MAX and bad for MIN (which is
how the players get their names).

OPTIMAL DECISIONS IN GAMES


Consider the trivial game in Figure 5.2. The possible moves for MAX at the root node are labeled
a1, a2, and a3. The possible replies to a1 for MIN are b1, b2, b3, and so on. This particular game
ends after one move each by MAX and MIN. (In game parlance, we say that this tree is one move
deep, consisting of two half-moves, each of which is called a ply.) The utilities of the terminal
states in this game range from 2 to 14.

33
Prepared by S.Surendhar AP/CSE
Given a game tree, the optimal strategy can be determined from the minimax value of each node,
which we write as MINIMAX(n). The minimax value of a node is the utility (for MAX) of being in
the corresponding state, assuming that both players play optimally from there to the end of the
game. Obviously, the minimax value of a terminal state is just its utility. Furthermore, given a
choice, MAX prefers to move to a state of maximum value, whereas MIN prefers a state of
minimum value. So we have the following:

Let us apply these definitions to the game tree in Figure 5.2. The terminal nodes on the bottom level
get their utility values from the game’s UTILITY function. The first MIN node, labeled B, has three
successor states with values 3, 12, and 8, so its minimax value is 3. Similarly, the other two MIN
nodes have minimax value 2. The root node is a MAX node; its successor states have minimax
values 3, 2, and 2; so it has a minimax value of 3. We can also identify the minimax decision at the
root: action a1 is the optimal choice for MAX because it leads to the state with the highest minimax
value.

The Minimax Algorithm


The minimax algorithm (Figure 5.3) computes the minimax decision from the current state. It uses
a simple recursive computation of the minimax values of each successor state, directly
implementing the defining equations. The recursion proceeds all the way down to the leaves of the
tree, and then the minimax values are backed up through the tree as the recursion unwinds. For
example, in Figure 5.2, the algorithm first recurses down to the three bottom left nodes and uses the
UTILITY function on them to discover that their values are 3, 12, and 8, respectively. Then it takes
the minimum of these values, 3, and returns it as the backedup value of node B. A similar process
gives the backed-up values of 2 for C and 2 for D. Finally, we take the maximum of 3, 2, and 2 to
get the backed-up value of 3 for the root node.

Prepared by S.Surendhar AP/CSE

34
Let us examine how to extend the minimax idea to multiplayer games. we need to replace the single
value for each node with a vector of values. For example, in a three-player game with players A, B,
and C, a vector _vA, vB, vC_ is associated with each node. For terminal states, this vector gives the
utility of the state from each player’s viewpoint. (In two-player, zero-sum games, the two-element
vector can be reduced to a single value because the values are always opposite.) The simplest way
to implement this is to have the UTILITY function return a vector of utilities.

Consider the node marked X in the game tree shown in Figure 5.4. In that state, player C chooses
what to do. The two choices lead to terminal states with utility vectors {vA =1, vB =2, vC =6} and
{vA =4, vB =2, vC =3}. Since 6 is bigger than 3, C should choose the first move. This means that if
state X is reached, subsequent play will lead to a terminal state with utilities {vA =1, vB =2, vC
=6}. Hence, the backed-up value of X is this vector. The backed-up value of a node n is always the
utility vector of the successor state with the highest value for the player choosing at n. Anyone who
plays multiplayer games, such as diplomacy, quickly becomes aware that much more is going on
than in two-player games. Multiplayer games usually involve alliances, whether formal or informal,
among the players.

Prepared by S.Surendhar AP/CSE 35


ALPHA BETA CUT OFF
MINIMAX searches the entire tree, even if in some cases the rest can be ignored. But, this
alpha beta cutoff returns appropriate minimax decision without exploring entire tree. The
minimax search procedure is slightly modified by including branch and bound strategy one
for each players. This modified strategy is known as alpha beta pruning. It requires
maintenance of two threshold values, one representing a lower bound on the value that a
maximizing node may ultimately be assigned (alpha) and another representing upper bound
on the value that a minimizing node may be assigned (beta).

Alpha– beta pruning, when applied to a standard minimax tree, it returns the same move as
minimax would, but prunes away branches that cannot possibly influence the final decision
Let the two unevaluated successors of node C in Figure 5.5 have values x and y. Then the value of
the root node is given by
MINIMAX(root ) = max(min(3, 12, 8), min(2, x, y), min(14, 5, 2))
= max(3, min(2, x, y), 2)
= max(3, z, 2) where z = min(2, x, y) ≤ 2
= 3.

In other words, the value of the root and hence the minimax decision are independent of the
values of the pruned leaves x and y. Alpha–beta pruning can be applied to trees of any depth, and it
is often possible to prune entire subtrees rather than just leaves. The general principle is this:
consider a node n somewhere in the tree (see Figure 5.6), such that Player has a choice of moving to
that node. If Player has a better choice m either at the parent node of n or at any choice point further
up, then n will never be reached in actual play.
Prepared by S.Surendhar AP/CSE 36
Alpha–beta pruning gets its name from the following two parameters that describe bounds on the
backed-up values that appear anywhere along the path.

α = the value of the best (i.e., highest-value) choice we have found so far at any choice point
along the path for MAX.
β = the value of the best (i.e., lowest-value) choice we have found so far at any choice point
along the path for MIN.
Alpha–beta search updates the values of α and β as it goes along and prunes the remaining branches
at a node (i.e., terminates the recursive call) as soon as the value of the current node is known to be
worse than the current α or β value for MAX or MIN, respectively. The complete algorithm is given
in Figure 5.7.

Prepared by S.Surendhar AP/CSE


37
STOCHASTIC GAMES
In real life, many unpredictable external events can put us into unforeseen situations. Many games
mirror this unpredictability by including a random element, such as the throwing of dice. We call
these stochastic games. Backgammon is a typical game that combines luck and skill. Dice are
rolled at the beginning of a player’s turn to determine the legal moves. In the backgammon position
of Figure 5.10, for example, White has rolled a 6–5 and has four possible moves.

Although White knows what his or her own legal moves are, White does not know what Black is
going to roll and thus does not know what Black’s legal moves will be. That means White cannot
construct a standard game tree of the sort we saw in chess and tic-tac-toe. A game tree in
backgammon must include chance nodes in addition to MAX and MIN nodes. Chance nodes are
shown as circles in Figure 5.11

Terminal nodes and MAX and MIN nodes (for which the dice roll is known) work exactly the same
way as before. For chance nodes we compute the expected value, which is the sum of the value over
all outcomes, weighted by the probability of each chance action

Prepared by S.Surendhar AP/CSE


38
where r represents a possible dice roll (or other chance event) and RESULT(s, r) is the same state as
s, with the additional fact that the result of the dice roll is r.

39

Prepared by S.Surendhar AP/CSE

You might also like