0% found this document useful (0 votes)
4 views77 pages

AI Lec3 SearchAgents

Uploaded by

mzmindykkyan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views77 pages

AI Lec3 SearchAgents

Uploaded by

mzmindykkyan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

CSIT 5900

Lecture 3: Agents that Search

Department of Computer Science and Engineering


Hong Kong University of Science and Technology

(HKUST) Lecture 3: Search 1 / 77


Overview

Search problems.
Uninformed search.
Hueristic search.
Local search and constraint satisfaction problems.
Game tree search.

(HKUST) Lecture 3: Search 2 / 77


Missionaries and cannibals problem
From wikipedia (https:
//en.wikipedia.org/wiki/Missionaries_and_cannibals_problem):
In the missionaries and cannibals problem, three missionaries and
three cannibals must cross a river using a boat which can carry
at most two people, under the constraint that, for both banks, if
there are missionaries present on the bank, they cannot be out-
numbered by cannibals (if they were, the cannibals would eat the
missionaries). The boat cannot cross the river by itself with no
people on board...
In the jealous husbands problem, the missionaries and cannibals
become three married couples, with the constraint that no woman
can be in the presence of another man unless her husband is also
present. Under this constraint, there cannot be both women and
men present on a bank with women outnumbering men...

(HKUST) Lecture 3: Search 3 / 77


Search in State Spaces - The 8-Puzzle
8-puzzle:

5 4 5
1 4
2 3

6 1 88 6
8 84

7 3 22 7 6 25

Start State Goal State

8-puzzle solving agent:


Accessible environment: the agent knows exactly which state she is in.
Four possible actions: move blank left, right, up, or down.
Goal: find a sequence of actions that change the environment from
the initial to the goal state.
(HKUST) Lecture 3: Search 4 / 77
Search Problem Definition

In general, a search problem consists of


A set of states
An initial state
A set of deterministic actions (or operators): if an action is
executable in a state, then it will produce a unique next state
A goal test
A path cost function
A solution is a sequence of actions that leads to a goal state, i.e. a state
in which the goal is satisfied.

(HKUST) Lecture 3: Search 5 / 77


8-Puzzle - a formulation

States: any arrangements of the blank and the numbers 1 to 8 on the


board.
Initial state: any given one.
Goal test: the blank in the middle, and the numbers are in order
clockwise starting from the left top corner.
Actions: move the blank left, right, up, or down.
Path cost: the length of the path.

(HKUST) Lecture 3: Search 6 / 77


Problem-Solving Agents

Problem-solving agents are often goal-directed.


Key ideas:
I To determine what the best action is, a problem-solving agent
systematically considers the expected outcomes of different possible
sequences of actions that lead to some goal state.
I A systematic search process is conducted in some representation space.
Steps:
1 Goal and problem formulation
2 Search process
3 Action execution

(HKUST) Lecture 3: Search 7 / 77


Searching For Solutions
Searching for a solution to a problem can be thought of as a process of
building up a search tree:
The root of the search tree is the node corresponding to the initial
state.
At each step, the search algorithm chooses one leaf node to expand by
applying all possible actions to the state corresponding to the node.

function GENERAL-SEARCH( problem, strategy) returns a solution, or failure


initialize the search tree using the initial state of problem
loop do
if there are no candidates for expansion then return failure
choose a leaf node for expansion according to strategy
if the node contains a goal state then return the corresponding solution
else expand the node and add the resulting nodes to the search tree
end

(HKUST) Lecture 3: Search 8 / 77


An Example Search Strategy:
Breadth-First

Expand first the root node.


At each step, expand all leaf nodes.
Some properties of the breadth-first search:
If there is a solution, then the algorithm will find it eventually.
If there are more than one solutions, then the first one returned will
be a one with the shortest length.
The complexity (both time and space complexity) of breadth-first
search is very high, making it inpractical for most real world problems.

(HKUST) Lecture 3: Search 9 / 77


Breadth-First Search of the Eight-Puzzle
19
2 8 2 8
1 6 3 1 6 3
7 5 4 7 5 4
9
2 8 3 2 8 3
1 6 18 1 5 6
7 5 4 2 8 3 7 4
1 6
7 5 4 2 3
4 1 8 6
7 5 4
2 8 3 17
1 6 4
7 5 2 8 3 2 8 3
1 4 5 1 6
8 7 6 7 5 4
2 8 3 2 8 3
1 4 16 1 4 5
7 6 5 2 8 7 6
1 4 3 1 2 3
7 6 5 2 8 7 8 4
1 4 3 6 5
7 6 5
15
2 3 2 3 4 27
1 8 4 1 8 1 2 3 Goal
1 3 7 7 6 5 7 6 5 8 4
Start 2 8 3 2 8 3 2 3 7 6 5 node
1 6 4 1 4 1 8 4
node 2 8 3
7 5 7 6 5 7 6 5 14 26 7 1 4
2 3 1 2 3 6 5
1 8 4 8 4
7 6 5 7 6 5 2 8 3
7 4
6 1 5
13 25 8 1 3
2 8 3 2 8 3 2 4
7 1 4 7 1 4 7 6 5
6 5 6 5
8 3
6 2 1 4
2 8 3 7 6 5
1 4
24
7 6 5 8 3 2 8 3
2 1 4 6 7 4
12 7 6 5 1 5
8 3
2 1 4 2 8 3
6 7 4
7 6 5 23 1 5
2 2 8 3
6 7 4 2 8 3
2 8 3 1 5 6 4 5
1 6 4 1 7
7 5
2 8
11 22 6 4 3
2 8 3 2 8 3 1 7 5
6 4 6 4
1 7 5 1 7 5 2 3
6 8 4
5 1 7 5
2 8 3
6 4 21 2 3
1 7 5 2 3 6 8 4
6 8 4 1 7 5
1 7 5
8 6 3
2 4
1 7 5
10 20
8 3 8 3 8 3
2 6 4 2 6 4 2 6 4
1 7 5 1 7 5 1 7 5
© 1998 Morgan Kaufman Publishers

(HKUST) Lecture 3: Search 10 / 77


Search Strategies
The search strategy determines the order in which nodes in the search
tree are expanded.
Different search strategies lead to different search trees.
Four different criteria that determine what the right search strategy is
for a problem:
I Completeness:
Is it guaranteed to find a solution if one exists?
I Time complexity:
How long does it take to find a solution?
I Space complexity:
How much memory is needed?
I Optimality:
Does it find the “best” solution if several different solutions exist?
Types:
I Uninformed (or blind) search strategies
I Informed (or heuristic) search strategies

(HKUST) Lecture 3: Search 11 / 77


Depth-First (Backtracking) Search
Depth-first search generates the successors of a node one at a time;
as soon as a successor is generated, one of its successors is generated;
Normally, a depth bound is used: no successor is generated whose
depth is greater than the depth bound;
the following is a depth-first search illustration using 8 puzzle with a
depth bound of 5.
2 8 3 2 8 3 2 8 3
1 6 4 1 6 4 1 6 4
0 7 5 0 7 5 0 7 5

2 8 3 2 8 3 2 8 3
1 6 4 1 6 4 1 6 4
1 7 5 1 7 5 1 7 5

2 8 3 2 8 3 2 8 3
6 4 6 4 6 4
2 1 7 5 2 1 7 5 2 1 7 5

8 3 8 3 2 8 3 2 8 3
2 6 4 2 6 4 6 4 6 4
3 1 7 5 3 1 7 5 7 1 7 5 7 1 7 5

8 3 8 3 2 3
2 6 4 2 6 4 6 8 4
4 1 7 5 4 1 7 5 8 1 7 5

8 3 8 6 3 2 3
2 6 4 2 4 6 8 4
5 1 7 5 6 1 7 5 Discarded before 9 1 7 5
generating node 7
(a) (b) (c)
© 1998 Morgan Kaufman Publishers

(HKUST) Lecture 3: Search 12 / 77


Comparing Breadth-First and Depth-First Search

In the following, b is the branching factor; d is the depth of solution; m is


the depth bound of the depth first search:

Time Space Optimal? Complete?


Breadth First bd bd Yes Yes
Depth First bm bm No Yes, if m ≥ d

(HKUST) Lecture 3: Search 13 / 77


Iterative Deepening Search

A technique called iterative deepening enjoys the linear memory


requirements of depth-first search while guaranteeing optimality if a
goal can be found at all;
it conducts successive depth-first search, by increasing the depth
bound by 1 each time, until a goal is found:

Depth bound = 1 Depth bound = 2 Depth bound = 3 Depth bound = 4


© 1998 Morgan Kaufman Publishers

(HKUST) Lecture 3: Search 14 / 77


Avoiding Repeated States
Newly expanded nodes may contain states that have already
encountered before.
There are three ways to deal with repeated states, in increasing order
of effectiveness and computational overhead:
I Do not return to the state just came from.
I Do not create a path with cycles, that is, do not expand a node whose
state has appeared in the path already.
I Do not generate any state that was generated before.
A state space that generates an exponentially large search tree:

A A

B B B

C C C C C

(HKUST) Lecture 3: Search 15 / 77


Heuristic Search

A heuristic function is a mapping from states to numbers.


It normaly measures how far the current state is from a goal state, so
the smaller the value is the closer it is from the goal.
Heuristic or best-first search starts with a heuristic function and
chooses nodes that have smallest value to expand.

(HKUST) Lecture 3: Search 16 / 77


Eight Puzzle
Consider the following heuristic function for the eight-puzzle:
f(n) = number of tiles out of place compared with goal
Here is a search using this function (the number next to the node is its
value of this function):
2 8 3
1 6 4
4 7 5

2 8 3 2 8 3 2 8 3
1 6 4 1 4 1 6 4
5 7 5 3 7 6 5 5 7 5

2 8 3 2 3 2 8 3
1 4 1 8 4 1 4
3 7 6 5 3 7 6 5 4 7 6 5

8 3 2 8 3
2 1 4 7 1 4 To the goal
3 7 6 5 4 6 5

8 3
2 1 4
3 7 6 5

To more fruitless wandering


(HKUST) Lecture 3: Search 17 / 77
Eight Puzzle
The following is the search using the same heuristic function but with path
cost added:
2 8 3
1 6 4
0+4 7 5

2 8 3 2 8 3 2 8 3
1 6 4 1 4 1 6 4
1+5 7 5 1+3 7 6 5 1+5 7 5

2 8 3 2 3 2 8 3
1 4 1 8 4 1 4
2+3 7 6 5 2+3 7 6 5 2+4 7 6 5

8 3 2 8 3 2 3 2 3
2 1 4 7 1 4 1 8 4 1 8 4
3+3 7 6 5 3+4 6 5 3+2 7 6 5 3+4 7 6 5

1 2 3
8 4
4+1 7 6 5

Goal
1 2 3 1 2 3
8 4 7 8 4
5+0 7 6 5 5+2 6 5

© 1998 Morgan Kaufman Publishers


(HKUST) Lecture 3: Search 18 / 77
A∗ Search

Evaluation function:
I Sum of:
F actual path cost g (n) from the start node to node n
F estimated cost h(n)
I Estimated cost of the cheapest path through node n
Idea: Try to find the cheapest solution.

(HKUST) Lecture 3: Search 19 / 77


A* Search by Tree
A∗ search by trees - check only ancesters for repeated states
1 create a search tree T , consisting solely of the start node, n0 . Put n0
on a list called OPEN.
2 If OPEN is empty, then exit with failure.
3 Select the first node on OPEN, and remove it from OPEN. Call this
node n.
4 If n is a goal node, exit successfully with the solution corresponding
to the path from root n0 to this node.
5 Expand node n, generating the set M of its successors that are not
already ancestors of n in G . Install these members of M as children of
n in G , and add them to OPEN.
6 reorder the list OPEN in order of increasing g (n) + h(n) values. (Ties
are resolved in favor of the deepest node in the search tree.)
7 go to step 2.
(HKUST) Lecture 3: Search 20 / 77
Route Finding

Straight−line distance
Oradea to Bucharest
71
Neamt Arad 366
87
Bucharest 0
Zerind
75
151 Craiova 160
Iasi Dobreta 242
Arad
140 Eforie 161
92 Fagaras 178
Sibiu Fagaras
99 Giurgiu 77
118 Hirsova 151
Vaslui
80 Iasi 226
Rimnicu Vilcea Lugoj
Timisoara 244
142
Mehadia 241
111 Pitesti 211 Neamt 234
Lugoj 97
Oradea 380
70 98 Pitesti 98
85 Hirsova
146 101 Rimnicu Vilcea 193
Mehadia Urziceni
86 Sibiu 253
75 138 Bucharest Timisoara 329
Dobreta 120
90
Urziceni 80
Craiova Eforie Vaslui 199
Giurgiu Zerind 374

(HKUST) Lecture 3: Search 21 / 77


A∗ Search for Route Finding

Arad
f=0+366 Arad
=366

Sibiu Timisoara Zerind


f=140+253 f=118+329 f=75+374 Arad
=393 =447 =449

Sibiu Timisoara Zerind


f=118+329 f=75+374
=447 =449
Arad Fagaras Oradea Rimnicu Arad
f=280+366 f=239+178 f=146+380 f=220+193
=646 =417 =526 =413
Sibiu Timisoara Zerind
f=118+329 f=75+374
=447 =449
Arad Fagaras Oradea Rimnicu
f=280+366 f=239+178 f=146+380
=646 =417 =526
Craiova Pitesti Sibiu
f=366+160 f=317+98 f=300+253
=526 =415 =553

(HKUST) Lecture 3: Search 22 / 77


Behavior of A∗

We shall assume:
h is admissible, that is, it is never larger than the actual cost.
g is the sum of the cost of the operators along the path.
The cost of each operator is greater than some positive amount, .
The number of operators is finite (thus finite branching factor of
search tree).
Under these conditions, we can always revise h into another admissible
heuristic funtion so that the f = h + g values along any path in the search
tree is never decreasing. (Monotonicity.) If f ∗ is the cost of an optimal
solution, then

(HKUST) Lecture 3: Search 23 / 77


A∗ expands all nodes with f (n) < f ∗ .
A∗ may expands some nodes with f (n) = f ∗ before finding goal.
O

N
Z

I
A
380 S
F
V
400
T R

L P

H
M U
B
420
D
E
C
G

(HKUST) Lecture 3: Search 24 / 77


Completeness and Optimality of A∗
Under these assumptions,
A∗ is complete: A∗ expands nodes in order of increasing f , there are
only finite number of nodes with f (n) ≤ f ∗ is finite, so it must
eventually expanded to reach a goal state.
A∗ is optimal: it will always return an optimal goal. Proof:
G -optimal, G2 -suboptimal.
It is impossible for A∗ to find G2 before G :
Start

G G2

It is optimally efficient: no other optimal algorithm is guaranteed to


expand fewer nodes than A∗ (Dechter and Pearl 1985).
(HKUST) Lecture 3: Search 25 / 77
Complexity of A∗

Complexity:
I Number of nodes expanded is exponential in the length of the solution.
I All generated nodes are kept in memory. (A∗ usually runs out of space
long before it runs out of time.)
I With a good heuristic, significant savings are still possible compared to
uninformed search methods.
I Admissible heuristic functions that give higher values tend to make the
search more efficient.
Memory-bounded extensions to A∗ :
I Iterative deepening A∗ (IDA∗ )
I Simplified memory-bounded A∗ (SMA∗ )

(HKUST) Lecture 3: Search 26 / 77


Heuristic Functions for 8-Puzzle
Some possible candidates (admissible heuristic functions):
I Number of tiles that are in the wrong position (h1 )
I Sum of the city block distances of the tiles from their goal positions
(h2 )
Comparison of iterative deepening search, A∗ with h1 , and A∗ with h2
(avegared over 100 instances of the 8-puzzle, for various solution
lengths d):
Search Cost Effective Branching Factor
d IDS A*(h1 ) A*(h2 ) IDS A*(h1 ) A*(h2 )
2 10 6 6 2.45 1.79 1.79
4 112 13 12 2.87 1.48 1.45
6 680 20 18 2.73 1.34 1.30
8 6384 39 25 2.80 1.33 1.24
10 47127 93 39 2.79 1.38 1.22
12 364404 227 73 2.78 1.42 1.24
14 3473941 539 113 2.83 1.44 1.23
16 – 1301 211 – 1.45 1.25
18 – 3056 363 – 1.46 1.26
20 – 7276 676 – 1.47 1.27
22 – 18094 1219 – 1.48 1.28
24 – 39135 1641 – 1.48 1.26

We see that h2 is better than h1 .


In general, it is always better to use a heuristic function with higher
values, as long as it is admissible, i.e. does not overestimate: h2
dominates (is better than) h1 because for all node n, h2 (n) ≥ h1 (n).
(HKUST) Lecture 3: Search 27 / 77
Inventing Heuristic Functions

Questions:
How might one have come up with good heuristics such as h2 for the
8-puzzle?
Is it possible for a computer to mechanically invent such heuristics?
Relaxed Problem. Given a problem, a relaxed problem is one with less
restrictions on the operators.
Strategy. use the path cost of a relaxed problem as the heuristic function
for the original problem.

(HKUST) Lecture 3: Search 28 / 77


Inventing Heuristic Function - 8-Puzzle

Example. Suppose the 8-puzzle operators are described as:


A tile can move from square A to B if A is adjacent to B and B
is blank.
We can then generate three relaxed problems by removing some of the
conditions:
(a) A tile can move from square A to B if A is adjacent to B.
(b) A tile can move from square A to B if B is blank.
(c) A tile can move from square A to B.
Using (a), we get h2 . Using (c), we get h1 .
ABSOLVER (Prieditis 1993) is a computer program that automatically
generates heuristics based on this and other techniques.

(HKUST) Lecture 3: Search 29 / 77


Constraint Satisfaction Problem (Assignment Problem)

Problem definition:
A finite set of variables and their domains.
A finite set of conditions on these variables.
A solution is an assignment to these variables that satisfies all the
conditions.
Example: 8-queens problem:
Variables: q1 , ..., q8 , the position of a queen in column 1,...,8.
Domain: the same for all variables, {1, .., 8}.
Constraints: q1 − q2 6= 0, |q1 − q2 | =
6 1,...

(HKUST) Lecture 3: Search 30 / 77


Constructive Methods

Constructive methods use search strategies, especially depth-first search,


to try to find a solution:
States: any partial assignment (assign values to some of the
variables).
Initial state: the empty assignment.
Operator: pick a new variable, and assign a value in its domain to it.
Path cost: 0.
Goal condition: when all constraints are satisfied.

(HKUST) Lecture 3: Search 31 / 77


Constraint Propagation

A technique called constraint propagation can be used to prune the


search space: given the current partial assignment, one can use the
constraints to narrow down the search, i.e. eliminate some possible
values of the remaining variables.
How to do this depends on the type of constraints.
An example using SAT (plus dependency-based backtracking and
conflict-clause learning) will be given later.
The next few slides illustrate the idea using 4-Queen Problem and
constraint graphs constructed using constraints involving two
variables: A constraint graph is one where a node is labelled by a
variable and the domain that the variable can take, and there is an
edge from node i to j if the variable i and j are constrainted.

(HKUST) Lecture 3: Search 32 / 77


Running Example: 4-Queens Problem
Constraint graph:
q1

{1, 2, 3, 4}

q2 q3

{1, 2, 3, 4} {1, 2, 3, 4}

{1, 2, 3, 4}

q4
© 1998 Morgan Kaufman Publishers
(HKUST) Lecture 3: Search 33 / 77
Running Example: 4-Queens Problem
Constraint graph with q1 = 1:
q1

{1}

q2 q3

{ 3 , 4} {2, 4}

{2, 3}

q4

Values eliminated by first making arc (q 2, q 3)


and then arc (q 3, q 2) consistent
Value eliminated by next making arc (q 3, q 4) consistent
© 1998 Morgan Kaufman Publishers

(HKUST) Lecture 3: Search 34 / 77


Running Example: 4-Queens Problem
Constraint graph with q = 2:
q1

{2}

q2 q3

{4} {1, 3 }

{ 1 , 3, 4 }

q4

Value eliminated by first making arc (q 3, q 2) consistent


Value eliminated by next making arc (q 4, q 3) consistent
Value eliminated by next making arc (q 4, q 2) consistent
© 1998 Morgan Kaufman Publishers

(HKUST) Lecture 3: Search 35 / 77


Satisfiability
A clause is a disjunction of literals - a literal is a variable or the
negation of a variable.
Satisfiability (SAT) refers to the problem of deciding if a set of
clauses is satisfiable. It was the first NP-complete problem discovered
by Cook, it is also one of the most intensely studied NP-complete
problem.
There is a huge literature on SAT. SAT algorithms have many
applications.
3SAT refers to the problem of deciding if a set of clauses that have
no more than 3 literals is satisfiable.
In terms of computational complexity, SAT is equivalent to 3SAT.
A procedure for SAT is sound if whenever it returns yes, the input is
indeed satisfiable; it is complete if it will return yes whenever the
input is satisfiable. Obviously, we want only sound procedures. But
incomplete ones can sometimes be very useful.
(HKUST) Lecture 3: Search 36 / 77
DPLL Procedure
The most popular and widely studied complete method for SAT is the so
called Davis-Putnam-Longemann-Loveland (DPLL) procedure. (Can you
see that it’s really the constructive method for CSP?)
Let α be a set of clauses. If l is a literal occurring in α, then by α(l), we
mean the result of letting l true and simplifying: if a clause C does not
mention l and ¬l, then do nothing; if C mention l, then eliminate it; if C
mention ¬l, then delete ¬l from C .
Procedure DPLL(CNF: α)
if α is empty, then return yes.
else if there is an empty clause in α, return no.
else if there is a pure literal l in α, then return DPLL(α(l)).
else if there is a unit clause {l} in α, then return DPLL(α(l)).
else select a variable p in α, and do the following
if DPLL(α(p)) return yes, then return yes.
else return DPLL(α(¬p)).
Where a literal in α is pure if it occurrs only positively or negatively, and a
unit clause is one whose length is 1.
(HKUST) Lecture 3: Search 37 / 77
Modern DPLL
The state-of-art SAT solvers use DPLL with conflict driven clause learning
and dependency-based backtracking:
1 Start with the empty assignment.
2 If all variables are assigned then return with the assignment.
3 Choose an assignment for an un-assigned variable (backtrack point).
4 Propagate (deduce) the extended partial assignment.
5 If propagate yields a contradiction:
I analyze the conflict to learn some new clauses
(conflict driven clause learning)
I backtrack to the nearest backtrack point that is relevant to the conflict.
I exit with failure (unsatisfiable) if backtrack returns no new alternatives.
6 Else go back to step 2.
See https:
//en.wikipedia.org/wiki/Conflict-Driven_Clause_Learning for
more details.
(HKUST) Lecture 3: Search 38 / 77
GSAT
Incomplete methods cannot be used to check unsatisfiable CNFs. But on
satisfiable CNFs, incomplete methods can sometimes be much faster than
the currently known best complete methods.
The following randomized local search algorithm, often called GSAT, is
typical of current best known incomplete methods:
Procedure GSAT(CNF: α, max-restart: mr, max-climb: mc)
for i = 1 to mr do
get a randomly generated truth assignment v .
for j = 1 to mc do
if v satisfies α, return yes
else let v be a random choice of one the best successors of v
return failure
where a successor of v is an assignment with the truth value of one of the
variables flipped, and a best successor is one that satisfies maximum
number of clauses in α.

(HKUST) Lecture 3: Search 39 / 77


Heuristic Repair

GSAT is a local search method that belongs to Heuristic repair class


of methods in CSP.
In CSP, constructive methods starts with empty assignment, and build
up a solution gradually by assigning values to variables one by one.
Heuristic repair starts with a proposed solution, which most probably
does not satisfy all constraints,and repair it until it does.
An often used, so-called min-conflicts repair method by Gu will select
a variable for adjust, and find a value that minimizes conflicts - see
the following example on 8-Queens.

(HKUST) Lecture 3: Search 40 / 77


8-Queens By Min-Conflicts
2 x
0
x2 x
1 x
3 x
2 x
4 x
1 x

2 x
Move x 2
(1,3) to (1,2) 2 x
2 x
3 x
x3
1 x
1 x

2 x
x 1 Move (2,6)
2 x to (2,7)
2 x
x1
1
x 3 x
2 x

2 x
No change x 2
2 x
4 x
x 3
1
x 2 x
x1

© 1998 Morgan Kaufman Publishers

(HKUST) Lecture 3: Search 41 / 77


Unsupervised Learning (Clustering and K-Mean via CSP)

Unsupervised learning, by its name, can be about any learning


methods that do not need explicitly labeled examples.
However, it often refers to those clustering methods that take input
Dt to be a set of input x without labels y .
Reinforcement learning, while needing no explicitly labeled data, is
considered on its own (samples are generated dynamically and under
the control of the agent).

(HKUST) Lecture 3: Search 42 / 77


Example 1: English word clustering
Brown et al 1992:
Input: a (large) corpus of words (100 million words of news articles)
Output: Clusters of words, for example:
1 Friday Monday Thursday Wednesday Tuesday Saturday Sunday
weekends Sundays Saturdays
2 June March July April January December October November
September August
3 people guys folks fellows CEOs chaps doubters commies unfortunates
blokes
4 down backwards ashore sideways southward northward overboard aloft
downwards adrift
5 water gas coal liquid acid sand carbon steam shale iron
6 great big vast sudden mere sheer gigantic lifelong scant colossal
7 man woman boy girl lawyer doctor guy farmer teacher citizen
8 American Indian European Japanese German African Catholic Israeli
Italian Arab
It’s often called Brown clustering or IBM clustering, and used to be a
corner stone of NLP systems.
(HKUST) Lecture 3: Search 43 / 77
Example 2: Image clustering

Feature learning using neural networks clusters 10 million images (from


YouTube stills) into 22 thousands categories:

Figure: From Percy Liang’s AI note

(HKUST) Lecture 3: Search 44 / 77


Clustering

Given a training set of inputs Dt = {x1 , ..., xn }, classify them into a unique
cluster in {C1 , ..., CK }.
A CSP problem?
Variables: c1 ,...,cn .
Domains: {1, ..., K } (ci = j means that the ith input xi is assigned to
cluster j.
Constraints?
Intuition: “Similar” inputs should get the same assignment.

(HKUST) Lecture 3: Search 45 / 77


K-means

K-means clustering: “similar” = close in distance, and each cluster i is


represented by a centroid µi .
K-means clustering CSP:
Variables: c1 ,...,cn , µ1 ,...,µK .
Domains: {1, ..., K } for ci and the feature space for µi (i.e. the
domain for φ(x)).
Constraints: minimize the following
n
X
kφ(xi ) − µci k2 .
i=1

(HKUST) Lecture 3: Search 46 / 77


K-means - example
Consider Dt with four examples whose feature space is one dimensional
numbers:
φ(Dt ) = {1, 2, 10, 11}.
and K=2, meaning we want to have two clusters. Clearly the solution is

c1 = c2 = 1, c3 = c4 = 2, µ1 = 1.5, µ2 = 10.5.

Insight: c and µ: knowing one makes the other easily computed:


Given µ1 = 1.5, µ2 = 10.5,

c1 = arg mini (1 − µi )2 = 1.

Given c1 = c2 = 1, c3 = c4 = 2,

µ1 = minα (1 − α)2 + (2 − α)2 = 1.5.

(HKUST) Lecture 3: Search 47 / 77


K-means

Algorithm - informal:
1 Initialize µ1 ,...,µK .
2 Iterate the following for T steps:
1 compute the best assignment c1 , ..., cn for the given µ.
2 computer the best centroids µ1 , ..., µK for the given c.

(HKUST) Lecture 3: Search 48 / 77


Search as Function Maximization

Funciton maximization problems: find a x such that Value(x) is


maximal.
Examples: 8-queens problem
I find a state in which none of the 8 queens are attacked.
I design a function on the states such that it’s value is maximal when
none of the queens are attacked.
I one such function can be: Value(n) = 1/e k(n) , where k(n) is the
number of queens that are attacked.
Another example: VLSI design - find a layout that satisfies the
constraints.

(HKUST) Lecture 3: Search 49 / 77


Hill-Climbing
Basic ideas:
Start at the initial state.
At each step, move to a next state with highest value.
It does not maintain the search tree and does not backtrack - it keeps
only the current node and its evaluation.

evaluation

current
state

(HKUST) Lecture 3: Search 50 / 77


Hill-Climbing Search Algorithm

function HILL-CLIMBING( problem) returns a solution state


inputs: problem, a problem
static: current, a node
next, a node
current MAKE-NODE(INITIAL-STATE[problem])
loop do
next a highest-valued successor of current
if VALUE[next] < VALUE[current] then return current
current next
end

(HKUST) Lecture 3: Search 51 / 77


Simulated Annealing
Hill-climbing can get easilty stuck in a local maximum.
One way to aovid getting stuck in a local maximum is by using
simulated annealing.
The main idea:
I at each step, instead of picking the best move, it picks a random move.
I if the move leads to a state with higher value, then execute the move.
I otherwise, execute the move with certain probability that becomes
smaller as the algorithm progresses.
I the probability is computed according to a function (schedule) that
maps time to probabilities.
The idea comes from the process of annealing - cooling metal liquid
gradually until it freezes.
It has been found to be extremely effective in real applications such
as factory scheduling.

(HKUST) Lecture 3: Search 52 / 77


Games - An Example

Tic-Tac-Toe:
A board with nine squares.
Two players: “X” and “O”; “X” moves first, and then alternate.
At each step, the player choose an unoccupied square, and mark it
with his/her name. Whoever gets three in a line wins.

(HKUST) Lecture 3: Search 53 / 77


Games as Search Problems
Mainly two-player games are considered.
Two-player game-playing problems require an agent to plan ahead in
an environment in which another agent acts as its opponent.
Two sources of uncertainty:
I Opponent: The agent does not know in advance what action its
opponent will take.
I Time-bounded search: Time limit has to be imposed because searching
for optimal decisions in large state spaces may not be practical. The
“best” decisions found may be suboptimal.
Search problem:
I Initial state: initial board configuration and indication of who makes
the first move.
I Operators: legal moves.
I Terminal test: determines when a terminal state is reached.
I Utility function (payoff function): returns a numeric score to quantify
the outcome of a game.

(HKUST) Lecture 3: Search 54 / 77


Game Tree for Tic-Tac-Toe

MAX (X)

X X X
MIN (O) X X X
X X X

X O X O X ...
MAX (X) O

X O X X O X O ...
MIN (O) X X

... ... ... ...

X O X X O X X O X ...
TERMINAL O X O O X X
O X X O X O O
Utility −1 0 +1

(HKUST) Lecture 3: Search 55 / 77


Minimax Algorithm With Perfect Decisions

Minimax Algorithm (With Perfect Decisions) Assume the two players are:
MAX (self) and MIN (opponent). To evaluate a node n in a game tree:
1 Expand the entire tree below n.
2 Evaluate the terminal nodes using the given utility function.
3 Select a node that has not been evaluated yet, and all of its children
have been evaulated. If there are no such node, then return.
4 If the selected node is on at which the MIN moves, assign it the
minimum of the values of its children. If the selected node is on at
which the MAX moves, assign it the maximum of the values of its
children. Return to step 3.

(HKUST) Lecture 3: Search 56 / 77


MAX tries to maximize the utility, assuming that MIN will act to minimize
it.

MAX 3

A1 A2 A3

MIN 3 2 2

A 11 A 12 A 13 A 21 A 22 A 23 A 31 A 32 A 33

3 12 8 2 4 6 14 5 2

(HKUST) Lecture 3: Search 57 / 77


Imperfect Decisions

Minimax algorithm with perfect decisions:


I No time limit is imposed.
I The complete search tree is generated.
It is often impractical to make perfect decisions, as the time and/or
space requirements for complete game tree search (to terminal states)
are intractable.
Two modifications to minimax algorithm with perfect decisions:
I Partial tree search:
Terminal test is replaced by a cutoff test.
I Evaluation function:
Utility function is replaced by a heuristic evaluation function.

(HKUST) Lecture 3: Search 58 / 77


Minimax Algorithm (With Imperfect Decisions) Assume the two players
are: MAX (self) and MIN (opponent). To evaluate a node n in a game
tree:
1 Expand the tree below n according to the partial tree search.
2 Evaluate the leaf nodes using the given evaluation function.
3 Select a node that has not been evaluated yet, and all of its children
have been evaulated. If there are no such node, then return.
4 If the selected node is on at which the MIN moves, assign it the
minimum of the values of its children. If the selected node is on at
which the MAX moves, assign it the maximum of the values of its
children. Return to step 3.

(HKUST) Lecture 3: Search 59 / 77


Evaluation Functions
An evaluation function returns an estimate of the expected utility of
the game from a given position.
Requirements:
I Computation of evaluation function values is efficient.
I Evaluation function agrees with utility function on terminal states.
I Evaluation function accurately reflects the chances of winning.
Most game-playing programs use a weighted linear function:

w1 f1 + w2 f2 + · · · wn fn

where the f ’s are the features (e.g. number of queens in chess) of the
game position, and w ’s are the weights that measure the importance
of the corresponding features.
Learning good evaluation functions automatically from past
experience is a promising new direction.

(HKUST) Lecture 3: Search 60 / 77


Tic-Tac-Toe

Evauation function, e(p), for Tic-Tac-Toe


If p is not a winning position for either player, then e(p) = (the
number of complete rows, columns, or diagonals that are still open for
MAX) - (the number of complete rows, columns, or diagonals that
are still open for MIN).
If p is a win for MAX, then e(p) = ∞; if p is a win for MIN, then
e(p) = −∞.
The next three slides illustrate Minimax search with imperfect decisions.
Notice that we have eliminated symmetric positions.

(HKUST) Lecture 3: Search 61 / 77


The First Stage of Search
1
OX 6–4=2
X

X 5–4=1
O O 4 – 6 = –2
X
MAX’s move
O
6–6=0
1 –2 X

O
5 – 6 = –1
X X

Start node
O 5–5=0
X

5 – 6 = –1
OX

O 4 – 5 = –1
X

O
5–5=0
–1 X

O
6–5=1
X X

5–5=0
X O

6–5=1
X O
(HKUST) Lecture 3: Search 62 / 77
The Second Stage of Search

1 0
Start node
OX OX X OX X 3–3=0
O
MAX’s move
OX X 4–3=1
1 O
O
OX OX 4–2=2 OX X 4–3=1
X X O
O
OX 4–2=2
0 X
O O
OX OX 4–3=1 OX 3–2=1
X X X
O
OX 4–3=1 OX O 5–2=3
1 X X
O O
OX OX 3–2=1 OX 3–3=0 OX 4–2=2
X X X OX
O
OX 4–2=2 OX O 5–3=2 OX 4–2=2
X X O X
O
OX 3–2=1 OX 3–3=0
X X O

OX O 5–2=3 OX 4–3=1
X OX

OX 3–2=1
X O

OX 4–2=2
X O

© 1998 Morgan Kaufman Publishers

(HKUST) Lecture 3: Search 63 / 77


The Last Stage of Search
OX O OOX
OX 2–1=1 OX 3–1=2
X X
OX O X
OX O 2–1=1 OX O 3–1=2
X X
–∞ OX –∞ O X
OX
OX 2–1=1 O X
OX 2–1=1
OX OX
OX OX
X OX X O X
OX –∞ OX – ∞
O X O X

OO
OX X 3–2=1
X
1 –∞ O O

O O
OX X 2–2=0
X
OX OX X
X X O
Start
OX X 2–2=0
X
node O
MAX’s OX X –∞
move O X

1 OO –∞ OO

O
OX 3–1=2 O
D OX 3–2=1
X X X X
OX OX
X X O O X X O O
OX 2–1=1 C OX 2–2=0
X X X X
O O
OX O 3–1=2 B OX O 3–2=1
X X X X
O O
OX 2–1=1 A OX –∞
X OX OX X

© 1998 Morgan Kaufman Publishers

(HKUST) Lecture 3: Search 64 / 77


Partial Tree Search

Different approaches:
I Depth-limited search
I Iterative deepening search
I Quiescent search
Quiescent positions are positions that are not likely to have large
variations in evaluation in the near future.
Quiescent search:
I Expansion of nonquiescent positions until quiescent positions are
reached.

(HKUST) Lecture 3: Search 65 / 77


Pruning

Pruning: The process of eliminating a branch of the search tree from


consideration without actually examining it.
General idea: If m is better than n for Player, then n will never be
reached in actual play and hence can be pruned away.

Player

Opponent m

..
..
..

Player

Opponent n

(HKUST) Lecture 3: Search 66 / 77


Alpha-Beta Pruning
Minimax search without pruning:

MAX 3

A1 A2 A3

MIN 3 2 2

A 11 A 12 A 13 A 21 A 22 A 23 A 31 A 32 A 33

3 12 8 2 4 6 14 5 2

Minimax search with alpha-beta pruning:

MAX 3

A1 A2 A3

MIN 3 <=2 2

A 11 A 12 A 13 A 21 A 22 A 23 A 31 A 32 A 33

3 12 8 2 14 5 2

Effectiveness of alpha-beta pruning depends on the order in which


successor nodes are examined.
(HKUST) Lecture 3: Search 67 / 77
Alpha-Beta Search Algorithm - Informal Description
To evaluate a MAX node in a game tree (we shall call the value assigned
to a MAX node its alpha value, and that to a MIN node its beta value):
1 Expand the node depth-first until a node that satisfies the cutoff test
is reached.
2 Evaluate the cutoff node.
3 Update the values of all the nodes that have so far been expanded
according to Minimax algorithm, and using the following pruning
strategy:
I prune all children of any MIN node whose beta value is ≤ the alpha
value of any of its MAX ancestors.
I prune all children of any MAX node whose alpha value is ≥ the beta
value of any of its MIN ancestors.
4 Backtrack to a node that has not been pruned, and go back to step
1. If there are no such node to backtrack to, then return with the
value assigned to the original node.

(HKUST) Lecture 3: Search 68 / 77


Example
Part of the first state of search in Tic-Tac-Toe using Alpha-Beta search.
Beta value = –1
5 – 6 = –1
C OX
B X

Alpha value = –1
O 4 – 5 = –1
X
Start node –1 O
A 5–5=0
X

X
O 6–5=1
X

5–5=0
X O

6–5=1
X O

© 1998 Morgan Kaufman Publishers

(HKUST) Lecture 3: Search 69 / 77


Alpha-Beta Search Algorithm
The above informal description corresponds to calling the following
function
MAX-VALUE(node,game,-∞,∞):

function MAX-VALUE(state, game, ,  ) returns the minimax value of state


inputs: state, current state in game
game, game description
, the best score for MAX along the path to state
 , the best score for MIN along the path to state
if CUTOFF-TEST(state) then return EVAL(state)
for each s in SUCCESSORS(state) do
MAX( , MIN-VALUE(s, game, ,  ))
if   then return 
end
return

function MIN-VALUE(state, game, ,  ) returns the minimax value of state

if CUTOFF-TEST(state) then return EVAL(state)


for each s in SUCCESSORS(state) do
 MIN(  , MAX-VALUE(s, game, ,  ))
if  then return
end
return 

(HKUST) Lecture 3: Search 70 / 77


Analyses of Game-Playing Algorithm

The effectiveness of alpha-beta pruning depends on the ordering in


which successors are generated.
Assuming optimal ordering,√alpha-beta pruning can reduce the
branching factor from b to b (Knuth and Moore 1975).
The result also assumes that all nodes have the same branching
factor, that all paths reach the same fixed depth limit, and that the
leaf (cutoff node) evaluations are randomly distributed.
All the game-playing algorithms assume that the opponent plays
optimally.

(HKUST) Lecture 3: Search 71 / 77


Monte-Carlo Search
Pure Monte-Carlo Search:
Expand the current node;
For each child, play a large number of random games to the finish
and compute the average payoffs (values).
Play the child move that has the largest average value.

(HKUST) Lecture 3: Search 72 / 77


Monte-Carlo Tree Search

Making use of MCS in game playing:


Use formulas other than average to summarize the sampling results.
Combine it with alpha-beta pruning search: start with alpha-beta
pruning and go to Monte-Carlo when the level is too deep to continue
or when there is no good evaluation function on the states.
Make use of domain knowledge, and combine it with learning
techniques.

(HKUST) Lecture 3: Search 73 / 77


An Example

Averages may be misleading: minimax would choose the left branch while
averaging would lead to the right branch.

(HKUST) Lecture 3: Search 74 / 77


Monte-Carlo Search with UCB
MCS with UCB (Upper Confidence Bound):
Expand the current node v ;
For each child Mi , play a large number of random games to the finish
and compute its UCB. A common formula is:
s
log N
UCB(Mi ) = µi + c
Ni

where
I µi is the expected value of the games (tryouts) for the child node Mi .
For example, in zero-sum game, it’s W Ni , where Wi is the number of
i

win’s for Mi , and Ni is the total number of plays for Mi ;


I N is the total number of plays for v ;
I c is a constant called the exploration parameter used to balance
branches.
Play the child move that has the largest average value.
(HKUST) Lecture 3: Search 75 / 77
Monte-Carlo Tree Search (MCTS)
MCTS:
Start building the tree using alpha-beta pruning.
Continue on with Monte-Carlo search with UCB scoring rule.
Propagate the values up the alpha-beta tree.

(HKUST) Lecture 3: Search 76 / 77


State of Art

Chess: Deep Blue (Benjamin, Tan, et al.) defeated world chess


champion Gary Kasparov on May 11, 1997.
Checker: Chinook (Schaeffer et al.) became the world champion in
1994.
Go: AlphaGo (Silver et al.) defeated a top-level player in 2016.

(HKUST) Lecture 3: Search 77 / 77

You might also like