0% found this document useful (0 votes)

114 views7 pages

Cheatsheet States Models

This document discusses search algorithms for pathfinding in state-based models. It describes backtracking search, breadth-first search (BFS), depth-first search (DFS), and iterative deepening which can be used to find optimal paths between states. It also discusses graph search algorithms like dynamic programming and uniform cost search, noting dynamic programming only works for acyclic graphs while uniform cost search can handle graphs with cycles. Key complexities are presented for each algorithm depending on factors like graph structure and action costs.

Uploaded by

ashwinids

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

114 views7 pages

Cheatsheet States Models

Uploaded by

ashwinids

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

CS 221 – Artificial Intelligence https://stanford.

edu/~shervine

VIP Cheatsheet: States-based models The objective is to find a path that minimizes the cost.
r Backtracking search – Backtracking search is a naive recursive algorithm that tries all
possibilities to find the minimum cost path. Here, action costs can be either positive or negative.
Afshine Amidi and Shervine Amidi
r Breadth-first search (BFS) – Breadth-first search is a graph search algorithm that does a
May 23, 2019 level-by-level traversal. We can implement it iteratively with the help of a queue that stores at
each step future nodes to be visited. For this algorithm, we can assume action costs to be equal
to a constant c > 0.

Search optimization
In this section, we assume that by accomplishing action a from state s, we deterministically
arrive in state Succ(s,a). The goal here is to determine a sequence of actions (a1 ,a2 ,a3 ,a4 ,...)
that starts from an initial state and leads to an end state. In order to solve this kind of problem,
our objective will be to find the minimum cost path by using states-based models.

Tree search
This category of states-based algorithms explores all possible states and actions. It is quite
memory efficient, and is suitable for huge state spaces but the runtime can become exponential
in the worst cases.

r Depth-first search (DFS) – Depth-first search is a search algorithm that traverses a graph
by following each path as deep as it can. We can implement it recursively, or iteratively with
the help of a stack that stores at each step future nodes to be visited. For this algorithm, action
costs are assumed to be equal to 0.

r Search problem – A search problem is defined with:

• a starting state sstart
• possible actions Actions(s) from state s

• action cost Cost(s,a) from state s with action a

• successor Succ(s,a) of state s after action a r Iterative deepening – The iterative deepening trick is a modification of the depth-first
search algorithm so that it stops after reaching a certain depth, which guarantees optimality
• whether an end state was reached IsEnd(s) when all action costs are equal. Here, we assume that action costs are equal to a constant c > 0.
r Tree search algorithms summary – By noting b the number of actions per state, d the
solution depth, and D the maximum depth, we have:

Algorithm Action costs Space Time

Backtracking search any O(D) O(bD )

Breadth-first search c>0 O(bd ) O(bd )

Depth-first search 0 O(D) O(bD )

DFS-Iterative deepening c>0 O(d) O(bd )

Stanford University 1 Spring 2019

CS 221 – Artificial Intelligence https://stanford.edu/~shervine

Graph search State Explanation

This category of states-based algorithms aims at constructing optimal paths, enabling exponen- States for which the optimal path has
Explored E
tial savings. In this section, we will focus on dynamic programming and uniform cost search. already been found
r Graph – A graph is comprised of a set of vertices V (also called nodes) as well as a set of States seen for which we are still figuring out
edges E (also called links). Frontier F
how to get there with the cheapest cost
Unexplored U States not seen yet

r Uniform cost search – Uniform cost search (UCS) is a search algorithm that aims at finding
the shortest path from a state sstart to an end state send . It explores states s in increasing order
of PastCost(s) and relies on the fact that all action costs are non-negative.

Remark: a graph is said to be acylic when there is no cycle.

r State – A state is a summary of all past actions sufficient to choose future actions optimally.
r Dynamic programming – Dynamic programming (DP) is a backtracking search algorithm
with memoization (i.e. partial results are saved) whose goal is to find a minimum cost path from
state s to an end state send . It can potentially have exponential savings compared to traditional Remark 1: the UCS algorithm is logically equivalent to Djikstra’s algorithm.
graph search algorithms, and has the property to only work for acyclic graphs. For any given Remark 2: the algorithm would not work for a problem with negative action costs, and adding a
state s, the future cost is computed as follows: positive constant to make them non-negative would not solve the problem since this would end
up being a different problem.
0 if IsEnd(s) r Correctness theorem – When a state s is popped from the frontier F and moved to explored

FutureCost(s) =

min Cost(s,a) + FutureCost(Succ(s,a)) otherwise set E, its priority is equal to PastCost(s) which is the minimum cost path from sstart to s.
a∈Actions(s)
r Graph search algorithms summary – By noting N the number of total states, n of which
are explored before the end state send , we have:

Algorithm Acyclicity Costs Time/space

Dynamic programming yes any O(N )
Uniform cost search no c>0 O(n log(n))

Remark: the complexity countdown supposes the number of possible actions per state to be
constant.

Learning costs

Remark: the figure above illustrates a bottom-to-top approach whereas the formula provides the Suppose we are not given the values of Cost(s,a), we want to estimate these quantities from a
intuition of a top-to-bottom problem resolution. training set of minimizing-cost-path sequence of actions (a1 , a2 , ..., ak ).

r Types of states – The table below presents the terminology when it comes to states in the r Structured perceptron – The structured perceptron is an algorithm aiming at iteratively
context of uniform cost search: learning the cost of each state-action pair. At each step, it:

Stanford University 2 Spring 2019

CS 221 – Artificial Intelligence https://stanford.edu/~shervine

• decreases the estimated cost of each state-action of the true minimizing path y given by r Correctness – If h is consistent, then A∗ returns the minimum cost path.
the training data,
r Admissibility – A heuristic h is said to be admissible if we have:
• increases the estimated cost of each state-action of the current predicted path y 0 inferred
from the learned weights. h(s) 6 FutureCost(s)

Remark: there are several versions of the algorithm, one of which simplifies the problem to only
learning the cost of each action a, and the other parametrizes Cost(s,a) to a feature vector of r Theorem – Let h(s) be a given heuristic. We have:
learnable weights.
h(s) consistent =⇒ h(s) admissible

A? search
r Efficiency – A∗ explores all states s satisfying the following equation:
r Heuristic function – A heuristic is a function h over states s, where each h(s) aims at PastCost(s) 6 PastCost(send ) − h(s)
estimating FutureCost(s), the cost of the path from s to send .

r Algorithm – A∗ is a search algorithm that aims at finding the shortest path from a state s to
an end state send . It explores states s in increasing order of PastCost(s) + h(s). It is equivalent
to a uniform cost search with edge costs Cost0 (s,a) given by:
Cost0 (s,a) = Cost(s,a) + h(Succ(s,a)) − h(s)
Remark: larger values of h(s) is better as this equation shows it will restrict the set of states s
Remark: this algorithm can be seen as a biased version of UCS exploring states estimated to be going to be explored.
closer to the end state.
r Consistency – A heuristic h is said to be consistent if it satisfies the two following properties:
Relaxation
• For all states s and actions a,
It is a framework for producing consistent heuristics. The idea is to find closed-form reduced
h(s) 6 Cost(s,a) + h(Succ(s,a)) costs by removing constraints and use them as heuristics.
r Relaxed search problem – The relaxation of search problem P with costs Cost is noted
Prel with costs Costrel , and satisfies the identity:
Costrel (s,a) 6 Cost(s,a)

r Relaxed heuristic – Given a relaxed search problem Prel , we define the relaxed heuristic
h(s) = FutureCostrel (s) as the minimum cost path from s to an end state in the graph of costs
Costrel (s,a).

r Consistency of relaxed heuristics – Let Prel be a given relaxed problem. By theorem, we

have:

• The end state verifies the following: h(s) = FutureCostrel (s) =⇒ h(s) consistent

h(send ) = 0
r Tradeoff when choosing heuristic – We have to balance two aspects in choosing a heuristic:

• Computational efficiency: h(s) = FutureCostrel (s) must be easy to compute. It has to

produce a closed form, easier search and independent subproblems.

• Good enough approximation: the heuristic h(s) should be close to FutureCost(s) and we
have thus to not remove too many constraints.

Stanford University 3 Spring 2019

CS 221 – Artificial Intelligence https://stanford.edu/~shervine

r Max heuristic – Let h1 (s), h2 (s) be two heuristics. We have the following property:
k
h1 (s), h2 (s) consistent =⇒ h(s) = max{h1 (s), h2 (s)} consistent X
u(s0 ,...,sk ) = ri γ i−1
i=1
Markov decision processes
In this section, we assume that performing action a from state s can lead to several states s01 ,s02 ,...
in a probabilistic manner. In order to find our way between an initial state and an end state,
our objective will be to find the maximum value policy by using Markov decision processes that
help us cope with randomness and uncertainty.

Remark: the figure above is an illustration of the case k = 4.

Notations
r Q-value – The Q-value of a policy π by taking action a from state s, also noted Qπ (s,a), is
r Definition – The objective of a Markov decision process is to maximize rewards. It is defined the expected utility of taking action a from state s and then following policy π. It is defined as
with: follows:
X
• a starting state sstart Qπ (s,a) = T (s,a,s0 ) Reward(s,a,s0 ) + γVπ (s0 )
s0 ∈ States
• possible actions Actions(s) from state s

• transition probabilities T (s,a,s0 ) from s to s0 with action a r Value of a policy – The value of a policy π from state s, also noted Vπ (s), is the expected
utility by following policy π from state s over random paths. It is defined as follows:
• rewards Reward(s,a,s0 ) from s to s0 with action a
Vπ (s) = Qπ (s,π(s))
• whether an end state was reached IsEnd(s)
Remark: Vπ (s) is equal to 0 if s is an end state.
• a discount factor 0 6 γ 6 1

Applications
r Policy evaluation – Given a policy π, policy evaluation is an iterative algorithm that com-
putes Vπ . It is done as follows:

• Initialization: for all states s, we have

(0)
Vπ (s) ←− 0

• Iteration: for t from 1 to TPE , we have

r Transition probabilities – The transition probability T (s,a,s0 )

specifies the probability ∀s,
(t)
Vπ (s) ←− Qπ
(t−1)
(s,π(s))
of going to state s0 after action a is taken in state s. Each s0 7→ T (s,a,s0 ) is a probability
distribution, which means that:
X with
∀s,a, T (s,a,s0 ) = 1 X h i
(t−1) (t−1)
s0 ∈ States Qπ (s,π(s)) = T (s,π(s),s0 ) Reward(s,π(s),s0 ) + γVπ (s0 )
s0 ∈ States

r Policy – A policy π is a function that maps each state s to an action a, i.e.

π : s 7→ a Remark: by noting S the number of states, A the number of actions per state, S 0 the number
of successors and T the number of iterations, then the time complexity is of O(TPE SS 0 ).

r Utility – The utility of a path (s0 , ..., sk ) is the discounted sum of the rewards on that path. r Optimal Q-value – The optimal Q-value Qopt (s,a) of state s with action a is defined to be
In other words, the maximum Q-value attained by any policy starting. It is computed as follows:

Stanford University 4 Spring 2019

CS 221 – Artificial Intelligence https://stanford.edu/~shervine

Remark: model-based Monte Carlo is said to be off-policy, because the estimation does not
X
0
0 0
depend on the exact policy.
Qopt (s,a) = T (s,a,s ) Reward(s,a,s ) + γVopt (s )
s0 ∈ States
r Model-free Monte Carlo – The model-free Monte Carlo method aims at directly estimating
Qπ , as follows:
bπ (s,a) = average of ut where st−1 = s, at = a
Q
r Optimal value – The optimal value Vopt (s) of state s is defined as being the maximum value
attained by any policy. It is computed as follows: where ut denotes the utility starting at step t of a given episode.
Vopt (s) = max Qopt (s,a) Remark: model-free Monte Carlo is said to be on-policy, because the estimated value is dependent
a∈ Actions(s)
on the policy π used to generate the data.

r Equivalent formulation – By introducing the constant η = 1+(#updates 1

and for
r Optimal policy – The optimal policy πopt is defined as being the policy that leads to the to (s,a))
optimal values. It is defined by: each (s,a,u) of the training set, the update rule of model-free Monte Carlo has a convex combi-
nation formulation:
∀s, πopt (s) = argmax Qopt (s,a)
a∈ Actions(s) bπ (s,a) ← (1 − η)Q
Q bπ (s,a) + ηu
as well as a stochastic gradient formulation:
r Value iteration – Value iteration is an algorithm that finds the optimal value Vopt as well
as the optimal policy πopt . It is done as follows: bπ (s,a) ← Q
Q bπ (s,a) − η(Q
bπ (s,a) − u)

• Initialization: for all states s, we have r SARSA – State-action-reward-state-action (SARSA) is a boostrapping method estimating
Qπ by using both raw data and estimates as part of the update rule. For each (s,a,r,s0 ,a0 ), we
(0)
Vopt (s) ←− 0 have:
h i
bπ (s,a) ←− (1 − η)Q
Q bπ (s0 ,a0 )
bπ (s,a) + η r + γ Q
• Iteration: for t from 1 to TVI , we have

(t) (t−1)
∀s, Vopt (s) ←− max Qopt (s,a) Remark: the SARSA estimate is updated on the fly as opposed to the model-free Monte Carlo
a∈ Actions(s) one where the estimate can only be updated at the end of the episode.
r Q-learning – Q-learning is an off-policy algorithm that produces an estimate for Qopt . On
with each (s,a,r,s0 ,a0 ), we have:
X h i h i
(t−1)
Qopt (s,a) = T (s,a,s0 ) Reward(s,a,s0 ) + γVopt
(t−1)
(s0 ) bopt (s,a) ← (1 − η)Q
Q bopt (s,a) + η r + γ max bopt (s0 ,a0 )
Q
a0 ∈ Actions(s0 )
s0 ∈ States

r Epsilon-greedy – The epsilon-greedy policy is an algorithm that balances exploration with

Remark: if we have either γ < 1 or the MDP graph being acyclic, then the value iteration probability and exploitation with probability 1 − . For a given state s, the policy πact is
algorithm is guaranteed to converge to the correct answer. computed as follows:

argmax Q
bopt (s,a) with proba 1 −
When unknown transitions and rewards πact (s) = a∈ Actions
random from Actions(s) with proba
Now, let’s assume that the transition probabilities and the rewards are unknown.
r Model-based Monte Carlo – The model-based Monte Carlo method aims at estimating
T (s,a,s0 ) and Reward(s,a,s0 ) using Monte Carlo simulation with: Game playing
# times (s,a,s0 )
occurs In games (e.g. chess, backgammon, Go), other agents are present and need to be taken into
Tb(s,a,s0 ) =
# times (s,a) occurs account when constructing our policy.

and r Game tree – A game tree is a tree that describes the possibilities of a game. In particular,
each node is a decision point for a player and each root-to-leaf path is a possible outcome of the
0 game.
Reward(s,a,s
\ ) = r in (s,a,r,s0 )
r Two-player zero-sum game – It is a game where each state is fully observed and such that
These estimations will be then used to deduce Q-values, including Qπ and Qopt . players take turns. It is defined with:

Stanford University 5 Spring 2019

CS 221 – Artificial Intelligence https://stanford.edu/~shervine

• a starting state sstart Remark: we can extract πmax and πmin from the minimax value Vminimax .

• possible actions Actions(s) from state s

• successors Succ(s,a) from states s with actions a

• whether an end state was reached IsEnd(s)

• the agent’s utility Utility(s) at end state s

• the player Player(s) who controls state s

Remark: we will assume that the utility of the agent has the opposite sign of the one of the
opponent.
r Minimax properties – By noting V the value function, there are 3 properties around
r Types of policies – There are two types of policies: minimax to have in mind:

• Deterministic policies, noted πp (s), which are actions that player p takes in state s.
• Property 1 : if the agent were to change its policy to any πagent , then the agent would be
• Stochastic policies, noted πp (s,a) ∈ [0,1], which are probabilities that player p takes action no better off.
a in state s.
∀πagent , V (πmax ,πmin ) > V (πagent ,πmin )

r Expectimax – For a given state s, the expectimax value Vexptmax (s) is the maximum expected
utility of any agent policy when playing with respect to a fixed and known opponent policy πopp . • Property 2 : if the opponent changes its policy from πmin to πopp , then he will be no
It is computed as follows: better off.
∀πopp , V (πmax ,πmin ) 6 V (πmax ,πopp )
Utility(s) IsEnd(s)

max Vexptmax (Succ(s,a)) Player(s) = agent



a∈Actions(s)
Vexptmax (s) = X • Property 3 : if the opponent is known to be not playing the adversarial policy, then the

 πopp (s,a)Vexptmax (Succ(s,a)) Player(s) = opp minimax policy might not be optimal for the agent.

a∈Actions(s)

∀π, V (πmax ,π) 6 V (πexptmax ,π)

Remark: expectimax is the analog of value iteration for MDPs.

In the end, we have the following relationship:

V (πexptmax ,πmin ) 6 V (πmax ,πmin ) 6 V (πmax ,π) 6 V (πexptmax ,π)

Speeding up minimax

r Evaluation function – An evaluation function is a domain-specific and approximate estimate

of the value Vminimax (s). It is noted Eval(s).
r Minimax – The goal of minimax policies is to find an optimal policy against an adversary Remark: FutureCost(s) is an analogy for search problems.
by assuming the worst case, i.e. that the opponent is doing everything to minimize the agent’s
utility. It is done as follows: r Alpha-beta pruning – Alpha-beta pruning is a domain-general exact method optimizing
the minimax algorithm by avoiding the unnecessary exploration of parts of the game tree. To do
 Utility(s) IsEnd(s)

max Vminimax (Succ(s,a)) Player(s) = agent so, each player keeps track of the best value they can hope for (stored in α for the maximizing
Vminimax (s) = a∈Actions(s) player and in β for the minimizing player). At a given step, the condition β < α means that the
 min Vminimax (Succ(s,a)) Player(s) = opp optimal path is not going to be in the current branch as the earlier player had a better option
a∈Actions(s) at their disposal.

Stanford University 6 Spring 2019

CS 221 – Artificial Intelligence https://stanford.edu/~shervine

Non-zero-sum games
r Payoff matrix – We define Vp (πA ,πB ) to be the utility for player p.

r Nash equilibrium – A Nash equilibrium is (πA

∗ ,π ∗ ) such that no player has an incentive to
B
change its strategy. We have:
∗ ∗ ∗ ∗ ∗ ∗
∀πA , VA (πA ,πB ) > VA (πA ,πB ) and ∀πB , VB (πA ,πB ) > VB (πA ,πB )

Remark: in any finite-player game with finite number of actions, there exists at least one Nash
equilibrium.

r TD learning – Temporal difference (TD) learning is used when we don’t know the transi-
tions/rewards. The value is based on exploration policy. To be able to use it, we need to know
rules of the game Succ(s,a). For each (s,a,r,s0 ), the update is done as follows:

w ←− w − η V (s,w) − (r + γV (s0 ,w)) ∇w V (s,w)

Simultaneous games

This is the contrary of turn-based games, where there is no ordering on the player’s moves.

r Single-move simultaneous game – Let there be two players A and B, with given possible
actions. We note V (a,b) to be A’s utility if A chooses action a, B chooses action b. V is called
the payoff matrix.

r Strategies – There are two main types of strategies:

• A pure strategy is a single action:

a ∈ Actions

• A mixed strategy is a probability distribution over actions:

∀a ∈ Actions, 0 6 π(a) 6 1

r Game evaluation – The value of the game V (πA ,πB ) when player A follows πA and player
B follows πB is such that:
X
V (πA ,πB ) = πA (a)πB (b)V (a,b)
a,b

r Minimax theorem – By noting πA ,πB ranging over mixed strategies, for every simultaneous
two-player zero-sum game with a finite number of actions, we have:

max min V (πA ,πB ) = min max V (πA ,πB )

πA πB πB πA

Stanford University 7 Spring 2019

LM2500 and PGT25 Gas Turbine Families: Updated Shutdown and Restart Procedures
100% (2)
LM2500 and PGT25 Gas Turbine Families: Updated Shutdown and Restart Procedures
15 pages
Snapdeal MIS
No ratings yet
Snapdeal MIS
16 pages
States Models Cheatsheet
No ratings yet
States Models Cheatsheet
7 pages
Announcements: Homework 1
No ratings yet
Announcements: Homework 1
51 pages
Searching Algorithms-Problem Solving in AI
No ratings yet
Searching Algorithms-Problem Solving in AI
16 pages
Unit 2
No ratings yet
Unit 2
33 pages
BIM309 AI Week5a
No ratings yet
BIM309 AI Week5a
41 pages
Ai Lect2 Search
No ratings yet
Ai Lect2 Search
81 pages
4 Informed Search
No ratings yet
4 Informed Search
84 pages
1-AI Search
No ratings yet
1-AI Search
42 pages
Ai 02
No ratings yet
Ai 02
22 pages
Search Problems
No ratings yet
Search Problems
50 pages
AI UNIT 1 BEC Part 2
No ratings yet
AI UNIT 1 BEC Part 2
102 pages
04 Informed Search
No ratings yet
04 Informed Search
6 pages
Unit-2 Introduction To Search
No ratings yet
Unit-2 Introduction To Search
15 pages
Artificial Intelligence Unit IV
No ratings yet
Artificial Intelligence Unit IV
105 pages
Search Algorithms in AI
No ratings yet
Search Algorithms in AI
7 pages
Lesson 3 - Part 1 Uninformed - Searching Algorithms
No ratings yet
Lesson 3 - Part 1 Uninformed - Searching Algorithms
56 pages
UNIT1 Artificial Intelligence
No ratings yet
UNIT1 Artificial Intelligence
6 pages
Ai Unit 2 Notes
No ratings yet
Ai Unit 2 Notes
52 pages
Lec02 Ai Chapter3 and 4 Problem Solving by Search Aima
No ratings yet
Lec02 Ai Chapter3 and 4 Problem Solving by Search Aima
81 pages
Ai 2
No ratings yet
Ai 2
32 pages
Artificial - Intillegence (Imp Questions)
No ratings yet
Artificial - Intillegence (Imp Questions)
58 pages
Unit 2
100% (1)
Unit 2
19 pages
Problem Solving
No ratings yet
Problem Solving
19 pages
2024 Slide2 Uninform Search Update
No ratings yet
2024 Slide2 Uninform Search Update
129 pages
IAI - Module 2 Notes
No ratings yet
IAI - Module 2 Notes
14 pages
Lesson 04 - Problem Solving by Searching
No ratings yet
Lesson 04 - Problem Solving by Searching
82 pages
Lecture 3 Problem Solving
No ratings yet
Lecture 3 Problem Solving
49 pages
Chapter Three
No ratings yet
Chapter Three
46 pages
Search Algorithms in AI4
No ratings yet
Search Algorithms in AI4
13 pages
Ai Notes - 15-21
No ratings yet
Ai Notes - 15-21
7 pages
Problem Solving by Searching
No ratings yet
Problem Solving by Searching
63 pages
Search Games
No ratings yet
Search Games
54 pages
A Search 3 Recap of A
No ratings yet
A Search 3 Recap of A
3 pages
AI Unit 2 Notes
No ratings yet
AI Unit 2 Notes
14 pages
Lec01 & 02 (Week4) - IT 314 - Artificial Intelligence
No ratings yet
Lec01 & 02 (Week4) - IT 314 - Artificial Intelligence
42 pages
Example: Route Planning in A Map: (Learning) (Logic) (Uncertainty) (Logic, Uncertainty)
No ratings yet
Example: Route Planning in A Map: (Learning) (Logic) (Uncertainty) (Logic, Uncertainty)
5 pages
Chapter 3 AI
No ratings yet
Chapter 3 AI
43 pages
Séance 6
No ratings yet
Séance 6
10 pages
A Search: Introduction To Artificial Intelligence
No ratings yet
A Search: Introduction To Artificial Intelligence
22 pages
2014wq171 06 ReviewSearch
No ratings yet
2014wq171 06 ReviewSearch
54 pages
FAI - Unit 2 - Search
No ratings yet
FAI - Unit 2 - Search
66 pages
Module-2 Chapter-1 Notes
No ratings yet
Module-2 Chapter-1 Notes
17 pages
UNIT 2 Search Algorithm in AI
No ratings yet
UNIT 2 Search Algorithm in AI
14 pages
Teorica 08
No ratings yet
Teorica 08
77 pages
Chapter Three: Lecture 1: Solving Problems by Searching and Constraint Satisfaction Problem
No ratings yet
Chapter Three: Lecture 1: Solving Problems by Searching and Constraint Satisfaction Problem
53 pages
AI Unit 2
No ratings yet
AI Unit 2
143 pages
Solving Problems by Searching
No ratings yet
Solving Problems by Searching
58 pages
Chapter3 ProblemSolvingBySearching
No ratings yet
Chapter3 ProblemSolvingBySearching
61 pages
02 Solving Problems by Searching (Us)
No ratings yet
02 Solving Problems by Searching (Us)
48 pages
Lecture3 Searching
No ratings yet
Lecture3 Searching
45 pages
Topic 3A - Problem Solving Via Search v1
No ratings yet
Topic 3A - Problem Solving Via Search v1
63 pages
Ai-Unit 2
No ratings yet
Ai-Unit 2
31 pages
Unit 2d ExtraReading (Compatibility Mode)
No ratings yet
Unit 2d ExtraReading (Compatibility Mode)
56 pages
Lecture2 Uninformedsearch
No ratings yet
Lecture2 Uninformedsearch
70 pages
AI - Unit 02
No ratings yet
AI - Unit 02
28 pages
Unit3 Search
No ratings yet
Unit3 Search
123 pages
Problem Solving and Searching: CS 171/271 (Chapter 3)
No ratings yet
Problem Solving and Searching: CS 171/271 (Chapter 3)
30 pages
Depth First Search: Fundamentals and Applications
From Everand
Depth First Search: Fundamentals and Applications
Fouad Sabry
No ratings yet
Breadth First Search: Fundamentals and Applications
From Everand
Breadth First Search: Fundamentals and Applications
Fouad Sabry
No ratings yet
A Star: Fundamentals and Applications
From Everand
A Star: Fundamentals and Applications
Fouad Sabry
No ratings yet
TextAds LandingPage L1 v5 07292022-1
No ratings yet
TextAds LandingPage L1 v5 07292022-1
4 pages
Hal 91-104
No ratings yet
Hal 91-104
14 pages
PDP Erik Conrath
No ratings yet
PDP Erik Conrath
8 pages
Precedence and Associativity of Operators in Python
No ratings yet
Precedence and Associativity of Operators in Python
2 pages
Information Technology Thesis Titles List
100% (3)
Information Technology Thesis Titles List
8 pages
How To Use NFC Shield With Arduino and Demo Code
No ratings yet
How To Use NFC Shield With Arduino and Demo Code
8 pages
ANSYS Stress Linearization
No ratings yet
ANSYS Stress Linearization
15 pages
Training Report On Telecommunication and Signal-Indian Railways
100% (3)
Training Report On Telecommunication and Signal-Indian Railways
38 pages
China Book Digital Publishing Market Analysis Yanping Bryant Openbook Trajectory
No ratings yet
China Book Digital Publishing Market Analysis Yanping Bryant Openbook Trajectory
37 pages
Field Service Manager
No ratings yet
Field Service Manager
2 pages
Unit 4 Web Programming
No ratings yet
Unit 4 Web Programming
184 pages
Department of Computer Science and Engineering: Course Name: Differential and Integral Calculus Course Code: MATH 207
No ratings yet
Department of Computer Science and Engineering: Course Name: Differential and Integral Calculus Course Code: MATH 207
13 pages
Level of Implementation of Industrial Technology Syllabi at ThePangasinan State University
No ratings yet
Level of Implementation of Industrial Technology Syllabi at ThePangasinan State University
9 pages
Role of Information Technology (BPR)
No ratings yet
Role of Information Technology (BPR)
9 pages
TMS IntraWeb Component Pack Quick Start
No ratings yet
TMS IntraWeb Component Pack Quick Start
17 pages
Bizhub PRO 1200 Series Product Guide 4.8
No ratings yet
Bizhub PRO 1200 Series Product Guide 4.8
73 pages
CSC118 - Fundamentals of Algorithm Development
0% (1)
CSC118 - Fundamentals of Algorithm Development
3 pages
CRC Leaky Bucket Algorithm
No ratings yet
CRC Leaky Bucket Algorithm
7 pages
Probuds t31
No ratings yet
Probuds t31
7 pages
7SJ602 Catalogue V35
50% (2)
7SJ602 Catalogue V35
31 pages
ABAP Performance Tuning
No ratings yet
ABAP Performance Tuning
40 pages
Utilization of The AISC Steel Sculpture For An Introductory Construction Plan Reading Course
No ratings yet
Utilization of The AISC Steel Sculpture For An Introductory Construction Plan Reading Course
7 pages
Project
No ratings yet
Project
2 pages
Urb 100 Tuning Ver1
No ratings yet
Urb 100 Tuning Ver1
23 pages
Location Samsung
No ratings yet
Location Samsung
3 pages
Collaborative Optimization of Dynamic Pricing and Seat Allocation For High-Speed Railways An Empirical Study From China
No ratings yet
Collaborative Optimization of Dynamic Pricing and Seat Allocation For High-Speed Railways An Empirical Study From China
11 pages
Thinkcentre M910 Tiny Platform Specifications: Product Specifications Reference (Psref)
No ratings yet
Thinkcentre M910 Tiny Platform Specifications: Product Specifications Reference (Psref)
4 pages
Database Design Section 5 Quiz
No ratings yet
Database Design Section 5 Quiz
7 pages

Cheatsheet States Models

Uploaded by

Cheatsheet States Models

Uploaded by

CS 221 – Artificial Intelligence https://stanford.

r Search problem – A search problem is defined with:

• action cost Cost(s,a) from state s with action a

Algorithm Action costs Space Time

Backtracking search any O(D) O(bD )

Breadth-first search c>0 O(bd ) O(bd )

Depth-first search 0 O(D) O(bD )

DFS-Iterative deepening c>0 O(d) O(bd )

Stanford University 1 Spring 2019

Graph search State Explanation

Remark: a graph is said to be acylic when there is no cycle.

Algorithm Acyclicity Costs Time/space

Stanford University 2 Spring 2019

r Consistency of relaxed heuristics – Let Prel be a given relaxed problem. By theorem, we

• Computational efficiency: h(s) = FutureCostrel (s) must be easy to compute. It has to

Stanford University 3 Spring 2019

Remark: the figure above is an illustration of the case k = 4.

• Initialization: for all states s, we have

• Iteration: for t from 1 to TPE , we have

r Transition probabilities – The transition probability T (s,a,s0 )

r Policy – A policy π is a function that maps each state s to an action a, i.e.

Stanford University 4 Spring 2019

r Equivalent formulation – By introducing the constant η = 1+(#updates 1

r Epsilon-greedy – The epsilon-greedy policy is an algorithm that balances exploration with

Stanford University 5 Spring 2019

• possible actions Actions(s) from state s

• successors Succ(s,a) from states s with actions a

• whether an end state was reached IsEnd(s)

• the agent’s utility Utility(s) at end state s

• the player Player(s) who controls state s

∀π, V (πmax ,π) 6 V (πexptmax ,π)

In the end, we have the following relationship:

V (πexptmax ,πmin ) 6 V (πmax ,πmin ) 6 V (πmax ,π) 6 V (πexptmax ,π)

r Evaluation function – An evaluation function is a domain-specific and approximate estimate

Stanford University 6 Spring 2019

r Nash equilibrium – A Nash equilibrium is (πA

r Strategies – There are two main types of strategies:

• A pure strategy is a single action:

• A mixed strategy is a probability distribution over actions:

max min V (πA ,πB ) = min max V (πA ,πB )

Stanford University 7 Spring 2019

You might also like