0% found this document useful (0 votes)

20 views20 pages

Lec 25

Uploaded by

deepthi.content

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views20 pages

Lec 25

Uploaded by

deepthi.content

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Fundamentals of Artificial Intelligence

Prof. Shyamanta M Hazarika

Department of Mechanical Engineering
Indian Institute of Technology- Guwahati

Lecture-25
Sequential Decision Problems

Welcome to fundamentals of artificial intelligence, today we look at sequential decision

problems. Sequential decision problems which involve utility and uncertainty, generalize the
search and planning problems that we have looked so far. Recall that in our discussion on
reasoning under uncertainty we had looked at utility theory, wherein the agent was concerned
about making a single decision based on the outcome of the actions utility value.

Here today in sequential decision problems we will look at utility being computed for a sequence
of actions and we will see how we can go about making decisions under such a scenario.
(Refer Slide Time: 01:45)

Sequential decision problems are concerned with making decisions when a sequence is involved.
Now if you think of the search algorithms that we have discussed previously we were looking for
a sequence of actions that could lead to a good state for us. However here there is a basic
difference, the difference being instead of returning as a sequence of actions what would be
returned by a sequential decision problem is a policy, a set of situation action rules for each state.
And that would be arrived by calculating utilities for each table.
(Refer Slide Time: 02:36)

So let us look at an example to understand sequential decision problems, here is a 4 cross 3 grid
and an agent needs to start at 1, 1 and it much reach idly the state marked with +1. Now the
problem terminates when either of the goal state given by + 1 or - 1 is reached. This example that
we are taken here today for discussion is from the book by Russell and Norvig chapter 17 pages
498 to 507.

Now the possible actions for this agent in this scenario are to either go up or in a given scenario
come down or go left and right. So the agent needs to execute a sequence of actions and we
assume that the environment is fully observable, that is the agent always knows where it is.
(Refer Slide Time: 03:57)
In a deterministic version of this problem if I am interested to get to the goal the objective would
be to get the maximum reward. So each action would be like moving the one square in the
intended direction, except when I move into your wall resulting in no change in my position. In a
deterministic version if I know the initial state and the effects of my actions then this problem
can be directly solved by search algorithms that we have described before in this course.

Now this is true regardless of where whether the environment is accessible or inaccessible. In
such a scenario the solution for this sequential decision problem is to move up once and
thereafter move again move to the right one time, a second time and the third time would bring
me to the goal. Now if I am looking at a stochastic version of the same problem. Let us assume
that I can go forward with a probability of 0.8 and the rest of the time it either goes to the cells to
its right or to its left with probability of 0.1.

And like the previous case if my agent bombs into the wall it stays in place. Now under such a
scenario one cannot create a plan ahead of time, in fact if you look at whether my old plan would
succeed, I can see what would be the probability of the old plan succeeding. So because the
actions are unreliable each action has the intended effect with certain probabilities and that is 0.8
when it goes forward and 0.1 when the agent goes at right-angles.
So if I now compute the probability to succeed by taking the one version which is the one that is
marked in red or a very unlikely other version which is marked in blue where the total
probability to succeed would be 0.32776.
(Refer Slide Time: 06:52)

Now we should understand that in the stochastic version that we were talking off the actions are
unreliable, each action actually achieves the intended effect with a probability of 0.8. But the rest
of the time the action moves the agent at right angles to the intended direction and therefore we
will need to use a transition model. This transition model refer to the set of probabilities
associated with the possible transitions between states after any given action.

The transition model is a specification of the outcome probabilities for each action, in each
possible state. In our case the transition model is given as T of s a s prime which denotes the
probability of reaching the state s prime, if I do an action a in the state s. So under such a
transition model I would know now what would be the likelihood of a given action in a given
state to arrive at a new state?
(Refer Slide Time: 08:25)
A utility function would need to be specified for the agent in order to determine the value of an
action. Now in our earlier discussion each agent was involved in a single decision and when I
was looking for the utility I was computing it based on what is the final result of a given action,
here the problem is sequential and therefore the utility function depends not on a single state that
I arrive at but on a sequence of states under such as scenario it is very tricky to compute the
utility function.

In our case except for the terminal states which is marked with + 1 or - 1 no other state has been
given an indication of what its utility is. So when we base the utility function on sequence of
states we need to look at the history or what we call the environment history rather than on a
single state. So we do it in the following way rewards are assigned to the states that is each state
returns a reward which is given as Rs.

For this example we can assume the following that the reward for all states except for the goal
state is - 0.04 and then if we look at the utility function the utility function is the sum of all the
states visited so far. So for example if my agent reaches 4, 3 in 10 steps then the total utility
would be 1 + 10 steps it has visited which each had a utility of - 0.4. So that would be reduced
from 1 and I would have a total utility of 0.6.
I can also make provisions for negative reward that is an incentive to stop interacting as quickly
as possible, after we have looked at the rewards we can now deal with action sequences .So we
can look at action sequences as long actions themselves, then we could apply the basic maximum
expected utility principle to the complete sequence.
(Refer Slide Time: 11:11)

And the rational action would then be the first action of the optimal sequence, but then there is a
flaw in this approach even if it is closely related to the way that search algorithms work for a
stochastic environment the flaw is that here I assume the agent to commit to an entire sequence
of action before executing it. Now in a case where the agent has no senses this is the best it can
do. But let us assume that the agent can acquire new sensory information.

Then committing to an entire sequence is actually irrational because after I reach a state new
percepts that I have may enable me to take a better decision rather than committing to a sequence
of actions a priori. So even if this idea of action sequences to be considered as long actions and
just using maximum expected utility looks attractive but it is not the best thing to do. In reality as
I have been emphasizing the agent will have the opportunity to choose a new action after each
step.
(Refer Slide Time: 12:44)
Given whatever new information it gets from the sensors, we therefore need an approach which
is more like conditional planning algorithms rather than search algorithms. So this will of course
have to be extended to handle probabilities and utilities we can deal with the fact that the
conditional plan for a stochastic environment may have to be also infinite in size, because one
needs to understand that though very unlikely there is a possibility that a agent may get stuck in
one place or in a loop no matter how hard it tries not to when dealing with a stochastic
environment.

So when we are dealing with action sequences we would rather be using an approach more like
conditional planning instead of search as a sequence. So in an accessible environment we have
the agents percept which at each step will identify the state the agent is in.
(Refer Slide Time: 14:02)
If it can calculate an optimal action for each step then that will completely determine its
behavior, a complete mapping from the states to action is what I need and this is called a policy.
So given a policy it is possible to calculate the expected utility of the possible environment
histories generated by the policy and the problem is therefore not to calculate the optimal action
sequence, what I am more interested in sequential decision problems is to calculate the optimal
policy.

The policy that results in the highest expected utility, so instead of looking at the complete
sequence of actions as a long action itself and trying to commit to that sequence a priori we
rather look at a mapping from states to actions and we are interested in finding out such a policy.
So the problem of calculating an optimal policy in an accessible stochastic environment with a
known transition model is called a Markov decision problem.
(Refer Slide Time: 15:37)
Now this is very important for us to realize that Markov decision problem is about getting to an
optimal policy in an accessible environment under a stochastic environment, but then the
transition model that is the change from one state to another state given an action in the previous
set is known to me. Now Markov's work is very closely associated with the assumption of
accessibility and decision problems are therefore often divided into what are called Markov and
non Markov decision problems.

Closely related to the concept of a Markov decision process is the Markov property. Now
Markov property holds if the transition probabilities from any given state depend only on the
state and not on previous histories. In our case the transition model that we have assumed T if it
depends only on the state s and not on the rest of the history, then we say it satisfies Markov
problem.
(Refer Slide Time: 17:14)
Markov decision process is the specification of a sequential decision problem for a fully
observable environment that satisfies the Markov assumption. So it is defined as a 4 tuple where
I have this state, the action, the transition model, in fact the transition model is a table of
probabilities of S prime given action a in state s and I have rewards recall that reward at every
state is the cost or the reward to be in state s.

So this 4 tuple that I have under a fully observable environment satisfying the Markov
assumption yielding additive rewards is a Markov decision process. Now for the example
problem that we had started our discussion today let us look at what are these elements of the 4
tuple line.
(Refer Slide Time: 18:25)
So we have S the state of the agent on the grid, so it could be like 4, 3 the cell denoted by X , Y
is this state of the agent, we have actions in our case as already noted the agent can go up down
left and right, we there have to have the transition model. So we could say certain things like
being in 4, 3 and the next state being 3, 3 the probability of up is 0.1. So I would have a table that
gives me the probability of S prime given action a in state s.

And finally I have the rewards, so as already mentioned for our example except for the goal
states the reward is assumed to be - 0.04.
(Refer Slide Time: 19:35)
So now having formulated the problem as a Markov decision problem we need to look at the
solution since outcome of actions are not deterministic, a fixed set of actions cannot be a solution
here. A solution here needs to specify what an agent should do for any state that the agent might
reach. So we are looking for a policy and the policy is denoted by pi what the policy does is
recommends an action for a given state.

So if I say pi of s this is the action recommended by the policy pi for the status. Now one needs
to remember that the environment is stochastic so each time I have a given policy that is
executed there can be different environment histories and the quality of a policy that I need to
actually evaluate is determined by the expected utility of the possible environment histories that
are generated by that policy.
(Refer Slide Time: 20:57)

And finally I have what is called an optimal policy that is a policy that would yield the highest
expected utility. So an optimal policy is denoted by pi star and once pi star is computed for a
problem then the agent identifying the state s that it is in would consult the optimal policy for the
next action to execute. So what I am saying is in a sequential decision problem once we have
somehow arrived at the optimal policy whenever the agent needs to take an action it will consult
the optimal policy for the next action to execute at the given state.

So here is an example of the optimal policy for the problem that we were discussing. Now this is
under a reward that is 0.04 for all non terminal states. Now one thing that I want to make explicit
here is the point that if you look at 3, 1 even if the shortest path to the goal is about going here
and getting to +1, the optimal policy here is taking along the route. Now, this is because the cost
of taking a step in our case is fairly small compared to the penalty of ending up here by accident.

And therefore the optimal policy that I have is about taking a longer route and coming to the goal
rather than taking a shorter one with the possibility of getting a penalty if I falling here and in
order to avoid this I rather take along the route in my optimal policy.
(Refer Slide Time: 23:24)

Once we have the environment the utility function for environment histories or sequence of
states is computed under 2 methods, one is called the additive rewards. The additive rewards
method is about summing up the rewards of states that comprises the environment history, the
other method is called discounted rewards which is about the sum of progressively discounted
rewards of states that is I have the first state reward.

The second one is multiplied by a discount factor gamma which is a number between 0 and 1 and
then the next is gamma square RS 2 so on and so forth. Now the closure gamma is to 0 the last
future rewards count and if you note that when I have gamma = 1 the discounted rewards is same
as the additive rewards. So this is how I assign utility to state sequences and then let us look at
utilities for our example problem.
(Refer Slide Time: 24:54)
So the utilities of the states in our 4 cross 3 world for our non-terminal states we take gamma = 1
and the reward being - 0.04, we have computed the utilities as shown here, now note that the
utilities that are closure 2 4 3 are higher because we need fewer steps to reach the exit from these
states.
(Refer Slide Time: 25:32)

Having looked at the optimal policy and the utilities of states, now the question is about choosing
between policies that is how do I rank the policies: the value of a policy is the expected some of
the discounted rewards obtained where the expectation is taken overall possible state sequences
that could occur. So pi star that I have is about getting the maximum, so I have many state
sequences to compare as before we would love to maximize the expected utility when choosing
the policies. Now value iteration is an algorithm that allows me to compute the optimal policy.
(Refer Slide Time: 26:28)

The key insight in this algorithm is that the utility of a state is immediate reward + the
discounted expected utility of next states. So this is what is the key inside of the algorithm of
value iteration, the basic idea is to calculate the utility of each state and then use the state utilities
to select an optimal action.
(Refer Slide Time: 27:05)

So the utility of a state for that matter is the expected utility of the state sequence that might
follow it. So here we have u the utility of a state s after execution of pi 40 steps, we are interested
in finding out the expected utility as the sum of the rewards. So, here the rewards is the short-
term reward for being in the state s where as the utility of s is the long-term total reward from s
onwards.
(Refer Slide Time: 27:50)

So pi star selects the action that maximizes the expected utility of the subsequence state and the
Bellman equation defines the utility at a given state as the utility of s + the discounted utility of
the next step, assuming the optimal action.
(Refer Slide Time: 28:16)

So for our example problem if I want to evaluate the utility of 1, 1 given the Bellman equation,
then I have its own reward + the discount factor and then I am looking at the maximum of the
actions I could either do an up or a left or down or right. Now when we plug in the numbers from
utilities for the optimal policy that we have discussed before we find that up is the optimal action
in 1, 1 that is what comes out very clearly for us.
(Refer Slide Time: 29:03)

So using Bellman equation we can look for n possible states, if there are n possible states then
there are n Bellman equations one for each step and to compute the n utilities we would love to
solve simultaneously the n Bellman equations. Now this is problematic, because in the Bellman
equation the function max is not a linear operator. So what is done is we apply an iterative
approach and in that approach we replace the equality of the Bellman equation by an assignment.

So use iteration applying Bellman update instead of an equality here I have an assignment now
and we start with utilities of all states initialized to 0 and approach this iterative process which is
guaranteed to converge.
(Refer Slide Time: 30:11)
Another alternative to value iteration to find an optimal policy is the policy iteration. The policy
iteration searches the policy space, the basic idea is to start with a random policy and calculate
the utilities based on it if that policy were executed and then calculate a new maximum expected
utility based on the computed utilities. That is called policy improvement and then iterate until
the policy does not change.
(Refer Slide Time: 30:50)

Now so far we were talking of an accessible environment, in an inaccessible environment the

percept does not provide enough information to determine the state or the associated transition
probabilities, such problems are called partially observable Markov decision problems. So the
Markov property does not hold for percepts as opposed to states and the next percept does not
depend just on the current percept and the action taken.

So methods used for Markov decision problems are not directly applicable to partially
observable Markov decision problems.
(Refer Slide Time: 31:39)

The correct approach for partially observable Markov decision problems is actually to calculate a
probability distribution over the possible states given all previous percepts and to base decision
on this distribution. Now this seems simple enough but it is not, because in partially observable
Markov decision problems calculating the utility of an action in a state is made difficult by the
fact that the actions will cause the agent to obtain new percepts.

Then this will again change the agents belief to change in complex ways and therefore we can
see that partially observable Markov decision problems include value of information that we
have discussed when we were talking of reasoning under uncertainty as a component of the
decision problem. Now the standard method of solving a partially observable Markov decision
problem is to construct a new Markov decision problem in which this probability distribution
plays the role of the state variable.
(Refer Slide Time: 33:01)
However the resulting Markov decision problem is not easy to solve, this is because the state
space is characterized by real valued probabilities and this makes it infinite. Exact solution
methods for partially observable Markov decision problems required some fairly advanced tools
and is beyond the scope of this fundamental course on artificial intelligence. Instead of exact
solutions what one can often obtain is a good approximation using limited look-ahead as we had
done while we were discussing game-playing.

Now we shall revisit decision networks and see how this approach can be realized for partially
observable Markov decision problems, thank you very much.

AI Notes
No ratings yet
AI Notes
37 pages
Reinforcement Learning I:: The Setting and Classical Stochastic Dynamic Programming Algorithms
No ratings yet
Reinforcement Learning I:: The Setting and Classical Stochastic Dynamic Programming Algorithms
42 pages
Book Mathmatical Foundation of Reinforcement Learning Lecture Slides
No ratings yet
Book Mathmatical Foundation of Reinforcement Learning Lecture Slides
524 pages
5TH SFG, 1ST Special Forces Operational Report 31 July 1966
100% (1)
5TH SFG, 1ST Special Forces Operational Report 31 July 1966
54 pages
08 MDPs
No ratings yet
08 MDPs
111 pages
4 5864134585436080020
100% (1)
4 5864134585436080020
25 pages
CSE2530 Reinforcement Learning 2025 P1+2
No ratings yet
CSE2530 Reinforcement Learning 2025 P1+2
115 pages
06 MDP
No ratings yet
06 MDP
89 pages
An Introduction To Reinforcement Learning From Theory To Algorithms (December 19, 2024) - Joon Kwon
No ratings yet
An Introduction To Reinforcement Learning From Theory To Algorithms (December 19, 2024) - Joon Kwon
66 pages
MIT 6.036 Lecture
No ratings yet
MIT 6.036 Lecture
64 pages
Lec17 ReinforcementLearning
No ratings yet
Lec17 ReinforcementLearning
58 pages
Markov Decision Process I
No ratings yet
Markov Decision Process I
111 pages
2024 MDPs Part 1
No ratings yet
2024 MDPs Part 1
59 pages
M2 PPT
No ratings yet
M2 PPT
91 pages
Chapter3&4 Problemsolvingagents Expertsystems
No ratings yet
Chapter3&4 Problemsolvingagents Expertsystems
71 pages
18 AI BasicRL
No ratings yet
18 AI BasicRL
96 pages
08 MDPs
No ratings yet
08 MDPs
110 pages
L12 Markov Decision Processes
No ratings yet
L12 Markov Decision Processes
64 pages
Sections
No ratings yet
Sections
76 pages
Chapter. 07 - Expectimax Search and Utilities
No ratings yet
Chapter. 07 - Expectimax Search and Utilities
47 pages
Decision Making
No ratings yet
Decision Making
63 pages
Decision Making Under Uncertainty
No ratings yet
Decision Making Under Uncertainty
63 pages
Decision Making Under Uncertainty
No ratings yet
Decision Making Under Uncertainty
63 pages
Lec 08
No ratings yet
Lec 08
59 pages
5 - MDP
No ratings yet
5 - MDP
42 pages
(24F-COSE361) 5. Markov Decision Process
No ratings yet
(24F-COSE361) 5. Markov Decision Process
40 pages
Sp14 Cs188 Lecture 8 - Mdps I
No ratings yet
Sp14 Cs188 Lecture 8 - Mdps I
50 pages
Chapter17 1
No ratings yet
Chapter17 1
40 pages
Reinforcement Learning Cheat Sheet: Return
No ratings yet
Reinforcement Learning Cheat Sheet: Return
7 pages
Finite Markov Decision Processes-BR
No ratings yet
Finite Markov Decision Processes-BR
31 pages
On State Variables and POMDP-s
No ratings yet
On State Variables and POMDP-s
49 pages
Lecture 4: Sequential Decision Making: Simon Parsons
No ratings yet
Lecture 4: Sequential Decision Making: Simon Parsons
94 pages
Cs 188 HW Solutions Artificial Intelligence
No ratings yet
Cs 188 HW Solutions Artificial Intelligence
7 pages
Lec 26
No ratings yet
Lec 26
21 pages
MODULE 2 Chapter 3 EC (1) Modified
No ratings yet
MODULE 2 Chapter 3 EC (1) Modified
25 pages
ML Unit 4
No ratings yet
ML Unit 4
17 pages
Algorithms To Solve An MDP
No ratings yet
Algorithms To Solve An MDP
24 pages
Markov Decision Process
No ratings yet
Markov Decision Process
29 pages
22 Reinforcement Learning
No ratings yet
22 Reinforcement Learning
18 pages
MDP PDF
No ratings yet
MDP PDF
37 pages
Reinforcement Learning Cheatsheet
No ratings yet
Reinforcement Learning Cheatsheet
16 pages
Cours
No ratings yet
Cours
19 pages
MPRA Paper 19683
No ratings yet
MPRA Paper 19683
15 pages
Approximate Dynamic Programming - II: Algorithms: Warren B. Powell
No ratings yet
Approximate Dynamic Programming - II: Algorithms: Warren B. Powell
22 pages
Pioneer Sa 610
No ratings yet
Pioneer Sa 610
23 pages
PDF Unit-5 (Full Unit)
No ratings yet
PDF Unit-5 (Full Unit)
37 pages
3 - Chapter 1 Basic Concepts
No ratings yet
3 - Chapter 1 Basic Concepts
13 pages
3 - Chapter 1 Basic Concepts
No ratings yet
3 - Chapter 1 Basic Concepts
13 pages
1.1 Discounted (Infinite-Horizon) Markov Decision Processes
No ratings yet
1.1 Discounted (Infinite-Horizon) Markov Decision Processes
26 pages
Chapter 5
No ratings yet
Chapter 5
13 pages
mdp1 6pp
No ratings yet
mdp1 6pp
13 pages
A17 Complexdecisions
No ratings yet
A17 Complexdecisions
28 pages
Markov Decision Processes: Stochastic, Sequential Environments
No ratings yet
Markov Decision Processes: Stochastic, Sequential Environments
20 pages
Markov Decision Processes and Exact Solution Methods
No ratings yet
Markov Decision Processes and Exact Solution Methods
34 pages
Lect28 4up
No ratings yet
Lect28 4up
11 pages
Art of Perfumery Me 00 Pies
No ratings yet
Art of Perfumery Me 00 Pies
308 pages
ML Unit 4
No ratings yet
ML Unit 4
9 pages
Lecture7 MDPs I
No ratings yet
Lecture7 MDPs I
9 pages
(Partially Observable) Markov Decision Processes: Frederike Petzschner & Lionel Rigoux
No ratings yet
(Partially Observable) Markov Decision Processes: Frederike Petzschner & Lionel Rigoux
19 pages
AI Unit3 Part 1
No ratings yet
AI Unit3 Part 1
5 pages
A Brief Introduction To Reinforcement Learning
No ratings yet
A Brief Introduction To Reinforcement Learning
4 pages
Solving Stochastic Planning Problems With Large State and Action Spaces
No ratings yet
Solving Stochastic Planning Problems With Large State and Action Spaces
9 pages
PX - 120 - 01 - e Manual Casio Privia Px120
No ratings yet
PX - 120 - 01 - e Manual Casio Privia Px120
38 pages
Class Viii G. Science Summative Assessment No. - 1 Combustion & Flame Assignment No. 6
No ratings yet
Class Viii G. Science Summative Assessment No. - 1 Combustion & Flame Assignment No. 6
2 pages
PG Diploma in Oil & Gas Piping Engineering Design and Analysis
No ratings yet
PG Diploma in Oil & Gas Piping Engineering Design and Analysis
4 pages
Bizuayehu Getachew V.good
No ratings yet
Bizuayehu Getachew V.good
104 pages
G7 Q1 W2 Activity Sheet
100% (1)
G7 Q1 W2 Activity Sheet
3 pages
Arun &associates
No ratings yet
Arun &associates
12 pages
Stilan Non Slip Brochure 2016
No ratings yet
Stilan Non Slip Brochure 2016
2 pages
Quimpo Vs Mendoza - Digest
100% (1)
Quimpo Vs Mendoza - Digest
1 page
Shri Vaishnav Institute of Management, Indore (M.P.)
No ratings yet
Shri Vaishnav Institute of Management, Indore (M.P.)
14 pages
Manual WFDJ7010
No ratings yet
Manual WFDJ7010
33 pages
Apollo Vs MRF: An Analysis of The Indian Tyre Industry
No ratings yet
Apollo Vs MRF: An Analysis of The Indian Tyre Industry
17 pages
Ece 10 - Microprocessor and Microcontroller System and Design (Module 1)
No ratings yet
Ece 10 - Microprocessor and Microcontroller System and Design (Module 1)
20 pages
Watcher's Advisory in Public Libraries
No ratings yet
Watcher's Advisory in Public Libraries
14 pages
Broker's Title, Inc. v. Ralph E. Main, JR., Robert H. Blodinger, Orbin F. Carter, 806 F.2d 257, 4th Cir. (1986)
No ratings yet
Broker's Title, Inc. v. Ralph E. Main, JR., Robert H. Blodinger, Orbin F. Carter, 806 F.2d 257, 4th Cir. (1986)
2 pages
Comics and Novelization A Literary History of Bandes Dessines Benot Glaude PDF Download
No ratings yet
Comics and Novelization A Literary History of Bandes Dessines Benot Glaude PDF Download
76 pages
Book of Hajj From Summarized Fiqh of Shaykh Fawzan
100% (1)
Book of Hajj From Summarized Fiqh of Shaykh Fawzan
67 pages
Concall SWSOLAR
No ratings yet
Concall SWSOLAR
20 pages
Install Ubuntu Server 18
No ratings yet
Install Ubuntu Server 18
11 pages
Half Yearly Datesheet 2024-2025
No ratings yet
Half Yearly Datesheet 2024-2025
1 page
311 Application SC Rout
No ratings yet
311 Application SC Rout
5 pages
Medical Writing Humour
No ratings yet
Medical Writing Humour
1 page
Overview - DC Talent Development Workshop
No ratings yet
Overview - DC Talent Development Workshop
2 pages
Pre-Installed SAP Portable Hard Drive Plug N Play For Laptop and Desktops
No ratings yet
Pre-Installed SAP Portable Hard Drive Plug N Play For Laptop and Desktops
23 pages
Pre Interview Questionnaire
No ratings yet
Pre Interview Questionnaire
5 pages
MCIRMARCH0B
No ratings yet
MCIRMARCH0B
4 pages
TR - 1 - Science Curriculum Analysis - Learning Journal - Siti Hanifah Nasution
No ratings yet
TR - 1 - Science Curriculum Analysis - Learning Journal - Siti Hanifah Nasution
3 pages
A Conversation About Calculus
From Everand
A Conversation About Calculus
Ginachukwu Amah
No ratings yet
Markov Decision Process: Fundamentals and Applications
From Everand
Markov Decision Process: Fundamentals and Applications
Fouad Sabry
No ratings yet

Lec 25

Uploaded by

Lec 25

Uploaded by

Fundamentals of Artificial Intelligence

Prof. Shyamanta M Hazarika

Welcome to fundamentals of artificial intelligence, today we look at sequential decision

Now so far we were talking of an accessible environment, in an inaccessible environment the

You might also like