Unit-4 of Ai

The document discusses Markov Decision Processes (MDPs) and their formulation in Reinforcement Learning, detailing key components such as states, actions, rewards, and policies. It also introduces utility theory and utility functions for decision-making under uncertainty, along with algorithms like Value Iteration and Policy Iteration for optimizing policies. Additionally, it covers Partially Observable Markov Decision Processes (POMDPs) for scenarios where agents operate under uncertainty, highlighting their applications in various fields.

Uploaded by

ggi2022.1201

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

143 views9 pages

Unit-4 of Ai

Uploaded by

ggi2022.1201

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

UNIT-4 MARKOV DECISION PROCESS

MDP FORMULATION:
Reinforcement Learning:
Reinforcement Learning is a type of Machine
Learning. It allows machines and software agents to
automatically determine the ideal behavior within a
specific context, in order to maximize its
performance. Simple reward feedback is required for
the agent to learn its behavior; this is known as the
reinforcement signal.
There are many different algorithms that tackle this
issue. As a matter of fact, Reinforcement Learning is
defined by a specific type of problem and all its
solutions are classed as Reinforcement Learning
algorithms. In the problem, an agent is supposed to
decide the best action to select based on his current
state. When this step is repeated, the problem is
known as a Markov Decision Process.
A Markov Decision Process (MDP) model contains:

• A set of possible world states S.

• A set of Models.
• A set of possible actions A.
• A real-valued reward function R(s,a).
• A policy is a solution to Markov Decision
Process.

What is a State?
A State is a set of tokens that represent every state
that the agent can be in.
What is a Model?:
A Model (sometimes called Transition Model) gives
an action’s effect in a state. In particular, T(S, a, S’)
defines a transition T where being in state S and
taking an action ‘a’ takes us to state S’ (S and S’ may
be the same).

What are Actions?

Action A is a set of all possible actions. A(s) defines
the set of actions that can be taken being in state S.
What is a Reward?
A Reward is a real-valued reward function. R(s)
indicates the reward for simply being in the state S.
R(S,a) indicates the reward for being in a state S and
taking an action ‘a’. R(S,a,S’) indicates the reward for
being in a state S, taking an action ‘a’ and ending up
in a state S’.

What is a Policy?
A Policy is a solution to the Markov Decision
Process. A policy is a mapping from S to a. It
indicates the action ‘a’ to be taken while in state S.
UTILITY THEORY:
Utility theory offers a framework for making
decisions in situations of ambiguity by putting
utilities(values) on several possible results. It is very
useful in optimising and modelling decision-making
processes by considering uncertain and
probabilistic outcomes in different situations.
In artificial intelligence(AI), utility theory aims to
represent and measure the choices and ideas of an
intelligent entity(agent). It offers a framework for
making decisions in situations of ambiguity by
putting utilities(values) on several possible results.
It can be used in various artificial intelligence areas,
such as game theory, reinforcement learning,
decision making etc. It is very useful in optimising
and modelling decision-making processes by
considering uncertain and probabilistic outcomes in
different situations.A utility function is used in utility
theory to represent an agent's preferences. It maps
potential outcomes or states to fundamental values
expressing the agent's desirability.
UTILITY FUNCTION:
In AI, a utility function is a mathematical
representation of an agent's preferences, assigning
numerical values (utilities) to different outcomes,
guiding decision-making by favoring outcomes with
higher utility values. A utility function is a function
that takes inputs (like states or outcomes) and
outputs a numerical value representing the agent's
satisfaction or preference for that input.
It helps AI systems make decisions by quantifying
the desirability of different actions or states,
allowing the system to choose the option that
maximizes its expected utility.

VALUE ITERATION:
Value Iteration is an algorithm used in
Reinforcement Learning and Markov Decision
Processes (MDPs) to compute the optimal policy
for an agent. It focuses on finding the optimal value
function, which assigns a value to each state,
representing the maximum cumulative reward an
agent can achieve starting from that state.
Here’s a step-by-step explanation of how it works:
1. Initialize Value Function: Start with an arbitrary
value for each state (usually zero).
2. Iterative Update:
o For each state, compute the maximum
expected reward by considering all possible
actions and their outcomes (using the
transition probabilities and rewards).
o Update the value of the state based on this
computation.
3. Convergence:
o Repeat the updates until the values stabilize
(i.e., the difference between consecutive
updates is smaller than a predefined
threshold).

POLICY ITERATION:
Policy Iteration is an algorithm used in
Reinforcement Learning and Markov Decision
Processes (MDPs) to find the optimal policy for an
agent. Unlike Value Iteration, which focuses on
directly refining the value function, Policy Iteration
alternates between two steps: policy evaluation
and policy improvement.
Here’s how it works:
1. Policy Evaluation:
o Start with an initial policy (which can be
arbitrary).
o Evaluate how "good" this policy is by
calculating the value function for each
state, assuming the agent follows this
policy.
2. Policy Improvement:
o Using the value function from the policy
evaluation step, update the policy by
selecting the action in each state that leads
to the maximum expected reward.
o This creates a new, improved policy.
3. Repeat Until Convergence:
o Alternate between policy evaluation and
policy improvement until the policy stops
changing. At this point, you have the
optimal policy.
PARTIALLY OBSERVABLE MDPS:
Partially Observable Markov Decision Processes
(POMDPs) are an extension of Markov Decision
Processes (MDPs) used to model decision-making in
situations where the agent doesn't have full visibility
or certainty about the environment's current state.
They are particularly useful in real-world scenarios
where an agent must act under uncertainty.
The agent's goal in a POMDP is to develop a policy
that determines the best action to take based on its
belief about the current state. This belief is
represented as a probability distribution over all
possible states, updated as new observations are
made.
Key elements of a POMDP:
1. States: A set of all possible states the
environment can be in (similar to MDPs).
2. Actions: The set of actions the agent can take to
interact with the environment.
3. Transition Probabilities: Probabilities that
taking an action in a given state will lead to a
specific new state.
4. Observations: Instead of directly observing the
state, the agent receives observations that
provide partial information about the true state.
5. Observation Probabilities: The likelihood of
receiving a specific observation in a given state.
6. Rewards: Numerical values representing the
immediate benefit of taking an action in a
particular state.
Applications
POMDPs are widely used in areas where uncertainty
is inherent, such as:
• Robot Navigation: Robots may not have full
knowledge of their surroundings due to sensor
limitations.
• Medical Diagnosis: Doctors making decisions
based on incomplete or noisy patient data.
• Speech Recognition: Understanding spoken
words when the input contains ambiguity or
noise.

Problem Solving Techniques
No ratings yet
Problem Solving Techniques
52 pages
Unit 1
100% (1)
Unit 1
77 pages
Unit1 ML
No ratings yet
Unit1 ML
23 pages
Ch-5 Uncertain Knowledge and Reasoning
No ratings yet
Ch-5 Uncertain Knowledge and Reasoning
25 pages
Unit 4 Ai
100% (2)
Unit 4 Ai
16 pages
UNIT-1 PPT AI - Dum
No ratings yet
UNIT-1 PPT AI - Dum
70 pages
Search Algorithms in Artificial Intelligence
No ratings yet
Search Algorithms in Artificial Intelligence
17 pages
Unit 1 Notes
100% (1)
Unit 1 Notes
18 pages
Unit III AI
100% (1)
Unit III AI
38 pages
CS6659 AI UNIT 1 Notes
100% (8)
CS6659 AI UNIT 1 Notes
47 pages
AI Digital Notes Complete
100% (1)
AI Digital Notes Complete
202 pages
Question bank-AI-12-13-10144CS601
100% (2)
Question bank-AI-12-13-10144CS601
30 pages
PST Material Unit-I
No ratings yet
PST Material Unit-I
32 pages
Unit II - Problem Solving by Searching
No ratings yet
Unit II - Problem Solving by Searching
21 pages
Knowledge Based Agents
No ratings yet
Knowledge Based Agents
5 pages
Unit V
100% (1)
Unit V
24 pages
Big Questions With Answers
100% (1)
Big Questions With Answers
32 pages
AI-ques-ans-Unit-1 Prof. Anuj Khanna KOIT
100% (1)
AI-ques-ans-Unit-1 Prof. Anuj Khanna KOIT
17 pages
Of Module-1 1.: I. What Is AI?
100% (1)
Of Module-1 1.: I. What Is AI?
19 pages
Managing Errors and Exception
67% (3)
Managing Errors and Exception
12 pages
Cs2351 Ai Notes
100% (1)
Cs2351 Ai Notes
91 pages
Ai Unit-I
No ratings yet
Ai Unit-I
41 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
98 pages
AI-Lecture 6 (Adversarial Search)
No ratings yet
AI-Lecture 6 (Adversarial Search)
68 pages
BCA 6TH Sem Artificial Intelligence
No ratings yet
BCA 6TH Sem Artificial Intelligence
2 pages
AI 2ndunit
No ratings yet
AI 2ndunit
25 pages
AL3391-AI Unit IV
No ratings yet
AL3391-AI Unit IV
65 pages
Principles of Artificial Intelligence
No ratings yet
Principles of Artificial Intelligence
2 pages
Unit 2
No ratings yet
Unit 2
55 pages
ML - CSA 301 - ML Perspective and Issues
No ratings yet
ML - CSA 301 - ML Perspective and Issues
34 pages
The Foundations of Artificial Intelligence
No ratings yet
The Foundations of Artificial Intelligence
4 pages
Artificial Intelligence R20 Notes-Unit 1
No ratings yet
Artificial Intelligence R20 Notes-Unit 1
24 pages
AI Old Question Papers
No ratings yet
AI Old Question Papers
7 pages
ML Unit-1
No ratings yet
ML Unit-1
32 pages
Ai-Unit-I Notes
No ratings yet
Ai-Unit-I Notes
74 pages
AI 2marks Questions
100% (1)
AI 2marks Questions
121 pages
Algorithm Analysis Design Lecture1 PowerPoint Presentation
No ratings yet
Algorithm Analysis Design Lecture1 PowerPoint Presentation
9 pages
Artificial Intelligence Module 5
No ratings yet
Artificial Intelligence Module 5
23 pages
Ai-Unit2 - QB-VDP
No ratings yet
Ai-Unit2 - QB-VDP
13 pages
DAA-2020-21 Final Updated Course File
No ratings yet
DAA-2020-21 Final Updated Course File
49 pages
Department of Computer Science and Engineering
No ratings yet
Department of Computer Science and Engineering
23 pages
CS 3 - Problem Solving Agent
No ratings yet
CS 3 - Problem Solving Agent
80 pages
Physical Science Q2 Week 6 SLM 7
33% (3)
Physical Science Q2 Week 6 SLM 7
15 pages
AI Lab MAnual Final
No ratings yet
AI Lab MAnual Final
44 pages
18CSC305J - UNIT-4.pptx - 18CSC305J - UNIT-4
No ratings yet
18CSC305J - UNIT-4.pptx - 18CSC305J - UNIT-4
77 pages
Unit 2 AI
No ratings yet
Unit 2 AI
22 pages
NN DL
No ratings yet
NN DL
1 page
Question Bank - OS
No ratings yet
Question Bank - OS
6 pages
Berger Paint Project
100% (2)
Berger Paint Project
144 pages
Data Structures Design - AD3251 - Important Questions With Answer - Unit 1 - Abstract Data Types
No ratings yet
Data Structures Design - AD3251 - Important Questions With Answer - Unit 1 - Abstract Data Types
15 pages
DEADLOCK
No ratings yet
DEADLOCK
8 pages
Representing Knowledge Using
No ratings yet
Representing Knowledge Using
22 pages
Lecture 3 Search Strategies in Artificial Intelligence
No ratings yet
Lecture 3 Search Strategies in Artificial Intelligence
18 pages
Mooc File On Introduce To Machine Learning
No ratings yet
Mooc File On Introduce To Machine Learning
13 pages
Design A Learning System in Machine Learning
No ratings yet
Design A Learning System in Machine Learning
41 pages
Ai-Unit-Iii Notes
No ratings yet
Ai-Unit-Iii Notes
46 pages
Unit-1 of Ai
No ratings yet
Unit-1 of Ai
31 pages
Att#11 - A - Painting Procedure
No ratings yet
Att#11 - A - Painting Procedure
14 pages
CS-1351 Artificial Intelligence - Two Marks
100% (1)
CS-1351 Artificial Intelligence - Two Marks
24 pages
RL DQN PG
No ratings yet
RL DQN PG
65 pages
Environmental Ethics Assignment
0% (1)
Environmental Ethics Assignment
6 pages
ML Notes
No ratings yet
ML Notes
47 pages
Lecture4 Chapter1 - Binary - Gray, and ASCII Codes
No ratings yet
Lecture4 Chapter1 - Binary - Gray, and ASCII Codes
36 pages
DS Unit 5
No ratings yet
DS Unit 5
27 pages
Verbal Tenses Review 1st Term ANSWER KEY
No ratings yet
Verbal Tenses Review 1st Term ANSWER KEY
2 pages
IOT Solve
No ratings yet
IOT Solve
23 pages
A320 Limitations
No ratings yet
A320 Limitations
19 pages
MOFP - Families Fabaceae, Brassicaceae, Malvaceae
No ratings yet
MOFP - Families Fabaceae, Brassicaceae, Malvaceae
2 pages
DIVIDENDS
No ratings yet
DIVIDENDS
2 pages
Bus Paper Craft
No ratings yet
Bus Paper Craft
10 pages
Wc-Module 2
No ratings yet
Wc-Module 2
14 pages
Carbolite Furnace Manual
No ratings yet
Carbolite Furnace Manual
16 pages
Unit-3 of Ai
No ratings yet
Unit-3 of Ai
19 pages
California Utility Bill PDF 2 1
No ratings yet
California Utility Bill PDF 2 1
1 page
L12 Reinforcement Learning 2
No ratings yet
L12 Reinforcement Learning 2
26 pages
Spec Trod Ens
No ratings yet
Spec Trod Ens
4 pages
Unit 4
No ratings yet
Unit 4
6 pages
Unit 2
No ratings yet
Unit 2
39 pages
Abeya Merga Research
No ratings yet
Abeya Merga Research
45 pages
EASA Module 15 - Engine Fire Protection System Question
No ratings yet
EASA Module 15 - Engine Fire Protection System Question
11 pages
Understanding Operating Systems 7th Edition by Ida Flynn, Ann McIver McHoes 128509655X 978-1285096551instant Download
100% (4)
Understanding Operating Systems 7th Edition by Ida Flynn, Ann McIver McHoes 128509655X 978-1285096551instant Download
85 pages
Sorcerer (Alternate) - Sorcerous Origins (Archmage)
No ratings yet
Sorcerer (Alternate) - Sorcerous Origins (Archmage)
15 pages
Antenna Fundamentals
No ratings yet
Antenna Fundamentals
36 pages
Unit-1 of WC
No ratings yet
Unit-1 of WC
10 pages
Memory Addressing and Instruction Formats
No ratings yet
Memory Addressing and Instruction Formats
9 pages
Nokia Solutions and Networks Jaipur (Raj.) : Seminar Report ON Industrial Training AT
No ratings yet
Nokia Solutions and Networks Jaipur (Raj.) : Seminar Report ON Industrial Training AT
51 pages
Solution Test2
No ratings yet
Solution Test2
6 pages
Strategic Analysis Rubric Expanded
No ratings yet
Strategic Analysis Rubric Expanded
4 pages
Action Plans For Strategic Goals 2015
No ratings yet
Action Plans For Strategic Goals 2015
11 pages
Latihan Minggu 7-2-Andri Rahman Kusumo-1B AKM
No ratings yet
Latihan Minggu 7-2-Andri Rahman Kusumo-1B AKM
5 pages
Test 1 Truss Test 2014
No ratings yet
Test 1 Truss Test 2014
4 pages
RFS Journal Primer - Interventional Oncology SL
No ratings yet
RFS Journal Primer - Interventional Oncology SL
14 pages
SM T311 - Direy 6
No ratings yet
SM T311 - Direy 6
3 pages
Catalogo Pompe 2014
No ratings yet
Catalogo Pompe 2014
2 pages
Meeting Script
No ratings yet
Meeting Script
1 page
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet

Unit-4 of Ai

Uploaded by

Unit-4 of Ai

Uploaded by

UNIT-4 MARKOV DECISION PROCESS

• A set of possible world states S.

What are Actions?

You might also like