0% found this document useful (0 votes)

21 views21 pages

Markov

markov

Uploaded by

97 Haseeb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views21 pages

Markov

markov

Uploaded by

97 Haseeb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 21

Markov Decision Process (MDP)

• A Markov Decision Process (MDP) is a mathematical

framework used for modeling decision-making in
environments where outcomes are partly random and
partly under the control of a decision-maker. MDPs are
widely used in reinforcement learning and operations
research to solve sequential decision problems.
Components of an MDP

• An MDP is defined by the tuple (S,A,P,R,γ):

1.States (S):
The set of all possible situations the agent can be in.
Example: The location of a robot in a grid.
2.Actions (A):
The set of all possible actions the agent can take in a state.
Example: Moving up, down, left, or right.
• Transition Probabilities (𝑃):The probability of moving from
one state to another, given an action: 𝑃(𝑠
′∣𝑠,𝑎)=Probability of transitioning to state 𝑠
′ from state 𝑠 after taking action 𝑎.
• Reward Function (𝑅):The immediate reward received
after transitioning from one state to another due to an

• 𝑅(𝑠,𝑎,𝑠′)=Reward for transitioning from 𝑠 to 𝑠

action:

′ using action 𝑎.
• Discount Factor (γ):
• A value between 0 and 1 that determines the
importance of future rewards:
• γ=0 Only considers immediate rewards.
• γ=1 Places high importance on future rewards.
• The goal is to find a policy (π) that tells the agent the
best action to take in each state to maximize the
cumulative reward (expected return) over time.
Hidden Markov Model (HMM)
• The hidden Markov Model (HMM) is a statistical model
that is used to describe the probabilistic relationship
between a sequence of observations and a sequence of
hidden states. It is often used in situations where the
underlying system or process that generates the
observations is unknown or hidden, hence it has the
name “Hidden Markov Model.”
• It is used to predict future observations or classify
sequences, based on the underlying hidden process that
generates the data.
• An HMM consists of two types of variables: hidden states
and observations.
• The hidden states are the underlying variables that
generate the observed data, but they are not directly
observable.

• The observations are the variables that are measured and

observed.
.
• The relationship between the hidden states and the
observations is modeled using a probability distribution.
The Hidden Markov Model (HMM) is the relationship
between the hidden states and the observations using
two sets of probabilities: the transition probabilities and
the emission probabilities.
• The transition probabilities describe the probability
of transitioning from one hidden state to another.

• The emission probabilities describe the probability of

observing an output given a hidden state.
Algorithm Steps
• Step 1: Define the state space and observation space
• The state space is the set of all possible hidden states, and
the observation space is the set of all possible observations.

• Step 2: Define the initial state distribution

• This is the probability distribution over the initial state.

• Step 3: Define the state transition probabilities

• These are the probabilities of transitioning from one state
to another. This forms the transition matrix, which describes
the probability of moving from one state to another.
• Step 4: Define the observation likelihoods:
• These are the probabilities of generating each observation
from each state. This forms the emission matrix, which
describes the probability of generating each observation
from each state.

• Step 5: Train the model

• The parameters of the state transition probabilities and the
observation likelihoods are estimated using the Baum-Welch
algorithm, or the forward-backward algorithm. This is done
by iteratively updating the parameters until convergence.
• Step 6: Decode the most likely sequence of
hidden states
• Given the observed data, the Viterbi algorithm is used
to compute the most likely sequence of hidden states.
This can be used to predict future observations, classify
sequences, or detect patterns in sequential data.

• Step 7: Evaluate the model

• The performance of the HMM can be evaluated using
various metrics, such as accuracy, precision, recall, or
F1 score.
Example
.
.
Exploration and Exploitation
• Exploration and Exploitation are methods for building effective learning
algorithms that can adapt and perform optimally in different environments.
Exploitation

Exploitation is a strategy of using the accumulated knowledge to make decisions that

maximize the expected reward based on the present information.
1. Reward Maximization: Maximizing the immediate or short-term reward based on
the current understanding of the environment is the main objective of exploitation. This
is choosing courses of action based on learned values or rewards that the model predicts
will yield the highest expected payoff.
2. Decision Efficiency: Exploitation can often make more efficient decisions by
concentrating on known high-reward actions, which lowers the computational and
temporal costs associated with exploration.

3. Risk Aversion: Exploitation inherently involves a lower level of risk as it relies on

tried and tested actions, avoiding the uncertainty associated with less familiar options.
Techniques
• Greedy Algorithms: Greedy algorithms tend to choose the locally optimal
solutions at each step without consideration of the potential impact on the
overall solution. They are often efficient in terms of computation time;
however, this approach may be suboptimal when sacrifices are required to
achieve the best global solution

• Exploitation of Learned Policies: Reinforcement learning algorithms tend to

base their pursuits on previously learned policies as a way of leveraging on old
gains. This is picking the activity that amounts in high rewards, when it is
similar to the previous experiences.

• Model-Based Methods: Model-based approaches take advantage of

underlying models that make decisions based on their predictive capabilities.
Exploration
• Exploration is used to increase knowledge about an environment or model. The exploration
process selects actions with uncertain outcomes to gather information about the possible
states and rewards that the performed actions will result.
1. Information Gain: The main objective of exploration is to gather fresh data that can improve
the model's comprehension of the surroundings. This involves exploring distinct regions of
the state space or experimenting with different actions whose outcomes are unknown.

2. Uncertainty Reduction: Reducing uncertainty in the model's estimates of the environment

guides the actions that are selected. For example, activities that are rarely selected in the past
are ranked in order of possible rewards.

3. State Space Coverage: In certain models, especially those with large or continuous state
spaces, exploration makes sure that enough different areas of the state space are visited to
prevent learning that is biased toward a small number of experiences.
Techniques
• Epsilon-Greedy Exploration: Epsilon-greedy algorithms
manage to unify those two characteristics (exploitation and
exploration) by sometimes choosing completely random
actions with probability epsilon while continuing to use the
current best-known action with probability (1 - epsilon).

• Thompson Sampling: Thompson sampling exploits the

Bayesian method to explore and exploit services
simultaneously. It helps to keep the chances that are
associated with the parameters and takes in considerations
of what is most likely to happen so as to balance for
exploration and exploitation.
Balancing Exploitation and
Exploration
• Exploration-Exploitation Trade-off: The foremost idea here
is to understand the exchange between exploration and
exploitation processes. Allocation of resources should rest on
needs to both streams alternatively depending on current state
of knowledge and complexity of the learning task or a given
day.

• Dynamic Parameter Tuning: It makes the algorithm

dynamically set the exploration and exploitation parameters
according to how the model performs and the environment
changes characteristics, thus the algorithm can be changed in
a way that better adapts to the changing environment and is
learning efficiently.

Introduction To Machine Learning - Ethem Alpaydin
100% (4)
Introduction To Machine Learning - Ethem Alpaydin
432 pages
Deep Reinforcement Learning
100% (1)
Deep Reinforcement Learning
410 pages
(Addison-Wesley Data & Analytics Series) Laura Graesser - Wah Loon Keng - Foundations of Deep Reinforcement Learning - Theory and Practice in Python-Addison-Wesley Professional (2019) PDF
100% (1)
(Addison-Wesley Data & Analytics Series) Laura Graesser - Wah Loon Keng - Foundations of Deep Reinforcement Learning - Theory and Practice in Python-Addison-Wesley Professional (2019) PDF
656 pages
Practical RL
No ratings yet
Practical RL
514 pages
Hidden Markov Models Applied To Information Extraction: Part I: Concept
No ratings yet
Hidden Markov Models Applied To Information Extraction: Part I: Concept
34 pages
Unit-4 of Ai
No ratings yet
Unit-4 of Ai
9 pages
1) Algs & Theory Overview 3) Systems For Going Right 4) Really Doing It in Practice
No ratings yet
1) Algs & Theory Overview 3) Systems For Going Right 4) Really Doing It in Practice
54 pages
Artificial Intelligence - Model Question Paper
100% (2)
Artificial Intelligence - Model Question Paper
2 pages
Знімок екрана 2022-10-31 о 18.56.30
No ratings yet
Знімок екрана 2022-10-31 о 18.56.30
96 pages
Paiml-Unit 5
No ratings yet
Paiml-Unit 5
38 pages
09 - Hidden Markov Model
No ratings yet
09 - Hidden Markov Model
78 pages
Chapter 18 - Reinforcement Learning
No ratings yet
Chapter 18 - Reinforcement Learning
29 pages
ML at Icl Reinforcement Learning: in A Nutshell
No ratings yet
ML at Icl Reinforcement Learning: in A Nutshell
60 pages
Module 6.2
No ratings yet
Module 6.2
25 pages
Markov Models
No ratings yet
Markov Models
54 pages
RL Module 4
No ratings yet
RL Module 4
50 pages
Lec 26
No ratings yet
Lec 26
21 pages
HiddenMarkovModels BARCA
No ratings yet
HiddenMarkovModels BARCA
44 pages
Artificial Intelligence: Lecture 10 - Reinforcement Learning Prof. Shivanjali Khare
No ratings yet
Artificial Intelligence: Lecture 10 - Reinforcement Learning Prof. Shivanjali Khare
45 pages
Unit 5
No ratings yet
Unit 5
39 pages
L12 Reinforcement Learning 2
No ratings yet
L12 Reinforcement Learning 2
26 pages
16 - Reinforcement Learning and Bandits
No ratings yet
16 - Reinforcement Learning and Bandits
41 pages
119686
No ratings yet
119686
24 pages
19 - Monte Carlo and Temporal Difference For Markov Decision Processes
No ratings yet
19 - Monte Carlo and Temporal Difference For Markov Decision Processes
57 pages
Prques 2
No ratings yet
Prques 2
13 pages
Ai (It) Unit-4
No ratings yet
Ai (It) Unit-4
37 pages
Lecture 8: State-Space Models Based On Slides By: Probabilis C Graphical Models
No ratings yet
Lecture 8: State-Space Models Based On Slides By: Probabilis C Graphical Models
29 pages
Amanuel Ai
No ratings yet
Amanuel Ai
28 pages
Module 3-22 March 2025
No ratings yet
Module 3-22 March 2025
34 pages
An Introduction To Reinforcement Learning From Theory To Algorithms (December 19, 2024) - Joon Kwon
No ratings yet
An Introduction To Reinforcement Learning From Theory To Algorithms (December 19, 2024) - Joon Kwon
66 pages
ML 5
No ratings yet
ML 5
28 pages
AAI Lab Manual FH-25
No ratings yet
AAI Lab Manual FH-25
20 pages
ML - Unit-V-1
No ratings yet
ML - Unit-V-1
42 pages
Provably Efficient Maximum Entropy Exploration
No ratings yet
Provably Efficient Maximum Entropy Exploration
11 pages
NIPS 2012 A Unifying Perspective of Parametric Policy Search Methods For Markov Decision Processes Paper
No ratings yet
NIPS 2012 A Unifying Perspective of Parametric Policy Search Methods For Markov Decision Processes Paper
9 pages
ML Unit-4 - RTU
No ratings yet
ML Unit-4 - RTU
18 pages
My Notes Unit 5
No ratings yet
My Notes Unit 5
12 pages
Markovian Decision Process
No ratings yet
Markovian Decision Process
27 pages
ML Presentation Final
No ratings yet
ML Presentation Final
26 pages
Esma El 2012
No ratings yet
Esma El 2012
6 pages
AS02
No ratings yet
AS02
16 pages
Unit-5 ML
No ratings yet
Unit-5 ML
18 pages
Hidden Markov Model
No ratings yet
Hidden Markov Model
6 pages
Reinforcement Learning-1
No ratings yet
Reinforcement Learning-1
13 pages
Lecture 30 Reinforcement-Learning
No ratings yet
Lecture 30 Reinforcement-Learning
50 pages
A Hidden Markov Model
No ratings yet
A Hidden Markov Model
6 pages
22 Reinforcement Learning
No ratings yet
22 Reinforcement Learning
18 pages
Unit 4
No ratings yet
Unit 4
6 pages
AI Unit 3
No ratings yet
AI Unit 3
12 pages
Unit Vi
No ratings yet
Unit Vi
17 pages
5.4-Reinforcement Learning-Part1-Introduction
No ratings yet
5.4-Reinforcement Learning-Part1-Introduction
15 pages
5.4-Reinforcement Learning-Part2-Learning-Algorithms
No ratings yet
5.4-Reinforcement Learning-Part2-Learning-Algorithms
15 pages
VIME
No ratings yet
VIME
11 pages
Subtitle
No ratings yet
Subtitle
2 pages
Hidden Markov Model in Machine Learning
No ratings yet
Hidden Markov Model in Machine Learning
2 pages
Stochastic Process - Markov Property - Markov Chain - Markov Decision Process - Reinforcement Learning - RL Techniques - Example Applications
No ratings yet
Stochastic Process - Markov Property - Markov Chain - Markov Decision Process - Reinforcement Learning - RL Techniques - Example Applications
39 pages
Reinforcement Learning MY101
No ratings yet
Reinforcement Learning MY101
15 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
10 pages
Hidden Markov Model
No ratings yet
Hidden Markov Model
4 pages
Lecture 1: Introduction: Lecturer: Prof. Subrahmanya Swamy Peruru Scribe: Harshvardhan Arya - Rishabh Katiyar
No ratings yet
Lecture 1: Introduction: Lecturer: Prof. Subrahmanya Swamy Peruru Scribe: Harshvardhan Arya - Rishabh Katiyar
4 pages
AI Unit 4 QA
No ratings yet
AI Unit 4 QA
22 pages
Markov Decision Processes & Reinforcement Learning: Megan Smith Lehigh University, Fall 2006
No ratings yet
Markov Decision Processes & Reinforcement Learning: Megan Smith Lehigh University, Fall 2006
40 pages
Reinforcement LN-6
No ratings yet
Reinforcement LN-6
13 pages
Lecture Week12
No ratings yet
Lecture Week12
37 pages
Lecture 3 - MDPs and Dynamic Programming
No ratings yet
Lecture 3 - MDPs and Dynamic Programming
66 pages
2pageresume Shambhavi
0% (1)
2pageresume Shambhavi
2 pages
DRL Homework 1
No ratings yet
DRL Homework 1
4 pages
CS6700 RL 2024 Wa1
No ratings yet
CS6700 RL 2024 Wa1
7 pages
Artificial Intelligence CS188 Midterm1 Solutions
No ratings yet
Artificial Intelligence CS188 Midterm1 Solutions
28 pages
Textbook Solutions Expert Q&A Practice: Find Solutions For Your Homework
0% (1)
Textbook Solutions Expert Q&A Practice: Find Solutions For Your Homework
4 pages
Process Systems Engineering Vol. 4 - Supply Chain Optimization, Part II (Wiley-VCH, 2008)
No ratings yet
Process Systems Engineering Vol. 4 - Supply Chain Optimization, Part II (Wiley-VCH, 2008)
365 pages
Clustering
No ratings yet
Clustering
19 pages
Overfitting Vs Underfitting
No ratings yet
Overfitting Vs Underfitting
14 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
13 pages
Aulas MDP Exercises Sols
No ratings yet
Aulas MDP Exercises Sols
10 pages
Markov Decision Process
No ratings yet
Markov Decision Process
8 pages
Midterm sp09 Solution
No ratings yet
Midterm sp09 Solution
11 pages
Solution of Final Exam: 10-701/15-781 Machine Learning: Fall 2004 Dec. 12th 2004
No ratings yet
Solution of Final Exam: 10-701/15-781 Machine Learning: Fall 2004 Dec. 12th 2004
27 pages
Multi Agent Deep Reinforcement Learning: A Survey: Sven Gronauer Klaus Diepold
No ratings yet
Multi Agent Deep Reinforcement Learning: A Survey: Sven Gronauer Klaus Diepold
49 pages
Robust Markov Decision Processes With Average and Blackwell Optimality
No ratings yet
Robust Markov Decision Processes With Average and Blackwell Optimality
57 pages
Journal of Cleaner Production: Zeyi Sun, Lin Li
No ratings yet
Journal of Cleaner Production: Zeyi Sun, Lin Li
10 pages
K-Means Clustering
No ratings yet
K-Means Clustering
14 pages
AI Course Handout Dse 6th Sem
No ratings yet
AI Course Handout Dse 6th Sem
6 pages
Policy Gradient Method For Robust Reinforcement Learning
No ratings yet
Policy Gradient Method For Robust Reinforcement Learning
44 pages
Towards A Continuously Learning Humanoid, AI-Driven Sensory Perception and Adaptive Intelligence.
No ratings yet
Towards A Continuously Learning Humanoid, AI-Driven Sensory Perception and Adaptive Intelligence.
21 pages
All INT Quizzes 3
No ratings yet
All INT Quizzes 3
25 pages
Markov Decision Processes With Ordinal Rewards: Reference Point-Based Preferences
No ratings yet
Markov Decision Processes With Ordinal Rewards: Reference Point-Based Preferences
8 pages
Priority Sequential Inference Improving Safety For Efficient Autonomous Highway Driving Using MARL
No ratings yet
Priority Sequential Inference Improving Safety For Efficient Autonomous Highway Driving Using MARL
12 pages
Nabeel and Awais
No ratings yet
Nabeel and Awais
13 pages
1 s2.0 S1474034621001956 Main
No ratings yet
1 s2.0 S1474034621001956 Main
13 pages
Activation Function
No ratings yet
Activation Function
10 pages
PA4
No ratings yet
PA4
8 pages
Feed Forward Neural Network Presentation of Sami and Daim
No ratings yet
Feed Forward Neural Network Presentation of Sami and Daim
9 pages
Lecture7 MDPs I
No ratings yet
Lecture7 MDPs I
9 pages
Computer Network Applications in Fuzzy System - Ma...
No ratings yet
Computer Network Applications in Fuzzy System - Ma...
2 pages
Simulated Annealing: Fundamentals and Applications
From Everand
Simulated Annealing: Fundamentals and Applications
Fouad Sabry
No ratings yet
Markov Decision Process: Fundamentals and Applications
From Everand
Markov Decision Process: Fundamentals and Applications
Fouad Sabry
No ratings yet

Markov

Uploaded by

Markov

Uploaded by

Markov Decision Process (MDP)

• A Markov Decision Process (MDP) is a mathematical

• An MDP is defined by the tuple (S,A,P,R,γ):

• 𝑅(𝑠,𝑎,𝑠′)=Reward for transitioning from 𝑠 to 𝑠

• The observations are the variables that are measured and

• The emission probabilities describe the probability of

• Step 2: Define the initial state distribution

• Step 3: Define the state transition probabilities

• Step 5: Train the model

• Step 7: Evaluate the model

Exploitation is a strategy of using the accumulated knowledge to make decisions that

3. Risk Aversion: Exploitation inherently involves a lower level of risk as it relies on

• Exploitation of Learned Policies: Reinforcement learning algorithms tend to

• Model-Based Methods: Model-based approaches take advantage of

2. Uncertainty Reduction: Reducing uncertainty in the model's estimates of the environment

• Thompson Sampling: Thompson sampling exploits the

• Dynamic Parameter Tuning: It makes the algorithm

You might also like