Markov Decision
Markov Decision
1. States (S):
A finite set of states representing the environment's different situations.
Example: In a grid world, each cell (position) can be a state.
2. Actions (A):
A finite set of actions available to the agent in each state.
Example: In a grid world, the actions could be "up," "down," "left," and "right."
Goal of MDP
The goal of an MDP is to find a policy \( \pi \) that maximizes the expected
cumulative reward over time. A policy is a function that maps states to actions, \(
\pi: S \rightarrow A \).
- Optimal Policy (\( \pi^* \)): The policy that yields the maximum expected reward
starting from any state \( s \).
Bellman Equations
Solving MDPs
1. Dynamic Programming:
Methods like Policy Iteration and Value Iteration are used to compute the
optimal policy. These methods rely on the Bellman equations to iteratively
improve the value functions until convergence.
Summary