Markov
Markov
′ using action 𝑎.
• Discount Factor (γ):
• A value between 0 and 1 that determines the
importance of future rewards:
• γ=0 Only considers immediate rewards.
• γ=1 Places high importance on future rewards.
• The goal is to find a policy (π) that tells the agent the
best action to take in each state to maximize the
cumulative reward (expected return) over time.
Hidden Markov Model (HMM)
• The hidden Markov Model (HMM) is a statistical model
that is used to describe the probabilistic relationship
between a sequence of observations and a sequence of
hidden states. It is often used in situations where the
underlying system or process that generates the
observations is unknown or hidden, hence it has the
name “Hidden Markov Model.”
• It is used to predict future observations or classify
sequences, based on the underlying hidden process that
generates the data.
• An HMM consists of two types of variables: hidden states
and observations.
• The hidden states are the underlying variables that
generate the observed data, but they are not directly
observable.
3. State Space Coverage: In certain models, especially those with large or continuous state
spaces, exploration makes sure that enough different areas of the state space are visited to
prevent learning that is biased toward a small number of experiences.
Techniques
• Epsilon-Greedy Exploration: Epsilon-greedy algorithms
manage to unify those two characteristics (exploitation and
exploration) by sometimes choosing completely random
actions with probability epsilon while continuing to use the
current best-known action with probability (1 - epsilon).