Unit 4
Unit 4
Learning,
Semi Supervised Learning
Image Classification
Anomaly Detection
Reinforcement Learning
Reinforcement learning (RL) is a machine learning (ML) technique that trains software
to make decisions to achieve the most optimal results.
It mimics the trial-and-error learning process that humans use to achieve their goals.
Reinforcement Learning (RL) is the science of decision making. It is about learning the
optimal behavior in an environment to obtain maximum reward.
In Reinforcement Learning, the agent learns automatically using
feedbacks without any labeled data, unlike supervised learning.
The agent cannot cross the S6 block, as it is a solid wall. If the agent
reaches the S4 block, then get the +1 reward; if it reaches the fire
pit, then gets -1 reward point. It can take four actions: move up,
move down, move left, and move right.
The agent can take any path to reach to the final point, but he needs
to make it in possible fewer steps. Suppose the agent considers the
path S9-S5-S1-S2-S3, so he will get the +1-reward point.
For 1st block:
V(s3) = max [R(s,a) + γV(s`)], here V(s')= 0 because there is no
further state to move.
V(s3)= max[R(s,a)]=> V(s3)= max[1]=> V(s3)= 1
It’s a bit like learning the rules of a game by playing it many times,
rather than studying its manual.
At the end of each episode, the algorithm looks back at the states
visited and the rewards received to calculate what’s known as the
“return” — the cumulative reward starting from a specific state until
the end of the episode.
Monte Carlo policy evaluation repeatedly simulates episodes, tracking the
total rewards that follow each state and then calculating the average.
These averages give an estimate of the state value under the policy being
followed.
These values are useful because they help us understand which states are
more valuable and thus guide the agent toward better decision-making in
the future.
Over time, as the agent learns the value of different states, it can
refine its policy, favoring actions that lead to higher rewards.
N(s) is the number of times state “s” is visited across episodes, and
Gi is the return from the i-th episode after visiting state “s”.
Conclusion
Here agent will take a move as per probability bases and changes the
state. But if we want some exact moves, so for this, we need to make
some changes in terms of Q-value.
Q- represents the quality of the actions at each state. So instead of
using a value at each state, we will use a pair of state and action, i.e.,
Q(s, a).
State-Action-Reward-State-Action
The robot can proceed in numerous directions at each step (these are
the "Actions" - what it does). As it travels, the robot receives input
through incentives - positive or negative numbers indicating its
performance.
Explanation of SARSA:
The amazing thing about SARSA is that it doesn't need a map of the
maze or explicit instructions on what to do.
Here, the update equation for SARSA depends on the current state,
current action, reward obtained, next state and next action.
Code Snippet
Output
Model Based Reinforcement
Learning
The model is used for planning, which means it provides a way to take a
course of action by considering all future situations before actually
experiencing those situations.
The approaches for solving the RL problems with the help of the model are
termed as the model-based approach.
With the help of the model, one can make inferences(idea or conclusions)
about how the environment will behave.
Such as, if a state and an action are given, then a model can predict
the next state and reward.
This model can predict the next state and reward given the current
state and action. It helps the agent to simulate future states and
outcomes without direct interaction.
Approach in MBRL
1. Model Learning:
Implicit - Indirectly learning models, often through latent variable
representations or embeddings.
Explicit- Directly learning the dynamics of the environment (e.g., using
neural networks or Gaussian processes).
2. Planning Algorithms:
Planning involves using the model to simulate multiple future scenarios,
enabling the agent to choose actions that maximize long-term rewards.
3. Hybrid Approach
Combining MBRL with model-free methods can leverage the strengths of
both approaches.