Reinforcement Learning
Reinforcement Learning
By Shweta Saxena
Types of machine learning
Reinforcement Learning
Action
Environment
(State, Action, Reward)
Agent
(Computer
Program)
Reinforcement Learning
• Art of optimal decision making
• Reinforcement learning is a type of machine learning method where an
intelligent agent (computer program) interacts with the environment and
learns to act within that.
• In RL an agent learns by trial and error using feedback from its own actions
and experiences.
• How a Robotic dog learns the movement of his arms is an example of Reinforcement
learning.
• RL solves a specific type of problem where decision making is sequential and
the goal is long-term.
• Game-playing
• Rbotics, etc.
Reinforcement Learning
• The figure below illustrates the action-reward feedback loop of a
generic RL model.
Reinforcement Learning
Reinforcement Learning
• The above image shows the robot, diamond, and fire.
• The goal of the robot is to get the reward that is the diamond and
avoid the hurdles that are fired.
• The robot learns by trying all the possible paths and then choosing
the path which gives him the reward with the least hurdles.
• Each right step will give the robot a reward and each wrong step will
subtract the reward of the robot.
• The total reward will be calculated when it reaches the final reward
that is the diamond.
Main points in Reinforcement learning
1. Input: The input should be an initial state from which the model will
start
2. Output: There are many possible outputs as there are a variety of
solutions to a particular problem
3. Training:
• The training is based upon the input.
• The model will return a state and the user will decide to reward or punish the
model based on its output.
• The model keeps continues to learn.
• The best solution is decided based on the maximum reward.
Terms used in Reinforcement Learning
• Agent(): An entity that can perceive/explore the environment and act upon it.
• Environment(): A situation in which an agent is present or surrounded by. In RL, we assume
the stochastic environment, which means it is random in nature.
• Action(): Actions are the moves taken by an agent within the environment.
• State(): State is a situation returned by the environment after each action taken by the agent.
• Reward(): A feedback returned to the agent from the environment to evaluate the action of the
agent.
• Policy(): Policy is a strategy applied by the agent for the next action based on the current state.
• Value(): It is expected long-term retuned with the discount factor and opposite to the short-
term reward.
• Q-value(): It is mostly similar to the value, but it takes one additional parameter as a current
action (a).
Reinforcement learning and Supervised
learning
• Both supervised and reinforcement learning use mapping between
input and output.
• Supervised learning where the feedback provided to the agent is correct set
of actions for performing a task.
• Reinforcement learning uses rewards and punishments as signals for positive
and negative behavior
• Goal in unsupervised learning is to find similarities and differences
between data points.
• Reinforcement learning the goal is to find a suitable action model
that would maximize the total cumulative reward of the agent.
Difference between Reinforcement learning
and Supervised learning
Reinforcement learning Supervised learning