Unit - 5 Re-Inforcement Learning
Unit - 5 Re-Inforcement Learning
Example : The problem is as follows: We have an agent and a reward, with many hurdles in
between. The agent is supposed to find the best possible path to reach the reward. The following
problem explains the problem more easily.
The above image shows robot, diamond and fire. The goal of the robot is to get the reward that is
the diamond and avoid the hurdles that is fire. The robot learns by trying all the possible paths
and then choosing the path which gives him the reward with tthehe least hurdles. Each right step
will give the robot a reward and each wrong step will subtract the reward of the robot. The total
reward will be calculated when it reaches the final reward that is the diamond.
Input: The input should be an initial state from which the model will start
Output: There are many possible output as there are variety of solution to a particular
problem
Training: The training is based upon the input, The model will return a state and the user
will decide to reward or punish the model based on its output.
The model keeps continues to learn.
The best solution is decided based on the maximum reward.
Difference between Reinforcement learning and Supervised learning:
1. Positive –
Positive Reinforcement is defined as when an event, occurs due to a particular behavior,
increases the strength and the frequency of the behavior. In other words it has a positive
effect on the behavior.
o Maximizes Performance
o Sustain Change for a long period of time
o Too much Reinforcement can lead to overload of states which can diminish the
results
2. Negative –
Negative Reinforcement is defined as strengthening of a behavior because a negative
condition is stopped or avoided.
o Increases Behavior
o Provide defiance to minimum standard of performance