Unit 5 ML
Unit 5 ML
Digital Notes
[Department of Computer Science Engineering]
Course : B.TECH
Branch : CSE 3rd Yr
Subject Name :Machine Learning Techniques
(BCS055)
Prepared by: Mr. Abhishek Singh Sengar
UNIT 5
• REINFORCEMENT LEARNING–Introduction to
Reinforcement Learning, Learning Task, Example of
Reinforcement Learning in Practice, Learning Models
for Reinforcement – (Markov Decision process, Q
Learning - Q Learning function, Q Learning Algorithm)
• Application of Reinforcement Learning, Introduction
to Deep Q Learning. GENETIC ALGORITHMS:
Introduction, Components, GA cycle of reproduction,
Crossover, Mutation, Genetic Programming, Models
of Evolution and Learning, Applications
• Reinforcement Learning (RL) is a branch of
machine learning that focuses on how agents
can learn to make decisions through trial and
error to maximize cumulative rewards.
• RL allows machines to learn by interacting
with an environment and receiving feedback
based on their actions. This feedback comes in
the form of rewards or penalties.
• Reinforcement Learning revolves around the idea that an agent
(the learner or decision-maker) interacts with an environment
to achieve a goal. The agent performs actions and receives
feedback to optimize its decision-making over time.
• Agent: The decision-maker that performs actions.
• Environment: The world or system in which the agent operates.
• State: The situation or condition the agent is currently in.
• Action: The possible moves or decisions the agent can make.
• Reward: The feedback or result from the environment based
on the agent’s action.
How Reinforcement Learning Works?
• 1. Positive Reinforcement
• Positive Reinforcement is defined as when an event, occurs due to a
particular behavior, increases the strength and the frequency of the
behavior. In other words, it has a positive effect on behavior.
• Advantages: Maximizes performance, helps sustain change over time.
• Disadvantages: Overuse can lead to excess states that may reduce
effectiveness.
• 2. Negative Reinforcement
• Negative Reinforcement is defined as strengthening of behavior because a
negative condition is stopped or avoided.
• Advantages: Increases behavior frequency, ensures a minimum
performance standard.
• Disadvantages: It may only encourage just enough action to avoid
penalties.
CartPole in OpenAI Gym
• 1. Q-Values or Action-Values
• Q-values represent the expected rewards for taking an
action in a specific state. These values are updated over
time using the Temporal Difference (TD) update rule.
• 2. Rewards and Episodes
• The agent moves through different states by taking actions
and receiving rewards. The process continues until the
agent reaches a terminal state, which ends the episode.
• 3. Temporal Difference or TD-Update
• The agent updates Q-values using the formula:
Where,
• Applications for Q-learning, a reinforcement learning algorithm, can be found in many different
fields. Here are a few noteworthy instances:
• Atari Games: Classic Atari 2600 games can now be played with Q-learning. In games like Space
Invaders and Breakout, Deep Q Networks (DQN), an extension of Q-learning that makes use of
deep neural networks, has demonstrated superhuman performance.
• Robot Control: Q-learning is used in robotics to perform tasks like navigation and robot control.
With Q-learning algorithms, robots can learn to navigate through environments, avoid obstacles,
and maximise their movements.
• Traffic Management: Autonomous vehicle traffic management systems use Q-learning. It lessens
congestion and enhances traffic flow overall by optimising route planning and traffic signal timings.
• Algorithmic Trading: The use of Q-learning to make trading decisions has been investigated in
algorithmic trading. It makes it possible for automated agents to pick up the best strategies from
past market data and adjust to shifting market conditions.
• Personalized Treatment Plans: To make treatment plans more unique, Q-learning is used in the
medical field. Through the use of patient data, agents are able to recommend personalized
interventions that account for individual responses to various treatments.
Deep Q-Learning in Reinforcement Learning
• A Fitness Score is given to each individual which shows the ability of an individual to
“compete”. The individual having optimal fitness score (or near optimal) are sought.
• The GAs maintains the population of n individuals (chromosome/solutions) along
with their fitness scores.The individuals having better fitness scores are given more
chance to reproduce than others. The individuals with better fitness scores are
selected who mate and produce better offspring by combining chromosomes of
parents. The population size is static so the room has to be created for new arrivals.
So, some individuals die and get replaced by new arrivals eventually creating new
generation when all the mating opportunity of the old population is exhausted. It is
hoped that over successive generations better solutions will arrive while least fit die.
• Each new generation has on average more “better genes” than the individual
(solution) of previous generations. Thus each new generations have better “partial
solutions” than previous generations. Once the offspring produced having no
significant difference from offspring produced by previous populations, the
population is converged. The algorithm is said to be converged to a set of solutions
for the problem.
Operators of Genetic Algorithms