Reinforcement Learning

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 10

Unit V Reinforcement Learning

Dr.M.Thamarai
Professor, ECE Department,
SVEC
Reinforcement learning (RL)
• Reinforcement learning (RL) is a type of
machine learning where an agent learns how
to make decisions by performing actions in an
environment to maximize some notion of
cumulative reward.
• Unlike supervised learning, where the model
learns from labeled data, RL is based on trial
and error, and it uses feedback from its
actions to improve its performance over time.
Supervised Vs Reinforcement Learning
Key Concepts in Reinforcement Learning

• 1.Agent: The learner or decision-maker that


interacts with the environment.
• 2. Environment: The world or system within
which the agent operates and learns.
• 3. State (S): A representation of the current
situation of the environment.
• 4.Action (A): Any set of moves or decisions the
agent can make in a given state.
Key Concepts in Reinforcement Learning
• 5. Reward (R) : A feedback signal the agent receives after
taking an action in a state, indicating the immediate gain or
loss.
• 6.Policy (π): A strategy or function that maps states to
actions and determines the agent's behavior.
• 7. Value Function (V): A function that estimates the expected
cumulative reward (or "value") of being in a certain state.
• 8.Q-Function (Q): A function that estimates the expected
cumulative reward of taking a particular action in a given
state, helping the agent decide between different actions.
How RL Works?
• 1.Exploration vs. Exploitation: The agent must balance exploring new
actions to discover potentially better rewards (exploration) with
exploiting known actions that have previously yielded good rewards
(exploitation).

• 2. Learning Process: The agent goes through a cycle of observing the


current state, choosing and performing an action, receiving a reward,
and moving to the next state. The reward serves as feedback, which
helps the agent learn which actions yield the highest cumulative
reward over time.

• 3. Goal: The agent’s goal is to learn a policy that maximizes


cumulative reward over time, sometimes discounted over the future.
Types of RL Algorithms
• 1.Model-Free vs. Model-Based RL:
- Model-Free: The agent learns purely from interaction without
understanding the environment’s underlying model (e.g., Q-
learning, SARSA).
-Model-Based: The agent learns or uses a model of the
environment to plan actions (e.g., Dyna-Q, AlphaGo).
• 2. Value-Based Methods: Focus on estimating the value of states or
state-action pairs, typically with Q-learning and Deep Q Networks
(DQNs).
• 3. Policy-Based Methods: Directly optimize the policy function
without estimating values, usually through gradient-based methods
(e.g., REINFORCE, Proximal Policy Optimization).
• 4.Actor-Critic Methods: Combine value-based and policy-based
approaches. The “actor” updates the policy, and the “critic”
estimates the value function to critique the actions taken (e.g.,
Applications of RL
• -Games: RL is used in board games like chess and Go (e.g.,
AlphaGo), video games, and other competitive environments.
• - Robotics: Teaching robots to navigate, manipulate objects,
and perform tasks autonomously.
• - Healthcare: Assisting in personalized treatment plans,
optimizing resource allocation, and managing healthcare
operations.
• - Finance: Portfolio optimization, trading strategies, and risk
management.
• -Self-Driving Cars: Decision-making in complex environments,
like lane changing, braking, and accelerating.

You might also like