0% found this document useful (0 votes)

6 views49 pages

Unit 4

The document discusses Semi-Supervised Learning (SSL) and Reinforcement Learning (RL) as machine learning techniques that utilize both labeled and unlabeled data for model training and decision-making, respectively. It covers various concepts within RL, including the Markov Decision Process, Bellman Equation, Monte Carlo policy evaluation, Q-learning, and SARSA, highlighting their applications and methodologies. Additionally, it introduces Model-Based Reinforcement Learning, which enhances learning efficiency by predicting future states and rewards based on a model of the environment.

Uploaded by

berwalnimish

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views49 pages

Unit 4

Uploaded by

berwalnimish

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 49

Semi Supervised

Learning,
Semi Supervised Learning

 Semi-supervised learning (SSL) is a machine learning technique that

uses both labeled and unlabeled data to train AI models for
classification and regression.

 SSL is a combination of supervised and unsupervised learning, and it

uses techniques that incorporate unlabeled data into model training.
Examples of Semi-Supervised
Learning
 Text Classification

 Image Classification

 Anomaly Detection
Reinforcement Learning

 Reinforcement learning (RL) is a machine learning (ML) technique that trains software
to make decisions to achieve the most optimal results.

 Reinforcement Learning is a feedback-based Machine learning technique in which an

agent learns to behave in an environment by performing the actions and seeing the
results of actions.

 Reinforcement learning is a form of machine learning that teaches a model to choose

the best course of action while solving a problem

 It mimics the trial-and-error learning process that humans use to achieve their goals.

 Reinforcement Learning (RL) is the science of decision making. It is about learning the
optimal behavior in an environment to obtain maximum reward.
 In Reinforcement Learning, the agent learns automatically using
feedbacks without any labeled data, unlike supervised learning.

 Since there is no labeled data, so the agent is bound to learn by its

experience only.

 How a Robotic dog learns the movement of his arms is an example of

Reinforcement learning.
Types of Reinforcement:

 There are two types of Reinforcement:

1. Positive: Positive Reinforcement is defined as when an event, occurs
due to a particular behavior, increases the strength and the
frequency of the behavior. In other words, it has a positive effect on
behavior.

Advantages of reinforcement learning are:

1. Maximizes Performance
2. Sustain Change for a long period of time
3. Too much Reinforcement can lead to an overload of states which can
diminish the results
Types of Reinforcement:

1. Negative: Negative Reinforcement is defined as strengthening of

behavior because a negative condition is stopped or avoided.

Advantages of reinforcement learning:

1. Increases Behavior
2. Provide defiance to a minimum standard of performance
3. It Only provides enough to meet up the minimum behavior
• Example: Suppose there is an AI agent present within a maze
environment, and his goal is to find the diamond. The agent interacts
with the environment by performing some actions, and based on
those actions, the state of the agent gets changed, and it also
receives a reward or penalty as feedback.
• The agent continues doing these three things (take action, change
state/remain in the same state, and get feedback), and by
doing these actions, he learns and explores the environment.
• The agent learns that what actions lead to positive feedback or
rewards and what actions lead to negative feedback penalty. As a
positive reward, the agent gets a positive point, and as a penalty, it
gets a negative point
Markov Decision Process

 Markov Decision Process or MDP, is used to formalize the

reinforcement learning problems.

 If the environment is completely observable, then its dynamic can be

modeled as a Markov Process.

 In MDP, the agent constantly interacts with the environment and

performs actions; at each action, the environment responds and
generates a new state.
 A State is a set of tokens that represent every state that the agent
can be in.

 MDP contains a tuple of four elements (S, A, Pa, Ra):

• A set of finite States S
• A set of finite Actions A
• Rewards received after transitioning from state S to state S', due to
action a.
• Probability Pa.
Markov Property

 MDP uses Markov property, and to better understand the MDP, we

need to learn about it.

 "If the agent is present in the current state S1, performs an

action a1 and move to the state s2, then the state transition
from s1 to s2 only depends on the current state and future
action and states do not depend on past actions, rewards, or
states.“

 Example: Chess game, the players only focus on the current

state and do not need to remember past actions or states.
 Markov Process is a memoryless process with a sequence of random
states S1, S2, ....., St that uses the Markov Property. Markov process is
also known as Markov chain, which is a tuple (S, P) on state S and
transition function P.
Bellman Equation

 The Bellman equation was introduced by the Mathematician Richard

Ernest Bellman in the year 1953, and hence it is called as a
Bellman equation.

 It is associated with dynamic programming and used to calculate the

values of a decision problem at a certain point by including the values
of previous states.
Where,
V(s)= value calculated at a particular point.
R(s,a) = Reward at a particular state s by
performing an action a.
γ = Discount factor(Gamma)
V(s`) = The value at the previous state.
 In the above image, the agent is at the very first block of the maze.
The maze is consisting of an S6 block, which is a wall, S8 a fire pit,
and S4 a diamond block.

 The agent cannot cross the S6 block, as it is a solid wall. If the agent
reaches the S4 block, then get the +1 reward; if it reaches the fire
pit, then gets -1 reward point. It can take four actions: move up,
move down, move left, and move right.

 The agent can take any path to reach to the final point, but he needs
to make it in possible fewer steps. Suppose the agent considers the
path S9-S5-S1-S2-S3, so he will get the +1-reward point.
 For 1st block:
 V(s3) = max [R(s,a) + γV(s`)], here V(s')= 0 because there is no
further state to move.
 V(s3)= max[R(s,a)]=> V(s3)= max[1]=> V(s3)= 1

 For 2nd block:

 V(s2) = max [R(s,a) + γV(s`)], here γ= 0.9(lets), V(s')= 1, and R(s,
a)= 0, because there is no reward at this state.
 V(s2)= max[0.9(1)]=> V(s)= max[0.9]=> V(s2) =0.9
 For 3rd block:
 V(s1) = max [R(s,a) + γV(s`)], here γ= 0.9(lets), V(s')= 0.9, and R(s,
a)= 0, because there is no reward at this state also.
 V(s1)= max[0.9(0.9)]=> V(s3)= max[0.81]=> V(s1) =0.81

 For 4th block:

 V(s5) = max [R(s,a) + γV(s`)], here γ= 0.9(lets), V(s')= 0.81, and R(s,
a)= 0, because there is no reward at this state also.
 V(s5)= max[0.9(0.81)]=> V(s5)= max[0.81]=> V(s5) =0.73
 For 5th block:
 V(s9) = max [R(s,a) + γV(s`)], here γ= 0.9(lets), V(s')= 0.73, and R(s,
a)= 0, because there is no reward at this state also.
 V(s9)= max[0.9(0.73)]=> V(s4)= max[0.81]=> V(s4) =0.66
 By using the Bellman equation our agent will calculate the value of
every step except for the trophy and the fire state (V = 0), they
cannot have values since they are the end of the maze.
Policy Evolution using Monte Carlo

 Monte Carlo policy evaluation is a technique within the field of

reinforcement learning that estimates the effectiveness of a policy—a
strategy for making decisions in an environment.

 It’s a bit like learning the rules of a game by playing it many times,
rather than studying its manual.

 This approach doesn’t require a pre-built model of the environment;

instead, it learns exclusively from the outcomes of the episodes it
experiences.
How Monte Carlo Policy Evaluation
Works?
 The method works by running simulations or episodes(repeated
random sampling) where an agent interacts with the environment
until it reaches a terminal state.

 At the end of each episode, the algorithm looks back at the states
visited and the rewards received to calculate what’s known as the
“return” — the cumulative reward starting from a specific state until
the end of the episode.
 Monte Carlo policy evaluation repeatedly simulates episodes, tracking the
total rewards that follow each state and then calculating the average.

 These averages give an estimate of the state value under the policy being
followed.

 By aggregating the results over many episodes, the method converges to

the true value of each state when following the policy.

 These values are useful because they help us understand which states are
more valuable and thus guide the agent toward better decision-making in
the future.
 Over time, as the agent learns the value of different states, it can
refine its policy, favoring actions that lead to higher rewards.

 In Monte Carlo policy evaluation, the value V of a state “s” under a

policy π is estimated by the average return G following that state. The
return is the cumulative reward obtained after visiting state “s”:

N(s) is the number of times state “s” is visited across episodes, and
Gi is the return from the i-th episode after visiting state “s”.
Conclusion

 Monte Carlo policy evaluation is like learning through full experience.

It’s a hands-on way to measure how effective certain actions are,
based on the rewards they yield over many trials.
Q Learning

 Q-learning is a basic form of Reinforcement Learning that uses Q-

values (also called action values) to iteratively improve the behavior
of the learning agent.

 Q-learning is a popular model-free reinforcement learning algorithm

used in machine learning and artificial intelligence applications.

 It falls under the category of temporal difference learning techniques,

in which an agent picks up new information by observing results,
interacting with the environment, and getting feedback in the form of
rewards.
 we can see there is an agent who has three values options, V(s1),
V(s2), V(s3). As this is MDP, so agent only cares for the current state
and the future state.

 The agent can go to any direction (Up, Left, or Right), so he needs to

decide where to go for the optimal path.

 Here agent will take a move as per probability bases and changes the
state. But if we want some exact moves, so for this, we need to make
some changes in terms of Q-value.
 Q- represents the quality of the actions at each state. So instead of
using a value at each state, we will use a pair of state and action, i.e.,
Q(s, a).

 Q-value specifies that which action is more lubricative than others,

and according to the best Q-value, the agent takes his next move.

 The Bellman equation can be used for deriving the Q-value.

The above formula is used to estimate the Q-
values in Q-Learning.
SARSA

 State-Action-Reward-State-Action

 Consider teaching a computer to play a game, operate a car, or

manage resources.

 SARSA is a reinforcement learning algorithm that teaches computers

how to make good decisions by interacting with an environment.

 It helps computers learn from their experiences to determine the

best actions.
Explanation of SARSA:

 Assume you're teaching a robot to navigate a maze. The robot begins

at a specific location (the "State" - where it is), and you want it to
discover the best path to the maze's finish.

 The robot can proceed in numerous directions at each step (these are
the "Actions" - what it does). As it travels, the robot receives input
through incentives - positive or negative numbers indicating its
performance.
Explanation of SARSA:

 The amazing thing about SARSA is that it doesn't need a map of the
maze or explicit instructions on what to do.

 It learns by trial and error, discovering which actions work best in

different situations. This way, SARSA helps computers learn to make
decisions in various scenarios, from games to driving cars to
managing resources efficiently.
Equation

Here, the update equation for SARSA depends on the current state,
current action, reward obtained, next state and next action.
Code Snippet
Output
Model Based Reinforcement
Learning
 The model is used for planning, which means it provides a way to take a
course of action by considering all future situations before actually
experiencing those situations.

 In Model-based reinforcement learning which mimics the behavior of the

environment.

 The approaches for solving the RL problems with the help of the model are
termed as the model-based approach.

 With the help of the model, one can make inferences(idea or conclusions)
about how the environment will behave.
 Such as, if a state and an action are given, then a model can predict
the next state and reward.

 Model-based reinforcement learning (MBRL) is an approach within the

field of reinforcement learning (RL) that incorporates a model of the
environment to improve the efficiency and effectiveness of the
learning process.
 In MBRL, an agent not only learns from interactions with the
environment but also builds and utilizes a model of the environment.

 This model can predict the next state and reward given the current
state and action. It helps the agent to simulate future states and
outcomes without direct interaction.
Approach in MBRL

 1. Model Learning:
 Implicit - Indirectly learning models, often through latent variable
representations or embeddings.
 Explicit- Directly learning the dynamics of the environment (e.g., using
neural networks or Gaussian processes).
 2. Planning Algorithms:
 Planning involves using the model to simulate multiple future scenarios,
enabling the agent to choose actions that maximize long-term rewards.

 3. Hybrid Approach
 Combining MBRL with model-free methods can leverage the strengths of
both approaches.

Ai (It) Unit-5
No ratings yet
Ai (It) Unit-5
43 pages
Wa0000.
No ratings yet
Wa0000.
40 pages
Dod Data Analytics Ai Adoption Strategy
No ratings yet
Dod Data Analytics Ai Adoption Strategy
26 pages
DLMAIRIL01 Q4-2024 Session4
No ratings yet
DLMAIRIL01 Q4-2024 Session4
80 pages
Chapter - 6 Communicating, Perceiving, and Acting
No ratings yet
Chapter - 6 Communicating, Perceiving, and Acting
30 pages
Erik Herman - Data Science For Decision Makers - Using Analytics and Case Studies (2024, Mercury Learning and Information) - Libgen - Li
No ratings yet
Erik Herman - Data Science For Decision Makers - Using Analytics and Case Studies (2024, Mercury Learning and Information) - Libgen - Li
197 pages
Unit No. 01 - Introduction To AI & ML
No ratings yet
Unit No. 01 - Introduction To AI & ML
34 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
31 pages
Unit 5
No ratings yet
Unit 5
39 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
50 pages
RL MJJ
No ratings yet
RL MJJ
32 pages
Lec17 ReinforcementLearning
No ratings yet
Lec17 ReinforcementLearning
58 pages
Reinforcement Learning: Karan Kathpalia
No ratings yet
Reinforcement Learning: Karan Kathpalia
80 pages
Abdelghani KABOT S CV 1675947915
No ratings yet
Abdelghani KABOT S CV 1675947915
2 pages
Analyzing The Effectiveness of Adaptive Learning Systems
No ratings yet
Analyzing The Effectiveness of Adaptive Learning Systems
14 pages
Reinforcement Learning Explained - A Step-by-Step Guide to Reward-Driven AI
From Everand
Reinforcement Learning Explained - A Step-by-Step Guide to Reward-Driven AI
Luka Nikolic
No ratings yet
WisconsinStateJournal 20230924
No ratings yet
WisconsinStateJournal 20230924
42 pages
Finite Markov Decision Processes-BR
No ratings yet
Finite Markov Decision Processes-BR
31 pages
Unit 5 ML
No ratings yet
Unit 5 ML
15 pages
L13 Reinforcement Learning
No ratings yet
L13 Reinforcement Learning
57 pages
Individual Assignment ITT501
No ratings yet
Individual Assignment ITT501
8 pages
ML Unit-4
No ratings yet
ML Unit-4
10 pages
IntroductiontoRL BR
No ratings yet
IntroductiontoRL BR
22 pages
Unit-8 - Reinforcement Learning
No ratings yet
Unit-8 - Reinforcement Learning
52 pages
Aiml Unit 4
No ratings yet
Aiml Unit 4
20 pages
DL Unit 1
No ratings yet
DL Unit 1
21 pages
ML Unit 4
No ratings yet
ML Unit 4
17 pages
DSA5102 Lecture11
No ratings yet
DSA5102 Lecture11
44 pages
Unit Vi
No ratings yet
Unit Vi
17 pages
Chapter 18 - Reinforcement Learning
No ratings yet
Chapter 18 - Reinforcement Learning
29 pages
11-DL-Deep Learning For Reinforcement Learning
No ratings yet
11-DL-Deep Learning For Reinforcement Learning
47 pages
L12 Reinforcement Learning 2
No ratings yet
L12 Reinforcement Learning 2
26 pages
The Role of An AI Architect
No ratings yet
The Role of An AI Architect
7 pages
AI Research
No ratings yet
AI Research
17 pages
Lecture#5 Monte Carlo Methods Part I
No ratings yet
Lecture#5 Monte Carlo Methods Part I
28 pages
Lecture 9 Reiforcement Learning
No ratings yet
Lecture 9 Reiforcement Learning
29 pages
114021
No ratings yet
114021
55 pages
MLT Unit-5 Notes
No ratings yet
MLT Unit-5 Notes
17 pages
New Research Report Maximizing Digital Banking Engagement 1676131939
No ratings yet
New Research Report Maximizing Digital Banking Engagement 1676131939
70 pages
20ai903 - RL - Unit 2
No ratings yet
20ai903 - RL - Unit 2
27 pages
Reinforcement Learning Note
No ratings yet
Reinforcement Learning Note
16 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
ML - Unit-3 - Reinforcement Learning
No ratings yet
ML - Unit-3 - Reinforcement Learning
47 pages
Employee Experience Trends
No ratings yet
Employee Experience Trends
83 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
45 pages
2 Paper Structure
No ratings yet
2 Paper Structure
2 pages
7 - Reinforcement Learning
No ratings yet
7 - Reinforcement Learning
23 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
9 pages
Add-On DRL CS06
No ratings yet
Add-On DRL CS06
23 pages
DL Unit 6 QP Solution
No ratings yet
DL Unit 6 QP Solution
15 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
29 pages
Lecture 30 Reinforcement-Learning
No ratings yet
Lecture 30 Reinforcement-Learning
50 pages
ML Unit-4 - RTU
No ratings yet
ML Unit-4 - RTU
18 pages
Unit-5 (AI)
No ratings yet
Unit-5 (AI)
21 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
6 pages
Unit 5
No ratings yet
Unit 5
10 pages
Reinforcement Learning-1
No ratings yet
Reinforcement Learning-1
13 pages
Reinforcement Learning MY101
No ratings yet
Reinforcement Learning MY101
15 pages
Artificial Intelligence: Computer Science & Engineering, Khulna University
No ratings yet
Artificial Intelligence: Computer Science & Engineering, Khulna University
30 pages
Navigating Artificial Intelligence-Aug 2024
No ratings yet
Navigating Artificial Intelligence-Aug 2024
50 pages
Deep Learning Lite
No ratings yet
Deep Learning Lite
58 pages
Unit-5 ML Notes
No ratings yet
Unit-5 ML Notes
31 pages
Reinforced Learning
No ratings yet
Reinforced Learning
25 pages
MSDS Sample
No ratings yet
MSDS Sample
3 pages
RL Ese
No ratings yet
RL Ese
7 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
5 pages
5.4-Reinforcement Learning-Part1-Introduction
No ratings yet
5.4-Reinforcement Learning-Part1-Introduction
15 pages
Unit 5 - Reinforcement Learning
No ratings yet
Unit 5 - Reinforcement Learning
15 pages
Digital Technologies and Their Impact On Society and Governance
No ratings yet
Digital Technologies and Their Impact On Society and Governance
16 pages
Fastfood Chain Report
No ratings yet
Fastfood Chain Report
25 pages
ML Unit 4
No ratings yet
ML Unit 4
9 pages
Reinforcement
No ratings yet
Reinforcement
9 pages
Reinforcement Learning: Nguyen Do Van, PHD
No ratings yet
Reinforcement Learning: Nguyen Do Van, PHD
40 pages
Fundamentals of Reinforcement Learning
No ratings yet
Fundamentals of Reinforcement Learning
33 pages
ZenHire Competitor Analysis
No ratings yet
ZenHire Competitor Analysis
12 pages
7 Data Science / Machine Learning Cheat Sheets in One
100% (1)
7 Data Science / Machine Learning Cheat Sheets in One
9 pages
Python For Data Science Cheat Sheet: Scikit-Learn Create Your Model Evaluate Your Model's Performance
100% (1)
Python For Data Science Cheat Sheet: Scikit-Learn Create Your Model Evaluate Your Model's Performance
1 page
Ai in Control
No ratings yet
Ai in Control
4 pages
The Machine Learning Journey PDF
No ratings yet
The Machine Learning Journey PDF
21 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
46 pages
Netaji Subhash Engineering College
No ratings yet
Netaji Subhash Engineering College
24 pages
Stochastic Process - Markov Property - Markov Chain - Markov Decision Process - Reinforcement Learning - RL Techniques - Example Applications
No ratings yet
Stochastic Process - Markov Property - Markov Chain - Markov Decision Process - Reinforcement Learning - RL Techniques - Example Applications
39 pages
Reinforcement Learning, Crawling Robot: Faculty of Sciences and Techniques Béni-Mellal
No ratings yet
Reinforcement Learning, Crawling Robot: Faculty of Sciences and Techniques Béni-Mellal
5 pages
Question Bank UM19MB602: Introduction To Machine Learning Unit 4: Decision Tree
No ratings yet
Question Bank UM19MB602: Introduction To Machine Learning Unit 4: Decision Tree
4 pages
SP14 CS188 Lecture 10 - Reinforcement Learning I
No ratings yet
SP14 CS188 Lecture 10 - Reinforcement Learning I
35 pages
A Brief Introduction To Reinforcement Learning
No ratings yet
A Brief Introduction To Reinforcement Learning
4 pages
Markov Decision Processes & Reinforcement Learning: Megan Smith Lehigh University, Fall 2006
No ratings yet
Markov Decision Processes & Reinforcement Learning: Megan Smith Lehigh University, Fall 2006
40 pages
Markov Decision Process: Fundamentals and Applications
From Everand
Markov Decision Process: Fundamentals and Applications
Fouad Sabry
No ratings yet
Adaptive Neuro-Fuzzy Inference
No ratings yet
Adaptive Neuro-Fuzzy Inference
13 pages
IT8601-Computational Intelligence PDF
No ratings yet
IT8601-Computational Intelligence PDF
12 pages