0% found this document useful (0 votes)

13 views38 pages

Lecture 29 RL

The document discusses reinforcement learning and provides an overview of key concepts including types of reinforcement learning, elements of reinforcement learning, and applications. It also covers passive learning versus active learning in reinforcement learning and different approaches like model-based, model-free, LMS, temporal difference learning, and adaptive dynamic programming.

Uploaded by

prakuld04

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views38 pages

Lecture 29 RL

Uploaded by

prakuld04

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 38

APEX INSTITUTE OF TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Machine Learning (21CSH-286)

Faculty: Prof. (Dr.) Madan Lal Saini(E13485)

Lecture – 29
DISCOVER . LEARN . EMPOWER
Re-inforcement Learning

1
COURSE OBJECTIVES
The Course aims to:
•Understand and apply various data handling and visualization
techniques.
•Understand about some basic learning algorithms and techniques
and their applications, as well as general questions related to
analysing and handling large data sets.
•To develop skills of supervised and unsupervised learning
techniques and implementation of these to solve real life
problems.
•To develop basic knowledge on the machine techniques to build
an intellectual machine for making decisions behalf of humans.
•To develop skills for selecting suitable model parameters and
apply them for designing optimized machine learning applications.

2
COURSE OUTCOMES
On completion of this course, the students shall be able to:-

CO1 Understand machine learning techniques and computing environment

that are suitable for the applications under consideration.

CO2 Understand data pre-processing techniques and apply these for data
cleaning.

CO3 Identify and implement simple learning strategies using data science
and statistics principles.

CO4 Evaluate machine learning model’s performance and apply learning

strategy to improve the performance of supervised and unsupervised
learning model.

CO5 Develop a suitable model for supervised and unsupervised learning

algorithm and optimize the model on the expected accuracy.

3
Unit-3 Syllabus

Unit-3 Unsupervised Learning Contact Hours: 10 Hours

Clustering Types of Clustering: Centroid-based clustering, Density-based
clustering, Distribution-based Clustering and Hierarchical
clustering; K- Means Clustering, KNN (K-Nearest
Neighbours), DBSCAN clustering algorithm; Performance
metrics for clustering: Silhouette Score

Association Rule Apriori algorithm, F-P Growth Algorithm, Applications of

Learning Association Rule Learning, Market Basket Analysis.

Reinforcement Types of Reinforcement learning, Key Features of

Learning Reinforcement Learning, Elements of Reinforcement
Learning, Applications of Reinforcement Learning.

4



SUGGESTIVE READINGS
TEXT BOOKS:
T1: Tom.M.Mitchell, “Machine Learning, McGraw Hill International Edition”.
 T2: Ethern Alpaydin,” Introduction to Machine Learning. Eastern Economy Edition, Prentice Hall
of India, 2005”.
 T3: Andreas C. Miller, Sarah Guido, Introduction to Machine Learning with Python, O’REILLY
(2001).

 REFERENCE BOOKS:
 R1 Sebastian Raschka, Vahid Mirjalili, Python Machine Learning, (2014)
 R2 Richard O. Duda, Peter E. Hart, David G. Stork, “Pattern Classification, Wiley, 2nd Edition”.
 R3 Christopher Bishop, “Pattern Recognition and Machine Learning, illustrated Edition, Springer,
2006”.

5
What is learning?
 Learning takes place as a result of interaction
between an agent and the world, the idea
behind learning is that
 Percepts received by an agent should be used not
only for acting, but also for improving the agent’s
ability to behave optimally in the future to achieve
the goal.
Learning types
 Learning types
 Supervised learning:
a situation in which sample (input, output) pairs of the
function to be learned can be perceived or are given
 You can think it as if there is a kind teacher
 Reinforcement learning:
in the case of the agent acts on its environment, it
receives some evaluation of its action (reinforcement),
but is not told of which action is the correct one to
achieve its goal
Reinforcement learning
 Task
Learn how to behave successfully to achieve a
goal while interacting with an external
environment
 Learn via experiences!
 Examples
 Game playing: player knows whether it win or
lose, but not know how to move at each step
 Control: a traffic system can measure the delay of
cars, but not know how to decrease it.
RL is learning from interaction
RL model
 Each percept(e) is enough to determine the
State(the state is accessible)
 The agent can decompose the Reward
component from a percept.
 The agent task: to find a optimal policy, mapping
states to actions, that maximize long-run measure
of the reinforcement
 Think of reinforcement as reward
 Can be modeled as MDP model!
Review of MDP model
 MDP model <S,T,A,R>
• S– set of states
Agent • A– set of actions
• T(s,a,s’) = P(s’|s,a)– the
State Action probability of transition from
Reward
s to s’ given action a
Environment • R(s,a)– the expected reward
for taking action a in state s
R ( s, a )   P ( s ' | s, a ) r ( s, a, s ' )
a0 a1 a2 s'
s0 s1 s2 s3 R ( s, a )   T ( s, a, s ' ) r ( s, a, s ' )
r0 r1 r2 s'
Model based v.s.Model free
approaches
 But, we don’t know anything about the environment
model—the transition function T(s,a,s’)
 Here comes two approaches
 Model based approach RL:
learn the model, and use it to derive the optimal policy.
e.g Adaptive dynamic learning(ADP) approach

 Model free approach RL:

derive the optimal policy without learning the model.
e.g LMS and Temporal difference approach

 Which one is better?

Passive learning v.s. Active
learning
 Passive learning
 The agent imply watches the world going by and
tries to learn the utilities of being in various states
 Active learning
 The agent not simply watches, but also acts
Example environment
Passive learning scenario
 The agent see the the sequences of state
transitions and associate rewards
 The environment generates state transitions and
the agent perceive them
e.g (1,1) (1,2) (1,3) (2,3) (3,3) (4,3)[+1]

(1,1)(1,2) (1,3) (1,2) (1,3) (1,2) (1,1) (2,1)

(3,1) (4,1) (4,2)[-1]

 Key idea: updating the utility value using the

given training sequences.
Passive leaning scenario
LMS updating
 Reward to go of a state
the sum of the rewards from that state until a
terminal state is reached
 Key: use observed reward to go of the state as
the direct evidence of the actual expected utility
of that state
 Learning utility function directly from sequence
example
LMS updating
function LMS-UPDATE (U, e, percepts, M, N ) return an updated U
if TERMINAL?[e] then
{ reward-to-go  0
for each ei in percepts (starting from end) do
s = STATE[ei]
reward-to-go  reward-to-go + REWARS[ei]
U[s] = RUNNING-AVERAGE (U[s], reward-to-go, N[s])
end
}
function RUNNING-AVERAGE (U[s], reward-to-go, N[s] )
U[s] = [ U[s] * (N[s] – 1) + reward-to-go ] / N[s]
LMS updating algorithm in
passive learning
 Drawback:
 The actual utility of a state is constrained to be probability- weighted
average of its successor’s utilities.
 Converge very slowly to correct utilities values (requires a lot of sequences)
 for our example, >1000!
Temporal difference method
in passive learning
 TD(0) key idea:
 adjust the estimated utility value of the current state based on its
immediately reward and the estimated value of the next state.
 The updating rule
U ( s )  U ( s )   ( R ( s )  U ( s ' )  U ( s ))

  is the learning rate parameter

 Only when  is a function that decreases as the number of times
a state has been visited increased, then can U(s)converge to the
correct value.
The TD learning curve
(4,3)

(2,3)
(2,2)

(1,1)

(3,1)
(4,1)

(4,2)
Adaptive dynamic programming(ADP)
in passive learning
 Different with LMS and TD method(model free
approaches)
 ADP is a model based approach!
 The updating rule for passive learning
U ( s )   T ( s, s ' )( r ( s, s ' )  U ( s ' ))
s'

 However, in an unknown environment, T is not

given, the agent must learn T itself by experiences
with the environment.
 How to learn T?
ADP learning curves
(4,3)

(3,3)

(2,3)

(1,1)

(3,1)

(4,1)

(4,2)
Active learning
 An active agent must consider
 what actions to take?
 what their outcomes maybe(both on learning and receiving the
rewards in the long run)?
 Update utility equation
U ( s )  max ( R( s, a )    T ( s, a, s ' )U ( s ' ))
a
s'
 Rule to chose action
a  arg max ( R( s, a )    T ( s, a, s ' )U ( s ' ))
a s'
Active ADP algorithm
For each s, initialize U(s) , T(s,a,s’) and R(s,a)
Initialize s to current state that is perceived
Loop forever
{
Select an action a and execute it (using current model R and T) using
a  arg max ( R ( s, a )    T ( s, a, s ' )U ( s ' ))
a s'

Receive immediate reward r and observe the new state s’

Using the transition tuple <s,a,s’,r> to update model R and T (see further)

For all the s ) smax

U (sate ( R( sU(s)
, update
a

, a )  usingT the
s'
( s, aupdating rule
, s ' )U ( s ' ))

s = s’
}
How to learn model?
 Use the transition tuple <s, a, s’, r> to learn T(s,a,s’) and
R(s,a). That’s supervised learning!
 Since the agent can get every transition (s, a, s’,r) directly, so take
(s,a)/s’ as an input/output example of the transition probability
function T.
 Different techniques in the supervised learning(see further reading
for detail)
 Use r and T(s,a,s’) to learn R(s,a)
R ( s, a )   T ( s, a, s ' ) r
s'
ADP approach pros and cons
 Pros:
 ADP algorithm converges far faster than LMS and Temporal
learning. That is because it use the information from the the model
of the environment.
 Cons:
 Intractable for large state space
 In each step, update U for all states
 Improve this by prioritized-sweeping (see further reading for detail)
Another model free method–
TD-Q learning
 Define Q-value function
U ( s )  max Q( s, a )
a

 Q-value function updating rule

U ( s )  max ( R( s, a )    T ( s, a, s ' )U ( s ' ))
a
s'
Q( s, a )  R ( s, a )    T ( s, a, s ' )U ( s ' )
s'

Q( s, a )  R ( s, a )    T ( s, a, s ' ) maxQ( s ' , a ' )

s'
a' <*>
 Key idea of TD-Q learning
 Combined with temporal difference approach
 The updating rule
Q( s, a )  Q( s, a )   (r   max Q( s ' , a ' )  Q( s, a ))
a'

 Rule to chose the action to take a  arg max Q( s, a )

a
TD-Q learning agent algorithm
For each pair (s, a), initialize Q(s,a)
Observe the current state s
Loop forever
{
Select an action a and execute it
a  arg max Q( s, a )
a
Receive immediate reward r and observe the new state s’
Update Q(s,a)
Q( s, a )  Q ( s, a )   (r   max Q( s ' , a ' )  Q ( s, a ))
a'
s=s’
}
Exploration problem in Active
learning
 An action has two kinds of outcome
 Gain rewards on the current experience
tuple (s,a,s’)
 Affect the percepts received, and hence
the ability of the agent to learn
Exploration problem in Active
learning
 A trade off when choosing action between
 its immediately good(reflected in its current utility estimates using the
what we have learned)
 its long term good(exploring more about the environment help it to
behave optimally in the long run)
 Two extreme approaches
 “wacky”approach: acts randomly, in the hope that it will eventually
explore the entire environment.
 “greedy”approach: acts to maximize its utility using current model
estimate
See Figure 20.10
 Just like human in the real world! People need to decide between
 Continuing in a comfortable existence
 Or striking out into the unknown in the hopes of discovering a new
and better life
Exploration problem in Active
learning
 One kind of solution: the agent should be more wacky when it has
little idea of the environment, and more greedy when it has a
model that is close to being correct
 In a given state, the agent should give some weight to actions that it
has not tried very often.
 While tend to avoid actions that are believed to be of low utility

 Implemented by exploration function f(u,n):

 assigning a higher utility estimate to relatively unexplored action state
pairs
 Chang the updating rule of value function to
U  ( s )  max (r ( s, a )  f ( T ( s, a, s ' )U  ( s ' ), N (a, s ))
a
s'
 U+ denote the optimistic estimate of the utility
Exploration problem in Active
learning
 One kind of definition of f(u,n)
f (u , n)   R
u
if n< Ne
otherwise
 R  is an optimistic estimate of the best possible reward
obtainable in any state
 The agent will try each action-state pair(s,a) at least Ne times
 The agent will behave initially as if there were wonderful rewards
scattered all over around– optimistic .
Generalization in
Reinforcement Learning
 So far we assumed that all the functions learned by
the agent are (U, T, R,Q) are tabular forms—
i.e.. It is possible to enumerate state and action
spaces.
 Use generalization techniques to deal with large state
or action space.
 Function approximation techniques
Genetic algorithm and Evolutionary
programming

 Start with a set of individuals

 Apply selection and reproduction operators to “evolve” an individual that is
successful (measured by a fitness function)
Genetic algorithm and Evolutionary
programming
 Imagine the individuals as agent functions
 Fitness function as performance measure or reward
function
 No attempt made to learn the relationship the
rewards and actions taken by an agent
 Simply searches directly in the individual space to
find one that maximizes the fitness functions
Genetic algorithm and Evolutionary
programming
 Represent an individual as a binary string(each bit of the string is called a gene)
 Selection works like this: if individual X scores twice as high as Y on the fitness
function, then X is twice likely to be selected for reproduction than Y is
 Reproduction is accomplished by cross-over and mutation
Thank you!

CSE3013 Artficial Intelligence
No ratings yet
CSE3013 Artficial Intelligence
3 pages
Lecture RL
No ratings yet
Lecture RL
37 pages
Reinforcement Learning: Yijue Hou
No ratings yet
Reinforcement Learning: Yijue Hou
34 pages
Ai (It) Unit-5
No ratings yet
Ai (It) Unit-5
43 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
IntroductiontoRL BR
No ratings yet
IntroductiontoRL BR
22 pages
Lecture 30 Reinforcement-Learning
No ratings yet
Lecture 30 Reinforcement-Learning
50 pages
Unit 3
No ratings yet
Unit 3
29 pages
Unit-8 - Reinforcement Learning
No ratings yet
Unit-8 - Reinforcement Learning
52 pages
Unit-5 Reinforcemnt and Q Learning
No ratings yet
Unit-5 Reinforcemnt and Q Learning
45 pages
16 RL
No ratings yet
16 RL
51 pages
SP14 CS188 Lecture 10 - Reinforcement Learning I
No ratings yet
SP14 CS188 Lecture 10 - Reinforcement Learning I
35 pages
Reinforcement Learning: By: Chandra Prakash IIITM Gwalior
No ratings yet
Reinforcement Learning: By: Chandra Prakash IIITM Gwalior
64 pages
Unit 5 - Reinforcement Learning
No ratings yet
Unit 5 - Reinforcement Learning
15 pages
5th Unit Notes Full File
No ratings yet
5th Unit Notes Full File
22 pages
Unit 3
No ratings yet
Unit 3
32 pages
Unit4 (AI) 2024 Docx-1
No ratings yet
Unit4 (AI) 2024 Docx-1
22 pages
CMPE257 - W10C13 - Reinforcement Learning
No ratings yet
CMPE257 - W10C13 - Reinforcement Learning
161 pages
UNIT V Explaining Reinforcement Learning - Active Vs Passive
No ratings yet
UNIT V Explaining Reinforcement Learning - Active Vs Passive
7 pages
Reinforcement Learning
100% (1)
Reinforcement Learning
64 pages
Unit 5 ML
No ratings yet
Unit 5 ML
15 pages
Lecture 5
No ratings yet
Lecture 5
28 pages
ML Unit-4 - RTU
No ratings yet
ML Unit-4 - RTU
18 pages
Artificial Intelligence: Lecture 10 - Reinforcement Learning Prof. Shivanjali Khare
No ratings yet
Artificial Intelligence: Lecture 10 - Reinforcement Learning Prof. Shivanjali Khare
45 pages
Reinforcement
No ratings yet
Reinforcement
9 pages
L13 Reinforcement Learning
No ratings yet
L13 Reinforcement Learning
57 pages
Elementos Basicos Aprendizaje Por Refuerzo
No ratings yet
Elementos Basicos Aprendizaje Por Refuerzo
52 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
28 pages
MLT Unit-5 Notes
No ratings yet
MLT Unit-5 Notes
17 pages
7 - Reinforcement Learning
No ratings yet
7 - Reinforcement Learning
23 pages
Reinforcement Learning (RL) : Big Data Mining
No ratings yet
Reinforcement Learning (RL) : Big Data Mining
86 pages
Artificial Intelligence: Computer Science & Engineering, Khulna University
No ratings yet
Artificial Intelligence: Computer Science & Engineering, Khulna University
30 pages
Reinforcement Learning
100% (1)
Reinforcement Learning
25 pages
An Overview of Machine Learning
No ratings yet
An Overview of Machine Learning
42 pages
Lec 10
No ratings yet
Lec 10
50 pages
21ai020 & Reinforcement Learning UNIT 1-LM:1
No ratings yet
21ai020 & Reinforcement Learning UNIT 1-LM:1
8 pages
Fundamentals of Reinforcement Learning
No ratings yet
Fundamentals of Reinforcement Learning
33 pages
Reinforcement Learning Notes ?
No ratings yet
Reinforcement Learning Notes ?
40 pages
Unit 5
No ratings yet
Unit 5
45 pages
Ai Unit 3
No ratings yet
Ai Unit 3
23 pages
What Is Reinforcement Learning
No ratings yet
What Is Reinforcement Learning
5 pages
DLMAIRIL01 Q4-2024 Session1
No ratings yet
DLMAIRIL01 Q4-2024 Session1
84 pages
Lecture 9 Reiforcement Learning
No ratings yet
Lecture 9 Reiforcement Learning
29 pages
ML 10
No ratings yet
ML 10
9 pages
ML 4
No ratings yet
ML 4
4 pages
Unit-5 Mla
No ratings yet
Unit-5 Mla
22 pages
Machine Learning Unit-1.2
No ratings yet
Machine Learning Unit-1.2
23 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
Learning Task
No ratings yet
Learning Task
14 pages
ML Assignment 2
No ratings yet
ML Assignment 2
6 pages
SP14 CS188 Lecture 10 - Reinforcement Learning I - Print
No ratings yet
SP14 CS188 Lecture 10 - Reinforcement Learning I - Print
25 pages
RL & DL Notes
No ratings yet
RL & DL Notes
73 pages
37 RL
No ratings yet
37 RL
18 pages
Unit V
100% (1)
Unit V
24 pages
DRL Final Notes
No ratings yet
DRL Final Notes
281 pages
Reinforcement Learning in A Id - 12008003
No ratings yet
Reinforcement Learning in A Id - 12008003
43 pages
Algorithms For Reinforced Learning
No ratings yet
Algorithms For Reinforced Learning
98 pages
Reinforcement Learning: Nazia Bibi
100% (1)
Reinforcement Learning: Nazia Bibi
61 pages
RLAlgs in MDPs
No ratings yet
RLAlgs in MDPs
98 pages
Reinforcement Learning: Csci 5512: Artificial Intelligence Ii
No ratings yet
Reinforcement Learning: Csci 5512: Artificial Intelligence Ii
30 pages
Reinforcement Learning: A Practical Guide to Algorithms
From Everand
Reinforcement Learning: A Practical Guide to Algorithms
Trilokesh Khatri
No ratings yet
Kargil Ranking
No ratings yet
Kargil Ranking
944 pages
CN Notes
No ratings yet
CN Notes
27 pages
2.2 CU Memory System Design
No ratings yet
2.2 CU Memory System Design
75 pages
2.1 CU CPU Control Unit Design
No ratings yet
2.1 CU CPU Control Unit Design
49 pages
Vision Net
No ratings yet
Vision Net
4 pages
Python Lecture 29,30
No ratings yet
Python Lecture 29,30
18 pages
From Skills To Symbols: Learning Symbolic Representations For Abstract High-Level Planning
No ratings yet
From Skills To Symbols: Learning Symbolic Representations For Abstract High-Level Planning
75 pages
OpenAI For Cybersecurity
No ratings yet
OpenAI For Cybersecurity
10 pages
TRANSCRIPT of Christian-Madsbjerg-Design-Thinking-Bogus
No ratings yet
TRANSCRIPT of Christian-Madsbjerg-Design-Thinking-Bogus
34 pages
Conscious Machines and The Soul
No ratings yet
Conscious Machines and The Soul
2 pages
The Role of Artificial Intelligence in Proactive Cyber Threat Detection in Cloud Environments
No ratings yet
The Role of Artificial Intelligence in Proactive Cyber Threat Detection in Cloud Environments
11 pages
ESRGAN Slides 3mar2025
No ratings yet
ESRGAN Slides 3mar2025
40 pages
Thesis Submit
No ratings yet
Thesis Submit
148 pages
Experience
No ratings yet
Experience
2 pages
Inset 2023 Presentation
No ratings yet
Inset 2023 Presentation
38 pages
Week 10 Translation Technology and Ethics
No ratings yet
Week 10 Translation Technology and Ethics
55 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
5 pages
Deep Learning For Beginners - Practical Guide With Python and Tensorflow - François Duval 2017
No ratings yet
Deep Learning For Beginners - Practical Guide With Python and Tensorflow - François Duval 2017
35 pages
CMPT 310 Introduction
No ratings yet
CMPT 310 Introduction
22 pages
Towards Applying Powerful Large AI Models in Classroom Teaching: Opportunities, Challenges and Prospects
No ratings yet
Towards Applying Powerful Large AI Models in Classroom Teaching: Opportunities, Challenges and Prospects
16 pages
SL Classification For Data Science..
No ratings yet
SL Classification For Data Science..
4 pages
Examen Alejandro
No ratings yet
Examen Alejandro
8 pages
Ganesh
No ratings yet
Ganesh
1 page
Game Development Thesis Introduction
100% (3)
Game Development Thesis Introduction
8 pages
AI-Lecture-08-11 (Agents)
No ratings yet
AI-Lecture-08-11 (Agents)
68 pages
Introduction To ML,: Module-I
No ratings yet
Introduction To ML,: Module-I
48 pages
1st & 2nd Year FYUGP Syllabus
No ratings yet
1st & 2nd Year FYUGP Syllabus
25 pages
Digital Society Paper 1 & 2
No ratings yet
Digital Society Paper 1 & 2
7 pages
Microsoft Company
No ratings yet
Microsoft Company
13 pages
Jtaer 20 00173
No ratings yet
Jtaer 20 00173
20 pages
Build-a-Bot Teaching Conversational AI Using A Tra
No ratings yet
Build-a-Bot Teaching Conversational AI Using A Tra
8 pages
AI in Agriculture
No ratings yet
AI in Agriculture
12 pages
State of Platform Engineering Vol 2
No ratings yet
State of Platform Engineering Vol 2
26 pages
Exploring The Landscape of Trustworthy AI Status and Challenges
No ratings yet
Exploring The Landscape of Trustworthy AI Status and Challenges
27 pages
Large Language Models in Education
No ratings yet
Large Language Models in Education
20 pages

Lecture 29 RL

Uploaded by

Lecture 29 RL

Uploaded by

APEX INSTITUTE OF TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Machine Learning (21CSH-286)

CO1 Understand machine learning techniques and computing environment

CO4 Evaluate machine learning model’s performance and apply learning

CO5 Develop a suitable model for supervised and unsupervised learning

Unit-3 Unsupervised Learning Contact Hours: 10 Hours

Association Rule Apriori algorithm, F-P Growth Algorithm, Applications of

Reinforcement Types of Reinforcement learning, Key Features of

 Model free approach RL:

 Which one is better?

(1,1)(1,2) (1,3) (1,2) (1,3) (1,2) (1,1) (2,1)

 Key idea: updating the utility value using the

  is the learning rate parameter

 However, in an unknown environment, T is not

Receive immediate reward r and observe the new state s’

For all the s ) smax

 Q-value function updating rule

Q( s, a )  R ( s, a )    T ( s, a, s ' ) maxQ( s ' , a ' )

 Rule to chose the action to take a  arg max Q( s, a )

 Implemented by exploration function f(u,n):

 Start with a set of individuals

You might also like