0% found this document useful (0 votes)

19 views11 pages

ML Mod 6

Uploaded by

neha1831sewani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views11 pages

ML Mod 6

Uploaded by

neha1831sewani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Module 06

Q1. What are the elements of reinforcement learning?

Q2. What is Reinforcement Learning? Explain with the help of an example.
Q3. Explain reinforcement learning in detail along with the various elements
involved in forming the concept. Also define what is meant by positive RL.

[IOM | May16, Dec16, May17, Dec17 & May18]

Ans:
REINFORCEMENT LEARNING:

1. Reinforcement Learning is a type of machine learning algorithm.

2. It enables an agent to learn in an interactive environment by trial and error.

3. Agent uses feedback from its own actions and experiences.

4. The goal is to find a suitable action model that increases total reward of the
agent.

5. Output depends on the state of the current input.

6. The next state depends on the output of the previous input.

7. Types of Reinforcement Learning:

a. Positive RL
b. Negative RL

8. The most common applications of reinforcement learning are:

a. PC Games: Reinforcement learning is widely being used in PC games like
Assassin's Creed. Change the enemies to change their moves and
approach based on your performance.
b. Robotics: Most of the robots that you see in the present world are
running on reinforcement learning.

EXAMPLE:

1. We have an agent and a reward with many hurdles in between.

2. The agent is supposed to find the best possible path to reach the reward.
[Figure 8.1: Example of Reinforcement Learning - showing a grid with robot,
diamond and fire]

3. The Figure 8.1 shows robot, diamond and fire.

Module 06 1
4. The goal of the robot is to get the reward.

5. Reward is the diamond and avoid the hurdles that is fire.

6. The robot learns by trying all the possible paths.

7. After learning robot chooses a path which gives him the reward with the
least hurdles.

8. Each right step will give the robot a reward.

9. Each wrong step will subtract the reward of the robot.

10. The total reward will be calculated when it reaches the final reward that is
the diamond.

REINFORCEMENT LEARNING ELEMENTS:

These are four main sub elements of a reinforcement learning system:

1. A policy

2. Reward function

3. A value function

4. A model of the environment (optional)

5. Policy:

The policy is the core of a reinforcement learning agent

Policy is sufficient to determine behaviour

In general, policies may be stochastic

A policy defines the learning agent's way of behaving at a given time

The simplest case of a policy may be a simple function or lookup table

1. Reward Function:

A reward function defines the goal in a reinforcement learning problem

It maps each perceived state of the environment to a single number, a

reward

The reward function defines what the good and bad events are for the
agent

Reward functions must necessarily be unalterable by the agent

Reward functions serve as a basis to altering the policy

Module 06 2
1. Value Function:

Whereas a reward function indicates what is good in an immediate sense

A value function specifies what is good in the long run

The value of a state is the total amount of reward an agent can expect to
collect over the future, starting from that state

Rewards are in a sense primary, whereas values, as predictions of rewards,

are secondary

Without rewards there could be no values, and the only purpose of

estimating values is to achieve more reward

1. Model:

Model is an optional element of reinforcement learning system

Model is something that mimics the behaviour of the environment

For example, given a state and action, it can predict the resultant next state
and next reward

Models are used for planning

Q4. Model Based Learning

[IOM | Dec17]
Ans:
MODEL BASED LEARNING:

1. Model-based machine learning refers to machine learning models.

Module 06 3
2. These models are parameterized with a certain number of parameters
which do not change as the model structure changes.

3. The goal is to provide a simple development framework which supports the

creation of a wide range of bespoke models.

4. For example, if you assume that a set of data {(x₁,..., x_n)} you are given is
subject to a linear model y=f(x;w, b), where w ∈
ℝᵈ and b is the dimension
of each data point irrespective of n.

5. There are 3 steps to model based machine learning which:

a. Describe the Model: Describe the process that generated the data using
factor graphs
b. Condition on Observed Data: Condition the variables to their known
quantities
c. Perform Inference: Perform bayesian reasoning to update the prior
distribution over the latent variables or parameters

6. This framework emerged from an important convergence of three key

ideas:
a. The adoption of a Bayesian viewpoint
b. The use of factor graphs (a type of probabilistic graphical model); and
c. The application of fast, deterministic efficient and approximate inference
algorithms

MODEL BASED LEARNING PROCESS:

1. We start with model learning where we completely know the environment

model parameters

2. Environment parameters are P(s') and r + 1 (s, a)

3. In such a case, we do not need any exploration

4. We can directly solve for the optimal policy using dynamic programming

5. The optimal value function is obtained

6. When we have the optimal value function, the optimal policy is to choose
the action that maximizes value in next state as following:
π*(s+t) = argₐ max{[r(s,t,a,t) + γ P(s,t + 1|s,t,a,t)
v(s,t + 1)]}

BENEFITS:

1. Provides a systematic process of creating ML solutions

Module 06 4
2. Allow for incorporation of prior knowledge

3. Does not suffer from over-fitting

4. Separates model from inference/training code

Q5. Explain in detail Temporal Difference Learning

[IOM | Dec16, May17 & Dec18]

TEMPORAL DIFFERENCE LEARNING:

It is an approach to learning how to predict a quantity that depends on future
values of a given signal.
Various TD learning methods can be used to estimate these value functions:

1. They can be used to predict returns - the total reward expected over the
future

2. Target returns are usually "boot strapped" returns towards a more accurate
target return

3. TD algorithms are often used to predict a measure of the total amount of

reward expected over the future

4. They can be used to predict other quantities as well

Different TD algorithms:

1. TD(0) algorithm

2. TD(λ) algorithm

3. TD(n) algorithm

4. The aspect to understand temporal difference algorithms is the TD(0)

algorithm

TEMPORAL DIFFERENCE LEARNING METHODS:

A) On-Policy Temporal Difference methods:

1. Learn the value of the policy that is used to make decisions

2. The value functions are related to specific actions determined by some

policy

3. These policies are usually "soft" and non-deterministic

4. Permission time is always an element of validation to the policy

5. The policy is not strict that it always chooses the action that gives the most
reward

Module 06 5
6. On policy algorithms cannot separate exploration from control

B) Off-Policy Temporal Difference methods:

1. Can learn different policies for estimation

2. Again the behaviors are usually "soft" to ensure sufficient exploration going
on

3. Off-policy algorithms can update the estimated value functions using

actions which have not actually been tried

4. Off policy algorithms can separate exploration from control

5. Agent may end up learning multiple policies of different behaviors during

the learning phase

ACTION SELECTION POLICIES:

The aim of these policies is to balance the trade-off between exploitation and
exploration
I. ε-Greedy:

1. Most of the time the action with the highest estimated reward is chosen,
called the greedy action

2. Every once in a while, say with a small probability, an action is selected at

random

3. The action is selected randomly independent of the action-value estimates

4. This method ensures that if enough trials are done, each action will be tried
an infinite number of times

5. Ensures optimal actions are discovered

II) Softmax:

1. Very similar to ε-greedy

2. The near actions is just as likely to be selected as the second best

3. Softmax assigns a rank or weight to each of these actions according to

their action-value estimate

4. An action is selected with respect to the weight associated with each action

5. It means that the worst actions are unlikely to be chosen

Module 06 6
6. This is a good approach to take when the worst actions are very
unfavorable

ADVANTAGES OF TD METHODS:

1. Don't need a model of the environment

2. Online and incremental can be use

3. They don't need to wait till the end of the episode

4. Need less memory and computation

Q6. What is Q-learning? Explain algorithm for learning Q

Ans: [IOM | Dec18]
Q-LEARNING:

1. Q-learning is a reinforcement learning technique used in machine learning

2. Q-learning is a values-based learning algorithm

3. The goal of Q-learning is to learn a policy

4. Q-learning tells an agent what action to take under what circumstances

5. Q-learning uses temporal differences to estimate the value of Q(s, a)

6. In Q-learning, the agent maintains a table of Q[s,A],

where S is the set of states and
A is the set of actions.
Q[s, a] represents the current estimate of Q*(s, a)

7. An experience (s, a, s') provides one data point for the value of Q(s, a)

8. The data point is that the agent received the future value of V*(s'),
where V(s') = maxₐ Q(s', a)

9. This is the actual current reward plus the discounted estimated future value

10. This new data point is called a return

11. The agent can use the temporal difference equation to update its estimate
for Q(s, a):
Q(s, a) = Q(s, a) + α[r + γmaxₐ' Q(s', a') - Q(s, a)]

12. Or, equivalently,

Q(s, a) ← (1-α) Q(s, a) + α[r + γmaxₐ' Q(s', a')]

Module 06 7
13. It can be proven that given sufficient training under any ε-soft policy, the
algorithm converges with probability 1 to a close approximation of the
action-value function for an arbitrary target policy

14. Q-Learning learns the optimal policy even when actions are selected
according to a more exploratory or even random policy

Q-TABLE:

1. Q-Table is just a simple lookup table

2. In Q-Table we calculate the maximum expected future rewards for action at

each state

3. Basically, this table will guide us to the best action at each state

4. In the Q-Table, the columns are the actions and the rows are the states

5. Each Q-table score will be the maximum expected future reward that the
agent will get if it takes that action at that state

6. This is an iterative process, as we need to improve the Q-table at each

iteration

PARAMETERS USED IN THE Q-VALUE UPDATE PROCESS:

I) α - the learning rate:

1. Set between 0 and 1

2. Setting it to 0 means that the Q-values are never updated, hence nothing is
learned

3. Setting a high value such as 0.9 means that learning can occur quickly

II) γ - Discount factor:

1. Set between 0 and 1

2. This models the fact that future rewards are worth less than immediate
rewards

3. Mathematically, the discount factor needs to be set less than 0 for the
algorithm to converge

III) Maxₐ - the maximum reward:

1. This is attainable in the state following the current one

2. i.e. the reward for taking the optimal action thereafter

Module 06 8
PROCEDURAL APPROACH:

1. Initialize the Q-values table, Q(s, a)

2. Observe the current state, s

3. Choose an action, a, for that state based on one of the action selection
policies (i.e. soft, ε-greedy or Softmax)

4. Take the action, and observe the reward, r, as well as the new state, s'

5. Update the Q-value for the state using the observed reward and the
maximum reward possible for the next state. (The updating is done
according to the formula and parameters described above.)

6. Set the state to the new state, and repeat the process until a terminal state
is reached

Q7. Explain following terms with respect to Reinforcement Learning: delayed

rewards, exploration, and partially observable states.(10 mark - d18)
Q8. Explain reinforcement learning in detail along with the various elements
involved in forming the concept. Also define what is meant by partially
observable state. (10 mark - d17)
[IOM | Dec17 & Dec18]

EXPLORATION:

1. It means gathering more information about the problem

2. Reinforcement learning requires clever exploration mechanisms

3. Randomly selecting actions, without reference to an estimated probability

distribution, seems less performance

4. The case of (small) finite Markov decision processes is relatively well

understood

5. However, due to the lack of algorithms that properly scale well with the
number of states, simple exploration methods are the most practical

6. One such method is ε-greedy, when the agent chooses the action that it
believes has the best long-term effect with probability 1-ε

7. If no action which satisfies this condition is found, the agent chooses an

action uniformly at random

8. Here, 0 < ε < 1 is a tuning parameter, which is sometimes changed, either

according to a fixed schedule or adaptively based on heuristics

Module 06 9
DELAYED REWARDS:

1. In the general case of the reinforcement learning problem, the agent's

actions determine not only its immediate reward

2. And it also determine (at least probabilistically) the next state of the
environment

3. It may take a long sequence of actions, receiving insignificant

reinforcement

4. Then finally arrive at a state with high reinforcement

5. It has to learn from delayed reinforcement. This is called delayed rewards

6. The agent must be able to learn which of its actions are desirable based on
reward that can take place arbitrarily far in the future

7. It can also be done with eligibility traces, which weight the previous actions
a lot

8. The action before that a little less, and the action before that even less, and
so on. But it takes lot of computational time

PARTIALLY OBSERVABLE STATES:

1. In certain applications, the agent does not know the state exactly

2. It is equipped with sensors that return an observation using which the agent
should estimate the state

3. For example, we have a robot which navigates in a room

4. The robot may not know its exact location in the room, or what else is in
room

5. The robot may have a camera with which sensory observations are
recorded

6. This does not tell the robot its state exactly but gives inputs as to its likely
state

7. For example, the robot may only know that there is a wall to right

8. The setting is like a Markov decision process, except that after taking an
action at state s

9. The new state s'+1 is not known but only an observation o+1 which is
stochastic function of s and p(o+1|st, at)

Module 06 10
10. This is called as partially observable state.

sUMS ON q1

Module 06 11

Control Systems and Reinforcement Learning (Sean Meyn) (Z-Library)
No ratings yet
Control Systems and Reinforcement Learning (Sean Meyn) (Z-Library)
453 pages
Unit 5
No ratings yet
Unit 5
45 pages
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
No ratings yet
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
11 pages
Lecture 9 Reiforcement Learning
No ratings yet
Lecture 9 Reiforcement Learning
29 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
Ai (It) Unit-5
No ratings yet
Ai (It) Unit-5
43 pages
Unit4 (AI) 2024 Docx-1
No ratings yet
Unit4 (AI) 2024 Docx-1
22 pages
Unit 5 - Reinforcement Learning
No ratings yet
Unit 5 - Reinforcement Learning
15 pages
MLT Unit-5 Notes
No ratings yet
MLT Unit-5 Notes
17 pages
Unit 3
No ratings yet
Unit 3
29 pages
Unit-5 MLT
No ratings yet
Unit-5 MLT
13 pages
Reinforcement Learning: Karan Kathpalia
No ratings yet
Reinforcement Learning: Karan Kathpalia
80 pages
Lecture RL
No ratings yet
Lecture RL
37 pages
Unit-5 Mla
No ratings yet
Unit-5 Mla
22 pages
5th Unit Notes Full File
No ratings yet
5th Unit Notes Full File
22 pages
Unit-5 Reinforcemnt and Q Learning
No ratings yet
Unit-5 Reinforcemnt and Q Learning
45 pages
Unit-8 - Reinforcement Learning
No ratings yet
Unit-8 - Reinforcement Learning
52 pages
Reinforcement Learning (Part 2) : Nguyen Do Van, PHD
No ratings yet
Reinforcement Learning (Part 2) : Nguyen Do Van, PHD
46 pages
Artificial Intelligence: Computer Science & Engineering, Khulna University
No ratings yet
Artificial Intelligence: Computer Science & Engineering, Khulna University
30 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
11 pages
Unit 5-1
No ratings yet
Unit 5-1
8 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
32 pages
Reinforcement
No ratings yet
Reinforcement
9 pages
Unit 5 ML 3year
No ratings yet
Unit 5 ML 3year
17 pages
Reinforcement Learning: Yijue Hou
No ratings yet
Reinforcement Learning: Yijue Hou
34 pages
IntroductiontoRL BR
No ratings yet
IntroductiontoRL BR
22 pages
Unit 5 ML
No ratings yet
Unit 5 ML
15 pages
Reinforced Learning
No ratings yet
Reinforced Learning
25 pages
Discuss About Temporal Difference in Reinforcement Learning?
No ratings yet
Discuss About Temporal Difference in Reinforcement Learning?
9 pages
Notes For Module 4 and 5
No ratings yet
Notes For Module 4 and 5
9 pages
ML Assignment 2
No ratings yet
ML Assignment 2
6 pages
Ai Unit 3
No ratings yet
Ai Unit 3
23 pages
Lecture 29 RL
No ratings yet
Lecture 29 RL
38 pages
MLT - Module 5
No ratings yet
MLT - Module 5
77 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
29 pages
Unit-5 (AI)
No ratings yet
Unit-5 (AI)
21 pages
Unit 4 QP
No ratings yet
Unit 4 QP
19 pages
CMPE257 - W10C13 - Reinforcement Learning
No ratings yet
CMPE257 - W10C13 - Reinforcement Learning
161 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
Unit 4
No ratings yet
Unit 4
56 pages
37 RL
No ratings yet
37 RL
18 pages
Reinforcemnet Learning
No ratings yet
Reinforcemnet Learning
8 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
10 pages
ML Unit-4 - RTU
No ratings yet
ML Unit-4 - RTU
18 pages
Reinforcement Learning MY101
No ratings yet
Reinforcement Learning MY101
15 pages
DD2431 Machine Learning Lab 4: Reinforcement Learning Python Version
No ratings yet
DD2431 Machine Learning Lab 4: Reinforcement Learning Python Version
9 pages
5.4-Reinforcement Learning-Part2-Learning-Algorithms
No ratings yet
5.4-Reinforcement Learning-Part2-Learning-Algorithms
15 pages
Mlt-Cia Iii Ans Key
No ratings yet
Mlt-Cia Iii Ans Key
14 pages
Unit 1
No ratings yet
Unit 1
18 pages
QP Ans
No ratings yet
QP Ans
40 pages
L13 Reinforcement Learning
No ratings yet
L13 Reinforcement Learning
57 pages
SP14 CS188 Lecture 10 - Reinforcement Learning I
No ratings yet
SP14 CS188 Lecture 10 - Reinforcement Learning I
35 pages
DL Unit 6 QP Solution
No ratings yet
DL Unit 6 QP Solution
15 pages
ML - Unit 3 - Part II
No ratings yet
ML - Unit 3 - Part II
51 pages
RL-Endterm Report - Mridul Agarwal
No ratings yet
RL-Endterm Report - Mridul Agarwal
27 pages
Ds d79 Diy Solution v1 1tv pb2fp74
No ratings yet
Ds d79 Diy Solution v1 1tv pb2fp74
5 pages
RL Vishnu Sankar
No ratings yet
RL Vishnu Sankar
26 pages
What Is Reinforcement Learning
No ratings yet
What Is Reinforcement Learning
5 pages
21ai020 & Reinforcement Learning UNIT 1-LM:1
No ratings yet
21ai020 & Reinforcement Learning UNIT 1-LM:1
8 pages
7 - Reinforcement Learning
No ratings yet
7 - Reinforcement Learning
23 pages
Reinforcement Learning Explained - A Step-by-Step Guide to Reward-Driven AI
From Everand
Reinforcement Learning Explained - A Step-by-Step Guide to Reward-Driven AI
Luka Nikolic
No ratings yet
Seqgan: Sequence Generative Adversarial Nets With Policy Gradient
No ratings yet
Seqgan: Sequence Generative Adversarial Nets With Policy Gradient
10 pages
From Skills To Symbols: Learning Symbolic Representations For Abstract High-Level Planning
No ratings yet
From Skills To Symbols: Learning Symbolic Representations For Abstract High-Level Planning
75 pages
Introduction To ML
100% (1)
Introduction To ML
39 pages
Assignment 5 (Sol.) : Reinforcement Learning
100% (1)
Assignment 5 (Sol.) : Reinforcement Learning
4 pages
4a - Approximate Reinforcement Learning
No ratings yet
4a - Approximate Reinforcement Learning
55 pages
Machine Learning Applications in Physical Design: Recent Results and Directions
No ratings yet
Machine Learning Applications in Physical Design: Recent Results and Directions
114 pages
Self Driving Car Prototype
No ratings yet
Self Driving Car Prototype
9 pages
ssrn-4754937 - General-Purpose Deep Reinforcement Learning Approach For Dynamic
No ratings yet
ssrn-4754937 - General-Purpose Deep Reinforcement Learning Approach For Dynamic
25 pages
Machine MCQ
No ratings yet
Machine MCQ
32 pages
UNIT 1 All Notes
No ratings yet
UNIT 1 All Notes
24 pages
Machine Learning For Fluid Mechanics
No ratings yet
Machine Learning For Fluid Mechanics
32 pages
Unit 1
No ratings yet
Unit 1
22 pages
AI Interview Notes
No ratings yet
AI Interview Notes
11 pages
Unit-5 DS Notes
No ratings yet
Unit-5 DS Notes
19 pages
A Reinforcement Learning Fuzzy Controller For The Ball and Plate System
No ratings yet
A Reinforcement Learning Fuzzy Controller For The Ball and Plate System
8 pages
Ai in Workplace
No ratings yet
Ai in Workplace
35 pages
Reinforcement Learning For Portfolio Management: Meng Dissertation
No ratings yet
Reinforcement Learning For Portfolio Management: Meng Dissertation
123 pages
Machine Learning in Finance From Theory To Practice Matthew F Dixon Igor Halperin Paul Bilokon Instant Download
No ratings yet
Machine Learning in Finance From Theory To Practice Matthew F Dixon Igor Halperin Paul Bilokon Instant Download
83 pages
The Rubik's Cube
No ratings yet
The Rubik's Cube
51 pages
JETIR2501512
No ratings yet
JETIR2501512
6 pages
UNIT 3 - Data Science - III BSC CS
No ratings yet
UNIT 3 - Data Science - III BSC CS
19 pages
Transfer Learning in Deep Reinforcement Learning A Survey
No ratings yet
Transfer Learning in Deep Reinforcement Learning A Survey
19 pages
Review of Deep Reinforcement Learning Based Scheduling For Optimizing System Load and Response Time in Edge and Fog Computing Environments
No ratings yet
Review of Deep Reinforcement Learning Based Scheduling For Optimizing System Load and Response Time in Edge and Fog Computing Environments
2 pages
X CH 2 AI ProjectCycle Notes Revised
No ratings yet
X CH 2 AI ProjectCycle Notes Revised
9 pages
ArCHer - Training Language Model Agents Via Hierarchical Multi-Turn RL
No ratings yet
ArCHer - Training Language Model Agents Via Hierarchical Multi-Turn RL
39 pages
Cybersecurity Internship Report
No ratings yet
Cybersecurity Internship Report
55 pages
Accident Detection Using Machine Learning Method
No ratings yet
Accident Detection Using Machine Learning Method
10 pages
Algorithms For Reinforcement Learning - Szepesvari
No ratings yet
Algorithms For Reinforcement Learning - Szepesvari
98 pages
Machine Learning and Visual Perception 9783110595567 9783110595536
100% (1)
Machine Learning and Visual Perception 9783110595567 9783110595536
221 pages