0% found this document useful (0 votes)

9 views35 pages

Function Approximation

Uploaded by

rajputakashchand4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views35 pages

Function Approximation

Uploaded by

rajputakashchand4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 35

Function Approximation

Module-6
Table of Content

● Function Approximation
● Drawbacks of tabular implementation
● Gradient Descent Methods
● Linear parameterization
● Policy gradient with function approximation
Drawbacks of Tabular Representation
1. Does not learn anything about how different states and actions relate
to each other.
2. Curse of Dimensionality

3. Lack of Generalization

4. Inefficient Learning

5. Scalability Issues

6. Lack of Flexibility
What to do when state and action spaces explode… literally ?
Example
let’s take a concrete example. Suppose an agent is in a 4x4 grid,so the location of the
of the agent on the grit is a feature. This gives 16 different locations meaning 16
different states.
Example
But that’s not all, suppose the orientation (north, south, east, west)
is also a feature.
This gives 4 possibilities for each location, which makes the
number of states to 16*4 = 64. Furthermore if the agent has the
possibility of using 5 different tools (including “no tool” case), this
will grow the number of states to 64 * 5 = 320.
Example
One way to represent those states is by creating a multidimensional
array such as V[row, column, direction, tool]. Then we either query or
compute a state.

For example V[1, 2, north, torch] represents the state where the agent
is at row 1, column 2, looking north and holding a torch. The value
inside this array cell tells how valuable this state is.

Another example would be V[4, 1, west, nothing], which is an agent at

4th row, 1st column, heading west and holding nothing
Example
Let’s also consider game of Chess.

The situation of the board after each move is a state. The estimated
number of states is about 10^120 !

One representation of a state in Chess could be:

V[black pawn, black rook, …, none, none, ….,white queen, white bishop]

Where each dimension represent a square on the board and its value is
one of the black or white pieces as well as none.
Example
So once we have the set of states we can assign a value-state function
for each state

Needless to say that the amount of memory needed to accommodate

the number of state is huge and the amount to time needed to compute
the value of each state is also prohibitive.

Logically this pushes us to find better and more suitable solutions.

Solution
It is always useful to keep in mind what we are trying to do, because
with all the details we might lose sight.

The idea is that we want to find the value of each state/action, in an

environment, so that the agent follows the optimum path that collects
the maximum rewards.

In the previous section we have shown that when the state space
becomes too large, the tabular methods become insufficient and
unsuitable.
Solution
In order to address this shortcoming, we can adopt a new approach based
on the features of each state. The aim is to use these set of features to
generalise the estimation of the value at states that have similar features.
We used the word estimation to indicate that this approach will never find
the true value of a state, but an approximation of it. Despite this
seemingly inconvenient result, however this will achieve faster
computation and much more generalisations.
The methods that compute these approximations are called Function
Approximators
Function Approximation
We have seen that our estimates of value functions are represented as a
table with one entry for each state or for each state–action pair.

This is a particularly clear and instructive case, but of course it is limited to

problems with small numbers of states and actions.

The problem is not just the memory needed for large tables, but the time
and data needed to fill them accurately.

In other words, the key issue is that of generalization.

How can experience with a limited subset of the state space be usefully
generalized to produce a good approximation over a much larger subset?
Function Approximation
In many tasks to which we would like to apply reinforcement learning, most states
encountered will never have been experienced exactly before.
This will almost always be the case when the state or action spaces include continuous
variables or complex sensations, such as a visual image.

The only way to learn anything at all on these tasks is to generalize from previously
experienced states to ones that have never been seen.

we need to combine reinforcement learning methods with existing generalization

methods.
The kind of generalization we require is often called function approximation because it
takes examples from a desired function (e.g., a value function) and attempts to
generalize from them to construct an approximation of the entire function
Function Approximation
A function is just a mapping from inputs to
outputs, and these can take many forms.
Let’s say you want to train a machine learning
model that predicts a person’s clothing size.
The inputs are the person’s height, weight, and
age. The output is the size.
What we are trying to do is produce a function
that converts a person’s height/weight/age
combination (a triple of numbers) into a size
(perhaps a continuous scalar value or a
classification like XS, S, M, L, XL).
Function Approximation
According to machine learning, we can do this by using the following steps:

Gather data that is representative of the population (height/weight/age numbers for a large
number of people, paired with their actual clothing size).

Train a model to approximate the function that maps the inputs to the outputs of your training
data.

Test your model on unseen data: give it height/weight/age numbers for new people and hopefully
it will produce an accurate clothing size!

Training a model would be easy if the clothing size were just a linear combination of the input
variables.

A simple linear regression could get you good values for a, b, c, and d in the following equation.

size = aheight + bweight + c*age + d

Function Approximation
However, we cannot assume, in general, that an output is a
linear combination of input variables.

Conditions in real life are complicated. Rules have

exceptions and special cases.

Examples like handwriting recognition and image

classification clearly require very complicated patterns to be
learned from high-dimensional input data.

Wouldn’t it be great if there were a way to

approximate any function?

According to the Universal Approximation Theorem, a neural

network with a single hidden layer can do exactly that.
Function Approximators
Function approximators are used in reinforcement learning (RL) to
estimate the value function or the policy when the state or action
spaces are too large or continuous to represent them using tabular
methods.
● Value Function Approximation

● Policy Approximation

There are many function approximators like:

● Linear combinations of features
● Neural networks
Advantages of Function Approximators
● Ability to handle large or continuous state/action spaces.

● Generalization to unseen states by exploiting similarities between

states.

● Compact representation of the value function or policy.

Disdvantages of Function Approximators
● Potential instability or divergence during learning.

● Approximation errors due to the limited capacity of the function

approximator.

● Need for careful feature engineering or representation learning.

Types of Function Approximators
● Linear Combinations of Features

● Artificial Neural Networks

● Kernel Methods

● Decision Trees and Ensembles

The choice of function approximator depends on factors like the nature of the problem, the available data,
and the trade-off between modeling capacity and computational complexity.
Gradient Descent Methods
● Gradient descent methods are widely used for function
approximation, particularly in machine learning and optimization

● Objective Function - 𝐽(𝜃)

tasks.

● Parameters - parameters 𝜃
● Gradient Descent - minimize the objective function
○ Gradient Calculation
○ Update Rule

where ∇𝐽(𝜃) represents the gradient of the objective function

𝛼 is learning rate
● Convergence
Types of Gradient Descent
● Batch Gradient Descent:
○ Computes the gradient of the objective function using the entire dataset.
○ slow for large datasets but ensures convergence to the global minimum.

● Stochastic Gradient Descent (SGD):

○ Computes the gradient using only one sample at a time.
○ faster but may oscillate around the minimum and can be noisy.

● Mini-batch Gradient Descent:

○ Computes the gradient using a subset of the dataset.
○ combines the advantages of both batch and stochastic gradient descent
Linear Parameterization
The details of this Linear Function Approximator method.

This is a recursive formula which computes the value of a state s based on

the values of the next states s’ and so on…

The problem with this formula is that each time we need to compute V(s) we
need to compute all future states. Even worse, if at a time we encounter a
state similar to one that we have already seen in the past, we have no way
of recognizing it.
Linear Parameterization
Also from the Temporal Difference with Q learning formulas we have a
way of estimating what future states can be:

We will use those formulas to derive some interesting solutions.

Let’s redefine V(s) such as it reflects the features it contains

Linear Parameterization
The state-value function can be expressed by its weighted sum of its features:

V(s) = W1 . F1(s) + W2 . F2(s) + … +Wn . Fn(s) or in shorter terms

● V(s) is the estimated value function for state s.

● 𝝫(s) is a vector of features at state s, and

● 𝜽ᵀ is a transpose matrix of weights applied to the features, in such a

manner that some features are valued more than others at any state s.
Linear Parameterization
However our problem is that we don’t know what is the true result!

Instead we will compute the difference between the actual result and an
estimated result. After each iteration we have a new estimation of the
true result, which makes it appear as if we are aiming at a moving target.

Still the idea is to keep refining the estimation until the difference
between the two becomes small enough.

It has been proven that this approach has enough guaranties of

convergence.
Advantages of Linear Parameterization

● Generalization: By using feature vectors, the learned value

function or policy can generalize to unseen states or state-action
pairs, as long as they share similar feature representations.

● Scalability: It allows for efficient computation and storage,

especially when the state or action space is large or continuous.

● Interpretability: The learned weights θ can provide insights into

the relative importance of different features in determining the
value function or policy.
Disadvantage of Linear Parameterization

● The inability to represent complex, non-linear functions accurately.

● Solution:
○ non-linear function approximation techniques, like neural networks or decision
trees, may be more suitable.
Policy gradient with function approximation
● Policy gradient methods with function approximation are a class of
reinforcement learning algorithms used to learn policies in
environments with large or continuous state and action spaces.
● These methods aim to directly optimize the policy parameters to
maximize the expected return.
1. Policy
2. Objective
3. Policy Gradient Theorem
4. Function Approximation
5. Update Rule
Policy

𝜋(𝑎∣𝑠;𝜃), where 𝜃 represents the parameters of the policy.

● The policy is a mapping from states to actions, denoted as
Objective
● maximize the expected return, 𝐽(𝜃)
● expected sum of rewards obtained by following the policy

𝜏 represents a trajectory generated by the policy 𝜋,

Where,

𝑟𝑡is the reward at time step 𝑡,

𝑇 is the time horizon, and
𝛾 is the discount factor.
Policy Gradient Theorem
● provides a way to compute the gradient of the objective function
with respect to the policy parameters

● 𝐺𝑡 is an estimate of the return at time step 𝑡, often computed using

Where

the rewards obtained in the future trajectory

Function Approximation
● use a parameterized function approximator, such as a neural
network.

● So, 𝜋(𝑎∣𝑠;𝜃) becomes a parameterized function, typically denoted

as 𝜋𝜃(𝑠,𝑎).
Update Rule
● We update the policy parameters in the direction of the policy gradient to increase the
expected return

𝛼 is the learning rate.

Where

The steps for policy gradient with function approximation are:

○ Collect trajectories using the current policy.
○ Compute the policy gradient using the sampled trajectories.
○ Update the policy parameters using the gradient ascent algorithm.

This iterative process continues until the policy converges to an optimal or near-optimal
policy.
References
Chapter 4 from Sutton and Barto

https://web.stanford.edu/class/cs234/CS234Win2020/slides/lecture5_pos
t.pdf

https://towardsdatascience.com/function-approximation-in-reinforcemen
t-learning-85a4864d566

https://www.youtube.com/watch?v=gqSuPgrcVx8&list=PLEAYkSg4uSQ0
Hkv_1LHlJtC_wqwVu6RQX&index=39&ab_channel=ReinforcementLearni
ng

https://www.youtube.com/watch?v=c4cvheE3diA&list=PLEAYkSg4uSQ0
Hkv_1LHlJtC_wqwVu6RQX&index=40&ab_channel=ReinforcementLearni
ng

Unit 2 Introduction To Deep Learning
No ratings yet
Unit 2 Introduction To Deep Learning
79 pages
Chapter 08
100% (2)
Chapter 08
202 pages
Intro 2 Molecular Modelling & Molecular Mechanics
50% (2)
Intro 2 Molecular Modelling & Molecular Mechanics
35 pages
Lecture13 ANFIS
No ratings yet
Lecture13 ANFIS
43 pages
Ecs 403 ML Module I
No ratings yet
Ecs 403 ML Module I
33 pages
ML Interview Questions
No ratings yet
ML Interview Questions
146 pages
Unconstrained Numerical Optimization An Introduction For Econometricians
100% (1)
Unconstrained Numerical Optimization An Introduction For Econometricians
32 pages
Multi Layer Perceptron
No ratings yet
Multi Layer Perceptron
51 pages
Morphological Analysis
No ratings yet
Morphological Analysis
118 pages
Morphological Analysis
No ratings yet
Morphological Analysis
118 pages
Linear Regression Final
No ratings yet
Linear Regression Final
160 pages
BTP Project Report
No ratings yet
BTP Project Report
13 pages
Assignment Week 4-Deep-Learning PDF
100% (1)
Assignment Week 4-Deep-Learning PDF
7 pages
DAML - Lecture Notes
No ratings yet
DAML - Lecture Notes
35 pages
RL Unit 4
No ratings yet
RL Unit 4
9 pages
CCD and BBD
No ratings yet
CCD and BBD
31 pages
Andrew NG
No ratings yet
Andrew NG
31 pages
Linear Regression: Student: Mohammed Abu Musameh Supervisor: Eng. Akram Abu Garad
No ratings yet
Linear Regression: Student: Mohammed Abu Musameh Supervisor: Eng. Akram Abu Garad
35 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
293 pages
DLbook
No ratings yet
DLbook
165 pages
8 Adagrad, RMSprop, Adam 04 Sep 2020material I 04 Sep 2020 Module4 Optimization
No ratings yet
8 Adagrad, RMSprop, Adam 04 Sep 2020material I 04 Sep 2020 Module4 Optimization
50 pages
Unit-2 Machine Learning
No ratings yet
Unit-2 Machine Learning
110 pages
Six Lectures On NN - Montanari
No ratings yet
Six Lectures On NN - Montanari
77 pages
RL Unit 5
No ratings yet
RL Unit 5
30 pages
Lecture 5 - ModelFreePrediction
No ratings yet
Lecture 5 - ModelFreePrediction
79 pages
Deep Learning As A Building Block in Probabilistic Models: Pierre-Alexandre Mattei
No ratings yet
Deep Learning As A Building Block in Probabilistic Models: Pierre-Alexandre Mattei
57 pages
04 TemporalDiffHung Hung
No ratings yet
04 TemporalDiffHung Hung
93 pages
Prof. Richardson Neuralnetworks
No ratings yet
Prof. Richardson Neuralnetworks
61 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
48 pages
Chapter 2 Conjugate Gradient and Matlab PDF
No ratings yet
Chapter 2 Conjugate Gradient and Matlab PDF
50 pages
07 Deep Reinforcement Learning (John)
No ratings yet
07 Deep Reinforcement Learning (John)
52 pages
Lecture 6: Value Function Approximation: David Silver
No ratings yet
Lecture 6: Value Function Approximation: David Silver
56 pages
Lecture 5: Value Function Approximation: Emma Brunskill
No ratings yet
Lecture 5: Value Function Approximation: Emma Brunskill
59 pages
Notes On Deep Learning Theory
No ratings yet
Notes On Deep Learning Theory
68 pages
Mitchell Machine Learning
No ratings yet
Mitchell Machine Learning
37 pages
Eid 403 ML Module I Lecture Notes
No ratings yet
Eid 403 ML Module I Lecture Notes
26 pages
SVM & CNN
No ratings yet
SVM & CNN
62 pages
Chapter
No ratings yet
Chapter
46 pages
COMP 4901Z: Reinforcement Learning: 2.3 Value Function Approximation
No ratings yet
COMP 4901Z: Reinforcement Learning: 2.3 Value Function Approximation
55 pages
Neural Networks
No ratings yet
Neural Networks
37 pages
UDL Errata
No ratings yet
UDL Errata
13 pages
Module 1
No ratings yet
Module 1
27 pages
Vector Semantics
No ratings yet
Vector Semantics
83 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Deep Neural Networks and Partial Differential Equations - Approximation Theory and Structural Properties) Philipp Petersen, University of Oxford
No ratings yet
Deep Neural Networks and Partial Differential Equations - Approximation Theory and Structural Properties) Philipp Petersen, University of Oxford
49 pages
2023 Week4 Funcapproximate Update
No ratings yet
2023 Week4 Funcapproximate Update
69 pages
Mod2 Data Streams
No ratings yet
Mod2 Data Streams
75 pages
Lecture 3 - Linear Regression
No ratings yet
Lecture 3 - Linear Regression
55 pages
Lecture 2
No ratings yet
Lecture 2
67 pages
07 FA Methods
No ratings yet
07 FA Methods
58 pages
DSA5105 Lecture5
No ratings yet
DSA5105 Lecture5
52 pages
CH5 - Function Approximation
No ratings yet
CH5 - Function Approximation
33 pages
Linear Classifiers PPT 1
No ratings yet
Linear Classifiers PPT 1
14 pages
Linear Regression For Absolute Beginners With Implementation in Python
No ratings yet
Linear Regression For Absolute Beginners With Implementation in Python
17 pages
Arxiv - v1 (Math - Oc) 23 Sep 2021
No ratings yet
Arxiv - v1 (Math - Oc) 23 Sep 2021
20 pages
Muntaha
No ratings yet
Muntaha
27 pages
ML Notes
No ratings yet
ML Notes
14 pages
Optimization Methods For Large-Scale Machine Learning - 2021
No ratings yet
Optimization Methods For Large-Scale Machine Learning - 2021
29 pages
3 - Chapter 8 Value Function Approximation
No ratings yet
3 - Chapter 8 Value Function Approximation
39 pages
Gansp Awareness Quiz PDF
No ratings yet
Gansp Awareness Quiz PDF
13 pages
20ai903 - RL - Unit 4
No ratings yet
20ai903 - RL - Unit 4
49 pages
Machine Learning UNIT-I Notes
No ratings yet
Machine Learning UNIT-I Notes
38 pages
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
No ratings yet
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
12 pages
Gradient Descent
No ratings yet
Gradient Descent
18 pages
RL With LCS
No ratings yet
RL With LCS
29 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
31 pages
Lecture 6 Value Function Approximation
No ratings yet
Lecture 6 Value Function Approximation
56 pages
5SC28 L7 Machine Learning
No ratings yet
5SC28 L7 Machine Learning
61 pages
TFM Lichtner Bajjaoui Aisha
No ratings yet
TFM Lichtner Bajjaoui Aisha
18 pages
An Introduction To Gradient Descent and Linear Regression
No ratings yet
An Introduction To Gradient Descent and Linear Regression
8 pages
Module 6
No ratings yet
Module 6
47 pages
A Systematic and Bibliometric Review On Physics-Based Neural Networks
No ratings yet
A Systematic and Bibliometric Review On Physics-Based Neural Networks
18 pages
Fundations Data Science
No ratings yet
Fundations Data Science
16 pages
Machine Learning - AL3451 - Important Questions With Answer
No ratings yet
Machine Learning - AL3451 - Important Questions With Answer
27 pages
Lecture 03 - Feedforward Networks - 4p
No ratings yet
Lecture 03 - Feedforward Networks - 4p
19 pages
(2004) An Integrated Framework For Optimization Under Uncertainty Using Inverse Reliability Strategy
No ratings yet
(2004) An Integrated Framework For Optimization Under Uncertainty Using Inverse Reliability Strategy
9 pages
Klqgceb Ewvhja SC
No ratings yet
Klqgceb Ewvhja SC
8 pages
The Mathematics of Artificial Intelligence: 1 Supervised Learning
No ratings yet
The Mathematics of Artificial Intelligence: 1 Supervised Learning
10 pages
SC Assignment 2
No ratings yet
SC Assignment 2
10 pages
UDL Errata
No ratings yet
UDL Errata
8 pages
tmpD53B TMP
No ratings yet
tmpD53B TMP
6 pages
When Models Meet Data
No ratings yet
When Models Meet Data
25 pages
Experiment 7
No ratings yet
Experiment 7
7 pages
Universal Value Function Approximators.
No ratings yet
Universal Value Function Approximators.
9 pages
RL Chap 4
No ratings yet
RL Chap 4
7 pages
Learning Multidimensional Fourier Series With Tensor Trains
No ratings yet
Learning Multidimensional Fourier Series With Tensor Trains
6 pages
Lnotes 05
No ratings yet
Lnotes 05
5 pages
OMINewsLetter November 2024
No ratings yet
OMINewsLetter November 2024
5 pages
Function Approximation
No ratings yet
Function Approximation
2 pages
Solutions - REINFORCE and Linear Function Approximation
No ratings yet
Solutions - REINFORCE and Linear Function Approximation
5 pages
MP1 v01
No ratings yet
MP1 v01
3 pages
Function Approximation
No ratings yet
Function Approximation
3 pages
OF Fall1403 HW3
No ratings yet
OF Fall1403 HW3
2 pages
The Practically Cheating Calculus Handbook
From Everand
The Practically Cheating Calculus Handbook
S. Deviant
3.5/5 (7)

Function Approximation

Uploaded by

Function Approximation

Uploaded by

Function Approximation

Another example would be V[4, 1, west, nothing], which is an agent at

One representation of a state in Chess could be:

Needless to say that the amount of memory needed to accommodate

Logically this pushes us to find better and more suitable solutions.

The idea is that we want to find the value of each state/action, in an

This is a particularly clear and instructive case, but of course it is limited to

In other words, the key issue is that of generalization.

we need to combine reinforcement learning methods with existing generalization

size = a*height + b*weight + c*age + d

Conditions in real life are complicated. Rules have

Examples like handwriting recognition and image

Wouldn’t it be great if there were a way to

According to the Universal Approximation Theorem, a neural

There are many function approximators like:

● Generalization to unseen states by exploiting similarities between

● Compact representation of the value function or policy.

● Approximation errors due to the limited capacity of the function

● Need for careful feature engineering or representation learning.

● Artificial Neural Networks

● Decision Trees and Ensembles

● Objective Function - 𝐽(𝜃)

where ∇𝐽(𝜃) represents the gradient of the objective function

● Stochastic Gradient Descent (SGD):

● Mini-batch Gradient Descent:

This is a recursive formula which computes the value of a state s based on

We will use those formulas to derive some interesting solutions.

Let’s redefine V(s) such as it reflects the features it contains

V(s) = W1 . F1(s) + W2 . F2(s) + … +Wn . Fn(s) or in shorter terms

● V(s) is the estimated value function for state s.

● 𝝫(s) is a vector of features at state s, and

● 𝜽ᵀ is a transpose matrix of weights applied to the features, in such a

It has been proven that this approach has enough guaranties of

● Generalization: By using feature vectors, the learned value

● Scalability: It allows for efficient computation and storage,

● Interpretability: The learned weights θ can provide insights into

● The inability to represent complex, non-linear functions accurately.

𝜋(𝑎∣𝑠;𝜃), where 𝜃 represents the parameters of the policy.

𝜏 represents a trajectory generated by the policy 𝜋,

𝑟𝑡​is the reward at time step 𝑡,

● 𝐺𝑡 is an estimate of the return at time step 𝑡, often computed using

the rewards obtained in the future trajectory

● So, 𝜋(𝑎∣𝑠;𝜃) becomes a parameterized function, typically denoted

𝛼 is the learning rate.

The steps for policy gradient with function approximation are:

You might also like

size = aheight + bweight + c*age + d

𝑟𝑡is the reward at time step 𝑡,