Function Approximation
Function Approximation
Module-6
Table of Content
● Function Approximation
● Drawbacks of tabular implementation
● Gradient Descent Methods
● Linear parameterization
● Policy gradient with function approximation
Drawbacks of Tabular Representation
1. Does not learn anything about how different states and actions relate
to each other.
2. Curse of Dimensionality
3. Lack of Generalization
4. Inefficient Learning
5. Scalability Issues
6. Lack of Flexibility
What to do when state and action spaces explode… literally ?
Example
let’s take a concrete example. Suppose an agent is in a 4x4 grid,so the location of the
of the agent on the grit is a feature. This gives 16 different locations meaning 16
different states.
Example
But that’s not all, suppose the orientation (north, south, east, west)
is also a feature.
This gives 4 possibilities for each location, which makes the
number of states to 16*4 = 64. Furthermore if the agent has the
possibility of using 5 different tools (including “no tool” case), this
will grow the number of states to 64 * 5 = 320.
Example
One way to represent those states is by creating a multidimensional
array such as V[row, column, direction, tool]. Then we either query or
compute a state.
For example V[1, 2, north, torch] represents the state where the agent
is at row 1, column 2, looking north and holding a torch. The value
inside this array cell tells how valuable this state is.
The situation of the board after each move is a state. The estimated
number of states is about 10^120 !
V[black pawn, black rook, …, none, none, ….,white queen, white bishop]
Where each dimension represent a square on the board and its value is
one of the black or white pieces as well as none.
Example
So once we have the set of states we can assign a value-state function
for each state
In the previous section we have shown that when the state space
becomes too large, the tabular methods become insufficient and
unsuitable.
Solution
In order to address this shortcoming, we can adopt a new approach based
on the features of each state. The aim is to use these set of features to
generalise the estimation of the value at states that have similar features.
We used the word estimation to indicate that this approach will never find
the true value of a state, but an approximation of it. Despite this
seemingly inconvenient result, however this will achieve faster
computation and much more generalisations.
The methods that compute these approximations are called Function
Approximators
Function Approximation
We have seen that our estimates of value functions are represented as a
table with one entry for each state or for each state–action pair.
The problem is not just the memory needed for large tables, but the time
and data needed to fill them accurately.
How can experience with a limited subset of the state space be usefully
generalized to produce a good approximation over a much larger subset?
Function Approximation
In many tasks to which we would like to apply reinforcement learning, most states
encountered will never have been experienced exactly before.
This will almost always be the case when the state or action spaces include continuous
variables or complex sensations, such as a visual image.
The only way to learn anything at all on these tasks is to generalize from previously
experienced states to ones that have never been seen.
Gather data that is representative of the population (height/weight/age numbers for a large
number of people, paired with their actual clothing size).
Train a model to approximate the function that maps the inputs to the outputs of your training
data.
Test your model on unseen data: give it height/weight/age numbers for new people and hopefully
it will produce an accurate clothing size!
Training a model would be easy if the clothing size were just a linear combination of the input
variables.
A simple linear regression could get you good values for a, b, c, and d in the following equation.
● Policy Approximation
● Kernel Methods
The choice of function approximator depends on factors like the nature of the problem, the available data,
and the trade-off between modeling capacity and computational complexity.
Gradient Descent Methods
● Gradient descent methods are widely used for function
approximation, particularly in machine learning and optimization
● Parameters - parameters 𝜃
● Gradient Descent - minimize the objective function
○ Gradient Calculation
○ Update Rule
The problem with this formula is that each time we need to compute V(s) we
need to compute all future states. Even worse, if at a time we encounter a
state similar to one that we have already seen in the past, we have no way
of recognizing it.
Linear Parameterization
Also from the Temporal Difference with Q learning formulas we have a
way of estimating what future states can be:
Instead we will compute the difference between the actual result and an
estimated result. After each iteration we have a new estimation of the
true result, which makes it appear as if we are aiming at a moving target.
Still the idea is to keep refining the estimation until the difference
between the two becomes small enough.
● Solution:
○ non-linear function approximation techniques, like neural networks or decision
trees, may be more suitable.
Policy gradient with function approximation
● Policy gradient methods with function approximation are a class of
reinforcement learning algorithms used to learn policies in
environments with large or continuous state and action spaces.
● These methods aim to directly optimize the policy parameters to
maximize the expected return.
1. Policy
2. Objective
3. Policy Gradient Theorem
4. Function Approximation
5. Update Rule
Policy
This iterative process continues until the policy converges to an optimal or near-optimal
policy.
References
Chapter 4 from Sutton and Barto
https://web.stanford.edu/class/cs234/CS234Win2020/slides/lecture5_pos
t.pdf
https://towardsdatascience.com/function-approximation-in-reinforcemen
t-learning-85a4864d566
https://www.youtube.com/watch?v=gqSuPgrcVx8&list=PLEAYkSg4uSQ0
Hkv_1LHlJtC_wqwVu6RQX&index=39&ab_channel=ReinforcementLearni
ng
https://www.youtube.com/watch?v=c4cvheE3diA&list=PLEAYkSg4uSQ0
Hkv_1LHlJtC_wqwVu6RQX&index=40&ab_channel=ReinforcementLearni
ng