0% found this document useful (0 votes)
4 views

Function Approximation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Function Approximation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 35

Function Approximation

Module-6
Table of Content

● Function Approximation
● Drawbacks of tabular implementation
● Gradient Descent Methods
● Linear parameterization
● Policy gradient with function approximation
Drawbacks of Tabular Representation
1. Does not learn anything about how different states and actions relate
to each other.
2. Curse of Dimensionality

3. Lack of Generalization

4. Inefficient Learning

5. Scalability Issues

6. Lack of Flexibility
What to do when state and action spaces explode… literally ?
Example
let’s take a concrete example. Suppose an agent is in a 4x4 grid,so the location of the
of the agent on the grit is a feature. This gives 16 different locations meaning 16
different states.
Example
But that’s not all, suppose the orientation (north, south, east, west)
is also a feature.
This gives 4 possibilities for each location, which makes the
number of states to 16*4 = 64. Furthermore if the agent has the
possibility of using 5 different tools (including “no tool” case), this
will grow the number of states to 64 * 5 = 320.
Example
One way to represent those states is by creating a multidimensional
array such as V[row, column, direction, tool]. Then we either query or
compute a state.

For example V[1, 2, north, torch] represents the state where the agent
is at row 1, column 2, looking north and holding a torch. The value
inside this array cell tells how valuable this state is.

Another example would be V[4, 1, west, nothing], which is an agent at


4th row, 1st column, heading west and holding nothing
Example
Let’s also consider game of Chess.

The situation of the board after each move is a state. The estimated
number of states is about 10^120 !

One representation of a state in Chess could be:

V[black pawn, black rook, …, none, none, ….,white queen, white bishop]

Where each dimension represent a square on the board and its value is
one of the black or white pieces as well as none.
Example
So once we have the set of states we can assign a value-state function
for each state

Needless to say that the amount of memory needed to accommodate


the number of state is huge and the amount to time needed to compute
the value of each state is also prohibitive.

Logically this pushes us to find better and more suitable solutions.


Solution
It is always useful to keep in mind what we are trying to do, because
with all the details we might lose sight.

The idea is that we want to find the value of each state/action, in an


environment, so that the agent follows the optimum path that collects
the maximum rewards.

In the previous section we have shown that when the state space
becomes too large, the tabular methods become insufficient and
unsuitable.
Solution
In order to address this shortcoming, we can adopt a new approach based
on the features of each state. The aim is to use these set of features to
generalise the estimation of the value at states that have similar features.
We used the word estimation to indicate that this approach will never find
the true value of a state, but an approximation of it. Despite this
seemingly inconvenient result, however this will achieve faster
computation and much more generalisations.
The methods that compute these approximations are called Function
Approximators
Function Approximation
We have seen that our estimates of value functions are represented as a
table with one entry for each state or for each state–action pair.

This is a particularly clear and instructive case, but of course it is limited to


problems with small numbers of states and actions.

The problem is not just the memory needed for large tables, but the time
and data needed to fill them accurately.

In other words, the key issue is that of generalization.

How can experience with a limited subset of the state space be usefully
generalized to produce a good approximation over a much larger subset?
Function Approximation
In many tasks to which we would like to apply reinforcement learning, most states
encountered will never have been experienced exactly before.
This will almost always be the case when the state or action spaces include continuous
variables or complex sensations, such as a visual image.

The only way to learn anything at all on these tasks is to generalize from previously
experienced states to ones that have never been seen.

we need to combine reinforcement learning methods with existing generalization


methods.
The kind of generalization we require is often called function approximation because it
takes examples from a desired function (e.g., a value function) and attempts to
generalize from them to construct an approximation of the entire function
Function Approximation
A function is just a mapping from inputs to
outputs, and these can take many forms.
Let’s say you want to train a machine learning
model that predicts a person’s clothing size.
The inputs are the person’s height, weight, and
age. The output is the size.
What we are trying to do is produce a function
that converts a person’s height/weight/age
combination (a triple of numbers) into a size
(perhaps a continuous scalar value or a
classification like XS, S, M, L, XL).
Function Approximation
According to machine learning, we can do this by using the following steps:

Gather data that is representative of the population (height/weight/age numbers for a large
number of people, paired with their actual clothing size).

Train a model to approximate the function that maps the inputs to the outputs of your training
data.

Test your model on unseen data: give it height/weight/age numbers for new people and hopefully
it will produce an accurate clothing size!

Training a model would be easy if the clothing size were just a linear combination of the input
variables.

A simple linear regression could get you good values for a, b, c, and d in the following equation.

size = a*height + b*weight + c*age + d


Function Approximation
However, we cannot assume, in general, that an output is a
linear combination of input variables.

Conditions in real life are complicated. Rules have


exceptions and special cases.

Examples like handwriting recognition and image


classification clearly require very complicated patterns to be
learned from high-dimensional input data.

Wouldn’t it be great if there were a way to


approximate any function?

According to the Universal Approximation Theorem, a neural


network with a single hidden layer can do exactly that.
Function Approximators
Function approximators are used in reinforcement learning (RL) to
estimate the value function or the policy when the state or action
spaces are too large or continuous to represent them using tabular
methods.
● Value Function Approximation

● Policy Approximation

There are many function approximators like:


● Linear combinations of features
● Neural networks
Advantages of Function Approximators
● Ability to handle large or continuous state/action spaces.

● Generalization to unseen states by exploiting similarities between


states.

● Compact representation of the value function or policy.


Disdvantages of Function Approximators
● Potential instability or divergence during learning.

● Approximation errors due to the limited capacity of the function


approximator.

● Need for careful feature engineering or representation learning.


Types of Function Approximators
● Linear Combinations of Features

● Artificial Neural Networks

● Kernel Methods

● Decision Trees and Ensembles

The choice of function approximator depends on factors like the nature of the problem, the available data,
and the trade-off between modeling capacity and computational complexity.
Gradient Descent Methods
● Gradient descent methods are widely used for function
approximation, particularly in machine learning and optimization

● Objective Function - 𝐽(𝜃)


tasks.

● Parameters - parameters 𝜃
● Gradient Descent - minimize the objective function
○ Gradient Calculation
○ Update Rule

where ∇𝐽(𝜃) represents the gradient of the objective function


𝛼 is learning rate
● Convergence
Types of Gradient Descent
● Batch Gradient Descent:
○ Computes the gradient of the objective function using the entire dataset.
○ slow for large datasets but ensures convergence to the global minimum.

● Stochastic Gradient Descent (SGD):


○ Computes the gradient using only one sample at a time.
○ faster but may oscillate around the minimum and can be noisy.

● Mini-batch Gradient Descent:


○ Computes the gradient using a subset of the dataset.
○ combines the advantages of both batch and stochastic gradient descent
Linear Parameterization
The details of this Linear Function Approximator method.

This is a recursive formula which computes the value of a state s based on


the values of the next states s’ and so on…

The problem with this formula is that each time we need to compute V(s) we
need to compute all future states. Even worse, if at a time we encounter a
state similar to one that we have already seen in the past, we have no way
of recognizing it.
Linear Parameterization
Also from the Temporal Difference with Q learning formulas we have a
way of estimating what future states can be:

We will use those formulas to derive some interesting solutions.

Let’s redefine V(s) such as it reflects the features it contains


Linear Parameterization
The state-value function can be expressed by its weighted sum of its features:

V(s) = W1 . F1(s) + W2 . F2(s) + … +Wn . Fn(s) or in shorter terms

● V(s) is the estimated value function for state s.

● 𝝫(s) is a vector of features at state s, and

● 𝜽ᵀ is a transpose matrix of weights applied to the features, in such a


manner that some features are valued more than others at any state s.
Linear Parameterization
However our problem is that we don’t know what is the true result!

Instead we will compute the difference between the actual result and an
estimated result. After each iteration we have a new estimation of the
true result, which makes it appear as if we are aiming at a moving target.

Still the idea is to keep refining the estimation until the difference
between the two becomes small enough.

It has been proven that this approach has enough guaranties of


convergence.
Advantages of Linear Parameterization

● Generalization: By using feature vectors, the learned value


function or policy can generalize to unseen states or state-action
pairs, as long as they share similar feature representations.

● Scalability: It allows for efficient computation and storage,


especially when the state or action space is large or continuous.

● Interpretability: The learned weights θ can provide insights into


the relative importance of different features in determining the
value function or policy.
Disadvantage of Linear Parameterization

● The inability to represent complex, non-linear functions accurately.

● Solution:
○ non-linear function approximation techniques, like neural networks or decision
trees, may be more suitable.
Policy gradient with function approximation
● Policy gradient methods with function approximation are a class of
reinforcement learning algorithms used to learn policies in
environments with large or continuous state and action spaces.
● These methods aim to directly optimize the policy parameters to
maximize the expected return.
1. Policy
2. Objective
3. Policy Gradient Theorem
4. Function Approximation
5. Update Rule
Policy

𝜋(𝑎∣𝑠;𝜃), where 𝜃 represents the parameters of the policy.


● The policy is a mapping from states to actions, denoted as
Objective
● maximize the expected return, 𝐽(𝜃)
● expected sum of rewards obtained by following the policy

𝜏 represents a trajectory generated by the policy 𝜋,


Where,

𝑟𝑡​is the reward at time step 𝑡,


𝑇 is the time horizon, and
𝛾 is the discount factor.
Policy Gradient Theorem
● provides a way to compute the gradient of the objective function
with respect to the policy parameters

● 𝐺𝑡 is an estimate of the return at time step 𝑡, often computed using


Where

the rewards obtained in the future trajectory


Function Approximation
● use a parameterized function approximator, such as a neural
network.

● So, 𝜋(𝑎∣𝑠;𝜃) becomes a parameterized function, typically denoted


as 𝜋𝜃(𝑠,𝑎).
Update Rule
● We update the policy parameters in the direction of the policy gradient to increase the
expected return

𝛼 is the learning rate.


Where

The steps for policy gradient with function approximation are:


○ Collect trajectories using the current policy.
○ Compute the policy gradient using the sampled trajectories.
○ Update the policy parameters using the gradient ascent algorithm.

This iterative process continues until the policy converges to an optimal or near-optimal
policy.
References
Chapter 4 from Sutton and Barto

https://web.stanford.edu/class/cs234/CS234Win2020/slides/lecture5_pos
t.pdf

https://towardsdatascience.com/function-approximation-in-reinforcemen
t-learning-85a4864d566

https://www.youtube.com/watch?v=gqSuPgrcVx8&list=PLEAYkSg4uSQ0
Hkv_1LHlJtC_wqwVu6RQX&index=39&ab_channel=ReinforcementLearni
ng

https://www.youtube.com/watch?v=c4cvheE3diA&list=PLEAYkSg4uSQ0
Hkv_1LHlJtC_wqwVu6RQX&index=40&ab_channel=ReinforcementLearni
ng

You might also like