Entropy 21 00257 v2
Entropy 21 00257 v2
Article
PID Control as a Process of Active Inference with Linear
Generative Models †
Manuel Baltieri * and Christopher L. Buckley
EASY Group—Sussex Neuroscience, Department of Informatics, University of Sussex, Brighton BN1 9RH, UK;
[email protected]
* Correspondence: [email protected]
† This paper is an extended version of our paper published in From Animals to Animats 15: 15th International
Conference on Simulation of Adaptive Behavior, SAB 2018, Frankfurt/Main, Germany, 14–17 August 2018.
Received: 18 January 2019; Accepted: 3 March 2019; Published: 7 March 2019
Abstract: In the past few decades, probabilistic interpretations of brain functions have become widespread
in cognitive science and neuroscience. In particular, the free energy principle and active inference are
increasingly popular theories of cognitive functions that claim to offer a unified understanding of life
and cognition within a general mathematical framework derived from information and control theory,
and statistical mechanics. However, we argue that if the active inference proposal is to be taken as a
general process theory for biological systems, it is necessary to understand how it relates to existing
control theoretical approaches routinely used to study and explain biological systems. For example,
recently, PID (Proportional-Integral-Derivative) control has been shown to be implemented in simple
molecular systems and is becoming a popular mechanistic explanation of behaviours such as chemotaxis
in bacteria and amoebae, and robust adaptation in biochemical networks. In this work, we will show
how PID controllers can fit a more general theory of life and cognition under the principle of (variational)
free energy minimisation when using approximate linear generative models of the world. This more
general interpretation also provides a new perspective on traditional problems of PID controllers such as
parameter tuning as well as the need to balance performances and robustness conditions of a controller.
Specifically, we then show how these problems can be understood in terms of the optimisation of the
precisions (inverse variances) modulating different prediction errors in the free energy functional.
Keywords: approximate Bayesian inference; active inference; PID control; generalised state-space models;
sensorimotor loops; information theory; control theory
1. Introduction
Probabilistic approaches to the study of living systems and cognition are becoming increasingly
popular in the natural sciences. In particular for the brain sciences, the Bayesian brain hypothesis,
predictive coding, the free energy principle and active inference have been proposed to explain
brain processes including perception, action and higher order cognitive functions [1–8]. According
to these theories, brains, and biological systems more generally, should be thought of as Bayesian
inference machines, since they appear to estimate the latent states of their sensory input in a process
consistent with a Bayesian inference scheme. Given the complexity of exact Bayesian inference, however,
approximated schemes are believed to provide a more concrete hypothesis on the underlying mechanisms.
One candidate scheme is the free energy principle (FEP), which was introduced in [4] and later
elaborated in a series of papers, e.g., [9–11], and has its roots in information theory, control theory,
thermodynamics and statistical mechanics. While initially the theory emerged in the computational [12]
and behavioural/cognitive neurosciences [13,14], over time, further connections with the fields of biological
self-organisation, information theory, optimal control, cybernetics and economics among others, have also
been suggested [10,15–17]. According to the FEP, living systems exist in a limited set of physical states and
thus must minimise the entropy of those physical states (see fluctuation theorems for non-equilibrium
thermodynamics, e.g., [18]). To achieve this, organisms can minimise the informational entropy of their
sensory states, which, under ergodic assumptions, is equivalent to the time average of surprisal (or
self-information) [9]. Surprisal quantifies how improbable an outcome is for a system, i.e., a fish out of
water is in a surprising state. Biological creatures can thus be seen as minimising the surprisal of their
sensations to maintain their existence, e.g., a fish’s observations should be limited to states in water. Since
this surprisal itself is not directly accessible by an agent, variational free energy is proposed as an upper
bound on this quantity which can be minimised in its stead [4,19]. It has also been suggested that cognitive
functions such as perception, action, learning and attention can be accounted for in terms of approximate
Bayesian inference schemes such as the FEP. In particular, according to this hypothesis, perception
can be described using predictive coding models of the cortex. These models describe perception as a
combination of feedforward prediction errors and feedback predictions combined under a generative
model to infer the hidden causes and states of sensory data [2]. More recent work has connected these
ideas to control theory and cybernetics [15,17,20], extending existing accounts of (optimal) motor control
and behaviour [10,13,21,22]. In this view, behaviour is cast as a process of acting on the world to make
sensory data better fit existing predictions, with (optimal) motor control cast as a Bayesian inference
problem. The most recent attempt to unify predictive coding and optimal control theory usually falls
under the name of active inference [10,13].
While in standard accounts of perceptual inference, prediction errors can be suppressed only by
updating predictions of the incoming sensations, in active inference, errors can also be minimised by
directly acting on the environment to change sensory input to better accord with existing predictions [9,13].
If a generative model encodes information about favourable states for an agent, then this process constitutes
a way by which the agent can change its environment to better meet its needs. Thus, under the FEP,
these two processes of error suppression allow a system to both infer the properties of, and control,
the surrounding environment. Most models implementing the FEP and active inference assume that
agents have a deep understanding of their environment and its dynamics in the form of an accurate and
detailed generative model. For instance, in [13,23] the generative model of the agent explicitly mirrors the
generative process of the environment, i.e., the dynamics of the world the agent interacts with. In recent
work, it has been argued that this need not be the case [24–27], especially if we consider simple living
systems with limited resources. We intuitively do not expect an ant to model the entire environment
where it forages, performing complex simulations of the world in its brain (cf. the concept of Umwelt [28]).
When states and parameters in the world change too rapidly, accurate online inference and learning
are implausible [29]. This idea is however common in the literature, e.g., [6,13,14,23], where cognition
and perception are presented as processes of inference to the best explanation, and agents are primarily
thought to build sophisticated models of their worlds with only a secondary role for action and behaviour.
A possible alternative introduces action-oriented models entailing a more parsimonious approach where
only task-relevant information is encoded [24,25]. On this normative view, agents only model a minimal
set of environmental properties, perhaps in the form of sensorimotor contingencies [26], that are necessary
to achieve their goals.
The relationship between information/probability theory and control theory has long been recognised,
with the first intuitions emerging from work by Shannon [30] and Kalman [31]. A unifying view of these
two theoretical frameworks is nowadays proposed for instance in stochastic optimal control [32,33]
Entropy 2019, 21, 257 3 of 25
and extended in active inference [15], with connections to ideas of sensorimotor loops in biological
systems [11,13]. These connections emphasise homeostasis, regulation and concepts such as set-point
control and negative feedback for the study of different aspects of living systems, with roots in the
cybernetics movement [34,35]. It remains, however, unclear how the active inference formulation directly
relates to more traditional concepts of control theory. PID control, a popular control strategy working with
little prior knowledge of the process to regulate, is commonly applied in engineering [36–38] and more
recently used in biology and neuroscience modelling [39–43]. In this work, we develop an information
theoretic interpretation of PID control, showing how it can be derived in a more general Bayesian (active)
inference framework. We will show that approximate models of the world are often enough for regulation,
and in particular that simple linear generative models that only approximate the true dynamics of the
environment implement PID control as a process of inference. Using this formulation we also propose a
new method for the optimisation of the gains of PID controllers based on the same principle of variational
free energy minimisation, and implemented as a second order optimisation process. Finally, we will
show that our implementation of PID controllers as approximate Bayesian inference lends itself to a
general framework for the formalisation of different (conflicting) criteria in the design of a controller,
the so-called performance-robustness trade-off [38,44], as a cohesive set of constraints implemented in a
free energy functional. In active inference, these criteria will be mapped to precisions, or inverse variances,
of observations and dynamics of a state-space model with a straightforward interpretation in terms of
uncertainty on different variables of a system.
In Section 2 we will introduce PID control and give a brief overview of the recent literature highlighting
the most common design principles used nowadays for PID controllers. The free energy principle will
be presented in Section 3, followed by a complete derivation of PID control as a form of active inference.
In this section we will also propose that the parameters of a PID controller, its gains, can be optimised
following the active inference formulation, which also captures modern design constraints and desiderata
of PID controllers.
2. PID Control
Proportional-Integral-Derivative (PID) control is one of the most popular types of controllers used in
industrial applications, with more than 90% of total controllers implementing PID or PI (no derivative)
regulation [38,45]. It is one of the simplest set-point regulators, whereby a desired state (i.e., set-point,
reference, target) represents the final goal of the regulation process, e.g., to maintain a room temperature of
23 ◦ C. PID controllers are based on closed-loop strategies with a negative feedback mechanism that tracks
the real state of the environment. In the most traditional implementation of negative feedback methods,
the difference between the measured state of the variable to regulate (e.g., the real temperature in a room)
and the target value (e.g., 23 ◦ C) produces a prediction error whose minimisation drives the controller’s
output, e.g., if the temperature is too high, it is decreased and if too low, it is raised. In mathematical terms:
e ( t ) = yr − y ( t ) (1)
where e(t) is the error, yr is the reference or set-point (e.g., desired temperature) and y(t) is the observed
variable (e.g., the actual room temperature).
This mechanism is, however, unstable in very common conditions, in particular when a steady-state
offset is added (e.g., a sudden and unpredictable change in external conditions affecting the room
temperature which are not under our control), or when fluctuations need to be suppressed (e.g., too
many oscillations while regulating the temperature may be undesirable). PID controllers elegantly deal
with both of these problems by augmenting the standard negative feedback architecture, here called
proportional or P term, with an integral or I and a derivative or D term, see Figure 1. The integral term
Entropy 2019, 21, 257 4 of 25
accumulates the prediction error over time in order to cancel out errors due to unaccounted steady-state
input, while minimising the derivative of the prediction error leads to a decrease in the amplitude of
fluctuations of the controlled signal. The general form of the control signal u(t) generated by a PID
controller is usually described by:
Z t
de(t)
u(t) = k p e(t) + k i e(τ )dτ + k d (2)
0 dt
where e(t) is, again, the prediction error and k p , k i , k d are the so called proportional, integral and derivative
gains respectively, a set of parameters used to tune the relative strength of the P, I and D terms of the
controller. The popularity of PID controllers is largely due to their simple formulation and implementation.
One of the major challenges on the other hand, lies with the tuning of parameters k p , k i , k d , that have to be
adapted to deal with different (often conflicting) constraints on the regulation process [36,44].
Figure 1. A PID controller [46]. The prediction error e(t) is given by the difference between a reference
signal r (t), yr in our formulation, and the output y(t) of a process. The different terms, one proportional to
the error (P term), one integrating the error over time (I term) and one differentiating it (D term), drive the
control signal u(t).
• load disturbance response, how a controller reacts to changes in external inputs, e.g., a step input,
• set-point response, how a controller responds to different set-points over time,
• measurement noise response, how noise on the observations impacts the regulation process,
The goal of a general methodology for the design and tuning of PID controllers is to bring together
these (and possibly more) criteria into a formal and tractable framework that can be used for a large
class of problems. One possible example is presented in [48] (see also [50,51] for other attempts). This
methodology is based on the maximisation of the integral gain (equivalent to the minimisation of the
Entropy 2019, 21, 257 5 of 25
integral of the error from the set-point, see [36]), subject to constraints derived from a frequency domain
analysis related to the Nyquist stability criterion applied to the controlled system [48]. In this work,
we propose our formulation also as a general framework for the design and tuning of PID controllers
leveraging the straightforward interpretation of the performance-robustness trade-off for PID controllers
in terms of uncertainty parameters (i.e., precisions or inverse variances) in the variational free energy.
− ln p(ψ|m) (3)
where ψ is a set of sensory inputs conditioned on an agent m. Surprisal, in general, can in fact differ from
agent to agent, with states that are favourable for a fish (in water), different from those favourable for a
bird (out of water) (see [52] for a review on the value of information). According to the FEP, agents that
minimise the surprisal of their sensory states over time will also minimise the entropy of their sensations,
thus limiting the number of states they can physically occupy [4,19]. This minimisation is, however,
intractable in any practical scenario since surprisal can be seen as the negative log-model evidence or
negative log-marginal likelihood of observations ψ, with (omitting m for simplicity from now on) the
marginal likelihood or model evidence expressed as:
Z
p(ψ) = p(ψ, ϑ ) dϑ. (4)
ϑ
This integral is defined over all possible hidden variables, ϑ, of observations ψ. In many cases,
the marginalisation is intractable since the latent space of ϑ may be high dimensional or the distribution
may have a complex (analytical) form. In statistical mechanics, an approximation under variational
formulations transforms this into an optimisation problem. The approximation goes by several names,
including variational Bayes and ensemble learning [53,54], and constitutes the mathematical basis of the
free energy principle. Using variational Bayes, surprisal can then be decomposed into [54]:
where
q(ϑ )
Z
DKL (q(ϑ ) || p(ϑ |ψ)) = q(ϑ ) ln dϑ, (6)
p(ϑ |ψ)
is the Kullback-Leibler (KL) divergence [55], or relative entropy [54], an asymmetrical non-negative
measure of the difference between two probability distributions. The first one, p(ϑ |ψ), represents
the posterior distribution specifying the probability of hidden states, causes and parameters (ϑ) given
observations ψ, while the second one q(ϑ ), is the variational or recognition density which encodes
currents beliefs over hidden variables ϑ. The latter is introduced with the idea of approximating the
(also) intractable posterior p(ϑ |ψ) with a simpler distribution, q(ϑ ), and then minimising their difference
through the KL divergence: when the difference is zero (following Jensen’s inequality, the divergence is
always non-negative [54]), q(ϑ ) is a perfect description of p(ϑ |ψ). Analogously, from the point of view of
Entropy 2019, 21, 257 6 of 25
an agent, its goal is to explain the hidden states, causes and parameters ϑ of sensations ψ by approximating
the posterior p(ϑ |ψ) with a known distribution, q(ϑ ). The first term in Equation (5) can be written as
q(ϑ )
Z
F= q(ϑ ) ln dϑ (7)
p(ϑ, ψ)
and is defined as (variational) free energy [8,12,56,57] for its mathematical analogies with free energy in
thermodynamics, or [54] (negative) evidence lower bound in machine learning. Since the KL divergence is
always non-negative we arrive at
which demonstrates that variational free energy is an upper bound to surprisal, since by minimising F
we are guaranteed to minimise − ln p(ψ). To evaluate the variational free energy F, we must formalise a
recognition density q(ϑ ) and a generative density p(ϑ, ψ) specific to an agent. Starting from the latter, we
define a generative model formulated as a one dimensional generalised state-space model [12]:
ψ = g( x, v) + z ẋ = x 0 = f ( x, v) + w
ψ0 = gx ( x, v) x 0 + gv ( x, v)v0 + z0 ẋ 0 = x 00 = f x ( x, v) x 0 + f v ( x, v)v0 + w0 )
(9)
ψ00 = gx ( x, v) x 00 + gv ( x, v)v00 + z00 ẋ 00 = x 000 = f x ( x, v) x 00 + f v ( x, v)v00 + w00 )
.. ..
. .
where ψ are the observations and ϑ = { x, v, θ, γ}, with x as the hidden states and v as the exogenous
inputs, while θ and γ follow a partition in terms of parameters and hyperparameters defined in [12] and
are specified later to simplify the notation now. Functions g(·) and f (·) map hidden states/inputs to
observations and the dynamics of hidden states/inputs respectively. The prime symbols, e.g., x 0 , x 00 , x 000
are used to define higher orders of motion of a variable. Generalised coordinates of motion are introduced
to represent non-Markovian continuous stochastic processes based on Stratonovich calculus, with strictly
non-zero autocorrelation functions [12,58,59]. Ito’s formulation of stochastic processes, on the other hand,
is based on Wiener noise, where the autocorrelation can be seen as strictly equal to a delta function [59,60].
In general, the Stratonovich formulation is preferred in physics, where it is assumed that perfect white noise
does not exist in the real world [61], while Ito’s calculus is extensively used in mathematics/economics for
its definition preserving the Martingale property [62]. It is proposed that models of biological systems
should be based on the Stratonovich derivation [12], to accommodate more realistic properties of the
physical world (i.e., non-Markovian processes). Using the Stratonovich interpretation, random processes
can be described as analytic (i.e., differentiable) and become better approximations of real-world (weakly)
coloured noise [60,63,64]. In this formulation, standard state-space models are extended, describing
dynamics and observations for higher “orders of motion” encoding, altogether, a trajectory for each
variable. The more traditional state-space description is based on Markovian processes (i.e., white noise)
and can be seen as a special case of generalised state-space models defined here and in, for instance [8,12].
When coloured noise is introduced, one should either define a high order autoregressive process expressed
in terms of white noise [65] or embrace the Stratonovich formulation defining all the necessary equations
in a state-space form [12]. The higher “orders of motion” introduced here can be thought of as quantities
specifying “velocity” (e.g., (ψ)0 ), “acceleration” (e.g., (ψ)00 ), etc. for each variable, which is neglected
in more standard formulations. For practical purposes, in Equation (9) we also made a local linearity
Entropy 2019, 21, 257 7 of 25
approximation on higher orders of motion suppressing nonlinear terms [8,12]. We introduce then a more
compact form:
where the tilde sign (e.g., ψ̃) summarises a variable and its higher orders of motion (e.g., ψ̃ = {ψ, ψ0 , ψ00 , . . . }).
The stochastic model in Equation (9) can then be described in terms of a generative density:
In this case, we also make the conditional dependence on θ, γ explicit, defining θ as slowly changing
parameters coupling hidden states and causes to observations, and hyperparameters γ as encoding
properties of random fluctuations/noise w̃ and z̃. P(ψ| x, v; θ, γ) is a likelihood function describing the
measurement law in Equation (10), while the prior P( x̃, ṽ; θ, γ) describes the system’s dynamics. Under the
Laplace approximation [66,67], the form of the recognition density q(ϑ ) is specified in terms of a Gaussian
distribution centred around the estimated mode (i.e., the mean for a Gaussian distributions) which can
be evaluated using an extension of the EM algorithm [56,57]. Furthermore, (co)variances can be solved
analytically in terms of the Hessian of the free energy evaluated at the mode [8,67,68]. The variational free
energy in Equation (7) can then be simplified, without constants, to [8]:
where the condition ϑ̃ = µ̃ϑ represents the fact that the generative density P(ψ̃, x̃, ṽ; θ, γ) will be
approximated by a Gaussian distribution centred around the best estimates µ̃ϑ of the unknown ϑ̃, following
the Laplace method implemented in a variational context [66]. With Gaussian assumptions on random
variables z̃ and w̃ in Equation (10), the likelihood and prior in Equation (11) are also Gaussian, and the
variational free energy can be expressed as:
2 2
1
0
F≈ πz̃ ψ̃ − g(µ̃ x , µ̃v ) + πw̃ µ̃ x − f (µ̃ x , µ̃v ) − ln πz̃ πw̃ (13)
2
where x̃ and ṽ are replaced by their sufficient statistics, means/modes µ̃ x , µ̃v , and sensory and
dynamics/process precisions πz̃ , πw̃ , or inverse variances, of random variables z̃ and w̃. Following [12,56],
the optimisation of the (Laplace-encoded) free energy with respect to expected hidden states µ̃ x , equivalent
to estimation or perception, can be implemented via a gradient descent:
∂F
µ̃˙ x = D µ̃ x − (14)
∂µ̃ x
while, considering how, from the perspective of agent, only observations ψ are affected by actions a (i.e.,
ψ( a)), control or action can be cast as:
∂F ∂F ∂ψ̃
ȧ = − =− (15)
∂a ∂ψ̃ ∂a
representing a set of coupled differential equations describing a closed sensorimotor loop in terms of a
physically plausible minimisation scheme [12]. The first equation includes a term D µ̃ x that represents the
“mode of the motion” (also the mean for Gaussian variables) in the minimisation of states in generalised
coordinates of motion [8,12,69], with D as a differential operator “shifting” the order of motion of µ̃ x
Entropy 2019, 21, 257 8 of 25
such that D µ̃ x = µ̃0x . More intuitively, since we are now minimising the components of a generalised
state representing a trajectory rather than a static state, variables are in a moving frame of reference in
the phase-space, and the minimisation is achieved when the temporal dynamics of the gradient descent
match the ensemble dynamics of the estimates of hidden states, so for µ̃˙ x = µ̃0x rather than for µ̃˙ x = 0
(which assumes that the mode of the motion is zero, as in standard state-space formulations with Markov
assumptions). In the second equation, active inference makes the assumption that agents have innate
knowledge of the mapping between actions a and observations ψ̃ (i.e., ∂ψ̃/∂a) as reflex arcs, acquired on
an evolutionary time scale, see [13,15] for discussion.
4. Results
ψ = x+z ẋ = x 0 = −α( x + v) + w
ψ0 = x 0 + z0 ẋ 0 = x 00 = −α( x 0 + v0 ) + w0
ψ00 = x 00 + z00 ẋ 00 = x 000 = −α( x 00 + v00 ) + w00
where α ∈ θ is a parameter. As previously suggested, with a Gaussian assumption on z̃, w̃, the likelihood
is reduced to:
where we assume no direct dependence of observations ψ̃ on external inputs ṽ, while the prior is
described by:
with
To simplify our formulation, we assume that precisions πṽ tend to infinity (i.e., no uncertainty on
the priors for ṽ), so that P(ṽ; θ, γ) in Equation (18) becomes a delta function and inputs ṽ reduce to their
prior expectations η̃x , i.e., µ̃v = η̃x . With this simplification, prior precisions πṽ and respective predictions
errors (µ̃v − η̃x ) are not included in our formulation (see [56,57] for more general treatments). By applying
Entropy 2019, 21, 257 9 of 25
the gradient descent described in Equations (14) and (15) to our free energy functional, we then get the
following update equations for perception (estimation):
µ̇ x =µ0x − − πz ψ − µ x + πw α µ0x
+ α(µ x − ηx )
µ̇0x =µ00x − − πz0 ψ0 − µ0x + πw0 α µ00x + α(µ0x − ηx0 ) + πw µ0x + α(µ x − ηx ) (20)
µ̇00x =µ000
x − − π z 00 ψ 00
− µ 00
x + π w 00 α µ 000
x + α ( µ 00
x − η 00
x ) + π w 0 µ 00
x + α ( µ 0
x − η 0
x )
The mapping of these equations to a PID control scheme becomes more clear under a few simplifying
assumptions. First, we assume strong priors on the causes of proprioceptive observations ψ. For
consistency with previous formulations, e.g., [8,13,15], we will define ψ as proprioceptive observations,
where proprioception is the sense of position and movement of different parts of one’s body. For the
car model we introduce later, this is equivalent for instance to readings of the velocity of the vehicle.
Intuitively, these priors are used to define actions that change the observations to better fit the agent’s
desires, i.e., the target of the PID controller. This is implemented in the weighting mechanism of prediction
errors by precisions in Equation (19); see also [13,26,70] for similar discussions on the role of precisions for
behaviour. In our derivation, weighted prediction errors on system dynamics, πw̃ (µ̃0x + µ̃ x − η̃x ), will be
weighted more than weighted errors on observations, πz̃ (ψ̃ − µ̃ x ). To achieve this, we decrease sensory
precisions πz̃ on proprioceptive observations, effectively biasing the gradient descent procedure towards
minimising errors on the prior dynamics [70]. Secondly, we set the decay parameter α to a large value
(theoretically α → ∞, in practice α = 105 in our simulations), obtaining a set of differential equations
including only terms of order α2 for perception:
µ̇ x ≈ − πw α α(µ x − ηx )
µ̇0x ≈ − πw0 α α(µ0x − ηx0 ) (22)
µ̇00x ≈ − πw00 α α(µ00x − ηx00 )
This can be interpreted as an agent encoding beliefs in a world that quickly settles to a desired
equilibrium state. This assumption effectively decouples orders of generalised motion, with higher
embedding orders not affecting the minimisation of lower ones in Equation (20), since terms from
lower orders are modulated by α directly. The remaining terms effectively impose constraints on the
generalised motion only close to equilibrium, playing a minor role in the control process away from the
target/equilibrium (the more interesting part of regulation). These terms are necessary for the system to
settle to a proper steady state when (µ̃ x − η̃x ) → 0 and maintain consistency across generalised orders of
motion for small fluctuations at steady state, but have virtually no influence at all in conditions far from
equilibrium. Following Equation (22), at steady state, expectations on hidden states µ̃ x are mainly driven
by priors η̃x :
µ̃ x = η̃x (23)
Entropy 2019, 21, 257 10 of 25
but are still not met by appropriate changes in observations ψ̃ which effectively implement the regulation
around the desired target. To minimise free energy in the presence of strong priors, this agent will
necessarily have to modify its observations ψ̃ to better match expectations µ̃ x , which in turn are shaped
by priors (i.e., desires) η̃x . Effectively, the agent “imposes” its desires on the world, acting to minimise
the prediction errors arising at the proprioceptive sensory layers. In essence, an active inference agent
implements set-point regulation by behaving to make its sensations accord with its strong priors/desires.
After these assumptions, action can be written as:
∂ψ ∂ψ0 ∂ψ00
0
ȧ ≈ − πz ψ − ηx + πz0 ψ − ηx0 + πz00 ψ 00
− ηx00 (24)
∂a ∂a ∂a
where we still need to specify partial derivatives ∂ψ̃/∂a. As discussed in [13], this step highlights the
fundamental differences between the FEP and the more traditional forward/inverse models formulation
of control problems in biological systems [71,72]. While these derivatives help in the definition of an
inverse model (i.e., finding the correct action for a desired output), unlike more traditional approaches,
active inference does not involve a mapping from hidden states x̃ to actions a, but is cast in terms of
(proprioceptive) sensory data ψ̃ directly. This is thought to simplify the problem: from a mapping between
unknown hidden states and actions, to a mapping between known proprioceptive observations ψ̃ and
actions a. It is claimed that this provides an easier implementation for an inverse model [15], one that is
grounded in an extrinsic frame of reference, i.e., the real world (ψ̃), rather than in a intrinsic one in terms
of hidden states (x̃) to be inferred first. To achieve PID-like control, we assume that the agent adopts the
simplest (i.e., linear) relationship between its actions (controls) and their effects on sensory input across all
orders of motion:
∂ψ ∂ψ0 ∂ψ00
= = = 1. (25)
∂a ∂a ∂a
This reflects a very simple reflex-arc-like mechanism that is triggered every time a proprioceptive
prediction is generated: positive actions (linearly) increase the values of the sensed variables ψ̃, while
negative actions decrease them. There is, however, an apparent inconsistency here that we need to
dissolve: the proprioceptive input ψ and its higher order states ψ0 , ψ00 are all linearly dependent with
respect to actions a as represented in Equation (25). While an action may not change position, velocity
and acceleration of a variable in the same way, a generative model does not need to perfectly describe the
system to regulate: these derivatives only encode sensorimotor dependencies that allow for, in this case,
sub-optimal control. In the same way, PID controllers are, in most cases, effective but only approximate
solutions for control [36,73]. This allows us to understand the encoding of an inverse model from the
perspective of an agent (i.e., the controller) rather than assuming a perfect, objective mapping from
sensations to actions that reflects exactly how actions affect sensory input [13]. This also points at
possible investigations of generative/inverse models in simpler living systems where accurate models
are not perhaps needed, and where strategies like PID control are implemented [39–41]. By combining
Equations (24) and (25), action can then be simplified to:
which is consistent with the “velocity form” or algorithm of a PID controller [36]:
d d2
u̇ = k i yr − y + k p yr − y + k d 2 yr − y . (27)
dt dt
Entropy 2019, 21, 257 11 of 25
Velocity forms are used in control problems where, for instance, integration is provided by an
external mechanism outside the controller [36,73]. Furthermore, velocity algorithms are the most natural
form for the implementation of integral control to avoid windup effects of the integral term, emerging
when actuators cannot regulate an indiscriminate accumulation of steady-state error in the integral
term due to physical limitations [36,74]. This algorithm is usually described using discrete systems
to avoid the definition of the derivative of random variables, often assumed to be white noise in the
Ito’s sense (i.e., Markovian processes). In the continuous case, if the variable y is a Markov process,
its time derivative is in fact not well defined. For this form to exist in continuous systems, y must
be a smooth (stochastic) process. Effectively, this drops the Markov assumption of white noise and
implements the same definition of analytic (i.e., differentiable) noise related to Stratonovich calculus and
the generalised coordinates of motion we described earlier. The presence of extra prediction errors beyond
the traditional negative feedback (proportional term) can, in this light, be seen as a natural consequence
of considering linear non-Markovian processes with simple reflex mechanisms responding to position,
velocity and acceleration in the generalised motion phase space (see Equation (25)). To ensure that the
active inference implementation approximates the velocity form of PID control we still need to clarify
the relationship between the generalised coordinates of motion in Equation (26) and the differential
operators d/dt, d2 /dt2 in Equation (27). As pointed out in previous work, when the variational free
energy is minimised, the two of them are equal since the motion of the mode becomes the mode of the
motion [8,56]. To simplify our formulation and show PID control more directly, we can consider the case
for ηx0 = ηx00 = 0, defining the more standard set-point control where a desired or set-trajectory collapses to
a single set-point in the state-space and equivalent, in the velocity form, to the case where yr is a constant
and dyr /dt = d2 yr /dt2 = 0.
To show an implementation of PID control through active inference we use a standard model
of cruise control, i.e., a car trying to maintain a certain velocity over time (our code is available at
https://github.com/mbaltieri/PIDControlActiveInference.). While only a toy model, the intuitions and
results we derive can easily be transferred to the regulation of proteins in bacterial chemotaxis [39] or yeast
osmoregulation [75], and more generally to any homeostatic mechanism [34], especially when including
limited knowledge of external forces [76]. In this setup, a controller receives the speed of the car as an
input and adapts the throttle of the vehicle based on a negative feedback mechanism to achieve the desired,
or target, cruising velocity. In real-world scenarios, this mechanism needs to be robust in the presence of
external disturbances, essentially represented by changes in the slope of the road, wind blowing, etc., see
Figure 2d. For simplicity, we will use the model based on the formulation in [73], see also Appendix A.
In this particular instance, we will provide a simple proof of concept, simplifying PID to PI control as in [73],
hence implementing only a first order generalised state-space model (see Equation (16)). The controller
receives noisy readings ψ, ψ0 of the true velocity and acceleration of the car, x, x 0 , following the formulation
in Equation (16). The controller is provided with a Gaussian prior in generalised coordinates encoding
desired velocity and acceleration with means ηx = 10 km/h, ηx0 = 0 km/h2 . This prior represents a target
trajectory for the agent that, as we saw in Equation (26), will be equivalent to integral and proportional
terms of a PI controller in velocity form. The recognition dynamics ([69]) are then specified in Equations (20)
and (21).
In Figure 2 we show the behaviour of a standard simulation of active inference implementing PI-like
control for the controller of the speed of a car. The sensory and process precisions πz̃ , πw̃ are fixed, to show
here only the basic disturbance rejection property of PID controllers [36,76]. In Figure 2a, after the car
is exposed to some new external condition (e.g., wind) represented in Figure 2c and not encoded in the
controller’s generative model, the regulation process brings the velocity of the car back to the desired state
after a short transition period. Figure 2b shows how sudden changes in the acceleration of the car are
Entropy 2019, 21, 257 12 of 25
quickly cancelled out in accord with the specified prior ηx0 = 0 km/h2 . The action of the car is then shown,
as one would expect [76], to counteract the external force v, Figure 2c.
Acceleration (km/h 2)
30 40
Velocity (km/h)
20 20
0 0
x
10 x
20
0 40
60
10
0 50 100 150 200 250 300 0 50 100 150 200 250 300
Time (s) Time (s)
(a) (b)
Motor output
4 Action, a
Ext. input, v
2
Acceleration (km/h 2)
(wind)
Fcar
0
Fa + Fg + Fr (varying slope)
2 λ(t)
4
0 50 100 150 200 250 300
Time (s)
(d)
(c)
Figure 2. A cruise controller based on PI control under active inference. (a) The response of the car velocity
over time with a target state, or prior in our formulation, ηx = 10 km/h, ηx0 = 0 km/h2 ; (b) The acceleration
of the car over time with a specified prior ηx0 = 0 km/h2 ; (c) The external force v, introduced at t = 150 s,
models a sudden change in the environmental conditions, for instance wind or change in slope. Action
obtained via the minimisation of variational free energy with respect to a and counteracts the effects of
v. The motor action is never zero since we assume a constant slope, λ = 4◦ (see Table A1, Appendix A);
(d) The model car we implemented, where v could be thought of as a sudden wind or a changing slope.
In the limit for process prediction errors πw̃ (µ̃0x + α(µ̃ x − η̃x )) much larger than the sensory
ones πz̃ (ψ̃ − µ̃ x ) and with fixed expected sensory precisions πz̃ , the response to load disturbances
is invariant (Figure 3a). A new target velocity for the car creates different responses with varying
πw = {exp(−24), exp(−22), exp(−20)} (precisions on higher embedding orders are built, in both cases,
using a smoothness (i.e., decay) factor of 1/2, see [12]). Larger πw̃ values imply an expected low uncertainty
on the dynamics (i.e., changes to the set-point are not encoded and therefore not expected) and are met
almost instantaneously with an update of expected hidden states µ̃ x , matched by suitable actions a. On the
other hand, smaller πw̃ account for higher variance/uncertainty and thus changes in the target velocity
are to be expected, making the transitions to new reference values slower, as seen in Figure 3b.
Velocity (km/h)
12.5
8 10.0 x
6 new ext. input 7.5
new target
4 5.0
Sensed velocity, 1; w = exp(-24.0) Sensed velocity, 1; w = exp(-24.0)
2 Sensed velocity, 1; w = exp(-22.0) 2.5 Sensed velocity, 1; w = exp(-22.0)
Sensed velocity, 1; w = exp(-24.0) Sensed velocity, 1; w = exp(-20.0)
0 0.0
0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0
Time (s) Time (s)
(a) (b)
Figure 3. Different responses to load disturbances and set-point changes. The simulations were 300 s long,
with an external disturbance/different target velocity introduced at t = 150 s. Here we report only a 20 s
time window around the change in conditions. (a) The same load disturbance (v = 3.0 km/h2 ) is applied
with varying expected process precisions πw̃ where πw = {exp(−24), exp(−22), exp(−20)}. Expected
sensory log-precisions πz̃ are fixed over the duration of the simulations, with µγz = 1; (b) A similar example
for changes in the target velocity of the car, from ηx = 13 km/h to ηx = 10 km/h, tested on varying expected
process precisions πw̃ where πw = {exp(−24), exp(−22), exp(−20)}.
Active inference provides then an analytical criterion for the tuning of PID gains in the temporal
domain, where otherwise mostly empirical methods or complex methods in the frequency domain
have insofar been proposed [36,38,47,48]. In frameworks used to implement active inference, such as
DEM [12,56], parameters and hyperparameters are usually assumed to be conditionally independent of
hidden states based on a strict separation of time scales (i.e., a mean-field approximation). This assumption
prescribes a minimisation scheme with respect to the path-integral of free energy, or free action, requiring
the explicit integration of this functional over time. In our work, however, for the purposes of building
an online self-tuning controller, we will treat expected sensory precisions as conditionally dependent but
changing on a much slower time-scale with respect to states x, using a second order online update scheme
based on generalised filtering [57]. The controller gains, µπz , µπz0 , µπz00 , will thus be updated specifying
instantaneous changes of the curvature of expected precisions with respect to variational free energy rather
than first order updates with respect to free action:
∂F
µ̈πz̃ = − (28)
∂µπz̃
Expected precisions µπz̃ should however be non-negative since variances need to be positive, a
fact also consistent with the negative feedback principle behind PID controllers (i.e., negative expected
precisions would apply a positive feedback). To include this constraint, following [66] we thus parametrise
sensory precisions πz̃ (and consequently expected sensory precisions µπz̃ ) in the generative model as:
creating, effectively, log-normal priors and making them strictly positive thanks to the exponential mapping
of hyperparameters γ. The scheme in Equation (28) is then replaced by one in terms of expected sensory
log-precisions µγz̃ :
∂F
µ̈γz̃ = − (30)
∂µγz̃
For practical purposes, the second order system presented in Equation (30) is usually reduced to a
simpler set of first order differential equations [8]:
µ̇γz̃ = µ0γz̃
(31)
µ̇0γz̃ = − ∂µ∂Fγ − κµ0γz̃
z̃
where µ0γz̃ is a prior on the motion of hyperparameters γ which encodes a “damping” term for the
minimisation of free energy F (in [57] we can see that this is equivalent to the introduction of a prior p(γ̃)
on the motion of γ̃ to be zero (i.e., zero mean) with precision 2κ). This term enforces hyperparameters
to converge to a solution close to the real steady state thanks to a drag term for κ > 0 (κ = 5 in our
simulations). The parametrisation of expected precisions in terms of log-precisions γz̃ , in fact, makes the
derivative of the free energy with respect to log-precisions strictly positive (∂F/∂γz̃ > 0), not providing a
steady-state solution for the gradient descent [57]. This “damping” term stabilises the solution, reducing
the inevitable oscillations around the real equilibrium of the system. Given the free energy defined in
Entropy 2019, 21, 257 15 of 25
Equation (19), with exp(µγz̃ ) replacing πz̃ , the minimisation of expected sensory log-precisions (or “log-
PID gains”) is prescribed by the following equations:
µ̇γz = µ0γz h i
µ̇0γz = − ∂µ∂Fγ − κµ0γz = − 12 exp (µγz )(ψ − µ x )2 − 1 − κµ0γz
z
µ̇γz0 = µ0γ 0
z
(32)
h i
µ̇0γ 0 = − ∂µ∂Fγ − κµ0γz0 = − 12 exp (µγz0 )(ψ0 − µ0x )2 − 1 − κµ0γz0
z z0
µ̇γz00 = µ0γ 00
z h i
µ̇0γ 00 = − ∂µ∂F − κµ 0
γ 00
= − 1
2 exp ( µ γ z 00 )( ψ 00 − µ00 )2 − 1 − κµ0
x γz00
z γ
z00 z
This scheme introduces a new mechanism for the tuning of the gains of a PID controller, allowing the
controller to adapt to adverse and unexpected conditions in an optimal way, in order to avoid oscillations
around the target state.
In Figure 4 the controller for the car velocity is initialised with suboptimal sensory log-precisions µγz̃ ,
i.e., log-PI gains. The parameters were initially not updated (Figure 4d) to allow the controller to settle
around the desired state, see Figure 4a. The adaptation begins at t = 30 s and is stopped at t = 150 s, when
an external force is introduced, to test the response of the controller after the gains have been optimised.
With the adaptation process, the controller becomes more responsive when facing external disturbances
(cf. Figure 2), quickly and effectively counteracted by prompt changes in controls, see Figure 4c. As
a trade-off, the variances of the velocity and the acceleration are however increased, see Figure 4a,b.
The optimisation of the gains through µγz̃ without extra constraints (if not the stopping condition we
imposed at t = 150 s, after the adaptation reaches a steady-state) effectively introduces an extremely
responsive controller: cancelling out the effects of unwanted external inputs, such as wind in our cruise
control example, but also more sensitive to measurement noise. In Figure 5 we show summary statistics
with the results of the adaptation of the gains. Following the examples in Figures 2 and 4, we simulated 20
different cars with expected sensory log-precisions µγz̃ sampled uniformly in the interval [−4, −2] and
expected process log-precisions µγw̃ in the interval [−23, −21]. We initially maintained (i.e., no adaptation)
the same hyperparameters and introduced a load disturbance at t = 150 s, then repeated the simulations
(20 cars) with the same initial conditions allowing for the adaptation of expected sensory log-precisions as
log-PI gains after t = 30 s, as in Figure 4. Following [79], we measured the performance of the controllers
by defining the integral absolute error (IAE):
Z t+τ
I AE = |e(t)| dt (33)
t
between two zero-crossings: the last time the velocity was at the target value before a disturbance is
introduced, assumed to be t = 150 s in our case, and the first time the velocity goes back to the target
after a disturbance is introduced (t + τ). To compute t + τ, we took into account the stochasticity of the
system and errors due to numerical approximations, considering the case for the real velocity to be within
a ±0.5 km/h interval away from the target value. The IAE captures the impact of oscillations on the
regulation problem by integrating the error over the temporal interval where the car is pushed away from
its target due to some disturbance (for more general discussions on its role and uses see [36]). As we can
see in Figure 5, the IAE converges to a single value for all cars (taking into account our approximation of a
±0.5 km/h interval while measuring it) and is clearly lower when the adaptation mechanism for expected
sensory log-precisions is introduced, making the controller very responsive to external forces and thus
reducing the time away from the target velocity, see Figure 4 for an example.
Entropy 2019, 21, 257 16 of 25
Acceleration (km/h 2)
30 40
Velocity (km/h)
20 20
0 0
x
10 x
20
0 40
60
10
0 50 100 150 200 250 300 0 50 100 150 200 250 300
Time (s) Time (s)
(a) (b)
0 0
2 2
4 Expec. of log-precision, z
4 Expec. of log-precision, z0
6
0 50 100 150 200 250 300 0 50 100 150 200 250 300
Time (s) Time (s)
(c) (d)
Figure 4. Optimising PID gains as expected sensory log-precisions µγz̃ . This example shows the control
of the car velocity before and after the optimisation of µγz̃ (before and after the vertical dash dot black
line) is introduced. (a) The velocity of the car; (b) The acceleration of the car; (c) The action of the car, with
an external disturbance introduced at t = 150 s; (d) The optimisation of expected sensory precisions µγz̃
and their convergence to an equilibrium state, after which the optimisation is stopped before introducing
an external force. The blue line represents the true log-precision of observation noise in the system,
γz = γz0 = 5.
2000
1500
IAE (a.u.)
1000
500
0
No adaptation Adaptation
Figure 5. Performance of PID controllers with and without adaptation of the gains based on the
minimisation of free energy. The integral absolute error (IAE) is used to measure the effects of the
oscillations introduced by a single load disturbance at t = 150 s (see text for the exact definition of the IAE).
Entropy 2019, 21, 257 17 of 25
5. Discussion
In this work we developed a minimal account of regulation and control mechanisms based on active
inference, a process theory for perception, action and higher order functions expressed via the minimisation
of variational free energy [4,8,10,13]. Our implementation constitutes an example of the parsimonious,
action-oriented models described in [24,25], connecting them to methods from classic control theory.
We focused in particular on Proportional-Integral-Derivative (PID) control, both extensively used in
industry [36–38,78] and more recently emerging as a model of robust feedback mechanisms in biology,
implemented for instance by bacteria [39], amoeba [40] and gene networks [41], and in psychology [42].
PID controllers are ubiquitous in engineering mostly due to the fact that one needs only little knowledge
of the process to regulate. In the biological sciences, this mechanism is thought to be easily implemented
even at a molecular level [43] and to constitute a possible account for limited knowledge of the external
world in simple agents [76].
Following our previous work on minimal generative models [26], we showed that this mechanism
corresponds, in active inference terms, to linear generative models for agents that only approximate
properties of the world dynamics. Specifically, our model describes linear dynamics for a single
hidden or latent state and a linear mapping from the hidden state to an observed variable, representing
knowledge of the world that is potentially far removed from the real complexity behind observations
and their hidden variables. To implement such a model, we defined a generative model that only
approximates the environment of an agent and showed how under a set of assumptions including
analytic (i.e., non-Markovian, differentiable) Gaussian noise and linear dynamics, this recapitulates PID
control. A crucial component of our formulation is the presence of low sensory precision parameters
on proprioceptive prediction errors of our free energy function or equivalently, high expected variance
of proprioceptive signals. These low precisions play two roles during the minimisation of free energy:
(1) they implement control signals as predictions of proprioceptive input influenced by strong priors (i.e.,
desires) rather than by observations, see Equation (24) and [13], and (2) they reflect a belief that there are
large exogenous fluctuations (low precision = high variance) in the observed proprioceptive input. This
last point can be seen as the well known property of the Integral term [73,76] of PID controllers, dealing
with unexpected external input (i.e., large exogenous fluctuations). The model represented by derivatives
∂ψ̃/∂a encodes then how actions a approximately affect observed proprioceptive sensations ψ̃, with an
agent implementing a sensorimotor mapping that does not match the real dynamics of actions applied
to the environment. The formulation in Equations (20) and (21) can in general be applied to different
tasks, in the same way PID control is used in different problems without specific knowledge of the system
to regulate.
The generative model we used is expressed in generalised coordinates of motion, a mathematical
construct used to build non-Markovian continuous stochastic models based on Stratonovich calculus.
Their importance has been expressed before [12,56,57], for the treatment of real world processes best
approximated by continuous models and for which Markov assumptions do not really hold (see also [69]
for discussion). The definition of a generalised state-space model provides then a series of weighted
prediction errors and their higher orders of motion from the start, with PID control emerging as the
consequence of an agent trying to impose its desired prior dynamics on the world via the approximate
control of its observations on different embedding orders (for I, P and D terms). In this light, the ubiquitous
efficacy of PID control may thus reflect the fact that the simplest models of controlled dynamics are
first-order approximations to generalised motion. This simplicity is mandated because the minimisation
of free energy is equivalent to the maximisation of model evidence, which can be expressed as accuracy
minus complexity [10,24]. On this view, PID control emerges via the implementation of constrained
Entropy 2019, 21, 257 18 of 25
(parsimonious, minimum complexity) generative models that are, under some constraints, the most
effective (maximum accuracy) for a task.
In the control theory literature, many tuning rules for PID gains have been proposed (e.g.,
Ziegler-Nichols, IMC, etc., see [36,38] for a review) and used in different applications [36–38,48,78],
however, most of them produce quite different results, highlighting their inherent fit to only one of many
different goals of the control problem. With our active inference formulation, we argue that different
criteria can (and should) be expressed within the same set of equations in order to better understand their
implications for a system. Modern approaches to the study of PID controllers propose four points as
fundamental features to be considered for the design of a controller [44]:
In our formulation, these criteria can be interpreted using precision (inverse variance) parameters
of different prediction errors in the variational free energy, expressing the uncertainty associated to
observations and priors, as reported in Table 1, see also Appendix B for further reference.
After establishing the equivalence between PID control and linear approximations of generalised
motion in generative models, we showed that the controllers’ gains, k i , k p , k d , are in our formulation
equivalent to expected precisions, µπz , µπz0 , µπz00 , for which a minimisation scheme is provided in [12,56,57].
The basic version of this optimisation also produces promising results in the presence of time-varying
measurement (white) noise in the simulated car (see Figure A1 in Appendix B). If the adaptation is halted
on a system with fixed measurement noise, it can be used to effectively deal with load disturbances,
external forces acting against a system reaching his target (see Figure 4), e.g., a change in chemicals
concentration for a bacterium.
Future extensions could provide a more principled way of dealing with these two (and possibly
other) conflicting cases, an issue that can be solved by introducing suitable hyperpriors (priors on
hyperparameters) expressing the confidence of a system regarding changes in measurement noise via the
use of precisions on hyperpriors [12]. High confidence (i.e., high precision on hyperpriors) would imply
that a system should quickly react to sudden changes, both in measurement noise and other disturbances,
since they are unexpected. On the other hand, low confidence (i.e., low precision on hyperpriors) would
Entropy 2019, 21, 257 19 of 25
make a system’s reaction to new conditions slower since such changes are expected. A trade-off between
these conditions, with appropriate knowledge of a system or a class of systems introduced in the form
of hyperpriors, would then make the process completely automatised, taking advantage of, for instance,
empirical Bayes for learning such hyperpriors [10]. By extending our proposition with priors on precisions
we can also, in principle, cast more criteria for the controller, expressing different requirements for
more complex regulation processes. Given the fact that any optimality criterion can be recast as a
prior, following the complete class theorem [80,81], as long as we know how to represent these rules as
priors for the controller, we can provide any combination of requirements and tune the parameters in a
straightforward fashion.
6. Conclusions
PID controllers are robust controllers used as a model of regulation for noisy and non-stationary
processes in different engineering fields [38,73]. More recently, they have also been proposed as behavioural
models of adaptive learning in humans [42] and as mechanistic explanations of different functions of
systems in microbiology [39–41]. Their utmost relevance to the natural sciences is becoming clear, with
implementations now proposed at the level of simple biomolecular interactions [43,82]. PID controllers
are renowned for their simplicity and straightforward interpretation in control theory, however, a general
interpretation in probabilistic frameworks (e.g., Bayesian inference) is still missing.
Active inference has been proposed as a general mathematical theory of life and cognition according
to the minimisation of variational free energy [10]. On this view, biological agents are seen as homeostatic
systems maintaining their existence via the the minimisation of free energy. This process is implemented
via the estimation and prediction of latent variables in the world (equivalent to perception) and the
control of sensory inputs with behaviours accommodating normative constraints of an agent. Active
inference is often described as an extension of optimal control theory with deep connections to Bayesian
inference [15]. While methods such as PID control are still widely adopted as models of biological
systems, it is unclear how general theories such as active inference connect to practical implementation
of homeostatic principles such as PID control. In this work we proposed a way to connect these two
perspectives showing how PID controllers can be seen as a special case of active inference. This account
is based on the definition of a linear generative model for an agent approximating the dynamics of its
environment, potentially very different from the information represented by the model. The model is
expressed in generalised coordinates of motion [8,12,69] with prediction errors at different embedding
orders for integral, proportional and derivative components emerging naturally as a consequence of an
agent assuming non-Markovian dynamics on its sensory input. Through the use of active inference we
also proposed the implementation of a mechanism for the optimisation of the gains of a PID controller,
i.e., the weights of different prediction errors, now interpreted as precision parameters encoding the
uncertainty of different variables from the perspective of an agent.
Author Contributions: M.B.: Conceptualization, Data curation, Formal analysis, Investigation, Methodology,
Software, Validation, Visualization, Writing. C.L.B.: Conceptualization, Methodology, Supervision, Writing.
Funding: This work was supported in part by a BBSRC Grant BB/P022197/1.
Acknowledgments: The authors would like to thank Karl Friston for thought-provoking discussions and insightful
feedback on a previous version of this manuscript, and Martijn Wisse and Sherin Grimbergen for important comments
on the initial mathematical derivation.
Conflicts of Interest: The authors declare no competing interest.
Entropy 2019, 21, 257 20 of 25
d2 s
m = F − Fd (A1)
dt2
where s is the position, F the force generated by the engine and Fd a disturbance force that accounts for a
gravitational component Fg , a rolling friction Fr and an aerodynamic drag Fa , such that Fd = Fg + Fr + Fa ,
see again Figure 2d. The forces will be modelled as following:
ω 2
F =r g a(t) Tm 1 − β −1
ωm
Fg =mg sin λ
(A2)
Fr =mg Cr sgn ṡ
1
Fa = ρ Cd Aṡ2
2
with all the constants and variables reported and explained in Table A1.
Description Value
s(t) car position -
rg gear ratio divided by wheel radius 12
a(t) control -
Tm maximum torque 190 Nm
β motor constant 0.4
ω engine speed αn v
ωm speed that gives maximum torque 420 rad/s
m car mass 100 kg
g gravitational acceleration 9.81 m/s2
λ slope of the road 4◦
Cr coefficient of rolling friction 0.01
ρ density of the air 1.3 kg/m3
Cd aerodynamic drag coefficient 0.32
A frontal area of the car 2.4 m2
precision) of the measurement noise πz , see Equations (26) and (27). The remaining gains k p , k d can then be
seen as encoding the uncertainty (i.e., precision) of higher orders of motion when the measurement noise
is effectively coloured, otherwise just approximating possible correlations of the observed data over time.
On the other hand, the robustness to model uncertainty is related to expected process log-precisions
µγw̃ encoding (again by construction) the amplitude of fluctuations due to unknown effects on the
dynamics [12]. By modulating the prior dynamics of a system, these hyperparameters assume then
a double role, they can either: (1) passively describe (estimate) the dynamics of a system (cf. Kalman
filters [83]) or (2) actively impose desired trajectories on observations that can be implemented through
actions on the world, as explained in Section 4.1. With these conditions at the extremes, a spectrum of
intermediate behaviours is also possible, with µγw̃ enacting different sensorimotor couplings by weighting
the importance of objective information and desired states/priors of a system.
In the majority of the formulations of control problems, the properties of measurement noise and
model uncertainty (especially their (co)variance) are assumed to be constant over time. Often, these
parameters also need to be adapted to different systems since their properties are likely to be different.
In Section 4.3, we proposed an optimisation method for the gains of a PID controller based on active
inference that here we exploit for time changing properties of the noise of a system, and that we show in
an example when the measurement noise suddenly increases. In our car example, we could think of a
drop in performance of the sensors recording velocity.
We simulated 20 cars for 300 s with adaptation of expected sensory log-precisions (or log-PI gains)
µγz̃ , introduced at t = 30 s and stopped at t = 150 s. At t = 150 s we then decreased the log-precision of
measurement noise (n.b. not the expectation on the log-precision) from γz = 5 to γz = 2 for the rest of
the simulations to simulate the partial failure of a sensor, and stopped the adaptation process. We then
simulated 20 cars where adaptation was not halted after the increased measurement noise. To represent
the difference, we measured the variance of the real velocity of the cars (without measurement noise to
avoid biases), from t = 225 s to t = 300 s to allow the velocity to settle after the transient due to the sudden
change. Agents that kept adapting their gains are shown to be more robust to persistent changes in noise,
see Figure A1.
4.0
3.5
Variance (a.u.)
3.0
2.5
2.0
1.5
1.0
Adaptation interrupted Continual adaptation
Figure A1. Performance of PID controllers with a sudden increase in measurement noise. Twenty cars
simulated in the case where measurement noise is increased at t = 150 s during the 300 s simulations. We
report aggregate results with the variance from the target value measured over the last 25% (225 < t < 300 s)
of a simulation. We show (1) the case for adaptation of the gains of the PI controller (through expected
sensory log-precisions, or log-PI gains, µγz̃ ) interrupted before the measurement noise drastically changes,
and (2) the case where the adaptation process persists for the entire duration of the simulations.
Entropy 2019, 21, 257 22 of 25
In the case of model uncertainty, given the dual role of µγw̃ explained above, i.e., encoding prior
dynamics reflecting both real properties of the environment and desired trajectories imposed on the system
to regulate, it is harder to show the update of expected precisions without compromising the control of the
car. The optimisation of variational free energy is, in fact, not intrinsically biased towards the control of a
system, i.e., we externally imposed that as a condition for the agent. While having more flexible priors,
an agent could potentially begin to account for uncertainty in the world rather than forcibly changing its
observations to reach its target.
References
1. Dayan, P.; Hinton, G.E.; Neal, R.M.; Zemel, R.S. The Helmholtz Machine. Neural Comput. 1995, 7, 889–904.
[CrossRef] [PubMed]
2. Rao, R.P.; Ballard, D.H. Predictive coding in the visual cortex: A functional interpretation of some extra-classical
receptive-field effects. Nat. Neurosci. 1999, 2, 79–87. [CrossRef] [PubMed]
3. Knill, D.C.; Pouget, A. The Bayesian brain: The role of uncertainty in neural coding and computation.
Trends Neurosci. 2004, 27, 712–719. [CrossRef] [PubMed]
4. Friston, K.J.; Kilner, J.; Harrison, L. A free energy principle for the brain. J. Physiol.-Paris 2006, 100, 70–87.
[CrossRef] [PubMed]
5. Clark, A. Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behav. Brain Sci.
2013, 36, 181–204. [PubMed]
6. Hohwy, J. The Predictive Mind; OUP Oxford: Oxford, UK, 2013.
7. Bogacz, R. A tutorial on the free-energy framework for modelling perception and learning. J. Math. Psychol.
2017, 76, 198–211. [CrossRef] [PubMed]
8. Buckley, C.L.; Kim, C.S.; McGregor, S.; Seth, A.K. The free energy principle for action and perception:
A mathematical review. J. Math. Psychol. 2017, 14, 55–79. [CrossRef]
9. Friston, K.J. The free-energy principle: A rough guide to the brain? Trends Cognit. Sci. 2009, 13, 293–301.
[CrossRef] [PubMed]
10. Friston, K.J. The free-energy principle: A unified brain theory? Nat. Rev. Neurosci. 2010, 11, 127–138. [CrossRef]
[PubMed]
11. Friston, K.J.; Rigoli, F.; Ognibene, D.; Mathys, C.; Fitzgerald, T.; Pezzulo, G. Active inference and epistemic
value. Cognit. Neurosci. 2015, 1–28, doi:10.1080/17588928.2015.1020053. [CrossRef] [PubMed]
12. Friston, K.J. Hierarchical models in the brain. PLoS Comput. Biol. 2008, 4, e1000211. [CrossRef] [PubMed]
13. Friston, K.J.; Daunizeau, J.; Kilner, J.; Kiebel, S.J. Action and behavior: A free-energy formulation. Biol. Cybernet.
2010, 102, 227–260. [CrossRef] [PubMed]
14. Friston, K.J.; Mattout, J.; Kilner, J. Action understanding and active inference. Biol. Cybernet. 2011, 104, 137–160.
[CrossRef] [PubMed]
15. Friston, K.J. What is optimal about motor control? Neuron 2011, 72, 488–498. [CrossRef] [PubMed]
16. Friston, K.J. Life as we know it. J. R. Soc. Interface 2013, 10, 20130475. [CrossRef] [PubMed]
17. Seth, A.K. The Cybernetic Bayesian Brain; Open MIND: Frankfurt am Main, Germany, 2014.
18. Evans, D.J.; Searles, D.J. The fluctuation theorem. Adv. Phys. 2002, 51, 1529–1585. [CrossRef]
19. Friston, K.J. A free energy principle for biological systems. Entropy 2012, 14, 2100–2121.
20. Pezzulo, G.; Cisek, P. Navigating the affordance landscape: Feedback control as a process model of behavior
and cognition. Trends Cognit. Sci. 2016, 20, 414–424. [CrossRef] [PubMed]
21. Körding, K.P.; Wolpert, D.M. Bayesian decision theory in sensorimotor control. Trends Cognit. Sci. 2006,
10, 319–326. [CrossRef] [PubMed]
22. Franklin, D.W.; Wolpert, D.M. Computational mechanisms of sensorimotor control. Neuron 2011, 72, 425–442.
[CrossRef] [PubMed]
23. Friston, K.J.; Daunizeau, J.; Kiebel, S.J. Reinforcement learning or active inference? PLoS ONE 2009, 4, e6421.
[CrossRef] [PubMed]
Entropy 2019, 21, 257 23 of 25
24. Clark, A. Surfing Uncertainty: Prediction, Action, and the Embodied Mind; Oxford University Press: Oxford, UK, 2015.
25. Clark, A. Radical predictive processing. Southern. J. Philos. 2015, 53, 3–27. [CrossRef]
26. Baltieri, M.; Buckley, C.L. An active inference implementation of phototaxis. In Proceedings of the 14th
European Conference on Artificial Life 2017, Lyon, France, 4–8 September 2017; pp. 36–43.
27. Baltieri, M.; Buckley, C.L. A Probabilistic Interpretation of PID Controllers Using Active Inference. In From
Animals to Animats 15; Manoonpong, P., Larsen, J.C., Xiong, X., Hallam, J., Triesch, J., Eds.; Springer International
Publishing: Berlin, Germany, 2018; pp. 15–26.
28. Clark, A. Being There: Putting Brain, Body, and World Together Again; MIT Press: Cambridge, MA, USA, 1998.
29. Ashby, W.R. Requisite variety and its implications for the control of complex systems. Cybernetica 1958,
1, 83–99.
30. Shannon, C.E. Coding theorems for a discrete source with a fidelity criterion. IRE Nat. Conv. Rec 1959, 4, 1.
31. Kalman, R.E. Contributions to the theory of optimal control. Bol. Soc. Mat. Mexicana 1960, 5, 102–119.
32. Stengel, R.F. Optimal Control and Estimation; Courier Corporation: North Chelmsford, MA, USA, 1994.
33. Todorov, E. General duality between optimal control and estimation. In Proceedings of the 47th IEEE
Conference on Decision and Control, Cancun, Mexico, 9–11 December 2008; pp. 4286–4292.
34. Ashby, W.R. An Introduction to Cybernetics; Chapman & Hall Ltd.: London, UK, 1957.
35. Wiener, N. Cybernetics or Control and Communication in the Animal and the Machine; MIT Press: Cambridge, MA,
USA, 1961; Volume 25.
36. Åström, K.J. PID Controllers: Theory, Design and Tuning; ISA: The Instrumentation, Systems, and Automation
Society: Research Triangle Park, NC, USA, 1995.
37. Ang, K.H.; Chong, G.; Li, Y. PID control system analysis, design, and technology. IEEE Trans. Control
Syst. Technol. 2005, 13, 559–576.
38. Åström, K.J.; Hägglund, T. Advanced PID Control; ISA: The Instrumentation, Systems, and Automation Society:
Research Triangle Park, NC, USA, 2006.
39. Yi, T.M.; Huang, Y.; Simon, M.I.; Doyle, J. Robust perfect adaptation in bacterial chemotaxis through integral
feedback control. Proc. Natl. Acad. Sci. USA 2000, 97, 4649–4653. [CrossRef] [PubMed]
40. Yang, L.; Iglesias, P.A. Positive feedback may cause the biphasic response observed in the
chemoattractant-induced response of Dictyostelium cells. Syst. Control Lett. 2006, 55, 329–337. [CrossRef]
[PubMed]
41. Ang, J.; Bagh, S.; Ingalls, B.P.; McMillen, D.R. Considerations for using integral feedback control to construct a
perfectly adapting synthetic gene network. J. Theoret. Biol. 2010, 266, 723–738. [CrossRef] [PubMed]
42. Ritz, H.; Nassar, M.R.; Frank, M.J.; Shenhav, A. A Control Theoretic Model of Adaptive Learning in Dynamic
Environments. J. Cognit. Neurosci. 2018, 30, 1405–1421. [CrossRef] [PubMed]
43. Chevalier, M.; Gomez-Schiavon, M.; Ng, A.; El-Samad, H. Design and Analysis of a
Proportional-Integral-Derivative Controller with Biological Molecules. bioRxiv 2018. [CrossRef]
44. Åström, K.J.; Hägglund, T. The future of PID control. Control Eng. Pract. 2001, 9, 1163–1175. [CrossRef]
45. Åström, K.J.; Hägglund, T. Revisiting the Ziegler–Nichols step response method for PID control.
J. Process Control 2004, 14, 635–650. [CrossRef]
46. Arturo Urquizo. PID Controller—Wikipedia, the Free Encyclopedia. 2011. Available online: https://en.
wikipedia.org/wiki/PID_controller (accessed on 30 March 2018).
47. Rivera, D.E.; Morari, M.; Skogestad, S. Internal model control: PID controller design. Ind. Eng. Chem. Process
Des. Dev. 1986, 25, 252–265. [CrossRef]
48. Åström, K.J.; Panagopoulos, H.; Hägglund, T. Design of PI controllers based on non-convex optimization.
Automatica 1998, 34, 585–601. [CrossRef]
49. Garpinger, O.; Hägglund, T.; Åström, K.J. Performance and robustness trade-offs in PID control.
J. Process Control 2014, 24, 568–577. [CrossRef]
50. Grimble, M.; Johnson, M. Algorithm for PID controller tuning using LQG cost minimization. In Proceedings of
the 1999 American Control Conference (Cat. No. 99CH36251), San Diego, CA, USA, 2–4 June 1999; Volume 6,
pp. 4368–4372.
Entropy 2019, 21, 257 24 of 25
51. O’Brien, R.T.; Howe, J.M. Optimal PID controller design using standard optimal control techniques.
In Proceedings of the 2008 American Control Conference, Seattle, WA, USA, 11–13 June 2008; pp. 4733–4738.
52. Kolchinsky, A.; Wolpert, D.H. Semantic information, autonomous agency and non-equilibrium statistical
physics. Interface Focus 2018, 8, 20180041. [CrossRef] [PubMed]
53. Beal, M.J. Variational Algorithms for Approximate Bayesian Inference; University of London: London, UK, 2003.
54. Bishop, C.M. Pattern Recognition and Machine Learning; Springer: New York, NY, USA, 2006.
55. Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [CrossRef]
56. Friston, K.J.; Trujillo-Barreto, N.; Daunizeau, J. DEM: A variational treatment of dynamic systems. NeuroImage
2008, 41, 849–885. [CrossRef] [PubMed]
57. Friston, K.J.; Stephan, K.; Li, B.; Daunizeau, J. Generalised filtering. Math. Probl. Eng. 2010, 2010. [CrossRef]
58. Stratonovich, R.L. Topics in the Theory of Random Noise; CRC Press: Boca Raton, FL, USA, 1967; Volume 2.
59. Jazwinski, A.H. Stochastic Processes and Filtering Theory; Academic Press: Cambridge, MA, USA, 1970;
Volume 64.
60. Fox, R.F. Stochastic calculus in physics. J. Stat. Phys. 1987, 46, 1145–1157. [CrossRef]
61. Wong, E.; Zakai, M. On the convergence of ordinary integrals to stochastic integrals. Ann. Math. Stat. 1965,
36, 1560–1564. [CrossRef]
62. Moon, W.; Wettlaufer, J. On the interpretation of Stratonovich calculus. New J. Phys. 2014, 16, 055017. [CrossRef]
63. Van Kampen, N.G. Stochastic Processes in Physics and Chemistry; Elsevier: Amsterdam, the Netherlands, 1992;
Volume 1.
64. Klöden, P.E.; Platen, E. Numerical Solution of Stochastic Differential Equations; Springer: Berlin, Germany, 1992.
65. Chui, C.K.; Chen, G. Kalman filtering with Real-Time Applications; Springer: Berlin, Germany, 2017.
66. Friston, K.J.; Mattout, J.; Trujillo-Barreto, N.; Ashburner, J.; Penny, W. Variational free energy and the Laplace
approximation. Neuroimage 2007, 34, 220–234. [CrossRef] [PubMed]
67. MacKay, D.J. Information Theory, Inference and Learning Algorithms; Cambridge University Press: Cambridge,
MA, USA, 2003.
68. Särkkä, S. Bayesian Filtering and Smoothing; Cambridge University Press: Cambridge, MA, USA, 2013; Volume 3.
69. Kim, C.S. Recognition Dynamics in the Brain under the Free-Energy Principle. Neural Comput. 2018,
30, 2616–2659. [CrossRef] [PubMed]
70. Brown, H.; Adams, R.A.; Parees, I.; Edwards, M.; Friston, K.J. Active inference, sensory attenuation and
illusions. Cognit. Process. 2013, 14, 411–427. [CrossRef] [PubMed]
71. Kawato, M. Internal models for motor control and trajectory planning. Curr. Opin. Neurobiol. 1999, 9, 718–727.
[CrossRef]
72. Wolpert, D.M.; Ghahramani, Z. Computational principles of movement neuroscience. Nat. Neurosci. 2000,
3, 1212. [CrossRef] [PubMed]
73. Åström, K.J.; Murray, R.M. Feedback Systems: An Introduction for Scientists and Engineers; Princeton University
Press: Princeton, NJ, USA, 2010.
74. Svrcek, W.Y.; Mahoney, D.P.; Young, B.R.; Mahoney, D.P. A Real-Time Approach to Process Control; Wiley:
New York, NY, USA, 2006.
75. Muzzey, D.; Gómez-Uribe, C.A.; Mettetal, J.T.; van Oudenaarden, A. A systems-level analysis of perfect
adaptation in yeast osmoregulation. Cell 2009, 138, 160–171. [CrossRef] [PubMed]
76. Sontag, E.D. Adaptation and regulation with signal detection implies internal model. Syst. Control Lett. 2003,
50, 119–126. [CrossRef]
77. Araki, M.; Taguchi, H. Two-degree-of-freedom PID controllers. Int. J. Control Autom. Syst. 2003, 1, 401–411.
78. Johnson, M.A.; Moradi, M.H. PID Control; Springer: Berlin, Germany, 2005.
79. Hägglund, T. A control-loop performance monitor. Control Eng. Pract. 1995, 3, 1543–1551. [CrossRef]
80. Ferguson, T.S. Mathematical Statistics: A Decision Theoretic Approach; Academic Press: Cambridge, MA,
USA, 1967.
81. Brown, L.D. A complete class theorem for statistical problems with finite sample spaces. Ann. Stat. 1981, 9,
1289–1300. [CrossRef]
Entropy 2019, 21, 257 25 of 25
82. Briat, C.; Gupta, A.; Khammash, M. Antithetic integral feedback ensures robust perfect adaptation in noisy
biomolecular networks. Cell Syst. 2016, 2, 15–26. [CrossRef] [PubMed]
83. Kalman, R.E. A new approach to linear filtering and prediction problems. J. Basic Eng. 1960, 82, 35–45.
[CrossRef]
c 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution (CC
BY) license (http://creativecommons.org/licenses/by/4.0/).