Quantum Computing
Quantum Computing
Bayesian Machines
This page intentionally left blank
QuantumandMechanics
Bayesian Machines
George Chapline
Lawrence Livermore National Laboratory, USA
World Scientific
NEW JERSEY • LONDON • SINGAPORE • BEIJING • SHANGHAI • HONG KONG • TAIPEI • CHENNAI • TOKYO
Published by
World Scientific Publishing Co. Pte. Ltd.
5 Toh Tuck Link, Singapore 596224
USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601
UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance
Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy
is not required from the publisher.
Printed in Singapore
Preface
v
vi Quantum Mechanics and Bayesian Machines
ix
This page intentionally left blank
Acknowledgments
xi
This page intentionally left blank
Contents
Preface v
About the Author ix
Acknowledgments xi
1. Introduction 1
xiii
xiv Quantum Mechanics and Bayesian Machines
References 167
Index 177
Chapter 1
Introduction
1
2 Quantum Mechanics and Bayesian Machines
The symbol P means averaging over all the possible paths going
from x at t = 0 to x(T ) at t = T . The optimal path will be defined
by the path where the integral over q(x, t) is minimized. However,
because of the necessity of exploring many paths in Eq. (1.3), the
Introduction 7
change in V (t) along the optimal path will have an additional cost
term, the “control cost”, which encourages the controlled history of
the system to lie near a path that might be attributed to the system
dynamics without controls and can be identified as a KL divergence.
In an earlier paper [19], Todorov showed that the loss rate for the
optimal Bellman cost function can be written as a sum of q(x, u) and
the KL divergence term, which together are minimized as w.r.t u(x).
This in turn leads to an expression for V (t) as the backward filtering
probability for the state of the system given all previous observations
of the system. By expressing the Bellman function V (t) in terms of a
probabilistic chain of actions (cf. [19]), the sum in Eq. (1.3) can also
be evaluated using Monte Carlo methods [31]. However, this can be
very time-consuming, if not intractable.
Taking to heart the similarity between the space-time structure
of MDPs and quantum dynamics that was one of our original inspi-
rations [16], one might guess that Eq. (1.3) can be faithfully emu-
lated using Feynman’s “sum over paths” interpretation for quantum
mechanics introduced in his 1942 PhD thesis [32]. In this interpre-
tation of Eq. (1.3), the action function appearing in the exponent of
Feynman’s sum over paths replaces the reward function q(x, u) that
appears in Todorov’s formulation of optimal control [19], while the
control variable u(t) is represented by Dirac’s momentum operator
−i∂/∂x. The r.h.s of Eq. (1.3) would be then replaced by a sum over
quantum paths where the real√exponents, the r.h.s of Eq. (1.3), are
replaced by iq(x) where i = −1. In this quantum interpretation
of Eq. (1.3) the sum over real negative exponentials is replaced by
Feynman’s sum over quantum paths expression [32] for a quantum
propagator describing the translation along a path x(t) of a two-
component wave function Ψ(x):
y
T (x, y|λ) = P exp −i q̃(x , t|λ)dx (t) , (1.4)
x
Quantum theory began to take form at the end of the 19th cen-
tury as a result of Max Planck’s introduction [40] of the quantum of
light in 1900 in connection with the problem of understanding the
spectrum of thermal radiation emerging from an oven. There was
already an appreciation at the time of Planck’s paper that there were
a variety of physical phenomena, e.g. the dependence of the chemical
and spectral properties of atoms on their atomic number, the spec-
trum of thermal radiation, radioactivity, etc. that were refractory
to explanations based on classical physics. However, Planck’s focus
on the problem of understanding the entropy of thermal radiation
turned out to be pivotal for the future of physics. Following Planck’s
unveiling of light quanta, it was soon realized, largely as a result of
the work of Bohr and Sommerfeld [41], that Planck’s discovery had
profound implications for our understanding of atomic matter. Quan-
tum mechanics emerged in the 1920s because of a desire to extend
the Bohr–Sommerfeld quantum theory, which had only been success-
ful for simple (actually “integrable”) physical systems, to all types
of physical systems. As prophesied by Dirac [35], quantum mechan-
ics does in fact appear to provide us with a mathematically con-
sistent framework for understanding all known natural phenomena.
Quantum mechanics made its debut [42] in 1925 with the two simul-
taneous papers of Dirac and Heisenberg, Born, and Jordan. These
papers provided a foundation for theoretical physics where matri-
ces were used to represent physical quantities. Initially the physical
meaning of this “matrix mechanics” was rather obscure, although
it soon became clear [43] that the epistemological flaw with classi-
cal physics lay with the implicit assumption that the variables used
in classical physics, e.g. the position of a particle or the magnitude
and polarization of an electric field, could — at least in principle —
be simultaneously measured with arbitrary precision. Before 1925 it
had always been imagined that physics should be directly based on
measurable quantities. Heisenberg’s great achievement [43] was his
“uncertainty principle”, which explained that the flaw in classical
physics lay in the tension that always exists between the way exper-
imental measurements are carried out — particularly when atomic
phenomena are involved — and the desire that physics should be
based entirely on physical laws that were completely independent of
the way measurements are carried out. In the 1925 papers of Dirac
and Born, Heisenberg, and Jordan [42] it was proposed that this
10 Quantum Mechanics and Bayesian Machines
This is true for both kitchen ovens and the universe. (The entropy
of the universe is to a very good approximation just the number of
cosmic microwave photons per gram of dark matter.) As was empha-
sized by Planck in his original work on thermal radiation [40], one of
the most satisfying consequences of quantizing the energy levels of a
system is that the absolute entropy of any physical system acquires
a well-defined combinatorial definition. The combinatorial definition
of the entropy of thermal radiation provided by Planck suggests that
quantum theory could be relevant to representing the information
theoretic aspects of Bayesian learning.
At first sight this may appear implausible because the equations
of quantum mechanics, e.g. the Schrodinger wave equation, are by
themselves deterministic, and therefore there is no obvious mech-
anism for representing the gathering of information required for
Bayesian learning. On the other hand, there is an underlying random-
ness associated with the choice of quantum paths in the path integral
formulation of the Schrodinger equation. In addition, including the
measurement process into a quantum description of the dynamics
of a system apparently does offer the possibility of introducing the
randomness represented by the conditional probabilities in Bayes’s
formula. There have been several attempts (see e.g. [48]) to describe
the effects of measurements by modifying the Schrodinger equation
so that it is no longer deterministic. However, there is as yet no
universal agreement as to which of these stochastic extensions of
Schrodinger’s equation would be the canonical best choice. For our
purposes we will follow the ideas of Schwinger [49], Caldeira and
Leggett [50], and Keldysh [51] regarding how to describe relaxation
processes due to measurements within the framework of quantum
mechanics. In particular, we will make use of the double path inte-
gral description of interacting quantum systems due to Feynman and
Vernon [47] (see also Appendix D).
Kappen [31] pointed out that as a function of the level of innova-
tion noise (the difference between observations of the state models for
a system) the Bellman dynamic programming equations change from
being deterministic at low noise levels to being explicitly stochastic
at high noise levels. This transition is reflected in the relative con-
tributions of the reward function and KL divergence to the Bellman
function loss rate. At low noise levels, the KL divergence term can be
neglected and the Bellman loss rate will be determined by a reward
12 Quantum Mechanics and Bayesian Machines
Dyson. In 1975, Dyson noticed [58] that at low light intensities where
photon noise becomes important, the feedback equations of adaptive
optics are formally identical with the theory of inverse scattering for
the 3D Schrodinger equation. In Dyson’s approach to adaptive optics,
the effect of the atmosphere on a flat 2D wave front is observed by
using a phase sensor that allows for observation of arbitrary 2D cor-
relations between the atmospheric noise in different optical channels.
These equations are a 3D generalization of the equations developed
in the 1950s by Gelfand, Levitan, and Marčenko [59,60] for the pur-
pose of finding the potential of the 1D Schrodinger equation based
on scattering data, (see Appendix B). Dyson’s discovery of a con-
nection between adaptive optics in the presence of photon noise and
the 3D Schrodinger equation naturally stimulated interest in why
so seemingly disparate topics are connected, and a full resolution of
this puzzle remains to this day. Our aim though is somewhat different
than just understanding scattering solutions of the 3D Schrodinger
equation. As in our original paper [36], we will be focused on regres-
sion between observations and models for an entire control history
which terminates in a desired history.
In the following, we do not claim to prove that translating
probabilistic models for optimal control such as Bellman’s dynamic
programming into the language of quantum mechanics necessar-
ily provides better results than what might be achieved with conven-
tional computational resources. However, we do wish to emphasize
some ab initio advantages that quantum amplitudes enjoy in com-
parison with conventional probabilistic representations for Markov
decision chains (MDPs). One prominent advantage is the elegant
way in which quantum amplitudes can capture causal relationships.
This is very challenging [62] for conventional machine learning tech-
niques; especially in cases where the computational model aspires to
“artificial intelligence” [63]. As was noted by Feynman in his Nobel
Prize winning paper [64] introducing a relativistic quantum theory
of photons interacting with electrons and positrons, there is no nat-
ural way to combine causal and anti-causal influences within the
framework of classical electrodynamics (a footnote in [64]). On the
other hand, in his theory of quantum electrodynamics [64] Feynman
introduced a way of combining forward and backward in time prop-
agation for electrons that takes full advantage of the fact that the
relevant quantum amplitudes can be regarded as smooth functions of
14 Quantum Mechanics and Bayesian Machines
that from the point of view of information theory is the most eco-
nomical. This principle is a legacy of William of Ockham, who early
in the 14th century [85] put forward one of the fundamental tenets
of science: that the best explanation for a physical phenomenon is
usually the simplest. It is perhaps counterintuitive that a principle
of physical science should underlie data analysis. However, William’s
principle of minimizing the complexity of the explanation for a set
of observations is at the heart of the notion that Bayes’s formula
provides the logical basis for solving a variety of problems includ-
ing stochastic estimation, Bayesian searches, and feedback control.
The maximum likelihood method [6], which is widely used to solve
these types of problems, short circuits the full use of the Bayes for-
mula by looking only at ratios of the likelihood factor in Eq. (1.1).
However, as has been emphasized by McKay [6], simply looking at
the likelihood that a model for the data yields a particular set of
observations can lead to serious errors when one must choose the
model from an ensemble of a priori approximately equally plausible
models. Reflecting Mumford’s insight [23] regarding the MDL princi-
ple and mammalian cognition, McKay’s “Occam razor” factor [6] is
possibly the best metric yet proposed for guiding data model selec-
tion. Chapter 3 concludes with a brief account of how the search for
methods for dealing with hidden factors [7,29] led to the Helmholtz
machine [8], which provides a logical framework for how conditional
probabilities which reflect the MDL principle might be computed as
Markov decision chains.
Chapter 4 focuses on control theory [3], and in particular on the
deterministic limit of Bellman optimization, known as Pontryagin
control [86,87]. The Pontryagin procedure for realizing the determin-
istic limit of optimal control is somewhat different than the Euler–
Lagrange variational procedure described in textbooks for obtaining
the classical equations of motion by minimizing the Maupertuis
action (see e.g. [88]). The Euler–Lagrange method for obtaining the
equations of motion for classical mechanics differs from the proce-
dure for obtaining the optimal path for feedback control by minimiz-
ing the Bellman cost function in that the Euler–Lagrange method
only demands uniform convergence for the positions’ classical tra-
jectory dx/dt, whereas the Pontryagin limit of Bellman optimiza-
tion demands simultaneous uniform convergence in both the system
18 Quantum Mechanics and Bayesian Machines
Thomas Bayes was born in 1701 and died in 1761. He was an ordained
minister in Turnbridge Wells — about 35 miles southeast of London.
Although he published no scientific papers during his lifetime, his
mathematical abilities must have been known to his contemporaries
because he was elected a Fellow of the Royal Society in 1742. After
his death, his “Essay Towards Solving a Problem in the Doctrine
of Chances” was published in the Philosophical Transactions of the
Royal Society [82]. This essay, arguably one of the most important
papers in the history of science, introduced a rigorous methodology
for estimating the probabilities for different possible explanations for
experimental observations based on the “evidence” [5,6]. Given the
revolutionary implications of Bayes’s essay, especially compared with
what was understood in the 18th century about the scientific method,
one might have expected that his formula would have immediately
been celebrated. Unfortunately, in what perhaps may be assayed as
one of the most significant lapses in the post-Renaissance progress
of science, it took more than 300 years after its publication for the
great value of Bayes’s theorem for data analysis to be fully appreci-
ated. Fortunately, the fundamental usefulness of Bayes’s formula for
data analysis, Markov decision problems, and optimal control is now
widely appreciated (see e.g. [6]) — even if not widely used.
The origins of Bayes’s essay are not entirely understood, but it
seems likely [82] that he was motivated by the earlier work of another
25
26 Quantum Mechanics and Bayesian Machines
One of the reasons for the long historical delay in making exten-
sive use of Bayes’s formula was apparently confusion as to how one
could estimate the in general unknown a priori probability distri-
butions. This uncertainty about the usefulness of Bayes’s formula
was eventually dissipated by the development of Bayesian approaches
to search and control problems; where the problem of defining the
prior probabilities for possible explanations is side-stepped by using
the probabilistic predictions from the prior step of the search or con-
trol process as the input a priori probability for the next step. The
problem with the initial a priori probability lingers, but it is often
the case that the final answer is insensitive to the exact initial a priori
probability distribution. In addition, conceptual unease with uncer-
tainties in a priori probabilities has been largely erased by the notion
of using a model generation or “adversarial” network to predict the
probabilities for observed data.
The success of Kalman’s filter also relies on the use of two Gaussian
processes: (1) a noise source N (t) which limits the ability of an
observer to measure a signal (i.e. Y (t) = Ẑ(t)+N (t)), and (2) another
GP w(t) which represents an intrinsic randomness in the system
dynamics. Kailath [83] introduced the designation “innovation” for
u(t) in recognition of the fact that it represents that part of an
observation which yields information about the new state. The gen-
eral scheme (Fig. 4.1) can be pictured as an interaction between an
“observer–controller” and a system; e.g. a mechanical device or an
“environment” (cf. the zebras and their surroundings in Fig. 2.1).
The objective of the Kalman filter is that, given the GPs, w(t) and
Six Fundamental Discoveries 31
v(t), minimize the mean square difference between the estimated cur-
rent state of the system X(u, t) and a desired target state XT .
In cases where both the state X(t) of the environment and
measurements Y (t) are continuous matrix functions of time, these
matrices satisfy:
˙ ˙
Ẑ(t) = H(t)X̂(t), X̂ ((t) = F (t)X̂(t) + K(t)(Y (t) − Ẑ(t)), (2.8)
where the observations Y (t) differ from the signal Z(t) by an obser-
vational noise ν(t); i.e. Y (t) = Z(t) + ν(t), K(t) = P (t)H + G(t)H X̂
describes the increase in our knowledge of the system based on con-
tinuous observation of particular features of the system, and it is
assumed that all the coefficient matrices are all known functions of
time. Here, Hk is a matrix which defines the “features” {Zk } of the
environment which are of greatest interest to designated controllers.
It is a result of combining observation of a system (or ‘environment’
in the case of RL) with the linear dynamical model, Eq. (2.3), for the
system that the controller hopes to gain enough information about
the state or environment to take effective corrective actions.
The time-dependent covariance P (t) = E x̃(t)x̃(t ) for the error
x̃(t) = (x(t) − x̂(t)) can then be found [118] by numerically solving
a ordinary nonlinear differential equation:
Ṗ = −P H T R−1 HP + F P + P F + Q, (2.9)
32 Quantum Mechanics and Bayesian Machines
t1
min min
V (T ) = l( X(τ ), U (τ ), τ )dτ
[t, t1 ] [t1 , T ] t
T
+ l( X(τ ), U (τ ), τ )dτ . (2.10)
t1
Six Fundamental Discoveries 33
where
L = pq̇ − H(q, p). (2.13)
Although using a Lagrangian function of q and dq/dt rather than a
Hamiltonian function of q and p might seem to be a trivial differ-
ence, it turned out that using the Principle of Least Action as the
starting point for formulating quantum mechanics made a profound
difference.
Following Dirac’s lead [35], Richard Feynman investigated in his
PhD thesis [32] what role the classical principle of least action might
Six Fundamental Discoveries 35
where S[b.a] is the classical action (cf. [47]) going from xa to xb , and
D[b, a] denotes a sum over all paths leading from xa to xb . For a free
particle with mass m in one-dimension, this propagator takes the
simple form [47]:
1 im (xb − xa )2
K(xa , tb ; xa , ta ) = exp , (2.17)
A(tb − ta ) 2 tb − ta
brains of other animal species is that the mammalian brain has some
capability of dealing with ambiguities in the interpretation of sensory
data, which necessarily involves [78] combinatorial optimization.
Our introduction of the term quantum self-organization in con-
nection with the TSP is a pointer to the relationship between our
solution for the TSP and the appearance of holomorphic functions
in Kohonen self-organization of sensory data [12]. Our discovery was
prompted by the Durbin–Willshaw elastic net method [1] for finding
solutions to the TSP. As the name suggests, this involves adding to
the locations of the cities to be visited, indicated by the round dots
in Fig. 2.1, a trial “itinerary” for the salesman, the square points in
Fig. 2.2, and then connecting all the points to each other by springs.
The lengths d(i, μ) of the springs connecting the nodes where the
salesman is assumed to stop to the actual location of cities that are
to be visited is the “innovation” for the Durbin–Willshaw method;
i.e. the distance between the actual locations of the cities and a model
for the salesman’s itinerary.
In the Durbin–Willshaw approach [1], the TSP is solved by the
simple expedient of connecting the initially randomly placed movable
nodes with elastic strings, and the cities to be visited by nonlinear
strings, and allowing the system to relax to the lowest energy state
using gradient descent dynamics for an energy functional [1]:
|ξ μ − wi |2 K
E[{wi }] = − log exp + |wi+1 − wi |2 .
μ
2 2
i
(2.19)
When the locations {ξ μ } of the square points in Fig. 2.2 are not too
far from round points wi , this approach gives a satisfactory solution
for the TSP via gradient of the energy functional (2.19). What makes
the traveling salesman problem especially interesting from the point
of view of using quantum mechanics to solve optimal control and RL
problems is that the term in Eq. (2.19) involving the difference in
the positions of the round and square points can be replaced [36] by
a Feynman path over all paths marked with squares.
t
i m
K(t − t0 ) = exp |ẋ − v(t)| dt Dy(t ),
2
(2.20)
t0 2
38 Quantum Mechanics and Bayesian Machines
d(i,j)
i,j
,)
Fig. 2.2. Durbin–Willshaw setup for solving the traveling salesman problem.
where the classical velocity v(x) is defined for all x by the actual
motion of the salesman, and y(t) = x(t) − xcl (t) is the deviation of
the Feynman path from the salesman’s itinerary.
Equation (2.19) together with Fig. 2.2 implicitly illustrates one
aspect of the Bayesian model selection problem that is particularly
troublesome; namely, connecting data points with models will in gen-
eral involve a topologically non-trivial planar graph (i.e. a planar
graph where at least two lines defining the graph cross one another).
If one changes the order in which the cities are visited, then in gen-
eral both the solid line marking the salesman’s path and the dashed
line will cross one another. This makes improving the model for
the salesman’s using Markov chain Monte Carlo regression meth-
ods essentially intractable (see e.g. [79]). However, because of the
general mathematical equivalence of topologically nontrivial planar
graphs and topologically simple paths on Riemann surfaces [80], the
regressions that are ill-defined for planar graphs can be carried out
on a topologically nontrivial surface.
In an almost obvious way, the Durbin–Willshaw setup can also be
interpreted as a control problem by interpreting the x and y coordi-
nates of the square dots as estimations of a time-dependent vector
X(t) describing the evolution of the state of a system in phase space
(viz. the evolution of position and velocity variables ( x, ẋ ) for a
self-driving car). In this interpretation, the round points represent
an underlying model {ẑ(t)}, for how these variables vary with time.
The distances d(i, μ) in Fig. 2.2 corresponds to Kailath’s “innovation
Six Fundamental Discoveries 39
Ockham’s Razor
41
42 Quantum Mechanics and Bayesian Machines
where the integral extends over all space and the log is base 2.
The Shannon entropy, Eq. (3.4), is a measure of the progress for
Bayesian searches, optimal control, and reinforcement learning. All
three of these types of machine learning problems can be charac-
terized as the problem of finding a policy for choosing the sequence
of actions so that the Shannon entropy H(pN ) is minimized after
N -steps. Bellman’s optimization of this cost function introduced in
his paper on dynamic programming [20] minimizes at each step the
entropy (3.4):
where the second term on the r.h.s is the expected information avail-
able after step n + 1 given the choice An = A assuming that prior
information is information contained in the the posterior probability
density pn (x). The l.h.s of (3.6) also represents the mutual informa-
tion between the conditional distributions for x∗ and Yn . This mutual
44 Quantum Mechanics and Bayesian Machines
relation [20]
p(xN = x∗ |UN −1 , YN )
Dxk k=N ∗ ∗
k=1 p(Yk |Uk−1 , x , pk−1 )p1 (x = x)
= . (3.13)
p(Yn |Un )
In a completely analogous manner to the way the posterior proba-
bility for a Bayesian search p(x|D) was obtained by integration over
all possible values of an interpolation function for input data labels,
the posterior probability that after N + 1 steps the new state will
be x N + ΔxN can be obtained by integrating over an interpolation
function U (x) for the controls
p(Δx|DN −1 , xN ) = DU (x)p(Δx|U (x), xN )p(U(x)|DN −1 ). (3.14)
If all the a priori probabilities are roughly the same, then MacKay’s
prescription is to turn to the evidence P (D|Hi ) in order to rank
Ockham’s Razor 47
The ratio of prior probabilities on the r.h.s of Eq. (3.16) allows one
to input one’s personal judgement regarding the relative elegance or
simplicity of the two models. However, MacKay [6] points out that
the ratio of the evidence factors in Eq. (3.16) allows one to assay
the relative simplicity of two models in a way that is independent of
subjective judgments. Moreover, the evidence factor for any model
can be estimated from the way the model parameters needed for
the given dataset are distributed relative to the ML value for these
parameters:
P (D|Hi ) ∼
= P (D|wM L , Hi )P (wM L |Hi )σw|D , (3.17)
M
n n n
p(D|km ) = p(D|θm , km )p(θm |km ), (3.18)
m=1
where L(α) is the likelihood function defined in Eq. (3.21). The exact
answer for the best model is still given by maximizing the probability
defined in Eq. (3.23), and this is equivalent to minimizing the free
energy of the avatar physical system:
F (x) = {Eα P (α) − (−P (α) log P (α))}. (3.20)
α
N M
n
P {dn }n=N α=M
n=1 |{θα }α=1 = [P (dn |Θα )]km , (3.21)
n=1 α=1
observational input.
Determining MacKay’s Occam’s razor is also closely related to
the problem of constructing “adversarial” networks, i.e. devising an
algorithm which will generate authentic looking input data given a
suitable choice of network parameters. The construction of adversar-
ial networks that are of practical use in general settings is currently
an active area of research in the data science community. Here will
focus on the approach of Neal, Hinton, et al., known as the Helmholtz
machine [8,9].
where the slj (α) are the spin excitations in the l’th layer. Because the
excitations are stochastic variables, running the recognition network
many times over generates a probability distribution Qα . According
to Mumford–Rissanen [26], the best explanation minimizes the total
Ockham’s Razor 53
cost C(α):
C(α) = − [Qα Eα + Qα logQα − Qα log(Pα /Qα )] , (3.25)
α
In this way, we recover Todorov’s formula, Eq. (1.3), for the Evidence
factor. Of course, in both quantum mechanics and Todorov’s formu-
lation of control theory, the devil lies in the difficulty of taking into
account many alternative paths in a path integral representation for
the Bellman function.
Chapter 4
Control Theory
55
56 Quantum Mechanics and Bayesian Machines
ġ = f (A + u1 E1 + · · · + ul El ), f ∈ T G, u ∈ Rl , (4.21)
4.4. H∞ Control
xk = μk + Pk λTk (4.32)
for the state variables. The solution to Eq. (4.31) that minimizes Jmax
w.r.t to xk and yk is x̂k = μk and yk = Hk μk . Evidently, given the
existence of a bound on the magnitude of J, we obtain a quiescent
equilibrium state. Thus, H∞ control does seem to provide relief for
an adversarial increase in the innovation noise.
In contrast with the Kalman filter, the variance of the innovation
is bounded. In other words, the actual forward (or backward) phase
space trajectories for a system or environment are uniformly close
to the observed trajectories. (This is also the miracle of Pontryagin
control [86,87]). The geometric and topological proximity of these
trajectories allows one to picture the forward and backward “innova-
tions”, i.e. two fluctuating smooth surfaces representing differences
between the model and observed trajectories viewed from the per-
spective of the observer/controller or environment acting as the RL
agent while the degrees of freedom of the adversary are frozen. The
areas of these two surfaces are just the Bellman values V of the
actions of the two agents. In the von Neumann equilibrium state,
the two surfaces have the same shape, but the Bellman values have
opposite signs due to the reversal of the direction of time. (In quan-
tum mechanics, systems propagating backward in time have negative
energy [64].)
This page intentionally left blank
Chapter 5
Integrable Systems
d2 ψ 2m
+ Γ2 (x)ψ(x) = 0, Γ2 (x) = [E − v(x)] (5.2)
dx2 2
of the form
Ai(ξ) Bi(ξ)
ψMAF (x) ≡ C1 + C2 , (5.3)
ξ (x) ξ (x)
67
68 Quantum Mechanics and Bayesian Machines
v(x) (even very near to a turning point where the WKB approxima-
tion fails) [70,71].
Airy functions were originally introduced into quantum mechanics
in order to solve the problem of quantum motion in a linear potential
[14], but later made an appearance in connection with the general
problem of relating the oscillating and exponentially decaying solu-
tions of the 1D Schrodinger equation at classical turning points [71].
What is of paramount interest for us is that the solutions (5.3) pro-
vide approximate solutions for the time-independent 1D Schrodinger
equation that are accurate for all values of x, and for any poten-
tial. Thus, the MAFs may be especially useful for analytically rep-
resenting the progress of feedback control or RL, both of the past
and future. This property of MAFs is shared with solutions of the
KdV and NLS equations and reflects a universal property of func-
tions that satisfy integrable PDEs. They also share the fact that they
have meromorphic integral representations as Cauchy integrals [68].
This analytic behavior and its attendant connection with solvability
is fundamental to our approach to optimal control.
In Landau’s Appendix for Quantum Mechanics [15] (which has a
colorful history going back to the time he was a postdoctoral fellow in
Copenhagen), he points out that the solutions to the Airy equation
can also be expressed as an integral along a line in the complex plane.
When the contour is the imaginary axis, the representation for the
forward propagating solution is
∞
1 s3
Ai(x) = √ cos sx + ds. (5.4)
π 0 3
A similar integral expression exists for the backward propagating
eigenstate Bi (x). Arnold Its [68] pointed out that these integral rep-
resentations for the Airy function can also be reformulated as a
Riemann–Hilbert problem (see Appendix B for an introduction to
the Riemann–Hilbert problem, which goes back to Riemann’s PhD
thesis). The Riemann–Hilbert problem [67–69,73] is to reconstruct
a function that is analytic in the complement of a closed contour Γ
from the discontinuity of the function along the contour. Applying
this to the case where the holomorphic functions are the two simple
momentum space eigenstates of the 1D Schrodinger equation [13] for
a linear potential yields the representation in Eq. (5.4). Except for
arcs at infinity, the RH contour Γ consists of the real line plus the
60◦ and 120◦ diagonal lines in the complex plane. Remarkably, these
Integrable Systems 69
same lines played a central role in the “Eightfold Way” scheme for
constructing representations of SU(3) [118] that played a historical
role in the early understanding of elementary particles with strong
interactions. The Riemann–Hilbert problem of interest to us consists
in recovering the 2-component analytic function Φ(z) and the solu-
tion of the nonlinear Airy equation from a jump condition across Γ:
1 Γk
3
exp−i(2xs+ 8s )
3
where G(z) = s−z ds
and Φ+ and Φ− are the holo-
0 1
morphic pieces of Φ defined on the two sides of the RH Γ. The integral
in Eq. (5.4) is recovered from the component of Φ corresponding to
the Ai (x) solution of the Airy equation with a wave incident from
the left:
lim
u(x, t) = 2i zΦ(z)12 . (5.6)
z→∞
Its [68] generalized this construction for the Ai (x) to a solution that
include both of the two independent solutions of the Airy equation.
The jump contour Γ now consists of the real line (the “I spin” axis)
plus the entire “U-spin” and “We spin” axes. The jump condition
is modified so that along the added pieces of the “U-spin” and “We
spin” axes, the jump matrix G(z) in Eq. (5.4) is replaced by its
inverse [68–69]. describe a version of this construction where the
analytic matrices Φ(z) involve MAFs near the turning points. They
enjoy nice analytic behavior when ξ(x) is extended to the entire
complex plane — with the exception of the origin — which induced
T–O to modify the jump contour by adding a circle around the ori-
gin where Φ is a 2 × 2 matrix that describes the two independent
MAFs that appear inside and outside this circle around the origin
in the complex plane. The matrix Φ is holomorphic inside and outside
the jump contour, which anticipates the Segal–Wilson solution for
the KdV equation.
The matrix Φ has a Cauchy integral representation in terms of its
discontinuity across the jump contour
1 Φ+ − Φ−
Φ(z) = ds, (5.7)
2πi s−z
70 Quantum Mechanics and Bayesian Machines
Φ+ (s) = Φ− (s)G(s),
Landau’s integral representation for g(λ)dλ for the Airy function
can be recovered by taking the limit λ → ∞ of λG12 . When the
contour is the real axis, the jump condition is
1 − |r|2 −re−2ixs
Φ+ = Φ− . (5.8)
re2ixs 1
In this formalism, the input data which allows one to extract u(x, t)
are the reflection coefficients r(s). If the contour is a polygon with
G(λ) = M1 M2 , . . . , Mk, a product of piecewise constant matrices,
then the meromorphic scattering amplitude S(λ) can be constructed
as a product of τ -functions [68]:
τk (Y ) = Y (λ)Mk , (5.9)
where
g(μ)
1 μ−λ dμ
Y (λ) = .
0 1
∂u ∂ 3 u ∂u
+ 3 + 6u = 0, (5.10)
∂t ∂x ∂x
∂Ψ(x, t)
= Bn Ψ(x, t). (5.12)
∂tn
In both the KdV and NLS cases, this Riemann surface arises as a
corollary of the Burchnall–Chaundry theorem [77] for commuting
scalar differential operators. The wave function φ(z, x), as well as
the functions q(x) and p(x), can be calculated exactly in terms of
the Θ-functions associated with this Riemann surface [56].
The integrable structure for the KdV and NLS equations is largely
hidden. Indeed, initially it was not even suspected that these equa-
tions were completely integrable. However, following a decade-long
campaign by some talented mathematicians, this hidden structure
was finally revealed (see [72–75] for nice reviews). The solution of
the KdV equation representing a single solitary wave turned out to
have the form [75] u1 (x, t) = −2η 2 sech2 (η(x − x0 + η 3 t + η 5 t5 + · · · ),
where the appearance of an infinite set of independent time variables
reflects the fact that the KdV equation is an example of an integrable
dynamical system with an infinite number of degrees of freedom. An
important development in the theory of the KdV equation was the
discovery by Hiroto [76] that that multi-solitary wave solutions of
the KdV equation can be represented in terms of Θ-functions:
∂2
u(x, t) = 2 lnτ (θ1 , . . . , θ1 ), (5.16)
∂x2
where for multiple solitary wave solutions of the KdV equation:
⎡ ⎛ ⎞⎤
N N
τ (x1 , t2 , . . . , tN ) = exp ⎣iπ ⎝ ini nj Tij + 2iθj nj ⎠⎦
ni ,nj ∈Z i,j=1 j=1
74 Quantum Mechanics and Bayesian Machines
and
⎡ ⎛ ⎞⎤
The model selection problem for the KdV equation amounts to choos-
ing a set of solitary waves and values for the initial positions xi0 and
“momenta” ηi that best explain a set of observations of the wave
amplitude that, say for practical reasons, are limited in their scope
of times and locations. Finding the best choice of parameters for
the τ -function in Eq. (5.16) based on actual video observations of
wave amplitudes would be a very difficult problem for conventional
machine learning if many solitary waves were present.
In the context of using the KdV equation as an avatar for Bayesian
learning, this model selection problem amounts to choosing a “Back-
lund transformation” [75]. This involves transforming the τ -function
to accommodate the addition of another solitary wave. Using the
formula for u(x, t) as a second derivative of the τ -function one finds
that the new τ -function can be expressed in terms of asymptotic
scattering wave functions for the KdV Lax equation:
The work of Its [68] set the data science community on the path
connecting the use of the inverse scattering method to solve nonlin-
ear integrable PDEs to construct special meromorphic functions in
a neighborhood of the north pole of a Riemann surface, which can
be used to construct a 1:1 map between input data to data features.
Following the success of the inverse scattering method for construct-
ing exact solutions of the KdV or NSE equation, Segal and Wilson
discovered [73] a nice geometric way of side-stepping the usual way
of solving the GLM integral equation. In the Segal–Wilson approach,
scattering amplitudes for the Schrodinger or Dirac equations appears
as a discontinuity between square-integrable holomorphic functions
defined in the upper and lower halves of the complex plane. Their
construction is based on the introduction of the “Grassmannian” of
all closed subspaces W of the Hilbert space H consisting of square
Integrable Systems 77
g : S 1 → H+ + H− , (5.23)
aλ bλ
g(λ) = . (5.24)
cλ dλ
where
1 1 bλ Tλ Rλ
Sλ = ≡ (5.26)
dλ −cλ 1 −Rλ Tλ
If the tk in Eq. (5.30) are nonzero, then the action of the exponential
factor in Eq. (5.17) represents the effect of the multiple flows on the
solution uw (z), provided that these are independent linear flows for
each value of k, which of course makes sense since the KdV equation is
an infinitely integrable system. The transversal condition means that
the parameters defining w satisfy certain conditions, which in the
“Kyoto school” theory of the KdV equation are met by demanding
that the Baker function be derived as the ratio of two τ -functions as
in Eq. (5.17), which explains the meromorphic structure of the Baker
function a la Wiener filter.
In the multi-solitary wave case, these τ -functions can also be rep-
resented as a determinant of propagators for solutions of a Fokker-
Planck equation [120]:
⎢ ⎥
⎢K(s1 , s1 ) · · · K(s1 , sl )⎥
∞
zl b b ⎢ ⎥
⎢ ··· ⎥
D(z) = ds1 . . . dsl ⎣ : : ⎦,
l! a a
l=1 K(sl , s1 ) · · · K(sl , sl )
(5.31)
The solution for u(x, t) obtained from (5.31) is the same as the
analytic expression involving theta functions for a Riemann surface
80 Quantum Mechanics and Bayesian Machines
The Lax equation approach for finding exact solutions also works for
the nonlinear Schrodinger equation [69,75]. Of particular interest to
us is its “complexified form” where the scalar wave amplitude u(x, t)
of the KdV equation is replaced by two amplitudes p(x, t) and q(x, t),
which play the role of momentum and position controls:
∂q
i = −qxx + |p|2 q
∂t
∂p
i = pxx − |q|2 p. (5.32)
∂t
If we assume p = ±q, then we have the real form analogs to the KdV
equation:
∂u
i = −uxx + 2|u|2 u, (5.33)
∂t
where
d/dx −q
M =i . (5.35)
p −d/dx
ψ2 = φ(z, x)ψ1 ,
For similar reasons in the case of the NLS equation it is almost nec-
essary from the beginning to recognize that the ψ(x, P ) amplitudes
are quantum fields.
A “second quantized” version of the NLS equation was introduced
by Faddeev et al. [125]. Their Quantum Inverse Scattering formalism
allows one to express the τ -function and Baker function for the sec-
ond quantized NLS equation in terms of expectation values for prod-
ucts of creation and annihilation operators for the oscillator array.
The Hamiltonian for NLS model introduced by Faddeev & Co is
the same as the Hamiltonian for discretized version of a 1D gas of
strongly repulsive bosons, with interactions:
" #
↑
H = dx ∂x Ψ↑ ∂x Ψ + cΨ↑ Ψ ΨΨ , (5.37)
where λ is the spectral parameter and EN = N 2
j=1 (λj − μF ). The
momentum eigenvalues associated with excitations of the Bethe vac-
uum are
The quantities that would most naturally play the role of the real
valued kernel functions K(x, y) that appear in the theory of the KdV
equation are the equal time correlation functions
' x2 (
T
Q(x1 , x2 ) = Ψ (y)Ψ(y) dy , (5.43)
x1
sin(λ − μ)
K(λ, μ) = ϑ(λ) ϑ(μ), (5.44)
λ−μ
where ϑ(λ) is the Fermi weight 1/ exp(λ2 − β). The analog of the
conventional scattering problem is |λ| → ∞. Because we have ana-
lytic expressions for these kernel functions, we can naturally make
contact between traditional methods of data analysis and our use
of integrable models to make predictions for optimal control or RL
strategies.
The τ -function τ (λ) is defined to be the trace of the analog of the
S-matrix [125]:
Quantum Tools
89
90 Quantum Mechanics and Bayesian Machines
This group is a “nilpotent”, which means that the Lie algebra for the
Heisenberg group has the same form as the above matrix minus the
identity operator. (Incidentally, this is the mathematically rigorous
formulation of the original matrix mechanics of Heisenberg, Born,
and Jordan [42]).
Among the representations of the Weyl–Heisenberg group, the
representations related to the energy eigenstates of a harmonic oscil-
lator will be of special importance to us. In particular, Bargmann
introduced a type of quantum coherent state for 1D quantum oscil-
lators, known as the BFS states [94]. The Bargmann–Segal trans-
form [95] is a map f (x) → F (z) from square integrable functions on
Euclidean space Rd to Cd :
√
1 z 2 +2 2x·z−x2
F (z) = d e− 2 f (x)dx (6.4)
π
The power of the BS transform (6.4) is that a space of holomorphic
function defined on a compact domain can be mapped to a com-
pact space of harmonic oscillators with real valued wave functions.
Because of the ubiquitous importance of holomorphic functions for
Bayesian learning, this result is potentially of great interest to us.
In 1928, Fock had observed [94,95] that regarded as operators in
a Hilbert space of holomorphic functions z and d/dz obey the same
Quantum Tools 91
K(z, w) = ez · w . (6.5)
The presence of the factor μ(w) in this equation means that the
displacement operator f (z) → f (z − a) is not unitary; instead one
represents displacements with a unitary operator
|z||2
− 2
+z · a
Ta f (z) = e f (z − a). (6.7)
[A, A∗ ] = 1. (6.8)
The last factor in Eq. (6.11) is a signature for the fact that a+ and a
operators obey the Fock commutation relation (6.8). In the position
representation commonly used for the Schrodinger equation, these
states have the form
x2 √ α2
|α >= exp − 2αx + . (6.12)
2 2
iωT Lω 2
= exp − − a + b2 −2abeiωT . (6.14)
2 4h
iωT mω 2
ψ(x, T ) = exp − − x − 2abxe−iωT +a2 cos(ωt) e−iωT .
2 4h
Quantum Tools 93
f˜(z) = (f, Φα,β )Φα,β , (6.15)
α,β
where f ∈ L2 (CN ). When α, β are integers and Φα,β has the form
1 ix · ξ y y
Φμ,ν (z) = e Φ μ ξ + Φ ν ξ − dξ, (6.16)
(2π)N/2 2 2
where Hni is the Hermite function for a single Fock state, n = {nj } ∈
ZN and x = {xj } ∈ RN . It can be shown that the set of functions
{Φn } for all n provides a basis for the Hilbert space L2 (RN ). This
Hilbert space, in common with the Hilbert space for the single quan-
tum oscillator, is infinite dimensional. However, it can be truncated
a natural way by restricting attention to values of n ∈ ZN , and using
log2 N qubits to label the values of n. These states not only form
the basis for the Hilbert space L2 (CN ), but also form a space of
94 Quantum Mechanics and Bayesian Machines
∞
∞
2n+1
f (r) = f (s)ϕk (s)s ds ϕk (r), (6.18)
k=0 0
−n 1 r2
2 k! 2 2 − 4 n
where the ϕk (r) = [ (k+n)! ] r e Lk (r) are the generalized Laguerre
functions which also appear in elementary quantum mechanics [14].
The expansion (6.18) can also be expressed as a projection:
Pk f (z) = (f, Φα,β )Φα,β ,
|β|=k α
In the case where the shifts x and y are restricted to the integers, mod
n and z lie on a complex torus Cg /Λ, where Λ is a 2g dimensional
lattice. The indexed Θ-functions (6.18) were originally introduced
[54] by Solomon Lefshetz as the coordinates of a Riemann surface
embedded in flat projective space PN . (This embedding is of par-
ticular importance in mathematics because it means that Riemann
surfaces are “algebraic varieties”.) Remarkably, the set of functions
defined in Eq. (6.21) form a “reproducing” kernel space (cf. [99])
of dimension n2g . The term reproducing means that they are the
eigenfunctions for the defining kernel, which in the case of (6.21) is
the closed string theory propagator used in relativistic string the-
ory [100]. These functions can also can be constructed [55] by first
identifying their value at a reference point z = 0, and then using
the Weyl–Heisenberg shift operators, Eq. (6.2) to define their val-
ues over the entire Riemann surface. It was Lefshetz’s discovery of
the embedding of Riemann surfaces in projective space using these
functions that allowed quantum mechanics to emerge from algebraic
geometry. (For a detailed discussion of theta functions with charac-
teristics, see Griffiths and Harris’s Principles of Algebraic Geometry
[54] or Mumford’s more succinct Tata on Theta [56].)
states that can be represented with even a finite set of basis states
is literally infinite, it might seem that there would be an enormous
advantage to storing data features as quantum states. However, this
is probably a chimera because one must take into account that there
are strict limits in how much information can be stored in quantum
states. The key to understanding this is Helstrom’s theorem [102],
which places strict limits on the distinguishability of two quantum
states. Helstrom’s theorem plays a role in quantum Bayesian infer-
ence that is analogous to the singular role that the Neyman–Pearson
test plays in classical Bayesian approaches to data interpretation.
One of the advantages of quantum information processing is that
as a consequence of Helstrom’s theorem, one is able to immediately
attach information theoretic significance to the data features regard-
less of whether these features are Gaussian distributed variables.
One of the enigmas of quantum information processing is whether
it is possible how to encode experimental data as quantum states. For
example, if one wants to know how many measurements are needed
to distinguish two Gaussian distributed variables, one only needs
to know the estimated mean and variance for the two variables in
order to determine for example the probability of false alarm (PFA)
were really the same even when the measurement suggested they
were different. However, in quantum mechanics the wave functions
themselves are a deterministic rather than probabilistic quantity.
Therefore, in quantum mechanics there is no automatic way of asso-
ciating information with a state in Hilbert space. Nevertheless, there
is a simple and universal way for estimating the PFA for quantum
measurements. Namely, the probability of “false alarms” is elegantly
provided by Helstrom’s theorem:
PFA = 1 − 1 − η, (6.23)
where η = [< Ψ1 |Ψ2 > |2 . This estimate for the PFA is independent
of the number or type of measurements. Thus, Helstrom’s theorem
does provide a limitation on how well quantum measurements can
reproduce Bayes’s conditional probabilities. However, in practice the
statistical uncertainties associated with weak measurements typically
obscure this limitation. On the other hand, as noted in the intro-
duction, for the most part we are going to restrict our attention to
weak measurements, which allows Bayesian conditional probabilities
to appear in a completely natural way.
Quantum Tools 97
∂ 2
where φj is an eigenfunction of the Hill–Schrodinger operator − ∂x 2 −
∂2
∂y 2
+ 12 (x2 + y 2 ) function. (This operator first appeared in the 19th
century in connection with Hill’s theory of the stability of Lagrange
triangles, but reappeared in 1925 in connection with Schrodinger’s
equation for an “upside down” 2D quantum oscillator). It turns out
that the radial part of wave function for a 2D quantum oscillator
involves a generalized Laguerre polynomial LnM that is closely related
to generalized Laguerre polynomial that appears in the radial wave
functions for the 2D hydrogen atom problem [14]. This brings us full
circle back to the problem that originally inspired Bayes and Gauss;
i.e. finding the orbital parameters for astronomical objects moving
under the influence of a 1/r potential. It is worth keeping this in mind
because this suggests that the model selection problem that attracted
Gauss’s interest, namely assigning multiple solar system objects to
distinctive orbits, might also be treated as a quantum problem for
multiple oscillators.
This focus on the 2D harmonic oscillator permits us an easy segue
to another very important Hilbert space related to the quantum the-
ory of angular momentum. Following the epochal 1922 discovery
by Stern and Gerlach of “spatial quantization” [131], Wigner and
Racah [113] developed a beautiful formalism for describing angular
momentum states in quantum mechanics. These states are of interest
for quantum machine learning because of a connection between the
energy eigenstates (Fock states) of a quantum oscillator and quan-
tum angular momentum states discovered by Julian Schwinger (when
he was a graduate student!). In On Angular Momentum, Schwinger
describes a very elegant way of constructing the quantum angular
momentum operators Jˆx ,Jˆy , and Jˆz as well as the Wigner–Racah
algebra [113] for representing vector sums of angular momentum in
terms of the annihilation and creation operators for a 2D quantum
harmonic oscillator. (These notes are unpublished, but a brief sum-
mary can be found in [112].) Schwinger’s construction of these states
is based on representing the quantum angular momentum operators
Quantum Tools 99
in terms of the raising and lowering operators for the number states
of a 2D quantum harmonic oscillator:
σ
J ≡ a+ς ς| |ς a ς
2
ς,ς =±
+
[a ς , a ς ] = a +
ς , a ς = 0 a ς ,a + = δςς (6.27)
ξ
+
J+ = a a2 , J− = a+ a1 , J3 = (a+ a1 − a+
2 a2 ).
2 1 2 2 2 1
One potential advantage of using the Fock states for a 2D oscillator to
represent quantum angular momentum states is that superconduct-
ing quantum oscillators provide an analog method for representing
these states.
Although our presentation has for the most part ignored the exten-
sive literature on qubit quantum computing, there is one develop-
ment in qubit quantum computing that mirrors our approach to
Bayesian inference: the “measurement-based quantum computing”
formalism of Raussendorf and Briegel [132]. Our approach to find-
ing the optimum strategies for Bayesian search and model selection
problems by encoding both observational data and the conditional
probabilities used in Bayesian inference as self-organized quantum
states is very similar in spirit to using measurements of entangled
states of qubits to carry out quantum computations. In Ref. [131], it
was shown that essentially all quantum computations that have been
contemplated using qubit quantum circuits can also be carried out
by making measurements of qubit states in a 2D array whose quan-
tum states have become! entangled by applying a controlled phase
gate CZ = exp (−i π4 <i,j> σiz σiz ) between qubits on neighbor nodes.
Such controlled phase gates can be realized naturally by allowing an
Ising spin-like interaction between neighboring qubits to act for time
intervals analogous to the Rabi time for spin flip in a magnetic field.
As a simple illustration of how measurement-based quantum com-
puting works for qubits, we consider the problem of teleporting a
bipartite qubit state of the form (α1 |0 > + β1 |1 >)(α2 |0 > + β2 |1 >)
from one location to another.
100 Quantum Mechanics and Bayesian Machines
Then measuring the eigenvalues of σ1x and σ2x yields the initial wave
function defined on the 1 and 2 nodes teleported to the 3 and 4 nodes.
As an illustration of the potential usefulness of this type of scheme
for Bayesian searches, we consider the “Monty Hall” search problem
where the location of an object of interest within a linear array of
boxes is being sought [5]. The 2D quantum oscillator array we envi-
sion using for this problem is illustrated in Fig. 6.1.
In this cartoon, each node of the middle layer consists of either
a single quantum oscillator plus a qubit or a pair of quantum oscil-
lators. This layer is the “quantum computer” which we use to find
the location of the hidden object. The quantum computations can
be carried out with either N levels of the single quantum oscilla-
tor or with [N/2] + 1 levels in each oscillator of a pair of quantum
Quantum Tools 101
X1 X2 X* XN
ψ0 = n ∑n1n2
n1n2 >
Fig. 6.1. Quantum scheme for solving the Monty Hall problem.
where σNz
+1 = 1. Expanding the r.h.s of (6.32) after entanglement,
the search proceeds by measuring at each step of the search the
Quantum Tools 103
Quantum Self-organization
105
106 Quantum Mechanics and Bayesian Machines
d
∂P ij ∂ ∂S
=− g P j (7.3)
∂t ∂xi ∂x
i,j=1
d
∂S 1 ij ∂S ∂S
=− g + v(x)
∂t 2 ∂xi ∂xj
i,j
2 2 ∂2P 1 ∂P ∂P
− i j
− 2 i j . (7.4)
8 P ∂x ∂x P ∂x ∂x
Quantum Self-organization 107
hole entropy [139]. Of course, this begs the question as to why some
interesting RL problem might be mapped onto such a model, but
our Gross–Pitaevskii model suggests that one way to understand
optimization of the Bellman function is to observe the time history
of the order parameter falling in a gravitational field as a function of
altitude:
∂ψ 2 2
i =− ∇ ψ + [U (|ψ|2 ) − g(t)h(x)]ψ, (7.10)
∂t 2m
If the altitudes hi of atoms in the cloud and their velocities vi are
controlled by varying the acceleration of gravity g, then the optimal
point is the altitude h where the speed of sound vanishes, and the
phase of the order parameter has a stationary point.
The moon lander problem described in Chapter 4 illustrates what
the solution to Eq. (7.10) might look like. In this case, turning on
the rocket thrust of the moon lander can be emulated in a cloud of
bosons falling in a gravitational field by demanding that on reaching
a surface the bosons should come to rest. Another advantage of visu-
alizing the system being controlled as a quantum fluid is that fluid
control is susceptible to H∞ control [119], which provides an entre
[84] for quantum self-organization to the theory of games.
Ψ = exp{−2πS/B}, (7.14)
where S is the area of the region between the salesman’s path con-
necting the square dots in Fig. 2.1 and the observations of the sales-
man’s path linking the round dots representing the cities. B = curlA
is assumed to be constant in this region. Of course, this formula only
makes sense on a Riemann surface because in the plane the sales-
man’s itinerary will in general be a self-intersecting graph. Equa-
tion (7.14) is completely consistent with our conjecture [36] that the
area S plays the same role as the Bellman function in optimal control,
i.e. the area S represents the information regarding the salesman’s
itinerary that has been lost as a result of environmental noise. In
other words, in our quantum version of the TSP the innovation noise
is just a consequence of using a path integral with the Nambu-like
action for a string [100] to describe the observations. When the devi-
ations of the quantum path from the classical path are limited such
that the path is not self-intersecting, then Bellman optimization is
112 Quantum Mechanics and Bayesian Machines
jαβ = σH ε αβγ Eγ ,
g = ±e2 /mcκ,
The aim of the wake–sleep algorithm [9] for training the Helmholtz
machine is to produce joint probability distributions for the ensem-
ble of Ising spins (states = {+1, −1}) which minimizes the informa-
tion costs of representing a set of observations. One of the key ideas
behind the Helmholtz machine of Dayan et al. [8] was to follow in the
footsteps of the Boltzmann machine [1] by regarding the two arrays
of Ising spins as physical systems. From this perspective, minimiza-
tion of the information cost of the descriptions of the states of the
recognition and data generation is equivalent to minimizing the free
energy of this physical system of interacting Ising spins. This in turn
implies [8,9] that the conditional probability for a given model, i.e.
the l.h.s of Eq. (1.1), will be given by Bayes’s formula. We want to
extend this scenario by replacing the Ising spins with the Riemann
surface degrees of freedom introduced in Chapter 5.
One possibility [52] for going in this direction would be to replace
the array of Ising spins with the Ashkin–Teller (AT) statistical
model [144] for a 2D array of spins with two Ising spins per lat-
tice site. It was discovered by Kadanoff and Brown [145] that if
the 4-spin couplings are carefully chosen, then the energy functional
Quantum Self-organization 115
for the model has a Gaussian form similar in form to the energy
cost functions that naturally appear in Kohonen self-organization
[13]. When the spins at each lattice site are allowed to interact,
these AT models share with self-organizing networks [108] the cru-
cial property that for critical values for the spin couplings the
AT model will possess string-like excitations where the informa-
tion regarding the state of the observer/controller or environment
can be represented by the shape of a possibly topologically non-
trivial 2D surface. The emergence here of “self-organization” means
that the original AT spin degrees of freedom are effectively replaced
with Θ-functions representing the shape of a Riemann surface. This
has the pleasant consequence that both the recognition and gen-
erative networks of the Helmholtz machine can then be described
as path integral representations of the Lax equation for the KdV
equation.
The story line here is that we want to use the double path
integral formulation of Feynman and Vernon [47] to represent the
string degrees of freedom in the two Helmholtz machine arrays; one
representing the forward evolution of an observer/controller which
includes an estimation for the innovation, and the other to represent
the backward evolution of the system or environment. Formally, this
amounts to replacing Feynman’s original path integral with a double
path integral of the form (see Appendix E):
e−iS[x(t)] Dx(t)
(t)
→ e−i{S[x(t)]−S[x ]}
F [x(t), x (t)]Dx(t)Dx (t)}, (E.3)
where the ωi s are the frequencies of the oscillators in the 2nd oscilla-
tor array making up the “environment”. For a single harmonic oscil-
lator and an environment consisting of oscillators with frequencies
ωi , where Δi = ωi is the level spacing, the exponential factor in
Eq. (7.11) becomes
g 2 M ω 0 tj+1 t
exp − i
dt ds(x(s)e−iΔi (t−s)
2hΔi tj tj
ij
−x (s)eiΔi (t−s) )(x(t) − x (t))
1 1/2 2 1/2
ΔS 12 = dσ1 dσ2 [det(gab )] [det(gab )] G(ẋ2 − ẏ 2 ), (7.23)
where gab is the metric for a Riemann surface and ẋ2 − ẏ 2 is the
Lorentz invariant distance between a position on a Riemann surface
representing the observer/controller and a position on a Riemann
surface in representing environment. This in consistent with the
Chern–Simons interaction that appeared in our treatment of the
TSP. However, the path of the salesman is fixed, so there is no Rie-
mann surface associated with the salesman’s trajectory. On the other
hand, it will turn out to be of considerable interest in the case of
the Helmholtz machine to consider what happens when the degrees
of freedom of either array are frozen in time. In fact, this takes us
back to the Segal–Wilson and T–O descriptions of exact solutions of
the KdV equation in terms of holomorphic functions on a Riemann
surface.
Our elucidation of this follows some ideas of Chu [146]. Carrying
out the integration over the string degrees of freedom for the envi-
ronment array in the double path integral in (E.3) leads to an action
function for each string in the observer/controller array of strings of
the form
2 2
τf β
dxi dxi
S = dτ dy exp c2s − + qj (xi [y, τ ]) ,
0 0 dτ dy
j
(7.24)
On the other hand, the equilibrium state of either player can also
be described as the state where the information that the team of
agents representing a player has gathered regarding the state of the
adversarial player is maximized. In the equilibrium state, the optimal
strategies {πi∗ } for the 2N agents representing the two players satisfy
the Nash condition [105]:
where e ({πi }) is the game payoff for two teams of N agents represent-
ing the players in a two-player stochastic game. The {πi } are a set
of mixed strategies for each agent, while the ith strategy on the r.h.s
of the inequality (7.26) is any strategy other than the optimal strat-
egy. The strategies with an asterisk are the optimal strategies that
define the Nash equilibrium state. The Nash equilibrium condition
in Eq. (7.26) is formally the same as optimization of the Bellman–
Issacs function [53], which in this case sums up the payoffs for all the
122 Quantum Mechanics and Bayesian Machines
where ri∗ is the position of the neuron whose output is initially closest
to an input feature ξj , while the function Λ (r ) leads to a “receptive
field” for each detector which is the union of all input signals from
a particular field of view producing a response in the detector at
position ri that is closer to the current state of the detector at ri
than the state of any other r detector. Λ(|ri − ri∗ |) is typically a
Gaussian function that allows the feature detectors to adjust their
outputs so that not only the detector located at ri∗ , but also nearby
detectors observe the signal ξj . The “receptive field” for a sensor is
the union of all input signals from a particular field of view that
produce a response in the detector at position ri that is closer to
the current state of the detector at ri than the state of any other r
detector (cf. the receptive fields for the TSP defined by the dashed
lines in Fig. 2.1). The self-organizing algorithm (7.27) adjusts the
response of the sensor at position ri to a particular environmental
stimulus ξμ in its receptive field to be at least as strong as any of its
nearest neighbors.
Of great importance for us is that Kohonen’s self-organization
maps also give rise to holomorphic functions that can serve
mammalian cognition in much the same way that the holomorphic
functions introduced in Chapters 5 and 6 can be melded together to
provide an analytic model for the reward function for optimal control.
In this regard, Ritter and Schulten have shown [13] that under the
influence of random variables ξμ, the model outputs {w(ri )} evolve
to a state which minimizes a stochastic energy functional
⎡ ⎤
1
E[w(rj )] = ⎣ P (ξμ )|ξμ − w(rj )|2 ⎦ . (7.28)
2 <r,s>
ξμ ∈R
The reason that a sum over neighboring nodes appears on the r.h.s
of Eq. (7.28) is that at each time step the change in position of a
124 Quantum Mechanics and Bayesian Machines
node affects its neighbors, which allows the entire ensemble to relax
to a statistical configuration described by a partition function
⎛ ⎞
F
κ
Z= dw(rj )exp ⎝− w(ri ) − w(rj )2 ⎠, (7.29)
2
L i=1 i,j
picture of the object is created within the sensor array itself. There
is already evidence from MEG recordings [150] that different audio
patterns are recorded in different areas of the cerebral cortex. The
ability to provide a holistic understanding of different features of an
environment is of course probably one of the reasons for the evolu-
tionary success of mammals.
Chapter 8
Holistic Computing
127
128 Quantum Mechanics and Bayesian Machines
In our scheme the single quantum wire in Ref. [4] used to tele-
port wave functions of a continuous variable q is replaced by a
two-dimensional graph for teleporting entangled states of three angu-
lar momenta. Quantum states representing triangles are created by
entangling the angular momentum states attached to three neigh-
boring nodes in a column of the quantum wire — as illustrated in
Fig. 6.1 — so that the sum of the three angular momenta is zero; i.e.
the angular momentum vectors form a perfect triangle in the classical
limit. In a basis where the angular momentum component J z along
a fixed axis is well defined, the state representing a triangle is the
sum of a product of the three states |ji mi >, where the coefficients
are Wigner’s 3j symbols [114]. In this chapter, we will focus on the
130 Quantum Mechanics and Bayesian Machines
where F is the operator that switches the Jiz basis to the φi basis,
X(<Jiz >) is a shift operator that depends on the result of a φi mea-
surement, and Rz (θ) is a rotation operator about common z-axis for
Holistic Computing 131
the input and target column which can be applied either before or
after the CZ gate. The CZ gate in Eq. (6.28) allows one to entangle
an input state representing a triangle of angular momentum states
with control states |ΣJ z = 0> on neighboring columns, and then as
a result of measurements of φi for the input nodes the wave function
for the nodes in the next target layer {it } become X(φi0 )Ψin (qi1 ).
As an illustration of how quantum angular momentum states
might be used for the teleportation of geometric objects, let us con-
sider the teleportation of angular momentum states representing a
triangle within a quantum circuit consisting of an array of two-
dimensional oscillators. We envision that this array consists of 2D
oscillators which are localized at points in three-dimensions and con-
nected in a fashion at each node in the lattice is the used to define
an angular momentum state in a basis where the Jiz operators for
all the nodes within each column have definite values with respect
to a common z-axis. The wave function for the input layer corre-
sponding to three nodes within the first column of the lattice has the
form given in Eq. (8.3), where i = 1, 2, 3. The lines connecting the
nodes correspond to controlled phase gates CZ = exp(iJzit ⊗ Ji0 z ) act-
We end our presentation with some musings about the deeper signif-
icance of the connection between quantum mechanics and Bayesian
inference. It is easy to get the impression from the chaotic literature
devoted to data science that there is nothing particularly mathe-
matically profound about machine learning algorithms. On the other
hand, one of our aims with this book is to frame the question of
the mathematical significance of Bayesian inference in terms of its
relationship to quantum mechanics. John von Neumann apparently
thought that there was something mathematically profound about
quantum mechanics. However, von Neumann did not clearly artic-
ulate what this means in his publications. Although our presen-
tation has been focused on the relationship of the Bayes formula
and quantum mechanics, we believe that our results may also shed
Holistic Computing 135
A. Gaussian Processes
P (l|z(x), XN )P (z(x))
P (z(x)|lN , X N ) = , (A.1)
P (lN , |X N )
When the training labels l(n) are real numbers, the problem of find-
ing the function y(x) is usually referred to as a regression problem. If
l(n) = 0 or 1, the inference problem is usually referred to as a search
problem. Given a training dataset DN = {XN , ZN }, the regression
137
138 Quantum Mechanics and Bayesian Machines
for a training set X N of input data {x (k)} } with say N examples (i.e.
k = 1, . . . , N), and then using this interpolation model to attach sim-
ilar labels to new examples of data. MacKay’s method assumes that
each measurement y(x (k) ) of l(k) differs from the model z(x (k)} ) by
a random error:
y(x (k)} ) = z(x (k)} ) + v, (A.4)
P (t|z)
P (z ∗ |D, x∗ ) = dzP (z ∗ |z, x, x∗ )P (z|D) = dzP (z ∗ , z|x, x∗ ) ,
P (t)
(A.7)
where the “observation noise” P (t|z) as well as the prior P (t) are
assumed to be normally distributed about z. If the variables xi , zi ,
and ti are all assumed to be independent Gaussian distributed vari-
ables, the integral in Eq. (A.7) can be carried out analytically. The
result in z ∗ (x) is a Gaussian random variable with mean
and variance
= K(x∗ , x∗ ) − K(x∗ , D)(K(D, D) + σ2 I)−1 t. (A.9)
B. Wiener–Hopf Methods
with noise, Wiener’s original derivation of his filter [81] involved solv-
ing an integral equation of the same form as the integral equation
for wave scattering discovered by Wiener and Hopf. Not only was
Wiener’s discovery applied with good effect during the war, but in
the years following WWII Kalman modified Wiener’s signal filter in
such a way as to address the problem of optimal control [83]. It turns
out [41] that the mathematical structure underlying both the Wiener
and Kalman filters involves functions that are rational functions (i.e.
ratios of polynomials) of the wave frequency regarded as a complex
number. Perhaps the most momentous aspect of the effort to find an
exact solution of the KdV equation is that a certain rational function
of the eigenvalue of the 1D Schrodinger operator regarded as a com-
plex variable plays much the same role as the rational functions in
the Wiener and Kalman filters. Thus, the parts of machine learning
that flow from the work of Wiener and Kalman seem to involve in
essence the construction of a certain rational function of a complex
variable. It was left to a group of Russian mathematicians [119] as
well as Segal [73] to point out [94] the connection of this effort with
algebraic geometry and that the setting for this rational function is
a Riemann surface rather than the usual complex plane.
A potentially very profound advantage of using quantum ampli-
tudes rather than real probabilities to solve pattern recognition and
decision tree problems arises from the observation that, in contrast
with classical probability densities, the quantum amplitudes used
to describe the state and evolution of a quantum system are always
complex valued quantities that are typically analytic functions of the
continuous variables describing the system. This allows one to take
advantage of powerful methods for representing analytic functions
in terms of their singularities in the complex plane. In particular, in
1931 Weiner and Hopf [61] made the remarkable observation that cer-
tain kinds of integral equations that arise in scattering problems can
be solved by regarding the scattering amplitudes as analytic func-
tions of the parameters of the problem. For example, it was shown in
the 1950s that certain interesting problems involving the scattering
of electromagnetic waves that one might guess are intractable, e.g.
the scattering of an electromagnetic wave from a flat conductor with
a knife edge, could be easily solved by extension of the physical solu-
tion to a solution where the frequency is a complex variable. In the
Appendices 143
eiδ sinδ
f (k) = . (B.5)
k
The time reversal symmetry of the Schrodinger equation implies that
f ∗ (k) = f (−k∗ )
and (B.6)
2∗
f ∗ (k 2 ) = f (k ).
Given the input data R(t+x) the acausal covariance function K(x, y)
can be determined by solving the liner integral Eq. (2.15). This is the
covariance function that is used in least squares stochastic estima-
tion. The potential that appears in the wave equation (B.9) is given
by
d
u(x) = 2 K(x, x). (B.11)
dx
The GL equations pertain to the scattering solutions of the time-
independent form of Eq. (B.9):
∂2u
− u(x)ψ(t, x) = k2 ψ(t, x), (B.12)
∂x2
where uv(x) is assumed to be everywhere positive (u(x) > 0) with
compact support centered on the origin. The solutions to Eq. (B.12)
that are of interest in connection with the inverse scattering problem
are solutions which for x → −∞ have the form
A similar approach works for the KdV equation using the real
line as the contour [71]. G(z, x) contains the scattering data for the
Baker function, and the matrix linking the two holomorphic Hilbert
spaces is
3
1 − |r|2 re−2i(z t+xz)
G(λ, x) = 3 t+xz)
Ψ− . (B.26)
re2i(z 1
We now come to the punch line; this matrix contains all the infor-
mation necessary to construct the Bellman and reward functions for
the Kalman filter:
+
1 Φ − Φ−
Φ(z) = ds, (B.27)
2πi s−z
where
Φ+ (s) = Φ− (s)G(s),
τk (Y ) = Y (λ)Mk . (B.28)
and
The existence of the solution u+ (x, t) allows one to write the solution
to (B.7) when the incident wave u0 (x, t) has any shape with a sharp
wave front, i.e. u0 (x, t) = 0 when x − t > x0 , in the Green’s function
form:
∞
u(t, x, x0 ) = u+ (t − τ, x)u0 (τ, 0, x0 )dτ. (B.31)
−∞
The Rose optimization principle [34] is that the “best” choice for v(x)
is the one that leads to a scattering state that is entirely focused on
a particular location x∗ at a chosen time t∗ in the future.
∞
B= [u(t, x; x0 ) − δ(t − x + x0 )]2 dx = 0. (B.33)
−∞
where it should be noted that the second integral only goes over times
prior to t. They showed that problems of this type can be solved
by first calculating the wave response G(x, x0 , ω) when a periodic
impulse is applied at a particular point x0 on the boundary of the
medium carrying the wave:
∞
ψ(x|x0 |t) = G(x, x0 |ω)F (ω)e−iωt dω, (B.35)
−∞
where
∞
F (ω) = f (t)eiωt dω. (B.36)
−∞
transform:
∞
F (s) = f (t)e−st dω. (B.37)
−0
Solving the wave motion with a flexible boundary, taking into account
the distributed impulse resulting from the entire flexible boundary,
can now be found by finding the “filter” function g(x, x0 , t) for the
coupled wave medium/flexible boundary whose Laplace transform is
Green’s function G(x, x0 , ω). Given this filter function, the complete
motion of the string due to the imposition of a distributed force
f (x0 |t) along the length of the string can be obtained. We can do no
better summarizing the Morse–Feshbach proposal for how to solve
this type of problem than simply quote their synopsis in Methods of
Theoretical Physics [116]:
“First compute the Green’s function G(x, x0 |ω) for the steady
state response of the system to a force of unit amplitude and fre-
quency ω applied to point (x0 , y0, z0 ) within or on the boundary, by
solving an inhomogeneous Helmholtz equation with an inhomoge-
neous boundary condition. Find the impulse function g(x, x0 , t) for
which G is the Laplace transform, either by contour integration of
(Eq. (B.35)) or inverting (Eq. (B.37)). The response to f (x, t) is then
given by (Eq. (B.35))”.
One beautiful feature of the Morse–Feshbach prescription for deal-
ing with rubber potentials is that it not only illustrates the role of
causality more clearly, but it also immediately illustrates why the
nonrelativistic Schrodinger equation is relevant for understanding
wave propagation with rubber potentials.
C. Riemann Surfaces
What gives Riemann surfaces their punch are the Θ-functions [53–
56]; which define an ng -dimensional Hilbert space consisting of inde-
pendent meromorphic functions:
Θ(A|Tij ) ≡ exp iπ ni Tij nj + 2πi Aj nj , (C.2)
n∈Z g ij j
P
where Aj ≡ P0 dωj and the Tij ≡ dωj (Bi ) are the “periods”
for the Riemann surface obtained by integrating one of the g alge-
braically independent rational differentials on the Riemann surface
around one of its “B” cycles. These functions are not L-periodic, but
L-automorphic:
⎛ ⎞
Θ ⎝A + Tij mj ,|Tij ⎠ = [exp [iπ (Tii + 2πiAi ]n Θ(A|Tij ).
j
(C.3)
One thing that is remarkable about Θ-functions is that they define
an embedding of a Riemann surface in a projective space. Because
156 Quantum Mechanics and Bayesian Machines
As the integer characteristics [ε, ε ] run over the coset labels (0,1)
for Zg /2Zg the “theta-null-werte”, i.e. the values of θ (z|T ) at
z = 0, define the generators for a 22g dimensional representation
of the Heisenberg group! In this representation, the cosets Zg /2Zg
play the role of t in (6.2). The action of the translation part of the
158 Quantum Mechanics and Bayesian Machines
If g ∈ G(V, V ∗ ), then
where the brackets refer to the ground state expectation value for
an array of quantum oscillators. This τ -function is the glue that ties
together the long-term behavior of an integrable system with local
behavior represented by the Hamiltonian H. Of course, in practice
160 Quantum Mechanics and Bayesian Machines
the sums over l and n in the expression for H(t) would have to be
truncated in practice, not to mention the difficulties of representing
fermions in a practical computational setting. Nevertheless, we have
a setup which in principle would allow the τ and Baker functions for
the KdV equation to be exactly evaluated using an array of quantum
oscillators. Whether this is of any practical value remains to be seen.
However, the “I-spin”, “U-spin”. and “We-spin” lines in the SU(3)
root diagram used in the Eightfold Way make a ubiquitous appear-
ance in the RH approach to solving nonlinear PDES. The reason for
this is that that these axes play an important role [69] in defining the
boundary separating the domains of the holomorphic functions which
are used in Riemann–Hilbert approach to finding analytic solutions
for integrable nonlinear PDEs.
where S[x(t)] is the classical action for the system and A(t, t ) is the
autocorrelation function for the noise. If instead of a classical noise
signal the quantum system is coupled to a quantum environment,
then the real exponential in the formula for J is replaced by the
complex valued influence functional F {[x(t), x (t)]} and A(t, t ) is
replaced by a complex function α(t, t ). The exponential factor in
the density matrix propagator J can also be thought of as an overlap
Appendices 161
integral for final and initial states for the forward and backward 2nd
oscillator array as a functional of x(t) and x (t):
F [x(t)x (t)] = ψY (yb )ψY∗ (yb )dYb .
where α(t, t ) is a complex function that plays much the same role
as the real autocorrelation function for the signal. In the case where
the environment consists of a pure classical noise, usefulness of the
162 Quantum Mechanics and Bayesian Machines
(E.5)
where α(t, t ) is a complex function that will play much the same
role as the real autocorrelation function A(t, t ) for a valued random
time real signal. When the environment consists of another harmonic
oscillator with level spacing Δ, the influence function is
g 2 mω02
F (x, y) ∼
= exp −
2Δ 2
to +T t
× e−iΔ(t−t ) x(t ) − eiΔ(t−t ) y(t ))
to t0
×(x(t) − y(t) dt dt
C 2 ln Δmax /Δmin
ti −t
− Q(t) − Q(t)
2 Δmax ti =−∞
ti
× (Q(s) − Q (s))ds ρ. (E.6)
ti −τ
Appendices 163
F [q(t), q (t)]
T t
∗
= exp − [α(t, t )q(t ) − α (t, t )q (t )][q(t) − q (t)]dt dt ,
0 0
(E.7)
where α(t, t ) is a complex function that plays much the same role
in quantum mechanics as the autocorrelation function for Gaussian
processes. The exact relation of α(t, t ) to classical noise can be under-
stood by looking at matrix elements of the quadratic functional of
q(t) and q (t) in the Hilbert space spanned by energy eigenstates of
an array of quantum oscillators. For example,
T t
S[q(t)]−S[q (t)]
e α(t, t )q(t )[q(t) − q (t)]dt dt Dq(t)Dq(t)
0 0
T t
=− α(t, t ) < m|q(t)|n >< m|q(t )|n > dt dt, (E.8)
0 0
g2
i −iωi (t−t )
α(t, t ) = − e , (E.9)
ωi
2
where the ωi s are the frequencies of the oscillators in the 2nd oscilla-
tor array making up the “environment”. For a single harmonic oscil-
lator and an environment consisting of oscillators with frequencies
ωi , where Δi = ωi is the level spacing, by analogy with the classical
Wiener filter one might assume that α(t, t ) has a piece that repre-
sents the signal and a piece that represents the noise. As a reminder
the Wiener filter HF (t, s), described in Chapter 2, is obtained as
a ratio of Laplace transforms of the signal correlation K(s, t) and
R(s, t), the sum of the signal and noise correlation functions. If we
164 Quantum Mechanics and Bayesian Machines
g2
i −iωi |t−t |
R(t, t ) = − e + A(t, t ), (E.10)
ωi
2
R+(y)
HF (s) = . (E.14)
1 + R+ (y)
g2 t
F (t) = − sin(t − t )σjz (t ) a+ (t ) − a(t ) dt
2 t0
j
167
168 Quantum Mechanics and Bayesian Machines
A F
adaptive optics, 8, 13, 23, 144, Θ-functions, 12, 18–19, 73–76, 79, 95
152–153 Feynman path integral, 8, 18, 36, 57
Ashkin–Teller model, 114 Feynman, Richard, 34
Feynman–Vernon influence function,
B 20
Bargmann–Segal transform, 90
Bayes’s formula, 3–5, 11, 16–17, G
25–27, 42, 46, 50–51, 53–54, 107, Galois theory, 18, 86–87
114, 136, 137, 139 Gross–Pitaevski equation, 105,
Bayes, Thomas, 15, 25 107–108
Bayesian learning, 11, 15, 19, 29, 41,
57, 71, 75, 80, 90, 92, 159 H
Bayesian searches, 17, 41–43, 45,
100–101 H∞ control, 63–65, 120, 122
Bellman cost function, 7, 17, 44, 55 Hamilton–Jacobi–Bellman equation,
Bellman–Issacs function, 12, 119, 121 55, 105
Boltzmann distribution, 5 Hardy spaces, 15, 18–20, 64, 120, 124,
127
D Helmholtz machine, 5–6, 12, 17–18,
20–22, 49–53, 84, 114–115,
dynamic programming, 6, 11, 13, 16, 117–119, 122
32–33, 43–46, 55 Helstrom’s theorem, 20, 106
Dyson, Freeman, 13 Hilbert spaces, 15, 19, 28, 72, 76,
89–90, 93–98, 127
E Hilbert, David, 15, 18
eightfold way, 69 Hinton, Geoffrey, 5–6
177
178 Quantum Mechanics and Bayesian Machines