100% found this document useful (1 vote)

277 views510 pages

Quantum Information and Foundations

Uploaded by

Brent Allie

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

277 views510 pages

Quantum Information and Foundations

Uploaded by

Brent Allie

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 510

Quantum

Information and
Foundations
Edited by
Giacomo Mauro D’Ariano and Paolo Perinotti
Printed Edition of the Special Issue Published in Entropy

www.mdpi.com/journal/entropy
Quantum Information and Foundations
Quantum Information and Foundations

Special Issue Editors

Giacomo Mauro D’Ariano
Paolo Perinotti

MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade • Manchester • Tokyo • Cluj • Tianjin
Special Issue Editors
Giacomo Mauro D’Ariano Paolo Perinotti
QUit Group, Department of QUit Group, Department of
Physics, University of Pavia Physics, University of Pavia
Italy Italy

Editorial Ofﬁce
MDPI
St. Alban-Anlage 66
4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal Entropy
(ISSN 1099-4300) (available at: https://www.mdpi.com/journal/entropy/special issues/quantum
information foundations).

For citation purposes, cite each article independently as indicated on the article page online and as
indicated below:

LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. Journal Name Year, Article Number,
Page Range.

ISBN 978-3-03928-380-4 (Pbk)

ISBN 978-3-03928-381-1 (PDF)

c 2020 by the authors. Articles in this book are Open Access and distributed under the Creative
Commons Attribution (CC BY) license, which allows users to download, copy and build upon
published articles, as long as the author and publisher are properly credited, which ensures maximum
dissemination and a wider impact of our publications.
The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons
license CC BY-NC-ND.
Contents

About the Special Issue Editors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

Giacomo Mauro D’Ariano and Paolo Perinotti

Quantum Information and Foundations
Reprinted from: Entropy 2020, 22, 22, doi:10.3390/e22010022 . . . . . . . . . . . . . . . . . . . . . 1

Alessandro Bisio, Giacomo Mauro D’Ariano, Nicola Mosco, Paolo Perinotti and Alessandro
Tosini
Solutions of a Two-Particle Interacting Quantum Walk
Reprinted from: Entropy 2018, 20, 435, doi:10.3390/e20060435 . . . . . . . . . . . . . . . . . . . . . 9

Marc-Olivier Renou, Nicolas Gisin and Florian Fröwis

Robust Macroscopic Quantum Measurements in the Presence of Limited Control and
Knowledge
Reprinted from: Entropy 2018, 20, 39, doi:10.3390/e20010039 . . . . . . . . . . . . . . . . . . . . . 27

Louis H. Kauffman
Iterant Algebra
Reprinted from: Entropy 2017, 19, 347, doi:10.3390/e19070347 . . . . . . . . . . . . . . . . . . . . . 41

Časlav Brukner
A No-Go Theorem for Observer-Independent Facts
Reprinted from: Entropy 2018, 20, 350, doi:10.3390/e20050350 . . . . . . . . . . . . . . . . . . . . . 71

Alexander Wilce
A Royal Road to Quantum Theory (or Thereabouts)
Reprinted from: Entropy 2018, 20, 227, doi:10.3390/e20040227 . . . . . . . . . . . . . . . . . . . . . 81

Giulio Chiribella
Agents, Subsystems, and the Conservation of Information
Reprinted from: Entropy 2018, 20, 358, doi:10.3390/e20050358 . . . . . . . . . . . . . . . . . . . . . 107

Howard Barnum, Ciarán M. Lee, Carlo Maria Scandolo and John Selby
Ruling out Higher-Order Interference from Purity Principles
Reprinted from: Entropy 2017, 19, 253, doi:10.3390/e19060253 . . . . . . . . . . . . . . . . . . . . . 161

John Selby and Bob Coecke

Leaks: Quantum, Classical, Intermediate and More
Reprinted from: Entropy 2017, 19, 174, doi:10.3390/e19040174 . . . . . . . . . . . . . . . . . . . . . 189

Alberto Barchielli, Matteo Gregoratti and Alessandro Toigo

Measurement Uncertainty Relations for Position and Momentum: Relative Entropy
Formulation
Reprinted from: Entropy 2017, 19, 301, doi:10.3390/e19070301 . . . . . . . . . . . . . . . . . . . . . 213

Giovanni Amelino-Camelia
Planck-Scale Soccer-Ball Problem: A Case of Mistaken Identity
Reprinted from: Entropy 2017, 19, 400, doi:10.3390/e19080400 . . . . . . . . . . . . . . . . . . . . . 249

Mario Arnolfo Ciampini, Paolo Mataloni and Mauro Paternostro

Structure of Multipartite Entanglement in Random Cluster-Like Photonic Systems
Reprinted from: Entropy 2017, 19, 473, doi:10.3390/e19090473 . . . . . . . . . . . . . . . . . . . . . 257

v
Ämin Baumeler and Stefan Wolf
Non-Causal Computation
Reprinted from: Entropy 2017, 19, 326, doi:10.3390/e19070326 . . . . . . . . . . . . . . . . . . . . . 269

Chris Heunen
The Many Classical Faces of Quantum Structures
Reprinted from: Entropy 2017, 19, 144, doi:10.3390/e19040144 . . . . . . . . . . . . . . . . . . . . . 279

Philipp Andres Höhn

Quantum Theory from Rules on Information Acquisition
Reprinted from: Entropy 2017, 19, 98, doi:10.3390/e19030098 . . . . . . . . . . . . . . . . . . . . . 301

Catalina Curceanu, Hexi SHI, Sergio Bartalucci, Sergio Bertolucci, Massimiliano Bazzi,
Carolina Berucci, Mario Bragadireanu, Michael Cargnelli, Alberto Clozza, Luca De Paolis,
Sergio Di Matteo, Jean-Pierre Egger, Carlo Guaraldo, Mihail Iliescu, Johann Marton,
Matthias Laubenstein, Edoardo Milotti, Marco Miliucci, Andreas Pichler, Dorel Pietreanu,
Kristian Piscicchia, Alessandro Scordo, Diana Laura Sirghi, Florin Sirghi, Laura Sperandio,
Oton Vazquez Doce, Eberhard Widmann and Johann Zmeskal
Test of the Pauli Exclusion Principle in the VIP-2 Underground Experiment
Reprinted from: Entropy 2017, 19, 300, doi:10.3390/e19070300 . . . . . . . . . . . . . . . . . . . . . 319

Kristian Piscicchia, Angelo Bassi, Catalina Curceanu, Raffaele Del Grande, Sandro Donadi,
Beatrix C. Hiesmayr and Andreas Pichler
CSL Collapse Model Mapped with the Spontaneous Radiation
Reprinted from: Entropy 2017, 19, 319, doi:10.3390/e19070319 . . . . . . . . . . . . . . . . . . . . . 327

Robert B. Grifﬁths
Quantum Information: What Is It All About?
Reprinted from: Entropy 2017, 19, 645, doi:10.3390/e19120645 . . . . . . . . . . . . . . . . . . . . . 335

Benjamin F. Dribus
Entropic Phase Maps in Discrete Quantum Gravity
Reprinted from: Entropy 2017, 19, 322, doi:10.3390/e19070322 . . . . . . . . . . . . . . . . . . . . . 347

Yangyang Wang, Xiaofei Qi and Jinchuan Hou

Nonclassicality by Local Gaussian Unitary Operations for Gaussian States
Reprinted from: Entropy 2018, 20, 266, doi:10.3390/e20040266 . . . . . . . . . . . . . . . . . . . . 395

Kevin Vanslette
Entropic Updating of Probabilities and Density Matrices
Reprinted from: Entropy 2017, 19, 664, doi:10.3390/e19120664 . . . . . . . . . . . . . . . . . . . . . 409

Andriyan Bayu Suksmono

Finding a Hadamard Matrix by Simulated Quantum Annealing
Reprinted from: Entropy 2018, 20, 141, doi:10.3390/e20020141 . . . . . . . . . . . . . . . . . . . . . 433

Ameneh Arjmandzadeh and Majid Yarahmadi

Quantum Genetic Learning Control of Quantum Ensembles with Hamiltonian Uncertainties
Reprinted from: Entropy 2017, 19, 376, doi:10.3390/e19080376 . . . . . . . . . . . . . . . . . . . . . 447

Lucas Kocia, Yifei Huang, Peter Love

Discrete Wigner Function Derivation of the Aaronson—Gottesman Tableau Algorithm
Reprinted from: Entropy 2017, 19, 353, doi:10.3390/e19070353 . . . . . . . . . . . . . . . . . . . . . 459

vi
Alain Deville and Yannick Deville
Concepts and Criteria for Blind Quantum Source Separation and Blind Quantum Process
Tomography
Reprinted from: Entropy 2017, 19, 311, doi:10.3390/e19070311 . . . . . . . . . . . . . . . . . . . . . 477

vii
About the Special Issue Editors
Giacomo Mauro D’Ariano is full professor at Pavia University, where he teaches Quantum
Mechanics and Foundations of Quantum Theory, and leads the group QUit. He is a fellow of
the American Physical Society and of the Optical Society of America, a member of the Academy
Istituto Lombardo of Scienze e Lettere, of the Center for Photonic Communication and Computing
at Northwestern IL, and of FQXi. He is the (co)author of more than 350 articles in peer-reviewed
physics journals. He started Quantum Information in Italy, where he created a school that spread
scholars worldwide.

Paolo Perinotti is an associate professor at the Physics Department of Pavia University. He

teaches “Theoretical physics and information theory” and “Statistical Mechanics”. His research
activity is focused on quantum information theory, quantum foundations, and quantum mechanics.
In 2016 he was awarded the Birkhoff Von Neumann prize for researches in Quantum Foundations.
He is a member of the Foundational Questions Institute (FQXi), and of the International Quantum
Structures Association.

ix
entropy
Editorial
Quantum Information and Foundations
Giacomo Mauro D’Ariano * and Paolo Perinotti *
QUIT Group, Dipartimento di Fisica dell’Università di Pavia, Istituto Nazionale di Fisica Nucleare, via Bassi 6,
27100 Pavia, Italy
* Correspondence: [email protected] (G.M.D.); [email protected] (P.P.)

Received: 3 December 2019; Accepted: 10 December 2019; Published: 23 December 2019

Keywords: quantum information; quantum foundations; quantum theory and gravity

The new era of quantum foundations, fed by the quantum information theory experience and
opened in the early 2000s by a series of memorable papers [1–3], led in a few years to a wealth of
results, that can all be roughly traced back to the idea of testing quantum theory against new rivals
instead of struggling in the worn-out attempt at its recomprehension within a classical imaginative
world. The first remarkable construction of a toy theory for foundational purposes, in our knowledge,
is represented by Ref. [4].
The study of foil theories along with their informational power lead to important progress,
paralleled with an increasing understanding of the new foundational scenario [5]. Most importantly,
this stream of thought is the origin of the new paradigm of the so-called reconstructions, which aim
at singling out quantum theory in a wider scenario of possible theories of elementary physical
systems [6,7]. Grant the authors an unwarranted bit of pride in stating that a clear picture of such a
playground is now available thanks to the formulation of the concept of Operational Probabilistic Theory
(OPT) [8,9]. As a result of the growing interest, we now understand quantum theory as a special kind
of information theory, with postulates that regard the possibility or impossibility to carry out specific
information processing tasks, instead of directly describing the mathematical structures of Hilbert
spaces, operator algebras, and alike.
One of the future challenges for the informational approach to quantum foundations is then to
embrace the mechanical part of the theory, besides the merely information-theoretic one, or, better,
to remain on top of it.
The time demarcation represented by the year 2000 is of course artificial, just like every symbolic
date, as quantum information was strictly connected to foundations since its very birth. One could
not express this fact in better words than Chris Fuchs’ own: “The title of the NATO Advanced Research
Workshop that gave birth to this volume was ‘Decoherence and its Implications in Quantum Computation
and Information Transfer’ . . . The life of the party was all the talks and conversations on ‘Decoherence and its
Implications in Quantum Foundations’.” [10]. The new approach, moreover, has some deep connections
with the previous experience that can be broadly collected under the name quantum logic. Having said
that, the turn of the century undoubtedly brought the foundations new vigour.
This special issue is meant to witness recent progress of the balanced and fertile interchange
between the developments in application-oriented quantum information theory and those in
foundations. As a result, the response of the authors was great, and produced a perfect blend of
flavours. The subjects of the contributions can be briefly classified in three groups.
The first one can be deemed resources. One of the main topics in a well-organised information
theory is quantification and classification of resources. It is nowadays common wisdom that the
resource for quantum computation and information is entanglement, which is incidentally one of
the main resources also for foundations. In a broader view, entanglement is one of the nonclassical
resources allowed by quantum theory.

Entropy 2020, 22, 22; doi:10.3390/e22010022 1 www.mdpi.com/journal/entropy

Entropy 2020, 22, 22

In the contribution, Nonclassicality by Local Gaussian Unitary Operations for Gaussian States [11],
the authors introduce a measure of nonclassicality for Gaussian states of continuous variable systems
and compare it with other measures of nonclassical correlations. The resource in this case is
nonclassicality, namely, the ability to produce phenomena that are not reproducible by classical
means. The proposed measure of nonclassicality is explicitly computed for a system of two bosonic
modes, and estimated in the general case.
In another respect, one of the primary resources in quantum information is the ability to prepare
states on demand. Methods for predicting the statistical efficiency of sources, or for sharpening our
description of preparations through density matrices in the presence of partial information are then of
the utmost importance.
In the paper, Entropic Updating of Probabilities and Density Matrices [12], the author analyses the task
of reconstructing the theoretical description of a quantum state from partial experimental information.
The standard relative entropy and the Umegaki entropy are derived in parallel from the same set of
design criteria.
Finally, in the contribution, Structure of Multipartite Entanglement in Random Cluster-Like Photonic
Systems [13], the authors analyse the size of multipartite entanglement in randomly generated cluster
states, relating it to the density of nodes in the cluster.
A second collection of contributions regards algorithms and protocols. This selection witnesses
progress in the ongoing challenge towards new algorithms and new tasks. In the contribution, Finding
a Hadamard Matrix by Simulated Quantum Annealing [14], the author analyses quantum algorithms for
finding a Hadamard matrix, which is itself a hard problem. The problem is reformulated in terms of
energy minimisation of spin vectors connected by a complete graph, and approached via path-integral
Monte-Carlo techniques. The scaling properties of the method show that the quantum algorithm
outperforms its classical counterpart in solving this hard problem, providing yet another hint to
quantum supremacy.
In the contribution, Quantum Genetic Learning Control of Quantum Ensembles with Hamiltonian
Uncertainties [15], the authors propose a new method for controlling a quantum ensemble of two-level
systems with uncertainties in the parameters of the Hamiltonian system. The method is based on the
combination of a sample-learning control and a quantum genetic algorithm, witnessing the continuous
cross-fertilisation between quantum theory and computer science.
The authors of the contribution, Discrete Wigner Function Derivation of the Aaronson-Gottesman
Tableau Algorithm [16], present a discrete Wigner-function-based simulation algorithm for odd-d
qudits that has the same time and space complexity as the Aaronson–Gottesman algorithm for qubits.
The authors also discuss the differences between the Wigner function algorithm for odd-d and the
Aaronson–Gottesman algorithm for qubits, conjecturing that they are due to the fact that qubits exhibit
state-independent contextuality. This may provide a guide for extending the discrete Wigner function
approach to qubits. Considering this result, one can easily realise how tightly quantum computation
and quantum foundations are bound.
Concepts and Criteria for Blind Quantum Source Separation and Blind Quantum Process Tomography [17]
discusses communication protocols for demixing a signal from the output of a communication line
and establishes properties that were already used without justification in that context. The scenario
considered here involves a pair of electron spins initially prepared in a pure state and then submitted
to an undesired exchange coupling. The authors introduce a criterion for checking that the coupling
does not produce entanglement.
In recent years, after studies that provided a fully algebraic method for analysing quantum
circuits [18], it was realised that there are easy protocols challenging the circuit model, but are still
amenable to a fully algebraic account [19]. Some of these protocols can be interpreted as computations
that call events in a causally indefinite order, thus hinting to interesting foundational questions. In the
article, Non-Causal Computation [20], the authors review recent results on indefinite orders and their

2
Entropy 2020, 22, 22

potentiality in computation, replacing the requirement of a global ordering between gates in the
computation with that of mere logical consistency.
The third collection regards foundations. This is the subject that encompasses all the remaining
contributions, that amount to fifteen, with a very diverse span of subjects, approaches, and techniques.
One of the lessons of the quantum information theoretical approach to foundations is that very
often physical concepts are easily grasped referring to the operations and processes they can undergo.
In this spirit, the author of the contribution, Agents, Subsystems, and the Conservation of Information [21]
proposes a mathematical modelling for subsystems of physical systems in the general scenario of OPTs,
where subsystems are identified through a subalgebra of the full algebra of operations on the composite
system they are part of. Various cases are then discussed, with a particular focus on quantum systems.
The relevance of appropriately treating subsystems of composite systems might appear somewhat
technical at a superficial sight, but after giving the subject some more thought, one realizes that
the notion of subsystem underlies many fundamental questions, e.g., Wigner’s thought experiment
popularly known as the Wigner’s friend paradox. This is the subject of the contribution, A No-Go
Theorem for Observer-Independent Facts [22], which proposes a perspective on the argument of
Frauchiger and Renner [23] proving that “single-world interpretations of quantum theory cannot be
self-consistent”. The author derives a no-go theorem for observer-independent facts, which would be
common both for Wigner and the friend. This result is claimed to undermine one of the assumptions
behind the concept of “self-consistency” by the authors of Ref. [23].
The analysis of conceptual foundational questions is possible thanks to the availability of a
suitable mathematical language. A continuous process of reformulation and reconsideration of the
latter is an important chapter in quantum foundations, as witnessed by the contribution, A Royal
Road to Quantum Theory (or Thereabouts) [24]. Here, the author proposes an alternate perspective for
approaching the problem of reformulating the mathematical language of quantum theory from simple
postulates, based on the theory of Euclidean Jordan algebras. While the paper, as declared by the
author, “fails to derive quantum mechanics”, it derives a more general framework that embraces the
quantum along with alternate, not wildly different possible theories.
In addition, the article Quantum Theory from Rules on Information Acquisition [25] reviews a
reconstruction of the mathematical framework of quantum theory. The starting point here is a set of
rules constraining an observer’s acquisition of information about physical systems. The reconstruction
offers an informational explanation for entanglement, monogamy, and nonlocality, from limited
accessible information and complementarity. The analysis leads to a notion of “conserved informational
charges” that stems from complementarity relations that characterise the unitary group and the set of
pure states.
The review The Many Classical Faces of Quantum Structures [26] addresses a mathematical
reformulation of quantum mechanics in terms of classical mechanics. The standpoint for this approach
is that interpretational problems with quantum mechanics can be phrased precisely by only talking
about empirically accessible information. This review spells out the main points of the abovementioned
approach in terms of the algebraic structures lying behind quantum theory.
After the reconstruction of the mathematical language of quantum theory from information
theoretical postulates was completed, one of the possible developments was the attempt at a
reformulation of quantum mechanics from information processing. In this respect, much progress
was achieved, essentially showing that one can have a fully information-theoretic account of the
basic equations at the core of relativistic quantum field theory, such as Weyl’s and Dirac’s [27,28],
and Maxwell’s [29]. The next difficult step in this direction is introducing interactions. A recent
result in this direction is the study of all possible interacting cellular automata in one dimension
along with a full diagonalization of their two-particle sector [30]. In the contribution, Solutions of a
two-particle interacting quantum walk [31], the authors provide an alternative solution of the dynamics
of the abovementioned class of cellular automata based on a path-sum approach.

3
Entropy 2020, 22, 22

Once again, on the exploration of the language of quantum foundations, one can read Ruling
out Higher-Order Interference from Purity Principles [32], where the authors analyse the principles of
Causality, Purity Preservation, Pure Sharpness, and Purification in the operational framework of
generalised probabilistic theories, proving that these principles limit interference to second-order,
namely, the interference pattern formed in a multislit experiment is a function of the interference
patterns formed between pairs of slits. This behaviour is typical of quantum theory, where there are no
genuinely new features resulting from considering three slits instead of two. Systems in such theories
correspond to Euclidean Jordan algebras.
Another contribution that is focused on the mathematical language and its framework is Leaks:
Quantum, Classical, Intermediate and More [33], where the authors introduce the notion of a leak for
general process theories and identify quantum theory as a theory with minimal leakage, as opposed
to classical theory that has maximal leakage. Leaks are processes that provide leakage of classical
information, and can be introduced in most theories. These processes allow for a category theoretical
account of decoherence as a mechanism for the emergence of classical theory in a quantum scenario.
The authors also discuss the relation of leaks with purity of processes.
One of the main themes in the context of reconstructions and reformulations of quantum theory is
to open the route to possible new post-quantum theories. The article, Iterant Algebra [34] moves a step
beyond quantum theory, starting from a generalisation of the structure of matrix algebra, motivated
by the structure of measurement for discrete processes. Iterant algebra is shown to embrace matrix
and Clifford algebras, and the framework is then applied to discuss various aspects of quantum
mechanics, such as the Schrödinger and Dirac equations, Majorana Fermions, and representations of
the braid group.
We now move to a different chapter in foundations, where one can use the standard mathematical
formalism to face questions and concepts that have interpretational issues. An example is given
by Robust Macroscopic Quantum Measurements in the Presence of Limited Control and Knowledge [35].
The authors tackle the problem of compatibility of quantum behaviour and macroscopic measurements,
focusing on the estimation of the polarization direction for a large system of spin 1/2 particles.
The analysis starts from a model of von Neumann pointer measurement and shows traits of a classical
measurement for an intermediate coupling strength. A relevant part of the contribution is devoted to
the analysis of response of the model against relaxations of the initial assumptions, showing that the
model is robust.
One of the fundamental subjects that attracted interest from the very birth of quantum mechanics
is uncertainty. The study of uncertainty is still lively, and the present special issue includes one
contribution that is devoted to this subject: Measurement Uncertainty Relations for Position and Momentum:
Relative Entropy Formulation [36]. The authors analyse uncertainty as related to incompatibility of
different observables, where the latter is quantified by the amount of unavoidable approximation in a
joint measurement. As a quantifier of information loss, the authors consider relative entropy of a “true”
probability distribution and an approximating one. Such an analysis is applied to obtain lower bound
for the amount of information that is lost by replacing the distributions of the sharp position and
momentum observables, as they could be obtained with two separate experiments, by the marginals of
any smeared joint measurement.
The renewed interest in fundamental problems produced new approaches to the unification of
quantum mechanics and the theory of gravity. Recent trends in quantum gravity are thus of high
interest for the community working in foundations and, for this reason, we appreciate the value of
a contribution such as Planck-Scale Soccer-Ball Problem: A Case of Mistaken Identity [37], which reports
about reflections on the rule of composition for momenta. Over the last decade, nonlinear laws of
composition of momenta were predicted by many approaches to quantum gravity. In order to dissipate
concerns about such nonlinearity, the author discusses the subtle difference between the two roles that
a law of momentum composition play: the first one is related to the description of space-time locality,
and the second one is related to translational invariance. The contribution exhibits an example of

4
Entropy 2020, 22, 22

space-time where the local structure provides a nonlinear composition of momenta and yet translational
invariance is expressed by a linear law for the addition of momenta of many-particle systems.
Another contribution focused on a model aiming at a formulation of quantum gravity is Entropic
Phase Maps in Discrete Quantum Gravity [38], where the author makes an attempt based on path
summation over a space of evolutionary pathways in a history configuration space. This approach
enables derivation of discrete Schrödinger-type equations, and mathematical constructions thereof are
used to introduce entropic functions that obey an abstract version of the second law of thermodynamics.
One of the most remarkable consequences of the widespread interest in foundations is a
flourishing of experiments aimed at testing fundamental questions, or challenging established pillars
of quantum theory. A remarkable example is the Pauli exclusion principle for Fermions, that has been
tested in a series of recent experiments, in an ongoing effort that is witnessed also by a contribution
in the present issue, Test of the Pauli Exclusion Principle in the VIP-2 Underground Experiment [39].
Here, the authors report progress of the VIP-2 experiments at the Laboratori Nazionali del Gran Sasso,
seeking a prohibited transition in copper atoms of a 2p orbit electron to the fully populated ground
state, via X-ray analysis. The present limit on the probability for Pauli exclusion principle violation for
electrons set by the VIP experiment is 4.7 × 10−29 . A first result from the VIP-2 experiment improves
on the VIP limit, while the goal is a gain of two orders of magnitude in the long run.
A second example is the test of spontaneous collapse models, which aim at an objective solution of
the measurement problem that keeps the quantum formalism untouched while tweaking its dynamical
equations. In the contribution, CSL Collapse Model Mapped with the Spontaneous Radiation [40], new upper
limits on the parameters of the Continuous Spontaneous Localization collapse models are extracted.
The main idea behind the experiment is to analyse IGEX data about X-ray emission and compare
them with the spectrum of the spontaneous photon emission process predicted by collapse models.
This study allows for the exclusion of a broad range of the parameter space for CSL models.
Finally, we include a contribution out of line, which is more focused on interpretational issues
than technical, such as Quantum Information: What Is It All About? [41]. In this contribution, the author
answers the provocative question originally posed by John Bell, claiming that, in the consistent
histories approach to quantum theory, information is meant about projectors on subspaces of the
Hilbert space of a system, representing its quantum properties. The main focus is the discussion of
how the single-framework rule—i.e., the rule for assigning probabilities to a projective decomposition
of the identity—for consistent histories avoids contradictions and recovers both classical information
theory and macroscopic physics. Room for issues is left only in the regimes without classical analogue,
where a single framework is not sufficient.
As a concluding remark, we would like to thank all the authors for their contributions and declare
our satisfaction in verifying the ongoing interest in fundamental problems—the only possible fuel for
the science and technology of tomorrow.

Acknowledgments: We express our thanks to the authors of the above contributions, and to the journal Entropy
and MDPI for their support during this work.
Conﬂicts of Interest: The authors declare no conﬂict of interest.

References
1. Hardy, L. Quantum theory from ﬁve reasonable axioms. arxiv 2001, arXiv:quant-ph/0101012.
2. Fuchs, C.A. Quantum Mechanics as Quantum Information (and only a little more). arXiv 2002,
arXiv:quant-ph/0205039.
3. Brassard, G. Is information the key? Nat. Phys. 2005, 1, 2–4. [CrossRef]
4. Hardy, L. Disentangling nonlocality and teleportation. arXiv 1999, arXiv:quant-ph/9906123.
5. Spekkens, R.W. Evidence for the epistemic view of quantum states: A toy theory. Phys. Rev. A 2007, 75, 032110.
[CrossRef]

5
Entropy 2020, 22, 22

6. Dakic, B.; Brukner, C. Quantum theory and beyond: Is entanglement special? In Deep Beauty: Understanding
the Quantum World through Mathematical Innovation; Halvorson, H., Ed.; Cambridge University Press:
Cambridge, UK, 2011; pp. 365–392.
7. Masanes, L.; Müller, M.P. A derivation of quantum theory from physical requirements. New J. Phys.
2011, 13, 063001. [CrossRef]
8. Chiribella, G.; D’Ariano, G.M.; Perinotti, P. Probabilistic theories with puriﬁcation. Phys. Rev. A
2010, 81, 062348. [CrossRef]
9. D’Ariano, G.M.; Chiribella, G.; Perinotti, P. Quantum Theory from First Principles: An Informational Approach;
Cambridge University Press: Cambridge, UK, 2017.
10. Gonis, T.; Gonis, A.; Turchi, P.E. Decoherence and Its Implications in Quantum Computation and
Information Transfer; NATO Science Series: Computer and Systems Sciences; IOS Press: Amsterdam,
The Netherlands, 2001.
11. Wang, Y.; Qi, X.; Hou, J. Nonclassicality by Local Gaussian Unitary Operations for Gaussian States.
Entropy 2018, 20, 266. [CrossRef]
12. Vanslette, K. Entropic Updating of Probabilities and Density Matrices. Entropy 2017, 19, 664. [CrossRef]
13. Ciampini, M.A.; Mataloni, P.; Paternostro, M. Structure of Multipartite Entanglement in Random Cluster-Like
Photonic Systems. Entropy 2017, 19, 473. [CrossRef]
14. Suksmono, A.B. Finding a Hadamard Matrix by Simulated Quantum Annealing. Entropy 2018, 20, 141.
[CrossRef]
15. Arjmandzadeh, A.; Yarahmadi, M. Quantum Genetic Learning Control of Quantum Ensembles with
Hamiltonian Uncertainties. Entropy 2017, 19, 376. [CrossRef]
16. Kocia, L.; Huang, Y.; Love, P. Discrete Wigner Function Derivation of the Aaronson–Gottesman Tableau
Algorithm. Entropy 2017, 19, 353. [CrossRef]
17. Deville, A.; Deville, Y. Concepts and Criteria for Blind Quantum Source Separation and Blind Quantum
Process Tomography. Entropy 2017, 19, 311. [CrossRef]
18. Chiribella, G.; D’Ariano, G.M.; Perinotti, P. Theoretical framework for quantum networks. Phys. Rev. A
2009, 80, 022339. [CrossRef]
19. Chiribella, G.; D’Ariano, G.M.; Perinotti, P.; Valiron, B. Quantum computations without deﬁnite causal
structure. Phys. Rev. A 2013, 88, 022318. [CrossRef]
20. Baumeler, Ä.; Wolf, S. Non-Causal Computation. Entropy 2017, 19, 326. [CrossRef]
21. Chiribella, G. Agents, Subsystems, and the Conservation of Information. Entropy 2018, 20, 358. [CrossRef]
22. Brukner, Č. A No-Go Theorem for Observer-Independent Facts. Entropy 2018, 20, 350. [CrossRef]
23. Frauchiger, D.; Renner, R. Quantum theory cannot consistently describe the use of itself. Nat. Commun.
2018, 9, 3711. [CrossRef]
24. Wilce, A. A Royal Road to Quantum Theory (or Thereabouts). Entropy 2018, 20, 227. [CrossRef]
25. Höhn, P.A. Quantum Theory from Rules on Information Acquisition. Entropy 2017, 19, 98. [CrossRef]
26. Heunen, C. The Many Classical Faces of Quantum Structures. Entropy 2017, 19, 144. [CrossRef]
27. Bisio, A.; D’Ariano, G.M.; Tosini, A. Dirac quantum cellular automaton in one dimension: Zitterbewegung
and scattering from potential. Phys. Rev. A 2013, 88, 032301. [CrossRef]
28. D’Ariano, G.M.; Perinotti, P. Derivation of the Dirac equation from principles of information processing.
Phys. Rev. A 2014, 90, 062106. [CrossRef]
29. Bisio, A.; D’Ariano, G.M.; Perinotti, P. Quantum Cellular Automaton Theory of Light. arXiv 2014,
arXiv:1407.6928.
30. Bisio, A.; D’Ariano, G.M.; Perinotti, P.; Tosini, A. Thirring quantum cellular automaton. Phys. Rev. A
2018, 97, 032132. [CrossRef]
31. Bisio, A.; D’Ariano, G.M.; Mosco, N.; Perinotti, P.; Tosini, A. Solutions of a Two-Particle Interacting Quantum
Walk. Entropy 2018, 20, 435. [CrossRef]
32. Barnum, H.; Lee, C.M.; Scandolo, C.M.; Selby, J.H. Ruling out Higher-Order Interference from Purity
Principles. Entropy 2017, 19, 253. [CrossRef]
33. Selby, J.; Coecke, B. Leaks: Quantum, Classical, Intermediate and More. Entropy 2017, 19, 174. [CrossRef]
34. Kauffman, L.H. Iterant Algebra. Entropy 2017, 19, 347. [CrossRef]
35. Renou, M.O.; Gisin, N.; Fröwis, F. Robust Macroscopic Quantum Measurements in the Presence of Limited
Control and Knowledge. Entropy 2018, 20, 39. [CrossRef]

6
Entropy 2020, 22, 22

36. Barchielli, A.; Gregoratti, M.; Toigo, A. Measurement Uncertainty Relations for Position and Momentum:
Relative Entropy Formulation. Entropy 2017, 19, 301. [CrossRef]
37. Amelino-Camelia, G. Planck-Scale Soccer-Ball Problem: A Case of Mistaken Identity. Entropy 2017, 19, 400.
[CrossRef]
38. Dribus, B.F. Entropic Phase Maps in Discrete Quantum Gravity. Entropy 2017, 19, 322. [CrossRef]
39. Curceanu, C.; Shi, H.; Bartalucci, S.; Bertolucci, S.; Bazzi, M.; Berucci, C.; Bragadireanu, M.; Cargnelli, M.;
Clozza, A.; De Paolis, L.; et al. Test of the Pauli Exclusion Principle in the VIP-2 Underground Experiment.
Entropy 2017, 19, 300. [CrossRef]
40. Piscicchia, K.; Bassi, A.; Curceanu, C.; Grande, R.D.; Donadi, S.; Hiesmayr, B.C.; Pichler, A. CSL Collapse
Model Mapped with the Spontaneous Radiation. Entropy 2017, 19, 319. [CrossRef]
41. Grifﬁths, R.B. Quantum Information: What Is It All About? Entropy 2017, 19, 645. [CrossRef]

c 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

7
Article
Solutions of a Two-Particle Interacting
Quantum Walk
Alessandro Bisio, Giacomo Mauro D’Ariano, Nicola Mosco *, Paolo Perinotti * and
Alessandro Tosini
Dipartimento di Fisica dell’Università di Pavia, Istituto Nazionale di Fisica Nucleare, Pavia 27100, Italy;
[email protected] (A.B.); [email protected] (G.M.D.); [email protected] (A.T.)
* Correspondence: [email protected] (N.M.); [email protected] (P.P.);
Tel.: +39-0382-987675 (N.M. & P.P.)

Received: 22 April 2018; Accepted: 31 May 2018; Published: 5 June 2018

Abstract: We study the solutions of an interacting Fermionic cellular automaton which is the analogue
of the Thirring model with both space and time discrete. We present a derivation of the two-particle
solutions of the automaton recently in the literature, which exploits the symmetries of the evolution
operator. In the two-particle sector, the evolution operator is given by the sequence of two steps, the
ﬁrst one corresponding to a unitary interaction activated by two-particle excitation at the same site,
and the second one to two independent one-dimensional Dirac quantum walks. The interaction step
can be regarded as the discrete-time version of the interacting term of some Hamiltonian integrable
system, such as the Hubbard or the Thirring model. The present automaton exhibits scattering
solutions with nontrivial momentum transfer, jumping between different regions of the Brillouin
zone that can be interpreted as Fermion-doubled particles, in stark contrast with the customary
momentum-exchange of the one-dimensional Hamiltonian systems. A further difference compared
to the Hamiltonian model is that there exist bound states for every value of the total momentum and
of the coupling constant. Even in the special case of vanishing coupling, the walk manifests bound
states, for ﬁnitely many isolated values of the total momentum. As a complement to the analytical
derivations we show numerical simulations of the interacting evolution.

Keywords: quantum walks; Hubbard model; Thirring model

1. Introduction
Quantum walks (QWs) describe the evolution of one-particle quantum states on a lattice, or, more
generally, on a graph. The quantum walk evolution is linear in the quantum state and the quantum
aspect of the evolution occurs in the interference between the different paths available to the walker.
There are two kinds of quantum walks: continuous time QWs, where the evolution operator of
the system given in terms of an Hamiltonian can be applied at any time (see Farhi et al. [1]),
and discrete-time QWs, where the evolution operator is applied in discrete unitary time-steps.
The discrete-time model, which appeared already in the Feynman discretization of the Dirac
equation [2], was later rediscovered in quantum information [3–7], and proved to be a versatile
platform for various scopes. For example, QWs have been used for empowering quantum algorithms,
such as database search [8,9], or graph isomorphism [10,11]. Moreover, quantum walks have been
studied as a simulation tool for relativistic quantum ﬁelds [12–28], and they have been used as discrete
models of spacetime [29–32].
QWs are among the most promising quantum simulators with possible realizations in a variety of
physical systems, such as nuclear magnetic resonance [33,34], trapped ions [35], integrated photonics,
and bulk optics [36–39].

Entropy 2018, 20, 435; doi:10.3390/e20060435 9 www.mdpi.com/journal/entropy

Entropy 2018, 20, 435

New research perspectives are unfolding in the scenario of multi-particle interacting quantum
walks where two or more walking particles are coupled via nonlinear (in the field) unitary operators.
The properties of these systems are still largely unexplored. Both continuous-time [40] and
discrete-time [41] quantum walks on sparse unweighted graphs are equivalent in power to the quantum
circuit model. However, it is highly non-trivial to design a suitable architecture for universal quantum
computation based on quantum walks. Within this perspective, a possible route has been suggested
in [42] based on interacting multi-particle quantum walks with indistinguishable particles (Bosons
or Fermions), proving that “almost any interaction” is universal. Among the universal interacting
many-body systems are the models with coupling term of the form χδx1 ,x2 n̂( x1 )n̂( x2 ), with n̂( x )
the number operator at site x. The latter two-body interaction lies at the basis of notable integrable
quantum systems in one space dimension such as the Hubbard and the Thirring Hamiltonian models.
The first attempts at the analysis of interacting quantum walks were carried out in [43,44].
More recently, in [45], the authors proposed a discrete-time analogue of the Thirring model, which is
indeed a Fermionic quantum cellular automaton, whose dynamics in the two-particle sector reduces to
an interacting two-particle quantum walk. As for its Hamiltonian counterpart, the discrete-time
interacting walk has been solved analytically in the case of two Fermions. Analogously to any
Hamiltonian integrable system, also in the discrete-time case the solution is based on the Bethe
Ansatz technique. However, discreteness of the evolution prevents the application of the usual Ansatz,
and a new Ansatz has been introduced successfully [45].
In this paper, we present an original simplified derivation of the solution of [45], which exploits
the symmetries of the interacting walk. We present the diagonalization of the evolution operator and
the characterization of its spectrum. We explicitly write the two particle states corresponding to the
scattering solutions of the system, having eigenvalues in the continuous spectrum of the evolution
operator. We then show how the present model predicts the formation of bound states, which are
eigenstates of the interacting walk corresponding to the discrete spectrum. We provide also in this
case the analytic expression of such molecular states.
We comment on the phenomenological differences between the Hamiltonian model and the
discrete-time one. First, we see that the set of possible scattering solutions is larger in the discrete-time
case: for a fixed value total momentum, a non trivial transfer of relative momentum can occur besides
the simple exchange of momentum between the two particles, differently from the Hamiltonian case.
In addition, the family of bound states appearing in the discrete-time scenario is larger than the
corresponding Hamiltonian one. Indeed, for any fixed value of the coupling constant, a bound state
exists with any possible value of the total momentum, while, for Hamiltonian systems, bound states
cannot have arbitrary total momentum.
Finally, we show that, in the set of solutions for the interacting walk, there are perfectly localized
states (namely, states that lie on a finite number of lattice sites). Moreover, differently from the
Hamiltonian systems, bound states exist also for null coupling constants; however, this is true only for
finitely many isolated values of the total momentum. In addition to the exact analytical solution of the
dynamics, we show the simulation of some significant initial states.

2. The Dirac Quantum Walk

In this section, we review the Dirac quantum cellular automaton on the line describing the
free evolution of a two-component Fermionic ﬁeld. The single particle Hilbert space is given by
H := C2 ⊗ 2 (Z) for which we employ the factorized basis | a | x , with a ∈ {↑, ↓} and x ∈ Z.
The Dirac automaton describes an arbitrary number of Fermions whose evolution is linear in the ﬁeld:

ψ↑ ( x, t)
ψ( x, t + 1) = W ψ( x, t), ψ( x, t) = , (1)
ψ↓ ( x, t)
[ψa ( x ), ψb† ( x )]+ = δa,b δx,x , [ψa ( x ), ψb ( x )]+ = 0, (2)

10
Entropy 2018, 20, 435

where W is a unitary operator. In the single particle sector, the automaton can be regarded as a
quantum walk on the single-particle Hilbert space H whose evolution unitary operator W is given by

νTx −iμ
W= , ν, μ > 0, ν2 + μ2 = 1, (3)
−iμ νTx†

where Tx denotes the translation operator on 2 (Z), deﬁned by Tx | x = | x + 1.

Since the walk W is translation invariant (it commutes with the translation operator),
it can be diagonalized in momentum space. In the momentum representation, deﬁning | p :=
(2π )−1/2 ∑ x∈Z e−ipx | x , with p ∈ B := (−π, π ], the walk operator can be written as

νeip −iμ
W= dp W ( p) ⊗ | p p| , W ( p) = , (4)
B −iμ νe−ip

where |ν|2 + |μ|2 = 1. The spectrum of the walk is given by {e−iω ( p) , eiω ( p) }, where the dispersion
relation ω ( p) is given by
ω ( p) := Arccos(ν cos p), (5)

where Arccos denotes the principal value of the arccosine function. The single-particle eigenstates,
solving the eigenvalue problem

W ( p)vsp = e−isω ( p) vsp , s = ±, (6)

can be conveniently written as

1 −iμ
vsp = , (7)
| Ns | gs ( p )

with gs ( p) := −i (s sin ω ( p) + ν sin p), | Ns |2 := μ2 + | gs |2 .

3. The Thirring Quantum Walk

In this section, we present a Fermionic cellular automaton in one spatial dimension with an on-site
interaction, namely two particles interact only when they lie at the same lattice site. The linear part
corresponds to the Dirac QW [17] and the interaction term is the most general number-preserving
coupling in one dimension [46]. The same kind of interaction characterizes also the most studied
integrable quantum systems, such as the Thirring [47] and the Hubbard [48] models.
The linear part of the automaton is given by the Dirac automaton, describing the free evolution of
the particles. In order to introduce an interaction, we modify the evolution operator adding an extra
unitary step of the form:
Vint := eiχn↑ ( x)n↓ ( x) , (8)

where n a ( x ), a ∈ {↑, ↓}, represents the particle number at site x, namely n a ( x ) = ψa† ( x )ψa ( x ), and χ
is a real coupling constant. Since the interaction term preserves the total number operator, we can
study the automaton for a ﬁxed number of particles. For N interacting particles, we can describe the
evolution in terms of an interacting quantum walk over H N = H ⊗ N with the free evolution given by
WN := W ⊗ N .
In this work, we focus on the two-particle sector whose solutions has been derived in [45]. As we
will see, the Thirring walk features molecule states besides scattering solutions. This features is shared
also by the Hadamard walk with the same on-site interaction [44].

11
Entropy 2018, 20, 435

WN := W ⊗ N , acting on the Hilbert space H N = H ⊗ N and describing the free evolution of the
particles. In order to introduce an interaction, we modify the update rule of the walk with an extra
step Vint : UN := WN Vint . In the present case, the term Vint has the form

Vint = VN (χ) := eiχn↑ ( x)n↓ ( x) . (9)

Since we focus on the solutions involving the interaction of two particles, it is convenient to write
the walk in the centre of mass basis | a1 , a2 |y |w, with a1 , a2 ∈ {↑, ↓}, y = x1 − x2 and w = x1 + x2 .
Therefore, on this basis, the generic Fermionic state is |ψ = ∑ a1 ,a2 ,y,w c( a1 , a2 , y, w) | a1 , a2 |y |w with
c( a2 , a1 , y, w) = −c( a1 , a2 , −y, w). Notice that only the pairs y, w with y and w, both even or odd,
correspond to physical points in the original basis x1 , x2 .
We deﬁne the two-particle walk with both y and w in Z, so that the linear part of walk can be
written as
⎛ ⎞
ν 2
μ Tw −iTy ⊗ Tw −iTy† ⊗ Tw − μν
⎜ −iT ⊗ T ν 2
−νμ
−iTy ⊗ Tw ⎟
†
⎜ y w μ Ty ⎟
W2 = μν ⎜ ⎜−iT † ⊗ Tw μ ν † 2 † ⊗ T† ⎟ ,
⎟ (10)
⎝ y − ν μ yT − iT y w⎠
− μν −iTy ⊗ Tw† −iTy† ⊗ Tw† ν †2
μ Tw

where Ty represents the translation operator in the relative coordinate y, and Tw the translation operator
in the centre of mass coordinate w, whereas the interacting term reads
⎛ ⎞
Iy ⊗ Iw 0 0 0
⎜ iχδy,0
⊗ Iw 0 ⎟
⎜ 0 e 0 ⎟
V2 (χ) = ⎜ ⎟. (11)
⎝ 0 0 eiχδy,0 ⊗ Iw 0 ⎠
0 0 0 Iy ⊗ Iw

This deﬁnition gives a walk U2 = W2 V2 (χ) that can be decomposed into two identical copies
of the original walk. Indeed, deﬁning as C the projector on the physical center of mass coordinates,
one has U2 = CU2 C + ( I − C )U2 ( I − C ), where CU2 C and ( I − C )U2 ( I − C ) are unitarily equivalent.
We will then diagonalize the operator U2 , reminding readers that the physical solutions will be given
by projecting the eigenvectors with C.
Introducing the (half) relative momentum k = 12 ( p1 − p2 ) and the (half) total momentum
p = 12 ( p1 + p2 ), the free evolution of the two particles is written in the momentum representation as

W2 = dkdp W2 ( p, k) ⊗ |k k | ⊗ | p p| , (12)

where the matrix W2 ( p, k ) is given by

W2 ( p, k ) = W ( p1 ) ⊗ W ( p2 ). (13)

k = v p+k ⊗ v p−k , with s, r = ±, such that

Furthermore, we introduce the vectors vsr : s r

−iωsr ( p,k) sr
W2 ( p, k)vsr
k =e vk , (14)

where ωsr ( p, k) := sω ( p + k) + rω ( p − k ) is the dispersion relation of the two-particle walk. Explicitly,

the vectors vsr
k are given by
⎛ ⎞
− μ2
⎜ ⎟
1 ⎜ −iμgr ( p − k ) ⎟
vsr
k = | N ( p + k )| | N ( p − k )| ⎜ ⎟. (15)
s r ⎝ −iμgs ( p + k ) ⎠
g s ( p + k ) gr ( p − k )

12
Entropy 2018, 20, 435

We focus in this work on Fermionic solutions satisfying the eigenvalue equation

U2 (χ, p) |ψ = e−iω |ψ , ω ∈ R, (16)

with |ψ(y) ∈ C4 . In the centre of mass basis, the antisymmetry condition reads

|ψ(y) = − E |ψ(−y) , (17)

E being the exchange matrix

⎛ ⎞
1 0 0 0
⎜0 0⎟
⎜ 0 1 ⎟
E=⎜ ⎟. (18)
⎝0 1 0 0⎠
0 0 0 1

4. Symmetries of the Thirring Quantum Walk

The Thirring walk manifests some symmetries that allow for simplifying the derivation and
the study of the solutions. First of all, as we already mentioned, one can show that the interaction
V (χ) commutes with the total number operator. This means that one can study the walk dynamics
separately for each ﬁxed number of particles. We focus here on the two-particle walk U2 = W2 V2 (χ),
where W2 = W ⊗ W and V2 (χ) = eiχδy,0 (1−δa1 ,a2 ) .
Since the interacting walk U2 commutes with the translations in the centre of mass coordinate
w, the total momentum is a conserved quantity, so it is convenient to study the walk parameterized
by the total momentum p. To this end, we consider the basis | a1 , a2 |y | p, so that, for ﬁxed values
of p, the interacting walk of two particles can be expressed in terms of a one-dimensional QW
U2 (χ, p) = W2 ( p)V (χ) with a four-dimensional coin:
⎛ ⎞
ν i2p
μe −ieip Ty −ieip Ty† − μν
⎜ ⎟
−
⎜ ieip Ty ν 2
μ Ty − μν −ie−ip Ty ⎟
W2 ( p) = μν ⎜
⎜−ieip T †
⎟. (19)
⎝ y − μν ν †2
μ Ty −ie−ip Ty† ⎟
⎠
− μν −ie−ip Ty −ie−ip Ty† ν −i2p
μ e

Although the range of the variable p is the interval (−π, π ], it is possible to show that one can
restrict the study of the walk to the interval [0, π/2]. On the one hand, the two-particle walk transforms
unitarily under a parity transformation in the momentum space. Starting from the single particle walk,
W ( p) transforms under a parity transformation as

W ( p) = σx W (− p)σx , p ∈ (−π, π ], (20)

so that, for the two-particle walk, we have the relation

W2 (− p, y) = σx ⊗ σx EW2 ( p, y) E σx ⊗ σx . (21)

On the other hand, a translation of π of the total momentum p entails that

W2 ( p + π, y) = σz ⊗ σz W2 ( p, y) σz ⊗ σz , (22)

while the interaction term remains unaffected in both cases.

13
Entropy 2018, 20, 435

The Thirring walk features also another symmetry that can be exploited to simplify the derivation
of the solutions. It is easy to check that the walk operator U2 ( p, χ) = W2 ( p)V (χ) commutes with the
projector deﬁned by ⎛ ⎞
Po 0 0 0
⎜0 P 0 0⎟
⎜ e ⎟
P := ⎜ ⎟, (23)
⎝ 0 0 Pe 0 ⎠
0 0 0 Po
where Pe and Po are the projectors on the even and the odd subspaces, respectively:

Pe = ∑ |2z 2z| , Po = ∑ |2z + 1 2z + 1| . (24)

z∈Z z∈Z

The projector P induces a splitting of the total Hilbert space H in two subspaces PH and
( I − P)H , with the interaction term acting non-trivially only in the subspace PH . In the
complementary subspace ( I − P)H , the evolution is free for Fermionic particles. This means that
solutions of the free theory are also solutions of the interacting one, as opposed to the Bosonic case for
which the interaction is non-trivial also in ( I − P)H .

5. Review of the Solutions

We focus in this section on the antisymmetric solutions of the Thirring walk that actually feel
the interaction. From the remarks that we have made in the previous section, such solutions can only
be found in the subspace PH . Formally, we have to solve the eigenvalue equation PU2 (χ, p) |ψ =
e−iω |ψ, with |ψ ∈ PH . Conveniently, we write a vector |ψ ∈ PH in the form
⎛ ⎞ ⎛ ⎞
ψ1 ( z ) 0
⎜ 0 ⎟ ⎜ ψ2 ( z ) ⎟
⎜ ⎟ ⎜ ⎟
|ψ = ∑ ⎜ ⎟ ⊗ |2z + 1 + ∑ ⎜ 3 ⎟ ⊗ |2z , (25)
z∈Z
⎝ 0 ⎠ z∈Z
⎝ψ (z)⎠
ψ4 ( z ) 0

and the antisymmetry condition becomes:

ψ1,4 (−z) = −ψ1,4 (z − 1), (26)

ψ2 (−z) = −ψ3 (z). (27)

The restriction of the walk to the subspace PH entails that the eigenvalue problem is equivalent
to the following system of equations:
⎧
⎪
⎪e−iω ψ1 (z) = ν2 ei2p ψ1 (z) − iμνeip eiχδz,0 ψ2 (z) − iμνeip eiχδz,−1 ψ3 (z + 1) − μ2 ψ4 (z),
⎪
⎪
⎪
⎨e−iω ψ2 (z) = −iμνeip ψ1 (z − 1) + ν2 eiχδz,1 ψ2 (z − 1) − μ2 eiχδz,0 ψ3 (z) − iμνe−ip ψ4 (z − 1),
(28)
⎪e−iω ψ3 (z) = −iμνeip ψ1 (z) − μ2 eiχδz,0 ψ2 (z) + ν2 eiχδz,−1 ψ3 (z + 1) − iμνe−ip ψ4 (z),
⎪
⎪
⎪
⎪
⎩e−iω ψ4 (z) = −μ2 ψ1 (z) − iμνe−ip eiχδz,0 ψ2 (z) − iμνe−ip eiχδz,−1 ψ3 (z + 1) + ν2 e−i2p ψ4 (z).

The most general solution of Equation (28) for p

∈ {0, π/2} has two forms:
⎧⎛ ⎞
⎪
⎪ ζ ±∞
⎪⎜
⎪ ⎟
⎪
⎪ ⎜ η ⎟
⎪
⎨⎜ ±∞ ⎟
±i2p ⎜ ⎟ δz,0 , z ≥ 0,
U2 (χ, p) |ψ±∞ = e |ψ±∞ , ψ±∞ (z) = ⎜ ⎝ − η± ∞ ⎟
⎠ (29)
⎪
⎪
⎪
⎪ ζ±
⎪
⎪ ∞
⎪
⎩
antisymmetrized, z < 0,

14
Entropy 2018, 20, 435

and
⎛ ⎞
⎧ vsr,1 −i (2z+1)k
⎪ k e
⎨ ∑ sr
dk gω (k)wsr
k ( z ), z > 0, ⎜ vsr,2 e−i(2z)k ⎟
⎜ k ⎟
ψ(z) = s,r =± S wsr
k ( z ) := ⎜ sr,3 −i (2z)k ⎟ ,
⎪
⎩antisymmetrized, ⎝ vk e ⎠
z < 0, −i (2z+1)k
vsr,4
k e
⎛ sr,1
⎞ (30)
S dk gω ( k ) vk
sr
∑s,r=±
⎜ ξ ⎟
⎜ ⎟
ψ (0) = ⎜ ⎟,
⎝ −ξ ⎠
sr ( k ) vsr,4
∑s,r=± S dk gω k

with k = k R + ik I , S := { k ∈ C | k R ∈ (−π, π ] }, and gω

sr satisfying the condition

e−iω
= e−iωsr ( p,k) =⇒ gω
sr
(k) = 0. (31)

sr . Let us now study the equation

Solving Equation (28) corresponds now to ﬁnd the function gω

e−iωsr ( p,k) = e−iω . (32)

Since e−iωsr ( p,k) has to be an eigenvalue of U2 (χ, p), ωsr ( p, k ) must be real and thus k ∈ Γ f or
k ∈ Γl with l = 0, ±1, 2, so we conveniently deﬁne the sets:

Ωsr
f := e−iωsr ( p,k) k ∈ Γ f , Ωsr
l := e−iωsr ( p,k) k ∈ Γl , (33)
π

Γ f := { k ∈ S | k I = 0 } , Γl := k ∈ S k R = l , l = 0, ±1, 2. (34)
2
−iωsr ( p,k)
f ∩ Ωl = ∅ for all s, r and l, and the range of the function e
It is easy to see that Ωsr sr

covers the entire unit circle except for the points e±i2p . Therefore, we can discuss separately the case
e−iω ∈ Ωsrf and the case e
−iω ∈ Ωsr . A solution with e−iω = e±i2p actually exists, corresponding to the
l
function of Equation (29), and it will be discussed in Section 5.3.
Let us start with the case e−iω ∈ Ωsr f , which will lead to the characterization of the continuous
spectrum of the Thirring walk U2 (χ, p) and of the scattering solutions.

5.1. Scattering Solutions

In this section, we assume p
∈ {0, π/2} with e−iω ∈ Ωsr f . This implies that e
−iω
= e±i2p :

indeed, as one can notice from Figure 1, the lines ω = ±2p lie entirely in the gaps between the curves
ω = ±2ω ( p) and ω = ±(π − 2 Arccos(n sin p)). The solution is thus the one given in Equation (30).
One can prove that Ω++ f = Ω−−
f and Ω+− f = Ω−+f . Furthermore, as one can notice from Figure 2,
there are four values of the triple (s, r, k ) such that e−iωsr ( p,k) = e−iω for a given value of e−iω : if the
triple (+, +, k ) is a solution, so are (+, +, π − k), (−, −, −k ) and (−, −, k − π ); and if (+, −, k ) is
a solution, then also (+, −, π − k ), (−, +, −k) and (−, +, k − π ) are solutions. This result greatly
simpliﬁes Equation (30). Indeed, the sum over s, r and the integral over k reduces to the sum of
four terms:

ψk±,1 (z) := (α± +±,1

k vk + δk± v−∓ ,1 −i (2z+1)k
k−π )e − ( β± ±+,1 ± ∓−,1 i (2z+1)k
k v − k + γk v π − k ) e , z ≥ 0,
ψk±,2 (z) := (α± +±,2
k vk − δk± v−∓ ,2 −i2zk
k−π )e − ( β± ±+,2 ± ∓−,2 i2zk
k v − k − γk v π − k ) e , z > 0,
ψk±,3 (z) ± +±,3
:= (αk vk ± −∓,3 −i2zk
− δk vk−π )e ± ±+,3 ± ∓−,3 i2zk
− ( β k v − k − γk v π − k ) e , z > 0, (35)
ψk±,4 (z) := (α± +±,4
k vk
± −∓,4 −i (2z+1)k
+ δk vk−π )e − ( β± ±+,4 ± ∓−,4 i (2z+1)k
k v − k + γk v π − k ) e , z ≥ 0,
ψk±,2 (0) ±,3
= −ψk (0) := ξ.

15
Entropy 2018, 20, 435

As we will see, the original problem can be simplified in this way to an algebraic problem
with a finite set of equations. We note that the fact that the equation e−iωsr ( p,k) = e−iω has a finite
number of solutions is a consequence of the fact that we are considering a model in one spatial
dimension. However, in analogous one-dimensional Hamiltonian models (e.g., the Hubbard model),
the degeneracy of the eigenvalues is two.

-
2

-
0 3
8 4 8 2
p

Figure 1. Continuous spectrum of the two-particle walk as a function of the total momentum
p ∈ [0, π/2] with mass parameter m = 0.7. The continuous spectrum is the same as in the free
case. The solid blue curves are described by the functions ω = ±2ω ( p), and the red ones by
ω = ±(π − 2 Arccos(n sin p)). As one can notice, the light-red lines ω = ±2p lie entirely in the
gaps between the solid curves, highlighting the fact that e±i2p is not in the range of e−iωsr ( p,k) for
p
= 0, π/2 (see text).

1
++
0

-1 +-
-+
-2

-3
-3 -2 -1 0 1 2 3
k

Figure 2. Spectrum of the walk for m = 0.6 and p = π/6 as a function of k. The colours highlight
the different ranges of eigenvalues corresponding to the dispersion relation ωsr ( p, k ). The range of
ωsr ( p, k ) is understood to be computed mod (2π ). One can notice that there are four values of the
relative momentum k having the same value of the dispersion relation (ω = 2 in the ﬁgure). This is in
contrast to the Hamiltonian model for which there are only two solutions.

+,j
Let us consider for the sake of simplicity the solution of the kind ψk (z), since the other one can
be analysed in a similar way. Using the notation of Appendix, Equation (35) reduces to the expressions
(dropping the + superscript)

16
Entropy 2018, 20, 435

ψk1 (z) = a[λe−i(2z+1)k − ρei(2z+1)k ],

ψk2 (z) = λbe−i2zk − ρcei2zk ,
ψk3 (z) = λce−i2zk − ρbei2zk ,
(36)
ψk4 (z) = d[λe−i(2z+1)k − ρei(2z+1)k ],
λ := αk + δk , ρ := β k + γk ,
ψk2 (0) = ξ.

We notice that now the number of unknown parameters is further reduced to three, namely
λ, ρ, and ξ. Clearly, one of the parameters can be fixed by choosing arbitrarily the normalization.
From now on, we fix λ = 1 and define T+ := ρ. Equation (36) has to satisfy the recurrence relations
of Equation (28) for z = 0 and z = 1, while, for z > 1, it is automatically satisfied. For z = 0,
Equation (28) becomes

e−iω ψk1 (0) = ν2 ei2p ψk1 (0) − iμνeip eiχ ξ − iμνeip ψk3 (1) − μ2 ψk4 (0), (37)
e−iω ξ = iμνeip ψk1 (0) − ν2 ψk3 (1) − μ2 eiχ ξ + iμνe−ip ψk4 (0), (38)
− e ξ = −iμνeip ψk1 (0) − μ2 eiχ ξ + ν2 ψk3 (1) − iμνe−ip ψk4 (0),
−iω
(39)
e−iω ψk4 (0) = −μ2 ψk1 (0) − iμνe−ip eiχ ξ − iμνe−ip ψk3 (1) + ν2 e−i2p ψk4 (0). (40)

Starting from Equation (37), we can notice that ν2 ei2p a − iμνeip eik b − iμνeip e−ik c − μ2 d = e−iω a,
where we employed the notation of Appendix A, so that we obtain ξ = e−iχ (b − T+ c). We can
then substitute this expression in Equation (39) and use the relations

−iμνeip eik a + ν2 ei2k b − μ2 c − iμνe−ip eik d = e−iω b, (41)

ip −ik 2 −i2k −ip −ik −iω
−iμνe e a−μ b+ν e
2
c − iμνe e d=e c, (42)

to obtain the expression

e−iχ (b − T+ c) = T+ b − c, (43)

and thus
c + e−iχ b g+ ( p + k) + e−iχ g+ ( p − k)
T+ = = . (44)
b + e−iχ c g+ ( p − k) + e−iχ g+ ( p + k)
For these values of ξ and T+ one can verify that Equation (28) is satisﬁed also for z = 1,
−,j
thus concluding the derivation. For the solution of the kind ψk (z), we can follow a similar reasoning,
obtaining the analogous quantity T− :

g+ ( p + k ) + e−iχ g− ( p − k )
T− := . (45)
g− ( p − k ) + e−iχ g+ ( p + k )

It is worth noticing that T± is of unit modulus for k ∈ (−π, π ].

The ﬁnal form of the solution results in being:

ψk±,1 (z) = (v+±

k
,1
+ v−∓ ,1 −i (2z+1)k
k−π )e − T± (v±+ ,1 ∓−,1 i (2z+1)k
−k + vπ −k )e ,

±,2 −iχδz,0 +±,2 −∓,2 −i2zk ±+,2 ∓−,2 i2zk
ψk (z) = e (vk − vk−π )e − T± (v−k − vπ −k )e ,
(46)
ψk±,3 (z) = (vk+±,3 − v−∓ ,3 −i2zk
k−π )e − T± (v±+ ,3 ∓−,3 i2zk
−k − vπ −k )e ,
ψk±,4 (z) = (vk+±,4 + v−∓ ,4 −i (2z+1)k
k−π )e − T± (v±+ ,4 ∓−,4 i (2z+1)k
−k + vπ −k )e ,

17
Entropy 2018, 20, 435

which in terms of the relative coordinate y can be written as

⎧
⎨e−iχδz,0 δj,2 (v+± + v−∓ )e−iky − T± (v±+ + v∓− )eiky , y ≥ 0,
k−π −k π −k
ψk± (y) = k
(47)
⎩antisymmetrized, y < 0.

We can interpret such a solution as a scattering of plane waves for which the coefﬁcient T± plays
the role of the transmission coefﬁcient. Being the total momentum a conserved quantity, the two
particles can only exchange their momenta, as expected from a theory in one dimension. Furthermore,
for each value k of the relative momentum, the two particles can also acquire an additional phase of π.
As the interaction is a compact perturbation of the free evolution, the continuous spectrum is the same
as that of the free walk. Equation (46) provides the generalized eigenvector if U2 (χ, p) corresponding
to the continuous spectrum σc = Ω++ f ∪ Ω+−
f .

5.2. Bound States

In the previous section, we derived the solutions in the continuous spectrum, which can
be interpreted as scattering plane waves in one spatial dimension. We seek now the solutions
corresponding to the discrete spectrum, namely solutions with eigenvalues in any one of the sets
Ωsr
l . The derivation of the solution follows similar steps as for the scattering solutions. In particular,
the degeneracy in k is the same: there are four solutions to the equation e−iωsr ( p,k) = e−iω even in
this case, as proved in [45]. Therefore, the general form of the solution in this case can be written
again as in Equation (35) and, following the same reasoning, one obtains the same set of solutions
as in Equation (46). At this stage, we did not impose that the solution is a proper eigenvector in the
Hilbert space H . To this end, we have to set T± = 0 to eliminate the exponentially-divergent terms in
Equation (46). As one can prove, the equation T± = 0 has only one solution for ﬁxed values of χ and
p. More precisely, there is a unique k ∈ Γ0 ∪ Γ−1 ∪ Γ1 ∪ Γ2 , with k I < 0 and eiχ
∈ {1, −1}, such that
either T+ = 0 or T− = 0.
In other words, for each pair of values (χ, p), the walk U2 ( p) has one and only one eigenvector
corresponding to an eigenvalue in the point spectrum. Such eigenvector can be written as

ψk̃1 (z) = (v+±

k̃
,1
+ v−∓
k̃ −π
,1 −i (2z+1)k̃
)e ,

−∓,2 −i2zk̃
ψk̃2 (z) = e−iχδz,0 (v+±
k̃
,2
− v k̃ −π
) e ,
(48)
ψk̃3 (z) = (v+±
k̃
,3
− v−∓
k̃ −π
,3 −i2zk̃
)e ,

ψk̃4 (z) = (v+±

k̃
,4
+ v−∓
k̃ −π
,4 −i (2z+1)k̃
)e ,

where k̃ is the solution of T+ = 0 or T− = 0 and ± chosen accordingly. More compactly, in the y

coordinate, the solution can be written as
⎧
⎨e−iχδz,0 δj,2 (v+± + v−∓ )e−ik̃y , y ≥ 0,
ψk̃ (y) = k̃ k̃−π (49)
⎩antisymmetrized, y < 0.

In Figure 3, the discrete spectrum of the interacting walk together with the continuous
spectrum as a function of the total momentum p is depicted. The solid curves in the gaps between
the continuous bands denote the discrete spectrum for different values of the coupling constant
χ = 2π/3, 3π/7, −3π/7, −2π/3. Molecule states appear also in the Hadamard walk with the same
on-site interaction [44].
Referring to Figure 4, we show the evolution of two particles initially prepared in a singlet
state localized at the origin. From the ﬁgure, one can appreciate the appearance of the bound
state component that has non-vanishing overlapping with the initial state. The bound state,

18
Entropy 2018, 20, 435

being exponentially decaying in the relative coordinate y, is localized on the diagonal of the plot, that
is when the two particles lie at the same point.
In Figure 5, the probability distribution of the bound state corresponding to a choice of parameters
χ = 0.2π and p = 0.035π is depicted. The plot highlights the exponential decay of the tails, which is
the characterizing feature of the bound state.

2
= 23

0 = 37

=- 37
- 2
=- 23

- 3
0 8 4 8 2
p

Figure 3. Complete spectrum of the two-particle Thirring walk as a function of the total momentum
p with mass parameter m = 0.7. The continuous spectrum is as in Figure 1. The solid lines in the
gaps show the point spectrum for different values of the coupling constant: from top to bottom,
χ = 2π/3, 3π/7, −3π/7, −2π/3. It is worth noticing that, for each pair (χ, p), there is only one value
in the discrete spectrum. The light-red lines ω = ±2p lie entirely in the gap between the continuous
bands highlighting the fact that the e±i2p is not in the range of e−iωsr ( p,k) for p
= 0, π/2; for a given
coupling constant χ, e±i2p is an eigenvalue for p = χ/2.

(a) (b)

Figure 4. We show for comparison the free evolution (a) and the interacting one (b) highlighting the
appearance of bound states components along the diagonal, namely when the two particles are at the
same site (i.e., x1 = x2 ), where x1 and x2 denote the positions of the two particles. The plots show
the probability distribution p( x1 , x2 ) in position space after t = 32 time-steps. The chosen value of
the mass parameter is m = 0.6 and the coupling constant is χ = π/2. The two particles are initially
prepared in a singlet state located at the origin.

19
Entropy 2018, 20, 435

(a) (b)

Figure 5. We show the evolution of a bound state of the two particles peaked around the value of the total
momentum p = 0.035π. The mass paramater is m = 0.6 and the coupling constant χ = 0.2π. In (a) is
depicted the probability distribution of the initial state. In (b) is depicted the probability distribution
of the evolved state after t = 128 time-steps. One can notice that, in the relative coordinate x1 − x2 ,
the probability distribution remains concentrated on the diagonal, highlighting the fact that the two
particles are in a bound state. The diffusion of the state happens only in the centre of a mass coordinate.

5.3. Solution for e−iω = e±i2p

Thus far, we have studied proper eigenvectors that decay exponentially as the two particles are
further apart. However, the previous analysis failed to cover the particular case when e−iω = e±i2p ,
since the range of e−iωsr ( p,k) does not include the two points of the unit circle e±i2p .
We now study the solutions with e−iω = e±i2p having the form given in Equation (29). One can
prove that such solutions are non-vanishing only for z = 0 on PH , namely we look for a solution of
the form
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
−ζ 0 ζ
⎜ 0 ⎟ ⎜ η ⎟ ⎜0⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
|ψ = ⎜ ⎟ ⊗ |−1 + ⎜ ⎟ ⊗ |0 + ⎜ ⎟ ⊗ |1 . (50)
⎝ 0 ⎠ ⎝−η ⎠ ⎝0⎠
−ζ 0 ζ

Subtracting the ﬁrst and the last equations of (28) using (50), we obtain the following equation:

(e−iω − ei2p )ζ = ei2p (e−iω − e−i2p )ζ . (51)

If both ζ and ζ are non-zero, one can prove that a solution does not exist and thus we have to
consider the two cases ζ = 0 and ζ = 0 separately. Starting from ζ = 0, Equation (51) imposes that
e−iω = ei2p , meaning that, if a solution exists in this case, it is an eigenvector corresponding to the
eigenvalue ei2p . From the second equation of (28), we obtain the relation

(1 − μ2 ei(χ−2p) )η = iμνe−ip ζ (52)

and, using the ﬁrst equation of (28), it turns out that a solution exists only if eiχ = ei2p , as expected,
since, otherwise, the case of Section 5.2 would have held. The other case, namely e−iω = e−i2p , can be
studied analogously. Let us, then, denote as |ψ±∞ such proper eigenvectors with eigenvalue e±i2p
μ
for χ = e±i2p and, choosing η = ν as the value for the free parameter η, we obtain the following
expression for |ψ±∞ :

20
Entropy 2018, 20, 435

⎛ ⎞ ⎛ ⎞ ⎛ ⎞
1±1
2 0 − 1±2 1
⎜ ⎟ ⎜ μ ⎟ ⎜ ⎟
⎜ 0 ⎟ ⎜ ⎟ ⎜ 0 ⎟
|ψ±∞ = ie±ip ⎜ ⎟ ⊗ |−1 + ⎜ νμ ⎟ ⊗ |0 + ie±ip ⎜ ⎟ ⊗ |1 . (53)
⎝ 0 ⎠ ⎝− ν ⎠ ⎝ 0 ⎠
−
− 21±1
0 − 1±1
2

Such solutions provide a special case of molecule states (namely, proper eigenvectors of U2 (χ, p)),
being localized on few sites, and differ from the previous solutions showing an exponential decay in
the relative coordinate.

5.4. Solutions for p ∈ {0, π/2}

The solutions that we presented in the previous discussion do not cover the extreme values
p = 0, π/2 (see [45] for a reference). Let us consider for deﬁniteness the case p = 0, since the
other case is obtained in a similar way. For e−iω
= 1, the previous analysis still holds. Indeed,
noticing that ω±± (0, k) = ±2ω (k ), we have ω (k ) ∈ R and ω (k )
= 0 if and only if k ∈ Γ f ∪ Γ0 ∪ Γ2 ,
whereas ω±∓ (0, k ) = 0 for all k ∈ C. This means that the solutions |ψk+ of Equation (46) are actually
eigenvectors of U2 (χ, 0). Thus, the spectrum is made by a continuous part, given by the arc of the
unit circle containing −1 and having e±2iω (0) as extremes, and a point spectrum with two points:
e−2iω (k̃) , where k̃ is the solution of T+ = 0 for p = 0, and 1. As shown in [45], 1 is a separated part of
the spectrum of U2 (χ, 0) and the corresponding eigenspace is a separable Hilbert space of stationary
bound states. This fact underlines an important feature of the Thirring walk not shared by analogous
Hamiltonian models. It is remarkable that this behaviour occurs also for the free walk with χ = 0.
In Figure 6, we show the probability distribution of two states having the properties hereby discussed.
It is worth noticing that all the states v+−
k with k ∈ (−π, π ] are eigenvectors relative to the eigenvalue 1,
and thus they generate a subspace on which the walk acts identically. We remark that this behaviour
relies on the fact that the dispersion relation in one dimension is an even function of k.

0.6
0.30

0.5
0.25

0.4
0.20
p(y)

p(y)

0.3 0.15

0.2 0.10

0.1 0.05

0.0 0.00
- 30 - 20 - 10 0 10 20 30 - 30 - 20 - 10 0 10 20 30
y y

(a) (b)

Figure 6. We show the case of two proper eigenstates for p = 0. In both cases the mass parameter

is m = 0.6. (a): probability distribution in the relative coordinate y of dk (v+− − v−+
k )e
−iyk .
−+ −iyk
k
(b): probability distribution in the y-coordinate of dk (v+−
k + v k ) e .

6. Conclusions
In this work, we reviewed the Thirring quantum walk [45], providing a simpliﬁed derivation of
its solutions for Fermionic particles. The simpliﬁed derivation relies on the symmetric properties of the
walk evolution operator, allowing for separating the subspace of solutions affected by the interaction
from the subspace where the interaction step acts trivially. The interaction term is the most general
number-preserving interaction in one dimension, whereas the free evolution is provided by the Dirac
QW [17].

21
Entropy 2018, 20, 435

We showed the explicit derivation of the scattering solutions (solutions for the continuous
spectrum) as well as for the bound-state solutions. The Thirring walk features also localized bound
states (namely, states whose support is ﬁnite on the lattice) when e−iω = e±i2p . Such solutions exist
only when the coupling constant is χ = 2p. Figure 4 depicts the evolution of a perfectly localized
state showing the overlapping with bound state components. In Figure 5, we reported the evolution
of a bound state of the two particles peaking around a certain value of the total momentum: one can
appreciate that the probability distribution remains localized on the main diagonal during the evolution.
Finally, we showed that bound states exist also for a vanishing coupling constant—even though
this is true only for a ﬁnite set of values of the total momentum p—which is a striking difference
between the discrete model of the present work and corresponding Hamiltonian systems.

Author Contributions: G.M.D. and P.P. conceived and designed the model; A.B. and A.T. performed the calculations;
N.M. reviewed the derivation exploiting the symmetries of the walk, and performed the numerical analysis.
Funding: This publication was made possible through the support of a grant from the John Templeton Foundation
under the project ID# 60609 Causal Quantum Structures. The opinions expressed in this publication are those of the
authors and do not necessarily reflect the views of the John Templeton Foundation.
Conflicts of Interest: The authors declare no conflict of interest. The founding sponsors had no role in the design
of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the
decision to publish the results.

Appendix A. Notation
For the single particle walk of Equation (3), the eigenstates can be written as

1 −iμ
vsp = , gs ( p) := −i (s sin ω ( p) + ν sin p), (A1)
| Ns ( p)| gs ( p )

with | Ns ( p)|2 = μ2 + | gs ( p)|2 . For the two-particle walk, we deﬁne vrs k = v p+k ⊗ v p−k . If s = r,
: s r

then we name the related eigenspace the even eigenspace; whereas, if s

= r, we call the related
eigenspace the odd eigenspace. As proven in item 3 of Lemma 1 of [45], for a given k, the degeneracy is
4 both in the even and in the odd case. Namely, if the triple (+, +, k ) is a solution, then also (+, +, −k ),
(−, −, π − k) and (−, −, k − π ) are solutions; if the triple (+, −, k) is a solution, then also (+, −, π − k)
and (−, +, k − π ) are solutions.
Explicitly, for the even case, we have:
⎛ ⎞ ⎛ ⎞
− μ2 − μ2
⎜ −iμg ( p − k ) ⎟ ⎜ ⎟
⎜ ⎟ ⎜ iμg+ ( p + k) ⎟
v−−
+
v++ ∝⎜ ⎟, π −k ∝⎜ ⎟, (A2)
k ⎝ −iμg+ ( p + k ) ⎠ ⎝ iμg+ ( p − k) ⎠
g+ ( p + k ) g+ ( p − k ) g+ ( p + k ) g+ ( p − k )

⎛ ⎞ ⎛ ⎞
− μ2 − μ2
⎜ −iμg+ ( p − k) ⎟ ⎜ iμg+ ( p − k) ⎟
⎜ ⎟ ⎜ ⎟
v++
−k ∝⎜ ⎟, v−−
k−π ∝⎜ ⎟. (A3)
⎝ −iμg+ ( p + k) ⎠ ⎝ iμg+ ( p + k) ⎠
g+ ( p + k ) g+ ( p − k ) g+ ( p + k ) g+ ( p − k )

Analogously for the odd case, the eigenstates are

22
Entropy 2018, 20, 435

⎛ ⎞ ⎛ ⎞
− μ2 − μ2
⎜ −iμg− ( p − k) ⎟ ⎜ ⎟
⎜ ⎟ ⎜ iμg+ ( p + k) ⎟
v+− ∝⎜ ⎟, v+−
π −k ∝⎜ ⎟, (A4)
k ⎝ −iμg+ ( p + k) ⎠ ⎝ iμg− ( p − k) ⎠
g+ ( p + k ) g− ( p − k ) g+ ( p + k ) g− ( p − k )

⎛ ⎞ ⎛ ⎞
− μ2 − μ2
⎜ −iμg ( p + k ) ⎟ ⎜ ⎟
⎜ ⎟ ⎜ iμg+ ( p + k) ⎟
v−+ v−+
+
−k ∝⎜ ⎟, k−π ∝⎜ ⎟. (A5)
⎝ −iμg− ( p − k ) ⎠ ⎝ iμg− ( p − k) ⎠
g− ( p + k ) g+ ( p − k ) g+ ( p + k ) g− ( p − k )

In order to simplify the derivation of the solution, we adopt the following notation:
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
a a a a
⎜b⎟ ⎜c⎟ ⎜ −c ⎟ ⎜−b⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
v++ =: ⎜ ⎟ , v++
−k = ⎜ ⎟, v−−
π −k = ⎜ ⎟, v−−
k−π = ⎜ ⎟, (A6)
k ⎝c⎠ ⎝b⎠ ⎝−b⎠ ⎝ −c ⎠
d d d d
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
a a a a
⎜ b ⎟ ⎜ c ⎟ ⎜ −c ⎟ ⎜−b ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
v+− =: ⎜ ⎟ , v−+
−k = ⎜ ⎟ , v+−
π −k = ⎜ ⎟ , v−+
k−π = ⎜ ⎟ . (A7)
k ⎝c ⎠ ⎝b ⎠ ⎝−b ⎠ ⎝ −c ⎠
d d d d

References
1. Farhi, E.; Goldstone, J.; Gutmann, S. A Quantum Algorithm for the Hamiltonian NAND Tree. Theory Comput.
2008, 4, 169–190. doi:10.4086/toc.2008.v004a008. [CrossRef]
2. Feynman, R.P.; Hibbs, A.R.; Styer, D.F. Quantum Mechanics and Path Integrals; Volume 2, International Series
in Pure and Applied Physics; McGraw-Hill: New York, NY, USA, 1965.
3. Grossing, G.; Zeilinger, A. Quantum cellular automata. Complex Syst. 1988, 2, 197–208.
4. Ambainis, A.; Bach, E.; Nayak, A.; Vishwanath, A.; Watrous, J. One-dimensional Quantum Walks.
In Proceedings of the STOC ’01 Thirty-Third Annual ACM Symposium on Theory of Computing, Hersonissos,
Greece, 6–8 July 2001; ACM: New York, NY, USA, 2001; pp. 37–49. doi:10.1145/380752.380757. [CrossRef]
5. Reitzner, D.; Nagaj, D.; Bužek, V. Quantum Walks. Acta Phys. Slov. Rev. Tutor. 2011, 61, 603–725. [CrossRef]
6. Gross, D.; Nesme, V.; Vogts, H.; Werner, R. Index theory of one dimensional quantum walks and cellular
automata. Commun. Math. Phys. 2012, 310, 419–454. [CrossRef]
7. Shikano, Y. From Discrete Time Quantum Walk to Continuous Time Quantum Walk in Limit Distribution.
J. Comput. Theor. Nanosci. 2013, 10, 1558–1570. doi:10.1166/jctn.2013.3097. [CrossRef]
8. Childs, A.M.; Goldstone, J. Spatial search by quantum walk. Phys. Rev. A 2004, 70, 022314.
doi:10.1103/PhysRevA.70.022314. [CrossRef]
9. Portugal, R. Quantum Walks and Search Algorithms; Springer Science & Business Media: Berlin, Germany, 2013.
10. Douglas, B.L.; Wang, J.B. A classical approach to the graph isomorphism problem using quantum walks.
J. Phys. A Math. Theor. 2008, 41, 075303. [CrossRef]
11. Gamble, J.K.; Friesen, M.; Zhou, D.; Joynt, R.; Coppersmith, S.N. Two-particle quantum walks applied to the
graph isomorphism problem. Phys. Rev. A 2010, 81, 052313. doi:10.1103/PhysRevA.81.052313. [CrossRef]
12. Bialynicki-Birula, I. Weyl, Dirac, and Maxwell equations on a lattice as unitary cellular automata. Phys. Rev. D
1994, 49, 6920. [CrossRef]
13. Meyer, D. From quantum cellular automata to quantum lattice gases. J. Stat. Phys. 1996, 85, 551–574.
[CrossRef]
14. Yepez, J. Relativistic Path Integral as a Lattice-based Quantum Algorithm. Quantum Inf. Process. 2006,
4, 471–509. [CrossRef]
15. Arrighi, P.; Facchini, S. Decoupled quantum walks, models of the Klein-Gordon and wave equations. EPL
2013, 104, 60004. [CrossRef]

23
Entropy 2018, 20, 435

16. Bisio, A.; D’Ariano, G.M.; Tosini, A. Quantum ﬁeld as a quantum cellular automaton: The Dirac free
evolution in one dimension. Ann. Phys. 2015, 354, 244–264. doi:10.1016/j.aop.2014.12.016. [CrossRef]
17. D’Ariano, G.M.; Perinotti, P. Derivation of the Dirac equation from principles of information processing.
Phys. Rev. A 2014, 90, 062106. doi:10.1103/PhysRevA.90.062106. [CrossRef]
18. D’Ariano, G.M.; Mosco, N.; Perinotti, P.; Tosini, A. Path-integral solution of the one-dimensional Dirac
quantum cellular automaton. Phys. Lett. A 2014, 378, 3165–3168. doi:10.1016/j.physleta.2014.09.020.
[CrossRef]
19. D’Ariano, G.M.; Mosco, N.; Perinotti, P.; Tosini, A. Discrete Feynman propagator for the Weyl quantum walk
in 2 + 1 dimensions. EPL 2015, 109, 40012. doi:10.1209/0295-5075/109/40012. [CrossRef]
20. Arrighi, P.; Facchini, S.; Forets, M. Quantum walking in curved spacetime. Quantum Inf. Process. 2016,
15, 3467–3486. doi:10.1007/s11128-016-1335-7. [CrossRef]
21. Bisio, A.; D’Ariano, G.M.; Perinotti, P. Quantum cellular automaton theory of light. Ann. Phys. 2016,
368, 177–190. doi:10.1016/j.aop.2016.02.009. [CrossRef]
22. Arnault, P.; Debbasch, F. Quantum walks and discrete gauge theories. Phys. Rev. A 2016, 93, 052301.
doi:10.1103/PhysRevA.93.052301. [CrossRef]
23. Bisio, A.; D’Ariano, G.M.; Erba, M.; Perinotti, P.; Tosini, A. Quantum walks with a one-dimensional coin.
Phys. Rev. A 2016, 93, 062334. doi:10.1103/PhysRevA.93.062334. [CrossRef]
24. Mallick, A.; Mandal, S.; Chandrashekar, C.M. Neutrino oscillations in discrete-time quantum walk
framework. Eur. Phys. J. C 2017, 77, 85. doi:10.1140/epjc/s10052-017-4636-9. [CrossRef]
25. Molfetta, G.D.; Pérez, A. Quantum walks as simulators of neutrino oscillations in a vacuum and matter.
New J. Phys. 2016, 18, 103038. [CrossRef]
26. Brun, T.A.; Mlodinow, L. Discrete spacetime, quantum walks and relativistic wave equations. Phys. Rev. A
2018, 97, 042131. doi:10.1103/PhysRevA.97.042131. [CrossRef]
27. Brun, T.A.; Mlodinow, L. Detection of discrete spacetime by matter interferometry. arXiv 2018,
arXiv:1802.03911. [CrossRef]
28. Raynal, P. Simple derivation of the Weyl and Dirac quantum cellular automata. Phys. Rev. A 2017, 95, 062344.
doi:10.1103/PhysRevA.95.062344. [CrossRef]
29. Bibeau-Delisle, A.; Bisio, A.; D’Ariano, G.M.; Perinotti, P.; Tosini, A. Doubly special relativity from quantum
cellular automata. EPL 2015, 109, 50003. [CrossRef]
30. Bisio, A.; D’Ariano, G.M.; Perinotti, P. Special relativity in a discrete quantum universe. Phys. Rev. A 2016,
94, 042120. doi:10.1103/PhysRevA.94.042120. [CrossRef]
31. Bisio, A.; D’Ariano, G.M.; Perinotti, P. Quantum walks, deformed relativity and Hopf algebra symmetries.
Philos. Trans. R. Soc. Lond. A Math. Phys. Eng. Sci. 2016, 374, doi:10.1098/rsta.2015.0232. [CrossRef]
[PubMed]
32. Arrighi, P.; Facchini, S.; Forets, M. Discrete Lorentz covariance for quantum walks and quantum cellular
automata. New J. Phys. 2014, 16, 093007. [CrossRef]
33. Du, J.; Li, H.; Xu, X.; Shi, M.; Wu, J.; Zhou, X.; Han, R. Experimental implementation of the quantum
random-walk algorithm. Phys. Rev. A 2003, 67, 042316. doi:10.1103/PhysRevA.67.042316. [CrossRef]
34. Ryan, C.A.; Laforest, M.; Boileau, J.C.; Laﬂamme, R. Experimental implementation of a discrete-time
quantum random walk on an NMR quantum-information processor. Phys. Rev. A 2005, 72, 062317.
doi:10.1103/PhysRevA.72.062317. [CrossRef]
35. Xue, P.; Sanders, B.C.; Leibfried, D. Quantum Walk on a Line for a Trapped Ion. Phys. Rev. Lett. 2009,
103, 183602. doi:10.1103/PhysRevLett.103.183602. [CrossRef] [PubMed]
36. Do, B.; Stohler, M.L.; Balasubramanian, S.; Elliott, D.S.; Eash, C.; Fischbach, E.; Fischbach, M.A.; Mills, A.;
Zwickl, B. Experimental realization of a quantum quincunx by use of linear optical elements. J. Opt. Soc.
Am. B 2005, 22, 499–504. doi:10.1364/JOSAB.22.000499. [CrossRef]
37. Sansoni, L.; Sciarrino, F.; Vallone, G.; Mataloni, P.; Crespi, A.; Ramponi, R.; Osellame, R. Two-Particle
Bosonic-Fermionic Quantum Walk via Integrated Photonics. Phys. Rev. Lett. 2012, 108, 010502.
doi:10.1103/PhysRevLett.108.010502. [CrossRef] [PubMed]
38. Crespi, A.; Osellame, R.; Ramponi, R.; Giovannetti, V.; Fazio, R.; Sansoni, L.; De Nicola, F.; Sciarrino, F.;
Mataloni, P. Anderson localization of entangled photons in an integrated quantum walk. Nat. Photonics
2013, 7, 322–328. [CrossRef]

24
Entropy 2018, 20, 435

39. Flamini, F.; Spagnolo, N.; Sciarrino, F. Photonic quantum information processing: A review. arXiv 2018,
arXiv:1803.02790. [CrossRef]
40. Childs, A.M. Universal Computation by Quantum Walk. Phys. Rev. Lett. 2009, 102, 180501.
doi:10.1103/PhysRevLett.102.180501. [CrossRef] [PubMed]
41. Lovett, N.B.; Cooper, S.; Everitt, M.; Trevers, M.; Kendon, V. Universal quantum computation using the
discrete-time quantum walk. Phys. Rev. A 2010, 81, 042330. doi:10.1103/PhysRevA.81.042330. [CrossRef]
42. Childs, A.M.; Gosset, D.; Webb, Z. Universal computation by multiparticle quantum walk. Science 2013,
339, 791–794. [CrossRef] [PubMed]
43. Meyer, D.A. Quantum lattice gases and their invariants. Int. J. Mod. Phys. C 1997, 8, 717–735. [CrossRef]
44. Ahlbrecht, A.; Alberti, A.; Meschede, D.; Scholz, V.B.; Werner, A.H.; Werner, R.F. Molecular binding in
interacting quantum walks. New J. Phys. 2012, 14, 073050. [CrossRef]
45. Bisio, A.; D’Ariano, G.M.; Perinotti, P.; Tosini, A. Thirring quantum cellular automaton. Phys. Rev. A 2018,
97, 032132. doi:10.1103/PhysRevA.97.032132. [CrossRef]
46. Östlund, S.; Mele, E. Local canonical transformations of fermions. Phys. Rev. B 1991, 44, 12413–12416.
doi:10.1103/PhysRevB.44.12413. [CrossRef]
47. Thirring, W.E. A soluble relativistic ﬁeld theory. Ann. Phys. 1958, 3, 91–112. doi:10.1016/0003-4916(58)90015-0.
[CrossRef]
48. Hubbard, J. Electron correlations in narrow energy bands. Proc. R. Soc. Lond. A Math. Phys. Eng. Sci. 1963,
276, 238–257. doi:10.1098/rspa.1963.0204. [CrossRef]

c 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

25
entropy
Article
Robust Macroscopic Quantum Measurements in the
Presence of Limited Control and Knowledge
Marc-Olivier Renou *, Nicolas Gisin and Florian Fröwis
Department of Applied Physics, University of Geneva, 1211 Geneva 4, Switzerland;
[email protected] (N.G.); [email protected] (F.F.)
* Correspondence: [email protected]

Received: 31 October 2017; Accepted: 26 December 2017; Published: 9 January 2018

Abstract: Quantum measurements have intrinsic properties that seem incompatible with our
everyday-life macroscopic measurements. Macroscopic Quantum Measurement (MQM) is a concept
that aims at bridging the gap between well-understood microscopic quantum measurements and
macroscopic classical measurements. In this paper, we focus on the task of the polarization direction
estimation of a system of N spins 1/2 particles and investigate the model some of us proposed
in Barnea et al., 2017. This model is based on a von Neumann pointer measurement, where each spin
component of the system is coupled to one of the three spatial component directions of a pointer.
It shows traits of a classical measurement for an intermediate coupling strength. We investigate
relaxations of the assumptions on the initial knowledge about the state and on the control over the
MQM. We show that the model is robust with regard to these relaxations. It performs well for thermal
states and a lack of knowledge about the size of the system. Furthermore, a lack of control on the
MQM can be compensated by repeated “ultra-weak” measurements.

Keywords: quantum measurement; quantum estimation; macroscopic quantum measurement

1. Introduction
In our macroscopic world, we constantly measure our environment. For instance, to find north
with a compass, we perform a direction measurement by looking at the pointer. Yet, finding a quantum
model for this kind of macroscopic measurement faces several problems. Many characteristics
of quantum measurements seem to be incompatible with our intuitive notion of macroscopic
measurements. For example, perfectly measuring two non-commuting observables is impossible
in quantum mechanics, and any informative measurement has a nonvanishing invasiveness. Thus, if it
exists, such a model cannot be of the standard projective kind. Although we have a good intuition of
what such a measurement is, the natural characteristics it should satisfy are not obvious. Even if these
characteristics can be rigorously formulated, it is not clear whether there exists a quantum model that
satisfies them all.
For concreteness, quantum models for macroscopic measurements can be considered as a
parameter estimation task. In this paper, we focus on the estimation of the direction of polarization
of N qubits, oriented in a direction that is uniformly chosen at random. The question of the optimal
way to estimate N qubit polarization is already well studied [1,2] and can be seen as part of a larger
class of covariant estimation problems [3]. It is linked to covariant cloning [4] and purification of
state [5]. In the limit of macroscopic systems, those optimal measurements are arbitrarily precise
and potentially with low disturbance of the system [6,7]. A tradeoff between the quality of the guess
and the disturbance of the state has been demonstrated [8], as well as an improvement of the guess
when abstention is allowed [9]. However, these optimal measurements may not be satisfying models
of our everyday-life macroscopic measurements as it is not clear how these optimal measurements

Entropy 2018, 20, 39; doi:10.3390/e20010039 27 www.mdpi.com/journal/entropy

Entropy 2018, 20, 39

could be physically implemented in a natural way. A first attempt to solve this Positive Operator
Valued Measure (POVM), which is continuous, into a POVM with a finite (and small) number of
elements [10,11]. However, even if this reduction exists, the resulting POVM is difficult to interpret
physically, and to our best knowledge, no family of reduced POVM for every N exists.
In [12], we argue that a good model of a macroscopic measurement should be highly non-invasive,
collect a large amount of information in a single shot and be described by a “fairly simple” coupling
between system and observer. Measurements that fulfills these requirements are called “Macroscopic
Quantum Measurements” (MQM). Invasiveness seems to be difficult to satisfy with a quantum
model. Indeed, the disturbance induced on the state by a measurement is generic in quantum
mechanics. This has no counterpart in classical physics, where any measurement can ideally be done
without disturbance of the system. However, it is now well known that this issue can be solved by
accepting quantum measurements of finite accuracy. In [13], Poulin shows the existence of a trade-off
between state disturbance and measurement resolution as a function of the size of the ensemble.
One macroscopic observable can behave “classically”, provided we measure it with sufficiently low
resolution. Yet, the question is still open for several non-commuting observables. Quantum physics
allows precise measurements of only one observable among two non-commuting ones.
In this paper, we study the behavior of an MQM model for the measurement of the polarization of a
large ensemble of N parallel spin 1/2 particles, which implies the measurement of the non-commuting
spin operators. In this model, the measured system is first coupled to a measurement apparatus through
an intuitive Hamiltonian already introduced in [14]. Then, the apparatus is measured. We extend our
previous study to more general cases. In [12], it was shown that this model allows good direction
estimation and low disturbance for systems of N parallel spin 1/2 particles. This system can be
interpreted as the ground state of a product Hamiltonian. Here, we generalize the scenario to thermal
states. We also study a different measurement procedure based on repeated weak measurements.
The paper is structured as follows: We first present a simplified technical framework that describes
the measurement of a random direction for a given quantum state and observable. Considering an
input state and an observable independent of the particle number and with no preferred direction,
we show that the problem reduces to many sub-problems, which correspond to systems of fixed total
spin j. Then, we quantitatively treat the case of the thermal state, which generalizes the N parallel spin
1/2 particle for non-zero temperature, showing that the discussed MQM is still close to the optimal
measurement. In the proposed MQM, the precision of the estimated direction highly depends on the
optimized coupling strength of the model. In Section 4, we follow the ideas of [13], and we show that
one may relax this requirement by doing repeated “ultra-weak” measurements and a naive guess.
We conclude and summarize in the last section.

2. Estimation of a Direction
In this paper, we aim to study the behavior of a specific MQM model for a direction estimation
task, e.g., the estimation of the direction of a magnet or a collection of spins. Hence, we first introduce
an explicit (and specific) direction estimation problem, which is presented as a game. It concerns the
direction estimation of a qubit ensemble. In the following, Su = S · u represents the spin operator
projected in direction u, i.e., the elementary generator of rotations around u. For a given state ρu of
N = 2J qubits, we say that ρu points in the direction u if it is positively polarized in the u direction, i.e.,
if [ρu , Su ] = 0 and Tr(ρu Su ) > 0. We consider the problem of polarization direction estimation from
states that are all the same, but point in a direction that is chosen uniformly at random. This problem
has already been widely studied [1–3,6,15]. We give here a unified framework adapted to our task.

2.1. General Framework

We consider a game with a referee, Alice, and a player, Bob. Alice and Bob agree on some initial
state ρz . In each round of the game, Alice chooses a direction u from a uniform distribution on the unit
sphere. She rotates ρz to ρu = R†u ρz Ru , where Ru is a rotation operator, which maps z to u. She sends

28
Entropy 2018, 20, 39

ρu to Bob, who measures it with some given measurement device characterized by a Positive Operator
Valued Measure (POVM) Ωr . He obtains a result r with probability p(r |u) = Tr(Ωr ρu ), from which
he deduces vr , his guess for u. Bob’s score is computed according to some predeﬁned score function
g(u, vr ) = u.vr . Given his measurement result, Bob’s goal is to ﬁnd the optimal estimate, i.e., the one
that optimizes his mean score [16].

G= dr du p(r |u) g(u, vr ) (1)

For simplicity, we consider an equivalent, but simplified POVM. In our description, Bob measures
the system, obtains results r and then post-processes this information to find his guess vr . We now
regroup all POVM elements corresponding to the same guess and label by the guessed direction.

Formally, we go from Ωr to Ov = drΩr δ(vr − v).
Some assumptions are made about ρz and Ov . We suppose that ρz points in the z direction.
Moreover, we assume that ρz is symmetric under the exchange of particles, which implies [ρz , S2 ] = 0.
Let |α, j, m be the basis in which Sz and S2 are diagonal (where j ∈ { J, J − 1, ...} is the total spin,
α the multiplicity due to particle exchange and m the spin along z). Then, ρz is diagonal in this basis,
j
with coefficients independent of α, denoted as cm = α, j, m|ρz |α, j, m.
We also suppose that the measurement device does not favor any direction and treats each particle
equally. Mathematically, this means that Ov is covariant with respect to particle exchange and rotations.
Then, any POVM element is generated from one kernel Oz and the rotations Rv : Ov = R†v Oz Rv
(for more technical details, see [15]). With this, Equation (1) simplifies to:

G= dv du p(v|u) g(u, v), (2)

2.2. Score for Given Input State and Measurement

The following lemma is already implicitly proven in [15].

Lemma 1. Bob’s mean score is:

j
jA j Tr ρz j

Sz j S z Oz
G= ∑ j+1
Tr
j
ρ̃z Tr
j 2j + 1
(3)
j

j
where A j = ( J2J 2J
− j) − ( J − j−1) is the degeneracy of the multiplicity α in a subspace of given ( j, m ), Oz is the
j
projections of Oz over all subspaces of ﬁxed (α, j), ρz is the projection of ρz over all subspaces of ﬁxed (α, j) and
j
ρz .
j
ρ̃z = j
Tr ρz

Lemma 1 says that Bob cannot use any coherence between subspaces associated with different
(α, j) to increase
his score. In other words, the score Bob achieves is the weighted sum (where the
α,j j
weights are Tr ρz ) of the scores G j Bob would achieve by playing with the states ρ̃z . This property
is a consequence of the assumption that no direction or particle is preferred by Bob’s measurement or
in the set of initial states. For self-consistency, we prove this lemma.

Proof. Bob’s mean score is:

G= dr du p(v|u) g(u, v) = dv Tr(Ov Γv ), (4)

where Γv = v · du ρu u. As ρu is the rotated ρz and Ov is covariant, we have:

G = Tr(Oz Γz ). (5)

29
Entropy 2018, 20, 39

α,j α,j
Let Pα,j = ∑m |α, j, m α, j, m| be projectors, Γz = Pα,j Γz Pα,j and Oz = Pα,j Oz Pα,j . Here, as ρ and
Oz do not depend on the particle number, α is only a degeneracy.
α,j
As Γz is invariant under rotation around z and commutes with S2 , we have Γz = ∑α,j Γz .

α,j α,j j j j j
Then, G = ∑α,j Tr Oz Γz = ∑ j A j Tr Oz Γz , where Oz , Γz are respectively the projections of Oz , Γz

j j
over any spin coherent subspace of ﬁxed α, j. Let G j = Tr Oz Γz .
j j
Γz = ∑m cm du uz R†u |α, j, m α, j, m|Ru is symmetric under rotations around z. Then, it is

diagonal in the basis |α, j, m with ﬁxed j, α. As α, j, μ| du uz R†u |α, j, m α, j, m|Ru |α, j, μ =
mμ α,j
j( j+1)(2j+1)
= m
j( j+1)(2j+1)
α, j, μ| Sz |α, j, μ, we have:

m α,j
∑ cm j( j + 1)(2j + 1) Sz
j j
Γz = (6)
m

and:
1 j j
Gj = Tr Sz ρz Tr Sz Oz . (7)
j( j + 1)(2j + 1)

2.3. State Independent Optimal Measurement, Optimal State for Direction Estimation

α,j
Given the state ρz , the measurement that optimizes Bob’s score is the set of Θv such that

α,j α,j
Tr Sz Θz is maximal. The maximum is obtained when Θz is proportional to a projector on the
α,j
eigenspace of Sz with the maximal
eigenvalue,
that is for Θz = (2j + 1)|α, j, ± j α, j, ± j|. Here, the
j
sign depends on the sign of Tr Sz ρ z
. In the following, we restrict ourselves to the case where the

j
Tr Sz ρz are all positive (this is the case for the thermal state, considered below). Then:

j
jA j Tr ρz
Sz j
Gopt = ∑ j+1
Tr
j
ρ̃z . (8)
j

J
For ρz = | J, J J, J |, the thermal state of temperature T = 0, we ﬁnd Gopt,T =0 = J +1 . Equivalently,
1 N +1
we recover the optimal ﬁdelity Fopt,T =0 = = 2 (1 + Gopt,T =0 ) N +2 , already found in [1]. Asymptotically,
we have Gopt,T =0 = 1 − 1/J + O(1/J 2 ). This induces a natural characterization of the optimality of an
estimation procedure. Writing GT =0 as GT =0 = 1 − J /J where J = J (1 − GT =0 ) ≥ 1, we say that the
procedure is asymptotically optimal if J = 1 + O(1/J ) and almost optimal if J − 1 is asymptotically
not far from zero.

2.4. Optimality of a State and a Measurement for Direction Guessing

Given the input state ρz , we can now compare the performances of a given measurement to the
optimal measurement. From Equations (3) and (8), we have, for an arbitrary measurement:

j
jA j Tr ρz j j

Sz j S z Θ z − Oz
ΔG ≡ Gopt − G = ∑ j+1
Tr
j
ρ̃z Tr
j 2j + 1
. (9)
j

For every j, the three terms of the product are positive. Then, qualitatively, the measurement is
nearly optimal if for each j, the product of the three is small. We give here the interpretation of each of
these terms:

30
Entropy 2018, 20, 39

j
• A j is the degeneracy under permutation of particles (labeled by α) and Tr ρz the weight of ρz
over a subspace j, α. Hence, the first term, bounded by j/( j + 1), only contains the total weight of
ρz over a fixed total spin j. Hence, it is small whenever ρ has little weight in the subspace j.
j j
• Tr Sjz ρ̃z is small whenever the component of ρz on the subspace of total spin j, ρz = Pz ρz Pz , is
j
small or not well polarized. It is bounded by one. When ρz is not well polarized, the optimality of
the measurement in that subspace makes little difference. Then, this second term characterizes the
j
quality of the component ρz for the guess of the direction.
j
• The last term is small when Oz is nearly optimal and is also bounded by one. More exactly,

j j
as Oz is a covariant POVM, we have Tr Oz = 2j + 1, and all diagonal coefficients are positive.
j
Because of Sz /j, Oz is (nearly) optimal when it projects (mainly) onto the subspace of Sz with the
highest eigenvalue. POVMs containing other projections are sub-optimal. This effect is amplified
by the operator Sz : the further away these extra projections ∝ | j, m j, m| are from the optimal
projector ∝ | j, j j, j| (in the sense of j − m), the stronger the sub-optimality is. Then, the last term
j
corresponds to the optimality of the measurement component Oz for the guess of the direction.

Interestingly, we see here that the state and measurement “decouple”: the optimal measurement is
independent of the considered state. However, if the measurement is not optimal only for subspaces
where ρz has low weight or is not strongly polarized, it will still result in a good mean score.

2.5. Estimation from a Thermal State

We now consider the case where the game is played with a thermal state (with temperature
T = 1/β) of N = 2J spins:
1 − βσz /2 ⊗ N 1
ρz = e = ∑ e− βm |α, j, m α, j, m|, (10)
Z Z α,j,m

where Z = (2cosh( β/2)) N is the partition sum. ρz is clearly invariant under rotations
around
z and
Sz α,j
symmetric under particle exchange. For later purposes, we deﬁne f j ( β) = Z Tr j ρz = (1 + j )

sinh( jβ) − j sinh((1 + j) β) (2j sinh( β/2)2 ).
Equation (3) now reads:
J S z Oz
GT = 0 = Tr , (11)
J+1 J 2J + 1
and for any temperature β:
1 j
G=
Z ∑ Aj f j ( β ) GT = 0 , (12)
j
jA
with the optimal measurement, Gopt,T = Z1 ∑ j j+1j f j ( β). Note that for low temperatures, this expression
can be approximated with J β /( J β + 1), where J β is the mean value of the total spin operator for
a thermal state.

3. A Macroscopic Quantum Measurement

3.1. The Model

In the following, we consider a model already introduced in [12,14] for polarization estimation.
It is adapted from the Arthur–Kelly model, which is designed to simultaneously measure momentum
and position [17–19]. The model is expressed in the von Neumann measurement formalism [20–22].
The measurement device consists of a quantum object (the pointer), which is ﬁrst initialized in a
well-known state and coupled to the system to be measured. At last, the pointer is measured in a
projective way. The result of the measurement provides information about the state of the system.
Tuning the initial state of the pointer and the strength of interaction, one can model a large range of

31
Entropy 2018, 20, 39

measurements on the system, from projective measurements, which are partially informative, but
destruct the state to weak measurements, which acquire little information, but do not perturb much.
More speciﬁcally, to measure the direction of ρu , we use a pointer with three spatial degrees
of freedom: 2 2 2
1 −x +y +z
|φ = dx dy dze 4Δ2 | x |y |z , (13)
(2πΔ2 )3/4
where x, y, z are the coordinates of the pointer. The parameter Δ in |φ represents the width of the
pointer: a small Δ corresponds to a narrow pointer and implies a strong measurement, while a large Δ
gives a large pointer and a weak measurement. The interaction Hamiltonian reads:
Hint = S · p ≡ p x ⊗ Sx + py ⊗ Sy + pz ⊗ Sz , (14)

where p x , py , pz are the conjugate variables of x, y, z. A longer interaction time or stronger coupling
can always be renormalized by adjusting Δ. Hence, we take the two equal to one. Finally, a position
measurement with outcome r is performed on the pointer. The POVM elements associated with this
measurement are Or = Er Er† , where the Krauss operator Er reads:

2 2 −ip·S
Er ∝ dpeir·p e−Δ p e
. (15)

The POVM associated with this model is already covariant. Indeed, the index of each POVM
element is the direction of guess (to exactly obtain the form given in Section 2.1, one has to deﬁne
∞
Ov = 0 r2 Or dr, which is equivalent to identifying each vector with its direction). Any Or is a rotation
of Oz : Or = Rr† Oz Rr .

3.2. Behavior for Zero Temperature States

At zero temperature, it is already known that the score obtained for a game where Bob does the
MQM remains close to the optimal one. In our previous study [12], we demonstrated a counter-intuitive
behavior of the quality of the guess: a weaker coupling strength can achieve better results than a
strong coupling; see Figure 1a. In particular, we show that for well-chosen finite coupling strength,
√
the score of the guess is almost optimal. The optimal value of the coupling is Δ = J/4: it scales with
the square root of the number of particles.
Additional calculations confirm this first conclusion (see Figure 1b). Exploiting the conclusion of
the discussion of Section 2.4, we only considered the first diagonal coefficient of Oz , o J = J, J |Oz | J, J ,
to lower bound the performance of the POVM [23]. Numerical simulations suggest that for a coupling
√
strength Δ = J/4, only considering the bound over o J , GT =0 develops as GT =0 = 1 − J /J with
J = J (1 − G J ) 19/18 for large J. Hence, the asymptotic difference between Gopt,T =0 and G MQM,T =0
is such that JΔGT =0 remains bounded, in the order of 0.05.
(a) (b)

G
0.70

0.60

0.50

0.40

0.30

0.20

Figure 1. (a) Mean score as a function of the pointer width Δ for various N = 2J. The dashed lines
correspond to the optimal value Gopt . (b) Scaling factor J = J (1 − G J ) from the approximate lower
bound on the score G (upper, blue curve) compared to the optimal scaling factor J (1 − Gopt ) (lower,
red curve). For large J, J seems to go to 19/18 (dashed line). See Section 3.2 for further details.

32
Entropy 2018, 20, 39

From Equation (3) and the discussion about Equation (9), we see that, to achieve optimality,
the first diagonal coefficient o J must be maximal [24], that is equal to 2J + 1. When this is not the case,
as Tr(Oz ) = 2J + 1, the difference (2J + 1) − o J = Tr(Oz ) − o J = ∑m
= J om is distributed between the
other diagonal coefficients om = J, m|Oz | J, m, for m
= J. The score achieved by the measurement is
given by Equation (11):
J
S z Oz J m om
J+1 ∑
GT =0 = Tr = . (16)
J 2J + 1 m J 2J + 1

Our bound only considers the coefﬁcient o J . However, a simple calculation shows that this is
enough to deduce the strict suboptimality of the measurement. Indeed, one can derive:

J oJ m om
2J + 1 m∑
J = J 1− +
J+1
= J
J 2J + 1

J oJ J−1 oJ
≥ J 1− + 1−
J + 1 2J + 1 J 2J + 1
oJ
≥ 2− + o (1),
2J + 1

where o (1) → 0 when J → ∞. Hence, if o J is not asymptotically 2J + 1, J cannot be asymptotically one.

In the following, we show that a lower bound on G for thermal states can be calculated with
methods based on the T = 0 case.

3.3. Behavior for Finite Temperature States

As it is built from the spin operators only, the measurement scheme depends only on the properties
of the system with respect to the spin operators. More precisely, for a given system size N = 2J, we
consider the basis {|α, j, m( N ) }, and for given total spin j and permutation multiplicity α, the projector
(N)
Pα,j = ∑α,j |α, j, m α, j, m|( N ) . Then, the projection of Equation (15) for N = 2J spins onto the
subspace j, α is equivalent to the projected Krauss operator for n = 2j spins onto j:

(N) (N) (N) (n) (n) (n)

Pα,j ErPα,j ≡ Pj Er Pj , (17)

where the equivalence ≡ is interpreted as |α, j, m( N ) ≡ |m(n) (there is no multiplicity for n and
j = n/2).
For non-zero temperature, we adapt the numerical estimation model of [12]. Due to Lemma 1
and Equation (17), we can directly exploit the same model and combine the results for the different
subspaces for given j. However, in this case, we are limited by the choice of the coupling strength Δ
of the pointer with the system. At zero temperature, only the total spin subspace that corresponds
√
to j = J is involved. The optimal coupling strength is then Δ = J/4. For a non-zero temperature,
all possible j appear, and the value of Δ cannot be optimized for each one. Our strategy is to choose
the optimal coupling value for the equivalent total spin Jeq satisfying S2 = Jeq ( Jeq + 1), which can
be deduced from S2 = 11 (3J + J (2J − 1)tanh2 β/2) (for a thermal state). Depending on the sensitivity
of the MQM guessing scheme with respect to a change in the value of Δ, this method may work or not.
√
Numeric simulations show that a change of order O( J ) perturbs the score. However, one can hope
that for smaller variation, the perturbation is insigniﬁcant.
We tested the method for different values of temperature T = 1/β corresponding to spin
polarization Sz = J tanh β/2. We ﬁnd again that the asymptotic difference between Gopt and
G MQM is small. More precisely, Figure 2 shows JΔGβ as a function of J, for different temperature
corresponding to Sz = cJ, for various c. For each Δ, the error JΔGβ seems to be bounded for large J.

33
Entropy 2018, 20, 39

0.25

0.20

0.15

0.10

0.05

Figure 2. JΔG (Equation (9)) as a function of J, for various β chosen such that Sz = J tanh β/2.
The Macroscopic Quantum Measurement (MQM) is close to optimal even for ﬁnite temperature.
See Section 3.3 for further details.

4. Estimation of a Direction through Repeated Weak Measurements

In the previous section, we considered a specific MQM and studied the mean score of the state
direction for pure states, as well as for more realistic thermal states. We compared it to its optimal
value, obtained with the optimal theoretical measurement. We showed that the difference remained
bounded. As the model makes use of a simple Hamiltonian coupling between system and observer,
it satisfies the requirements of an MQM as stated in the introduction for thermal states.
However, this model requires that three one-dimensional (1D) pointers (or equivalently one
three-dimensional (3D) pointer) are coupled to the system at the very same time, to be then measured.
This requirement is difficult to meet. Moreover, an optimized coupling strength between system and
√
pointer is necessary: the pointer width has to be Δ = J/4 within relatively tight limits. This requires
a good knowledge about the system to be measured (its size, its temperature, etc.) and fine control
over the measurement. Following [13], we can overcome this problem by implementing many
ultra-weak measurements. To this end, we focus on a relaxation of the measurement procedure,
√
where we consider repeated very weak measurements (with Δ J/4) in successive orthogonal
directions on the state, which is gradually disturbed by the measurements. This idea has already been
implemented experimentally [25]. The guessed state is obtained by averaging the results in each of
the three directions. Note that this is not optimal, as the first measurements are more reliable than
the last. However, we show in the following that this intuitive approach gives almost optimal results.
For simplicity, we restrict ourselves to the case of a perfectly-polarized state, or equivalently a thermal
state at a zero temperature.

4.1. The Model

We modify the game considered so far in the following way. Bob now uses a modiﬁed strategy,
in which he successively repeats the same measurement potentially in different measurement basis.
First, he weakly couples the state to a 1D Gaussian pointer through an interaction Hamiltonian in
some direction w. The pointer state is:
2
1 − 4Δ
w
|φ = dwe 2
|w , (18)
(2πΔ2 )1/4
and the Hamiltonian reads:
Hw ∝ pw ⊗ Sw , (19)
where w ∈ { x, y, z}. Then, Bob measures the pointer. The post measured state is used again for the
next measurement and is disturbed in each round. We ﬁrst analytically derive the case where Bob
only measures in one direction (w = z). Then, we consider the case where Bob does t measurements
successively in each orthogonal direction x, y, z. He obtains results x1 , y1 , z1 , x2 , y2 , ..., zt and estimates
the direction with the vector which coordinates are the average of the xi , the yi and the zi .

34
Entropy 2018, 20, 39

4.2. Measurement in a Single Direction

We ﬁrst study the 1D case. First, note that the optimal strategy when the measurement operators
j
Or are required to measure in a ﬁxed direction z (i.e., [Or , Sz ] = 0) is to measure the operators Sz : As the
j
Or commutes with Sz , they can be simulated with a measurement of Sz . The optimum is to answer
J
±z depending on the sign of the result. The obtained score is then G = 2J + 1 for integer J = N/2 and
2J +1
G= 4( J +1)
otherwise.
In our model, we consider an interaction Hamiltonian Hw taken in a constant direction w = z.
The total number of measurements is t.
The measurement results form a vector r = {r1 , ..., rt }. The POVM of the full measure sequence is:
⎡ ⎤
..
⎢ . ⎥
⎢ ⎥
Ωr = ⎢ Fm (r ) ⎥ (20)
⎣ ⎦
..
.

where:
1 −||r −m1||2
Fm (r ) = √ p e 2Δ2 , (21)
Δ 2π

where 1 = {1, ..., 1}. As all measurements for each step commute, this case can be solved analytically.
Note ﬁrst that the ordering of the measurement results is irrelevant. From Equation (1), we ﬁnd:
1
G= Tr(Sz Oz )
( J + 1)(2J + 1)

2 − ||
2
r || −mt m r · 1
= √ t drδ(vr −z)e 2Δ2
∑ m e 2Δ2 sinh
Δ2
,
( J + 1)(2J + 1) Δ 2π m >0

where vr is the optimal guess. For r such that r ·1 ≥ 0, the optimal guess is clearly vr = z. By symmetry,
v−r = −vr , and the optimal guess is vr = sign(r · 1)z. Then:

2 m t
( J + 1)(2J + 1) m∑
G= m erf (22)
>0 Δ 2

is easily computed by integration over r and by decomposition into√its parallel and orthogonal
components to 1. We see here that the score only depends on the ratio t and reaches the 1D strong Δ
√
measurement limit for Δt 1 (see Figure 3). Here, erﬁs the error function. We see that G → 1/2 for
J → ∞, which is the optimal value for optimal measurements lying on one direction.

G
0.5

0.3

0.1

20 60 100 t

Figure 3. Score for repeated weak measurement in a single ﬁxed direction with Δ = 10 and J = 2, 4, 8, 16.
See Section 4.2 for further details.

4.3. Ultra-Weak Measurements in Three Orthogonal Directions

We now study the relaxation of our initial MQM model. In this case, for a large number of
measurement t, we could not analytically derive the mean score. We hence implemented a numerical

35
Entropy 2018, 20, 39

simulation of the model. We fix the number of qubits N = 2J and pointer width Δ. The vector u
is drawn at random on the Bloch sphere. Then, we simulate τ successive weak measurements in
directions x, y, z of the system |u⊗ N . For each t ≤ τ, we guess u from the mean of the results for x, y, z
for measurements up to t.
For large Δ, our procedure can be seen as successive weak measurements of the system.
Each measurement acquires a small amount of information and weakly disturbs the state. We attribute
the same weight to each measurement result to find the estimated polarization. As each measurement
disturbs the state, this strategy is not optimal. However, keeping the heuristic of “intuitive measurement”,
we consider this guessing method as being natural.
The results from the numerical simulation suggest that for a fixed number of particles N = 2J and
fixed pointer width Δ, the score as a function of t increases and then decreases (see Figure 4a), which
is intuitive. Indeed, for few measurements, the state is weakly disturbed, and each measurement
acquires only a small amount of information about the original state. Then, after a significant number
of measurements, the state is strongly disturbed, and each measurement is done over a noisy state
and gives no information about the initial state. Hence, there is an optimal number of measurements
tmax ( N, Δ) that gives a maximal score Gmax ( N, Δ). Moreover, for a fixed N = 2J, Gmax ( N, Δ) increases
smoothly as the measurements are weaker, i.e., as Δ increases. It reaches a limit Gmax ( N ) (see Figure 4a).
This suggest that for weak enough measurements, we observe the same behavior as in the 1D case.
More measurements compensate a weaker interaction strength, without loss of precision. Hence,
the precision of a single measurement is not important, as long as the measurement is weak enough.
Moreover, in that case, we observe a plateau, which suggests that the exact value of t is not important.
For N 1, even with t far from tmax , the mean score is close to Gmax . Interestingly, the trade-off
√
between tmax and Δ found for the 1D case seems to repeat here. √ We numerically find that t /Δ is
max

constant for a given N = 2J (see Figure 4c) and scales as 1/ N.

(a) (b)

0.9

0.8

0.7

0.6

0.5
G

0.4

0.3

0.2

0.1

0
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
t

Figure 4. (a) Score as a function of the number of measurements for N = 2J = 20. For each Δ, there is
√
an optimal repetition rate tmax . The optimal score Gmax saturates for Δ big enough. (b) Ratio tmax /Δ for
√
N = 5, 10, 15, 20. As in the 1D case, tmax /Δ is constant and only depends on N. (c) Score as a function
√
of the number of measurements t, for N = 2J = 1.50 and Δ = 8 N. See Section 4.3 for further details.

36
Entropy 2018, 20, 39

Most importantly, for weak enough measurements, the obtained score is close to the optimal one,
as shown in Figure 5. Numerical ﬂuctuations prevent any precise statements about an estimation of the
error, but the error is close to what was obtained with the initial measurement procedure; see Figure 5.

Figure 5. Mean score Gmax as a function of N maximized over t. A too strong measurement (Δ = 1) fails
√
to achieve an optimum. A weak enough measurement (Δ = 8 N) achieves a good score. The insert
shows N = 2 (1 − GN ). See Section 4.3 for further details.
N

5. Conclusions
In this paper, we asked the question of how to model everyday measurements of a macroscopic
system within quantum mechanics. We introduced the notion of Macroscopic Quantum Measurement
and argued that such a measurement should be highly non-invasive, collect a large amount of
information in a single shot and be described by a “fairly simple” coupling between system and
observer. We proposed a concrete model based on a pointer von Neumann measurement inspired
by the Arthur–Kelly model, where a pointer is coupled to the macroscopic quantum system through
a Hamiltonian and then measured. This approach applies to many situations, as long as a natural
Hamiltonian for the measured system can be found.
Here, we focused on the problem of a direction estimation. The Hamiltonian naturally couples
the spin of the macroscopic quantum state to the position of a pointer in three dimensions, which
is then measured. This reveals information about the initial direction of the state. We extended our
previous study to consider a collection of aligned spins, which exploits the non-monotonic behavior of
the mean score as a function of the coupling strength. We presented more precise results. We relaxed
the assumptions about the measured system, by considering a thermal state of finite temperature
and showed that our initial conclusions are still valid. We also relaxed the assumptions over the
measurement scheme, looking at its approximation by a repetition of ultra weak measurements
in several orthogonal directions. Here again, we obtained numerical results supporting the initial
conclusion. In summary, this MQM proposal tolerates several relaxations regarding lack of control
or knowledge.
It is likely that these two relaxations can be unified: polarization measurement of systems with
n unknown number of particle or temperature should be accessible via the repeated 1D ultra-weak
measurement method. However, this claim has to be justified numerically. Further open questions
include the behavior of Arthur–Kelly models in other situations where two or more non-commuting
quantities have to be estimated, e.g., for position and velocity estimation.

37
Entropy 2018, 20, 39

Acknowledgments: We would like to thank Tomer Barnea for fruitful discussions. Partial financial support by
ERC-AG MEC and Swiss NSF is gratefully acknowledged.
Author Contributions: Nicolas Gisin suggested the study. Marc-Olivier Renou and Florian Fröwis performed
the simulations and worked out the theory. Marc-Olivier Renou wrote the paper. All authors discussed the
results and implications and commented on the manuscript at all stages. All authors have read and approved the
final manuscript.
Conflicts of Interest: The authors declare no conflict of interest.

References and Notes

1. Massar, S.; Popescu, S. Optimal Extraction of Information from Finite Quantum Ensembles. Phys. Rev. Lett.
1995, 74, 1259–1263.
2. Gisin, N.; Popescu, S. Spin Flips and Quantum Information for Antiparallel Spins. Phys. Rev. Lett. 1999,
83, 432–435.
3. Chiribella, G.; D’Ariano, G.M. Maximum likelihood estimation for a group of physical transformations.
Int. J. Quantum Inf. 2006, 4, 453–472.
4. Scarani, V.; Iblisdir, S.; Gisin, N.; Acín, A. Quantum cloning. Rev. Mod. Phys. 2005, 77, 1225–1256.
5. Cirac, J.; Ekert, A.; Macchiavello, C. Optimal purification of single qubits. Phys. Rev. Lett. 1999, 82, 4344.
6. Bagan, E.; Monras, A.; Muñoz Tapia, R. Comprehensive analysis of quantum pure-state estimation for
two-level systems. Phys. Rev. A 2005, 71, 062318.
7. Bagan, E.; Ballester, M.A.; Gill, R.D.; Monras, A.; Muñoz-Tapia, R. Optimal full estimation of qubit mixed
states. Phys. Rev. A 2006, 73, 032301.
8. Sacchi, M.F. Information-disturbance tradeoff for spin coherent state e stimation. Phys. Rev. A 2007, 75, 012306.
9. Gendra, B.; Ronco-Bonvehi, E.; Calsamiglia, J.; Muñoz-Tapia, R.; Bagan, E. Optimal parameter estimation
with a fixed rate of abstention. Phys. Rev. A 2013, 88, 012128.
10. Latorre, J.; Pascual, P.; Tarrach, R. Minimal optimal generalized quantum measurement. Phys. Rev. Lett.
1998, 81, 1351.
11. Chiribella, G.; D’Ariano, G.M.; Schlingemann, D.M. How Continuous Quantum Measurements in Finite
Dimensions Are Actually Discrete. Phys. Rev. Lett. 2007, 98, 190403.
12. Barnea, T.; Renou, M.O.; Fröwis, F.; Gisin, N. Macroscopic quantum measurements of non-commuting
observables. Phys. Rev. A 2017, 96, 012111.
13. Poulin, D. Macroscopic observables. Phys. Rev. A 2005, 71, 022102.
14. D’Ariano, G.; Presti, P.L.; Sacchi, M. A quantum measurement of the spin direction. Phys. Lett. A 2002,
292, 233–237.
15. Holevo, A. Probabilistic and Statistical Aspects of Quantum Theory; Edizioni della Normale: Pisa, Italy, 1982.

16. Often, the considered score is F = dr du p(r |u) f (u, vr ), where f (u, vr ) = | u|vr |2 can be seen as
the fidelity between qubits |u and |vr , where a unit vector is associated with the corresponding qubit
via the Bloch sphere identification. As F = 12 (1 + G ), this is equivalent. We chose this formulation for
practical reason.
17. Arthurs, E.; Kelly, J.L. On the Simultaneous Measurement of a Pair of Conjugate Observables. Bell Syst. Tech. J.
1965, 44, 725–729.
18. Pal, R.; Ghosh, S. Approximate joint measurement of qubit observables through an Arthur–Kelly model.
J. Phys. A Math. Theor. 2011, 44, 485303.
19. Levine, R.Y.; Tucci, R.R. On the simultaneous measurement of spin components using spin-1/2 meters:
Naimark embedding and projections. Found. Phys. 1989, 19, 175–187.
20. Von Neumann, J. Mathematical Foundations of Quantum Mechanics; Princeton University Press: Princeton, NJ,
USA, 1955.
21. Busch, P.; Lahti, P.J.; Mittelstaedt, P. The Quantum Theory of Measurement; Springer: Berlin/Heidelberg,
Germany, 1991; pp. 27–98.
22. Peres, A. Quantum Theory: Concepts and Methods; Kluwer Academic Publishers: Norwell, MA, USA, 2002.
23. This method is equivalent to the one previously used in [12]. There, z|⊗ N Ωr |z⊗ N is lower bounded by
| z|⊗ N Er |r ⊗ N |2 , but as the Krauss operator is diagonal, this last term is nothing else than o J | z|⊗ N |r ⊗ N |2 .

38
Entropy 2018, 20, 39

24. We can interpret this physically. We see from Section 2.3 that the best covariant measurement is obtained
from Oz ∝ | J, J J, J |. Other covariant measurements can be obtained with Oz ∝ |m, J m, J | for 0 ≤ m < J.
The coefﬁcients om can be interpreted as how much each of these measurements is done. The term |m, J m, J |
can also be thought as the physical system used to measure. When it is highly polarized (m = J), the
measurement is efﬁcient. However, when the polarization is low, the information gain is weak, e.g., m = 0,
and we clearly see that all POVM elements are ∝ 11.
25. Hacohen-Gourgy, S.; Martin, L.S.; Flurin, E.; Ramasesh, V.V.; Whaley, K.B.; Siddiqi, I. Quantum dynamics of
simultaneously measured non-commuting observables. Nature 2016, 538, 491–494.

39
entropy
Article
Iterant Algebra
Louis H. Kauffman
Department of Mathematics, Statistics and Computer Science, University of Illinois at Chicago,
851 South Morgan Street, Chicago, IL 60607-7045, USA; [email protected]; Tel.: +1-773-363-5115

Received: 29 May 2017; Accepted: 5 July 2017; Published: 11 July 2017

Abstract: We give an exposition of iterant algebra, a generalization of matrix algebra that is motivated
by the structure of measurement for discrete processes. We show how Clifford algebras and matrix
algebras arise naturally from iterants, and we then use this point of view to discuss the Schrödinger
and Dirac equations, Majorana Fermions, representations of the braid group and the framed braids
in relation to the structure of the Standard Model for physics.

Keywords: iterant; Clifford algebra; matrix algebra; braid group; Fermion; Dirac equation

1. Introduction
This is a paper about an approach to algebra that we call iterants. The idea behind the definition
of iterant (see Section 2) is that one is studying a periodic discrete process with an associated action
of a group of permutations on the sequences of the process. The simplest such discrete system is
an alternation between +1 and −1. We will show that this system gives rise in a natural way to
the square root of minus one. This way thinking about the square root of minus one as an iterant
is explained below. More generally, by starting with a discrete time series of positions, one has a
non-commutativity of observations due to time-delays (the clock must tick to measure a velocity) and
this non-commutativity can be encapsulated in a generalized iterant algebra as defined in Section 3 of
the present paper. Iterant algebra generalizes matrix algebra and we shall see how it can be used to
formulate the algebra of the framed Artin Braid Group, the Lie algebra su(3) for the Standard Model
for particle physics, the framed braid representations for Fermions of Sundance Bilson-Thompson,
the Clifford algebra for Majorana Fermions and the structure of the Schrödinger and Dirac equations.
This paper is a sequel to [1] and it uses material from that paper and extends it into the more general
context of the present paper. See also [1–4] for previous work by the author about iterants. This
paper also incorporates results of the author that appear in the joint paper of the author and Rukhsan
Ul-Haq [5]. Our intent is to give a picture of the range of application of the basic mathematical idea of
iterants and to include a description of the basic results that make them work.
This paper is organized as follows. Sections 2–4 are devoted to the mathematics of iterants. Each
remaining section of the paper applies the iterant structure to a topic in mathematical physics that is of
interest to the author. We hope that the reader finds the first few sections to be a readable introduction
to iterants. An interested reader can then turn to the remaining sections to see how iterants can be
used in specific cases. The reader should note that since applying iterants often means reformulating a
topic usually written in matrix algebra in terms of iterant algebra, and the specific interest in such a
formulation may be, at this time, of a formal nature. Nevertheless, the reformulation often raises many
interesting questions, and these will be the subject of subsequent work.
Sections 2 and 3 are an introduction to the process algebra of iterants and how the square root of
minus one arises from an alternating process. Section 4 shows how iterants give an alternative way to
do matrix algebra. The section ends with the construction of the split quaternions. Section 4 considers
iterants of arbitrary period (not just two) and shows, with the example of the cyclic group, how the
ring of all n × n matrices can be seen as a faithful representation of an iterant algebra based on the

Entropy 2017, 19, 347; doi:10.3390/e19070347 41 www.mdpi.com/journal/entropy

Entropy 2017, 19, 347

cyclic group of order n. We then generalize this construction (Theorem 1) to arbitrary non-commutative
ﬁnite groups G. Such a group has a multiplication table (n × n where n is the order of the group G).
We show that, by rearranging the multiplication table so that the identity elements appear on the
diagonal, we get a set of permutation matrices that represent the group faithfully as n × n matrices.
This gives a faithful representation of the iterant algebra associated with the group G onto the ring of
n × n matrices. As a result, we see that iterant algebra is fundamental to matrix algebra. Section 4 ends
with a number of classical examples including iterant representations for quaternion algebra.
Section 5 is a discussion of the Schrödinger equation. We formulate a discrete model related to
the diffusion equation by following a heuristic that would identify the square root of minus one as a
controlled oscillation between plus one and minus one. The resulting discrete model has the equation
(compare with [1])

ψ( x, t + τ ) = ((−1)n(t) /2)ψ( x − Δ, t) + (1 − (−1)n(t) )ψ( x, t) + ((−1)n(t) /2)ψ( x + Δ, t)

and satisfies a discrete version of the diffusion equation with an extra coefficient of (−1)n(t) , where
n(t) denotes the number of time steps τ needed to reach time t. By dividing this discrete system into its
even and odd parts (the parity of (−1)n(t) ), we retrieve the Schrödinger equation, and the formalism of
the complex numbers handles the parity. In the discrete model, the iterant structure appears directly.
Section 6 discusses the iterant structure of the framed Artin braid group via framed braids and
discusses the basics of the Sundance Bilson-Thompson model for elementary particles. In Section 7,
we apply this to a formulation of the particle model of Sundance Bilson-Thompson [6], using
framed braids.
In Section 7, we give an iterant interpretation of the su(3) Lie algebra for the Standard Model
using [7]. The resulting formulation of the su(3) Lie algebra is particularly elegant from our point
of view, and we expect it to give further insight into the standard model. This iterant formulation
of the su(3) Lie Algebra is so concise that we show it here in the Introduction. We use the specific
iterant formulas
T+ = [1, 0, 0] A, T− = [0, 1, 0] B,

U+ = [0, 1, 0] A, U− = [0, 0, 1] B,

V+ = [0, 0, 1] A, V− = [1, 0, 0] B,
1
T3 = [1/2, −1/2, 0], Y = √ [1, 1, −2].
3
We have the permutation relations A[ x, y, z] = [y, z, x ] A and B = A2 = A−1 so that B[ x, y, z] =
[z, y, x ] B. This reduces the basic su(3) Lie algebra to a very elementary patterning of order three cyclic
operations. The details of this formulation are given in Section 7.
In Section 8 we apply this point of view on the Standard Model to obtain an embedding of the
framed braid algebra for the Sundance Bilson-Thompson model into the iterant version of su(3). These
three sections are an account of research of the author and Rukhsan Ul-Haq in [5].
Section 9 discusses how Clifford algebras are related to the structure of Fermions. We show how
the algebra of the split quaternions, the very ﬁrst iterant algebra that appears in relation to the square
root of minus one, is behind the structure of the operator algebra of the electron. The Clifford structure
on two generators describes a pair of Majorana Fermion operators. Majorana Fermions are particles
that are their own antiparticles. These Majorana Fermion operators correspond to Clifford algebra
generators a and b such that a2 = b2 = 1 and ab = −ba. Using our iterant formulation, we can take a as
the iterant corresponding to a period two oscillation, and b as the time shifting operator. The product ab
is a square root of minus one in the non-commutative context of this Clifford algebra. The annihilation
operator for an electron can be symbolized by φ = ( a + ib)/2 and the creation operator for an electron
by φ† = ( a − ib)/2. These form the operator algebra for an electron. Note that

42
Entropy 2017, 19, 347

φ2 = ( a + ib)( a + ib)/4 = ( a2 − b2 + i ( ab + ba))/4 = (0 + i0)/4 = 0 = (φ† )2

and therefore
φφ† + φ† φ = (φ + φ† )2 = a2 = 1.

The electron is seen in terms of its underlying Clifford structure in the form of a pair of Majorana
Fermions. Section 9 shows how braiding is related to the Majorana Femions.
Section 10 discusses the structure of the Dirac equation and how the nilpotent and the Majorana
operators arise naturally in this context. This section provides a link between our work and the
work on nilpotent structures and the Dirac equation of Peter Rowlands [8]. We end this section
with an expression in split quaternions for the the Majorana–Dirac equation in (3 + 1) spacetime.
The Majorana–Dirac equation can be written as follows:

(∂/∂t + η̂η∂/∂x + ∂/∂y + η∂/∂z

ˆ − ˆ η̂ηm)ψ = 0,

where η and are the generators of our simplest iterant algebra with η 2 = 2 = 1 and η + η = 0.
The elements ,ˆ η̂ form a commuting copy of this algebra. This use of a combination of the simplest
Clifford algebra with itself is the underlying structure of the Majorana–Dirac equation.
We give a speciﬁc real solution to the Majorana–Dirac equation in our iterant/Clifford algebra
formalism. Here, ρ( x, t) = e( p• x− Et) , where p = ( p x , py , pz ) is a constant vector momentum, and x
denotes the vector ( x, y, z). The solution to the Majorana–Dirac equation is Γ̂ρ( x, t) as shown below:

Γ̂ρ( x, t) = (− E − η̂η p x − py − η

ˆ pz + ˆ η̂ηm)ρ( x, t).

This solution is real in the sense that its coordinates are all real valued functions once the iterant
or matrix forms for the operators are made explicit. The combination of iterant and Clifford algebra
language that we develop here makes the analysis of certain aspects of the Dirac equation and the
Majorana–Dirac equation very clear. More work needs to be done in all these fronts.
This paper is a snapshot of a larger story. Iterant algebra is a basically simple reformulation
of aspects of patterned algebra that can often illuminate correspondingly elementary topics in
mathematics and physics. The present work is a beginning in the larger enterprise of understanding
relationships in discrete physics and relationships between algbra and physics.

2. Iterants
An iterant is a sum of elements of the form

aσ = [ a1 , a2 , ..., an ]σ,

where a = [ a1 , a2 , ..., an ] is a vector of elements that are scalars (usually real or complex numbers) and
σ is a permutation on n letters. Vectors are added and multiplied coordinatewise (see below), and we
take the following rule for multiplication of vector/permutation combinations:

( aσ)(bτ ) = ( abσ )στ,

where bσ denotes the vector b with its elements permuted by the action of σ.
If a and b are vectors, then ab denotes the vector, where ( ab)i = ai bi , and a + b denotes the vector
where ( a + b)i = ai + bi . Then,
(ka)σ = k( aσ)
for a scalar k, and
( a + b)σ = aσ + bσ,

43
Entropy 2017, 19, 347

where vectors are multiplied as above and we take the usual product of the permutations. All of matrix
algebra is naturally represented in the iterant framework, as we shall see in the next sections.
For example, if η is the order two permutation of two elements, then [ a, b]η = [b, a]. Thus,

[ a, b]η = η [ a, b]η = η [b, a].

We deﬁne
i = [1, −1]η

and then

i2 = [1, −1]η [1, −1]η = [1, −1][−1, 1]]η 2 = [1, −1][−1, 1] = [−1, −1] = −1.

In this way, the complex numbers arise naturally from iterants. One can interpret [1, −1] as an
oscillation between +1 and −1 and η as a temporal shift operator. Then, i = [1, −1]η is a time sensitive
element and its self-interaction has square minus one. In this way, iterants can be interpreted as a
formalization of elementary discrete processes.
Note that if we let e = [1, −1], then e2 = 1, η 2 = 1 and eη = −ηe. Thus, e and η generate a small
Clifford algebra.

3. Iterants and Discrete Processes

The primitive idea behind an iterant is a periodic time series or “waveform”

· · · abababababab · · · .

We illustrate with period two. The elements of the waveform can be any mathematically or
empirically well-defined objects. We can regard the ordered pairs [ a, b] and [b, a] as abbreviations for
the waveform or as two points of view about the waveform (a first or b first). We have called [ a, b]
an iterant. Thinking of an iterant as a discrete process, we define a time shift operator η such that
[ a, b]η = η [b, a] and η 2 = 1.

Discrete Calculus and the Temporal Shift Operator. If we have a discrete time series X, X , X , · · · ,
then it is convenient to deﬁne an operator J so that X t J = JX t+1 , and it is this temporal shift operator
that can be used to correlate discrete calculus for the time-series. For example, we can deﬁne a discrete
derivative D by the equation
DX t = J ( X t+1 − X t )/Δt,

(with time increment equal to Δt). Note then that the derivative is expressed as a commutator:

DX t = J ( X t+1 − X t )/Δt = ( JX t+1 − JX t )Δt = ( X t J − JX t )/Δt = [ X t , J/Δt],

where here [ R, S] = RS − SR is the commutator. This means that this discrete derivative satisfies the
Leibniz rule for products, and it can be used for formulations of discrete physics. This use of the
temporal shift operator dovetails with its use for keeping track of observation in a discrete model,
where successive observations require temporal shifts. In particular, let P = mDX t and Q = X t denote
momentum and position, respectively (m is mass and commutes with J, as does Δt). Then, PQ and QP
do not commute and the temporal shift operator J keeps track of the fact that measuring momentum
requires a tick of the clock. We can interpret PQ as first measuring Q and then measuring P, while QP
represents first measuring P and then measuring Q :

PQ = (mDX t )( X t ) = (mJ ( X t+1 − X t )/Δt) X t = mJ ( X t+1 − X t ) X t /Δt,

QP = X t (mJ ( X t+1 − X t )/Δt) = mJX t+1 ( X t+1 − X t )/Δt.

44
Entropy 2017, 19, 347

Thus,
[ Q, P] = QP − PQ = mJ ( X t+1 − X t )2 /Δt = mJ (ΔX )2 /Δt.
In this form of discrete physics, the commutator equation

[ Q, P] = k,

where k is a constant, is satisfied by a Brownian walk with diffusion constant (ΔX )2 /Δt. In this way,
our interpretation of the square root of negative one in terms of the temporal shift operator fits into a
larger context of the physics of discrete observations. In this paper, we work with periodic series and
use cyclic operators such as η to keep track of the periodicity. For related discussion, see [2,3,5,9–16].
See also [17] for other uses of iterants in the context of Clifford algebras. For papers of the author about
discrete physics and quantum computing see [18–28].
We have defined products and sums of iterants as follows

[ a, b][c, d] = [ ac, bd]

and
[ a, b] + [c, d] = [ a + c, b + d].
The operation of juxtapostion of waveforms is multiplication while + denotes ordinary addition of
ordered pairs. These operations are natural with respect to the structural juxtaposition of iterants:

...abababababab...

...cdcdcdcdcdcd...

Structures combine at the points where they correspond. Waveforms combine at the times where
they correspond. Iterants combine in juxtaposition. This theme of including the result of time in
observations of a discrete system occurs at the foundation of our construction.
In the next section, we show how all matrix algebra can be formulated in terms of iterants.

4. Matrix Algebra via Iterants

Here is a direct translation of period-two iterants into 2 × 2 matrices. Let

a c
[ a, b] + [c, d]η = ,
d b

where
x 0
[ x, y] = ,
0 y

and
0 1
η= .
1 0

The reader will have no difﬁculty verifying that the usual deﬁnition of matrix multiplicaiton
corresponds exactly to the iterant multiplication that we have already described. In particular,

[ x, y][z, w] = [ xy, zw]

and
[ x, y] + [z, w] = [ x + y, z + w]

45
Entropy 2017, 19, 347

are rules of matrix multiplication and addition, as are

[ x, y]η = η [y, x ].

Thus, matrix multiplication and addition is identical with iterant multiplication. There are many ways
to motivate the rules for matrix algebra. Iterants are a natural entry into matrix structure.
The fact that the iterant expression [ a, d]1 + [b, c]η captures the whole of 2 × 2 matrix algebra
corresponds to the fact that a two by two matrix is combinatorially the union of the identity pattern
(the diagonal) and the interchange pattern (the antidiagonal) that correspond to the operators 1 and η :

∗ @
.
@ ∗

In the formal diagram for a matrix shown above, we indicate the diagonal by ∗ and the anti-diagonal by @.
In the case of complex numbers, we represent

a −b
= [ a, a] + [−b, b]η = a1 + b[−1, 1]η = a + bi.
b a

In this way, we see that 2 × 2 matrix algebra can be seen as a hypercomplex number system based on
the symmetric group S2 . In the next section, we generalize this point of view to arbitrary ﬁnite groups
by generalizing Cayley’s Theorem that shows that every ﬁnite group has a faithful representation as a
permutation group.
The factorization of i into a product η of non-commuting iterant operators shows, in the iterant
viewpoint, the temporal nature of i and its algebraic roots.
Note that the quaternions arise from the split quaternions: The split quaternions are the system

{±1, ±, ±η, ±i }.

Here, = 1 = ηη while i = η so that ii = −1. The quaternions come about once we construct an
√
extra square root of minus one that commutes with them. Call this extra root of minus one −1. Then,
the quaternions are generated by
√ √
I= −1, J = η, K = −1η

with
I 2 = J 2 = K2 = I JK = −1.

Remark 1. The rest of this section is an exposition of the higher period iterants and the general Theorem 1 about
ﬁnite groups and iterant matrix representations. The exposition follows the corresponding exposition in our
paper [1].

4.1. Iterants of Arbirtarily High Period and General Matrix Algebras

Consider a waveform of period three.

· · · abcabcabcabcabcabc · · ·

Here, we see three natural iterant views (depending upon whether one starts at a, b or c).

[ a, b, c], [b, c, a], [c, a, b].

46
Entropy 2017, 19, 347

The appropriate shift operator is given by the cyclic permutation S:

[ x, y, z]S = S[z, x, y].

With T = S2 , we have
[ x, y, z] T = T [y, z, x ]
and S3 = 1. We obtain a closed algebra of iterants whose general element is of the form

[ a, b, c] + [d, e, f ]S + [ g, h, k]S2 ,

where a, b, c, d, e, f , g, h, k are real or complex numbers. Call this algebra Vect3 (R) when the scalars are
in a commutative ring with unit F. Let M3 (F) denote the 3 × 3 matrix algebra over F. We have the:

Lemma 1. The iterant algebra Vect3 (F) is isomorphic to the full 3 × 3 matrix algebra M3 ((F).

Proof.
[ a, b, c] + [d, e, f ]S + [ g, h, k]S2
maps to the matrix ⎛ ⎞
a d g
⎜ ⎟
⎝ h b e ⎠,
f k c
preserving the algebra structure. Since any 3 × 3 matrix can be written uniquely in this form, it follows
that Vect3 (F) is isomorphic to the full 3 × 3 matrix algebra M3 (F).

We can summarize the pattern behind this expression of 3 × 3 matrices by the following
symbolic matrix: ⎛ ⎞
1 S T
⎜ ⎟
⎝ T 1 S ⎠.
S T 1
Here, the letter T occupies the positions in the matrix that correspond to the permutation matrix
that represents it, and the letter T = S2 occupies the positions corresponding to its permutation
matrix. The 1s occupy the diagonal for the corresponding identity matrix. The iterant representation
corresponds to writing the 3 × 3 matrix as a disjoint sum of these permutation matrices such that the
matrices themselves are closed under multiplication. In this case, the matrices form a permutation
representation of the cyclic group of order 3, C3 = {1, S, S2 }.

Remark 2. Note that a permutation matrix is a matrix of zeroes and ones such that some permutation of the
rows of the matrix transforms it to the identity matrix. Given an n × n permutation matrix P, we associate to it
a permuation

σ ( P) : {1, 2, · · · , n} −→ {1, 2, · · · , n}

via the following formula

iσ( P) = j,

where j denotes the column in P where the i-th row has a 1. Note that an element of the domain of a permutation
is indicated to the left of the symbol for the permutation. It is then easy to check that for permutation matrices
P and Q,
σ( P)σ( Q) = σ( PQ),

given that we compose the permutations from left to right according to this convention.

47
Entropy 2017, 19, 347

This construction generalizes directly for iterants of any period and hence for a set of operators
forming a cyclic group of any order. We shall generalize further to any finite group G. We now define
Vectn ( G, F) for any finite group G.

Deﬁnition 1. Let G be a ﬁnite group, written multiplicatively. Let F denote a given commutative ring with
unit. Assume that G acts as a group of permutations on the set {1, 2, 3, · · · , n} so that given an element g ∈ G
we have (by abuse of notation)

g : {1, 2, 3, · · · , n} −→ {1, 2, 3, · · · , n}.

We shall write
ig

for the image of i ∈ {1, 2, 3, · · · , n} under the permutation represented by g. The notation denotes functionality
from the left. We have (ig)h = i ( gh) for all elements g, h ∈ G and i1 = i for all i, in order to have a
representation of G as permutations. We shall call an n-tuple of elements of F a vector and denote it by
a = ( a1 , a2 , · · · , an ). We then deﬁne an action of G on vectors over F by the formula

a g = ( a1g , a2g , · · · , ang ),

and note that ( a g )h = a gh for all g, h ∈ G. Deﬁne an algebra Vectn ( G, F), the iterant algebra for G, to be the
set of ﬁnite sums of formal products of vectors and group elements in the form ag with multiplication rule

( ag)(bh) = ab g ( gh),

and the understanding that ( a + b) g = ag + bg and for all vectors a, b and group elements g. It is understood
that vectors are added coordinatewise and multiplied coordinatewise. Thus, ( a + b)i = ai + bi and ( ab)i = ai bi .

Theorem 1. Let G be a ﬁnite group of order n [1]. Let ρ : G −→ Sn denote the right regular representation of G
as permutations of n objects. List the elements of G as G = { g1 , · · · , gn }, and let G act on its own underlying
set via the deﬁnition gi ρ( g) = gi g. Here, we describe ρ( g) acting on the set of elements gk of G. We also regard
ρ( g) as a mapping of the set {1, 2, · · · n}, replacing gk by k and iρ( g) = k where gi g = gk .
Then, Vectn ( G, F) is isomorphic to the matrix algebra Mn ((F). In particular, Vectn! (Sn , F) is isomorphic
with the matrices of size n! × n!, Mn! ((F).

Proof. Take the multiplication table for G to be the n × n matrix with columns and rows listed in
the order [ g1 , · · · , gn ]. Permute the rows of this matrix so that the diagonal consists in all 1 s. Let the
resulting matrix be called the G-Table. The G-Table is labeled by elements of the group. For any vector
a, let D ( a) denote the n × n diagonal matrix whose entries in order down the diagonal are the entries
of a in the order speciﬁed by a. For each group element g, let Pg denote the permutation matrix with 1
in every spot on the G-Table that is labeled by g and 0 in all other spots. It is now a direct veriﬁcation
that the mapping
F (Σin=1 ai gi ) = Σin=1 D ( ai ) Pgi

deﬁnes an isomorphism from Vectn ( G, F) to the matrix algebra Mn ((F). The main point to check is
that σ( Pg ) = ρ( g). We now prove this fact.
In the G-Table, the rows correspond to { g1−1 , g2−1 , · · · gn−1 } and the columns correspond to
{ g1 , g2 , · · · gn } so that the i-i entry of the table is gi−1 gi = 1. With this, we have that, in the table,
a group element g occurs in the i-th row at column j where gi−1 g j = g. This is equivalent to the
equation gi g = g j which, in turn, is equivalent to the statement iρ( g) = j. This is exactly our functional
interpretation of the action of the permutation corresponding to the matrix Pg . Thus, ρ( g) = σ ( Pg ).
The rest of the proof is straightforward and left to the reader.

48
Entropy 2017, 19, 347

Example 1.

1. Consider the cyclic group of order three.

C3 = {1, S, S2 }

with S3 = 1. The multiplication table is

⎛ ⎞
1 S S2
⎜ ⎟
⎝ S S2 1 ⎠.
S2 1 S

Interchanging the second and third rows, we obtain

⎛ ⎞
1 S S2
⎜ 2 ⎟
⎝ S 1 S ⎠,
S S2 1

and this is the G-Table that we used for Vect3 (C3 , F) prior to proving the Main Theorem. The same
pattern works for abitrary cyclic groups.

2. Consider the symmetric group on six letters,

S6 = {1, R, R2 , F, RF, R2 F },

where R3 = 1, F2 = 1, FR = RF2 . Then, the multiplication table is

⎛ ⎞
1 R R2 F RF R2 F
⎜ R R2 1 RF R2 F F ⎟
⎜ ⎟
⎜ R2 R2 F ⎟
⎜ 1 R F RF ⎟
⎜ ⎟.
⎜ F R2 F RF 1 R2 R ⎟
⎜ ⎟
⎝ RF F R2 F R 1 R2 ⎠
R2 F RF F R2 R 1

The corresponnding G-Table is

⎛ ⎞
1 R R2 F RF R2 F
⎜ R2 1 R R2 F F RF ⎟
⎜ ⎟
⎜ R R2 R2 F ⎟
⎜ 1 RF F ⎟
⎜ ⎟.
⎜ F R2 F RF 1 R2 R ⎟
⎜ ⎟
⎝ RF F R2 F R 1 R2 ⎠
R2 F RF F R2 R 1

This G-Table encodes the isomorphism of Vect6 (S3 , F) with the full algebra of six by six matrices. Similarly,
Vectn! (Sn , F) is isomorphic with the full algebra of n! × n! matrices. The permutation matrices are
obtained from the G-Table by choosing a given group element and then replacing it by 1 for each appearance
in the table, and replacing the other elements of the table by 0. For example, we have the permutation
matrix for R given by the formula below:

49
Entropy 2017, 19, 347

⎛ ⎞
0 1 0 0 0 0
⎜ 0 0 1 0 0 0 ⎟
⎜ ⎟
⎜ ⎟
⎜ 1 0 0 0 0 0 ⎟
R=⎜ ⎟.
⎜ 0 0 0 0 0 1 ⎟
⎜ ⎟
⎝ 0 0 0 1 0 0 ⎠
0 0 0 0 1 0

3. Consider the group G = C2 × C2 , the “Klein 4-Group”. Take G = {1, A, B, C } where A2 = B2 = C2 =

1, AB = BA = C. G has the multiplication table, which is also its G-Table for Vect4 ( G, F) :
⎛ ⎞
1 A B C
⎜ A ⎟
⎜ 1 C B ⎟
⎜ ⎟.
⎝ B C 1 A ⎠
C B A 1

Thus, we have the corresponding permutation matrices that I shall call E, A, B, C. The reader can verify
that A2 = B2 = C2 = 1, AB = BA = C. Let

α = [1, −1, −1, 1], β = [1, 1, −1, −1], γ = [1, −1, 1, −1].

In addition, let
I = αA, J = βB, K = γC.

Then, it is easy to check that

I 2 = J 2 = K2 = I JK = −1, I J = K, J I = −K.

Thus, we have constructed the quaternions as iterants in relation to the Klein 4-Group. In Figure 1, we
illustrate these quaternion generators with string diagrams for the permutations. The reader can check
that the permuations correspond to the permutation matrices constructed for the Klein 4-Group.

+ + + + + - - + + + - - + - + -

1 I J K
Elements of the Klein Four-Group.
Basic products: II = JJ = KK = IJK = -1
+ - - +
+ - + -
+ - + -
I

K
= =
+ + - -

J
IJ = K

Product of I and J perfomed as flat framed

braid multiplication.

Figure 1. Quaternions from the Klein 4-Group.

50
Entropy 2017, 19, 347

4. Since complex numbers commute with one another, we could consider iterants whose values are in the
complex numbers. This is just like considering matrices whose entries are complex numbers. Thus, we
shall allow a version of i that commutes with the iterant shift operator η. Let this commuting i be denoted
by ι. Then, we are assuming that

ι2 = −1,

ηι = ιη,

η 2 = +1.

We then consider iterants of the form [ a + bι, c + dι] and [ a + bι, c + dι]η = η [c + dι, a + bι]. In particular,
we have = [1, −1], and i = η is quite distinct from ι. Note, as before, that η = −η and that 2 = 1.
Now, let

I = ι,

J = η,

K = ιη.

We ﬁnd the quaternions once more:

I 2 = ιι = ιι = (−1)(+1) = −1,

J 2 = ηη = (−)ηη = −1,

K2 = ιηιη = ιιηη = −1,

I JK = ιηιη = ι1ιηη = ιι = −1.

Thus,
I 2 = J 2 = K2 = I JK = −1.

This construction shows how the structure of the quaternions comes directly from the non-commutative
structure of period two iterants. The group SU (2) of 2 × 2 unitary matrices of determinant one is
isomorphic to the quaternions of length one.

5. Similarly,
a c + dι
H = [ a, b] + [c + dι, c − dι]η =
c − dι b

represents a Hermitian 2 × 2 matrix and hence an observable for quantum processes mediated by SU (2).
Hermitian matrices have real eigenvalues.

If in the above Hermitian matrix form, we take a = T + X, b = T − X, c = Y, d = Z, then we obtain an

iterant and/or matrix representation for a point in Minkowski spacetime:

T+X Y + Zι
H = [ T + X, T − X ] + [Y + Zι, Y − Zι]η = .
Y − Zι T−X

Note that we have the formula

Det( H ) = T 2 − X 2 − Y 2 − Z2 .

51
Entropy 2017, 19, 347

√
It is not hard to see that the eigenvalues of H are T ± X 2 + Y 2 + Z2 . Thus, viewed as an observable, H
can observe the time and the invariant spatial distance from the origin of the event ( T, X, Y, Z ). At least
at this very elementary juncture, quantum mechanics and special relativity are reconciled.

6. Hamilton’s Quaternions are generated by iterants, as discussed above, and we can express them purely
algebraicially by writing the corresponding permutations as shown below:

I = [+1, −1, −1, +1]s,

J = [+1, +1, −1, −1]l,

K = [+1, −1, +1, −1]t,

where
s = (12)(34),

l = (13)(24),

t = (14)(23).

Here, we represent the permutations as products of transpositions (ij). The transposition (ij) interchanges
i and j, leaving all other elements of {1, 2, ..., n} ﬁxed.
One can verify that
I 2 = J 2 = K2 = I JK = −1.

Note that making an iterant interpretation of an entity like I = [+1, −1, −1, +1]s is a conceptual
departure from our original period two iterant (or cyclic period n) notion. Now, we consider iterants such
as [+1, −1, −1, +1] where the permutation group acts to produce other orderings of a given sequence.
The iterant itself can represent a form that can be seen in any of its possible orders. These orders are
subject to permutations that produce the possible views of the iterant. Algebraic structures such as the
quaternions appear in the explication of such forms.

7. In all these examples, we can interpret the iterants as short hand for matrix algebra based on permutation
matrices, or as indicators of discrete processes. The discrete processes become more complex in proportion
to the complexity of the groups used in the construction. We began with processes of order two, then
considered cyclic groups of arbitrary order, then the symmetric group S3 in relation to 6 × 6 matrices,
and the Klein 4-Group in relation to the quaternions. In the case of the quaternions, we know that
this structure is intimately related to rotations of three- and four-dimensional space and many other
geometric themes.

5. Schrödinger’s Equation
In this section, we go more deeply into a treatment of Schrödinger’s equation that was begun
in the introduction to [1]. In that paper, we used this example for Schrödinger’s equation to
motivate the introduction of iterants. Here, we already have iterants, but we ﬁnd that a discrete
model for Schrödinger’s equation instantiates an alternating pattern that is essentially of the form
· · · + − + − + − + · · · , and the problem of taking the continuum limit of this discrete model leads to
the complex numbers by a parity consideration. The parity consideration corresponds to our iterant
construction of the square root of minus one, and so we see in this model how the iterant square root
of minus one can correspond to an alternation in a discrete process while the usual square root of
minus one describes the behaviour of the limit of the process.

52
Entropy 2017, 19, 347

5.1. Brownian Walks and the Diffusion Equation

Recall how the diffusion equation arises in discussing Brownian motion. We are given a Brownian
process where
x (t + τ ) = x (t) ± Δ,

so that the time step is τ and the space step is of absolute value Δ. We regard the probability of left or
right steps as equal, so that if P( x, t) denotes the probability that the Brownian particle is at point x at
time t, then
P( x, t + τ ) = P( x − Δ, t)/2 + P( x + Δ, t)/2.

From this equation for the probability, we can write a difference equation for the partial derivative of
the probability with respect to time:

( P( x, t + τ ) − P( x, t))/τ = (h2 /2τ )[( P( x − Δ, t) − 2P( x, t) + P( x + Δ))/Δ2 ].

The expression in brackets on the right-hand side is a discrete approximation to the second partial of
P( x, t) with respect to x. Thus, if the ratio C = Δ2 /2τ remains constant as the space and time intervals
approach zero, then this equation goes in the limit to the diffusion equation

∂P( x, t)/∂t = C∂2 P( x, t)/∂x2 .

C is called the diffusion constant for the Brownian process.

5.2. An Iterant Intepretation of Schrödinger’s Equation

Recall that Schrödinger’s equation can be regarded as the diffusion equation with an imaginary
diffusion constant. Recall how this works. The Schrödinger equation is

ih̄∂ψ/∂t = Hψ,

where the Hamiltonian H is given by the equation H = p2 /2m + V, where V ( x, t) is the potential
energy and p = (h̄/i )∂/∂x is the momentum operator. With this, we have p2 /2m = (−h̄2 /2m)∂2 /∂x2 .
Thus, with V ( x, t) = 0, the equation becomes ih̄∂ψ/∂t = (−h̄2 /2m)∂2 ψ/∂x2 , which simpliﬁes to

∂ψ/∂t = (ih̄/2m)∂2 ψ/∂x2 .

Thus, we have arrived at the form of the diffusion equation with an imaginary constant, and it is
possible to make the identiﬁcation with the diffusion equation by setting

h̄/m = Δ2 /τ,

where Δ denotes a space interval, and τ denotes a time interval as explained in the last section
about the Brownian walk. With this, we can ask what space interval and time interval will satisfy
this relationship? One answer is that this equation is satisﬁed when m is the Planck mass, Δ is
the Planck length and τ is the Planck time. Note that L2 /T = (h̄/Mc)2 /(h̄/Mc2 ) = h̄/M. Here,
h̄ is Planck’s constant divided by 2π. c is the speed of light. G is Newton’s gravitational constant.
√
M = h̄c/G, L = h̄/Mc, T = h̄/Mc2 .
What does all this say about the nature of the Schrödinger equation itself? Consider a discrete
function ψ( x, t) deﬁned (recursively) by the following equation:

ψ( x, t + τ ) = (i/2)ψ( x − Δ, t) + (1 − i )ψ( x, t) + (i/2)ψ( x + Δ, t).

53
Entropy 2017, 19, 347

In other words, we are thinking here of a random “quantum walk” where the amplitude for stepping
right or stepping left is proportional to i while the amplitude for not moving at all is proportional to
(1 − i ). It is then easy to see that ψ is a discretization of

∂ψ/∂t = (iΔ2 /2τ )∂2 ψ/∂x2 .

Just note that ψ satisﬁes the difference equation

(ψ( x, t + τ ) − ψ( x, t))/τ = (iΔ2 /2τ )(ψ( x − Δ, t) − 2ψ( x, t) + ψ( x + Δ, t))/Δ2 .

This gives a direct interpretation of the solution to the Schrödinger equation as a limit of a sum over
generalized Brownian paths with complex amplitudes.

Replacing i by An Iterant. Now, however, suppose that we replace i by (−1)n(t) at time step t = n(t)τ
where n(t) is a non-negative integer. Instead of writing

ψ( x, t + τ ) = (i/2)ψ( x − Δ, t) + (1 − i )ψ( x, t) + (i/2)ψ( x + Δ, t),

we will write

ψ( x, t + τ ) = ((−1)n(t) /2)ψ( x − Δ, t) + (1 − (−1)n(t) )ψ( x, t) + ((−1)n(t) /2)ψ( x + Δ, t).

Then, we will ﬁnd that

(ψ( x, t + τ ) − ψ( x, t))/τ = (−1)n(t) (Δ2 /2τ )(ψ( x − Δ, t) − 2ψ( x, t) + ψ( x + Δ, t))/Δ2 ,

so that the diffusion equation seems to have been replaced with an equation of the form

∂ψ/∂t = ±κ∂2 ψ/∂x2 .

We wish to consider the continuum limit. However, there is no meaning to

(−1)n(t)

in the realm of continuous time. In the discrete world, the wave function ψ divides into ψe and ψo
where the (discrete) time, n(t), is either even or odd. We write

∂t ψe = κ∂2x ψo ,

∂t ψo = −κ∂2x ψe ,

and take the continuum limit of ψe and ψo separately.

In fact, we can interpret the {±} as the complex number i. We write

ψ = ψe + iψo ,

so that
i∂t ψ = i∂t (ψe + iψo ) = i∂t ψe − ∂t ψo

= iκ∂2x ψo + κ∂2x ψe = κ∂2x (ψe + iψo )

= κ∂2x ψ.
Thus,
i∂ψ/∂t = κ∂2 ψ/∂x2 .

54
Entropy 2017, 19, 347

This the Schrödinger equation. Instead of the simple diffusion equation, we have a mutual
dependency where the temporal variation of ψe is mediated by the spatial variation of ψo
and vice-versa:
ψ = ψe + iψo ,

∂t ψe = κ∂2x ψo ,

∂t ψo = −κ∂2x ψe ,

i∂ψ/∂t = κ∂2 ψ/∂x2 .

Note that in terms of the iterant interpretation, the pair [ψe , ψo ] is an abbreviation of the temporal
series · · · ψt , ψt+τ , ψt+2τ , · · · that represents the discrete process ψt+τ ( x ) = ((−1)n(t) /2)ψt ( x − Δ) +
(1 − (−1)n(t) )ψt ( x ) + ((−1)n(t) /2)ψt ( x + Δ) Here, the process itself is not periodic, but the underlying
alternation of the parity of (−1)n(t) gives the iterant stucture that allows the use of i as a combination
of shift and permutation.

Remark 3. The discrete recursion at the beginning of this section can be implemented to approximate solutions
to the Schrödinger equation. This will be the subject of another paper. The main point of this section is that a
discrete version of the Schrödinger equation actually uses the temporal iterant interpretation of the square root
of minus one, so that one can think of this oscillation as part of a discrete process in back of the Schrödinger
evolution. This reformulation of basic quantum mechanics deserves further study.

6. The Framed Braid Group and the Sundance Bilson-Thompson Model for Elementary Particles
The reader should recall that the symmetric group Sn has presentation

Sn = ( T1 , · · · Tn−1 | Ti2 = 1, Ti Ti+1 Ti = Ti+1 Ti Ti+1 , Ti Tj = Tj Ti ; |i − j| > 1).

The Artin Braid Group Bn is a relative of the symmetric group that is obtained by removing the
condition that each generator has a square equal to the identity:

Bn = (σ1 , · · · σn−1 |σi σi+1 σi = σi+1 σi σi+1 , σi σj = σj σi ; |i − j| > 1).

In Figure 2, we illustrate the the generators σ1 , σ2 , σ3 of the 4-strand braid group and we show the
topological nature of the relation σ1 σ2 σ1 = σ2 σ1 σ2 and the commuting relation σ1 σ3 = σ3 σ1 . Topological
braids are represented as collections of always descending strings, starting from a row of points and
ending at another row of points. The strings are embedded in three-dimensional space and can wind
around one another. The elementary braid generators σi correspond to the i-th strand interchanging
with the (i + 1)-th strand. Two braids are multiplied by attaching the bottom endpoiints of one braid
to the top endpoints of the other braid to form a new braid.
There is a fundamental homomorphism

π : Bn −→ Sn

deﬁned on generators by
π (σi ) = Ti

in the language of the presentations above. In terms of the diagrams in Figure 2, a braid diagram is a
permutation diagram if one forgets about its weaving structure of over and under strands at a crossing.

55
Entropy 2017, 19, 347

σ1 σ2
Braid Generators

σ3 σ 1-1

= σ 1-1 σ1 = 1

= σ1 σ 2 σ1 = σ 2 σ1 σ 2

= σ1 σ 3 = σ 3 σ1

Figure 2. Braid generators.

We now turn to a generalization of the braid group, the framed braid group. In this generalization,
we associate elements of the form t a to the top of each braid strand. For these purposes, it is useful
to take t as an algebraic variable and a as an integer. To interpret this framing, geometrically replace
each braid strand by a ribbon and interpret t a as a 2πa twist in the ribbon. In Figure 3, we illustrate
how to multiply two framed braids. In our formalism, the braids A and B in this ﬁgure are given by
the formulas
A = [t a , tb , tc ]σ1 σ2 σ3 ,

B = [td , te , t f ]σ2 σ3 ,

in the framed braid group on three strands, denoted FB3 . As the Figure 3 illustrates, we have the
basic formula
vσ = σvπ (σ) ,

where v is a vector of the form v = [t a , tb , tc ] (for n = 3) and vπ (σ) denotes the action of the permutation
associated with the braid σ on the vector v. In the ﬁgure, the permutation is accomplished by sliding
the algebra along the strings of the braid.

a b c d e f
t t t t t t

= =

c b a d e
t f
t t t t t
A B
a b c a +f b +e c +d
t t t t t t

AB = =
d e f
t t t

Figure 3. Framed braids.

We can form an algebra Alg[ FBn ] by taking formal sums of framed braids of the form ∑ ck vk Gk ,
where ck is a scalar, vk is a framing vector and Gk is an element of the Artin Braid group Bn . Since

56
Entropy 2017, 19, 347

braids act on framing vectors by permutations, this algebra is a generalization of the iterant algebras
we have deﬁned so far. The algebra of framed braids uses an action of the braid group based on its
representation to the symmetric group. Furthemore, the representation π : Bn −→ Sn induces a map
of algebras
π̂ : Alg[ FBn ] −→ Alg[ FSn ],

where we recognize Alg[ FSn ] as exactly an iterant algebra based in Sn .

In [6], Sundance Bilson-Thompson represents Fermions as framed braids. See Figure 4 for his
diagrammatic representations. In this theory, each Fermion is associated with a framed braid. Thus,
from the ﬁgure, we see that the positron and the electron are given by the framed braids

e+ = [t, t, t]σ1 σ2−1 ,

and
e− = σ2 σ1−1 [t−1 , t−1 , t−1 ].

Here, we use [t a , tb , tc ] for the framing numbers ( a, b, c). Products of framed braids correspond to
particle interactions. Note that e+ e− = [1, 1, 1] = γ so that the electron and the positron are inverses
in this algebra. In Figure 5 are illustrated the representations of bosons, including γ, a photon and
the identity element in this algebra. Other relations in the algebra correspond to particle interactions.
For example, in Figure 6 the muon decay is illustrated:

μ → νμ + W− → νμ + ν¯e + e− .

The reader can see the deﬁnitions of the different parts of this decay sequence from the three
ﬁgures we have just mentioned. Note that strictly speaking the muon decay is a multiplicative identity
in the braid algebra:
μ = νμ W− = νμ ν¯e e− .

Particle interactions in this model are mediated by factorizations in the non-commutative algebra of the
framed braids.

Figure 4. Sundance Bilson-Thompson Framed Braid Fermions (“(3)” under the labels for the up and
down quarks and antiquarks represent the fact that there are three permutations of charge placement
giving the three colours).

Figure 5. Bosons.

57
Entropy 2017, 19, 347

Figure 6. Representation of μ → νμ + W− → νμ + ν¯e + e− .

By using the representation π̂ : Alg[ FB3 ] −→ Alg[ FS3 ], we can image the structure of
Bilson-Thompson’s framed braids in the the iterant algebra corresponding to the symmetric group.
However, we propose to change this map so that we have a non-trivial representation of the Artin
braid group. This can be accomplished by deﬁning

ρ : Alg[ FB3 ] −→ Alg[ FS3 ],

where
ρ(σk ) = [t, t, t] Tk

and
ρ(σk−1 ) = [t−1 , t−1 , t−1 ] Tk

for k = 1, 2. The reader will ﬁnd that we have now represented the braid group in the iterant algebra
Alg[ FS3 ] and extended the representation to the framed braid group algebra. Thus, the Sundance
Bilson-Thompson representation of elementary particles as framed braids is represented inside the iterant algebra
for the symmetric group on three letters. In Section 10, we carry this further and place the representation
inside the Lie Algebra su(3).

7. Iterants and the Standard Model

In this section, we shall give an iterant interpretation for the Lie algebra of the special unitary
group SU (3). The Lie algebra in question is denoted as su(3) and is often described by a matrix basis.
The Lie algebra su(3) is generated by the following eight Gell Man Matrices [29]:
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
0 1 0 0 −i 0 1 0 0
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
λ1 = ⎝ 1 0 0 ⎠ , λ2 = ⎝ i 0 0 ⎠ , λ3 = ⎝ 0 −1 0 ⎠ ,
0 0 0 0 0 0 0 0 0
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
0 0 1 0 0 i 0 0 0
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
λ4 = ⎝ 0 0 0 ⎠ , λ5 = ⎝ 0 0 0 ⎠ , λ6 = ⎝ 0 0 1 ⎠,
1 0 0 −i 0 0 0 1 0
⎛ ⎞ ⎛ ⎞
0 0 0 1 0 0
⎜ ⎟ 1 ⎜ ⎟
λ7 = ⎝ 0 0 − i ⎠ , λ8 = √ ⎝ 0 1 0 ⎠.
3
0 i 0 0 0 −2

The group SU (3) consists of the matrices U (1 , · · · , 8 ) = ei ∑a a λa , where 1 , · · · , 8 are real
numbers and a ranges from 1 to 8. The Gell Man matrices satisfy the following relations:

tr (λ a λb ) = 2δab ,

58
Entropy 2017, 19, 347

[λ a /2, λb /2] = i f abc λc /2.

Here, we use the summation convention summing over repeated indices, and tr denotes standard
matrix trace, [ A, B] = AB − BA is the matrix commutator and δab is the Kronecker delta, equal to 1
when a = b, and equal to 0, otherwise. The structure coefﬁcients f abc take the following non-zero values:

f 123 = 1, f 147 = 1/2, f 156 = −1/2, f 246 = 1/2, f 257 = 1/2,

√ √
f 345 = 1/2, f 367 = −1/2, f 458 = 3/2, f 678 = 3/2.

We now give an iterant representation for these matrices that is based on the pattern
⎛ ⎞
1 A B
⎜ ⎟
⎝ B 1 A ⎠
A B 1

as described in the previous section. That is, we use the cyclic group of order three to represent all
3 × 3 matrices at iterants based on the permutation matrices
⎛ ⎞ ⎛ ⎞
0 1 0 0 0 1
⎜ ⎟ ⎜ ⎟
A=⎝ 0 0 1 ⎠,B = ⎝ 1 0 0 ⎠.
1 0 0 0 1 0

Recalling that [ a, b, c] as an iterant, denotes a diagonal matrix

⎛ ⎞
a 0 0
⎜ ⎟
[ a, b, c] = ⎝ 0 b 0 ⎠,
0 0 c

the reader will have no difﬁculty verifying the following formulas for the Gell Mann Matrices in the
iterant format:
λ1 = [1, 0, 0] A + [0, 1, 0] B,

λ2 = [−i, 0, 0] A + [0, i, 0] B,

λ3 = [1, −1, 0],

λ4 = [1, 0, 0] B + [0, 0, 1] A,

λ5 = [i, 0, 0] B + [0, 0, −i ] A,

λ6 = [0, 1, 0] A + [0, 0, 1] B,

λ7 = [0, −i, 0] A + [0, 0, i ] B,

1
λ8 = √ [1, 1, −2].
3
Letting Fa = λ a /2, we can now rewrite the Lie algebra into simple iterants of the form [ a, b, c] G
where G is a cyclic group element. Compare with [7]. Let

T± = F1 ± iF2 ,

U± = F6 ± iF7 ,

V± = F4 ± iF5 ,

T3 = F3 ,

59
Entropy 2017, 19, 347

2
Y = √ F8 .
3
Iterant Formulation of the su(3) Lie Algebra. We now have the speciﬁc iterant formulas

T+ = [1, 0, 0] A,

T− = [0, 1, 0] B,

U+ = [0, 1, 0] A,

U− = [0, 0, 1] B,

V+ = [0, 0, 1] A,

V− = [1, 0, 0] B,

T3 = [1/2, −1/2, 0],

1
Y = √ [1, 1, −2].
3
We have that A[ x, y, z] = [y, z, x ] A and B = A2 = A−1 so that B[ x, y, z] = [z, y, x ] B. We have reduced
the basic su(3) Lie algebra to a very elementary patterning of order three cyclic operations. In a subsequent
paper, we will use this point to view to examine the irreducible representations of this algebra and to
illuminate the Standard Model’s Eightfold Way.

8. Iterants, Braiding and the Sundance Bilson-Thompson Model for Fermions

In the last section, we based our iterant representations on the following patterns and matrices.
The pattern, ⎛ ⎞
1 A B
⎜ ⎟
⎝ B 1 A ⎠,
A B 1
uses the cyclic group of order three to represent all 3 × 3 matrices at iterants based on the
permutation matrices ⎛ ⎞ ⎛ ⎞
0 1 0 0 0 1
⎜ ⎟ ⎜ ⎟
A=⎝ 0 0 1 ⎠,B = ⎝ 1 0 0 ⎠.
1 0 0 0 1 0

Recalling that [ a, b, c] as an iterant denotes a diagonal matrix

⎛ ⎞
a 0 0
⎜ ⎟
[ a, b, c] = ⎝ 0 b 0 ⎠.
0 0 c

In fact, there are six 3 × 3 permuation matrices: { I, A, B, P, Q, R}, where

⎛ ⎞ ⎛ ⎞ ⎛ ⎞
0 1 0 1 0 1 0 0 1
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
P=⎝ 1 0 0 ⎠,Q = ⎝ 0 0 1 ⎠,R = ⎝ 0 1 0 ⎠.
0 0 1 0 1 0 1 0 0

We then have A = QP, B = PQ, R = PQP = QPQ. The two transpositions P and Q generate the entire
group of permuatations S3 . It is usual to think of the order-three transformations A and B as expressed
in terms of these transpositons, but we can also use the iterant structure of the 3 × 3 matrices to express
P, Q and R in terms of A and B. The result is as follows:

60
Entropy 2017, 19, 347

P = [0, 0, 1] + [1, 0, 0] A + [0, 1, 0] B,

Q = [1, 0, 0] + [0, 1, 0] A + [0, 0, 1] B,

R = [0, 1, 0] + [0, 0, 1] A + [1, 0, 0] B.

Recall from the previous section that we have the iterant generators for the su(3) Lie algebra:

T+ = [1, 0, 0] A,

T− = [0, 1, 0] B,

U+ = [0, 1, 0] A,

U− = [0, 0, 1] B,

V+ = [0, 0, 1] A,

V− = [1, 0, 0] B.

Thus, we can express these transpositions P and Q in the iterant form of the Lie algebra as

P = [0, 0, 1] + T+ + T− ,

Q = [1, 0, 0] + U+ + U− ,

R = [0, 1, 0] + V+ + V− .

The basic permutations receive elegant expressions in the iterant Lie algebra.
Now that we have basic permutations in the Lie algebra, we can take the map from Section 7

ρ : Alg[ FB3 ] −→ Alg[ FS3 ]

with
ρ(σk ) = [t, t, t] Tk

and
ρ(σk−1 ) = [t−1 , t−1 , t−1 ] Tk

for k = 1, 2 and send T1 to P and T2 to Q. Then, we have

ρ(σ1 ) = [t, t, t] P

and
ρ(σ1−1 ) = [t−1 , t−1 , t−1 ] P

and
ρ(σ2 ) = [t, t, t] Q

and
ρ(σ1−1 ) = [t−1 , t−1 , t−1 ] Q.

By choosing t
= 1 on the unit circle in the complex plane, we obtain representations of the
Sundance Bilson-Thompson constructions of Fermions via framed braids inside the su(3) Lie algebra.
This brings the Bilson-Thompson formalism in direct contact with the Standard Model via our iterant
representations. We shall return to these relationships in a sequel to the present paper.

61
Entropy 2017, 19, 347

9. Clifford Algebra, Majorana Fermions and Braiding

This section is based on our paper [1]. We show how the very simple Clifford algebra(s) that
come from iterants ﬁgure in studying Fermions and Majorana Fermions. This section also provides the
background for the next section on the Dirac equation. The original paper by Ettore Majorana [30]
led to the notion of Clifford algebraic Majorana operators that we discuss in this section. In the next
section on the Dirac equation, we show how this Clifford algebra is related to Majorana’s original
equation. A key relationship between the physics of the Quantum Hall effect and the kind of braiding
representations considered here originates with the paper of Moore and Read [31]. See also [28] where
we look at the combinatorial topology behind the braid group representations of Moore and Read.
Recall Fermion algebra. One has Fermion annihiliation operators ψ and their conjugate creation
operators ψ† . One has ψ2 = 0 = (ψ† )2 . There is a fundamental commutation relation

ψψ† + ψ† ψ = 1.

If you have more than one of them, say ψ and φ, then they anti-commute:

ψφ = −φψ.

Majorana Fermion operators c satisfy c† = c so that the corresponding particles are their own
anti-particles. A group of researchers [32] claims, at this writing, to have found Majorana Fermions in
edge effects in nano-wires.
Majorana operators are related to standard Fermions as follows: the algebra for Majoranas is
c = c† and cc = −c c if c and c are distinct Majorana Fermions with c2 = 1 and c2 = 1. One can make
a standard Fermion operator from two Majorana operators via

ψ = (c + ic )/2,

ψ† = (c − ic )/2.

Similarly, one can mathematically make two Majoranas from any single Fermion. If one takes a set
of Majoranas
{ c1 , c2 , c3 , · · · , c n },
then there are natural braiding operators that act on the vector space with these ck as the basis.
The operators are mediated by algebra elements that themselves satisfy braiding relations
√
τk = (1 + ck+1 ck )/ 2,
√
τk−1 = (1 − ck+1 ck )/ 2.

The Ivanov [33] braiding operators are

Tk : Span{c1 , c2 , · · · , , cn } −→ Span{c1 , c2 , · · · , , cn }

via
Tk ( x ) = τk xτk−1 .

The braiding is simply:

Tk (ck ) = ck+1 ,

Tk (ck+1 ) = −ck ,

and Tk is the identity otherwise. We have then a unitary representaton of the Artin braid group.
See Figure 7 for a depiction of the braiding of Majorana Fermions in relation to the topology of a

62
Entropy 2017, 19, 347

belt that connects them. In quantum mechanics, we must represent rotations of three-dimensional
space as unitary transformations. This relationship between rotations and unitary transformations
is encoded in the topology of the belt. See [34] for more about this topological view of the physics
of Fermions. In the ﬁgure, we see that the strictly topological belt does not know which of the two
Fermions will individually acquire a phase change, but the Ivanov algebra above makes this decision.
More understanding is needed in this area of subtle topological structure of Fermions.

y x
x y
y x

Topological Exchange

x y y x
Ivanov Braiding
Transformation
of Majorana Fermion
T(x) = y
T(y) = -x Operators
(Note that x goes to the y-position and
y goes to the x-position with a twist.)

Figure 7. Braiding action on a pair of fermions.

Recall that, in discussing the inception of iterants, we introduce a temporal shift operator η such that

[ a, b]η = η [b, a]

and
ηη = 1

for any iterant [ a, b]. In this way, we have a Clifford algebra generated by e = [1, −1] and η. We can
take e and η as Majorana Fermion operators and construct Fermion operators

ψ = (e + iη )/2,

ψ† = (e − iη )/2.

Here, i is an extra square root of minus one that commutes with the operators e and η. We arrive at
fermions in a few short steps from the origin of the iterants. Algebraically, we have controlled the
period two oscillation e so that it satisﬁes the fermion algebra. From the point of view taken in this
paper, it is worth examining if this discrete process view of fermion algebra and Majorana operator
algebra can shed light on the many properties in this domain. In particular, I would like to see if there
is insight into the braiding of Majorana Fermion operators to be gained from the iterant viewpoint.

10. The Dirac Equation and Majorana Fermions

This section goes beyond our paper [1]. We expand on the relationship of a nilpotent formulation
of the Dirac equation and an iterant formulation. We ﬁrst construct the Dirac equation. The algebra
underlying this equation has the same properties as the creation and annihilation algebra for Majorana
Fermion operators, so it is by way of this algebra that we will come to the Dirac equation.

63
Entropy 2017, 19, 347

If the speed of light is equal to 1 (by convention), then energy E, momentum p and mass m are
related by the (Einstein) equation
E2 = p2 + m2 .

Dirac constructed his equation by ﬁnding an algebraic square root of p2 + m2 . A corresponding linear
operator for E can then take the role of the Hamiltonian in the Schrödinger equation. We ﬁrst assume
that p is a scalar (using one dimension of space and one dimension of time). Let E = αp + βm, where α
and β are elements of a non-commutative, associative algebra. Then,

E2 = α2 p2 + β2 m2 + pm(αβ + βα).

Hence, E2 = p2 + m2 if α2 = β2 = 1 and αβ + βα = 0. We can use the iterant algebra generated by e

and η with α = e and β = η. Recall that the quantum operator for momentum is p̂ = −i∂/∂x and the
operator for energy is Ê = i∂/∂t. The Dirac equation is

Êψ = α p̂ψ + βmψ.

This becomes the explicit equation:

i∂ψ/∂t = −iα∂ψ/∂x + βmψ.

Let
O = i∂/∂t + iα∂/∂x − βm
so that the Dirac equation takes the form

O ψ( x, t) = 0.

A Plane Wave Solution to the Dirac Equation. Note that

O ei( px−Et) = ( E − αp − βm)ei( px−Et)

and note also that

( E + αp + βm)( E − αp − βm) = E2 − p2 − m2 = 0.
Thus, it follows that
φ = ( E + αp + βm)ei( px− Et)

is a solution of the Dirac equation.

Now let Δ = ( E − αp − βm) and let

U = Δβα = ( E − αp − βm) βα = βαE + βp − αm.

Then,
U 2 = − E2 + p2 + m2 = 0.

The nilpotent element U leads to the same plane wave solution to the Dirac equation as follows.
We have shown that
O ψ = Δψ
for ψ = ei( px− Et) . It then follows that

O( βαΔβαψ) = ΔβαΔβαψ = U 2 ψ = 0,

64
Entropy 2017, 19, 347

from which it follows that

ψ = βαUei( px− Et)

is a plane wave solution to the Dirac equation.

We can multiply the operator O by βα on the right, obtaining the operator

D = O βα = iβα∂/∂t + iβ∂/∂x − αm,

and the equivalent Dirac equation

D ψ = 0.
For ψ above, we have D(Uei( px− Et) ) = U 2 ei( px− Et) = 0. This beautiful observation that the Dirac
operator can be modified so that one can directly construct nilpotent solutions to the Dirac equation
was first made by Peter Rowlands [8] in the context of doubled quaternion algebra. Here we have
shown how Rowland’s work fits into the Clifford algebra and iterant approach to the Dirac equation.
Such solutions can be articulated into specific vector solutions by using either an iterant or matrix
representation of the algebra.

10.1. U and U † as Creation and Annihilation Operators

The Clifford algebra element U can be regarded (in the context of this rewrite of the Dirac equation) as a
creation operator for a Fermion.
If, reversing time, we let
ψ̃ = ei( px+ Et) ,

then
D ψ̃ = (− βαE + βp − αm)ψ = U † ψ̃,
giving a deﬁnition of U † for the anti-particle for Uψ.

U = βαE + βp − αm

and
U † = − βαE + βp − αm.

Note that here we have

(U + U † )2 = (2βp + αm)2 = 4( p2 + m2 ) = 4E2 ,

and
(U − U † )2 = −(2βαE)2 = −4E2 .
U 2 = (U † )2 = 0,

and
UU † + U † U = 4E2 .

The Fermion operator algebra emerges from these plane wave solutions to the Dirac equation.
The decomposition of Uand U † into the corresponding Majorana Fermion operators corresponds
to the decomposition of the energy into momentum and mass: E2 = p2 + m2 . Normalizing by dividing
by 2E, we have
A = ( βp + αm)/E

and
B = iβα,

65
Entropy 2017, 19, 347

so that
A2 = B2 = 1

and
AB + BA = 0.

Then,
U = ( A + Bi ) E

and
U † = ( A − Bi ) E,

showing how the Fermion operators are expressed in terms of the simpler Clifford algebra of Majorana
operators (split quaternions once again). We can take A = e and B = η and regard these Fermion
annihilation and creation operators in the simplest iterant framework.

10.2. Iterant Formulation of the Dirac Equation

Note that the solutions to the Dirac equation that we have written are expressed using abstract
algebra. To write explicit solutions using this algebraic approach, we can write

O = Ê − α P̂ − βm,

where Ê is the energy operator and p̂ is the momentum operator. Then, a solution

φ = A + αB + βC + αβD

of the Dirac equation consists in a quadruple of complex functions of ( x, t) such that

O φ = 0.

We can regard [ A, B, C, D ] = φ = A + αB + βC + αβD as an iterant that is acted upon by α and β.

We see that (by multiplying on the left)

[ A, B, C, D ]α = [ B, A, D, C ]

and
[ A, B, C, D ] β = [C, − D, A, − B].
Thus, the structure corresponds to the action of the split quaternions as a signed Klein 4-group.
The equation O φ = 0 becomes four operator equations involving these signed permutations:

O φ = ( Ê − α p̂ − βm)( A + αB + βC + αβD ) =

ÊA + α ÊB + β ÊC + αβ ÊD

−α p̂A − p̂B − αβ p̂C − β p̂D

− βmA + αβmB − mC + αmD
= ( ÊA − p̂B − mC ) + α( ÊB − p̂A + mD ) + β( ÊC − p̂D − mA) + αβ( ÊD − p̂C + mB).
Thus, O φ = 0 is equivalent to the set of equations

ÊA = p̂B + mC,

ÊB = p̂A − mD,

66
Entropy 2017, 19, 347

ÊC = p̂D + mA,

ÊD = p̂C − mB.

This, in turn, can be written in iterant form as

Ê[ A, B, C, D ] = p̂[ B, A, D, C ] + m[C, − D, A, − B] = p̂[ A, B, C, D ]α + m[ A, B, C, D ] β .

The plane wave solution φ = ( E + αp + βm)ei( px− Et) k corresponds, in this iterant formalism, to φ =
[ E, p, m, 0]ei( px−Et) .
In this way, we can think of a solution to the Dirac equation as an iterant composed of four
complex valued functions taken in order with the given action of the split quaternions as described
above. This can then be reformulated as single recursive system, as we did for the Schrödinger
equation in the introduction. The analogs for the way the recursion acts on the time steps of the
recursion are given by the action of the split quaternions rather than the action of the complex numbers
([ a, b]i = [−b, a]). The idea remains the same, and the matrix representations for the Dirac algebra arise
naturally from the algebra itself.

10.3. Writing in the Full Dirac Algebra

This section closely follows our paper [1] and is expanded for the discussion at the end. The aim
is to write the Dirac equation for three dimensions of space and one dimension of time, and then to
write a version of the Majorana–Dirac Equation (that can have real solutions) in terms of a doubled
split quaternion algebra, expressed in iterant language. This provides an alternative to working with
modiﬁcations of the 4 × 4 Dirac matrices. We formulate it to illustrate again the iterant concept and to
raise the question of ﬁnding other matrix representations for equations of Majorana type.
We have written the Dirac equation so far in one dimension of space and one dimension of time.
In order to write in three spatial dimensions, we take an independent Clifford algebra generated
by σ1 , σ2 , σ3 with σi2 = 1 for i = 1, 2, 3 and σi σj = −σj σi for i
= j. Assume that α and β generate an
independent Clifford algebra that commutes with the algebra of the σi . Replace the scalar momentum
p by a 3-vector momentum p = ( p1 , p2 , p3 ) and let p • σ = p1 σ1 + p2 σ2 + p3 σ3 . Replace ∂/∂x with
∇ = (∂/∂x1 , ∂/∂x2 , ∂/∂x2 ) and ∂p/∂x with ∇ • p.
The Dirac equation is then written

i∂ψ/∂t = −iα∇ • σψ + βmψ.

The Dirac operator is

O = i∂/∂t + iα∇ • σ − βm.
Using the Dirac operator, the Dirac equation is is

O ψ( x, t) = 0.

Let
ψ( x, t) = ei( p• x− Et)

and construct solutions by ﬁrst applying the Dirac operator to this ψ. The modiﬁed Dirac operator is

D = iβα∂/∂t + β∇ • σ − αm.

We have that
D ψ = Uψ,

67
Entropy 2017, 19, 347

where U = βαE + βp • σ − αm. Here, U 2 = 0 and Uψ is a solution to the modiﬁed Dirac Equation.
We can use the Fermion operators as creation and annihilation operators, and locate the corresponding
Majorana Fermion operators. We leave these details to the reader.

10.4. Majorana Fermions in the Sense of Majorana

We end with a brief discussion making Dirac algebra distinct from the one generated by
α, β, σ1 , σ2 , σ3 to obtain an equation that can have real solutions. This was the strategy that Majorana [30]
followed to construct his Majorana Fermions. A real equation can have solutions that are invariant
under complex conjugation and so can correspond to particles that are their own anti-particles. We will
describe this Majorana algebra in terms of the split quaternions and η. For convenience, we use the
matrix representation given below. The reader of this paper can substitute the corresponding iterants:

−1 0 0 1
= ,η = .
0 1 1 0

Let ˆ and η̂ generate another, independent algebra of split quaternions, commuting with the ﬁrst
algebra generated by and η. Then, a totally real Majorana Dirac equation can be written as follows:

(∂/∂t + η̂η∂/∂x + ∂/∂y + η∂/∂z

ˆ − ˆ η̂ηm)ψ = 0.

To see that this is a correct Dirac equation, note that

Ê = α x pˆx + αy pˆy + αz pˆz + βm

(Here, the “hats” denote the quantum differential operators corresponding to the energy and
momentum.) will satisfy
Ê2 = pˆx 2 + pˆy 2 + pˆz 2 + m2

if the algebra generated by α x , αy , αz , β has each generator of square one and each distinct pair of
generators anti-commuting. From there, we obtain the general Dirac equation by replacing Ê by i∂/∂t,
and pˆx with −i∂/∂x (and same for y, z):

(i∂/∂t + iα x ∂/∂x + iαy ∂/∂y + iαz ∂/∂z − βm)ψ = 0.

This is equivalent to
(∂/∂t + α x ∂/∂x + αy ∂/∂y + αz ∂/∂z + iβm)ψ = 0.
Thus, here we take
α x = η̂η, αy = , αz = η,
ˆ β = i ˆ η̂η,

and observe that these elements satisfy the requirements for the Dirac algebra. Since the algebra
appearing in the Majorana–Dirac operator is constructed entirely from two commuting copies of the
split quaternions, there is no appearance of the complex numbers, and when written out in 2 × 2
matrices, we obtain coupled real differential equations to be solved.

A solution to the Majorana–Dirac Equation. Let ρ( x, t) = e( p• x− Et) . Note that ρ is a a real-valued

function. Let
MO = (∂/∂t + η̂η∂/∂x + ∂/∂y + η∂/∂z
ˆ − ˆ η̂ηm).
This is the Majorana–Dirac operator, as we have explained above. Then, we have the equation

MO ρ( x, t) = (− E + η̂η p x + py + η
ˆ pz − ˆ η̂ηm)ρ( x, t).

68
Entropy 2017, 19, 347

Let
Γ = − E + η̂η p x + py + η
ˆ pz − ˆ η̂ηm,

and
Γ̂ = − E − η̂η p x − py − η
ˆ pz + ˆ η̂ηm.

Then, we have
Γ̂Γ = 0,

since all algebraic coefﬁcients square to minus one, and anti-commute. Therefore,

MO(Γ̂ρ( x, t)) = Γ̂Γρ( x, t) = 0.

Thus,
Γ̂ρ( x, t) = (− E − η̂η p x − py − η
ˆ pz + ˆ η̂ηm)ρ( x, t)

is a solution to the Majorana–Dirac equation. When this solution is written out into its components, it
is an entirely real valued solution since the components of the matrices representing the algebra are all
real numbers. Recall from the earlier part of this section that we were able to reformulate solutions of
this kind for the usual Dirac equation in terms of the nilpotent formalism with the algebraic element
U with U 2 = 0. Here, we can produce real solutions to the Majorana–Dirac equation, but it does not
seem possible to put them in the nilpotent formalism. This is surely a reﬂection of the fact that these
solutions are not Fermions in the usual sense. On the other hand, one can regard the solution Γ̂ρ( x, t)
in relation to the algebra element Γ̂, and this algebra element is a combination of Majorana Fermion
operators {η̂η, , η,
ˆ ˆ η̂η } in the sense of Clifford algebra or iterant operators that we have used earlier
in this paper. Thus, we see that there is at least the beginning of a relationship between the modern use
of the Majorana Fermion operators and the original intents of Ettore Majorana to ﬁnd real solutions to
the Dirac equation.
We would like to know if there are other ways to produce such real Dirac equations, and particularly
if there are ways to accomplish this aim that do not algebraically entangle the two copies of the split
quaternions as our construction (and Majorana’s original construction) seems to require.

Acknowledgments: It gives the author great pleasure to thank G. Spencer-Brown, James Flagg, Alex Comfort,
David Finkelstein, Pierre Noyes, Peter Rowlands, Sam Lomonaco, Bernd Schmeikal and Rukhsan Ul-Haq for
conversations related to the considerations in this paper.
Conﬂicts of Interest: The author declares no conﬂict of interest.

References
1. Kauffman, L.H. Iterants, Fermions and Majorana Operators. In Unified Field Mechanics—Natural Science Beyond
the Veil of Spacetime; Amoroso, R., Kauffman, L.H., Rowlands, P., Eds.; World Scientific Pub. Co.: Singapore,
2015; pp. 1–32.
2. Kauffman, L.H. Knot Logic. In Knots and Applications; Kauffman, L., Ed.; World Scientific Pub. Co.: Singapore,
1994; pp. 1–110.
3. Kauffman, L.H. Knot logic and topological quantum computing with Majorana fermions. In Logic and
Algebraic Structures in Quantum Computing and Information; Lecture Notes in Logic; Chubb, J., Eskandarian, A.,
Harizanov, V., Eds.; Cambridge University Press: Cambridge, UK, 2016; 124p.
4. Kauffman, L.H.; Lomonaco, S.J. Braiding, Majorana Fermions and Topological Quantum Computing, (to appear
in Special Issue of QIP on Topological Quantum Computing). In Proceedings of the 2nd International Conference
and Exhibition on Mesoscopic and Condensed Matter Physics, Chicago, IL, USA, 26–28 October 2016.
5. Ul Haq, R.; Kauffman, L.H. Iterants, Idempotents and Clifford algebra in Quantum Theory. arXiv 2017,
arXiv:1705.06600.
6. Bilson-Thompson, S.O. A topological model of composite fermions. arXiv 2006, arXiv:hep-ph/0503213.
7. Gasiorowicz, S. Elementary Particle Physics; Wiley: New York, NY, USA, 1966.

69
Entropy 2017, 19, 347

8. Rowlands, P. Zero to Infinity: The Foundations of Physics; Series on Knots and Everything, Volume 41; World
Scientific Publishing Co.: Singapore, 2007.
9. Spencer-Brown, G. Laws of Form; George Allen and Unwin Ltd.: London, UK, 1969.
10. Kauffman, L. Sign and Space. In Religious Experience and Scientific Paradigms, Proceedings of the 1982 IASWR
Conference; Institute of Advanced Study of World Religions: Stony Brook, NY, USA, 1985; pp. 118–164.
11. Kauffman, L. Self-reference and recursive forms. J. Soc. Biol. Struct. 1987, 10, 53–72.
12. Kauffman, L. Special relativity and a calculus of distinctions. In Proceedings of the 9th Annual International
Meeting of ANPA, Cambridge, UK, 23–28 September 1987; pp. 290–311.
13. Kauffman, L. Imaginary values in mathematical logic. In Proceedings of the Seventeenth International
Conference on Multiple Valued Logic, Boston, MA, USA, 26–28 May 1987; pp. 282–289.
14. Kauffman, L.H. Biologic. AMS Contemp. Math. Ser. 2002, 304, 313–340.
15. Kauffman, L.H. Temperley-Lieb Recoupling Theory and Invariants of Three-Manifolds (Annals Studies-114);
Princeton University Press: Princeton, NJ, USA, 1994.
16. Kauffman, L.H. Time imaginary value, paradox sign and space. In Computing Anticipatory Systems, Proceedings
of the AIP Conference CASYS—Fifth International Conference, Liege, Belgium, 13–18 August 2001; Dubois, D., Ed.;
AIP Conference Publishing: Melville, NY, USA, 2002; Volume 627.
17. Schmiekal, B. Decay of Motion: The Anti-Physics of SpaceTime; Nova Publishers, Inc.: Hauppauge, NY, USA, 2014.
18. Kauffman, L.H.; Noyes, H.P. Discrete Physics and the Derivation of Electromagnetism from the formalism of
Quantum Mechanics. Proc. R. Soc. Lond. A 1996, 452, 81–95.
19. Kauffman, L.H.; Noyes, H.P. Discrete Physics and the Dirac Equation. Phys. Lett. A 1996, 218, 139–146.
20. Kauffman, L.H. Noncommutativity and discrete physics. Phys. D Nonlinear Phenom. 1998, 120, 125–138.
21. Kauffman, L.H. Space and time in discrete physics. Int. J. Gen. Syst. 1998, 27, 241–273.
22. Kauffman, L.H. A non-commutative approach to discrete physics. In Aspects II: Proceedings of ANPA 20; ANPA:
Stanford, CA, USA, 1999; pp. 215–238.
23. Kauffman, L.H. Non-commutative calculus and discrete physics. In Boundaries: Scientific Aspects of ANPA 24;
ANPA: Stanford, CA, USA, 2003; pp. 73–128.
24. Kauffman, L.H. Non-commutative worlds. New J. Phys. 2004, 6, 73.
25. Kauffman, L.H. Non-commutative worlds and classical constraints. In Scientific Essays in Honor of Pierre Noyes
on the Occasion of His 90-th Birthday; Amson, J., Kaufman, L.H., Eds.; World Scientific Pub. Co.: Singapore,
2013; pp. 169–210.
26. Kauffman, L.H. Differential geometry in non-commutative worlds. In Quantum Gravity: Mathematical Models
and Experimental Bounds; Fauser, B., Tolksdorf, J., Zeidler, E., Eds.; Birkhauser: Basel, Switzerland, 2007;
pp. 61–75.
27. Kauffman, L.H. Knot Logic and Topological Quantum Computing with Majorana Fermions. arXiv 2013,
arXiv:1301.6214.
28. Kauffman, L.H.; Lomonaco, S.J., Jr. q-deformed spin networks, knot polynomials and anyonic topological
quantum computation. J. Knot Theory Ramif. 2007, 16, 267–332.
29. Cheng, T.P.; Lee, L.F. Gauge Theory of Elementary Particles; Clarendon Press: Oxford, UK, 1988.
30. Majorana, E. A symmetric theory of electrons and positrons. I Nuovo Cimento 1937, 14, 171–184.
31. Moore, G.; Read, N. Noabelions in the fractional quantum Hall effect. Nucl. Phys. B 1991, 360, 362–396.
32. Mourik, V.; Zuo, K.; Frolov, S.M.; Plissard, S.R.; Bakkers, E.P.A.M.; Kouwenhuven, L.P. Signatures of Majorana
fermions in hybred superconductor-semiconductor devices. Science 2012, 336, 1003–1007.
33. Ivanov, D.A. Non-abelian statistics of half-quantum vortices in p-wave superconductors. Phys. Rev. Lett. 2001,
86, 268, doi:10.1103/PhysRevLett.86.268.
34. Kauffman, L.H. Knots and Physics; World Scientific Pub., Co.: Singapore, 2012.

c 2017 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

70
entropy
Article
A No-Go Theorem for Observer-Independent Facts
Časlav Brukner 1,2
1 Vienna Center for Quantum Science and Technology (VCQ), Faculty of Physics, University of Vienna,
Boltzmanngasse 5, A-1090 Vienna, Austria; [email protected]
2 Institute of Quantum Optics and Quantum Information (IQOQI), Austrian Academy of Sciences,
Boltzmanngasse 3, A-1090 Vienna, Austria

Received: 5 April 2018; Accepted: 2 May 2018; Published: 8 May 2018

Abstract: In his famous thought experiment, Wigner assigns an entangled state to the composite
quantum system made up of Wigner’s friend and her observed system. While the two of them have
different accounts of the process, each Wigner and his friend can in principle verify his/her respective
state assignments by performing an appropriate measurement. As manifested through a click in a
detector or a speciﬁc position of the pointer, the outcomes of these measurements can be regarded as
reﬂecting directly observable “facts”. Reviewing arXiv:1507.05255, I will derive a no-go theorem for
observer-independent facts, which would be common both for Wigner and the friend. I will then
analyze this result in the context of a newly-derived theorem arXiv:1604.07422, where Frauchiger
and Renner prove that “single-world interpretations of quantum theory cannot be self-consistent”.
It is argued that “self-consistency” has the same implications as the assumption that observational
statements of different observers can be compared in a single (and hence an observer-independent)
theoretical framework. The latter, however, may not be possible, if the statements are to be understood
as relational in the sense that their determinacy is relative to an observer.

Keywords: Wigner-friend experiment; no-go theorem; quantum foundations; interpretations of

quantum mechanics

1. Introduction
One of the most debated situations concerning the quantum measurement problem is described
in the thought experiment of the so-called “Wigner’s friend”. The experiment involves a quantum
system and an observer (Wigner’s friend) who performs measurements on this system in a sealed
laboratory. A “super-observer” (Wigner) is placed outside the laboratory. While for the friend,
the measurement outcome is reﬂected in a property of the device recording it (e.g., in the form of a
click in a photo-detector or a certain position of a pointer device), Wigner can describe the process
unitarily on the basis of the information that is in principle available to him. At the end of the process,
the friend projects the state of the system corresponding to the observed outcome, whereas Wigner
assigns a speciﬁc entangled state to the system and the friend, which he can verify performing a further
experiment. When Wigner’s friend observes an outcome, does the state collapse for Wigner as well?
If not, how can we reconcile their different accounts of the process?
The thought experiment of Wigner’s friend has great conceptual value, as it challenges different
approaches to understanding quantum theory. In his original work [1], Wigner designed the experiment
to support his view that consciousness is necessary to complete the quantum measurement process.
According to the many-worlds interpretation [2], there are many copies of Wigner’s friend in different
“worlds”. Each copy observes one outcome, a different one in each world. According to the Copenhagen,
relational [3] and quantum Bayesian [4] interpretations, the state is defined only relative to the observer;
relative to the friend, the state is projected, while relative to Wigner, it is in a superposition. Either

Entropy 2018, 20, 350; doi:10.3390/e20050350 71 www.mdpi.com/journal/entropy

Entropy 2018, 20, 350

way, supporters of any of these interpretations will arrive at the same predictions in Wigner’s verifying
experiment. In contrast, objective collapse theories [5–7] predict that the quantum state collapses when
a superposed system reaches a certain threshold of mass, size, complexity, etc., such that it becomes
impossible to even prepare the entangled state of Wigner’s friend and the system. Consequently,
Wigner’s state assignment can statistically be disproved repeating the verifying experiment.
The descriptions of “what is happening inside the lab” as given by Wigner and Wigner’s friend
respectively will differ. This difference need not pose a consistency problem for quantum theory,
for example if one takes the view that the theory gives the physical description relative to the observer
and her/his measuring apparatus in agreement with [3]. As long as the two observers do not exchange
the information about their outcomes, they will remain separated from each other, each holding a
different description of the systems with respect to their individual experimental arrangements. If they
do compare their predictions, they will agree. For example, should the friend communicate her result
to Wigner, this would collapse the state he assigns to the friend and the system. This suggests that
there should be no tension in accepting that, relative to their experimental arrangements, Wigner’s
friend in her measurement, as well as Wigner in his verifying measurement, each obtains a respective
measurement outcome. Since these outcomes are usually manifested as clicks in detectors or definite
positions of a pointer, they can be considered as directly accessible “facts”. Quite naturally, the question
arises: Can the facts as observed by Wigner and by Wigner’s friend be jointly considered as objective
properties of the world, in which case we might call them “facts of the world”? What we mean with
this question is whether there exists any theory, potentially different from quantum theory, where a
joint probability may be assigned for Wigner’s outcome and for that of his friend.
Reviewing the results of [8], I will derive a Bell-type no-go theorem for observer-independent
facts, showing that there can be no theory in which Wigner’s and Wigner’s friend’ facts can jointly
be considered as (local) objective properties. More precisely, I will show that the assumptions of
“locality”, “freedom of choice” and “universality of quantum theory” (the latter in the sense that
there are no constraints of the system to which the theory can be applied) are incompatible with
the assumption of observer-independent facts, i.e., under the assumptions one cannot define joint
probabilities for Wigner’s outcome and for that of Wigner’s friend. This might indicate that in quantum
theory, we can only define facts relative to an observation and an observer. I will then analyze the
relation of these results to the theorem developed by Frauchinger and Renner [9], which proves that
“single-world interpretations of quantum theory cannot be self-consistent”. In particular, I will argue
that the implications of their “self-consistency” requirement are equivalent to those of a theoretical
framework in which the truth values of the observational statements by Wigner and Wigner’s friend
can be jointly assigned and then whether they are consistent or not verified. However, in the view of
the no-go theorem, this in general need not be possible in a physical theory; the theory may operate
only with facts relative to the observer.
It should be emphasized that the no-go theorem applies to “facts” understood as “immediate
experiences of observers”; it may refer to what various interpretations of quantum mechanics assume to
be “real” (e.g., the wave function of the Universe, Bohmian’s trajectories, etc.) only to the extent to which
these “realities” give rise to directly observable facts in terms of detector clicks or pointer positions.

2. Deutsch’s Version of Wigner’s Friend Experiment

The standard description of the Wigner-friend thought experiment involves a quantum two-level
system (System 1, e.g., a spin-1/2 particle), which can give rise to two outcomes upon measurement
(e.g., two opposite directions when passing through a Stern–Gerlach apparatus). The outcomes are
recorded by a measurement apparatus and eventually in the friend’s memory (System 2). Now,
Wigner is placed outside the isolated laboratory in which the experiment takes place and can perform a
quantum measurement on the overall system (spin-1/2 particle + friend’s laboratory). Take it that all
experiments are carried out a sufficient number of times to collect statistics.

72
Entropy 2018, 20, 350

For concreteness, suppose that a measurement of spin along z is performed on a particle

initially prepared in state | x +S = √1 (|z+S + |z−S ), where subscript S refers to the spin. After the
2
measurement is completed, the measurement apparatus is found in one of many perceptively different
macroscopic configurations, like different positions of a pointer along a scale. If the apparatus pointer
is found in a specific position along the scale, the friend can say that the observable spin z has the
value “up” or “down”. Note that for the present argument, we need not make any assumption about
how the friend formally describes the spin and the apparatus, which measurement formalism she uses
or even if she uses quantum theory for it. All that is needed is the assumption that the friend perceives
a definite outcome.
Wigner uses quantum theory to describe the friend’s measurement. From his perspective,
the measurement is described by a unitary transformation. The different possible spin states |z+S
and |z−S are supposed to get entangled to the perceptively different macroscopic configurations of
the apparatus and the parts of the laboratory including the friend’s memory. The states of different
macroscopic configurations are represented by orthogonal states | Fz+ F and | Fz− F , respectively.
We assume that the state of the composite system “spin + friend’s laboratory” is given by:

1
|ΦSF = √ (|z+S | Fz+ F + |z−S | Fz− F ) , (1)
2

where the particular phase (here “+”) between the two amplitudes in Equation (1) is specified by the
measurement interaction in control of Wigner (note that if Wigner did not know this phase due to
the lack of control of it, he would describe the “spin + friend’s laboratory” in an incoherent mixture
of the two possibilities). Wigner can verify his state assignment (1), for example by performing
a Bell state measurement in the basis: |Φ± SF = √1 (|z+S | Fz+ F ± |z−S | Fz− F ) and |Ψ± SF =
2
√1
2
(|z+S | Fz− F ± |z−S | Fz+ F ).
The fact that the friend and Wigner have different accounts of the friend’s measurement process is
at the heart of the discussion surrounding the Wigner-friend thought experiment. Still, the difference
need not give rise to any inconsistency in practicing quantum theory, since the two descriptions
belong to two different observers, who remain separated in making predictions for their respective
systems. The novelty of Deutsch’s proposal [10] lies in the possibility for Wigner to acquire direct
knowledge on whether the friend has observed a definite outcome upon her measurement or not
without revealing what outcome she has observed. The friend could open the laboratory in a manner
that allowed communication (e.g., a specific message written on a piece of paper) to be passed outside
to Wigner, keeping all other degrees of freedom fully isolated, as illustrated in Figure 1. Obviously,
it is of central importance that the message does not contain any information concerning the specific
observed outcome (which would destroy the coherence of state (1)), but merely an indication of the
kind: “I have observed a definite outcome” or “I have not observed a definite outcome”. If the message
is encoded in the state of system M, the overall state is:

1
|ΦSFM = √ (|z+S | Fz+ F + |z−S | Fz− F ) |“I have observed a deﬁnite outcome“ M , (2)
2

since the state of the message is factorized out from the total state (I leave the option for the message
“I have not observed a definite outcome” out, as it conflicts with our experience of the situation
that we refer to as measurement and it also can be used to violate the bound on quantum state
discrimination [8]).
If we assume the universality of quantum theory in the sense that it can be applied at any scale,
including the apparatus, the entire laboratory and even the observer’s memory, we conclude that
the message will indicate that the friend perceives a definite outcome and yet Wigner will confirm
his state assignment (1). This should be contrasted to the “collapse models” by Ghirardi, Rimini and
Weber [5] or by Diosi [6] and Penrose [7], which predict a breakdown of the quantum-mechanical laws

73
Entropy 2018, 20, 350

at some scale. In the presence of such a collapse, the prediction based on Wigner’s state assignment
will statistically deviate from the result obtained in the veriﬁcation test.

Figure 1. Deutsch’s version of the Wigner-friend thought experiment. An observer (Wigner’s friend)
performs a Stern–Gerlach experiment on a spin 1/2 particle in a sealed laboratory. The outcome,
either “spin up” or “spin down”, is recorded in the friend’s laboratory, including her memory.
A super-observer (Wigner) describes the entire experiment as a unitary transformation resulting
in an encompassing entangled state between the system and the friend’s laboratory. The friend is
allowed to communicate a message, which only reports whether she sees a deﬁnite outcome or not,
without in any way revealing the actual outcome she observes.

3. The No-Go Theorem

We have seen that Wigner not only perceives his own facts, he is also able to obtain a direct
evidence for the existence of the friend’s facts (although without knowing which speciﬁc outcome
has been realized in the laboratory). This strongly suggests that Wigner’s and Wigner’s friend’s facts
coexist. We pose the question: Is there a theoretical framework, potentially going beyond quantum
theory, in which one can account for observer-independent facts, ones that hence can be called “facts
of the world”? In such a framework, one could assign jointly truth values to both the observational
statement A1 : “The pointer of Wigner’s friend’s apparatus points to result z+” and A2 : “The pointer
of Wigner’s apparatus points to result Φ”.
One important remark: Whenever Wigner performs his measurement, he can inform the friend
about the outcome he observed. Hence, Wigner’s friend can learn Wigner’s outcome in addition
to the outcome she herself observed directly. In this way, Wigner’s friend can know the truth
values of both statements A1 and A2 . The assumption of “observer-independent facts” is a stronger
condition: we require an assignment of truth values to statements A1 and A2 independently of which
measurement Wigner performs. Wigner can either perform his verifying experiment or he can perform
Wigner’s friend’s measurement (for example, by opening the lab, or learning it from the friend).
In either experiment, the observed outcome (e.g., “Φ” and “z+”, respectively) is required to reveal the
assigned truth value for A1 or A2 . We formalize the requirement of “observer-independent facts” in
the following assumption.

Postulate 1. (“Observer-independent facts”) The truth values of the propositions Ai of all observers form a
Boolean algebra A. Moreover, the algebra is equipped with a (countably additive) positive measure p( A) ≥ 0 for
all statements A ∈ A, which is the probability for the statement to be true.

In the proof, we will only use the conjunction of propositions of different observers, which is a
weaker requirement. Furthermore, we use a countably additive measure since we are dealing with only
a countable (in fact only a finite) set of elements. In Boolean algebra, one can build the conjunction,
the disjunction and the negation of the statements. A typical example of a Boolean algebra is set theory.
The operations are identified with the set theoretic intersection, union and complement, respectively.
This is significant in the context of classical physics, where the propositions can be represented by
subsets of a phase space. In the present context, one can jointly assign truth values “true” or “false” to

74
Entropy 2018, 20, 350

statements A1 and A2 about observations made by Wigner’s friend and Wigner, respectively. Moreover,
one can build the conjunction A1 ∩ A2 and assign joint probability p( A1 = ±1, A2 = ±1), where A1
is observed by the friend and A2 by Wigner (and where truth value “true” corresponds to a value
of one and “false” to −1). Note that since observables corresponding to A1 and A2 do not commute
with each other, this amounts to introducing “hidden variables”, for which we now formulate a Bell’s
theorem [11].

Theorem 1. (No-go theorem for “observer-independent facts”) The following statements are incompatible
(i.e., lead to a contradiction)
1. “Universal validity of quantum theory”. Quantum predictions hold at any scale, even if the measured
system contains objects as large as an “observer“ (including her laboratory, memory etc.).
2. “Locality”. The choice of the measurement settings of one observer has no inﬂuence on the outcomes of the
other distant observer(s).
3. “Freedom of choice”. The choice of measurement settings is statistically independent from the rest of
the experiment.
4. “Observer-independent facts”. One can jointly assign truth values to the propositions about observed
outcomes (“facts”) of different observers (as speciﬁed in the postulate above).

Before going to the proof, I make two comments. Firstly, we use word "universal" in assumption 1
in the sence of Peres [12]: “There is nothing in quantum theory making it applicable to three atoms
and inapplicable to 1023 ... Even if quantum theory is universal, it is not closed. A distinction must be
made between endophysical systems—those which are described by the theory—and exophysical ones,
which lie outside the domain of the theory (for example, the telescopes and photographic plates used
by astronomers for verifying the laws of celestial mechanics). While quantum theory can in principle
describe anything, a quantum description cannot include everything. In every physical situation
something must remain unanalyzed. This is not a ﬂaw of quantum theory, but a logical necessity ...”.
Secondly, the theorem can be derived by replacing assumptions 2, 3 and 4 with a single assumption
of Bell’s “local causality”. The latter already implies the existence of (local) probabilities for “joint facts”
for Wigner and Wigner’s friend [13], which is the subject of the present no-go theorem. The reason for
working with the present choice of assumptions is that the relevance of the theorem for the propositions
different observers make about their respective outcome becomes apparent.

Proof. With reference to Figure 2, consider a pair of super-observers (Alice and Bob) who can carry out
experiments on two systems that include a laboratory for each system, in each of which an observer
(Charlie and Debbie, respectively) performs a measurement on a spin-1/2 particle. We consider
a Bell inequality test and assume that Alice chooses between two measurement settings A1 and
A2 , and similarly, Bob chooses between B1 and B2 . The settings A1 and A2 correspond to the
observational statements Charlie and Alice can make about their respective outcomes, respectively.
Similarly, the settings B1 and B2 correspond to observational statements of Debbie and Bob, respectively.
Assumptions (2), (3) and (4) together account for the existence of local hidden variables that predeﬁne
the values for A1 , A2 , B1 and B2 to be +1 or −1. Moreover, the assumptions imply the existence
of the joint probability p( A1 , A2 , B1 , B2 ) whose marginals satisfy the Clauser–Horne–Shimony–Holt
inequality (CHSH): S = A1 B1 + A1 B2 + A2 B1 − A2 B2 ≤ 2. Here, for example, A1 B1 =
∑ A1 ,B1 =−1,1 A1 B1 p( A1 , B1 ) and p( A1 , B1 ) = ∑ A2 ,B2 =−1,1 p( A1 , A2 , B1 , B2 ) and similarly for other cases.
Suppose that Charlie and Debbie initially share an entangled state of two respective spin-1/2
particles S1 and S2 in a state:

θ θ
|ψS1 S2 = − sin |φ+ S1 S2 + cos |ψ− S1 S2 , (3)
2 2

where |φ+ S1 S2 = √1 (|z+S1 |z+S2 + |z−S1 |z−S2 ) and |ψ− S1 S2 = √1 (|z+S1 |z−S2 −
2 2
|z−S1 |z+S2 ), and the ﬁrst spin is in possession of Charlie and the second of Debbie. The state can be

75
Entropy 2018, 20, 350

obtained by applying rotation ( ⊗ e− 2 θσy )|ψ− S1 S2 to the singlet state |ψ− S1 S2 =
i
√1 (| z +S | z −S −
2 1 2
|z−S1 |z+S2 ), where θ is the angle of rotation of Debbie’s spin around the y-axis and σy is a Pauli
matrix. This particular choice of the state enables all measured observables to be either of the Wigner’s
friend type, or of the Wigner type.
For Alice and Bob, the overall state of the spins together with Charlie’s and Debbie’s laboratories
is initially:
| Ψ 0 = | ψ S1 S2 | 0 C | 0 D , (4)

in agreement with Assumption 1. The state |0C |0 D of the two observers does not require further
characterization, except for the description of observers capable of completing a measurement.
Now, Charlie and Debbie each perform a measurement of the respective spin along the z direction.
This measurement procedure is described as a unitary transformation from the point of view of
Alice and Bob. We assume that after Charlie and Debbie complete their measurement, the overall
state becomes:
θ θ
|Ψ̃ = − sin |Φ+ + cos |Ψ− , (5)
2 2
where:
1
|Φ+ = √ (| Aup | Bup + | Adown | Bdown ), (6)
2
− 1
|Ψ = √ (| Aup | Bdown − | Adown | Bup ) (7)
2

and:

| Aup = |z+S1 |Cz+ C , | Bup = |z+S2 | Dz+ D , (8)

| Adown = |z−S1 |Cz− C , | Bdown = |z−S2 | Dz− D . (9)

We take now θ = π/4 and deﬁne two sets of (binary) observables, which play the same role of
spin (Pauli) operators along the z and x axis, respectively: Az = | Aup Aup | − | Adown Adown | and
A x = | Aup Adown | + | Adown Aup | for Alice and similarly Bz and Bx for Bob. In the Bell experiment,
Alice chooses between A1 = Az and A2 = A x , whereas Bob chooses between B1 = Bz and B2 = Bx .
Note that Alice and Bob each choose between the friend’s (A1 and B1 ) and Wigner’s (A2 and B2 ) type √
of measurement. The Bell test with these measurement settings and state (5) results in SQ = 2 2.
The violation of the inequality implies that the conjunction of the assumptions (1–4) used to derive it
is untenable.

Figure 2. A Bell experiment on two entangled observers in a Wigner-friend scenario. The super-observers
Alice and Bob perform their respective measurements on laboratories containing the observers Charlie
and Debbie, who both perform a Stern–Gerlach measurement on their respective spin-1/2 particles.

In Appendix A, we present a Greenberger–Horne–Zeilinger type of the theorem with three

Wigners and three friends. There, the discrepancy between quantum theory and the theories respecting
(2–4) is no more of a probabilistic, but of a deterministic nature.

76
Entropy 2018, 20, 350

We conclude that Wigner, even as he has clear evidence for the occurrence of a deﬁnite outcome
in the friend’s laboratory, cannot assume any speciﬁc value for the outcome to coexist together
with the directly observed value of his outcome, given that all other assumptions are respected.
Moreover, there is no theoretical framework where one can assign jointly the truth values to
observational propositions of different observers (they cannot build a single Boolean algebra) under
these assumptions. A possible consequence of the result is that there cannot be facts of the world
per se, but only relative to an observer, in agreement with Rovelli’s relative-state interpretation [3],
quantum Bayesianism (already in 1996, in the “Replies to Referee 4” of [14], Fuchs drew a distinction
between “facts for the agent” and “facts for everybody”) [4], as well as the (neo)-Copenhagen
interpretation [8]. It is interesting to note that a similar view was expressed by Jammer as early
as in 1974 [15], when he wrote that “the description of the state of a system, rather than being restricted
to the particle (or systems of particles) under observation, expresses a relation between the particle
and all the measurement devices involved.” Other possible interpretations of the violation of Bell’s
inequalities include violations of Assumption 1 in collapse models [5–7], of Assumption 2 in non-local
hidden variable models such as de Broglie–Bohm theory [16] or of Assumption 3 in superdeterministic
theories [17]. The proper account of the result in the many-worlds interpretation should be found in the
interpretation’s account of Bell’s inequality violation [18,19] and points again to observer-dependent
facts as they depend on the branch of the many worlds.

4. Relation to the Paper by Frauchiger and Renner, arXiv: 1604.07422

Building upon works by Deutsch [10], Hardy [20,21] and [8] reviewed above, Frauchiger and
Rennen [9] proposed an “extended Wigner-friend thought experiment“, from which they concluded
that “single-world interpretations of quantum theory cannot be self-consistent“. The implications of
their argument have been discussed since then [4,22–24].
The claim of [9] is based on an incompatibility proof stating that there cannot exist a physical
theory Tthat would fulfill the following three properties (informal versions; see [9] for details):
(QT) “Compliance with quantum theory: T forbids all measurement results that are forbidden by
standard quantum theory (and this condition holds even if the measured system is large enough
to contain itself an experimenter).”
(SW) “Single-world: T rules out the occurrence of more than one single outcome if an experimenter
measures a system once.”
(SC) “Self-consistency: T’s statements about measurement outcomes are logically consistent (even if
they are obtained by considering the perspectives of different experimenters).”
Property (QT) is essentially a weaker version of our Assumption 1 where it is sufficient to require
the validity of quantum theory for results with vanishing probability (as the argument is possibilistic,
not probabilistic). An example of a theory-violating property (SW) is the many-worlds interpretation
of quantum theory.
The argument combines a set of statements that involves different observers F1 , F2 , A and Wand
can be drawn on the basis of theory T:
S1 If F1 sees r = t, then W sees w
= ok.
S2 If F2 sees z = +, then F1 sees r = t.
S3 If A sees x = ok, then F2 sees z = +.
S4 W sees w = ok and is told by A that x = ok.
The specific type of quantum state, measurements and outcomes involved in the argument is not
relevant for further discussion and will be omitted here.
Property (SC) is crucial in a step of the proof, where one combines “nested” statements (S1 –S4 ) [25].
In the first step, the self-consistency property (SC) implies the following:
Sa ∩ Sb =⇒ Sc (10)
where ∩ denotes logical “and” and the statements are of the type:

77
Entropy 2018, 20, 350

Sa Observer W assigns the truth value “true” to the statement: “A sees x = ok”;
Sb Observer A assigns the truth value “true” to the statement: “If x = ok, then F2 sees z = +”;
Sc Observer W assigns the truth value “true” to the statement: “A concludes that F2 sees
z = +”.

By repeating reasoning (10) in an iterative way, starting from statement S4 –S1 , one arrives at a
new statement:

T Observer W concludes that A concludes that F2 concludes that F1 concludes that w

= ok.

It is important to note that this statement refers to W’s conclusion about what other observers
conclude when they apply T conditional on the outcomes they observe. It is not a statement about his
directly observed outcome.
In the second step, the self-consistency property (SC) is used to arrive at an implication of the
following type:
T =⇒ S. (11)
where the implied statement is:

S Observer W concludes that w

= ok,

which stands in logical contradiction with W’s directly observed outcome w = ok.
The second step is non-trivial. It enables promoting others’ knowledge based on their observations
to ones’ own knowledge and then to put this “promoted knowledge” in logical comparison with ones’
own knowledge gained through direct observation. Through implication (11), the self-consistency
property (SC) enables observational statements of other observers (A, F2 and F1 ) to be logically
compared with ones (W) own. This has the same predictive power as a theoretical framework in
which the truth values of statements of different observers can jointly be assigned and compared.
To see this, denote statements Si , i = 1, 2, 3 as implications S1 : (P =⇒ Q), S2 : (Q =⇒ R) and
S3 : (R =⇒ S), where P: “A sees x = ok”, Q: “F2 sees z = +”, R: “F1 sees r = t” and S: “W sees
w
= ok”. Then, “collapsing” others’ knowledge into W’s knowledge via Equation (11) is equivalent in
its implications to considering all the statements as belonging to a single Boolean algebra (i.e., they are
now all propositions of observer W, who can apply logical operations on them) for which one can
use the transitivity of implication to arrive at [P ∩ (P =⇒ Q) ∩ (Q =⇒ R) ∩ (R =⇒ S)] =⇒ S.
Statement S is again in logical contradiction to W’s directly observed outcome w = ok.
We have seen that the existence of a single Boolean algebra for truth values for observational
statements of different observers is incompatible with the assumptions of “locality”, “freedom of
choices” and the predictions of quantum theory, which does not impose any constraints on the objects
to which it is applied. This might be interpreted as an indication that the strong conclusions implied
by the theorem of [9] rely on a too restrictive requirement of property (SC) on a physical theory.
The requirement needs not only be fulfilled in quantum theory, but in other physical theories, as well.
An example was provided by Sudbery [23]: In the special theory of relativity, due to time dilation, every
inertial observer can claim that her/his clock ticks slower than that of a moving partner. This apparent
contradiction in predictions of different observers is resolved when one realizes that the statements only
have meaning with respect to the specific, observer-dependent measurement procedures that define
“simultaneity”. Similarly, the states referring to outcomes of different observers in a Wigner-friend
type of experiment cannot be defined without referring to the specific experimental arrangements
of the observers, in agreement with Bohr’s idea of contextuality as formulated by him in 1963 [26]:
“the unambiguous account of proper quantum phenomena must, in principle, include a description of
all relevant features of experimental arrangement.”
I conclude with a remark that the theorem by Frauchiger and Renner has deep conceptual value,
as it points to the necessity to differentiate between ones’ knowledge about direct observations and
ones’ knowledge about others’ knowledge that is compatible with physical theories. It is likely that

78
Entropy 2018, 20, 350

understanding this difference will be an important ingredient in further development of the method of
Bayesian inference in situations as in the Wigner-friend experiment.

Funding: I acknowledge the support of the Austrian Science Fund (FWF) through the project I-2526-N27.
This research was funded by [John Templeton Foundation] grant number [60609]. The opinions expressed in this
publication are those of the authors and do not necessarily reflect the views of the John Templeton Foundation.
Acknowledgments: I acknowledge helpful discussions with Mateus Araújo, Veronika Baumann, Adán Cabello,
Giulio Chiribella, Christopher Fuchs, Borivoje Dakić, Philipp Höhn, Nikola Paunković, Lídia del Rio, Rüdiger
Schack and Stefan Wolf. I would like to especially acknowledge the fruitful discussions with Renato Renner and
thank him for providing notes summarizing that discussion.
Conﬂicts of Interest: The author declares no conﬂict of interest.

Appendix A
The Bell theorem from the main text can be extended to a Greenberger–Horne–Zeilinger (GHZ)
version [27] with three friends and three Wigners. Since the incompatibility of Assumptions 1–4 is not of a
probabilistic, but rather of a deterministic nature, this version of the theorem completely bypasses any use
of the notion of probability, similarly to the version by Frauchiger and Renner [9]. The experiment was
independently introduced in [28], where it was argued that it suggests a violation of Lorentz symmetry.
Consider three spatially-separated observers (Wigners), Alice, Bob and Cleve. They each perform a
measurement on a subsystem of a tripartite system. Each of the subsystems includes a further observer,
Debbie, Eric and Fiona (Wigner’s friends), who perform a Stern–Gerlach measurement of spin along x of
their respective spin-1/2 particles. Alice measures Debbie and her spin particle; Bob measures Eric and his
spin particle; and finally, Cleve measures Fiona and her spin particle. We consider a GHZ test where Alice
chooses between two measurement settings: A1 and A2 , Bob between B1 and B2 and Cleve between C1
and C2 . Assumptions 2, 3 and 4 imply that A1 , A2 , B1 , B2 , C1 and C2 have predefined values of +1 or −1.
Deﬁne Â x = | Aup Aup | − | Adown Adown | and Ây = i (| Aup Adown | − | Adown Aup |) for Alice
and similarly B̂x and B̂y for Bob and Ĉx and Ĉy for Cleve, where:

| Aup = | x + A1 | Dx+ A2 , | Bup = | x + B1 | Ex+ B2 |Cup = | x +C1 | Fx+ C2 , (A1)

| Adown = | x − A1 | Dx− A2 , | Bdown = | x − B1 | Ex− B2 |Cdown = | x −C1 | Fx− C2 . (A2)

In the GHZ test, we choose Â1 = Â x , Â2 = Ây for Alice and similarly for Bob and Cleve.
Assume that Alice, Bob and Cleve perform these measurements on a shared GHZ state:

1
|ΨGHZ ABC = √ (| A+| B+|C + − | A−| B−|C −) , (A3)
2

where due to Assumption 1, we presume that such a state can be prepared and | A± = √1 (| Aup ±
2
| Adown ), | B± = √1 (| Bup ± | Bdown ) and |C ± = √1 (|Cup ± |Cdown ).
2 2
In order to reproduce perfect correlations in the GHZ state, the predeﬁned values need to satisfy
A x By Cy = Ay Bx Cy = Ay By Cx = 1. These equations imply then that A x Bx Cx = 1; however, one ﬁnds
the opposite result in quantum mechanics: Â x B̂x Ĉx |ΨGHZ ABC = −|ΨGHZ ABC .

References
1. Wigner, E.P. Remarks on the mind-body question. In The Scientist Speculates; Good, I.J., Ed.; Heinemann:
London, UK, 1961.
2. Everett, H. “Relative State” Formulation of Quantum Mechanics. Rev. Mod. Phys. 1957, 29, 454–462. [CrossRef]
3. Rovelli, C. Relational quantum mechanics. Int. J. Theor. Phys. 1996, 35, 1637–1678. [CrossRef]
4. Fuchs, C.A. Notwithstanding Bohr, the Reasons for QBism. Mind Matter 2017, 15, 245–300.
5. Ghirardi, G.C.; Rimini, A.; Weber, T. Uniﬁed dynamics for microscopic and macroscopic systems.
Phys. Rev. D 1986, 34, 470. [CrossRef]

79
Entropy 2018, 20, 350

6. Diosi, L. Models for universal reduction of macroscopic quantum ﬂuctuations. Phys. Rev. A 1989, 40, 1165.
[CrossRef]
7. Penrose, R. On gravity’s role in quantum state reduction. Gen. Relat. Gravit. 1996, 28, 581–600. [CrossRef]
8. Brukner, Č. On the quantum measurement problem. In Quantum [Un]speakables II; Bertlmann, R.,
Zeilinger, A., Eds.; The Frontiers Collection; Springer: New York, NY, USA, 2016. [CrossRef]
9. Frauchiger, D.; Renner, R. Single-world interpretations of quantum theory cannot be self-consistent. arXiv
2016, arXiv:1604.07422. [CrossRef]
10. Deutsch, D. Quantum theory as a universal physical theory. Int. J. Theor. Phys. 1985, 24, 1–41. [CrossRef]
11. Bell, J.S. Speakable and Unspeakable in Quantum Mechanics; Collected Papers on Quantum Philosophy;
Cambridge University Press: Cambridge, MA, USA, 2004. [CrossRef]
12. Peres, A. Quantum Theory: Concepts and Methods; Springer: New York, NY, USA, 1995; p. 173. [CrossRef]
13. Zukowski, M.; Brukner, Č. Quantum non-locality—It ain’t necessarily so ... J. Phys. A Math. Theor. 2014,
47, 424009. [CrossRef]
14. Fuchs, C.A.; Schlosshauer, M.; Stacey, B.C. My Struggles with the Block Universe. arXiv 2015, arXiv:1405.2390.
[CrossRef]
15. Jammer, M. The Philosophy of Quantum Merchanics: The Interpretations of QM in Historical Perspective; John Wiley
and Sons: Hoboken, NJ, USA, 1974; pp. 197–198.
16. Bohm, D. A Suggested Interpretation of the Quantum Theory in Terms of “Hidden” Variables, I and II.
Phys. Rev. 1952, 85, 166–193. [CrossRef]
17. Hooft, G ’t. Free Will in the Theory of Everything. arXiv 2017, arXiv:1709.02874. [CrossRef]
18. Brown, H.R.; Timpson, C.G. Bell on Bell’s theorem: The changing face of nonlocality. In Quantum Nonlocality
and Reality: 50 Years of Bell’s Theorem; Bell, M., Gao, S., Eds.; Cambridge University Press: Cambridge,
MA, USA, 2016.
19. Araújo, M. Understanding Bell’s Theorem Part 3: The Many-Worlds Version. Blog: More Quantum.
Available online: http://mateusaraujo.info/2016/08/02/understanding-bells-theorem-part-3-the-many-
worlds-version/ (accessed on 2 August 2016).
20. Hardy, L. Quantum mechanics, local realistic theories, and Lorentz-invariant realistic theories. Phys. Rev. Lett.
1992, 68, 2981. [CrossRef] [PubMed]
21. Hardy, L. Nonlocality for two particles without inequalities for almost all entangled states. Phys. Rev. Lett.
1993, 71, 1665. [CrossRef] [PubMed]
22. Baumann, V.; Hansen, A.; Wolf, S. The measurement problem is the measurement problem is the
measurement problem. arXiv 2016, arXiv:1611.01111 . [CrossRef]
23. Sudbery, A. Single-World Theory of the Extended Wigner’s Friend Experiment. Found. Phys. 2017, 47,
658–669. [CrossRef]
24. Bub, J. Why Bohr was (Mostly) Right. arXiv 2017, arXiv:1711.01604. [CrossRef]
25. Brukner, Č. (University of Vienna, Austria; Austrian Academy of Sciences, Austria); Renner, R. (Institute for
Theoretical Physics, ETH Zürich, Switzerland). Personal communication, 2017.
26. Bohr, N. Quantum Physics and Philosophy: Causality and Complementarity. In Philosophy in Mid-Century:
A Survey; Klibansky, R., Ed.; La Nuova Italia Editrice: Florence, Italy, 1963.
27. Greenberger, D.M.; Horne, M.A.; Shimony, A.; Zeilinger, A. Going beyond Bell’s Theorem.Am. J. Phys. 1990,
58, 1131–1143. [CrossRef]
28. Leegwater, G. When GHZ Meet Wigner’s Friend. Erasmus University Rotterdam, Rotterdam, The Netherlands.
Unpublished manuscript, 2017.

c 2018 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

80
Article
A Royal Road to Quantum Theory (or Thereabouts)
Alexander Wilce
Department of Mathematics, Susquehanna University, Selinsgrove, PA 17870, USA; [email protected]

Received: 15 January 2018; Accepted: 19 March 2018; Published: 26 March 2018

Abstract: This paper fails to derive quantum mechanics from a few simple postulates. However,
it gets very close, and does so without much exertion. More precisely, I obtain a representation
of ﬁnite-dimensional probabilistic systems in terms of Euclidean Jordan algebras, in a strikingly
easy way, from simple assumptions. This provides a framework within which real, complex and
quaternionic QM can play happily together and allows some (but not too much) room for more
exotic alternatives. (This is a leisurely summary, based on recent lectures, of material from the papers
arXiv:1206:2897 and arXiv:1507.06278, the latter joint work with Howard Barnum and Matthew
Graydon. Some further ideas are also explored, developing the connection between conjugate
systems and the possibility of forming stable measurement records and making connections between
this approach and the categorical approach to quantum theory.)

Keywords: reconstruction of quantum mechanics; conjugate systems; Jordan algebras

1. Introduction and Overview

Whatever else it may be, Quantum mechanics (QM) is a machine for making probabilistic
predictions about the results of measurements. To this extent, QM is, at least in part, about information.
Over the last decade or so, it has become clear that the formal apparatus of quantum theory, at least
in finite dimensions, can be recovered from constraints on how physical systems store and process
information. To this extent, finite-dimensional QM is just about information.
The broad idea of regarding QM in this way, and of attempting to derive its mathematical structure
from simple operational or probabilistic axioms, is not new. Efforts in this direction go back at least to
the work of von Neumann [1], and include also attempts by Schwinger [2], Mackey [3], Ludwig [4],
Piron [5], and many others. However, the consensus is that these were not entirely successful: partly
because the results they achieved (e.g., Piron’s well-known representation theorem) did not rule out
certain rather exotic alternatives to QM, but mostly because the axioms deployed seem, in retrospect,
to lack sufficient physical or operational motivation.
More recently, with inspiration from quantum information theory, attention has focused on
finite-dimensional systems, where the going is a bit easier. Just as importantly, quantum information
theory prompts us to treat properties of composite systems as fundamental, where earlier work focused
largely on systems in isolation (a recent exception to this trend is the paper [6] of Barnum, Müller and
Ududec). These shifts of emphasis are illustrated by the work of Hardy [7], who presented five simple,
broadly information-theoretic postulates governing the states and measurements associated with a
physical system, determining a very restricted set of possible theories, parametrized by a positive
integer r, with finite-dimensional quantum and classical probability theory corresponding to r = 1
and r = 2. Following this lead, several papers, notably [8–10], have derived finite-dimensional QM
from various packages of axioms governing the information-carrying and information-processing
capacity of finite-dimensional systems.

Entropy 2018, 20, 227; doi:10.3390/e20040227 81 www.mdpi.com/journal/entropy

Entropy 2018, 20, 227

Problems with existing approaches. These recent reconstructive efforts suffer from two related
problems. First, they make use of assumptions that seem too strong. Secondly, in trying to derive
exactly complex, ﬁnite-dimensional quantum theory, they derive too much.

• All of the cited papers assume local tomography. This is the doctrine that the state of a bipartite
composite system is entirely determined by the joint probabilities it assigns to outcomes of
measurements on the two subsystems. This rules out both real and quaternionic QM, both of
which are legitimate quantum theories [11].
• These papers also all make some version of a uniformity assumption: that all systems having the
same information-carrying capacity are isomorphic, or that all systems are composed, in a uniform
way, from “bits” of a uniform type. Here, “information carrying capacity” means essentially
the maximum number of states that can be distinguished from one another with probability
one by a single measurement. A bit is a system for which this number is two. This rules out
systems involving superselection rules, i.e., those that admit both real and classical degrees of
freedom (for example, the quantum system corresponding to M2 (C) ⊕ M2 (C), corresponding to a
classical choice between one of two qubits, has the same information-carrying capacity as a single,
four-level quantum system). More seriously, it rules out any theory that includes, e.g., real and
complex, or real and quaternionic systems, as the state spaces of the bits of these theories have
different dimensions. As I will discuss below, one can indeed construct mathematically-reasonable
theories that embrace ﬁnite-dimensional quantum systems of all three types.
• Another shortcoming, not related to the exclusion of real and quaternionic QM, is the technical
assumption (explicit in [10] for bits) that all positive afﬁne functionals on the state space
taking values between zero and one correspond to physically-accessible “effects”, i.e., possible
measurement results. From an operational point of view, this principle (called the “no-restriction
hypothesis” in [12]) seems to call for further motivation.

Another approach. In these notes, I am going to describe an alternative approach that avoids these
difficulties. This begins by associating with every physical system a convex set of states and a
distinguished set of basic measurements (or experiments) that can be made on the system. We then
isolate two striking features shared by classical and quantum probabilistic systems. The first is the
possibility of finding a joint state that perfectly correlates a system A with an isomorphic system A
(call it a conjugate system) in the sense that every basic measurement on A is perfectly correlated
with the corresponding measurement on A. In finite-dimensional QM, where A is represented by
a finite-dimensional Hilbert space H, A, corresponds to the conjugate Hilbert space H, and the
perfectly-correlating state is the maximally-entangled “EPR” state on H ⊗ H.
The second feature is the existence of what I call filters associated with each basic measurement.
These are processes that independently attenuate the “response” of each outcome of the measurement
by some specified factor. Such a process will generally not preserve the normalization of states, but up
to a constant factor, in both classical and quantum theory, one can prepare any desired state by applying
a suitable filter to the maximally-mixed state. Moreover, when the target state is not singular (that is,
when it does not assign probability zero to any nonzero measurement outcome), one can reverse the
filtering process, in the sense that it can be undone by another process with positive probability.
The upshot is that all probabilistic systems having conjugates and a sufficiently lavish supply
of (probabilistically) reversible filters can be represented by formally real Jordan algebras, a class
of structures that includes real, complex and quaternionic quantum systems, and just two further
well-studied additional possibilities, which I will review below.
In addition to leaving room for real and quaternionic quantum mechanics (which I take to be
a virtue), this approach has another advantage: it is much easier! The assumptions involved are
few and easily stated, and the proof of the main technical result (Lemma 1 in Section 4) is short
and straightforward. By contrast, the mathematical developments in the papers listed above are
significantly more difficult and ultimately lean on the (even more difficult) classification of compact

82
Entropy 2018, 20, 227

groups acting on spheres. My approach, too, leans on a received result, but one that is relatively
accessible. This is the Koecher–Vinberg theorem, which characterizes formally real, or Euclidean,
Jordan algebras in terms of ordered real vector spaces with homogeneous, self-dual cones. A short and
non-taxing proof of this classical result can be found in [13].
These ideas were developed in [14–16] and especially [17], of which this paper is, to an extent,
a summary. However, the presentation here is slightly different, and some additional ideas are
also explored. In particular, I have spelled out in more detail the connection between conjugate
systems and measurement records, only alluded to in the earlier paper. I also link this approach to the
categorical approach to quantum theory due to Abramsky, Coecke and others [18], along the way
brieﬂy discussing recent work with Howard Barnum and Matthew Graydon [19] on the construction
of probabilistic theories in which real, complex and quaternionic quantum systems coexist. Finally,
Appendix B presents a uniqueness result for spectral decompositions of states, which may ﬁnd
further application.

A bit of background. At this point, I had better pause to explain some terms. A Jordan algebra is a real
commutative algebra (a real vector space E with a commutative bilinear multiplication a, b → a b) ·
·· · ·
having a multiplicative unit u and satisfying the Jordan identity: a2 ( a b) = a ( a2 b), for all
·
a, b, c ∈ E, where a2 = a a. A Jordan algebra is formally real if sums of squares of nonzero elements are
always nonzero. The basic, and motivating, example is the space Lsa (H) of self-adjoint operators on a
· ·
complex Hilbert space, with the Jordan product given by a b = 12 ( ab + ba). Note that here, a a = aa,
so the notation a2 is unambiguous. To see that Lsa (H) is formally real, just note that a2 is always a
positive operator.
If H is finite dimensional, Lsa (H) carries a natural inner product, namely a, b = Tr( ab).
· ·
This plays well with the Jordan product: a b, c = b, a c for all a, b, c ∈ Lsa (H). More generally,
a finite-dimensional Jordan algebra equipped with an inner product having this property is said to
be Euclidean. For finite-dimensional Jordan algebras, being formally real and being Euclidean are
equivalent [13]. In what follows, I will abbreviate “Euclidean Jordan algebra” to EJA.
Jordan algebras were originally proposed, with what now looks like slightly thin motivation,
by P. Jordan [20]: if a and b are quantum-mechanical observables, represented by a, b ∈ Lsa (H),
then while a + b is again self-adjoint, ab and ba are not, unless a and b commute; however, their
·
average, a b, is self-adjoint and, thus, represents another observable. Almost immediately, Jordan,
von Neumann and Wigner showed [21] that all formally real Jordan algebras are direct sums of simple
such algebras, with the latter falling into just five classes, parametrized by positive integers n: the
self-adjoint parts, Mn (F)sa , of matrix algebras Mn (F), where F = R, C or H (the quaternions) or, for
n = 3, over O (the octonions); and also what are called spin factors Vn (closely related to Clifford
algebras). There is some overlap: V2 M2 (R), V3 M2 (C) and V5 M2 (H). In all but one case, one
can show that a simple Jordan algebra is a Jordan subalgebra of Mn (C) for suitable n. The exceptional
Jordan algebra, M3 (O)sa , admits no such representation.
Besides this classification theorem, there is only one other important fact about Euclidean Jordan
algebras that is needed for what follows. This is the Koecher–Vinberg (KV) theorem alluded to above.
Recall that an ordered vector space is a real vector space, call it E, spanned by a distinguished convex
cone E+ having its vertex at the origin. Such a cone induces a translation-invariant partial order on
E, namely a ≤ b iff b − a ∈ E+ . As an example, the space Lsa (H) is ordered by the cone of positive
operators. More generally, any EJA is an ordered vector space, with positive cone E+ := { a2 | a ∈ A}.
This cone has two special features: first, it is homogeneous, i.e., for any points a, b in the interior of E+ ,
there exists an automorphism of the cone (a linear isomorphism E → E, taking E+ onto itself) that
maps a to b. In other words, the group of automorphisms of the cone acts transitively on the cone’s
interior. The other special property is that E+ is self-dual. This means that E carries an inner product
(in fact, the given one making E Euclidean) such that a ∈ E+ iff a, b ≥ 0 for all b ∈ E+ .

83
Entropy 2018, 20, 227

An order unit in an ordered vector space E is an element u ∈ E+ such that, for all a ∈ E, there exists
some n ∈ N with a ≤ nu. In ﬁnite dimensions, this is equivalent to u’s belonging to the interior of
the cone E+ [22]. In the following, by a Euclidean order unit space, I mean an ordered vector space E
equipped with an inner product , with a, b ≥ 0 for all a, b ∈ E+ , and a distinguished order-unit
u. I will say that such a space E is HSD iff E+ is homogeneous, and also self-dual with respect to the
given inner product.

Theorem 1 (Koecher 1958; Vinberg 1961). Let E be a ﬁnite-dimensional euclidean order-unit space. If E is
·
HSD, then there exists a unique product with respect to which E (with its given inner product) is a euclidean
Jordan algebra, u is the Jordan unit, and E+ is the cone of squares.

It seems, then, that if we can motivate a representation of physical systems in terms of

HSD order-unit spaces, we will have “reconstructed” what with a little license we might call
ﬁnite-dimensional Jordan-quantum mechanics. In view of the classiﬁcation theorem glossed above,
this gets us into the neighborhood of orthodox QM, but still leaves open the possibility of taking real
and quaternionic quantum systems seriously. (It also leaves the door open to two possibly unwanted
guests, namely spin factors and the exceptional Jordan algebra. I will discuss below some constraints
that at least bar the latter.)

Some notational conventions. My notation is mostly consistent with the following conventions
(more standard in the mathematics than the physics literature, but in places slightly excentric relative
to either). Capital Roman letters A, B, C serve as labels for systems. Mn (F) stands for the set of n × n
matrices over F = R or H; Mn (F)sa is the set of self-adjoint such matrices. Vectors in a Hilbert space H
are denoted by little Roman letters x, y, z from the end of the alphabet. Operators on H will usually be
denoted by little Roman letters a, b, c, ... from the beginning of the alphabet. Roman letters t, s typically
stand for real numbers. The space of all linear operators on H is denoted L(H); as already indicated
above, Lsa (H) is the (real) vector space of self-adjoint operators on H.
As above, the conjugate Hilbert space is denoted H. I will write x for the vectors in H
corresponding to x ∈ H. From a certain point of view, this is the same vector; the bar serves to
remind us that cx = c x for scalars c ∈ C. Alternatively, one can regard H as the space of “bra” vectors
x | corresponding to the “kets” | x in H, i.e., as the dual space of H.
The inner product of x, y ∈ H is written as x, y and is linear in the ﬁrst argument (if you
like: x, y = y| x in Dirac notation). The inner product on H is then x, y = y, x . The rank-one
projection operator associated with a unit vector x ∈ H is p x . Thus, p x (y) = y, x x. I denote
functionals on Lsa (H) by little Greek letters, e.g., α, β..., and operators on Lsa (H) by capital Greek
letters, e.g., Φ. Two exceptions to this scheme: a generic density operator on H is denoted by the
capital Roman letter W, and a certain special unit vector in H ⊗ H is denoted by the capital Greek
letter Ψ. With luck, context will help keep things straight.

2. Homogeneity and Self-Duality in Quantum Theory

Why should a probabilistic physical system be represented by a Euclidean order-unit space that
is either homogeneous or self-dual? One place to start hunting for an answer might be to look at
standard quantum probability theory, to see if we can isolate, in operational or probabilistic terms,
what makes this self-dual and homogeneous.

Correlation and self-duality. Let H be a ﬁnite-dimensional complex Hilbert space, representing

some ﬁnite-dimensional quantum system. The system’s states are represented by density operators,
i.e., positive trace-one operators W ∈ Lsa (H); possible measurement-outcomes are represented by
effects, i.e., positive operators a ∈ Lsa (H) with a ≤ 1. The Born rule speciﬁes the probability of

84
Entropy 2018, 20, 227

observing effect a in state W as Tr(Wa). If W is a pure state, i.e., W = pv where v is a unit vector in H,
then Tr(Wa) = av, v; by the same token, if a = p x , then Tr(Wa) = Wx, x .
For a, b ∈ Lsa (H), let a, b := Tr( ab). This is an inner product. By the spectral theorem,
Tr( ab) ≥ 0 for all b ∈ Lh (H)+ iff Tr( ap x ) ≥ 0 for all unit vectors x. However, Tr( ap x ) = ax, x .
So Tr( ab) ≥ 0 for all b ∈ Lh (H)+ iff a ∈ Lh (H)+ , i.e., the trace inner product is self-dualizing.
However, this now leaves us with the following:

Question: What does the trace inner product represent, oprationally or probabilistically?

Let H be the conjugate Hilbert space to H. Suppose H has dimension n. Any unit vector Ψ in
H ⊗ H gives rise to a joint probability assignment to effects a on H and b on H, namely ( a ⊗ b)Ψ, Ψ.
Consider the EPR state for H ⊗ H deﬁned by the unit vector:

Ψ= √1
n ∑ x ⊗ x ∈ H ⊗ H,
x∈E

where E is any orthonormal basis for H. A straightforward computation shows that the joint probability
of observing a and b in the state Ψ is:

( a ⊗ b)Ψ, Ψ = n1 Tr( ab).

In other words, the normalized trace inner product just is the joint probability function determined
by the pure state vector Ψ!
As a consequence, the state represented by Ψ has a very strong correlational property: if x, y are
two orthogonal unit vectors with corresponding rank-one projections p x and py , we have p x py = 0,
so ( p x ⊗ py )Ψ, Ψ = 0. On the other hand, ( p x ⊗ p x )Ψ, Ψ = n1 Tr( p x ) = n1 . Hence, Ψ perfectly,
and uniformly, correlates every basic measurement (orthonormal basis) of H with its counterpart in H.

Filters and homogeneity. Next, let us see why the cone Lh (H)+ is homogeneous. Recall that this
means that any state in the interior of the cone (here, any non-singular density operator) can be
obtained from any other by an automorphism of the cone. However, in fact, something better is true:
this order-automorphism can be chosen to represent a probabilistically-reversible physical process,
i.e., an invertible CP mapping with a CP inverse.
To see how this works, suppose W is a positive operator on H. Consider the pure CP mapping
ΦW : Lsa (H) → Lsa (H) given by:

ΦW ( a) = W 1/2 aW 1/2 .

−1
Then, ΦW (1) = W. If W is nonsingular, so is W 1/2 , so ΦW is invertible, with inverse ΦW = ΦW −1 ,
again a pure CP mapping. Now, given another nonsingular density operator M, we can get from W to
M by applying Φ M ◦ ΦW −1 .
All well and good, but we are still left with the following:

Question: What does the mapping ΦW represent, physically?

To answer this, suppose W is a density operator, with spectral expansion W = ∑ x∈ E t x p x . Here, E

is an orthonormal basis for H diagonalizing W, and t x is the eigenvalue of W corresponding to x ∈ E.
Then, for each vector x ∈ E,
ΦW ( p x ) = t x p x

where p x is the projection operator associated with x. We can understand this to mean that ΦW acts as
a ﬁlter on the test E: the response of each outcome x ∈ E is attenuated by a factor 0 ≤ t x ≤ 1 (my usage

85
Entropy 2018, 20, 227

here is slightly non-standard, in that I allow ﬁlters that “pass” the system with a probability strictly
between zero and one). Thus, if M is another density operator on H, representing some state of the
corresponding system, then the probability of obtaining outcome x after preparing the system in state
M and applying the process Φ is t x times the probability of x in state M. In detail: suppose p x is the
rank-one projection operator associated with x, and note that W 1/2 p x = p x W 1/2 = t1/2
x p x . Thus,

Tr(ΦW ( M) p x ) = Tr(W 1/2 MW 1/2 p x ) = Tr(W 1/2 Mt1/2 1/2

x p x ) = Tr( t x p x W
1/2
M)
= Tr(t x p x M) = t x Tr( Mp x ).

If we think of the basis E as representing a set of alternative channels plus detectors, as in the
figure below, we can add a classical filter attenuating the response of one of the detectors (say, x) by a
fraction t x . What the computation above tells us is that we can achieve the same result by applying a
suitable CP map to the system’s state. Moreover, this can be done independently for each outcome
of E. In Figure 1, this is illustrated for a three-level quantum system: E = { x, y, z} is an orthonormal
basis, representing three possible outcomes of a Stern–Gerlach-like experiment; the filter Φ acts on the
system’s state in such a way that the probability of outcome x is attenuated by a factor of t x = 1/2,
while outcomes y and z are unaffected. Returning to the general situation, if we apply a filter ΦW to the
maximally-mixed state n1 1, we obtain n1 W. Thus, we can prepare W, up to normalization, by applying
the filter ΦW to the maximally mixed state.

x prob = 12 α( x )

α
y prob = α(y)

Φ z prob = α(z)

Figure 1. Φ attenuates x’s sensitivity by 1/2.

Filters are symmetric. Here is a ﬁnal observation, linking these last two: the ﬁlter ΦW is symmetric
with respect to the uniformly-correlating “EPR” state Ψ, in the sense that:

(ΦW ( a) ⊗ b)Ψ, Ψ = ( a ⊗ ΦW (b))Ψ, Ψ

for all effects a, b ∈ Lsa (H)+ . Remarkably, this is all that is needed to recover the Jordan structure of
finite-dimensional quantum theory: the existence of a conjugate system, with a uniformly-correlating
joint state, plus the possibility of preparing non-singular states by means of filters that are symmetric
with respect to this state, and doing so reversibly when the state is nonsingular.
In a very rough outline, the argument is that states preparable (up to normalization) by
symmetric filters have spectral decompositions, and the existence of spectral decompositions makes
the uniformly-correlating joint state a self-dualizing inner product. However, to spell this out in a
precise way, I need a general mathematical framework for discussing states, effects and processes in
abstraction from quantum theory. The next section reviews the necessary apparatus.

3. General Probabilistic Theories

A characteristic feature of quantum mechanics is the existence of incompatible,
or non-comeasurable, observables. This suggests the following simple, but very fruitful, notion:

Deﬁnition 1. A test space is a collection M of non-empty sets E, F, ...., each representing the outcome-set of
some measurement, experiment, or test. At the outset, one makes no special assumptions about the combinatorial
!
structure of M. In particular, distinct tests are permitted to overlap. Let X := M denote the set of all

86
Entropy 2018, 20, 227

outcomes of all tests in M: a probability weight on M is a function α : X → [0, 1] such that ∑ x∈ E α( x ) = 1

for every E ∈ M.

Test spaces were introduced and studied by D. J. Foulis and C. H. Randall in a long series of papers
beginning around 1970. The original term for a test was an operation, which has the advantage of
signaling that the concept has wider applicability than simply reading a number off a meter: anything
an agent can do that leads to a well-deﬁned, exhaustive set of mutually-exclusive outcomes deﬁnes an
operation. Accordingly, test spaces were originally called “manuals of operations”.
It can happen that a test space admits no probability weights at all. However, to serve as a model
of a real family of experiments associated with an actual physical system, a test space should obviously
carry a lavish supply of such weights. One might want to single out some of these as describing
physically (or otherwise) possible states of the system. This suggests the following:

Deﬁnition 2. A probabilistic model is a pair A = (M, Ω), where M is a test space and Ω is some designated
convex set of probability weights, called the states of the model.

The deﬁnition is deliberately spare. Nothing prohibits us from adding further structure (a group
of symmetries, say, or a topology on the space of outcomes). However, no such additional structure
is needed for the results I will discuss below. I will write M( A), X ( A) and Ω( A) for the test space,
associated outcome space and state space of a model A. The convexity assumption on Ω( A) is
intended to capture the possibility of forming mixtures of states. To allow the modest idealization of
taking outcome-wise limits of states to be states, I will also assume that Ω( A) is closed as a subset of
[0, 1] X ( A) (in its product topology). This makes Ω( A) compact and, so, guarantees the existence of
pure states, that is, extreme points of Ω( A). If Ω( A) is the set of all probability weights on M( A), I
will say that A has a full state space.

Two bits. Here is a simple, but instructive illustration of these notions. Consider a test space M =
{{ x, x }, {y, y }}. Here, we have two tests, each with two outcomes. We are permitted to perform either
test, but not both at once. A probability weight is determined by the values it assigns to x and to y, and
since the sets { x, x } and {y, y } are disjoint, these values are independent. Thus, geometrically, the
space of all probability weights is the unit square in R2 (Figure 2a, below). To construct a probabilistic
model, we can choose any closed, convex subset of the square for Ω. For instance, we might let Ω be
the convex hull of the four probability weights δx , δx , δy and δy corresponding to the midpoints of the
four sides of the square, as in Figure 2b, that is,

δx ( x ) = 1, δx ( x ) = 0, δx (y) = δx (y ) = 1/2,

δx ( x ) = 0, δx ( x ) = 1, δx (y) = δx (y ) = 1/2,

and similarly for δy and δy .

y
δy
1

δx δx

x
1 δy

(a) (b)

Figure 2. The state spaces of two bits. (a) The square bit; (b) The diamond bit.

87
Entropy 2018, 20, 227

The model of Figure 2a, in which we take Ω to be the entire set of probability weights on
M = {{ x, x }, {y, y }}, is sometimes called the square bit. I will call the model of Figure 2b the
diamond bit.

Classical, quantum and Jordan models. If E is a finite set, the corresponding classical model is A( E) =
({ E}, Δ( E)) where Δ( E) is the simplex of probability weights on E. If H is a finite-dimensional
!
complex Hilbert space, let M(H) denote the set of orthonormal bases of H: then X = M(H)
is the unit sphere of H, and any density operator W on H defines a probability weight αW , given
by αW ( x ) = Wx, x for all x ∈ X. Letting Ω(H) denote the set of states of this form, we obtain the
quantum model, A(H) = (M(H), Ω(H)), associated with H (Gleason’s theorem tells us that A(H)
has a full state space for dim(H) > 2, but we will not need this fact).
More generally, every Euclidean Jordan algebra E gives rise to a probabilistic model as follows.
A minimal or primitive idempotent of E is an element p ∈ E with p2 = p and, for q = q2 < p, q = 0.
A Jordan frame is a maximal pairwise orthogonal set of primitive idempotents. Let X (E) be the set of
primitive idempotents; let M(E) be the set of Jordan frames; and let Ω(E) be the set of probability
weights of the form α( p) = a, p where a ∈ E+ with a, u = 1. These data define the Jordan model
A(E) associated with E. In the case where E = Lh (H) for a finite-dimensional Hilbert space H,
this almost gives us back the quantum model A(H): the difference is that we replace unit vectors by
their associated projection operators, thus conflating outcomes that differ only by a phase.

Sharp models. Jordan models enjoy many special features that the generic probabilistic model lacks.
I want to take a moment to discuss one such feature, which will be important below.

Deﬁnition 3. A model A is unital iff, for every outcome x ∈ X ( A), there exists a state α ∈ Ω( A) with
α( x ) = 1, and sharp if this state is unique (from which it follows easily that it must be pure). If A is sharp, I
will write δx for the unique state making x ∈ X ( A) certain.

If A is sharp, then there is a sense in which each test E ∈ M( A) is maximally informative: if we

are certain which outcome x ∈ E will occur, then we know the system’s state exactly, as there is only
one state in which x has probability 1.
Classical and quantum models are obviously sharp. More generally, every Jordan model is sharp.
To see this, note first that every state α on a Euclidean Jordan algebra E has the form α( x ) = a, x
where a ∈ E+ with a, u = 1 and where , is the given inner product on E, normalized so that
x = 1 for all primitive idempotents (equivalently, so that u = n, the rank of E). The spectral
theorem for EJAs [13] shows that a = ∑ p∈ E t p p where E is a Jordan frame and the coefficients t p are
non-negative and sum to one (since a, u = 1). If a, x = 1, then ∑ p∈ E t p p, x = 1 implies that,
for every p ∈ E with t p > 0, p, x = 1. However, p = x = 1, so this implies that p, x = p x ,
which in turn implies that p = x.
In general, a probabilistic model need not even be unital, much less sharp. On the other hand,
given a unital model A, it is often possible to construct a sharp model by suitably restricting the state
space. This is illustrated in Figure 2b above: the full state space of the square bit is unital, but far
from sharp; however, by restricting the state space to the convex hull of the barycenters of the faces,
we obtain a sharp model. This is possible whenever A is unital and carries a group of symmetries
acting transitively on the outcome-set X ( A). For details, see Appendix A. The point here is that
sharpness is not, by itself, a very stringent condition: since we should expect to find highly symmetric,
unital models represented abundantly “in nature”, we can also expect to encounter an abundance of
systems represented by sharp models.

88
Entropy 2018, 20, 227

The spaces V( A), V∗ ( A). Any probabilistic model gives rise to a pair of ordered vector spaces in a
canonical way. These will be essential in the development below, so I am going to go into a bit of
detail here.

Deﬁnition 4. Let A be any probabilistic model. Let V( A) be the span of the state space Ω( A) in RX ( A) ,
ordered by the cone V( A)+ consisting of non-negative multiples of states, i.e.,

V( A)+ = {tα|α ∈ Ω( A), t ≥ 0}.

Call the model A finite-dimensional iff V( A) is finite-dimensional. From now on, I assume that
all models are finite-dimensional.
Let V∗ ( A) denote the dual space of V( A), ordered by the dual cone of positive linear functionals,
i.e., functionals f with f (α) ≥ 0 for all α ∈ V( A)+ . Any measurement-outcome x ∈ X ( A) yields an
evaluation functional x" ∈ V∗ ( A), given by x"(α) = α( x ) for all α ∈ V( A). More generally, an effect is a
positive linear functional f ∈ V∗ ( A) with 0 ≤ f (α) ≤ 1 for every state α ∈ Ω( A). The functionals x" are
effects. One can understand an arbitrary effect a to represent a mathematically possible measurement
outcome, having probability a(α) in state α. I stress the adjective mathematically because, a priori,
there is no guarantee that every effect will correspond to a physically-realizable measurement outcome.
In fact, at this stage, I make no assumption at all about what, apart from the tests E ∈ M( A), is or
is not physically realizable. (Later, it will follow from further assumptions that every element of
V∗ ( A) represents a random variable associated with some E ∈ M( A) and is, therefore, operationally
meaningful. However, this will be a theorem, not an assumption.)
The unit effect is the functional u A := ∑ x∈ E x", where E is any element of M( A). This takes the
constant value of one on Ω( A), and, thus, represents a trivial measurement outcome that occurs with
probability one in every state. This is an order unit for V∗ ( A) (to see this, let a ∈ V( A)∗ , and let N be
the maximum value of | a(α)| for α ∈ Ω( A), remembering that the latter is compact: then a ≤ Nu).
For both classical and quantum models, the ordered vector spaces V∗ ( A) and V( A) are naturally
isomorphic. If A( E) is the classical model associated with a finite set E, both are isomorphic to the
space RE of all real-valued functions on E, ordered pointwise. If A = A(H) is the quantum model
associated with a finite-dimensional Hilbert space H, V( A) and V∗ ( A) are both naturally isomorphic
to the space Lh (H) of Hermitian operators on H, ordered by its usual cone of positive semi-definite
operators. More generally, if E is a Euclidean Jordan algebra and A = A(E) is the corresponding
Jordan model, then V( A) E V∗ ( A), with E ordered as usual, i.e., by its cone of squares. The first
of these isomorphisms is due to the definition of the model A(E) and the second to E’s self-duality.

The space E( A). It is going to be technically useful to introduce a third ordered vector space, which I
will denote by E( A). This is the span of the evaluation-effects x", associated with measurement
outcomes x ∈ X ( A), in V∗ ( A), ordered by the cone:
# $

E( A)+ := ∑ i i i
t "
x t ≥ 0 .
i

That is, E( A)+ is the set of linear combinations of effects x" having non-negative coefﬁcients. It is
important to note that this is, in general, a proper sub-cone of V( A)∗+ . To see this, we can revisit the
example of the “diamond bit” of Figure 2b. Letting x and y be the outcomes corresponding to the
right face and the top face of the larger (full) state space pictured below in Figure 3a, consider the
functional f := x" + y" − 12 u. This takes positive values on the smaller state space of the diamond bit,
but is negative on, for example, the state γ corresponding to the lower-left corner of the full state space
(see Figure 3b). Thus, f ∈ V( A)+ , but f
∈ E( A)+ .

89
Entropy 2018, 20, 227

y" = 1

x" = 1 f =1

f =0
(a) (b)

Figure 3. (a) Two outcome-effects for the square bit; (b) An effect for the diamond bit not positive on
the square bit.

Since we are working in ﬁnite dimensions, the outcome-effects x" span V∗ ( A). Thus, as vector
spaces, E( A) and V∗ ( A) are the same. However, as the diamond bit illustrates, they can have quite
different positive cones and, thus, need not be isomorphic as ordered vector spaces.

Processes and subnormalized states. A subnormalized state of a model A is an element α of V( A)+ with
u(α) < 1. These can be understood as states that allow a nonzero probability 1 − u(α) of some generic
“failure” event, (e.g., the destruction of the system), represented by the zero functional in V∗ ( A).
More generally, we may wish to regard two systems, represented by models A and B, as the input
to and output from some process, whether dynamical or purely information-theoretic, that has some
probability to destroy the system or otherwise “fail”. Since such a process should preserve probabilistic
mixtures, it should be represented mathematically by an afﬁne mapping T : Ω( A) → V( B)+ , taking
each normalized state α of A to a possibly sub-normalized state T (α) of B. One can show that such a
mapping extends uniquely to a positive linear mapping:

T : V ( A ) → V ( B ),

so from now on, this is how I represent processes.

Even if a process T has a nonzero probability of failure, it may be possible to reverse its effect
with nonzero probability.

Deﬁnition 5. A process T : A → B is probabilistically reversible iff there exists a process S such that, for all
α ∈ Ω( A), (S ◦ T )(α) = pα, where p ∈ (0, 1].

This means that there is a probability 1 − p of the composite process S ◦ T failing, but a
probability p that it will leave the system in its initial state (note that, since S ◦ T is linear, p must
be constant); where T preserves normalization, so that T (Ω( A)) ⊆ Ω( B), S can also be taken to be
normalization-preserving and will undo the result of T with probability one. This is the more usual
meaning of “reversible” in the literature.
Given a process T : V( A) → V( B), there is a dual mapping T ∗ : V∗ ( B) → V∗ ( A), also positive,
given by T ∗ (b)(α) = b( T (α)) for all b ∈ V∗ ( B) and α ∈ V( A). The assumption that T takes normalized
states to subnormalized states is equivalent to the requirement that T ∗ (u B ) ≤ u A , that is that T ∗ maps
effects to effects.

Remark 1. Since we are attaching no special physical interpretation to the cone E+ ( A), we do not require a
physical process T : V( A) → V( B) to have a dual process T ∗ that maps E+ ( B) to E+ ( A). That is, we do not
require T ∗ to be positive as a mapping E( B) → E( A).

90
Entropy 2018, 20, 227

Joint probabilities and joint states. If M1 and M2 are two test spaces, with outcome-spaces X1 and
X2 , we can construct a space of product tests (note here the savage abuse of notation: M1 × M2 is
not the Cartesian product of M1 and M2 ):

M1 × M2 = { E × F | E ∈ M1 , F ∈ M2 }

This models a situation in which tests from M1 and from M2 can be performed separately,
and the results collated. Note that the outcome-space for M1 × M2 is X1 × X2 . A joint probability
weight on M1 and M2 is just a probability weight on M1 × M2 , that is a function ω : X1 × X2 →
[0, 1] such that ∑( x,y)∈E× F ω ( x, y) = 1 for all tests E ∈ M1 and F ∈ M2 . One says that ω is
non-signaling iff the marginal (or reduced) probability weights ω1 and ω2 , given by:

ω1 ( x ) = ∑ ω (x, y) and ω2 (y) = ∑ ω (x, y)

y∈ F x∈E

are well-defined, i.e., independent of the choice of the tests E and F, respectively. One can understand
this to mean that the choice of which test to measure on M1 has no observable, i.e., no statistical,
influence on the outcome of tests made of M2 , and vice versa. In this case, one also has well-defined
conditional probability weights:

ω2| x (y) := ω ( x, y)/ω1 ( x ) and ω1|y := ω ( x, y)/ω2 (y)

(with, say, ω2| x = 0 if ω1 ( x ) = 0, and similarly for ω1|y ). This gives us the following bipartite version
of the law of total probability [23]: for any choice: of E ∈ M1 or F ∈ M2 ,

ω2 = ∑ ω1 ( x ) ω2| x and ω1 = ∑ ω2 ( y ) ω1| y . (1)

x∈E y∈ F

Deﬁnition 6. A joint state on a pair of probabilistic models A and B is a non-signaling joint probability weight
ω on M( A) × M( B) such that, for every x ∈ X ( A) and every y ∈ X ( B), the conditional probability weights
ω2| x and ω1|y belong to Ω( A) and Ω( B), respectively. It follows from (1) that the marginal weights ω1 and ω2
are also states of A and B, respectively.

This naturally suggests that one should define, for models A and B, a composite model AB,
the states of which would be precisely the joint states on A and B. If one takes M( AB) = M( A) ×
M( B), this is essentially the “maximal tensor product” of A and B [24]. However, this does not
coincide with the usual composite of quantum-mechanical systems. In Section 6, I will discuss
composite systems in more detail. Meanwhile, for the main results of this paper, the idea of a joint
state is sufficient.
For a simple example of a joint state that is neither classical, nor quantum, let B denote the “square
bit” model discussed above. That is, B = (B , Ω) where e B = {{ x, x }, {y, y }} is a test space with two
non-overlapping, two-outcome tests, and Ω is the set of all probability weights thereon, amounting to
the unit square in R2 . The joint state on B × B given by Table 1 (a variant of the “non-signaling box” of
Popescu and Rohrlich [25]) is clearly non-signaling. Notice that it also establishes a perfect, uniform
correlation between the outcomes of any test on the first system and its counterpart on the second.

Table 1. A joint state for two square bits.

x x’ y y’
x 1/2 0 1/2 0
x’ 0 1/2 0 1/2
y 0 1/2 1/2 0
y’ 1/2 0 0 1/2

91
Entropy 2018, 20, 227

" :
Conditioning maps. If ω is a joint state on A and B, deﬁne the associated conditioning maps ω
" ∗ : X ( B) → V( A) by:
X ( A) → V( B) and ω

" ∗ (y)( x )
" ( x )(y) = ω ( x, y) = ω
ω

for all x ∈ X ( A) and y ∈ X ( B). Note that ω " ( x ) = ω1 ( x )ω2| x for every x ∈ X ( A), i.e., ω " ( x ) can be
understood as the un-normalized conditional state of B given the outcome x on A. Similarly, ω " ∗ (y) is
the unnormalized conditional state of A given outcome y on B.
The conditioning map ω " extends uniquely to a positive linear mapping E( A) → V( B), which
I also denote by ω, " such that ω " ( x ) for all outcomes x ∈ X ( A). To see this, consider the
" ( x") = ω
linear mapping T : V∗ ( A) → RX ( B) defined, for f ∈ V∗ ( A), by T ( f )(y) = f (ω " ∗ (y)) for all y ∈ X ( B).
If f = x", we have T ( x") = ω1 ( x )ω2| x ∈ V( B)+ , whence, for all y ∈ X ( B), T ( x")(y) = ω ( x, y) = ω " ( x )(y).
Since the evaluation functionals x" span E( A), the range of T lies in V( B), and moreover, T is positive on
the cone E( A)+ . Hence, as advertised, T defines a positive linear mapping E( B) → V( A), extending
" In the same way, ω
ω. " ∗ defines a positive linear mapping ω " ∗ : E ( B ) → V ( A ).
An immediate and important corollary is that any joint state ω on A and B defines a bilinear
form, which by abuse of notation I also call ω, on E( A) × E( B), given by ω ( a, b) := ω " ( a)(b) for all
a, b ∈ E( A). Note that ω ( x", y") = ω ( x, y) for all x ∈ X ( A), y ∈ X ( B) and also that the bilinear form ω
is positive, in the sense that ω ( a, b) ≥ 0 for all a ∈ E( A)+ and all b ∈ E( B)+ .

4. Conjugates and Filters

We are now in a position to abstract the two features of QM discussed earlier. Call a test space
( X, M) uniform iff all tests E ∈ M have the same size, which we then call the rank of the test space.
The test spaces associated with quantum models are uniform, and it is quite easy to generate many
other examples (see Appendix A).
A uniform test space of rank n always admits at least one probability weight, namely the
maximally-mixed probability weight ρ( x ) = 1/n for all x ∈ X. I will say that a probabilistic model A
is uniform if the test space M( A) is uniform and the maximally-mixed state ρ belongs to Ω( A).
By an isomorphism γ : A → B from a probabilistic model A to a probabilistic model B, I mean
the obvious thing: a bijection γ : X ( A) → X ( B) taking M( A) onto M( A), and such that β → β ◦ γ
maps Ω( A) onto Ω( A).

Deﬁnition 7. Let A be uniform probabilistic model with tests of size n. A conjugate for A is a model A, plus a
chosen isomorphism γ A : A A and a joint state η A on A and A such that for all x, y ∈ X ( A),

(a) η A ( x, x ) = 1/n
(b) η A ( x, y) = η (y, x )

where x := γ A ( x ).

This corresponds to what is called a “weak conjugate” in [17]. Note that if E ∈ M( A), we have
∑ x,y∈ E× E η A ( x, y) = 1 and | E| = n. Hence, η A ( x, y) = 0 for x, y ∈ E with x
= y. Thus, η A establishes
a perfect, uniform correlation between any test E ∈ M( A) and its counterpart, E := { x | x ∈ E},
in M( A).
The symmetry condition (b) is pretty harmless. If η is a joint state on A and A satisfying (a), then
so is η t ( x, y) := η (y, x ); thus, 12 (η + η t ) satisfies both (a) and (b). In fact, if A is sharp, (b) is automatic:
if η satisfies (a), then the conditional state (η A )1| x assigns probability one to the outcome x. If A is
sharp, this implies that η1| x = δx is uniquely defined, whence η ( x, y) = nδy ( x ) is also uniquely defined.
In other words, for a sharp model A and a given isomorphism γ : A A, there exists at most one joint
state η satisfying (a); whence, in particular, η = η t .

92
Entropy 2018, 20, 227

If A = A(H) is the quantum-mechanical model associated with an n-dimensional Hilbert space

H, then we can take A = A(H) and deﬁne η A ( x, y) = | Ψ, x ⊗ y|2 , where Ψ is the EPR state on
H ⊗ H, as discussed in Section 3.
So much for conjugates. We generalize the ﬁlters associated with pure CP mappings as follows:

Deﬁnition 8. A ﬁlter associated with a test E ∈ M( A) is a positive linear mapping Φ : V( A) → V( A)

such that for every outcome x ∈ E, there is some coefﬁcient t x ∈ [0, 1] with Φ(α)( x ) = t x α( x ) for every state
α ∈ Ω ( A ).

Equivalently, Φ is a filter iff the dual process Φ∗ : V∗ ( A) → V∗ ( A) satisfies Φ∗ ( x") = t x x" for each
x ∈ E. Just as in the quantum-mechanical case, a filter independently attenuates the “sensitivity” of
the outcomes x ∈ E. (The extreme case is one in which the coefficient t x corresponding to a particular
outcome is one, and the other coefficients are all zero. In that case, all outcomes other than x are, so to
say, blocked by the filter. Conversely, given such an “all or nothing” filter Φ x for each x ∈ E, we can
construct an arbitrary filter with coefficients t x by setting Φ = ∑ x∈ E t x Φ x .)
Call a filter Φ reversible iff Φ is an order-automorphism of V( A); that is, iff it is probabilistically
reversible as a process. Evidently, this requires that all the coefficients t x be nonzero. We will eventually
see that the existence of a conjugate, plus the preparability of arbitrary nonsingular states by symmetric
reversible filters, will be enough to force A to be a Jordan model. Most of the work is done by the easy
Lemma 1, below. First, some terminology.

Deﬁnition 9. Suppose Δ = {δx | x ∈ X ( A)} is a family of states indexed by outcomes x ∈ X ( A) and such
that δx ( x ) = 1. Say that a state α is spectral with respect to Δ iff there exists a test E ∈ M( A) such that
α = ∑ x∈ E α( x )δx . Say that the model A itself is spectral with respect to Δ if every state of A is spectral with
respect to Δ.

If A has a conjugate A, then the bijection γ A : X ( A) → X ( A) extends to an order-isomorphism

E( A) E( A). It follows that every non-signaling joint probability weight ω on A and A deﬁnes a
bilinear form a, b → ω ( a, b) on E( A).
The following is essentially proven in [17], but the presentation here is somewhat different.

Lemma 1. Let A have a conjugate ( A, η A ). Suppose A is spectral with respect to the states δx := η1| x ,
x ∈ X ( A). Then:
a, b := nη A ( a, b),
where n is the rank of A, deﬁnes a self-dualizing inner product on E( A), with respect to which V( A)+ E( A)+ .
Moreover, A is sharp, and E( A)+ = V∗ ( A)+ .

Proof. That , is symmetric and bilinear follows from η A ’s being symmetric and non-signaling.
Note that x", x" = 1 for every x ∈ X ( A) and x", y" = 0 for any distinct x, y ∈ X ( A) lying in a common
test. We need to show that , is positive-definite. Since A " A and the latter is spectral, so is the
former. It follows that η" takes E( A)+ onto V( A)+ and, hence, is an order-isomorphism. From this,
it follows that every a ∈ E( A)+ has a “spectral” decomposition of the form ∑ x∈ E t x x for some
coefficients t x ≥ 0 and some test E ∈ M( A). In fact, any a ∈ E( A), positive or otherwise, has such a
decomposition (albeit with possibly negative coefficients). If a ∈ E( A) is arbitrary, with a = a1 − a2 for
some a1 , a2 ∈ E( A)+ , we can find N ≥ 0 with a2 ≤ Nu. Thus, b := a + Nu = a1 + ( Nu − a2 ) ≥ 0, and
so, b := ∑ x∈ E t x x for some E ∈ A, and hence, a = b − Nu = ∑ x∈ E t x x − N (∑ x∈ E x ) = ∑ x∈ E (t x − N ) x.
Now, let a ∈ E( A). Decomposing a = ∑ x∈ E t x x for some test E and some coefficients t x , we have:

a, a = ∑ t x ty x", y" = ∑ tx 2 ≥ 0.
x,y∈ E× E x∈E

93
Entropy 2018, 20, 227

This is zero only where all coefﬁcients t x are zero, i.e., only for a = 0. Therefore, , is an inner
product, as claimed.
We need to show that , is self-dualizing. Clearly a, b = nη A ( a, b) ≥ 0 for all a, b ∈ E( A)+ .
Suppose a ∈ E( A) is such that a, b ≥ 0 for all b ∈ E( A)+ . Then, a, y" ≥ 0 for all y ∈ X. Now,
a = ∑ x∈ E t x x" for some test E; thus, for all y ∈ E, we have a, y" = ty ≥ 0, whence, a ∈ E( A)+ .
Next, we want to show that E( A)+ = V( A)∗+ . Since η" : E( A) → V( A) is an order-isomorphism,
for every α ∈ V( A), there exists a unique a ∈ E( A) with η"( a) = n1 α. In particular,

a, x = nη A ( a, x ) = α( x ) = α( x ).

It follows that if b ∈ E( A) = V∗ ( A),

η A ( a) = nη ( a, b) = a, b.
b(α) = b(α) = bn"

Since every a ∈ E( A)+ has the form a = η"−1 ( n1 α) for some α ∈ V( A)+ , if b ∈ V∗ ( A)+ , we have
a, b ≥ 0 for all a ∈ E( A)+ , whence, by the self-duality of the latter cone, b ∈ E( A)+ . Thus,
V∗ ( A) = E( A)+ .
Finally, let us see that A is sharp. If α ∈ Ω( A), let a be the unique element of E( A)+ with
a, x = α( x ). In particular, a, u = 1. If a has spectral decomposition a = ∑ x∈E t x x", where E ∈ M( A),
then for all x ∈ E, a, x = t x ; hence, ∑ x∈ E t x = ∑ x∈ E a, x = a, u = 1. Thus, a2 = ∑ x∈ E t2x ≤ 1,
whence, a ≤ 1. Now, suppose α( x ) = 1 for some x ∈ X ( A): then, 1 = a, x ≤ a x ; as x = 1,
we have a = 1. However, now a, x" = a x", whence, a = x". Hence, there is only one weight α
with α( x ) = 1, namely, α = x, · , so A is sharp.

If A is sharp, then we say that A is spectral iff it is spectral with respect to the pure states δx deﬁned
by δx ( x ) = 1. If A is sharp and has a conjugate A, then, as noted earlier, the state η1| x is exactly δx ,
so the spectrality assumption in Lemma 1 is fulﬁlled if we simply say that A is spectral. Hence, a sharp,
spectral model with a conjugate is self-dual.
For the simplest systems, this is already enough to secure the desired representation in terms of a
Euclidean Jordan algebra.

Deﬁnition 10. Call A a bit iff it has rank two (that is, all tests have two outcomes) and if every state α ∈ Ω( A)
can be expressed as a mixture of two sharply distinguishable states; that is, α = tδx + (1 − t)δy for some
t ∈ [0, 1] and states δx and δy with δx ( x ) = 1 and δy (y) = 1 for some test { x, y}.

Corollary 1. If A is a sharp bit, then Ω( A) is a ball of some ﬁnite dimension d.

The proof is given in Appendix C. If d is 2, 3 or 5, we have a real, complex or quaternionic bit.

For d = 4 or d ≥ 6, we have a non-quantum spin factor.
For systems of higher rank (higher “information capacity”), we need to assume a bit more.
Suppose A satisﬁes the hypotheses of Lemma 1. Appealing to the Koecher–Vinberg theorem, we see
that if V( A) and, hence, V∗ ( A) are also homogeneous, then V∗ ( A) carries a canonical Jordan structure.
In fact, we can say something a little stronger.

Theorem 2. Let A be spectral with respect to a conjugate system A. If V( A) is homogeneous, then there exists
a canonical Jordan product on E( A) with respect to which u A is the Jordan unit. Moreover, with respect to this
product, X ( A) is exactly the set of primitive idempotents, and M( A) is exactly the set of Jordan frames.

The ﬁrst part is almost immediate from the Koecher–Vinberg theorem, together with Lemma 1.
The KV theorem gives us an isomorphism between the ordered vector spaces V( A) and E( A), so if one
is homogeneous, so is the other. Since E( A) is also self-dual by Lemma 1, the KV theorem yields the
requisite unique Euclidean Jordan structure having u as the Jordan unit. One can then show without

94
Entropy 2018, 20, 227

much trouble that every outcome x ∈ X ( A) is a primitive idempotent of E( A) with respect to this
Jordan structure and that every test is a Jordan frame. The remaining claims (that every minimal
idempotent belongs to X ( A) and every Jordan frame, to M( A)) take a little bit more work. I will
not reproduce the proof here; the details (which are not especially difficult, but depend on some facts
concerning Euclidean Jordan algebras) can be found in [17].
The homogeneity of V( A) can be understood as a preparability assumption: it is equivalent
to saying that every state in the interior of Ω( A) can be obtained, up to normalization, from the
maximally-mixed state by a reversible process. That is, if α ∈ Ω( A), there is some such process φ such
that φ(ρ) = pα where 0 < p ≤ 1. One can think of the coefficient p as the probability that the process
φ will yield a nonzero result (more dramatically: will not destroy the system). Thus, if we prepare an
ensemble of identical copies of the system in the maximally-mixed state ρ and subject them all to the
process φ, the fraction that survives will be about p, and these will all be in state α.
In fact, if the hypotheses of Lemma 1 hold, the homogeneity of E( A) follows directly from the
mere existence of reversible filters with arbitrary non-zero coefficients. To see this, suppose a ∈ E( A)+
has a spectral decomposition ∑ x∈ E t x x" for some E ∈ M( A), with t x > 0 for all x when a belongs
to the interior of E( A)+ . Now, if we can find a reversible filter for E with Φ( x ) = t x x" for all x ∈ E,
then applying this to the order-unit u = ∑ x∈ E x" yields a. Thus, V∗ ( A) is homogeneous.

Two paths to spectrality. Some axiomatic treatments of quantum theory have taken one or another
form of spectrality as an axiom [6,26]. If one is content to do this, then Lemma 1 above provides a
very direct route to the Jordan structure of quantum theory. However, spectrality can actually be
derived from assumptions that, on their face, seem a good deal weaker, or anyway more transparent (a
different path to spectrality is charted in a recent paper [27] by G. Chiribella and C. M. Scandolo).
I will call a joint state on models A and B correlating iff it sets up a perfect correlation between
some pair of tests E ∈ M( A) and F ∈ M( B). More exactly:

Deﬁnition 11. A joint state ω on probabilistic models A and B correlates a test E ∈ M( A) with a test
F ∈ M( B) iff there exist subsets E0 ⊆ E and F0 ⊆ F, and a bijection f : E0 → F0 such that ω ( x, y) = 0 for
( x, y) ∈ E × F unless y = f ( x ). In this case, say that ω correlates E with F along f . A joint state on A and B
is correlating iff it correlates some pair of tests E ∈ M( A), F ∈ M( B).

Note that ω correlates E with F along f iff ω ( x, f ( x )) = ω1 ( x ) = ω2 ( f ( x )), which, in turn,

is equivalent to saying that ω2| x ( f ( x )) = 1 for ω1 ( x )
= 0.

Lemma 2. Suppose A is sharp and that every state α of A arises as the marginal of a correlating joint state
between A and some model B. Then, A is spectral.

Proof. Suppose α = ω1 , where ω is a joint state correlating a test E ∈ M( A) with a test F ∈ M( B),
say along a bijection f : E0 → F0 , where Eo ⊆ E and F0 ⊆ F. Then, for any x ∈ E with α( x )
= 0,
ω1| f ( x) ( x ) = 1, whence, as A is sharp, ω1| f ( x) = δx , the unique state making x certain. It follows from
the law of total probability that α = ∑ x∈ E α( x )δx .

In principle, the model B can vary with the state α. Lemma 2 suggests the following language:

Deﬁnition 12. A model A satisﬁes the correlation condition iff every state α ∈ Ω( A) is the marginal of some
correlating joint state of A and some model B.

This has something of the same ﬂavor as the puriﬁcation postulate of [8], which requires that all
states of a given system arise as marginals of a pure state on a larger, composite system, unique up to
symmetries on the purifying system. However, note that we do not require the correlating joint state
to be either pure (which, in classical probability theory, it will not be) or unique.

95
Entropy 2018, 20, 227

If A is sharp and satisfies the correlation condition, then every state of A is spectral. If, in addition,
A has a conjugate, then for every x ∈ X ( A), we have η1| x = δx . In this case, A is spectral with respect
to the family of states η1| x , and the hypotheses of Lemma 1 are satisfied.
Here is another, superficially quite different, way of arriving at spectrality. Suppose A has a
conjugate, A. Call a transformation Φ symmetric with respect to η A iff, for all x, y ∈ X ( A),
∗
η A (Φ∗ x, y) = η A ( x, Φ y).

Say that a state α is preparable by a ﬁlter Φ iff α = Φ(ρ), where ρ is the maximally-mixed state.

Lemma 3. Let A have a conjugate, A, and suppose every state of A is preparable by a symmetric ﬁlter. Then, A
is spectral.

Proof. Let α = Φ(ρ) where Φ is a ﬁlter on a test E ∈ M( A), say Φ( x ) = t x x for all x ∈ E. Then:
∗
α = Φ(η"∗ (u)) = η (Φ∗ (·), u) = η ( · , Φ (u)) = ∑ η ( · , tx x) = ∑ tx n1 δx .
x∈E x∈E

Thus, the hypotheses of either Corollary 2 or Lemma 3 will supply the needed spectral assumption
that makes Lemma 1 work (in fact, it is not hard to see that these hypotheses are actually equivalent,
an exercise I leave for the reader).
To obtain a Jordan model, we still need homogeneity. This is obviously implied by the preparability
condition in Lemma 3, provided the preparing filters Φ can be taken to be reversible whenever the
state to be prepared is non-singular. On the other hand, as noted above, in the presence of spectrality, it
is enough to have arbitrary reversible filters, as these allow one to prepare the spectral decompositions
of arbitrary non-singular states. Thus, conditions (a) and (b) below both imply that A is a Jordan
model. Conversely, one can show that any Jordan model satisfies both (a) and (b), closing the loop [17]:

Theorem 3. The following are equivalent:

(a) A has a conjugate, and every non-singular state can be prepared by a reversible symmetric filter;
(b) A is sharp, has a conjugate, satisfies the correlation condition and has arbitrary reversible filters;
(c) A is a Jordan model.

5. Measurement, Memory and Correlation

Of the spectrality-underwriting conditions given in Lemmas 2 and 3, the one that seems less
transparent (to me, anyway) is the correlation condition, i.e., that every state arises as the marginal
of a correlating bipartite state. While surely less ad hoc than spectrality, this still calls for further
explanation. Suppose we hope to implement a measurement of a test E ∈ M( A) dynamically. This
would involve bringing up an ancilla system B (also uniform, suppose; and which we can suppose,
by suitable coarse-graining, if necessary, to have tests of the same cardinality as A’s) in some “ready”
state β o . We would then subject the combined system AB to some physical process, at the end of
which, AB is in some ﬁnal joint state ω, and B is (somehow!) in one of a set of record states, β x ,
each corresponding to an outcome x ∈ X ( A). (This way of putting things takes us close to the usual
formulation of the quantum-mechanical “measurement problem”, which I certainly do not propose
to discuss here. The point is only that, if any dynamical process, describable within the theory, can
account for measurement results, it should be consistent with this description.)
We would like to insist that:

(a) The states β x are distinguishable, or readable, by some test F ∈ M( B). This means that for each
x ∈ E, there is a unique y ∈ F such that β x (y) = 1. Note that this sets up an injection f : E → F.

96
Entropy 2018, 20, 227

(b) The record states must be accurate, in the sense that if we were to measure E on A, and secure
x ∈ E, the record state β x should coincide with the conditional state ω2| x (if this is not the case,
then a measurement of A cannot correctly calibrate the system B as a measuring device for E).

It follows from (a) and (b) that, for x ∈ E and y

= f ( x ) ∈ F,

ω ( x, y) = ω1 ( x )ω2| x (y) = ω1 ( x ) β x (y) = 0.

In other words, ω must correlate E with F, along the bijection f : E → Fo ⊆ F. If the measurement
process leaves α undisturbed, in the sense that ω1 = α, then α dilates to a correlating state. This suggests
the following non-disturbance principle: every state can be measured, by some test E ∈ M( A),
without disturbance. Lemma 2 then tells us that if A is sharp and satisﬁes the non-disturbance
principle, every state of A is spectral.
Here is a slightly different, but possibly more compelling, version of this story. Suppose we can
perform a test E on A directly (setting aside, that is, any issue of whether or not this can be achieved
through some dynamical process): this will result in an outcome x occurring. To do anything with
this, we need to record its having occurred. This means we need a storage medium, B and a family
of states β x , one for each x ∈ E, such that if, on performing the test E, we obtain x, then B will be
in state β x . Moreover, these record states need to be readable at a later time, i.e., distinguishable by
a later measurement on B. To arrange this, we need A and B to be in a joint state, associated with
a joint probability weight ω, such that ω1 = α (because we want to have prepared A in the state α)
and β x = ω2| x for every x ∈ E. We then measure E on A; upon our obtaining outcome x ∈ E, B is in
the state β x . Since the ensemble of states β x is readable by some F ∈ M( B) with | F | ≥ | E|, we have
correlation, and α must also be spectral.
Of course, these desiderata cannot always be satisﬁed. What is true, in QM, is that for every
choice of state α, there will exist some test that is recordable in that state, in the foregoing sense. If we
promote this to the general principle, we again see that every state is the marginal of a correlating state,
and hence spectral, if A is sharp.

6. Composites and Categories

Thus far, we have been referring to the correlator η A as a joint state, but dodging the question:
state of what? Mathematically, nothing much hangs on this question: it is sufficient to regard η A as a
bipartite probability assignment on A and A. However, it would surely be more satisfactory to be able
to treat it as an actual physical state of some composite system AA. How should this be chosen? As
mentioned above, one possibility is to take AA to be the maximal tensor product of the models A and
A [24]. By definition, this has for its states all non-signaling probability assignments with conditional
states belonging to A and A. However, we might want composite systems, in particular AA, to satisfy
the same conditions we are imposing on A and A, i.e., to be a Jordan model. If so, we need to work
somewhat harder: the maximal tensor product will be self-dual only if A is classical.
In order to be more precise about all this, the first step is to decide what ought to count as a
composite of two probabilistic models. If we mean to capture the idea of two physical systems that can
be acted upon separately, but which cannot influence one another in any observable way (e.g., two
spacelike-separated systems), the following seems to capture the minimal requirements:

Deﬁnition 13. A non-signaling composite of models A and B is a model AB, together with a mapping
π : X ( A) × X ( B) → V∗ ( AB)+ such that:

∑ π ( x, y) = u AB
x ∈ E,y∈ F

and, for ω ∈ Ω( AB), ω ◦ π is a joint state on A and B, as deﬁned in Section 2.

97
Entropy 2018, 20, 227

The idea here, expressed in Alice-and-Bob language (Alice controlling system A, Bob controlling
system B), is that π ( x, y) is an effect of the composite system AB, corresponding to x being observed
by Alice and y, by Bob. In many cases, π ( x, y) will actually be an outcome in X ( AB). Indeed,
we usually have π : X ( A) × X ( B) → X ( AB) injective, and for E ∈ M( A), F ∈ M( B), π ( E × F ) =
{π ( x, y)| x ∈ E, y ∈ F } a test in M( AB). The rank of AB will then be the product of the ranks of A
and B. Accordingly, let us call a non-signaling composite with these these properties multiplicative.
Composites in real and complex quantum mechanics are multiplicative; in quaternionic quantum
mechanics, with the most plausible deﬁnition of tensor product, they are not [28].
Therefore, the question becomes: can one construct, for Jordan models A and B, a non-signaling
composite AB that is also a Jordan model? At present, and in this generality, this question seems to be
open, but some progress is made in [28]: if neither A, nor B contain the exceptional Jordan algebra as a
summand, such a composite can indeed be constructed, and in multiple ways. Moreover, under a
considerably more restrictive deﬁnition of “Jordan composite”, no Jordan composite AB can exist if
either factor has an exceptional summand.

Categories of Self-Dual Probabilistic Models. It is natural to interpret a physical theory as a category,

in which objects represent physical systems and morphisms represent physical processes having these
systems (or their states) as inputs and outputs. In order to discuss composite systems, this should
be a symmetric monoidal category. That is, for every pair of objects A, B, there should be an object
A ⊗ B, and for every pair of morphisms f : A → A and g : B → B , there should be a morphism
f ⊗ g : A ⊗ B → A ⊗ B , representing the two processes f and g occurring “in parallel”. One requires
that ⊗ be associative and commutative, and have a unit object I, in the sense that there exist canonical
isomorphisms α A,B;C : A ⊗ ( B ⊗ C ) ( A ⊗ B) ⊗ C, σA,B : A ⊗ B B ⊗ A, λ A : I ⊗ A A and
ρ A : A ⊗ I → A/ These must satisfy various “naturality conditions”, guaranteeing that they interact
correctly; see [29] for details. One also requires that ⊗ be bifunctorial, meaning that id A ⊗ idB = id A⊗ B ,
and if f : A → A , f : A → A , g : B → B and g : B → B , then:

( f ⊗ g ) ◦ ( f ⊗ g ) = ( f ◦ f ) ⊗ ( g ◦ g ).

By a probabilistic theory, I mean a category of probabilistic models and processes; that is, objects
of C are models, and a morphism A → B, where A, B ∈ C , is a process V( A) → V( B). A monoidal
probabilistic theory is such a category, C , carrying a symmetric monoidal structure A, B → AB, where
AB is a non-signaling composite in the sense of the deﬁnition above. I also assume that the monoidal
unit, I, is the trivial Model 1 with V(1) = R, and that, for all A ∈ C ,

(a) α ∈ Ω( A) iff the mapping α : R → V( A) given by α(1) = α belongs to C( I, A);

(b) The evaluation functional x" belongs to C( A, I ) for all outcomes x ∈ X ( A).

Call C locally tomographic iff AB is a locally tomographic composite for all A, B ∈ C . Much of the
qualitative content of (ﬁnite-dimensional) quantum information theory can be formulated in purely
categorical terms [11,18,30]. In particular, in the work of Abramsky and Coecke [18], it is shown that a
range of quantum phenomena, notably gate teleportation, is available in any dagger-compact category.
For a review of this notion, as well as a proof of the following result, see Appendix D:

Theorem 4. Let C be a locally-tomographic monoidal probabilistic theory, in which every object A ∈ C is sharp,
spectral and has a conjugate A ∈ C , with η A ∈ Ω( AA). Assume also that, for all A, B ∈ C ,

(i) A = A, with η A ( a, b) = η A ( a, b);

(ii) If φ ∈ C( A, B), then φ ∈ C( A, B).

Then, C has a canonical dagger-compact structure, in which A is the dual of A with η A : R → V( AA) as
the co-unit.

98
Entropy 2018, 20, 227

Jordan composites. The local tomography assumption in Theorem 4 is a strong constraint. As is

well known, the standard composite of two real quantum systems is not locally tomographic,
yet the category of finite-dimensional real mixed-state quantum systems is certainly dagger-compact
and satisfies the other assumptions of Theorem 4, so local tomography is definitely not a necessary
condition for dagger-compactness.
This raises some questions. One is whether local tomography can simply be dropped in the
statement of Theorem 4. At any rate, at present, I do not know of any non-dagger-compact monoidal
probabilistic theory satisfying the other assumptions.
Another question is whether there exist examples other than real QM of non-locally-tomographic,
but still dagger-compact, monoidal probabilistic theories satisfying the assumptions of Theorem 2.
The answer to this is yes. Without going into detail, the main result of [28] is that one can construct
a dagger-compact category in which the objects are Hermitian parts of finite-dimensional real,
complex and quaternionic matrix algebras, that is the Euclidean Jordan algebras corresponding
to finite-dimensional real, complex or quaternionic quantum-mechanical systems, and morphisms
are certain completely positive mappings between enveloping complex ∗-algebras for these Jordan
algebras. The monoidal structure gives almost the expected results: the composite of two real quantum
systems is the real system corresponding to the usual (real) quantum-mechanical composite of the
two components (and, in particular, is not locally tomographic). The composite of two quaternionic
systems is a real system (see [11] for an account of why this is just what one wants). The composite of
a real and a complex, or a quaternionic and a complex, system is again complex. The one surprise is
that the composite of two standard complex quantum systems, in this category, is not the usual thing,
but rather, comes with an extra superselection rule. This functions to make time-reversal a legitimate
physical operation on complex systems, as it is for real and quaternionic systems. This is part of the
price one pays for the dagger-compactness of this category.

7. Conclusions
As promised, we have here an easy derivation of something close to orthodox, finite-dimensional
QM, from operationally or probabilistically transparent assumptions. As discussed earlier,
this approach offers, in addition to its relative simplicity, greater latitude than the locally-tomographic
axiomatic reconstructions of [7–10], putting us in the slightly less constrained realm of formally real
Jordan algebras. This allows for real and quaternionic quantum systems, superselection rules and even
theories, such as the ones discussed in Section 6, in which real, complex and quaternionic quantum
systems coexist and interact.
There remains some mystery as to the proper interpretation of the conjugate system A.
Operationally, the situation is clear enough: if we understand A as controlled by Alice and A, by
Bob, then if Alice and Bob share the state η A , then they will always obtain the same result, as long as
they perform the same test. However, what does it mean physically that this should be possible (in
a situation in which Alice and Bob are still able to choose their tests independently)? In fact, there
is little consensus (that I can find, anyway) among physicists as to the proper interpretation of the
conjugate of the Hilbert space representing a given quantum-mechanical system. One popular idea
is that the conjugate is a time-reversed version of the given system; but why, then, should we expect
to find a state that perfectly correlates the two? At any rate, finding a clear physical interpretation of
conjugate systems, even (or especially!) in orthodox quantum mechanics, seems to me an urgently
important problem.
I would like to close with another problem, this one of mainly mathematical interest. The
hypotheses of Theorem 2 yield a good deal more structure than just a homogeneous, self-dual cone.
In particular, we have a distinguished set M( A) of orthonormal observables in V∗ ( A), with respect
to which every effect has a spectral decomposition. Moreover, with a bit of work, one can show that
this decomposition is essentially unique. More exactly, if a = ∑i ti pi where the coefficients ti are all
distinct and the effects p1 , ..., pk are associated with a coarse-graining of a test E ∈ M( A), then both

99
Entropy 2018, 20, 227

the coefﬁcients and the effects are uniquely determined. The details are in Appendix B. Using this,
we have a functional calculus on V∗ ( A), i.e., for any real-valued function f of a real variable and any
effect a with spectral decomposition ∑i ti pi as above, we can deﬁne f ( a) = ∑i f (ti ) pi . This gives us a
unique candidate for the Jordan product of effects a and b, namely,

·
a b = 12 (( a + b)2 − a2 − b2 )).

We know from Theorem 2 (and thus, ultimately, from the KV theorem) that this is bilinear.
The challenge is to show this without appealing to the KV theorem (the fact that the state spaces of
“bits” are always balls, as shown in Appendix C, is perhaps relevant here).

Acknowledgments: This paper is partly based on talks given in workshops and seminars in Amsterdam, Oxford,
in 2014 and 2015, and was largely written while the author was a guest of the Quantum Group at the Oxford
Computing Laboratory, supported by a grant (FQXi-RFP3-1348) from the FQXifoundation. I would like to thank
Sonja Smets (in Amsterdam) and Bob Coecke (in Oxford) for their hospitality on these occasions. I also wish to
thank Carlo Maria Scandolo for his careful reading of, and useful comments on, two earlier drafts of this paper.
Conﬂicts of Interest: The author declares no conﬂict of interest.

Appendix A. Models with Symmetry

Recall that a probabilistic model A is sharp iff, for every measurement outcome x ∈ X ( A),
there exists a unique state δx ∈ Ω( A) with δx ( x ) = 1. While this is clearly a very strong condition,
it is not an unreasonable one. In fact, given the test space M( A), we can often choose the state space
Ω( A) in such a way as to guarantee that A is sharp. In particular, this is the case when M( A) enjoys
enough symmetry.

Deﬁnition A1. Let G be a group. A G-test space is a test space ( X, M) where X is a G-space, that is, where X
comes equipped with a preferred G-action G × X → X, ( g, x ) → gx, such that gE ∈ M for all E ∈ M. A
G-model is a probabilistic model A such that (i) M( A) is a G-test space and (ii) Ω( A) is invariant under the
action of G on probability weights given by α → gα := α ◦ g−1 for g ∈ G.

Lemma A1. Let A be a ﬁnite-dimensional G-model, and suppose G acts transitively on the outcome space
X ( A). Suppose also that A is unital, i.e., for every x ∈ X ( A), there exists at least one state α with α( x ) = 1.
Then, there exists a G-invariant convex subset Δ ⊆ Ω( A) such that A = (M( A), Δ) is a sharp G-model.

Proof. For each x ∈ X ( A), let Fx denote the face of Ω( A) consisting of states α with α( x ) = 1. Let β x
be the barycenter of Fx . It is easy to check that Fgx = gFx for every g ∈ G. Thus, gβ x = β gx , i.e., the set
of barycenters β x is an orbit. Let Δ be the convex hull of these barycenters. Then, Δ is invariant under
G. If α ∈ Δ with α( x ) = 1, then α ∈ Fx ∩ Δ = { β x }, so (M( A), Δ) is sharp.

Appendix B. Uniqueness of Spectral Decompositions

Let A be a model satisfying the conditions of Lemma 1. In particular, every a ∈ E( A) = V∗ ( A)
has a spectral representation a = ∑ x∈ E t x x" for some test E ∈ M( A). In general, this expansion is
highly non-unique. For instance, the unit u A can be expanded as ∑ x∈ E x" for any test E ∈ M( A).
The aim in this Appendix is to obtain a form of spectral expansion for effects that is unique.
Call a subset of a test an event. That is, D ⊆ X ( A) is an event iff there exists a test E ∈ M( A)
with D ⊆ E. The probability of an effect D in a state α is α( D ) = ∑ x∈ E α( x ). Thus, any event gives rise
" given by D
to an effect, D, " (α) = α( D ). Evidently,

" :=
D ∑ x".
x∈E

" = u.
A test is a maximal event, and for any test E ∈ M( A), D

100
Entropy 2018, 20, 227

Deﬁnition A2. An effect p ∈ V∗ ( A) is sharp iff it has the form p = D " for some event D. A set of sharp effects
p1 , ..., pn ∈ V∗ ( A) is jointly orthogonal with respect to M( A) iff there exists a test E ∈ M( A) and pairwise
disjoint events D1 , ..., Dn ⊆ E with pi = D " i for i = 1, ..., n.

Given an arbitrary element a ∈ V∗ ( A) with spectral decomposition a = ∑ x∈ E t x x", we can isolate

distinct values to > t1 > ... > tk of the coefﬁcients t x . Letting Ei = { x ∈ E|t x = ti } and setting
pi = p( Ei ) = ∑ x∈ Ei x", we have a = ∑i ti pi , with p1 , ..., pn jointly orthogonal. Suppose there is another
such decomposition, say a = ∑ j s j q j , with q j = F"j = ∑y∈ Fj y", where F1 , ..., Fl ⊆ F ∈ M( A) are pairwise
disjoint, and again, with the coefﬁcients in descending order, say s0 > s1 > · · · > sl .

Lemma A2. In the situation described above, t0 = s0 and p0 = q0 .

Proof. Normalize the inner product on E( A) so that x = 1 for all outcomes x. Then, for any sharp
" D an event, we have D 2 = | D |, the cardinality of D. Choosing any outcome x0 ∈ E0 ,
effect p = D,
set α = | x0 , i.e., α( x") = x", x"0 for all x ∈ X ( A). Then, α ∈ Ω( A), α( p0 ) = 1 and α( pi ) = 0 for
i > 0. Thus,
t0 = α ( a ) = ∑ s j α ( q j ).
j

Since the coefﬁcients α(q j ) are sub-convex, the right-hand side is no larger than the largest of the
values s j , namely, so . Thus, t0 ≤ s0 . The same argument, with the roles of the two decompositions
reversed, shows that s0 ≤ t0 . Thus, s0 = t0 .
Now again, let x ∈ E0 : then,

x", p0 = ∑ x", y" = x, x = 1,

y∈ E0

whence, x", a = to . However, we then have (using the fact that s0 = t0 ):

% &
l l
t0 = x", a = x" , t0 q0 + ∑ sj qj = to x", q0 + ∑ s j x", q j .
j =1 j =1

Since ∑lj=0 x", q j ≤ x", u =≤ 1, the sum in the last expression above is a sub-convex combination
of the distinct values so > · · · > sl . This can equal t0 = s0 , the maximum of these values, only if
x", q0 = 1 and x", q j = 0 for the remaining q j . It follows that p0 , q0 = ∑ x∈E0 x", q0 = | E0 | = p0 2 .
The same argument, with p’s and q’s interchanged, shows that p0 , q0 = q0 2 . Hence, p0 = q0 ,
and p0 , q0 = p0 2 = p0 q0 , whence, p0 = q0

Proposition A1. Every a ∈ V∗ ( A) has a unique expansion of the form a = ∑ik=0 ti pi where t0 > t1 > ... > tk
are non-zero coefﬁcients and p1 , ..., pn are jointly orthogonal sharp effects.

Proof. Suppose a = ∑ik=1 ti pi , as above, and also a = ∑lj=1 s j q j , with s0 > · · · > sl > 0 and q j pairwise
orthogonal sharp effects. We shall show that k = l, and that ti = si and pi = qi for each i = 1, ..., k.
Lemma A2 tells us that t0 = s0 and p0 = s0 . Hence,

k l
∑ t i p i = a − t o p o = a − s0 q0 = ∑ s j q j .
i =1 j =1

Applying Lemma A2 recursively, we ﬁnd that ti = si and pi = qi for i = 1, ..., min(k, l ). If k

= l,
say k < l; we then have:
l l
tk pk = sk qk + ∑ s j q j = tk pk + ∑ sj qj
j = k +1 j = k +1

101
Entropy 2018, 20, 227

whence, ∑lj=k+1 s j q j = 0, which is impossible since all q j are sharp and the coefﬁcients s j are strictly
positive. Hence, l = k, and the proof is complete.

Appendix C. Bits Are Balls

In most other reconstructions of QM [8–10], the ﬁrst step is to show that the state space of a bit,
that is, a system in which every state is the mixture of two sharply-distinguishable pure states, is a ball.
In our approach, this fact is an easy consequence of Lemma 1. In our framework, we will deﬁne a bit to
be a sharp, uniform model A with rank two, in which every state has the form tδx + (1 − t)δx , where
{ x, x } ∈ M( A). Note that this implies that A is spectral.

Lemma A3. Let A be a bit with conjugate A. Then, Ω( A) is a Euclidean ball, the extreme points of which are
the states δx , x ∈ X ( A).

Proof. By Lemma 1, E( A) carries a self-dualizing inner product such that x", y" = 0 for { x, y} ∈
M( A), and which we can normalize so that x" = 1 for each outcome x ∈ X ( A), so that u, x" =
x", x" = 1 and u2 = 2. Every state α ∈ Ω( A) corresponds to a unique vector a ∈ E( A)+ with
a, u = 1, where α( x ) = a, x" for all x ∈ X ( A); conversely, every vector a ∈ E( A)+ with a, u = 1
corresponds in this way to a state. In particular, the state δx corresponds to the unit vectors x", and the
maximally-mixed state corresponds to the vector n1 u. To simplify the notation, let us agree for the
moment to write ρ for this vector. Thus, ρ, x" = 12 , ρ2 = 14 u, u = 12 , and hence,

ρ − x"2 = ρ2 − 2 ρ, x" + x"2 = 12 .

√
Thus, X" ( A) := { x" | x ∈ X ( A) } lies on the sphere of radius 1/ 2 about the state ρ. I now claim
√
that any a ∈ E( A) with a, u = 1 (in effect, any state) such that ρ − a ≤ 1/ 2 belongs to the
positive cone E( A)+ . To see this, use spectrality to decompose a as s" x + t"y where { x, y} ∈ M( A)
and s, t ∈ R. Consider now the two-dimensional subspace E x,y spanned by x" and y". With respect
to the inner product inherited from E, we can regard this as a two-dimensional Euclidean space, in
which a is represented by the Cartesian coordinate pair (s, t). Expanding ρ as ρ = 12 ( x" + y"), we see √
that ρ ∈ E x,y with coordinates (1/2, 1/2). The point (t, s) lies, therefore, in the disk of radius 1/ 2
centered at (1/2, 1/2) in E x,y . Moreover, as a, u = 1, we see that s + t = 1, i.e., (s, t) lies on the line of
slope −1 through (1/2, 1/2). This puts (s, t) in the positive quadrant of this plane, i.e., s ≥ 0 and t ≥ 0.
However, then a ∈ E( A)+ , as claimed.

It follows that, for rank-two models, we do not even need to invoke homogeneity: they all
correspond to spin factors. Letting d denote the dimension of the state space (that is, d = dim(E) − 1),
we see that if d = 1, we have the classical bit; d = 2 gives the real quantum-mechanical bit, d = 3
gives the familiar Bloch sphere, i.e., the usual qubit of complex QM; while d = 5 corresponds to the
quaternionic unit sphere, giving us the quaternionic bit. The generalized bits with d = 4 and d ≥ 6 are
more exotic “post-quantum” possibilities.

Appendix D. Locally-Tomographic and Dagger-Compactness

A dagger on a category C is a contravariant functor † : C → C that is the identity on objects
f f†
and satisﬁes † ◦ † = idC . That is, if A −→ B is a morphism in C , then A ←− B, with f †† = f and
( f ◦ g)† = g† ◦ f † whenever f ◦ g is deﬁned. An isomorphism f : A B in C is then said to be unitary
iff f † = f −1 . One says that C is †-monoidal iff C is equipped with a symmetric monoidal structure ⊗
such that ( f ⊗ g)† = f † ⊗ g† , and such that the canonical isomorphisms α A,B,C , σA,B , λ A and ρ A are
all unitary.

102
Entropy 2018, 20, 227

A dual for an object A in a symmetric monoidal category C is a structure ( A , η, ) where A ∈ C

and η : I → A ⊗ A and : A ⊗ A → I, such that:

(id A ⊗ ) ◦ (η ⊗ id A ) = id A and ( ⊗ id A ) ◦ (id A ⊗ η ) = id A

up to the natural associator and unit isomorphisms. If C is †-monoidal and = σA,A ◦ η A † , then ( A , η, )

is a dagger-dual. A category in which every object A has a specified dual ( A , η A , A ) is compact closed,
and a dagger-monoidal category in which every object has a given dagger-dual is dagger-compact.
See [18,30] for details.
An important example of all this is the category FdHilbR of finite-dimensional real Hilbert
spaces and linear mappings. If H and K are two such spaces and φ : H → K, let φ† be the usual
adjoint of φ with respect to the given inner products. Letting H ⊗ K be the usual tensor product of
H and K (in particular, with x ⊗ y, u ⊗ v = x, u y, v for x, u ∈ H and y, v ∈ K), FdHilbR is a
dagger-monoidal category with R as the monoidal unit.
Since any H ∈ FdHilbR is canonically isomorphic to its dual space, we have also a canonical
isomorphism H ⊗ H H∗ ⊗ H = L(H) and a canonical trace functional TrH : H ⊗ H → R,
uniquely defined by TrH ( x ⊗ y) = x, y for all x, y ∈ H. Taking H = H, let ηH ∈ H ⊗ H be given
by ηH = ∑i xi ⊗ xi , where the sum is taken over any orthonormal basis { xi } for H; then, for any
a ∈ H ⊗ H, η A , a = Tr( a). It is routine to show that TrH = σH,H ◦ ηH
† , so that η and Tr make H
H H
its own dagger-dual.
In any compact closed symmetric monoidal category C , every morphism φ : A → B yields a dual
morphism φ : B → A defined by:

φ = (id A ⊗ B ) ◦ (id A ⊗ f ⊗ idB ) ◦ (η A ⊗ idB ).

(again, suppressing associators and left and right units). For φ : H → K in FdHilbR , one has, for any
v ∈ A,
φ (v) = ∑ v, f ( x ) x = ∑ f † (v), x x = f † (v),
x∈ M x∈ M

i.e., φ = φ† .
Now, let C be a monoidal probabilistic theory; that is, a category of probabilistic models and
processes, with a symmetric monoidal structure A, B → AB, where AB is a (non-signaling) composite
in the sense discussed in Section 6. Let C is multiplicative, so that for A, B ∈ C , we have π AB :
X ( A) × X ( B) → X ( AB). Henceforward, I will write x ⊗ y for π ( x, y) where x ∈ X ( A) and y ∈ X ( B).
I will further assume that C ’s tensor unit is I = R, and that:
(a) Every A ∈ C has a conjugate, A ∈ C , with A = A;
(b) For all A, B ∈ C and φ ∈ C( A, B), φ ∈ C( A, B);
(c) A = A, with η A ( a, b) := η A ( a, b).

Remark A1. (1) The chosen conjugate A for A ∈ C required by Condition (a) is equipped with a canonical
isomorphism γ A : A A, with x = γ( x ) for every x ∈ X ( A). As discussed in Section 4, this extends to an
order-isomorphism E( A) E( A), which we again write as γ A ( a) = a for a ∈ E( A). Notice, however, that γ A
is not assumed to be a morphism in C .
(2) In spite of this, Condition (b) requires that φ = γB ◦ φ ◦ γ−
A does belong to C( A, B ) for φ ∈ C( A, B ).
1

Notice here that φ → φ is functorial.

(3) The second part of Condition (c) is redundant if every model A in C is sharp (since in this case, there is
at most one correlator between A and A). Notice, too, that Condition (c) implies that:

x, y = η A ( x, y) = η A ( x, y) = x", y"

for all x, y ∈ E( A).

103
Entropy 2018, 20, 227

We are now ready to prove Theorem 4. We continue to assume that C is a locally-tomographic,

multiplicative monoidal probabilistic theory, satisfying Conditions (a), (b) and (c) above. We wish to
show that if every A ∈ C is sharp and spectral, then C has a canonical dagger, with respect to which it
is dagger-compact.
Before proceeding, it will be convenient to dualize our representation of morphisms, so that
φ ∈ C( A, B) means that φ is a positive linear mapping E( B) → E( A) (thus, our co-unit η ∈ C( I, A ⊗ A )
becomes a positive linear mapping η A : E( A ⊗ A ) → R, and similarly, a unit A ∈ C( A ⊗ A, I )
becomes a positive linear mapping R → E( A ⊗ A), i.e, an element of E( A ⊗ A )). By Lemma 1,
for every A ∈ C , the space E( A) carries a canonical self-dualizing inner product , A , with respect to
which E( A) V( A).

Lemma A4. For all models A, B ∈ C , the inner product on E( AB) factors, in the sense that if u, x ∈ E( A)
and v, y ∈ E( B), then u ⊗ v, x ⊗ y = u, x v, y.

Proof. This follows from the sharpness of A, B and AB. For u ∈ X ( A), v ∈ X ( B), let δu , δv and δu⊗v
denote the unique states of A, B and AB such that δu (u) = δv (v) = δu⊗v (u ⊗ v) = 1. Since (δu ⊗
δv )(u ⊗ v) is also one, we conclude that δu⊗v = δu ⊗ δv . However, we also have δu ( x ) = n u", x",
δv (y) = m v", y" and δu⊗v ( x ⊗ y) = nm u" ⊗ v", x" ⊗ y", where n, m and nm are the ranks, respectively, of
A, B and A ⊗ B. This establishes the claim.

It follows that C is a monoidal subcategory of FdHilbR . In effect, we are going to show that
C inherits a dagger-compact structure from FdHilbR , with the minor twist that we will take A,
rather than A, as the dual for A ∈ C . We define the dagger of φ ∈ C( A, B) to be the Hermitian adjoint
of φ : E( A) → E( B) with respect to the canonical inner products on E( A) and E( B). At this point, it is
not obvious that φ† belongs to C . In order to show that it does, we first need to show that C is compact
closed. To define the unit, let e A ∈ E( A) ⊗ E( A) = E( AA) (note the use of local tomography here) to
be the vector with e A , · = η A , i.e., for all a, b ∈ E( A),

e A , a ⊗ b) = η A ( a ⊗ b) = a, b.

Since E( AA) is self-dual, e A ∈ E( AA)+ .

Lemma A5. With η A and e A deﬁned as above, A is a dual for A for every A ∈ C . In particular, C is
compact closed.

Proof. Choose an orthonormal basis M ⊆ E( A). Local tomography and Lemma A4 tell us that
M ⊗ M = { a ⊗ a| a ∈ M } is then an orthonormal basis for E( AA) (note here that a, b ∈ M are not
necessarily even positive, let alone in X ( A)). If we expand e A with respect to this basis, we have:

eA = ∑ e A , a ⊗ b a ⊗ b
a,b∈ M

Since the basis is orthonormal, we have:

e A , a ⊗ a = a, a = a2 = 1

and for a
= b, both in M,
e A , a ⊗ b = a, b = 0

104
Entropy 2018, 20, 227

Hence, e A = ∑ a∈ M a ⊗ a. Regarding e A as a morphism I → A ⊗ A, we now have, for any

v ∈ E ( A ),

(η A ⊗ id A ) ◦ (id A ⊗ e A )(v) = (η A ⊗ id A ) ∑ v⊗a⊗a
x∈ M
= ∑ η A (v ⊗ a) a
x∈ M
= ∑ v, aa = v.
x∈ M

Similarly, for v ∈ A,

(id A ⊗ η A ) ◦ (e A ⊗ id A )(v) = (id A ⊗ η A ) ∑ a⊗a⊗v
a∈ M
= ∑ aη A ( a, v) = ∑ a, va
x∈ M a∈ M
= ∑ v, aa = v.
a∈ M

Lemma A6. If φ : E( A) → E( B) belongs to C , then so does φ† : E( B) → E( A).

Proof. Using the compact structure on C deﬁned above, if φ : A → B, we construct the dual of φ,

φ := (ηB ⊗ id A ) ◦ (idB ⊗ φ ⊗ id A ) ◦ (idB ⊗ e A ) : E( B) → E( A).

Applying this mapping to b ∈ E( B), we have:

b → (ηB ⊗ id A ) ∑ b ⊗ φ( a) ⊗ a = ∑ ηB (b, φ( a)) a.
a∈ M a∈ M
= ∑ b, φ(a)a
a∈ M
= ∑ φ † ( b ), a a = φ † ( b ).
a∈ M

Thus, φ† = φ , which is evidently a morphism in C .
Thus, C is a dagger-, as well as a monoidal, sub-category of FdHilbR . Hence, the associator, swap
and left- and right-unit morphisms associated with an object A ∈ C are all unitary (since they are
unitary in FdHilbR ), whence C is dagger-monoidal. To complete the proof of Theorem 4, we need to
check that η A = e†A ◦ σA,A : E( AA) → R. In view of our local tomography assumption, it is enough to
check this on pure tensors, where a routine computation gives us e†A (σA,A ( a ⊗ b)) = e†A (b ⊗ a), 11 =
b ⊗ a, e A AA = a, b = η A ( a ⊗ b).

Remark A2. Given that C is compact closed, with A the dual of A, the functoriality of φ → φ makes C strongly
compact closed, in the sense of [18]. This is equivalent to dagger-compactness.

References
1. Von Neumann, J. Mathematical Foundations of Quantum Mechanics; Princeton University Press: Princeton, NJ,
USA, 1955.
2. Schwinger, J. The algebra of microscopic measurement. Proc. Natl. Acad. Sci. USA 1959, 45, 1542–1553.

105
Entropy 2018, 20, 227

3. Mackey, G.W. Mathematical Foundations of Quantum Mechanics; Dover Publications, Inc.: Mineola, NY,
USA, 2004.
4. Ludwig, G. Foundations of Quantum Mechanics I; Springer: New York, NY, USA, 1983.
5. Piron, C. Mathematical Foundations of Quantum Mechanics; Academic Press: Cambridge, MA, USA, 1978.
6. Barnum, H.; Müller, M.; Ududec, C. Higher-order interference and single-system postulates characterizing
quantum theory. New J. Phys. 2014, 16, 123029.
7. Hardy, L. Quantm theory from five reasonable axioms. arXiv 2001, arXiv:quant-ph/0101012.
8. Chiribella, G.; D’Ariano, M.; Perinotti, P. Informational derivation of quantum theory. Phys. Rev. A 2011, 84,
012311.
9. Dakic, B.; Brukner, C. Quantum theory and beyond: Is entanglement special? arXiv 2009, arXiv:0911.0695.
10. Masanes, L.; Müller, M. A derivation of quantum theory from physical requirements. New J. Phys. 2011,
13, 063001.
11. Baez, J. Division algebras and quantum theory. Found. Phys. 2012, 42, 819–855.
12. Janotta, P.; Lal, R. Generalized probabilistic theories without the no-restriction hypothesis. Phys. Rev. A 2013,
87, 052131.
13. Faraut, J.; Koranyi, A. Analysis on Symmetric Cones; Oxford University Press: London, UK, 1994.
14. Wilce, A. 4.5 axioms for finite-dimensional quantum probability. In Probability in Physics; Ben-Menahem, Y.,
Hemmo, M., Eds.; Springer: New York, NY, USA, 2012.
15. Wilce, A. Symmetry and composition in probabilistic theories. Electron. Notes Theor. Comput. Sci. 2011, 270,
191–207.
16. Wilce, A. Symmetry, self-duality and the Jordan structure of finite-dimensional quantum mechanics. arXiv
2011, arxiv:1110.6607.
17. Wilce, A. Conjugates, Filters and Quantum Mechanics. arXiv 2012, arxiv.org/pdf/1206.2897.
18. Abramsky, S.; Coecke, B. Abstract Physical Traces. Theor. Appl. Categories 2005, 14, 111–124.
19. Barnum, H.; Graydon, M.A.; Wilce, A. Some nearly quantum theories. arXiv 2015, arXiv:1507.06278.
20. Jordan, P. Über ein Klasse nichtassoziativer hypercomplexe algebren. Nachr. Akad. Wiss. Göttingen Math.
Phys. Kl. I. 1933, 33, 569–575. (In German)
21. Von Neumann, J. On an algebraic generalization of the quantum mechanical formalism (Part I). Ann. Math.
1936, 1, 415–484.
22. Aliprantis, C.D.; Toukey, R. Cones and Duality; American Mathematical Society: Providence, RI, USA, 2007.
23. Foulis, D.J.; Randall, C.H. Empirical logic and tensor products. In Interpretations and Foundations of Quantum
Theory; Neumann, H., Ed.; Bibliographisches Inst.: Mannheim, Germany, 1981.
24. Barnum, H.; Wilce, A. Post-classical probability theory. In Quantum Theory: Informational Foundations and
Foils; Chiribella, G., Spekkens, R., Eds.; Springer: Dordrecht, The Netherlands, 2016.
25. Popescu, S.; Rohrlich, D. Nonlocality as an axiom. Found. Phys. 1994, 24, 379–385.
26. Gunson, J. On the algebraic structure of quantum mechanics. Commun. Math. Phys. 1967, 6, 262–285.
27. Chribella, G.; Scandolo, C.M. Operational axioms for state diagonalization. arXiv 2015, arXiv:1506:00380.
28. Barnum, H.; Graydon, M.; Wilce, A. Composites and categories of Euclidean Jordan algebras. arXiv 2016,
arXiv:1606.09331.
29. Mac Lane, S. Categories for the Working Mathematician; Springer: New York, NY, USA, 1978.
30. Selinger, P. Dagger compact closed categories and completely positive maps. Electron. Notes Theor. Comput. Sci.
2007, 170, 139–163.

106
entropy
Article
Agents, Subsystems, and the Conservation
of Information
Giulio Chiribella 1,2,3,4
1 Department of Computer Science, University of Oxford, Parks Road, Oxford OX1 3QD, UK;
[email protected]
2 Canadian Institute for Advanced Research, CIFAR Program in Quantum Information Science,
661 University Ave, Toronto, ON M5G 1M1, Canada
3 Department of Computer Science, The University of Hong Kong, Pokfulam Road, Hong Kong, China
4 HKU Shenzhen Institute of Research and Innovation, Yuexing 2nd Rd Nanshan, Shenzhen 518057, China

Received: 12 March 2018; Accepted: 5 May 2018; Published: 10 May 2018

Abstract: Dividing the world into subsystems is an important component of the scientific method.
The choice of subsystems, however, is not defined a priori. Typically, it is dictated by experimental
capabilities, which may be different for different agents. Here, we propose a way to define
subsystems in general physical theories, including theories beyond quantum and classical mechanics.
Our construction associates every agent A with a subsystem S A , equipped with its set of states and its
set of transformations. In quantum theory, this construction accommodates the notion of subsystems
as factors of a tensor product, as well as the notion of subsystems associated with a subalgebra of
operators. Classical systems can be interpreted as subsystems of quantum systems in different ways,
by applying our construction to agents who have access to different sets of operations, including
multiphase covariant channels and certain sets of free operations arising in the resource theory of
quantum coherence. After illustrating the basic definitions, we restrict our attention to closed systems,
that is, systems where all physical transformations act invertibly and where all states can be generated
from a fixed initial state. For closed systems, we show that all the states of all subsystems admit a
canonical purification. This result extends the purification principle to a broader setting, in which
coherent superpositions can be interpreted as purifications of incoherent mixtures.

Keywords: subsystem; agent; conservation of information; puriﬁcation; group representations;

commuting subalgebras

1. Introduction
The composition of systems and operations is a fundamental primitive in our modelling of
the world. It has been investigated in depth in quantum information theory [1,2], and in the
foundations of quantum mechanics, where composition has played a key role from the early days
of Einstein–Podolski–Rosen [3] and Schroedinger [4]. At the level of frameworks, the most recent
developments are the compositional frameworks of general probabilistic theories [5–15] and categorical
quantum mechanics [16–20].
The mathematical structure underpinning most compositional approaches is the structure of
monoidal category [18,21]. Informally, a monoidal category describes circuits, in which wires represent
systems and boxes represent operations, as in the following diagram:

Entropy 2018, 20, 358; doi:10.3390/e20050358 107 www.mdpi.com/journal/entropy

Entropy 2018, 20, 358

A A

B U B B (1)
C V C

The composition of systems is described by a binary operation denoted by ⊗, and referred to as

the “tensor product” (note that ⊗ is not necessarily a tensor product of vector spaces). The system
A ⊗ B is interpreted as the composite system made of subsystems A and B. Larger systems are built in
a bottom-up fashion, by combining subsystems together. For example, a quantum system of dimension
d = 2n can arise from the composition of n single qubits.
In some situations, having a rigid decomposition into subsystems is neither the most convenient
nor the most natural approach. For example, in algebraic quantum field theory [22], it is natural
to start from a single system—the field—and then to identify subsystems, e.g., spatial or temporal
modes. The construction of the subsystems is rather flexible, as there is no privileged decomposition
of the field into modes. Another example of flexible decomposition into subsystems arises in quantum
information, where it is crucial to identify degrees of freedom that can be treated as “qubits”. Viola,
Knill, and Laflamme [23] and Zanardi, Lidar, and Lloyd [24] proposed that the partition of a system
into subsystems should depend on which operations are experimentally accessible. This flexible
definition of subsystem has been exploited in quantum error correction, where decoherence free
subsystems are used to construct logical qubits that are untouched by noise [25–30]. The logical qubits
are described by “virtual subsystems" of the total Hilbert space [31], and in general such subsystems
are spread over many physical qubits. In all these examples, the subsystems are constructed through
an algebraic procedure, whereby the subsystems are associated with algebras of observables [32].
However, the notion of “algebra of observables” is less appealing in the context of general physical
theories, because the multiplication of two observables may not be defined. For example, in the
framework of general probabilistic theories [5–15], observables represent measurement procedures,
and there is no notion of “multiplication of two measurement procedures”.
In this paper, we propose a construction of subsystems that can be applied to general physical
theories, even in scenarios where observables and measurements are not included in the framework.
The core of our construction is to associate subsystems to sets of operations, rather than observables.
To fix ideas, it is helpful to think that the operations can be performed by some agent. Given a set
of operations, the construction extracts the degrees of freedom that are acted upon only by those
operations, identifying a “private space” that only the agent can access. Such a private space
then becomes the subsystem, equipped with its own set of states and its own set of operations.
This construction is closely related to an approach proposed by Krämer and del Rio, in which the
states of a subsystem are identified with equivalence classes of states of the global system [33]. In this
paper, we extend the equivalence relation to transformations, providing a complete description of the
subsystems. We illustrate the construction in a several examples, including

1. quantum subsystems associated with the tensor product of two Hilbert spaces,
2. subsystems associated with an subalgebra of self-adjoint operators on a given Hilbert space,
3. classical systems of quantum systems,
4. subsystems associated with the action of a group representation on a given Hilbert space.

The example of the classical systems has interesting implications for the resource theory
of coherence [34–41]. Our construction implies that different types of agents, corresponding to
different choices of free operations, are associated with the same subsystem, namely the largest
classical subsystem of a given quantum system. Speciﬁcally, classical systems arise from strictly
incoherent operations [41], physically incoherent operations [38,39], phase covariant operations [38–40],
and multiphase covariant operations (to the best of our knowledge, multiphase covariant operations
have not been considered so far in the resource theory of coherence). Notably, we do not obtain classical
subsystems from the maximally incoherent operations [34] and from the incoherent operations [35,36],

108
Entropy 2018, 20, 358

which are the first two sets of free operations proposed in the resource theory of coherence. For these
two types of operations, we find that the associated subsystem is the whole quantum system.
After examining the above examples, we explore the general features of our construction.
An interesting feature is that certain properties, such as the impossibility of instantaneous signalling
between two distinct subsystems, arise by fiat, rather then being postulated as physical requirements.
This fact is potentially useful for the project of finding new axiomatizations of quantum theory [42–48]
because it suggests that some of the axioms assumed in the usual (compositional) framework may turn
out to be consequences of the very definition of subsystem. Leveraging on this fact, one could hope to
find axiomatizations with a smaller number of axioms that pinpoint exactly the distinctive features of
quantum theory. In addition, our construction suggests a desideratum that every truly fundamental
axiom should arguably satisfy: an axiom for quantum theory should hold for all possible subsystems of
quantum systems. We call this requirement Consistency Across Subsystems. If one accepts our broad
definition of subsystems, then Consistency Across Subsystems is a very non-trivial requirement, which
is not easily satisfied. For example, the Subspace Axiom [5], stating that all systems with the same
number of distinguishable states are equivalent, does not satisfy Consistency Across Subsystems
because classical subsystems are not equivalent to the corresponding quantum systems, even if they
have the same number of distinguishable states.
In general, proving that Consistence Across Subsystems is satisfied may require great effort.
Rather than inspecting the existing axioms and checking whether or not they are consistent across
subsystems, one can try to formulate the axioms in a way that guarantees the validity of this property.
We illustrate this idea in the case of the Purification Principle [8,12,13,15,49–51], which is the key
ingredient in the quantum axiomatization of Refs. [13,15,42] and plays a central role in the axiomatic
foundation of quantum thermodynamics [52–54] and quantum information protocols [8,15,55–57].
Specifically, we show that the Purification Principle holds for closed systems, defined as systems where
all transformations are invertible, and where every state can be generated from a fixed initial state by
the action of a suitable transformation. Closed systems satisfy the Conservation of Information [58],
i.e., the requirement that physical dynamics should send distinct states to distinct states. Moreover,
the states of the closed systems can be interpreted as “pure”. In this setting, the general notion of
subsystem captures the idea of purification, and extends it to a broader setting, allowing us to regard
coherent superpositions as the “purifications” of classical probability distributions.
The paper is structured as follows. In Section 2, we outline related works. In Section 3, we present
the main framework and the construction of subsystems. The framework is illustrated with five
concrete examples in Section 4. In Section 5, we discuss the key structures arising from our construction,
such as the notion of partial trace and the validity of the no-signalling property. In Section 6, we identify
two requirements, concerning the existence of agents with non-overlapping sets of operations, and
the ability to generate all states from a given initial state. We also highlight the relation between the
second requirement and the notion of causality. We then move to systems satisfying the Conservation
of Information (Section 7) and we formalize an abstract notion of closed systems (Section 8). For such
systems, we provide a dynamical notion of pure states, and we prove that every subsystem satisfies the
Purification Principle (Section 9). A macro-example, dealing with group representations in quantum
theory is provided in Section 10. Finally, the conclusions are drawn in Section 11.

2. Related Works
In quantum theory, the canonical route to the definition of subsystems is to consider commuting
algebras of observables, associated with independent subsystems. The idea of defining independence
in terms of commutation has a long tradition in quantum field theory and, more recently, quantum
information theory. In algebraic quantum field theory [22], the local subsystems associated with
causally disconnected regions of spacetime are described by commuting C*-algebras. A closely related
approach is to associate quantum systems to von Neumann algebras, which can be characterized as
double commutants [59]. In quantum error correction, decoherence free subsystems are associated

109
Entropy 2018, 20, 358

with the commutant of the noise operators [28,29,31]. In this context, Viola, Knill, and Laflamme [23]
and Zanardi, Lidar, and Lloyd [24] made the point that subsystems should be defined operationally,
in terms of the experimentally accessible operations. The canonical approach of associating subsystems
to subalgebras was further generalized by Barnum, Knill, Ortiz, and Viola [60,61], who proposed the
notion of generalized entanglement, i.e., entanglement relative to a subspace of operators. Later, Barnum,
Ortiz, Somma, and Viola explored this notion in the context of general probabilistic theories [62].
The above works provided a concrete model of subsystems that inspired the present work.
An important difference, however, is that here we will not use the notions of observable and expectation
value. In fact, we will not use any probabilistic notion, making our construction usable also in
frameworks where no notion of measurement is present. This makes the construction appealingly
simple, although the flip side is that more work will have to be done in order to recover the probabilistic
features that are built-in in other frameworks.
More recently, del Rio, Krämer, and Renner [63] proposed a general framework for representing
the knowledge of agents in general theories (see also the Ph.D. theses of del Rio [64] and Krämer [65]).
Krämer and del Rio further developed the framework to address a number of questions related to
locality, associating agents to monoids of operations, and introducing a relation, called convergence
through a monoid, among states of a global system [33]. Here, we will extend this relation to
transformations, and we will propose a general definition of subsystem, equipped with its set of
states and its set of transformations.
Another related work is the work of Brassard and Raymond-Robichaud on no-signalling and
local realism [66]. There, the authors adopt an equivalence relation on transformations, stating that
two transformations are equivalent iff they can be transformed into one another through composition
with a local reversible transformation. Such a relation is related to the equivalence relation on
transformations considered in this paper, in the case of systems satisfying the Conservation of
Information. It is interesting to observe that, notwithstanding the different scopes of Ref. [66] and
this paper, the Conservation of Information plays an important role in both. Ref. [66], along with
discussions with Gilles Brassard during QIP 2017 in Seattle, provided inspiration for the present paper.

3. Constructing Subsystems
Here, we outline the basic deﬁnitions and the construction of subsystems.

3.1. A Pre-Operational Framework

Our starting point is to consider a single system S, with a given set of states and a given set of
transformations. One could think S to be the whole universe, or, more modestly, our “universe of
discourse”, representing the fragment of the world of which we have made a mathematical model.
We denote by St(S) the set of states of the system (sometimes called the “state space”), and by Transf (S)
be the set of transformations the system can undergo. We assume that Transf (S) is equipped with
a composition operation ◦, which maps a pair of transformations A and B into the transformation
B ◦ A. The transformation B ◦ A is interpreted as the transformation occurring when B happens
right before A. We also assume that there exists an identity operation IS , satisfying the condition
A ◦ IS = IS ◦ A = A for every transformation A ∈ Transf ( A). In short, we assume that the physical
transformations form a monoid.
We do not assume any structure on the state space St(S): in particular, we do not assume that
St(S) is convex. We do assume, however, is that there is an action of the monoid Transf (S) on the
set St(S): given an input state ψ ∈ St(S) and a transformation T ∈ Transf (S), the action of the
transformation produces the output state T ψ ∈ St(S).

Example 1 (Closed quantum systems). Let us illustrate the basic framework with a textbook example,
involving a closed quantum system evolving under unitary dynamics. Here, S is a quantum system of dimension

110
Entropy 2018, 20, 358

d, and the state space St(S) is the set of pure quantum states, represented as rays on the complex vector space Cd ,
or equivalently, as rank-one projectors. With this choice, we have

St(S) = |ψ ψ| : | ψ ∈ Cd , ψ|ψ = 1 . (2)

The physical transformations are represented by unitary channels, i.e., by maps of the form |ψ ψ| →
U |ψ ψ|U † , where U ∈ Md (C) is a unitary d-by-d matrix over the complex ﬁeld. In short, we have

Transf (S) = U · U † : U ∈ Md (C) , U†U = U†U = I , (3)

where I is the d-by-d identity matrix. The physical transformations form a monoid, with the composition
operation induced by the matrix multiplication (U · U † ) ◦ (V · V † ) := (UV ) · (UV )† .

Example 2 (Open quantum systems). Generally, a quantum system can be in a mixed state and can undergo
an irreversible evolution. To account for this scenario, we must take the state space St(S) to be the set of all
density matrices. For a system of dimension d, this means that the state space is

St(S) = ρ ∈ Md (C) : ρ≥0 Tr[ρ] = 1 , (4)

where Tr[ρ] = ∑dn=1 n|ρ|n denotes the matrix trace, and ρ ≥ 0 means that the matrix ρ is positive semideﬁnite.
Transf (S) is the set of all quantum channels [67], i.e., the set of all linear, completely positive, and trace-preserving
maps from Md (C) to itself. The action of the quantum channel T on a generic state ρ can be speciﬁed through
the Kraus representation [68]
r
T (ρ) = ∑ Ti ρTi† , (5)
i =1

where { Ti }ri=1 ⊆ Md (C) is a set of matrices satisfying the condition ∑ri=1 Ti† Ti = I. The composition of two
transformations T and S is given by the composition of the corresponding linear maps.

Note that, at this stage, there is no notion of measurement in the framework. The sets St(S) and
Transf (S) are meant as a model of system S irrespectively of anybody’s ability to measure it, or even to
operate on it. For this reason, we call this layer of the framework pre-operational. One can think of the
pre-operational framework as the arena in which agents will act. Of course, the physical description of
such an arena might have been suggested by experiments done earlier on by other agents, but this fact
is inessential for the scope of our paper.

3.2. Agents
Let us introduce agents into the picture. In our framework, an agent A is identiﬁed a set of
transformations, denoted as Act( A; S) and interpreted as the possible actions of A on S. Since the
actions must be allowed physical processes, the inclusion Act( A; S) ⊆ Transf (S) must hold. It is
natural, but not strictly necessary, to assume that the concatenation of two actions is a valid action,
and that the identity transformation is a valid action. When these assumptions are made, Act( A; S) is
a monoid. Still, the construction presented in the following will hold not only for monoids, but also for
generic sets Act( A; S). Hence, we adopt the following minimal deﬁnition:

Deﬁnition 1 (Agents). An agent A is identiﬁed by a subset Act( A; S) ⊆ Transf (S).

Note that this deﬁnition captures only one aspect of agency. Other aspects—such as the ability
to gather information, make decisions, and interact with other agents—are important too, but not
necessary for the scope of this paper.

111
Entropy 2018, 20, 358

We also stress that the interpretation of the subset Act( A; S) ⊆ Transf (S) as the set of actions of an
agent is not strictly necessary for the validity of our results. Nevertheless, the notion of “agent” here is
useful because it helps explaining the rationale of our construction. The role of the agent is somehow
similar to the role of a “probe charge” in classical electromagnetism. The probe charge need not exist in
reality, but helps—as a conceptual tool—to give operational meaning to the magnitude and direction
of the electric ﬁeld.
In general, the set of actions available to agent A may be smaller than the set of all physical
transformations on S. In addition, there may be other agents that act on system S independently of
agent A. We deﬁne the independence of actions in the following way:

Deﬁnition 2. Agents A and B act independently if the order in which they act is irrelevant, namely

A◦B = B ◦A, ∀A ∈ Act( A; S) , B ∈ Act( B; S) . (6)

In a very primitive sense, the above relation expresses the fact that A and B act on “different
degrees of freedom” of the system.

Remark 1 (Commutation of transformations vs. commutation of observables). Commutation conditions

similar to Equation (6) are of fundamental importance in quantum ﬁeld theory, where they are known under
the names of “Einstein causality” [69] and “Microcausality” [70]. However, the similarity should not mislead
the reader. The ﬁeld theoretic conditions are expressed in terms of operator algebras. The condition is that the
operators associated with independent systems commute. For example, a system localized in a certain region
could be associated with the operator algebra A, and another system localized in another region could be associated
with the operator algebra B. In this situation, the commutation condition reads

CD = DC ∀C ∈ A, ∀D ∈ B . (7)

In contrast, Equation (6) is a condition on the transformations, and not on the observables, which are not
even described by our framework. In quantum theory, Equation (6) is a condition on the completely positive
maps, and not to the elements of the algebras A and B. In Section 4, we will bridge the gap between our
framework and the usual algebraic framework, focussing on the scenario where A and B are ﬁnite dimensional
von Neumann algebras.

3.3. Adversaries and Degradation

From the point of view of agent A, it is important to identify the degrees of freedom that no other
agent B can affect. In an adversarial setting, agent B can be viewed as an adversary that tries to control
as much of the system as possible.

Deﬁnition 3 (Adversary). Let A be an agent and let Act( A; S) be her set of operations. An adversary of A
is an agent B that acts independently of A, i.e., an agent B whose set of actions satisﬁes

Act( B; S) ⊆ Act( A; S) := B ∈ Transf (S) : B ◦ A = A ◦ B , ∀A ∈ Act( A; S) . (8)

Like the agent, the adversary is a conceptual tool, which will be used to illustrate our notion of
subsystem. The adversary need not be a real physical entity, localized outside the agent’s laboratory,
and trying to counteract the agent’s actions. Mathematically, the adversary is just a subset of the
commutant of Act( A; S). The interpretation of B as an “adversary” is a way to “give life to to the
mathematics”, and to illustrate the rationale of our construction.

112
Entropy 2018, 20, 358

When B is interpreted as an adversary, we can think of his actions as a “degradation”, which

compromises states and transformations. We denote the degradation relation as ! B , and write

φ !B ψ iff ∃B ∈ Act( B; S) : ψ = B φ , (9)

S !B T iff ∃B1 , B2 ∈ Act( B; S) : T = B1 ◦ S ◦ B2 (10)

for φ, ψ ∈ St(S) or S , T ∈ Transf (S).

The states that can be obtained by degrading ψ will be denoted as

Deg B (ψ) := B ψ : B ∈ Act( B; S) . (11)

The transformations that can be obtained by degrading T will be denoted as

Deg B (T ) := B1 ◦ T ◦ B2 : B1 , B2 ∈ Act( B; S) . (12)

The more operations B can perform, the more powerful B will be as an adversary. The most
powerful adversary compatible with the independence condition (6) is the adversary that can
implement all transformations in the commutant of Act( A; S):
' (
Deﬁnition 4. The maximal adversary of agent A is the agent A that can perform the actions Act A ; S :=
Act( A; S) .

Note that the actions of the maximal adversary are automatically a monoid, even if the set
Act( A; S) is not. Indeed,

• the identity map IS commutes with all operations in Act( A; S), and
• if B and B commute with every operation in Act( A; S), then also their composition B ◦ B will
commute with all the operations in Act( A; S).

In the following, we will use the maximal adversary to deﬁne the subsystem associated
with agent A.

3.4. The States of the Subsystem

Given an agent A, we think of the subsystem S A to be the collection of all degrees of freedom
that are unaffected by the action of the maximal adversary A . Consistently with this intuitive picture,
we partition the states of S into disjoint subsets, with the interpretation that two states are in the same
subset if and only if they correspond to the same state of subsystem S A .
We denote by Λψ the subset of St(S) containing the state ψ. To construct the state space of the
subsystem, we adopt the following rule:

Rule 1. If the state ψ is obtained from the state φ through degradation, i.e., if ψ ∈ Deg A (φ), then ψ and φ
must correspond to the same state of subsystem S A , i.e., one must have Λψ = Λφ .

Rule 1 imposes that all states in the set Deg A (ψ) must be contained in the set Λψ . Furthermore,
we have the following fact:

Proposition 1. If the sets Deg A (φ) and Deg A (ψ) have non-trivial intersection, then Λφ = Λψ .

Proof. By Rule 1, every element of Deg A (φ) is contained in Λφ . Similarly, every element of Deg A (ψ)
is contained in Λψ . Hence, if Deg A (φ) and Deg A (ψ) have non-trivial intersection, then also Λφ and
Λψ have non-trivial intersection. Since the sets Λφ and Λψ belong to a disjoint partition, we conclude
that Λφ = Λψ .

113
Entropy 2018, 20, 358

Generalizing the above argument, it is clear that two states φ and ψ must be in the same subset
Λφ = Λψ if there exists a ﬁnite sequence (ψ1 , ψ2 , . . . , ψn ) ⊆ St(S) such that

ψ1 = φ , ψn = ψ , and Deg A (ψi ) ∩ Deg A (ψi+1 )

= ∅ ∀i ∈ {1, 2, . . . , n − 1} . (13)

When this is the case, we write φ A ψ. Note that the relation φ A ψ is an equivalence relation.
When the relation φ A ψ holds, we say that φ and ψ are equivalent for agent A. We denote the
equivalence class of the state ψ by [ψ] A .
By Rule 1, the whole equivalence class [ψ] A must be contained in the set Λψ , meaning that all
states in the equivalence class must correspond to the same state of subsystem S A . Since we are not
constrained by any other condition, we make the minimal choice

Λψ := [ψ] A . (14)

In summary, the state space of system S A is

St(S A ) := [ψ] A : ψ ∈ St(S) . (15)

3.5. The Transformations of a Subsystem

The transformations of system S A can also be constructed through equivalence classes. Before
taking equivalence classes, however, we need a candidate set of transformations that can be interpreted
as acting exclusively on subsystem S A . The largest candidate set is the set of all transformations that
commute with the actions of the maximal adversary A , namely

Act( A ; S) = Act( A; S) . (16)

In general, Act( A; S) could be larger than Act( A; S), in agreement with the fact the set of physical
transformations of system S A could be larger than the set of operations that agent A can perform.
For example, agent A could have access only to noisy operations, while another, more technologically
advanced agent could perform more accurate operations on the same subsystem.
For two transformations S and T in Act( A; S) , the degradation relation ! A takes the simple form

S ! A T iff T = B◦S for some B ∈ Act( A ; S) . (17)

As we did for the set of states, we now partition the set Act( A; S) into disjoint subsets, with the
interpretation that two transformations act in the same way on the subsystem S A if and only if they
belong to the same subset.
Let us denote by ΘA the subset containing the transformation A. To ﬁnd the appropriate partition
of Act( A; S) into disjoint subsets, we adopt the following rule:

Rule 2. If the transformation T ∈ Act( A; S) is obtained from the transformation S ∈ Act( A; S) through
degradation, i.e., if T ∈ Deg A (S), then T and S must act in the same way on the subsystem S A , i.e., they must
satisfy ΘT = ΘS .

Intuitively, the motivation for the above rule is that system S A is deﬁned as the system that is not
affected by the action of the adversary.
Rule 2 implies that all transformations in Deg A (T ) must be contained in ΘT . Moreover, we have
the following:

Proposition 2. If the sets Deg A (S) and Deg A (T ) have non-trivial intersection, then ΘS = ΘT .

114
Entropy 2018, 20, 358

Proof. By Rule 2, every element of Deg A (S) is contained in ΘS . Similarly, every element of Deg A (T )
is contained in ΘT . Hence, if Deg A (S) and Deg A (T ) have non-trivial intersection, then also ΘS and
ΘT have non-trivial intersection. Since the sets ΛS and ΛT belong to a disjoint partition, we conclude
that ΛS = ΛT .

Using the above proposition, we obtain that the equality ΘT = ΘS holds whenever there exists a
ﬁnite sequence (A1 , A2 , . . . , An ) ⊆ Act( A; S) such that

A1 = S , An = T , and Deg A (Ai ) ∩ Deg A (Ai+1 )

= ∅ ∀i ∈ {1, 2, . . . , n − 1} . (18)

When the above relation is satisﬁed, we write S A T and we say that S and T are equivalent for
agent A. It is immediate to check that A is an equivalence relation. We denote the equivalence class
of the transformation T ∈ Act( A; S) as [T ] A .
By Rule 2, all the elements of [T ] A must be contained in the set ΘT , i.e., they should correspond
to the same transformation on S A . Again, we make the minimal choice: we stipulate that the set ΘT
coincides exactly with the equivalence class [T ] A . Hence, the transformations of subsystem S A are

Transf (S A ) := [T ] A : T ∈ Act( A; S) . (19)

The composition of two transformations [T1 ] A and [T2 ] A is deﬁned in the obvious way, namely

[T1 ] A ◦ [T2 ] A := [T1 ◦ T2 ] A . (20)

Similarly, the action of the transformations on the states is deﬁned as

[T ] A [ψ] A := [T ψ] A . (21)

In Appendix A, we show that deﬁnitions (20) and (21) are well-posed, in the sense that their
right-hand sides are independent of the choice of representatives within the equivalence classes.

Remark 1. It is important not to confuse the transformation T ∈ Act( A; S) with the equivalence class
[T ] A : the former is a transformation on the whole system S, while the latter is a transformation only on
subsystem S A . To keep track of the distinction, we deﬁne the restriction of the transformation T ∈ Act( A; S)
to the subsystem S A via the map

π A (T ) := [T ] A . (22)

Proposition 3. The restriction map π A : Act( A; S) → Transf (S A ) is a monoid homomorphism, namely
π A (IS ) = IS A and π A (S ◦ T ) = π A (S) ◦ π A (T ) for every pair of transformations S , T ∈ Act( A; S) .

Proof. Immediate from the deﬁnition (20).

4. Examples of Agents, Adversaries, and Subsystems

In this section, we illustrate the construction of subsystems in ﬁve concrete examples.

4.1. Tensor Product of Two Quantum Systems

Let us start from the obvious example, which will serve as a sanity check for the soundness of our
construction. Let S be a quantum system with Hilbert space HS = H A ⊗ H B . The states of S are all the
density operators on the Hilbert space HS . The space of all linear operators from HS to itself will be
denoted as Lin(HS ), so that

115
Entropy 2018, 20, 358

St(S) = ρ ∈ Lin(HS ) : ρ ≥ 0, Tr[ρ] = 1 . (23)

The transformations are all the quantum channels (linear, completely positive, and trace-preserving
linear maps) from Lin(HS ) to itself. We will denote the set of all channels on system S as Chan(S).
Similarly, we will use the notation Lin(H A ) [Lin(H B )] for the spaces of linear operators from H A
[H B ] to itself, and the notation Chan( A) [Chan( B)] for the quantum channels from Lin(H A ) [Lin(H B )]
to itself.
We can now deﬁne an agent A whose actions are all quantum channels acting locally on
system A, namely

Act( A; S) := A ⊗ I B : A ∈ Chan( A) , (24)

where I B denotes the identity map on Lin(H B ). It is relatively easy to see that the commutant of
Act( A; S) is

Act( A; S) = I A ⊗ B : B ∈ Chan( B) (25)

(see Appendix B for the proof). Hence, the maximal adversary of agent A is the adversary A = B that
has full control on the Hilbert space H B . Note also that one has Act( A; S) = Act( A; S).
Now, the following fact holds:

Proposition 4. Two states ρ, σ ∈ St(S) are equivalent for agent A if and only if TrB [ρ] = TrB [σ], where TrB
denotes the partial trace over the Hilbert space H B .

Proof. Suppose that the equivalence ρ A σ holds. By deﬁnition, this means that there exists a ﬁnite
sequence (ρ1 , ρ2 , . . . , ρn ) such that

ρ1 = ρ , ρn = σ , and Deg B (ρi ) ∩ Deg B (ρi+1 )

= ∅ ∀i ∈ {1, 2, . . . , n − 1} . (26)

In turn, the condition of non-trivial intersection implies that, for every i ∈ {1, 2, . . . , n − 1}, one has

(I A ⊗ Bi ) (ρi ) = (I A ⊗ B)i ) (ρi+1 ) , (27)

where Bi and B ) i are two quantum channels in Chan(B). Since Bi and B)i are trace-preserving, Equation (27)
implies TrB [ρi ] = TrB [ρi+1 ], as one can see by taking the partial trace on HB on both sides. In conclusion,
we obtained the equality TrB [ρ] ≡ TrB [ρ1 ] = TrB [ρ2 ] = · · · = TrB [ρn ] ≡ TrB [σ].
Conversely, suppose that the condition TrB [ρ] = TrB [σ] holds. Then, one has

(I A ⊗ B0 ) (ρ) = (I A ⊗ B0 ) (σ) , (28)

where B0 ∈ Chan( B) is the erasure channel deﬁned as B0 (·) = β 0 TrB [·], β 0 being a ﬁxed (but otherwise
arbitrary) density matrix in Lin(H B ). Since I A ⊗ B0 is an element of Act( B; S), Equation (28) shows
that the intersection between Deg B (ρ) and Deg B (σ) is non-empty. Hence, ρ and σ correspond to the
same state of system S A .

We have seen that two global states ρ, σ ∈ St(S) are equivalent for agent A if and only if they
have the same partial trace over B. Hence, the state space of the subsystem S A is

St(S A ) = TrB [ρ] : ρ ∈ St(S) , (29)

consistently with the standard prescription of quantum mechanics.

116
Entropy 2018, 20, 358

Now, let us consider the transformations. It is not hard to show that two transformations
T , S ∈ Act( A; S) are equivalent if and only if TrB ◦T = TrB ◦S (see Appendix B for the details).
Recalling that the transformations in Act( A; S) are of the form A ⊗ I B , for some A ∈ Chan( A),
we obtain that the set of transformations of S A is

Transf (S A ) = Chan( A) . (30)

In summary, our construction correctly identiﬁes the quantum subsystem associated with the
Hilbert space H A , with the right set of states and the right set of physical transformations.

4.2. Subsystems Associated with Finite Dimensional Von Neumann algebras

In this example, we show that our notion of subsystem encompasses the traditional notion of
subsystem based on an algebra of observables. For simplicity, we restrict our attention to a quantum
system S with finite dimensional Hilbert space HS Cd , d < ∞. With this choice, the state space St(S)
is the set of all density matrices in Md (C) and the transformation monoid Transf (S) is the set of all
quantum channels (linear, completely positive, trace-preserving maps) from Md (C) to itself.
We now define an agent A associated with a von Neumann algebra A ⊆ Md (C). In the finite
dimensional setting, a von Neumann algebra is just a matrix algebra that contains the identity operator
and is closed under the matrix adjoint. Every such algebra can be decomposed in a block diagonal form.
Explicitly, one can decompose the Hilbert space HS as
*' (
HS = H Ak ⊗ H Bk , (31)
k

for appropriate Hilbert spaces H Ak and H Bk . Relative to this decomposition, the elements of the
algebra A are characterized as
*' (
C∈A ⇐⇒ C= Ck ⊗ IBk , (32)
k

where Ck is an operator in Lin(H Ak ), and IBk is the identity on H Bk . The elements of the commutant
algebra A are characterized as
*' (
D ∈ A ⇐⇒ D= I A k ⊗ Dk , (33)
k

where I Ak is the identity on H Ak and Dk is an operator in Lin(H Bk ).

We grant agent A the ability to implement all quantum channels with Kraus operators in the
algebra A, i.e., all quantum channels in the set
r
Chan(A) := C ∈ Chan(S) : C(·) = ∑ Ci · Ci† , Ci ∈ A ∀i ∈ {1, . . . , r} . (34)
i =1

The maximal adversary of agent A is the agent B who can implement all the quantum channels
that commute with the channels in Chan(A), namely

Act( B; S) = Chan(A) . (35)

In Appendix C, we prove that Chan(A) coincides with the set of quantum channels with Kraus
operators in the commutant of the algebra A: in formula,

Chan(A) = Chan(A ) . (36)

117
Entropy 2018, 20, 358

As in the previous example, the states of subsystem S A can be characterized as “partial traces”
of the states in S, provided that one adopts the right definition of “partial trace”. Denoting the
commutant of the algebra A by B := A , one can define the “partial trace over the algebra B” as the
+
channel TrB : Lin(HS ) → k Lin(H Ak ) specified by the relation
*
TrB (ρ) := TrBk Πk ρΠk , (37)
k

where Πk is the projector on the subspace H Ak ⊗ H Bk ⊆ HS , and TrBk denotes the partial trace over
the space H Bk . With deﬁnition (37), is not hard to see that two states are equivalent for A if and only if
they have the same partial trace over B:

Proposition 5. Two states ρ, σ ∈ St(S) are equivalent for A if and only if TrB [ρ] = TrB [σ].

The proof is provided in Appendix C. In summary, the states of system St(S A ) are obtained from
the states of S via partial trace over B, namely

St(S A ) = TrB (ρ) : ρ ∈ St(S) . (38)

Our construction is consistent with the standard algebraic construction, where the states of system
S A are deﬁned as restrictions of the global states to the subalgebra A: indeed, for every element C ∈ A,
we have the relation
, -
*
Tr[C ρ] = Tr Ck ⊗ IBk ρ
k
= ∑ Tr[(Ck ⊗ IBk ) Πk ρΠk ]
k

= ∑ Tr Ck TrBk [Πk ρΠk ]
k
*
= Tr Č TrB [ρ] , Č := Ck , (39)
k

meaning that the restriction of the state ρ to the subalgebra A is in one-to-one correspondence with the
state TrB [ρ].
Alternatively, the states of subsystem S A can be characterized as density matrices of the block
diagonal form
*
σ= pk σk , (40)
k

where ( pk ) is a probability distribution, and each σk is a density matrix in Lin(H Ak ). In Appendix C,

we characterize the transformations of the subsystem S A as quantum channels A of the form
*
A= Ak , (41)
k

where Ak : Lin(H Ak ) → Lin(H Ak ) is a linear, completely positive, and trace-preserving map.

In summary, the subsystem S A is a direct sum of quantum systems.

118
Entropy 2018, 20, 358

4.3. Coherent Superpositions vs. Incoherent Mixtures in Closed-System Quantum Theory

We now analyze an example involving only pure states and reversible transformations. Let S
be a single quantum system with Hilbert space HS = Cd , d < ∞, equipped with a distinguished
orthonormal basis {|n}dn=1 . As the state space, we consider the set of pure quantum states: in formula,

St(S) = |ψ ψ| : | ψ ∈ Cd , ψ|ψ = 1 . (42)

As the set of transformations, we consider the set of all unitary channels: in formula,

Transf (S) = U · U † : U ∈ Md (C) , U†U = U†U = I . (43)

To agent A, we grant the ability to implement all unitary channels corresponding to diagonal
unitary matrices, i.e., matrices of the form

Uθ = ∑ eiθk |k k| , θ = (θ1 , . . . , θd ) ∈ [0, 2π )×d , (44)

where each phase θk can vary independently of the other phases. In formula, the set of actions of
agent A is

Act( A; S) = Uθ · Uθ† : Uθ ∈ Lin(HS ) , Uθ as in Equation (44) . (45)

The peculiarity of this example is that the actions of the maximal adversary A are exactly the
same as the actions of A. It is immediate to see that Act( A; S) is included in Act( A ; S) because all
operations of agent A commute. With a bit of extra work, one can see that, in fact, Act( A; S) and
Act( A ; S) coincide.
Let us look at the subsystem associated with agent A. The equivalence relation among states
takes a simple form:

Proposition 6. Two pure states with unit vectors |φ, |ψ ∈ HS are equivalent for A if and only if |ψ = U |φ
for some diagonal unitary matrix U.

Proof. Suppose that there exists a ﬁnite sequence (|ψ1 , |ψ2 , . . . , |ψn ) such that

|ψ1 = |φ , |ψn = |ψ , and Deg A (|ψi ψi |) ∩ Deg A (|ψi+1 ψi+1 |)
= ∅ ∀i ∈ {1, 2, . . . , n − 1} .

)i
This means that, for every i ∈ {1, . . . , n − 1}, there exist two diagonal unitary matrices Ui and U
) i |ψi+1 , or equivalently,
such that Ui |ψi = U

) † Ui |ψi .
|ψi+1 = U (46)
i

Using the above relation for all values of i, we obtain |ψ = U |φ with U :=
) † Un−1 · · · U
U ) † U2 U
) † U1 .
n −1 2 1
Conversely, suppose that the condition |ψ = U |φ holds for some diagonal unitary matrix U.
Then, the intersection Deg A (|φ φ|) ∩ Deg A (|ψ ψ|) is non-empty, which implies that |φ φ| and
|ψ ψ| are in the same equivalence class.

119
Entropy 2018, 20, 358

Using Proposition 6, it is immediate to see that the equivalence class [|ψ ψ|] A is uniquely
identiﬁed by the diagonal density matrix ρ = ∑k |ψk |2 |k k|. Hence, the state space of system S A is
the set of diagonal density matrices

St(S A ) = ρ = ∑ pk |k k| : pk ≥ 0 ∀k , ∑ pk = 1 . (47)
k k

The set of transformations of system S A is trivial because the actions of A coincide with the actions
of the adversary A , and therefore they are all in the equivalence class of the identity transformation.
In formula, one has

Transf (S A ) = IS A . (48)

4.4. Classical Subsystems in Open-System Quantum Theory

This example is of the same ﬂavour as the previous one but is more elaborate and more interesting.
Again, we consider a quantum system S with Hilbert space H = Cd . Now, we take St(S) to be the
whole set of density matrices in Md (C) and Transf (S) to be the whole set of quantum channels from
Md (C) to itself.
We grant to agent A the ability to perform every multiphase covariant channel, that is, every
quantum channel M satisfying the condition

Uθ ◦ M = M ◦ Uθ ∀θ = (θ1 , θ2 , . . . , θd ) ∈ [0, 2π )×d , (49)

where Uθ = Uθ · Uθ† is the unitary channel corresponding to the diagonal unitary Uθ = ∑k eiθk |k k |.
Physically, we can interpret the restriction to multiphase covariant channels as the lack of a reference
for the deﬁnition of the phases in the basis {|k , k = 1, . . . , d}.
It turns out that the maximal adversary of agent A is the agent A that can perform every
basis-preserving channel B , that is, every channel satisfying the condition

B(|k k|) = |k k| ∀k ∈ {1, . . . , d} . (50)

Indeed, we have the following:

Theorem 1. The monoid of multiphase covariant channels and the monoid of basis-preserving channels are the
commutant of one another.

The proof, presented in Appendix D.1, is based on the characterization of the basis-preserving
channels provided in [71,72].
We now show that states of system S A can be characterized as classical probability distributions.

Proposition 7. For every pair of states ρ, σ ∈ St(S), the following are equivalent:

1. ρ and σ are equivalent for agent A,

2. D(ρ) = D(σ), where D is the completely dephasing channel D(·) := ∑k |k k| · |k k|.

Proof. Suppose that Condition 1 holds, meaning that there exists a sequence (ρ1 , ρ2 , . . . , ρn ) such that

ρ1 = ρ , ρn = σ , ∀i ∈ {1, . . . , n − 1} ∃Bi , B)i ∈ Act( B; S) : Bi (ρi ) = B)i (ρi+1 ) , (51)

where Bi and B)i are basis-preserving channels. The above equation implies

k|Bi (ρi )|k = k|B)i (ρi+1 )|k . (52)

120
Entropy 2018, 20, 358

Now, the relation k|B(ρ)|k = k |ρ|k is valid for every basis-preserving channel B and for every
state ρ [71]. Applying this relation on both sides of Equation (52), we obtain the condition

k | ρ i | k = k | ρ i +1 | k , (53)

valid for every k ∈ {1, . . . , d}. Hence, all the density matrices (ρ1 , ρ2 , . . . , ρn ) must have the same
diagonal entries, and, in particular, Condition 2 must hold.
Conversely, suppose that Condition 2 holds. Since the dephasing channel D is obviously
basis-preserving, we obtained the condition Deg A (ρ) ∩ Deg A (σ)
= ∅, which implies that ρ and
σ are equivalent for agent A. In conclusion, Condition 1 holds.

Proposition 7 guarantees that the states of system S A is in one-to-one correspondence with

diagonal density matrices, and therefore, with classical probability distributions: in formula,

St(S A ) = ( pk )dk=1 : pk ≥ 0 ∀k , ∑ pk = 1 . (54)
k

The transformations of system S A can be characterized as transition matrices, namely

Transf (S A ) = [ Pjk ] j≤d, k≤d : Pjk ≥ 0 ∀ j, k ∈ {1, . . . , d} , ∑ Pjk = 1 ∀k ∈ {1, . . . , d} . (55)
j

The proof of Equation (55) is provided in Appendix D.2.

In summary, agent A has control on a classical system, whose states are probability distributions,
and whose transformations are classical transition matrices.

4.5. Classical Systems From Free Operations in the Resource Theory of Coherence
In the previous example, we have seen that classical systems arise from agents who have access
to the monoid of multiphase covariant channels. In fact, classical systems can arise in many other
ways, corresponding to agents who have access to different monoids of operations. In particular, we
find that several types of free operations in the resource theory of coherence [34–41] identify classical
systems. Specifically, consider the monoids of
1. Strictly incoherent operations [41], i.e., quantum channels T with the property that, for every
Kraus operator Ti , the map Ti (·) = Ti · Ti satisfies the condition D ◦ Ti = Ti ◦ D , where D is the
completely dephasing channel.
2. Dephasing covariant operations [38–40], i.e., quantum channels T satisfying the condition
D ◦ T = T ◦ D.
3. Phase covariant channels [40], i.e., quantum channels T satisfying the condition T ◦ U ϕ =
U ϕ ◦ T , ∀ ϕ ∈ [0, 2π ), where U ϕ is the unitary channel associated with the unitary matrix
U ϕ = ∑k eikϕ |k k |.
4. Physically incoherent operations [38,39], i.e., quantum channels that are convex combinations of
channels T admitting a Kraus representation where each Kraus operator Ti is of the form

Ti = Uπi Uθi Pi , (56)

where Uπi is a unitary that permutes the elements of the computational basis, Uθi is a diagonal
unitary, and Pi is a projector on a subspace spanned by a subset of vectors in the computational basis.
For each of the monoids 1–4, our construction yields the classical subsystem consisting of diagonal
density matrices. The transformations of the subsystem are just the classical channels. The proof is
presented in Appendix E.1.
Notably, other choices of free operations, such as the maximally incoherent operations [34] and the
incoherent operations [35], do not identify classical subsystems. The maximally incoherent operations

121
Entropy 2018, 20, 358

are the quantum channels T that map diagonal density matrices to diagonal density matrices, namely
T ◦ D = D ◦ T ◦ D , where D is the completely dephasing channel. The incoherent operations are the
quantum channels T with the property that, for every Kraus operator Ti , the map Ti (·) = Ti · Ti sends
diagonal matrices to diagonal matrices, namely Ti ◦ D = D ◦ Ti ◦ D .
In Appendix E.2, we show that incoherent and maximally incoherent operations do not identify
classical subsystems: the subsystem associated with these operations is the whole quantum system.
This result can be understood from the analogy between these operations and non-entangling
operations in the resource theory of entanglement [38,39]. Non-entangling operations do not generate
entanglement, but nevertheless they cannot (in general) be implemented with local operations
and classical communication. Similarly, incoherent and maximally incoherent operations do not
generate coherence, but they cannot (in general) be implemented with incoherent states and coherence
non-generating unitary gates. An agent that performs these operations must have access to more
degrees of freedom than just a classical subsystem.
At the mathematical level, the problem is that the incoherent and maximally incoherent operations
do not necessarily commute with the dephasing channel D . In our construction, commutation
with the dephasing channel is essential for retrieving classical subsystems. In general, we have
the following theorem:

Theorem 2. Every set of operations that

1. contains the set of classical channels, and

2. commutes with the dephasing channel

identiﬁes a d-dimensional classical subsystem of the original d-dimensional quantum system.

The proof is provided in Appendix E.1.

5. Key Structures: Partial Trace and No Signalling

In this section, we go back to the general construction of subsystems, and we analyse the main
structures arising from it. First, we observe that the deﬁnition of subsystem guarantees by ﬁat the
validity of the no-signalling principle, stating that operations performed on one subsystem cannot
affect the state of an independent subsystem. Then, we show that our construction of subsystems
allows one to build a category.

5.1. The Partial Trace and the No Signalling Property

We deﬁned the states of system S A as equivalence classes. In more physical terms, we can regard
the map ψ → [ψ] A as an operation of discarding, which takes system S and throws away the degrees
of freedom reachable by the maximal adversary A . In our adversarial picture, “throwing away some
degrees of freedom” means leaving them under the control of the adversary, and considering only the
part of the system that remains under the control of the agent.

Deﬁnition 5. The partial trace over A is the function Tr A : St(S) → St(S A ), deﬁned by Tr A (ψ) = [ψ] A
for a generic ψ ∈ St(S).

The reason for the notation Tr A is that in quantum theory the operation Tr A coincides with the
partial trace of matrices, as shown in the example of Section 4.1. For subsystems associated with
von Neumann algebras, the partial trace is the “partial trace over the algebra” deﬁned in Section 4.2.
For subsystems associated with multiphase covariant channels or dephasing covariant operations,
the partial trace is the completely dephasing channel, which “traces out” the off-diagonal elements of
the density matrix.

122
Entropy 2018, 20, 358

With the partial trace notation, the states of system S A can be succinctly written as

St(S A ) = ρ = Tr A (ψ) : ψ ∈ St(S) . (57)

Denoting B := A , we have the important relation

TrB ◦B = TrB ∀B ∈ Act( B; S) . (58)

Equation (58) can be regarded as the no signalling property: the actions of agent B cannot lead to
any change on the system of agent A. Of course, here the no signalling property holds by fiat, precisely
because of the way the subsystems are defined!
The construction of subsystems has the merit to clarify the status of the no-signalling principle.
No-signalling is often associated with space-like separation, and is heuristically justified through the
idea that physical influences should propagate within the light cones. However, locality is only a
sufficient condition for the no signalling property. Spatial separation implies no signalling, but the
converse is not necessarily true: every pair of distinct quantum systems satisfies the no-signalling
condition, even if the two systems are spatially contiguous. In fact, the no-signalling condition holds
even for virtual subsystems of a single, spatially localized system. Think for example of a quantum
particle localized in the xy plane. The particle can be regarded as a composite system, made of two
virtual subsystems: a particle localized on the x-axis, and another particle localized on the y-axis.
The no-signalling property holds for these two subsystems, even if they are not separated in space.
As Equation (58) suggests, the validity of the no-signalling property has more to do with the way
subsystems are constructed, rather than the way the subsystems are distributed in space.

5.2. A Baby Category

Our construction of subsystems deﬁnes a category, consisting of three objects, S, S A , and SB ,
where SB is the subsystem associated with the agent B = A . The sets Transf (S), Transf (S A ), and
Transf (SB ) are the endomorphisms from S to S, S A to S A , and SB to SB , respectively. The morphisms
from S to S A and from S to SB are deﬁned as

Transf (S → S A ) = TrB ◦T : T ∈ Transf (S) (59)

and

Transf (S → SB ) = Tr A ◦T : T ∈ Transf (S) , (60)

respectively.
Morphisms from S A to S, from SB to S, from S A to SB , or from SB to S A , are not naturally deﬁned.
In Appendix F, we provide a mathematical construction that enlarges the sets of transformations,
making all sets non-empty. Such a construction allows us to reproduce a categorical structure known
as a splitting of idempotents [73,74]

6. Non-Overlapping Agents, Causality, and the Initialization Requirement

In the previous sections, we developed a general framework, applicable to arbitrary physical
systems. In this section, we identify some desirable properties that the global systems may enjoy.

6.1. Dual Pairs of Agents

So far, we have taken the perspective of agent A. Let us now take the perspective of the maximal
adversary A . We consider A as the agent, and denote his maximal adversary as A . By deﬁnition,
A can perform every action in the commutant of Act( A ; S), namely

123
Entropy 2018, 20, 358

Act( A ; S) = Act( A ; S) = Act( A; S) . (61)

Obviously, the set of actions allowed to agent A includes the set of actions allowed to agent A.
At this point, one could continue the construction and consider the maximal adversary of agent A .
However, no new agent would appear at this point: the maximal adversary of agent A is agent A
again. When two agents have this property, we call them a dual pair:

Deﬁnition 6. Two agents A and B form a dual pair iff Act( A; S) = Act( B; S) and Act( B; S) = Act( A; S) .

All the examples in Section 4 are examples of dual pairs of agents.

It is easy to see that an agent A is part of a dual pair if and only if the set Act( A; S) coincides with
its double commutant Act( A; S) .

6.2. Non-Overlapping Agents

Suppose that agents A and B form a dual pair. In general, the actions in Act( A; S) may have a
non-trivial intersection with the actions in Act( B; S). This situation does indeed happen, as we have
seen in Sections 4.3 and 4.4. Still, it is important to examine the special case where the actions of A and
B have only trivial intersection, corresponding to the identity action IS . When this is the case, we say
that the agents A and B are non-overlapping:

Deﬁnition 7. Two agents A and B are non-overlapping iff Act( A; S) ∩ Act( B; S) ⊆ {IS }.

Dual pairs of non-overlapping agents are characterized by the fact that the sets of actions have
trivial center:

Proposition 8. Let A and B be a dual pair of agents. Then, the following are equivalent:

1. A and B are non-overlapping,

2. Act( A; S) has trivial center,
3. Act( B; S) has trivial center.

Proof. Since agents A and B are dual to each other, we have Act( B; S) = Act( A; S) and Act( A; S) =
Act( B; S) . Hence, the intersection Act( A; S) ∩ Act( B; S) coincides with the center of Act( A; S), and with
the center of Act( B; S). The non-overlap condition holds if and only if the center is trivial.

Note that the existence of non-overlapping dual pairs is a condition on the transformations of the
whole system S:

Proposition 9. The following are equivalent:

1. system S admits a dual pair of non-overlapping agents,

2. the monoid Transf (S) has trivial center.

Proof. Assume that Condition 1 holds for a pair of agents A and B. Let C(S) be the center of Transf (S).
By deﬁnition, C(S) is contained into Act( B; S) because Act( B; S) contains all the transformations that
commute with those in Act( A; S). Moreover, the elements of C(S) commute with all elements of
Act( B; S), and therefore they are in the center of Act( B; S). Since A and B are a non-overlapping
dual pair, the center of Act( B; S) must be trivial (Proposition 8), and therefore C(S) must be trivial.
Hence, Condition 2 holds.
Conversely, suppose that Condition 2 holds. In that case, it is enough to take A to be the maximal
agent, i.e., the agent Amax with Act ( Amax ; S) = Transf (S). Then, the maximal adversary of Amax is the
agent B = Amax with Act( B; S) = Act ( Amax ; S) = C(S) = {IS }. By deﬁnition, the two agents form a
non-overlapping dual pair. Hence, Condition 1 holds.

124
Entropy 2018, 20, 358

The existence of dual pairs of non-overlapping agents is a desirable property, which may be used
to characterize “good systems”:

Deﬁnition 8 (Non-Overlapping Agents). We say that system S satisﬁes the Non-Overlapping Agents
Requirement if there exists at least one dual pair of non-overlapping agents acting on S.

The Non-Overlapping Agents Requirement guarantees that the total system S can be regarded as
a subsystem: if Amax is the maximal agent (i.e., the agent who has access to all transformations on S),
then the subsystem S Amax is the whole system S. A more formal statement of this fact is provided in
Appendix G.

6.3. Causality
The Non-Overlapping Agents Requirement guarantees that the subsystem associated with a maximal
agent (i.e., an agent who has access to all possible transformations) is the whole system S. On the
other hand, it is natural to expect that a minimal agent, who has no access to any transformation, should
be associated with the trivial system, i.e., the system with a single state and a single transformation.
The fact that the minimal agent is associated with the trivial system is important because it equivalent
to a property of causality [8,13,75,76]: indeed, we have the following

Proposition 10. Let Amin be the minimal agent and let Amax be its maximal adversary, coinciding with the
maximal agent. Then, the following conditions are equivalent

1. S Amin is the trivial system,

2. one has Tr Amax [ρ] = Tr Amax [σ] for every pair of states ρ, σ ∈ St(S).

Proof. 1 ⇒ 2: By deﬁnition, the state space of S Amin consists of states of the form Tr Amax [ρ], ρ ∈ St(S).
Hence, the state space contains only one state if and only if Condition 2 holds. 2 ⇒ 1: Condition 2
implies that every two states of system S are equivalent for agent Amax . The fact that S Amin has only
one transformation is true by deﬁnition: since the adversary of Amin is the maximal agent, one has
T ∈ Deg Amax (IS ) for every transformation T ∈ Transf (S). Hence, every transformation is in the
equivalence class of the identity.

With a little abuse of notation, we may denote the trace over Amax as TrS because Amax has access
to all transformations on system S. With this notation, the causality condition reads

TrS [ρ] = TrS [σ] ∀ρ, σ ∈ St(S) . (62)

It is interesting to note that, unlike no signalling, causality does not necessarily hold in the framework
of this paper. This is because the trace TrS is deﬁned as the quotient with respect to all possible
transformations, and having a single equivalence class is a non-trivial property. One possibility is
to demand the validity of this property, and to call a system proper, only if it satisﬁes the causality
condition (62). In the following subsection, we will see a requirement that guarantees the validity of
the causality condition.

6.4. The Initialization Requirement

The ability to prepare states from a ﬁxed initial state is important in the circuit model of quantum
computation, where qubits are initialized to the state |0, and more general states are generated by
applying quantum gates. More broadly, the ability to initialize the system in a given state and to
generate other states from it is important for applications in quantum control and adiabatic quantum
computing. Motivated by these considerations, we formulate the following deﬁnition:

125
Entropy 2018, 20, 358

Deﬁnition 9. A system S satisﬁes the Initialization Requirement if there exists a state ψ0 ∈ St(S) from which
any other state can be generated, meaning that, for every other state ψ ∈ St(S), there exists a transformation
T ∈ Transf (S) such that ψ = T ψ0 . When this is the case, the state ψ0 is called cyclic.

The Initialization Requirement is satisfied in quantum theory, both at the pure state level and at the
mixed state level. At the pure state level, every unit vector |ψ ∈ HS can be generated from a fixed unit
vector |ψ0 ∈ HS via a unitary transformation U. At the mixed state level, every density matrix ρ can be
generated from a fixed density matrix ρ0 via the erasure channel Cρ (·) = ρ Tr[·]. By the same argument,
the initialization requirement is also satisfied when S is a system in an operational-probabilistic
theory [8,10–13] and when S is a system in a causal process theory [75,76].
The Initialization Requirement guarantees that minimal agents are associated with trivial systems:

Proposition 11. Let S be a system satisfying the Initialization Requirement, and let Amin be the minimal
agent, i.e., the agent that can only perform the identity transformation. Then, the subsystem S Amin is trivial:
' ( ' (
St S Amin contains only one state and Transf S Amin contains only one transformation.

Proof. By deﬁnition, the maximal adversary of Amin is the maximal agent Amax , who has access to
all physical transformations. Then, every transformation is in the equivalence class of the identity
transformation, meaning that system S Amin has a single transformation. Now, let ψ0 be the cyclic state.
By the Initialization Requirement, the set Deg Amax (ψ0 ) is the whole state space St(S). Hence, every
' (
state is equivalent to the state ψ0 . In other words, St S Amin contains only one state.

The Initialization Requirement guarantees the validity of causality, thanks to Proposition 10.
In addition, the Initialization Requirement is important independently of the causality property.
For example, we will use it to formulate an abstract notion of closed system.

7. The Conservation of Information

In this section, we consider systems where all transformations are invertible. In such systems,
every transformation can be thought as the result of some deterministic dynamical law. The different
transformations in Transf (S) can be interpreted as different dynamics, associated with different values
of physical parameters, such as coupling constants or external control parameters.

7.1. Logically Invertible vs. Physically Invertible

Deﬁnition 10. A transformation T ∈ Transf (S) is logically invertible iff the map

T" : St(S) → St(S) , ψ → T ψ (63)

is injective.

Logically invertible transformations can be interpreted as evolutions of the system that preserve
the distictness of states. At the fundamental level, one may require that all physical evolutions be
logically invertible, a requirement that is sometimes called the Conservation of Information [58]. In the
following, we will explore the consequences of such requirement:

Deﬁnition 11 (Logical Conservation of Information). System S satisﬁes the Logical Conservation of

Information if all transformations in Transf (S) are logically invertible.

The requirement is well-posed because the invertible transformations form a monoid. Indeed,
the identity transformation is logically invertible, and that the composition of two logically invertible
transformations is logically invertible.
A special case of logical invertibility is physical invertibility, deﬁned as follows:

126
Entropy 2018, 20, 358

Deﬁnition 12. A transformation T ∈ Transf (S) is physically invertible iff there exists another
transformation T ∈ Transf (S) such that T ◦ T = IS .

Physical invertibility is more than injectivity: not only should the map T be injective on the state
space, but also its inverse should be a physical transformation. In light of this observation, we state a
stronger version of the Conservation of Information, requiring physical invertibility:

Deﬁnition 13 (Physical Conservation of Information). System S satisﬁes the Physical Conservation of

Information if all transformations in Transf (S) are physically invertible.

The difference between Logical and Physical Conservation of Information is highlighted by the
following example:

Example 3 (Conservation of Information in closed-system quantum theory). Let S be a closed quantum

system described by a separable, inﬁnite-dimensional Hilbert space HS , and let St(S) be the set of pure states,
represented as rank-one density matrices

St(S) = |ψ ψ| : |ψ ∈ HS , ψ|ψ = 1 . (64)

One possible choice of transformations is the monoid of isometric channels

Transf (S) = V · V † : V ∈ Lin(S) , V†V = I . (65)

This choice of transformations satisﬁes the Logical Conservation of Information, but violates the Physical
Conservation of Information because in general the map V † · V fails to be trace-preserving, and therefore fails to
be an isometric channel. For example, consider the shift operator
∞
V= ∑ |n + 1 n| . (66)
n =0

The operator V is an isometry but its left-inverse V † is not an isometry. As a result, the channel V † · V is
not an allowed physical transformation according to Equation (65).
An alternative choice of physical transformations is the set of unitary channels

Transf (S) = V · V † : V ∈ Lin(S) , V † V = VV † = I . (67)

With this choice, the Physical Conservation of Information is satisﬁed: every physical transformation is
invertible and the inverse is a physical transformation.

7.2. Systems Satisfying the Physical Conservation of Information

In a system satisfying the Physical Conservation of Information, the transformations are not only
physically invertible, but also physically reversible, in the following sense:

Deﬁnition 14. A transformation T ∈ Transf (S) is physically reversible iff there exists another
transformation T ∈ Transf (S) such that T ◦ T = T ◦ T = IS .

With the above deﬁnition, we have the following:

Proposition 12. If system S satisﬁes the Physical Conservation of Information, then every physical
transformation is physically reversible. The monoid Transf (S) is a group, hereafer denoted as G(S).

127
Entropy 2018, 20, 358

Proof. Since T is physically invertible, there exists a transformation T such that T ◦ T = IS . Since the
Physical Conservation of Information holds, T must be physically invertible, meaning that there
exists a transformation T such that T ◦ T = IS . Hence, we have

T = T ◦ (T ◦ T ) = (T ◦ T ) ◦ T = T . (68)

Since T = T , the invertibility condition T ◦ T = IS becomes T ◦ T = IS . Hence, T is

reversible and Transf (S) is a group.

7.3. Subsystems of Systems Satisfying the Physical Conservation of Information

Imagine that an agent A acts on a system S satisfying the Physical Conservation of Information.
We assume that the actions of agent A form a subgroup of G(S), denoted as G A . The maximal adversary
of A is the adversary B = A , who has access to all transformations in the set

GB := GA = U B ∈ G(S) : UB ◦ U A = U A ◦ UB , ∀U A ∈ G( A) . (69)

It is immediate to see that the set GB is a group. We call it the adversarial group.
The equivalence relations used to deﬁne subsystems can be greatly simpliﬁed. Indeed, it is easy
to see that two states ψ, ψ ∈ St(S) are equivalent for A if and only if there exists a transformation
U B ∈ GB such that

ψ = U B ψ . (70)

Hence, the states of the subsystem S A are orbits of the group GB : for every ψ ∈ St(S), we have

TrB [ψ] := U B ψ : U B ∈ GB . (71)

Similarly, the degradation of a transformation U ∈ G(S) yields the orbit

Deg B (U ) = U B,1 ◦ U ◦ U B,2 : U B,1 , U B,2 ∈ GB . (72)

It is easy to show that the transformations of the subsystem S A are the orbits of the group GB :

Transf (S A ) = π A (U ) : U ∈ GA , π A (U ) := U B ◦ U : U B ∈ GB . (73)

8. Closed Systems
Here, we define an abstract notion of “closed systems”, which captures the essential features of
what is traditionally called a closed system in quantum theory. Intuitively, the idea is that all the states
of the closed system are “pure” and all the evolutions are reversible.
An obvious problem in defining closed system is that our framework does not include a notion of
“pure state”. To circumvent the problem, we define the closed systems in the following way:

Deﬁnition 15. System S is closed iff it satisﬁes the Logical Conservation of Information and the Initialiation
Requirement, that is, iff
1. every transformation is logically invertible,
2. there exists a state ψ0 ∈ St(S) such that, for every other state ψ ∈ St(S), one has ψ = V ψ0 for some
suitable transformation V ∈ Transf (S).

For a closed system, we nominally say that all the states in St(S) are “pure”, or, more precisely,
“dynamically pure”. This deﬁnition is generally different from the usual deﬁnition of pure states as

128
Entropy 2018, 20, 358

extreme points of convex sets, or from the compositional deﬁnition of pure states as states with only
product extensions [77]. First of all, dynamically pure states are not a subset of the state space: provided
that the right conditions are met, they are all the states. Other differences between the usual notion of
pure states and the notion of dynamically pure states are highlighted by the following example:

Example 4. Let S be a system in which all states are of the form Uρ0 U † , where U is a generic 2-by-2 unitary
matrix, and ρ0 ∈ M2 (C) is a fixed 2-by-2 density matrix. For the transformations, we allow all unitary
channels U · U † . By construction, system S satisfies the initialization Requirement, as one can generate every
state from the initial state ρ0 . Moreover, all the transformations of system S are unitary and therefore the
Conservation of Information is satisfied, both at the physical and the logical level. Therefore, the states of system
S are dynamically pure. Of course, the states Uρ0 U † need not be extreme points of the convex set of all density
matrices, i.e., they need not be rank-one projectors. They are so only when the cyclic state ρ0 is rank-one.
On the other hand, consider a similar example, where

• system S is a qubit,
• the states are pure states, of the form |ψ ψ| for a generic unit vector |ψ ∈ C2 ,
• the transformations are unitary channels V · V † , where the unitary matrix V has real entries.

Using the Bloch sphere picture, the physical transformations are rotations around the y axis. Clearly,
the Initialization Requirement is not satisﬁed because there is no way to generate arbitrary points on the sphere
using only rotations around the y-axis. In this case, the states of S are pure in the convex set sense, but not
dynamically pure.

For closed systems satisfying the Physical Conservation of Information, every pair of pure states
are interconvertible:

Proposition 13 (Transitive action on the pure states). If system S is closed and satisﬁes the Physical
Conservation of Information, then, for every pair of states ψ, ψ ∈ St(S), there exists a reversible transformation
U ∈ G(S) such that ψ = U ψ.

Proof. By the Initialization Requirement, one has ψ = V ψ0 and ψ = V ψ0 for suitable V , V ∈

Transf (S). By the Physical Conservation of Information, all the tranformations in Transf (S) are
physically reversible. Hence, ψ = V ◦ V −1 ψ = U ψ, having deﬁned U = V ◦ V −1 .

The requirement that all pure states be connected by reversible transformations has featured in
many axiomatizations of quantum theory, either directly [5,44–46], or indirectly as a special case of
other axioms [42,48]. Comparing our framework with the framework of general probabilistic theories,
we can see that the dynamical definition of pure states refers to a rather specific situation, in which all
pure states are connected, either to each other (in the case of physical reversibility) or with to a fixed
cyclic state (in the case of logical reversibility).

9. Puriﬁcation
Here, we show that closed systems satisfying the Physical Conservation of Information also
satisfy the puriﬁcation property [8,12,13,15,49–51], namely the property that every mixed state can be
modelled as a pure state of a larger system in a canonical way. Under a certain regularity assumption,
the same holds for closed systems satisfying only the Logical Conservation of Information.

9.1. Puriﬁcation in Systems Satisfying the Physical Conservation of Information

Proposition 14 (Purification). Let S be a closed system satisfying the Physical Conservation of Information.
Let A be an agent in S, and let B = A be its maximal adversary. Then, for every state ρ ∈ St(S A ), there
exists a pure state ψ ∈ St(S), called the purification of ρ, such that ρ = TrB [ψ]. Moreover, the purification

129
Entropy 2018, 20, 358

of ρ is essentially unique: if ψ ∈ St(S) is another pure state with TrB [ψ] = ρ, then there exists a reversible
transformation U B ∈ GB such that ψ = U B ψ.

Proof. By construction, the states of system S A are orbits of states of system S under the adversarial
group GB . By Equation (71), every two states ψ, ψ ∈ St(S) in the same orbit are connected by an
element of GB .

Note that the notion of purification used here is more general than the usual notion of purification
in quantum information and quantum foundations. The most important difference is that system
S A need not be a factor in a tensor product. Consider the example of the coherent superpositions vs.
classical mixtures (Section 4.3). There, systems S A and SB coincide, their states are classical probability
distributions, and the purifications are coherent superpositions. Two purifications of the same classical
state p = ( p1 , p2 , . . . , pd ) are two rank-one projectors |ψ ψ| and |ψ ψ | corresponding to unit vectors
of the form
√ √
|ψ = ∑ pn eiθn |n and |ψ = ∑ pn eiθn |n . (74)
n n

One puriﬁcation can be obtained from the other by applying a diagonal unitary matrix. Speciﬁcally,
one has

| ψ = UB | ψ with UB = ∑ ei(θn −θn ) |n n| . (75)
n

For finite dimensional quantum systems, the notion of purification proposed here encompasses
both the notion of entanglement and the notion of coherent superposition. The case of infinite
dimensional systems will be discussed in the next subsection.

9.2. Puriﬁcation in Systems Satisfying the Logical Conservation of Information

For infinite dimensional quantum systems, every density matrix can be purified, but not all
purifications are connected by reversible transformations. Consider for example the unit vectors

. ∞ . ∞
|ψ AB = 1 − x2 ∑ xn |n A ⊗ |n B and |ψ AB = 1 − x2 ∑ x n | n A ⊗ | n + 1 B , (76)
n =0 n =0

for some x ∈ [0, 1).

For every fixed x
= 0, there is one and only one operator VB satisfying the condition |ψ AB =
( I A ⊗ VB )|ψ AB , namely the shift operator VB = ∑∞
n=0 | n + 1 n |. However, VB is only an isometry,
but not a unitary. This means that, if we define the states of system S A as equivalence classes of pure
states under local unitary equivalence, the two states |ψ ψ| and |ψ ψ | would end up into two
different equivalence classes.
One way to address the problem is to relax the requirement of reversibility and to consider the
monoid of isometries, defining

Transf (S) := {V · V † : V ∈ Lin(S) , V † V = I } . (77)

Given two puriﬁcations of the same state, say |ψ and |ψ , it is possible to show that at least one
of the following possibilities holds:
1. |ψ = ( I A ⊗ VB ) |ψ for some isometry VB acting on system SB ,
2. |ψ = ( I A ⊗ VB ) |ψ for some isometry VB acting on system SB .
Unfortunately, this uniqueness property is not automatically valid in every system satisfying the
Logical Conservation of Information. Still, we will now show a regularity condition, under which the
uniqueness property is satisﬁed:

130
Entropy 2018, 20, 358

Deﬁnition 16. Let S be a system satisfying the Logical Conservation of Information, let M ⊆ Transf (S) be a
monoid, and let DegM (ψ) be the set deﬁned by

DegM (ψ) = V ψ : V ∈M . (78)

We say that the monoid M ⊆ Transf (S) is regular iff

1. for every pair of states ψ, ψ ∈ St(S), the condition DegM (ψ) ∩ DegM (ψ )
= ∅ implies that there exists
a transformation U ∈ M such that ψ = U ψ or ψ = U ψ ,
2. for every pair of transformations V , V ∈ M, there exists a transformation W ∈ M such that V = W ◦ V
or V = W ◦ V .

The regularity conditions are satisﬁed in quantum theory by the monoid of isometries.

Example 5 (Isometric channels in quantum theory). Let S be a quantum system with separable Hilbert
space H, of dimension d ≤ ∞. Let St(S) the set of all pure quantum states, and let Transf (S) be the monoid of
all isometric channels.
We now show that the monoid M = Transf (S) is regular. The ﬁrst regularity condition is immediate
because for every pair of unit vectors |ψ and |ψ there exists an isometry (in fact, a unitary) V such that
|ψ = U |ψ. Trivially, this implies the relation |ψ ψ | = U |ψ ψ|U † at the level of quantum states and
isometric channels.
Let us see that the second regularity condition holds. Let V, V ∈ Lin(H) be two isometries on H, and let
{|i }id=1 be the standard basis for H. Then, the isometries V and V can be written as

d
V= ∑ |φi i| and V = ∑ |φi i| , (79)
i =1 i

where {|φi }id=1 and {|φi }id=1 are orthonormal vectors (not necessarily forming bases for the whole Hilbert

space H). Deﬁne the subspaces S = Span{|φi }id=1 and S = Span{|φi }id=1 , and let {|ψj }rj=1 and {|ψj }rj=1
be orthonormal bases for the orthogonal complements S⊥ and S⊥ , respectively. If r ≤ r , we deﬁne the isometry

d r
W= ∑ |φi φi | + ∑ |ψj ψj | , (80)
i =1 j =1

and we obtain the condition V = WV. Alternatively, if r ≤ r, we can deﬁne the isometry

d r
W= ∑ |φi φi | + ∑ |ψj ψj | , (81)
i =1 j =1

and we obtain the condition V = WV . At the level of isometric channels, we obtained the condition V = W ◦ V
or the condition V = W ◦ V , with V (·) = V · V † , V (·) = V · V † , and W (·) = W · W † .
The fact that the monoid of all isometric channels is regular implies that other monoids of isometric channels
are also regular. For example, if the Hilbert space H has the tensor product structure H = H A ⊗ H B , then the
monoid of local isometric channels, deﬁned by isometries of the form I A ⊗ VB , is regular. More generally, if the
Hilbert space is decomposed as
*
H= (H A,k ⊗ H B,k ) , (82)
k

131
Entropy 2018, 20, 358

then the monoid of isometric channels generated by isometries of the form

*
V= ( I A,k ⊗ VB,k ) (83)
k

is regular.

We are now in position to derive the puriﬁcation property for general closed systems:

Proposition 15. Let S be a closed system. Let A be an agent and let B = A be its maximal adversary.
If Act( B; S) is a regular monoid, the condition TrB [ψ] = TrB [ψ ] implies that there exists some invertible
transformation V B ∈ Transf ( B; S) such that the relation ψ = V B ψ or the relation ψ = V B ψ holds.

The proof is provided in Appendix H. In conclusion, we obtained the following

Corollary 1 (Purification). Let S be a closed system, let A be an agent in S, and let B = A be its maximal
adversary. If the monoid Act( B; S) is regular, then every state ρ ∈ St(S A ) has a purification ψ ∈ St(S), i.e.,
a state such that ρ = TrB [ψ]. Moreover, the purification is essentially unique: if ψ ∈ St(S) is another state
with TrB [ψ] = ρ, then there exists a reversible transformation V B ∈ Act( B; S) such that the relation ψ = V B ψ
or the relation ψ = V B ψ holds.

10. Example: Group Representations on Quantum State Spaces

We conclude the paper with a macro-example, involving group representations in closed-system
quantum theory. The point of this example is to illustrate the general notion of purification introduced
in this paper and to characterize the sets of mixed states associated with different agents.
As system S, we consider a quantum system with Hilbert space HS , possibly of infinite dimension.
We let St(S) be the set of pure quantum states, and let G(S) be the group of all unitary channels.
With this choice, the total system is closed and satisfies the Physical Conservation of Information.
Suppose that agent A is able to perform a group of transformations, such as e.g., the group of
phase shifts on a harmonic oscillator, or the group of rotations of a spin j particle. Mathematically,
we focus our attention on unitary channels arising from some representation of a given compact
group G. Denoting the representation as U : G → Lin(HS ) , g → Ug , the group of Alice’s actions is

G A = U g (·) = Ug · Ug† : g∈G . (84)

The maximal adversary of A is the agent B = A who is able to perform all unitary channels V
that commute with those in G A , namely, the unitary channels in the group

G B : = V ∈ G( S ) : V ◦ Ug = Ug ◦ V ∀g ∈ G . (85)

Speciﬁcally, the channels V correspond to unitary operators V satisfying the relation

VUg = ω (V, g) Ug V ∀g ∈ G , (86)

where, for every fixed V, the function ω(V, ·) : G → C is a multiplicative character, i.e., a one-dimensional
representation of the group G.
Note that, if two unitaries V and W satisfy Equation (86) with multiplicative characters ω (V, ·)
and ω (W, ·), respectively, then their product VW satisfies Equation (86) with multiplicative character
ω (VW, ·) = ω (V, ·) ω (W, ·). This means that the function ω : GB × G → C is a multiplicative
bicharacter: ω (V, ·) is a multiplicative character for G for every fixed V ∈ GB , and, at the same time,
ω (·, g) is a multiplicative character for GB for every fixed g ∈ G.

132
Entropy 2018, 20, 358

The adversarial group GB contains the commutant of the representation U : g → Ug , consisting of

all the unitaries V such that

VUg = Ug V ∀g ∈ G . (87)

The unitaries in the commutant satisfy Equation (86) with the trivial multiplicative character
ω (V, g) = 1, ∀ g ∈ G. In general, the adversarial group may contain other unitary operators,
corresponding to non-trivial multiplicative characters. The full characterization of the adversarial
group is provided by the following theorem:

Theorem 3. Let G be a compact group, let U : G → Lin(H) be a projective representation of G, and let G A be
the group of channels G A := {Ug · Ug† g ∈ G}. Then, the adversarial group GB is isomorphic to the semidirect
product A U , where U is the commutant of the set {Ug : g ∈ G}, and A is an Abelian subgroup of the
group of permutations of Irr(U ), the set of irreducible representations contained in the decomposition of the
representation Ug .

The proof is provided in Appendix I, and a simple example is presented in Appendix J.

In the following, we will illustrate the construction of the state space S A in a the prototypical
example where the group G is a compact connected Lie group.

Compact Connected Lie Groups

When G is a compact connected Lie group, the characterization of the adversarial group is
simpliﬁed by the following theorem:

Theorem 4. If G is a compact connected Lie group, then the Abelian subgroup A of Theorem 3 is trivial, and all
the solutions of Equation (86) have ω (V, g) = 1 ∀ g ∈ G.

The proof is provided in Appendix K.

For compact connected Lie groups, the the adversarial group coincides exactly with the
commutant of the representation U : G → Lin(HS ). An explicit expression can be obtained in
terms of the isotypic decomposition [78]
*
( j)
Ug = Ug ⊗ IM j , (88)
j∈Irr(U )

where Irr(U ) is the set of irreducible representations (irreps) of G contained in the decomposition of U,
( j)
U ( j) : g → Ug is the irreducible representation of G acting on the representation space R j , and IM j is
the identity acting on the multiplicity space M j . From this expression, it is clear that the adversarial
group GB consists of unitary gates V of the form
*
V= IR j ⊗ Vj , (89)
j∈Irr(U )

where IR j is the identity operator on the representation space R j , and Vj is a generic unitary operator
on the multiplicity space M j .
In general, the agents A and B = A do not form a dual pair. Indeed, it is not hard to see that the
maximal adversary of B is the agent C = A that can perform every unitary channel U (·) = U · U † ,
where U is a unitary operator of the form
*
U= Uj ⊗ IM j , (90)
j∈Irr(U )

133
Entropy 2018, 20, 358

Uj being a generic unitary operator on the representation space R j . When A and B form a dual par,
the groups G A and GB are sometimes called gauge groups [79].
It is now easy to characterize the subsystem S A . Its states are equivalence classes of pure states
under the relation |ψ ψ| A |ψ ψ | iff

∃U B ∈ G B such that | ψ = UB | ψ . (91)

It is easy to see that two states in the same equivalence class must satisfy the condition

TrB (|ψ ψ |) = TrB (|ψ ψ|) , (92)

where the “partial trace over agent B” is TrB is the map

*
TrB (ρ) := TrM j [Π j ρ Π j ] , (93)
j∈Irr(U )

Π j being the projector on the subspace R j ⊗ M j .

Conversely, it is possible to show that the state TrB (|ψ ψ|) completely identiﬁes the equivalence
class [|ψ ψ|] A .

Proposition 16. Let |ψ, |ψ ∈ HS be two unit vectors such that TrB (|ψ ψ|) = TrB (|ψ ψ |). Then, there
exists a unitary operator UB ∈ GB such that |ψ = UB |ψ.

The proof is provided in Appendix L.

We have seen that the states of system S A are in one-to-one correspondence with the density
matrices of the form TrB (|ψ ψ|), where |ψ ∈ HS is a generic pure state. Note that the rank of the
density matrices ρ j in Equation (A109) cannot be larger than the dimensions of the spaces R j and M j ,
denoted as dR j and dM j , respectively. Taking this fact into account, we can represent the states of S A as
*
St(S A ) ρ = p j ρ j : ρ j ∈ QSt(R j ) , Rank(ρ j ) ≤ min{dR j , dM j } , (94)
j∈Irr(U )

where { p j } is a generic probability distribution. The state space of system S A is not convex, unless
the condition

dM j ≥ dR j ∀ j ∈ Irr(U ) (95)

is satisfied. Basically, in order to obtain a convex set of density matrices, we need the total system S to
be “sufficiently large” compared to its subsystem S A . This observation is a clue suggesting that the
standard convex framework could be considered as the effective description of subsystems of “large”
closed systems.
Finally, note that, in agreement with the general construction, the pure states of system S are
“purifications" of the states of the system S A . Every state of system S A can be obtained from a pure
state of system S by “tracing out" system SB . Moreover, every two purifications of the same state are
connected by a unitary transformation in GB .

11. Conclusions
In this paper, we adopted rather minimalistic framework, in which a single physical system was
described solely in terms of states and transformations, without introducing measurements. Or at least,
without introducing measurements in an explicit way: of course, one could always interpret certain
transformations as “measurement processes", but this interpretation is not necessary for any of the
conclusions drawn in this paper.

134
Entropy 2018, 20, 358

Our framework can be interpreted in two ways. One way is to think of it as a fragment of
the larger framework of operational-probabilistic theories [8,11–13], in which systems can be freely
composed and measurements are explicitly described. The other way is to regard our framework as
a dynamicist framework, meant to describe physical systems per se, independently of any observer.
Both approaches are potentially fruitful.
On the operational-probabilistic side, it is interesting to see how the definition of subsystem
adopted in this paper interacts with probabilities. For example, we have seen in a few examples that
the state space of a subsystem is not always convex: convex combination of allowed states are not
necessarily allowed states. It is then natural to ask: under which condition is convexity retrieved?
In a different context, the non-trivial relation between convexity and the dynamical notion of system
has been emerged in a work of Galley and Masanes [80]. There, the authors studied alternatives to
quantum theory where the closed systems have the same states and the same dynamics of closed
quantum systems, while the measurements are different from the quantum measurements. Among
these theories, they found that quantum theory is the only theory where subsystems have a convex
state space. These and similar clues are an indication that the interplay between dynamical notions
and probabilistic notions plays an important role in determining the structure of physical theories.
Studying this interplay is a promising avenue of future research.
On the opposite end of the spectrum, it is interesting to explore how far the measurement-free
approach can reach. An interesting research project is to analyze the notions of subsystem, pure
state, and purification, in the context of algebraic quantum field theory [22] and quantum statistical
mechanics [32]. This is important because the notion of pure state as an extreme point of the convex
set breaks down for type III von Neumann algebras [81], whereas the notions used in this paper
(commutativity of operations, cyclicity of states) would still hold. Another promising clue is the
existence of dual pairs of non-overlapping agents, which amounts to the requirement that the set
of operations of each agent has trivial center and coincides with its double commutant. A similar
condition plays an important role in the algebraic framework, where the operator algebras with trivial
center are known as factors, and are at the basis of the theory of von Neumann algebras [82,83].
Finally, another interesting direction is to enrich the structure of system with additional features,
such as a metric, quantifying the proximity of states. In particular, one may consider a strengthened
formulation of the Conservation of Information, in which the physical transformations are required
not only to be invertible, but also to preserve the distances. It is then interesting to consider how the
metric on the pure states of the whole system induces a metric on the subsystems, and to search for
relations between global metric and local metric. Also in this case, there is a promising precedent,
namely the work of Uhlmann [84], which led to the notion of fidelity [85]. All these potential avenues
of future research suggest that the notions investigated in this work may find application in a variety
of different contexts, and for a variety of interpretational standpoints.

Acknowledgments: It is a pleasure to thank Gilles Brassard and Paul Raymond-Robichaud for stimulating
discussions on their recent work [66], Adán Cabello, Markus Müller, and Matthias Kleinmann for providing
motivation to the problem of deriving subsystems, Mauro D’Ariano and Paolo Perinotti for the invitation to
contribute to this Special Issue, and Christopher Timpson and Adam Coulton for an invitation to present at the
Oxford Philosophy of Physics Seminar Series, whose engaging atmosphere stimulated me to think about extensions
of the Purification Principle. I am also grateful to the three referees of this paper for useful suggestions, and to
Robert Spekkens, Doreen Fraser, Lídia del Rio, Thomas Galley, John Selby, Ryszard Kostecki, and David Schmidt
for interesting discussions during the revision of the original manuscript. This work is supported by the
Foundational Questions Institute through grant FQXi-RFP3-1325, the National Natural Science Foundation
of China through grant 11675136, the Croucher Foundation, the Canadian Institute for Advanced Research
(CIFAR), and the Hong Research Grant Council through grant 17326616. This publication was made possible
through the support of a grant from the John Templeton Foundation. The opinions expressed in this publication
are those of the authors and do not necessarily reflect the views of the John Templeton Foundation. The authors
also acknowledge the hospitality of Perimeter Institute for Theoretical Physics. Research at Perimeter Institute
is supported by the Government of Canada through the Department of Innovation, Science and Economic
Development Canada and by the Province of Ontario through the Ministry of Research, Innovation and Science.
Conflicts of Interest: The author declares no conflict of interest.

135
Entropy 2018, 20, 358

Appendix A. Proof That Deﬁnitions (20) and (21) Are Well-Posed

We give only the proof for deﬁnition (20), as the other proof follows the same argument.

Proposition A1. If the transformations S , S), T , T) ∈ Act( A; S) are such that [S] A = [S]
) A and [T ] A =
) ) )
[T ] A , then [S ◦ T ] A = [S ◦ T ] A .

Proof. Let (S1 , S2 , . . . , Sm ) ⊂ Act( A; S) and (T1 , T2 , . . . , Tn ) ⊂ Act( A; S) be two ﬁnite sequences
such that

S1 = S , Sm = S) , Deg A (Si ) ∩ Deg A (Si+1 )

= ∅ ∀i ∈ {1, . . . , m − 1},
T1 = T , Tn = T) , Deg A (T j ) ∩ Deg A (T j+1 )
= ∅ ∀ j ∈ {1, . . . , n − 1}. (A1)

Without loss of generality, we assume that the two ﬁnite sequences have the same length m = n.
When this is not the case, one can always add dummy entries and ensure that the two sequences have
the same length: for example, if m < n, one can always deﬁne Si := Sm for all i ∈ {m + 1, . . . , n}.
Equation (A1) mean that for every i and j there exist transformations Bi , B)i , C j , C)j ∈ Act( A; S)
such that

Bi ◦ Si = B)i ◦ Si+1 ,
C j ◦ T j = C)j ◦ T j+1 . (A2)

Using the above equalities for i = j, and using the fact that transformations in Act( A; S) commute
with transformations in Act( A; S) , we obtain
' ( ' ( ' ( ' (
Bi ◦ Ci ◦ Si ◦ Ti = Bi ◦ Si ◦ Ci ◦ Ti
' ( ' (
= B)i ◦ Si+1 ◦ C)i ◦ Ti+1
' ( ' (
= B)i ◦ C)i ◦ Si+1 ◦ Ti+1 . (A3)

In short, we proved that

Deg A (Si ◦ Ti ) ∩ Deg A (Si+i ◦ Ti+1 )

= ∅ ∀i ∈ {1, . . . , n − 1} . (A4)

To conclude, observe that the sequence (S1 ◦ T1 , S2 ◦ T2 , . . . , Sn ◦ Tn ) satisﬁes S1 ◦ T1 = S ◦ T ,

Sn ◦ Tn = S) ◦ T) , and Equation (A4). By deﬁnition, this means that the transformations S ◦ T and
S) ◦ T) are in the same equivalence class.

Appendix B. The Commutant of the Local Channels

Here, we show that the commutant of the quantum channels of the form A ⊗ I B consists of
quantum channels of the form I A ⊗ B .
Let C ∈ Chan(S) be a quantum channel that commutes with all channels of the form A ⊗ I B ,
with A ∈ Chan( A). For a ﬁxed unit vector |α ∈ H A , consider the erasure channel Aα ∈ Chan( A)
deﬁned by

Aα (ρ) = |α α| Tr[ρ] ∀ρ ∈ Lin( A) . (A5)

136
Entropy 2018, 20, 358

Then, the commutation condition C ◦ (Aα ⊗ I B ) = (Aα ⊗ I B ) ◦ C implies

C |α α| ⊗ | β β| = C Aα ⊗ I B |α α| ⊗ | β β|

= Aα ⊗ I B C |α α| ⊗ | β β|

= |α α| ⊗ Tr A C |α α| ⊗ | β β| ∀| β ∈ H B . (A6)

Tracing over B on both sides of Equation (A6), we obtain

TrB C |α α| ⊗ | β β| = |α α| . (A7)

The above relation implies that the state C |α α| ⊗ | β β| is of the form

C |α α| ⊗ | β β| = |α α| ⊗ B(| β β|) , (A8)

for some suitable channel B ∈ Chan( B). Since |α and | β are arbitrary, we obtained C = I A ⊗ B .

Appendix C. Subsystems Associated to Finite Dimensional Von Neumann Algebras

Here, we prove the statements made in the main text about quantum channels with Kraus
operators in a given algebra.

Appendix C.1. The Commutant of Chan(A)

The purpose of this subsection is to prove the following theorem:

Theorem A1. Let A be a von Neumann subalgebra of Md (C), d < ∞, and let Chan(A) be the set of quantum
channels with Kraus operators in A. Then, the commutant of Chan(A) is the set of channels with Kraus operators
in the algebra A . In formula,

Chan(A) = Chan(A ) . (A9)

The proof consists of a few lemmas, provided in the following.

Lemma A1. Every channel D ∈ Chan( A) must satisfy the condition

Pl ◦ D ◦ Pk = 0 ∀l
= k , (A10)

where Pk is the CP map Pk (·) := Πk · Πk , and Πk is the projector on the subspace H Ak ⊗ H Bk in Equation (31).

Proof. Consider the quantum channel C ∈ Chan(A) deﬁned as

*
C := |αk αk | Tr Ak ⊗I Bk ◦ Pk , (A11)
k

137
Entropy 2018, 20, 358

where each |αk is a generic (but otherwise ﬁxed) unit vector in H Ak and I Bk is the identity map
on Lin(H Bk ). By deﬁnition, every channel D ∈ Chan(A) must satisfy the condition C ◦ D = D ◦ C .
In particular, we must have

D(|αk αk | ⊗ | β k β k |) = (D ◦ C)(|αk αk | ⊗ | β k β k |)
= (C ◦ D)(|αk αk | ⊗ | β k β k |)
*
= |αl αl | ⊗ Tr Al (Pl ◦ D)(|αk αk | ⊗ | β k β k |) . (A12)
l

Applying the CP map Pl on both sides of the above equality, we obtain the relation

(Pl ◦ D)(|αk αk | ⊗ | β k β k |) = |αl αl | ⊗ Ml (|αk αk | ⊗ | β k β k |) , (A13)

where Ml is the map from Md (C) to Lin(H Al ) deﬁned as Ml := Tr Al ◦Pl ◦ D .

Note that the right-hand side of Equation (A13) depends on the choice of vector |αl , which is
arbitrary. On the other hand, the left-hand side does not depend on |αl . Hence, the only way that the
two sides of Equation (A13) can be equal for k
= l is that they are both equal to 0. Moreover, since |αk
and | β k are arbitrary vectors in H Ak and H Bk , respectively, Equation (A13) implies the relation

(Pl ◦ D)(ρ) = 0 ∀ρ ∈ Lin(H Ak ⊗ H Bk ) , ∀l

= k . (A14)

Since ρ is an arbitrary operator in Lin(H Ak ⊗ H Bk ), we conclude that the relation Pl ◦ D ◦ Pk = 0

holds for every l
= k.

Lemma A2. Every channel D ∈ Chan( A) must satisfy the conditions

D ◦ Pk = Pk ◦ D ◦ Pk ∀k (A15)

and

Pk ◦ D = Pk ◦ D ◦ Pk ∀k . (A16)

In short: D ◦ Pk = Pk ◦ D for every k.

Proof. Deﬁne Dk := D ◦ Pk . Then, the Cauchy–Schwarz inequality yields

/

φ | Πi Dk ( ρ ) Π j | φ ≤ φ | Πi Dk ( ρ ) Πi | φ φ | Π j Dk ( ρ ) Π j | φ
/
≤ φ|(Pi ◦ D ◦ Pk )(ρ) |φ φ|(P j ◦ D ◦ Pk )(ρ) |φ . (A17)

(D ◦ Pk )(ρ) = Dk (ρ)
= ∑ Πi Dk ( ρ ) Π j
i,j

= Πk Dk (ρ) Πk
= (Pk ◦ D ◦ Pk )(ρ) , (A18)

valid for arbitrary density matrices ρ, and therefore for arbitrary matrices in Md (C). In conclusion,
Equation (A16) holds.

138
Entropy 2018, 20, 358

The proof of Equation (A15) is analogous to that of Equation (A16), with the only difference that it
uses the adjoint map, which for a generic linear map L : Lin(HS ) → Lin(HS ) is deﬁned by the relation

Tr[L† (O) ρ] := Tr[O L(ρ)] ∀O ∈ Md (C) , ∀ρ ∈ Md (C) . (A19)

) k := Pk ◦ D . Then, we obtain the relation

Speciﬁcally, we deﬁne the map D
0 1
) k (Πi ρΠ j ) |φ = Tr D
) † (|φ φ|)Πi ρΠ j
φ| D k
2/ / 3

= Tr D) † (|φ φ|)Πi √ρ √
ρΠ )
D † (| φ φ |
k j k
/ 0 1 0 1
≤ Tr Dk† (|φ φ|) Πi ρΠi Tr Dk† (|φ φ|) Π j ρΠ j
/
= φ| D ) k (Πi ρΠi ) |φ ψ| D) k (Π j ρΠ j ) |ψ
/
= φ| (Pk ◦ D ◦ Pi )(ρ) |φ ψ| (Pk ◦ D ◦ P j )(ρ) |ψ , (A20)

where the right-hand side is 0 unless i = j = k (cf. Lemma A2). Since the condition
) k (Πi ρΠ j ) |φ| = 0, ∀|φ ∈ HS implies the condition D
| φ| D ) k (Πi ρΠ j ) = 0, we obtained the relation

) k (Πi ρΠ j ) = 0
D unless i = j = k. (A21)

Using this fact, we obtain the equality

) k (ρ)
(Pk ◦ D)(ρ) = D
=∑D ) k (Πi ρΠ j )
i,j
) k ◦ Pk )(ρ)
= (D
= (Pk ◦ D ◦ Pk )(ρ) . (A22)

Since the equality holds for every ρ, this proves Equation (A16).

Lemma A2 guarantees that the linear map D ◦ Pk sends Lin(Rk ⊗ Mk ) into itself. It is also easy
to see that the map D ◦ Pk has a simple form:

Lemma A3. For every channel D ∈ Chan( A) , one has

D ◦ Pk = (I Ak ⊗ Bk ) ◦ Pk ∀k, (A23)

where I Ak is the identity map from Lin(H Ak ) to itself, and Bk is a quantum channel from Lin(H Ak ) to itself.

Proof. Straightforward extension of the proof in Appendix B.

Using the notion of adjoint, we can now prove the following

Lemma A4. For every channel D ∈ Chan( A) , the adjoint D † preserves the elements of the algebra A, namely
D † (C ) = C for all C ∈ A.

Proof. Let C be a generic element of A. By Equation (31), one has the equality
* *
C= (Ck ⊗ IBk ) = P k ( C ). (A24)
k k

139
Entropy 2018, 20, 358

Using Lemma A3 and the deﬁnition of adjoint, we obtain

Tr[D † (C ) ρ] = Tr[C D(ρ)]

= ∑ Tr[Pk (C ) D(ρ)]
k
= ∑ Tr[C (Pk ◦ D)(ρ)]
k
= ∑ Tr[C (Pk ◦ D ◦ Pk )(ρ)]
k

= ∑ Tr Pk (C ) (D ◦ Pk )(ρ)
k

= ∑ Tr (Ck ⊗ IBk ) [(I Ak ⊗ Bk ) ◦ Pk ](ρ) , (A25)
k

having used Lemma A3 in the last equality. Then, we use the fact that the channel Bk is trace-preserving,
and therefore its adjoint Bk† preserves the identity. Using this fact, we can continue the chain of
equalities as

Tr[D † (C )] = ∑ Tr [Ck ⊗ Bk† ( IBk )] Pk (ρ)
k

= ∑ Tr (Ck ⊗ IBk ) Pk (ρ)
k

= ∑ Tr Pk (Ck ⊗ IBk ) ρ
k
, -
*
= Tr Ck ⊗ IBk ρ
k
= Tr[Cρ] , (A26)

having used Equation (A24) in the last equality. Since the equality holds for every density matrix ρ,
we proved the equality D † (C ) = C.

We are now in position to prove Theorem A1.

Proof of Theorem A1. Let D be a quantum channel in Chan(A) . Then, Lemma A4 guarantees that the
adjoint D † preserves all operators in the algebra A. Then, a result due to Lindblad [86] guarantees that
all the Kraus operators of D belong to the algebra A . This proves the inclusion Chan(A) ⊆ Chan(A ).
The converse inclusion is immediate: if a channel D belongs to Chan( A ), it commutes with all
channels in Chan(A) thanks to the block diagonal form of the Kraus operators (cf. Equations (32)
and (33)).

Appendix C.2. States of Subsystems Associated to Finite Dimensional Von Neumann algebras
Here, we provide the proof of Proposition 5, adopting the notation B := A .
The proof uses the following lemma:

Lemma A5 (No signalling condition). For every channel D ∈ Chan(B), one has TrB ◦D = TrB .

Proof. By deﬁnition, the partial trace channel TrB can be written as

*
TrB = (I Ak ⊗ TrBk ) ◦ Pk . (A27)
k

140
Entropy 2018, 20, 358

where the second equality follows from Lemma A3, and the third equality follows from the fact that
Bk is trace-preserving.

Proof of Proposition 5. Suppose that ρ and σ are equivalent for A. By deﬁnition, this means that there
exists a ﬁnite sequence (ρ1 , ρ2 , . . . , ρn ) such that

ρ1 = ρ , ρn = σ , and Deg B (ρi ) ∩ Deg B (ρi+1 )

= ∅ ∀i ∈ {1, 2, . . . , n − 1} . (A29)

The condition of non-trivial intersection implies that, for every i ∈ {1, 2, . . . , n − 1}, one has

) i ( ρ i +1 ) ,
Di ( ρi ) = D (A30)

) i are two quantum channels in Chan(B). Tracing over B on both sides we obtain
where Di and D
the relation

) i ) ( ρ i +1 ) ,
(TrB ◦Di ) (ρi ) = (TrB ◦D (A31)

and, thanks to Lemma A5, TrB [ρi ] = TrB [ρi+1 ]. Since the equality holds for every i ∈ {1, . . . , n − 1},
we obtained the condition TrB [ρ] = TrB [σ]. In summary, if two states ρ and σ are equivalent for A,
then TrB [ρ] = TrB [σ ].
To prove the converse, it is enough to deﬁne the channel D0 ∈ Chan(B) as
*
D0 ( ρ ) : = TrBk [Pk (ρ)] ⊗ β k , (A32)
k

where each β k is a ﬁxed (but otherwise generic) density matrix in Lin(H Bk ). Now, if the equality
TrB [ρ] = TrB [σ ] holds, then also the equality D0 (ρ) = D0 (σ) holds. This proves that the intersection
between Deg B (ρ) and Deg B (σ) is non-empty, and therefore ρ and σ are equivalent for A.

Appendix C.3. Transformations of Subsystems Associated to Finite Dimensional von Neumann algebras
+
Here, we prove that all transformations of system S A are of the form A = k Ak , where each Ak
is a quantum channel from Lin(H Ak ) to itself. The proof is based on the following lemmas:

Lemma A6. For every channel C ∈ Chan(A), one has the relation

Pk ◦ C = (Ak ⊗ I Bk ) ◦ Pk , (A33)

where Ak is a quantum channel from Lin(H Ak ) to itself.

141
Entropy 2018, 20, 358

Proof. Let
*
C(ρ) = ∑ Ci ρ Ci† , Ci = (Cik ⊗ IBk ) (A34)
i k

be a Kraus representation of channel C . The preservation of the trace amounts to the condition

I= ∑ Ci† Ci
i

*
= ∑ Cik† Cik ⊗ IBk , (A35)
k i

which implies

∑ Cik† CIk = IAk ∀k . (A36)

Now, we have

(Pk ◦ C)(ρ) = ∑(Cik ⊗ IBk ) Pk (ρ) (Cik ⊗ IBk )†

i
= (Ak ⊗ I Bk ) [Pk (ρ)] , (A37)

where the channel Ak is deﬁned as

Ak (σ) := ∑ Cik σ Cik

†
∀σ ∈ Lin(H Ak ) . (A38)
i

Since the density matrix ρ in Equation (A37) is arbitrary, we proved the relation Pk ◦ C =
(Ak ⊗ I Bk ) ◦ Pk .

Lemma A7. For two channels C , C ∈ Chan(A), let Ak and Ak be the quantum channels deﬁned in Lemma A6.
Then, the following are equivalent:

1. TrB ◦ C = TrB ◦ C ,
2. Ak = Ak for every k.

Similarly, for channel C , we have

*
TrB ◦ C = (Ak ⊗ TrBk ) ◦ Pk . (A40)
k

Clearly, if Ak and Ak are equal for every k, then the partial traces TrB ◦ C and TrB ◦ C are equal.
1 =⇒ 2. Suppose that partial traces TrB ◦ C and TrB ◦ C are equal. Then, Equations (A39) and (A40)
imply the equality

(Ak ⊗ TrBk ) ◦ Pk = (Ak ⊗ TrBk ) ◦ Pk ∀k . (A41)

142
Entropy 2018, 20, 358

In turn, the above equality implies Ak = Ak , ∀k, as one can easily verify by applying both sides
of Equation (A41) to a generic product operator Xk ⊗ Yk , with Xk ∈ Lin(H Ak ) and Yk ∈ Lin(H Bk ).

Lemma A8. Two channels C , C ∈ Chan(A) are equivalent for A if and only if TrB ◦ C = TrB ◦ C .

Proof. Suppose that C and C are equivalent for A. By deﬁnition, this means that there exists a ﬁnite
sequence (C1 , C2 , . . . , Cn ) ⊂ Chan(A) such that

C1 = C , Cn = C , Deg B (Ci ) ∩ Deg B (Ci+1 )

= ∅ ∀i ∈ {1, . . . , n − 1} . (A42)

) i ∈ Chan(B) such that

This means that, for every i, there exist two channels Di , D

) i ◦ C i +1 .
Di ◦ Ci = D (A43)

Tracing over B on both sides, we obtain

) i ◦ C i +1 ,
TrB ◦ Di ◦ Ci = TrB ◦ D (A44)

and, using the no signalling condition of Lemma A5,

TrB ◦ Ci = TrB ◦ Ci+1 . (A45)

Since the above relation holds for every i, we obtained the equality TrB ◦ C = TrB ◦ C .
Conversely, suppose that TrB ◦ C = TrB ◦ C . Then, Lemma A7 implies the equality

Ak = Ak ∀k , (A46)

where Ak and Ak are the quantum channels defined in Lemma A6.
Now, let D0 be the channel in Chan(B) defined in Equation (A32). By definition, we have
*
D0 ◦ C = (I Ak ⊗ β k TrBk ) ◦ Pk ◦ C
k
*
= (I Ak ⊗ β k TrBk ) ◦ (Ak ⊗ I Bk ) ◦ Pk (A47)
k
*
= (Ak ⊗ β k TrBk ) ◦ Pk .
k

Similarly, we have
*
D0 ◦ C = (Ak ⊗ β k TrBk ) ◦ Pk . (A48)
k

Since Ak and Ak are equal for every k, we conclude that D0 ◦ C is equal to D0 ◦ C . This means
that the intersection between Deg(C) and Deg(C ) is non-empty, and, therefore C is equivalent to C
modulo B.

Combining Lemmas A7 and A8, we obtain the following corollary:

Corollary A1. For two channels C , C ∈ Chan(A), let Ak and Ak be the quantum channels deﬁned in
Lemma A6. Then, the following are equivalent:

1. C and C are equivalent for A,

+ +
2. k Ak = k Ak .

143
Entropy 2018, 20, 358

Proof. By Lemma A8, C and C are equivalent for A if and only if the condition TrB ◦C = TrB ◦C
holds. By Lemma A7, the condition TrB ◦C = TrB ◦C holds if and only if one has Ak = Ak for every k.
+ +
In turn, the latter condition holds if and only if the equality k Ak = k Ak holds.

In summary, the transformations of system S A are characterized as

*
Transf (S A ) = Chan( Ak ) , (A49)
k

where Chan( Ak ) is the set of all quantum channels from Lin(H Ak ) to itself.
To conclude, we observe that the transformations of S A act in the expected way. To this purpose,
we consider the restriction map
* *
πA : Chan(A) → Chan(Ak ) , C → Ak , (A50)
k k

where Ak is deﬁned as in Lemma A6.

Using the restriction map, we can prove the following propositions:

Proposition A2. For every channel C ∈ Chan(A), we have the relation

TrB ◦ C = πA (C) ◦ TrB . (A51)

In words, evolving system S with C and then computing the local state of system S A is the same as
computing the local state of system S A and then evolving it with πA (C).

Proof. Using Lemma A6, the proof is straightforward:

*
TrB ◦ C = (I Ak ⊗ TrBk ) ◦ Pk ◦ C
k
*
= (I Ak ⊗ TrBk ) ◦ (Ak ⊗ I Bk ) ◦ Pk
k
*
= Ak ◦ (I Ak ⊗ TrBk ) ◦ Pk (A52)
k
, -
* *
= Ak ◦ (I Al ⊗ TrBl ) ◦ Pl
k l
= πA (C) ◦ TrB .

Proposition A3. For every pair of channels C1 , C2 ∈ Chan(A), we have the homomorphism relation

πA (C1 ◦ C2 ) = πA (C1 ) ◦ πA (C2 ) . (A53)

Proof. Let us write the channels πA (C1 ), πA (C2 ), and πA (C1 ◦ C2 ) as

* * *
πA (C1 ) = A1k , πA (C2 ) = A2k , and πA (C1 ◦ C2 ) = A12k . (A54)
k k k

144
Entropy 2018, 20, 358

With this notation, we have

(A12k ⊗ I Bk ) ◦ Pk = Pk ◦ C1 ◦ C2
= (A1k ⊗ I Bk ) ◦ Pk ◦ C2
= (A1k ⊗ I Bk ) ◦ (A2k ⊗ IBk ) ◦ Pk

= (A1k ◦ A2k ) ⊗ I Bk ◦ Pk ∀k . (A55)

From the above equation, we obtain the equality A12k = A1k ◦ A2k for all k. In turn, this equality
implies the desired result:

* *
πA (C1 ) ◦ πA (C2 ) = A1k ◦ A2l
k l
*
= A1k ◦ A2k
k
*
= A12k
k
= πA (C1 ◦ C2 ) . (A56)

Appendix D. Basis-Preserving and Multiphase-Covariant Channels

Appendix D.1. Proof of Theorem 1

Here, we prove that the monoid of multiphase covariant channels on S (denoted as MultiPCov(S))
and the monoid of basis-preserving channels on S (denoted as BPres(S)) are one the commutant of
the other.
The proof uses a few lemmas, the ﬁrst of which is fairly straightforward:

Lemma A9. BPres(S) ⊆ MultiPCov(S).

Proof. Every unitary channel of the form Uθ = Uθ · Uθ† is basis-preserving, and therefore every
channel C in the commutant of BPres(S) must commute with it. By deﬁnition, this means that C is
multiphase covariant.

To prove the converse inclusion, we use the following characterization of multiphase

covariant channels:

Lemma A10 (Characterization of MultiPCov(S)). A channel M ∈ Chan(S) is multiphase covariant if and

only if it has a Kraus representation of the form

r d
M(ρ) = ∑ Mi ρMi† + ∑ ∑ p( j|k ) | j k | ρ|k j| , (A57)
i =1 k =1 j
= k

where each operator Mi is diagonal in the computational basis, and each p( j|k) is non-negative.

Proof. Let M ∈ Lin(HS ⊗ HS ) be the Choi operator of channel M. For a multiphase covariant channel,
the Choi operator must satisfy the commutation relation [87,88]

[ M, Uθ ⊗ U θ ] = 0 ∀θ ∈ [0, 2π )⊗d . (A58)

145
Entropy 2018, 20, 358

This condition implies that M must have the form

M= ∑ Mss,tt |s t| ⊗ |s t| + ∑ ∑ M jk,jk | j j| ⊗ |k k| , (A59)

s,t k j
=k

where the d × d matrix [Γs,t ] := [ Mss,tt ]s,t∈{1,...,d} is positive semideﬁnite and each coefﬁcient Mst,st is
non-negative. Then, Equation (A57) follows from diagonalizing the matrix Γ and using the relation
M(ρ) = Tr[ M ( I ⊗ ρ T )], where ρ T is the transpose of ρ in the computational basis.

From Equation (A57), one can show every multiphase covariant channel commutes with every
basis-preserving channel:

Lemma A11. MultiPCov(S) ⊆ BPres(S) .

Proof. Let B ∈ BPres(S) be a generic basis-preserving channel, and let M ∈ MultiPCov(S) be a generic
multiphase covariant channel. Using the characterization of Equation (A57), we obtain

M ◦ B(ρ) = ∑ Mi B(ρ) Mi† + ∑ ∑ p( j|k)| j k|B(ρ)|k j|

i k j
=k

= ∑ B( Mi ρMi† ) + ∑ ∑ p( j|k)| j k|B(ρ)|k j|

i k j
=k

=∑ B( Mi ρMi† ) + ∑∑ p( j|k )| j k|ρ|k j|

i k j
=k

= ∑ B( Mi ρMi† ) + ∑ ∑ p( j|k) B(| j j|) k|ρ|k

i k j
=k

=B ∑ Mi ρMi† + ∑ ∑ p( j|k)| j k|ρ|k j|
i k j
=k

= B ◦ M(ρ) ∀ρ ∈ Lin(S) . (A60)

The second equality used the fact that the Kraus operators of B are diagonal in the computational
basis [71,72] and therefore commute with each operator Mi . The third equality uses the relation
k|B(ρ)|k = k|ρ|k, following from the fact that B preserves the computational basis [71,72].

Summarizing, we have shown that the multiphase covariant channels are the commutant of the
basis-preserving channels:

Corollary A2. MultiPCov(S) = BPres(S) .

Note that Corollary A2 implies the relation

MultiPCov(S) = BPres(S) ⊇ BPres(S) . (A61)

To conclude the proof of Theorem 1, we prove the converse inclusion:

Lemma A12. MultiPCov(S) ⊆ BPres(S).

Proof. A special case of multiphase covariant channel is the erasure channel Mk deﬁned by Mk (ρ) =
|k k| for every ρ ∈ Lin(S). For a generic channel C ∈ MultiPCov(S) , one must have

C(|k k|) = C ◦ Mk (|k k|) = Mk ◦ C(|k k|) = |k k| . (A62)

Since the above condition must hold for every k, the channel C must be basis-preserving.

146
Entropy 2018, 20, 358

Combining Lemma A12 and Equation (A61), we obtain:

Corollary A3. MultiPCov(S) = BPres(S).

Putting Corollaries A2 and A3 together, we have an immediate proof of Theorem 1.

Appendix D.2. Proof of Equation (55)

Here, we show that the transformations on system S A are classical channels. To construct the
transformations of S A , we have to partition the double commutant of Act( A; S) = MultiPCov(S) into
equivalence classes.
First, recall that MultiPCov(S) = MultiPCov(S) (by Theorem 1). Then, note the following property:

4 ∈ MultiPCov(S) satisfy the condition

Lemma A13. If two channels M, M

4 j j|) |k ,
k| M(| j j|) |k = k| M(| (A63)

4 A .
then [M] A = [M]

Proof. Deﬁne the completely dephasing channel D = ∑k |k k | · |k k |. Clearly, D is basis-preserving.

Using the idempotence relation D ◦ D = D , we obtain

D ◦ M (ρ) = D ◦ D ◦ M (ρ)

= D ◦ M ◦ D (ρ)

= D◦M ∑ | j j| j|ρ| j
j

= ∑ j|ρ| j D M(| j j|)
j

= ∑ j|ρ| j k|M(| j j|)|k |k k| . (A64)

j,k

Likewise, we have

4 (ρ) = ∑ j|ρ| j k |M(|
D◦M 4 j j|)|k |k k | . (A65)
j,k

4 holds, meaning that Deg(M) and

If condition (A63) holds, then the equality D ◦ M = D ◦ M
4 have non-empty intersection. Hence, M and M
Deg(M) 4 must be in the same equivalence class.

The converse of Lemma A13 holds:

4 ∈ MultiPCov(S) are in the same equivalence class, then they must

Lemma A14. If two channels M, M
satisfy condition (A63).

Proof. If M and M 4 are in the same equivalence class, then there exists a ﬁnite sequence
(M1 , M2 , . . . , Mn ) such that

M1 = M , 4,
Mn = M ∀i ∈ {1, . . . , n − 1} ∃Bi , B)i ∈ BPres(S) : Bi ◦ Mi = B)i ◦ Mi+1 .

147
Entropy 2018, 20, 358

The above condition implies

l k | Mi (ρ) |k = Tr[Mi (ρ) |k k|] = k | Bi ◦ Mi (ρ) |k = k | B)i ◦ Mi+1 (ρ) |k
= k | M i +1 ( ρ ) | k , (A66)

for all i ∈ {1, . . . , n − 1} and for all ρ ∈ Lin(ρ). In particular, choosing ρ = | j j| we obtain

k| Mi (| j j|) |k = k| Mi+1 (| j j|) |k ∀i ∈ {1, . . . , n − 1} , ∀ j, k ∈ {1, . . . , d} . (A67)

Hence, Equation (A63) follows.

Appendix E. Classical Systems and the Resource Theory of Coherence

Here, we consider agents who have access to various types of free operations in the resource
theory of coherence. We start from the types of operations that give rise to classical systems, and then
show two examples that do not have this property.

Appendix E.1. Operations That Lead to Classical Subsystems

Consider the following monoids of operations

1. Strictly incoherent operations [41], i.e., quantum channels T with the property that, for every
Kraus operator Ti , the map Ti (·) = Ti · Ti satisﬁes the condition D ◦ Ti = Ti ◦ D , where D is the
completely dephasing channel.
2. Dephasing covariant operations [38–40], i.e., quantum channels T satisfying the condition
D ◦ T = T ◦ D.
3. Phase covariant channels [40], i.e., quantum channels T satisfying the condition T ◦ U ϕ = U ϕ ◦
T , ∀ ϕ ∈ [0, 2π ), where U ϕ is the unitary channel associated with the unitary matrix U ϕ =
∑k eikϕ |k k|.
4. Physically incoherent operations [38,39], i.e., quantum channels that are convex combinations of
channels T admitting a Kraus representation where each Kraus operator Ti is of the form

Ti = Uπi Uθi Pi , (A68)

We now show that all the above operations define classical subsystems according to our construction.
The ﬁrst ingredient in the proof is the observation that each of the monoids 1–5 contains the
monoid of classical channels. Then, we can apply the following lemma:

Lemma A15. Let M ⊆ Chan(S) be a monoid of quantum channels, and let M be its commutant. If M contains
the monoid of classical channels, then M is contained in the set of basis-preserving channels.

Proof. Consider the erasure channel Ck deﬁned by Ck (ρ) := |k k | Tr[ρ], ∀ρ ∈ Lin(HS ). Clearly,
the erasure channel is a classical channel. Then, every channel B ∈ M must satisfy the condition

B(|k k|) = B ◦ Ck (|k k|) = Ck ◦ B(|k k|) = |k k| . (A69)

Since k is generic, this implies that B must be basis-preserving.

Furthermore, we have the following

148
Entropy 2018, 20, 358

Lemma A16. Let Act( A; S) ⊆ Chan(S) be a set of quantum channels that contains the monoid of classical
channels. If two quantum states ρ, σ ∈ St(S) are equivalent for A, then they must have the same diagonal
entries. Equivalently, they must satisfy D(ρ) = D(σ).

Proof. Same as the ﬁrst part of the proof of Proposition 7. Suppose that Condition 1 holds, meaning
that there exists a sequence (ρ1 , ρ2 , . . . , ρn ) such that

ρ1 = ρ , ρn = σ , ∀i ∈ {1, . . . , n − 1} ∃Bi , B)i ∈ Act( B; S) : Bi (ρi ) = B)i (ρi+1 ) , (A70)

where Bi and B)i are channels in the commutant Act( A; S) . The above equation implies

k|Bi (ρi )|k = k|B)i (ρi+1 )|k . (A71)

Now, we know that the commutant Act( A; S) consists of basis-preserving channels (Lemma A15).
Since every basis-preserving channel satisﬁes the relation k |B(ρ)|k = k |ρ|k [71,72], we obtain that
all the density matrices (ρ1 , ρ2 , . . . , ρn ) must have the same diagonal entries, namely D(ρ1 ) = D(ρ2 ) =
· · · = D(ρn ).

Now, we observe that the completely dephasing channel D is contained in the commutant of
all the monoids 1–5. This fact is evident for the monoids 1, 2 and 5, where the commutation with D
holds by deﬁnition. For the monoid 3, the commutation with D has been proven in [38,39], and for the
monoid 4 it has been proven in [40].
Since D is contained in the commutant of all the monoids 1–5, we can use the following
obvious fact:

Lemma A17. Let Act( A; S) ⊆ Chan(S) be a monoid of quantum channels and suppose that its commutant
Act( A; S) contains the dephasing channel D . If two quantum states ρ, σ ∈ St(S) satisfy D(ρ) = D(σ ),
then they are equivalent for A.

Proof. Trivial consequence of the deﬁnition.

Combining Lemmas A16 and A17, we obtain the following

Proposition A4. Let Act( A; S) ⊆ Chan(S) be a monoid of quantum channels on system S. If Act(A; S)
contains the monoid of classical channels, and if the the commutant Act( A; S) contains the completely dephasing
channel D , then two states ρ, σ ∈ St(S) are equivalent for A if and only if D(ρ) = D(σ).

Proof. Same as the proof of Proposition 7.

Proposition A4 implies that the states of the subsystem S A are in one-to-one correspondence with
diagonal density matrices. Since the conditions of the proposition are satisﬁed by all the monoids 1–5,
each of these monoids deﬁnes the same state space.
The same result holds for the transformations:

Proposition A5. Let Act( A; S) ⊆ Chan(S) be a monoid of quantum channels. If Act(A; S) contains the
monoid of classical channels, and if the the commutant Act( A; S) contains the completely dephasing channel D ,
then two transformations S , T ∈ Transf (S) are equivalent for A if and only if D ◦ T ◦ D = D ◦ T ◦ D .

Proof. Same as the proofs of Lemmas A13 and A14.

Proposition A5 implies that the transformations of subsystem S A can be identiﬁed with classical
channels. Hence, system S A is exactly the d-dimensional classical subsystem of the quantum system S.
In summary, each of the monoids 1–5 deﬁnes the same d-dimensional classical subsystem.

149
Entropy 2018, 20, 358

Appendix E.2. Operations That Do Not Lead to Classical Subsystems

Here, we show that our construction does not associate classical subsystems with the monoids
of incoherent and maximally incoherent operations. To start with, we recall the deﬁnitions of these
two subsets:

1. The maximally incoherent operations are the quantum channels T that map diagonal density
matrices to diagonal density matrices, namely T ◦ D = D ◦ T ◦ D , where D is the completely
dephasing channel.
2. The Incoherent operations are the quantum channels T with the property that, for every Kraus
operator Ti , the map Ti (·) = Ti · Ti sends diagonal matrices to diagonal matrices, namely
Ti ◦ D = D ◦ Ti ◦ D .

Note that each set of operations contains the set of classical channels. Hence, the commutant of
each set of operation consists of (some subset of) basis-preserving channels (by Lemma A15).
Moreover, both sets of operations 1 and 2 contain the set of quantum channels Cψ deﬁned by
the relation

I − |1 1|
Cψ (ρ) = |1 1| ψ|ρ|ψ + Tr[( I − |ψ ψ|) ρ] ∀ρ ∈ Lin(HS ) , (A72)
d−1

where |ψ ∈ HS is a ﬁxed (but otherwise arbitrary) unit vector. The fact that both monoids contain the
channels Cψ implies a strong constraint on their commutants:

Lemma A18. The only basis-preserving quantum quantum channel B ∈ BPres(S) satisfying the property
B ◦ Cψ = Cψ ◦ B for every |ψ ∈ HS is the identity channel.

Proof. The commutation property implies the relation

(Cψ ◦ B) (|ψ ψ|) = (B ◦ Cψ ) (|ψ ψ|)

= B(|1 1|)
= |1 1| , (A73)

where we used the fact that B is basis-preserving. Tracing both sides of the equality with the projector
|1 1|, we obtain the relation

1 = 1|(Cψ ◦ B) (|ψ ψ|)|1

= ψ| B(|ψ ψ|) |ψ , (A74)

the second equality following from the deﬁnition of channel Cψ . In turn, Equation (A74) implies the
relation B(|ψ ψ|) = |ψ ψ|. Since |ψ is arbitrary, this means that B must be the identity channel.

In summary, the commutant of the set of incoherent channels consists only of the identity channel,
and so is the the commutant of the set of maximally incoherent channels. Since the commutant is
trivial, the equivalence classes are trivial, meaning that the subsystem S A has exactly the same states
and the same transformations of the original system S. In short, the subsystem associated with the
incoherent (or maximally incoherent) channels is the full quantum system.

Appendix F. Enriching the Sets of Transformations

Here, we provide a mathematical construction that enlarges the sets of transformations in the
“baby category” with objects S, S A , and SB . This construction provides a realization of a catagorical
structure known as splitting of idempotents [73,74].

150
Entropy 2018, 20, 358

As we have seen in the main text, our basic construction does not provide transformations
from the subsystem S A to the global system S. One could introduce such transformations by hand,
by deﬁning an embedding [63]:

Deﬁnition A1. An embedding of S A into S is a map E A : St(S A ) → St(S) satisfying the property

TrB ◦E A = IS A . (A75)

In other words, E A associates a representative to every equivalence class ρ ∈ St(S A ).

A priori, embeddings need not be physical processes. Consider the example of a classical system,
viewed as a subsystem of a closed quantum system as in Section 4.3. An embedding would map each
classical probability distribution ( p1 , p2 , . . . , pd ) into a pure quantum state |ψ = ∑k ck |k satisfying
the condition |ck |2 = pk for all k ∈ {1, . . . , d}. If the embedding were a physical transformation, there
would be a way to physically transform every classical probability distributions into a corresponding
pure quantum state, a fact that is impossible in standard quantum theory.
When building a new physical theory, one could postulate that there exists an embedding E A that
is physically realizable. In that case, the transformations from S A to S would be those in the set

Transf (S A → S) = T ◦ E A : T ∈ Transf (S) , (A76)

and similarly for the transformations from SB to S. The transformations from S A to SB would be those
in the set

Transf (S A → SB ) = Tr A ◦T ◦ E A : T ∈ Transf (S) , (A77)

and similarly for the transformations from SB to S A . In that new theory, the old set of transformations
from S A should be replaced by the new set:

(S A ) =
Transf TrB ◦T ◦ E A : T ∈ Transf (S) , (A78)

so that the structure of category is preserved. Similarly, the old set of transformations from SB to SB
should be replaced by the new set .

(SB ) =
Transf Tr A ◦T ◦ E B : T ∈ Transf (S) . (A79)

When this is done, the embeddings deﬁne two idempotent morphisms P A := E A ◦ TrB and
P B := E B ◦ Tr A , i.e., two morphisms satisfying the conditions

PA ◦ PA = PA and PB ◦ PB = PB . (A80)

The partial trace and the embedding define a splitting of idempotents, in the sense of Refs. [73,74].
The splitting of idempotents was considered in the categorical framework as a way to define general
decoherence maps, and, more specifically, decoherence maps to classical subsystems [74,89].

Appendix G. The Total System as a Subsystem

For every system satisfying the Non-Overlapping Agents Requirement, the system S can be
regarded as a subsystem:

151
Entropy 2018, 20, 358

Proposition A6. Let S be a system satisfying the Non-Overlapping Agents Requirement, let Amax be the
maximal agent, and S Amax be the associated subsystem. Then, one has S Amax S, meaning that there exist two
isomorphisms γ : St(S) → St (S Amax ) and δ : Transf (S) → Transf (S Amax ) satisfying the condition

γ(T ψ) = δ(T ) γ(ψ) , ∀ψ ∈ St(S) , ∀T ∈ Transf (S) . (A81)

Proof. The Non-Overlapping Agents Requirement guarantees that the commutant Act( Amax ; S)
contains only the identity transformation. Hence, the equivalence class [ψ] Amax contains only the
state ψ. Hence, the partial trace Tr Amax
: ψ → [ψ] Amax is a bijection from St(S) to St (S Amax ). Similarly,
the equivalence class [T ] Amax contains only the transformation T . Hence, the restriction π Amax :
T → [T ] Amax is a bijective function between Transf (S) and Transf (S Amax ). Such a function is an
homomorphism of monoids, by Equation (20). Setting δ := π Amax and γ := Tr Amax , the condition (A81)
is guaranteed by Equation (21).

Appendix H. Proof of Proposition 15

By deﬁnition, the condition TrB [ψ] = TrB [ψ ] holds if and only if there exists a ﬁnite sequence
(ψ1 , ψ2 , . . . , ψn ) such that

ψ1 = ψ , ψn = ψ , )i ψi+1 .
∀i ∈ {1, . . . , n − 1} ∃Vi , V)i ∈ Act( B; S) : Vi ψi = V (A82)

Our goal is to prove that there exists an adversarial action V B ∈ Act( B; S) such that the relation
ψ = V B ψ or ψ = V B ψ holds.
We will proceed by induction on n, starting from the base case n = 2. In this case, we have
Deg B (ψ) ∩ Deg B (ψ )
= ∅. Then, the ﬁrst regularity condition implies that there exists a transformation
V B ∈ Act( B; S) such that at least one of the relations V B ψ = ψ and ψ = V B ψ holds. This proves the
validity of the base case.
Now, suppose that the induction hypothesis holds for all sequences of length n, and suppose
that ψ and ψ are equivalent through a sequence of length n + 1, say (ψ1 , ψ2 , . . . , ψn , ψn+1 ). Applying
the induction hypothesis to the sequence (ψ1 , ψ2 , . . . , ψn ), we obtain that there exists a transformation
V ∈ Act( B; S) such that at least one of the relations ψn = V ψ and ψ = V ψn holds. Moreover,
applying the induction hypothesis to the pair (ψn , ψn+1 ) we obtain that there exists a transformation
V ∈ Act( B; S) such that ψn+1 = V ψn , or ψn = V ψn+1 . Hence, there are four possible cases:

1. ψn = V ψ and ψn+1 = V ψn . In this case, we have ψn+1 = (V ◦ V )ψ, which proves the
desired statement.
2. ψn = V ψ and ψn = V ψn+1 . In this case, we have V ψ = V ψn+1 , or equivalently Deg B (ψ) ∩
Deg B (ψn+1 )
= ∅. Applying the induction hypothesis to the sequence (ψ, ψn+1 ), we obtain the
desired statement.
3. ψ = V ψn and ψn+1 = V ψn . Using the second regularity condition, we obtain that there exists a
transformation W ∈ Act( B; S) such that at least one of the relations V = W ◦ V and V = W ◦ V
holds. Suppose that V = W ◦ V . In this case, we have

ψ = V ψn = (W ◦ V )ψn = W ψn+1 . (A83)

Alternatively, suppose that V = W ◦ V . In this case, we have

ψn+1 = V ψn = (W ◦ V )ψn = W ψ . (A84)

In both cases, we proved the desired statement.

4. ψ = V ψn and ψn = V ψn+1 . In this case, we have ψ = (V ◦ V )ψn+1 , which proves the
desired statement.

152
Entropy 2018, 20, 358

Appendix I. Characterization of the Adversarial Group

Here, we provide the proof of Theorem 3, proving a canonical decomposition of the elements of
the adversarial group. The proof proceeds in a few steps:

Lemma A19 (Canonical form of the elements of the adversarial group). Let U : g → Ug be a projective
representation of the group G, let Irr(U ) be the set of irreducible representations contained in the isotypic
decomposition of U, and let ω : G → C be a multiplicative character of G. Then, the commutation relation

VUg = ω ( g) Ug V ∀g ∈ G (A85)

holds iff

1. The map U ( j) → ω U ( j) is a permutation of the set Irr(U ), denoted as π : Irr(U ) → Irr(U ). In other
words, for every irrep U ( j) with j ∈ Irr(U ), the irrep ω U ( j) is equivalent to an irrep k ∈ Irr(U ), and the
correspondence between j and k is bijective.
2. The multiplicity spaces M j and Mπ ( j) have the same dimension.
3. The unitary operator V has the canonical form V = Uπ V0 , where V0 is an unitary operator in the
commutant U and Uπ is a permutation operator satisfying

Uπ R j ⊗ M j = Rπ ( j) ⊗ Mπ ( j) ∀ j ∈ Irr(U ) . (A86)

Proof. Let us use the isotypic decomposition of U, as in Equation (88). We deﬁne

Vj,k := Π j V Πk , (A87)

where Π j (Πk ) is the projector onto R j ⊗ M j (Rk ⊗ Mk ). Then, Equation (A85) is equivalent to
the condition

(k) ( j)
Vj,k Ug ⊗ IMk = ω ( g) Ug ⊗ IM j Vjk , ∀ g ∈ G , ∀ j, k , (A88)

which in turn is equivalent to the condition

(k) ( j)
α|Vj,k | β Ug = ω ( g) Ug α|Vj,k | β , ∀ g ∈ G , ∀ j, k , ∀|α ∈ M j , ∀| β ∈ Mk , (A89)

where α|Vj,k | β is a shorthand for the partial matrix element ( IR j ⊗ α|) Vj,k ( IRk ⊗ | β).
Equation (A89) means that each operator α|Vj,k | β intertwines the two representations U (k) and
ω U ( j) .
Recall that each representation is irreducible. Hence, the second Schur’s lemma [78] implies
that α| Vj,k | β is zero if the two representations are not equivalent. Note that there can be at most
one value of j such that U (k) is equivalent to ω U ( j) . If such a value exists, we denote it as j = π (k ).
By construction, the function π : Irr(U ) → Irr(U ) must be injective.
When j = π (k), the ﬁrst Schur’s lemma [78] guarantees that the operator α| Vπ (k),k | β is
proportional to the partial isometry Tπ (k),k that implements the equivalence of the two representations.
Let us write

α| Vπ (k),k | β = Mα,β Tπ (k),k , (A90)

153
Entropy 2018, 20, 358

(k)
for some Mα,β ∈ C. Note also that, since the left-hand side is sesquilinear in |α and | β, the right-hand
side should also be sesquilinear. Hence, we can ﬁnd an operator Mπ (k),k : Mk → Mπ (k) such that
(k)
Mα,β = α| Mπ (k),k | β. Putting everything together, the operator V can be written as

*
V= Tπ (k),k ⊗ Mπ (k),k . (A91)
k ∈Irr(U )

Now, the operator V must be unitary, and, in particular, it should satisfy the condition VV † = I,
which reads
*
IRπ (k) ⊗ Mπ (k),k Mπ† (k),k = I . (A92)
k ∈Irr(U )

The above condition implies that: (i) the function π must be surjective, and (ii) the operator
Mπ (k),k must be a co-isometry. From the relation V † V, we also obtain that Mπ (k),k must be an isometry.
Hence, Mπ (k) is unitary.
Summarizing, the condition (A85) can be satisﬁed only if there exists a permutation π : Irr(U ) →
Irr(U ) such that, for every j,
1. the irreps ω U (k) and U π (k) are equivalent,
2. the multiplicity spaces Mk and Mπ (k) are unitarily isomorphic.
Fixing a unitary isomorphism Sπ (k),k : Mk → Mπ (k) , we can write every element of the
adversarial group in the canonical form V = Uπ V0 , where Uπ is the permutation operator
*
Uπ = Tπ (k),k ⊗ Sπ (k),k , (A93)
k ∈Irr(U )

and V0 is an element of the commutant U , i.e., a generic unitary operator of the form
*
V0 = Ij ⊗ V0,k . (A94)
k ∈Irr(U )

Conversely, if a permutation π exists with the properties that for every k ∈ Irr(U )
1. ω U (k) and U (π (k)) are equivalent irreps,
2. Mk and Mπ (k) are unitarily equivalent,
and if the operator V has the form V = Uπ V0 , with Uπ and V0 as in Equations (A93) and (A94), then V
satisﬁes the commutation relation (A85).

We have seen that every element of the adversarial group can be decomposed into the product of
a permutation operator, which permutes the irreps, and an operator in the commutant of the original
group representation U : G → Lin(H). We now observe that the allowed permutations have an
additional structure: they must form an Abelian group, denoted as A.

Lemma A20. The permutations π arising from Equation (A85) with a generic multiplicative character ω (V, ·)
form an Abelian subgroup A of the group of all permutations of Irr(U ).

Proof. Let V and W be two elements of the adversarial group GB , let ω (V, ·) and ω (W, ·) be the
corresponding characters, and let πV and πW be the permutations associated with ω (V, ·) and ω (W, ·)
as in Theorem A19, i.e., through the relation

j = πV ( k ) ⇐⇒ U ( j) is equivalent to ω (V, ·) U (k),

j = πW ( k ) ⇐⇒ U ( j) is equivalent to ω (W, ·) U (k) . (A95)

154
Entropy 2018, 20, 358

Now, the element VW is associated with the permutation πV ◦ πW , while the element WV is
associated with the permutation πW ◦ πV . On the other hand, the characters obey the equality

ω (VW, g) = ω (V, g)ω (W, g) = ω (WV, g) ∀g ∈ G . (A96)

Hence, we conclude that πV ◦ πW and πW ◦ πV are, in fact, the same permutation. Hence,
the elements of the adversarial group must correspond to an Abelian subgroup of the permutations
of Irr(U ).

Combining Lemmas A19 and A20, we can now prove Theorem 3.

Proof of Theorem 3. For different permutations in A, we can choose the isomorphisms Sπ (k),k : Mk →
Mπ (k) such that the following property holds:

Sπ2 ◦π2 (k),k = Sπ2 (π1 (k)),π1 (k) Sπ1 (k),k , ∀ π1 , π2 ∈ A . (A97)

When this is done, the unitary operators Uπ deﬁned in Equation (A93) form a faithful
representation of the Abelian group A. Using the canonical decomposition of Theorem A19, every
element of V ∈ GB is decomposed uniquely as V = Uπ V0 , where V0 is an element of the commutant U .
Note also that the commutant U is a normal subgroup of the adversarial group: indeed, for every
element V ∈ GB we have VU V † = U . Since U is a normal subgroup and the decomposition
V = Uπ V0 is unique for every V ∈ GB , it follows that the adversarial group GB is the semidirect
product A U .

Appendix J. Example: The Phase Flip Group

Consider the Hilbert space HS = C2 , and suppose that agent A can only perform the identity
channel and the phase ﬂip channel Z , deﬁned as

Z (·) = Z · Z , Z = |0 0| − |1 1| . (A98)

Then, the actions of agent A correspond to the unitary representation

U : Z2 → Lin(S) , k → Uk = Z k . (A99)

The representation can be decomposed into two irreps, corresponding to the one-dimensional
subspaces H0 = Span{|0} and H1 = Span{|1}. The corresponding irreps, denoted by

ω0 : Z2 → C , ω (k ) = 1,
ω1 : Z2 → C , ω (k ) = (−1)k , (A100)

are the only two irreps of the group and are multiplicative characters.
The condition VUk = Uk V yields the solutions

V = eiθ0 |0 0| + eiθ1 |1 1| , θ0 , θ1 ∈ [0, 2π ) , (A101)

corresponding to the commutant U . The condition VUk = (−1)k Uk V yields the solutions

V = eiθ0 |0 1| + eiθ1 |1 0| , θ0 , θ1 ∈ [0, 2π ) . (A102)

It is easy to see that the adversarial group GB acts irreducibly on HS .

155
Entropy 2018, 20, 358

Let us consider now the subsystem S A . The states of S A are equivalence classes under the relation

|ψ A |ψ ∃V ∈ G B : |ψ = V |ψ . (A103)

It is not hard to see that the equivalence class of the state |ψ is uniquely determined by the
unordered pair {| 0|ψ| , | 1|ψ|}. In other words, the state space of system S A is

St(S A ) = { p, 1 − p} , : p ∈ [0, 1] . (A104)

Note that, in this case, the state space is not a convex set of density matrices. Instead, it is the
quotient of the set of diagonal density matrices, under the equivalence relation that two matrices with
the same spectrum are equivalent.
Finally, note that the transformations of system S A are trivial: since the adversarial group GB
contains the group G A , the group G(S A ) = π A ( G A ) is trivial, namely

G( S A ) = I S A . (A105)

Appendix K. Proof of Theorem 4

Let G be a connected Lie group, and let g be the Lie algebra. Since G is connected, the exponential
map reaches every element of the group, namely G = exp[ig].
Let h ∈ G be a generic element of the group, written as h = exp[iX ] for some X ∈ g, and consider
the one-parameter subgroup H = {exp[iλX ] , λ ∈ R}. For a generic element g ∈ H, the corresponding
unitary operator can be expressed as Ug = exp[iλK ], where K ∈ Lin(S) is a suitable self-adjoint
operator. Similarly, the multiplicative character has the form ω ( g) = exp[iλμ], for some real number
μ ∈ R.
Now, every element V of the adversarial group must satisfy the relation

V exp[iλK ] = exp[iλ(K + μ IS )]V ∀λ ∈ R , (A106)

or equivalently,

exp[iλK ] = V † exp[iλ(K + μ IS )] V ∀λ ∈ R . (A107)

Since the operators exp[iλK ] and exp[iλ(K + μ IS )] are unitarily equivalent, they must have
the same spectrum. This is only possible if the operators K and K + μ IS have the same spectrum,
which happens only if μ = 0.
Now, recall that the one-parameter Abelian subgroup H is generic. Since every element of G is
contained in some one-parameter Abelian subgroup H, we showed that ω ( g) = 1 for every g ∈ G.
To conclude the proof, observe that the map U ( j) → ω U ( j) is the identity, and therefore induces
the trivial permutation on the set of irreps Irr(U ). Hence, the group of permutations A induced by
multiplication by ω contains only the identity element.

Appendix L. Proof of Proposition 16

Proof. It is enough to decompose the two states as
* . * /
|ψ = p j |ψj and |ψ = pj |ψj , (A108)
j∈Irr(U ) j∈Irr(U )

156
Entropy 2018, 20, 358

where ρ j (ρj ) is the marginal of |ψj (|ψj ) on system R j . It is then clear that the equality T B (|ψ ψ|) =
T B (|ψ ψ |) implies p j = pj and ρ j = ρj for every j. Since the states |ψj and |ψj have the same
marginal on system R j , there must exist a unitary operator Uj : M j → M j such that

|ψj = ( IR j ⊗ Uj ) |ψj . (A110)

We can then deﬁne the unitary gate

*
UB = IR j ⊗ Uj , (A111)
j∈Irr(U )

which satisﬁes the property UB |ψ = |ψ . By the characterization of Equation (89), UB is an element
of GB .

References
1. Nielsen, M.; Chuang, I. Quantum information and computation. Nature 2000, 404, 247.
2. Kitaev, A.Y.; Shen, A.; Vyalyi, M.N. Classical and Quantum Computation; Number 47; American Mathematical
Society: Providence, RI, USA, 2002.
3. Einstein, A.; Podolsky, B.; Rosen, N. Can quantum-mechanical description of physical reality be considered
complete? Phys. Rev. 1935, 47, 777. [CrossRef]
4. Schrödinger, E. Discussion of probability relations between separated systems. In Mathematical Proceedings
of the Cambridge Philosophical Society; Cambridge University Press: Cambrdige, UK, 1935; Volume 31,
pp. 555–563.
5. Hardy, L. Quantum theory from ﬁve reasonable axioms. arXiv 2001, arXiv:quant-ph/0101012.
6. Barnum, H.; Barrett, J.; Leifer, M.; Wilce, A. Generalized no-broadcasting theorem. Phys. Rev. Lett. 2007,
99, 240501. [CrossRef] [PubMed]
7. Barrett, J. Information processing in generalized probabilistic theories. Phys. Rev. A 2007, 75, 032304.
[CrossRef]
8. Chiribella, G.; D’Ariano, G.; Perinotti, P. Probabilistic theories with purification. Phys. Rev. A 2010, 81, 062348.
[CrossRef]
9. Barnum, H.; Wilce, A. Information processing in convex operational theories. Electron. Notes Theor. Comput. Sci.
2011, 270, 3–15. [CrossRef]
10. Hardy, L. Foliable operational structures for general probabilistic theories. In Deep Beauty: Understanding the
Quantum World through Mathematical Innovation; Halvorson, H., Ed.; Cambridge University Press: Cambrdige,
UK, 2011; p. 409.
11. Hardy, L. A formalism-local framework for general probabilistic theories, including quantum theory.
Math. Struct. Comput. Sci. 2013, 23, 399–440. [CrossRef]
12. Chiribella, G. Dilation of states and processes in operational-probabilistic theories. In Proceedings of
the 11th workshop on Quantum Physics and Logic, Kyoto, Japan, 4–6 June 2014; Coecke, B., Hasuo, I.,
Panangaden, P., Eds.; Electronic Proceedings in Theoretical Computer Science; Volume 172, pp. 1–14.
13. Chiribella, G.; D’Ariano, G.M.; Perinotti, P. Quantum from principles. In Quantum Theory: Informational
Foundations and Foils; Springer: Dordrecht, The Netherlands, 2016; pp. 171–221.
14. Hardy, L. Reconstructing quantum theory. In Quantum Theory: Informational Foundations and Foils; Springer:
Dordrecht, The Netherlands, 2016; pp. 223–248.
15. Mauro D’Ariano, G.; Chiribella, G.; Perinotti, P. Quantum Theory from First Principles. In Quantum Theory
from First Principles; D’Ariano, G.M., Chiribella, G., Perinotti, P., Eds.; Cambridge University Press: Cambridge,
UK, 2017.

157
Entropy 2018, 20, 358

16. Abramsky, S.; Coecke, B. A categorical semantics of quantum protocols. In Proceedings of the 19th Annual
IEEE Symposium on Logic in Computer Science, Turku, Finland, 17 July 2004; pp. 415–425.
17. Coecke, B. Kindergarten quantum mechanics: Lecture notes. In Proceedings of the AIP Conference
Quantum Theory: Reconsideration of Foundations-3, Växjö, Sweden, 6–11 June 2005; American Institute of
Physics: Melville, NY, USA, 2006; Volume 810, pp. 81–98.
18. Coecke, B. Quantum picturalism. Contemp. Phys. 2010, 51, 59–83. [CrossRef]
19. Abramsky, S.; Coecke, B. Categorical quantum mechanics. In Handbook of Quantum Logic and Quantum
Structures: Quantum Logic; Elsevier Science: New York, NY, USA, 2008; pp. 261–324.
20. Coecke, B.; Kissinger, A. Picturing Quantum Processes; Cambridge University Press: Cambridge, UK, 2017.
21. Selinger, P. A survey of graphical languages for monoidal categories. In New Structures for Physics; Springer:
Berlin/Heidelberg, Germany, 2010; pp. 289–355.
22. Haag, R. Local Quantum Physics: Fields, Particles, Algebras; Springer: Berlin/Heidelberg, Germany, 2012.
23. Viola, L.; Knill, E.; Laﬂamme, R. Constructing qubits in physical systems. J. Phys. A Math. Gen. 2001, 34, 7067.
[CrossRef]
24. Zanardi, P.; Lidar, D.A.; Lloyd, S. Quantum tensor product structures are observable induced. Phys. Rev. Lett.
2004, 92, 060402. [CrossRef] [PubMed]
25. Palma, G.M.; Suominen, K.A.; Ekert, A.K. Quantum computers and dissipation. Proc. R. Soc. Lond. A 1996,
452, 567–584. [CrossRef]
26. Zanardi, P.; Rasetti, M. Noiseless quantum codes. Phys. Rev. Lett. 1997, 79, 3306. [CrossRef]
27. Lidar, D.A.; Chuang, I.L.; Whaley, K.B. Decoherence-free subspaces for quantum computation. Phys. Rev. Lett.
1998, 81, 2594. [CrossRef]
28. Knill, E.; Laﬂamme, R.; Viola, L. Theory of quantum error correction for general noise. Phys. Rev. Lett. 2000,
84, 2525. [CrossRef] [PubMed]
29. Zanardi, P. Stabilizing quantum information. Phys. Rev. A 2000, 63, 012301. [CrossRef]
30. Kempe, J.; Bacon, D.; Lidar, D.A.; Whaley, K.B. Theory of decoherence-free fault-tolerant universal quantum
computation. Phys. Rev. A 2001, 63, 042307. [CrossRef]
31. Zanardi, P. Virtual quantum subsystems. Phys. Rev. Lett. 2001, 87, 077901. [CrossRef] [PubMed]
32. Bratteli, O.; Robinson, D.W. Operator Algebras and Quantum Statistical Mechanics 1; Springer:
Berlin/Heidelberg, Germany, 1987.
33. Kraemer, L.; Del Rio, L. Operational locality in global theories. arXiv 2017, arXiv:1701.03280.
34. Åberg, J. Quantifying superposition. arXiv 2006, arXiv:quant-ph/0612146.
35. Baumgratz, T.; Cramer, M.; Plenio, M. Quantifying coherence. Phys. Rev. Lett. 2014, 113, 140401. [CrossRef]
[PubMed]
36. Levi, F.; Mintert, F. A quantitative theory of coherent delocalization. New J. Phys. 2014, 16, 033007. [CrossRef]
37. Winter, A.; Yang, D. Operational resource theory of coherence. Phys. Rev. Lett. 2016, 116, 120404. [CrossRef]
[PubMed]
38. Chitambar, E.; Gour, G. Critical examination of incoherent operations and a physically consistent resource
theory of quantum coherence. Phys. Rev. Lett. 2016, 117, 030401. [CrossRef] [PubMed]
39. Chitambar, E.; Gour, G. Comparison of incoherent operations and measures of coherence. Phys. Rev. A 2016,
94, 052336. [CrossRef]
40. Marvian, I.; Spekkens, R.W. How to quantify coherence: Distinguishing speakable and unspeakable notions.
Phys. Rev. A 2016, 94, 052324. [CrossRef]
41. Yadin, B.; Ma, J.; Girolami, D.; Gu, M.; Vedral, V. Quantum processes which do not use coherence. Phys. Rev. X
2016, 6, 041028. [CrossRef]
42. Chiribella, G.; D’Ariano, G.; Perinotti, P. Informational derivation of quantum theory. Phys. Rev. A 2011,
84, 012311. [CrossRef]
43. Hardy, L. Reformulating and reconstructing quantum theory. arXiv 2011, arXiv:1104.2066.
44. Masanes, L.; Müller, M.P. A derivation of quantum theory from physical requirements. New J. Phys. 2011,
13, 063001. [CrossRef]
45. Dakic, B.; Brukner, C. Quantum Theory and Beyond: Is Entanglement Special? In Deep Beauty: Understanding
the Quantum World through Mathematical Innovation; Halvorson, H., Ed.; Cambridge University Press:
Cambridge, UK, 2011; pp. 365–392.

158
Entropy 2018, 20, 358

46. Masanes, L.; Müller, M.P.; Augusiak, R.; Perez-Garcia, D. Existence of an information unit as a postulate of
quantum theory. Proc. Natl. Acad. Sci. USA 2013, 110, 16373–16377. [CrossRef] [PubMed]
47. Wilce, A. Conjugates, Filters and Quantum Mechanics. arXiv 2012, arXiv:1206.2897.
48. Barnum, H.; Müller, M.P.; Ududec, C. Higher-order interference and single-system postulates characterizing
quantum theory. New J. Phys. 2014, 16, 123029. [CrossRef]
49. Chiribella, G.; D’Ariano, G.; Perinotti, P. Quantum Theory, namely the pure and reversible theory of
information. Entropy 2012, 14, 1877–1893. [CrossRef]
50. Chiribella, G.; Yuan, X. Quantum theory from quantum information: The purification route. Can. J. Phys.
2013, 91, 475–478. [CrossRef]
51. Chiribella, G.; Scandolo, C.M. Conservation of information and the foundations of quantum mechanics.
In EPJ Web of Conferences; EDP Sciences: Les Ulis, France, 2015; Volume 95, p. 03003.
52. Chiribella, G.; Scandolo, C.M. Entanglement and thermodynamics in general probabilistic theories.
New J. Phys. 2015, 17, 103027. [CrossRef]
53. Chiribella, G.; Scandolo, C.M. Microcanonical thermodynamics in general physical theories. New J. Phys.
2017, 19, 123043. [CrossRef]
54. Chiribella, G.; Scandolo, C.M. Entanglement as an axiomatic foundation for statistical mechanics. arXiv
2016, arXiv:1608.04459.
55. Lee, C.M.; Selby, J.H. Generalised phase kick-back: The structure of computational algorithms from physical
principles. New J. Phys. 2016, 18, 033023. [CrossRef]
56. Lee, C.M.; Selby, J.H. Deriving Grover’s lower bound from simple physical principles. New J. Phys. 2016,
18, 093047. [CrossRef]
57. Lee, C.M.; Selby, J.H.; Barnum, H. Oracles and query lower bounds in generalised probabilistic theories.
arXiv 2017, arXiv:1704.05043.
58. Susskind, L. The Black Hole War: My Battle with Stephen Hawking to Make the World Safe for Quantum Mechanics;
Hachette UK: London, UK, 2008.
59. Takesaki, M. Theory of Operator Algebras I; Springer: New York, NY, USA, 1979.
60. Barnum, H.; Knill, E.; Ortiz, G.; Somma, R.; Viola, L. A subsystem-independent generalization of
entanglement. Phys. Rev. Lett. 2004, 92, 107902. [CrossRef] [PubMed]
61. Barnum, H.; Knill, E.; Ortiz, G.; Viola, L. Generalizations of entanglement based on coherent states and
convex sets. Phys. Rev. A 2003, 68, 032308. [CrossRef]
62. Barnum, H.; Ortiz, G.; Somma, R.; Viola, L. A generalization of entanglement to convex operational theories:
entanglement relative to a subspace of observables. Int. J. Theor. Phys. 2005, 44, 2127–2145. [CrossRef]
63. Del Rio, L.; Kraemer, L.; Renner, R. Resource theories of knowledge. arXiv 2015, arXiv:1511.08818.
64. Del Rio, L. Resource Theories of Knowledge. Ph.D. Thesis, ETH Zürich, Zürich, Switzerland, 2015.
[CrossRef]
65. Kraemer Gabriel, L. Restricted Agents in Thermodynamics and Quantum Information Theory. Ph.D. Thesis,
ETH Zürich, Zürich, Switzerland, 2016. [CrossRef]
66. Brassard, G.; Raymond-Robichaud, P. The equivalence of local-realistic and no-signalling theories. arXiv
2017, arXiv:1710.01380.
67. Holevo, A.S. Statistical Structure of Quantum Theory; Springer: Berlin/Heidelberg, Germany, 2003; Volume 67.
68. Kraus, K. States, Effects and Operations: Fundamental Notions of Quantum Theory; Springer: Berlin/Heidelberg,
Germany, 1983.
69. Haag, R.; Schroer, B. Postulates of quantum field theory. J. Math. Phys. 1962, 3, 248–256. [CrossRef]
70. Haag, R.; Kastler, D. An algebraic approach to quantum field theory. J. Math. Phys. 1964, 5, 848–861.
[CrossRef]
71. Buscemi, F.; Chiribella, G.; D’Ariano, G.M. Inverting quantum decoherence by classical feedback from the
environment. Phys. Rev. Lett. 2005, 95, 090501. [CrossRef] [PubMed]
72. Buscemi, F.; Chiribella, G.; D’Ariano, G.M. Quantum erasure of decoherence. Open Syst. Inf. Dyn. 2007,
14, 53–61. [CrossRef]
73. Selinger, P. Idempotents in dagger categories. Electron. Notes Theor. Comput. Sci. 2008, 210, 107–122.
[CrossRef]
74. Coecke, B.; Selby, J.; Tull, S. Two Roads to Classicality. Electron. Proc. Theor. Comput. Sci. 2018, 266, 104–118.
[CrossRef]

159
Entropy 2018, 20, 358

75. Coecke, B.; Lal, R. Causal categories: relativistically interacting processes. Found. Phys. 2013, 43, 458–501.
[CrossRef]
76. Coecke, B. Terminality implies no-signalling... and much more than that. New Gener. Comput. 2016, 34, 69–85.
[CrossRef]
77. Chiribella, G. Distinguishability and copiability of programs in general process theories. Int. J. Softw. Inform.
2014, 8, 209–223.
78. Fulton, W.; Harris, J. Representation Theory: A First Course; Springer: Berlin/Heidelberg, Germany, 2013;
Volume 129.
79. Marvian, I.; Spekkens, R.W. A generalization of Schur–Weyl duality with applications in quantum estimation.
Commun. Math. Phys. 2014, 331, 431–475. [CrossRef]
80. Galley, T.D.; Masanes, L. Impossibility of mixed-state puriﬁcation in any alternative to the Born Rule. arXiv
2018, arXiv:1801.06414.
81. Yngvason, J. Localization and entanglement in relativistic quantum physics. In The Message of Quantum
Science; Springer: Berlin/Heidelberg, Germany, 2015; pp. 325–348.
82. Murray, F.J.; Neumann, J.V. On rings of operators. Ann. Math. 1936, 37, 116–229. [CrossRef]
83. Murray, F.J.; von Neumann, J. On rings of operators. II. Trans. Am. Math. Soc. 1937, 41, 208–248. [CrossRef]
84. Uhlmann, A. The transition probability in the state space of a *-algebra. Rep. Math. Phys. 1976, 9, 273–279.
[CrossRef]
85. Jozsa, R. Fidelity for mixed quantum states. J. Mod. Opt. 1994, 41, 2315–2323. [CrossRef]
86. Lindblad, G. A general no-cloning theorem. Lett. Math. Phys. 1999, 47, 189–196. [CrossRef]
87. D’Ariano, G.M.; Presti, P.L. Optimal nonuniversally covariant cloning. Phys. Rev. A 2001, 64, 042308.
[CrossRef]
88. Chiribella, G.; D’Ariano, G.; Perinotti, P.; Cerf, N. Extremal quantum cloning machines. Phys. Rev. A 2005,
72, 042336. [CrossRef]
89. Coecke, B.; Selby, J.; Tull, S. Categorical Probabilistic Theories. Electron. Proc. Theor. Comput. Sci. 2018,
266, 367–385.

160
entropy
Article
Ruling out Higher-Order Interference from
Purity Principles
Howard Barnum 1,2, *, Ciarán M. Lee 3, *, Carlo Maria Scandolo 4, * and John H. Selby 4,5, *
1 Centre for the Mathematics of Quantum Theory (QMATH), Department of Mathematical Sciences,
University of Copenhagen, DK-2100 Copenhagen, Denmark
2 Department of Physics and Astronomy, University of New Mexico, Albuquerque, NM 87131, USA
3 Department of Physics, University College London, London WC1E 6BT, UK
4 Department of Computer Science, University of Oxford, Oxford OX1 3QD, UK
5 Department of Physics, Imperial College London, London SW7 2AZ, UK
* Correspondence: [email protected] (H.B.); [email protected] (C.M.L.);
[email protected] (C.M.S.); [email protected] (J.H.S.)

Academic Editors: Giacomo Mauro D’Ariano and Paolo Perinotti

Received: 21 April 2017; Accepted: 22 May 2017; Published: 1 June 2017

Abstract: As first noted by Rafael Sorkin, there is a limit to quantum interference. The interference
pattern formed in a multi-slit experiment is a function of the interference patterns formed between
pairs of slits; there are no genuinely new features resulting from considering three slits instead of two.
Sorkin has introduced a hierarchy of mathematically conceivable higher-order interference behaviours,
where classical theory lies at the first level of this hierarchy and quantum theory theory at the second.
Informally, the order in this hierarchy corresponds to the number of slits on which the interference
pattern has an irreducible dependence. Many authors have wondered why quantum interference
is limited to the second level of this hierarchy. Does the existence of higher-order interference
violate some natural physical principle that we believe should be fundamental? In the current work
we show that such principles can be found which limit interference behaviour to second-order,
or “quantum-like”, interference, but that do not restrict us to the entire quantum formalism. We work
within the operational framework of generalised probabilistic theories, and prove that any theory
satisfying Causality, Purity Preservation, Pure Sharpness, and Purification—four principles that
formalise the fundamental character of purity in nature—exhibits at most second-order interference.
Hence these theories are, at least conceptually, very “close” to quantum theory. Along the way we
show that systems in such theories correspond to Euclidean Jordan algebras. Hence, they are self-dual
and, moreover, multi-slit experiments in such theories are described by pure projectors.

Keywords: higher-order interference; generalised probabilistic theories; Euclidean Jordan algebras

1. Introduction
Described by Feynman as “impossible, absolutely impossible, to explain in any classical way” [1]
(volume 1, chapter 37), quantum interference is a distinctive signature of non-classicality. However, as
ﬁrst noted by Rafael Sorkin [2,3], there is a limit to this interference; in contrast to the case of two slits,
the interference pattern formed in a three slit experiment can be written as a linear combination of two
and one slit patterns. Sorkin has introduced a hierarchy of mathematically conceivable higher-order
interference behaviours, where classical theory lies at the ﬁrst level of this hierarchy and quantum
theory theory at the second. Informally, the order in this hierarchy corresponds to the number of slits
on which the interference pattern has an irreducible dependence.
Many authors have wondered why quantum interference is limited to the second level of this
hierarchy [2,4–13]. Does the existence of higher-order interference violate some natural physical

Entropy 2017, 19, 253; doi:10.3390/e19060253 161 www.mdpi.com/journal/entropy

Entropy 2017, 19, 253

principle that we believe should be fundamental [14]? In the current work we show that such
natural principles can be found which limit interference behaviour to second-order, or “quantum-like”,
interference, but that do not restrict us to the entire quantum formalism.
We work in the framework of general probabilistic theories [15–28]. This framework is general
enough to accommodate essentially arbitrary operational theories, where an operational theory specifies
a set of laboratory devices which can be connected together in different ways, and assigns probabilities to
different experimental outcomes. Investigating how the structural and information-theoretic features of a
given theory in this framework depend on different physical principles deepens our physical and intuitive
understanding of such features. Indeed, many authors [20,22,23,28,29] have derived the entire structure
of finite-dimensional quantum theory from simple information-theoretic axioms—reminiscent of
Einstein’s derivation of special relativity from two simple physical principles. So far, ruling out
higher-order interference has required thermodynamic arguments. Indeed, by combining the results
and axioms of Refs. [30,31], higher-order interference could be ruled out in theories satisfying the
combined axioms. In this paper we show that we can prove this in a more direct way from first
principles, using only the axioms of Ref. [30].
Many experimental investigations have searched for divergences from quantum theory by looking
for higher-order interference [32–36]. These experiments involved passing a particle through a physical
barrier with multiple slits and comparing the interference patterns formed on a screen behind the
barrier when different subsets of slits are closed. Given this set-up, one would expect that the physical
theory being tested should possess transformations that correspond to the action of blocking certain
subsets of slits. Moreover, blocking all but two subsets of slits should not affect states which can pass
through either slit. This intuition suggests that these transformations should correspond to projectors.
Many operational probabilistic theories do not possess such a natural mathematical interpretation
of multi-slit experiments; indeed many theories do not admit well-defined projectors [9]. Here, we
show that there exist natural information-theoretic principles that both imply the existence of the
projector structure, and rule out third-, and higher-, order interference. The principles that ensure
this structure are Causality, Purity Preservation, Pure Sharpness, and Purification. These formalise
intuitive ideas about the fundamental role of purity in nature. More formally, we show that such
theories possess a self-dualising inner product, and that there exist pure projectors which represent the
opening and closing of slits in a multi-slit experiment. Barnum, Müller and Ududec have shown that
in any self-dual theory in which such projectors exist for every face, if projectors map pure states to
pure states, then there can be at most second-order interference [4] (Proposition 29). The conjunction
of our new results and the principle of Purity Preservation implies the conditions of Barnum et al.’s
proposition. Hence sharp theories with purification do not exhibit higher-order interference. In fact
we prove a stronger result, that the systems in such theories are Euclidean Jordan algebras which have
been studied in quantum foundations [4,13,37].
This paper is organised as follows. In Section 2 we review the basics of the operational probabilistic
theory framework. In Section 3 we formally define higher-order interference. In Section 4 we define
sharp theories with purification and review relevant known results. In Section 5 we present and prove
our new results. Finally, in Section 6, we offer some suggestions on how new experiments might be
devised to observe higher-order interference.

2. Framework
We will describe theories in the framework of operational-probabilistic theories (OPTs) [19,20,24,29,38–40],
arising from the marriage of category theory [41–46] with probabilities. The foundation of this
framework is the idea that any successful physical theory must provide an account of experimental
data. Hence, such theories should have an operational description in terms of such experiments.
The OPT framework is based on the graphical language of circuits, describing experiments that
can be performed in a laboratory with physical systems connecting together physical processes, which
are denoted as wires and boxes respectively. The systems/wires are labelled with a type denoted A,

162
Entropy 2017, 19, 253

B, C, . . . . For example, the type given to a quantum system is the dimension of the Hilbert space
describing the system. The processes/boxes are then viewed as transformations with some input and
output systems/wires. For instance, in quantum theory these correspond to quantum instruments.
We now give a brief introduction to the important concepts in this formalism.

2.1. States, Transformations, and Effects

A fundamental tenant of the OPT framework is composition of systems and physical processes.
Given two systems A and B, they can be combined into a composite system, denoted by A ⊗ B.
Physical processes can be composed to build circuits, such as

A A
A
A A a
ρ . (1)
B B
B b

Processes with no inputs (such as ρ in the above diagram) are called states, those with no outputs
(such as a and b) are called effects and, those with both inputs and outputs (such as A, A , B ) are called
transformations. We deﬁne:

1. St (A) as the set of states of system A,

2. Eﬀ (A) as the set of effects on A,
3. Transf (A, B) as the set of transformations from A to B, and Transf (A) as the set of transformations
from A to A,
4. B ◦ A (or BA, for short) as the sequential composition of two transformations A and B , with the
input of B matching the output of A,
5. A ⊗ B as the parallel composition (or tensor product) of the transformations A and B .

OPTs include a particular system, the trivial system I, representing the lack of input or output for
a particular device.
Hence, states (resp. effects) are transformations with the trivial system as input (resp. output).
Circuits with no external wires, like the circuit in Equation (1), are called scalars and are associated
with probabilities. We will often use the notation ( a|ρ) to denote the circuit

( a|ρ) := ρ A a ,

and of the notation ( a|C|ρ) to denote the circuit

( a|C|ρ) := ρ A
C B a .

The fact that scalars are probabilities and so are real numbers induces a notion of a sum of
transformations, so that the sets St (A), Transf (A, B), and Eff (A) become spanning sets of real vector
spaces, denoted by StR (A), Transf R (A, B), and Eff R (A). In this work we will restrict our attention to
finite systems, i.e., systems for which the vector space spanned by states is finite-dimensional for all
systems. Operationally this assumption means that one need not perform an infinite number of distinct
experiments to fully characterise a state. Restricting ourselves to non-negative real numbers, we have
the convex cone of states and of effects, denoted by St+ (A) and Eff + (A) respectively. We moreover
make the assumption that the set of states is close. Operationally this is justified by the fact that up to
any experimental error a state space is indistinguishable from its closure.
The composition of states and effects leads naturally to a norm. This is defined, for states ρ as
ρ := supa∈Eff (A) ( a|ρ), and similarly for effects a as a := supρ∈St(A) ( a|ρ). The set of normalised
states (resp. effects) of system A is denoted by St1 (A) (resp. Eff 1 (A)).
Transformations are characterised by their action on states of composite systems: if A, A ∈
Transf (A, B), we have that A = A if and only if

163
Entropy 2017, 19, 253

A
A B A
A B
ρ = ρ , (2)
S S

for every system S and every state ρ ∈ St (A ⊗ S). However it follows that [19] effects (resp. states) are
completely deﬁned by their action on states (resp. effects) of a single system.
Equality on states of the single system A is, in general, not enough to discriminate between A
and A , as is the case for quantum theory over real Hilbert spaces [47]. However, for the scope of the
present article, which focuses on single-system properties, we often concern ourselves with equality
on single system.
.
Deﬁnition 1. Two transformations A, A ∈ Transf (A, B) are equal on single system, denoted by A = A ,

if Aρ = A ρ for all states ρ ∈ St (A).

2.2. Tests and Channels

In general, the boxes corresponding to physical processes come equipped with classical pointers.
When used in an experiment, the ﬁnal position of the a given pointer indicates the particular
process which occurred for that box in that run. In general, this procedure can be non-deterministic.
These non-deterministic processes are described by tests [19,39]: a test from A to B is a collection
of transformations {Ci }i∈X from A to B, where X is the set of outcomes. If A (resp. B) is the trivial
system, the test is called a preparation-test (resp. observation-test). If the set of outcomes X has a single
element, we say that the test is deterministic, because only one transformation can occur. Deterministic
transformations will be called channels.
A channel U from A to B is reversible if there exists another channel U −1 from B to A such that
U −1 U = IA and U U −1 = IB , where IS is the identity transformation on system S. If there exists
a reversible channel transforming A into B, we say that A and B are operationally equivalent, denoted
as A B. The composition of systems is required to be symmetric, meaning that A ⊗ B B ⊗ A.
Physically, this means that for every pair of systems there exists a reversible channel swapping them.
A state χ is called invariant if U χ = χ for all reversible channels U .
A particularly useful class of observation-tests allows for the following.

Deﬁnition 2. The states {ρi }i∈X are called perfectly distinguishable if there exists an observation-test
' (
{ ai }i∈X such that ai ρ j = δij for all i, j ∈ X.
Moreover, if there is no other state ρ0 such that the states {ρi }i∈X ∪ {ρ0 } are perfectly distinguishable,
the set {ρi }i∈X is said maximal.

2.3. Pure Transformations

There are various different ways to deﬁne pure transformations, for example in terms of
resources [30,48–51] or “side information” [39,52]. Informally pure transformations correspond to
an experimenter having maximal control of or information about a process. Here, we formalise this
notion by deﬁning the notion of a coarse-graining [19]. Coarse-graining is the operation of joining two
or more outcomes of a test into a single outcome. More precisely, a test {Ci }i∈X is a coarse-graining of
5 6
the test D j j∈Y if there is a partition {Yi }i∈X of Y such that, for all i ∈ X

Ci = ∑ Dj
j ∈ Yi

5 6
In this case, we say that the test D j j∈Y is a refinement of the test {Ci }i∈X , and that the
5 6
transformations D j j∈Y are a refinement of the transformation Ci . A transformation C ∈ Transf (A, B)
i 5 6 5 6
is pure if it has only trivial refinements, namely refinements D j of the form D j = p j C , where p j is
a probability distribution. We denote the sets of pure transformations, pure states, and pure effects as

164
Entropy 2017, 19, 253

PurTransf (A, B), PurSt (A), and PurEﬀ (A) respectively. Similarly, PurSt1 (A), and PurEﬀ 1 (A) denote
normalised pure states and effects respectively. Non-pure states are called mixed.

Deﬁnition 3. Let ρ ∈ St1 (A). A normalised state σ is contained in ρ if we can write ρ = pσ + (1 − p) τ,

where p ∈ (0, 1] and τ is another normalised state.

Clearly, no states are contained in a pure state. On the other edge of the spectrum we have
complete states.

Deﬁnition 4. A state ω ∈ St1 (A) is complete if every state is contained in it.

Deﬁnition 5. We say that two transformations A, A ∈ Transf (A, B) are equal upon input of the state
ρ ∈ St1 (A) if Aσ = A σ for every state σ contained in ρ. In this case we will write A =ρ A .

2.4. Causality
A natural requirement of a physical theory is that it is causal, that is, no signals can be sent from
the future to the past. In the OPT framework this is formalised as follows:

Axiom 1 (Causality [19,39]). The probability that a transformation occurs is independent of the choice of tests
performed on its output.

Causality is equivalent to the requirement that, for every system A, there exists a unique
deterministic effect uA on A (or simply u, when no ambiguity can arise) [19]. Owing to the uniqueness
of the deterministic effect, the marginals of a bipartite state can be uniquely deﬁned as:

A
ρA A := ρAB ,
B u

Moreover, this uniqueness forbids the ability to signal [19,53]. We will denote by TrB ρAB the
marginal on system A, in analogy with the notation used in the quantum case. We will stick to the
notation Tr in formulas where the deterministic effect is applied directly to a state, e.g., Tr ρ := (u|ρ).
In a causal theory it is easy to see that the norm of a state takes the form ρ = Tr ρ, and that a
state can be prepared deterministically if and only if it is normalised.

3. Higher-Order Interference
The deﬁnition of higher-order interference we shall present in this section takes its motivation
from the set-up of multi-slit interference experiments. In such experiments a particle passes through
slits in a physical barrier and is detected at a screen. By repeating the experiment many times, one
builds up a pattern on the screen. To determine if this experiment exhibits interference one compares
this pattern to those produced when certain subsets of the slits are blocked. In quantum theory,
for example, the two-slit experiment exhibits interference as the pattern formed with both slits open is
not equal to the sum of the one-slit patterns.
Consider the state of the particle just before it passes through the slits. For every slit, there should
exist states such that the particle is deﬁnitely found at that slit, if measured. Mathematically, this means
that there is a face [4] of the state space, such that all states in this face give unit probability for the
“yes” outcome of the two-outcome measurement “is the particle at this slit?”. Recall that a face is a
convex set with the property that if px + (1 − p) y, for 0 ≤ p ≤ 1, is an element then x and y are also
elements. These faces will be labelled Fi , one for each of the n slits i ∈ {1, . . . , n}. As the slits should
be perfectly distinguishable, the faces associated with each slit should be perfectly distinguishable,
or orthogonal. One can additionally ask coarse-grained questions of the form “Is the particle found
among a certain subset of slits, rather than somewhere else?”. The set of states that give outcome “yes”
with probability one must contain all the faces associated with each slit in the subset. Hence the face

165
Entropy 2017, 19, 253

associated with the subset of slits I ⊆ {1, . . . , n} is the smallest face containing each face in this subset
7 7
FI := i∈I Fi , where the operation is the least upper bound of the lattice of faces where the ordering
is provided by subset inclusion of one face within another. The face FI contains all those states which
can be found among the slits contained in I. The experiment is “complete” if all states in the state space
(of a given system A) can be found among some subset of slits. That is, if F12···n = St (A).
An n-slit experiment requires a system that has n orthogonal faces Fi , with i ∈ {1, . . . , n}.
Consider an effect E associated with ﬁnding a particle at a particular point on the screen. We now
formally deﬁne an n-slit experiment.

Deﬁnition 6. An n-slit experiment is a collection of effects eI , where I ⊆ {1, . . . , n}, such that

(eI |ρ) = ( E|ρ) , ∀ρ ∈ FI , and

(eI |ρ) = 0, ∀ρ where ρ ⊥ FI .

The effects introduced in the above deﬁnition arise from the conjunction of blocking off the slits
{1, . . . , n} \ I and applying the effect E. If the particle was prepared in a state such that it would be
unaffected by the blocking of the slits (i.e., ρ ∈ FI ) then we should have (eI |ρ) = ( E|ρ). If instead the
particle is prepared in a state which is guaranteed to be blocked (i.e., ρ ⊥ FI ) then the particle should
have no probability of being detected at the screen, i.e., (eI |ρ ) = 0.
The relevant quantities for the existence of various orders of interference are [2,9,13,15]:

I1 := ( E|ρ) , (3)
I2 := ( E|ρ) − (e1 |ρ) − (e2 |ρ) , (4)
I3 := ( E|ρ) − (e12 |ρ) − (e23 |ρ) − (e31 |ρ) + (e1 |ρ) + (e2 |ρ) + (e3 |ρ) , (5)
In := ∑ (−1)n−|I| (eI |ρ) , (6)
∅
=I⊆{1,...,n}

for some state ρ, and deﬁning e{1,...,n} := E.

Deﬁnition 7. A theory has n-th order interference if there exists a state ρ and an effect E such that In
= 0.

In a slightly different formal setting, it was shown in [2] that In = 0 =⇒ In+1 = 0, so if there is no
nth order interference, there will be no (n + 1)th order interference; the argument of [2] applies here.
It should be noted that there appears to be a lot of freedom in choosing a set of effects {eI } to test
for the existence of higher-order interference. Indeed, in arbitrary generalised theories this appears to
be the case [9]. However, it is natural to ask whether there exists physical transformations TI in the
theory which correspond to leaving the subset of slits I open and blocking the rest. Hence a unique eI
is assigned to each fixed E defined as eI = ETI . Ruling out the existence of higher-order interference
then reduces to proving certain properties of the TI . This will turn out to be the case in sharp theories
with purification.

4. Sharp Theories with Puriﬁcation

In this section we present the deﬁnition and important properties of sharp theories with
puriﬁcation. They were originally introduced in [30,49,54] for the analysis of the foundations of
thermodynamics and statistical mechanics.
Sharp theories with purification are causal theories defined by three axioms. The first axiom—Purity
Preservation—states that no information can leak when two pure transformations are composed:

Axiom 2 (Purity Preservation [55]). Sequential and parallel compositions of pure transformations yield
pure transformations.

166
Entropy 2017, 19, 253

The second axiom—Pure Sharpness—guarantees that every system possesses at least one
elementary property.

Axiom 3 (Pure Sharpness [54]). For every system there exists at least one pure effect occurring with unit
probability on some state.

These axioms are satisfied by both classical and quantum theory. Our third axiom—Purification—
signals the departure from classicality, and characterises when a physical theory admits a level of
description where all deterministic processes are pure and reversible.
Given a normalised state ρA ∈ St1 (A), a normalised pure state Ψ ∈ PurSt1 (A ⊗ B) is a purification
of ρA if
A
Ψ = ρA A ;
B u
in this case B is called the purifying system. We say that a pure state Ψ ∈ PurSt (A ⊗ B) is an essentially
unique purification of its marginal ρA [39] if every other pure state Ψ ∈ PurSt (A ⊗ B) satisfying the
purification condition must be of the form

A A

Ψ B
= Ψ
B B
,
U

for some reversible channel U .

Axiom 4 (Purification [19,39]). Every state has a purification. Purifications are essentially unique.

Quantum theory, both on complex and real Hilbert spaces, satisfies Purification, and also Spekkens’
toy model [56]. Examples of sharp theories with purification besides quantum theory include fermionic
quantum theory [57,58], a superselected version of quantum theory known as doubled quantum
theory [49], and a recent extension of classical theory with the theory of codits [30].

Properties of Sharp Theories With Puriﬁcations

Sharp theories with puriﬁcations enjoy some nice properties, which were mainly derived in
Refs. [30,54]. The ﬁrst property is that every non-trivial system admits perfectly distinguishable
states [54], and that all maximal sets of pure states have the same cardinality [30].

Proposition 1. For every system A there is a positive integer dA , called the dimension of A, such that all
maximal sets of pure states have dA elements.

Note that we will omit the subscript A when the context is clear.
In sharp theories with puriﬁcation every state can be diagonalised, i.e., written as a convex
combination of perfectly distinguishable pure states (cf. Refs. [30,54]).

Theorem 5. Every normalised state ρ ∈ St1 (A) of a non-trivial system can be decomposed as

d
ρ= ∑ pi αi ,
i =1

where { pi }id=1
is a probability distribution, and {αi }id=1 is a pure maximal set. Moreover, given ρ, { pi }id=1 is
unique up to rearrangements.

Such a decomposition is called a diagonalisation of ρ, the pi ’s are the eigenvalues of ρ, and the αi ’s
are the eigenstates. Theorem 5 implies that the eigenvalues of a state are unique, and independent
of its diagonalisation. Sharp theories with puriﬁcation have a unique invariant state χ [19], which
can be diagonalised as χ = 1d ∑id=1 αi , where {αi }id=1 is any pure maximal set [30]. Furthermore, the

167
Entropy 2017, 19, 253

diagonalisation result of Theorem 5 can be extended to every vector in StR (A), but here the eigenvalues
will be generally real numbers [30].
One of the most important consequences for this paper of the axioms deﬁning sharp theories with
puriﬁcation is a duality between normalised pure states and normalised pure effects.

Theorem 6 (States-effects duality [30,54]). For every system A, there is a bijective correspondence †:
PurSt1 (A) → PurEﬀ 1 (A) such that if α ∈ PurSt1 (A), α† is the unique normalised pure effect such
' (
that α† α = 1. Furthermore this bijection can be extended by linearity to an isomorphism between the vector
spaces StR (A) and Eﬀ R (A).

With a little abuse of notation we will use † also to denote the inverse map PurEff 1 (A) →
' (
PurSt1 (A), by which, if a ∈ PurEff 1 (A), a† is the unique pure state such that a a† = 1. Pure maximal
sets {αi }id=1 have the property that ∑id=1 αi† = u [30].
A diagonalisation result holds for vectors of Eff R (A) as well [30]: they can be written as
X = ∑id=1 λi αi† , where {αi }id=1 is a pure maximal set. Again, the λi ’s are uniquely defined given X.
Another result that will be made use of in the following sections is the following. It was shown to
hold in Ref. [30], and expresses the possibility of constructing non-disturbing measurements [20,59,60].

Proposition 2. Given a system A, let a ∈ Eﬀ (A) be an effect such that ( a|ρ) = 1, for some ρ ∈ St1 (A).
Then there exists a pure transformation T ∈ PurTransf (A) such that T =ρ I , with (u|T |σ ) ≤ ( a|σ ), for
every state σ ∈ St1 (A).

Note that the pure transformation T is non-disturbing on ρ because it acts as the identity on ρ and
on all states contained in it. In other words, whenever we have an effect occurring with unit probability
on some state ρ, we can always ﬁnd a transformation that does not disturb ρ (i.e., a non-disturbing,
non-demolition measurement) [30].
Finally, a property that we will use often is a sort of no-restriction hypothesis for tests, derived
in [20] (Corollary 4).

Proposition 3. A collection of transformations {Ai }i∈X is a valid test if and only if ∑i∈X uAi = u.
A collection of effects { ai }i∈X is a valid observation-test if and only if ∑i∈X ai = u.

5. Sharp Theories with Puriﬁcation Have No Higher-Order Interference

Here we will show that sharp theories with purification do not exhibit higher-order interference.
Our proof strategy will be to show that results of [4], which rule out the existence of higher-order
interference from certain assumptions, hold in sharp theories with purification. To this end, we will
first prove that these theories are self-dual, and that they admit pure orthogonal projectors which
satisfy certain properties, compatible with the setting presented in Section 3.

5.1. Self-Duality
Now we will prove that sharp theories with puriﬁcation are self-dual. Recall that a theory is
self-dual if for every system A there is an inner product •, • on StR (A) such that ξ ∈ St+ (A) if and
only if ξ, η ≥ 0 for every η ∈ St+ (A). To show that, we need to ﬁnd a self-dualising inner product
on StR (A) for every system A. The dagger will provide us with a good candidate. First we need the
following lemma.

Lemma 1. Let a ∈ Eﬀ 1 (A) be a normalised effect. Then a is of the form a = ∑ri=1 αi† , with r ≤ d, and the
pure states {αi }ri=1 are perfectly distinguishable.

168
Entropy 2017, 19, 253

Proof. We know that every effect a can be written as a = ∑ri=1 λi αi† , where r ≤ d, the pure states
{αi }ri=1 are perfectly distinguishable, and for every i ∈ {1, . . . , r }, λi ∈ (0, 1]. Since the state space is
closed, and a is normalised, then there exists a (normalised) state ρ such that ( a|ρ) = 1. One has
r

1 = ( a|ρ) = ∑ λi αi† ρ .
i =1
' ( ' (
Now, αi† ρ ≥ 0, and ∑ri=1 αi† ρ ≤ 1 because

r d

∑ αi† ρ ≤ ∑ αi† ρ = Tr ρ = 1,
i =1 i =1
' (
where we have used the fact that ∑id=1 αi† = u. Then ∑ri=1 λi αi† ρ ≤ λmax , where λmax is the
maximum of the λi ’s. Therefore, λmax ≥ 1, which implies λmax = 1. Now, the condition
r

∑ λi αi† ρ = λmax
i =1

means that λi = λmax = 1 for all i ∈ {1, . . . , r }.

In the above, we call r the rank of the normalised effect. We can use this result to prove
the following.

Lemma 2. For every system A, the map

ξ, η := ξ † η ,

for every ξ, η ∈ StR (A) is an inner product on StR (A).

Proof. The map •, • is clearly bilinear by construction, because the dagger is also linear. Let us show
that it is positive-deﬁnite. Take a non-null vector ξ ∈ StR (A), and diagonalise it as ξ = ∑id=1 xi αi . Then

d d

ξ, ξ = ξ † ξ = ∑ xi x j αi† α j = ∑ xi2 > 0,
i,j=1 i =1

' (
where we have used the fact that for perfectly distinguishable pure states αi† α j = δij [30].
The hard part is to prove that this bilinear map is symmetric, namely ξ, η = η, ξ , for every
ξ, η ∈ StR (A). Let us deﬁne a new (double) dagger ‡. The double dagger of a normalised state ρ is
an effect ρ‡ whose action on normalised states σ is deﬁned as

ρ‡ σ := σ† ρ , (7)

where † is the dagger of Theorem 6. Note that Equation (7) is enough to characterise ρ‡ completely,
' (
and it guarantees that ρ‡ is a mathematically well-deﬁned effect, because it is linear and σ† ρ ∈ [0, 1].
' ‡ ( ' † (
Consider now ρ and σ to be a normalised pure state ψ. Then ψ ψ = ψ ψ = 1, this means that α‡
is normalised. If we manage to show that ψ‡ is pure, then by Theorem 6 we can conclude that ψ‡ = ψ† .
By Lemma 1, ψ‡ is of the form ψ‡ = ∑ri=1 αi† , with r ≤ d, and the pure states {αi }ri=1 are perfectly
distinguishable. Clearly ψ‡ is pure if and only if r = 1. To prove it, ﬁrst let us evaluate ψ‡ on χ:

1 1
ψ‡ χ = χ† ψ = Tr ψ = , (8)
d d

169
Entropy 2017, 19, 253

as prescribed by Equation (7). Now, since ψ‡ = ∑ri=1 αi† , we have

r
r
ψ‡ χ = ∑ αi† χ = ,
d
(9)
i =1
' (
because αi† χ = 1d for every i [30]. A comparison between Equations (8) and (9), shows that r = 1.
This means that ψ‡ is pure, whence ψ‡ = ψ† . Now we can show that the double dagger ‡ actually
coincides with the dagger of Theorem 6. Indeed, given a state ρ, diagonalise it as ρ = ∑id=1 pi αi .
‡
One can easily show that the double dagger of Equation (7) is linear, so we have ρ‡ = ∑id=1 pi αi , but
‡
we have just proved that αi = αi† for pure states, so ρ‡ = ∑id=1 pi αi† = ρ† . This means that ‡ = †, and
that Equation (7) is nothing but a redeﬁnition of the usual dagger. This means for every normalised
states we have

ρ† σ = σ† ρ , (10)

and this extends linearly to all vectors ξ, η ∈ StR (A). We have proved that •, • is symmetric, and
this concludes the proof.

Note that the above result immediately yields the “symmetry of transition probabilities” as
deﬁned in Ref. [61,62].
Now we prove that this inner product is invariant under reversible transformations.

Proposition 4. For every ξ, η ∈ StR (A) and every reversible channel U one has

U ξ, U η = ξ, η .

Proof. To prove the statement, let us ﬁrst prove that for a normalised pure state α one has (U α)† =
' ( ' (
α† U −1 , for every reversible channel U . α† U −1 is a pure effect and one has α† U −1 U α = α† α = 1.
By the uniqueness of the dagger for normalised pure states, α U † − 1 †
= (U α) . This can be extended
by linearity to all vectors ξ in StR (A), so (U ξ )† = ξ † U −1 . Therefore, when we compute U ξ, U η ,
we have

U ξ, U η = ξ † U −1 U η = ξ † η = ξ, η .

The fact that •, • is an inner product allows us to define an additional norm in sharp theories
with purification: if ξ ∈ StR (A), define the dagger norm as
/
ξ † := ξ, ξ .

See Appendix A.1 for an extended discussion on the properties of this norm.
Now we are ready to state the core of this subsection.

Proposition 5. Sharp theories with puriﬁcation are self-dual.

Proof. Given a system A, we need to prove that ξ ∈ StR (A) is in St+ (A) if and only if ξ, η ≥ 0 for
all η ∈ St+ (A). Note that ξ ∈ St+ (A) if and only if it can be diagonalised as ξ = ∑id=1 xi αi , where the
xi ’s are all non-negative.
Necessity. Suppose ξ ∈ St+ (A), and take any η ∈ St+ (A), diagonalised as η = ∑id=1 yi β i .
Then we have
d

ξ, η = ∑ xi y j αi† β j ≥ 0
i,j=1
' (
because all the terms xi , y j , and αi† β j are non-negative.

170
Entropy 2017, 19, 253

Sufﬁciency. Take ξ ∈ StR (A), and assume that ξ, η ≥ 0 for all η ∈ St+ (A). Assume ξ is
diagonalised as ξ = ∑id=1 xi αi , where the xi ’s are generic real numbers. We wish to prove that all the
xi ’s are non-negative. Then
d

ξ, η = ∑ xi αi† η ≥ 0.
i,j=1
' (
Recalling that for perfectly distinguishable pure states one has αi† α j = δij [30], it is enough to
take η to be one of the states {αi }id=1 to conclude that xi ≥ 0 for every i ∈ {1, . . . , d}, meaning that
ξ ∈ St+ (A).

The self-dualising inner product, besides being a nice mathematical tool, has some operational
meaning, because it provides a measure of the distinguishability of states, as explained in Appendix A.2.
Moreover, it is the starting point for extending the dagger to all transformations. This is done in
Appendix B.

5.2. Existence of Pure Orthogonal Projectors

Now we show that we have orthogonal projectors on every face of the state space. A consequence
of diagonalisation is that all faces are generated by perfectly distinguishable pure states. Indeed, every
face F is generated by a state ω in its relative interior. ω can be diagonalised as ω = ∑ri=1 pi αi , where
r ≤ d, and pi > 0 for i ∈ {1, . . . , r }. By definition of face, this means that the states {αi }ri=1 are in F,
and therefore generate F. Consequently, there is an effect a that picks out the whole face as the set of
states ρ such that ( a|ρ) = 1. In the specific case considered above, it is a = ∑ri=1 αi† . Such faces are
called exposed.
Therefore the study of faces of sharp theories with purification reduces to the study of normalised
effects. Thanks to Lemma 1, it is enough to consider subsets of pure maximal sets. Pick a pure maximal
set {αi }id=1 , and consider a subset I of {1, . . . , d}. The subset I flags the slits that are open in the
experiment. Setting aI := ∑i∈I αi† , we can define the two faces

1. FI := {ρ ∈ St1 (A) : ( aI |ρ) = 1};

2. FI⊥ := {ρ ∈ St1 (A) : ( aI |ρ) = 0},

in analogy with those of Deﬁnition 6. Clearly the effect aI⊥ := ∑i∈/I αi† deﬁnes the orthogonal face FI⊥ ,
5 6
as it occurs with probability one on the states of FI⊥ . Note that each of the effects αi† i∈/I occurs with
zero probability on the states of FI .

Deﬁnition 8. An orthogonal projector (in the sense of [20]) on the face FI is a transformation PI ∈ Transf (A)
such that

• if ρ ∈ FI , then PI ρ = ρ;
• if ρ ∈ FI⊥ , then PI ρ = 0.

We can prove the existence of a projector at least in one case, when I = {1, . . . , d}. In this case
.
aI = u, so FI = St1 (A), and FI⊥ = ∅. Then it is enough to take PI = I . However, sharp theories with
puriﬁcation admit projectors on every face.

Proposition 6. Sharp theories with puriﬁcation have pure projectors on every face FI . Furthermore one has
uPI = aI .

Proof. Suppose ρ is any state in FI , then ( aI |ρ) = 1. By Proposition 2 we know that there is a pure
transformation PI such that PI ρ = ρ for every ρ ∈ FI . We also have (u| PI |σ) ≤ ( aI |σ), so if σ ∈ FI⊥ ,
we have (u| PI |σ ) = 0, whence PI σ = 0.

171
Entropy 2017, 19, 253

To prove that uPI = aI , ﬁrst note that ψ† PI = ψ† for every pure state ψ ∈ FI . Indeed ψ† PI is
' ( ' (
pure by Purity Preservation, and we have ψ† PI ψ = ψ† ψ = 1 because PI ψ = ψ by deﬁnition.
By Theorem 6, we have ψ PI = ψ . Furthermore, ϕ PI = 0 for a pure state ϕ ∈ FI⊥ . Indeed, consider
† † †

1
1
ϕ† PI χ =
d ∑ ϕ† PI αi +
d ∑ ϕ† PI αi .
i ∈I i∈
/I

The second term vanishes because αi ∈ FI⊥ for i ∈/ I. The ﬁrst term vanishes because PI αi = αi for
i ∈ I, and ϕ is perfectly distinguishable from any of the αi ’s for i ∈ I by means of the observation-test
' (
{u − aI , aI }, implying ϕ† αi = 0 [30]. This means that ϕ† PI occurs with zero probability on all states
contained in χ, and since χ is complete [19], ϕ† PI = 0. Now, when we calculate uPI , we separate the
contribution arising from states in orthogonal faces:

uPI = ∑ αi† PI + ∑ αi† PI = ∑ αi† = aI

i ∈I i∈
/I i ∈I

This concludes the proof.

In other words, PI occurs with the same probability as aI , thus satisfying one of the desiderata
of Section 3. Moreover, extending some of the results in the Proof 6 by linearity, we obtain the dual
statements of Deﬁnition 8, namely

• ρ† PI = ρ† if ρ ∈ FI
• ρ† PI = 0 if ρ ∈ FI⊥

Another consequence of Proposition 6 is that projectors actually project on their associated face, viz.
for every normalised state ρ, PI ρ = λσ, where σ is in FI , and λ = ( aI |ρ). Indeed, λ = (u| PI |ρ) = ( aI |ρ).
If λ
= 0, which means ρ ∈ / FI⊥ , then and ( aI |σ ) = λ1 ( aI | PI |ρ). However, we know that aI PI = aI , so
( aI |σ) = 1, showing that σ ∈ FI .
Furthermore, we can show that every projector PI has a complement PI⊥ , which is the projector
' (
associated with the effect aI⊥ = ∑i∈/I αi† , which deﬁnes the orthogonal face FI⊥ . Clearly PI⊥ ρ = aI⊥ ρ σ,
with σ ∈ FI⊥ . In particular, PI⊥ ρ vanishes if and only if ρ ∈ FI .
These properties are the starting point for proving the idempotence of projectors.
.
Proposition 7. Given a ﬁxed pure maximal set {αi }id=1 and I ⊆ {1, . . . , d}, one has PI2 = PI . Moreover, if J is
.
another subset of {1, . . . , d} disjoint from I, then PI PJ = 0.

Proof. Recall that for every state ρ, PI ρ = λσ, where σ is in FI . Now, PI leaves σ invariant by definition, so

PI2 ρ = λPI σ = λσ,

.
so PI2 = PI . To prove the other property, note that if I and J are disjoint, they deﬁne orthogonal faces.
Indeed, suppose ρ ∈ FI , then

1 = Tr ρ = ( aI |ρ) + ( aJ |ρ) + ∑ αi† ρ ,
i∈
/ I∪J

which implies ( aJ |ρ) = 0 because ( aI |ρ) = 1. Hence ρ ∈ FJ⊥ . Now, given any normalised state ρ,
.
PI PJ ρ = 0 because PJ ρ is proportional to a state in FI⊥ . This proves that PI PJ = 0.
d
5 6
This result shows that, a pure maximal set {αi }i=1 is ﬁxed, whenever we have a partition
once
I j of {1, . . . , d}, the test PIj is a von Neumann measurement. The only thing left to check is that

172
Entropy 2017, 19, 253

∑ j uPIj = u, which is a sufficient condition for a set of transformations to be a test in sharp theories
with purification. This is satisfied because, recalling Proposition 6,

d
∑ uPIj = ∑ aIj = ∑ αi† = u.
j j i =1

Because of the properties proved above, von Neumann measurements are repeatable and
minimally disturbing measurements in the sense of Refs. [59,63]. Indeed, aIj PIj = aIj , and

aIj ∑ PIk = aIj PIj + ∑ aIj PIk = aIj ,

k k
= j

because for k
= j the PIk ’s project on faces orthogonal to FIj .
The next proposition concerns the interplay between orthogonal projectors and the dagger.

Proposition 8. For every normalised state ρ, and for every projector PI on a face FI , one has ( PI ρ)† = ρ† PI .

Proof. First of all, note that 0 ≤ PI ρ ≤ 1, and it vanishes if and only if ρ ∈ FI⊥ . If ρ ∈ FI⊥ , then
ρ† PI = 0, so the statement is trivially true. Now suppose PI ρ > 0. We will ﬁrst prove the statement
for normalised pure states ψ, then it is sufﬁcient to extend it by linearity to all states. We will make use
of the uniqueness of the dagger for normalised pure states. Then the statement is equivalent to proving
†
PI ψ ψ† PI
= ,
PI ψ PI ψ

Noting that the term in brackets is a normalised pure state (by Purity Preservation), and that the RHS
is a pure effect (again by Purity Preservation), by the uniqueness of the dagger for normalised pure
states (cf. Theorem 6), it is enough to prove that
' (
ψ† PI PI ψ
= 1;
PI ψ2
' ( . ' (
in other words that ψ† PI PI ψ = PI ψ2 . Recall that PI2 = PI (Proposition 7), so ψ† PI PI ψ =
' † ( ' ( ' (
ψ PI ψ . Now, PI ψ = PI ψ ψ , where ψ is a pure state in FI . We have
ψ
† P ψ = P ψ ψ† ψ .
PI I I
' † ( ' † (
We only need to prove that ψ ψ = PI ψ. Recall that ψ ψ = ψ † ψ by Lemma 2, and that

ψ † PI = ψ † as ψ ∈ FI , thus

ψ† ψ = ψ † PI ψ = PI ψ ψ † ψ = PI ψ .

†
PI ψ ψ† PI
By the uniqueness of the dagger for normalised pure states we conclude that PI ψ
= PI ψ
, namely
( PI ψ)† = ψ† PI .

A consequence of this proposition is that orthogonal projectors play nicely with the inner product
of Lemma 2, namely for every ξ, η ∈ StR (A) one has

PI ξ, η = ξ, PI η . (11)

In other words, projections are symmetric with respect to the inner product.
The last property we need is a generalisation of the results of Proposition 7.
.
Proposition 9. Fixing a pure maximal set {αi }id=1 , and considering I, J ⊆ {1, . . . , d}, we have PI PJ = PI∩J .

173
Entropy 2017, 19, 253

Proof. First let us prove that

PI PJ ρ = PI PJ ρ ρ (12)

where we have used the fact that αi† PJ = αi† if i ∈ J, and αi† PJ = 0 if i ∈/ J. If ρ ∈ FI⊥∩J , both the LHS and
the RHS of Equation (12) vanish, and the statement is trivially satisﬁed. Now, let us assume ρ ∈ / FI⊥∩J ,
in this case ( aI∩J |ρ) > 0. We wish to prove that ( aI∩J | PI PJ |ρ) = ( aI∩J |ρ). Recalling the expression of
aI∩J , we have

∑ αi† PI PJ ρ = ∑ αi† PJ ρ = ∑ αi† ρ = (aI∩J |ρ) ,
i ∈I∩J i ∈I∩J i ∈I∩J

again by the properties of PI and PJ . This means that PI PJ maps every normalised state to a state of
FI∩J , up to normalisation.
.
Now let us prove that ( PI PJ )2 = PI PJ . First note that FI∩J ⊆ FI . Indeed, suppose ρ ∈ FI∩J , then

( aI | ρ ) = ∑ αi† ρ + ∑ αi† ρ = ( aI∩J |ρ) = 1,
i ∈I∩J i ∈I\J

( '
where we have used the fact that αi† ρ = 0 if i ∈ / I ∩ J. By a similar argument, FI∩J ⊆ FJ . Now,
PI PJ ρ = PI PJ ρ ρ , with ρ ∈ FI∩J . Then ( PI PJ )2 ρ = PI PJ ρ PI PJ ρ . However, ρ ∈ FJ , so PJ ρ = ρ ,
and, similarly, ρ ∈ FI , so PI ρ = ρ . Consequently,

( PI PJ )2 ρ = PI PJ ρ ρ = PI PJ ρ,
.
proving that ( PI PJ )2 = PI PJ .
Now let us prove that for every ξ ∈ StR (A), we have ( PI PJ ξ )† = ξ † PI PJ . Following the lines of
proof of Proposition 8, let us show that this is true when ξ is a normalised pure state ψ. This boils
down to showing that

ψ† PI PJ PI PJ ψ = PI PJ ψ2 .

The proof goes on as for Proposition 8, noting that if ψ ∈ FI∩J , then ψ† PI PJ = ψ† because
ψ† PI = ψ† as ψ ∈ FI , and, similarly, ψ† PJ = ψ† as ψ ∈ FJ . Eventually we find that for pure states
( PI PJ ψ)† = ψ† PI PJ , and by linearity this means that ( PI PJ ξ )† = ξ † PI PJ .
A consequence of this property is that PI PJ ξ, η = ξ, PI PJ η , for all ξ, η ∈ StR (A). These linear
maps on StR (A) are such that StR (A) = im PI PJ ⊕ ker PI PJ , and ker PI PJ is the orthogonal subspace
to im PI PJ , hence it is uniquely defined once im PI PJ is fixed. Note that for any projector PI we have
im PI = span FI , and we have just proved that im PI PJ = span FI∩J = im PI∩J . Having the same image,
and consequently the same kernel, PI PJ and PI∩J agree on a basis of StR (A), therefore they agree also
.
on all states of A, meaning that PI PJ = PI∩J .

5.3. Main Result

Proposition 29 of [4] asserts that theories satisfying two postulates, Strong Symmetry and
Projectivity, have higher-order interference if and only if their projectors (in our terminology here)
preserve purity. A close examination of its proof, and those of all lemmas and propositions used in its
proof—notably Lemma 22 and Propositions 18, 25, 26, and 28 of [4]—reveals that only premises weaker
than the conjunction of Strong Symmetry and Projectivity are used: self-duality, the “spectral-like
decomposition” of effects as in Lemma 1 above, the fact that faces are determined by subsets of maximal
distinguishable sets of states as in Section 5.2 above, the existence of projectors onto each face in the

174
Entropy 2017, 19, 253

sense of Deﬁnition 8 above, and the fact that these are symmetric with respect to the self-dualising inner
product (i.e., orthogonal projectors), and satisfy Proposition 9 above. We have established these weaker
premises for sharp theories with puriﬁcation, and moreover, we have established in Proposition 6 that
their projectors preserve purity, so we have proved:

Theorem 7. In any sharp theory with puriﬁcation there can be no nth order interference for n ≥ 3.

5.4. Jordan-Algebraic Structure

Our results also imply that systems, and therefore also the “subsystems” associated with their
faces, are operationally equivalent to finite-dimensional Jordan-algebraic systems. These are systems
A for which St+ (A) is the cone of squares in a finite-dimensional Euclidean Jordan algebra (EJA) and
Eff + (A) is identified with the same cone, with evaluation of effects on states given by the inner product
and the Jordan unit as the deterministic effect. (See [37] for more on Jordan algebraic operational
systems, and [61] for a mathematical treatment.)

Theorem 8. In a sharp theory with puriﬁcation, every system A has both St+ (A) and Eﬀ + (A) isomorphic to
the cone of squares in a Euclidean Jordan algebra (EJA) via isomorphisms S and T such that ( a|ρ) = Ta, Sρ,
where •, • is the canonical inner product on the EJA, and T takes the deterministic effect to the Jordan unit.

Proof. The proof uses results of Alfsen and Shultz [64], for which we refer to [61]. Theorem 9.33
in [61] implies that finite-dimensional systems with symmetry of transition probabilities (STP), a type
of projection operator they call “compression” associated with every face, and whose compressions
preserve purity, have state spaces affinely isomorphic to the state spaces of Euclidean Jordan algebras.
Sharp theories with purification satisfy STP, as noted following Lemma 2 above. Our projectors are
easily shown to be examples of compressions by the same argument as in Theorem 17 of [4]; this
argument uses only properties satisfied by our projectors (the same ones needed in the proof of
Theorem 7, except for Purity Preservation) and does not need Strong Symmetry. As shown above, our
projectors also preserve purity.

Since faces of Jordan-algebraic systems are also Jordan-algebraic (to see this, combine a result
of Iochum [65] (Theorem 5.32 in [61]), whose finite dimensional case is that all faces of EJAs are the
positive part of the images of compressions, with the facts (cf. pp. 22–26 of [61]) that every face of the
cone of squares is the image of such a compression P ([61], Lemma 1.39), and also a Jordan subalgebra
whose unit is the image of the order unit under P ([61], Proposition 1.43).), so are the faces of state
spaces in sharp theories with purification. However, it is not the case that in sharp theories with
purification, each face of a system is necessarily isomorphic to a stand-alone system of the theory
(an object of the category, in the categorical formulation), but, it is always possible to extend the theory
such that they are. Every category has a Cauchy completion: this is a minimal extension of the category
such that every idempotent morphism π : A → A can be written as a retraction-section pair, i.e., as the
composition π = σ ◦ ρ, with ρ : A → B and σ : B → A, such that the reverse composition ρ ◦ σ is the
identity morphism on B. When the idempotents are projectors P like the ones we consider here, B will
be a system isomorphic to the face im+ ( P). Of course, since there may be idempotents beyond the
projectors onto faces (for example, decoherence of a set of orthogonal subspaces, or damping to a fixed
state, in quantum theory), Cauchy completion of an operational theory T may add many objects in
addition to ones isomorphic to faces of systems of T; indeed, for many operational theories (e.g., ones
possessing idempotent decoherence maps) this will add some classical systems. This is indeed the
case for quantum theory where the Cauchy completion leads to the category of finite-dimensional
C*-algebras and completely positive maps [66]. The Cauchy completion can be thought of as adding
in all operationally accessible systems that can be simulated on the physical system via a consistent
restriction on the allowed states, effects and transformations. The Cauchy completion of a sharp theory
with purification will likely satisfy the Ideal Compression postulate by virtue of containing the faces
that are images of orthogonal projectors; but there are also non-Cauchy complete theories that satisfy

175
Entropy 2017, 19, 253

it, e.g., the category CPM of finite-dimensional quantum systems and CP maps, in which all systems,
and also all images of orthogonal projectors as defined above, are fully coherent quantum systems, but
there are no classical systems.
In [37], some categories, including dagger-compact-closed categories, of Jordan algebraic systems
were constructed; these categories are equivalent to operational theories as we use the term here.
Although sharp theories with purification also have Jordan algebraic state and effect spaces, it is
interesting to note that some of the explicit examples in [30,49] involve composites different from those
that would be obtained in the categories considered in [37] for systems with the same state spaces.
On the other hand, the category combining real and quaternionic systems in [37] does not satisfy
Purity Preservation by parallel composition and hence falls outside the class of sharp theories with
purification, although its filters do preserve purity. Of course, the failure of Purity Preservation by
parallel composition seems likely to allow phenomena like the nonextensiveness of entropy when
products of states are taken, which could warrant focusing on sharp theories with purification in
thermodynamically motivated work such as [30].
That Jordan-algebraic systems lack higher-order interference was shown by Barnum and
Ududec ([12]; announced in [67]) and by Niestegge [68]; combining this with Theorem 8 gives another
way to see that our results on sharp theories with purification imply the absence of higher-order
interference. Moreover, as not all EJAs satisfy our postulates, it is clear that our postulates are sufficient
but not necessary conditions for ruling out higher-order interfence.

6. Discussion and Conclusions

We proved that in sharp theories with purification multi-slit experiments must have a pure projector
structure and, moreover, such theories exhibit at most second-order interference. Hence these theories
are, at least conceptually, very “close” to quantum theory. Moreover, recent work has shown that sharp
theories with purification are close to quantum theory in terms of other physical and information
processing features. Indeed, such theories possess quantum-like contextuality behaviour [59,63],
quantum-like computation [7,8], and quantum-like thermodynamic Properties [30,49,54]. Recall from
Section 4 that quantum theory is not the only example of a generalised probabilistic theory satisfying
these principles. Hence Causality, Purity Preservation, Pure Sharpness, and Puriﬁcation do not recover
the entire quantum formalism.
However, if one were to introduce the Ideal Compression and Local Discriminability principles
of the reconstruction of quantum theory due to Chiribella, D‘Ariano, and Perinotti [20], one would
indeed regain the entire quantum formalism. Indeed, both additional principles are necessary: Local
Discriminability to preclude real quantum theory and Ideal Compression to preclude the contrived—yet
admissible—example of the theory in which all systems are composites of qubits. Sharp theories with
purification thus serve as a fertile test-bed for physics that is conceptually quite close to that predicted
by the quantum world, but which may diverge from it in certain small, yet interesting, ways.

Finding Higher Order Interference

To date there has been no experiment that has found higher-order interference, at least, none
that cannot be explained by taking into account the fact that the “sets of histories are not mutually
exclusive” [2,35]. However, this might be due to the speciﬁc experimental set-up employed, rather
than a fundamental preclusion of higher-order interference in nature. We show here that many of the
properties needed to rule out observing higher-order interference are in fact quite natural assumptions
which appear to be suggested by the experimental set-up employed. This suggests that the experimental
set-up itself may implicitly rule out observing higher-order interference from the outset.
The main result of the current work is that sharp theories with puriﬁcation can never exhibit
higher-order interference in any experiment. However, in a wider class of theories, we still will not
observe higher-order interference in a particular experiment if the following three conditions are met;

176
Entropy 2017, 19, 253

hence, to have any chance of observing higher-order interference, experiments must be designed in
order to try to violate these conditions.

1. The transformations corresponding to blocking slits satisfy: TI TJ = TI∩J . By this we mean that
they share several properties with the projectors PI of Section 5: if we deﬁne the effects aI = uTI
and the faces FI and FI⊥ as in Section 5.2, i.e., as the 1-set and 0-set of aI , then the TI are assumed
to be orthogonal projectors in the sense of Deﬁnition 8, and to be both idempotent and “orthogonal”
(TI TJ = 0) if I and J are disjoint (as in Proposition 7).
2. The TI ’s map pure states to pure states
3. The TI ’s are self-adjoint.

The first of these is generally expected as only those slits belonging to both I and J will not be
blocked by either TI or TJ , and so should hold in this experimental set-up for any theory that can
describe it.
The second assumption, which is also natural given the multi-slit set-up, is that, in an idealised
scenario, the slits should not introduce fundamental noise. That is, if an input state ρ is pure, i.e., has
no classical noise associated with it, then TI ρ should also be pure. Hence it appears natural to assume
that TI maps pure states to pure states. Violating this principle by just adding noise to the experiment
does not seem likely to demonstrate higher-order interference. A more plausible way to violate this
however would be if the particle passing through the slits were to become entangled with some degree
of freedom associated with them, if we do not have access to this degree of freedom then this would
send a pure input to a mixed state.
The final assumption is far less general than the others, as it places a constraint on the theory.
That is, to even discuss whether a transformation is self-adjoint (cf. also Appendix B), one requires
that the theory itself be self-dual. To fully understand what this assumption entails, one needs an
operational or physical interpretation of the self-dualising inner product (see [69] for an example
of such an interpretation). However, intuitively this notion reflects the inherent symmetry of the
experimental set-up. Here one could consider propagation from the source to the effect or from the
effect to the source as being “dual” to one another and, moreover, that the physical blocking of slits
has an equivalent effect in either situation. That is, the assumption of self-adjointness corresponds to
the statement that the projector has an equivalent action on the effects associated with a particular slit
as it does on the states which can pass through them.
If an experiment satisfies these assumptions then for any self-dual theory it was shown in [4]
(Proposition 29) that we will not see higher-order interference in this experiment. Hence any set
of physical principles which ensure these assumptions hold will rule out higher-order interference.
Because the mathematical assumptions involved in formalising a multi-slit experiment are so natural
when interpreted operationally, perhaps one should search for higher-order interference in set-ups that
don’t seem to preclude it from the outset. This could involve “asymmetric” multi-slit set-ups that are
not obviously time-symmetric in an arbitrary generalised probabilistic theory. One could also consider
experiments that search for higher-order phases [8], a reformulation of higher-order interference that
makes no reference to projectors and hence does not preclude certain generalised theories from the
outset. The assumption that nature is self-dual could also be rejected; this poses the question as to
whether it is possible to find a direct experimental test of this principle.

Acknowledgments: The authors thank J. Barrett for useful discussions and J. J. Barry for encouragement while
writing the current paper. This work was supported by EPSRC grants through the Controlled Quantum Dynamics
Centre for Doctoral Training, the UCL Doctoral Prize Fellowship (project number 534936), and an Oxford doctoral
training scholarship, and also by Oxford-Google DeepMind Graduate Scholarship. We also acknowledge ﬁnancial
support from the European Research Council (ERC Grant Agreement No. 337603), the Danish Council for
Independent Research (Sapere Aude) and VILLUM FONDEN via the QMATH Centre of Excellence (Grant
No. 10059). This work began while the authors were attending the “Formulating and Finding Higher-order
Interference” workshop at the Perimeter Institute. Research at Perimeter Institute is supported by the Government
of Canada through the Department of Innovation, Science and Economic Development Canada and by the
Province of Ontario through the Ministry of Research, Innovation and Science.

177
Entropy 2017, 19, 253

Author Contributions: All authors contributed equally to the present work.

Conﬂicts of Interest: The authors declare no conﬂict of interest.

Appendix A. Norms and Fidelity

Appendix A.1. Operational Norm and Dagger Norm

In Ref. [19] the operational norm for every vector ξ ∈ StR (A) was introduced:

ξ := sup ( a|ξ ) − inf ( a|ξ )

a∈Eﬀ (A) a∈Eﬀ (A)

As pointed out in [19], in quantum theory the operational norm coincides with the trace norm.
The analogy is apparent also in sharp theories with puriﬁcation.

Proposition A1. Let ξ ∈ StR (A) be diagonalised as ξ = ∑id=1 xi αi . Then ξ = ∑id=1 | xi |.

Proof. Let us separate the terms with non-negative eigenvalues from the terms with negative
eigenvalues, so that we can write ξ = ξ + − ξ − , where ξ + := ∑xi ≥0 xi αi , and ξ − = ∑xi <0 (− xi ) αi . Clearly,
ξ + , ξ − ∈ St+ (A). In order to achieve the supremum of (a|ξ ) we must have (a|ξ − ) = 0. Moreover,

( a|ξ + ) = ∑ xi ( a | αi ) ≤ ∑ xi
x i ≥0 x i ≥0

since ( a|αi ) ≤ 1 for every i. The supremum of ( a|ξ + ) is achieved by a = ∑ xi ≥0 αi† . Hence supa ( a|ξ ) =
∑ xi ≥0 xi . By a similar argument, one shows that infa ( a|ξ ) = ∑ xi <0 xi . Therefore

d
ξ = ∑ xi + ∑ (− xi ) = ∑ | xi | .
x i ≥0 x i <0 i =1

1
p p
For p ≥ 1, the p-norm of a vector x ∈ Rd is defined as x p := ∑id=1 | xi | , thus we have
ξ = x1 , where x is the spectrum of ξ.
In sharp theories with purification we have an additional norm,
/ the dagger norm, defined in
Section 5.1. The dagger norm of a vector ξ ∈ StR (A) is ξ † = ∑id=1 x2i , where the xi ’s are the
eigenvalues of ξ. It is obvious from the very definition that ξ † = x2 . Thanks to these results
following from diagonalisation, we can derive the
√ standard bounds between the two norms, by making
use of the well-known bounds x2 ≤ x1 ≤ d x2 , which imply
√
ξ † ≤ ξ ≤ d ξ † . (A1)

Note that, unlike Ref. [70], here the bounds are derived without assuming Bit Symmetry [4,71].
If we take ξ to be a normalised state ρ, its eigenvalues form a probability distribution, and we
have ρ† ≤ 1, with equality if and only if ρ is pure. Note that ρ† is a Schur-convex function [72] of
the eigenvalues of ρ, so it is a purity monotone [30]. As such, it attains its minimum on the invariant
state, which is χ† = √1 , so for every normalised state one has
d

1
√ ≤ ρ† ≤ 1,
d

178
Entropy 2017, 19, 253

consistently with the bounds (A1). The square of the dagger norm, still a Schur-convex function,
was called purity in Refs. [70,73]. Consequently 1 − ρ2† is a measure of mixedness, sometimes
called the impurity I (ρ) of ρ. The impurity can be extended to subnormalised states by deﬁning it as
I (ρ) := (Tr ρ)2 − ρ2† [4].
The two norms behave differently under channels applied to states. In Ref. [19] it was shown that
in causal theories the operational norm of a state ρ is preserved by channels: C ρ = ρ for every
channel C , because channels are such that uC = u.
Instead the dagger norm shows a different behaviour. To describe it, it is useful to divide channels
into two classes: unital and non-unital channels [49].

Deﬁnition A1. A channel D ∈ Transf (A, B) is unital if D χA = χB .

Unital channels do not increase the dagger norm of states.

Proposition A2. If D is a unital channel, then D ρ† ≤ ρ† , for every normalised state ρ.

Proof. Unital channels can be chosen as free operations for the resource theory of purity [49].
In Ref. [49] it was shown that the spectrum of D ρ is majorised by the spectrum of ρ (see Ref. [72] for
a deﬁnition of majorisation and Schur-convex functions). Since the dagger norm is a Schur-convex
function, we have D ρ† ≤ ρ† .

Clearly if D is reversible, the dagger norm is preserved, by Proposition 4.

For non-unital channels there is at least one state—the invariant state χ—for which the dagger
norm increases. Indeed, if C is non-unital, χ is majorised by C χ, whence χ† ≤ C χ† . Is it true,
then, that non-unital channels increase the dagger norm of all states? The answer is clearly negative.
Consider the non-unital channel mapping all states to a ﬁxed mixed state ρ0
= χ. For some states,
e.g., the invariant state, the dagger norm will increase, for others, e.g., pure states, the dagger norm
will decrease because it is a purity monotone. In short, for non-unital channels there is no uniform
behaviour of the dagger norm.

Appendix A.2. Dagger Fidelity

The inner product defined in Section 5.1 allows us to define a fidelity-like quantity, called the
dagger fidelity.

Definition A2. Given two normalised states ρ and σ, the dagger fidelity is defined as

ρ, σ
F† (ρ, σ) = .
ρ† σ †

The dagger fidelity measures the overlap between two states. It shares some properties with the
fidelity in quantum theory (cf. for instance Ref. [74]), despite not coinciding with it. The first, obvious
one, is that F† (ρ, σ) = F† (σ, ρ).
To prove the other properties we need the following lemma, generalising one of the results
of Ref. [30].
' (
Lemma A1. Let {ρi }in=1 be perfectly distinguishable states. Then ρi† ρ j = ρi 2† δij .
' (
Proof. Clearly what we need to prove is that ρi† ρ j = 0 if i
= j. Let { ai }in=1 be the perfectly
r
distinguishing test, and let ρi be diagonalised as ρi = ∑ki=1 pk,i αk,i , where pk,i > 0 for all k = 1, . . . , r.
We have ( ai |ρi ) = 1, hence by Proposition 2 there exists a non-disturbing pure transformation Ti such
' ( ' (
that Ti =ρi I . Specifically, we have that Ti αk,i = αk,i . Moreover if i
= j, we have uTi ρ j ≤ ai ρ j = 0,
' (
whence uTi ρ j = 0. This means that Ti ρ j = 0 for all j
= i.

179
Entropy 2017, 19, 253

Now, consider

α†k,i Ti αk,i = α†k,i αk,i = 1,

where we have used the fact that Ti αk,i = αk,i . Since α†k,i Ti is a pure effect, it must be α†k,i Ti = α†k,i by
Theorem 6. By linearity we have ρi† Ti = ρi† . Now, using this fact, for all j
= i

ρi† ρ j = ρi† Ti ρ j = 0,

because Ti ρ j = 0.
' (
Recalling that ρ† σ = ρ, σ , this lemma means that perfectly distinguishable states form an
orthogonal set. Speciﬁcally, if the states are pure, the set is orthonormal.
The following proposition extends and generalises the properties of the self-dualising inner
product of Ref. [71].

Proposition A3. The dagger ﬁdelity has the following properties, for all normalised states ρ and σ.

1. 0 ≤ F† (ρ, σ) ≤ 1;
2. F† (ρ, σ ) = 0 if and only if ρ and σ are perfectly distinguishable;
3. F† (ρ, σ ) = 1 if and only if ρ = σ;
4. F† (U ρ, U σ) = F† (ρ, σ ), for every reversible channel U .

Proof. Let us prove the various properties.

' (
1. Recall that ρ, σ = ρ† σ ≥ 0, whence F† (ρ, σ) ≥ 0. Moreover, by Schwarz inequality, ρ, σ ≤
ρ† σ† , so F† (ρ, σ) ≤ 1.
2. Suppose ρ and σ are perfectly distinguishable, then by Lemma A1 ρ, σ = 0, implying F† (ρ, σ) = 0.
Now suppose F† (ρ, σ ) = 0; then ρ, σ = 0. Let ρ = ∑ri=1 pi αi be a diagonalisation of ρ, with
' ( ' (
pi > 0, for all i = 1, . . . , r, and r ≤ d. We have ∑ri=1 pi αi† σ = 0, which means that αi† σ = 0
for i = 1, . . . , r. This means that we can build an observation-test that distinguishes ρ and σ
perfectly by taking { a, u − a}, where a = ∑ri=1 αi† .
3. Clearly, if ρ = σ, ρ, σ = ρ2† , whence F† (ρ, σ) = 1. Conversely, suppose F† (ρ, σ ) = 1.
This means that ρ, σ = ρ† σ† . By Schwarz inequality, this is true if and only if ρ = λσ, for
some λ ∈ R. Since both states are normalised, λ = 1, yielding ρ = σ.
4. This property follows by Proposition 4, because the inner product and the dagger norm are
invariant under reversible channels.

Note that Property 3 captures the sharpness of the dagger for all normalised states [69].
A property involving tensor product of states is the following.

Proposition A4. For all normalised states ρ1 , ρ2 , σ1 , σ2 one has

F† (ρ1 ⊗ ρ2 , σ1 ⊗ σ2 ) = F† (ρ1 , σ1 ) F† (ρ2 , σ2 )

The proof needs the following easy lemma.

Lemma A2. Let ρ, σ ∈ St1 (A), then (ρ ⊗ σ )† = ρ† ⊗ σ† .

Proof. Let us prove the result for ρ and σ pure, the general result will follow by linearity. By Purity
' (
Preservation, ρ ⊗ σ and ρ† ⊗ σ† are pure, and one has ρ† ⊗ σ† ρ ⊗ σ = 1. By Theorem 6,
†
(ρ ⊗ σ ) = ρ† ⊗ σ† .

Now comes the actual proof.

180
Entropy 2017, 19, 253

Proof of Proposition A4. We have

ρ1 ⊗ ρ2 , σ1 ⊗ σ2
F† (ρ1 ⊗ ρ2 , σ1 ⊗ σ2 ) = .
ρ1 ⊗ ρ2 † σ1 ⊗ σ2 †

Now, by Lemma A2,

ρ1 ⊗ ρ2 , σ1 ⊗ σ2 = ρ1† ⊗ ρ2† σ1 ⊗ σ2 = ρ1† σ1 ρ2† σ2 = ρ1 , σ1 ρ2 , σ2 .

Furthermore,
/ /
ρ1 ⊗ ρ2 † = ρ1 ⊗ ρ2 , ρ1 ⊗ ρ2 = ρ1 , ρ1 ρ2 , ρ2 = ρ1 † ρ2 † .

Putting everything together,

ρ1 , σ1 ρ2 , σ2
F† (ρ1 ⊗ ρ2 , σ1 ⊗ σ2 ) = · = F† (ρ1 , σ1 ) F† (ρ2 , σ2 ) .
ρ1 † σ1 † ρ2 † σ2 †

Appendix B. Dagger of All Transformations

Inspired by the results of Lemma 2, in sharp theories with puriﬁcation, we can extend the dagger
to all transformations, a feature often present in process theories [44,45,69,75].

Deﬁnition A3. Given the transformation A ∈ Transf (A, B), its dagger (or adjoint) is a linear transformation
A† from B to A deﬁned as

B A
A B
†
A† A
ρ = ρ† , (A2)
S S

for every system S, and every state ρ ∈ St1 (B ⊗ S).

This deﬁnition speciﬁes the dagger of a transformation completely, thanks to Equation (2).
Note that Lemma 2 allows us to formulate Equation (10) in term of effects and their dagger:

ab† = ba†

for all effects a, and b. In this way, Deﬁnition A3 can be recast in equivalent terms by taking b as the
term in round brackets in the RHS of Equation (A2). This yields

B A A B
A† A
ρ E = E† ρ† , (A3)
S S

for every system S, every state ρ ∈ St1 (B ⊗ S), and every effect E ∈ Eﬀ (A ⊗ S).
The dagger of a transformation may not be a physical transformation, i.e., it may send physical
states to non-physical ones. Indeed, the action of A† ⊗ I on a generic state (the LHS of Equation (A2))
is deﬁned as the dagger of an effect. However, not all daggers of effects are physical states. For instance,
take the deterministic effect u = ∑id=1 αi† , where {αi }id=1 is a pure maximal set. Its dagger is
u† = ∑id=1 αi = dχ, which is a supernormalised (and hence non-physical) state.
For channels, we can give a necessary condition for the existence of a physical dagger of the channel.

181
Entropy 2017, 19, 253

Proposition A5. Let C ∈ Transf (A, B) be a channel. If C † is a physical transformation, then C is unital, and
C † itself is a unital channel.

Proof. If C † is a physical transformation, then, for every normalised state ρ ∈ St1 (B), we have
8 † 8 ' ( ' ( ' (
8C ρ8 ≤ 1, or in other words, uC † ρ ≤ 1. By Equation (A3), uC † ρ = ρ† C u† , so the condition
8 † 8
8C ρ8 ≤ 1 is equivalent to

1
ρ† C χ ≤ , (A4)
d
with equality if and only if C † is a channel. Suppose by contradiction that C is not unital, then
C χ = ρ0
= χ. Diagonalise ρ0 as ρ0 = ∑id=1 pi αi , where p1 ≥ p2 ≥ . . . ≥ pd ≥ 0, and p1 > 1d .
' (
Then taking ρ to be α1 in ρ† C χ yields p1 , but p1 > 1d , contradicting Equation (A4).
Being C unital, we have that

1 1
ρ† C χ = ρ† χ = Tr ρ = ,
d d

showing that C † is itself a channel. Let us prove it is unital. The action of C † on χ is deﬁned in
Equation (A2), so
† 1 1
C † χ = χ† C = (uC)† = u† = χ,
d d
where we have used the fact that C is a channel, so uC = u. This proves that C † is unital.

We can prove that the dagger of a transformation has some nice properties.
' (†
Proposition A6. For every transformation A ∈ Transf (A, B), one has A† = A.

Proof. By Equation (A3) given any system S, any state ρ ∈ St1 (A ⊗ S), and any effect E ∈ Eﬀ (B ⊗ S),
we have
A ' † († B
A B
A†
A
ρ E = E † ρ† . (A5)
S
S

A linear extension of Equation (A3) to cover the case when E† is not a physical state, applied to the
RHS of Equation (A5) yields

B A A B
A† A
E† ρ† = ρ E .
S S

Comparing this with Equation (A5), we get the thesis.

We can give a characterisation of the dagger of reversible channels, which are unital channels.

Proposition A7. If U ∈ Transf (A, B) is a reversible channel, U † = U −1 .

Proof. We have
B A A B
U† U
ρ E = E† ρ† ,
S S

9 : 9 :
for any S, ρ, E. Recalling Lemma 2, the RHS is ρ, (U ⊗ I) E† . By Proposition 4 ρ, (U ⊗ I) E† =
9' −1 ( :
U ⊗ I ρ, E† , and by symmetry of the inner product we have that

182
Entropy 2017, 19, 253

; < ; < B
U −1
A
U −1 ⊗ I ρ, E† = E† , U −1 ⊗ I ρ = ρ E ,
S

whence the thesis follows.

In particular we have that the dagger of the SWAP channel between two systems is the SWAP with
the input and output systems reversed.
The orthogonal projectors of Section 5.2, on the other hand, are self-adjoint on single system.
.
Proposition A8. Given the orthogonal projector PI on a face FI , we have PI† = PI .
' ( ' ( 9 :
Proof. For every ρ and E, we have E PI† ρ = ρ† PI E† . The RHS is ρ, PI E† . By the properties
of projectors, ; < ; < ; <
ρ, PI E† = PI ρ, E† = E† , PI ρ = ( E| PI |ρ) .
.
This shows that PI† = PI .

Finally we prove some properties of the dagger with respect to compositions. We need an easy
lemma ﬁrst.

Lemma A3. For every A ∈ Transf (A, B), every system S, and every vector ξ ∈ StR (A⊗S) we have
A B
† B A
A A†
ξ = ξ† .
S S

' († ' († ' (†

Proof. Recall that A = A† ; by Deﬁnition A3 we have A† ξ = ξ † A†

' († ⎛ ⎞†
A B
A
A B A† B
A†
A
ξ = ξ =⎝ ξ† ⎠ .
S S
S

Taking the dagger of this equation yields the desired result.

Now we can state the main results. The ﬁrst concerns sequential composition.

Proposition A9. For all transformations A ∈ Transf (A, B), B ∈ Transf (B, C), one has (BA)† = A† B † .

Proof. Take any system S, any state ρ ∈ St1 (C ⊗ S), and any effect E ∈ Eﬀ (A ⊗ S). By Equation (A3)
we have

C A
(BA)† A
BA C A
A B
B C
ρ E = E† ρ† = E† ρ† .
S S
S

Deﬁne ξ as ξ := (A ⊗ I) E† , so

C A
(BA)† B
B C C
B†
B
ρ E = ξ ρ† = ρ ξ† .
S S
S

183
Entropy 2017, 19, 253

0 1† ' (
By Lemma A3 ξ † = (A ⊗ I) E† = E A† ⊗ I , then

C A
(BA)† C
B†
B
A†
A
ρ E = ρ E ,
S S

therefore (BA)† = A† B † .

Finally the dagger respects parallel composition. Again we need a lemma.

Lemma A4. For every A ∈ Transf (A, B), every systems S and S , we have (IS ⊗ A ⊗ IS )† = IS ⊗ A† ⊗
I S .

Proof. As a ﬁrst step, let us prove that, for every system S, we have (A ⊗ IS )† = A† ⊗ IS . Take any
system S , any state ρ ∈ St1 (B ⊗ S ⊗ S ), and any effect E ∈ Eﬀ (A ⊗ S ⊗ S ), Equation (A3) yields

A B
B A A B A
†
(A ⊗ I) A⊗I
ρ S S
E = E†
S S
ρ† = E†
S
ρ† .
S S
S

Specialising Equation (A3) to the case of a composite system, we have

B A
A
A B
A†

E†
S
ρ† = ρ S
E ,
S S

whence we conclude that (A ⊗ IS )† = A† ⊗ IS .

Now let us prove that, for every system S, (IS ⊗ A)† = IS ⊗ A† . Note that

S S A B S
A
A B
= SWAP SWAP .
A A S B

By Proposition A9, and recalling what we have just proved, we have

⎛ ⎞†
S S B A S S
A†
⎝ ⎠ = SWAP SWAP = .
A B
A B S A B
A†
A

To get the thesis, note that (IS ⊗ A ⊗ IS )† = [(IS ⊗ A) ⊗ IS ]† . We have just proved that

[(IS ⊗ A) ⊗ IS ]† = (IS ⊗ A)† ⊗ IS ,

and that (IS ⊗ A)† = IS ⊗ A† , therefore we conclude that (IS ⊗ A ⊗ IS )† = IS ⊗ A† ⊗ IS .

Proposition A10. Let A ∈ Transf (A, B), and B ∈ Transf (C, D). We have (A ⊗ B)† = A† ⊗ B † .

Proof. Take any system S, any state ρ ∈ St1 (B ⊗ D ⊗ S), and any effect E ∈ Eﬀ (A ⊗ C ⊗ S), we have

184
Entropy 2017, 19, 253

A B
B A A B A
(A ⊗ B)† A⊗B
ρ D C
E = E†
C D
ρ† = E†
C
B D
ρ† .
S S
S

Now deﬁne ξ := (IA ⊗ B ⊗ IS ) E† , hence

B A
B A
A
A B
A†
(A ⊗ B)†
ρ D C
E = ξ D
ρ† = ρ D
ξ†
S
S S

' (
By Lemmas A3 and A4, we have that ξ † = E IA ⊗ B † ⊗ IS , so

B A
B A A†
†
(A ⊗ B)
ρ D C
E = ρ D
B†
C
E ,
S
S

whence the thesis.

This means that the dagger respects the composition of diagrams, and corresponds to the action
of ﬂipping a diagram with respect to a vertical axis.

References
1. Feynman, R.P.; Leighton, R.; Sands, M. The Feynman Lectures on Physics. The Deﬁnitive and Extended Edition;
Addison Wesley: Boston, MA, USA, 2005.
2. Sorkin, R.D. Quantum mechanics as quantum measure theory. Mod. Phys. Lett. A 1994, 9, 3119–3127.
3. Sorkin, R.D. Quantum Classical Correspondence: The 4th Drexel Symposium on Quantum Nonintegrability;
Chapter Quantum Measure Theory and Its Interpretation; International Press: Boston, MA, USA, 1997;
pp. 229–251.
4. Barnum, H.; Müller, M.P.; Ududec, C. Higher-order interference and single-system postulates characterizing
quantum theory. New J. Phys. 2014, 16, 123029.
5. Bolotin, A. On the ongoing experiments looking for higher-order interference: What are they really testing?
arXiv 2016, arXiv:1611.06461.
6. Dakić, B.; Paterek, T.; Brukner, Č. Density cubes and higher-order interference theories. New J. Phys. 2014,
16, 023028.
7. Lee, C.M.; Selby, J.H. Deriving grover’s lower bound from simple physical principles. New J. Phys. 2016,
18, 093047.
8. Lee, C.M.; Selby, J.H. Generalised phase kick-back: The structure of computational algorithms from physical
principles. New J. Phys. 2016, 18, 033023.
9. Lee, C.M.; Selby, J.H. Higher-order interference in extensions of quantum theory. Found. Phys. 2017,
47, 89–112.
10. Niestegge, G. Three-slit experiments and quantum nonlocality. Found. Phys. 2013, 43, 805–812.
11. Ududec, C. Perspectives on the Formalism of Quantum Theory. Ph.D. Thesis, University of Waterloo,
Waterloo, ON, Canada, 2012.
12. Ududec, C.; Barnum, H.; Emerson, J. Probabilistic Interference in Operational Models. 2009, in preparation.
13. Ududec, C.; Barnum, H.; Emerson, J. Three slit experiments and the structure of quantum theory. Found. Phys.
2011, 41, 396–405.
14. Lee, C.M.; Selby, J.H. A no-go theorem for theories that decohere to quantum mechanics. arXiv 2017,
arXiv:1701.07449.

185
Entropy 2017, 19, 253

15. Barnum, H.; Barrett, J.; Leifer, M.; Wilce, A. Generalized no-broadcasting theorem. Phys. Rev. Lett. 2007,
99, 240501.
16. Barnum, H.; Wilce, A. Information processing in convex operational theories. Electron. Notes Theor. Comput. Sci.
2011, 270, 3–15.
17. Barrett, J. Information processing in generalized probabilistic theories. Phys. Rev. A 2007, 75, 032304.
18. Barrett, J.; de Beaudrap, N.; Hoban, M.J.; Lee, C.M. The computational landscape of general physical theories.
arXiv 2017, arXiv:1702.08483.
19. Chiribella, G.; D’Ariano, G.M.; Perinotti, P. Probabilistic theories with purification. Phys. Rev. A 2010,
81, 062348.
20. Chiribella, G.; D’Ariano, G.M.; Perinotti, P. Informational derivation of quantum theory. Phys. Rev. A 2011,
84, 012311.
21. Chiribella, G.; Spekkens, R.W. (Eds.) Quantum Theory: Informational Foundations and Foils; Fundamental
Theories of Physics; Springer: Dordrecht, The Netherlands, 2016; Volume 181.
22. Dakić, B.; Brukner, Č. Quantum Theory and Beyond: Is Entanglement Special; Cambridge University Press:
Cambridge, UK, 2011; pp. 365–392.
23. Hardy, L. Quantum Theory From Five Reasonable Axioms. arXiv 2001, arXiv:quant-ph/0101012.
24. Hardy, L. Foliable Operational Structures for General Probabilistic Theories; Cambridge University Press:
Cambridge, UK, 2011; pp. 409–442.
25. Lee, C.M.; Barrett, J. Computation in generalised probabilistic theories. New J. Phys. 2015, 17, 083001.
26. Lee, C.M.; Hoban, M.J. Bounds on the power of proofs and advice in general physical theories. Proc. R. Soc. A
2016, 472, 20160076.
27. Lee, C.M.; Hoban, M.J. The information content of systems in general physical theories. In Proceedings of
the 7th International Workshop on Physics and Computation, Manchester, UK, 14 July 2016; Volume 214,
pp. 22–28.
28. Masanes, L.; Müller, M.P. A derivation of quantum theory from physical requirements. New J. Phys. 2011,
13, 063001.
29. Hardy, L. Reformulating and reconstructing quantum theory. arXiv 2011, arXiv:1104.2066.
30. Chiribella, G.; Scandolo, C.M. Entanglement as an axiomatic foundation for statistical mechanics. arXiv
2016, arXiv:1608.04459.
31. Krumm, M.; Barnum, H.; Barrett, J.; Müller, M.P. Thermodynamics and the structure of quantum theory.
New J. Phys. 2017, 19, 043025.
32. Jin, F.; Liu, Y.; Geng, J.; Huang, P.; Ma, W.; Shi, M.; Duan, C.; Shi, F.; Rong, X.; Du, J. Experimental test of
born’s rule by inspecting third-order quantum interference on a single spin in solids. Phys. Rev. A 2017,
95, 012107.
33. Kauten, T.; Keil, R.; Kaufmann, T.; Pressl, B.; Brukner, Č.; Weihs, G. Obtaining tight bounds on higher-order
interferences with a 5-path interferometer. New J. Phys. 2017, 19, 033017.
34. Park, D.K.; Moussa, O.; Laflamme, R. Three path interference using nuclear magnetic resonance: A test of
the consistency of born’s rule. New J. Phys. 2012, 14, 113025.
35. Sinha, A.; Vijay, A.H.; Sinha, U. On the superposition principle in interference experiments. Sci. Rep. 2015,
5, 10304.
36. Sinha, U.; Couteau, C.; Jennewein, T.; Laflamme, R.; Weihs, G. Ruling out multi-order interference in
quantum mechanics. Science 2010, 329, 418–421.
37. Barnum, H.; Graydon, M.; Wilce, A. Composites and categories of Euclidean Jordan algebras. arXiv 2016,
arXiv:1606.09331.
38. Chiribella, G. Dilation of states and processes in operational-probabilistic theories. In Proceedings of the
11th workshop on Quantum Physics and Logic, Kyoto, Japan, 4–6 June 2014; Volume 172, pp. 1–14.
39. Chiribella, G.; D’Ariano, G.M.; Perinotti, P. Quantum Theory: Informational Foundations and Foils; Chapter
Quantum from Principles; Springer: Dordrecht, The Netherlands, 2016; pp. 171–221.
40. Hardy, L. Quantum Theory: Informational Foundations and Foils; Chapter Reconstructing Quantum Theory;
Springer: Dordrecht, The Netherlands, 2016; pp. 223–248.
41. Abramsky, S.; Coecke, B. A categorical semantics of quantum protocols. In Proceedings of the 19th Annual
IEEE Symposium on Logic in Computer Science, Turku, Finland, 13–17 July 2004; pp. 415–425.
42. Coecke, B. Kindergarten quantum mechanics: Lecture notes. AIP Conf. Proc. 2006, 810, 81–98.

186
Entropy 2017, 19, 253

43. Coecke, B. Quantum picturalism. Contemp. Phys. 2010, 51, 59.

44. Coecke, B.; Duncan, R.; Kissinger, A.; Wang, Q. Quantum Theory: Informational Foundations and Foils; Chapter
Generalised Compositional Theories and Diagrammatic Reasoning; Springer: Dordrecht, The Netherlands,
2016; pp. 309–366.
45. Coecke, B.; Kissinger, A. Picturing Quantum Processes: A First Course in Quantum Theory and Diagrammatic
Reasoning; Cambridge University Press: Cambridge, UK, 2017.
46. Selinger, P. A survey of graphical languages for monoidal categories. In New Structures for Physics; Coecke, B., Ed.;
Springer: Berlin, Germany, 2011; pp. 289–356.
47. Wootters, W.K. Local accessibility of quantum states. In Complexity, Entropy and the Physics of Information;
Zurek, W.H., Ed.; Westview Press: Boulder, CO, USA, 1990; pp. 39–46.
48. Chiribella, G.; Scandolo, C.M. Entanglement and thermodynamics in general probabilistic theories.
New J. Phys. 2015, 17, 103027.
49. Chiribella, G.; Scandolo, C.M. Purity in microcanonical thermodynamics: A tale of three resource theories.
arXiv 2016, arXiv:1608.0446.
50. Gour, G.; Müller, M.P.; Narasimhachar, V.; Spekkens, R.W.; Yunger Halpern, N. The resource theory of
informational nonequilibrium in thermodynamics. Phys. Rep. 2015, 583, 1–58.
51. Horodecki, M.; Horodecki, P.; Oppenheim, J. Reversible transformations from pure to mixed states and the
unique measure of information. Phys. Rev. A 2003, 67, 062104.
52. Selby, J.H.; Coecke, B. Leaks: Quantum, classical, intermediate, and more. Entropy 2017, 19, 174.
53. Coecke, B. Terminality implies non-signalling. In Proceedings of the 11th workshop on Quantum Physics
and Logic, Kyoto, Japan, 4–6 June 2014; Volume 172, pp. 27–35.
54. Chiribella, G.; Scandolo, C.M. Operational axioms for diagonalizing states. In Proceedings of the
12th International Workshop on Quantum Physics and Logic, Oxford, UK, 15–17 July 2015; Volume 195,
pp. 96–115.
55. Chiribella, G.; Scandolo, C.M. Conservation of information and the foundations of quantum mechanics.
EPJ Web Conf. 2015, 95, 03003.
56. Disilvestro, L.; Markham, D. Quantum protocols within Spekkens’ toy model. Phys. Rev. A 2017, 95, 052324.
57. D’Ariano, G.M.; Manessi, F.; Perinotti, P.; Tosini, A. Fermionic computation is non-local tomographic and
violates monogamy of entanglement. Europhys. Lett. 2014, 107, 20009.
58. D’Ariano, G.M.; Manessi, F.; Perinotti, P.; Tosini, A. The Feynman problem and fermionic entanglement:
Fermionic theory versus qubit theory. Int. J. Mod. Phys. A 2014, 29, 1430025.
59. Chiribella, G.; Yuan, X. Bridging the gap between general probabilistic theories and the device-independent
framework for nonlocality and contextuality. Inf. Comput. 2016, 250, 15–49.
60. Pﬁster, C.; Wehner, S. An information-theoretic principle implies that any discrete physical theory is classical.
Nat. Commun. 2013, 4, 1851.
61. Alfsen, E.M.; Shultz, F.W. Geometry of State Spaces of Operator Algebras; Mathematics Theory & Applications;
Birkhäuser: Basel, Switzerland, 2003.
62. Barnum, H.; Barrett, J.; Krumm, M.; Müller, M.P. Entropy, majorization and thermodynamics in general
probabilistic theories. In Proceedings of the 12th International Workshop on Quantum Physics and Logic,
Oxford, UK, 15–17 July 2015; Volume 195, pp. 43–58.
63. Chiribella, G.; Yuan, X. Measurement sharpness cuts nonlocality and contextuality in every physical theory.
arXiv 2014, arXiv:1404.3348.
64. Alfsen, E.M.; Shultz, F.W. State spaces of Jordan algebras. Acta Math. 1978, 140.1, 155–190.
65. Iochum, B. Cônes Autopolaires et Algèbres de Jordan; Lecture Notes in Mathematics; Springer: Berlin/Heidelberg,
Germany, 1984; Volume 1049, doi: 10.1007/BFb0071358. (In French)
66. Coecke, B.; Selby, J.; Tull, S. Two roads to classicality. arXiv 2017, arXiv:1701.07400.
67. Barnum, H. Spectrality as a Tool for Quantum Reconstruction: Higher-Order Interference, Jordan State
Space Characterizations, Aug. 2009. Talk Given at the Conference “Reconstructing Quantum Theory”,
August 9–11, Perimeter Institute for Theoretical Physics. Available online: http://pirsa.org/09080016/
(accessed on 26 May 2017).
68. Niestegge, G. Conditional probability, three-slit experiments, and the jordan algebra structure of quantum
mechanics. Adv. Math. Phys. 2012, 2012, 156573.
69. Selby, J.H.; Coecke, B. Process-theoretic characterisation of the hermitian adjoint. arXiv 2016, arXiv:1606.05086.

187
Entropy 2017, 19, 253

70. Müller, M.P.; Oppenheim, J.; Dahlsten, O.C.O. The black hole information problem beyond quantum theory.
J. High Energy Phys. 2012, 2012, 9.
71. Müller, M.P.; Ududec, C. Structure of reversible computation determines the self-duality of quantum theory.
Phys. Rev. Lett. 2012, 108, 130401.
72. Marshall, A.W.; Olkin, I.; Arnold, B.C. Inequalities: Theory of Majorization and Its Applications; Springer Series
in Statistics; Springer: New York, NY, USA, 2011.
73. Müller, M.P.; Dahlsten, O.C.O.; Vedral, V. Unifying typical entanglement and coin tossing: On randomization
in probabilistic theories. Commun. Math. Phys. 2012, 316, 441–487.
74. Wilde, M.M. Quantum Information Theory, 2nd ed.; Cambridge University Press: Cambridge, UK, 2017.
75. Selinger, P. Dagger compact closed categories and completely positive maps. Electron. Notes Theor. Comput. Sci.
2007, 170, 139–163.

c 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

188
entropy
Article
Leaks: Quantum, Classical, Intermediate and More
John Selby 1 and Bob Coecke 2, *
1 Department of Physics, Imperial College London, Kensington, London SW7 2AZ, UK;
[email protected]
2 Department of Computer Science, University of Oxford, Oxford OX1 3PA, UK
* Correspondence: [email protected]; Tel.: +44-7881-333990

Academic Editors: Giacomo Mauro D’Ariano and Paolo Perinotti

Received: 26 January 2017; Accepted: 12 April 2017; Published: 19 April 2017

Abstract: We introduce the notion of a leak for general process theories and identify quantum
theory as a theory with minimal leakage, while classical theory has maximal leakage. We provide
a construction that adjoins leaks to theories, an instance of which describes the emergence of classical
theory by adjoining decoherence leaks to quantum theory. Finally, we show that defining a notion
of purity for processes in general process theories has to make reference to the leaks of that theory,
a feature missing in standard definitions; hence, we propose a refined definition and study the
resulting notion of purity for quantum, classical and intermediate theories.

Keywords: process theory; classical limit; purity

1. Introduction
Can we explain why the world is quantum by finding some sense in which quantum theory is an
optimal theory? Broadcasting distinguishes quantum theory from classical theory in that quantum
states cannot be broadcast [1], but neither can the states of many other theories [2,3]. Non-locality is
a measure of non-classicality, and quantum theory is non-local, but not maximally so [4]. Therefore,
is there some manner in which we can uniquely single out quantum theory? In this paper, we show
that quantum theory is a leak-free theory, whilst classical theory is maximally leaking. We formalise
the notion of a leak, which can roughly be thought of as a ‘one-sided broadcasting map’, within
the process-theoretic framework [3,5,6] as a particular type of process, which, as the name suggests,
accounts for leaking state-data into the environment.
Moreover, there is a natural way to introduce leaks to any theory, and by doing so, we obtain new
theories. We call this the leak construction. In particular, classical theory can be obtained from quantum
theory in this manner, where, in this example, the leaking is then nothing but decoherence [7,8]. Hence,
the concept of a leak allows us to generalise decoherence to arbitrary process theories. Besides classical
theory, any theory characterised by some finite-dimensional C*-algebra can be obtained in this manner
from quantum theory. In fact, as we show in a follow-up paper [9], only C*-algebras can be obtained
in this manner. Leaks therefore capture the operational content of finite-dimensional C*-algebras
on-the-nose, in a manner that does not involve any additive structure, nor a ∗-operation.
Finally, we observe that defining purity of processes in process theories with leaks is problematic;
in particular, this is the case for classical theory. Making explicit use of the concept of a leak, we
therefore propose a new definition that makes sense for arbitrary processes in arbitrary process theories.

Related Work
As explained in detail in the follow-up paper [9], the leak construction is related to the
“constructions of classical system types” in [10–12]. More speciﬁcally, in the case of quantum theory, we
exactly obtain the same result, but in a much simpler way, with much less use of structure and guided

Entropy 2017, 19, 174; doi:10.3390/e19040174 189 www.mdpi.com/journal/entropy

Entropy 2017, 19, 174

by a clear operational meaning. The notion of a leak is closely related to the decomposability
of a state-space [13] in the generalised probabilistic theory framework as, at least under some
standard assumptions, such as the “no-restriction hypothesis”, each is equivalent to the existence of a
non-disturbing measurement as discussed in [14].

2. Process Theories with Discarding...

A process theory [3,6] is a collection of systems that are represented by wires and processes that
are represented by boxes with wires as inputs (at the bottom) and outputs (at the top). Moreover, when
we plug these boxes together:
A B C
g
A D

f h
A

the resulting diagram should also be a process. To be mathematically more precise, the data that make
up a diagram are:

• the boxes that appear in the diagram and

• how these boxes are wired together, including the overall ordering of inputs/outputs.

Hence, two diagrams are equal when these data match up.
By a circuit [3,6], we mean a diagram that can be constructed by means of the obvious operations
of parallel composition ⊗ and sequential composition ◦ of boxes. For example, the following diagrams
is a circuit:

g g
=
f h f h

⎛ ⎞ ⎛ ⎞

= ⎝ ⊗ g ⎠◦⎝ f ⊗ h ⎠

Composite systems, denoted A ⊗ B, then simply arise by pairing wires:

A⊗B := A B

Remark 1. A process theory with circuits as diagrams can also be deﬁned as a strict symmetric monoidal
category. Strictness means that associativities and unit laws hold on-the-nose, unlike the symmetric monoidal
categories of concrete mathematical models where non-trivial associativity and unit natural isomorphisms are
required. Fortunately, by Mac Lane’s strictiﬁcation theorem [15], every such category is categorically equivalent
(although not isomorphic) to a strict one, which means that for all practical purposes, it can be thought of as
a strict one.

A state is a process without inputs; an effect is a process without outputs; and a number is a
process with neither inputs nor outputs. One special number is the empty diagram:

190
Entropy 2017, 19, 174

which in most theories coincides with the number one.

Throughout this paper, for each system in a process theory, we postulate the existence of
a discarding effect, which is interpreted just as its name indicates and which is denoted as:

We also make the natural assumption that discarding effects compose:

:= (1)
A⊗B A B

A process f is causal if we have:

f = (2)

and a theory is causal if all of the processes of the theory are causal. Therefore, except for the fact
that it composes, discarding is not subject to any deﬁning constraints. In a sense, its behaviour is
entirely implicit within its role within the deﬁning equation of causality. In particular, by Equation (2)
where f is taken to be an effect, it immediately follows that the only effects in a causal theory are the
discarding effects. In this form, the axiom of causality traces back to [16]. When restricting to causal
processes, a process theory is non-signalling [17]; hence, the causality of a theory is vital to guarantee
compatibility with relativity.

Example 1 (Classical probability theory). When viewing probability theory as a process theory, systems are
n-state classical systems and boxes are n × m stochastic matrices, and so, in particular, states are probability
distributions. Discarding is given by marginalisation, and so, causality boils down to the fact that the entries of
a probability distribution add up to one and that the entries in each column of a stochastic matrix add up to one.

Example 2 (Quantum theory). Quantum theory as a process theory has ﬁnite dimensional Hilbert spaces H
as its systems and completely positive trace preserving (CPTP) maps:

ξ : B(H) → B(H )

as its processes. Causality for density operators means having trace one and for completely positive maps means
being trace-preserving. One can also include classical data as additional systems, and then measurements and
controlled operations are also processes. If this is the case, we will often denote the classical systems as dotted
wires to distinguish them from quantum wires. Speciﬁcally, measurements are processes from quantum to
classical systems where the probabilities of obtaining the different outcomes are encoded in the classical system.
Causality then implies that, for projective measurements, the projectors form a resolution of the identity and,
for general measurements that the POVM elements sum to discarding. A full description and a pedagogical
introduction to this theory is in [3,6,18].

Typically, as will be the case in the examples below, we will want to describe both causal and
non-causal processes. We therefore will still, for each system, have a discarding map, which specifies
the causal processes, but there will also be other processes that will not satisfy Equation (2). There are
two main reasons for this. The first is to allow us to discuss events, i.e., processes that we cannot make
happen deterministically, but that can occur as a particular outcome in some experiment; therefore,
allowing us to obtain the probability of obtaining a specific outcome, which, in particular, allows us,
via suitable renormalisation, to describe post-selection. The second reason is mathematical simplicity:
it is often much easier to define the process theory, or various structures within it, in the non-causal
setting and then to restrict to the causal sub-theory when necessary.

191
Entropy 2017, 19, 174

Example 3 (Non-causal extension of quantum theory). To describe non-causal processes in quantum theory,
rather than taking processes as completely positive trace-preserving maps, we instead just require that they are
completely positive. It is very standard within quantum theory to consider such processes, for example Dirac
bras are non-causal or, more generally, individual POVM elements are non-causal.
An important tool is the Choi–Jamiolkowski isomorphism between transformations and bipartite states.
One direction of this isomorphism can be realised causally, using the Bell state, which we represent, up to
a normalisation factor, with a cup-shaped wire:

1
where D :=
D
which allows us to “bend wires up”:
1
f → f
D

This associates with each (causal) process a (causal) bipartite state. The other direction is however not realisable
causally, as it relies on the Bell effect, which we represent with a cap-shaped wire:

ρ → D ρ

The fact that this is an isomorphism provides us with the following intuitive diagrammatic rule (justifying the
representation of these as a cup and cap):

= (3)

It is then clear that the cap cannot be causal (even up to a rescaling) as, if it were, then the identity transformation
would be separable, i.e.:

ρ
= = =

where in the second step we relied on the fact that by causality, all effects must be discarding, so in particular,
the cap, as well as Equation (1).

Example 4 (Non-causal extension of classical theory). We can similarly extend classical theory, taking
processes as n × m matrices with positive real elements as opposed to stochastic matrices. This again allows us
to discuss particular outcomes of measurements, which may not happen with certainty, and moreover, gives us
a classical equivalent of the Choi–Jamiolkowski isomorphism where rather than using the Bell state and effect,
we use the perfectly correlated state and effect, again denoted by a cup and a cap. These can be deﬁned in terms of
the orthonormal basis states and effects as:

1 δij
i j = and i j = δij
n n

respectively. It is simple to check that these also satisfy Equation (3) as we would expect from the choice of the
diagrammatic representation.

192
Entropy 2017, 19, 174

This forms the basic structures needed to describe the physical content of process theories;
however, we will need some further tools for the proofs. These are all deﬁned in the standard way for
categorical quantum mechanics and surveyed in Appendix A for those unfamiliar with the ﬁeld.

3. ... and Leaks

Deﬁnition 1. A leak is a process:
A L
(4)
A

which has discarding as a right counit, that is:

= (5)

Proposition 1. All leaks are causal.

Proof. Causality of a leak means:

and this equation is obtained by discarding the outputs in (5).

When we have multiple leaks around, we may often represent them with different colours to
distinguish them.

Proposition 2. Leaks compose to give leaks.

Proof. Sequential composition of leaks is again a leak:

L1 L2 L1 ⊗ L2
=:

since we have:
L1
L1 ⊗ L2
= =
L2

and the same goes for parallel composition:

L1 L2 L1 ⊗ L2

=:
A B A⊗B

193
Entropy 2017, 19, 174

since we have:

L1 ⊗ L2 L1 L2
= = =
A⊗B A B A B A⊗B

For classical probability theory, copying of support elements provides a leak:

: X → X × X :: x → ( x, x )

since if we discard a copy, we are back with what we started off with. In fact, strictly speaking, what
we are dealing with here is not a copying operation since while it copies pure classical states, it does
not do that for impure ones. What it is instead is broadcasting, that is besides Equation (5), discarding
is also a left counit for the leaking process:

= (6)

Note that this requires L := A in Equation (4). This is the maximal possible leak for any system, as all
of the information about the ingoing state is leaked out.
On the other hand, quantum theory does not allow for broadcasting [1]. In fact, the only kind
of leak quantum theory admits is constant leaking. This immediately follows from the following
fact about quantum processes, which states that any dilation of a pure process, i.e., representation as
a process with an extra output that is discarded, must separate:

Proposition 3. For pure quantum processes f , we have:

ρ
f = g =⇒ g = f (7)

with ρ causal. That is, if a reduced process f is pure, then the process g we started from must separate.

Proof. See, e.g., [3,6].

Hence, since the identity is pure, by Proposition 3, it follows from the deﬁning equation of a
leak (5) that any leak for quantum theory must be constant, that is of the form:

ρ (8)

where we need to take the state to be causal:

= (9)
ρ

194
Entropy 2017, 19, 174

Remark 2. In quantum theory, Proposition 3 can actually be taken as a deﬁnition of the purity of processes, that
is a quantum process f is pure if and only if all dilations of f separate. However, in theories with non-constant
leaks, this deﬁnition must be revised as we discuss in detail in Section 7.

Of course, (8) is also a leak for classical probability theory, and another example arises by
combining broadcasting and a constant:
ρ (10)

At least qualitatively, quantum theory can therefore be described as a minimally-leaking theory,

as all leaks are constant leaks, whilst classical theory is maximally leaking, as for each system, there is
a maximal leak. We will now provide qualitative substance to this claim.

4. Quality of a Leak
For the sake of simplicity of the argument, we will restrict ourselves to a special kind of
process theories that admit the notion of a feedback wire. Explicitly spelling out the process-theoretic
characterisation of a feedback wire as in [19] goes beyond the scope of this paper. It sufﬁces to know
that they exist in both quantum and classical theory, where they can be constructed in the obvious way
using the cups and caps of Examples 3 and 4. The behaviour of such a feedback wire is that of a wire
of the shape:

for which we have the obvious equations, such as:

In particular, by means of such a wire, we can feed an output of a process back into it as an input:

B
f C
A

i.e., we create a feedback-loop.

Feedback-loops allow us to ask questions, such as how closely will some outgoing data match
an ingoing data. In particular, for the case of leaks where L = A, we can measure how closely the
leaked data matches the original (while ignoring the output) via the following diagram:

However, what tends to be more useful, particularly in the case where L

= A, is not asking precisely
how well does the outgoing data match the ingoing, but how well does the outgoing data encode
the ingoing data. For example, all of the information could be there just scrambled up or encoded in

195
Entropy 2017, 19, 174

some other system type. We therefore want to consider maximising over potential restoration maps
r : L → A, where r is taken to be causal. We call this notion the quality of a leak:
⎡ ⎤
2 3
Q := Maxr ⎣ r ⎦

If the structure of the numbers in a process theory is sufﬁciently rich, e.g., they are the real
numbers or probabilities, one can moreover renormalise this quantity as follows:
2 3
Q −
(11)
−

where the circle indicates the feedback-loop applied to the identity. As a leak, the quality of
broadcasting is one, since we have:
(6)
=

while for constant leaks, it is zero, since we have:

(9)
ρ = =
ρ

We therefore see that quantum theory is a minimally leaking theory as the renormalised quality
for any leak is zero, whilst classical theory is maximal as every system has a leak with renormalised
quality of one. In the next section, we consider how to increase the amount of leaking for a theory,
providing a process-theoretic perspective on the quantum to classical transition.

Example 5. If a process theory admits sums (cf. [6] or Appendix A), then set:

c := c +q ρ

with c + q = 1. Now, quality in the form Equation (11) is c.

5. A Representation for All Classical-Quantum Leaks

We already characterised all quantum leaks as being constant leaks; we next characterise all
classical leaks.

Proposition 4. All classical leaks are of the form:

L
A L
A l
= (12)
A A
A

where l is any causal classical process.

196
Entropy 2017, 19, 174

Proof. First, let us deﬁne, using the non-causal “cap” of Example 4:

:= l

Despite the fact that this is deﬁned using a non-causal process, the composite process l is actually causal:

l = = =

We can then use the matrix representation of the leak (see Appendix A):

i j
ij
:= ∑ Δk
ijk k

ij
where Δk ∈ R+ . The leak condition then implies that:

∑ Δk
ij
= δki
j

and so:
ij ij
Δk = Δk δki

Then, we can check that Equation (12) is indeed satisﬁed:

l i j i j
ij ij
= = ∑ Δk δki = ∑ Δk =
ijk k ijk k

We can now also characterise all leaks for composite classical-quantum systems:

Proposition 5. Denoting the classical system by a dotted line and the quantum system by a solid line;
all composite classical quantum systems have leaks of the form:

L
= (13)

where L is any causal process from classical to quantum systems.

197
Entropy 2017, 19, 174

Proof. Note that any composite leak deﬁnes a quantum leak as:

1
D

and therefore, as we know all quantum leaks separate:

= ρ

where ρ deﬁnes a classical leak as:

L
:=
ρ

and so putting this together, we have:

L
= = =
ρ

The bottom line is that all of these leaks involve the copying leak as the fundamental ingredient.
This is not all too surprising, since, as we showed in the previous section, it stands for maximal leakage.
The processes l and L then play the role of reducing the leakage, with as the extremal cases l and L
being constant, producing a constant leak.

6. The Leak-Construction
We now show how one can construct new process theories from old ones by introducing leaks.
This is done by inserting particular processes of the old theory of the form (15) on all of the wires.
The processes (14), to which we refer in the old theory as pre-leaks, then become leaks in the new
theory. Hence, the leak construction turns pre-leaks into leaks.

Theorem 1. Given any process theory and for each system a causal process:

A LA

(14)
A

which is such that the following process is idempotent:

(15)

198
Entropy 2017, 19, 174

and which are chosen coherently for composite systems:

A⊗B L A⊗ B A B LA LB

:= (16)
A⊗B A B

we can construct a new process theory in which each process (14) is a leak for the system A. This construction
goes as follows:

• systems stay the same;

• one restricts processes to those of the form:

f (17)
A

Proof. By causality of (14):

= (18)

discarding is preserved by the leak-construction. Given the form Equation (17) of the processes in the
theory and due to the idempotence of Equation (15), plain wires have taken the form Equation (15),
so the deﬁning equation of a leak Equation (5) is satisﬁed. To consider the pre-leak in the new theory, we
must apply the leak construction Equation (17), and using the condition for composites Equation (16),
we get the following process in the new theory:

A LA

LA LLA

A LA

A
LA

which is indeed a leak in the new theory:

(18) (15)
= =

199
Entropy 2017, 19, 174

which is the form of a plain wire in the new theory, and so, this construction does turn pre-leaks into
leaks. It is moreover straightforward to see that we again obtain a process theory.

Sometimes the leak-construction does nothing, in particular, when the pre-leaks are already leaks:

Example 6 (Trivial). A simple example of the leak construction is the one where the pre-leaks are taken to
already be leaks, since then (17) will reduce to the processes f themselves.

The main motivating example for this construction is of course the following:

Example 7 (Decoherence). The leak construction for the pre-leak:

: B(H) → B(H ⊗ H) :: |i i | → |i i | ⊗ |i i |

applied to the process theory of quantum processes (i.e., Example 2), we obtain classical probability theory
(i.e., Example 1).

In the above construction, it is really the idempotents rather than the speciﬁc pre-leaks that
determine the theory that is obtained. We can therefore have several different perspectives on the
“cause” of this idempotent, by considering different pre-leaks from which it could be obtained. Firstly,
we can always take the trivial case, where the pre-leak is just the idempotent itself, i.e., taking the
leaked system as the empty system. There are however three alternate forms that always exist in
quantum theory and that are more insightful.

Example 8. Firstly we can consider the puriﬁcation f of the idempotent, in the sense of [16]:

= f

This corresponds to the idea that information can never be fundamentally destroyed, only discarded, and so,
we can see this leaking of information into some causally-separated system leading to decoherence. Another
standard way to represent a general process is, via Stinespring dilation [20], as a reversible interaction with
an environment:

U
=
s

and so, we can equivalently view decoherence as arising due to a reversible interaction with some uncontrolled
environment [8]. A ﬁnal example, suggested to us by Rob Spekkens, is that the idempotent can be viewed as
describing a system that lacks a reference frame [21]; the leaked system would then correspond to the reference
system itself. This is the subject of ongoing work and is discussed in the Conclusion.

Example 7 leaves open the question whether there are any theories that can be obtained from this
leak construction in between classical and quantum theory. This question is solved in a forthcoming
paper where the key result is the following theorem:

Theorem 2. The leak construction applied to quantum processes (i.e., Example 2) gives all C*-algebras and
C*-algebras only.

200
Entropy 2017, 19, 174

Therefore, despite the weak structure of a leak, for the speciﬁc case of quantum theory, we obtain
precisely the C*-algebras via the leak construction. This leads one to contemplate the view that the
operational essence of (ﬁnite dimensional) C*-algebras is entirely captured by leaks and that the
additional structure of C*-algebras is merely an artefact of the Hilbert space representation.

Remark 3. The leak-construction does not apply to Example 5, since only for c = 0, 1, we have idempotence of (15).

Remark 4. For a process theory in which all systems are compositions A⊗n of one atomic system A, it sufﬁces
to pick a single process (14) for the system A (where L A will be of the form A⊗n , since all other such processes
arise then by coherence (16)).

Remark 5. If a pre-leak with L := A is co-associative:

then the idempotence of (15) follows from causality of the pre-leak.

Remark 6. The construction in Theorem 1, when modiﬁed by not ﬁxing a pre-leak for each type, but rather
considering all pairs of a system and a corresponding pre-leak, is known as the Karoubi envelope, or Cauchy
completion, or splitting of idempotents. More details on this can be found in [9].

7. Process-Purity from Leaks

In this section, we consider how leaks relate to purity in process theories. The purity (or lack
of purity) of a state is a fundamental concept in quantum theory and is equally important in most
approaches to generalised physical theories. However, there is no reason to consider this as solely
a property for states, but should be considered for all processes in a theory. Indeed, lack of knowledge
about a process, the noisiness of a channel and detection errors on a POVM-element all correspond
to process-impurities. We will show that deﬁning such a property for general theories, and classical
theory in particular, requires leaks.
In Reference [22], Chiribella et al. introduce the notion of side-information; this can be thought
of as information that is lost during a process that, in principle, could be possessed by some other
agent. The use of this in cryptographic scenarios is clear, where the side-information can be thought
of as being possessed by an eavesdropper attempting to inﬂuence or gain information about some
cryptographic protocol. Diagrammatically, this side information is depicted as:

Side information
about process f
f = g

Lack of side-information for a process would imply that g must separate such that the
side-information is independent of the process f . Indeed, this must be the case for any such g, i.e.:

ρ
f = g =⇒ g = f (19)

or in other words, all dilations of f must separate. As mentioned in Remark 2, the separability of
dilations (cf. Proposition 3) has been proposed as a deﬁnition of process-purity. Indeed for the case of

201
Entropy 2017, 19, 174

quantum theory, this corresponds to the expected notion of purity, that is that the CPTP map must
be Kraus rank 1. Remarkably, however, in the form of (19), this deﬁnition does not extend to general
processes of classical probability theory. In fact, nor does it do so for any theory that has broadcasting:

Proposition 6. If a non-trivial theory has broadcasting and one deﬁnes purity by means of (19), then plain
wires (i.e., identity processes) are not pure.

Proof. Assuming identities are pure and applying (19) to the deﬁning equation of a leak (5), we obtain:

= ρ (20)

that is, it is a constant leak. However, then, from the second deﬁning equation of broadcasting,
we obtain:
ρ
(6) (20)
= =

that is, each plain wire is a constant process, and hence, the theory is trivial, since as a consequence,
all processes must then be constant since for (causal) processes, we have:

ρ
ρ
f = = =

f f

Hence, in a non-trivial theory with broadcasting, identities cannot be pure in the sense of (19).

From the ﬁrst part of this proof, namely that this deﬁnition of purity implies that leaks must be
constant, it follows that this issue arises in any theories with non-constant leaks. We can think of this
as the fact that, if a system has a leak, then there is irreducible side-information contained within the
system itself:

Side information
about system A
=
A A

Fortunately, leaks also allow us to ﬁx this problem. Firstly, let us suppose that a theory has leaks
and also has a pure process f . Then, clearly, the following is a dilation of f :

202
Entropy 2017, 19, 174

where l is causal. One may therefore consider explicitly bringing leaks into play in the deﬁnition of
purity. A ﬁrst step in this direction is to weaken (19) as follows:

f = g =⇒ ∃ , & l : g = f (21)

However, now, we have the opposite problem: all classical processes, including all states, are pure!
(See Appendix B). It is clear that we are missing a constraint. The original idea was that for a process to
be pure, it should have no side-information that some eavesdropper could take advantage of. However,
we have shown that for some systems, there is irreducible side-information represented by leakage.
Therefore, to ensure that the eavesdropper cannot gain information or inﬂuence the process, we must
demand that the process does not interact with this irreducible side-information, such that leaking
before or after is equivalent:

f
∀ ∃ and ∀ ∃ such that = (22)
f

Hence, we propose the following definition of process-purity, which packages these two conditions,
(21) and (22), into a neat form:

Deﬁnition 2. f is pure if and only if:

f
f = g =⇒ ∃ & : g = = (23)
f

This ensures that the only side-information is this irreducible kind, i.e., system leakage,
and moreover, that pure processes do not interact with this irreducible side information. To further
motivate this definition, we will show that it provides a sensible definition for quantum, classical and
composite systems. However, first, note that for states, this definition reduces to:

Example 9. A state ψ is pure if we have:

ψ = =⇒ σ = ψ ρ
σ

This is the same as the original definition, and so, we see that it is only for general processes that
this new definition is necessary. Similarly, in the case of quantum theory, it is only the first condition
that provides a non-trivial constraint:

Example 10 (quantum purity). As for quantum theory, the only leaks are constant leaks, Condition (21) in
Deﬁnition 2 reduces to (19), while Condition (22) becomes trivial.

203
Entropy 2017, 19, 174

Whilst, in the classical case, as we have mentioned above, (21) is satisﬁed by all classical processes,
and so, it is only (22) that needs to be considered:

Example 11 (classical purity). All pure classical processes, between an n and m state system, are of the form:

n (24)
n
r
n

where we can deﬁne the ‘upside-down broadcasting map’ by:

and the black/white dot is any process that satisﬁes:

= and =

Proof. We prove here that pure classical processes must be of this form and leave the proof that any
process of this form is pure to Appendix C.
First consider the condition:

f
∀ ∃ such that =
f

for the special case where:

and using the standard form for classical leaks to write:

l
:=

Then, we can show that:

l
f
f = f = =
f

204
Entropy 2017, 19, 174

This implies that, for all i and j:

j j j

f = f l
i i i

so, for each i and j:

j j
j j
fi : = f =0 or li : = l =1
i i

j
Causality of l then implies that, for each j, there can only be a single i where li = 1, and so, for all other
j j
fi fi ,
i, we must have = 0. This means that in each row of there is at most a single non-zero element.
We can run through this argument in the opposite direction using the condition:

f
∀ ∃ such that =
f

j
which shows that fi can have at most a single non-zero element in each column. This is precisely
what is enforced by the black/white dot in the above form; the value of the non-zero elements is then
determined by the state r. Hence, we can write f in the desired form.

Example 12. If we consider purity for causal classical processes, then we ﬁnd that the pure processes are those
that are reversible (i.e., are isometries).

Proof. The deﬁnition of purity, and the standard form for classical leaks, requires that:

l
f =
f

and so, we have:

l
l
= = f = =
f f

Therefore, f is reversible in the sense that it has a left-inverse, i.e., l.

Finally, we consider the composite case, where the conjunction of (21) and (22) is necessary:

Example 13 (Composite classical-quantum purity). Pure processes are:

f
(25)

205
Entropy 2017, 19, 174

where we denote the classical system with a dotted line, and:

f (26)
i

is pure for all i.

Proof. Again, we prove the interesting direction here that pure processes on composite systems must
be of this form and, again, leave the other direction to Appendix C.
Note that a generic process can be written as:

An (almost) identical argument to the classical case shows that if this is pure, it can be written as:

We therefore move on to considering the other part of the deﬁnition of purity, that is that any dilation
can be written as a leak; that means that any dilation of this process can be written as:

f l f l
= ∑ i
i i i

r
r

Now, note that any collection of dilations of the processes:

f := fi = gi
i

deﬁnes a dilation of the whole process, which must be able to be written as a leak:

f l
∑ gi
i
= ∑ i
i i i i

r r

Therefore, each gi must separate, and hence, the f i are each pure quantum processes.

206
Entropy 2017, 19, 174

An immediate consequence of this is the following.

Proposition 7. The pure quantum to classical or classical to quantum maps are separable.

Proof. First note that,

i
ĩ
= π2 :=

Then, using the above result regarding pure maps for composite quantum classical systems, we have,

f f f
ĩ
= = ĩ ĩ

r r r
ĩ

Similarly, we obtain separability for pure quantum to classical maps.

This means that there is no pure way to transform between classical and quantum information.

8. Conclusions
In this paper, we introduce the concept of leaks to generalised process theories. The deﬁnition
of which can be thought of as a “one-way” broadcasting map. These prove to be very useful for
understanding various aspects of quantum theory from a physically well-motivated perspective.
In particular, we show:

• that quantum theory is a leak-free theory, whilst classical theory is maximally leaking, giving
a clear separation between the theories for which quantum theory is optimal.
• how to construct sub-theories via a “leak construction”, which can be thought of as the sub-theories
that can be obtained from a dynamical decoherence mechanism. For quantum theory, we can
obtain classical theory, composite quantum classical theory and, generally, finite dimensional
C*-algebras from this construction [9].
• a characterisation of the leaks and pure processes for quantum, classical and composite systems; in
particular, we demonstrate that there is no pure way to transform quantum systems into classical
systems or vice versa.
• that leaks are essential to deﬁning purity of processes; we therefore introduce a novel deﬁnition
of purity of processes, which makes sense both for quantum theory and for classical theory.

Future Work
In this paper, we have shown how classical theory emerges from quantum theory due to the leak
construction, providing a process-theoretic perspective on why the world on a large scale appears to
us to be classical. It is natural to ask: Is there some deeper theory of nature than quantum theory from
which quantum theory emerges in an analogous way? This is the subject of a forthcoming paper [23].
A second, related question, would be to ask: What does it imply about a theory if it can obtain classical
theory via a leak construction; is the ability for this to happen in quantum theory special or is this a
generic feature of general theories?
We have also shown that quantum theory is minimally leaking and classical theory maximally;
moreover, if we start from a process theory describing ﬁnite dimensional C*-algebras, then quantum
theory is singled out as the unique minimally-leaking theory. Can this idea lead to a complete
reconstruction of quantum theory [24]?

207
Entropy 2017, 19, 174

As mentioned in Example 8, one interpretation of the leak construction is as a way to represent

systems for which there is a missing reference frame, that is we can write the pre-leak as ([21],
Section IVB):

= dg Ug g
G

where G is a group associated with a reference frame for a particular degree of freedom, Ug is the
representation of G on the system of interest and g the state of the reference system. Note, however,
that making sense of this integral for general symmetry groups requires the reference be an infinite
dimensional quantum system and so is beyond the scope of this paper. One could replace, at least
for compact groups, the integral by a finite convex mixture (using the results of [16], Corollary 33
from Caratheodory’s theorem), for which the resulting idempotent would be the same. This can be
thought of as there only being a finite set of possible orientations for the reference frame. However,
a comprehensive understanding of the connections here would require consideration of the infinite
dimensional case. Moreover, we know that in the finite dimensional case, the leak construction leads
to C*-algebraic systems only; however, it remains an interesting open question as to what the leak
construction leads to for infinite dimensional systems.

Acknowledgments: We thank Aleks Kissinger, Dan Marsden, Rob Spekkens and Sean Tull for useful feedback.
John Selby was supported by the EPSRC (Engineering and Physical Sciences Research Council) through the
Controlled Quantum Dynamics Centre for Doctoral Training, and Bob Coecke is supported by the U.S. Air Force
Office of Scientific Research.
Author Contributions: Both authors contributed equally to all aspects of this work. Both authors have read and
approved the final manuscript.
Conflicts of Interest: The authors declare no conflict of interest.

Appendix A. Mathematical Tools for Proofs

Deﬁnition A1. Sums can be deﬁned by the fact that they distribute over diagrams, that is:

c
c

∑ bi = ∑ bi
i i

a a

In particular, in classical probability theory, we can take sums of diagrams where the sum is
the standard sum of matrices. In fact, this provides us with a matrix calculus for our diagrams.
In particular, we have a basis and co-basis for each system, denoted:
# $n # $n
i and j
i =1 j =1

respectively, such that they are orthonormal:

j
= δij
i

208
Entropy 2017, 19, 174

Then, this provides a decomposition of the identity

i
= ∑
i i

which allows us to write any process as:

i
∑
i i i
f = f := ∑ f ji
ij j
j
∑
j j

where it is simple to check that sequential composition then coincides with a matrix multiplication,
parallel composition with the matrix tensor product and diagrammatic sum with the sum of matrices.

Definition A2. For each classical system type, we have a family of spiders diagrammatically defined by, firstly:

···
···
··· ···
··· ··· =
··· ···
···

and secondly, that the symmetries of the representation as spiders are respected. Alternately, spiders can be
deﬁned via the matrix representation as:

···
··· i i
:= ∑i
··· i i
···

This family of maps is particularly important as, for classical theory at least, they allow us to deﬁne
various concepts that we have used throughout the paper in a uniﬁed way. Firstly, the broadcasting
map can now be seen as just an example spider with one input and two outputs, but moreover,
we have:
= = =

The feedback-loop we introduced can also now be interpreted as the composite of two spiders:

We moreover want to consider a way to join spiders of different dimensionality (denoted by using
a different colour), which is exactly what the black/white dots achieve.

209
Entropy 2017, 19, 174

Deﬁnition A3. Diagrammatically, the black/white dots are any process satisfying:

··· ···
···
··· = = ···
··· ···
···
···
···
···
···

which is equivalent to how they were introduced in Example 11. Alternatively, their matrix representation is:

π1
m i
l
:= ∑ (A1)
i =1 i
n
π2

requiring that l ≤ Min(n, m) and πi are arbitrary permutations of the basis elements. These are then just
matrices with elements {0, 1} with at most a single one in each row and column.

Appendix B. Dilations of Classical Processes

Any dilation of a classical process f can be written as:

f = F =⇒ F = f

to check this deﬁne l by its matrix elements as:

⎧
⎪
⎪ 1 if fik = 0
⎨
j
lik :=
⎪
⎪ Fi
kj
⎩ otherwise
fik

and then, it is simple to check this satisﬁes the above equation and, moreover, is causal.

Appendix C. Pure Quantum-Classical Composite Processes

We need to prove that our deﬁnition of purity, i.e., Conditions (22) and (21), is satisﬁed by any
process of the form:

210
Entropy 2017, 19, 174

where:

fi := f
i

is pure for all i.

That (22) is satisﬁed is a straightforward proof once the following observation, easily veriﬁed by a
straightforward calculation, is made:

∀ l ∃ l˜

such that l and l˜ are both causal and:

l l˜
=

To check this, simply deﬁne l˜ as:

s
l
l˜ := + ∑
j∈ J j

where J = Ker .
That (21) is satisﬁed is also simple if, using the purity of the f i , we can write the dilation as:

fi si
∑ i
i

and note that this can be written as a leak by deﬁning:

l si
:= where l := ∑
i i

References
1. Barnum, H.; Caves, C.M.; Fuchs, C.A.; Jozsa, R.; Schumacher, B. Noncommuting mixed states cannot be
broadcast. Phys. Rev. Lett. 1996, 76, 2818.
2. Barnum, H.; Barrett, J.; Leifer, M.; Wilce, A. A generalized no-broadcasting theorem. Phys. Rev. Lett. 2007,
99, 240501.
3. Coecke, B.; Kissinger, A. Categorical quantum mechanics I: Causal quantum processes. In Categories for the
Working Philosopher; Landry, E., Ed.; Oxford University Press: Oxford, UK, 2016.
4. Popescu, S.; Rohrlich, D. Quantum nonlocality as an axiom. Found. Phys. 1994, 24, 379–385.
5. Abramsky, S.; Coecke, B. A categorical semantics of quantum protocols. In Proceedings of the 19th Annual
IEEE Symposium on Logic in Computer Science (LICS), Washington, DC, USA, 13–17 July 2004; pp. 415–425.

211
Entropy 2017, 19, 174

6. Coecke, B.; Kissinger, A. Picturing Quantum Processes. A First Course in Quantum Theory and Diagrammatic
Reasoning; Cambridge University Press: Cambridge, UK, 2016.
7. Kuperberg, G. The capacity of hybrid quantum memory. IEEE Trans. Inf. Theory 2003, 49, 1465–1473.
8. Zurek, W.H. Quantum darwinism. Nat. Phys. 2009, 5, 181–188.
9. Coecke, B.; Selby, J.; Tull, S. Two roads to classicality. arXiv 2017, arXiv:1701.07400.
10. Selinger, P. Idempotents in Dagger Categories (Extended Abstract). Electron. Notes Theor. Comput. Sci. 2008,
210, 107–122.
11. Heunen, C.; Kissinger, A.; Selinger, P. Completely positive projections and biproducts. In Proceedings of the
10th International Workshop on Quantum Physics and Logic, Barcelona, Spain, 17–19 July 2013.
12. Cunningham, O.; Heunen, C. Axiomatizing complete positivity. arXiv 2015, arXiv:1506.02931.
13. Barrett, J. Information processing in generalized probabilistic theories. Phys. Rev. A 2007, 75, 032304.
14. Richens, J.; Selby, J.; Al-Saﬁ, S. Entanglement is an inevitable feature of any non-classical theory. arXiv 2016,
arXiv:1610.00682 .
15. Mac Lane, S. Categories for the Working Mathematician; Springer: Berlin/Heidelberg, Germany, 1998.
16. Chiribella, G.; D’Ariano, G.M.; Perinotti, P. Probabilistic theories with puriﬁcation. Phys. Rev. A 2010,
81, 062348.
17. Coecke, B. Terminality implies non-signalling. arXiv 2014, arXiv:1405.3681.
18. Coecke, B.; Kissinger, A. Categorical quantum mechanics II: Classical-quantum interaction. arXiv 2016,
arXiv:1605.08617.
19. Joyal, A.; Street, R.; Verity, D. Traced monoidal categories. In Mathematical Proceedings of the Cambridge
Philosophical Society; Cambridge University Pressess: Cambridge, UK, 1996; Volume 119, pp. 447–468.
20. Stinespring, W.F. Positive functions on C*-algebras. Proc. Am. Math. Soc. 1955, 6, 211–216.
21. Bartlett, S.D.; Rudolph, T.; Spekkens, R.W. Reference frames, superselection rules, and quantum information.
Rev. Mod. Phys. 2007, 79, 555.
22. Chiribella, G.; D’Ariano, G.M.; Perinotti, P. Quantum from principles. In Quantum Theory: Informational
Foundations and Foils; Springer: Berlin/Heidelberg, Germany, 2016; pp. 171–221.
23. Lee, C.M.; Selby, J.H. A no-go theorem for post-quantum theories that decohere to quantum theory. arXiv
2017, arXiv:1701.07449.
24. Selby, J.; Scandolo, C.M.; Coecke, B. Quantum theory from diagrammatic postulates. Forthcoming submitted.

212
entropy
Article
Measurement Uncertainty Relations for Position and
Momentum: Relative Entropy Formulation
Alberto Barchielli 1,2,3 , Matteo Gregoratti 1,2, * and Alessandro Toigo 1,3
1 Dipartimento di Matematica, Politecnico di Milano, Piazza Leonardo da Vinci 32, I-20133 Milano, Italy;
[email protected] (A.B.); [email protected] (A.T.)
2 Istituto Nazionale di Alta Matematica (INDAM-GNAMPA), 00185 Roma, Italy
3 Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Milano, 20133 Milano, Italy
* Correspondence: [email protected]; Tel.: +39-0223994569

Received: 26 May 2017; Accepted: 21 June 2017; Published: 24 June 2017

Abstract: Heisenberg’s uncertainty principle has recently led to general measurement uncertainty
relations for quantum systems: incompatible observables can be measured jointly or in sequence only
with some unavoidable approximation, which can be quantiﬁed in various ways. The relative entropy
is the natural theoretical quantiﬁer of the information loss when a ‘true’ probability distribution
is replaced by an approximating one. In this paper, we provide a lower bound for the amount
of information that is lost by replacing the distributions of the sharp position and momentum
observables, as they could be obtained with two separate experiments, by the marginals of any
smeared joint measurement. The bound is obtained by introducing an entropic error function,
and optimizing it over a suitable class of covariant approximate joint measurements. We fully exploit two
cases of target observables: (1) n-dimensional position and momentum vectors; (2) two components
of position and momentum along different directions. In (1), we connect the quantum bound to the
dimension n; in (2), going from parallel to orthogonal directions, we show the transition from highly
incompatible observables to compatible ones. For simplicity, we develop the theory only for Gaussian
states and measurements.

Keywords: measurement uncertainty relations; relative entropy; position; momentum

PACS: 03.65.Ta, 03.65.Ca, 03.67.-a, 03.65.Db

MSC: 81P15, 81P16, 94A17, 81P45

1. Introduction
Uncertainty relations for position and momentum [1] have always been deeply related to the
foundations of Quantum Mechanics. For several decades, their axiomatization has been of ‘preparation’
type: an inviolable lower bound for the widths of the position and momentum distributions, holding in
any quantum state. Such kinds of uncertainty relations, which are now known as preparation uncertainty
relations (PURs) have been later extended to arbitrary sets of n ≥ 2 observables [2–5]. All PURs
trace back to the celebrated Robertson’s formulation [6] of Heisenberg’s uncertainty principle:
for any two observables, represented by self-adjoint operators A and B, the product of the variances
of A and B is bounded from below by the expectation value of their commutator; in formulae,
Varρ ( A) Varρ ( B) ≥ 14 |Tr{ρ[ A, B]}|2 , where Varρ is the variance of an observable measured in any
system state ρ. In the case of position Q and momentum P, this inequality gives Heisenberg’s relation
2
Varρ (Q) Varρ (P) ≥ h̄4 . About 30 years after Heisenberg and Robertson’s formulation, Hirschman
attempted a ﬁrst statement of position and momentum uncertainties in terms of informational
quantities. This led him to a formulation of PURs based on Shannon entropy [7]; his bound was

Entropy 2017, 19, 301; doi:10.3390/e19070301 213 www.mdpi.com/journal/entropy

Entropy 2017, 19, 301

later refined [8,9], and extended to discrete observables [10]. Also other entropic quantities have been
used [11]. We refer to [12,13] for an extensive review on entropic PURs.
However, Heisenberg’s original intent [1] was more focused on the unavoidable disturbance that
a measurement of position produces on a subsequent measurement of momentum [14–21]. Trying to
give a better understanding of his idea, more recently new formulations were introduced, based
on a ‘measurement’ interpretation of uncertainty, rather than giving bounds on the probability
distributions of the target observables. Indeed, with the modern development of the quantum theory
of measurement and the introduction of positive operator valued measures and instruments [3,22–26],
it became possible to deal with approximate measurements of incompatible observables and to
formulate measurement uncertainty relations (MURs) for position and momentum, as well as for more
general observables. The MURs quantify the degree of approximation (or inaccuracy and disturbance)
made by replacing the original incompatible observables with a joint approximate measurement of them.
A very rich literature on this topic flourished in the last 20 years, and various kinds of MURs have been
proposed, based on distances between probability distributions, noise quantifications, conditional
entropy, etc. [12,14–21,27–32].
In this paper, we develop a new information-theoretical formulation of MURs for position and
momentum, using the notion of the relative entropy (or Kullback-Leibler divergence) of two probabilities.
The relative entropy S( pq) is an informational quantity which is precisely tailored to quantify the
amount of information that is lost by using an approximating probability q in place of the target
one p. Although classical and quantum relative entropies have already been used in the evaluation of
the performances of quantum measurements [24,27,30,33–40], their first application to MURs is very
recent [41].
In [41], only MURs for discrete observables were considered. The present work is a first attempt
to extend that information-theoretical approach to the continuous setting. This extension is not trivial
and reveals peculiar problems, that are not present in the discrete case. However, the nice properties of
the relative entropy, such as its scale invariance, allow for a satisfactory formulation of the entropic
MURs also for position and momentum.
We deal with position and momentum in two possible scenarios. Firstly, we consider the case
of n-dimensional position and momentum, since it allows to treat either scalar particles, or vector
ones, or even the case of multi-particle systems. This is the natural level of generality, and our
treatment extends without difficulty to it. Then, we consider a couple made up of one position and
one momentum component along two different directions of the n-space. In this case, we can see
how our theory behaves when one moves with continuity from a highly incompatible case (parallel
components) to a compatible case (orthogonal ones).
The continuous case needs much care when dealing with arbitrary quantum states and
approximating observables. Indeed, it is difficult to evaluate or even bound the relative entropy
if some assumption is not made on probability distributions. In order to overcome these technicalities
and focus on the quantum content of MURs, in this paper we consider only the case of Gaussian
preparation states and Gaussian measurement apparatuses [2,4,5,42–45]. Moreover, we identify the
class of the approximate joint measurements with the class of the joint POVMs satisfying the same
symmetry properties of their target position and momentum observables [3,23]. We are supported in
this assumption by the fact that, in the discrete case [41], simmetry covariant measurements turn out to
be the best approximations without any hypothesis (see also [17,19,20,29,32] for a similar appearance
of covariance within MURs for different uncertainty measures).
We now sketch the main results of the paper. In the vector case, we consider approximate joint
measurements M of the position Q ≡ ( Q1 , . . . , Qn ) and the momentum P ≡ ( P1 , . . . , Pn ). We find
the following entropic MUR (Theorem 5, Remark 14): for every choice of two positive thresholds

214
Entropy 2017, 19, 301

1 , 2 , with 1 2 ≥ h̄2 /4, there exists a Gaussian state ρ with position variance matrix Aρ ≥ 1 1 and
momentum variance matrix Bρ ≥ 2 1 such that
= >
h̄ h̄
S(Qρ M1,ρ ) + S(Pρ M2,ρ ) ≥ n (log e) ln 1 + √ − √ (1)
2 1 2 h̄ + 2 1 2

for all Gaussian approximate joint measurements M of Q and P. Here Qρ and Pρ are the distributions
of position and momentum in the state ρ, and Mρ is the distribution of M in the state ρ, with marginals
M1,ρ and M2,ρ ; the two marginals turn out to be noisy versions of Qρ and Pρ . The lower bound is
strictly positive and it linearly increases with the dimension n. The thresholds 1 and 2 are peculiar of
the continuous case and they have a classical explanation: the relative entropy S( pq) → +∞ if the
variance of p vanishes faster than the variance of q, so that, given M, it is trivial to find a state ρ enjoying
(1) if arbtrarily small variances are allowed. What is relevant in our result is that the total loss of
information S(Qρ M1,ρ ) + S(Pρ M2,ρ ) exceeds the lower bound even if we forbid target distributions
with small variances.
The MUR (1) shows that there is no Gaussian joint measurement which can approximate arbitrarily
well both Q and P. The lower bound (1) is a consequence of the incompatibility between Q and P and,
indeed, it vanishes in the classical limit h̄ → 0. Both the relative entropies and the lower bound in (1)
are scale invariant. Moreover, for fixed 1 and 2 , we prove the existence and uniqueness of an optimal
approximate joint measurement, and we fully characterize it.
In the scalar case, we consider approximate joint measurements M of the position Qu = u · Q
along the direction u and the momentum Pv = v · P along the direction v, where u · v = cos α. We find
two different entropic MURs. The first entropic MUR in the scalar case is similar to the vector case
(Theorem 3, Remark 11). The second one is (Theorem 1):

S(Qu,ρ M1,ρ ) + S(Pv,ρ M2,ρ ) ≥ cρ (α), (2)

for all Gaussian states ρ and all Gaussian joint approximate measurements M of Qu and Pv . This lower
bound holds for every Gaussian state ρ without constraints on the position and momentum variances
' ( ' (
Var Qu,ρ and Var Pv,ρ , it is strictly positive unless u and v are orthogonal, but it is state dependent.
Again, the relative entropies and the lower bound are scale invariant.
The paper is organized as follows. In Section 2, we introduce our target position and momentum
observables, we discuss their general properties and define some related quantities (spectral measures,
mean vectors and variance matrices, PURs for second order quantum moments, Weyl operators,
Gaussian states). Section 3 is devoted to the definitions and main properties of the relative and
differential (Shannon) entropies. Section 4 is a review on the entropic PURs in the continuous
case [7–9,46], with a particular focus on their lack of scale invariance. This is a flaw due to the
very definition of differential entropy, and one of the reasons that lead us to introduce relative entropy
based MURs. In Section 5 we construct the covariant observables which will be used as approximate
joint measurements of the position and momentum target observables. Finally, in Section 6 the main
results on MURs that we sketched above are presented in detail. Some conclusions are discussed in
Section 7.

2. Target Observables and States

Let us start with the usual position and momentum operators, which satisfy the canonical
commutation rules:
0 1
Q ≡ ( Q1 , . . . , Q n ), P ≡ ( P1 , . . . , Pn ), Qi , Pj = ih̄δij . (3)

215
Entropy 2017, 19, 301

Each of the vector operators has n components; it could be the case of a single particle in one or
more dimensions (n = 1, 2, 3), or several scalar or vector particles, or the quadratures of n modes of the
electromagnetic field. We assume the Hilbert space H to be irreducible for the algebra generated by
the canonical operators Q and P. An observable of the quantum system H is identified with a positive
operator valued measure (POVM); in the paper, we shall consider observables with outcomes in Rk
endowed with its Borel σ-algebra B(Rk ). The use of POVMs to represent observables in quantum
theory is standard and the definition can be found in many textbooks [22,23,26,47]; the alternative
name “non-orthogonal resolutions of the identity” is also used [3–5]. Following [5,23,26,31], a sharp
observable is an observable represented by a projection valued measure (pvm); it is standard to identify
a sharp observable on the outcome space Rk with the k self-adjoint operators corresponding to it by
spectral theorem. Two observables are jointly measurable or compatible if there exists a POVM having
them as marginals. Because of the non-vanishing commutators, each couple Qi , Pi , as well as the
vectors Q, P, are not jointly measurable.
We denote by T (H) the trace class operators on H, by S ⊂ T (H) the subset of the statistical
operators (or states, preparations), and by L(H) the space of the linear bounded operators.

2.1. Position and Momentum

Our target observables will be either n-dimensional position and momentum (vector case) or
position and momentum along two different directions of Rn (scalar case). The second case allows to
give an example ranging with continuity from maximally incompatible observables to compatible ones.

2.1.1. Vector Observables

As target observables we take Q and P as in (3) and we denote by Q( A), P( B), A, B ∈ B(Rn ),
their pvm’s, that is
Qi = xi Q(dx), Pi = pi P(dp). (4)
Rn Rn
Then, the distributions in the state ρ ∈ S of a sharp position and a sharp momentum measurements
(denoted by Qρ and Pρ ) are absolutely continuous with respect to the Lebesgue measure; we denote by
f (•|ρ) and g(•|ρ) their probability densities: ∀ A, B ∈ B(Rn ),

Qρ ( A) = Tr {ρQ( A)} = f ( x|ρ)dx, Pρ ( B) = Tr {ρP( B)} = g( p|ρ)dp. (5)
A B

In the Dirac notation, if | x and | p are the improper position and momentum eigenvectors,
these densities take the expressions f ( x|ρ) = x|ρ| x and g( p|ρ) = p|ρ| p, respectively. The mean
vectors and the variance matrices of these distributions will be given in (7) and (8).

2.1.2. Scalar Observables

As target observables we take the position along a given direction u and the momentum along
another given direction v:

Qu = u · Q, Pv = v · P, with u, v ∈ Rn , |u| = |v| = 1, u · v = cos α. (6)

In this case we have [ Qu , Pv ] = ih̄ cos α, so that Qu and Pv are not jointly measurable, unless the
directions u and v are orthogonal.
Their pvm’s are denoted by Qu and Pv , their distributions in a state ρ by Qu,ρ and Pv,ρ , and their
corresponding probability densities by f u (•|ρ) and gv (•|ρ): ∀ A, B ∈ B(R),

Qu,ρ ( A) = Tr{Qu ( A)ρ} = f u ( x |ρ) dx, Pv,ρ ( B) = Tr{Pv ( A)ρ} = gv ( p|ρ) dp.
A B

216
Entropy 2017, 19, 301

Of course, the densities in the scalar case are marginals of the densities in the vector case.
Means and variances will be given in (11).

2.2. Quantum Moments

Let S2 be the set of states for which the second moments of position and momentum are ﬁnite:
= >
S2 := ρ∈S: | x|2 f ( x|ρ)dx < +∞, | p|2 g( p|ρ)dp < +∞ .
Rn Rn

Then, the mean vector and the variance matrix of the position Q in the state ρ ∈ S2 are

ρ
ai : = xi f ( x|ρ)dx ≡ Tr {ρQi } ,
Rn
(7)
ρ ρ ρ ρ ρ
Aij := xi − ai x j − a j f ( x|ρ)dx ≡ Tr ρ Qi − ai Qj − aj ,
Rn

while for the momentum P we have

ρ
bi : = pi g( p|ρ)dp ≡ Tr {ρPi } ,
Rn
(8)
ρ ρ ρ ρ ρ
Bij := p i − bi p j − b j g( p|ρ)dp ≡ Tr ρ Pi − bi Pj − b j .
Rn

For ρ ∈ S2 it is possible to introduce also the mixed ‘quantum covariances’

# ρ ρ ρ ρ $
ρ
( Qi − ai )( Pj − b j ) + ( Pj − b j )( Qi − ai )
Cij := Tr ρ . (9)
2

Since there is no joint measurement for the position Q and momentum P, the quantum covariances
ρ
Cij are not covariances of a joint distribution, and thus they do not have a classical probabilistic
interpretation.
By means of the moments above, we construct the three real n × n matrices Aρ , Bρ , C ρ ,
the 2n-dimensional vector μρ and the symmetric 2n × 2n matrix V ρ , with

aρ Aρ Cρ
μρ := , V ρ := . (10)
bρ (C ρ ) T Bρ

We say V ρ is the quantum variance matrix of position and momentum in the state ρ. In [2]
dimensionless canonical operators are considered, but apart from this, our matrix V ρ corresponds to
their “noise matrix in real form”; the name “variance matrix” is also used [44,48].
In a similar way, we can introduce all the moments related to the position Qu and momentum Pv
introduced in (6). For ρ ∈ S2 , the means and variances are respectively

u · aρ , Var(Qu,ρ ) = u · Aρ u, v · bρ , Var(Pv,ρ ) = v · Bρ v. (11)

Similarly to (9), we have also the ‘quantum covariance’ u · C ρ v ≡ v · (C ρ ) T u. Then, we collect the
two means in a single vector and we introduce the variance matrix:

ρ u · aρ ρ u · Aρ u u · Cρ v
μu,v := , Vu,v := . (12)
v · bρ u · Cρ v v · Bρ v

217
Entropy 2017, 19, 301

A C
Proposition 1. Let V = be a real symmetric 2n × 2n block matrix with the same dimensions of
CT B
a quantum variance matrix. Deﬁne

A C ± i 2h̄ 1 i 0 h̄1
V± := ≡ V ± Ω, with Ω := . (13)
C T ∓ i 2h̄ 1 B 2 −h̄1 0

Then
V = V ρ for some state ρ ∈ S2 ⇐⇒ V+ ≥ 0 ⇐⇒ V− ≥ 0. (14)

In this case we have: V ≥ 0, A > 0, B > 0, and

' (2 h̄2 ' (2

(u · Au )(v · Bv ) ≥ v · Cu + v ·u , ∀ u ∈ Rn , ∀ v ∈ Rn . (15)
4

The inequalities (14) for V± tell us exactly when a (positive semi-definite) real matrix V is the quantum
variance matrix of position and momentum in a state ρ. Moreover, they are the multidimensional
version of the usual uncertainty principle expressed through the variances [2,3,5], hence they represent
a form of PURs. The block matrix Ω in the deﬁnition of V± is useful to compress formulae involving
position and momentum; moreover, it makes simpler to compare our equations with their frequent
dimensionless versions (with h̄ = 1) in the literature [43,44].

Proof. Equivalences (14) are well known, see e.g., [3] (Section 1.1.5), [5] (Equation (2.20)), and [2]
(Theorem 2). Then V = 12 V+ + 12 V− ≥ 0.

αu
By using the real block vector , with arbitrary α, β ∈ R and given u , v ∈ Rn ,
βv
the semi-positivity (14) implies

u · Au u · Cv ± i 2h̄ u · v
≥ 0, ∀ u ∈ Rn , ∀ v ∈ Rn ,
v · C T u ∓ i 2h̄ v · u v · Bv

which in turn implies A ≥ 0, B ≥ 0 and (15). Then, by choosing u = v = ui , where u1 , . . . , un are the
eigenvectors of A (since A is a real symmetric matrix, ui ∈ Rn for all i), one gets the strict positivity of
all the eigenvalues of A; analogously, one gets B > 0.

Inequality (15) for u = u and v = v becomes the uncertainty rule à la Robertson [6] for the
observables in (6) (a position component and a momentum component spanning an arbitrary angle α):

h̄2
Var(Qu,ρ ) Var(Pv,ρ ) ≥ (v · C ρ u)2 + (cos α)2 . (16)
4
Inequality (16) is equivalent to

ρ ih̄ 0 1
Vu,v ± cos α ≥ 0. (17)
2 −1 0

Since V± are block matrices, their positive semi-deﬁniteness can be studied by means of the Schur
complements [49–51]. However, as V± are complex block matrices with a very peculiar structure,
special results hold for them. Before summarizing the properties of V± in the next proposition, we need
a simple auxiliary algebraic lemma.

Lemma 1. Let A and B be complex self-adjoint matrices such that A ≥ B ≥ 0. Then det A ≥ det B ≥ 0,
and the equality det A = det B holds iff A = B.

218
Entropy 2017, 19, 301

Proof. Let λi↓ ( A) and λi↓ ( B) be the ordered decreasing sequences of the eigenvalues of A and B,
respectively. Then, by Weyl’s inequality, A ≥ B ≥ 0 implies λi↓ ( A) ≥ λi↓ ( B) ≥ 0 for every i [52]
(Section III.2). This gives the ﬁrst statement. Moreover, if A ≥ B ≥ 0 and det A = det B, we get
λi↓ ( A) = λi↓ ( B) for every i. Then A = B because A − B ≥ 0 and Tr{ A − B} = 0.

A C
Proposition 2. Let V = be a real symmetric 2n × 2n matrix with the same dimensions of
CT B
a quantum variance matrix. Then V+ ≥ 0 (or, equivalently, V− ≥ 0) if and only if A > 0 and

ih̄ ih̄ h̄2 ih̄ −1
B≥ CT ∓ 1 A −1 C ± 1 ≡ C T A −1 C + A −1 ∓ A C − C T A −1 . (18)
2 2 4 2

In this case we have

h̄2 −1 h̄2 −1
B ≥ C T A −1 C + A ≥ A > 0. (19)
4 4
Moreover, we have also the following properties for the various determinants:

h̄ 2n
(det A)(det B) ≥ det V = (det A) det B − C T A−1 C ≥ , (20)
2
2n
h̄ h̄2 −1
det V = ⇔ B = C T A −1 C + A ⇒ CA = AC T , (21)
2 4
2n
h̄ h̄2 −1
(det A)(det B) = ⇔ B= A , C = 0. (22)
2 4

By interchanging A with B and C with C T in (18)–(22) equivalent results are obtained.

Proof. Since we already know that V+ ≥ 0 implies the invertibility of A, the equivalence between (14)
and (18) with A > 0 follows from [49] (Theorem 1.12 p. 34) (see also [50] (Theorem 11.6) or [51] (Lemma 3.2)).
In (19), the ﬁrst inequality follows by summing up the two inequalities in (18). The last two ones
are immediate by the positivity of A−1 .
The equality in (20) is Schur’s formula for the determinant of block matrices ([49], Theorem 1.1 p. 19).
Then, the ﬁrst inequality is immediate by the lemma above and the trivial relation B ≥ B − C T A−1 C;
the second one follows from (19):

h̄2 −1 h̄2 −1 (h̄/2)2n
B − C T A −1 C ≥ A ⇒ det B − C T A−1 C ≥ det A = .
4 4 det A

2n ' ( 2
The equality det V = 2h̄ is equivalent to det B − C T A−1 C = det h̄4 A−1 ; since the latter
two determinants are evaluated on ordered positive matrices by (19), they coincide if and only if
the respective arguments are equal (Lemma 1); this shows the equivalence in (21). Then, by (18),
' (
the self-adjoint matrix ih̄2 A−1 C − C T A−1 is both positive semi-deﬁnite and negative semi-deﬁnite;
hence it is null, that is, CA = AC .T
2
2n 2n
Finally, B = h̄4 A−1 gives (det A)(det B) = 2h̄ trivially. Conversely, (det A)(det B) = 2h̄

implies det B = det B − C T A−1 C by (20); since B ≥ B − C T A−1 C ≥ 0 by (19), Lemma 1 then implies
C T A−1 C = 0 and so C = 0.

By (18) and (19), every time three matrices A, B, C deﬁne the quantum variance matrix of a state
ρ, the same holds for A, B, C) = 0. This fact can be used to characterize when two positive matrices

219
Entropy 2017, 19, 301

A and B are the diagonal blocks of some quantum variance matrix, or two positive numbers cQ and c P
are the position and momentum variances of a quantum state along the two directions u and v.

Proposition 3. Two real matrices A > 0 and B > 0, having the dimension of the square of a length and
momentum, respectively, are the diagonal blocks of a quantum variance matrix V ρ if and only if

h̄2 −1
B≥ A .
4
Two real numbers cQ > 0 and c P > 0, having the dimension of the square of a length and momentum,
respectively, are such that cQ = Var(Qu,ρ ) and c P = Var(Pv,ρ ) for some state ρ if and only if
2
h̄
cQ cP ≥ cos α .
2

Proof.For A
and B, the necessity follows from (19). The sufﬁciency comes from (18) by choosing
A 0
Vρ = .
0 B
ρ
ForcQ and c P , the necessity follows from (15). The sufﬁciency comes from (18) with V =
A 0
and for example the following choices of A and B:
0 B

• if cos α = ±1, we take A = cQ 1 and B = c P 1;

• if cos α = 0, we let

h̄2 h̄2
A = cQ uu T + vv T + A B= uu T + c P vv T + B ,
4c P 4cQ

where A and B are any two scalar multiples of the orthogonal projection onto {u, v}⊥ satisfying
2
B ≥ h̄4 A −1 when restricted to {u, v}⊥ ;
• if cos α ∈
/ {0, ±1}, we choose
2 3
1 2
A = cQ uu T − (uv T + vu T ) + vv T + A
cos α (cos α) 2
2 3
cP (sin α)2 + (cos α)4 T 1
B= uu − (uv T + vu T ) + vv T + B ,
(sin α)4 (cos α) 2 cos α

where A and B are as in the previous item.

cQ cP
In the last two cases, we chose A and B in such a way that B = (cos α)2
A −1 when restricted to the
linear span of {u, v}.

2.3. Weyl Operators and Gaussian States

In the following, we shall introduce Gaussian states, Gaussian observables and covariant
observables on the phase-space. In all these instances, the Weyl operators are involved; here we recall
their deﬁnition and some properties (see e.g., [4] (Section 5.2) or [5] (Section 12.2), where, however, the
deﬁnition differs from ours in that the Weyl operators are composed with the map Ω−1 of (13)).

Deﬁnition 1. The Weyl operators are the unitary operators deﬁned by

= > n n i ix j p j

i
= ∏ e h̄ ( p j Q j − x j Pj ) = ∏ e h̄ p j Q j e− h̄ x j Pj e− 2h̄ .
i i
W ( x, p) := exp ( p · Q − x · P) (23)
h̄ j =1 j =1

220
Entropy 2017, 19, 301

The Weyl operators (23) satisfy the composition rule

= >
i
W ( x1 , p1 )W ( x2 , p2 ) = exp − ( x1 · p2 − x2 · p1 ) W ( x1 + x2 , p1 + p2 );
2h̄

in particular, this implies the commutation relation

# $
x2
W ( x1 , p1 )W ( x2 , p2 ) = exp −i x1T p1T Ω−1 W ( x 2 , p 2 )W ( x 1 , p 1 ) . (24)
p2

These commutation relations imply the translation property

W ( x, p)∗ Qi W ( x, p) = Qi + xi , W ( x, p)∗ Pi W ( x, p) = Pi + pi , i = 1, . . . , n; (25)

due to this property, the Weyl operators are also known as displacement operators.
With a slight abuse of notation, we shall sometimes use the identification

x
W ( x, p) ≡ W , (26)
p

x
where is a block column vector belonging to the phase-space Rn × Rn ≡ R2n ; here, the first block
p
x is a position and the second block p is a momentum.
By means of the Weyl operators, it is possible to define the characteristic function of any
trace-class operator.

Deﬁnition 2. For any operator ρ ∈ T (H), its characteristic function is the complex valued function
ρ" : R2n → C deﬁned by
k
ρ"(w) := Tr {ρW (−Ωw)} , w≡ . (27)
l

Note that k is the inverse of a length and l is the inverse of a momentum, so that w is a block
vector living in the space R2n ≡ Rn × Rn regarded as the dual of the phase-space.
Instead of the characteristic function, sometimes the so called Weyl transform Tr {W ( x, p)ρ} is
introduced [4,44].
By [4] (Proposition 5.3.2, Theorem 5.3.3), we have ρ"(w) ∈ L2 (R2n ) and the following trace formula
holds: ∀ρ, σ ∈ T (H), n
h̄
Tr{σ∗ ρ} = "
σ(w) ρ"(w) dw. (28)
2π R2n

As a corollary [4] (Corollary 5.3.4), we have that a state ρ ∈ S is pure if and only if
n
h̄
|ρ"(w)|2 dw = 1.
2π R2n

By [53] (Lemma 3.1) or [26] (Proposition 8.5.(e)), the trace formula also implies

1
W ( x, p)ρW ( x, p)∗ dxdp = Tr{ρ}1, ∀ ρ ∈ T (H ) . (29)
(2πh̄)n R2n

221
Entropy 2017, 19, 301

Moreover, the following inversion formula ensures that the characteristic function ρ" completely
characterizes the state ρ [4] (Corollary 5.3.5):
n
h̄
ρ= W (Ωw) ρ"(w)dw, ∀ ρ ∈ T (H ) .
2π R2n

The last two integrals are deﬁned in the weak operator topology.
Finally, for ρ ∈ S2 , the moments (7)–(10) can be expressed as in [4] (Section 5.4):

ρ(w)
∂" ρ ∂2 ρ"(w) ρ ρ ρ
−i = μi , − = Vij + μi μ j . (30)
∂wi 0 ∂wi ∂w j 0

Deﬁnition 3 ([2–5,44,48]). A state ρ is Gaussian if

= >
1
ρ"(w) = exp iw T μρ − w T V ρ w
2
= > (31)
1
= exp i (k · a + l · bρ ) − (k · Aρ k + l · Bρ l ) − k · C ρ l ,
ρ
2
ρ
for a vector μρ ∈ R2n and a real 2n × 2n matrix V ρ such that V+ ≥ 0.

ρ
The condition V+ ≥ 0 is necessary and sufficient in order that the function (31) defines the
characteristic function of a quantum state [4] (Theorem 5.5.1), [5] (Theorem 12.17). Therefore,
Gaussian states are exactly the states whose characteristic function is the exponential of a second order
polynomial [4] (Equation (5.5.49)), [5] (Equation (12.80)).
We shall denote by G the set of the Gaussian states; we have G ⊂ S2 ⊂ S. By (30), the vectors
aρ , bρ and the matrices Aρ , Bρ , C ρ characterizing a Gaussian state ρ are just its first and second order
quantum moments introduced in (7)–(9). By (31), the corresponding distributions of position and
momentum are Gaussian, namely

Qρ = N ( a ρ ; A ρ ), Qu,ρ = N (u · aρ ; u · Aρ u), Pρ = N ( b ρ ; B ρ ), Pv,ρ = N (v · bρ ; v · Bρ v). (32)

2n
Proposition 4 (Pure Gaussian states). For ρ ∈ G, we have det V ρ = h̄
2 if and only if ρ is pure.

(h̄/2)n
Proof. The trace formula (28) and (31) give Tr{ρ2 } = √ , and this implies the statement.
det V ρ

2n
Proposition 5 (Minimum uncertainty states). For ρ ∈ S2 , we have (det Aρ )(det Bρ ) = h̄2 if and only if
ρ is a pure Gaussian state and it factorizes into the product of minimum uncertainty states up to a rotation of Rn .
2n 2
Proof. If (det Aρ )(det Bρ ) = 2h̄ , then the equivalence (22) gives Bρ = h̄4 ( Aρ )−1 , so that the
variance matrices Aρ and Bρ have a common eigenbasis u1 , . . . , un . Thus, all the corresponding
2
couples of position Qui and momentum Pui have minimum uncertainties: Var(Qui ) Var(Pui ) = h̄4 .
Therefore, if we consider the factorization of the Hilbert space H = H1 ⊗ · · · ⊗ Hn corresponding
to the basis u1 , . . . , un , all the partial traces of the state ρ on each factor Hi are minimum uncertainty
states. Since for n = 1 the minimum uncertainty states are pure and Gaussian, the state ρ is a pure
product Gaussian state.
The converse is immediate.

222
Entropy 2017, 19, 301

3. Relative and Differential Entropies

In this paper, we will be concerned with entropic quantities of classical type [54–56]. We express
them in ‘bits’, that is we use the base-2 logarithms: log a ≡ log2 a.
' (
We deal only with probabilities on the measurable space Rn , B(Rn ) which admit densities with
respect to the Lebesgue measure. So, we deﬁne the relative entropy and differential entropy only for
such probabilities; moreover, we list only the general properties used in the following.

3.1. Relative Entropy or Kullback-Leibler Divergence

The fundamental quantity is the relative entropy, also called information divergence, discrimination
information, Kullback-Leibler divergence or information or distance or discrepancy. The relative entropy of
a probability p with respect to a probability q is deﬁned for any couple of probabilities p, q on the same
probability space.
Given two probabilities p and q on (Rn , B(Rn )) with densities f and g, respectively, the relative
entropy of p with respect to q is

f ( x)
S( pq) = f ( x) log dx. (33)
Rn g( x)

The value +∞ is allowed for S( pq); the usual convention 0 log(0/0) = 0 is understood.
The relative entropy (33) is the amount of information that is lost when q is used to approximate
p [54] (p. 51). Of course, if x is dimensioned, then the densities f and g have the same dimension
(that is, the inverse of x), and the argument of the logarithm is dimensionless, as it must be.

Proposition 6 ([56], Theorem 8.6.1). The following properties hold.

(i) S( pq) ≥ 0.
(ii) S( pq) = 0 ⇐⇒ p=q ⇐⇒ f = g a.e..
(iii) S( pq) is invariant under a change of the unit of measurement.
(iv) If p = N ( a; A) and q = N (b; B) with invertible variance matrices A and B, then
# $
det B
−1 −1
2 S( pq) = (log e) ( a − b) · B ( a − b) + Tr B A−1 + log . (34)
det A

As S( pq) is scale invariant, it quantiﬁes a relative error for the use of q as an approximation of p,
not an absolute one.
Let us employ the relative entropy to evaluate the effect of an additive Gaussian noise ν ∼ N (b; β2 )
on an independent Gaussian random variable X. If X ∼ N ( a; α2 ), then X + ν ∼ N ( a + b; α2 + β2 ),
and the relative entropy of the true distribution of X with respect to its disturbed version X + ν is

log e b2 − β2 1 α2 + β2
S( X X + ν) = + log .
2 α +β
2 2 2 α2

This expression vanishes if the noise becomes negligible with respect to the true distribution, that
is if β2 /α2 → 0 and b2 /α2 → 0. On the other hand, S( X X + ν) diverges if the noise becomes too
strong with respect to the true distribution, or, in other words, if the true distribution becomes too
peaked with respect to the noise, that is, β2 /α2 → +∞ or b2 /α2 → +∞.

3.2. Differential Entropy

The differential entropy of an absolutely continuous random vector X with a probability density f is

H ( X ) := − f ( x) log f ( x)dx.
Rn

223
Entropy 2017, 19, 301

This quantity is commonly used in the literature, even if it lacks many of the nice properties of
the Shannon entropy for discrete random variables. For example, H ( X ) is not scale invariant, and it
can be negative [56] (p. 244).
Since the density f enters in the logarithm argument, the deﬁnition of H ( X ) is meaningful only
when f is dimensionless, which is the same as X being dimensionless. Note that, if X is dimensioned
and c > 0 is a real parameter making X ) = cX a dimensionless random variable, then

)) = − f (u/c) f (u/c) f ( x)
H (X log du = − f ( x) log dx .
Rn cn cn Rn cn

In the following, we shall consider the differential entropy only for dimensionless random vectors X.

Proposition 7 ([56], Section 8.6). The following properties hold.

(i) If X is an absolutely continuous random vector with variance matrix A, then

1 n 1
H (X ) ≤ log (2πe)n det A = log (2πe) + Tr log A.
2 2 2
The equality holds iff X is Gaussian with variance matrix A and arbitrary mean vector a.
(ii) If X = ( X1 , . . . , Xn ) is an absolutely continuous random vector, then
n
H (X ) ≤ ∑ H ( Xi ) .
i =1

The equality holds iff the components X1 , . . . , Xn are independent.

Remark 1. In property (i) we have used the following well-known matrix identity, which follows by diagonalization:

log det A = Tr log A, ∀ A > 0.

Remark 2. Property (i) yields that the differential entropy of a Gaussian random variable X ∼ N ( a; α2 ) is

1
H (X) = log 2πeα2 ,
2

which is an increasing function of the variance α2 , and thus it is a measure of the uncertainty of X. Note that
H ( X ) ≥ 0 iff α2 ≥ 1/(2πe).

4. Entropic PURs for Position and Momentum

The idea of having an entropic formulation of the PURs for position and momentum goes back
to [7–9]. However, we have just seen that, due to the presence of the logarithm, the Shannon differential
entropy needs dimensionless probability densities. So, this leads us to introduce dimensionless versions
of position and momentum.
Let λ > 0 be a dimensionless parameter and κ a second parameter with the dimension of a mass
times a frequency. Then, we introduce the dimensionless versions of position and momentum:

κ
) :=
Q Q, ) = √λ P
P ⇒ ) i , P)j = iλδij .
Q (35)
h̄ h̄κ

We use a unique dimensional constant κ , in order to respect rotation symmetry and do not
distinguish different particles. Anyway, there is no natural link between the parameter multiplying Q
and the parameter multiplying P; this is the reason for introducing λ. As we see from the commutation
rules, the constant λ plays the role of a dimensionless version of h̄; in the literature on PURs, often λ = 1
is used [8,9,12,46].

224
Entropy 2017, 19, 301

4.1. Vector Observables

Let Q) and P ) be the pvm’s of Q) and P; ) ρ and P
) then, Q ) ρ are their probability distributions in the
state ρ. The total preparation uncertainty is quantiﬁed by the sum of the two differential entropies
) ρ ) + H (P
H (Q ) ρ ). For ρ ∈ G, by Proposition 7 we get
2 n 3
) ρ ) + H (P
) ρ ) = n log (πeλ) + 1 4
H (Q log 2
(det Aρ ) (det Bρ ) . (36)
2 h̄
n
In the case of product states of minimum uncertainty, we have (det Aρ ) (det Bρ ) = h̄2 /4 ;
then, by taking (20) into account, we get

inf ) ρ ) + H (P
H (Q ) ρ ) = n log (πeλ) . (37)
ρ ∈G

Thus, the bound (37) arises from quantum relations between Q and P; indeed, there would be no
lower bound for (36) if we could take both det Aρ and det Bρ arbitrarily small.
By item (ii) of Proposition 7, the differential entropy for the distribution of a random vector is
smaller than the sum of the entropies of its marginals; however, the ﬁnal bound (37) is a tight bound
for both H (Q) ρ ) + H (P
) ρ ) and ∑n H (Q
) i,ρ ) + ∑n H (P
) i,ρ ).
i =1 i =1
By the results of [8,9], the same bound (37) is obtained even if the minimization is done over all
the states, not only the Gaussian ones.
The uncertainty result (37) depends on λ, this being a consequence of the lack of scale invariance
of the differential entropy; note that the bound is positive if and only if λ > 1/(πe). Sometimes in
the literature the parameter h̄ appears in the argument of the logarithm [27,30]; this fact has to be
interpreted as the appearance of a parameter with the numerical value of h̄, but without dimensions.
In this sense the formulation (37) is consistent with both the cases with λ = 1 or λ = h̄. Sometimes the
smaller bound ln 2π appears in place of log πe [10]; this is connected to a state dependent formulation
of the entropic PUR [12] (Section V.B).

4.2. Scalar Observables

The dimensionless versions of the scalar observables introduced in (6) are

κ λ
)u =
Q Qu , P)v = √ Pv ⇒ ) u , P)v = iλ cos α.
Q (38)
h̄ h̄κ

We denote by Q) u,ρ and P

) v,ρ the associated distributions in the state ρ. For ρ ∈ S2 , the respective
means and variances are

κ λ ) u,ρ ) = κ u · Aρ u, ) v,ρ ) = λ v · Bρ v,
2
u · aρ , √ v · bρ , Var(Q Var(P
h̄ h̄κ h̄ h̄κ
/
with Var(Q) u,ρ ) Var(P
) v,ρ ) ≥ λ |cos α| /2.
As in the vector case, the total preparation uncertainty is quantiﬁed by the sum of the two
differential entropies H (Q ) u,ρ ) + H (P
) v,ρ ). For ρ ∈ G, Proposition 7 gives
/
) u,ρ ) + H (P
H (Q ) v,ρ ) = log 2πe Var(Q
) u,ρ ) Var(P
) v,ρ ) . (39)

Then, we have the lower bound

inf H (Q ) v,ρ ) = log (πeλ |cos α|) = 1 + ln (π |λ cos α|) ,
) u,ρ ) + H (P (40)
ρ ∈G ln 2

225
Entropy 2017, 19, 301

which depends on λ, but not on κ . Of course, because of (39), for Gaussian states a lower bound
for the sum H (Q ) u,ρ ) + H (P
) v,ρ ) is equivalent to a lower bound for the product Var(Q
) u,ρ ) Var(P
) v,ρ ).
By the generalization of the results of [8,9] given in [46], the bound (40) is obtained also when the
minimization is done over all the states.
Let us note that the bound in (40) is positive for |λ cos α| > 1/(πe), and it goes to −∞ for
α → π/2, which is the case of compatible Qu,ρ and Pv,ρ . In the case α = 0, the bound (40) is the same
as (37) for n = 1.

5. Approximate Joint Measurements of Position and Momentum

In order to deal with MURs for position and momentum observables, we have to introduce
the class of approximate joint measurements of position and momentum, whose marginals we will
compare with the respective sharp observables. As done in [3,4,18,57], it is natural to characterize such
a class by requiring suitable properties of covariance under the group of space translations and velocity
boosts: namely, by approximate joint measurement of position and momentum we will mean any POVM on
the product space of the position and momentum outcomes sharing the same covariance properties of
the two target sharp observables. As we have already discussed, two approximation problems will be
of our concern: the approximation of the position and momentum vectors (vector case, with outcomes
in the phase-space Rn × Rn ), and the approximation of one position and one momentum component
along two arbitrary directions (scalar case, with oucomes in R × R). In order to treat the two cases
altogether, we consider POVMs with outcomes in Rm × Rm ≡ R2m , which we call bi-observables; they
correspond to a measurement of m position components and m momentum components. The specific
covariance requirements will be given in the Definitions 5–7.
In studying the properties of probability measures on Rk , a very useful notion is that of the
characteristic function, that is, the Fourier cotransform of the measure at hand; the analogous
quantity for POVMs turns out to have the same relevance. Different names have been used in
the literature to refer to the characteristic function of POVMs, or, more generally, quantum instruments,
such as characteristic operator or operator characteristic function [3,24,34,44,58–62]. As a variant,
also the symplectic Fourier transform quite often appears [5] (Section 12.4.3). The characteristic
function has been used, for instance, to study the quantum analogues of the infinite-divisible
distributions [3,34,58–60,62] and measurements of Gaussian type [5,44,61]. Here, we are interested
only in the latter application, as our approximating bi-observables will typically be Gaussian. Since
we deal with bi-observables, we limit our definition of the characteristic function only to POVMs on
Rm × Rm , which have the same number of variables of position and momentum type.
Being measures, POVMs can be used to construct integrals, whose theory is presented e.g., in [26]
(Section 4.8) and [4] (Section 2.9, Proposition 2.9.1).

Deﬁnition 4. Given a bi-observable M : B(R2m ) → L(H), the characteristic function of M is the operator
" : R2m → L(H), with
valued function M

" (k, l ) =
M ei(k· x+l · p) M(dxdp). (41)
R2m

In this definition the dimensions of the vector variables k and l are the inverses of a length and
momentum, respectively,
as in the definition of the characteristic function of a state (27). This definition
is given so that Tr M" (k, l )ρ is the usual characteristic function of the probability distribution Mρ
on R2m .

5.1. Covariant Vector Observables

In terms of the pvm’s (4), the translation property (25) is equivalent to the symmetry properties

W ( x, p)Q( A)W ( x, p)∗ = Q( A + x), W ( x, p)P( B)W ( x, p)∗ = P( B + p), ∀ A, B ∈ B(Rn ),

226
Entropy 2017, 19, 301

and they are taken as the transformation property deﬁning the following class of POVMs on
R2n [23,26,44,53,57].

Deﬁnition 5. A covariant phase-space observable is a bi-observable M : B(R2n ) → L(H) satisfying the

covariance relation

x
W ( x, p)M( Z )W ( x, p)∗ = M Z + , ∀ Z ∈ B(R2n ), ∀ x, p ∈ Rn .
p

We denote by C the set of all the covariant phase-space observables.

The interpretation of covariant phase-space observables as approximate joint measurements of

position and momentum is based on the fact that their marginal POVMs

M1 ( A) = M( A × Rn ), M2 ( B) = M(Rn × B), A, B ∈ B(Rn ),

have the same symmetry properties of Q and P, respectively. Although Q and P are not jointly
measurable, the following well-known result says that there are plenty of covariant phase-space
observables [4] (Theorem 4.8.3), [63,64]. In (43) below, we use the parity operator Π on H, which is
such that
Π W ( x, p) Π = W (− x, − p) = W ( x, p)∗ . (42)

Proposition 8. The covariant phase-space observables are in one-to-one correspondence with the states on H,
so that we have the identiﬁcation S ∼ C; such a correspondence σ ↔ Mσ is given by

Mσ ( B ) = Mσ ( x, p) dxdp, ∀ B ∈ B(R2n ),
B
1 (43)
Mσ ( x, p) = W ( x, p)ΠσΠW ( x, p)∗ .
(2πh̄)n

The characteristic function (41) of a measurement Mσ ∈ C has a very simple structure in terms of
the characteristic function (27) of the corresponding state σ ∈ S.

Proposition 9. The characteristic function of Mσ ∈ C is given by

" σ (k, l ) = W (−Ωw) " k
M σ ( w ), w≡ ∈ R2n , (44)
l

and the characteristic function of the probability Mσρ is

Tr M" σ (k, l )ρ = ρ"(w)"
σ ( w ). (45)

In (44) we have used the identification (26). The characteristic function of a state is introduced in (27).

Proof. By the commutation relations (24) we have

W (−h̄l, h̄k)W ( x, p)W (−h̄l, h̄k)∗ = ei(k· x+l · p) W ( x, p).

227
Entropy 2017, 19, 301

Then, we get

1
" σ (k, l ) =
M ei(k· x+l · p) W ( x, p)ΠσΠW ( x, p)∗ dxdp
(2πh̄)n R2n

1
= W (− h̄l, h̄k)W ( x, p)W (− h̄l, h̄k)∗ ΠσΠW ( x, p)∗ dxdp
(2πh̄)n R2n
= W (−h̄l, h̄k) Tr{W (−h̄l, h̄k)∗ ΠσΠ},

where we used the formula (29). By (42) and the definition (27), we get (44). Again by (27), we get (45).

In terms of probability densities, measuring Mσ on the state ρ yields the density function
hσ (x, p|ρ)= Tr{ Mσ (x, p)ρ}. Then, by (45), the densities of the marginals M1,ρ
σ and Mσ are the convolutions
2ρ

h1σ (•|ρ) = f (•|ρ) ∗ f (•|σ), h2σ (•|ρ) = g(•|ρ) ∗ g(•|σ), (46)

where f and g are the sharp densities introduced in (5). By the arbitrariness of the state ρ, the marginal
POVMs of Mσ turn out to be the convolutions (or ‘smearings’)

M1σ ( A) dx f ( x − x |σ)Q(dx ), M2σ ( B) dp g( p − p |σ )P(dp )
A Rn B Rn

(see e.g., [23] (Section III, Equations (2.48) and (2.49))).

Let us remark that the distribution of the approximate position observable M1σ in a state ρ is the
distribution of the sum of two independent random vectors: the ﬁrst one is distributed as the sharp
position Q in the state ρ, the second one is distributed as the sharp position Q in the state σ. In this
sense, the approximate position M1σ looks like a sharp position plus an independent noise given by σ.
Of course, a similar fact holds for the momentum. However, this statement about the distributions
can not be extended to a statement involving the observables. Indeed, since Q and P are incompatible,
nobody can jointly observe Mσ , Q and P, so that the convolutions (46) do not correspond to sums of
random vectors that actually exist when measuring Mσ .

5.2. Covariant Scalar Observables

Now we focus on the class of approximate joint measurements of the observables Qu and Pv
representing position and momentum along two possibly different directions u and v (see Section 2.1.2).
As in the case of covariant phase-space observables, this class is deﬁned in terms of the symmetries of
its elements: we require them to transform as if they were joint measurements of Qu and Pv . Recall
that Qu and Pv denote the spectral measures of Qu , Pv .
Due to the commutation relation (24), the following covariance relations hold

W ( x, p)Qu ( A)W ( x, p)∗ = Qu ( A + u · x), W ( x, p)Pv ( B)W ( x, p)∗ = Pv ( B + v · p),

for all A, B ∈ B(R) and x, p ∈ Rn . We employ covariance to deﬁne our class of approximate joint
measurements of Qu and Pv .

Deﬁnition 6. A (u, v)-covariant bi-observable is a POVM M : B(R2 ) → L(H) such that

u·x
W ( x, p)M( Z )W ( x, p)∗ = M Z+ , ∀ Z ∈ B(R2 ), ∀ x, p ∈ Rn .
v·p

We denote by Cu,v the class of such bi-observables.

So, our approximate joint measurements of Qu and Pv will be all the bi-observables in the class Cu,v .

228
Entropy 2017, 19, 301

Example 1. The marginal of a covariant phase-space observable Mσ along the directions u and v is
a (u, v)-covariant bi-observable. Actually, it can be proved that, if cos α
= 0, all (u, v)-covariant bi-observables
can be obtained in this way.

It is useful to work with a little more generality, and merge Deﬁnitions 5 and 6 into a single notion
of covariance.

Deﬁnition 7. Suppose J is a k × 2n real matrix. A POVM M : B(Rk ) → L(H) is a J-covariant observable

on Rk if

x
W ( x, p)M( Z )W ( x, p)∗ = M Z+J , ∀ Z ∈ B(Rk ), ∀ x, p ∈ Rn .
p

Thus, approximate joint observables of Qu and Pv are just J-covariant observables on R2 for the
choice of the 2 × 2n matrix
u T 0T
J= . (47)
0T v T

On the other hand, covariant phase-space observables constitute the class of 12n -covariant
observables on R2n , where 12n is the identity map of R2n .

5.3. Gaussian Measurements

When dealing with Gaussian states, the following class of bi-observables quite naturally arises.

Deﬁnition 8. A POVM M : B(R2m ) → L(H) is a Gaussian bi-observable if

# $
k aM 1 T k
" (k, l ) = W
M M T
−Ω( J ) exp i k T lT − k lT V M
(48)
l bM 2 l

for two vectors aM , bM ∈ Rm , a real 2m × 2n matrix J M and a real symmetric 2m × 2m matrix V M satisfying
the condition
i
V M ± J M Ω( J M ) T ≥ 0. (49)
2

aM
We set μM = . The triple (μM , V M , J M ) is the set of the parameters of the Gaussian observable M.
bM

In this definition, the vector aM has the dimension of a length, and bM of a momentum; similarly,
the matrices J M , V M decompose into blocks of different dimensions. The condition (49) is necessary
and sufficient in order that the function (48) defines the characteristic function of a POVM.
For unbiased Gaussian measurements, i.e., Gaussian bi-observables with aM = bM = 0,
the previous definition coincides with the one of [5] (Section 12.4.3). It is also a particular case of the
more general definition of Gaussian observables on arbitrary (not necessarily symplectic) linear spaces
that is given in [43,44]. We refer to [5,44] for the proof that Equation (48) is actually the characteristic
function of a POVM.

229
Entropy 2017, 19, 301

Measuring the Gaussian observable M on the Gaussian state ρ yields the probability distribution
Mρ whose characteristic function is
# $
aM 1 T T M k
" M T k
Tr{M(k, l )ρ} = ρ" ( J ) exp i k T l T − k l V
l bM 2 l
# , - $
aM ρ k
M a 1 M M ρ M T
= exp i k T l T +J − k T l T V + J V (J ) ;
bM bρ 2 l

hence the output distribution is Gaussian,

Mρ = N J M μ ρ + μM ; J M V ρ ( J M ) T + V M . (50)

5.3.1. Covariant Gaussian Observables

For Gaussian bi-observables, J-covariance has a very easy characterization.

Proposition 10. Suppose M is a Gaussian bi-observable on R2m with parameters (μM , V M , J M ). Let J be any
2m × 2n real matrix. Then, the POVM M is a J-covariant observable if and only if J M = J.

Proof. For x, p ∈ Rn , we let M and M be the two POVMs on R2m given by

x
M ( Z ) = W ( x, p)M( Z )W ( x, p)∗ , M ( Z ) = M Z+J , ∀ Z ∈ B(R2m ).
p

By the commutation relations (24) for the Weyl operators, we immediately get
# , -$
k
M " (k, l )W ( x, p)∗ = exp
" (k, l ) = W ( x, p)M −i x T p T Ω −1 − Ω ( J M ) T " (k, l )
M
l
# $
x
= exp −i k T l T J M " (k, l );
M
p

we have also

# , -$
x x
" (k, l ) =
M exp i k T lT − J M(dx dp )
R2m p p
# $
x
= exp −i kT lT J " (k, l ).
M
p

" (k, l )
= 0 for all k, l, by comparing the last two expressions we see that M = M if and
Since M
only if
# $ # $
x x
exp −i k T l T JM = exp −i k T lT J , ∀ x, p ∈ Rn , ∀k, l ∈ Rm ,
p p

which in turn is equivalent to J M = J.

Vector Observables
Let us point out the structure of the Gaussian approximate joint measurements of Q and P.

230
Entropy 2017, 19, 301

Proposition 11. A bi-observable Mσ ∈ C is Gaussian if and only if the state σ is Gaussian. In this case,
the covariant bi-observable Mσ is Gaussian with parameters
σ σ σ
μM = μ σ , VM = Vσ, J M = 12n .

Proof. By comparing (31), (44) and (48), and using the fact that W ( x1 , p1 ) ∝ W ( x2 , p2 ) if and only if
x1 = x2 and p1 = p2 , we have the ﬁrst statement. Then, for σ ∈ G, we see immediately that Mσ is
a Gaussian observable with the above parameters.

We call CG the class of the Gaussian covariant phase-space observables. By (50), observing Mσ
on a Gaussian state ρ ∈ G yields the normal probability distribution Mσρ = N (μρ + μσ ; V ρ + V σ ),
with marginals

σ
M1,ρ = N ( a ρ + a σ ; A ρ + A σ ), σ
M2,ρ = N ( b ρ + b σ ; B ρ + B σ ). (51)

When aσ = 0 and bσ = 0, we have an unbiased measurement.

Scalar Observables
We now study the Gaussian approximate joint measurements of the target observables Qu and Pu
deﬁned in (6).

Proposition 12. A Gaussian bi-observable M with parameters (μM , V M , J M ) is in Cu,v if and only if J M = J,
where J is given by (47). In this case, the condition (49) is equivalent to

M M M M h̄2 M 2
V11 ≥ 0, V22 ≥ 0, V11 V22 ≥ (cos α)2 + (V12 ) . (52)
4

Proof. The ﬁrst statement follows from Proposition 10. Then, the matrix inequality (49) reads

ih̄ 0 cos α
VM ± ≥ 0,
2 − cos α 0

which is equivalent to (52).

We write Cu,v
G for the class of the Gaussian (u, v)-covariant phase-space observables. An observable

M ∈ Cu,v
G is thus characterized by the couple ( μM , V M ). From (50) with J M = J given by (47),

we get that measuring M ∈ Cu,v G on a Gaussian state ρ yields the probability distribution

ρ ρ ρ ρ
Mρ = N μu,v + μ ; Vu,v + V
M M with μu,v and Vu,v given by (12). Its marginals with respect to the first
and second entry are, respectively,

M1,ρ = N u · aρ + aM ; Var(Qu,ρ ) + V11
M
, M2,ρ = N v · bρ + bM ; Var(Pv,ρ ) + V22
M
. (53)

Example 2. Let us construct an example of an approximate joint measurement of Qu and Pv , by using a noisy
measurement of position along u followed by a sharp measurement of momentum along v. Let Δ be a positive
real number yielding the precision of the position measurement, and consider the POVM M on R2 given by
= > = >
1 ( x − Q u )2 ( x − Q u )2
M( A × B ) = √ exp − Pv ( B) exp − dx, ∀ A, B ∈ B(R).
2πΔ A 4Δ 4Δ

231
Entropy 2017, 19, 301

The characteristic function of M is

= > 2 3 = >
" (k, l ) = √ 1 ( x − Q u )2 ( x − Q u )2
M eikx exp − eil p Pv (dp) exp − dx
2πΔ R 4Δ R 4Δ
= > = >
1 ( x − Q u )2 ( x − Q u )2
= √ exp ikx − eilPv exp − dx
2πΔ R 4Δ 4Δ
= > = >
eilPv ( x − Qu + h̄lu · v)2 ( x − Q u )2
= √ exp ikx − exp − dx
2πΔ R 4Δ 4Δ
= > = >
1 (h̄l cos α)2 ( x − Qu + h̄l cos α/2)2
= √ exp ilPv − exp ikx − dx
2πΔ 8Δ R 2Δ
= >
h̄l cos α Δ (h̄ cos α) 2
2
= exp ilPv + ik Qu + − k2 − l
2 2 8Δ
= >
Δ (h̄ cos α) 2
2
= W (−h̄lv, h̄ku) exp − k2 − l .
2 8Δ

Therefore, M is a Gaussian bi-observable with parameters aM = 0, bM = 0 and J M = J, where J is given

M = Δ, V M = (h̄ cos α)2 M = 0. This implies M ∈ CG ; in particular, the set CG is
by (47) and V11 22 4Δ and V12 u,v u,v
h̄2
4 (cos α )
MVM =
non-empty. Moreover, the lower bound V11 2 is attained, cf. (52).
22

Example 3. Let us consider the case α = ±π/2; now the target observables Qu and Pv are compatible and
we can deﬁne a pvm M on R2 by setting M( A × B) = Qu ( A)Pv ( B) for all A, B ∈ B(R). Its characteristic
function is
" (k, l ) =
M eikx Qu (dx ) eil p Pv (dp) = ei(kQu +lPv ) = W (−h̄lv, h̄ku).
R R

Then, M ∈ Cu,vG with parameters aM = 0, bM = 0, V M = 0 and J M = J given by (47). Note that M can be

regarded as the limit case of the observables of the previous example when cos α = 0 and Δ ↓ 0.

6. Entropic MURs for Position and Momentum

In the case of two discrete target observables, in [41] we found an entropic bound for the precision of
their approximate joint measurements, which we named entropic incompatibility degree. Its definition
followed a three steps procedure. Firstly, we introduced an error function: when the system is in a given
state ρ, such a function quantifies the total amount of information that is lost by approximating the
target observables by means of the marginals of a bi-observable; the error function is nothing else
than the sum of the two relative entropies of the respective distributions. Then, we considered the
worst possible case by maximizing the error function over ρ, thus obtaining an entropic divergence
quantifying the approximation error in a state independent way. Finally, we got our index of
the incompatibility of the two target observables by minimizing the entropic divergence over all
bi-observables. In particular, when symmetries are present, we showed that the minimum is attained
at some covariant bi-observables. So, the covariance followed as a byproduct of the optimization
procedure, and was not a priori imposed upon the class of approximating bi-observables.
As we shall see, the extension of the previous procedure to position and momentum target
observables is not straightforward, and peculiar problems of the continuous case arise. In order to
overcome them, in this paper we shall fully analyse only a case in which explicit computations can be
done: Gaussian preparations, and Gaussian bi-observables, which we a priori assume to be covariant.
We conjecture that the final result should be independent of these simplifications, as we shall discuss
in Section 7.
As we said in Section 5, by “approximate joint measurement” we mean “a bi-observable with the
‘right’ covariance properties”.

232
Entropy 2017, 19, 301

6.1. Scalar Observables

Given the directions u and v, the target observables are Qu and Pv in (6) with pvm’s Qu and Pv .
For ρ ∈ G with parameters (μρ , V ρ ) given in (10), the target distributions Qu,ρ and Pv,ρ are normal with
means and variances (11).
An approximate joint measurements of Qu and Pv is given by a covariant bi-observable M ∈ Cu,v ;
then, we denote its marginals with respect to the ﬁrst and second entry by M1 and M2 , respectively.
For a Gaussian covariant bi-observable M ∈ Cu,v G with parameters ( μM , V M ), the distribution of M in

a Gaussian state ρ is normal,

ρ ρ
Mρ = N μu,v + μM ; Vu,v + V M ,

so that its marginal distributions M1,ρ and M2,ρ are normal with means u · aρ + aM and v · bρ + bM
and variances
' ( ' ( M ' ( ' ( M
Var M1,ρ = Var Qu,ρ + V11 , Var M2,ρ = Var Pv,ρ + V22 . (54)

Let us recall that |u| = 1, |v| = 1, u · v = cos α, and that by (16) and (52), we have

' ( ' ( h̄2 h̄2

Var Qu,ρ Var Pv,ρ ≥ (cos α)2 , M M
V11 V22 ≥ (cos α)2 . (55)
4 4

6.1.1. Error Function

The relative entropy is the amount of information that is lost when an approximating distribution
is used in place of a target one. For this reason, we use it to give an informational quantiﬁcation of
the error made in approximating the distributions of sharp position and momentum by means of the
marginals of a joint covariant observable.

Deﬁnition 9. Given the preparation ρ ∈ S and the covariant bi-observable M ∈ Cu,v , the error function for
the scalar case is the sum of the two relative entropies:

S(ρ, M) := S(Qu,ρ M1,ρ ) + S(Pv,ρ M2,ρ ). (56)

The relative entropy is invariant under a change of the unit of measurement, so that the error
function is scale invariant, too; indeed, it quantiﬁes a relative error, not an absolute one. In the Gaussian
case the error function can be explicitly computed.

Proposition 13 (Error function for the scalar Gaussian case). For ρ ∈ G and M ∈ Cu,v
G , the error function is

log e
S(ρ, M) = [s( x ) + s(y) + Δ(ρ, M)] , (57)
2
where
M M
V11 V22 ( aM )2 ( bM )2
x := ' (, y := ' (, Δ(ρ, M) := ' (+ ' (,
Var Qu,ρ Var Pv,ρ Var M1,ρ Var M2,ρ
and s : [0, +∞) → [0, +∞) is the following C ∞ strictly increasing function with s(0) = 0:
x
s( x ) := ln (1 + x ) − . (58)
1+x

Proof. The statement follows by a straightforward combination of (32), (34), (53) and (56).

Note that the error function does not depend on the mixed covariances u · C ρ v and V12
M . Note also

that, if we select a possible approximation M, then the error function S(ρ, M) decreases for states ρ

233
Entropy 2017, 19, 301

' ( ' (
with increasing sharp variances Var Qu,ρ and Var Pv,ρ : the loss of information decreases when the
sharp distributions make the approximation error negligible. Finally, note that

s( x ) + s(y) = ln[(1 + x )(1 + y)] + (1 + x )−1 + (1 + y)−1 − 2,

' ( ' (
Var M1,ρ Var M2,ρ
1+x = ' (, 1+y = ' (.
Var Qu,ρ Var Pv,ρ
This means that, apart from the term Δ(ρ, M) due to the bias, our error function S(ρ, M) only
depends on the two ratios “variance of the approximating distribution over variance of the target
distribution”. Thus, in order to optimize the error function, one has to optimize these two ratios.
We use formula (57) to ﬁrstly give a state dependent MUR, and then, following the scheme of [41],
a state independent MUR.
A lower bound for the error function can be found by minimizing it over all possible approximate
joint measurements of Qu and Pv . First of all, let us remark that this minimization makes sense because
we consider only (u, v)-covariant bi-observables: if we minimized over all possible bi-observables,
then the minimum would be trivially zero for every given preparation ρ. Indeed, the trivial bi-observable
M( A × B) = Qu,ρ ( A)Pv,ρ (B) 1 yields S(ρ, M) = 0.
When minimizing the error function over all (u, v)-covariant bi-observables, both the minimum
and the best measurement attaining it are state dependent. When α = ±π/2, the two target
observables are compatible, so that their joint measurement trivially exists (see Example 3) and
we get infM∈Cu,v S(ρ, M) = 0. In order to have explicit results for any angle α, we consider only the
Gaussian case.

Theorem 1 (State dependent MUR, scalar observables). For every ρ ∈ G and M ∈ Cu,v
G ,

S(Qu,ρ M1,ρ ) + S(Pv,ρ M2,ρ ) ≥ cρ (α), (59)

where the lower bound is

with
h̄ |cos α|
zρ := / ' ( ' ( ∈ [0, 1]. (61)
2 Var Qu,ρ Var Pv,ρ

The lower bound is tight and the optimal measurement is unique: cρ (α) = S(ρ, M∗ ), for a unique
M∗ ∈ Cu,v
G ; such a Gaussian ( u, v )-covariant bi-observable is characterized by

B ' ( B ' (
M∗ M∗ h̄ Var Qu,ρ M∗ h̄ Var Pv,ρ
μ M∗
= 0, V12 = 0, V11 = ' ( |cos α| , V22 = ' ( |cos α| . (62)
2 Var Pv,ρ 2 Var Qu,ρ

Proof. As already discussed, the case cos α = 0 is trivial. If cos α

= 0, we have to minimize the error
function (57) over M. First of all we can eliminate the positive term Δ(ρ, M) by taking an unbiased
measurement. Then, since s is an increasing function, by the second condition in (55) we can also take
h̄2
M∗ M∗
V11 V22 = 4 (cos α)2 . This implies V12
M∗
= 0 by (52). In this case the error function (57) reduces to

log e M∗
V11
S(ρ, M∗ ) = s( x ) + s(zρ2 /x ) , x= ' (,
2 Var Qu,ρ

234
Entropy 2017, 19, 301

with zρ given by (61); by the ﬁrst of (55), we have zρ ∈ (0, 1].

Now, we can minimize the error function with respect to x by studying its ﬁrst derivative:

d x zρ4 x2 − zρ2 x2 + 2zρ2 x + zρ2
s( x ) + s(zρ2 /x ) = − = 2 .
dx (1 + x )2 x (zρ2 + x )2
x zρ2 + x (1 + x )2

Having x > 0, we immediately get that x = zρ gives the unique minimum. Thus

zρ
S(ρ, M) ≥ S(ρ, M∗ ) = s(zρ ) log e = log(1 + zρ ) − log e,
1 + zρ

and
B ' ( B ' (
M∗ ' ( h̄ Var Qu,ρ M∗ ' ( h̄ Var Pv,ρ
V11 = zρ Var Qu,ρ ≡ ' ( |cos α| , V22 = zρ Var Pv,ρ ≡ ' ( |cos α| ,
2 Var Pv,ρ 2 Var Qu,ρ

which conclude the proof.

Remark 3. The minimum information loss cρ (α) depends on both the preparation ρ and the angle α. When
α
= ±π/2, that is when the target observables are not compatible, cρ (α) is strictly grater than zero. This is
a peculiar quantum effect: given ρ, u and v, there is no Gaussian approximate joint measurement of Qu and Pv
that can approximate them arbitrarily well. On the other side, in the limit α → ±π/2, the lower bound cρ (α)
goes to zero; so, the case of commuting target observables is approached with continuity.

Remark 4. The lower bound cρ (α) goes to zero also in the classical limit h̄ → 0. This holds for every angle
α and every Gaussian state ρ.

Remark 5. Another case in which cρ (α) → 0 is the limit of large uncertainty states, that is, if we let the product
' ( ' (
Var Qu,ρ Var Pv,ρ → +∞: our entropic MUR disappears because, roughly speaking, the variance of (at
least) one of the two target observables goes to inﬁnity, its relative entropy vanishes by itself, and an optimal
covariant bi-observable M∗ has to take care of (at most) only the other target observable.

Remark 6. Actually, something similar to the previous remark happens also at the macroscopic limit,
and does not require the measuring instrument to be an optimal one; indeed, unbiasedness is enough in
this case. This happens because the error function S(ρ, M) quantiﬁes a relative error; even if the measurement
approximation M is ﬁxed, such an error can be reduced by suitably changing the preparation ρ. Indeed, if we
consider the position and momentum of a macroscopic particle, for instance the center of mass of many particles,
it is natural that its state has much larger position and momentum uncertainties than the intrinsic uncertainties
M
V11 M
V22
of the measuring instrument; that is, * 1 and * 1, implying that the error function (57) is
Var(Qu,ρ ) Var(Pv,ρ )
negligible. In practice, this is a classical case: the preparation has large position and momentum uncertainties
and the measuring instrument is relatively good. In this situation we do not see the difference between the joint
measurement of position and momentum and their separate sharp observations.

Remark 7. The optimal approximating joint measurement M∗ ∈ Cu,v G is unique; by (62) it depends on the

preparation ρ one is considering, as well as on the directions u and v. A realization of M∗ is the measuring
procedure of Example 2.

Remark 8. The MUR (59) is scale invariant, as both the error function S(ρ, M) and the lower bound cρ (α) are such.

235
Entropy 2017, 19, 301

Remark 9. For cos α

= 0, we get infM∈Cu,v G S ( ρ, M) = s ( z ρ ) log e, where z ρ is deﬁned by (61). As z ρ ranges

log e
G S ( ρ, M) takes all the values in the interval 0, 1 −
in the interval (0, 1], the quantity infM∈Cu,v 2 , so that

log e
sup inf S(ρ, M) = 1 − . (63)
ρ∈G M∈Cu,v
G 2

In order to get this result, we needed cos α

= 0; however, the ﬁnal result does not depend on α. Therefore,
in the supρ infM -approach of (63), the continuity from quantum to classical is lost.

6.1.2. Entropic Divergence of Qu , Pv from M

Now we want to find an entropic quantification of the error made in observing M ∈ Cu,v as
an approximation of Qu and Pv in an arbitrary state ρ. The procedure of [41], already suggested
in [19] (Section VI.C) for a different error function, is to consider the worst case by maximizing the
error function over all the states. However, in the continuous framework this is not possible for the
error function (56); indeed, from (57) we get supρ∈G S(ρ, M) = +∞ even if we restrict to unbiased
covariant bi-observables.
Anyway, the reason for S(ρ, M) to diverge is classical: it depends only on the continuous nature of
Qu and Pv , without any relation to their (quantum) incompatibility. Indeed, as we noted in Section 3.1,
if an instrument measuring a random variable X ∼ N ( a; α2 ) adds an independent noise ν ∼ N (b; β2 ),
thus producing an output X + ν ∼ N ( a + b; α2 + β2 ), then the relative entropy S( X X + ν) diverges
for α2 → 0; this is what happens if we fix the noise and we allow for arbitrarily peaked preparations.
Thus, the sum S(Qu,ρ M1,ρ ) + S(Pv,ρ M2,ρ ) diverges if, fixed M, we let Var(Qu,ρ ) or Var(Pv,ρ ) go to 0.
The difference between the classical and quantum frameworks emerges if we bound from below
the variances of the sharp position and momentum observables. Indeed, in the classical framework
we have infb,β2 supα2 ≥ S( X X + ν) = 0 for every > 0; the same holds for the sum of two relative
entropies if no relation exists between the two noises. On the contrary, in the quantum framework the
entropic MURs appear due to the relation between the position and momentum errors occurring in
any approximate joint measurement.
In order to avoid that S(ρ, M) → +∞ due to merely classical effects, we thus introduce the
following subset of the Gaussian states:
5 ' ( ' ( 6
Gu,v
: = ρ ∈ G : Var Qu,ρ ≥ 1 , Var Pv,ρ ≥ 2 , i > 0, (64)

and we evaluate the error made in approximating Qu and Pv with the marginals of a (u, v)-covariant
bi-observable by maximizing the error function over all these states.

Deﬁnition 10. The Gaussian -entropic divergence of Qu , Pv from M ∈ Cu,v is

DG (Qu , Pv M) := sup S(ρ, M). (65)

ρ∈Gu,v

For Gaussian M, depending on the choice of the thresholds 1 and 2 , the divergence
DG (Qu , Pv M) can be easily computed or at least bounded.

Theorem 2. Let the bi-observable M ∈ Cu,v

G be ﬁxed.

h̄2
(i) For 1 2 ≥ (cos α)2 , the divergence DG (Qu , Pv M) is given by
4
log e
DG (Qu , Pv M) = S(ρ (u, v), M) = [s( x ) + s(y ) + Δ(; M)] , (66)
2

236
Entropy 2017, 19, 301

where ρ (u, v) is any Gaussian state with Var Qu,ρ (u,v) = 1 and Var Pv,ρ (u,v) = 2 , and

M M
V11 V22 ( aM )2 ( bM )2
x := , y := , Δ(; σ ) := + M .
1 2 V11 + 1
M V22 + 2

h̄2
(ii) For 1 2 < (cos α)2 , the divergence DG (Qu , Pv M) is bounded from below by
4
log e
DG (Qu , Pv M) ≥ S(ρ (u, v), M) = [s( x ) + s(y ) + Δ(; M)] , (67)
2
h̄2
where ρ (u, v) is any Gaussian state with Var Qu,ρ (u,v) = 1 and Var Pv,ρ (u,v) = (cos α)2 , and
41
M M
V11 41 V22 ( aM )2 ( bM )2
x := , y := , Δ(; σ ) := + .
1 2
h̄ (cos α) 2
V11 + 1
M M + h̄2 (cos α )2
V22 41

The existence of the above states ρ (u, v) is guaranteed by Proposition 3.

Proof. By Proposition 3, maximizing the error function over the states in Gu,v is the same as
' ( ' (
maximizing (57) over the parameters Var Qu,ρ and Var Pv,ρ satisfying (55) and (64) (note that
' ( ' (
in the bias Δ(ρ, M), the variances Var M1,ρ and Var M2,ρ depend on Var Qu,ρ and Var Pv,ρ by (54)).

h̄2
(i) In the case 1 2 ≥ (cos α)2 , the thresholds themselves satisfy Heisenberg uncertainty relation,
4
and so equality (66) follows from the expression (57) and the fact the functions s( x ), s(y), Δ(ρ, M)
' ( ' (
are decreasing in Var Qu,ρ and Var Pv,ρ .
h̄2 ' (
(ii) In the case 1 2 < (cos α)2 , we have to take into account the relation (55) for Var Qu,ρ
' ( 4 ' ( ' ( 2
and Var Pv,ρ : the supremum of S(ρ, M) is achieved when Var Qu,ρ Var Pv,ρ = h̄4 (cos α)2 ,
' ( ' ( ' (
with Var Qu,ρ ≥ 1 and Var Pv,ρ ≥ 2 . Then inequality (67) follows by choosing Var Qu,ρ =
' ( h̄ 2
1 and Var Pv,ρ = (cos α)2 .
41

Remark 10. The conditions on the states ρ (u, v) do not depend on M, but only on the parameters deﬁning
Gu,v
2 2
. Thus, in the case 1 2 ≥ 4 (cos α ) , any choice of ρ ( u, v ) yields a state which is the worst one for every
h̄

Gaussian approximate joint measurement M.

6.1.3. Entropic Incompatibility Degree of Qu and Pv

The last step is to optimize the state independent -entropic divergence (65) over all the approximate
joint measurements of Qu and Pv . This is done in the next deﬁnition.

Deﬁnition 11. The Gaussian -entropic incompatibility degree of Qu , Pv is

G
cinc (Qu , Pv ; ) : = inf DG (Qu , Pv M) ≡ inf sup S(ρ, M). (68)
M∈Cu,v
G M∈Cu,v
G
ρ∈Gu,v

Again, depending on the choice of the thresholds 1 and 2 , the entropic incompatibility degree
G (Q , P ; ) can be easily computed or at least bounded.
cinc u v

237
Entropy 2017, 19, 301

The inﬁmum in (68) is attained and the optimal measurement is unique, in the sense that

G
cinc (Qu , Pv ; ) = DG (Qu , Pv M ) (70)

for a unique M ∈ Cu,v

G ; such a bi-observable is characterized by

M h̄ 1 M h̄ 2 M
aM = 0, bM = 0, V11 = |cos α| , V22 = |cos α| , V12 = 0. (71)
2 2 2 1

h̄2
(ii) For 1 2 < (cos α)2 , the incompatibility degree cinc
G (Q , P ; ) is bounded from below by
u v
4
= >
1
G
cinc (Qu , Pv ; ) ≥ (log e) ln (2) − . (72)
2

The latter bound is

= >
1 ' ( ' (
(log e) ln (2) − = S ρ (u, v), M = inf S ρ (u, v), M , (73)
2 M∈Cu,v
G

where the state ρ (u, v) is defined in item (ii) of Theorem 2 and M is the bi-observable in Cu,v
G such that

M M h̄2 M
aM = 0, bM = 0, V11 = 1 , V22 = (cos α)2 , V12 = 0. (74)
41

h̄2
Proof. (i) In the case 1 2 ≥ (cos α)2 , due to (66), the proof is the same as that of Theorem 1 with
' 4( ' (
the replacements Var Qu,ρ → 1 and Var Pv,ρ → 2 .
2
h̄
(ii) In the case 1 2 < (cos α)2 , starting from (67), the proof is the same as that of Theorem 1 with
4 ' ( ' ( h̄2
the replacements Var Qu,ρ → 1 and Var Pv,ρ → 4 (cos α)2 .
1

Remark 11 (State independent MUR, scalar observables). By means of the above results, we can formulate
a state independent entropic MUR for the position Qu and the momentum Pv in the following way. Chosen two
positive thresholds 1 and 2 , there exists a preparation ρ (u, v) ∈ Gu,v
(introduced in Theorem 2) such that,
for all Gaussian approximate joint measurements M of Qu and Pv , we have

S(Qu,ρ (u,v) M1,ρ (u,v) ) + S(Pv,ρ (u,v) M2,ρ (u,v) )

h̄2
The inequality follows by (66) and (69) in the case 1 2 ≥ 4 (cos α)2 , and (73) in the case
2
1 2 < h̄4 (cos α)2 .
What is relevant is that, for every approximate joint measurement M, the total information loss S(ρ, M)
does exceed the lower bound (75) even if the set of states Gu,v
forbids preparations ρ with too peaked target

238
Entropy 2017, 19, 301

distributions. Indeed, without the thresholds 1 , 2 , it would be trivial to exceed the lower bound (75), as we
noted in Section 6.1.2.
We also remark that, chosen 1 and 2 , we found a single state ρ (u, v) in Gu,v
that satisﬁes (75) for every
M, so that ρ (u, v) is a ‘bad’ state for all Gaussian approximate joint measurements of position and momentum.
2
When 1 2 ≥ h̄4 (cos α)2 , the optimal approximate joint measurement M is unique in the class of
Gaussian (u, v)-covariant bi-observables; it depends only on the class of preparations Gu,v
: it is the best
measurement for the worst choice of the preparation in the class Gu,v
.

Remark 12. The entropic incompatibility degree cinc G (Q , P ; ) is strictly positive for cos α
= 0 (incompatible
u v
target observables) and it goes to zero in the limits α → ±π/2 (compatible observables), h̄ → 0 (classical limit),
and 1 2 → ∞ (large uncertainty states).

Remark 13. The scale invariance of the relative entropy extends to the error function S(ρ, M), hence to the
divergence DG (Qu , Pv M) and the entropic incompatibility degree cinc
G (Q , P ; ), as well as the entropic MUR (75).
u v

6.2. Vector Observables

Now the target observables are Q and P given in (3), with pvm’s Q and P; the approximating
bi-observables are the covariant phase-space observables C of Deﬁnition 5. Each bi-observable M ∈ C
is of the form M = Mσ for some σ ∈ S, where Mσ is given by (43). CG is the subset of the Gaussian
bi-observables in C, and Mσ ∈ CG if and only if σ is a Gaussian state.
We proceed to deﬁne the analogues of the scalar quantities introduced in Sections 6.1.1–6.1.3.
In order to do it, in the next proposition we recall some known results on matrices.

Proposition 14 ([50–52,65]). Let M1 and M2 be n × n complex matrices such that M1 > M2 > 0. Then,
we have 0 < M1−1 < M2−1 . Moreover, if s : R+ → R is a strictly increasing continuous function, we have
Tr{s( M1 )} > Tr{s( M2 )}.

6.2.1. Error Function

Deﬁnition 12. Given the preparation ρ ∈ S and the covariant phase-space observable Mσ , with σ ∈ S, the error
function for the vector case is the sum of the two relative entropies:

S(ρ, Mσ ) := S(Qρ M1,ρ

σ σ
) + S(Pρ M2,ρ ). (76)

As in the scalar case, the error function is scale invariant, it quantiﬁes a relative error, and we
always have S(ρ, Mσ ) > 0 because position and momentum are incompatible. Indeed, since the marginals
of a bi-observable Mσ ∈ C turn out to be convolutions of the respective sharp observables Q and P with
σ and P
= Mσ for all states ρ; this is an easy consequence,
some probability densities on Rn , Qρ
= M1,ρ ρ 2,ρ
for instance, of Problem 26.1, p. 362, in [66].
In the Gaussian case the error function can be explicitly computed.

Proposition 15 (Error function for the vector Gaussian case). For ρ, σ ∈ G, the error function has the two
equivalent expressions:

log e 5 6
S(ρ, Mσ ) = Tr s( Eρ,σ ) + s( Fρ,σ ) + aσ · ( Aρ + Aσ )−1 aσ + bσ · ( Bρ + Bσ )−1 bσ (77a)
2
log e −1 −1

= Tr s( Nρ,σ ) + s( Rρ,σ ) + a σ · ( A ρ + A σ ) −1 a σ + b σ · ( B ρ + B σ ) −1 b σ , (77b)
2
where the function s is deﬁned in (58), and

Eρ,σ := ( Aρ )−1/2 Aσ ( Aρ )−1/2 , Fρ,σ := ( Bρ )−1/2 Bσ ( Bρ )−1/2 , (78a)

239
Entropy 2017, 19, 301

Nρ,σ := ( Aσ )−1/2 Aρ ( Aσ )−1/2 , Rρ,σ := ( Bσ )−1/2 Bρ ( Bσ )−1/2 . (78b)

Proof. First of all, recall that

Qρ = N ( a ρ ; A ρ ), σ
M1,ρ = N ( aρ + aσ ; Aρ + Aσ )
Pρ = N ( b ρ ; B ρ ), σ
M2,ρ = N ( b ρ + b σ ; B ρ + B σ ).

A direct application of (34) yields

1 det( Aρ + Aσ ) log e ρ
σ
S(Qρ M1,ρ )= log ρ + Tr ( A + Aσ )−1 Aρ − 1 + aσ · ( Aρ + Aσ )−1 aσ .
2 det A 2
We can transform this equation by using

det ( Aσ + Aρ ) ' (
= det ( Aρ )−1/2 ( Aσ + Aρ ) ( Aρ )−1/2 = det 1 + Eρ,σ ,
det Aρ
' ( 5 ' (6
ln det 1 + Eρ,σ = Tr ln 1 + Eρ,σ ,

Tr ( Aρ + Aσ )−1 Aρ − 1 = Tr ( Aρ )1/2 ( Aρ + Aσ )−1 ( Aρ )1/2 − 1 = − Tr (1 + Eρ,σ )−1 Eρ,σ .

This gives
log e
σ
S(Qρ M1,ρ )= Tr{s( Eρ,σ )} + aσ · ( Aρ + Aσ )−1 aσ .
2
σ ) and (77a) is proved.
In the same way a similar expression is obtained for S(Pρ M2,ρ
On the other hand, by using
' (
det ( Aσ + Aρ ) det 1 + Nρ,σ −1 −1
ln ρ = ln = ln det 1 + Nρ,σ = Tr ln 1 + Nρ,σ ,
det A det Nρ,σ
= −1 >
Tr ( Aρ + Aσ )−1 Aρ − 1 = − Tr ( Aρ + Aσ )−1 Aσ = − Tr −1
1 + Nρ,σ −1
Nρ,σ ,

and the analogous expressions involving Bρ and Rρ,σ , one gets (77b).

State Dependent Lower Bound

In principle, a state dependent lower bound for the error function could be found by analogy with
Theorem 1, by taking again the infimum over all joint covariant measurements, that is infσ S(ρ, Mσ ).
By considering only Gaussian states ρ and measurements Mσ , from (18), (77a) and (78a), the infimum
over σ ∈ G can be reduced to an infimum over the matrices Aσ :
# $
log e h̄2 ρ −1/2 σ −1 ρ −1/2
inf S(ρ, Mσ ) = infσ Tr s ( Aρ )−1/2 Aσ ( Aρ )−1/2 + s (B ) ( A ) (B ) .
σ ∈G 2 A 4

The above equality follows since the monotonicity of s (Proposition 14) implies that the trace term
2
in (77a) attains its minimum when Bσ = h̄4 ( Aρ )−1 . However, it remains an open problem to explicitly
compute the inﬁmum over the matrices Aσ when the preparation ρ is arbitrary.
Nevertheless, the computations can be done at least for a preparation ρ∗ of minimum uncertainty
(Proposition 5). Indeed, by (22) we get

log e ' (
−1
inf S(ρ∗ , Mσ ) = inf Tr s Eρ,σ + s Eρ,σ .
σ ∈G 2 Aσ

240
Entropy 2017, 19, 301

Now we can diagonalize Eρ,σ and minimize over its eigenvalues; since s( x ) + s( x −1 ) attains its
minimum value at x = 1, this procedure gives Eρ,σ = 1. So, by denoting by σ∗ the state giving the
minimum, we have
h̄2
Aσ∗ = Aρ∗ , Bσ∗ = Bρ∗ = ( A ρ ∗ ) −1 , (79)
4
inf S(ρ∗ , Mσ ) = S(ρ∗ , Mσ∗ ) = ns(1) log e. (80)
σ ∈G

For an arbitrary ρ ∈ G, we can use the last formula to deduce an upper bound for infσ∈G S(ρ, Mσ ).
h̄2 ρ −1
Indeed, if ρ∗ is a minimum uncertainty state with Aρ∗ = Aρ , then Bρ ≥ 4 (A ) = Bρ∗ by (19),
and, using again the state σ∗ of (79), we ﬁnd

inf S(ρ, Mσ ) ≤ S(ρ, Mσ∗ ) ≤ S(ρ∗ , Mσ∗ ) = ns(1) log e.

σ ∈G

The second inequality in the last formula follows from (77b), (78b) and the monotonicity of s
(Proposition 14).

6.2.2. Entropic Divergence of Q, P from Mσ

In order to deﬁne a state independent measure of the error made in regarding the marginals of
Mσ as approximations of Q and P, we can proceed along the lines of the scalar case in Section 6.1.2.
To this end, we introduce the following vector analogue of the Gaussian states deﬁned in (64):

G : = { ρ ∈ G : A ρ ≥ 1 1, B ρ ≥ 2 1 } , ≡ ( 1 , 2 ) , i > 0. (81)

In the vector case, Deﬁnition 10 then reads as follows.

Deﬁnition 13. The Gaussian -entropic divergence of Q, P from Mσ ∈ C is

DG (Q, PMσ ) := sup S(ρ, Mσ ). (82)

ρ ∈G

As in the scalar case, when Mσ is Gaussian, depending on the choice of the product 1 2 , we can
compute the divergence DG (Q, PMσ ) or at least bound it from below.

Theorem 4. Let the bi-observable Mσ ∈ CG be ﬁxed.

h̄2
(i) For 1 2 ≥ , the divergence DG (Q, PMσ ) is given by
4

log e
DG (Q, PMσ ) = S(ρ , Mσ ) = Tr {s ( Aσ /1 ) + s ( Bσ /2 )}
2
+ aσ · ( Aσ + 1 1)−1 aσ + bσ · ( Bσ + 2 1)−1 bσ , (83)

where ρ is any Gaussian state with Aρ = 1 1 and Bρ = 2 1.

h̄2
(ii) For 1 2 < , the divergence DG (Q, PMσ ) is bounded from below by
4
,
log e
DG (Q, PMσ ) σ
≥ S(ρ , M ) = Tr s ( Aσ /1 ) + s 41 Bσ /h̄2
2
−1 -
h̄2
+ a σ · ( A σ + 1 1 ) − 1 a σ + b σ · Bσ + 1 bσ , (84)
41

241
Entropy 2017, 19, 301

h̄2
where ρ is any Gaussian state with Aρ = 1 1 and Bρ = 1.
41

h̄2
Proof. (i) In the case 1 2 ≥ , for ρ ∈ G we have Nρ,σ ≥ 1 ( Aσ )−1 and Rρ,σ ≥ 2 ( Bσ )−1 ;
4
by Proposition 14 we get

−1
Tr{s( Nρ,σ )} ≤ Tr {s ( Aσ /1 )} , Tr{s( R− σ
ρ,σ )} ≤ Tr { s ( B /2 )} ,
1

( A ρ + A σ ) − 1 ≤ ( 1 1 + A σ ) − 1 , ( B ρ + B σ ) − 1 ≤ ( 2 1 + B σ ) − 1 .
By using these inequalities in the expression (77b), we get (83).
h̄2
(ii) In the case 1 2 < , the lower bound (84) follows by evaluating S(ρ, Mσ ) at the state ρ = ρ ∈ G
4
h̄2
with A = 1 1 and Bρ =
ρ 1.
41

Note that ρ does not depend on σ, but only on the parameters deﬁning G : again, in the
h̄2
case 1 2 ≥ , the error attains its maximum at a state which is independent of the approximate
4
measurement.

6.2.3. Entropic Incompatibility Degree of Q and P

By analogy with Section 6.1.3, we can optimize the -entropic divergence over all the approximate
joint measurements of Q and P.

Deﬁnition 14. The Gaussian -entropic incompatibility degree of Q and P is

G
cinc (Q, P; ) := inf DG (Q, PMσ ) ≡ inf sup S(ρ, Mσ ). (85)
σ ∈G σ ∈G ρ ∈G

Again, depending on the product 1 2 , we can compute or at least bound cinc

G (Q, P; ) from below.

h̄2
Theorem 5. (i) For 1 2 ≥ G (Q, P; ) is given by
, the incompatibility degree cinc
4
= >
h̄ h̄
G
cinc (Q, P; ) = n (log e) ln 1 + √ − √ . (86)
2 1 2 2 1 2 + h̄

The inﬁmum in (85) is attained and the optimal measurement is unique, in the sense that

G
cinc (Q, P; ) = DG (Q, PMσ ) (87)

for a unique σ ∈ G; such a state is the minimal uncertainty state characterized by

h̄ 1 h̄ 2
aσ = 0, bσ = 0, Aσ = 1, Bσ = 1, C σ = 0. (88)
2 2 2 1

h̄2
(ii) For 1 2 < (cos α)2 , the incompatibility degree cinc
G (Q, P; ) is bounded from below by
4
= >
1
G
cinc (Q, P; ) ≥ n(log e) ln (2) − . (89)
2

242
Entropy 2017, 19, 301

The latter bound is

= >
1
n(log e) ln (2) − = S(ρ , Mσ ) = inf S(ρ , Mσ ), (90)
2 σ ∈G

where the preparation ρ is deﬁned in item (ii) of Theorem 4 and σ is the state in G such that

h̄2
aσ = 0, bσ = 0, Aσ = 1 1, Bσ = 1, C σ = 0. (91)
41

h̄2
Proof. (i) In the case 1 2 ≥ , from the expression (83) we get immediately aσ = 0, bσ = 0 and
4
2 2
by (19) we have Bσ ≥ h̄4 ( Aσ )−1 . So, by (83) and Propositions 3 and 14, we get Bσ = h̄4 ( Aσ )−1 ,
and # $
log e h̄2
inf sup S(ρ, Mσ ) = infσ Tr s ( Aσ /1 ) + s ( A σ ) −1 .
σ ∈G ρ ∈G

2 A 42

By minimizing over all the eigenvalues of Aσ , we get the minimum (86), which is attained if and
only if Aσ is as in (88). Hence, Aσ and Bσ are as in (88). This implies that any optimal state σ is
a minimum uncertainty state; so, C σ = 0 and the state σ is unique.
h̄2
(ii) In the case 1 2 < , by (19) and Proposition 14, inequality (84) implies
4
log e
inf sup S(ρ, Mσ ) ≥ inf Tr s ( Aσ /1 ) + s 1 ( Aσ )−1 .
σ ∈G ρ ∈G

2 Aσ

By minimizing over all the eigenvalues of Aσ , we get (89). Then (89) holds for ρ as in item (ii) of
Theorem 4 and σ in (91).

Remark 14 (State independent MUR, vector observables). By means of the above results, we can formulate
the following state independent entropic MUR for the position Q and momentum P. Chosen two positive
thresholds 1 and 2 , there exists a preparation ρ ∈ G (introduced in Theorem 4) such that, for all Gaussian
approximate joint measurements Mσ of Q and P, we have

σ σ
S(Qρ M1,ρ
) + S(Pρ M2,ρ
)
⎧ = >
⎪
⎪ h̄ h̄ h̄2
⎨n (log e) ln 1 + √ − √ , if 1 2 ≥ ,
2 1 2 2 1 2 + h̄ 4
≥ = > (92)
⎪
⎪ 1 h̄2
⎩n(log e) ln (2) − , if 1 2 < .
2 4
2 2
The inequality follows by (83) and (86) for 1 2 ≥ h̄4 , and (90) for 1 2 < h̄4 .
Thus, also in the vector case, for every approximate joint measurement Mσ , the total information loss
S(ρ, Mσ ) does exceed the lower bound (92) even if G forbids preparations ρ with too peaked target distributions.
Moreover, chosen 1 and 2 , one can ﬁx again a single ‘bad’ state ρ in G that satisﬁes (92) for all Gaussian
approximate joint measurements Mσ of Q and P.
2
Whenever 1 2 ≥ h̄4 , the optimal approximating joint measurement Mσ is unique in the class of Gaussian
covariant bi-observables; it corresponds to a minimum uncertainty state σ which depends only on the chosen
class of preparations G , that is, on the thresholds 1 and 2 : Mσ is the best measurement for the worst choice of
the preparation in that class.

Remark 15. For n = 1, the vector lower bound in (92) reduces to the scalar lower bound found in (75) for two
parallel directions u and v; for n ≥ 1, the bound linearly increases with n.

243
Entropy 2017, 19, 301

Remark 16. The entropic incompatibility degree cinc G (Q , P ; ) is strictly positive for cos α
= 0 (incompatible
u v
target observables) and it goes to zero in the limit α → ±π/2 (compatible observables), h̄ → 0 (classical limit),
and 1 2 → ∞ (large uncertainty states).

Remark 17. Similarly to Remark 6 for scalar target observables, also the MUR (92) is actually ineffective for
macroscopic systems. Indeed, suppose we are concerned with position and momentum of a macroscopic particle,
say the center of mass of a multi-particle system (in this case n = 3). The states ρ which can be prepared in
practice have macroscopic widths, say ρ ∈ G with ‘large’ thresholds and 1 2 h̄2 /4. Then, we consider
a measuring instrument Mσ∗ having a high precision with respect to this class of states, but not necessarily
attaining a precision near the quantum limits. For instance, let us take Mσ∗ ∈ CG with Aσ∗ = δ1 1, Bσ∗ = δ2 1,
and 0 < δ1 * 1 , 0 < δ2 * 2 ; we assume Mσ∗ is also unbiased: aσ∗ = 0, bσ∗ = 0. Obviously, δ1 δ2 ≥ h̄2 /4
must hold. Then, ∀ρ ∈ G by (77a) and (78a) we have

δ1 δ δ2 δ
Eρ,σ∗ = ≤ 1 1, Fρ,σ∗ = ≤ 2 1,
Aρ 1 Bρ 2

log e 5 6 n log e
0 < S(ρ, Mσ∗ ) = Tr s( Eρ,σ∗ ) + s( Fρ,σ∗ ) ≤ [s(δ1 /1 ) + s(δ2 /2 )] .
2 2
By (58) the function s is increasing and it behaves as s( x ) x2 /2 in a neighborhood of zero; in the present
case δ1 /1 * 1 and δ2 /2 * 1, thus implying that the error function is negligible. This is practically a
‘classical’ case: the preparation has ‘large’ position and momentum uncertainties and the measuring instrument
is ‘relatively good’. In this situation we do not see the difference between the joint measurement of position and
momentum and their separate sharp distributions. Of course the bound (92) continues to hold, but it is also
negligible since 1 2 h̄2 /4.

Remark 18. Also in the vector case, the scale invariance of the relative entropy extends to the error function
S(ρ, Mσ ), the divergence DG (Q, PMσ ) and the entropic incompatibility degree cinc
G (Q, P; ), as well as the

entropic MUR (92). Indeed, let us consider the dimensionless versions of position and momentum (35) and
) P
their associated projection valued measures Q, ) introduced in Section 4. Accordingly, we rescale the joint
measurement Mσ of (43) in the same way, obtaining the POVM

) σ ( B) =
M M) σ ( x), p
))d)
xd p),
B
= > = >
) σ ( x), p 1 i ) − x) · P
) i ) − x) · P
)
M )) = n exp )· Q
p ΠσΠ exp − )· Q
p .
(2πλ) λ λ

Here, both the vector variables x) and p

), as well as the components of the Borel set B, are dimensionless. By the
scale invariance of the relative entropy, the error function takes the same value as in the dimensioned case:

) ρ M
S (Q ) σ ) + S (P
) ρ M
) 2,ρ
σ σ
) = S(Qρ M1,ρ σ
) + S(Pρ M2,ρ ). (93)
1,ρ

Then, the scale invariance holds for the entropic divergence and incompatibility degree, too:

) P
D)G (Q, ) M
) σ ) = DG (Q, PMσ ), G ) )
cinc G
(Q, P; )) = cinc (Q, P; ),

κ 1 λ 2 2 λ2 h̄2
where )
1 : = and )
2 : = . In particular ) 2 ≥
1 ) ⇐⇒ 1 2 ≥ and, in this case, we have
h̄ κ h̄ 4 4
λ h̄
G ) )
n (log e) s √ = cinc G
(Q, P; )) = cinc (Q, P; ) = n (log e) s √ .
2 )1 )2 2 1 2

244
Entropy 2017, 19, 301

7. Conclusions
We have extended the relative entropy formulation of MURs given in [41] from the case of discrete
incompatible observables to a particular instance of continuous target observables, namely the position
and momentum vectors, or two components of them along two possibly non parallel directions.
The entropic MURs we found share the nice property of being scale invariant and well-behaved in the
classical and macroscopic limits. Moreover, in the scalar case, when the angle spanned by the position
and momentum components goes to ±π/2, the entropic bound correctly reﬂects their increasing
compatibility by approaching zero with continuity.
Although our results are limited to the case of Gaussian preparation states and covariant Gaussian
approximate joint measurements, we conjecture that the bounds we found still hold for arbitrary states
and general (not necessarily covariant or Gaussian) bi-observables. Let us see with some more detail
how this should work in the case when the target observables are the vectors Q and P.
The most general procedure should be to consider the error function S(Qρ M1,ρ ) + S(Pρ M2,ρ ) for
an arbitrary POVM M on Rn × Rn and any state ρ ∈ S. First of all, we need states for which neither the
position nor the momentum dispersion are too small; the obvious generalization of the test states (81) is

S := {ρ ∈ S2 : Aρ ≥ 1 1, Bρ ≥ 2 1} , i > 0.

Then, the most general deﬁnitions of the entropic divergence and incompatibility degree are:
0 1
D (Q, PM) := sup S(Qρ M1,ρ ) + S(Pρ M2,ρ ) , (94)
ρ ∈ S

cinc (Q, P; ) := inf D (Q, PM). (95)

It may happen that Qρ is not absolutely continuous with respect to M1,ρ , or Pρ with respect to
M2,ρ ; in this case, the error function and the entropic divergence take the value +∞ by deﬁnition.
So, we can restrict to bi-observables that are (weakly) absolutely continuous with respect to the
Lebesgue measure. However, the true difﬁculty is that, even with this assumption, here we are not
able to estimate (94), hence (95). It could be that the symmetrization techniques used in [17,19] can be
extended to the present setting, and one can reduce the evaluation of the entropic incompatibility index
to optimizing over all covariant bi-observables. Indeed, in the present paper we a priori selected only
covariant approximating measurements; we would like to understand if, among all approximating
measurements, the relative entropy approach selects covariant bi-observables by itself. However, even
if M is covariant, there remains the problem that we do not know how to evaluate (94) if ρ and M
are not Gaussian. It is reasonable to expect that some continuity and convexity arguments should
apply, and the bounds in Theorem 5 might be extended to the general case by taking dense convex
combinations. Also the techniques used for the PURs in [8,9] could be of help in order to extend what
we did with Gaussian states to arbitrary states. This leads us to conjecture:

G
cinc (Q, P; ) = cinc (Q, P; ). (96)

Conjecture (96) is also supported since the uniqueness of the optimal approximating bi-observable
in Theorem 5(i) is reminiscent of what happens in the discrete case of two Fourier conjugated mutually
unbiased bases (MUBs); indeed, in the latter case, the optimal bi-observable is actually unique among
all the bi-observables, not only the covariant ones (see [41] (Theorem 5)).
Similar considerations obviously apply also to the case of scalar target observables. We leave
a more deep investigation of equality (96) to future work.
As a ﬁnal consideration, one could be interested in ﬁnding error/disturbance bounds involving
sequential measurements of position and momentum, rather than considering all their possible
approximate joint measurements. As sequential measurements are a proper subset of the set of
all the bi-observables, optimizing only over them should lead to bounds that are greater than cinc .

245
Entropy 2017, 19, 301

This is the reason for which in [41] an error/disturbance entropic bound, denoted by ced and dinstinct
from cinc , was introduced. However, it was also proved that the equality cinc = ced holds when
one of the target observables is discrete and sharp. Now, in the present paper, only sharp target
observables are involved; although the argument of [41] can not be extended to the continuous setting,
the optimal approximating joint observables we found in Theorems 3(i) and 5(i) actually are sequential
measurements. Indeed, the optimal bi-observable in Theorem 3(i) is one of the POVMs described in
Examples 2 and 3 (see (74)); all these bi-observables have a (trivial) sequential implementation in terms
of an unsharp measurement of Qu followed by sharp Pv . On the other hand, in the vector case, it was
shown in ([67], Corollary 1) that all covariant phase-space observables can be obtained as a sequential
measurement of an unsharp version of the position Q followed by the sharp measurement of the
momentum P. Therefore, cinc = ced also for target position and momentum observables, in both the
scalar and vector case.

Author Contributions: The three authors equally contributed to the paper.

Conﬂicts of Interest: The authors declare no conﬂict of interest.

References
1. Heisenberg, W. Über den anschaulichen Inhalt der quantentheoretischen Kinematik und Mechanik.
Zeitschr. Phys. 1927, 43, 172–198.
2. Simon, R.; Mukunda, N.; Dutta, B. Quantum-noise matrix for multimode systems: U (n) invariance, squeezing,
and normal forms. Phys. Rev. A 1994, 49, 1567–1583.
3. Holevo, A.S. Statistical Structure of Quantum Theory; Lecture Notes in Physics Monographs 67; Springer:
Berlin, Germany, 2001.
4. Holevo, A.S. Probabilistic and Statistical Aspects of Quantum Theory; Quaderni della Normale; Edizioni della
Normale: Pisa, Italy, 2011.
5. Holevo, A.S. Quantum Systems, Channels, Information; De Gruiter: Berlin, Germany, 2012.
6. Robertson, H. The uncertainty principle. Phys. Rev. 1929, 34, 163–164.
7. Hirschman, I.I. A note on entropy. Am. J. Math. 1957, 79, 152–156.
8. Beckner, W. Inequalities in Fourier analysis. Ann. Math. 1975, 102, 159–182.
9. Białynicki-Birula, I.; Mycielski, J. Uncertainty relations for information entropy in wave machanics.
Commun. Math. Phys. 1975, 44, 129–132.
10. Maassen, H.; Ufﬁnk, J.B.M. Generalized entropic uncertainty relations. Phys. Rev. Lett. 1988, 60, 1103–1106.
11. Gibilisco, P.; Isola, T. On a reﬁnement of Heisenberg uncertainty relation by means of quantum Fisher
information. J. Math. Anal. Appl. 2011, 375, 270–275.
12. Coles, P.J.; Berta, M.; Tomamichel, M.; Whener, S. Entropic uncertainty relations and their applications.
Rev. Mod. Phys. 2017, 89, 015002.
13. Wehner, S.; Winter, A. Entropic uncertainty relations—A survey. New J. Phys. 2010, 12, 025009.
14. Ozawa, M. Position measuring interactions and the Heisenberg uncertainty principle. Phys. Lett. A 2002,
299, 1–7.
15. Ozawa, M. Physical content of Heisenberg’s uncertainty relation: Limitation and reformulation. Phys. Lett. A
2003, 318, 21–29.
16. Ozawa, M. Universally valid reformulation of the Heisenberg uncertainty principle on noise and disturbance
in measurement. Phys. Rev. A 2003, 67, 042105.
17. Werner, R.F. The uncertainty relation for joint measurement of position and momentum. Quantum Inf. Comput.
2004, 4, 546–562.
18. Busch, P.; Heinonen, T.; Lahti, P. Heisenberg’s Uncertainty Principle. Phys. Rep. 2007, 452, 155–176.
19. Busch, P.; Lahti, P.; Werner, R. Measurement uncertainty relations. J. Math. Phys. 2014, 55, 042111.
20. Busch, P.; Lahti, P.; Werner, R. Quantum root-mean-square error and measurement uncertainty relations.
Rev. Mod. Phys. 2014, 86, 1261–1281.
21. Ozawa, M. Heisenberg’s original derivation of the uncertainty principle and its universally valid reformulations.
Curr. Sci. 2015, 109, 2006–2016.
22. Davies, E.B. Quantum Theory of Open Systems; Academic: London, UK, 1976.

246
Entropy 2017, 19, 301

23. Busch, P.; Grabowski, M.; Lahti, P. Operational Quantum Physics; Springer: Berlin, Germany, 1997.
24. Barchielli, A.; Gregoratti, M. Quantum Trajectories and Measurements in Continuous Time: The Diffusive Case;
Lecture Notes in Physics; Springer: Berlin/Heidelberg, Germany, 2009; Volume 782.
25. Heinosaari, T.; Ziman, M. The Mathematical Language of Quantum Theory: From Uncertainty to Entanglement;
Cambridge University Press: Cambridge, UK, 2012.
26. Busch, P.; Lahti, P.; Pellonpää, J.-P.; Ylinen, K. Quantum Measurement; Springer: Berlin, Germany, 2016.
27. Buscemi, F.; Hall, M.J.W.; Ozawa, M.; Wilde, M.M. Noise and disturbance in quantum measurements:
An information-theoretic approach. Phys. Rev. Lett. 2014, 112, 050401.
28. Busch, P.; Heinosaari, T.; Schultz, J.; Stevens, N. Comparing the degrees of incompatibility inherent in
probabilistic physical theories. Europhys. Lett. 2013, 103, 10002.
29. Busch, P.; Lahti, P.; Werner, R. Proof of Heisenberg’s error-disturbance relation. Phys. Rev. Lett. 2013, 111, 160405.
30. Coles, P.J.; Furrer, F. State-dependent approach to entropic measurement-disturbance relations. Phys. Lett. A
2015, 379, 105–112.
31. Heinosaari, T.; Schultz, J.; Toigo, A.; Ziman, M. Maximally incompatible quantum observables. Phys. Lett. A
2014, 378, 1695–1699.
32. Werner, R.F. Uncertainty relations for general phase spaces. Front. Phys. 2016, 11, 110305.
33. Buscemi, F.; Das, S.; Wilde, M.M. Approximate reversibility in the context of entropy gain, information gain,
and complete positivity. Phys. Rev. A 2016, 93, 062314.
34. Barchielli, A.; Lupieri, G. Instrumental processes, entropies, information in quantum continual measurements.
Quantum Inf. Comput. 2004, 4, 437–449.
35. Barchielli, A.; Lupieri, G. Instruments and channels in quantum information theory. Opt. Spectrosc. 2005, 99,
425–432.
36. Barchielli, A.; Lupieri, G. Quantum measurements and entropic bounds on information transmission.
Quantum Inf. Comput. 2006, 6, 16–45.
37. Barchielli, A.; Lupieri, G. Instruments and mutual entropies in quantum information. Banach Center Publ.
2006, 73, 65–80.
38. Barchielli, A.; Lupieri, G. Entropic bounds and continual measurements. In Quantum Probability and Infinite
Dimensional Analysis; QP-PQ: Quantum Probability and White Noise Analysis; Accardi, L., Freudenberg, W.,
Schürmann, M., Eds.; World Scientific: Singapore, 2007; Volume 20, pp. 79–89.
39. Barchielli, A.; Lupieri, G. Information gain in quantum continual measurements. In Quantum Stochastic and
Information; Belavkin, V.P., Guţǎ, M., Eds.; World Scientific: Singapore, 2008; pp. 325–345.
40. Maccone, L. Entropic information-disturbance tradeoff. EPL 2007, 77, 40002.
41. Barchielli, A.; Gregoratti, M.; Toigo, A. Measurement uncertainty relations for discrete observables: Relative
entropy formulation. arXiv 2016, arXiv:1608.01986.
42. Braunstein, S.L.; van Loock, P. Quantum information with continuous variables. Rev. Mod. Phys. 2005, 77,
513–577.
43. Heinosaari, T.; Kiukas, J.; Schultz, J. Breaking Gaussian incompatibility on continuous variable quantum
systems. J. Math. Phys. 2015, 56, 082202.
44. Kiukas, J.; Schultz, J. Informationally complete sets of Gaussian measurements. J. Phys. A Math. Theor. 2013,
46, 485303.
45. Weedbrook, C.; Pirandola, S.; García-Patrón, R.; Cerf, N.J.; Ralph, T.C.; Shapiro, J.H.; Lloyd, S. Gaussian
quantum information. Rev. Mod. Phys. 2012, 84, 621–669.
46. Huang, Y. Entropic uncertainty relations in multidimensional position and momentum spaces. Phys. Rev. A
2011, 83, 052124.
47. Heinosaari, T.; Miyadera, T.; Ziman, M. An invitation to quantum incompatibility. J. Phys. A Math. Theor.
2016, 49, 123001.
48. Simon, R.; Sudarshan, E.C.G.; Mukunda, N. Gaussian-Wigner distributions in quantum mechanics and
optics. Phys. Rev. A 1987, 36, 3868–3880.
49. Horn, R.A.; Zhang, F. Basic Properties of the Schur Complement. In The Schur Complement and Its Applications;
Zhang, F., Ed.; Numerical Methods and Algorithms; Springer: Berlin, Germany, 2005; pp. 17–46.
50. Petz, D. Quantum Information Theory and Quantum Statistics; Springer: Berlin, Germany, 2008.

247
Entropy 2017, 19, 301

51. Carlen, E. Trace Inequalities and Quantum Entropy: An Introductory Course. In Entropy and the Quantum;
Contemporary Mathematics; American Mathematical Society: Providence, RI, USA, 2010; Volume 529,
pp. 73–140.
52. Bhatia, R. Matrix Analysis; Springer: New York, NY, USA, 1997.
53. Werner, R.F. Quantum harmonic analysis on phase spaces. J. Math. Phys. 1983, 25, 1404–1411.
54. Burnham, K.P.; Anderson, D.R. Model Selection and Multimodel Inference—A Practical Information—Theoretic
Approach; Springer: New York, NY, USA, 2002.
55. Topsøe, F. Basic concepts, identities and inequalities—The toolkit of Information Theory. Entropy 2011, 3,
162–190.
56. Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley: Hoboken, NJ, USA, 2006.
57. Carmeli, C.; Heinonen, T.; Toigo, A. Position and momentum observables on R and on R3 . J. Math. Phys.
2004, 45, 2526–2539.
58. Barchielli, A.; Lupieri, G. Quantum stochastic calculus, operation valued stochastic processes and continual
measurements in quantum mechanics. J. Math. Phys. 1985, 26, 2222–2230.
59. Barchielli, A.; Lupieri, G. A quantum analogue of Hunt’s representation theorem for the generator of
convolution semigroups on Lie groups. Probab. Theory Rel. Fields 1991, 88, 167–194.
60. Barchielli, A.; Holevo, A.S.; Lupieri, G. An analogue of Hunt’s representation theorem in quantum probability.
J. Theor. Probab. 1993, 6, 231–265.
61. Holevo, A.S. Investigations in the General Theory of Statistical Decisions. Proc. Steklov Inst. Math. 1978, 124,
1–140.
62. Holevo, A.S. Inﬁnitely divisible measurements in quantum probability theory. Theory Probab. Appl. 1986, 31,
493–497.
63. Cassinelli, G.; De Vito, E.; Toigo, A. Positive operator valued measures covariant with respect to an irreducible
representation. J. Math. Phys. 2003, 44, 4768–4775.
64. Kiukas, J.; Lahti, P.; Ylinen, K. Normal covariant quantization maps. J. Math. Anal. Appl. 2006, 319, 783–801.
65. Ohya, M.; Petz, D. Quantum Entropy and Its Use; Springer: Berlin, Germany, 1993.
66. Billingsley, P. Probability and Measure, 2nd ed.; Wiley: New York, NY, USA, 1986.
67. Carmeli, C.; Heinonen, T.; Toigo, A. Sequential measurements of conjugate observables. J. Phys. A Math. Theor.
2011, 44, 285304.

248
entropy
Article
Planck-Scale Soccer-Ball Problem: A Case of
Mistaken Identity
Giovanni Amelino-Camelia 1,2
1 Dipartimento di Fisica, Università di Roma “La Sapienza”, P.le A. Moro 2, 00185 Roma, Italy;
[email protected]
2 Istituto Nazionale di Fisica Nucleare (INFN), Sez. Roma1, P.le A. Moro 2, 00185 Roma, Italy

Received: 23 April 2017; Accepted: 18 July 2017; Published: 2 August 2017

Abstract: Over the last decade, it has been found that nonlinear laws of composition of momenta are
predicted by some alternative approaches to “real” 4D quantum gravity, and by all formulations of
dimensionally-reduced (3D) quantum gravity coupled to matter. The possible relevance for rather
different quantum-gravity models has motivated several studies, but this interest is being tempered
by concerns that a nonlinear law of addition of momenta might inevitably produce a pathological
description of the total momentum of a macroscopic body. I here show that such concerns are
unjustified, finding that they are rooted in failure to appreciate the differences between two roles
for laws composition of momentum in physics. Previous results relied exclusively on the role of a
law of momentum composition in the description of spacetime locality. However, the notion of total
momentum of a multi-particle system is not a manifestation of locality, but rather reflects translational
invariance. By working within an illustrative example of quantum spacetime, I show explicitly that
spacetime locality is indeed reflected in a nonlinear law of composition of momenta, but translational
invariance still results in an undeformed linear law of addition of momenta building up the total
momentum of a multi-particle system.

Keywords: quantum foundations; relativity; quantum gravity

1. Introduction
An emerging characteristic of quantum-gravity research over the last decade has been a gradual
shift of focus toward manifestations of the Planck scale on momentum space, particularly pronounced
in some approaches to quantum gravity. For some research lines based on spacetime noncommutativity,
several momentum-space structures have been in focus, including the possibility of deformed laws
of composition of momenta, which shall be here of interest. While deformed laws of composition
of momenta are found to be inevitable in some approaches based on spacetime noncommutativity
(e.g., [1–6]), the situation is less certain in the loop-quantum-gravity approach. For “real” 4D loop
quantum gravity, the relevant issues are partly obscured by our present limited understanding of
the semiclassical limit of that theory [7], but some indirect arguments suggest that a nonlinear law of
composition of momenta might arise [8,9]. These arguments find further strength in results on 3D loop
quantum gravity, where the simplifications afforded by that dimensionally-reduced model allow one
to rigorously show that indeed the nonlinearities on momentum space are present (e.g., [10]). Actually,
evidence is growing that in all alternative formulations of 3D quantum gravity coupled to matter there
are nonlinearities in momentum space, including nonlinear laws of composition of momenta (e.g., [11]).
The role played by nonlinearities on momentum space is also noteworthy in two recently-proposed
approaches to the quantum-gravity problem: the one based on group field theory [12] and the one
based on the relative-locality framework [13].
Due to the lack of experimental guidance, a variety of approaches to quantum gravity are
being developed, and in most cases the different approaches have very little in common. This of

Entropy 2017, 19, 400; doi:10.3390/e19080400 249 www.mdpi.com/journal/entropy

Entropy 2017, 19, 400

course endows with additional reasons of interest any result which is found to apply to more than
one approach. Indeed, there has been growing interest in the conceptual implications and possible
phenomenological implications [14] of nonlinear laws on momentum space and particularly nonlinear
laws of composition of momenta. However, this interest is being tempered by concerns that a nonlinear
law of addition of momenta might inevitably produce a pathological description of the total momentum
of a macroscopic body [15–23] (also see References [24–26] for a related discussion focused within the
novel relative-locality framework). This issue has often been labelled as the “soccer-ball problem” [17]:
the quantum-gravity pictures lead one to expect nonlinearities of the law of composition of momenta
which are suppressed by the Planck scale (∼1028 eV) and would be unobservably small for particles
at energies we presently can access, but in the analysis of a macroscopic body (e.g., a soccer ball),
one might have to add up very many of such minute nonlinearities, ultimately obtaining results in
conflict with observations [15–23].
If this so-called “soccer-ball problem” really was a scientific problem (a case of actual conflict with
experimental data), we could draw rather sharp conclusions about several areas of quantum-gravity
research. Perhaps most notably we should consider as ruled out large branches of research on
quantum-gravity based on spacetime noncommutativity and we should consider the whole effort
of research on dimensionally-reduced 3D quantum gravity as completely unreliable in forming
an intuition for “real” 4D quantum gravity. However, I here show that previous discussions of
this soccer-ball problem [15–26] failed to appreciate the differences between two roles for laws of
composition of momentum in physics. Previous results supporting a nonlinear law of addition of
momenta relied exclusively on the role of a law of momentum composition in the description of
spacetime locality. The notion of total momentum of a multi-particle system is not a manifestation
of locality, but rather reflects translational invariance in interacting theories. After being myself
confused about these issues for quite some time [17] I feel I am now in a position to articulate the
needed discussion at a completely general level. However, considering the tone and content of the
bulk of literature that precedes this contribution of mine I find it is best to opt here instead for a very
explicit discussion based on illustrative examples of calculations performed within a specific simple
model affected by nonlinearities for a law of composition of momenta. The model I focus on has
2 + 1-dimensional pure-spatial κ-Minkowski noncommutativity [1–6], with the time coordinate left
unaffected by the deformation and the two spatial coordinates, x1 and x2 , governed by

[ x1 , x2 ] = i x1 (1)

(with the deformation scale expected to be of the order of the inverse of the Planck scale).
In the next section I brieﬂy review within this example of quantum spacetime previous
arguments showing that spacetime locality is reﬂected in a nonlinear law of composition of momenta.
Then, Section 3 takes off from known results on translational invariance for κ-Minkowski noncommutative
spacetimes and builds on those to achieve the first ever example of translationally-invariant interacting
two-particle system in κ-Minkowski. This allows me to explicitly verify that the conserved charge
associated with that translational invariance (the total momentum of the two-particle system) adds
linearly the momenta of the two particles involved. Section 4 offers some closing remarks.

2. Soccer-Ball Problem and Sum of Momenta from Locality

The ingredients needed for seeing a nonlinear law of composition of momenta emerging from
noncommutativity of type (1) are very simple. Essentially, one needs only to rely on results establishing
that functions of coordinates governed by (1) still admit a rather standard Fourier expansion (e.g., [1,2])

μ
Φ( x ) = d4 k Φ̃(k ) eikμ x

250
Entropy 2017, 19, 400

and that the notion of integration on such a noncommutative space preserves many of the standard
properties including [1,3]
μ
d4 x eikμ x = (2π )4 δ(4) (k) . (2)

It is a rather standard exercise for practitioners of spacetime noncommutativity to use these tools
in order to enforce locality within actions describing classical fields. For example, one might want to
introduce in the action the product of three (possibly identical, but in general different) fields, Φ, Ψ,
Υ, insisting on locality in the sense that the three fields be evaluated “at the same quantum point x”;
i.e., Φ( x ) Ψ( x ) Υ( x ). There is still no consensus on how one should formulate the more interesting
quantum-field version of such theories, and it remains unclear to which extent and in which way our
ordinary notion of locality is generalized by the requirement of evaluating “at the same quantum point
x ” fields intervening in a product such as Φ( x ) Ψ( x ) Υ( x ). Nonetheless, for the classical-field case
there is a sizable body of literature consistently adopting this prescription for locality. Important for
my purposes here is the fact that with such a prescription, locality inevitably leads to a nonlinear law
of composition of momenta, as I show explicitly in the following example:

d4 x Φ ( x ) Ψ ( x ) Υ ( x ) = (3)

μ ν ρ
= d4 x d4 k d4 p d4 q Φ̃(k) Ψ̃( p) Υ̃(q) eikμ x eipν x eiqρ x

μ
= d4 x d4 k d4 p d4 q Φ̃(k) Ψ̃( p) Υ̃(q)ei(k⊕ p⊕q)μ x

= (2π )4 d4 k d4 p d4 q Φ̃(k) Ψ̃( p) Υ̃(q) δ(4) (k ⊕ p ⊕ q)

where ⊕ is such that

( k ⊕ p )0 = k 0 + p0 (4)

( k ⊕ p )2 = k 2 + p2 (5)
, -
k 2 + p2 1 − ek2 1 − e p2
( k ⊕ p )1 = k1 + p1 (6)
1 − e(k2 + p2 ) k2 e p2 p2

This result is rooted in one of the most studied aspects of such noncommutative spacetimes,
which is their “generalized star product” [1–3]. This is essentially a characterization of the properties
of products of exponentials induced by rules of noncommutativity of type (1). Specifically, one
easily arrives at (3) (with ⊕ such that, in particular, (6) holds) by just observing that from the
defining commutator (1) it follows that (Equation (7) is a particular example of application of
the Baker-Campbell-Hausdorff formula for products of exponentials of noncommuting variables.
In general, the Baker–Campbell–Hausdorff formula involves an infinite series of nested commutators,
but the case of noncommutativity (1) is one of the cases for which the series of nested commutators
can be resummed explicitly [2,3]) [2,3]:

log [exp (ik2 x2 + ik1 x1 ) exp (ip2 x2 + ip1 x1 )] = (7)

k 2 + p2 1 − ek2 1 − e p2
= ix2 ( p2 + k2 ) + ix1 k1 + p1
1 − e(k2 + p2 ) k2 e p2 p2

The so-called soccer-ball problem concerns the acceptability of laws of composition of type (6).
Since one assumes that the deformation scale is on the order of the inverse of the Planck scale,
applying (6) to microscopic/fundamental particles has no sizable consequences: of course (6) gives us
back to good approximation (k ⊕ p)1 k1 + p1 whenever |k2 | * 1 and | p2 | * 1. However, if a law
of composition such as (6) should be used also when we add very many microparticle momenta in
obtaining the total momentum of a multiparticle system (such as a soccer ball), then the ﬁnal result

251
Entropy 2017, 19, 400

could be pathological [15–26] even when each microparticle in the system has momentum much
smaller than 1/.

3. Sum of Momenta from Translational Invariance

As clarified in the brief review of known results given in the previous section, a nonlinear law of
composition of momenta arises in characterizations of locality, as a direct consequence of the form of
some star products. My main point here is that a different law of composition of momenta is produced
by the analysis of translational invariance, and it is this other law of composition of momenta which is
relevant for the characterization of the total momentum of a multi-particle system. Here too I shall use
only known facts about the peculiarities of translation transformations in certain noncommutative
spacetimes, but exploit them to obtain results that had not been derived before—indeed, results
relevant for the description of the total momentum of a multi-particle system.
A first hint that translation transformations should be modified [4–6] in certain noncommutative
spacetimes comes from noticing that (1) is incompatible with the standard Heisenberg relations
[ p j , xk ] = iδjk . Indeed, if one adopts (1) and [ p j , xk ] = iδjk , one then easily finds that some Jacobi
identities are not satisfied. The relevant Jacobi identities are satisfied if one allows for a modification of
the Heisenberg relations which balances for the noncommutativity of the coordinates:

[ p1 , x1 ] = i , [ p2 , x1 ] = 0 , [ p2 , x2 ] = i , (8)
[ p1 , x2 ] = − i p1 , (9)

One easily ﬁnds that by combining (1), (8), and (9), all Jacobi identities are satisﬁed [4–6].
Additional intuition for these nonstandard properties of the momenta p j comes from actually looking
at which formulation of translation transformations preserves the form of the noncommutativity of
coordinates (1). Evidently, the standard description

x2 → x2 = x2 + a2 , x1 → x1 = x1 + a1

is not a symmetry of (1):

[ x1 , x2 ] = [ x1 + a1 , x2 + a2 ] = i x1 = i ( x1 − a1 ) (10)

Unsurprisingly, what does work is the description of translation transformations using as

generators the p j of (8) and (9), which as stressed above satisfy the Jacobi-identity criterion. These
deformed translation transformations take the form

x1 = x1 − ia1 [ p1 , x1 ] − ia2 [ p2 , x1 ] = x1 + a1 ,

x2 = x2 − ia1 [ p1 , x2 ] − ia2 [ p2 , x2 ] = x2 + a2 − a1 p1 (11)

and indeed are symmetries of the commutation rules (1):

[ x1 , x2 ] = [ x1 + a1 , x2 + a2 − a1 p1 ] =
= i x1 − a1 [ x1 , p1 ] = i ( x1 + a1 ) = i x1 (12)

All this about translation transformations in certain noncommutative spacetimes is well known
(e.g., [4–6]). The part which I am here going to contribute is to show how this is relevant for the mentioned
much-debated issue about the total momentum of a multi-particle system. My starting point is that in
order for us to be able to even contemplate the total momentum of a multiparticle system, we must be
dealing with a case where translational invariance is ensured: total momentum is the conserved charge
for a translationally invariant multi-particle system. Surely the introduction of translationally invariant
multi-particle systems must involve some subtleties due to the noncommutativity of coordinates,

252
Entropy 2017, 19, 400

and these subtleties are directly connected to the new properties of translation transformations (9),
but they are not directly connected to the properties of the star product (7) and the associated law
of composition of momenta (6). For my purposes, also considering the heated debate that precedes
this contribution of mine, it is best to show the implications of this point very simply and explicitly,
focusing on a system of two particles interacting via a harmonic potential.
I start by noticing that evidently one does not achieve translational invariance through a
description of the form

( p1A )2 ( p2A )2 ( p1B )2 ( p2B )2

Hnon−transl = + + + +
2m 2m 2m 2m
1
+ ρ[( x1A − x1B )2 + ( x2A − x2B )2 ] (13)
2
where indices A and B label the two particles involved in the interaction via the harmonic potential.
As stressed above, translation transformations consistent with the coordinate noncommutativity (1)
must be such that (see (11)) x1 → x1 + a1 and x2 → x2 + a2 − a1 p1 , and as a result by writing the
harmonic potential with ( x1A − x1B )2 + ( x2A − x2B )2 , one does not achieve translational invariance.
One does get translational invariance by adopting instead

( p1A )2 ( p2A )2 ( p1B )2 ( p2B )2

H= + + + + (14)
2m 2m 2m 2m
1
+ ρ[( x1A − x1B )2 + ( x2A + x1A p1A − x2B − x1B p1B )2 ]
2
This is trivially invariant under translations generated by p2 , which simply produce x1 → x1 and
x2 → x2 + a2 . It is also invariant under translations generated by p1 , since they produce x1 → x1 + a1
and x2 → x2 − a1 p1 , so that x2 + x1 p1 is left unchanged:

x2 + x1 p1 → x2 − a1 p1 + ( x1 + a1 ) p1 = x2 + x1 p1

It is interesting for my purposes to see which conserved charge is associated with this invariance
under translations of the hamiltonian H. This conserved charge will describe the total momentum
of the two-particle system governed by H (i.e., the center-of-mass momentum). It is easy to see that
this conserved charge is just the standard p A + p B . For the second component, one trivially ﬁnds
that indeed
[ p2A + p2B , H] = 0
and the same result also applies to the ﬁrst component:

[ p1A + p1B , H] ∝ [ p1A + p1B , ( x1A − x1B )2 ] +

+[ p1A + p1B , ( x2A + x1A p1A − x2B − x1B p1B )2 ] =
= [ p1A + p1B , ( x2A + x1A p1A − x2B − x1B p1B )2 ] ∝
∝ [ p1A + p1B , x2A + x1A p1A − x2B − x1B p1B ]
= −i p1A + i p1A + i p1B − i p1B = 0 (15)

where the only non-trivial observation I have used is that (1) leads to [ p1 , x2 + x1 p1 ] = −i p1 + i p1 = 0.
The result (15) shows that indeed p A + p B is the momentum of the center of mass of my
translationally-invariant two-particle system; i.e., it is the total momentum of the system.
The concerns about total momentum that had been voiced in discussions of the Planck-scale
soccer-ball problem were rooted in the different sum of momenta relevant for locality, the ⊕ sum
discussed in the previous section. It was feared that one should obtain the total momentum by
combining single-particle momenta with the nonlinear ⊕ sum. The result (15) shows that this

253
Entropy 2017, 19, 400

expectation was incorrect. One can also directly verify that indeed p A ⊕ p B is not a conserved
charge for my translationally-invariant two-particle system, and specifically, taking into account (6),
one finds that
[(p A ⊕ p B )1 , H]
= 0
This completes my thesis, but in closing this section I should warn readers of the fact that while the
picture emerging from my analysis is rather compelling, one should not forget that the interpretation
of the notion of total momentum in a noncommutative spacetime remains affected by some open issues
(see Reference [14] and references therein). Even the physical meaning of having noncommutative
spacetime coordinates is still being debated. In the shadow of these interpretational issues, we
cannot even be sure that the Hamiltonian of Equation (14) has physical (observable) consequences
different from an ordinary harmonic-oscillator theory. Nonetheless, my analysis contributes to this
ongoing debate by exposing two notions of momentum conservation: one connected to locality, and
one connected with translational invariance. Evidently, if interpreted in standard way, these two
notions could be mutually incompatible: in the analysis of a chain of events one might naturally
want to insist on overall total-momentum conservation, but in some parts of the chain of events
the conserved quantity might be the one coming from locality, while in other parts of the chain of
events the conserved quantity might the one coming from translational invariance. Addressing this
apparent puzzle might require a totally new interpretation of the notion of momentum of a particle
in a quantum spacetime, while failing to address it might be a mortal blow to the whole research
area. While in part my results are sub judice because of these interpretational issues, my analysis
nonetheless firmly establishes the main conceptual point I am making, which concerns the differences
between “composition of momentum appearing in locality analyses” and “composition of momentum
appearing in translational-invariance analyses”—two notions which are usually confused with each
other due to the fact that in a classical spacetime they coincide.

4. Implications and Outlook

The results here reported suggest that—at least within the framework of κ-Minkowski spacetime
noncommutativity, there might be no “soccer-ball problem”. I am confident that analogous results
will emerge in other similar formalisms, but of course dedicated analyses are needed. A case of
particular interest might be that of the Snyder model of spacetime noncommutativity [27], which is
already known to have a complicated interplay with translational invariance: the original model of
Reference [27] is not invariant under translations, but a variant with an extra dimension recovers
translational invariance [28].
The Hamiltonian of Equation (14) is the only one I managed to find which is invariant under the
translation transformations (11), but I do not have any proof of uniqueness. It would be interesting to
consider other Hamiltonians that are invariant under (11) and give the ordinary harmonic-oscillator
Hamiltonian in the → 0 limit.
As usual in physics, attempts to generalize a theory also help us understand the theory itself:
the analysis I here reported makes us appreciate how our current theories are built on a non-trivial
correspondence between the momentum-space manifestations of locality and translational invariance.
This can be viewed from a different perspective by reconsidering the fact that in Galilean relativity all
laws of composition of momenta and velocities are linear, and there is a linear relationship between
velocity and momentum. Within Galilean-relativistic theories, one could choose to never speak
of momentum and work exclusively in terms of velocities, with apparently a single linear law of
composition of velocities. In our current post-Galilean theories, the relationship between momentum
and velocity is non-linear, and we then manage to appreciate differences between composition
laws (in our current theories all laws of composition of momenta remain linear, but velocities are
composed non-linearly).
I must also comment on the fact that aspects of my analysis pertaining to translational invariance
were confined to a first-quantized system. This came out of necessity since several grey areas remain for

254
Entropy 2017, 19, 400

the formulation of second quantization with κ-Minkowski noncommutativity. As a matter of fact, I here
provided the first ever translationally-invariant formulation of an interacting theory in κ-Minkowski.
All previous attempts had been made within quantum field theory, and led to unsatisfactory results,
particularly concerning global translational invariance. Perhaps the results I here reported could
provide guidance for improving upon previous attempts at formulating interacting quantum field
theories in κ-Minkowski. In particular, it might be appropriate to make room for some novel notion
of “coincidence of points”—a possibility which had not been considered in previous attempts. I see
a hint pointing in this direction in the structure of my translationally-invariant harmonic potential:
unlike standard Harmonic potentials, the potential in my Equation (14) does not vanish when the
coordinates of the particles coincide: the potential in Equation (14) vanishes for x1A = x1B and x2A = x2B
only if the momenta also coincide (p1A = p1B ). This is reminiscent of some results obtained within the
recently-proposed relative-locality framework [13], where the only meaningful notion of “coincidence”
is a phase-space notion (not a notion that could be formulated exclusively in spacetime). This suggests
that one could perhaps improve upon previous attempts to formulate interacting quantum field
theories in κ-Minkowski by exploiting quantum-field-theory results being developed [29] for the
relative-locality framework.
Another direction for future studies which might bring some enlightenment concerns building
interacting theories with full relativistic covariance. Herein I focused on translation transformations
because it was sufficient for the purposes of my study, but it would be interesting to ask what
additional constraints would arise if one insists on full relativistic covariance (including boosts and
spatial rotations) rather than just translational invariance. For the law of composition of momenta
based on locality, a fully consistent relativistic picture is already known [13,14,29], and its consistency
with κ-Minkowski noncommutativity is well established. Important insight might be gained by
establishing whether or not analogous results are available for the law of composition of momenta
based on translational invariance of my interacting Hamiltonian.

Conﬂicts of Interest: The author declares no conﬂict of interest.

References
1. Majid, S. Meaning of Noncommutative Geometry and the Planck-Scale Quantum Group. Lect. Notes Phys.
2000, 541, 227.
2. Kosinski, P.; Lukierski, J.; Maslanka, P. Local Field Theory ON κ-Minkowski Space, Star Products and
Noncommutative Translations. Czechoslov. J. Phys. 2000, 50, 1283–1290.
3. Agostini, A.; Lizzi, F.; Zampini, A. Generalized Weyl systems and kappa-Minkowski space. Mod. Phys. Lett. A
2002, 17, 2105–2126.
4. Lukierski, J.; Ruegg, H.; Zakrzewski, W.J. Classical and Quantum Mechanics of Free κ Relativistic Systems.
Ann. Phys. 1995, 243, 90–116.
5. Amelino-Camelia, G.; Lukierski, J.; Nowicki, A. Distance Measurement and κ-Deformed Propagation of
Light and Heavy Probes. Int. J. Mod. Phys. A 1999, 14, 4575–4588.
6. Kowalski-Glikman, J.; Nowak, S. Doubly Special Relativity theories as different bases of κ-Poincaré algebra.
Phys. Lett. B 2002, 539, 126–132.
7. Rovelli, C. Loop Quantum Gravity. Living Rev. Relativ. 2008, 11, 5.
8. Smolin, L. Quantum gravity with a positive cosmological constant. arXiv 2002, arXiv:hep-th/0209079.
9. Amelino-Camelia, G.; Smolin, L.; Starodubtsev, A. Quantum symmetry, the cosmological constant and
Planck scale phenomenology. Class. Quant. Grav. 2004, 21, 3095–3110.
10. Noui, K. Three Dimensional Loop Quantum Gravity: Particles and the Quantum Double. J. Math. Phys. 2006,
47, 102501.
11. Freidel, L.; Livine, E.R. 3-D quantum gravity and non-commutative quantum field theory. Phys. Rev. Lett.
2006, 96, 221301.
12. Oriti, D.; Ryan, J. Group field theory formulation of 3D quantum gravity coupled to matter fields.
Class. Quant. Grav. 2006, 23, 6543–6576.

255
Entropy 2017, 19, 400

13. Amelino-Camelia, G.; Freidel, L.; Kowalski-Glikman, J.; Smolin, L. The principle of relative locality.
Phys. Rev. D 2011, 84, 084010.
14. Amelino-Camelia, G. Quantum Spacetime Phenomenology. Living Rev. Relativ. 2013, 16, 5.
15. Lukierski, J. From noncommutative space-time to quantum relativistic symmetries with fundamental mass
parameter. In Proceedings of the Second International Symposium on Quantum Theory (QTS2), Krakow,
Poland, 18–21 July 2001.
16. Maggiore, M. The Atick-Witten free energy, closed tachyon condensation and deformed Poincare’ symmetry.
Nucl. Phys. B 2002, 69, 647.
17. Amelino-Camelia, G. Doubly-Special Relativity: First Results and Key Open Problems. Int. J. Mod. Phys. D
2002, 11, 1643.
18. Kowalski-Glikman, J. Introduction to Doubly Special Relativity. Lect. Notes Phys. 2005, 669, 131–159.
19. Girelli, F.; Livine, E.R. Physics of Deformed Special Relativity. Braz. J. Phys. 2005, 35, 432–438.
20. Jacobson, T.; Liberati, S.; Mattingly, D. Lorentz violation at high energy: Concepts, phenomena and
astrophysical constraints. Ann. Phys. 2006, 321, 150–196.
21. Hossenfelder, S. Multi-Particle States in Deformed Special Relativity. Phys. Rev. D 2007, 75, 105005.
22. Mignemi, S. Doubly special relativity and translation invariance. Phys. Lett. B 2009, 672, 186–189.
23. Magpantay, J.A. Dual doubly special relativity. Phys. Rev. D 2011, 84, 024016.
24. Amelino-Camelia, G.; Freidel, L.; Kowalski-Glikman, J.; Smolin, L. Relative locality and the soccer ball
problem. Phys. Rev. D 2011, 84, 087702.
25. Hossenfelder, S. Comment on “Relative locality and the soccer ball problem”. Phys. Rev. D 2013, 88, 028701.
26. Amelino-Camelia, G.; Freidel, L.; Kowalski-Glikman, J.; Smolin, L. Noisy soccer balls. Phys. Rev. D 2013,
88, 028702.
27. Snyder, H.S. Quantized Space-Time. Phys. Rev. 1947, 71, 38.
28. Yang, C.N. On Quantized Space-Time. Phys. Rev. 1947, 72, 874.
29. Freidel, L.; Rempel, T. Scalar Field Theory in Curved Momentum Space. arXiv 2013, arXiv:1312.3674.

256
entropy
Article
Structure of Multipartite Entanglement in Random
Cluster-Like Photonic Systems
Mario Arnolfo Ciampini 1, *, Paolo Mataloni 1 and Mauro Paternostro 2
1 Dipartimento di Fisica, Sapienza Università di Roma, Piazzale Aldo Moro 5, Rome 00185, Italy;
[email protected]
2 Centre for Theoretical Atomic, Molecular and Optical Physics, School of Mathematics and Physics,
Queen’s University Belfast, Belfast BT7 1NN, UK; [email protected]
* Correspondence: [email protected]; Tel.: +39-06-4991-3526

Received: 24 July 2017; Accepted: 2 September 2017; Published: 5 September 2017

Abstract: Quantum networks are natural scenarios for the communication of information among
distributed parties, and the arena of promising schemes for distributed quantum computation.
Measurement-based quantum computing is a prominent example of how quantum networking,
embodied by the generation of a special class of multipartite states called cluster states, can be used
to achieve a powerful paradigm for quantum information processing. Here we analyze randomly
generated cluster states in order to address the emergence of correlations as a function of the density
of edges in a given underlying graph. We find that the most widespread multipartite entanglement
does not correspond to the highest amount of edges in the cluster. We extend the analysis to higher
dimensions, finding similar results, which suggest the establishment of small world structures in
the entanglement sharing of randomised cluster states, which can be exploited in engineering more
efficient quantum information carriers.

Keywords: cluster states; multipartite entanglement; percolation

1. Introduction
In 1929, the Hungarian author Karinthy famously set out the concept of six degrees of separation [1],
the conjecture according to which any two living entities on Earth are distant by no more than ﬁve
intermediate steps. This concept was reprised and developed later on more rigorous sociological and
statistical grounds. Remarkably, for instance, a variation of the six degrees was unveiled by the group
of Barabasi in 1999 [2], who predicted that any page in the World Wide Web can be reached from any
other one with only nineteen intermediate steps (or clicks) on average.
As counterintuitive as this result might look, they are actually based on a very solid concept in
graph theory, namely the emergence of small worlds from connected networks. A small-world network
is a type of mathematical graph in which most nodes are not neighbours of one another, but can be
reached from every other one by a small number of steps that actually grows logarithmically with
the number of nodes themselves. The six and nineteen degrees of separation highlighted above are
different yet similar manifestations of the emergence of small worlds in a network.
Can these concepts be exported to the quantum domain? While the theory of quantum networks
has found fertile applications in quantum communication [3] and ground-breaking results in the
proposal of quantum repeaters for the faithful long-haul transport of quantum information [4,5], the
implications of the emergence of small worlds have been far less explored, and mostly conﬁned to
studies of excitation-transport and the analysis of the transition from localised to delocalised regimes
in spatially extended interacting-particle models [6,7].
Here, inspired by the analogy between classical network bonds and the correlations set between
two elements of a given network of quantum particles, we aim at exploring different aspects.

Entropy 2017, 19, 473; doi:10.3390/e19090473 257 www.mdpi.com/journal/entropy

Entropy 2017, 19, 473

In particular, motivated by the current experimental state-of-the-art in linear optics, which makes
available controllable networks of interconnected information carriers, we address the emergence
of typical lengths in the entanglement established by a random set of unitary gates applied to the
elements of a given graph. In particular, we focus on a particular class of operations and networks,
i.e., those typically put in place in the procedure for the creation of so-called cluster states, which are
resources for measurement-based quantum computing [8].
Such computational paradigm, which has been demonstrated equivalent to any circuital quantum
computing protocol, is of fundamental importance in quantum information processing. Linear-optics
measurement-based quantum information processing has emerged as a promising avenue for the
exploration of controllable quantum protocols. Encoding and entangling qubits in more than one
degree of freedom of photons is a promising avenue for the generation of medium-to-large scale
photonic cluster states: hyperentanglement-based protocols have so far allowed for the creation of
cluster states of up to 6 qubits [9], which have been used to validate fundamental one-way quantum
algorithms [10,11].
In this paper, by randomising the application of the elementary gates needed to engineer a cluster
state of a given size, we induce the establishment of small worlds in the underlying network of a given
physical system, and address how the spreading of entanglement across the network itself is affected by
the degree of stochasticity of such gates. We unveil an interesting hierarchy with which entanglement
appears in subnetworks of growing size: only a sufﬁcient degree of determinism allows for the settling
of multipartite entanglement within a given cluster lattice, the threshold for k-element entanglement
depending neatly on the number of elements k itself. Moreover, we illustrate a fundamental difference
between the phenomenology illustrated in this paper and recently introduced concepts of classical
entanglement percolation [12].
The signiﬁcance of this study goes beyond the context set by cluster states and measurement-based
quantum information processing and addresses the fundamental concept of entanglement [13]. In fact,
the emergence of different lengths at which bipartite and multipartite entanglement emerge from
a set of entangling transformations applied to the elements of a given network, provides insightful
information on the entanglement sharing structure. In turn, such information could be used to design
better resources for quantum information protocols, obtained by applying only a small subset of
entangling operations than the whole one determined by the size of the network itself and nevertheless
bearing entanglement-sharing properties very close to those of the fully connected network.
The remainder of this paper is organised as follows. In Section 2.1 we present randomly generated
cluster states as the platform for our investigation; in Section 2.2 we focus our attention to four-qubit
cluster states, presenting a rich analysis on the interplay between stochasticity of the gates used to set
the network and the settling of bipartite and multipartite entanglement. In Section 2.3 we extend our
analysis to larger networks.

2. Results
2.1. Theoretical Framework
The approach that we use in order to investigate the core question of our work can be schematised
as follows:

1. We set the value of the threshold q and generate a suitable number of random variables pij ∈ [0, 1],
which embody the probabilities to apply the gate CPHASEi,j (π ) to the pair of qubits (ei , e j ).
2. We compare pij to q. Should it be pij < q (pij > q), CPHASEi,j (π ) is (not) applied. We exhaust the
number of all inequivalent pairs of qubits in the network. This produces the network state |ψΣ ,
where Σ = {e1 , . . . , e N } is the set of qubits of the register.
3. We compute the reduced density matrices ρσ = TrΣ\σ [|ψ ψ|Σ ] that are obtained upon tracing the
overall state over all qubits but those in the subset σ ∈ Σ.
4. We calculate the percent fraction of such reductions that are entangled at the set value of q.

258
Entropy 2017, 19, 473

5. In order to eliminate any dependence on the speciﬁc random pattern of applications of the joint
gate, we repeat the procedure above for a number Q 1 of instances.
6. When Q is reached, we change q and repeat the protocol from point 1 to 5.

Needless to say, the number of applications of CPHASEi,j (π ) at a set value of the threshold
depends strongly on the actual value of q itself: the larger the chosen value of q, the higher the number
of gate applications. This is illustrated in Figure 1, where we show the different conﬁgurations achieved
for a network of N = 8 elements for q = 0.2, 0.5 and 1, which is associated with a fully connected graph.
It is important to remark that, in our notation as well as in Figure 1, a bond connecting elements ei and
e j only means that gate CPHASEi,j (π ) was applied, and does not imply the existence of entanglement
between such elements.
ǻǼ ǻǼ ǻǼ

Figure 1. Example of instances of an N = 8 qubits random cluster states. For (a–c) we have taken
q = 0.2, q = 0.5, and q = 1 respectively.

Scope of our investigation is ascertaining the phenomenology of distribution of (in general)

multipartite entanglement across a given network. In particular, we will focus on the possible
emergence of special values of q that are associated with the onset of multipartite entanglement, and
the characterisation of such quantum correlations. The inherently random nature of the resource
states that we consider makes any analytical prediction difﬁcult to be drawn and provides the
necessary motivations for the statistical approach that, instead, will be used in the analysis that
follows. Notwithstanding its limited analytical power, we ﬁnd such investigation both powerful
and insightful.
As a side remark we mention that, as we have in mind a linear-optics implementation, which
to date is one of the most promising and successful platforms for the engineering of cluster-state
resources, in our analysis we will not account for any effect of dissipation on the random states that
are generated using the protocol illustrated above, as photon losses are negligible in such a setting.

2.2. Analysis of the Entanglement Structure in a Random Four-Qubit State

We start our analysis by focusing on an intuitive ﬁgure of merit that is nevertheless able to
provide crucial information on the distribution of entanglement across one of the random graph states
discussed above, namely state purity. We thus proceed to compute the purity

Pσ = Trσ [ρ2σ ] ∈ [0, 1] (1)

of the reduced density matrix ρσ , and use the fact that, given the overall pure nature of |ψΣ , a value of
Pσ < 1 necessarily implies entanglement in the bipartition (Σ\σ)|σ. We have thus implemented the
protocol illustrated in Section 2.1 by calculating, in step 4, the percentage of reductions with Pσ < 1.

259
Entropy 2017, 19, 473

In order to illustrate the salient features of our analysis, we now address explicitly the case of
N = 4, for which Σ = {e1 , . . . , e4 }. The state that would be produced by applying CPHASEi,j (π ) gates
to every pair of qubits in the network, which would correspond to chosing q = 1, reads

1
|ψΣ = √ Ĥe4 (|φ+ e1 e4 |φ− e2 e3 + |ψ+ e1 e4 |ψ− e2 e3 )
2
1
= √ Ĥe3 (|φ− e1 e2 |φ+ e3 e4 − |ψ+ e1 e2 |ψ− e3 e4 )
2
(2)
1
= √ Ĥe2 (|φ− e1 e3 |φ+ e2 e4 − |ψ+ e1 e2 |ψ− e2 e4 )
2
1
= √ Ĥe1 (|φ− e1 e2 |φ+ e3 e4 − |ψ+ e1 e2 |ψ− e3 e4 )
2

where Ĥe j is the Hadamard gate on qubit e j and we have introduced the Bell states |φ± ei e j = (|00 ±
√ √
|11)ei e j / 2, |ψ± ei e j = (|01 ± |10)ei e j / 2. The orthogonality of Bell states ensures that entanglement
exists in the three inequivalent bipartition (ei , e j )|(ek , el ). Moreover, it is equally straightforward to
check that any single-qubit reduction is maximally mixed. Therefore, also the bipartitions ei |(e j , ek , el )
are entangled. This implies that for q = 1 we expect all six bipartitions that can be identiﬁed to be
inseparable and the state to be genuinely multipartite entangled. The purity of the associated reduced
states is thus necessarily smaller than one. However, for q < 1 the number of mixed-state reduction is
not necessarily as large as six, and our calculations aim at quantifying the percentage of such reduced
states as q is varied.
The results of such calculations are presented in Figure 2 (blue and red dots), where each data
point is the result of an average over Q = 5000 random instances, a sample-size that was large
enough to ensure convergence of the numerics. The error bars attached to each point show the
uncertainty associated to the averages, calculated as the standard deviation of each Q-sized sample
√
and divided by Q. Clearly, for q = 0 the state of the network is deterministically found to be the
factorised initial state ⊗4j=1 |+e j , while for q = 1 we retrieve the result anticipated above (Equation (2)).
In between such extreme situations, the number of inseparable two-vs.-two and one-vs.-three qubits
bipartitions (equivalently, mixed two-qubit and one-qubit states) grows monotonically with q, albeit at
slightly different rates. In particular, we ﬁnd that the percentage fraction of inseparable two-vs.-two
(three-vs.-one) qubits bipartitions exceeds 99.9% at q = 0.82 ± 0.01 (q = 0.89 ± 0.01), as shown by the
vertical dashed line marked as T2 (T3 ) in Figure 2. The nominal positions (uncertainties) of T2,3 have
been obtained as the average (standard deviations) over 100 analytical non-linear interpolations of
the results of our simulations, each producing the functions f 2,3 (q) (whose averages are shown by the
blue and red lines in Figure 2) that have been used to solve numerically the equations f 2,3 (q) = 99.9.
Quite clearly, T2
= T3 beyond statistical errors, which implies that the random network at hand requires
a higher threshold in q to produce a complete set of inseparable one-vs.-three qubits bipartitions.

260
Entropy 2017, 19, 473

Figure 2. We study the percentage fraction of mixed-state reductions that can be identiﬁed in a network
of N = 4 elements, against the threshold parameter q. The blue (red) dots show the results of the
numerical experiment aimed at quantifying the fraction of mixed two-qubit (one-qubit) reductions.
The orange points identify the values of the percentage fraction F2 of two-qubit reductions whose
purity is exactly 1/4. The solid lines are non-linear interpolations of the data points. Each point is the
result of an average over a sample of Q = 5000 elements. Error bars show the standard deviations
associated with such averages. Dashed lines T2,3 identify the value of q at which the number of mixed
two- and one-qubit reductions is at least 99.9% of the possible ones. The line labelled max[F2 ] identiﬁes
the value of q at which the maximum of F2 occurs.

Needless to say, the empirical rule of “no free lunch” applies here as well: the establishment of
multipartite entanglement in the network under scrutiny has to come at the expenses of something
else, in light of the monogamy of entanglement. The specific algorithm at hand allows us to explore
who pays the toll represented by the establishment of genuine multipartite entanglement in the
random network.
In particular, we expect bipartite entanglement to be affected by the emergence of multipartite one.
Such expectation is corroborated by the analysis summarized by the orange dots and curve in Figure 2,
which show the percentage fraction F2 of two-vs.-two qubits reductions of random states at a given
value of q that have purity exactly equal to 1/4, which is the lowest a two-qubit state can achieve and
witnesses maximum entanglement across the (ei , e j )|(ek , el ) bipartition. Quite intuitively, F2 grows at
small values of q: a low threshold implies very small probability to apply multiple CPHASE gates,
which inevitably favours the construction of maximally entangled two-qubit states. For q 1, we have
a large probability that one qubit is affected by multiple CPHASE gates. Intuitively, this should be
able to set strong multipartite entanglement and deplete the degree of bipartite one, and we expect F2
to decrease accordingly. Indeed, we know that at q = 1 we have a genuinely multipartite entangled.
The orange dots in Figure 2 confirm such expectation, and show the occurrence of a maximum of F2
that is close, yet not identical, to the chosen thresholds T2,3 discussed above (we have that max[F2 ]
occurs at q = 0.72 ± 0.01).
Of course, counting for the number of reductions that are in mixed states does not provide full
information about multipartite nature of the entanglement that is established among the elements of
the network. We remind that a pure N-partite state is called genuinely multipartite entangled if it is
not separable with respect to any of the possible bipartitions of its N elements. One can thus check
the multipartite nature of the entanglement of a given pure state by counting the number of separable
bipartitions that can be drawn. As each instance of our random sample is a pure state, we have decided
to approach this task by using the N-partite generalisation of negativity defined as
/
EN = N Π{σ} Eσ|Σ\σ , (3)

261
Entropy 2017, 19, 473

where Eσ|Σ\σ is the negativity of the partially transposed density matrix of the bipartition σ|Σ\σ and
the product extends to all the bipartitions. We recall the deﬁnition of negativity as

Eσ|Σ\σ = max[0, −2 ∑ λ−
j ] (4)
j

with {λ− j } the set of negative eigenvalues of the partially transposed (with respect to any of the
subparties) density matrix of the bipartition σ|Σ\σ. The geometric average upon which Equation (3)
is built is null whenever at least one of the bipartitions of the network is positive under partial
transposition. Therefore, for pure states, only if all bipartitions are certified inseparable according
to the partial transposition criterion is the state of the network genuinely multipartite entangled.
The situation is much more difficult when mixed states are considered, for which the non-nullity of
the quantity in Equation (3) is no guarantee of the existence of genuine multipartite entanglement in
a given state [14].
Figure 3 shows the behavior of E4 against q. While for q > 0 we always have four-partite
entanglement (in line with the finding in Figure 2), it is remarkable that q = 1 is not associated with
the largest degree of four-partite negativity, which actually occurs at q = 0.72 ± 0.01.

Figure 3. Average four-partite negativity E4 plotted against q obtained for a sample of Q = 5000 random
network states. The error bars are the standard deviations associated with the averages. The orange
solid line is a non-linear interpolating function whose maximum is achieved at q = 0.72 ± 0.01 (vertical
dashed line).

We continue the assessment of the four-partite case by pointing out the differences between the
average behavior of the ﬁgures of merit addressed herein and the values taken by such indicators over
the average state of the network. The latter is deﬁned as the state obtained upon mediating over Q
random instances of network states. Formally, by assuming all instances to be equally likely to occur
(which is entailed by choosing the probabilities to apply gates CPHASEi,j (π ) uniformly), the physical
state of the system is described by the density matrix

Q
1
ρΣ =
Q ∑ |ψ ψ|Σ,j , (5)
j =1

where |ψ ψ|Σ,j is the jth random state of the Q-sized sample.
With the exception of the cases associated with q = 0, 1 (when we sum identically prepared states),
by averaging we lose the purity of the network state: PΣ reaches values as low as 0.14 for q = 0.5
(cf. Inset (a) of Figure 4), which is however larger than the minimum purity 1/16 achievable by a
four-qubit state. Despite being mixed, the average state of the network preserves signiﬁcant quantum
coherences as quantiﬁed by the measure proposed in [15] and formalised as

C= ∑ |(ρΣ )ij | (6)

i
= j

262
Entropy 2017, 19, 473

with |(ρΣ )ij | the off-diagonal elements of the density matrix ρΣ . The behavior of C against q is shown
in Inset (b) in Figure 4: a minimum of the measure of coherence is achieved in correspondence of the
minimum purity. However, such a minimum is strictly non-null, thus leaving open the possibility of
dealing with a (mixed) state of the network exhibiting a non-trivial entanglement structure. Such a
possibility is confirmed by the analysis of E4 (cf. main panel of Figure 4), which is a growing function
of q (similar trends are exhibited by both the two-vs.-two qubits entanglement E(ei ,e j )|(ek ,el ) , and the
one-vs.-three qubits one E(ei )|(e j ,ek ,el ) ). Nothing remarkable in the behavior of E4 appears to be related
to the value of q = 0.5, although the function changes concavity in correspondence to such a value
of the probability threshold. It should be noticed that, as anticipated, in such an average-state case
E N cannot be interpreted as a quantifier of genuine multipartite entanglement. Indeed, the revelation
of multipartite entanglement in general multiparty mixed states requires a more refined approach
(see [16] for a recent assessment of this point and the provision of useful criteria). Nevertheless, this
figure of merit is still very useful for our analysis, as it provides valuable information on the average
amount of bipartite entanglement within the statistically average stage of the network, and we will thus
make further use of E N in the remainder of this work. Finally, the non-nullity of either E(ei )|(e j ,ek ,el ) ’s
or E(ei ,e j )|(ek ,el ) ’s does not exclude the possibility of facing bound entanglement (i.e., non-distillable
entanglement) of the negative-partial-transposition nature [17] in those bipartitions, an issue that goes
beyond the scopes of this work.

PΣ Inset (a)

Inset (b)

Figure 4. Main panel: Logarithmic plot of the entanglement within the average estate ρΣ of an
N = 4 random network against the threshold probability q. The red dots show the value taken
by the four-partite negativity E4 , while the blue and orange ones are for the entanglement within
the bipartitions (ei , e j )|(ek , el ) and (ei )|(e j , ek , el ). The lines connecting the dots are simply guides to
the eye. Inset (a): Purity PΣ of the average state against q. The dashed horizontal line shows the
minimum purity of a four-qubit state. Inset (b): Values taken by the measure of coherence C against
the threshold probability.

To ﬁnish the study of this paradigmatic case, we report in the main panel of Figure 5 the behavior
of E3 in the four three-qubit reduced states that can be singled out from our network. We have used
the tripartite version of Equation (3) to quantify the entanglement and changed our notation so as to
make explicit the triplets of elements of the network that we ave considered. Moreover, by tracing
out two elements, we have evaluated the residual two-qubit entanglement, whose average across the
six two-qubit reductions is displayed in the inset of Figure 5. The general trend of such ﬁgures of
merit follows the expectation that, in the large-q region, the entanglement in the reduction is depleted
to favour the emergence of multipartite one. Moreover, their quantitative value is, in general, very
small. A point of notice is that the peak of three- and two-qubit negativity does not occur at the same
value of q, thus suggesting an interesting hierarchy of values of q at which the various structures of
entanglement across the system are triggered or destroyed.

263
Entropy 2017, 19, 473

Figure 5. Main panel: E3 in the three-qubit reductions (extracted from an N = 4 network) identiﬁed in
the legend, plotted against q. Each plot is an average over Q = 5000 realisation of the random network
state (we omit the error bars for clarity of presentation). Inset: Mean bipartite negativity E bip averaged
over the six two-qubit reduced states that can be singled out from our network. Same conditions as in
the main panel.

2.3. Enlarging the Size of the Network

We now assess the features of larger networks of qubits, addressing questions that are akin to
those assessed in Section 2.2. Features similar to those showcased in the four-qubit network are present
in all the higher-dimensional systems that we have studied through our simulations. For instance,
Figures 6 and 7 display the same behaviors highlighted in Figures 2 and 4, respectively. Rather than
reporting qualitatively similar plots for larger networks, in Table 1 we present the threshold values of
q at which progressively larger reductions of the state of the network are mixed.

Figure 6. We study the percentage fraction of mixed-state reductions that can be identiﬁed in a network
of N = 5 elements, against the threshold parameter q. The red dots show the results of the numerical
experiment aimed at quantifying the fraction of mixed two- and three-qubit reductions, which actually
coincide. The purple dots show the results for the one-qubit reductions. The orange points identify the
values of the percentage fraction F2 of two-qubit reductions whose purity is exactly 1/4. The solid lines
are non-linear interpolations of the data points. Each point is the result of an average over a sample of
Q = 104 elements. Error bars show the standard deviations associated with such averages. Dashed
lines T2,3 (T4 ) identify the value of q at which the number of mixed two- and three-qubit (one-qubit)
reductions is at least 99.9% of the possible ones. The line labelled max[F2 ] identiﬁes the value of q at
which the maximum of F2 occurs.

264
Entropy 2017, 19, 473

PΣ
Inset (a)

Inset (b)

Figure 7. Main panel: Logarithmic plot of the entanglement within the average estate ρΣ of an
N = 5 random network against the threshold probability q. The red dots show the value taken by E5 ,
while the blue and orange ones are for the entanglement within the bipartitions (ei , e j )|(ek , el , em ) and
(ei )|(e j , ek , el , em ). The lines connecting the dots are simply guides to the eye. Inset (a): Purity PΣ of
the average state against q. The dashed horizontal line shows the minimum purity of a four-qubit state.
Inset (b): Values taken by the measure of coherence C against the threshold probability.

The trend is clear: as we look into larger networks, the value of Tk (k = 2, 3, . . . ) decreases.

Table 1. The table shows the threshold value of q at which the fraction of progressively larger reductions
in an N-element random network is at least 99.9%. Black squares stands for unavailable data at that
size of the network. As before, max[F2 ] is the value of q at which the maximum of F2 occurs.

N 4 5 6 ··· 9
max F2 0.72 0.66 0.64 0.40
T2 0.82 0.67 0.57 0.39
T3 0.89 0.67 0.54 0.31
T4 0.818 0.57 0.27
T5 0.75 0.27
T6 0.31
T7 0.39
T8 0.40

2.4. Entanglement Percolation

It is interesting to compare our analysis to entanglement percolation, a concept akin to classical
bond percolation introduced in [12]. Consider a graph of particles akin to one of those addressed
in this paper. This time, though, a link between two elements implies the presence of entanglement
between them. Ref. [12] shows the existence of a minimum amount of entanglement between any two
elements of the network needed to establish a perfect quantum channel between distant (not directly
connected) elements, with signiﬁcant (non-exponentially decaying) probability.
This is fundamentally different from our situation, where instead we point out the existence
of a minimum probability to randomly apply a two-qubit gate in a network associated with the
establishment of a genuinely multipartite entangled state of the network. Our threshold does not
guarantee the existence of a long-distance entangled channel between arbitrarily chosen elements of
the network. In fact, non-nearest-neighbour elements of a cluster state are not necessarily entangled,
their entanglement being in general dependent on the geometry of the underlying network.
In order to ascertain if a value of q exists above which long-haul entanglement is set in the
network, we computed the negativity of the reduced state of the qubits that have the largest number
of intermediate sites between them, at a given value of N. This is analogous to the study presented in
the inset of Figure 5, although instead of an average over all the possible two-qubit reductions, here
we consider now only a speciﬁc reduction. Figure 8 shows the results valid for the case of N = 6, for
which we address the entanglement between elements e1 and e4 . We have considered the percentage
of reductions of such elements with a non-zero value of negativity against the value of q. Quite clearly,

265
Entropy 2017, 19, 473

such a percentage remains always very small, regardless of q, showing that no classical entanglement
percolation effect occurs, as there is no value of q at which long-distance entanglement within the
network is set deterministically. The results should be considered as canonical, qualitatively valid
regardless of the actual choice of N, and indicative of the profound differences between the situation
addressed here and the study in [12].

Figure 8. Percentage of reduced states of elements e1 and e4 of an N = 6 random network exhibiting a

non-zero negativity, plotted against q, for a sample of 5 × 104 states.

3. Discussion
We have studied the entanglement sharing structure among the elements of a qubit network
subjected to probabilistic CPHASE gates. We have highlighted the existence of statistically inequivalent
thresholds in the probability of application of the gates for the settling of entanglement in various
subsets of network elements, thus unveiling an interesting hierarchy in the entanglement distribution
pattern of a given network. The phenomenology that we have highlighted cannot be understood
in terms of the statistical properties of an intuitive, yet too naive, reference state such as the one
obtained by averaging overall the elements of the random set of states generated in our numerical
experiments: the above-mentioned hierarchy is a statistical feature of random networks rather than a
property of the statistically average state of the network. Remarkably, small worlds structures in the
entanglement sharing of the random set of network states appear to emerge. This is an interesting
feature that deserves more attention and upon which we plan to focus our forthcoming (theoretical
and experimental) efforts.

Acknowledgments: Mario Arnolfo Ciampini acknowledges support from QUCHIP-Quantum Simulation on

a Photonic Chip, FETPROACT-3-2014, Grant agreement no: 641039, Mauro Paternostro acknowledges support
from the SFI-DfE Investigator Programme (grant 15/IA/2864), and the Royal Society.
Author Contributions: Mario Arnolfo Ciampini conceived the idea, Mario Arnolfo Ciampini and Mauro
Paternostro performed the simulation and analysed the data, Mauro Paternostro and Paolo Mataloni interpreted
the results, all authors contributed to drafting and revisioning the manuscript.
Conﬂicts of Interest: The authors declare no conﬂict of interest.

References
1. Karinthy, F. Láncszemek. In Minden Masképpen van, 1929. Available online:
http://mek.oszk.hu/15500/15588/15588.pdf (accessed on 5 September 2017). (In Hungarian)
2. Albert, R.; Jeong, H.; Barabasi, A.-L. Internet: Diameter of the World-Wide Web. Nature 1999, 401, 130–131.
3. Kimble, H.J. The quantum internet. Nature 2008, 453, 1023–1030.
4. Munro, W.J.; Harrison, K.A.; Stephens, A.M.; Devitt, S.J.; Nemoto, K. From quantum multiplexing to
high-performance quantum networking. Nat. Photonics 2010, 4, 792–796.
5. Epping, M.; Kampermann, H.; Bruß, D. Robust entanglement distribution via quantum network coding.
New J. Phys. 2016, 18, 103052.
6. Zhu, C. P.; Xiong, S.-J. Localization-delocalization transition of electron states in a disordered quantum
small-world network. Phys. Rev. B 2000, 62, 14780.
7. Giraud, O.; Georgeot, B.; Shepelyansky, D.L. Tuning clustering in random networks with arbitrary degree
distributions. Phys. Rev. E 2005, 72, 036203.

266
Entropy 2017, 19, 473

8. Briegel, H.J.; Browne, D.E.; Dür, W.; Raussendorf, R.; Van den Nest, M. Measurement-based quantum
computation. Nat. Phys. 2009, 5, 19.
9. Vallone, G.; Donati, G.; Ceccarelli, R.; Mataloni, P. Six-qubit two-photon hyperentangled cluster states:
Characterization and application to quantum computation. Phys. Rev. A 2010, 81, 052301
10. Vallone, G.; Pomarico, E.; De Martini, F.; Mataloni, P. One-way quantum computation with two-photon
multiqubit cluster states. Phys. Rev. A 2008, 78, 042335.
11. Ciampini, M.A.; Orieux, A.; Paesani, S.; Sciarrino, F.; Corrielli, G.; Crespi, A.; Ramponi, R.; Osellame, R.;
Mataloni, P. Path-polarization hyperentangled and cluster states of photons on a chip. Light Sci. Appl. 2016,
5, e16064.
12. Acín, A.; Cirac, J.I.; Lewenstein, M. Entanglement Percolation in Quantum Networks. Nat. Phys. 2007, 3, 256.
13. Horodecki, R.; Horodecki, P.; Horodecki, M.; Horodecki, K. Quantum entanglement. Rev. Mod. Phys. 2009,
81, 865.
14. Huber, M.; Mintert, F.; Gabriel, A.; Hiesmayr, B.C. Detection of high-dimensional genuine multipartite
entanglement of mixed states. Phys. Rev. Lett. 2010, 104, 210501.
15. Baumgratz, T.; Cramer, M.; Plenio, M.B. Quantifying Coherence. Phys. Rev. Lett. 2014, 113, 140401.
16. Lancien, C.; Gühne, O.; Sengupta, R.; Huber, M. Relaxations of separability in multipartite systems:
Semideﬁnite programs, witnesses and volumes. J. Phys. A Math. Theor. 2015, 48, 505302.
17. Horodecki, P.; Horodecki, R. Distillation and bound entanglement. Quant. Inf. Comp. 2001, 1, 45.

267
entropy
Article
Non-Causal Computation
Ämin Baumeler 1 and Stefan Wolf 2, *
1 Faculty of Informatics, Università della Svizzera italiana, 6900 Lugano, Switzerland; [email protected]
2 Facoltà indipendente di Gandria, 6978 Gandria, Switzerland
* Correspondence: [email protected]; Tel.: +41-58-666-4000

Received: 11 May 2017; Accepted: 30 June 2017; Published: 2 July 2017

Abstract: Computation models such as circuits describe sequences of computation steps that
are carried out one after the other. In other words, algorithm design is traditionally subject to the
restriction imposed by a ﬁxed causal order. We address a novel computing paradigm beyond quantum
computing, replacing this assumption by mere logical consistency: We study non-causal circuits, where
a ﬁxed time structure within a gate is locally assumed whilst the global causal structure between the
gates is dropped. We present examples of logically consistent non-causal circuits outperforming
all causal ones; they imply that suppressing loops entirely is more restrictive than just avoiding
the contradictions they can give rise to. That fact is already known for correlations as well as for
communication, and we here extend it to computation.

Keywords: physical computing models; complexity classes; causality

1. Introduction
Computations, understood as realized through Turing machines, billiard or ballistic computers [1],
circuits, lists of computer instructions, or otherwise, are often designed to have a linear (i.e., causal)
time flow: After a fundamental operation is carried out, the program counter moves to the next
operation, and so forth. Surely, this is in agreement with our everyday experience; after you finish to
read this sentence, you continue to the next (hopefully), or do something else (in that case: goodbye!).
What sorts of computation become admissible if one drops the assumption of a linear time flow and reduces it to
mere logical consistency? One could imagine that a linear time flow restricts computation strictly beyond
what would be allowed for the purely logical point of view. Indeed, we show this to be true. If the assumption
of a linear time flow is dropped, a variable of the computational device could depend on “past” as
well as “future” computation steps. Such a dependence can be interpreted as loops in the time flow,
e.g., generated by a closed timelike curve [2]. There are two fundamental issues that might make loops
logically inconsistent. One is the liability to the grandfather antinomy. In a loop-like information flow,
multiple contradicting values could potentially be assigned to a variable—the variable is overdetermined.
The other issue is underdetermination: a variable could take multiple consistent values, yet the model
of computation cannot predict which actual value it takes. This underdetermination is also known
as the information antinomy. To overcome both issues, we restrict ourselves to models of computation
where the assumption of a linear time flow is dropped and replaced by the assumption of logical
consistency: All variables are neither overdetermined nor underdetermined. We call such models
of computation non-causal. Our main result is that non-causal models of computation are strictly more
powerful than the traditional causal ones. Therefore, causality is a stronger assumption than logical
consistency in the context of computation. Similar results are also known with respect to quantum
computation [3–7], correlations [5,8–11] as well as communication [12]. As we will show later, such
circuits are “programmed” by introducing a contradiction if an undesired result is found. This is like
guessing the solution to a problem and killing the own grandfather in the event that the guess was
wrong (similar to “quantum suicide” [13] or “anthropic computing” [14]).

Entropy 2017, 19, 326; doi:10.3390/e19070326 269 www.mdpi.com/journal/entropy

Entropy 2017, 19, 326

The article is structured as follows. First, we discuss the assumption of logical consistency
in more depth, then we describe a non-causal circuit model of computation and give a few examples
of problems that can be solved more efficiently. We continue by describing other non-causal models
of computations: the non-causal Turing machine and non-causal billiard computer. We conclude
by showing how to efficiently find a satisfying assignment to a SAT formula if the number of satisfying
assignments is previously known.

2. Logical Consistency
Let ρt be the ensemble of all variables (also called state) of a computational model at a time t.
In general, ρt depends on ρt−1 , ρt−2 , . . . . Without loss of generality, assume that ρt depends on ρt−1
only (i.e., the computation is described by a Markov chain). These dependencies are depicted
in Figure 1a. In a non-causal model, however, the values that are assigned to the variables at time t could
in principle depend on “future” time-steps; e.g., the assignment ρ0 could depend on ρm , which results
in a Markovian “bracelet” or circle (see Figure 1b).

Figure 1. Causal and non-causal computation. The arrows point in the direction of computation.
(a) The values that are assigned to the variables of a computational model at time t depend on ρt−1 .
(b) Cyclic dependencies of the values that are assigned to the variables at different steps during the
computation.

A computational model is not overdetermined if and only if the values that are assigned to the
variables do not contradict each other. This is equivalent to the existence of a ﬁxed point [15]
of the Markov chain that results from cutting the “bracelet” at an arbitrary position (see Figure 1b).
Let f be a function that describes the behaviour of this Markov chain. Then, the computational model
is not overdetermined if and only if ∃ x : f ( x ) = x.
A computational model is not underdetermined if and only if there exists at most one ﬁxed point [15]:

|{ x | x = f ( x )}| ≤ 1 .

Logical consistency is identiﬁed [15] with no overdetermination and no underdetermination; i.e.,

the existence of a unique ﬁxed point:

∃!x : f ( x ) = x .

270
Entropy 2017, 19, 326

3. Non-Causal Circuit Model

A circuit consists of gates that are interconnected with wires. In the traditional circuit model,
back-connections (i.e., a cyclic path through a graph where gates are identiﬁed with nodes and
wires are identiﬁed with edges) are either forbidden or interpreted as feedback channels. An example
of a feedback channel is an autopilot system in an aircraft that, depending on the measured altitude,
adjusts the rudder and the power setting to maintain the desired altitude, at the same time avoiding
a stall. Here, we interpret back-connections or loops differently. Whilst in the above scenario the
feedback gets introduced at a later point in the computation, the back-action in a non-causal circuit
effects the system at an earlier point. Such a back-action can be interpreted as acting into the past.
Another interpretation is that every gate has its own time (clock), but no global time is assumed—this
interpretation stems from the studies of correlations without causal order [5,8]. Such an interpretation
might be more pleasing: Here, “earlier” is understood logically, and the assumption of a global causal
order is simply replaced by logical consistency.
A non-causal circuit consists of gates that can be interconnected arbitrarily by wires, as long
as the circuit as a whole remains logically consistent. An example of a circuit that is overdetermined
and an example of a circuit that leads to the information antinomy (under-determined) are given
in Figure 2.

Figure 2. (a) Overdetermined circuit: The bit 0 is mapped to 1 and vice versa; i.e., there is no consistent
assignment of a value that travels on the wire. (b) Information antinomy: Both 0 and 1 could potentially
travel on the wire, yet the circuit does not specify which.

We model a gate G by a Markov matrix Ĝ with 0–1 entries. Without loss of generality, assume
that the input and output dimensions of a gate are equal. The Markov matrix of the ID gate on a single
bit (see Figure 2b) is

1 0
1= ,
0 1

and the Markov matrix of the NOT gate on a single bit (see Figure 2a) is

0 1
N̂ = .
1 0

Values are modeled by vectors; e.g., in a binary setting, the value 0 is represented by the
vector (1, 0) T and the value 1 is represented by the vector (0, 1) T . In general, an n-dimensional
variable with value i is modeled by the n-dimensional vector i with a 1 at position i, and where all
other entries are 0. A gate is applied to a value via the matrix-vector multiplication; i.e., the output
of G on input a is x = Ĝa. Let F and G be two gates. The Markov matrix of the parallel composition
of both gates is F̂ ⊗ Ĝ. They are composed sequentially with a wire that takes the d-dimensional output
of F and forwards it as input to G. By this, we obtain a new gate H = G ◦ F which represents the
sequential composition. The sequentially composed gate is

d −1
Ĥ = ∑ ĜvvT F̂ = Ĝ F̂ .
v =0

271
Entropy 2017, 19, 326

By using these rules of composition, a causal circuit can always be modeled by a single gate.
A closed circuit is a circuit where all wires are connected to gates on both sides. Let H be the gate that
describes the composition of all gates for a given causal circuit. We can transform any such circuit into
a closed non-causal circuit by connecting all outputs from H with all inputs to H. A logically consistent
closed circuit is thus a circuit where a unique assignment of a value c to the looping wire exists:

c = Ĥc ⇐⇒ c T Ĥc = 1 . (1)

In other words, the described closed circuit is logically consistent if and only if the diagonal
of Ĥ consists of 0’s with a single 1. The position of the 1-entry represents the ﬁxed point and the
value c on the looping wire. Note that for a given closed circuit, the gate H is not unique, but might
depend on where the “cut” is introduced. An open circuit is a circuit where some wires are not
connected to a gate on one side. Thus, such a circuit has either an input a, an output x, or both.
A logically consistent open circuit, therefore, is a circuit where for any choice of input a, a unique
assignment of a value c to the looping wire and to the output x exists, such that

( x ⊗ c) T Ĥ ( a ⊗ c) = 1 ,

where the second output from H is looped to the second input to H.

Let c a be the value on the looping wire of a logically consistent open circuit C with input a. We can
transform C into a family {Ci }0≤i<d of logically consistent closed circuits such that the value on the
same looping wire of Ci is ci . The circuit Ci is constructed by attaching the gate

d −1
D̂i = ∑ iT v
v =0

to the input and output wires of C (see Figure 3a,b). The gate Di unconditionally outputs the value i.
There is an ambiguity on which wires are regarded as “looping”. We show that two different
representations H and H of the same closed non-causal circuit C yield the same computation
(the difference between H and H is the identiﬁcation of the looping wires). Different H and H
that represent the same non-causal circuit C can be written as H = Q ◦ R and H = R ◦ Q. For H,
the looping wires are those that exit Q and enter R, and for H , vice versa. From Equation (1), we have

∃!c : c T Ĥc = c T Q̂ R̂c = c T Q̂ ∑ eeT R̂c = 1 .
e

Since R is deterministic, the value of e is uniquely determined. Thus, we obtain

∃!c : c T Q̂e∗ e∗T R̂c = 1 ,

where e∗ is the speciﬁc value on the wire exiting R and entering Q. Conversely,

∃!e : eT Ĥ e = eT R̂ Q̂e

= eT R̂ ∑ c cT Q̂e
c
= eT R̂c∗ c∗T Q̂e = 1,

holds. The only way H and H each have a unique ﬁxed point is with the identiﬁcation e∗ = e .
Therefore, both representations H and H assign the same values to the wires. By the above translation
from open to closed circuits, we see that the same reasoning can be applied to open circuits.

272
Entropy 2017, 19, 326

Figure 3. (a) Open circuit C with input a. (b) Closed circuit Ci with a = i → c a = ci . (c) The big
box represents a non-causal comb (note that combs obey causality; the higher-order transformations
described here are equivalent to combs, yet where the causality assumption is dropped) that transforms
a gate (H ) to a new gate, the composition.

Above, we considered deterministic Markov processes. It is natural to extend this model

to probabilistic processes (i.e., stochastic matrices). The logical consistency condition in that
case—as studied in Ref. [15]—is

Tr Ĥ = 1 , (2)
∀i, j : Ĥi,j ≥ 0 ,

that is, the diagonal of Ĥ consists of non-negative numbers (probabilities) that add up to 1. Equation (2)
can be interpreted as “the average number of ﬁxed points is 1”. To see this, we decompose H as a convex
combination of deterministic matrices

Ĥ = ∑ pi Ĥi ,
i

where for all i, Ĥi is deterministic. Then, Equation (2) states

Tr Ĥ = ∑ pi Tr Ĥi = 1 .
i

For an arbitrary deterministic matrix D̂, the expression Tr D̂ represents the number of ﬁxed points,
with which we arrive at the stated interpretation.
An open non-causal circuit can be represented by a non-causal comb [5] G which is a higher-order
transformation—G transforms the gate H to a new gate (see Figure 3c). The non-causal comb G,
for instance, could connect the output from H with the input of H , as long as the composition remains
logically consistent.

4. Computational Advantage
The logical consistency requirement forces the value on a looping wire to be the unique fixed
point of the transformation. This can be exploited for finding fixed points of a black box, which yields
an advantage in higher-order computation. Suppose we are given a black box B that takes (produces)

273
Entropy 2017, 19, 326

a d-dimensional input (output) and has a unique ﬁxed point x previously unknown to us. As a Markov
matrix, B is

d −1
B̂ = ∑ ei i T , with |{i | ei = i }| = 1 .
i =0

Our task is to find the fixed point x in as few queries as possible. If we solve this task with a causal
circuit, then, in the worst case, d − 1 queries are needed. In contrast, with a non-causal circuit, a single
query suffices. The reason for this is that the black box is queried with the fixed point only. Any other
query would lead to a logical contradiction, and therefore does not occur. For that purpose, we just
connect the output of B with the input of B and use a second wire to read out the value (see Figure 4a).
This circuit is logically consistent because

∀ a, ∃!c, x : ( x ⊗ c) T Ĉ (1 ⊗ B̂)( a ⊗ c)
= ( x ⊗ c) T Ĉ ( a ⊗ B̂c) = 1 ,

where Ĉ is the CNOT gate and 1 is the identity. However, this construction only works if B has a unique
fixed point. Suppose B2 has two fixed points. In that case, the circuit from Figure 4b can be used to find
both fixed points with two queries. In addition to short-cutting the black boxes, we need to introduce
a gate G that ensures a unique fixed point of the whole circuit. The gate G works in the following way:

Ĝ = ∑ ( a ⊗ b ⊗ c ⊗ c ⊗ 0)( a ⊗ b ⊗ c ⊗ c ⊗ e) T +
e,c− a<c −b

∑ ( a ⊗ b ⊗ c ⊗ c ⊗ ē)( a ⊗ b ⊗ c ⊗ c ⊗ e) T ,
e,c− a≥c −b

where e is binary, ē = e ⊕ 1, the addition is carried out modulo 2, and 0 is a 2-dimensional vector
representing the value 0. In words, if the value c on the upper wire is less than the value on the
lower wire c , and e is 0, then we get a fixed point on the third wire of G (variable e in Figure 4b).
Otherwise, the bit on the third wire gets flipped—no fixed point. This guarantees that all loops
together have a unique fixed point. Ironically, the gate G suppresses certain fixed points on the previous
loops by introducing a logical inconsistency at a later point in the circuit. This resembles “anthropic
computing” [14], where one guesses the solution to a problem and commits suicide if the guess
was wrong—a recipe to solve NP-complete problems in the relative-state interpretation of quantum
mechanics [16] and where consciousness follows only those branches where the programmer remains
alive. Such a construction can be used to find the fixed points of a black box with a few fixed points
and where the number of fixed points is known. For a large number n of fixed points (e.g., n = d/2),
we can use the probabilistic approach to non-causal circuits. Let Bn be a black box with n fixed points
and input and output spaces of dimension d. The Markov matrix of Bn is

d −1
B̂n = ∑ ei i T , with |{i | ei = i }| = n .
i =0

We construct a randomized gate where the average number of ﬁxed points is one:

1 n−1
B̂ = B̂n + N̂ ,
n n
with
n −1
N̂ = ∑ īiT , ī = i ⊕ 1 .
i =0

274
Entropy 2017, 19, 326

The gate N̂ can be understood as a d-dimensional generalization of the NOT gate for bits:
The input is increased by one modulo d. Such an N̂ has no ﬁxed points. The mixture B̂ is logically
consistent, because

1 n−1 1 n−1
Tr B̂n + N̂ = Tr B̂n + Tr N̂ = 1 .
n n n n

This means that we can use the circuit from Figure 4a to ﬁnd a random ﬁxed point of Bn .

Figure 4. Fixed point search for a black box with one and a black box with two fixed points. (a) The
output x is the fixed point c added to the input a. (b) Circuit for finding a fixed point for a black box
with two fixed points.

We apply these tools to find solutions to instances of search problems with a known number
of solutions, and where a guess for a solution can be verified efficiently by a verifier V. In other words,
we can find solutions to NP search problems, yet where the number of solutions to an instance must
be known to us in advance. Note that the following construction does not solve a decision problem,
but rather finds the solution. Suppose an instance I to a problem Π has a unique solution. We replace
the gate B of Figure 4a with a new gate V that acts in the following way: it takes a guess c for a solution
to Π( I ) as input, and runs V to verify c. If V accepts c, then V outputs c, and otherwise, V outputs c ⊕ 1,
where the addition is carried out modulo d. Such a circuit has a unique fixed point c which equals the
solution of Π( I ). This, for instance, could be applied to a SAT formula, where a unique assignment
of values to variables exist which make the formula true. Note that this approach does not prove
an advantage in finding satisfying assignments for SAT formulas, even if the number of these satisfying
assignments is previously known; currently, we do not know how difficult or easy it is to solve such
instances causally.

5. Other Non-Causal Computational Models

We brieﬂy discuss non-causal Turing machines and non-causal billiard computers. A Turing
machine T has a tape, a read/write head, and an internal state machine. After every read instruction,
the state machine moves to the next internal state, and thereby decides what to write and where
to move the head to. A non-causal Turing machine is a machine where parts of the tape are not “within
time”: “Future” (from the head’s point of view) write instructions inﬂuence “past” read instructions.
A symbol that is written at time t to position j could be read at time t < t form position j; i.e., symbols
can be read “before” they are written. As with other self-referential systems, this leads to problems
that can be solved if we enforce the condition of logical consistency, as discussed above. Another issue
is that multiple write instructions could overwrite the value on position j. This leaves open the question
of what value is read. We can overcome this issue by running the Turing machine in a reversible

275
Entropy 2017, 19, 326

fashion and by generating a history tape [17], where no memory position gets overwritten. An example
of a non-causal Turing machine is where the history tape is non-causal in the sense that symbols can
be read “before” they are written.
The billiard computer is a model of computation on a billiard table [1]. Before the computation
starts, obstacles are placed on the table in such a way that the induced reﬂections of the balls and
the collisions among the balls result in the desired computation. A non-causal version of a billiard
computer is a billiard table where the holes are connected with closed timelike curves (CTCs) [2] that
are logically consistent. Now, a billiard ball could also collide with its younger self; this introduces
a non-causal effect. Echeverria, Klinkhammer, and Thorne [2] showed that solutions to CTC-dynamics
that are not overdetermined exist. However, all solutions that they found are underdetermined.
The non-causal circuits presented in this work indicate that logically consistent non-causal billiard
computers are also admissible.

6. Conclusions and Open Questions

We show that models of computation where parts of the output of a computation are (re)used
as input to the same computation are logically possible. Furthermore, such a model of computation
helps to solve certain tasks more efficiently. The question is how much more powerful this new
model of computation is, and whether uncomputable tasks become computable when compared
to the standard circuit model. A strong restriction of the model is that before one can find a fixed
point, one needs to know the number of fixed points. For instance, if we want to find a satisfying
assignment for a SAT formula F with variables x0 , x1 , . . . , we first need to know the number of satisfying
assignments—otherwise, we do not know how to construct the circuit. Ironically, this means that
to solve a SAT problem without any promise, we first need to solve a problem that is believed
to be much harder: a #SAT problem. One might want to apply the Valiant–Vazirani [18] method
to F = F ∨ ( x0 ∧ x1 ∧ . . . ) to reduce the number of satisfying assignments to 1 (the reason why
we modify F to F is to guarantee satisfiability). The problem that we are left with is that we do not
know whether the output F of the Valiant–Vazirani method has a unique satisfying assignment or
not—the reduction is probabilistic. Therefore, we cannot plug F into a circuit like the one shown in
Figure 4a to find the fixed point.
A model of computation similar to but more general than ours is based on Deutsch’s [19] CTCs.
Aaronson and Watrous [20] showed that the classical special case of Deutsch’s model can solve
problems in PSPACE efficiently. However, in Deutsch’s model, in contrast to ours, the information
antinomy arises. Deutsch mitigates this issue by defining that the value on the looping wire is
the uniform mixture of all solutions. This introduces a non-linearity into Deutsch’s model: the
output of a circuit depends non-linearly on the input. A consequence of this is that—in the quantum
version—quantum states can be cloned [21]. As it is linear, the model studied here is not exposed
to such consequences.
Acknowledgments: We thank Mateus Araújo, Veronika Baumann, Cyril Branciard, Časlav Brukner, Fabio Costa,
Paul Erker, Adrien Feix, Arne Hansen, Alberto Montina, Christopher Portmann, and Benno Salwey for helpful
discussions. This work was supported by the Swiss National Science Foundation (SNF), the National Centre
of Competence in Research “Quantum Science and Technology” (QSIT) and the COST action on Fundamental
Problems in Quantum Physics.
Author Contributions: Ämin Baumeler carried out this research and wrote this article, which is also part of his
Ph.D., based on discussions with Stefan Wolf. Both authors have read and approved the final manuscript.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Fredkin, E.; Toffoli, T. Conservative logic. Int. J. Theor. Phys. 1982, 21, 219–253.
2. Echeverria, F.; Klinkhammer, G.; Thorne, K.S. Billiard balls in wormhole spacetimes with closed timelike
curves: Classical theory. Phys. Rev. D 1991, 44, 1077–1099.

276
Entropy 2017, 19, 326

3. Chiribella, G. Perfect discrimination of no-signalling channels via quantum superposition of causal structures.
Phys. Rev. A 2012, 86, 040301.
4. Colnaghi, T.; D’Ariano, G.M.; Facchini, S.; Perinotti, P. Quantum computation with programmable
connections between gates. Phys. Lett. A 2012, 376, 2940–2943.
5. Chiribella, G.; D’Ariano, G.M.; Perinotti, P.; Valiron, B. Quantum computations without deﬁnite causal
structure. Phys. Rev. A 2013, 88, 022318.
6. Araújo, M.; Costa, F.; Brukner, Č. Computational Advantage from Quantum-Controlled Ordering of Gates.
Phys. Rev. Lett. 2014, 113, 250402.
7. Procopio, L.M.; Moqanaki, A.; Araújo, M.; Costa, F.; Alonso Calafell, I.; Dowd, E.G.; Hamel, D.R.;
Rozema, L.A.; Brukner, Č.; Walther, P. Experimental superposition of orders of quantum gates. Nat. Commun.
2015, 6, 7913.
8. Oreshkov, O.; Costa, F.; Brukner, Č. Quantum correlations with no causal order. Nat. Commun. 2012, 3, 1092.
9. Baumeler, Ä.; Feix, A.; Wolf, S. Maximal incompatibility of locally classical behavior and global causal order
in multiparty scenarios. Phys. Rev. A 2014, 90, 042106.
10. Baumeler, Ä.; Wolf, S. The space of logically consistent classical processes without causal order. New J. Phys.
2016, 18, 013036.
11. Branciard, C.; Araújo, M.; Feix, A.; Costa, F.; Brukner, Č. The simplest causal inequalities and their violation.
New J. Phys. 2016, 18, 013008.
12. Feix, A.; Araújo, M.; Brukner, Č. Quantum superposition of the order of parties as a communication
resource. Phys. Rev. A 2015, 92, 052326.
13. Tegmark, M. The Interpretation of Quantum Mechanics: Many Worlds or Many Words? Fortschr. Phys.
1998, 46, 855–862.
14. Aaronson, S. Guest Column: NP-complete problems and physical reality. ACM SIGACT News 2005, 36, 30–52.
15. Baumeler, Ä.; Wolf, S. Device-independent test of causal order and relations to ﬁxed-points. New J. Phys.
2016, 18, 035014.
16. Everett, H. “Relative State” Formulation of Quantum Mechanics. Rev. Mod. Phys. 1957, 29, 454–462.
17. Bennett, C.H. Logical Reversibility of Computation. IBM J. Res. Dev. 1973, 17, 525–532.
18. Valiant, L.G.; Vazirani, V.V. NP is as easy as detecting unique solutions. Theor. Comput. Sci. 1986, 47, 85–93.
19. Deutsch, D. Quantum mechanics near closed timelike lines. Phys. Rev. D 1991, 44, 3197–3217.
20. Aaronson, S.; Watrous, J. Closed timelike curves make quantum and classical computing equivalent.
Proc. R. Soc. A Math. Phys. Eng. Sci. 2009, 465, 631–647.
21. Brun, T.A.; Wilde, M.M.; Winter, A. Quantum State Cloning Using Deutschian Closed Timelike Curves.
Phys. Rev. Lett. 2013, 111, 190401.

277
entropy
Article
The Many Classical Faces of Quantum Structures
Chris Heunen
School of Informatics, University of Edinburgh, 10 Crichton Street, Edinburgh EH8 9AB, UK;
[email protected]; Tel.: +44-131-650-5132

Academic Editors: Giacomo Mauro D’Ariano, Paolo Perinotti, Jay Lawrence and Giorgio Kaniadakis
Received: 9 January 2017; Accepted: 23 March 2017; Published: 29 March 2017

Abstract: Interpretational problems with quantum mechanics can be phrased precisely by only
talking about empirically accessible information. This prompts a mathematical reformulation of
quantum mechanics in terms of classical mechanics. We survey this programme in terms of algebraic
quantum theory.

Keywords: algebraic quantum theory; C*-algebra; gelfand duality; classical context; bohriﬁcation

1. Introduction
The mathematical formalism of quantum mechanics is open to interpretation. For example,
the possibility of deterministic hidden variables, the uncertainty principle, the measurement problem,
and the reality of the wave function, are all up for debate. (The ﬁrst and the last of course
have rigorous restrictions: hidden variables by the Bell inequalities [1] and the Kochen–Specker
theorem [2], discussed below, and reality of the wave function by the Pusey–Barrett–Rudolph
theorem [3].) Classical mechanics shares none of those interpretational questions. This article surveys
a mathematical reformulation of quantum mechanics in terms of classical mechanics, intended to
bring the interpretational issues with the former to a head. This programme proposes to replace the
usual notion of state space of a quantum-mechanical system by a new one, in a way that avoids the
interpretational questions above and leaves classical systems unaffected:

• known obstructions to hidden variable interpretations merely say that states cannot be located
with exact precision in the state space, and are circumvented via open regions of states;
• the uncertainty principle cannot be expressed and therefore poses no interpretational problem;
• the measurement problem is obviated because the new notion of state space incorporates all
classical data resulting from possible measurements.

If we also take dynamics into account, the new notion of conﬁguration space, called an active lattice:

• yields the same predictions as traditional quantum mechanics.

This programme branches into a number of related themes, spread over the literature; see the extensive
bibliography. The aim of this article is to bring all these active developments together to give an
overview. There are hardly any new results. Instead, the novelty lies in rephrasing foundations to
give an accessible, coherent, and complete overview of the current state-of-the-art. To do so, we will
have to be rather brief and refer to references for many technical details. Nevertheless, there is a
novel contribution regarding topological structure of the new notion of conﬁguration space. We will
use an n-level physical system as a running example to illustrate new notions (though many results
have exceptions for n ≤ 2, and most interesting features occur in inﬁnite dimension). The rest of this
introduction summarizes the framework and discusses four salient features, before giving an overview
of the rest of this article.

Entropy 2017, 19, 144; doi:10.3390/e19040144 279 www.mdpi.com/journal/entropy

Entropy 2017, 19, 144

1.1. Algebraic Quantum Theory

The traditional formalism of quantum theory holds that the (pure) state space is a Hilbert space H,
that (sharp) observables correspond to self-adjoint operators on that Hilbert space, and that (undisturbed)
evolution corresponds to unitary operators. Algebraic quantum theory instead takes the observables
as primitive, and the state space is a derived notion. Self-adjoint operators combine with unitaries
to give all bounded operators, and these form a so-called C*-algebra B( H ). However, superselection
rules mandate that not all self-adjoint operators correspond to valid observables. Thus, one considers
arbitrary C*-algebras, rather than only those of the form B( H ). Nevertheless, it turns out that any
C*-algebra A embeds into B( H ) for some Hilbert space H, and in that sense C*-algebra theory faithfully
captures quantum theory. Finally, one could impose extra conditions on a C*-algebra, leading to
so-called AW*-algebras, and W*-algebras, also known as von Neumann algebras. A good example to
keep in mind is the algebra Mn (C) of n-by-n complex matrices, that models (the observables of) an
n-level system, or direct sums Mn1 (C) ⊕ · · · ⊕ Mnk (C).
To pass from pure to mixed states (density matrices), from sharp to unsharp observables (positive
operator valued measurements), and from undisturbed evolution to including measurement (quantum
channels), the traditional formalism prescribes completely positive maps. These ﬁnd their natural
home in the algebraic formulation. States of a C*-algebra A can then be recovered as unital (completely)
positive maps A → C. Observables with n outcomes are unital (completely) positive maps Cn → A;
sharp observables correspond to homomorphisms. Evolution is described by a completely positive
map A → A; undisturbed evolution corresponds to a homomorphism. Indeed, if A = Mn (C), then
states A → C are precisely density matrices; observables Cn → A are precisely positive operator
valued measurements with n outcomes; completely positive maps A → A are precisely those that map
density matrices to density matrices; and homomorphisms A → A are precisely the linear functions
that map pure states to pure states.
For more information on algebraic quantum theory, see [4–13].

1.2. Gelfand Duality

The advantage of algebraic quantum theory is that it places quantum mechanics on the same
footing as classical mechanics. The (pure) state space in classical mechanics can be any locally compact
Hausdorff topological space X, (sharp) observables are continuous functions X → R, and evolution is
given by homeomorphisms X → X. This leads to the C*-algebra C0 ( X ) of continuous complex-valued
functions on X vanishing at inﬁnity; for compact X, we write C ( X ). A simple example is the algebra
Cn , where X is a discrete space with n points. Indeed, in that case there are n (pure) states; (sharp)
observables are precisely vectors in Rn ; and (deterministic) evolutions are just functions n → n.
Again, we can pass from classical mechanics to the probabilistic setting of statistical mechanics
by considering completely positive maps. States of C ( X ) can be recovered as unital (completely)
positive maps C ( X ) → C as before; pure states x ∈ X correspond to homomorphisms. Observables
with m outcomes are (completely) positive maps Cm → C ( X ), and sharp observables correspond to
homomorphisms. Stochastic evolution is described by a (completely) positive map C ( X ) → C ( X );
deterministic evolution corresponds to a homomorphism. Indeed, for X, the discrete space with n
points, states C ( X ) → C are precisely probability distributions on n points; observables Cm → C ( X )
with m outcomes are precisely m-tuples of probability distributions on n points summing to one; sharp
observables Cm → C ( X ) are just functions m → n; and evolutions C ( X ) → C ( X ) are simply stochastic
m-by-n matrices.
Note that multiplication in C(X) is commutative, whereas B( H ) was noncommutative. Gelfand duality
says that any commutative C*-algebra C is of the form C ( X ) for some compact Hausdorff space
X, called its spectrum and written as Spec(C ). That is, C ∼ = C (Spec(C )) and X ∼ = Spec(C ( X )).
Moreover, this gives a dual equivalence of categories: if f : X → Y is a continuous function then
C ( f ) : C (Y ) → C ( X ) is a homomorphism, and conversely, if f : C → D is a homomorphism, then
Spec( f ) : Spec( D ) → Spec(C ) is a continuous function. Thus, C*-algebra theory is often regarded as

280
Entropy 2017, 19, 144

noncommutative topology. In the case of a discrete space X with n points, this simply says that up to
isomorphism Cn is the only commutative C*-algebra of dimension n, and that functions n → n are the
only way to describe deterministic evolutions.
For more information, we refer to [14–17] in addition to references above.

1.3. Bohr’s Doctrine of Classical Concepts

To summarize, both classical systems and quantum systems are ﬁrst-class citizens that can interact
in the algebraic framework. Classical systems are commutative algebras C, and quantum systems are
noncommutative ones A. An example interaction is measurement, given by maps C → A. For n-level
systems, a measurement with m outcomes is a map Cm → Mn (C). Having no superﬂuous outcomes
in Spec(C ) of the measurement corresponds to the injectivity of these maps. So the information that all
possible measurements can give us about a possibly noncommutative algebra A is its collection C( A)
of commutative subalgebras C. In other words, all empirically accessible information in a quantum
system is encoded in its family of classical subsystems. This observation is known as the doctrine of
classical concepts and dates back to Bohr [18,19]. For an n-level system A = Mn (C), elements of C( A)
indeed correspond to all possible measurement setups: the ways of choosing an orthonormal basis of
Cn and a partition of an n-element set with m equivalence classes for outcomes.
The main aim of this paper is to survey what can be said about the quantum structure A based on
its many classical faces C( A), explaining the title.

1.4. The Kadison–Singer Problem

A case in point is the long-standing but recently solved Kadison–Singer problem [20,21].
In a noncommutative C*-algebra, not all observables are compatible, in the sense that they can
be measured simultaneously (without uncertainty). What can at most be measured in an experiment
are those observables in a single commutative subalgebra. The best an experimenter can do is repeat
the experiment to determine the values of those observables, giving a pure state of that commutative
subalgebra. Ideally, this tomography procedure should determine the state of the entire system.
Indeed, there are various protocols for performing such tomography on n-level systems that have been
experimentally veriﬁed [22].
The Kadison–Singer result says that this procedure indeed works in the discrete case. Let H be a
Hilbert space of countable dimension. Then B( H ) has a discrete maximal commutative subalgebra
∞ (N) consisting of operators that are diagonal in a ﬁxed basis. The precise result is that a pure state of
∞ (N) extends uniquely to a pure state of B( H ). Thus, (the state of) a quantum system is characterized
by what we can learn about it from experiments, giving a positive outlook on Bohr’s doctrine of
classical concepts.

1.5. The Kochen–Specker Theorem

Nevertheless, Bohr’s doctrine of classical concepts should be interpreted carefully. It does not
say that collections of states of each classical subsystem assemble to a state of the quantum system.
That is ruled out by the Kochen–Specker theorem. In physical terms, local deterministic hidden
variables are impossible; one cannot assign deﬁnite values to all observables of a quantum system
in a noncontextual way, i.e., giving coherent states on classical subsystems. In mathematical terms,
Gelfand duality does not extend to noncommutative algebras via C( A); this will be discussed in more
detail in Section 2. More precisely, the zero map is the only function Mn (C) → C ( X ) that restricts to
homomorphisms C → C ( X ) for each C ∈ C(Mn (C)) when n ≥ 3. That is, there is no way to assign
measurement outcomes in Rm to all possible positive operator valued measures on an n-level system
with m outcomes in a consistent way. This extends to more general noncommutative A that do not
contain a subalgebra M2 (C). See [2,11,23].

281
Entropy 2017, 19, 144

1.6. Overview of This Article

Section 2 continues in more depth the discussion of the structure of quantum systems from the
perspective just sketched. In particular, it covers exactly how much of A can be reconstructed from
C( A), and makes precise the link between the Kochen–Specker theorem and noncommutative Gelfand
duality. Section 3 shows how to interpret a quantum system A as a classical system via C( A) by
changing the rules of the ambient set theory, and discusses the surrounding interesting interpretational
issues. Section 4 considers fine-graining. Increasing chains of classical subsystems give more and more
information about the quantum system. We discuss C( A) from this information-theoretic point of
view, called domain theory. Section 5 explains how to incorporate dynamics into C( A), turning it into a
so-called active lattice. It turns out that this extra information does make C( A) into a full invariant, from
which one can reconstruct A. This raises interesting interpretational questions: its active lattice can be
regarded as a configuration space that completely determines a quantum system. By encoding more
than static hidden variables, it circumvents the obstructions of Section 2. To obtain an equivalence for
quantum systems like Gelfand duality did for classical ones, it thus suffices to characterize the active
lattices arising this way. This is examined in Section 6. Finally, Section 7 considers to what extent the
successes of the doctrine of classical concepts in the previous sections are due to the use of algebraic
quantum theory, and to what extent they generalize to other formulations.

2. Invariants
Bohr’s doctrine of classical concepts teaches that a quantum system can only be empirically
understood through its classical subsystems. These classical subsystems should therefore contain all
the physically relevant information about the quantum system.

Deﬁnition 1. For a unital C*-algebra A, write C( A) for its family of commutative unital C*-subalgebras C
(with the same unit as A). We may think of it either as partially ordered set by inclusion, or as a diagram that
remembers that the points of the partially ordered set are C*-algebras C.

For example, the partially ordered set C( A) of a 2-level system A = M2 (C) has Hasse diagram

• • • • • • • ···

with a point on the upper level for each unitary in U (2).

The question is then: how does the mathematical formalism of the quantum theory of A translate
into terms of C( A)? For example, it turns out that the entropy of a state of A can be reconstructed
from the entropies of its restriction to C( A) [24], see also [25]. Ideally, we would like to completely
reconstruct A from C( A). A priori, C( A) is merely an invariant of A. This section investigates how
strong an invariant it is. The ﬁrst step is to realize that, from C( A), we can reconstruct A as a set,
as well as operations between commuting elements. This can be made precise by the notion of a
piecewise C*-algebra, which is basically a C*-algebra that forgot how to add or multiply noncommuting
operators.

Deﬁnition 2. A piecewise C*-algebra consists of a set A with

• a reﬂexive and symmetric binary (commeasurability) relation - ⊆ A × A;

• elements 0, 1 ∈ A;
• a (total) involution ∗ : A → A;
• a (total) function · : C × A → A;
• a (total) function − : A → R;
• (partial) binary operations +, · : - → A;

282
Entropy 2017, 19, 144

such that every set S ⊆ A of pairwise commeasurable elements is contained in a set T ⊆ A of pairwise
commeasurable elements that forms a commutative C*-algebra under the above operations.

Of course, any commutative C*-algebra is a piecewise C*-algebra. More generally, the normal
elements (those commuting with their own adjoint) of any C*-algebra A form a piecewise C*-algebra.
For an n-level system A = Mn (C), the piecewise C*-algebra consists of all normal n-by-n matrices,
together with their norms and adjoints, as well as the knowledge of how commuting elements
add and multiply. Notice that C( A) makes perfect sense for any piecewise C*-algebra A. To make
precise how we can reconstruct the piecewise structure of A from C( A), we will use the language of
category theory [26]. C*-algebras, with ∗-homomorphisms between them, form a category. We can also
make piecewise C*-algebras into a category with the following arrows: (total) functions f : A → B that
preserve commeasurability and the algebraic operations, whenever deﬁned.
The precise notion we need is that of a colimit. Sufﬁce to say here, a colimit, when it exists, is a
universal solution that compatibly pastes together a given diagram into a single object. Thinking of A
as the whole and C( A) as its parts, we would like to know whether the whole is determined by the
parts. The following theorem says that C( A) indeed contains enough information to reconstruct A as a
piecewise C*-algebra.

Theorem 1 ([27]). Every piecewise C*-algebra is the colimit of its commutative C*-subalgebras in the category
of piecewise C*-algebras.

This means that the diagram C( A) determines the piecewise C*-algebra A: if C( A) and C( B) are
isomorphic diagrams, then A and B are isomorphic piecewise C*-algebras. Moreover, the previous
theorem gives a concrete way to reconstruct A from C( A). For the n-level system A = Mn (C),
this means we can reconstruct from C( A) the normal n-by-n matrices, as well as sums and products
of commuting ones. An important point to note here is that the reconstruction is happening in the
setting of piecewise C*-algebras. We could not have taken the colimit in the category of commutative
C*-algebras instead. Indeed, one way to reformulate the Kochen–Specker theorem in terms of colimits is
the following. The following reformulation might not look much like the original, but it is nevertheless
equivalent, and more suited to our purposes; see also ([2], p. 66).

Theorem 2 ([2,28]). If n ≥ 3, then the colimit of C(Mn (C)) in the category of commutative C*-algebras is the
degenerate, 0-dimensional, C*-algebra.

In fact, the colimit of C( A) degenerates for many more C*-algebras A than just Mn (C), such as any
C*-algebra of the form Mn ( B) for some C*-algebra B, or any W*-algebra that has no direct summand
C or M2 (C) [29,30].
As mentioned in the introduction, Gelfand duality is a functor from the category of commutative
C*-algebras to the category of compact Hausdorff topological spaces. That is, a systematic way
to assign a space to a C*-algebra, that respects functions. Interpreted physically: any classical
system is determined by a configuration space in a way that respects operations on the system.
The previous theorem can be used to show that there is no such configuration space determining
quantum systems—at least, if the notion of configuration space is to be a conservative extension of the
classical notion. The latter can be made precise as a continuous functor from the category of compact
Hausdorff spaces to some category with a degenerate space like the empty set, more precisely, a strict
initial object 0.

Theorem 3 ([29]). Suppose there exist a category conservatively extending that of compact Hausdorff spaces
and a functor F completing the following square.

283
Entropy 2017, 19, 144

Spec
commutative C*-algebras compact Hausdorff spaces

⊆
C*-algebras ?
F

Then F (Mn (C)) = 0 for n ≥ 3. In particular, F cannot be a dual equivalence.

Asking the functor on the right to be continuous is appropriate to model the classical limit
of quantum systems converging to a classical one, because then the state space of the product of
two limiting classical systems should be computed as the classical limit of the joint quantum systems.
In fact, the proof in [29] holds if the category on the bottom right has limits, and the functor on the
right reflects them. However, one might still wonder if it is reasonable to ask the diagram to commute
on the nose. Instead, we could ask it to commute up to a natural isomorphism. This is precisely the
way out we will explore in Sections 3 and 5.
This rules out many possible quantum configuration spaces that have been proposed for the
bottom right role in the square; in particular many generalized notions of topological spaces, such as
sets, topological spaces themselves, pointfree topological spaces, ringed spaces, quantales, toposes,
categories of sheaves, and many more [28,29,31]. In particular, the state space of a C*-algebra,
as discussed in the introduction, will not do for us, even though it is one of the most important
tools associated with a C*-algebra [32]. That explains why we deliberately talk about “configuration
spaces”. In the classical case, the two notions coincide. The previous theorem shows that serious
notions of quantum configuration space must be less conservative. This points the way towards good
candidates: Sections 3 and 5 will cover two that do fit the bill.
The question of noncommutative extensions of Gelfand duality is also very interesting from a
purely mathematical perspective. As mentioned in the introduction, C*-algebra theory can be regarded
as noncommutative topology. Adding more structure than mere topology leads to noncommutative
geometry, which is a rich field of study [33]. However, it takes place entirely on the algebraic side.
Finding the right notion of quantum configuration space could reintroduce geometric intuition, which
is usually very powerful [34,35]. For example, in certain cases, extensions of C( A) can be used to
compute the K-theory of A, which is a way to study homotopies of the configuration space underlying
A, that includes many local-to-global principles [36]. Similarly, closed ideals of a W*-algebra A, that are
important because they correspond to open subsets in the classical case, are in bijection with certain
piecewise ideals of C( A) [37].
So far, we have considered C( A) as a diagram of parts of the whole. We finish this section by
considering it as a mere partially ordered set, where we forget that elements have the structure
of commutative C*-algebras. That is, we only consider the shape of how the parts fit together.
This information is already enough to determine the piecewise structure of A, but as a Jordan algebra.
(In fact, considering C( A) as a mere partially ordered set gives precisely the same information as
considering it as a diagram [38]. This justifies Definition 1.) The self-adjoint elements of a C*-algebra
form a Jordan algebra under the product a ◦ b = 12 ( ab + ba); this even gives a so-called JB-algebra.
In fact, any JB-algebra is a subalgebra of the direct sum of one of this form and an exceptional one, such
as quaternionic matrices M3 (H) [39]. For example, the n-level system gives the JB-algebra of hermitian
n-by-n matrices multiplied via anticommutators. Piecewise Jordan algebras and their homomorphisms
are defined analogously to Definition 2. The structure of quantum observables leads naturally to
the axioms of Jordan algebras [8] (Modern mathematical physics tends to prefer C*-algebras, as their
theory is slightly less complicated, and the connections to Jordan algebras are so tight anyway [39].)
The following theorem justifies that point of view.

Theorem 4 ([40]). Let A and B be C*-algebras. If C( A) and C( B) are isomorphic partially ordered sets, then
A and B are isomorphic as piecewise Jordan algebras.

284
Entropy 2017, 19, 144

A little more can be said. Any isomorphism f : C( A) → C( B) is implemented by an isomorphism

g : A → B of piecewise Jordan algebras, in the sense that f (C ) = { g(c) | c ∈ C }. In fact, this g is
unique, unless A is either C2 or M2 (C). For AW*-algebras more is true because of Gleason’s theorem,
that we will meet in Section 5, we can actually reconstruct the full linear structure rather than just
the piecewise linear structure. (An AW*-algebra is a C*-algebra A that has enough projections, in the
sense that every C ∈ C( A) is the closed linear span of its projections, and those projections work
together well, in the sense that orthogonal families in the partially ordered set of projections have least
upper bounds [7,41]. See also Section 5. They are more general than W*-algebras, and much of the
theory of W*-algebra generalizes to AW*-algebras, such as the type decomposition. An n-level system
A = Mn (C) forms a W*-algebra, and hence also an AW*-algebra.) Type I2 AW*-algebras are those
of the form M2 (C ) for a commutative AW*-algebra C. AW*-algebras with a type I2 direct summand
correspond to the exceptional case n = 2 in the Kochen–Specker Theorem 2. We will call them atypical,
and algebras without a type I2 direct summand typical, as we will meet this exception often. An n-level
system is typical when n ≥ 3.

Corollary 1 ([42,43]). Let A and B be typical AW*-algebras. If C( A) and C( B) are isomorphic partially ordered
sets, then A and B are isomorphic as Jordan algebras.

Whereas the C*-algebra product is associative but need not be commutative, the Jordan product
is commutative but need not be associative; commutative C*-subalgebras correspond to associative
Jordan subalgebras. Indeed, the previous theorem generalizes to Jordan algebras in those terms [44].

3. Toposes
In this section, we consider C( A) as a diagram. That is, we regard it as an operation that assigns
to each classical subsystem C ∈ C( A) of the quantum system A a classical system C. What kind of
operation is this diagram C → C? We can think of it as a set S(C ) that varies with the context C ∈ C( A).
Moreover, this contextual set respects coarse-graining: if C ⊆ D, then S(C ) ⊆ S( D ). That is, when the
measurement context C grows to include more observables, the information contained in the set S(C )
assigned to it grows along accordingly. For example, for an 2-level system A = M2 (C), this comes
down to a choice of set S(u) for each unitary u ∈ U (2), that all include a ﬁxed set S(0). Hence, these
contextual sets are functors S from C( A), now regarded as a partially ordered set, to the category of
sets and functions. The totality of all such functors forms a category. In fact, contextual sets form a
particularly nice category, namely a topos.
A topos is a category that shares a lot of the properties of the category of sets and functions.
In particular, one can do mathematics inside a topos: we may think about objects of a topos as sets, that
we may specify and manipulate using logical formulae. Of course, this internal perspective comes
with some caveats. Most notably, if a proof is to hold in the internal language of any topos, it has to be
constructive: we are not allowed to use the axiom of choice or proofs by contradiction, and have to be
careful about real numbers. We cannot go into more detail here, but for more information on topos
theory, see [45].
One particular object of interest in the topos of contextual sets over C( A) is our canonical
contextual set C → C. It turns out that, according to the logic of the topos of contextual sets, this object
is a commutative C*-algebra.

Theorem 5 ([19]). Let A be a C*-algebra. In the topos of contextual sets over C( A), the canonical contextual
set C → C is a commutative C*-algebra.

This procedure is called Bohriﬁcation:

1. Start with a quantum system A.

2. Change the logical rules of set theory by moving to the topos of contextual sets over C( A).

285
Entropy 2017, 19, 144

3. The quantum system A turns into a classical one given by the canonical contextual set C → C.

Theorem 6 ([23,51,65]). There is a bijective correspondence between piecewise states on an AW*-algebra A,

and states of the canonical contextual set C → C inside the topos of contextual sets over C( A).

(The cited references consider W*-algebras, but the proof holds for AW*-algebras because
Corollary 5 does so, see Section 5. The same goes for the references in Corollary 3.) By Gleason’s
theorem (see Section 5), we can say more for AW*-algebras. See also [25].

Corollary 3 ([66,67]). There is a bijective correspondence between states of a typical AW*-algebra A, and states
of the canonical contextual set C → C inside the topos of contextual sets over C( A).

In the n-level system A = Mn (C) for n ≥ 3, this means that n-by-n density matrices correspond
precisely to a choice of probability distribution over m points that is consistent over all unitaries
u ∈ U (n) and partitions of n points into m equivalence classes.
Combining daseinisation with the above results gives rise to a contextual Born rule, justifying
the Bohriﬁcation procedure of Theorem 5 [50]. Summarizing, we can formulate the physics of the
quantum system A completely in terms of C( A) and its topos of contextual sets, and work within there
as if dealing with a classical system.
To end this section, let us mention some other related work. The “amount of nonclassicality” of
the contextual logic discussed of A measures the computational power of the quantum system A [68].
For philosophical aspects of Bohriﬁcation and related constructions, see [69,70]. Similar contextual
ideas have been used to model quantum numbers [71]. Transfering C*-algebras between different
toposes has been used successfully before in so-called Boolean-valued analysis [72–74]. Finally,
contextuality and the Kochen–Specker theorem can be formulated more generally than in algebraic
quantum theory [75].

287
Entropy 2017, 19, 144

4. Domains
The partially ordered set C( A) of empirically accessible classical contexts C of a quantum system
A embodies coarse-graining. As in the introduction, we think of each C ∈ C( A) as consisting of
compatible observables that we can measure together in a single experiment. Larger experiments,
involving more observables, should give us more information, and this is reflected in the partial order:
if C ⊆ D, then D contains more observables, and hence provides more information. If A itself is
noncommutative, the best we can do is approximate it with larger and larger commutative subalgebras
C. This sort of informational approximation is studied in computer science under the name domain
theory [76,77]. This section discusses the domain-theoretic properties of C( A). Domain theory is mostly
concerned with partial orders where every element can be approximated by finite ones, as those are
the ones we can measure in practice, leading to the following definitions.

Definition 3. A partially ordered set (C , ≤) is directed complete when every ascending chain { Di } has a least
7 7
upper bound i Di . An element C approximates D, written C * D, when D ≤ i Di implies C ≤ Di for
any chain { Di } and some i. An element C is finite when C * C. A continuous domain is a directed complete
7
partially ordered set, every element of which satisfies D = {C | C * D }. An algebraic domain is a directed
7
complete partially ordered set, every element of which is approximated by finite ones: D = {C | C * C ≤ D }.
7
Lemma 1 ([65,78]). If A is a C*-algebra, then C( A) is a directed complete partially ordered set, in which i Ci
!
is the norm-closure of i Ci .

We saw in Section 2 that C( A) captures precisely the structure of A as a (piecewise) Jordan

algebra. Order-theoretic techniques give an alternative proof of Corollary 1. First, we can recognize
the dimension of A from C( A). Recall that a partially ordered set is Artinian when: every nonempty
subset has a minimal element; every nonempty ﬁltered subset has a least element; every descending
sequence C1 ≥ C2 ≥ · · · eventually becomes constant. The dual notion, satisfying an ascending chain
condition, is called Noetherian.

Proposition 3 ([79]). A C*-algebra A is ﬁnite-dimensional if and only if C( A) is Artinian, if and only if C( A)

is Noetherian.

Indeed, in an n-level system A = Mn (C), elements C ∈ C( A) correspond to a choice of unitary

u ∈ U (n) and a partition of n points into m equivalence classes. Because C ⊆ D when the partition
for D is finer than that for C, the partially ordered set C( A) can only have strictly increasing chains of
length at most n.
By the Artin–Wedderburn theorem, we know that any finite-dimensional C*-algebra A is a finite
direct sum of matrix algebras Mni (C). It is therefore specified up to isomorphism by the numbers {ni },
which we can extract from the partially ordered set C( A). A partially ordered set C is called directly
indecomposable when C = C1 × C2 implies that either C1 or C2 is a singleton set.
+
i =1 Mni (C), then the C*-subalgebras Mni (C) correspond to directly
n
Proposition 4 ([79,80]). If A =
indecomposable partially ordered subsets Ci of C( A), and furthermore ni is the length of a maximal chain in Ci .

The previous proposition does not generalize to arbitrary C*-algebras, which need not have
a decomposition as a direct sum of factors. One might expect that C( A) is a domain when A
is approximately ﬁnite-dimensional, as this would match with the intuition of approximation using
practically obtainable information. However, there also needs to be a large enough supply of projections
for this to work; see also Section 3. It turns out that the correct notion is that of scattered C*-algebras [81],
that is, C*-algebras A for which every positive map A → C is a sum of pure ones. The n-level system
A = Mn (C ) is scattered.

288
Entropy 2017, 19, 144

Theorem 7 ([38]). A C*-algebra A is scattered if and only if C( A) is a continuous domain if and only if C( A)
is an algebraic domain.

Compare this to the situation using commutative W-subalgebras V ( A) of a W-algebra A: V ( A)

is a continuous or algebraic domain only when A is ﬁnite-dimensional [78]. Connecting back to
Theorem 6 and Corollary 3, let us notice that C can also be regarded as a domain using the interval
topology: smaller intervals approximate an ideal complex number better than larger ones. Moreover,
(piecewise) states A → C respect such approximations: the induced functions from C( A) to the interval
domain on C are Scott continuous [65,78].
There are several topologies with which one could adorn C( A). As any partially ordered set, it
carries the order topology. We have just mentioned the Scott topology on directed complete partially
ordered sets. For the purposes of information approximation that we are interested in, there is
the Lawson topology, which reﬁnes both the Scott topology and the order topology. If the domain
is continuous, the topological space will be Hausdorff. The topological space will be compact for
so-called FS-domains, which C( A) happens to be.

Corollary 4 ([77]). For a scattered C*-algebra A, the Lawson topology makes X = C( A) compact Hausdorff.
Hence to each scattered C*-algebra A we may assign a commutative C*-algebra C ( X ).

The assignment A → C (C( A)) is not functorial, does not leave commutative C*-algebras invariant,
and of course only works for scattered C*-algebras A in the ﬁrst place [38]. Hence there is no
contradiction with Theorem 3.
One can also furnish C( A) with a topology inspired by the topology of A itself. We will use the
topology induced by the following variation on the Hausdorff metric; similar variations are named
after Banach–Mazur, Kadets [82], Gromov–Hausdorff, Effros–Maréchal [83], and Kadison–Kastler [84] .
See also [85]. Deﬁne the distance between C, D ∈ C( A) to be
' (
d(C, D ) = max sup inf c − d, sup inf c − d .
c∈C d∈ D d∈ D c∈C
c≤1 d≤1 d≤1 c≤1

Now if C and D are generated by projections p and q, and A is represented on a Hilbert space H, then

p − q = sup p( x ) − q( x ) = sup x − q( x ) = sup inf x − y

x∈ H x ∈ p( H ) x ∈ p( H ) y∈q( H )
x ≤1 x ≤1 x ≤1 y≤1

is the Hausdorff distance between p( H ) and q( H ). It follows that the distance between C and D
is max( p − q, (1 − p) − q, p − (1 − q), (1 − p) − (1 − q)) = max( p − q, (1 − p) − q).
This topology on C( A) matches the case of the 2-level system A = M2 (C), where C( A) is in bijection
with the one-point compactiﬁcation of the real projective plane RP2 [50].

5. Dynamics
So far, we have only considered kinematics of the quantum system A, by looking for conﬁguration
spaces based on C( A). It is clear, however, that C( A) in itself is not enough to reconstruct all of A.
For a counterexample, observe that any C*-algebra A has an opposite C*-algebra Aop in which the
multiplication is reversed. Clearly, C( A) and C( Aop ) are isomorphic as partially ordered sets, but
there exist C*-algebras A that are not isomorphic to Aop as C*-algebras [86]. So we need to add more
information to C( A) to be able to reconstruct A as a C*-algebra, which is the topic of this section. To do
so, we bring dynamics into the picture. For motivation of why dynamics and conﬁguration spaces
should go together, see also [87].
We begin by viewing dynamics as a time-dependent group of evolutions. The traditional view is
that the 1-parameter group consists of unitary evolutions of the Hilbert space. For an n-level system,

289
Entropy 2017, 19, 144

these 1-parameter groups are continuous homomorphisms R → U (n). In algebraic quantum theory,
it becomes a 1-parameter group of isomorphisms A → A of the C*-algebra.
The group Aut( A) inherits the pointwise norm topology from A, that has subbasis

{ g ∈ Aut( A) | ∀ a ∈ S : f ( a) − g( a) < ε > f ( a) − g(1 − a)}

for f ∈ Aut( A), ε > 0, and S ⊆ A ﬁnite, and makes conjugation U ( A) → Aut( A) continuous [88].
We can similarly consider 1-parameter groups of isomorphisms C( A) → C( A) of partially ordered
sets.
Similarly, Aut(C( A)) becomes a topological group with subbasis

{ g ∈ Aut(C( A)) | ∀C ∈ S : d( f (C ), g( D )) < ε}

for f ∈ Aut(C( A)), ε > 0, and ﬁnite sets S of atoms of C( A).

Deﬁnition 4. Let A be a C*-algebra. A 1-parameter group on A is a continuous injection ϕ : R → Aut(A),

that assigns to each t ∈ R an isomorphism ϕt : A → A of C*-algebras, satisfying ϕ0 = 1 and ϕt+s = ϕt ◦ ϕs .
A 1-parameter group on C( A) is a continuous injection α : R → Aut(C( A)), that assigns to each t ∈ R an
isomorphism αt : C( A) → C( A) of partially ordered sets, satisfying αt+s = αt ◦ αs .

The following theorem shows that both notions in fact coincide. A factor is an algebra with trivial
center, that is, a single superselection sector: the n-level system Mn (C) is a factor, but Mm (C) ⊕ Mn (C)
is not, because its center is two-dimensional. More precisely, the following theorem shows that the
only freedom between the two notions in the previous deﬁnition lies in permutations of the center,
because Aut( A) Aut(C( A)) for typical AW*-factors.

Theorem 8 ([89,90]). Let A be a typical AW*-factor. Any 1-parameter group on C( A) is induced by a

1-parameter group on A, and vice versa.

So C*-dynamics of A can be completely justiﬁed in terms of C( A). This also justiﬁes our choice of
the topology on C( A) induced by the Hausdorff metric. See also [91]. Equilibrium states are described
in algebraic quantum theory by Kubo–Martin–Schwinger states, and these can be described in terms of
C( A) as well, see [92].
We now switch gear. By Stone’s theorem, 1-parameter groups of unitaries eith in certain W*-algebras
correspond to self-adjoint (possibly unbounded) observables h. Thus, we may forget about the explicit
dependence on a time parameter and consider single self-adjoint elements of C*-algebras. In fact, we
will mostly be interested in symmetries: self-adjoint unitary elements s = s∗ = s−1 .
Symmetries are tightly linked to projections. Every projection p gives rise to a symmetry 1 − 2p,
and every symmetry s comes from a projection (1 − s)/2. As they are unitary, the symmetries of
a C*-algebra A generate a subgroup Sym( A) of the unitary group. For a commutative C*-algebra
A = C ( X ), symmetries compose, so that Sym( A) consists of symmetries only. For an n-level system
A = Mn (C), it turns out that Sym( A) consists of those unitaries u ∈ U (n) whose determinant is 1
or −1. This ‘orientation’ is what we will add to C( A) to make it into a full invariant of A. See also [93].
Having enough symmetries means having enough projections. Therefore, we now consider
AW*-algebras rather than general C*-algebras. For commutative AW*-algebras C ( X ), the Gelfand
spectrum X is not just compact Hausdorff, but Stonean, or extremally disconnected, in the sense that
the closure of an open set is still open. (For comparison, the Lawson topology in Corollary 4 is
totally disconnected, in the sense that connected components are singleton sets, which is weaker
than Stonean).
Gelfand duality restricts to commutative AW*-algebras and Stonean spaces. Another way to
put this is to say that the projections Proj( A) of a commutative AW*-algebra A form a complete

290
Entropy 2017, 19, 144

Boolean algebra, and vice versa, every complete Boolean algebra gives a commutative AW*-algebra.
The appropriate homomorphisms between AW*-algebras are normal, meaning that they preserve
least upper bounds of projections [94]. There are versions of Definition 2 for piecewise AW*-algebras,
and piecewise complete Boolean algebras, too [94]. One could also define a piecewise Stonean space,
but the following lemma suffices here.

Lemma 2 ([94]). The category of piecewise complete Boolean algebras and the category of piecewise AW*-algebras
are equivalent.

The orthocomplement p → 1 − p makes sense for the projections Proj( A) of any C*-algebra A.
We can now make precise what equivariance under symmetries achieves: it makes the difference
between being able to recover Jordan structure and C*-algebra structure.

Proposition 5 ([43,94]). Let A and B be typical AW*-algebras, and suppose that f : Proj( A) → Proj( B)
preserve least upper bounds and orthocomplements. Then f extends to a Jordan homomorphism A → B.
' ( ' (' (
It extends to a homomorphism if additionally f (1 − 2p)(1 − 2q) = 1 − 2 f ( p) 1 − 2 f (q) .

To arrive at a good configuration space for A, we can package all this information up. We saw that
Proj( A) embedded in Sym( A). Conversely, Sym( A) acts on Proj( A): a symmetry s and a projection p
give rise to a new projection sps. In this way, Proj( A) acts on itself, and we may forget about Sym( A).
Including this action leads to the notion of an active lattice AProj( A). More precisely, an active lattice
consists of a complete orthomodular lattice P, a group G generated by 1 − 2p for p ∈ P within the
unitary group of the piecewise AW*-algebra A( P) with projections P, and an action of G on P that
becomes conjugation on A( P). The active lattice of an n-level system A = Mn (C) has, for P, the lattice
of subspaces of Cn ; for G, the group {u ∈ U (n) | det(u) = ±1}; the injection P → G sends V ⊆ Cn
to the reflection in V; and u ∈ G acts on V ∈ P as uVu∗ = {uvu∗ | v ∈ V } ⊆ Cn . For morphisms
of active lattices, we refer to [94], but let us point out that thanks to Lemma 2 they can be phrased
in terms of projections alone, just like the above definition of the active lattice itself. See also [95].
We can now make precise that we can reconstruct an AW*-algebra A from its active lattice AProj( A).
Up to now, we have mostly considered reconstructions of the form “if some structures based on A
and B are isomorphic, then so are A and B”. The following theorem gives a much stronger form of
reconstruction. Recall that a functor F is fully faithful when it gives a bijection between morphisms
A → B and F ( A) → F ( B).

Theorem 9 ([94]). The functor that assigns to an AW*-algebra A its active lattice AProj( A) is fully faithful.

It follows immediately that if A and B are AW*-algebras with isomorphic active lattices
AProj( A) ∼ = AProj( B), then A ∼= B are isomorphic AW*-algebras. That is, its active lattice completely
determines an AW*-algebra. We can therefore think of them as configuration spaces. As mentioned
before, Proj( A) contains precisely the same information as C( A), so we could phrase active lattices in
terms of C( A) as well. This configuration space circumvents the obstruction of Theorem 3, because
active lattices are not a conservative extension of the “passive lattices” coming from compact Hausdorff
spaces. Another thing to note about the previous theorem is that it has no need to except atypical cases
such as M2 (C). Finally, let us point out that functoriality of A → AProj( A) is nontrivial [96].
To get a good notion of configuration space for general quantum systems, we would eventually
like to pass from AW*-algebras to C*-algebras. One way to think about this step is as refining an
underlying carrying set to a topological space, that is, moving from algebras ∞ ( X ) of all (bounded)
functions on the set X to algebras C ( X ) of continuous functions on the topological space X. One might
hope that AW*-algebras or W*-algebras play the former role in a noncommutative generalization, and to
some extent this works [97,98]. Unfortunately, the Kadison–Singer problem raises rigorous obstructions

291
Entropy 2017, 19, 144

to the most obvious noncommutative generalization of such a “discretization” of C*-algebras to

AW*-algebras [99].
Nevertheless, AW*-algebras are pleasant to work with. Their theory is entirely algebraic, whereas
the theory of (commutative) W*-algebras involves a good deal of measure theory. For example, Gelfand
spectra of commutative AW*-algebras are Stonean spaces, whereas Gelfand spectra of commutative
W*-algebras are so-called hyperstonean spaces; they additionally have to satisfy a measure-theoretic
condition that seems divorced from topology. A similar downside occurs with projections:
the projection lattice of a commutative W*-algebra is not just a complete Boolean algebra, it additionally
has to satisfy a measure-theoretic condition. In particular, projections of an enveloping AW*-algebra
should correspond to certain ideals in a C*-algebra, without needing measure-theoretic intricacies.
Much of the theory of W*-algebra ﬁnds its natural home in AW*-algebras at any rate. As a case
in point, consider Gleason’s theorem. It states that any probability measure on Proj(Mn (C)) extends
to a positive linear function Mn (C) → C when n > 2. Roughly speaking, any quantum probability
measure μ is of the form μ( p) = Tr(ρp) for some density matrix ρ. In the algebraic formulation, any
probability measure Proj( A) → C extends to a state A → C, for an n-level system A = Mn (C) [100].
One can replace A by an arbitrary W*-algebra, and one can even replace C by an arbitrary operator
algebra B [101,102]. Thanks to Proposition 5, Gleason’s theorem generalizes to many typical
AW*-algebras A, such as those of so-called homogeneous type I, and those generated by two projections,
which leads to the following corollary, that supports many results in Sections 2 and 3.

Corollary 5 ([43]). Any normal piecewise Jordan homomorphism between typical AW*-algebras is a
Jordan homomorphism.

6. Characterization
Now that we have seen that most of the algebraic quantum theory of A can be phrased in terms
of C( A) only, let us try to axiomatize C( A) itself. Given any partially ordered set, when is it of the
form C( A) for some quantum system A? An answer to this question would, for example, make
Theorem 9 into an equivalence of categories, bringing configuration spaces for quantum systems on a
par with Gelfand duality for classical systems. An axiomatization would also open up the possibility
of generalizations, that might go beyond algebraic quantum theory.
We start with the classical case, of commutative C*-algebras C ( X ). By Gelfand duality, any
C ∈ C(C ( X )) corresponds to a quotient X/∼. In turn, the equivalence relation corresponds to a
partition of X into equivalence classes. Partitions are partially ordered by refinement: if C ⊆ D, then
any equivalence class in the partition corresponding to D is contained in an equivalence class of the
partition corresponding to C. Hence axiomatizing C(C ( X )) comes down to axiomatizing partition
lattices, and this has been well-studied, both in the finite-dimensional case [103,104], and in the general
case [105]. The list of axioms is too long to reproduce here, but let us remark that it is based on a
definition of points of the partition lattice. In the case of a finite partition lattice, the points are simply
the atoms, that is, the minimal nonzero elements. So for a classical system Cn with n states, the elements
of the partition lattice C(Cn )op are the ways to partition a set of n points into m equivalence classes;
the atoms put two of the n points in an equivalence class and all the others in their own equivalence
class of one point each. The other axioms are geometric in nature.

Lemma 3 ([64]). A partially ordered set is isomorphic to C(C ( X )) for a compact Hausdorff space X if and only
if it is opposite to a partition lattice whose points are in bijection with X.

Thanks to (a variation of) Lemma 2, the same strategy applies to piecewise Boolean algebras B.
Write C( B) for the partially ordered set of Boolean subalgebras of B. The downset of an element D of a
partially ordered set consists of all elements C ≤ D. In fact, the idea that any quantum logic (piecewise

292
Entropy 2017, 19, 144

Boolean algebra) should be seen as many classical sublogics (Boolean algebras) pasted together, is not
new, and drives much of the research in that area [27,106–109].

Theorem 10 ([110]). A partially ordered set is isomorphic to C( B) for a piecewise Boolean algebra B if
and only if:

• it is an algebraic domain;
• any nonempty subset has a greatest lower bound;
• a set of atoms has an upper bound whenever each pair of its elements does;
• the downset of each compact element is isomorphic to the opposite of a ﬁnite partition lattice.

In the case of a classical system with n states, B is the powerset of n points, and the above
conditions merely say that C( B)op is a partition lattice.
Just like in Section 3, if we consider C( B) as a diagram rather than a mere partially ordered set,
we can reconstruct B. Starting from just the partially ordered set C( B), the same issues surface as in
Sections 2 and 5, about Jordan structure verses full algebra structure. In the current piecewise Boolean
setting, it can be solved neatly by adding an orientation to C( B) [110]. This comes down to making a
consistent choice of atom in the Boolean subalgebras with two atoms, corresponding to the atypical
cases for AW*-algebras before.
Returning to C*-algebras, Lemma 3 reduces the question of characterizing C( A) for a C*-algebra A
to finding relationships between C( A) and C(C ) for C ∈ C( A). One prototypical case where we know
such a relationship is for the n-level system A = Mn (C). Namely, inspired by the previous section,
there is an action of the unitary group U (n) on C( A): if u ∈ U (n) is some rotation, and C ∈ C( A)
is diagonal in some basis, then also the rotation uCu∗ is diagonal in the rotated basis and therefore
is in C( A) again. In fact, any C ∈ C( A) will be a rotation of an element of C( A) that is diagonal in
the standard basis. Therefore, we can recognize C(Mn (C)) as a semidirect product of C(Cn ) and U (n).
Such semidirect products can be axiomatized; for details, we refer to [64]. This can be generalized
to C*-algebras A that have a weakly terminal commutative C*-subalgebra D, in the sense that any
C ∈ C( A) allows an injection C → D. This includes all finite-dimensional C*-algebras, as well as
algebras of all bounded operators on a Hilbert space. For example, for the n-level system A = Mn (C),
the matrices that are diagonal in the standard basis form a terminal subalgebra Cn .
However, the mere partially ordered set C( A) cannot detect this unitary action. For this we
need injections rather than inclusions. Therefore, we now switch to a category C ( A) of commutative
C*-subalgebras, with injective ∗-homomorphisms between them. For A = Mn (C), these morphisms
consist of a rotation in U (n) followed by an inclusion Ck → C l with k ≤ l. The following theorem
characterizes this category C ( A) up to equivalence. This is the same as characterizing C( A) up
to Morita equivalence, meaning that it determines the topos of contextual sets on C( A) discussed
in Section 3 up to categorical equivalence, rather than determining C( A) itself up to equivalence.
To phrase the following theorem, we introduce the monoid S( X ) of continuous surjections X → X on
a compact Hausdorff space X. In the finite-dimensional case, this is just the symmetric group S(n).
Because of our switch from C( A) to C ( A), it plays the role of the unitary group we need.

Theorem 11 ([64]). Suppose that a C*-algebra A has a weakly terminal commutative C*-subalgebra C ( X ).
A category is equivalent to C ( A) if and only if it is equivalent to a semidirect product of C(C ( X )) and S( X ).

Using this concrete parametrization of C( A) for A = Mn (C), to characterize C( A) it would

suffice to characterize the unitary group U ( A). Surprisingly, this question is open, even in the
finite-dimensional case. All that seems to be known is that, up to isomorphism, U (1) is the unique
nondiscrete locally compact Hausdorff group all of whose proper closed subgroups are finite [112].
This characterization does not generalize to finite dimensions higher than one, although closed
subgroups have received study in the infinite-dimensional case [113]. The unitary group U (n) is
also, up to isomorphism, the unique irreducible subgroup of GL(n) the trace of whose elements is
bounded [114]. It is known that unitary groups of C*-algebras cannot be countably classified [115].
Finally, the characterization of C( B( H )) for Hilbert spaces H could give rise to a description of the
category of Hilbert spaces in terms of generators and relations [116].

7. Generalizations
As mentioned in the introduction, the idea to describe quantum structures in terms of their
classical substructures applies very generally. This final section discusses to what extent algebraic
quantum theory is special, by considering a generalization as an example of another framework.
Namely, we consider categorical quantum mechanics [117]. This approach formulates quantum
theory in terms of the category of Hilbert spaces, and then abstracts away to more general categories
with the same structures. Specifically, what is retained is the notion of a tensor product to be able
to build compound systems, the notion of entanglement in the form of objects that form a duality
under the tensor product, and the notion of reversibility in the sense that every map between Hilbert
spaces has an adjoint in the reverse direction. It turns out that these primitives suffice to derive a lot
of quantum-mechanical features, such as scalars, the Born rule, no-cloning, quantum teleportation,
and complementarity. As a case in point, one can define so-called Frobenius algebras in any category
with this structure, which is important because of the following proposition.

Theorem 12 ([118,119]). Finite-dimensional C*-algebras correspond to Frobenius algebras in the category of

Hilbert spaces.

The point is that these notions make sense in any category with a tensor product, entanglement,
and reversibility. A different example of such a category is that of sets with relations between them.
That is, objects are sets X, and arrows X → Y are relations R ⊆ X × Y. For the tensor product, we
take the Cartesian product of sets, which makes every object dual to itself and thereby fulfulling the
structure of entanglement, and time reversibility is given by taking the opposite relation R† ⊆ Y × X.
Two relations R ⊆ X × Y and S ⊆ Y × Z compose to S ◦ R = {( x, z) | ∃y : ( x, y) ∈ R, (y, z) ∈ S}.
We may regard this as a toy example of possibilistic quantum theory: rather than complex matrices,
we now care about entries ranging over {0, 1}. A groupoid is a small category, every arrow of which is
an isomorphism; they may be considered as a multi-object generalization of groups.

Theorem 13 ([120]). Frobenius algebras in the category of sets and relations correspond to groupoids.

Algebraic quantum theory, as set out in the introduction, makes perfect sense in categories such
as sets and relations as well [121]. However, in this generality, it is not true that all classical subsystems
determine a quantum system at all. The previous theorem provides a counterexample. In commutative
groupoids, there can only be arrows X → X, for arrows g : X → Y between different objects cannot
commute with their inverse, as g ◦ g−1 = 1Y and g−1 ◦ g = 1X . Therefore, any arrow between different
objects in a groupoid can never be recovered from any commutative subgroupoid.
Similarly, quantum logic, as discussed in Section 3, makes perfect sense in this general categorical
setting [122]. Moreover, it matches neatly with algebraic quantum theory via taking projections [123].
However, it is no longer true that commutative subalgebras correspond to Boolean sublattices. Again,
a counterexample can be found using Theorem 13 [124].

294
Entropy 2017, 19, 144

One could object that commutativity might be too narrow a notion of classicality. However,
consider broadcastability instead: classical information can be broadcast, but quantum information
cannot. More precisely, a Frobenius algebra A is broadcastable when there exists a completely positive
map A → A ⊗ A such that both partial traces are the identity A → A. Again, this makes perfect
sense in general categories. It turns out that the broadcastable objects in the category of sets and
relations are the groupoids that are totally disconnected, in the sense that there are no arrows g : X → Y
between different objects [117]. So even with this more liberal operational notion of classicality, classical
subsystems do not determine a quantum system.
This breaks a well-known information-theoretic characterization of quantum theory, that is
phrased in terms of C*-algebras [125,126]. Hence there is something about (algebraic) quantum
theory beyond the categorical properties of having tensor products, entanglement, and reversibility,
that underwrites Bohr’s doctrine of classical concepts. It relates to characterizing unitary groups,
as discussed in Section 6. We close this overview by raising the interesting interpretational question of
just what this deﬁning property is.

Acknowledgments: Supported by EPSRC Fellowship EP/L002388/1.

Conﬂicts of Interest: The author declares no conﬂict of interest.

References
1. Bell, J.S. On the Einstein Podolsky Rosen paradox. Physics 1964, 1, 195–200.
2. Kochen, S.; Specker, E. The problem of hidden variables in quantum mechanics. J. Math. Mech. 1967, 17, 59–87.
3. Pusey, M.; Barrett, J.; Rudolph, T. On the reality of the quantum state. Nat. Phys. 2012, 8, 475–478.
4. Busch, P.; Grabowski, M.; Lahti, P.J. Operational Quantum Physics; Springer: Berlin/Heidelberg, Germany, 1995.
5. Keyl, M. Fundamentals of quantum information theory. Phys. Rep. 2002, 369, 431–548.
6. Kadison, R.V.; Ringrose, J.R. Fundamentals of the Theory of Operator Algebras; Number 15–16 in Graduate
Studies in Mathematics; Academic Press: Cambridge, MA, USA, 1983.
7. Berberian, S.K. Baer ∗ -Rings; Springer: Berlin/Heidelberg, Germany, 1972.
8. Emch, G.G. Mathematical and Conceptual Foundations of 20th-Century Physics, 1st ed.; North-Holland: Amsterdam,
The Netherlands, 1984.
9. Davies, E.B. Quantum Theory of Open Systems; Academic Press: Cambridge, MA, USA, 1976.
10. Earman, J. Superselection rules for philosophers. Erkenn 2008, 69, 377–414.
11. Rédei, M. Quantum Logic in Algebraic Approach; Springer: Cham, The Netherlands, 1998.
12. Haag, R. Local Quantum Physics; Texts and Monographs in Physics; Springer: Berlin/Heidelberg, Germany, 1996.
13. Strocchi, F. An Introduction to the Mathematical Structure of Quantum Mechanics; World Scientific: Singapore, 2008.
14. Emch, G.G. Algebraic Methods in Statistical Mechanics and Quantum Field Theory; Wiley: Hoboken, NJ, USA, 1972.
15. Alberti, P.M.; Uhlmann, A. Existence and density theorems for stochastic maps on commutative C*-algebras.
Math. Nachr. 1980, 97, 279–295.
16. Landsman, N.P. Mathematical Topics between Classical and Quantum Mechanics; Springer: Berlin/Heidelberg,
Germany, 1998.
17. Weaver, N. Mathematical Quantization; Chapman & Hall: London, UK, 2001.
18. Bohr, N. Chapter Discussion with Einstein on epistemological problems in atomic physics. In Albert Einstein:
Philosopher-Scientist; Cambridge University Press: Cambridge, UK, 1949.
19. Heunen, C.; Landsman, N.P.; Spitters, B. A topos for algebraic quantum theory. Commun. Math. Phys. 2009,
291, 63–110.
20. Kadison, R.V.; Singer, I.M. Extensions of pure states. Am. J. Math. 1959, 81, 383–400.
21. Marcus, A.; Spielman, D.A.; Srivastava, N. Interlacing families II: Mixed characteristic polynomials and the
Kadison–Singer problem. Ann. Math. 2015, 182, 327–350.
22. Altepeter, J.B.; James, D.F.V.; Kwiat, P.G. Qubit quantum state tomography. In Quantum State Estimation;
Springer: Berlin/Heidelberg, Germany, 2004.
23. Butterﬁeld, J.; Isham, C.J. A topos perspective on the Kochen–Specker theorem: I. Quantum States as
Generalized Valuations. Int. J. Theor. Phys. 1998, 37, 2669–2733.

295
Entropy 2017, 19, 144

24. Constantin, C.M.; Döring, A. Contextual entropy and reconstruction of quantum states. arXiv 2012,
arXiv:1208.2046.
25. Hamhalter, J.; Turilova, E. Orthogonal measures on state spaces and context structure of quantum theory.
Int. J. Theor. Phys. 2016, 55, 3353–3365.
26. Mac Lane, S. Categories for the Working Mathematician, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 1971.
27. Berg, B.; Heunen, C. Noncommutativity as a colimit. Appl. Categorical Struct. 2012, 20, 393–414.
28. Reyes, M.L. Obstructing extensions of the functor Spec to noncommutative rings. Isr. J. Math. 2012, 192,
667–698.
29. Berg, B.; Heunen, C. Extending obstructions to noncommutative functorial spectra. Theory Appl. Categories
2014, 29, 457–474.
30. Döring, A. Kochen–Specker theorem for von Neumann algebras. Int. J. Theor. Phys. 2005, 44, 139–160.
31. Reyes, M.L. Sheaves that fail to represent matrix rings. In Ring theory and Its Applications; American Mathematical
Society: Providence, RI, USA, 2014; Volume 609, pp. 285–297.
32. Alfsen, E.M.; Shultz, F.W. State Spaces of Operator Algebras: Basic Theory, Orientations, and C*-Products;
Birkhäuser: Basel, Switzerland, 2001.
33. Connes, A. Noncommutative Geometry; Academic Press: Cambridge, MA, USA, 1994.
34. Akemann, C.A. The general Stone–Weierstrass problem. J. Funct. Anal. 1969, 4, 277–294.
35. Giles, R.; Kummer, H. A non-commutative generalization of topology. Indiana Univ. Math. J. 1971, 21, 91–102.
36. De Silva, N. From topology to noncommutative geometry: K-theory. arXiv 2014, arXiv:1408.1170.
37. De Silva, N.; Soares Barbosa, R. Partial and total ideals of von Neumann algebras. arXiv 2014, arXiv:1408.1172.
38. Heunen, C.; Lindenhovius, A.J. Domains of commutative C*-subalgebras. In Proceedings of the 2015
30th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), Kyoto, Japan, 6–10 July 2015;
pp. 450–461.
39. Hanche-Olsen, H.; Størmer, E. Jordan Operator Algebras; Pitman Advanced Publishing Program: Boston, MA,
USA, 1984.
40. Hamhalter, J. Isomorphisms of ordered structures of abelian C*-subalgebras of C*-algebras. J. Math. Anal. Appl.
2011, 383, 391–399.
41. Kaplansky, I. Projections in Banach algebras. Ann. Math. 1951, 53, 235–249.
42. Döring, A.; Harding, J. Abelian subalgebras and the Jordan structure of von Neumann algebras. arXiv 2015,
arXiv:1009.4945.
43. Hamhalter, J. Dye’s theorem and Gleason’s theorem for AW*-algebras. J. Math. Anal. Appl. 2015, 422, 1103–1115.
44. Hamhalter, J.; Turilova, E. Structure of associative subalgebras of Jordan operator algebras. Q. J. Math. 2013,
64, 397–408.
45. Johnstone, P.T. Sketches of an Elephant: A Topos Theory Compendium; Clarendon Press: Oxford, UK, 2002.
46. Landsman, N.P. Bohrification: From Classical Concepts to Commutative Operator Algebras; Springer:
Berlin/Heidelberg, Germany, 2017.
47. Johnstone, P.T. Stone Spaces; Number 3 in Cambridge Studies in Advanced Mathematics; Cambridge
University Press: Cambridge, UK, 1982.
48. Banaschewski, B.; Mulvey, C.J. A globalisation of the Gelfand duality theorem. Ann. Pure Appl. Log. 2006,
137, 62–103.
49. Spitters, B.; Vickers, S.; Wolters, S. Gelfand spectra in Grothendieck toposes using geometric mathematics.
Electron. Proc. Theor. Comput. Sci. 2014, 158, 77–107.
50. Fauser, B.; Raynaud, G.; Vickers, S. The Born rule as structure of spectral bundles. Electron. Proc. Theor.
Comput. Sci. 2012, 95, 81–90.
51. Heunen, C.; Landsman, N.P.; Spitters, B. Bohrification. In Deep Beauty: Understanding the Quantum World
through Mathematical Innovation, Halvorson, H., Ed.; Cambridge University Press: Cambridge, UK, 2011;
pp. 217–313.
52. Caspers, M.; Heunen, C.; Landsman, N.P.; Spitters, B. Intuitionistic quantum logic of an n-level system.
Found. Phys. 2009, 39, 731–759.
53. Heunen, C.; Landsman, N.P.; Spitters, B. Bohrification of operator algebras and quantum logic. Synthese
2012, 186, 719–752.
54. Wolters, S. Topos models for physics and topos theory. J. Math. Phys. 2013, 55, 082110.
55. Nuiten, J. Bohrification of local nets. Electron. Proc. Theor. Comput. Sci. 2011, 95, 211–218.

296
Entropy 2017, 19, 144

56. Döring, A.; Isham, C.J. Topos Methods in the Foundations of Physics. In Deep Beauty: Understanding the
Quantum World through Mathematical Innovation, Halvorson, H., Ed.; Cambridge University Press: Cambridge,
UK, 2011.
57. Döring, A.; Isham, C.J. New Structure for Physics; Chapter What is a thing? Topos theory in the founcations
of physics. In Lecture Notes in Physics; Springer: Berlin/Heidelberg, Germany, 2011; Volume 813; pp. 753–940.
58. Döring, A.; Isham, C.J. A topos founcation for theories of physics. J. Math. Phys. 2008, 49, 053515.
59. Flori, C. A First Course in Topos Quantum Theory; Lecture Notes in Physics; Springer: Berlin/Heidelberg,
Germany, 2013; Volume 868.
60. Wolters, S. A comparison of two topos-theoretic approaches to quantum theory. Commun. Math. Phys. 2013,
317, 3–53.
61. Joyal, A.; Tierney, M. An Extension of the Galois Theory of Grothendieck (Memoirs of the American Mathematical
Society); Proquest Info & Learning: Ann Arbor, MI, USA, 1984; Volume 51.
62. Heunen, C.; Landsman, N.P.; Spitters, B.; Wolters, S. The Gelfand spectrum of a noncommutative C*-algebra:
A topos-theoretic approach. J. Aust. Math. Soc. 2011, 90, 39–52.
63. Berg, B.; Heunen, C. Erratum to: Noncommutativity as a colimit. Appl. Categorical Struct. 2013, 21, 103–104.
64. Heunen, C. Characterizations of categories of commutative C*-subalgebras. Commun. Math. Phys. 2014,
331, 215–238.
65. Spitters, B. The space of measurement outcomes as a spectral invariant for non-commutative algebras.
Found. Phys. 2012, 42, 896–908.
66. De Groote, H.F. Observables IV: The presheaf perspective. arXiv 2007, arXiv:0708.0677.
67. Döring, A. Quantum states and measures on the spectral presheaf. Adv. Sci. Lett. 2009, 2, 291–301.
68. Loveridge, L.; Dridi, R.; Raussendorf, R. Topos logic in measurement-based quantum computation. Proc. R.
Soc. A 2015, 471, 20140716.
69. Heunen, C.; Landsman, N.P.; Spitters, B. The principle of general tovariance. Int. Fall Workshop Geom. Phys.
2008, 1023, 93–102.
70. Epperson, M.; Zafiris, E. Foundations of Relational Realism: A Topological Approach to Quantum Mechanics and
the Philosophy of Nature; Lexington: Lanham, MD, USA, 2013.
71. Adelman, M.; Corbett, J.V. A sheaf model for intuitionistic quantum mechanics. Appl. Categorical Struct.
1995, 3, 79–104.
72. Takeuti, G. C*-algebras and Boolean-valued analysis. Jpn. J. Math. 1983, 9, 207–245.
73. Ozawa, M. A transfer principle from von Neumann algebras to AW*-algebras. J. Lond. Math. Soc. 1985,
32, 141–148.
74. Ozawa, M. A classification of type I AW*-algebras and Boolean-valued analysis. J. Math. Soc. Jpn. 1984,
36, 589–608.
75. Abramsky, S.; Brandenburger, A. The sheaf-theoretic structure of non-locality and contextuality. New J. Phys.
2011, 13, 113036.
76. Abramsky, S.; Jung, A. Domain Theory. In Handbook of Logic in Computer Science; Oxford University Press:
Oxford, UK, 1994; Volume 3.
77. Gierz, G.; Hofmann, K.H.; Keimel, K.; Lawson, J.D.; Mislove, M.W.; Scott, D.S. Continuous Lattices and Domains;
Number 93 in Encyclopedia of Mathematics and its Applications; Cambridge University Press: Cambridge,
UK, 2003.
78. Döring, A.; Barbosa, R.S. Unsharp values, domains and topoi. In Quantum Field Theory and Gravity:
Conceptual and Mathematical Advances in the Search for a Unified Framework; Springer: Berlin/Heidelberg,
Germany, 2011; pp. 65–96.
79. Lindenhovius, A.J. Classifying finite-dimensional C*-algebras by posets of their commutative C*-subalgebras.
Int. J. Theor. Phys. 2015, 54, 4615–4635.
80. Lindenhovius, A.J. C( A). Ph.D. Thesis, Radboud University, Nijmegen, The Netherlands, 5 July 2016.
81. Jensen, H.E. Scattered C*-algebras. Math. Scand. 1977, 41, 308–314.
82. Kalton, N.J.; Ostrovskii, M.I. Distances between Banach spaces. Forum Math. 1999, 11, 17–48.
83. Haagerup, U.; Winsløw, C. The Effros–Maréchal topology in the space of von Neumann algebras. Am. J. Math.
1998, 120, 567–617.
84. Kadison, R.V.; Kastler, D. Perturbations of von Neumann algebras I: Stability of type. Am. J. Math. 1972,
94, 38–54.

297
Entropy 2017, 19, 144

85. Chetcuti, E.; Hamhalter, J.; Weber, H. The order topology for a von Neumann algebra. Stud. Math. 2015,
230, 95–120.
86. Connes, A. A factor not anti-isomorphic to itself. Ann. Math. 1975, 101, 536–554.
87. Spekkens, R.W. The paradigm of kinematics and dynamics must yield to causal structure. Foundational
Questions Institute essay contest winner. arXiv 2013, arXiv:1209.0023.
88. Moffat, J. Groups of Automorphisms of Operator Algebras. Ph.D. Thesis, University of Newcastle upon
Tyne, Newcastle, UK, 1974.
89. Hamhalter, J.; Turilova, E. Automorphisms of ordered structures of abelian parts of operator algebras and
their role in quantum theory. Int. J. Theor. Phys. 2014, 53, 3333–3345.
90. Döring, A. Flows on generalised Gelfand spectra of nonabelian unital C*-algebras and time evolution of
quantum systems. arXiv 2012, arXiv:1212.4882
91. Heunen, C.; Lindenhovius, A.J. Domains of commutative C*-subalgebras. arXiv 2015, arXiv:1504.02730.
92. Geloun, J.B.; Flori, C. Topos analogues of the KMS state. arXiv 2012, arXiv:1207.0227.
93. Alfsen, E.M.; Shultz, F.W. Orientation in operator algebras. Proc. Natl. Acad. Sci. USA 1998, 95, 6596–6601.
94. Heunen, C.; Reyes, M.L. Active lattices determine AW*-algebras. J. Math. Anal. Appl. 2014, 416, 289–313.
95. Chevalier, G. Automorphisms of an orthomodular poset of projections. Int. J. Theor. Phys. 2005, 44, 985–998.
96. Heunen, C.; Reyes, M.L. Diagonalizing matrices over AW*-algebras. J. Funct. Anal. 2013, 264, 1873–1898.
97. Kornell, A. Quantum Collections. arXiv 2012, arXiv:1202.2994.
98. Kornell, A. V*-algebras. arXiv 2015, arXiv:1502.01516.
99. Heunen, C.; Reyes, M.L. On discretization of C*-algebras. J. Oper. Theory 2017, 77, 19–37.
100. Mackey, G.W. The Mathematical Foundations of Quantum Mechanics; W. A. Benjamin: New York, NY, USA, 1963.
101. Bunce, L.J.; Wright, J.D.M. The Mackey–Gleason problem. Bull. Am. Math. Soc. 1992, 26, 288–293.
102. Hamhalter, J. Quantum Measure Theory; Springer: Berlin/Heidelberg, Germany, 2004.
103. Birkhoff, G. Lattice Theory; American Mathematical Society: Providence, RI, USA, 1948.
104. Stonesifer, J.R.; Bogart, K.P. Characterizations of partition lattices. Algebra Univers. 1984, 19, 92–98.
105. Firby, P.A. Lattices and compactifications I. Proc. Lond. Math. Soc. 1973, 27, 22–50.
106. Gudder, S.P. Partial algebraic structures associated with orthomodular posets. Pac. J. Math. 1972, 41, 717–730.
107. Finch, P.D. On the structure of quantum logic. J. Symb. Log. 1969, 34, 415–425.
108. Hughes, R.I.G. Omnibus review. J. Symb. Log. 1985, 50, 558–566.
109. Scheibe, E. The Logical Analysis of Quantum Mechanics; Pergamon Press: Oxford, UK, 1973.
110. Heunen, C. Piecewise Boolean algebras and their domains. Lect. Notes Comput. Sci. 2014, 8573, 208–219.
111. Flori, C.; Fritz, T. Compositories and gleaves. Theory Appl. Categories 2016, 31, 928–988.
112. Morris, S.A. A characterization of the topological group of real numbers. Bull. Aust. Math. Soc. 1986,
34, 473–475.
113. Kadison, R.V. Infinite unitary groups. Trans. Am. Math. Soc. 1952, 72, 386–399.
114. Marcus, M.; Newman, M. Some results on unitary matrix groups. Linear Algebra Its Appl. 1970, 3, 173–178.
115. Kerr, D.; Lupini, M.; Phillips, N.C. Borel complexity and automorphisms of C*-algebras. J. Funct. Anal. 2015,
268, 3767–3789.
116. Heunen, C. On the functor 2 . In Computation, Logic, Games, and Quantum Foundations; Springer:
Berlin/Heidelberg, Germany, 2013; pp. 107–121.
117. Heunen, C.; Vicary, J. Categories for Quantum Theory: An Introduction; Oxford University Press: Oxford, UK, 2017.
118. Vicary, J. Categorical formulation of finite-dimensional quantum algebras. Commun. Math. Phys. 2011,
304, 765–796.
119. Abramsky, S.; Heunen, C. H*-algebras and nonunital Frobenius algebras: First steps in infinite-dimensional
categorical quantum mechanics. Clifford Lect. AMS Proc. Symp. Appl. Math. 2012, 71, 1–24.
120. Heunen, C.; Contreras, I.; Cattaneo, A.S. Relative Frobenius algebras are groupoids. J. Pure Appl. Algebra
2013, 217, 114–124.
121. Coecke, B.; Heunen, C.; Kissinger, A. Categories of quantum and classical channels. Quantum Inf. Process.
2016, 15, 5179–5209.
122. Heunen, C.; Jacobs, B. Quantum logic in dagger kernel categories. Order 2010, 27, 177–212.
123. Heunen, C. Complementarity in categorical quantum mechanics. Found. Phys. 2012, 42, 856–873.
124. Coecke, B.; Heunen, C.; Kissinger, A. Chapter Compositional Quantum Logic. In Computation, Logic, Games,
and Quantum Foundations; Springer: Berlin/Heidelberg, Germany, 2013; pp. 21–36.

298
Entropy 2017, 19, 144

125. Clifton, R.; Bub, J.; Halvorson, H. Characterizing quantum theory in terms of information-theoretic
constraints. Found. Phys. 2003, 33, 1561–1591.
126. Heunen, C.; Kissinger, A. Can quantum theory be characterized by information-theoretic constraints? arXiv
2016, arXiv:1604.05948.

299
entropy
Review
Quantum Theory from Rules on
Information Acquisition
Philipp Andres Höhn 1,2
1 Vienna Center for Quantum Science and Technology, University of Vienna, Boltzmanngasse 5, 1090 Vienna,
Austria; [email protected]
2 Institute for Quantum Optics and Quantum Information, Austrian Academy of Sciences, Boltzmanngasse 3,
1090 Vienna, Austria

Academic Editors: Giacomo Mauro D’Ariano and Paolo Perinotti

Received: 23 January 2017; Accepted: 17 February 2017; Published: 3 March 2017

Abstract: We summarize a recent reconstruction of the quantum theory of qubits from rules
constraining an observer’s acquisition of information about physical systems. This review is
accessible and fairly self-contained, focusing on the main ideas and results and not the technical
details. The reconstruction offers an informational explanation for the architecture of the theory
and speciﬁcally for its correlation structure. In particular, it explains entanglement, monogamy and
non-locality compellingly from limited accessible information and complementarity. As a by-product,
it also unravels new ‘conserved informational charges’ from complementarity relations that
characterize the unitary group and the set of pure states.

Keywords: reconstruction of quantum theory; entanglement; monogamy; quantum non-locality;

conserved informational charges; limited information; complementarity; characterization of unitary
group and state spaces

1. Introduction
Why is the physical world described by quantum theory? If we wish to sensibly address this
question, we have to step beyond quantum theory and to consider it within a landscape of alternative
theories. This, after all, permits us to ponder about how the world could have been different, possibly
described by modifications of quantum theory. Such an endeavor forces us to leave the usual textbook
formulation of quantum theory, and everything we take for granted about it, behind and to develop
a more general language that also applies to alternative theories. Ideally, this language should be
operational, encompassing the interactions of some observer with physical systems in a plethora of
conceivable, physically-distinct worlds.
If we wish to also provide a possible answer to the above question, we then have to find
physical properties of quantum theory that single it out, at least within the given landscape of
alternatives. In particular, the goal should be to find an operational justification for the textbook
axioms, i.e., ultimately for complex Hilbert spaces, unitary dynamics, tensor product structure for
composite systems, Born rule, and so on. The result would be a reconstruction of quantum theory from
operational axioms [1–10] and should ideally yield a better understanding of what quantum theory
tells us about Nature; and why it is the way it is.
In this manuscript, we shall review and summarize how the quantum formalism for arbitrarily
many qubits can be reconstructed from operational rules restricting an observer’s acquisition of
information about a set of observed systems [1,2]. The goal of this summary is to provide a
didactical and easily-accessible overview of this reconstruction. Its underlying framework is especially
engineered for unraveling the architecture of quantum theory, and so many reconstruction steps are
instructive for understanding the origin of quantum properties. As we shall see, this reconstruction

Entropy 2017, 19, 98; doi:10.3390/e19030098 301 www.mdpi.com/journal/entropy

Entropy 2017, 19, 98

provides a transparent, informational explanation for the structure of qubit quantum theory and
especially also for its paradigmatic features, such as entanglement, monogamy and non-locality.
The approach also produces novel ‘conserved informational charges’, indeed appearing in quantum
theory, that turn out to characterize the unitary group and the set of pure states and which might ﬁnd
practical applications in quantum information.
The premise of the summarized approach is to only speak about information that the
observer has access to. It is thus purely operational and survives without any ontological
commitments. This approach is inspired, in part, by Rovelli’s relational quantum mechanics [11]
and the Brukner–Zeilinger informational interpretation of quantum theory [12,13]; this successful
reconstruction can be viewed as a completion of these ideas for qubit systems.
The rest of the manuscript is organized as follows. In Section 2, we review the landscape of alternative
theories; in Section 3, we formulate the operational quantum axioms; in Section 4, we summarize the
key steps of the reconstruction itself and, ﬁnally, conclude in Section 5.

2. Overview of a Landscape of Theories

We shall begin with an overview of a landscape of alternative theories, which has been developed
in [1,2] to which we also refer for further details.

2.1. From Questions and Answers to Probabilities and States

Our first aim is to define a notion of a state both for a single system and an ensemble of systems.
Consider an observer O who interrogates an ensemble of (identically prepared [1]) systems
{Sa }na=1 , coming out of a preparation device, with binary questions Qi from some set Q. For example,
in the case of quantum theory, such a question could read “is the spin of the electron up in x-direction?”
This set Q shall only contain repeatable questions in the sense that O will receive m ∈ N times the
same answer whenever asking any Qi ∈ Q m times in immediate succession to a single system Sa .
We shall assume any Sa to always give a definite answer if asked some Qi ∈ Q, which moreover
is not independent of Sa ’s preparation. Accordingly, Q can only contain physically-implementable
questions, which are ‘answerable’ by the {Sa } and not arbitrary logically conceivable binary questions.
Furthermore, since we assume definite answers, we do not address the measurement problem.
The answers to the Qi ∈ Q given by the {Sa } shall follow a specific statistics for each way of preparing
the {Sa } (for n sufficiently large). The set of all the possible answer statistics for all Qi ∈ Q for all
preparations is denoted by Σ.
O, being a good experimenter, has developed, through his experiments, a theoretical model for
Q and Σ which he employs to interpret the outcomes of his interrogations (and to decide whether a
question is in Q or not). This permits O to assign, for the next Sa to be interrogated, a prior probability
yi that Sa ’s answer to Qi ∈ Q will be ‘yes’. Namely, O determines yi through a belief updating—in a
broadly Bayesian spirit—according to his model of Σ, any prior information on the way of preparation
and possibly to the frequencies of ‘yes’ answers to questions from Q, which he may have recorded in
previous interrogation runs on systems identically prepared to Sa . (We add “broadly” here as we also
consider the typical laboratory situation of an ensemble of systems.) In particular, O may also not have
carried out previous interrogations on systems identically prepared to Sa (e.g., if the ensemble contains
only the single Sa ) in which case, he will estimate the prior yi for the single Sa solely according to his
model of Σ and any prior information about the preparation (more on this and update rules will be
discussed in Sections 2.3 and 2.4).
While Q need not necessarily contain all binary measurements that O could, in principle, perform
on the {Sa }, we shall assume that Q is ‘tomographically complete’ in the sense that the {yi }∀ Qi ∈Q
are sufficient to compute the probabilities for all other physically realizable measurements possibly
not contained in the Q, as well. Hence, the yi encode everything O could possibly say about the
future outcomes to arbitrary experiments on the {Sa } in his laboratory. It will therefore be sufficient to
henceforth restrict O to acquire information about the Sa solely through the Qi ∈ Q. It is also natural

302
Entropy 2017, 19, 98

to identify O’s ‘catalog of knowledge’ about the given Sa , i.e., the collection of {yi }∀ Qi ∈Q , with the
state of Sa relative to O. This is a state of information and an element of Σ. Conversely, any element in
Σ assigns a probability yi to all Qi ∈ Q. Thus, we identify Σ with the state space of Sa .
The state {yi }∀ Qi ∈Q is the prior state for the single Sa to be interrogated next, but also coincides
with the state O assigns to the ensemble {Sa } (which may only contain a single member) given that its
members are identically prepared [1].

2.2. Time Evolution of O’s “Catalog of Knowledge”

We permit O to subject the {Sa } to interactions, which cause a state {yi (t0 )}∀ Qi ∈Q at time t0 to
evolve in time to another legitimate state. Any permitted time evolution shall be temporally translation
invariant, thus deﬁning a one-parameter map TΔt ({yi (t0 )}∀ Qi ∈Q ) = {yi (t0 + Δt)}∀ Qi ∈Q from Σ to
itself, which only depends on the time interval Δt, but not on t0 . We denote by T the set of all time
evolutions to which we allow O to expose the {Sa }.
Clearly, T is a further crucial ingredient of O’s world model; his model for describing his
interrogations with the {Sa } is thus encoded in the triple (Q, Σ, T ).

2.3. Convexity and State of No Information

It will be our challenge to unravel what O’s world model is. This requires us to subject the triple
(Q, Σ, T ) to a number of further operational conditions that are ‘natural’ in the context of information
acquisition with a broadly Bayesian spirit. Upon imposing the quantum postulates, this will turn
out to restrict Q and T to incorporate only a ‘natural’ subset of all possible quantum measurements
and time evolutions, namely projective binary measurements and unitaries, respectively (rather than
arbitrary positive operator-valued measures (POVMs) and completely positive maps). However, this
sufﬁces for our purposes to reconstruct the textbook quantum formalism.
To account for the possibility of randomness in the method of preparation, we assume Σ to be
convex. Consider a collection of identical systems (i.e., with identical (Q, Σ, T )) that are not necessarily
in identical states and for which O uses a cascade of biased coin tosses to decide which system to
interrogate. Then O is enabled to assign a single prior state to this collection, which is a convex
combination of their individual states.
Next, we assume the existence of a special method of preparation, which generates even
completely random answer statistics over all Qi ∈ Q. This preparation is described by a
special state in Σ, namely yi = 12 , ∀ Qi ∈ Q, and shall be called the state of no information.
This distinguished state is a constraint on the pair (Q, Σ). (E.g., in quantum theory, the pair
({binary POVMs}, {density matrices}) does not satisfy this condition because there exist inherently
biased POVMs, while ({projective binary measurements}, {density matrices}) does.) It plays two
crucial roles: it deﬁnes (1) the prior state of Sa that O will start with in a Bayesian updating when he
has no ‘prior information’ about the {Sa } (except what his model (Q, Σ, T ) is); and (2) an unambiguous
notion of the (in-)dependence of questions (cf. Section 2.4), which otherwise would be state dependent.
(E.g., in quantum theory, the questions Q x1 = “Is the spin of Qubit 1 up in x-direction?” and Q x2 = “Is
the spin of Qubit 2 up in x-direction?” are independent relative to the completely mixed state, however
not relative to a state with entanglement in x-direction.)

2.4. State Updating and (In)Dependence and Compatibility of Questions

There are two kinds of state update rules, one for the state of the ensemble {Sa } (which coincides
with the prior state assigned to the next Sa to be interrogated) and one for the posterior state of a given
ensemble member Sa . In a single shot interrogation, O receives a single Sa , assigns a prior state to it
according to his prior information (cf. Section 2.1), interrogates it with some questions from Q (without
intermediate re-preparation) and, depending on the answers, updates the prior to a posterior state
valid for this speciﬁc Sa only. This requires a consistent posterior state update rule, which permits
O to update the probabilities yi for all Qi ∈ Q in a manner that respects the structure of Σ and the

303
Entropy 2017, 19, 98

repeatability of questions (i.e., an answer Qi = ‘yes’ or ‘no’ must have a posterior yi = 1 or 0 as a

consequence, respectively). This is also a belief updating, but about the single Sa , and is not the same
as in Sections 2.1 and 2.3. Specifically, the posterior state of Sa may differ significantly from its prior
state if O has experienced an information gain on at least some Qi ∈ Q (this will necessarily happen
when complementary questions are involved; see below). This is the ‘collapse’ of the state: it is merely
O’s update of information about the specific Sa [1].
By contrast, in a multiple shot interrogation, O carries out a single shot interrogation on each
member of an entire (identically prepared [1]) ensemble {Sa } to do ensemble state tomography and
estimate the state of the ensemble from his/her prior information about the preparation and the
collection of posterior states from the single shot interrogations. With every further interrogated Sa , O
updates the ensemble state, which coincides with the prior state of the next system from the ensemble
to be interrogated. Accordingly, this requires a prior state update rule. This is the belief updating
alluded to in Sections 2.1 and 2.3 about the ensemble {Sa }.
It will not be necessary to specify these two update rules in detail; we just assume O uses consistent
ones. Specifically, given a posterior state update rule, we shall call Qi , Q j ∈ Q

(maximally) independent if, after having asked Qi to S in the state of no information, the posterior
probability y j = 12 . That is, if the answer to Qi relative to the state of no
information tells O ‘nothing’ about the answer to Q j .
dependent if, after having asked Qi to S in the state of no information, the posterior
probability y j
= 12 (if y j = 0 or 1, they are maximally dependent). That
is, if the answer to Qi relative to the state of no information gives O at
least partial information about the answer to Q j .
(maximally) compatible if O may know the answers to both Qi , Q j simultaneously, i.e., if there
exists a state in Σ such that yi , y j can be simultaneously zero or one.
(maximally) complementary if every state in Σ, which features yi = 0, 1, necessarily implies y j = 12 .
Notice that complementarity implies independence (but not vice versa).

(One can also deﬁne partial compatibility similarly [1].) These relations shall be symmetric; e.g., Qi is
independent of Q j if and only if Q j is independent of Qi , etc.
We impose a ﬁnal condition on the posterior state update rule: if Qi , Q j are maximally compatible
and independent, then asking Qi shall not change y j , i.e., O’s information about Q j .

2.5. Informational Completeness

The fundamental building blocks of the theories in the landscape that we are constructing are to
be sets of pairwise independent questions. This will help to render the convoluted parametrization
of a state by {yi }∀ Qi ∈Q more economical. Consider a set of pairwise independent questions Q M :=
{ Q1 , . . . , Q D }; it is called maximal if no question from Q \ Q M can be added to Q M without destroying
the pairwise independence of its elements. We shall assume that any maximal Q M is informationally
complete in the sense that all {yi }∀ Qi ∈Q can be computed from the corresponding probabilities {yi }iD=1
for all states in Σ. Any such Q M features D elements [1] such that Σ becomes a D-dimensional convex
set and states become vectors:
⎛ ⎞
y1
⎜ ⎟
⎜ y2 ⎟
y = ⎜ ⎟
⎜ .. ⎟ .
⎝ . ⎠
yD

2.6. Information Measure

Our focus is O’s acquisition of information, so we need to quantify O’s information about the
systems. Since Qi ∈ Q is binary, we quantify O’s information about Sa ’s answer to it by a function α(yi )

304
Entropy 2017, 19, 98

with 0 ≤ α(yi ) ≤ 1 bit and α(y) = 0 bit ⇔ y = 12 and α(1) = α(0) = 1 bit. O’s total information
about a Sa must be a function of the state; we make an additive ansatz:

D
I (y) := ∑ α ( y i ). (1)
i =1

The quantum postulates will single out the speciﬁc function α.

Consider a set { Q1 , . . . , Qn } of mutually (maximally) complementary questions. It is clear that
whenever O has maximal information α(yi ) = 1 bit about Qi from this set, he must have zero bits
of information about all other questions in the set. We require more generally that such a set cannot
support more than one bit of information, regardless of the state:

α(y1 ) + · · · + α(yn ) ≤ 1 bit (2)

for otherwise O could, for some states, reduce his total information about such a set by asking another
question from it. These complementarity inequalities represent informational uncertainty relations that
describe how the information gain about one question enforces an information loss about questions
complementary to it (see also the state ‘collapse’ in Section 2.4).

2.7. Composite Systems and (Classical) Rules of Inference

O must be able to tell a composite system apart into its constituents purely by means of the
information accessible to him through interrogation and thus ultimately by means of the question sets.
Let systems S A , SB have question sets Q A , Q B . It is then natural to say that they deﬁne a composite
system S AB if any Q a ∈ Q A is maximally compatible with any Qb ∈ Q B and if:

Q AB = Q A ∪ Q B ∪ Q̃ AB , (3)

where Q̃ AB only contains composite questions, which are iterative compositions, Q a ∗1 Qb , Q a ∗2

( Q a ∗3 Qb ), ( Q a ∗4 Qb ) ∗5 Qb , ( Q a ∗6 Qb ) ∗7 ( Q a ∗8 Qb ), . . ., via some logical connectives ∗1 , ∗2 , ∗3 , · · · ,
of individual questions Q a , Q a , . . . ∈ Q A about S A and Qb , Qb , . . . ∈ Q B about SB . This deﬁnition is
extended recursively to composite systems with more than two subsystems.
Since O can never test the truthfulness of statements about the logical connectives of
complementary questions through interrogations and since all propositions must have operational
meaning, we shall permit O to logically connect two (possibly composite) questions directly with some
∗ only if they are compatible. For the same reason, O is allowed to apply classical rules of inference
(in terms of Boolean logic) exclusively to sets of mutually-compatible questions.
We stress that this deﬁnition of composite systems is distinct from the usual state tensor product
rule in generalized probabilistic theories coming from local tomography [3–5]. In particular, this
composition rule admits non-locally tomographic composites (see Section 4.3).

2.8. Computing Probabilities and Questions as Vectors

Thanks to informational completeness, the probability function Y ( Q|y) ∈ [0, 1] that Q = ‘yes’,
given the state y, exists for all Q ∈ Q and y ∈ Σ. As shown in [2], the exhibited structure yields:

1
Y ( Q|y) = Y (q|y) = q · (2y − 1) + 1 , (4)
2

where q ∈ RD is a question vector encoding Q ∈ Q and 1 is a vector with each coefﬁcient equal to one
in the basis corresponding to Q M . This equation gives rise to (part of) the Born rule.
Suppose Q, Q ∈ Q were both encoded by the same q. Then, by (4), they would be probabilistically
indistinguishable, and O must view them as logically equivalent. O is free to remove any such
redundancy from his description of Q upon which every permissible question vector q will encode

305
Entropy 2017, 19, 98

a unique Q ∈ Q. Finally, for every Q ∈ Q, there exists a state yQ , which is the updated posterior
state of Sa after O received a ‘yes’ answer to the single question Q from Sa in the (prior) state of no
information. O had zero bits of information before, and yQ encodes a single independent question
answer, so we naturally require that it encodes one independent bit. Hence, for every Q ∈ Q, there
exists yQ ∈ Σ with I (yQ ) = 1 bit, such that Y ( Q|yQ ) = 1. (In quantum theory, the yQ will only turn
out to be pure states for a single qubit; e.g., for two qubits and Q = ‘Is the spin of Qubit 1 up in
z-direction?’, represented by the rank-two projector Pz1 = 12 (1 + σz ⊗ 12×2 ), yQ corresponds to the
mixed state ρz1 = 14 (1 + σz ⊗ 12×2 ). Clearly, tr( Pz1 ρz1 ) = 1.)

3. The Quantum Principles as Rules Constraining O’s Information Acquisition

In the sequel, we consider the most elementary of information carriers. Within the introduced
landscape of theories, we now establish rules on O’s acquisition of information that single out the
quantum theory of a composite system S N of N ∈ N qubits, modeled in our language by a triple
(Q N , Σ N , T N ). Effectively, these rules constitute a set of ‘coordinates’ for quantum theory on this
landscape. The rules are spelled out ﬁrst colloquially, then mathematically and are motivated in more
detail in [1,2].
Empirically, the information accessible to an experimenter about (characteristic properties of)
elementary systems is limited. For example, an experimenter may know one binary proposition about
an electron (e.g., its spin in x-direction), but nothing fully independent of it (and similarly for a classical
bit). We shall characterize a composition of N elementary systems according to how much information
is, in principle, simultaneously available to O.

Rule 1. (Limited information) “The observer O can acquire maximally N ∈ N independent bits of
information about the system S N at any moment of time.”
There exists a maximal set Qi , i = 1, . . . , N, of N mutually maximally independent and compatible questions
in Q N .

O can thereby distinguish maximally 2 N states of S N in a single shot interrogation.

However, empirically, elementary systems admit more independent propositions than what,
due to the information limit, they are able to answer at a time. This is Bohr’s complementarity.
The unanswered properties must be random (and so ‘in superposition’) because the information
limit makes it impossible to ascribe deﬁnite outcomes to them. For example, an experimenter may
also inquire about the spin of the electron in y-direction. Yet doing so is at the total expense of his
information about its spin in the x- and z-directions, and subsequent such measurements have random
outcomes. For the N elementary systems, we assert the existence of complementarity.

Rule 2. (Complementarity) “The observer O can always get up to N new independent bits of
information about the system S N . However, whenever O asks S N a new question, he experiences no
net loss in his total amount of information about S N .”
There exists another maximal set Qi , i = 1, . . . , N, of N mutually maximally independent and compatible
questions in Q N , such that Qi , Qi are maximally complementary and Qi , Q j
=i are maximally compatible.

The peculiar mathematical form of Rule 2 becomes intuitive upon recalling that S N is a composite
system, such that complementarity should exist per elementary system [1].
Rules 1 and 2 are conceptually inspired by (non-technical) proposals made by Rovelli [11] and
Zeilinger and Brukner [12,13]. These rules say nothing about what happens in-between interrogations.
Naturally, we demand O not to gain or lose information without asking questions.

Rule 3. (Information preservation) “The total amount of information O has about (an otherwise
non-interacting) S N is preserved in-between interrogations.”
I (y) is constant in time in-between interrogations for (an otherwise non-interacting) S N .

306
Entropy 2017, 19, 98

Hence, O’s total information I (y) is a ‘conserved charge’ of any time evolution TΔt ∈ T N .
The more interactions to which O may subject S N are available, the more ways in which any state
may, in principle, change in time and, thus, the more ‘interesting’ O’s world. We therefore demand
that any time evolution is physically realizable as long as it is consistent with the other rules (since
Σ N , T N are interdependent, this is distinct from ‘maximizing the number’ of states).

Rule 4. (Time evolution) “O’s ‘catalog of knowledge’ about S N evolves continuously in time in-between
interrogations, and every consistent such evolution is physically realizable.”
T N is the maximal set of transformations TΔt on states such that, for any ﬁxed state y, TΔt (y) is continuous in
Δt and compatible with Principles 1–3 (and the structure of the theory landscape).

(If we did not require this ‘maximality’ of T N , we would still ultimately obtain a linear, unitary
evolution, but not necessarily the full unitary group. This is the sole reason for demanding ‘maximality’.
Note that Principles 3 and 4 are not equivalent to the axiom of ‘continuous reversibility’ of generalized
probabilistic theories [3–5].)
We shall also allow O to ask any question to S N which ‘makes (probabilistic) sense’.

Rule 5. (Question unrestrictedness) “Every question that yields legitimate probabilities for every way of
preparing S N is physically realizable by O.”
Every question vector q ∈ RDN that satisﬁes Y (q|y) ∈ [0, 1] ∀y ∈ Σ N and for which there exists yQ ∈ Σ N
with I (yQ ) = 1 bit, such that Y (q|yQ ) = 1 corresponds to a Q ∈ Q N .

(Without Principle 5, we would still obtain the structure of an informationally complete set Q MN ,
finding that it encodes a basis of projective Pauli operator measurements [2]; Principle 5 legalizes all
such measurements.)
These five rules turn out to leave two solutions for the triple (Q N , Σ N , T N ). Remarkably, they
cannot distinguish between complex and real numbers. Namely, the two solutions are qubit and
rebit quantum theory, i.e., two-level systems over real Hilbert spaces [1,2]. Since the latter is both
mathematically and physically a subcase of the former, these five rules can be regarded as sufficient.
However, if one also wishes to discriminate rebits operationally, then an extra rule, adapted from [3–5]
and imposed solely for this purpose (it is partially redundant), succeeds.

Rule 6. (Tomographic locality) “O can determine the state of the composite system S N by interrogating only
its subsystems.”

As shown in [1,2], Rules 1–6 are equivalent to the textbook axioms. More precisely:

Claim. The only solution to Rules 1–6 is qubit quantum theory where:

Σ N convex hull of CP2 −1 is the space of 2 N × 2 N density matrices over C2 ,

N N
•
• states evolve unitarily according to T N PSU(2N ) and the equation describing the state dynamics is
(equivalent to) the von Neumann evolution equation,
Q N CP2 −1 is (isomorphic to) the set of projective measurements onto the +1 eigenspaces of N-qubit
N
•
N
Pauli operators (a Hermitian operator on C2 is a Pauli operator iff it has two eigenvalues ±1 of equal
multiplicity), and the probability for Q ∈ Q N to be answered with ‘yes’ in some state is given by the Born
rule for projective measurements.

4. Synopsis of the Reconstruction Steps and Key Results

Since this gives rise to a constructive derivation of the explicit architecture of qubit quantum theory,
it involves a large number of individual steps compared to the rather abstract reconstructions [3–10].
However, this is also rewarding as it offers novel informational explanations for typical features of

307
Entropy 2017, 19, 98

quantum theory, and so many reconstruction steps are actually quite instructive. We now provide a
summary of key results and reconstruction steps from [1,2] (to which we refer for technical details)
needed for proving the claim of the previous section.

4.1. Logical Connectives for Building Informationally Complete Sets

The ﬁrst task is to build informationally complete sets Q MN [1]. The conjunction of Rules 1 and 2
implies that Q M1 = { Q1 , Q2 , . . . , Q D1 } for a single elementary system must be a maximal mutually
complementary set with D1 ≥ 2. We changed notation slightly compared to rules 1 and 2, labeling
complementary questions by numbers, not primes. Of course, in quantum theory, D1 = 3; the more
involved N = 2 case will entail this. The structure (3) of a composite system implies that Q M2 should
contain individual questions about its subsystems. Continuing with a slight change of notation, we
denote Q M1 for System 1 by { Q1 , Q2 , . . . , Q D1 } and for System 2 with a prime by { Q1 , Q2 , . . . , QD1 }.
Apart from these individual questions, Q M2 should contain composite questions Qi ∗ Qj for some
connective ∗. Pairwise independence of Q M2 enforces that ∗ must satisfy the following truth table,
where ‘yes’ = 1 and ‘no’ = 0 (Qi , Qj are compatible) [1]:

Qi Qj Qi ∗ Qj
0 1 a
1 0 a a
= b a, b ∈ {0, 1}. (5)
1 1 b
0 0 b

Hence, ∗ is either the XNOR ↔ (for a = 0, b = 1) or its negation, the XOR ⊕ (for a = 1, b = 0). Up to
an overall negation ¬, the two connectives are logically equivalent, and so, we henceforth make the
convention to only build up composite questions (for informationally complete sets) using the XNOR.
The composite question Qij := Qi ↔ Qj is a ‘correlation question’, representing “are the answers to
Qi , Qj the same?.” Ultimately, in quantum theory, ↔ will turn out to correspond to the tensor product
⊗ in σi ⊗ σj where σi is a Pauli matrix; Qij will then correspond to “are the spins of Qubit 1 in the i-
and of Qubit 2 in the j-direction correlated?.”

4.2. Question Graphs, Independence and Compatibility for N = 2 and Entanglement

It is convenient to represent questions graphically: individual questions are represented as
vertices and bipartite correlation questions as edges between them. For instance, we may have:

system 1 system 2 system 1 system 2 system 1 system 2

Q1 Q1 Q1 Q11 Q1 Q1 Q11 Q1

Q22 Q31
Q2 Q2 Q2 Q2 Q2 Q2

Q3 Q3 Q3 Q3 Q3 Q23 Q3

.. .. .. .. .. .. .. .. ..
. . . . . . . . .
QD1 QD1 QD1 QD1 QD1 QD1 D1 Q. D1

Since O is only allowed to connect compatible questions logically, there can be no edge between
individual questions of the same system.
Using only Rules 1 and 2 and logical arguments, the following result is proven in [1]:

Lemma 1. Qi , Qj , Qij are pairwise independent for all i, j = 1, . . . , D1 and will thus be part of an
informationally complete set Q M2 . Furthermore:
(i) Qi is compatible with Qij , ∀ j = 1, . . . , D1 and complementary to Qkj , ∀ k
= i and ∀ j = 1, . . . , D1 .
That is, graphically, an individual question Qi is compatible with a correlation question Qij if and only if

308
Entropy 2017, 19, 98

its corresponding vertex is a vertex of the edge corresponding to Qij . By symmetry, the analogous result
holds for Qj .
(ii) Qij and Qkl are compatible if and only if i
= k and j
= l. That is, graphically, Qij and Qkl are compatible
if their corresponding edges do not intersect in a vertex and complementary if they intersect in one vertex.

For example, Q1 in the third question graph above is compatible with Q11 and complementary to
Q22 , while Q11 and Q22 are compatible and Q11 and Q31 are complementary.
This lemma has a striking consequence: it implies entanglement. Indeed, since, e.g., Q11 and
Q22 are independent and compatible, O may spend his maximally accessible amount of N = 2
independent bits of information (Rule 1) over correlation questions only. Since non-intersecting edges
do not share a common vertex, the lemma implies that no individual question is simultaneously
compatible with two correlation questions that are compatible. Hence, when knowing the answers to
Q11 , Q22 , O will be entirely ignorant about the individual questions; O has then maximal information
about S2 , but purely composite information. This is entanglement in the very sense of Schrödinger
(“...the best possible knowledge of a whole does not necessarily include the best possible knowledge of all its
parts...” [14]). For example, in quantum theory, a state with Q11 = Q22 = ‘yes’ will coincide with a
Bell state having the spins of Qubits 1 and 2 correlated in x- and y-direction (and anti-correlated in
z-direction). Of course, there is nothing special about Q11 , Q22 , and the argument works similarly for
other composite question pairs and can be extended also to states with non-maximal entanglement
(see [1] for details).
For systems with limited information content, entanglement is therefore a direct consequence of
complementarity; without it there would be no independent and compatible composite questions
sufﬁcient to saturate the information limit [1]. For instance, two classical bits satisfy Rule 1, as well,
but admit no complementarity so that Qcbit
M2 = { Q1 , Q1 , Q11 } and the maximum amount of N = 2
independent bits cannot be spent on composite questions only.

SA SB

We also note that Rules 1 and 2 offer a simple, intuitive explanation for monogamy of entanglement.
Consider, for a moment, N = 3 elementary systems S A , SB , SC , and suppose S A and SB are maximally
entangled (say, because O received the answer Q11 = Q22 = ‘yes’ from S AB ). Noting that S AB
is a composite bipartite system inside the tripartite S ABC , O has then already spent his maximal
amount of information of N = 2 independent bits, which he may know about S AB and can therefore
not know anything else that is independent, including non-trivial correlations with SC , about the
pair. To saturate the N = 3 independent bit limit for the tripartite system S ABC , he may then only
inquire about individual information about SC . This is monogamy in its extreme form: the maximally
entangled pair S AB cannot be entangled with any other system SC . This heuristic argument can be
made rigorous in terms of the compatibility and independence structure of questions for N ≥ 3 and
can be extended to the non-extremal case using informational monogamy inequalities [1].

4.3. A Logical Explanation for the Three-Dimensionality of the Bloch Ball

A key result of the reconstruction, proven in [1] is the following. Since its proof is instructive and
representative for this approach, we shall rephrase it here.

Theorem 1. D1 = 2 or 3.

Proof. Consider the N = 2 case. Lemma 1 implies that any maximal set of pairwise compatible
correlation questions has D1 elements. Indeed, there are maximally D1 non-intersecting edges between

309
Entropy 2017, 19, 98

the D1 vertices of System 1 and the D1 vertices of System 2; e.g., the D1 ‘diagonal’ Qii :

Q11

Q22
Q33
..
.
QD1 D1

are pairwise independent and compatible. The constraints on the posterior state update rule in
Section 2.4 entail that they are also mutually compatible (Specker’s principle) [1] such that O may
simultaneously know the answers to all D1 Qii . Since O may not know more than N = 2 independent
bits (Rule 1), the D1 Qii cannot be mutually independent if D1 > 2. Thus, assuming the Qii are of
equivalent status, the answers to any pair of them, say Q11 , Q22 , must imply the answers to all others,
say Qii , i = 3, . . . , D1 . Hence, Q jj = Q11 ∗ Q22 , j
= 1, 2, for a connective ∗ that preserves pairwise
independence of Q11 , Q22 , Q jj . Reasoning as in (5) implies that either:

Q jj = Q11 ↔ Q22 , or Q jj = ¬( Q11 ↔ Q22 ), j = 3, . . . , D1 (6)

so that for D1 > 3 Q jj , j = 3, . . . , D1 could not be pairwise independent. Arguing identically for all
other sets of D1 pairwise independent and compatible Qij , we conclude that D1 ≤ 3.

This theorem has several crucial repercussions. We may already suggestively call D1 = 2 and
D1 = 3 the ‘rebit’ (two-level systems over real Hilbert spaces) and ‘qubit’ case, respectively. Reasoning
as in (6) shows that the Qij are logically closed under ↔; as demonstrated in [1]:

Theorem 2. If D1 = 3, then Q M2 := { Qi , Qj , Qij }i,j=1,2,3 is logically closed under ↔ and, thus, constitutes
an informationally complete set for N = 2 with D2 = 15.
If D1 = 2, then Q M2 = { Qi , Qj , Qij , Q11 ↔ Q22 }i,j=1,2 is logically closed under ↔ and, thus, constitutes
an informationally complete set for N = 2 with D2 = 9. Furthermore, Q11 ↔ Q22 is complementary to the
individual questions Qi , Qj , i, j = 1, 2.

Indeed, D2 = 9, 15 are the correct numbers of degrees of freedom for N = 2 rebits and qubits,
respectively. However, since the composite question Q11 ↔ Q22 is complementary to all individual
questions in the rebit case (this is not true in the qubit case!), it is impossible for O to do ensemble state
tomography by asking only individual questions Qi , Qj , thereby violating Rule 6. We are left with the
qubit case and shall henceforth ignore rebits (for rebits see [1]).

4.4. Ruling out Local Hidden Variables and the Correlation Structure for N = 2
Using (6) and repeating the argument leading to it for ‘non-diagonal’ Qij show that either:

Q11 ↔ Q22 = Q12 ↔ Q21 , or Q11 ↔ Q22 = ¬( Q12 ↔ Q21 ). (7)

The ﬁrst case (without relative negation) is the case of classical logic and compatible with local hidden
variables for the individual questions Qi , Qj . Namely, note that Q11 ↔ Q22 = Q12 ↔ Q21 can be
rewritten in terms of the individuals as:

( Q1 ↔ Q1 ) ↔ ( Q2 ↔ Q2 ) = ( Q1 ↔ Q2 ) ↔ ( Q2 ↔ Q1 ). (8)

310
Entropy 2017, 19, 98

Suppose for a moment that Q1 , Q1 , Q2 , Q2 had simultaneous definite values (although not accessible
to O). It is easy to convince oneself that any distribution of simultaneous truth values over the Qi , Qj
satisfies (8) [1]. In fact, (8) is a classical logical identity and can be argued to follow from classical
rules of inference [1]. However, it involves complementary individual questions, thereby violating
our premise from Section 2.7 that O may apply classical rules of inference exclusively to mutually
compatible questions. This classical case is thus ruled out.
One can check that the second case, Q11 ↔ Q22 = ¬( Q12 ↔ Q21 ), does not admit a local hidden
variable interpretation, but is consistent with the structure of the theory landscape and rules [1].
Since one of the two cases (7) must be true, we conclude that this second case holds. In fact, for any
complementary pairs Q, Q and Q , Q such that both Q and Q are compatible with both Q , Q ,
one finds similarly [1]:
' (
( Q ↔ Q ) ↔ ( Q ↔ Q ) = ¬ ( Q ↔ Q ) ↔ ( Q ↔ Q ) . (9)

This precludes to reason classically about the distribution of truth values over O’s questions.
Equation (9) permits us to unravel the complete correlation structure for Q M2 . In fact, it turns
out that there are two distinct representations of this correlation structure: one corresponding to
quantum theory in its standard representation, the other to its ‘mirror’ representation, related by a
passive (not a physical) transformation, reassigning Q1 → ¬ Q1 (in quantum theory tantamount to a
partial transpose on qubit 1) [1]. The two distinct representations turn out to be physically equivalent,
and so, a convention has to be made. Choosing the ‘standard’ case and using (9), one ﬁnds that
the compatibility and correlation structure of Q M2 can be represented graphically as in Figure 1.
For Q, Q , Q compatible, we shall henceforth distinguish between:

even correlation: if Q = Q ↔ Q and

odd correlation: if Q = ¬( Q ↔ Q ).

identify

identify identify Q33 Q13 Q11

Q32
Q23 Q13
+ + + Q1 Q21
Q12 Q21 +
− − Q3 Q3 Q1 +
+
Q2 +
+ Q2 Q3
Q31 Q33 Q32 Q12 Q2 + +
Q31
Q1
+ − +
Q22 Q23
identify
Q22 Q11
Q13 Q23 identify

Figure 1. The compatibility and correlation structure of the informationally complete set Q M2 for the
N = 2 qubit case. Two questions are compatible if connected by a triangle edge and complementary
otherwise. Red and green triangles denote odd and even correlation, respectively; e.g., Q33 = ¬( Q11 ↔
Q22 ) = Q12 ↔ Q21 . (Taken from [1].)

One can easily check that quantum theory satisﬁes this correlation structure for projective spin
measurements if one replaces i = 1, 2, 3 by x, y, z. For instance, Q11 = Q22 = ‘yes’ implies, by Figure 1,
the dependent Q33 = ‘no’. In quantum theory, this corresponds to the (unnormalized) Bell state with
spin correlation in the x- and y-direction and anti-correlated spins in the z-direction:

| x + x + − | x − x − = − i | y + y + + i | y − y − = | z + z − + | z − z + .

311
Entropy 2017, 19, 98

4.5. Compatibility, Independence and Informational Completeness for Arbitrary N

Consider N elementary systems in the ‘qubit’ (D1 = 3) case and the XNOR conjunction:

Qμ1 μ2 ···μ N := Qμ1 ↔ Qμ2 ↔ · · · ↔ Qμ N (10)

of individual questions, where μ a = 0, 1, 2, 3 and Q0 := ‘yes’. The conjunction yields ‘yes’ and ‘no’ if
an even and odd number of Qμa = ‘no’, respectively, and thus, does not represent “are the answers to
all Qμa the same?.” As shown in [1], these conjunctions are informationally complete:

Theorem 3. (Qubits) The 4 N − 1 questions Qμ1 ···μ N , μ = 0, 1, 2, 3 (we deduct the trivial question Q000···000 ),
are pairwise independent and logically closed under ↔ and, thus, form an informationally complete set Q MN
with D N = 4 N − 1. Moreover, Qμ1 ···μ N and Qν1 ···νN are compatible if they differ by an even number (including
zero) of non-zero indices and complementary otherwise.

We note that an N-qubit density matrix has precisely 4 N − 1 degrees of freedom.

4.6. Linear, Reversible Time Evolution and a Quadratic Information Measure

Thus far, the summarized results invoked only Rules 1 and 2 (and in one instance, Rule 6). Rules 3
and 4, on the other hand, can be demonstrated to entail a linear and reversible evolution of the
generalized Bloch vector R4 −1 . r = 2 y − 1 that already appeared in (4),
N

r (Δt + t0 ) = T (Δt)r (t0 ), (11)

where T (Δt) ⊂ T N deﬁnes a one-parameter matrix group [1]. Suppose T (Δt), T (Δt ) ∈ T N correspond
to two distinct interactions to which O may subject S N . By Rule 4, T (Δt) · T (Δt ) must likewise be
contained in T N , and since both T, T are invertible, also the entire set T N must be a group. We shall
henceforth often represent states with Bloch vectors r.
Rules 3 and 4, together with elementary operational conditions on the information measure,
enforce it to be quadratic α(yi ) = (2 yi − 1)2 so that O’s total information (1):

4 N −1
IN (y) = ∑ (2 yi − 1)2 = |r |2 (12)
i =1

is simply the square norm of the Bloch vector [1]. Interestingly, this derivation would not work
without the continuity of time evolution (Rule 4). Crucially, (12) is not the Shannon entropy (see [1]
for a discussion about why the Shannon entropy is also conceptually not suitable for quantifying O’s
information). This reconstruction thereby corroborates an earlier proposal for a quadratic information
measure for quantum theory by Brukner and Zeilinger [13,15,16].
This quadratic information measure becomes key for the remaining steps of the reconstruction.
Given that (12) is a ‘conserved charge’ of time evolution (rule 3), we can already infer that T N ⊂ SO
(4 N −1) because time evolution must be connected to the identity.

4.7. Pure and Mixed States

Suppose O knows SN ’s answers to N mutually compatible questions from Q MN , thereby saturating
the information limit of N independent bits (Rule 1). He will then also know the answers to each of
their bipartite, tripartite, ..., and N-partite XNOR conjunctions which, by Theorem 3, are also in Q MN
(and compatible). In total, he then knows the answers to:
N
N N N N
+ +··· =∑ = 2N − 1
1 2 N i =1
i

312
Entropy 2017, 19, 98

questions from Q MN . Thus, O’s total information (12) is 2 N − 1 bits in this case. It contains dependent
bits of information because the questions in Q MN are pairwise, but not all mutually independent.
Thanks to Rule 3, this is invariant under time evolution.
This allows us to distinguish two kinds of states [1]; y is called a:

pure state: if it is a state of maximal information and, hence, of maximal length:

4 N −1
IN (y) = ∑ (2 yi − 1)2 = (2 N − 1) bits, (13)
i =1

mixed state: if it is a state of non-maximal information,

4 N −1
0 bit ≤ IN (y) = ∑ (2 yi − 1)2 < (2 N − 1) bits. (14)
i =1

The square length of the Bloch vector thus corresponds to the number of answered questions. The state
of no information y = 12 1 has length zero bits.
As can be easily checked, quantum theory satisﬁes this characterization. In particular, an N-qubit
density matrix, corresponding to a pure state, has a Bloch vector with square norm equal to 2 N − 1.
This peculiar mathematical fact now has a clear informational interpretation.

4.8. The Bloch Ball and Unitary Group for a Single Qubit from a Conserved Informational Charge
Since D1 = 3 (cf. Section 4.3), we have that Q M1 = { Q1 , Q2 , Q3 } is a maximal set of mutually
complementary questions, i.e., no further Q ∈ Q1 can be added to Q M1 without destroying mutual
complementarity in the set (cf. Section 4.1). According to (13), a pure state satisﬁes:

IN =1 (y) = r12 + r22 + r32 = (2 y1 − 1)2 + (2 y2 − 1)2 + (2 y3 − 1)2 = 1 bit. (15)

For later, we thus observe: for pure states, the maximal mutually complementary set carries exactly 1 bit of
information, and this is a conserved charge of time evolution (Rule 3).
Rule 1 implies that, e.g., the pure state y∗ = (1, 0, 0) exists in Σ1 , and we know T1 ⊂ SO(3).
However, it is clear that applying any T ∈ SO(3) to y∗ , according to (11), yields only states that are
also compatible with all Rules 1–3 (and the landscape). Hence, by Rule 4, we must actually have
T1 = SO(3) PSU(2). Clearly, T1 then generates all quantum pure states from y∗ , i.e., it yields the
entire Bloch sphere (the image of any legal state under a legal time evolution is also a legal state).
Recalling that Σ1 is convex, we obtain that Σ1 = B3 convex hull of CP1 is the entire unit Bloch ball
with mixed states (14) lying inside; the completely mixed state equals the state of no information at the
center. Σ1 , T1 coincide exactly with the set of density matrices ρ = 12 (1 +r ·σ) and the set of unitary
transformations ρ → U ρ U † , U ∈ SU(2), respectively, for a single qubit in its adjoint (i.e., Bloch vector)
representation, where σ = (σ1 , σ2 , σ3 ) is the vector of Pauli matrices. Finally, from the assumptions in
Section 2.8 and Rule 5, it is also clear that Q1 = {q ∈ R3 | |q|2 = 1 bit} CP1 . This coincides with the
set of projectors Pq = 12 (1 + q ·σ) onto the +1 eigenspaces of the Pauli operators q ·σ. Noting that:

1
Tr(ρ Pq ) = (1 +r · q) ≡ Y ( Q|y) (16)
2
we also recover that (4) yields the Born rule for projective measurements. We thus have the claim of
Section 3 for N = 1 (for details see [1,2]).

4.9. Unitary Group and Density Matrices for Two Qubits from Conserved Informational Charges
Also for N = 2, it is rewarding to consider maximal mutually complementary sets within Q M2 .
Using Lemma 1, one can check that there are exactly six maximal complementarity sets containing ﬁve

313
Entropy 2017, 19, 98

questions and twenty containing three [2]; e.g., two graphical representatives are:

Q11 Q11
11
00
11
00 11
00
11
00
11
00
11
00
11
00
11
00
11
00 11
00 11
00 11
00
Q12 Q12
11
00 11
00 11
00 11
00
Q2 00
11
00
11 11
00
00
11
11
00
11
00
11
00
11
00
Q13
Q3
11
00 11
00 11
00 11
00
Q3 11
00
11
00
11
00
11
00
00
11
00
11
00
11
00
11

Pent1 = {Q11 , Q12 , Q13 , Q2 , Q3 }, Tri1 = {Q11 , Q12 , Q3 } .

The six maximal complementarity sets of ﬁve elements can be represented as a lattice of pentagons;
see Figure 2 (which also contains four green triangles, each representing one of the twenty maximal
complementarity sets of three questions) [2].

Q21
Q33
00
11
Q22 11
00
00
11 11
00 Q1 00
11
00
11 11
00
11
00 Q32
11
00
3 5

00
11
00
11 Q2 Q31
Q11
00
23
Q3 Q21
Q11
1 2

Q3
Q12
Q2 00
11 Q32
Q13 11
00
00
11

6 4
11
00
Q1
11
00 00
11
11
00
11
00
Q23 00
11 Q22
00
11
11
00
00
11
Q33 Q12

Figure 2. The six maximal complementarity sets represented as pentagons. Two questions are complementary
if they share a pentagon or are connected by an edge and compatible otherwise. Every pentagon is connected
to all of the other ﬁve because any Q ∈ Q M2 is contained in precisely two pentagons. The red arrows represent
the information swap (21) between Pentagons 1 and 2 that preserves all pentagon equalities (18) and deﬁnes the
time evolution generator (22). (Figure adapted from [2]. Reprinted with permission from [P. Höhn and C. Wever,
Phys. Rev. A95, 012102 2017.] Copyright (2017) by the American Physical Society.)

Each of these sets has to satisfy the complementarity inequalities (2); specifically 0 bits ≤
I (Penta ) := ∑i∈Penta ri2 ≤ 1 bit for the information carried by the five questions in pentagon a. Since
any Q ∈ Q M2 is contained in precisely two pentagons (cf. Figure 2), we find:

6
∑ I (Penta ) = 2 ∑ (ri21 + ri22 ) + ∑ rij2 = 2 IN =2 (r ). (17)
a =1 i =1,2,3 i,j=1,2,3

Noting that for pure states IN =2 (rpure ) = 3 bits thus produces the pentagon equalities [2]:

pure states: I (Penta ) ≡ 1 bit, a = 1, . . . , 6. (18)

Any pure state must satisfy (18), and T2 evolves pure states to pure states (Rule 3). Hence, in analogy
to N = 1: for pure states, these six maximal mutually complementary sets carry exactly one bit of information,
and these are six conserved charges of time evolution. There are further interesting constraints on the
distribution of O’s information over Q M2 [2].

314
Entropy 2017, 19, 98

It can be straightforwardly checked that quantum theory actually satisﬁes (18). Indeed, in the
case of quantum theory, the identity for Pent1 reads in more familiar language (pure states):

I (Pent1 ) = σ2 ⊗ 12 + σ3 ⊗ 12 + σ1 ⊗ σ1 2 + σ1 ⊗ σ2 2 + σ1 ⊗ σ3 2 = 1,

etc. Remarkably, these identities of quantum theory seem not to have been reported before in
the literature. These novel conserved informational charges are a prediction of our reconstruction,
underscoring the beneﬁts of taking this informational approach. Additionally, these informational
charges are indispensable for deriving the unitary group and the state space, as we shall now see.
Using that I (Penta (r )) is conserved under T2 ⊂ SO(15) entails (with new index i = 1, . . . , 15):

∑ ri Gij r j = 0, a = 1, . . . 6, (19)
i ∈Penta ,1≤ j≤15

where T (Δt) = exp(ΔtG ) for G ∈ so(15) [2]. The correlation structure of Figure 1 enforces [2]:

Gij = 0, whenever Qi , Q j are compatible. (20)

Each of the 15 Qi ∈ Q M2 is complementary to eight others, and since Gij = − Gji , there could be
maximally 60 linearly independent Gij of T2 .
These are constructed as follows. For every pair of pentagons, there is a unique information swap
transformation that preserves (18). For instance, the red arrows in Figure 2 represent the complete
information swap between pentagons Pent1 and Pent2 (←→ is not the XNOR):

←→ r3 (Pent4 ), r13 ←→ r22 (Pent6 )

2
r22 ←→ r31
2
(Pent5 ), r32 ←→ r21
2 2
(Pent3 ), r12 2
(21)

that keeps all other components ﬁxed. (18) are preserved because every swap in (21) occurs within a
pentagon. The correlation structure of Figure 1 ﬁxes the corresponding generator to [2]:
Pent1 ,Pent2
Gij = δi2 δj(31) − δi3 δj(21) + δi(12) δj3 − δi(13) δj2 − (i ←→ j). (22)

One can repeat the argument for all 15 pentagon pairs, producing 15 linearly independent generators [2].
Remarkably, they turn out to coincide exactly with the adjoint representation of the 15 fundamental
generators of SU(4) [2]. In particular, (22) is the generator of entangling unitaries leaving r11 invariant.
The other 45 independent generators satisfying (20) are ruled out by the correlation structure so
that T2 cannot be generated by anything else than these 15 pentagon swaps [2]. One can show that
the exponentiation of (linear combinations of) these 15 pentagon swaps generates PSU(4) and that
this group abides by all rules and forms a maximal subgroup of SO(15) [2]. Rule 4 then implies
T2 PSU(4), which is the correct set of unitary transformations ρ → U ρ U † , U ∈ SU(4), for
two qubits.
It turns out that the set of Bloch vectors satisfying all six pentagon equalities (18) and the
conservation equations (19) for the 15 pentagon swaps splits into two sets on each of which T2 = PSU(4)
acts transitively [2]. These two sets correspond precisely to the two possible conventions of building
up composite questions either using the XNOR or XOR (cf. Section 4.1) and are therefore physically
equivalent. Adhering to the XNOR convention, we conclude that the surviving set of Bloch vectors
solving (18) and (19) is the set of N = 2 states admitted by the rules. Indeed, it coincides exactly
with the set of quantum pure states, which forms a CP3 of which PSU(4) is the isometry group [2].
Employing convexity of Σ2 , one ﬁnally ﬁnds:

Σ2 = closed convex hull of CP3 ,

which is exactly the set of normalized 4 × 4 density matrices over C2 ⊗ C2 .

315
Entropy 2017, 19, 98

Concluding, the new conserved informational charges (18), in analogy to (15) for N = 1, deﬁne
both the unitary group and the set of states for two qubits (for neglected details, see [2]).

4.10. Unitaries and States for N > 2 Elementary Systems

According to Theorem 3, Σ N is (4 N −1)-dimensional and T N ⊂ SO (4 N −1) (cf. Section 4.6).
The reconstruction of the unitary group uses a universality result from quantum computation:
two-qubit unitaries PSU(4) (between any pair) and single-qubit unitaries PSU(2) SO(3) generate
the full projective unitary group PSU (2 N ) for N qubits [17,18]. Given that S N is a composite system,
all of these bipartite and local unitaries must be in T N . One can check that PSU (2 N ) again abides by all
rules and constitutes a maximal subgroup of SO (4 N − 1) [2]. Thanks to Rule 4, this yields T N PSU
(2 N ), which coincides with the set of unitary transformations on N-qubit density matrices. In analogy
to the previous case, one obtains as the state space:
N −1
Σ N = closed convex hull of CP2 ,

which agrees with the set of normalized N-qubit density matrices (for details, see [2]).

4.11. Questions as Projective Measurements and the Born Rule

The assumptions in Section 2.8 and Rule 5 yield the following question set characterization [2]:
N −1
Q N {q ∈ R4 | Y (q|r ) ∈ [0, 1] ∀r ∈ Σ N and q is a 1 bit quantum state}. (23)

As shown in [2], this set is isomorphic to the set of projectors Pq = 12 (1 +q ·σ) onto the +1 eigenspaces
of the Pauli operators q ·σ = ∑μ1 ···μ N qμ1 ···μ N σμ1 ···μ N , where σμ1 ···μ N = σμ1 ⊗ · · · ⊗ σμ N and σ0 = 1.
Noting that qμ1 ···μ N corresponds to (10) reveals that the XNOR at the question level corresponds to
the tensor product ⊗ at the operator level. One also ﬁnds that (16) again holds, such that (4) yields
the Born rule for projective measurements for arbitrary N (for the neglected details and many further
interesting properties of Q N , we refer to [2]).

4.12. The von Neumann Evolution Equation

We thus obtain qubit quantum theory in its adjoint (i.e., Bloch vector) representation. Lastly, we
note that r (t) = T (t)r (0) with T (t) = et G ∈ PSU (2 N ) is equivalent to the adjoint action:

ρ ( t ) = U ( t ) ρ (0) U † ( t ), (24)

of U (t) = e−i H t ∈ SU(2 N ) for some Hermitian operator H on C2 , where ρ(t) =

N 1
2N
(1 +r (t) ·σ) [2].
(24), in turn, is equivalent to ρ(t) solving the von Neumann evolution equation:

∂ρ
i = [ H, ρ]. (25)
∂t
We have therefore also recovered the correct time evolution equation for quantum states.

5. Conclusions
We have reviewed and summarized the key steps from [1,2] necessary to prove the claim of
Section 3. This yields a reconstruction of the explicit formalism of qubit quantum theory from rules
constraining an observer’s acquisition of information about a system [1,2]. The derivation corroborates
the consistency of interpreting the state as the observer’s ‘catalog of knowledge’ and shows that it
is sufﬁcient to speak only about the information accessible to him for reproducing quantum theory.
In fact, for qubits, this derivation accomplishes an informational reconstruction of the type proposed in

316
Entropy 2017, 19, 98

Rovelli’s relational quantum mechanics [11] and in the Brukner-Zeilinger informational interpretation
of quantum theory [12,13].
As a key beneﬁt, this reconstruction also provides a novel informational explanation for the
architecture of qubit quantum theory. In particular, it explains the logical structure of a basis of spin
measurements, the dimensionality and structure of quantum state spaces, the correlation structure
and the unitarity of time evolution from the perspective of information acquisition. This unravels
previously unknown structural properties: conserved ‘informational charges’ from complementarity
relations deﬁne and explain the unitary group and the set of pure states.

Acknowledgments: The author thanks Christopher S. P. Wever for an enjoyable collaboration on [2]. The project
leading to this publication has received funding from the European Union’s Horizon 2020 research and innovation
program under the Marie Sklodowska-Curie Grant Agreement No. 657661.
Conﬂicts of Interest: The author declares no conﬂict of interest.

References
1. Höhn, P.A. Toolbox for reconstructing quantum theory from rules on information acquisition. arXiv 2014,
arXiv:1412.8323.
2. Höhn, P.A.; Wever, C.S.P. Quantum theory from questions. Phys. Rev. A 2017, 95, 012102.
3. Hardy, L. Quantum Theory From Five Reasonable Axioms. arXiv 2001, arXiv:quant-ph/0101012.
4. Dakic, B.; Brukner, C. Quantum Theory and Beyond: Is Entanglement Special? In Deep Beauty; Halvorson, H., Ed.;
Cambridge University Press: Cambridge, UK, 2011; p. 365.
5. Masanes, L.; Müller, M.P. A derivation of quantum theory from physical requirements. New J. Phys. 2011,
13, 063001.
6. Chiribella, G.; D’Ariano, G.M.; Perinotti, P. Informational derivation of quantum theory. Phys. Rev. A 2011,
84, 012311.
7. Barnum, H.; Müller, M.P.; Ududec, C. Higher-order interference and single-system postulates characterizing
quantum theory. New J. Phys. 2014, 16, 123029.
8. De la Torre, G.; Masanes, L.; Short, A.J.; Müller, M.P. Deriving Quantum Theory from Its Local Structure and
Reversibility. Phys. Rev. Lett. 2012, 109, 090403.
9. Goyal, P. From information geometry to quantum theory. New J. Phys. 2010, 12, 023012.
10. Appleby, M.; Fuchs, C.A.; Stacey, B.C.; Zhu, H. Introducing the Qplex: A Novel Arena for Quantum Theory.
arXiv 2016, arXiv:1612.03234.
11. Rovelli, C. Relational quantum mechanics. Int. J. Theor. Phys. 1996, 35, 1637–1678.
12. Zeilinger, A. A Foundational Principle for Quantum Mechanics. Found. Phys. 1999, 29, 631–643.
13. Brukner, C.; Zeilinger, A. Information and fundamental elements of the structure of quantum theory. In Time,
Quantum and Information; Castell, L., Ischebeck, O., Eds.; Springer: Berlin/Heidelberg, Germany, 2003.
14. Schrödinger, E. Discussion of Probability Relations between Separated Systems. Math. Proc. Camb. Philos. Soc.
1935, 31, 555–563.
15. Brukner, C.; Zeilinger, A. Operationally Invariant Information in Quantum Measurements. Phys. Rev. Lett.
1999, 83, 3354.
16. Brukner, C.; Zeilinger, A. Conceptual inadequacy of the Shannon information in quantum measurements.
Phys. Rev. A 2001, 63, 022113.
17. Bremner, M.J.; Dawson, C.M.; Dodd, J.L.; Gilchrist, A.; Harrow, A.W.; Mortimer, D.; Nielsen, M.A.; Osborne,T.J.
Practical Scheme for Quantum Computation with Any Two-Qubit Entangling Gate. Phys. Rev. Lett. 2002,
89, 247902.
18. Harrow, A.W. Exact universality from any entangling gate without inverses. Quant. Inf. Comput. 2009,
9, 773–777.

317
entropy
Brief Report
Test of the Pauli Exclusion Principle in the VIP-2
Underground Experiment
Catalina Curceanu 1,2,3, *,‡,§ , Hexi Shi 1,4, *,§ , Sergio Bartalucci 1 , Sergio Bertolucci 5 ,
Massimiliano Bazzi 1 , Carolina Berucci 1,4 , Mario Bragadireanu 1,3 , Michael Cargnelli 1,4 ,
Alberto Clozza 1 , Luca De Paolis 1 , Sergio Di Matteo 6 , Jean-Pierre Egger 7 , Carlo Guaraldo 1 ,
Mihail Iliescu 1 , Johann Marton 1,4 , Matthias Laubenstein 8 , Edoardo Milotti 9 , Marco Miliucci 1 ,
Andreas Pichler 1,4 , Dorel Pietreanu 1,3 , Kristian Piscicchia 2,1 , Alessandro Scordo 1 ,
Diana Laura Sirghi 1,3 , Florin Sirghi 1,3 , Laura Sperandio 1 , Oton Vazquez Doce 1,10 ,
Eberhard Widmann 4 and Johann Zmeskal 1,4
1 Laboratori Nazionali di Frascati, INFN, I-00044 Frascati, Italy; [email protected] (S.B.);
[email protected] (Mas.B.); [email protected] (C.B.);
[email protected] (Mar.B.); [email protected] (M.C.);
[email protected] (A.C.); [email protected] (L.D.P.); [email protected] (C.G.);
[email protected] (M.I.); [email protected] (J.M.); [email protected] (M.M.);
[email protected] (A.P.); [email protected] (D.P.); [email protected] (K.P.);
[email protected] (A.S.); [email protected] (D.L.S.);
[email protected] (F.S.); [email protected] (L.S.);
[email protected] (O.V.D.); [email protected] (J.Z.)
2 CENTRO FERMI - Museo Storico della Fisica e Centro Studi e Ricerche ‘Enrico Fermi’, I-00184 Rome, Italy
3 Institutul National pentru Fizica si Inginerie Nucleara Horia Hulubbei, IFIN-HH,
R-077125 Magurele, Romania
4 Stefan-Meyer-Institute for Subatomic Physics, Austrian Academy of Science, A-1090 Vienna, Austria;
[email protected]
5 Dipartimento di Fisica e Astronomia, Universitá di Bologna, I-40127 Bologna, Italy;
[email protected]
6 Institut de Physique UMR CNRS-UR1 6251, Université de Rennes1, F-35042 Rennes, France;
[email protected]
7 Institut de Physique, Université de Neuchâtel, CH-2000 Neuenburg, Switzerland;
[email protected]
8 Laboratori Nazionali del Gran Sasso, INFN, I-67100 Assergi L’Aquila, Italy;
[email protected]
9 Dipartimento di Fisica, Universitá di Trieste and INFN-Sezione di Trieste, I-34127 Trieste, Italy;
[email protected]
10 Excellence Cluster Universe, Technische Universität München, D-85748 Garching, Germany
* Correspondence: [email protected] (C.C.); [email protected] (H.S.);
Tel.: +39-06-9403-2321 (C.C.)
† This paper is an extended version of our paper published in the XIV International Conference on Topics in
Astroparticle and Underground Physics (TAUP2015), 7–11 September 2015, Torino, Italy.
‡ Current address: Laboratori Nazionali di Frascati, INFN, Via E. Fermi 40, I-00044, Frascati, Italy.
§ These authors contributed equally to this work.

Received: 29 April 2017; Accepted: 22 June 2017; Published: 24 June 2017

Abstract: The validity of the Pauli exclusion principle—a building block of Quantum Mechanics—is
tested for electrons. The VIP (violation of Pauli exclusion principle) and its follow-up VIP-2
experiments at the Laboratori Nazionali del Gran Sasso search for X-rays from copper atomic
transitions that are prohibited by the Pauli exclusion principle. The candidate events—if they
exist—originate from the transition of a 2p orbit electron to the ground state which is already occupied
by two electrons. The present limit on the probability for Pauli exclusion principle violation for
electrons set by the VIP experiment is 4.7 ×10−29 . We report a ﬁrst result from the VIP-2 experiment

Entropy 2017, 19, 300; doi:10.3390/e19070300 319 www.mdpi.com/journal/entropy

Entropy 2017, 19, 300

improving on the VIP limit, which solidiﬁes the ﬁnal goal of achieving a two orders of magnitude
gain in the long run.

Keywords: Pauli exclusion principle; quantum foundations; X-ray spectroscopy; underground

experiment; silicon drift detector

1. Introduction
The Pauli exclusion principle (PEP) states that in a system there cannot be two (or more) fermions
with all quantum numbers identical, and is a fundamental principle in physics. The validity of the PEP
is the basis of the periodic table of elements, electric conductivity in metals, the degeneracy pressure
which makes white dwarfs and neutron stars stable, as well as many other phenomena in physics,
chemistry, and biology. In quantum mechanics (QM), the states of particles are described in terms of
wave functions. For identical particles, with respect to their permutation, the states are necessarily
either symmetric for bosons, or antisymmetric for fermions. This “symmetrization postulate” [1]
excludes the mixing of different symmetrization groups, and it is at the basis of the PEP. Messiah
and Greenberg noted in [2] that this superselection rule “does not appear as a necessary feature of
the QM description of nature”. In this context, the violation of PEP is equivalent to the violation of
spin-statistics [3], and experimentally to the existence of states of particles that follow statistics other
than the fermionic or the bosonic ones.
Exhaustive reviews of the experimental and theoretical searches for a small violation of the PEP
or the violation of spin-statistics can be found, for example, in [3,4]. We first point out that there
is no established model in quantum field theory that can explicitly include small violations of the
PEP. Secondly, although many experimental searches present limits for the violation, the parameters
that quantify the limits are model/system-dependent and are not generally comparable. Moreover,
in order to search for states that are in a mixed symmetry, it is crucial to introduce new states into
the system, among which the PEP-violating states may be found. Ramberg and Snow [5] took this
argument into account by running a high electric DC current through a copper conductor, and they
searched for X-rays from transitions that are PEP-forbidden after electrons are captured by copper
atoms. In particular, they searched for PEP-violating transitions from the 2p level to the 1s level, which
is already occupied by two electrons. Due to the shielding effect of the additional electron in the
ground level, the energy of such abnormal transitions will deviate from the copper Kα X-ray at 8 keV
by about 300 eV [6], which are distinguishable in precision spectroscopic measurements. Since the new
electrons from the current are supposed to have no a-priori established symmetry with the electrons
inside the copper atoms, the detection of the energy-shifted X-rays is an explicit indication of the
violation of spin-statistics, and thus the violation of the PEP for electrons.
We want to mention that one known system in which the dichotomy of fermions and bosons does
not work is in the two-dimensional condensed matter physics through the (fractional) quantum
Hall effect [7]. Particles that are neither fermions nor bosons, and that may exist in electronic
systems confined to two spatial dimensions have been constructed theoretically and investigated
in the laboratory with great consistency with the theories as reviewed in [8]. The physics of this
special system is exciting in itself and may provide hints to the searches for the violation of the PEP in
other systems.
In Section 2, we will introduce the VIP (violation of Pauli exclusion principle) and VIP-2
experiments at Laboratori Nazionali del Gran Sasso (LNGS), and in Section 3 the first results from the
physics run of VIP-2 in 2016, which already improved the best result previously achieved by the VIP
experiment with 3 years of data collection. The paper ends with conclusions and future perspectives.

320
Entropy 2017, 19, 300

2. VIP-2 Experiment
The first experiment performed in the LNGS-INFN underground laboratory—the VIP
experiment—used a similar method as that of Ramberg and Snow, and the same definition of
the parameter to represent the probability that the PEP is violated, for a direct comparison of the
experimental results. An improvement in sensitivity was achieved firstly by performing the experiment
in the low radioactivity laboratory at LNGS, which has the advantage of the excellent shielding against
cosmic rays. Secondly, the application of charge-coupled device (CCD) as the X-ray detector with a
typical energy resolution of 320 eV at 8 keV increased the precision in the definition of the region of
interest to search for anomalous X-rays. The VIP experiment set the limit for the probability of the PEP
violation for electrons to be 4.7 × 10−29 [9–11].
By using new X-ray detectors and an active shielding of scintillators, the VIP-2 experiment plans
to further improve the sensitivity by two orders of magnitude. The major improvements come from
the change of the layout of the copper strip target and of the X-ray detectors, which allow a larger
acceptance for the X-ray detection. Secondly, a DC current with 100 amperes is applied instead of 40
amperes, which introduces two times the new electrons into the copper strip. Finally, in addition to
the improved passive shielding surrounding the setup to reduce the background generated by the
environmental radiations, the use of silicon drift detectors (SDDs) as the X-ray detectors allows the
implementation of an active shielding using scintillators, as illustrated in Figure 1a, which removes
the background induced by the high-energy charged particles that are not shielded. More details of
the detectors and the VIP-2 setup are given in [12–15].

Figure 1. (a) The design of the core components of the VIolation of Pauli exclusion principle 2
(VIP-2) setup, including the silicon drift detectors (SDDs) as the X-ray detector, the scintillators as
active shielding with silicon photomultiplier readout; (b) a picture of the VIP-2 setup in operation at
the underground laboratory of Gran Sasso.

The VIP-2 trigger logic was implemented using the Nuclear Instrumentation Module (NIM)
standard modules, and it is defined by either an event at any SDD or a coincidence between two layers
of the veto detector. A Versa Module Europa (VME) based data acquisition system for the detectors was
constructed. It records the energy deposit of the six SDDs from the output of a CAEN 568 spectroscopy
amplifier which processes the analog signals of the SDD preamplifier output. The charge to digital
signals (QDC) of the 32 scintillator channels, and the timing information of the SDDs with respect to
the main trigger are recorded in the data as well. The data acquisition computer transfers data from
the VME whenever there is one event ready in the memories of the modules, and clears the registers
of the VME when the data transfer is done. During the whole communication process between the

321
Entropy 2017, 19, 300

computer and the VME controller, the trigger logic is prohibited from receiving further events. The
user interface of the Labview-based data-taking program can be remotely accessed and controlled
from the computer terminals outside the Gran Sasso laboratory.
The temperatures of the SDDs, the copper conductor, the cooling system, as well as the ambient
temperature and vacuum pressure of the setup are monitored by a slow control system. The slow
control which can be accessed from remote terminals also controls the DC power supply to switch
on and off the current applied to the copper strip. A closed circuit chiller coupled to a cooling pad
attached to the copper strips keeps a constant temperature below 25 Celsius of the strips when the DC
current up to 100 A is applied. The temperature of the SDDs’ holder frame had a change of less than 2
K when the 100 A current was applied to the copper strip. At this level of temperature variation, the
effect of change in the energy resolution of the SDDs is negligible.
In November 2015, after having performed exhaustive tests in the laboratory, the VIP-2 setup
was transported and mounted in the Gran Sasso underground laboratory, as shown in Figure 1b.
After tuning and optimization, from October 2016 we started the ﬁrst campaign of data taking with the
complete detector system. The energy calibration of the SDDs was performed in in-situ, by placing a
weak Iron-55 source covered by a 25 μm-thick titanium foil near the detectors. The manganese K-series
X-rays from the source partly go through the foil and partly irradiate the foil, generating titanium
K-series X-rays. These ﬂuorescence X-rays are detected by the SDDs at an overall rate of about 2 Hz,
and provide reference energy peaks to calibrate the digitized SDD signals to energy scale.

3. First VIP-2 Results

During the data collection from October to December 2016, the DC current was typically switched
on for one week and off for the next. The energy calibrations for the SDDs were done for each data
set corresponding to a period of about one week, and then summed separately over the whole data
collection period of over two months, for 100 A current-on data and current-off data sets. The spectra
that correspond to 34 days of effective data acquisition with 100 A current on and 28 days with current
off are shown in Figure 2, in which the ﬂuorescence lines of titanium and manganese are marked.
The environmental gamma radiations and high-energy charged particles can irradiate the copper
conductor or the strip inside the setup, and the normal K-series X-rays from the de-excitation of the
copper form the main background near the energy region of interest (ROI in Figure 2) from 7629 eV to
7829 eV, which is deﬁned by the SDD energy resolution (200 eV full width at half maximum, FWHM) at
the Kα copper transition (8.04 keV) near the expected value of the PEP violating transition. In order to
obtain the number of events violating PEP in the ROI, the current-on spectrum was normalized to 28
days of data collection time, and then a subtraction with the current-off spectrum was performed. The
numbers of X-rays in the region of interest were :

• with I = 100 A; NX = 2222 ± 47 (for 34 days of data collection);

• with I = 0 A; NX = 2181 ± 47 (28 days of data collection normalized to 34 days);
• numerical subtraction : ΔNX = 41 ± 66 (normalized to 34 days of data collection time).

Following the similar notations used by Ramberg–Snow and the VIP experiment papers,
the number of possible PEP violating events, ΔNX , is related to the β2 /2 parameter giving the
probability of PEP violation [16] :

ΔNX ≥ 12 β2 Nnew 10
1
Nint × (detection efﬁciency factor)
(1)
β2 (ΣIΔt) D
= eμ
1
20 × (detection efﬁciency factor).

Furthermore, the number of new electrons that pass through the conductor,

Nnew = (1/e)ΣIΔt, (2)

322
Entropy 2017, 19, 300

is given by the electric charge e of the electron, the intensity I of the applied DC current, and the
duration time Δt of the measurement. The minimum number of internal scattering processes between
a new electron and the atoms of the copper lattice, Nint , is of order D/μ, where D is the length of the
copper strip (10 cm), and μ is the mean free path of electrons in copper. We follow the same assumption
used in the VIP paper [17], that the capture probability of a new electron by an atom of the copper
lattice is greater than 1/10 of the scattering probability.

Energy spectrum with 100 Ampere current

number of events [32 eV / bin]

Ti
34 days
10 5

ary
Mn

i n
elim
4
10

3
Pr ROI
Cu
10

3000 4000 5000 6000 7000 8000 9000 10000 11000

Energy [eV]

Energy spectrum without current

number of events [32 eV / bin]

28 days
10 5
Ti

Mn
i n ary
10 4

relim
P ROI
Cu
10 3

10 2
3000 4000 5000 6000 7000 8000 9000 10000 11000
Energy [eV]

Figure 2. The energy spectra from all the SDDs, for data with and without applied DC current to
the copper strip, taken during the physics run in late 2016 at the Laboratori Nazionali del Gran
Sasso (LNGS).

The detection efficiency factor is evaluated with a Monte Carlo simulation based on Geant4.10
with realistic detector configuration, taking into account: the transmission rate of a copper Kα X-ray
that originates at a random position inside the copper strip and reaches the surface; the geometrical
acceptance of the photons coming from the surface of the copper stip arriving at the six SDD detectors;
the detection efficiency of a copper Kα X-ray by the 450 μm-thick SDD unit, and the value is determined
to be about 1%.
With D = 10 cm, μ = 3.9 × 10−6 cm, e = 1.602 × 10−19 C, I = 100 A, and normalizing the
measurement time with current to 34 days, using the three sigma upper bound of ΔNX = 41 ± 66 to
give a 99.7% C.L., we get an upper limit for the β2 /2 parameter:

β2 3 × 66
≤ = 4.2 × 10−29 . (3)
2 4.7 × 1030

323
Entropy 2017, 19, 300

4. Conclusions and Future Perspectives

The ﬁrst VIP-2 physics run from two months of data collection already gave a better limit than
the VIP result obtained from three years of running.
In Figure 3, we show all the past experimental results of the PEP violation tests for electrons with
a copper conductor, together with this work. The new result shows that in the planned data collection
time of 3 to 4 years, the VIP-2 experiment can either set a new upper limit for the probability that the
PEP is violated at the level of 10−31 , improving the VIP experiment result by two orders of magnitude,
or ﬁnd the PEP violation, which would have profound implications in science and philosophy.

G h
Graph

Ramberg and Snow

10<27 VIP 2006

VIP 2011
`2/2

this work
10<29

10<31
VIP-2 goal

1990 2000 2010 2020

year

Figure 3. All the past results from Pauli exclusion principle (PEP) violation tests for electrons with
a copper conductor, together with the result from this work and the anticipated goal of the VIP-2
experiment. Note that the result of this work comes from two months of data collection, and it is
already compatible with the VIP result from three years of operation.

We conclude with the words of Lev Okun from his 1987 paper [18]: “The special place enjoyed by
the Pauli principle in modern theoretical physics does not mean that this principle does not require further and
exhaustive experimental tests. On the contrary, it is speciﬁcally the fundamental nature of the Pauli principle
which would make such tests, over the entire periodic table, of special interest”.

Acknowledgments: We thank H. Schneider, L. Stohwasser, and D. Stückler from Stefan-Meyer-Institut for

their fundamental contribution in designing and building the VIP-2 setup. We acknowledge the very important
assistance of the INFN-LNGS laboratory staff during all phases of preparation, installation and data taking.
We thank the Austrian Science Foundation (FWF) which supports the VIP-2 project with the grant P25529-N20.
We acknowledge the support from the EU COST Action CA15220, and from Centro Fermi (“Problemi aperti nella
meccania quantistica” project). Furthermore, this paper was made possible through the support of a grant from
the John Templeton Foundation (ID 58158). The opinions expressed in this publication are those of the authors
and do not necessarily reflect the views of the John Templeton Foundation.
Author Contributions: Sergio Bertolucci, Catalina Curceanu, Jean-Pierre Egger, Carlo Guaraldo, Edoardo Milotti,
Eberhard Widmann, Johann Zmeskal conceived and designed the experiment; Massimiliano Bazzi, Carolina
Berucci, Alberto Clozza, Mihail Iliescu, Andreas Pichler, Hexi Shi, Florin Sirghi, Johann Zmeskal prepared the
setup; Mihail Iliescu and Hexi Shi prepared the readout and data taking system; Andreas Pichler, Hexi Shi, Johann
Zmeskal performed the detctor tests; Mario Bragadireanu, Alberto Clozza, Catalina Curceanu, Mihail Iliescu,
Matthias Laubenstein, Johann Marton, Marco Miliucci, Andreas Pichler, Dorel Pietreanu, Kristian Piscicchia,
Alessandro Scordo, Hexi Shi, Florin Sirghi, Johann Zmeskal contributed to the installation and data taking;
Hexi Shi, Andreas Pichler, Michael Cargnelli, Luca De Paolis performed the data analysis; Sergio Bartalucci,
Diana Laura Sirghi, Laura Sperandio, Oton Vazquez Doce provided details of the VIP analysis; Sergio Di Matteo
provided theoretical support for the data analyses; Catalina Curceanu and Hexi Shi wrote the paper.
Conflicts of Interest: The authors declare no conflict of interest.

324
Entropy 2017, 19, 300

Abbreviations
PEP Pauli Exclusion Principle
VIP(-2) experiment VIolation of Pauli principle (-2) experiment
CCD Carge Coupled Device
SDD Silicon Drift Detector
NIM Nuclear Instrumentation Module
VME Versa Module Europa
QDC Charge-to-Digital Converter
LNGS Laboratori Nazionali del Gran Sasso
FWHM Full Width Half Maximum
ROI Region of Interest

References
1. Messiah, A.M.L. Quantum Mechanics, Volume II; North-Holland: Amsterdam, The Netherlands, 1962; p. 595.
2. Messiah, A.M.L.; Greenberg, O.W. Symmetrization Postulate and Its Experimental Foundation. Phys. Rev.
1964, 136, B248.
3. Greenberg, O.W. Theories of Violation of Statistics. AIP Conf. Proc. 2000, 545, 113, doi: 10.1063/1.1337721.
4. Elliott, S.R.; LaRoque, B.H.; Gehman, V.M.; Kidd, M.F.; Chen, M. An Improved Limit on
Pauli-Exclusion-Principle Forbidden Atomic Transitions. Found. Phys. 2012, 42, 1015–1030.
5. Ramberg, E.; Snow, G.A. Experimental Limit on a Small Violation of the Pauli Principle. Phys. Lett. B 1990,
238, 438–441.
6. Curceanu, C.; De Paolis, L.; Di Matteo, S.; Di Matteo, H.; Sperandio, S. Evaluation of the X-ray
Transition Energies for the Pauli-Principle-Violating Atomic Transitions in Several Elements by Using the
Dirac-Fock Method. Available online: http://www.lnf.infn.it/sis/preprint/detail.php?id=5330 (accessed on
23 June 2017).
7. Prange, R.; Girvin, S.M. The Quantum Hall Effect; Springer: New York, NY, USA, 1990.
8. Stern, A. Anyons and the quantum Hall effect—A pedagogical review. Ann. Phys. 2008, 323, 204–249.
9. Curceanu, C.; Bartalucci, S.; Bertolucci, S.; Bragadireanu, M.; Cargnelli, M.; Di Matteo, S.; Egger, J.-P.;
Guaraldo, C.; Iliescu, M.; Ishiwatari, T.; et al. Experiemntal tests of quantum mechanics—Pauli exclusion
principle violation (the VIP experiment) and future perspective. J. Phys. Conf. Ser. 2011, 306, 012036,
doi:10.1088/1742-6596/306/1/012036.
10. Bartalucci, S.; Bertolucci, S.; Bragadireanu, M.; Cargnelli, M.; Curceanu, C.; Di Matteo, S.; Egger, J.-P.;
Guaraldo, C.; Iliescu, M.; Ishiwatari, T.; et al. The VIP experimental limit on the Pauli exclusion principle
violation by electrons. Found. Phys. 2009, 40, 765–775.
11. Sperandio, L. New Experimental Limit on the Pauli Exclusion Principle Violation by Electrons From the VIP
Experiment. Ph.D. Thesis, Tor Vergata University, Rome, Italy, 2008.
12. Shi, H.; Bartalucci, S.; Bertolucci, S.; Berucci, C.; Bragadireanu, A.M.; Cargnelli, M.; Clozza, A.; Curceanu, C.;
De Paolis, L.; Di Matteo, S.; et al. Searches for the Violation of Pauli Exclusion Principle at LNGS in VIP(-2)
experiment. J. Phys. Conf. Ser. 2016, 718, 042055, doi:10.1088/1742-6596/718/4/042055.
13. Pichler, A.; Bartalucci, S.; Bazzi, M.; Bertolucci, S.; Berucci, C.; Bragadireanu, M.; Cargnelli, M.; Clozza, A.;
Curceanu, C.; De Paolis, L.; et al. Application of photon detectors in the VIP-2 experiment to test the Pauli
Exclusion Principle. J. Phys. Conf. Ser. 2016, 718, 052030, doi:10.1088/1742-6596/718/5/052030.
14. Shi, H.; Bartalucci, S.; Bertolucci, S.; Berucci, C.; Bragadireanu, A.M.; Cargnelli, M.; Clozza, A.; Curceanu, C.;
De Paolis, L.; Di Matteo, S.; et al. Testing the Pauli Exclusion Principle for electronics at LNGS. Phys. Procedia
2015, 62, 522–559.
15. Marton, J.; Bartalucci, S.; Bertolucci, S.; Berucci, C.; Bragadireanu, M.; Cargnelli, M.; Curceanu, C.;
Di Matteo, S.; Egger, J.-P.; Guaraldo, C.; et al. Testing the Pauli Exclusion Principle for Electrons. J. Phys.
Conf. Ser. 2013, 447, 012060, doi:10.1088/1742-6596/335/1/012060.
16. Greenberg, O.W.; Mohapatra, R.N. Local Quantum Field Theory of Possible Violation of the Pauli Principle.
Phys. Lett. 1987, 59, 2507.

325
Entropy 2017, 19, 300

17. VIP Collaboration; Bartalucci, S.; Bertolucci, S.; Bragadireanu, M.; Cargnelli, M.; Catitti, M.; Curceanu, C.;
Di Matteo, S.; Egger, J.-P.; Guaraldo, C.; et al. New experimental limit on the Pauli exclusion principle
violation by electrons. Phys. Lett. B 2006, 641, 18–22.
18. Okun, L. Possible violation of the Pauli principle in atoms. JETP Lett. 1987, 46, 529–532.

326
entropy
Article
CSL Collapse Model Mapped with the
Spontaneous Radiation
Kristian Piscicchia 1,2, *, Angelo Bassi 3,4 , Catalina Curceanu 2,1 , Raffaele Del Grande 2 ,
Sandro Donadi 5 , Beatrix C. Hiesmayr 6 and Andreas Pichler 7
1 CENTRO FERMI—Museo Storico della Fisica e Centro Studi e Ricerche “Enrico Fermi”, 00184 Rome, Italy
2 Istituto Nazionale di Fisica Nucleare (INFN), Laboratori Nazionali di Frascati, 00044 Frascati, Italy
3 Department of Physics, University of Trieste, 34151 Miramare-Trieste, Italy
4 Istituto Nazionale di Fisica Nucleare, Sezione di Trieste, Via Valerio 2, 34127 Trieste, Italy
5 Institute of Theoretical Physics, Ulm University, Albert-Einstein-Allee 11 D, 89069 Ulm, Germany
6 Faculty of Physics, University of Vienna, Boltzmanngasse 5, 1090 Vienna, Austria
7 Stefan-Meyer-Institut für Subatomare Physik, 1090 Vienna, Austria
* Correspondence: [email protected]; Tel.: +39-06-9403-2654

Received: 30 April 2017; Accepted: 25 June 2017; Published: 29 June 2017

Abstract: In this paper, new upper limits on the parameters of the Continuous Spontaneous
Localization (CSL) collapse model are extracted. To this end, the X-ray emission data collected by
the IGEX collaboration are analyzed and compared with the spectrum of the spontaneous photon
emission process predicted by collapse models. This study allows the obtainment of the most
stringent limits within a relevant range of the CSL model parameters, with respect to any other
method. The collapse rate λ and the correlation length rC are mapped, thus allowing the exclusion of
a broad range of the parameter space.

Keywords: quantum mechanics; the measurement problem; collapse models; X-rays

1. The CSL Collapse Model

Collapse models are phenomenological models introduced to solve the measurement problem of
quantum mechanics and explain the quantum-to-classical transition [1–6]. According to these models,
the linear and unitary evolution given by the Schrödinger equation is modified by adding a non-linear
term and the interaction with a stochastic noise field. These modifications have two very important
consequences: (i) they lead to the collapse of the wave function of the system in space (localization
mechanism) and (ii) the collapse effects get amplified with the mass of the system (amplification
mechanism). The combination of these two properties guarantees that macroscopic objects always
have well defined positions, explaining why we do not observe quantum behaviour at the macroscopic
level. On the other hand, for microscopic systems, the effect of the non-linear interaction with the noise
field is very small and their dynamics is dominated by the Schrödinger evolution. Due to the presence
of the non-linear interaction with the noise field, collapse models predict slight deviations from the
standard quantum mechanics predictions [7].
The analysis discussed in this work sets limits on the characteristic parameters of the Continuous
Spontaneous Localization (CSL) model [8–10], which is one of the most relevant and well-studied
collapse models in the literature. In the CSL model, the state vector evolution is described by a modified
Schrödinger equation which contains, besides the standard Hamiltonian, non-linear and stochastic
terms, characterized by the interaction with a continuous set of independent noises w(x, t) (one for
each point of the space, which is why this set is often referred to as “noise field”) having zero average
and white correlation in time, i.e., E[w(x, t)] = 0 and E[w(x, t)w(y, s)] = δ(x − y)δ(t − s) where E[...]
denotes the average over the noises. Two phenomenological parameters (λ and rC ) are introduced in

Entropy 2017, 19, 319; doi:10.3390/e19070319 327 www.mdpi.com/journal/entropy

Entropy 2017, 19, 319

the model. The parameter λ has the dimensions of a rate and sets the strength of the collapse, while rC is
a correlation length which determines the spatial resolution of the collapse: for superposition with size
much smaller than rC , the collapse is much weaker compared to the case when the superposition has a
delocalization much larger than rC . The originally proposed values for λ and rC are [8] λ = 10−16 s−1 ,
rC = 10−7 m. Higher values for λ were however put forward [11], up to λ = 10−8±2 s−1 .
The interaction with the noise ﬁeld causes an extra emission of electromagnetic radiation for
charged particles [7], which is not predicted by standard quantum mechanics. Such an effect is known
as spontaneous radiation emission. We show that the measurement of the radiation allows for a mapping
of the two relevant parameters λ and rC (see also Ref. [12]) into a two-dimensional parameter space,
i.e., we can present an exclusion plot. This gives a considerable reduction of the possible values in the
parameter space of collapse models.

2. The Collapse Rate Parameter λ

The energy distribution of the spontaneous radiation, emitted as a consequence of the interaction
of free electrons with the collapsing stochastic field, was first calculated by Fu [7] and later on studied
in more detail in [13–15], in the framework of the non-relativistic CSL model. If the stochastic field
is assumed to be a white noise, coupled to the particle mass density (mass proportional CSL model),
the spontaneous emission rate is given by:

dΓ( E) e2 λ
= 2 m2 E
, (1)
dE 4π 2 rC N

where e is the charge of the proton, m N represents the nucleon mass and E is the energy of the emitted
photon. In the non-mass proportional case, the rate takes the expression:

dΓ( E) e2 λ
= 2 m2 E
, (2)
dE 4π 2 rC e

with me the electron mass.

Using the measured radiation emitted in an isolated slab of Germanium [16] corresponding to an
energy of 11 keV, and comparing it with the predicted rate in Equations (1) and (2), Fu extracted the
following upper limits on λ for the two cases:

λ ≤ 2.20 · 10−10 s−1 mass prop., (3)

−16 −1
λ ≤ 0.55 · 10 s non-mass prop., (4)

assuming that the correlation length value is rC = 10−7

m. In his estimate, Fu considered the
contribution to the spontaneous X-ray emission of the four valence electrons in the Germanium atoms.
Such electrons can be considered as quasi-free, since their binding energy (of the order of ∼10 eV) is
much less than the emitted photons’ energy. In Ref. [11], the author argues that an erroneous value
for the ﬁne structure constant is used in Ref. [7]. This correction is taken into account in the analysis
described in Section 3. Further, the preliminary TWIN data set [16] used by Fu to estimate the upper
limit on λ turned out to be underestimated by a factor of about 50 at 10 keV.
A new analysis was performed in Ref. [17]. Based on the improved data presented in Ref. [18],
the limits corresponding to the footnote [7] in Ref. [17], for the cases of mass proportional and non-mass
proportional CSL models, were:

λ ≤ 8 · 10−10 s−1 mass prop., (5)

−16 −1
λ ≤ 2 · 10 s non-mass prop.. (6)

328
Entropy 2017, 19, 319

3. A New Limit on λ
In this work, the X-ray emission spectrum measured by the IGEX experiment [19] is analysed
in order to set a more stringent limit on the collapse rate parameter λ. IGEX is a low-background
experiment based on low-activity Germanium detectors, originally dedicated to the neutrinoless
double beta decay (ββ0ν) research. The published data set [20] refers to 80 kg day exposure, and was
conceived to search for a dark matter WIMPs signal that originated from elastic scattering, producing
Ge nuclear recoil.
For the measurement in Ref. [20], one of the IGEX detectors of 2.2 kg (active mass of about
2 kg) was used. The detector, the cryostat and the shielding were fabricated following ultra-low
background techniques, in order to minimize the radionuclides emission, which represents the main
background source in the measured X-ray spectrum (shown in Figure 1 as a black distribution).
Moreover, a cosmic muon veto covered the top and the sides of the shield. The experiment had
an overburden of 2450 m.w.e., reducing the muon flux to the value of 2 · 10−7 cm−2 s−1 . The two
main sources of inefficiency are represented by the muon veto anti-coincidence and the pulse shape
analysis. The probability of rejecting non-coincident events with the muon veto was found to be less
than 0.01. The loss of efficiency introduced by the pulse shape analysis resulted to be negligible for
events above 4 keV.

Figure 1. Fit of the X-ray emission spectrum measured by the IGEX experiment [19,20], using the
theoretical ﬁt function Equation (7). The black line corresponds to the experimental distribution; the red
dashed line represents the ﬁt. See the text for more details.

The X-ray spectrum (Figure 1) ranges in the interval (4.5 ÷ 48.5) keV, which is compatible with
the non-relativistic assumption for electrons, used to derive Equations (1) and (2).

3.1. The Data Analysis: Procedure and Results

The X-ray experimental spectrum published in [20] is compared with the predicted rate
Equations (1) and (2), by taking into account the spontaneous emission of the 30 outermost electrons of
the Ge atoms considered as quasi-free. We restricted our analysis to the energy range ΔE = (14.5 ÷ 48.5)
keV of the experimental spectrum [20], for which the binding energy of the lower lying electronic orbit
(the 2s orbit) is still one order of magnitude lower than 14.5 keV, justifying the quasi-free hypothesis.

329
Entropy 2017, 19, 319

The X-ray spectrum is fitted in the interval ΔE by minimising a χ2 function. The expected number of
counts for each bin of 1 keV is assumed to be described by the theoretical prediction Equations (1) and (2):

dΓ( E) α(λ)
= . (7)
dE E
The χ2 minimisation presumes that the bin contents yi (number of counts in the energy
bin Ei ) follow Gaussian distributions. Strictly speaking, the yi s are Poissonian stochastic variables;
nevertheless, the approximation is reasonable for yi ≥ 5; this constraint is then used for the fit.
The result of the fit is shown in Figure 1 (red dashed line). For the free parameter of the fit, the
minimization gives the value α(λ) = 115 ± 17, corresponding to a reduced χ2 /(n.d. f . − n.p.) = 0.9.
n.d. f . represents the number of degrees of freedom, n.p. is the number of free parameters of the fit.
α(λ) is also considered to follow a Gaussian distribution with a good approximation. An upper limit
can then be set as α(λ) ≤ 143 with a probability of 95%. Correspondingly, an upper limit on the
parameter λ can be extracted using Equations (1) and (2):

dΓ( E) e2 λ 143
=c 2 2 2 ≤ , (8)
dE 4π rC m E E

where the factor c is given by:

atoms n. of seconds
c = 8.29 × 1024 · (80 kg day) · 8.64 × 104 · (30), (9)
kg day
the first bracket accounts for the particle density of Germanium, the second represents the amount of
emitting material expressed in kg day, the third term is the number of seconds in one day and 30 represents
the number of spontaneously emitting electrons for each Germanium atom. Applying Equation (8),
the following upper limits for the reduction rate parameter are obtained, with a probability of 95%:

λ ≤ 8.1 · 10−12 s−1 mass prop., (10)

λ ≤ 2.4 · 10−18 s−1 non-mass prop.. (11)

In order to obtain the limits in Equations (10) and (11), two implicit assumptions are made on
the experimental input [20]. First, the measured spectrum is assumed to be background free, that is
to say that the upper limit on λ corresponds to the case in which all the measured X-ray emission
would be produced by spontaneous emission processes. This ansatz is conservative, and is imposed
by our ignorance regarding the contribution from known emission processes to the measured rate.
The second assumption, which is consistent with the analysis presented in Ref. [20], is that the detector
efﬁciency, in the range ΔE, is one, and that the un-efﬁciencies which are introduced by the muon
veto anticoincidence and the pulse shape analysis, performed to extract the experimental spectrum in
Ref. [20], are very small for events above 4 keV.
Having in mind these assumptions, the measured X-ray counts in the range ΔE can be re-analysed
in terms of their low-events Poissonian statistics. The number of counts yi s in each energy bin Ei can
be considered as independent stochastic variables following the distributions:

Λ i i e − Λi
y
G (yi | P, Λi ) = , (12)
yi !

where P denotes the Poisson distribution function. The expected numbers of counts per bin Λi are
indicated with capital letters, not to be confused with the spontaneous collapse rate λ. Let us deﬁne:
n n
y= ∑ yi , Λ= ∑ Λi (13)
i =1 i =1

330
Entropy 2017, 19, 319

where n is the total number of 1 keV bins in the range ΔE, y and Λ are the total number of counts and
the expected number of total counts, respectively. Here, y is distributed according to a Poissonian of
parameter Λ(λ), where the dependence on the collapse rate parameter, which follows the theoretical
input, was explicitly indicated.
According to the Bayes theorem, the probability distribution function of Λ(λ), given the measured
y, assuming a uniform prior, is given by:

G (Λ| G (y| P, Λ)) ∝ Λ(λ)y e−Λ(λ) , (14)

which means that G (λ) is proportional to a gamma probability distribution. Due to the assumption
that the background is negligible, Λ(λ) also represents the expected number of total signal counts ys ,
where ys is a Poissonian variable. Thus, according to Equation (8):
n n
e2 λ α(λ)
Λ(λ) = ys + 1 = ∑ c 4π2 r2 m2 E +1 = ∑ Ei
+ 1. (15)
i =1 C i i =1

Substituting Equation (15) for Equation (14), the probability distribution function for the collapse
rate parameter can then be obtained:
y
n
α(λ) − ∑in=1 α(λ)
Ei +1
G (λ| G (y| P, Λ)) ∝ ∑ Ei + 1 e , (16)
i =1

where the measured total number of counts is y = 130. Calculating the cumulative distribution function:
λ0
G (λ| G (y| P, Λ)) dλ, (17)
0
the following upper limits can be obtained on the collapse rate parameter, setting rC to the value
10−7 m, corresponding to a probability level of 95%

λ ≤ 6.8 · 10−12 s−1 mass prop., (18)

λ ≤ 2.0 · 10−18 s−1 non-mass prop.. (19)

4. Mapping CSL Parameters Space

In Figure 2, we present the mapping of the λ − rC parameters of the CSL model, where the
originally proposed theoretical values are shown, together with our results. The region excluded by
theoretical arguments is represented in gray. This theoretical bound (see Ref. [21]) is obtained by
requiring that a single-layered graphene disk of radius ∼0.01 mm is localized within ∼10 ms (these
are the minimum resolution and perception time of the human eye, respectively).
The region excluded by this analysis is shown in cyan for the non-mass proportional case and in
magenta for the mass proportional case. Figure 2 can be compared with Figure 2 in Ref. [22], where
the mapping is obtained using other measurements. It is interesting to note that, for a collapse induced
by a white noise, the allowed parameter space is conﬁned to a drastically reduced region.

331
Entropy 2017, 19, 319

Figure 2. Mapping of the λ − rC Continuous Spontaneous Localization (CSL) parameters: the originally
proposed theoretical values (GRW, Adler) are shown as black points; the region excluded by theory
(theory) is represented in gray. The excluded region according to our analysis is shown in cyan for the
non-mass proportional case (n-m-p) and in magenta for the mass proportional case (m-p).

5. Conclusions and Perspectives

We have presented an analysis of the spontaneous radiation emitted and measured by the IGEX
Germanium detector, to obtain a mapping of the CSL collapse model parameters. The results shown in
Figure 2 can be summarized as follows:
• the non-mass proportional model for a white noise scenario can be excluded by our analysis,
• the higher value on λ [11] can be excluded for a white noise scenario, in both mass proportional
and non-mass proportional models,
• the measurement of the spontaneous radiation allows the obtainment of the most stringent limits
on the CSL collapse model parameters, with respect to any other method, in a broad range of the
parameter space (see also Ref. [22] for comparison).
We are presently exploring the possibility of performing a new measurement that will allow an
improvement of at least one order of magnitude on the collapse rate parameter λ, exploring new
regions of CSL mapping.

Acknowledgments: We acknowledge the support of the CENTRO FERMI—Museo Storico della Fisica e Centro
Studi e Ricerche “Enrico Fermi” (Open Problems in Quantum Mechanics project), the support from the EU COST
Action CA 15220 is gratefully acknowledged. Furthermore, this paper was made possible through the support of a
grant from the Foundational Questions Institute, FQXi “Events” as we see them: experimental test of the collapse
models as a solution of the measurement problem) and a grant from the John Templeton Foundation (ID 58158).
The opinions expressed in this publication are those of the authors and do not necessarily reﬂect the views of the
John Templeton Foundation. Beatrix C. Hiesmayr acknowledges gratefully the support by the Autrian Science
Found (FWF-P26783). S. Donadi acknowledges the support by Trieste University and Istituto Nazionale di Fisica
Nucleare (INFN).
Author Contributions: Kristian Piscicchia, Catalina Curceanu, Raffaele Del Grande and Andreas Pichler analyzed
the data; Angelo Bassi, Sandro Donadi and Beatrix C. Hiesmayr gave the theoretical support for data analyses and
interpretation; Kristian Piscicchia and Catalina Curceanu wrote the paper. All authors have read and approved
the ﬁnal manuscript.

332
Entropy 2017, 19, 319

Conﬂicts of Interest: The authors declare no conﬂict of interest.

References
1. Bassi, A.; Ghirardi, G.C. Dynamical reduction models. Phys. Rep. 2003, 379, 257–426.
2. Pearle, P. Collapse Models Open Systems and Measurements in Relativistic Quantum Field Theory; Lecture Notes in
Physics; Breuer, H.-P., Petruccione, F., Eds.; Springer: Berlin/Heidelberg, Germany, 1999; Volume 526.
3. Diósi, L. Models for Universal Reduction of Macroscopic Quantum Fluctuations. Phys. Rev. A 1989, 40, 1165.
4. Bassi, A. Collapse Models: Analysis of the Free Particle Dynamics. Available online: https://arxiv.org/abs/
quant-ph/0410222.pdf (accessed on 25 March 2009).
5. Adler, S.L. Quantum Theory as an Emergent Phenomenon; Cambridge University Press: Cambridge, UK, 2004;
Charpter 6.
6. Weber, T. Quantum mechanics with spontaneous localization revisited. Il Nuovo Cimento B 1991, 106, 1111–1124.
7. Fu, Q. Spontaneous radiation of free electrons in a nonrelativistic collapse model. Phys. Rev. A 1997, 56, 1806.
8. Ghirardi, G.; Rimini, A.; Weber, T. Uniﬁed dynamics for microscopic and macroscopic systems. Phys. Rev. D
1986, 34, 470.
9. Pearle, P. Combining stochastic dynamical state-vector reduction with spontaneous localization. Phys. Rev. A
1989, 39, 2277.
10. Ghirardi, G.C.; Pearle, P.; Rimini, A. Markov processes in Hilbert space and continuous spontaneous
localization of systems of identical particles. Phys. Rev. A 1990, 42, 78.
11. Adler, S.L. Lower and Upper Bounds on CSL Parameters from Latent Image Formation and IGM Heating.
J. Phys. A 2007, 40, 2935–2958.
12. Curceanu, C.; Hiesmayr, B.C.; Piscicchia, K. X-rays help to unfuzzy the concept of measurement. J. Adv. Phys.
2015, 4, 263–266.
13. Adler, S.L.; Ramazanoglu, F.M. Photon emission rate from atomic systems in the CSL model. J. Phys. A 2007,
40, 13395–13406.
14. Adler, S.L.; Bassi, A.; Donadi, S. On spontaneous photon emission in collapse models. J. Phys. A 2013,
46, 245304.
15. Donadi, S.; Bassi, A.; Deckert, D.-A. On the spontaneous emission of electromagnetic radiation in the CSL
model. Ann. Phys. 2014, 340, 70–86.
16. Miley, H.S.; Avignone, F.T.; Brodzinski, R.L., III; Collar, J.I.; Reeves, J.H. Suggestive evidence for the two
neutrino double beta decay of Ge-76. Phys. Rev. Lett. 1990, 65, 3092.
17. Laloë, F.; Mullin, W.J.; Pearle, P. Heating of trapped ultracold atoms by collapse dynamics. Phys. Rev. A 2014,
90, 52119.
18. Collett, B.; Pearle, P.; Avignone, F.; Nussinov, S. Constraint on collapse models by limit on spontaneous X-ray
emission in Ge. Found. Phys. 1995, 25, 1399–1412.
19. Aalseth, C.E.; Avignone, F.T., III; Brodzinski, R.L.; Collar, J.I.; Garcia, E.; González, D.; Hasenbalg, F.;
Hensley, W.K.; Kirpichnikov, I.V.; Klimenko, A.A.; et al. Neutrinoless double-beta decay of Ge-76: First results
from the International Germanium Experiment (IGEX) with six isotopically enriched detectors. IGEX Collab.
Phys. Rev. C 1999, 59, 2108.
20. Morales, A.; Aalseth, C.E.; Avignone, F.T.; Brodzinski, R.L., III; Cebrian, S.; Garcia, E.; Irastorza, I.G.;
Kirpichnikov, I.V.; Klimenko, A.A.; Miley, H.S.; et al. Improved constraints on WIMPs from the international
Germanium experiment IGEX. IGEX Collab. Phys. Lett. B 2002, 532, 8–14.
21. Toroš, M.; Gasbarri, G.; Bassi, A. Bounds on Collapse Models from Matter-Wave Interferometry. Available
online: https://arxiv.org/pdf/1601.03672.pdf (accessed on 31 May 2017).
22. Carlesso, M.; Bassi, A.; Falferi, P.; Vinante, A. Experimental bounds on collapse models from gravitational
wave detectors. Phys. Rev. D 2016, 94, 124036.

333
Article
Quantum Information: What Is It All About?
Robert B. Grifﬁths
Department of Physics, Carnegie Mellon University, Pittsburgh, PA 15213, USA; [email protected]

Received: 23 October 2017; Accepted: 22 November 2017; Published: 29 November 2017

Abstract: This paper answers Bell’s question: What does quantum information refer to? It is about
quantum properties represented by subspaces of the quantum Hilbert space, or their projectors,
to which standard (Kolmogorov) probabilities can be assigned by using a projective decomposition
of the identity (PDI or framework) as a quantum sample space. The single framework rule of
consistent histories prevents paradoxes or contradictions. When only one framework is employed,
classical (Shannon) information theory can be imported unchanged into the quantum domain.
A particular case is the macroscopic world of classical physics whose quantum description needs
only a single quasiclassical framework. Nontrivial issues unique to quantum information, those with
no classical analog, arise when aspects of two or more incompatible frameworks are compared.

Keywords: Shannon information; quantum information; quantum measurements; consistent histories;

incompatible frameworks; single framework rule

1. Introduction
A serious study of the relationship between quantum information and quantum foundations
needs to address Bell’s rather disparaging question, “Quantum information ... about what?” found
in the third section of his polemic against the role of measurement in standard (textbook) quantum
mechanics [1]. The basic issue has to do with quantum ontology, “beables” in Bell’s language. I believe
a satisfactory answer to Bell’s question is available, indeed was already available (in a somewhat
preliminary form) at the time he was writing. (If he was aware of it, Bell did not mention it in any of
his publications.) Further developments have occurred since, and I have found this approach to be
of some value in addressing some of the foundational issues which have come up during my own
research on quantum information. So I hope the remarks which follow may assist others who ﬁnd the
textbook (both quantum and quantum information) presentations confusing or inadequate, and are
looking for something better.
Here is a summary of the remainder of this paper. The discussion begins in Section 2 by asking
Bell’s question about classical (Shannon) information: what is it all about? That theory works very
well in the world of macroscopic objects and properties. Hence if classical physics is fundamentally
quantum mechanical, as I and many others believe, and if Shannon’s approach is, as a consequence,
quantum information theory applied to the domain of macroscopic phenomena, we are already
half way to answering Bell’s question. The other half requires extending Shannon’s ideas into the
microscopic domain where classical physics fails and quantum theory is essential. This is possible,
Section 3, using a consistent formulation of standard (Kolmogorov) probability theory applied to
the quantum domain. Current quantum textbooks do not provide this, though their discussion of
measurements, Section 4, gives some useful hints. The basic approach in Section 3 follows von
Neumann: Hilbert subspaces, or their projectors, represent quantum properties, and a projective
decompositions of the identity (PDI) provides a quantum sample space. By not following Birkhoff
and von Neumman, but instead using a simpliﬁed form of quantum logic, Section 5, one has, in the
“single framework rule” of consistent histories, a means of escaping the well-known paradoxes that
inhabit the quantum foundations swamp. Section 6 argues that when quantum theory is equipped with

Entropy 2017, 19, 645; doi:10.3390/e19120645 335 www.mdpi.com/journal/entropy

Entropy 2017, 19, 645

(standard!) probabilities, quantum information theory is identical to Shannon’s theory in the domain of
macroscopic (classical) physics, as one might have expected, since only a single quasiclassical quantum
framework (PDI) is needed for a quantum mechanical description. However, classical information
theory also applies, unchanged, in the microscopic quantum domain if only a single framework is
needed. Section 7 provides a perspective on the highly nontrivial problems that are unique to quantum
information and lack any simple classical analog: they arise when one wants to compare (not combine!)
two or more incompatible frameworks applied to a particular situation.

2. Classical Information Theory

Let us start by asking Bell’s question about classical information theory, the discipline which
Shannon started. What is it all about? If you open any book on the subject, you will soon learn that it
is all about probabilities, and information measures expressed in terms of probabilities. So we need to
ask: probabilities of what? Standard (Kolmogorov) probability theory, the sort employed in classical
information theory, begins with a sample space of mutually exclusive possibilities, like the six faces
of a die. Next an event algebra made up of subsets of elements from the sample space, to which one
assigns probabilities, nonnegative numbers between 0 and 1, satisfying certain additivity conditions.
The simplest situation, quite adequate for the following discussion, is a sample space with a finite
number n of mutually exclusive possibilities, let them be labeled with an index j between 1 and n (or 0
and n − 1 if you’re a computer scientist). The event algebra consists of all 2n subsets (including the
empty set) of elements from this sample space. Then for probabilities choose a collection of n nonzero
real numbers p j lying between 0 and 1, which sum to 1. The probability of an element S in the event
algebra is the sum of the p j for j in S.
The mutually exclusive possibilities might be distinct letters of an alphabet used to send messages
through a communication channel, and in the actual physical world each letter will be represented
by some unique physical property(s) that identifies it and distinguishes it from the other letters of the
alphabet. One way to visualize this is to think of a classical phase space Γ in which each point γ
represents the precise state of a mechanical system, and a particular letter of the alphabet, say F,
is represented by some collection of points in Γ, the set of points where the property corresponding to
F is true, and where the corresponding indicator function F (γ) takes the value 1, whereas for all other
γ, F (γ) = 0. The different indicator functions associated with letters of the alphabet then split the
phase space up into tiles, regions in which a particular indicator function for a particular letter is equal
to 1, and indicators for the other letters are all equal to zero. If this tiling does not cover the entire
phase space, simply add another letter to the alphabet, call it “NONE”, and let its indicator be 1 on the
remaining points, and 0 elsewhere. In this manner, one can map the abstract notion of an alphabet of
mutually exclusive letters onto a collection of mutually exclusive physical properties, one and only
one of which will be true at any given time, because the point in phase space representing the actual
state of the mechanical system will be located in just one of the nonoverlapping tiles. Given the sample
space of tiles and some way of assigning probabilities, we have a setup to which the ideas of classical
information theory can be applied, with a fairly clear answer to the question of what the information is
all about.
In summary, classical information theory is all about probabilities, and in any specific application,
say to signals coming over an optical fiber, the probabilities are about, or make reference to,
physical events or properties of physical systems.

3. Quantum Probabilities
If we want quantum information theory to look something like Shannon’s theory, the ﬁrst task is
to identify a quantum sample spaces of mutually-exclusive properties to which probabilities can be
assigned. The task will be simplest if these quantum probabilities obey the same rules as their classical
counterparts. In particular, since Shannon’s theory employs expressions like p j log( p j ), it would

336
Entropy 2017, 19, 645

be nice if the quantum probabilities were nonnegative real numbers, in contrast to the negative
quasiprobabilities sometimes encountered in discussions of quantum foundations.
Can we identify a plausible sample space which relative to the quantum Hilbert space plays
a similar role to a tiling of a classical phase space? (In what follows, I will assume that the quantum
Hilbert space is a ﬁnite-dimensional complex vector space with an inner product. Thus, all subspaces
are closed, and we can ignore certain mathematical subtleties needed for a precise discussion of
inﬁnite-dimensional spaces.) A useful beginning is suggested by the quantum textbook approach to
probabilities given by the Born rule. Let A be an observable, a Hermitian operator on the quantum
Hilbert space, and let
A = ∑ a j Pj (1)
j

be its spectral representation: the a j are its eigenvalues and the Pj are projectors, orthogonal projection
operators, which form a projective decomposition of the identity I (PDI):

I= ∑ Pj ; Pj = Pj† ; Pj Pk = δjk Pj . (2)

If the eigenvalue a j is nondegenerate and |φj is the corresponding eigenvector, then

Pj = |φj φj | = [φj ], (3)

where [φ] is a convenient abbreviation for the Dirac dyad |φ φ|.
According to the textbooks, given a normalized ket |ψ, the probability that when A is measured
the outcome is a j , is given by the Born rule:

p j = Pr( a j ) = Pr( Pj ) = ψ| Pj |ψ = | ψ|φj |2 , (4)

where the ﬁnal equality applies only when Pj is the rank one projector in (3). Now a measurement of
A will yield just one eigenvalue, not many, so these eigenvalues correspond to the mutually-exclusive
properties Pj in the PDI used in (4). The idea that a quantum property should be associated with
a subspace of the Hilbert space, or the corresponding projector, goes back at least to von Neumann,
see Section III.5 of his oft-cited (but little read) book [2].
The projector Pj has eigenvalues 0 and 1, so it resembles an indicator function on the classical phase
space. In fact, a PDI divides up the Hilbert space into a set of mutually exclusive subspaces—Pj Pk = 0 for
j
= k—somewhat like a tiling of the classical phase space, whereas I = ∑ j Pj tells us this tiling is complete:
no part of the Hilbert space has been left out. Thus, the PDI is a plausible candidate for a quantum sample
space. The event algebra will then consist of the projectors in the PDI along with other projectors formed
from their sums, including I, along with the zero operator. The result is a commutative Boolean algebra.
We already have one scheme, (4), for assigning probabilities to elements of the PDI, and thus, by additivity,
to all the projectors in the event algebra. In particular, for j
= k,

Pr( Pj OR Pk ) = Pr( Pj ) + Pr( Pk ) = ψ|( Pj + Pk )|ψ, (5)

and similarly for sums of three or more distinct projectors.

In summary, this looks like a plausible beginning for a theory of quantum information: use a PDI
on the Hilbert space as a sample space; then assign probabilities to the individual projectors.
Not necessarily using (4), for it is only a particular example, but by some scheme which yields
nonnegative real numbers adding to 1. Indeed, this strategy works very well, and I believe it covers all
legitimate uses of (standard) probability theory in quantum mechanics, at least for a Hilbert space of
ﬁnite dimension.

337
Entropy 2017, 19, 645

4. Quantum Measurements
There is, of course, more to be said, and it can be motivated by noting that a carefully written
quantum textbook is likely to assign the probability p j not to the microscopic property of the measured
system, represented by Pj , but instead to the macroscopic measurement outcome, the pointer position in the
picturesque, albeit archaic, language of quantum foundations. However, in the above presentation,
it looks as if the probability is assigned directly to the microscopic property. Was this a mistake? Not if
one believes, as I do, that a properly constructed and calibrated apparatus designed to measure some
quantum observable can actually do what it was designed to do. Furthermore, if there is a one-to-one
correspondence between prior properties and later pointer positions, the probability p j will be the
same for both.
In support of my belief that quantum measurements measure something, I note that this is
assumed by my colleagues who do experiments at accelerator laboratories. They think that when they
detect a fast muon emerging from an energetic collision, there really was a fast muon that approached
and triggered their detector. Are they being naive? I do not think so. In passing, I note that these
colleagues do not seem to worry about the “collapse” of the muon wavefunction produced by its
interaction with the detector; they are less interested in what happened to the muon after it left their
measuring device, and more interested in knowing what it was doing before it arrived there.
In addition, the notion that outcome j corresponds to the earlier property Pj can in certain cases
be tested by preparing a particle which has the property Pj (see Section IV C of [3] on the topic of
preparation), sending it into the measurement apparatus, and seeing whether the result is that the
pointer points to j. Given that the apparatus has been tested and calibrated in this way, is not the
experimenter justified in thinking that the particle had the property indicated by the pointer in a run
in which the particle was not prepared in one of the Pj states? Justified or not, this is how many
of my colleagues who carry out experiments do interpret things, and if they did not it would be
difficult to draw interesting conclusions from their data. Quantum physics can hardly be called an
experimental science if experiments designed to reveal prior microscopic properties do not actually do
so! For additional details on the topic of what quantum measurements measure, including POVM and
weak measurements, see [3].
There is, to be sure, a conceptual difficulty lurking in the background if we assume that
measurements reveal prior microscopic properties. A hint is provided by the (correct) statement
in textbooks that the x and z components of spin angular momentum, Sx and Sz , of a spin-half
particle cannot be measured simultaneously. True, but what principle lies behind this? If we assume
that experimenters really do understand something about what their devices measure, their inability
to carry out such a simultaneous measurement might plausibly be explained by the fact that there is
nothing there to be measured. Even very skilled experimenters cannot measure what is not there; indeed,
this could be one thing that distinguishes them from less capable colleagues.
The Hilbert space of a spin-half particle is two-dimensional, and while it contains two subspaces
corresponding to Sx = ±1/2 (in units of h̄), and another two corresponding to Sz = ±1/2, there is no
subspace which can plausibly be associated with, to take an example, “Sx = +1/2 AND Sz = −1/2”.
Hence if we assume that quantum measurements measure microscopic properties represented by
subspaces of the quantum Hilbert space (or their projectors), we have a ready explanation for what lies
behind the assertion that Sx and Sz cannot both be measured simultaneously. This is one way in which
quantum mechanics is very different from classical mechanics.

5. Incompatible Properties

5.1. Issues of Logic

The absence of a Hilbert subspace corresponding to “Sx = +1/2 AND Sz = −1/2” reﬂects
an important difference between the logic of indicator functions on the classical phase space and
quantum projectors on the Hilbert space. One analogy has already been noted: the indicator F (γ) for

338
Entropy 2017, 19, 645

a classical property F takes one of two values, 0 and 1, while a quantum projector P has eigenvalues
that are either 0 or 1. In addition, the negation “NOT F” of a classical property has an indicator function
I (γ) − F (γ), where I (γ) is the function which is equal to 1 everywhere on the phase space. Similarly,
the negation “NOT P” of a quantum projector P is the projector I − P, with I the quantum identity
operator. However, the analogy begins to break down when we consider the conjunction “F AND G” of
two classical properties: the property which is true if and only if both F and G are true. It corresponds
to the intersection of the two subsets of phase space points associated with F and G, and its indicator
is the product F (γ) G (γ) of the two indicators. So we might expect that the conjunction “P AND Q” of
two quantum properties P and Q would be represented by the product PQ. Indeed, this is the case if
the projectors P and Q commute, PQ = QP, in which case PQ is again a projector. However, if PQ is
not equal to QP, then neither product is a projector, and it is not obvious how to deﬁne “P AND Q”.
The point can be illustrated using Sx and Sz for a spin-half particle. The projectors representing
Sx = +1/2 and −1/2 are [ x + ] = | x + x + | and [ x − ], where | x + and | x − are the eigenvectors
corresponding to Sx = +1/2 and −1/2. Since x + | x − = 0 (distinct eigenvalues means the
eigenvectors are orthogonal) [ x + ][ x − ] = [ x − ][ x + ] = 0. Thus, these projectors commute, and the
property “Sx = +1/2 AND Sx = −1/2” is represented by the zero operator on the Hilbert space:
the property that is always false and thus never occurs. Also [ x + ] + [ x − ] = I so these two
mutually-exclusive properties constitute a PDI, a quantum sample space. Likewise the projectors [z+ ]
and [z− ] that correspond to Sz = +1/2 and −1/2 form a PDI.
However, neither [ x + ] nor [ x − ] commutes with either [z+ ] or [z− ], so we cannot assign a quantum
property to “Sx = +1/2 AND Sz = −1/2” by taking the product of the projectors. Again, this is
consistent with the idea that the reason a simultaneous measurement of Sx and Sz is impossible is that
there is nothing there to be measured.

5.2. Compatible and Incompatible

Thus, one way, perhaps the most essential way, quantum physics differs from classical physics is
that projectors representing different quantum properties need not commute. We will say that the projectors
P and Q are compatible provided PQ = QP, and incompatible if PQ
= QP. Likewise a PDI { Pj } and
another PDI { Qk } are compatible if every projector in one commutes with every projector in the other:
Pj Qk = Qk Pj for every j and k. Otherwise, they are incompatible. In the compatible case, there is
a common refinement consisting of all products of the form Pj Qk = Qk Pj , and every property in the event
algebra associated with { Pj } or with { Qk } is also in the event algebra associated with this refinement.
Hence a very central issue in quantum foundations, and also for quantum information theory if one
wants to use PDI’s as sample spaces, is what to do when quantum projectors do not commute with
each other. There have been various approaches.
Von Neumann was well aware of this problem, and together with Birkhoff invented quantum
logic [4] to deal with it. In the case of a spin-half particle, quantum logic says that “Sx = +1/2 AND
Sz = −1/2” is the property represented by the zero operator; that is, it is meaningful, but it is always
false. This means its negation “Sx = −1/2 OR Sz = +1/2” is always true. Think about it: is that
reasonable? If you continue to try and apply ordinary logical reasoning in this situation, you will soon
end up in difficulty; see Section 4.6 of [5] for details. To prevent paradoxes, Birkhoff and von Neumann
modified some of the rules of ordinary logic. Alas, their quantum logic requires a revision of the rules
of ordinary (propositional) logic so radical that no one (known to me) has succeeded in using it to
think in a useful way about what is going on in the quantum world. Maybe we physicists are just too
stupid, and will have to wait for the day when clever quantum robots with intelligence vastly superior
to ours can use quantum logic to resolve the quantum mysteries. However, if they succeed, will they
be able to (or even want to) explain it to us?
A second approach to the incompatibility problem is employed in quantum textbooks and is also
widespread in the quantum foundations community. Instead of talking about the quantum properties
revealed by measurements, discussion is limited to measurement outcomes, the pointer positions that

339
Entropy 2017, 19, 645

are part of the macroscopic world where classical physics is an adequate approximation to quantum
physics, and noncommutation can be ignored for all practical purposes. (More in Section 6 below.) I call
this the “black box” approach to quantum foundations. One starts with the preparation of a microscopic
quantum state using a macroscopic apparatus, and then a later measurement of the state using another
macroscopic apparatus, and what lies in between—well, that is inside the black box, and we will say as
little as possible about it. A quantum |ψ? That is just a symbolic way of representing the preparation
procedure. A PDI { Pj }? That is nothing but a mathematical tool for calculating the probabilities of
measurement outcomes. The black box approach has the advantage that it avoids the problem of
noncommuting quantum projectors. Its disadvantage is that it provides no way of understanding in
physical terms what is going on at the microscopic level inside the box.
A third approach was popularized by Bell and his followers: replace the noncommuting Hilbert
space projectors with commuting hidden variables. In essence, assume that in some way classical
physics applies at the microscopic level. However, if, as I believe, noncommutation of projectors and
PDI’s marks the frontier between classical and quantum physics, one should not be surprised that
an approach which is fundamentally classical—assumes a classical sample space, as is evident from
the way the mysterious symbol λ is employed in formulas—results in the famous Bell inequality that
disagrees with both quantum mechanical calculations and experimental results. (Nonlocal inﬂuences
can be ignored, since they do not exist; see [6].)

5.3. The Single Framework Rule

The solution to the incompatibility problem that I favor can be viewed as a lowbrow form of
quantum logic, one that a physicist like me can actually make use of. Its essential idea is that as long as
one is dealing with a single PDI the rules of classical reasoning and classical probability theory can be
applied unaltered in the quantum domain. So let us do that. If two PDI’s are compatible, there is a PDI
which is a common refinement. So let us use it. However, if two PDI’s are incompatible, combining
them will lead to nonsense. So do not do it. These ideas have been worked out in considerable detail
in the consistent histories (CH) interpretation of quantum mechanics, where the prohibition against
combining incompatible PDI’s is known as the single framework rule. Here, the term framework is used
either for a PDI or the associated event algebra, and the single framework rule prohibits combining
incompatible PDI’s. The difference between CH and quantum logic can be illustrated using the
example “Sx = +1/2 AND Sz = −1/2” discussed earlier. In quantum logic, this is meaningful but
false, while in CH it is meaningless, neither true nor false. The negation of a false statement is a true
statement, so quantum logic has to say something about it. However, the negation of a meaningless
statement is equally meaningless, allowing CH to remain silent. See [7] for more details.
In order to discuss the time development of quantum systems, a similar approach can be used
(the “histories” part of consistent histories). Once again probabilities are assigned using PDI’s as
sample spaces, but in this case on an extended Hilbert space of histories [8]. In addition, in order to
assign probabilities to a family of histories (a PDI on the history sample space) using an extension of
the Born rule, it is necessary to impose certain consistency conditions (the “consistent” part of consistent
histories), if this family is to constitute an acceptable framework, so the single framework rule is
extended to incorporate the consistency conditions. For a short introduction to the CH interpretation
of quantum mechanics, see [9]. Various conceptual difficulties are discussed in [7], whereas [10] gives
a fairly thorough discussion of the ontology (Hilbert subspaces as “beables”). Finally, reference [5] is
a standard reference with lots of details.
One aspect of the CH approach has raised a lot of objections, so it deserves a comment. In a given
situation, it may be possible to describe what is going on using various different but incompatible
frameworks, so the question arises: “What is the right framework to use?” The right answer is that this
is the wrong question to ask in the quantum domain. In classical mechanics, the state of a mechanical
system at a particular instant of time can be exactly specified by a single point in its phase space,
the intersection of all properties (sets of points) which are “true” at that instant. This is consistent

340
Entropy 2017, 19, 645

with the idea, which I have elsewhere called unicity (Section 27.3 of [5]), that at every instant of time
there is a single unique “state of the universe” which, even if we do not know what it is, determines
all physical properties. What might be its quantum counterpart? A “wavefunction of the universe”?
If there really is something of that sort, it is likely to be a horrible, uninterpretable superposition
of different pointer positions at the end of a measurement, or some other form of Schrödinger cat.
The corresponding projector will then not commute with properties that might resemble something
in the ordinary macroscopic world, and the single framework rule will then prevent discussing the
world of everyday experience. I do not see any way in which a single quantum state could plausibly
represent the “true state of the world”, and I believe unicity must be abandoned in the transition from
classical to quantum physics.
In practice, the choice of which framework to use will depend on the problem one is interested
in. Consider, for example, a situation in which a spin-half particle is prepared in an eigenstate of Sx ,
say Sx = +1/2, before being sent through a magnetic field-free region (so its spin direction will not
change) into an Sz measuring device. The outcome of the measurement will be either Sz = +1/2
or Sz = −1/2; let us assume the latter. This means we can say that Sz was −1/2 just before the
measurement took place. However, is it possible that the particle had both Sx = +1/2 (because it
was prepared in this state) and Sz = −1/2 (the value measured later) at the same time, just before
the measurement was made? This makes no sense, as the properties are incompatible. There is
one framework in which at the intermediate time Sx = +1/2, reflecting its earlier preparation,
and a different, incompatible framework in which at the intermediate time Sz = −1/2, reflecting the
outcome of the later measurement. These frameworks cannot be combined, and each has its own
uses. If we are concerned about whether Sx was perturbed (say by a stray magnetic field), then the Sx
framework is helpful, while if we want to identify what the measurement measured, the Sz framework
is helpful. In textbook quantum mechanics, only the Sx framework is employed. Nothing wrong with
that, except that one cannot discuss in what way the measurement measures something, leaving the
poor student rather confused.
This example suggests that the liberty to choose different frameworks is not as dangerous as
it might at first appear. A particular choice yields some type of information, and a different choice
may yield something different. By looking at a coffee cup from above you can tell if it contains some
coffee, while to see if there is a crack in the bottom you need to look from below. The oddity about the
quantum world is not that different views, different frameworks, are possible. Instead, it is that certain
frameworks cannot be combined into a consistent quantum description, because they are incompatible.
For another, less trivial, example of a case in which choosing alternative frameworks proved useful,
see the end of Section 7.

6. Quantum Information Theory I

Once a proper quantum sample space, a PDI or framework, has been defined, standard (Kolmogorov)
probability theory can be used, and this means that the whole machinery of classical (Shannon)
probability theory can be imported, unchanged, into the quantum domain. However, the reasoning
and the results are restricted to this single framework; in particular, they cannot be combined with the
analysis carried out in a separate, incompatible framework. Probabilities associated with incompatible
frameworks cannot be combined; paying attention to this this eliminates a lot of well-known quantum
paradoxes (See Chapters 19 to 25 of [5]).
In particular, this provides a quantum justiﬁcation for all the usual applications of classical
information theory to macroscopic properties and their time development. The reason is that from
a quantum perspective the classical mechanics of macroscopic objects can be discussed with quite
adequate precision using a single quasiclassical quantum framework, in which ordinary macroscopic
properties are represented by enormous subspaces—a dimension of 10 raised to the power 1016 should
be counted as relatively small—whose projectors commute with one another for all practical purposes;
and quantum dynamics, which is intrinsically stochastic, is well approximated by deterministic

341
Entropy 2017, 19, 645

classical dynamics. See [11]; Chapters 7, 17, 18 of [12]; Chapter 26 of [5]; and Section 4 of [10].
Consequently, we can immediately claim that all of classical information theory, all seventeen chapters
of Cover and Thomas [13], or name your favorite reference, are a valid part of quantum information
theory when it is applied to macroscopic properties and processes. In this domain, we understand
quite well what quantum information is all about: its probabilities refer to quasiclassical properties and
processes, all the things for which classical physics provides a satisfactory approximation to a more
exact quantum description.
It is worth remarking, in passing, that using a quasiclassical framework provides a solution to
the infamous measurement problem of quantum foundations: what to do with a wavefunction which is
a coherent superposition of states in which the pointer points in two (or more) directions. While in the
CH approach there is nothing inherently wrong with such a thing, it can be ignored if one wants to
describe the usual macroscopic outcomes of laboratory experiments. Use a quasiclassical framework,
and the problems represented by Schrödinger’s cat are absent—and, by the single framework rule,
they are excluded from the description.
In addition, Shannon’s theory can be employed, unchanged, in situations in which some or all of
the properties being discussed are microscopic, quantum properties, provided the discussion is restricted
to a single framework. This includes what I have elsewhere [14] referred to as the second measurement
problem: inferring from the measurement outcome (the pointer position) something about the earlier
microscopic state of the system being measured. It can be analyzed in a manner which demonstrates
that my colleagues who carry out experiments at accelerator laboratories are not being foolish when
they assert that a fast muon has triggered their detector. The measurement apparatus is, in effect,
an information channel leading from microscopic quantum properties at the input to macroscopic
quantum properties (pointer positions) at the output.

7. Quantum Information Theory II

Does this mean that all problems of quantum information can be reduced to problems of classical
information? No, not at all, but it does provide some insight into the nature of the additional problems
which are unique to quantum information, and what is needed to attack them. These problems,
and there are a vast number, all have to do with comparing (but not combining!) situations involving
incompatible frameworks. But how can this be if a strict application of the single framework rule is
needed to avoid falling into nonsensical paradoxes? The answer will emerge from considering some
examples, starting with that of a noisy quantum channel.
Consider a one-qubit memoryless quantum channel whose input and output is a two-dimensional
Hilbert space, the quantum analog of a classical one-bit channel. The classical channel is characterized
by two real parameters: the probability that a 0 entering the channel will emerge as a 1, and the
probability that a 1 entering the channel will emerge as a 0. If both are zero, the channel is perfect,
noiseless. I like to visualize a perfect one-qubit quantum channel as a pipe through which a spin-half
particle is propelled in such a way that its spin is left unchanged. If it enters with Sx = +1/2 it
exits with Sx = +1/2, if it enters with Sz = −1/2 it exits with Sz = −1/2, and so forth. Of course,
on any particular run the particle can only have a well-defined spin angular momentum in a particular
direction; e.g., it can be prepared in such a state, and when it comes out only one component of its spin
angular momentum can be measured. So to test whether the channel is perfect, it is necessary to carry
out many repeated measurements. This by itself is no different from a classical channel, where repeated
measurements are needed to estimate the probabilities of a bit flip when a signal passes through the
channel. However, in the quantum case, the probabilities that Sz gets flipped, either from +1/2 to
−1/2, or from −1/2 to +1/2, can be very different from those for Sx , so repeated measurements need
to be carried out using different components of the spin angular momentum. The single framework
rule does not prohibit a discussion of both Sx and of Sz provided these refer to different runs of the
experiment. There is no problem in supposing that in one run Sz = −1/2, on the next run Sx = +1/2,

342
Entropy 2017, 19, 645

and so forth. Of course, one has to assume that the channel continues to behave in the same way,
at least in a probabilistic sense, during successive runs, but the same is true for a classical channel.
Suppose Joe has built what he claims is a perfect channel, but we want to test it. This is
straightforward for a 1-bit classical channel: send in a series of 0s and 1s, and see if what emerges from
the channel is the same as what was sent in. A one-qubit quantum channel is more complicated. If we
test it using a sequence of states in which Sz = +1/2 or −1/2, and what emerges is the same as what
went in, this is not sufficient, as it could very well be the case that if one sends in Sx = +1/2 it will
emerge with Sx either +1/2 or −1/2 in a completely random fashion, uncorrelated with the input.
So we have to check something in addition to Sz . Does this mean we have to carry out experiments with
Sw = +1/2 and −1/2 for every possible spin component w? That would take a lot of time, and is not
necessary. It suffices to check both Sz = ±1/2 and Sx = ±1/2. This result is far from obvious, and to
derive it one must use principles of quantum mechanics which have no classical analog. Quantum
information theorists need not fear unemployment; we will be kept busy for a long time.
As another example, consider teleportation, often presented as an instance of the mysterious
and almost magical way in which quantum mechanics goes beyond classical physics. A standard
textbook presentation of a protocol to teleport one qubit, e.g., Section 1.3.7 of [15], consists in applying
unitary time evolution to an initial quantum state, followed by a measurement which collapses it.
The measurement has four possible outcomes, and the result is communicated from A to B through
two uses of a perfect one-bit classical channel. The end result of the protocol is a quantum state
transmitted unchanged from A to B; in effect, a perfect one-qubit quantum channel. The student will
certainly learn something by working through the formulas in the textbook, but this is of limited
value in developing an intuition about microscopic quantum processes. My own approach [16]
to understanding teleportation employs two incompatible frameworks. One framework shows
how information about Sx is transmitted from Alice to Bob with the assistance of one use of the
classical channel, and the other how Sz information is transmitted with the help of the other use of
the classical channel. Similar ideas (but without referring to frameworks) will be found in [17,18].
This way of “opening the black box” should, I think, assist students in gaining a better intuition
for microscopic quantum processes, and I hope it will become more widespread in the quantum
information community, where research, or at least its publication, is still dominated by the “shut up
and calculate” mentality encouraged by textbooks.
The preceding example could be easily dismissed in that it did not lead (directly, at least) to
any new results in quantum information: the original teleportation protocol [19] appeared fourteen
years in advance of my analysis. Hence it may be worth mentioning another example. A student
and I were trying to understand Shor’s algorithm for factoring numbers, which ends with a quantum
Fourier transform followed by measurements of each of the qubits in the standard basis |0, |1 basis
(|z+ , |z− for a spin-half particle). We noted that if you suppose that the final measurement reveals
a property that the qubit possessed before the measurement, there is a way of looking at the problem
that leads to an alternative and simpler way to carry out the algorithm [20]. Our perspective required
using a framework incompatible with that employed in the standard textbook approach: unitary time
development right up to the moment when measurement “collapses” the wavefunction—which,
when done properly, leads to the same final answer. I was pleased that Nielsen and Chuang mentioned
our work (Exercise 4.35 on p. 188, and see p. 246 of [15]), but disappointed in that they presented it
as part of one more phenomenological principle, rather than as a way of gaining insight by using
measurements outcomes to infer something about what happened earlier.
In my opinion, the discipline of quantum information could benefit from paying attention to
the developments in quantum foundations mentioned above. If you open your favorite book on
quantum information you will discover that measurements are quite firmly embedded in the discussion,
and this in the manner of other textbooks in which measurements do not actually measure something,
but instead enter as a primitive concept without further definition, a rule for carrying out calculations
which requires no real physical understanding of processes at the microscopic quantum level. My guess

343
Entropy 2017, 19, 645

is that if quantum information texts were to provide a consistent discussion of microscopic properties
and processes, it could lead to some new and interesting advances, and perhaps even some new
insights into quantum foundations.

8. Conclusions
Bell’s question, “Quantum information ... about what?” can be given a quite definite answer.
It is about physical properties and processes, which in quantum theory are represented by subspaces
of the quantum Hilbert space, and to which standard (Kolmogorov) probabilities can be assigned,
using sample spaces constructed from projective decompositions of the identity operator (PDI’s).
The single framework rule of consistent histories forbids combining incompatible PDI’s or frameworks,
resulting in a consistent theory not troubled by unresolved quantum paradoxes. From a quantum
perspective, classical (Shannon) information theory is the application of quantum information theory
to the domain of macroscopic properties and processes, where a single quasiclassical quantum
framework is sufficient for all practical purposes, and therefore quantum incompatibilities can be
ignored. However, in addition, all the ideas of classical information, and in particular its probabilistic
formulation, can be imported unchanged into the microscopic quantum domain, as long as one is
considering only a single quantum framework.
That there are many distinct frameworks available in quantum theory, frameworks which
cannot be combined but can be compared, represents the new frontier of information theory that
is specifically quantum, where classical ideas no longer suffice. At this point, new, and sometimes
very difficult, problems arise in the process of comparing (but not combining) different incompatible
quantum frameworks. They have no analogs in classical information theory, and some of them
are quite challenging. Progress in this domain might well benefit were textbooks to abandon their
outdated “black box” approach to quantum theory, in which “measurement” is an undefined
primitive and measurements do not actually measure anything, but are simply a calculational tool to
collapse wavefunctions. It is past time to open the black box with tools that can consistently handle
noncommuting projectors. Consistent histories provide one approach for doing this; if the reader can
come up with something better, so much the better.

Acknowledgments: Major contributions to the consistent histories interpretation of quantum mechanics have been
made over the years by Roland Omnès, Murray Gell-Mann, James Hartle, and, more recently, Richard Friedberg
and Pierre Hohenberg. We may not agree about everything, but I have certainly reaped great benefit from
conversations with and publications by these colleagues, and it is a pleasure to thank them. I am also grateful for
comments from three anonymous referees.
Conflicts of Interest: The author declares no conflict of interest.

References
1. Bell, J.S. Against measurement. In Sixty-Two Years of Uncertainty; Miller, A.I., Ed.; Plenum Press: New York,
NY, USA, 1990; pp. 17–31. Reprinted in Speakable and Unspeakable in Quantum Mechanics, 2nd ed.; Cambridge
University Press: Cambridge, UK, 2004; pp. 213–231.
2. Von Neumann, J. Mathematical Foundations of Quantum Mechanics; Princeton University Press: Princeton, NJ,
USA, 1955.
3. Griffiths, R.B. What quantum measurements measure. Phys. Rev. A 2017, 96, 32110.
4. Birkhoff, G.; von Neumann, J. The logic of quantum mechanics. Ann. Math. 1936, 37, 823–843.
5. Griffiths, R.B. Consistent Quantum Theory; Cambridge University Press: Cambridge, UK, 2002.
6. Griffiths, R.B. Quantum locality. Found. Phys. 2011, 41, 705–733.
7. Griffiths, R.B. The New Quantum Logic. Found. Phys. 2014, 44, 610–640.
8. Isham, C.J. Quantum logic and the histories approach to quantum theory. J. Math. Phys. 1994, 35, 2157–2185.
9. Griffiths, R.B. The Consistent Histories Approach to Quantum Mechanics. Stanford Encyclopedia of
Philosophy. 2014. Available online: http://plato.stanford.edu/entries/qm-consistent-histories/ (accessed on
29 November 2017).
10. Griffiths, R.B. A consistent quantum ontology. Stud. Hist. Philos. Mod. Phys. 2013, 44, 93–114.

344
Entropy 2017, 19, 645

11. Gell-Mann, M.; Hartle, J.B. Classical equations for quantum systems. Phys. Rev. D 1993, 47, 3345–3382.
12. Omnès, R. Understanding Quantum Mechanics; Princeton University Press: Princeton, NJ, USA, 1999.
13. Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley: New York, NY, USA, 2006.
14. Griffiths, R.B. Consistent quantum measurements. Stud. Hist. Philos. Mod. Phys. 2015, 52, 188–197.
15. Nielsen, M.A.; Chuang, I.L. Quantum Computation and Quantum Information; Cambridge University Press:
Cambridge, UK, 2000.
16. Griffiths, R.B. Types of quantum information. Phys. Rev. A 2007, 76, 062320.
17. Renes, J.M.; Dupuis, F.; Renner, R. Efficient polar coding of quantum information. Phys. Rev. Lett. 2012,
109, 050504.
18. Coles, P.J.; Piani, M. Complementary sequential measurements generate entanglement. Phys. Rev. A 2014,
89, 010302.
19. Bennett, C.H.; Brassard, G.; Crépeau, C.; Jozsa, R.; Peres, A.; Wootters, W.K. Teleporting an unknown
quantum state via dual classical and Einstein-Podolsky-Rosen channels. Phys. Rev. Lett. 1993, 70, 1895–1899.
20. Griffiths, R.B.; Niu, C.-S. Semiclassical Fourier transform for quantum computation. Phys. Rev. Lett. 1996, 76,
3228–3231.

345
entropy
Article
Entropic Phase Maps in Discrete Quantum Gravity
Benjamin F. Dribus
Department of Mathematics, William Carey University, 710 William Carey Parkway, Hattiesburg, MS 39401,
USA; [email protected] or [email protected]; Tel.: +1-985-285-5821

Received: 26 May 2017; Accepted: 25 June 2017; Published: 30 June 2017

Abstract: Path summation offers a flexible general approach to quantum theory, including quantum
gravity. In the latter setting, summation is performed over a space of evolutionary pathways in a
history configuration space. Discrete causal histories called acyclic directed sets offer certain advantages
over similar models appearing in the literature, such as causal sets. Path summation defined in terms
of these histories enables derivation of discrete Schrödinger-type equations describing quantum
spacetime dynamics for any suitable choice of algebraic quantities associated with each evolutionary
pathway. These quantities, called phases, collectively define a phase map from the space of evolutionary
pathways to a target object, such as the unit circle S1 ⊂ C, or an analogue such as S3 or S7 . This paper
explores the problem of identifying suitable phase maps for discrete quantum gravity, focusing on a
class of S1 -valued maps defined in terms of “structural increments” of histories, called terminal states.
Invariants such as state automorphism groups determine multiplicities of states, and induce families
of natural entropy functions. A phase map defined in terms of such a function is called an entropic
phase map. The associated dynamical law may be viewed as an abstract combination of Schrödinger’s
equation and the second law of thermodynamics.

Keywords: quantum gravity; discrete spacetime; causal sets; path summation; entropic gravity

1. Introduction

1.1. Path Summation in Quantum Gravity

Feynman’s path summation approach to quantum theory [1], originally developed in the
non-relativistic context of four-dimensional Euclidean spacetime R4 , has since been abstracted and
generalized to apply to a wide variety of situations in which quantum effects play a significant role,
including the study of fundamental spacetime structure and quantum gravity. In the latter setting,
the objects over which summation is performed are no longer spaces of paths in low-dimensional real
manifolds whose elements represent events, but spaces of evolutionary pathways in configuration
spaces whose elements represent histories, i.e., entire spacetimes. The distinction between summing
over evolutionary pathways for histories and summing over histories themselves becomes significant
in the background independent context, where each pathway represents a history together with a
generalized frame of reference, and where different pathways may encode identical physics. For both
conceptual and computational reasons, histories incorporating a version of discreteness and a notion
of causal structure are especially attractive for studying quantum gravity. Such histories include
“purely causal” objects such as causal sets [2] and causal networks [3–5], “mostly causal” objects such as
causal dynamical triangulations [6] and quantum causal histories [7], and objects incorporating a significant
degree of additional structure, such as spin foams [8,9], quantum cellular automata [10], causal fermion
systems [11,12], and tensor networks [13]. The histories studied in this paper, called acyclic directed sets,
resemble causal sets and causal networks, but with a few important distinctions [14–16].

Entropy 2017, 19, 322; doi:10.3390/e19070322 347 www.mdpi.com/journal/entropy

Entropy 2017, 19, 322

1.2. Path Summation Rudiments

I recall here a few basic notions regarding conventional path summation. In ordinary quantum
mechanics and quantum field theory, one considers directed paths γ representing possible particle
trajectories in a fixed spacetime manifold, such as Euclidean spacetime R4 or Minkowski spacetime R3+1 .
Such paths are illustrated in the left-hand diagram in Figure 1, adapted from Figure 6.2.2 of [14].
One begins with a classical theory, whose dynamics is determined by a Lagrangian L encoding
information about motion-related or metric quantities. L may be regarded as an inﬁnitesimal path
functional, i.e., a function of the particle motion whose value depends only on instantaneous information
along γ. This viewpoint generalizes naturally to more abstract settings. The classical action S(γ) is
given by integrating L along γ with respect to time. Hamilton’s principle states that the classical path
γCL renders the classical action stationary. Heuristically, this means that L “chooses” γCL from among
other alternatives by how S varies with γ. The classical equations of motion are the Euler–Lagrange
equations for L, derived via Hamilton’s principle.

Classical path γCL ;

action stationary

“Nearby" paths;
action deviates

Figure 1. In a ﬁxed spacetime background, the Lagrangian L “chooses” the classical path γCL via
Hamilton’s principle; in a background independent theory, different paths imply different spacetimes.

In the corresponding quantum theory, the behavior of the particle depends on contributions
from every possible path. To quantify this dependence, one deﬁnes a phase map Θ on a space of paths
in spacetime, given by Feynman’s formula
i
Θ(γ) = e h̄ S(γ) , (1)
√
where i = −1 and h̄ is Planck’s reduced constant. For convenience, I use the term “phase” for the
i
value e h̄ S(γ) itself, rather than for the “angle” 1h̄ S(γ) in the complex exponential. One then performs a
path integral to “sum together” these phases. Feynman’s path integral for paths in a subset R of R4 is
the prototypical example. Its value is interpreted as a complex quantum amplitude for R, encoding the
probability that the particle follows a path through R. Due to Hamilton’s principle, phases for paths
near the classical path γCL combine via constructive interference to yield relatively large amplitudes for
neighborhoods of γCL , while phases for faraway paths destructively interfere. Schrödinger’s equation
for ordinary nonrelativistic quantum theory

∂ψ
ih̄ = Hψ, (2)
∂t
may be derived from Feynman’s path integral [1]. Here, ψ is the state function for the particle, and H
is the Hamiltonian operator.

348
Entropy 2017, 19, 322

1.3. Effects of Gravity

Gravitation alters this picture by introducing interaction between spacetime and its material content.
It no longer suffices to consider particle paths in a fixed spacetime manifold, because different paths induce
different local responses in spacetime geometry. The right-hand diagram in Figure 1 illustrates this
complication, showing a region of spacetime “warping” around a path. Absence of a fixed spacetime
background in this context is called background independence. Einstein’s equation, conventionally
expressed in the form
1 8πG
Rμν − Rgμν + Λgμν = 4 Tμν, (3)
2 c
quantifies this coupling between geometry and matter under the framework of general relativity.
Here, Rμν is the Ricci curvature tensor, R is the scalar curvature, gμν is the metric tensor, Λ is the
cosmological constant, G is Newton’s gravitational constant, c is the speed of light, and Tμν is the
stress-energy tensor. Ultimately, one expects both geometry and matter to emerge from some deeper
structural substratum, and this has been a consistent theme of fundamental physics since the early
unification efforts of Einstein, Kaluza and Klein, Weyl, and a few others. Unification would offer a
perfect version of background independence by eliminating all distinction between a background
“arena” and foreground “objects”. Discrete causal theory [14] represents one specific effort toward
the goal of unification. More generally, any background independent adaptation of path summation
associates a different copy of spacetime with each possible distribution of matter and energy, and this
leads to sums involving entire configuration spaces of spacetimes. Each such spacetime is classically
self-contained, in the sense that it describes its own complete version of events, and has no ordinary
causal interaction with other possible spacetimes. In this context, a spacetime is often called a history,
and a configuration space S of spacetimes is called a history configuration space.
A subset of a history configuration space S equipped with a total order, such as the image of a
non-self-intersecting directed path γ in S, does not represent “classical dynamics”, since each history
contains its own complete description of events. However, certain special totally ordered subsets of S
may be interpreted as representing “growth” or “development” of one history into another, and such
subsets are called evolutionary pathways in S. Technical requirements for evolutionary pathways are
discussed below. Such pathways may or may not possess initial or terminal histories, depending on the
structure of S. However, any pair of pathways in S sharing a common terminal history, or a common
“limit” in more general settings, describe identical physics from different points of view. A familiar
example is given by partitioning Minkowski spacetime R3+1 via two different integer-indexed families
{σk } and {σk } of spacelike sections, as illustrated in the left-hand diagram in Figure 2. This diagram
follows the usual convention of suppressing two spacelike dimensions, with time running vertically
up the page. Edges do not represent physical boundaries, but merely delimit the finite region shown.
Discrete evolutionary pathways for R3+1 may be defined via these partitions, as shown in the middle
and right-hand diagrams. One may completely foliate R3+1 by similar families, thereby defining
continuous pathways in a configuration space of Lorentzian manifolds. However, the simpler discrete
picture shown here, in which R3+1 is partitioned into increments of nontrivial causal extent, is more
illustrative of the discrete processes studied in this paper.
Both evolutionary pathways illustrated in Figure 2 describe the same empty, flat spacetime
represented by R3+1 . However, they offer different perspectives regarding the evolution of this
spacetime. These may be identified with different inertial frames of reference on R3+1 , since {σk } and
{σk } are families of parallel spacelike hyperplanes. In more abstract settings, histories may not encode
recognizable geometry, so the relativistic idea of frames of reference must be generalized. However,
the conceptual content remains unchanged: each evolutionary pathway in a history configuration
space S describes a history together with a generalized frame of reference for this history. To qualify as
an evolutionary pathway, a totally ordered subset γ of S must satisfy the property that “later histories
in γ are evolutionary descendants of earlier histories”. Mathematically, this means that the total order
on γ must be derived naturally from the structure of S. The most convenient case is when S itself

349
Entropy 2017, 19, 322

possesses natural order-theoretic structure from which evolutionary relationships may be deduced in
a self-evident way. This is the case for discrete causal theory.

σ1
σ2
σ1

σ0
time
σ0

σ−1
σ− 1

Figure 2. R3+1 partitioned via sequences of spatial sections {σk } and {σk }; evolutionary pathways
deﬁned by {σk } and {σk }. Both pathways share the same “limit history” R3+1 .

1.4. Motivation for Entropic Phase Maps

Histories modeled by objects called countable star finite acyclic directed sets induce discrete causal
history configuration spaces called kinematic schemes, with properties superior in some ways to those
of similar spaces arising in causal set theory, causal dynamical triangulations, and related approaches.
These objects are formally defined in Section 2. Path summation over a kinematic scheme S,
together with other natural machinery, enables derivation of discrete causal Schrödinger-type equations
such as Equation (1.1.2) of [14]. This equation is reproduced here as Equation (4):

− −
ψR;θ (r ) = θ (r ) ∑ ψR;θ (r − ). (4)
r − ≺r

The meaning of this equation is explained in Section 2, and more thoroughly in [14], but I briefly
−
describe its content here. The function ψR;θ is a generalized state function, called the past state function,
while R is a set of relations representing natural relationships between pairs of histories in S,
called co-relative histories. Sequences of co-relative histories fit together to define evolutionary pathways
in S, called co-relative kinematics. The relations r and r − are elements of R representing specific
co-relative histories. The precursor symbol ≺ in the expression r ≺ r − indicates that the evolutionary
relationship represented by r is a possible sequel to the evolutionary relationship represented by r − .
Remaining to be identified in Equation (4) is the relation function θ, which is the entity of principal
interest in this paper. This function assigns to each element r of R a phase θ (r ) belonging to some target
object T. The most obvious choice for T is the unit circle S1 , viewed as a subobject of the complex
field C, and this is the target object focused on here. However, other choices may be studied in more
general contexts. For reasons explained in [14], the unit spheres S3 and S7 , viewed as subobjects of the
quaternions H and octonions O, respectively, are potentially interesting alternatives. At a finer level
of detail, it may be appropriate to consider discrete subobjects of S1 , S3 , or S7 , which possess interesting
algebraic properties. Alternatively, T might be an object at a higher level of algebraic hierarchy, such as
a monoidal category. In any case, T must possess a “multiplicative” operation, enabling the factor θ (r )
−
to multiply the sum ∑r− ≺r ψR;θ (r − ) in Equation (4). Extending θ via this operation, as described below,
defines a phase map Θ on the space of co-relative kinematics in S. The form of Equation (4) assumes
that θ generates Θ in this way; otherwise, the equation must be generalized. Under this assumption,
θ provides specific dynamical content to the equation, and thereby defines a quantum dynamical law
governing fundamental spacetime structure.

350
Entropy 2017, 19, 322

The elements of the relation set R in Equation (4) encode information up to first order at the
quantum level, in the sense that they represent individual stages of evolution in S. Hence, θ is
analogous to an infinitesimal path functional on S, i.e., a generalized Lagrangian. Similarly, Θ may
be regarded as a generalized action. However, to simplify the form of Equation (4), the appropriate
analogue of the exponentiation appearing in Feynman’s phase map (1) is “built in” to the definition
of θ. Hence, the quantities I call “phases” throughout the remainder of the paper are analogous
i
to Feynman’s complex exponentials e h̄ S(γ) themselves, not to the corresponding “angles” 1h̄ S(γ).
The phase Θ(γ) of a co-relative kinematics γ is therefore a product of phases θ (r ) of individual relations
r along γ, rather than a sum or integral. More precisely, one may define a concatenation product 0 joining
co-relative kinematics “end-to-end”, under which γ may be factored into a product of individual
relations γ = ... 0 r0 0 r1 0 r2 0 ... Extending θ multiplicatively then means that Θ(γ) = ∏k θ (rk ),
where the product is in the target object T. Questions of convergence are important in general, but are
not examined here, since one may go quite far under finiteness assumptions.
This paper explores the problem of identifying suitable phase maps for discrete quantum gravity,
focusing on a class of S1 -valued maps defined in terms of terminal states Δ of histories D along
evolutionary pathways γ in a history configuration space S. Here, S is a kinematic scheme of star
finite acyclic directed sets D, γ is a co-relative kinematics, and Δ encodes “recent” causes and effects
in D. Invariants such as state automorphism groups Aut(Δ) determine multiplicities of states, and induce
natural families of entropy functions. Resolution entropy is defined via a “coarse-graining” procedure
called causal atomic resolution, analogous to conventional partitioning of state space into families
of states sharing “macroscopic” properties. Superset entropy is defined by counting the number of
ways in which a terminal state Δ may embed into a larger state Δ called a superset of Δ. A large
state automorphism group Aut(Δ) corresponds to a small number of such supersets, and therefore
implies low entropy. Labeled entropy is defined by counting the number of ways to label elements
of Δ; again, large Aut(Δ) implies low entropy. Symmetry entropy, by contrast, is defined by counting
the elements of Aut(Δ) itself, so large Aut(Δ) implies high entropy in this context. A primitive
version of symmetry entropy is discussed in Section 8.2 of [14]. A phase map defined in terms of such
entropic quantities, or related quantities such as entropy per unit volume, is called an entropic phase map.
The resulting version of Equation (4) may be viewed as an abstract combination of Schrödinger’s
equation and the second law of thermodynamics, which arises entirely from the structure of S.
Section 2 presents the necessary background from discrete causal theory [14] to support
the development and description of these ideas. Section 2.1 briefly outlines the conceptual and
philosophical foundations of discrete causal theory. Section 2.2 describes the classical version of
the theory, expressed in terms of countable star finite acyclic directed sets. Section 2.3 sketches the
theory of relation space, which addresses certain technical difficulties in earlier versions of the theory
such as causal set theory. Section 2.4 describes the basics of discrete quantum causal theory. Section 3
examines entropy and the second law of thermodynamics in a broad context, introduces discrete causal
analogues of familiar thermodynamic ideas such as state space, and develops the specific notions of
entropy mentioned above. Section 3.1 discusses entropy in general terms under a broad framework
called entropy systems. Section 3.2 describes associated versions of the second law. Section 3.3 introduces
discrete causal state spaces. Section 3.4 defines resolution, superset, labeled, and symmetry entropies.
Section 4 introduces entropic phase maps, and examines some of their properties. Section 4.1 describes
some simple versions of these maps explicitly. Section 4.2 discusses the problem of obtaining suitable
interference effects analogous to those induced for Feynman’s phase map by Hamilton’s principle.
Section 4.3 discusses some possible objections to the idea of entropic phase maps, and briefly
examines an alternative approach involving a more conventional notion of action. Section 4.4 offers
concluding remarks, and mentions some mathematical problems whose solution would enhance the
study of entropic phase maps.

351
Entropy 2017, 19, 322

2. Discrete Causal Theory

2.1. Causal Metric Hypothesis

Discrete causal theory is a general approach to fundamental physics that emphasizes
discrete spacetime models equipped with directed structure encoding cause-and-effect relationships
between pairs of events. Included under this umbrella are causal set theory [2], causal dynamical
triangulations [6], and quantum causal histories [7]. Similar ideas contribute to loop quantum
gravity [8,9], information-related approaches involving causal networks or cellular automata [10,17,18],
causal fermion systems [11], and the theory of tensor networks [13]. The version of discrete causal
theory used in this paper is distinct from all these, but may be regarded as an enhanced version of causal
set theory [14]. Clean and appealing basic structure is an asset of discrete causal theory, but its principal
motivation derives from technical results called metric recovery theorems, discussed in Section 2.2,
which demonstrate that discrete causal models can reproduce relativistic spacetime geometry at
ordinary scales. Such models also avoid generic divergence problems, and offer potential explanatory
advantages by allowing “pre-geometric” notions such as spacetime dimension to emerge dynamically.
The reason why these models cannot yet replace relativistic geometry root and branch is because
relativity explains how geometry evolves via Einstein’s Equation (3), while discrete causal dynamics
remains primitive. This paper offers a modest contribution toward rectifying this deficiency.
A radical interpretation of the aforementioned metric recovery results is the causal metric
hypothesis [14–16], which states that the structural properties of the universe, particularly the metric
structure of spacetime, emerge from causal structure at the fundamental scale. This general idea forms
the philosophical basis for discrete causal theory, but may be accorded different weights in different
versions of the theory. The strong interpretation of the causal metric hypothesis ascribes all of physics,
including “nongravitational matter”, to causal structure. In the context of entropic phase maps,
the strong interpretation extends the thermodynamic hypothesis regarding gravitation [19] to treat
matter and energy in similar terms. Alternatively, one may choose to restrict attention to gravity,
leaving aside unification. In this context, matter and energy may be modeled by attaching auxiliary
algebraic structure to causal structure. In either case, quantum theory arises via generalized path
summation in a manner much simpler and more natural than conventional attempts to quantize
relativistic geometry. The directed structures of individual discrete causal histories combine to induce
higher-level multidirected structures on their history configuration spaces, analogous to higher-level
geometric structures of moduli spaces in algebraic geometry. This iteration of structure enables a natural
version of summation over evolutionary pathways, which leads to quantum dynamics governed by
discrete causal Schrödinger-type equations such as Equation (4).

2.2. Classical Theory

The mathematical objects used to model discrete causal histories in this paper are called
countable star finite acyclic directed sets. Before defining them formally, I make two clarifying remarks.
First, these objects are conventionally called “directed graphs” rather than “directed sets”, because the
latter term has a more specific conventional meaning. However, graph-theoretic terminology is
awkward here, and “directed set” ideally communicates the intended notion of a set D equipped
with directions between distinguished pairs of elements x and y. Such a direction is called a
relation between x and y, with initial element x and terminal element y, and is denoted by x ≺ y.
The precursor symbol ≺ generalizes the familiar less than symbol < on a totally ordered set such as Z.
The relation x ≺ y is represented graphically by a directed edge between nodes representing x and y.
A family of such relations is called a binary relation on D, denoted collectively by the same symbol ≺.
Mathematically, ≺ is a subset of the Cartesian product D × D. Dual usage of the word “relation” and
the symbol ≺ for individual relations x ≺ y and for the set ≺ of all such individual relations is a
standard convenience. Second, the choice to focus on acyclic directed sets rules out discrete causal
analogues of closed causal curves, but this is a simplifying assumption that may be relaxed. It does

352
Entropy 2017, 19, 322

not imply the view that quantum gravity necessarily forbids such structure. Countability and/or star
ﬁniteness may also be relaxed, though in my opinion there is limited motivation for doing so.
The following deﬁnitions are adapted from Sections 3.6 and 3.7 of [14]:

Deﬁnition 1. A directed set ( D, ≺) is a set D equipped with a binary relation ≺. A morphism from a
directed set ( D, ≺) to a directed set ( D , ≺ ) is a set map f : D → D such that f ( x ) ≺ f (y) whenever x ≺ y.
The category of directed sets D is the category whose objects are directed sets and whose morphisms are
morphisms of directed sets. A subobject of a directed set ( D, ≺) is a directed set ( D , ≺ ), where D is a subset
of D, and where ≺ is a subset of ≺ consisting of relations between pairs of elements of D . The causal dual of
a directed set ( D, ≺) is the directed set ( D, ≺∗ ), where x ≺∗ y if and only if y ≺ x.

Deﬁnition 2. A multidirected set ( M, R, i, t) consists of a set of elements M, a set of relations R, and initial
and terminal element maps i : R → M and t : R → M. A morphism from a multidirected set
( M, R, i, t) to a multidirected set ( M , R , i , t ) consists of a map of elements f ELT : M → M and a
map of relations f REL : R → R , such that f ELT (i (r )) = i ( f REL (r )) and f ELT (t(r )) = t ( f REL (r )) for
each r in R. The category of multidirected sets M is the category whose objects are multidirected sets and
whose morphisms are morphisms of multidirected sets. A subobject of a multidirected set ( M, R, i, t) is a
multidirected set ( M , R , i , t), where M and R are subsets of M and R, respectively, and where i and t
are the restrictions of i and t to R . The causal dual of a multidirected set ( M, R, i, t) is the multidirected
set ( M, R, t, i ).

Deﬁnition 3. A chain in a multidirected set ( M, R, i, t) is a sequence of relations ..., rk , rk+1 , ... such that
t(rk ) = i (rk+1 ). The past of an element x of ( M, R, i, t) is the set of all elements w in M such that there exists a
chain r0 , ..., r N with i (r0 ) = w and t(r N ) = x. The future of x is the set of all elements y in M such that there
exists a chain r0 , ..., r N with i (r0 ) = x and t(r N ) = y. An antichain in ( M, R, i, t) is a subset σ of M with no
chain connecting any pair of its elements, distinct or otherwise. The past relation set R− ( x ) of an element x
in M is the set of all relations r in R such that t(r ) = x. The future relation set R+ ( x ) of x is the set of all
relations r in R such that i (r ) = x. The relation set R( x ) of x is the union R− ( x ) ∪ R+ ( x ).

For both directed sets and multidirected sets, an isomorphism is an invertible morphism, and an
automorphism is a self-isomorphism. Isomorphic sets are usually considered to be equivalent. It is
often convenient to denote a directed set or multidirected set by just D or M, respectively, or to write
D = ( D, ≺) or M = ( M, R, i, t) to indicate that a set D or M is equipped with such structure. Similarly,
the causal dual of a directed set D may be denoted by D ∗ , and the causal dual of a multidirected set M
by M∗ . A directed set D = ( D, ≺) may be recognized as a multidirected set whose set of relations is
the binary relation ≺, and whose initial and terminal element maps are defined by setting i ( x ≺ y) = x
and t( x ≺ y) = y. For multidirected sets, the notation x ≺ y remains useful to indicate the existence of
a relation r such that i (r ) = x and t(r ) = y, even though no binary relation is involved. The necessity
to study multidirected sets arises at the quantum level, via iteration of structure.
A well-motivated version of discrete classical causal theory is defined by the axioms in Definition 4,
adapted from Definition 4.10.1 of [14]. Symbols and terms are further discussed below.

Deﬁnition 4. Five axioms for discrete classical causal theory are the following:

1. Binary axiom: Classical spacetime may be modeled as a directed set D = ( D, ≺), whose elements
represent events, and whose relations represent causal relationships between pairs of events.
2. Generalized measure axiom: D is equipped with a set function μ from the power set P( D ) of D to
the extended real numbers R ∪ {∞}, which assigns finite positive values to nonempty finite subsets of D,
and infinite values to infinite subsets of D.
3. Countability: D is countable.
4. Star finiteness: For every element x of D, the star St( x ) = { x } ∪ R( x ) of x is finite.
5. Acyclicity: D possesses no cycles, i.e., sequences of relations x0 ≺ ... ≺ x N with x0 = x N .

353
Entropy 2017, 19, 322

The binary axiom specifies both a mathematical structure and a physical interpretation of
this structure. The generalized measure axiom imposes no mathematical conditions on the remaining
axioms, so it is allowed a range of possible versions, each specified by a choice of μ. The most attractive
choices are similar to the counting measure used in early versions of causal set theory, which assigns
to each subset of D its number of elements in fundamental units. The function μ is unrelated to the
family of measures μ for an entropy system, introduced in Section 3.1. Since the star St( x ) of x is just
{ x } ∪ R( x ), star finiteness is equivalent to finiteness of relation sets R( x ). The physical meaning of
this condition is that every event has only a finite number of direct causes and effects. The reason for
using St( x ) rather than R( x ) involves topological bookkeeping that plays no direct role in this paper.
The meanings of countability and acyclicity are self-evident. The discreteness of D is encoded in the
generalized measure axiom and the axiom of star finiteness.
Figure 3, adapted from Figure 3.6.5 of [14], illustrates different types of directed sets and
multidirected sets. Elements are represented by nodes, and relations by directed edges. In the
third and fourth diagrams, directions of relations are indicated by arrows, while in the first and
second diagrams, directions are inferred via an “up the page” convention analogous to the convention
for the direction of time in Minkowski spacetime diagrams. This convention applies only to
acyclic directed sets. The first diagram illustrates a causal set, i.e., a countable, irreflexive, transitive,
interval finite directed set (C, ≺CS ). Irreflexivity means that C contains no “self-relations” x ≺CS x.
Transitivity means that if x ≺CS y and y ≺CS z, then x ≺CS z. Irreflexivity and transitivity together
imply acyclicity. Transitivity leads to trouble in distinguishing between direct and indirect causation
in causal set theory [14,20]. Interval finiteness means that only a finite number of elements y lie
between any two elements x and z of C, in the sense that x ≺CS y ≺CS z. Interval finiteness and
star finiteness are incomparable, i.e., neither condition implies the other. An important class of
causal sets that are generally not star finite are those induced by randomly “sprinkling” elements
into a Lorentzian manifold. These sets are useful to illustrate metric recovery results, but they are
not regarded as physically realistic, even in causal set theory. Star finite objects are preferred as
the actual workhorses for quantum gravity [2,21,22]. The second diagram in Figure 3 illustrates a
nontransitive acyclic directed set; in particular, the two relations x ≺ y and y ≺ z do not imply a
relation x ≺ z. The physical interpretation of this set still recognizes x as a cause of z, but not a direct
cause. This is analogous to the relationship between a grandparent and grandchild. The third diagram
illustrates a directed set D with cycles, including the “self-relation” t ≺ t and the “reciprocal relations”
u ≺ v ≺ u. Such sets are not studied in this paper, but remain interesting in more general contexts.
The fourth diagram illustrates a multidirected set M whose relation structure is more complicated than
any binary relation on its set of elements. For example, there are two distinct relations in M from x to
y. In discrete causal theory, multiple relations between pairs of elements arise at the quantum level,
where a given pair of histories may exhibit multiple direct evolutionary relationships.
Absent from Definition 4 is any specification of classical dynamics. This reflects the philosophy that
physics at the fundamental scale should be described in quantum-theoretic terms. Classical equations of
motion should emerge at larger scales from underlying quantum dynamics, according to a generalized
version of the correspondence principle. All histories obeying suitable axioms should contribute to
this dynamics, with contributions of “well-behaved” histories reinforced via constructive interference,
and contributions of “pathological” histories damped out. There should be no artificial distinction
between “on-shell” histories that obey preconceived classical dynamics, and “off-shell” histories that
do not. All permissible histories should begin on an equal footing, just as all permissible paths begin
on equal footing in conventional path integration.

354
Entropy 2017, 19, 322

(C, ≺CS ) ( D, ≺) ( D , ≺ ) ( M, R, i, t)
u
z z v

y y y

x x x

Figure 3. Causal set; acyclic directed set; directed set; multidirected set.

Structurally attractive models need not be relevant to the actual universe. Genuinely interesting
models exhibit solid connections to established physics. For discrete causal theory, such connections
are provided by the metric recovery theorems of Hawking [23] and Malament [24], and their
generalizations [25–27]. Informally, these theorems state that the causal structure of relativistic spacetime
determines its geometric structure up to scale. The causal metric hypothesis [14–16] strengthens and
generalizes this statement by removing dependence on relativity and the caveat “up to scale”.
If spacetime is precisely smooth and Lorentzian to arbitrary scales, then the causal metric hypothesis
is not quite true, due to this missing scale data. Hence, the hypothesis relies on the assumption that
such data arises in the actual universe from some natural source other than a Lorentzian metric.
What Finkelstein [3,4], Myrheim [28], ‘t Hooft [29], Sorkin [2], and others realized by around
1980 was that discrete causal structure supplies its own natural notion of scale via enumeration
of fundamental elements. Later, it became popular to admit fluctuations in the sizes of elements to
preserve systematic Lorentz invariance [30,31]. The generalized measure axiom in Definition 4 further
relaxes this picture to allow the possible contribution of relation structure in determining volume.
However, the basic lesson of metric recovery is unchanged by these modifications: discrete causal
structure supplies natural scale data absent in continuous causal structure. Hence, Lorentzian geometry
at large scales may be reasonably attributed to discrete causal structure at the fundamental scale.

2.3. Relation Space

A gem of structural philosophy from pure mathematics is Grothendieck’s relative viewpoint,
which emphasizes the study of objects together with their natural relationships. In discrete causal theory,
the relative viewpoint is a conceptual tool of tremendous power and scope. A natural relationship
between a pair of events in this setting is just a causal relationship, represented by a relation x ≺ y
between elements x and y of a directed set D = ( D, ≺). The collection of all such relations is just the
binary relation ≺. It is surprisingly useful to view ≺ as a directed set in its own right, by recognizing
“relations between pairs of relations”. The resulting object R( D ) is called the relation space over D.
Deﬁnition 5, adapted from Deﬁnition 5.1.1 of [14], generalizes this idea to multidirected sets.

Deﬁnition 5. Let M = ( M, R, i, t) be a multidirected set, and let r0 and r1 be elements of its relation set R.

1. The induced relation ≺ on R is deﬁned by setting r0 ≺ r1 if and only if t(r0 ) = i (r1 ).

2. The directed set R( M ) = ( R, ≺) is called the relation space over M.

The induced relation involves a new use of the precursor symbol ≺. Figure 4, adapted from
Figure 5.1.3 of [14], illustrates the relation space R( D ) over an acyclic directed set D. The left-hand
diagram shows the construction of an individual relation r0 ≺ r1 , while the right-hand diagram shows
R( D ) as a whole. More generally, R( M ) may be identiﬁed with the line digraph [32] over the directed
multigraph corresponding to M. Theorem 6 gives the essential properties of relation space.

355
Entropy 2017, 19, 322

Theorem 6. Passage to relation space deﬁnes a functor R from the category M of multidirected sets to the
category D of directed sets. This functor sends acyclic multidirected sets to irreducible acyclic directed sets,
and preserves star ﬁniteness.

Proof. See [14], Theorem 5.1.4.

D R( D )

r1
r0 ≺ r1
y

r0
x

Figure 4. Induced relation between relations r0 and r1 in a directed set D; global view of R( D ).

An important application of relation space in discrete causal theory is to eliminate a technical

problem called permeability [33,34], which obstructs formulation and solution of initial value problems.
In such a problem, one begins by specifying information associated with a maximal antichain σ in
a directed set D, which is analogous to a spatial section of relativistic spacetime. One then attempts
to solve for corresponding data throughout the future of σ. In general relativity, a Cauchy surface σ
in a Lorentzian manifold X is an impermeable maximal antichain with respect to the causal structure
of X, meaning that every inextensible causal curve in X intersects σ. Cauchy surfaces are useful for
formulating initial value problems, because information cannot permeate a Cauchy surface σ to affect
its future without being “ﬁltered” by σ. Lorentzian manifolds containing Cauchy surfaces are called
globally hyperbolic. The left-hand diagram in Figure 5, adapted from Figure 5.4.1 of [14], illustrates two
causal curves intersecting a Cauchy surface in a globally hyperbolic manifold.

z
X D

σ
x
y
σ

w
Figure 5. Cauchy surface σ in a globally hyperbolic manifold X, intersected by two causal curves;
maximal antichain σ in a directed set D, permeated by two chains.

356
Entropy 2017, 19, 322

In discrete causal theory, a typical maximal antichain σ in a typical directed set D is permeable,
meaning that chains in D may pass through σ from past to future without intersecting σ. In causal
set theory [33], this phenomenon is referred to as “missing links”; the antichain σ is compared to
a “sieve” [34], which is “by-passed” by a “large amount of geometric information”. “Thickened
antichains”, obtained by adding limited quantities of past and future elements to σ, typically suffer
from the same problem. Hence, maximal antichains are not good analogues of Cauchy surfaces in
causal set theory, and the same statement applies to discrete causal theory in general. The right-hand
diagram in Figure 5 illustrates a pair of chains permeating a maximal antichain σ in an acyclic
directed set. The dashed lines connecting the elements of σ are a visual aid, not part of the structure.
Permeability means that information can leak through σ, for example, from w to z. Besides posing a
general obstacle to discrete causal dynamics, this problem also has as a specific bearing on the definition
and analysis of entropic quantities, again typified in the causal set context [35,36]. Fortunately, however,
this problem disappears upon passage to relation space.

Theorem 7. Maximal antichains in relation space are impermeable. That is, if σ is a maximal antichain in
the relation space R( M ) over a multidirected set M, and if γ is a chain of relations in R( M) beginning at an
element in the past of σ and terminating at an element in the future of σ, then γ intersects σ.

Proof. See [14], Theorem 5.4.3.

Path summation in discrete causal theory is described in terms of impermeable antichains,

and therefore depends on the theory of relation space in an essential way.

2.4. Quantum Theory

Just as relations between pairs of events are central to discrete classical causal theory,
so directed relationships between pairs of histories are central to discrete quantum causal theory.
These relationships are called co-relative histories. The word “relative” refers to the relative viewpoint,
while the preﬁx “co” derives from covariant constructions in category theory. The physical
interpretation of a co-relative history is that it encodes the evolution of one history into another.
The left-hand diagram in Figure 6, adapted from Figure 6.4.6 of [14], illustrates a family of four
co-relative histories sharing a common initial history, called a cobase. The right-hand diagram illustrates
how these co-relative histories are represented by morphisms of directed sets.

τ3 (y) τ3 (y) τ4 (y)

τ2 (y) τ3 ( x ) τ3 ( x ) τ4 ( x )
h4 τ2 ( x )
h3 τ1 (y)
τ3 τ3 τ4
h2
τ2
h1 y τ1 ( x ) y
τ1
x x

Figure 6. Four co-relative histories sharing a common cobase with two elements x and y and one
relation x ≺ y; morphisms (transitions) representing these co-relative histories.

Individual morphisms in the category D of directed sets do not always uniquely represent
evolutionary relationships, due to symmetries. For example, the co-relative history h3 in Figure 6 is
represented by two different morphisms τ3 and τ3 , due to the symmetry interchanging the two
maximal elements of its target history. Hence, co-relative histories are deﬁned as equivalence
classes of morphisms. It is convenient to restrict attention to special morphisms called transitions,

357
Entropy 2017, 19, 322

which represent “growth” of directed sets. This idea is made precise in Definition 8, adapted from
Definition 6.3.4 of [14]. Co-relative histories are then introduced in Definition 9, adapted from
Definition 6.4.3 of [14].

Deﬁnition 8. A transition in the category D of directed sets is a monomorphism τ : D → D , embedding its

source D into its target, D , as a proper, full, originary subobject. Here, “proper" means that τ ( D ) has
nontrivial complement in D , “full" means that τ ( x ) ≺ τ (y) in D if and only if x ≺ y in D, and “originary"
means that the isomorphic image τ ( D ) of D in D contains its own past.

At a less-formal level, the condition that τ is a monomorphism means that τ does not “erase”
details of the source D. The “proper” condition means that τ encodes nontrivial change. The “full”
condition means that τ does not “edit” details of D. The “originary” condition means that τ does not
add “prehistory” to D. These conditions support the desired evolutionary interpretation.

Deﬁnition 9. A proper, full, originary co-relative history h : Di ⇒ Dt is an equivalence class of

transitions τ : Di → Dt , where two transitions τ and τ are equivalent if and only if there exists an
automorphism β of Dt mapping τ ( Di ) onto τ ( Di ). The common source Di of the transitions representing h is
called the cobase of h, and the common target Dt of these transitions is called the target of h.

The subscripts i and t in the expression h : Di ⇒ Dt stand for “initial” and “terminal”.
This notation is different from the notation for arbitrary transitions in Definition 8, since Sections 3 and 4
feature auxiliary transitions related to h that do not belong to the equivalence class defining h.
The proper, full, and originary conditions in Definition 9 allow the unadorned term “co-relative history”
to mean something more general, but co-relative histories in this paper always satisfy these conditions,
except in the context of superset microstates in Definition 15, where they need not be full.
Each transition in the equivalence class defining h is said to represent h. The “double arrow” notation ⇒
emphasizes that h may be represented by more than one transition, but often h is uniquely represented
due to the rigidity of typical “large” directed sets [37], which plays an important role in Sections 3 and 4.
It is useful to think of h as “adding elements and relations to Di to produce Dt ”, but one cannot
always identify specific elements and relations as “the ones added” since h is an equivalence class.
Multiple inequivalent transitions, and hence multiple co-relative histories, may exist between a given
pair of directed sets, even a pair differing by a single element. This implies multidirected structure at
the quantum level.
Choosing a suitable family K of directed sets, together with a suitable family H of co-relative histories
between pairs of members of K, one obtains a structure S called a kinematic scheme, which serves as a history
configuration space. The word “kinematic” means that S encodes possible behavior, without identifying
what specific behavior is determined or favored under specific conditions. The latter question involves
dynamics. As an analogy, relativistic kinematics describes possible particle paths, e.g., ruling out
spacelike motion, but the paths of specific particles depend on dynamical information. S possesses
natural multidirected structure induced by H, elaborated below. Sequences of co-relative histories in
S define evolutionary pathways called co-relative kinematics, abstractly analogous to particle paths in
conventional path summation. The conditions that S must satisfy to qualify as a kinematic scheme are
that H must include enough co-relative histories to describe the evolution of any history in K, and
K must contain all “ancestors” of its members. These conditions are made precise in Definition 10,
adapted from Definitions 7.4.1 and 7.4.7 of [14]. An additional desirable property, called the generational
property, allows each co-relative history in H to be “factored into generations”. However, this property
is not studied in this paper, and it is preferable to omit it from the definition.

Deﬁnition 10. A kinematic scheme is a pair S = (K, H), where K is a class of directed sets, and H is a
class of co-relative histories between pairs of members of K satisfying the following properties:

358
Entropy 2017, 19, 322

1. Accessibility: If D is in K, then there exists a sequence of co-relative histories in H terminating at D.

2. Hereditary property: K is closed under the formation of proper, full, originary subobjects.

Figure 7, adapted from Figure 7.5.2 of [14], illustrates a portion of a kinematic scheme SPS called
the positive sequential kinematic scheme, which serves as a source of examples throughout the remainder
of the paper. SPS is modeled after a kinematic scheme of finite causal sets appearing implicitly in
Sorkin and Rideout’s theory of sequential growth dynamics [38]. Similar structures appear elsewhere in
the work of Sorkin [39], Isham [40–43], Markopoulou [7], and others. The objects illustrated inside
each large open node in the figure are members of the class K of directed sets of SPS , which is the
class of finite acyclic directed sets. This class is more restrictive than the class specified by Definition 4,
which requires only countability. The edges connecting the large open nodes represent members of the
class H of co-relative histories of SPS , which are those that “add a single new element to their targets”.
This means that if h : Di ⇒ Dt belongs to H, and if τ : Di → Dt is a transition representing h,
then the complement of τ ( Di ) in Dt is a singleton. The gray-colored nodes illustrate how the set of four
co-relative histories appearing in Figure 6 embeds into SPS . The thickened edges illustrate a co-relative
kinematics in SPS , whereby the empty set 1 evolves into a directed set D with four elements and
three relations. The specific transition or transitions representing each co-relative history illustrated
in the figure may be inferred in a straightforward manner from the directed structures of its cobase
and target; for example, there is a unique transition τ representing the final co-relative history in the
co-relative kinematics terminating at D. The “new element added by τ”, i.e., the complement of the
image of τ, is the top-right element indicated by the arrow.

Figure 7. Positive sequential kinematic scheme SPS (ﬁrst four generations); gray nodes show the four
co-relative histories from Figure 6; thickened edges illustrate a co-relative kinematics.

Given a kinematic scheme S = (K, H), it is useful to associate an abstract multidirected set
M(S) with S, where each member D of K is represented by an element x ( D ) of M(S), and where
each member h : Di ⇒ Dt of H is represented by a relation r (h) from x ( Di ) to x ( Dt ) in M(S).

359
Entropy 2017, 19, 322

M(S) is called the underlying multidirected set of S. Chains in M(S) represent co-relative kinematics
in S. The left-hand diagram in Figure 8, adapted from Figure 7.5.4 of [14], illustrates a portion of the
underlying multidirected set M(SPS ) of the positive sequential kinematic scheme SPS . The chain from
x (1) to x ( D ) represents the co-relative kinematics from 1 to D illustrated in Figure 7. This diagram
illustrates the permeability problem in the context of kinematic schemes; the three nodes connected by
the auxiliary dashed lines represent a maximal antichain in M(SPS ), which is permeated by the chain
from x (1) to x ( D ). It is therefore necessary to work in relation space to properly formulate the theory
of path summation. The right-hand diagram in Figure 8 illustrates part of the relation space R(M(SPS )).
The dark square nodes represent a maximal antichain, which is impermeable by Theorem 7.

x(D)

x (1)

Figure 8. Portion of M(SPS ) illustrating the permeability problem; corresponding portion of R(M(SPS ))
showing an impermeable maximal antichain.

While one could choose to perform path summation over a particular acyclic directed set,
the resulting theory would be background dependent, and hence unsuitable for quantum gravity.
Path summation in the background independent context involves summing phases Θ(γ) associated
with co-relative kinematics γ in a kinematic scheme S. As explained in Section 1.4, these phases are
i
analogous to Feynman’s phases e h̄ S(γ) . Under modest assumptions, Θ(γ) is a product of phases θ (r )
of individual relations representing individual co-relative histories. The relation function θ determines
a speciﬁc form for Equation (4)
− −
ψR;θ (r ) = θ (r ) ∑ ψR;θ (r − ),
r − ≺r

reproduced here for convenience. The setup for deriving this equation is illustrated in Figure 9,
adapted from Figure 6.9.2 of [14], where the derivation is carried out in detail. The auxiliary shading
represents a finite subobject R of the relation space R(M(S)). A choice of maximal antichain σ
partitions R into a disjoint union R = R− ∪ σ ∪ R+ , where σ represents a choice of “present”, and R±
−
are the corresponding past and future regions. The function ψR;θ is called the past state function,
because it depends on all chains in R− , which terminate at elements of σ. Here, one such chain
γ is shown, terminating at an element r ∈ σ, with penultimate element r − . This chain may be
factored into a concatenation product γ− 0 r, where γ− is the subchain of γ terminating at r − ,
−
and this factorization induces a factorization Θ(γ) = Θ(γ− )θ (r ) of phases. The value ψR;θ (r )
is defined to be the sum ∑γ Θ(γ) of the phases of all maximal chains γ in R− terminating at r.
Mathematically, Equation (4) merely organizes the factorizations Θ(γ) = Θ(γ− )θ (r ) for all such γ.
These chains represent co-relative kinematics in the corresponding region of S that lead to the target
history of the co-relative history represented by r. Generalizing to the case of infinite R raises
−
questions of convergence. From an abstract perspective, the function ψR;θ plays a role similar to
that of Feynman’s “wave function” ([1], Section 5), except that no limiting process is necessary to
−
define it, and no normalization constant is required. However, the structural context in which ψR;θ

360
Entropy 2017, 19, 322

arises is much different than in Feynman’s original non-relativistic background dependent setup,
where evolutionary pathways are represented by paths in a ﬁxed copy of R4 . In the present discrete
background independent context, each step along a chain represents a co-relative history, interpreted as
−
the evolution of one spacetime into another. Equation (4) describes how the value of ψR;θ changes when
the evolutionary pathways involved are extended by one additional relation r, which corresponds
to multiplying the associated phases by θ (r ). Abstractly, it arises in almost the same manner as the
ordinary Schrödinger equation under Feynman’s derivation ([1], Section 6), in which segmented
paths approximating continuous evolutionary processes are extended via a time-stepping method.
For Equation (4), however, no approximation is involved, so no limiting process is necessary.

R
(all nodes in R+
shaded region) (future region)

σ
γ (“present")
R(M(S))
r−

γ− R−
(past region)

Figure 9. Setup for deriving Equation (4): γ = γ− 0 r and Θ(γ) = Θ(γ− )θ (r ).

A few further remarks regarding Equation (4) may be helpful. First, it is illuminating to
spell out how the equation can describe quantum-theoretic behavior specifically. This depends partly
on the general properties of path summation, and partly on the choice of relation function θ that
determines the phase associated with each evolutionary pathway. Like virtually any formula involving
path summation over a history configuration space, Equation (4) combines contributions from many
distinct processes involving many distinct histories. This is a familiar feature of quantum-theoretic
superposition, but is not unique to the quantum realm. For example, classical stochastic models such
as Sorkin and Rideout’s theory of sequential growth dynamics [38] organize information in a similar
manner at an abstract level, but are decidedly non-quantum. The classical nature of the latter theory
arises from the assignment of real probabilities, rather than quantum amplitudes, to evolutionary
pathways. Similarly, Feynman’s derivation [1] could just as easily be used to produce a continuous
classical stochastic model, with real probabilities assigned to subspaces of a path space. What leads
to Schrödinger’s equation specifically under Feynman’s setup is Feynman’s choice of phase map, which
produces the type of interference effects necessary to describe quantum-theoretic behavior. Similar
considerations apply in the discrete causal context. For different choices of θ, Equation (4) could be
used to describe a classical stochastic model, or a quantum-theoretic model, or neither. This highlights
why the choice of phase map is so crucial to the theory. As described in Section 1.4, the most
obvious choice of target object for a quantum-theoretic phase map is the choice made by Feynman,
namely, S1 . Alternative choices can be interesting, but this paper focuses on S1 -valued phase maps
almost exclusively. Second, due to the quantum-gravity-related focus of this paper, it is worth
noting that Equation (4) shares certain similarities with the Wheeler-Dewitt equation, but these are not
explored here. Third, allowing cycles complicates the picture, and this generalization is not considered
here. Fourth, many different kinematic schemes typically share a given class K of directed sets, and
different schemes offer different perspectives regarding the evolution of families of histories. Physical

361
Entropy 2017, 19, 322

predictions must be independent of these choices, and this is expressed by saying that the theory must
be covariant. In practical terms, this means that if one changes S, then one generally must change θ to
compensate. This paper mostly ignores covariance issues.
Figure 10 illustrates a sequential growth process in SPS , in which a history D7 with seven elements
evolves into a history D11 with eleven elements via a sequence of co-relative histories labeled h7 to h10 .
These co-relative histories are represented by relations r (h7 ) to r (h10 ) in R(M(SPS )), abbreviated by r7
to r10 . This growth process serves as a source of examples in Sections 3 and 4. Each pair of consecutive
histories in Figure 10 encodes the same type of information associated with a single square node
in Figure 9, since these nodes represent co-relative histories. Given such a process, the goal is to
define phases measuring the “favorabilities” of each co-relative history. The black nodes and edges
represent the first-degree terminal states T 1 ( D7 ) to T 1 ( D11 ) of the histories D7 to D11 , which encode the
first-order information in each history, i.e., the “physically new” information, consisting of only the
most recent causes and effects. First-degree terminal states are featured repeatedly in Chapters 7 and 8
of [14], where they are described via terminology such as “structural increments” or “generations”.
By definition, only one element in each history is “new” from the perspective of the sequential growth
process itself; these new elements are indicated by arrows. However, this process is merely one way of
describing the evolution of D11 , and therefore involves arbitrary extraphysical choices regarding the
order of appearance of elements. Terminal states T n ( D ) of degree n are introduced in Definition 13.
For n > 1, there is a distinction between degree and order; for example, second-degree terminal states
may encode information of arbitrarily high order. It is convenient to use the abbreviation Δk for T 1 ( Dk ),
which highlights the fact that Δk is a “structural increment” of Dk . To avoid clutter, only Δ8 is labeled
in the figure. The symbol Δ is used in later sections to denote states of arbitrary degree.

D11

D10

“new" element D9 h10

in D8
D8 h9

D7 h8
each such pair
h7 represented by
a square node
Δ8 (black)
in Figure 9
ﬁrst-degree
terminal state
of D8
Figure 10. Sequence of co-relative histories in SPS ; terminal states indicated by dark nodes and edges;
“new elements” added by each co-relative history indicated by arrows.

First-degree terminal states are analogous to “present states” in conventional physics, involving data
up to first order, such as position and velocity. Familiar notions of entropy are associated with such
“present states”, not with entire histories. In particular, the second law of thermodynamics compares
the entropy of a “present state” to that of “previous states”; it does not involve a “higher-dimensional
entropy” associated with the entire history leading up to the present state. The evolution of physical
systems does not seem to be sensitive to details of the distant past; otherwise, one could not perform
reliable experiments without knowing the exact history of each piece of experimental equipment.
More formally, Lagrangians are typically assumed to depend on information only up to ﬁrst order.
The form of Equation (4) imposes an analogous assumption at the level of kinematic schemes, since
the relation function θ is analogous to a Lagrangian on S. As discussed in Section 3.3, higher-order

362
Entropy 2017, 19, 322

information at the level of individual histories is not a priori irrelevant in discrete causal theory, but
contributions from the distant past likely play a negligible dynamical role. Hence, the simplest “serious"
entropic phase maps are defined in terms of first-degree terminal states, and more-sophisticated phase
maps may be regarded as refinements of such maps.

3. Entropy and the Second Law of Thermodynamics

3.1. Entropy
Entropy, in the statistical sense pioneered by Boltzmann, may be understood very generally in
terms of the distinguishability of objects described at two different levels of detail, one regarded
as fine, and the other regarded as coarse. The prototypical application of this idea occurs in
statistical thermodynamics, in which the fine level of detail for a system, such as a fixed quantity of ideal
gas, is described in terms of microscopic data, such as the positions and momenta of individual molecules,
while the coarse level of detail is described in terms of macroscopic data, such as pressure, volume,
and temperature. Each possible choice of macroscopic data defines a coarse description of the system,
called a macrostate, while each possible choice of microscopic data defines a fine description, called a
microstate. Each macrostate generally corresponds to many different microstates, since many different
choices of microscopic data may be approximated by identical macroscopic data. The entropy of
a macrostate measures the quantity of corresponding microstates in a manner that is additive for
composite systems. In more general terms, objects distinguishable at some fine level of detail may be
indistinguishable at some coarser level, and a notion of entropy may be associated with the two levels
to quantify this difference in distinguishability. In particular, generalizations of Boltzmann entropy
such as Gibbs, Shannon, and Rényi entropies fall under the same conceptual umbrella. Measures of
entropy familiar in ordinary quantum theory, such as von Neumann entropy, are less relevant, since
they depend on specific algebraic apparatus less general than the path summation approach.
In statistical thermodynamics, the state space for a system is an abstract space parameterizing the set
of possible microstates of the system for some choice of fine detail. A choice of coarse detail partitions
state space into a family of subsets representing the possible macrostates of the system, where the points
of each subset parameterize the microstates associated with the corresponding macrostate. Such a
partition is called a coarse-graining of the state space. The left-hand diagram in Figure 11 illustrates such
a coarse-graining, where the cells representing macrostates are separated by solid lines. Dotted lines and
labels are explained below. Such a planar diagram could be interpreted literally as encoding the possible
position and momentum of a single particle moving in one real dimension, but all such diagrams in
this paper are schematic. Conventional state spaces are real manifolds, and therefore exhibit notions
of proximity, volume, and other topological and metric structure. However, their dimensions are
typically quite large, and this implies properties that are not well-represented by planar diagrams;
for example, each region typically has very many neighbors. Even in 24-dimensional Euclidean
space, each sphere in the regular packing induced by the Leech lattice is tangent to 196, 560 neighbors;
one may imagine the situation in 1024 -dimensional space. Abstract metric-related ideas remain useful
for describing the properties of discrete causal state spaces, but planar diagrams only roughly represent
these notions.

363
Entropy 2017, 19, 322

Wk
V

Figure 11. Partitions of state space; conventional state spaces exhibit regions of very different sizes;
state space inducing an “inverse second law of thermodynamics”.

Generalizing the thermodynamic picture, any set S of objects may be partitioned into a family
of subsets P, where the objects belonging to each subset are regarded as equivalent at a coarse level
of detail. More generally still, one may consider a strictly partially ordered family Π := { Pα }α∈ A of
partitions Pα of S for some index set A, where by definition Pα ≺ P β if Pα
= P β and if every member
of Pα is a union of members of P β . In this case, P β is called a refinement of Pα . Here, ≺ does not
represent causal structure, and superscript indices are used to distinguish information filtering from
mere enumeration. One may define equivalence relations ∼α on S for each α in A, where s ∼α s if s
and s belong to the same subset under Pα . If Pα ≺ P β , then Pα induces a quotient partition Pαβ of the
quotient set S β := S/ ∼ β in an obvious way. Any such choice of Pα and P β may be used to define
notions of coarse and fine detail. Returning to Figure 11 in this more abstract setting, the large regions
bordered by solid lines in the left-hand diagram represent a choice Pα of coarse detail for a set S,
while the small regions bordered by dotted lines represent a choice P β of fine detail. Here, Pα and
P β each partition S into subsets of roughly equal size, but a typical coarse-graining in conventional
thermodynamics exhibits vast differences in the sizes of regions, and correlations exist involving
proximity and size. The middle diagram in Figure 11 illustrates such a coarse-graining. As emphasized
by Penrose [44], such details are crucial for understanding whether a typical system can be expected to
exhibit a systematic increase in entropy. For example, the right-hand diagram in Figure 11 illustrates a
state space that induces an “inverse second law of thermodynamics”, in the sense that a typical path in
this space moves from larger to smaller cells. If Pα ≺ P β , and if each member of Pα is a finite union of
members of P β , then one may define multiplicities and entropies via counting: if V ⊂ S is a member
of Pα , and if V = ∪kK=1 W k for members W k of P β , then the multiplicity μαβ (V ) of V is K, and the
entropy eαβ (V ) of V is log K. The choice of notation for μαβ and eαβ is intended to emphasize the
relative viewpoint: multiplicities and entropies are properly understood in terms of natural relationships
between levels of detail, not in terms of any specific level of detail. For the set V shown in the left-hand
diagram in Figure 11, the entropy is eαβ (V ) = log 7, since P β subdivides V into seven regions. In more
general settings, it may be necessary to measure the sizes of members of Pαβ via some measure μαβ
other than the counting measure.

Deﬁnition 11. An entropy system (S, Π, μ) consists of a set S, a set Π := { Pα }α∈ A of partitions Pα of S for
some index set A, strictly partially ordered by reﬁnement, and a family μ of measures μαβ on the quotient sets S β ,
one for each relation Pα ≺ P β in Π. Each such relation induces an entropy quadruple (S, Pα , P β , μαβ ).
The entropy of a member V of Pα is eαβ (V ) := log μαβ (V β ), where V β ⊂ S β is the image of V under the
quotient map S → S β , and where log ∞ is understood to mean ∞.

It is often convenient to denote an entropy quadruple by just S, or to write S = (S, Pα , P β , μαβ )

to indicate that a set S is equipped with such a structure. The functions μαβ are taken to be measures
here for simplicity, but the situation could be generalized further. In particular, the target object of

364
Entropy 2017, 19, 322

μαβ need only be a totally ordered set. One may also abstain from using logarithms to “rescale” μαβ .
However, it suffices here to consider only the counting measure on a finite set or the Lebesgue measure
on a finite-dimensional real manifold, and logarithms are useful for producing quantities that are
additive for composite systems. The reason for using “e” instead of the familiar “h” for entropy is
because “h” is used here to represent co-relative histories. Figure 12 illustrates a simple entropy system
(S, Π, μ) whose underlying set S is the unit interval [0, 1] in R. The set Π of partitions of S has members
P0 , P1 , P2 , and P3 , which subdivide S into segments of equal lengths 1, 1/2, 1/3, and 1/6, respectively.
P0 is the trivial partition, under which S represents a single macrostate. The strict partial order ≺
on Π consists of five individual relations P0 ≺ P1 , P0 ≺ P2 , P0 ≺ P3 , P1 ≺ P3 , and P2 ≺ P3 ,
each of which induces an entropy quadruple. The quotient sets S0 , S1 , S2 , and S3 have 1, 2, 3 and
6 elements, respectively. There are two nontrivial quotient partitions, P13 and P23 , which subdivide
the quotient set S3 into equal-sized subsets with 3 and 2 elements, respectively. Multiplicities and
entropies of some representative subsets of S with respect to different entropy quadruples are also listed.
For example, the subset U = ( 12 , 1] of S has measure μ13 (U ) = 3 and entropy e13 (U ) = log 3 with
respect to the entropy quadruple (S, P1 , P3 , μ13 ).

Partitions: Quotient Sets: Quotient Partitions:

P3 : S3 : P13 : P23 :

V S2 :
Quadruples: Measures and Entropies:
P2 :
S1 : (S, P0 , P1 , μ01 ) μ01 (S) = 2, e01 (S) = log 2
U (S, P0 , P2 , μ02 ) μ02 (S) = 3, e02 (S) = log 3
P1 : S0 : (S, P0 , P3 , μ03 ) μ03 (S) = 6, e03 (S) = log 6
P3 (S, P1 , P3 , μ13 ) μ13 (U ) = 3, e13 (U ) = log 3
(S, P2 , P3 , μ23 ) μ23 (V ) = 2, e23 (V ) = log 2
P0 : Partial 1
P2 μ12 , e12 undeﬁned
order: P

S: P0

Figure 12. A simple entropy system on the unit interval S = [0, 1] ⊂ R.

The motivation for adopting such a general viewpoint is that multiple “levels” of entropy are
evident in discrete causal theory. An important example involves the nth-degree terminal states
T n ( D ) mentioned in Section 2.4 and formally introduced in Definition 13. Given two directed sets D
and D , it may be the case that T n ( D ) and T n ( D ) are isomorphic, while T n+1 ( D ) and T n+1 ( D ) differ.
In this case, D and D are indistinguishable at the level of detail specified by the index value n,
but become distinguishable at the finer level of detail specified by the index value n + 1. On the level
of individual elements, two elements x and y belonging to a subobject Δ of a directed set D may be
“locally indistinguishable”, in the sense that they are interchanged by an automorphism of Δ, but may
be “globally distinguishable”, in the sense that no such automorphism extends to an automorphism
of D. More generally, one may consider chains of subobjects Δ = Δ1 ⊂ Δ2 ⊂ ... ⊂ Δn ⊂ D containing x
and y, some of which possess automorphism groups interchanging x and y, and some of which do not.
Of obvious interest is the case in which Δ1 is a low-order terminal state of a history, and Δn for n > 1
are progressive “thickenings” of Δ.
While entropy is defined by associating entire families of “fine” states with individual
“coarse” states, it is sometimes interesting to compare the amount of detail encoded by specific pairs
of states. It is then natural to relate such “local comparisons” to the “global comparisons” leading to

365
Entropy 2017, 19, 322

entropy systems. In this context, one need not distinguish a priori between macrostates and microstates;
states are defined individually by specifying varying degrees and types information about an object
or system, and are then compared and categorized. Given two such states Δ and Δ , it is sometimes
possible to unambiguously identify Δ as more detailed than Δ, or vice versa. In other cases, Δ and
Δ are incomparable, in the sense that Δ contains more of one type of information, while Δ contains
more of another. In this setting, one may recognize a natural partial order ≺ on the family of states
under consideration, where Δ ≺ Δ if and only if Δ is unambiguously more detailed than Δ. This type
of partial order is different from the partial orders on sets of partitions in Definition 11, but the two
types of structure are related. For example, given an entropy quadruple (S, Pα , P β , μαβ ), the set Pα ∪ P β
is a subset of the power set P(S) of all subsets of S. The relation Pα ≺ P β means that every member V
of Pα is a union of members W of P β . One may define an induced relation on Pα ∪ P β , also denoted
by ≺, where V ≺ W if and only if V is a proper superset of W. Hence, a single relation between
two partitions induces a partial order on a corresponding family of subsets. This partial order is of a
special type, with maximal chain length 1, because its only relations are those of the form V ≺ W for
V ∈ Pα and W ∈ P β such that W ⊂ V. However, one may easily define partially ordered sets with
longer chains by considering sequences of partitions ... ≺ Pn ≺ Pn+1 ≺ ...
Working in the opposite direction, one may begin with a partial order ≺ on an arbitrary set Σ.
Here, Σ is viewed as an abstract analogue of a family of states encoding various types and quantities
of detail, while ≺ is viewed as an abstract analogue of the partial order relating pairs of states Δ and Δ
whenever Δ is unambiguously more detailed than Δ. One may partition Σ into a family of antichains
σ with respect to ≺. There are generally many different choices of partition, each analogous to a frame
of reference in relativity. In the entropic setting, elements of a given antichain σ are viewed as abstract
analogues of states sharing an equal level of detail. In the simplest case, the antichains σ “foliate” Σ,
in the sense that each nonextremal antichain σk has an unambiguous maximal predecessor σk−1 and
minimal successor σk+1 . More generally, the antichains σ form a partially ordered family. In either case,
the partition defines an atomic decomposition of Σ with respect to ≺, an idea revisited in a different
context in Section 3.3. In many cases, detail may be quantified in a variety of different ways, and this
leads to the consideration of families {≺α }α∈ A of partial orders on Σ. Such families are themselves
partially ordered via the order-theoretic version of refinement, under which ≺α precedes ≺ β if and only
if Δ ≺ β Δ whenever Δ ≺α Δ . An antichain with respect to ≺ β is then automatically an antichain with
respect to ≺α , so any partition of Σ induced by ≺ β refines at least one such partition induced by ≺α .
In this manner, the partial ordering by refinement of the family of partitions induced by {≺α }α∈ A
respects the partial ordering on {≺α }α∈ A itself. Hence, entropy systems defined in terms of such
partitions automatically respect the order-theoretic structure of Σ.

3.2. The Second Law

The familiar intuition regarding the second law of thermodynamics is that “entropy increases
with time”. Generalizing this idea to apply to the broad framework of entropy systems introduced
in Section 3.1 requires suitable analogues of “time” and “increase”. Time evolution is conventionally
represented by a directed curve in state space, and in this context the second law says that motion along
such a curve tends to pass from smaller to larger cells in a speciﬁed coarse-graining. The left-hand
diagram in Figure 13 illustrates such a curve γ. A typical curve originating in one of the two shaded
areas is likely to exhibit a systematic increase in entropy, at least for early times, since such curves begin
in small cells whose borders are dominated by larger cells. A typical curve originating elsewhere in the
state space does not exhibit such an increase in entropy. This illustrates the fact that both the structure
of state space and the region of origin of the curve describing the system of interest are relevant
to the existence of a recognizable second law. In the cosmology of the early universe, for example,
the question of why speciﬁc measures of entropy were initially relatively low is just as important as
the question of why entropy increased thereafter [44].

366
Entropy 2017, 19, 322

Wk
V

γ
L
γ

Figure 13. Curve in state space along which entropy increases; map from a linearly ordered set into an
entropy quadruple, showing no discernible second law.

The abstract analogue of a directed curve in state space is a map γ from a linearly ordered set L
into an entropy quadruple S = (S, Pα , P β , μαβ ). Such a map is illustrated in the right-hand diagram
in Figure 13. Here, L is drawn to suggest an interval in R, but in more general settings L may be
a non-continuous object such as an interval in Q, a discrete object such as an interval in Z, a finite
object such as the set {0, ..., N }, or even a transfinite object, such as the long line. The notion of an
increasing function requires similar generalization beyond the familiar setting of real analysis. Even in
conventional thermodynamics, strict definition of an increasing function must be relaxed, since the
second law is understood not as a prescription that entropy must increase over any time interval, but as
a description of the fact that entropy does increase with overwhelming likelihood over sufficiently
long time intervals. The map γ in the figure passes through cells of multiplicities 5, 2, 3, 7, 6, 6,
7 (again), 4, 2, 4, and 6 (again). Hence, the associated system does not obey a discernible version of the
second law. In the general case, it seems preferable to describe a variety of ways to define a version of
the second law for such a system than to isolate a particular choice via formal definition. An individual
map γ from a totally ordered set L into an entropy quadruple S = (S, Pα , P β , μαβ ), obeys a strict
version of the second law if for every pair of subsets V and V of S belonging to Pα , and for every
pair of elements and in L such that γ() ∈ V and γ( ) ∈ V , it is true that μαβ (V ) ≤ μαβ (V ).
Intuitively, this means that γ never passes from a large cell into a smaller cell. There are various
ways to relax this strict description. If L possesses a metric, then one may specify a rule relating the
size of the interval (, ) to the probability that μαβ (V ) ≤ μαβ (V ). If the target object of μαβ also
possesses a metric, then one may define something like a derivative, i.e., a rule relating the sizes
of the intervals (μαβ (V ), μαβ (V )) to the sizes of the corresponding intervals (, ). More generally,
a region U of S obeys a version of the second law if a typical map γ : L → S originating in U obeys
an individual version of the second law. The word “typical” may be made precise in terms of a
generalized measure on the space of maps γ. It is sometimes necessary to restrict attention to special
maps to obtain a clear pattern; for example, some entropy quadruples exhibit entropy increases along
typical “short curves”, but not along typical “long curves”. In particular, some cosmological models
posit a reversal of the second law in the distant past and/or future.

3.3. Discrete Causal State Spaces

In statistical thermodynamics, microstates are determined by information up to first order, e.g.,
by positions and momenta of individual molecules. Such information, together with the dynamical
laws of classical mechanics, is sufficient to recover higher-order information; one may uniquely evolve
a given state “backward in time”. Hence, if two states are indistinguishable up to first order, then they
are absolutely indistinguishable. In discrete causal theory, the situation is different. The analogue of
information up to first order in a finite acyclic directed set D is its first-degree terminal state T 1 ( D ),

367
Entropy 2017, 19, 322

which consists of all maximal elements of D, all relations terminating at these elements, and all
initial elements of these relations. Knowledge of T 1 ( D ) generally does not enable recovery of D.
One may propose a choice of classical dynamics implying such a relationship for very special classes of
directed sets, for example, by abstracting the Einstein–Hilbert action from general relativity, which takes
the form /
c4
SEH = R −det( g)d4 x, (5)
16πG X
in the simple vacuum case with zero cosmological constant. Here, g is a Lorentzian metric on a
4-dimensional manifold X, R is the curvature scalar arising from the metric connection, G is Newton’s
gravitational constant, and c is the speed of light. Yet despite interesting efforts in this direction,
for example, in causal set theory [45–47], such a strategy is dubious due to the amount of geometric
structure taken for granted in relativity. Geometric data such as metrics and curvature, and even
“pre-geometric” data such as dimension and topology, are emergent notions in discrete causal theory.
Action functionals in this context must be defined more fundamentally, and cannot be expected to
produce straightforward analogues of deterministic, time-symmetric Euler–Lagrange-type equations
that uniquely determine classical dynamics via information up to first order. In particular, elements of
a directed set D that are indistinguishable up to first order, i.e., permuted by an automorphism
of T 1 ( D ), may be distinguishable when one considers higher-order information. It is therefore necessary
to consider higher-degree terminal states in what follows. The form of Equation (4) does assume
that first-order information suffices at the level of kinematic schemes, in the sense that the phase of
an arbitrary co-relative kinematics is the product of the phases of its individual co-relative histories.
This picture may be generalized without leaving the general framework of path summation, but such
generalization is not undertaken here. In any case, the latter phases do generally depend nontrivially
on information above first order in the corresponding cobases and targets.
The simplest discrete causal analogues of familiar thermodynamic state spaces are nth-order state
spaces Dn , whose elements represent isomorphism classes of countable star finite acyclic directed sets Δ
with maximal chain length n. Equivalently, Rn (Δ) is a nonempty antichain. It is useful to preface formal
definitions involving Dn with some informal remarks. First, while the notion of order identifying a
state Δ as a member of Dn is intrinsic to Δ itself, the desired interpretation of Δ is as a terminal state of
a history D, containing information encoded by chains of length at most n terminating at maximal
elements of D. Second, it is usually impossible to choose a member of Dn that includes all such
information for n > 1, because chains of length at most n terminating at different maximal elements of
D may intersect to produce longer chains, thereby defining a higher-order state. One might consider
re-defining Dn to include such states, requiring only that each element be connected to a maximal
element by at least one chain of length at most n. In physical terms, such states are still composed of
elements exerting “recent influence”, but may contain chains of arbitrary length. However, such a
definition would not be ideal for the desired applications. For example, it would allow any countable
star finite acyclic directed set in which all chains are bounded above to be converted to a member
of D1 or D2 by adding new relations terminating at new maximal elements, thereby flouting the
intuition that low-order states should be “causally simple”. It is preferable to define a separate notion
called degree, which facilitates the definition of terminal states containing all information up to a
given order in a particular history. Following this idea, Definition 13 introduces special states T n ( D ),
called nth-degree terminal states, which include all information encoded in chains of length at most n
terminating at a maximal element in D. Third, as mentioned in Section 2.4, the distinction between
order and degree does not arise for n = 1; the first-degree terminal state T 1 ( D ) of D automatically
belongs to D1 . Fourth, the nth superset microstates introduced in Definition 18 are constructed by
adding n “prehistorical” elements to a state, which may not increase its maximal chain length at all.
These subtleties reflect the fact that more than one natural-number grading is useful in studying
discrete causal state spaces.

368
Entropy 2017, 19, 322

It is useful to define terminal states in terms of transitions between pairs of histories, using the
relative viewpoint. Though the ultimate goal is to use information encoded in terminal states to assign
phases to sequences of co-relative histories, i.e., co-relative kinematics, the states of principal interest
in studying a given co-relative history h : Di ⇒ Dt are typically not those induced by transitions
representing h. This is because the “physically new” structure associated with Di and Dt is more
meaningful than whatever structure h “adds to” Di to produce Dt . For example, each co-relative
history h : Di ⇒ Dt in SPS adds only one element to Di , so most of the physically new structure in Dt
is typically already present in Di . Yet what one is really interested in is whether or not the physically
new structure in Dt is “more favorable” than the physically new structure in Di ; i.e., one wishes to
compare terminal states of Di and Dt . These may be defined in terms of auxiliary transitions that are
determined by h, but do not represent h under Definition 9. First, however, one must define terminal
states associated with arbitrary transitions.

Deﬁnition 12. Let τ : D → D be a transition of acyclic directed sets. The subobject Δτ of D consisting of
all elements of D − τ ( D ), all relations terminating at such elements, and all initial elements of such relations,
is called the terminal state of τ. If Rn (Δτ ) is a nonempty antichain, then the order ord(Δτ ) of Δτ is n.

Despite the relative nature of Definition 12, it is convenient to refer to Δτ as a terminal state of
the target set D in many cases. Δτ does not include relations between elements of τ ( D ); it includes
only relations that are “new” with respect to τ. If the context is expanded to include cycles, a different
definition of order is necessary. For example, one may define ord(Δτ ) to be the maximal length
of non-self-intersecting chains in Δτ . Here, however, I focus almost exclusively on the acyclic case.
Any directed set D is itself the terminal state of the unique transition 1 → D . This transition may be
denoted by τ1 when the choice of target set D is obvious. As mentioned above, is useful to define
special terminal states that encode all information up to order n in a given history.

Deﬁnition 13. Let D be an acyclic directed set in which every chain is bounded above.

1. The nth-degree terminal state T n ( D ) of D is the subobject of D consisting of all elements connected to
a maximal element of D by a chain of length at most n, together with all relations in such chains.
2. The nth-degree initial state I n ( D ) of D is the subobject of D constructed by deleting all non-minimal
elements of T n ( D ) from D, together with all relations in D terminating at such elements.
3. The nth-degree transition τDn : I n ( D ) → D associated with D is the inclusion map I n ( D ) → D.

The boundedness hypothesis in Deﬁnition 13 is included to rule out situations in which D has
maximal elements but also has chains “extending to inﬁnity”, since it is awkward to exclude such
chains from consideration when studying terminal behavior. Such histories are not considered here.

Definition 14. The nth-order state space Dn is the set of all isomorphism classes of countable star finite
acyclic directed sets Δ such that Rn (Δ) is a nonempty antichain. The finite-order state space D is the disjoint
union ∞ n=0 D , and the (total, countable, acyclic) state space D is the set of all isomorphism classes of
n

countable acyclic directed sets, which may be viewed as limits of sequences in D.

Since the elements and relations in a member Δ of Dn are assumed to possess no internal structure,
one might expect Δ to be treated as a microstate. However, since discrete causal theory does not
rule out the dynamical relevance of information above order n at the level of individual histories,
data describing how Δ might fit into a larger history can be important in determining future behavior
influenced by Δ. Such data defines an even finer level of detail than Δ itself, permitting Δ to be viewed
as a macrostate. Ambiguity regarding the status of Δ is not surprising, due to the relative nature
of entropy. Figure 14 illustrates four different methods of defining coarse and fine levels of detail
using Dn . Informal discussion of these methods then precedes formal treatment in Definition 15.
The first diagram shows a third-order state Δ embedded in a history D. In this case, Δ does not

369
Entropy 2017, 19, 322

contain all the third-order information in D; in particular, it is not the third-degree terminal state T 3 ( D )
of D. The second diagram illustrates one way to treat Δ as a microstate, called a resolution microstate,
by approximating its structure via the method of causal atomic resolution, introduced in [14]. This method
involves choosing special subsets of Δ, called causal atoms, which serve as individual elements of
a coarser directed set. Such a choice defines a causal atomic decomposition of Δ. A sequence of such
decompositions is a causal atomic resolution, with each subsequence defining “initial” and “terminal”
levels of detail, and hence a notion of entropy. More generally, one may define partially ordered
families of decompositions, also called resolutions, which induce entropy systems. The resolution
in the figure involves a single decomposition, and hence just two levels of detail. Causal atomic
resolution provides perhaps the most obvious discrete causal analogue of conventional coarse-graining.
In particular, it involves actual approximation, meaning that the information contained in a causal
atomic decomposition is not only incomplete, but also imprecise. However, there is generally no
canonical choice of resolution for a given state, and different resolutions may be very dissimilar.
Further, resolutions reaching far above the fundamental scale can produce objects that are obviously
“too granular” to resemble physical spacetime. Members of Dn are usually treated as macrostates in
this paper, but methods such as causal atomic resolution remain worthy of further study in more
general entropic settings.

17
10 11 12 13 14
15 16 3
Δ 6 7 8
2
9
0 1 2 3 4 0 1
5
D
superset labeled symmetry
atomic microstate microstate microstate
∗
resolution η : Δ∗ ⇒ Δ :L→Δ : L → Δ̃
Figure 14. History D and terminal state Δ; causal atomic resolution of Δ; superset microstate of Δ;
labeled microstate of Δ; symmetry microstate of Δ.

The third diagram in Figure 14 illustrates the most obvious way to treat a member Δ of Dn as
a macrostate, by adding “prehistory” to define larger states called superset microstates. Different superset
microstates of Δ impose different constraints on the family of histories of which Δ could be a
terminal state. In particular, the superset Δ of Δ shown in the diagram is induced by the history D.
At a higher level of detail, Δ may itself be viewed as a macrostate, with its own superset microstates
adding more prehistory. One may imagine “flipping over” this diagram to obtain a co-relative
∗ ∗
history η : Δ∗ ⇒ Δ between the causal duals Δ∗ and Δ of Δ and Δ , and this is how superset
microstates are formalized in Definition 15. Hence, the convenient term “superset” is not quite precise,
because co-relative histories involve equivalence classes. Naïve amalgamation of superset microstates
produces a state space with an infinite number of elements in each cell, since one may always add more
prehistory to a directed set. This leads a priori to infinite multiplicities and entropies for finite states.
However, supersets adding “recent” data are expected to dominate dynamically, and families of
superset microstates may be filtered to reflect this expectation. In the case of finite states, one may
work with finite families of microstates defined in terms of numbers of elements and relations,
lengths of chains, sizes of antichains, and similar quantities. Here, I focus on families defined via the
number of prehistorical elements added to Δ. The quantity of superset microstates of a given type
is decreased by symmetries of Δ, which render equivalent different subsets of Δ. This meshes with
the intuition that high-entropy states should be “disordered”. For example, if Δ is an antichain of
cardinality K with automorphism group Aut(Δ) ∼ = SK , then there is only one way to add a single
prehistorical element and k relations to Δ for any k ≤ K, since the terminal elements of these relations

370
Entropy 2017, 19, 322

in Δ may be exchanged for any other k elements of Δ under Aut(Δ). By contrast, there are (Kk ) ways to
add such an element and relations to Δ if Aut(Δ) is trivial.
The fourth and fifth diagrams in Figure 14 illustrate contrasting ways to treat a member Δ of Dn as
a macrostate by focusing on its symmetries directly. Under the method illustrated in the fourth diagram,
a microstate of Δ is simply a copy of Δ labeled via a map : L → Δ, where L is a set of consecutive
natural numbers starting with zero, and where two labelings are regarded as equivalent if they are
related by an automorphism of Δ. Such a microstate is called a labeled microstate. The number of labeled
microstates associated with a state Δ of cardinality K ranges from 1 if Aut(Δ) ∼ = SK to K! if Aut(Δ)
is trivial. This method agrees qualitatively with the superset approach in the sense that high-entropy
states are those for which Aut(Δ) is small. The method illustrated in the fifth diagram essentially
reverses this relationship. Here, one begins with an arbitrary labeling : L → Δ̃, where Δ̃ is the subset
of Δ not fixed by Aut(Δ). Automorphisms of Δ convert to other labelings, each of which represents
a symmetry microstate. Such a microstate may be viewed as a “mode of symmetry breaking”, since it
breaks the symmetries of Δ in a specific way. For a finite state Δ, the number of symmetry microstates is
just |Aut(Δ)|, so high-entropy states are those for which Aut(Δ) is large. More generally, one may work
with non-surjective partial labelings : L → Δ̃ that leave a subgroup of Aut(Δ) unbroken. The labeling
in the figure is of this type, since there remains an automorphism of Δ interchanging the elements
indicated by arrows. The set of such partial labelings is partially ordered by extension, which is
interesting from the perspective of state-specific detail discussed at the end of Section 3.1. While it
is counterintuitive to associate high entropy with symmetry, there are arguments for entertaining
such possibilities. Symmetry is central to the theory of “elementary” particles, so certain special
structures that are locally symmetric, at least at measurable scales, are favored by the actual dynamics
of the physical universe. Such structures may be “attached” to underlying causal structure via
auxiliary algebraic information, but the strong interpretation of the causal metric hypothesis demands
an emergent description of both spacetime symmetries and internal symmetries. The most obvious
way to satisfy this demand is to incorporate some type of symmetry data directly into Equation (4).
Notions of entropy associated with superset microstates and/or labeled microstates might accomplish
a similar purpose, since their enumeration depends largely on symmetry considerations. Regardless of
the type of entropy chosen, an attractive though speculative idea is that elementary particles might
arise via local entropic traps, whereby certain regular structures that are small by conventional measures
but large compared to the fundamental scale might be very stable from an entropic perspective.
A mathematical result important in the study of superset microstates, labeled microstates,
and symmetry microstates is Bender and Robinson’s proof [37] that a typical acyclic directed set
D has trivial automorphism group, i.e., is rigid. This result applies asymptotically under modest
assumptions about the number of relations in D. However, these assumptions fail to hold for a typical
low-order terminal state Δ, since such a state has unusually large “spatial size” and small “causal size”,
and typically lacks enough relations to “bind elements in place”. Hence, Aut(Δ) is often nontrivial
for such a state. The extreme case is a zeroth-order state, whose automorphism group is the entire
symmetric group permuting its elements transitively. However, states tend to become increasingly
rigid as their order increases. Bender and Robinson’s result enables rough enumerations of the
number of high-order superset microstates and labeled microstates for a state Δ of a given cardinality.
It also suggests a novel explanation for why the details of the distant past seem to be irrelevant to
future dynamics, namely, because relatively few additional generations of elements must be added to
a typical low-order state to break most of its symmetries.

Definition 15. Dn , D, and D may be used to define finer state spaces, for which their members are macrostates.
∗
1. The nth-order superset state space DSUP
n is the set of full, originary co-relative histories η : Δ∗ ⇒ Δ .
where Δ is a member of Dn and Δ is a member of D. Its elements are called superset microstates.
The corresponding finite-order superset state space DSUP and (total, countable, acyclic) superset
state space DSUP are defined in the obvious ways.

371
Entropy 2017, 19, 322

2. The nth-order labeled state space DLAB n is the set of complete labelings of members Δ of Dn , where two
labelings of Δ are considered to be equivalent if they are related by an element of Aut(Δ). Its elements
are called labeled microstates. The corresponding finite-order labeled state space DLAB and
(total, countable, acyclic) labeled state space DLAB are defined in the obvious ways.
3. The nth-order symmetry state space DSYM n is the set of partial labelings of members Δ of Dn induced
by applying elements of Aut(Δ) to arbitrary initial labelings of the subsets Δ̃ of Δ not fixed by Aut(Δ).
Its elements are called symmetry microstates. The corresponding finite-order symmetry state space
DSYM and (total, countable, acyclic) symmetry state space DSYM are defined in the obvious ways.

The spaces DSUP

n , Dn , and Dn
LAB SYM , together with their larger counterparts, offer many alternative
notions of states at many different levels of detail, and induce a variety of entropy systems. The reason
why the co-relative history η in the definition of DSUP n is not assumed to be proper is because it is
sometimes convenient to view a state Δ as a superset microstate of itself, i.e., to take η to be the
co-relative history represented by the identity morphism Δ → Δ. The “full” and “originary” conditions
on η merely formalize the idea that η adds “prehistory” to Δ. It is sometimes convenient to refer to
∗
a superset Δ of Δ as a superset microstate of Δ if the choice of co-relative history η : Δ∗ ⇒ Δ is
clear from context, for example, if there is only one such co-relative history. Using this convention,
Figure 15 illustrates some of the superset microstates of the first-degree terminal state Δ7 appearing in
the sequential growth process in Figure 10. Each of these microstates is constructed by adding a single
prehistorical element to Δ7 , along with a family of prehistorical relations. The 22 microstates shown in
the figure each involve one or two extra relations. Overall, there are 96 such microstates, with between
zero and seven extra relations.

some
state Δ7 superset
microstates

possible
prehistorical
relations

prehistorical
element
Figure 15. 22 of the 96 superset microstates of Δ7 given by adding one prehistorical element.

For a state Δτ of cardinality K, the number of superset microstates adding a single element
is “roughly” 2K , if one ignores the contribution of symmetries. This reflects the idea that one may
choose any family of elements in Δτ to be in the direct future of the single prehistorical element,
since 2K is the sum of the binomial coefficients (Kk ) for 0 ≤ k ≤ K. Nontrivial symmetries of Δτ reduce
this number; in particular, the number of superset microstates of the first-degree terminal states Δ7 to
Δ11 in Figure 10 are 96, 64, 72, 144, and 132. Ignoring symmetries need not yield exactly 2K microstates,
due to a curious graph-theoretic phenomenon called pseudosimilarity, whereby one directed set may
be a terminal state of another in multiple distinct ways, even if the two sets differ by only a single
element. Figure 16 illustrates this subtlety via an example provided by Brendan McKay, in which
augmenting two copies of a state Δτ by a single prehistorical element in two different ways produces
isomorphic supersets. The drawing emphasizes the latter isomorphism; the fact that the black nodes
and edges represent two copies of the same state Δτ may be seen by matching up the elements labeled
x and y.

372
Entropy 2017, 19, 322

copies of Δτ

x
pseudosimilar
x y
elements

Figure 16. McKay’s example: a superset may induce multiple microstates via pseudosimilarity.

Figure 17 illustrates a small region of D1SYM whose macrostates are the first-degree terminal
states Δ7 to Δ11 appearing in the sequential growth process from Figure 10. The left-hand diagram
reproduces this process. In the middle diagram, Δ7 to Δ11 are represented by large cells labeled 7
to 11, subdivided into smaller cells representing symmetry microstates. Because the histories D7
to D11 are rigid, D1SYM accurately reflects relative distinguishability properties between terminal
states and their histories in this case, since every state symmetry is broken by its ambient history.
The figure highlights the fact that symmetry microstates of a given terminal state are isomorphic as
partially labeled directed sets, which raises the question of how they are distinct. The answer is that
there are multiple ways to break the automorphisms of the original states involved, even though
the resulting objects remain isomorphic. D1SYM generally has “too many microstates” for terminal
states of nonrigid histories, since it includes symmetry breaking information for symmetries that
remain unbroken. This issue may be addressed by restricting the class of permissible labelings.
The right-hand diagram represents the sequential growth process abstractly via a “curve” in D1SYM .
Since D1SYM encodes information only up to first order at the level of individual histories, the entire
curve is necessary to reconstruct the evolution of D11 . The corresponding regions of D1SUP and D1LAB
are much too large and cluttered to illustrate here, but the basic structural aspects are similar.

D11 1

2 0 0
h10 D10 1 0
0 2 1 10 10
1 2
D9 h9 2
1 0 1 other other
1 0
h8 D8 0 2 possible possible
2
11 1 0 ﬁrst-degree
11 ﬁrst-degree
D7 h7 0 1
9 states 9 states
0 1

Δ7 Δ8 7 7
8 8
(black) 0 1 1 0

Figure 17. Sequential growth process from Figure 10; region of D1SYM through which this process moves;
abstract view of the process.

Definitions 14 and 15 identify discrete causal state spaces as sets, but one may recognize additional
“geometric” structure on these spaces defined in terms of discrete operations that convert one state
to another. It is useful to define such operations for multidirected sets in general.

Deﬁnition 16. Let M and M be multidirected sets. Elementary operations on such sets are deﬁned as follows:

1. Add or delete an isolated element.

373
Entropy 2017, 19, 322

2. Add or delete a relation between two elements.

The absolute distance d( M, M ) between M and M is the minimal number of elementary operations required
to convert M to M , if this number is ﬁnite. Otherwise, d( M, M ) = ∞.

Notions of distance between pairs of states facilitate useful analogues of familiar evolutionary
ideas. For example, in conventional thermodynamics, one may ask why every system does not
immediately transition to the cell in state space representing thermal equilibrium. The answer is
that curves in state space are continuous in this context, so a typical system beginning far from
thermal equilibrium must pass through a sequence of intervening macrostates before reaching it.
Although literal continuity does not apply in the discrete causal context, similar ideas may be
invoked whenever one can define notions of distance and neighbors. In particular, even if a given
co-relative history is “favored” from a purely entropic perspective, it may be “costly” in the sense
that it entails direct passage between widely separated regions of a discrete causal state space.
Similarly, “short” paths between a given pair of states might be favored over “long” paths that
involve drastic changes in structure. These ideas are revisited in Section 4.2 in the context of
spacetime expansion, and again in Section 4.3 in the context of discrete causal action principles.
Alternative, relative notions of distance between pairs of directed or multidirected sets may
be defined in terms of “ambient” structure from a configuration space. In the case of directed sets,
such structure may originate from a kinematic scheme.

Deﬁnition 17. Let S = (K, H) be a kinematic scheme, and let D be a member of K in which every chain is
bounded above. Let T n ( D ) be the nth-degree terminal state of D, and let Δ be any other element of D.

1. The directed distance dS,D ( T n ( D ), Δ) between T n ( D ) and Δ in S with respect to D is the minimal
length of chains x ( D ) ≺ x ( D1 ) ≺ ... ≺ x ( D N ) in M(S), where T n ( D N ) = Δ.
2. The undirected distance S,D ( T n ( D ), Δ) between T n ( D ) and Δ in S with respect to D is the minimal
length of undirected paths x ( D ), x ( D1 ), ..., x ( D N ) in M(S) with initial element x ( D ) and terminal
element x ( D N ), where T n ( D N ) = Δ.

The reason why dS,D and S,D depend on a choice of D is because T n ( D ) and Δ may appear as
terminal states of many different histories in S. If T n ( D ) = T n ( D1 ) = T n ( D2 ), then it may be easier to
reach a history with nth-degree terminal state Δ from D1 than from D2 . The distinction between a chain
x ( D ) ≺ x ( D1 ) ≺ ... ≺ x ( D N ) and an undirected path x ( D ), x ( D1 ), ..., x ( D N ) is that chains respect
the directions of relations in M(S), while undirected paths generally do not. States close together in
an undirected sense may be far apart in a directed sense, since undirected paths are more general
than chains. Dependence on D implies that dS,D and S,D are inherently asymmetric. It is reasonable to
expect that dS,D and S,D may closely approximate more conventional notions of distance for suitable
classes of “large” directed sets, but this topic is not further explored here.

3.4. Multiplicities and Entropies

Four approaches to defining discrete causal microstates via terminal states of transitions were
introduced in Section 3.3. A preliminary step, given in Definition 14, was to define spaces Dn of
nth-order states, along with larger spaces D and D including states of arbitrary order. The first approach
was to treat the states making up these spaces as individual microstates, called resolution microstates,
and apply a discrete causal analogue of conventional coarse-graining, called causal atomic resolution,
to partition these spaces into cells. The remaining approaches treated such states as macrostates,
with finer state spaces of microstates introduced in Definition 15. The second approach was to add
detail to terminal states by specifying prehistorical information, leading to the spaces DSUP n , D
SUP ,
and DSUP of superset microstates. The third approach was to add detail to terminal states by labeling
their elements, leading to the spaces DLAB
n ,D
LAB , and DLAB of labeled microstates. The fourth approach

374
Entropy 2017, 19, 322

was to add detail to terminal states via partial labelings specifying symmetry breaking information,
leading to the spaces DSYM
n , DSYM , and DSYM of symmetry microstates.
Before explaining how discrete causal entropies may be defined via these four approaches,
I mention progress in the study of causal set entropy by Sorkin and collaborators [35,36]. This work
exhibits interesting relationships with analogous continuum-based notions, is supported by numerical
simulations involving “low-dimensional” causal sets, and incorporates covariance considerations.
However, it is very different in its assumptions and emphasis from the approaches examined in
this paper. First, the entropies involved are defined in terms of auxiliary fields on causal sets,
and are therefore not completely background independent quantities. Sorkin does consider causal set
“vacuum solutions”, whose entropies may be attributed solely to causal structure, but entropies associated
with nontrivial interactions typically involve large quantities of extra-causal data. Second, pre-packaged
quantum-theoretic machinery such as Hilbert spaces, operator algebras, density matrices, and von
Neumann-type entropy are applied to individual causal sets under this approach, rather than
emerging naturally from a history configuration space. Third, the permeability problem and other
technical obstructions arising in the absence of relation space methods render it difficult to define
terminal states or associated entropic data in this setting. The resulting measures of entropy are
a priori “higher-dimensional”, and can be associated only indirectly with conventional notions of
time-dependent entropy and the second law of thermodynamics. Fourth, many of the cases considered
under this approach involve special causal sets of the type mentioned in Section 2.2, induced by
sprinkling elements into relativistic spacetime manifolds. Such causal sets are naturally limited in their
potential to reveal structural features beyond the scope of general relativity.
I give only a brief sketch of how one may construct entropy systems via resolution microstates.
For simplicity, I describe this construction in terms of an individual nth-order state space Dn . The first
step is to choose a resolution of each state Δ in this space. In the simplest case, these resolutions may
be chosen to consist of single causal atomic decompositions. A choice of such decompositions defines
a coarse-graining of Dn , which induces an entropy quadruple, while a choice of resolutions involving
longer sequences of decompositions, or partially ordered families of decompositions, defines an
entropy system. In the general case, one may define a partially ordered family of equivalence relations
on Dn , specified by treating states as equivalent if their resolutions agree beyond a certain level
of detail. The associated equivalence classes then define partitions of Dn , and their cardinalities
define multiplicities. The resulting notion of entropy is called resolution entropy. One may choose to
define resolutions in such a way that each decomposition reduces the maximal length of chains in
each state by a specified quantity. For example, the decomposition illustrated in the second diagram in
Figure 14 converts a “fine” third-order state to a “rough” first-order state. An analogue of resolution
entropy appears in Sorkin’s approach to causal set entropy [35,36], but involves a random “decimation”
version of coarse-graining that does not incorporate causal structure in the same way that causal
atomic resolution does. It also involves “higher-dimensional” entropy, rather than entropy associated
with terminal states. However, numerical examples do hint at interesting universal behavior for this
type of entropy, and this evidence provides motivation for studying resolution entropy in more detail.
Numerous questions must be answered, however, before one may have confidence in the
resolution approach. The most basic is how sensitive resolution entropy is to changes of resolution,
since resolutions generally involve arbitrary extraphysical choices regarding the organization
of information. Another question, already mentioned in Section 3.3, is how one may reconcile
the increasing “granularity” produced by multi-level resolutions with the basic philosophy of
metric recovery, under which discrete causal structure at the fundamental scale should produce
effectively smooth structure at sufficiently large scales. A third issue arises from the empirical
dynamical irrelevance of details of the distant past. If only very low-order terminal states play
a substantial dynamical role in the future evolution of histories, then repeated causal atomic
decompositions of dynamically relevant states will produce antichains at relatively fine levels of detail.
Antichains possess no internal structure besides cardinality, which seems much too crude to determine

375
Entropy 2017, 19, 322

meaningful dynamics, especially locally. Therefore, the utility of resolution entropy seems to be
limited by the “causal depth” of relevant information. This issue does not necessarily disqualify the
resolution approach, however, due to the scales involved. In particular, the difference in magnitude
between the Planck scale and presently-measurable scales suggests than information up to order 1010
or 1015 could be relevant without producing noticeable deviations from the empirical obsolescence
of high-order information. A resolution involving decompositions similar to the one illustrated in
Figure 14 would require perhaps 30 decompositions to cover 10–15 orders of magnitude, and could
therefore contain a large quantity of information. However, such illustrations involving small histories
can be misleading; for example, it would not be surprising if each element in a typical physically
realistic history were directed related to 1010 or more other elements. Such large numbers of relations
would affect the qualitative properties of realistic resolutions.
Superset microstates offer a variety of different ways to define entropy systems via the state spaces
DSUP
n ,D
SUP , and DSUP . I begin by discussing simple notions of entropy involving individual partitions
of these spaces. For simplicity, I focus on the case of finite states. Let Δ be such a state, and consider
∗
all superset microstates η : Δ∗ ⇒ Δ adding a single prehistorical element to Δ. The number of
such microstates is the cardinality of the future relation set R+ ( x (Δ∗ )) in M(SPS ), since the number
of different ways in which Δ can be the terminal state of a history with one additional element is
the same as the number of ways in which Δ∗ can evolve into a history with one additional element.
As a reminder, x (Δ∗ ) is the element in the underlying multidirected set M(SPS ) of SPS representing Δ∗ ,
and R+ ( x (Δ∗ )) is the set of relations in M(SPS ) beginning at x (Δ∗ ), each of which represent a co-relative
history with cobase Δ∗ . The first superset multiplicity μ1SUP (Δ) of Δ is then defined to be the number
| R+ ( x (Δ∗ ))| of such microstates η, and the first superset entropy eSUP1 ( Δ ) is defined to be log μ1 ( Δ ).
SUP
Following essentially the same reasoning, nth superset multiplicities and entropies may be defined.

Deﬁnition 18. The nth superset multiplicity μSUPn ( Δ ) of a ﬁnite state Δ is the number of co-relative histories
∗
η : Δ ⇒ Δ , where the complement of the image of Δ∗ under any transition representing η has cardinality n.
∗
n ( Δ ) of Δ is log μn ( Δ ).
The nth superset entropy eSUP SUP

∗
An interesting entropy system on DSUP is given by ﬁltering superset microstates η : Δ∗ ⇒ Δ by
both the number of prehistorical elements added to Δ by η, and the order of the resulting supersets Δ .
DSUP has a natural partition whose members are the inﬁnite sets CSUP (Δ) parameterizing all full,
originary co-relative histories η with cobase Δ∗ and target belonging to D. One may partition each set
CSUP (Δ) by numbers of elements added to Δ, or by orders of supersets Δ , or by both. A general way to
∗ ∗
formalize the idea that two superset microstates η1 : Δ∗ ⇒ Δ1 and η2 : Δ∗ ⇒ Δ2 of Δ are equivalent
∗
up a given level of detail is to specify a common interpolating microstate η3 : Δ ⇒ Δ3 , characterized by
∗

the property that η1 and η2 both factor through η3 . This means that there exist pairs of transitions
τ3 ∗ τ ∗ τ3 ∗ τ ∗
Δ∗ − → Δ3 −→1
Δ1 and Δ∗ −→ Δ3 − →2
Δ2 , where τ3 and τ3 both represent η3 , and where the

compositions τ1 ◦ τ3 and τ2 ◦ τ3 represent η1 and η2 , respectively. Informally, this means that besides
being supersets of Δ, the states Δ1 and Δ2 also share common prehistorical elements. One may then
deﬁne equivalence relations ∼m and ∼n on DSUP , for each m, n ∈ N, where η1 ∼m η2 if η1 and η2
factor through a common interpolating microstate η3 adding m prehistorical elements to Δ, and where
η1 ∼n η2 if η1 and η2 factor through a common interpolating microstate η3 whose superset has order n.
Equivalence relations ∼(m,n) combine these two requirements. The corresponding partitions P(m,n) are

partially ordered lexicographically; i.e., P(m,n) ≺ P(m ,n ) if and only if m < m or m = m and n < n .
It is convenient to denote the pair (m, n) by the single symbol α, regarded as an element of N2 = N × N.
Informally, the partition Pα groups together superset microstates that agree both up to a given number
of prehistorical elements and a given order.

Deﬁnition 19. Let α = (m, n) ∈ N2 , and let ΠLEX := { Pα }α∈N2 be the set of partitions Pα of DSUP deﬁned
by taking superset microstates η1 and η2 of Δ to be equivalent if they factor through a common interpolating

376
Entropy 2017, 19, 322

∗ ∗ ∗
microstate η3 : Δ∗ ⇒ Δ3 of Δ represented by a transition τ3 : Δ∗ → Δ3 such that |Δ3 − τ3 (Δ∗ )| = m
and ord(Δ3 ) = n. Let ∼α be the corresponding equivalence relation, and for any subset V ⊂ DSUP , let V α be
the corresponding quotient set. For any relation Pα ≺ P β under the lexicographic order induced by N2 , and for
any subset V belonging to Pα , let μαβ (V β ) be the cardinality of V β . Let μLEX be the family of measures μαβ .
Then the triple (DSUP , ΠLEX , μLEX ) is called the lexicographic superset entropy system.

The measures μαβ (V β ) may take on infinite values; for example, there are infinitely many ways
to add a single prehistorical element to N. Definition 19 does not specify the number of relations
added to Δ by each microstate, or the maximal sizes of antichains in the corresponding supersets,
or any of a variety of other basic combinatorial data that may be used to partition DSUP in
different ways. Using such quantities, one may define alternative entropy systems, involving,
for example, “higher-dimensional” lexicographic orders. This particular entropy system merely
formalizes some of the simpler properties that may be used to organize families of superset microstates.
Labeled microstates also induce a variety of entropic notions. The most obvious is given by simply
counting the number of equivalence classes of labelings of a state Δ. If Δ has cardinality K, then its total
number of labelings is K!. These labelings are partitioned by the action of Aut(Δ) into equivalence
classes of cardinality |Aut(Δ)|, so the number of such classes is K!/|Aut(Δ)|.

Deﬁnition 20. The labeled multiplicity μLAB (Δ) of a state Δ of cardinality K is K!/|Aut(Δ)|. The labeled
entropy eLAB (Δ) of Δ is log μLAB (Δ) = log K! − log |Aut(Δ)|.

It is sometimes desirable to decompose the subset CLAB (Δ) of DLAB consisting of all equivalence
classes of labelings of Δ. This may be accomplished via equivalence classes of partial labelings
of Δ, i.e., labelings of special subsets U of Δ. To yield a suitable version of equivalence, U must be a
union of orbits under Aut(Δ), and the labeling must be by consecutive natural numbers beginning
with zero. The set of equivalence classes of such partial labelings is partially ordered by extension of
class representatives. A labeling of U corresponds to a subset CLAB () of CLAB (Δ) defined by labelings
of Δ extending . Letting U and vary, one obtains a family of sets {CLAB ()} that cover CLAB (Δ),
generally in a highly redundant fashion. A partition of CLAB (Δ) induced by partial labelings of Δ is defined
to be a partition whose members are open sets in the topology on CLAB (Δ) generated by {CLAB ()},
i.e., unions of finite intersections of members of {CLAB ()}. Choosing such a partition for each Δ
defines a partition of DLAB , and the collection of all such partitions forms a “large” entropy system.
Smaller subsystems may be more convenient to work with in practice.

Deﬁnition 21. Let Δ be a member of D, and let CLAB (Δ) be the subset of DLAB consisting of all equivalence
classes of labelings of Δ. Let ΠLAB (Δ) be the set of partitions of CLAB (Δ) induced by partial labelings of Δ,
and let ΠLAB be the set of partitions of DLAB constructed from the partitions ΠLAB (Δ), partially ordered
by reﬁnement. For any relation Pα ≺ P β in ΠLAB , and for any subset V belonging to Pα , let μαβ (V β ) be the
cardinality of the quotient set V β of V under the equivalence relation ∼ β induced by P β . Let μLAB be the family
of measures μαβ . Then the triple (DLAB , ΠLAB , μLAB ) is called the labeled entropy system.

Symmetry microstates share entropic similarities with labeling microstates, since both approaches
involve labelings. The principal differences are that symmetry microstates label only elements of a state
Δ that are not fixed by its automorphisms, and labelings related by automorphisms are not considered
to be equivalent. It is convenient to fix an arbitrary “initial” labeling on the set Δ̃ of elements of Δ not
fixed by Aut(Δ), i.e., the union of nonsingleton orbits under Aut(Δ). A labeling of Δ̃ is then considered
permissible if it is generated by applying an element of Aut(Δ) to this initial labeling. The number of
such labelings is just the order |Aut(Δ)| of Aut(Δ).

Deﬁnition 22. The symmetry multiplicity μSYM (Δ) of a ﬁnite state Δ is |Aut(Δ)|. The symmetry
entropy eSYM (Δ) of Δ is log μSYM (Δ) = log |Aut(Δ)|.

377
Entropy 2017, 19, 322

By Definitions 20 and 22, μLAB (Δ)μSYM (Δ) = K! for a state Δ of cardinality K. Processes exhibiting
an increase in eLAB therefore exhibit a decrease in eSYM for a fixed state cardinality, and vice versa,
although “expanding universes” may exhibit simultaneous increases in both types of entropy. As in
the case of labeled microstates, it is sometimes desirable to decompose the subset CSYM (Δ) of DSYM
consisting of all permissible labelings of Δ̃. This may be accomplished by partially labeling Δ̃ in a
suitable manner; in particular, the set U of elements labeled must be a union of nonsingleton orbits
under Aut(Δ). Such a labeling defines a subset CSYM () of CSYM (Δ) consisting of all labelings of Δ̃
extending . The set of all such labelings for all such U is partially ordered by extension. The collection
of sets {CSYM ()} define a family of partitions of DSYM , and hence an entropy system.

Definition 23. Let Δ be a member of D, and let CSYM (Δ) be the subset of DSYM consisting of all permissible
labelings of the set Δ̃ of elements of Δ not fixed by Aut(Δ), with respect to an arbitrary initial labeling.
Let ΠSYM (Δ) be the set of partitions of CSYM (Δ) induced by partial labelings of Δ̃, and let ΠSYM be the set of
partitions of DSYM constructed from the partitions ΠSYM (Δ), partially ordered by refinement. For any relation
Pα ≺ P β in ΠSYM , and for any subset V belonging to Pα , let μαβ (V β ) be the cardinality of the quotient set V β
of V under the equivalence relation ∼ β induced by P β . Let μSYM be the family of measures μαβ . Then the triple
(DSYM , ΠSYM , μSYM ) is called the symmetry entropy system.

It may often suffice on physical grounds to restrict attention to notions of entropy more specific
than those associated with the entropy systems of Definitions 19, 21 and 23, although it may be
necessary to supersede the simplistic notions of Definitions 18, 20 and 22. For superset microstates,
weighted sums of entropies can be useful to naturally distill finite entropic values from infinite families
of microstates. Abstractly, such sums are analogous to Gibbs or Shannon entropies. A practical
reason to study such sums is to quantify the degree to which prehistorical data of various orders is
dynamically relevant. A simple example of such a weighted sum is
∞ n (Δ)
eSUP
e(Δ) = ∑ n4
, (6)
n =1

where the denominator n4 dominates the rapid growth of eSUP n ( Δ ) as n increases. For both

labeled microstates and symmetry microstates, symmetry considerations are paramount. Interesting
generalizations of Definitions 20 and 22 include those involving the study of symmetries that are
broken or preserved by specific prehistorical information. This leads to the concept of extension groups,
which measure how many automorphisms of a terminal state extend to automorphisms of a specified
superset. One may formalize this idea in terms of pairs of transitions (τ1 , τ2 ), where τ1 specifies a
terminal state Δτ1 , and τ2 specifies a superset Δτ2 of Δτ1 that breaks some of the symmetries of Δτ1 .
Finiteness assumptions may be added as necessary.

Deﬁnition 24. Let τ, τ1 and τ2 be transitions of directed sets with sources D, D1 and D2 , and common
target D . Assume that τ2 ( D2 ) ⊂ τ1 ( D1 ) in D . Let Δτ , Δτ1 and Δτ2 be the terminal states of τ, τ1 , and τ2 .

1. The state automorphism group of τ is Aut(Δτ ).

2. The relative extension group Eτ1 τ2 of (τ1 , τ2 ) is the subgroup of Aut(Δτ1 ) of automorphisms of Δτ1 that
extend to automorphisms of Δτ2 .
τ1 τ2
3. The relative symmetry multiplicity μSYM of (τ1 , τ2 ) is |Aut(Δτ1 )| − | Eτ1 τ2 |.
τ1 τ2 τ1 τ2
4. The relative symmetry entropy eSYM of (τ1 , τ2 ) is log μSYM .

The generational automorphism groups discussed in Section 8.2 of [14] are special cases of
τ1 τ2 τ1 τ2
state automorphism groups. The quantities μSYM and eSYM may be derived from the symmetry
entropy system, if desired. E τ1 τ2 is generally not a normal subgroup of Aut(Δτ1 ). The superset Δτ2 may
acquire “new” symmetries that do not extend nontrivial symmetries of Δτ1 , but this is atypical due
to rigidity. Since the purpose of studying entropic phase maps is to assign quantum-theoretic phases

378
Entropy 2017, 19, 322

to co-relative kinematics, it is necessary to adapt the preceding notions to apply to co-relative histories
h : Di ⇒ Dt in a kinematic scheme S. The states of principal interest in this context are terminal states
of the cobase Di and target Dt of h. For generality, it is convenient to work with an unspeciﬁed entropy
function on a subset of D. Again, ﬁniteness assumptions may be added as necessary.

Deﬁnition 25. Let h : Di ⇒ Dt be a co-relative history. Let Δτi and Δτt be terminal states of Di
and Dt , respectively. Let e be an entropy function on a subset of D.
τ
1. The initial entropy ei i (h) of h with respect to τi is e(Δτi ).
2. The terminal entropy etτt (h) of h with respect to τt is e(Δτt ).
3. The relative entropy eτi τt (h) of h with respect to the pair (τi , τt ) is e(Δτt ) − e(Δτi ).

It is useful to specialize Definition 25 to the case where τi and τt are transitions of specific degrees,
as specified in Definition 13.

Deﬁnition 26. Let h : Di ⇒ Dt be a co-relative history, and let e be an entropy function on a subset of D.

1. The nth initial entropy ein (h) of h is e( T n ( Di )).

2. The nth terminal entropy etn (h) of h is e( T n ( Dt )).
3. The nth relative entropy en (h) of h is e( T n ( Dt )) − e( T n ( Di )).

4. Entropic Phase Maps

4.1. Examples of Phase Maps

Given an entropy function e on a subset U of the state space D, one may assign relative entropies
eτi τt (h) = e(Δτt ) − e(Δτi ) to each co-relative history h : Di ⇒ Dt in a kinematic scheme S whose
histories have terminal states in U, where Δτi and Δτt are terminal states of Di and Dt with respect to
transitions τi and τt . Abstracting Feynman’s approach, one may then associate a quantum-theoretic
' (
phase θe (r (h)) = exp ieτi τt (h) with the relation r (h) representing h in R(M(S)). As explained in
Section 1.4, this approach may be generalized to allow for target objects other than the unit circle S1 ,
but such generalization is not carried out here. The subscript e in the expression θe indicates that this
function is deﬁned directly in terms of entropy, rather than multiplicity, entropy per unit volume,
or some other variant of entropic information. Of course, θe also depends on the choices of transitions τi
and τt , but this dependence is suppressed to avoid notational clutter. For a co-relative kinematics in S,
represented by a chain γ = r (h0 ) ≺ ... ≺ r (h N ) of relations r (hk ) in R(M(S)) representing co-relative
histories hk : Dik ⇒ Dtk for 0 ≤ k ≤ N, one may extend θe multiplicatively to deﬁne a phase map

N ' (
Θe (γ) = ∏ exp ieτik τtk (hk ) , (7)
k =0

where Δτik and Δτtk are terminal states of Dik and Dtk with respect to transitions τik and τtk .
This approach restricts attention to causal Schrödinger-type equations of the form given in Equation (4),
since this equation is deﬁned in terms of a relation function θ, rather than a possibly nonmultiplicative
phase map. Since the target of hk coincides with the cobase of hk+1 , it is often reasonable to choose
τi(k+1) = τtk . With these choices, the product in Equation (7) telescopes to yield the simpler expression
' (
Θe (γ) = exp i e(ΔτtN ) − e(Δτi0 ) . (8)

This telescoping property implies that the value of Θe is independent of the choice of chain γ in
R(M(S)) between r (h0 ) and r (h N ), a feature revisited in Section 4.2. It is sometimes convenient to use
the shorthand eτi0 τtN (γ) for the entropic quantity e(ΔτtN ) − e(Δτi0 ) multiplying i in the exponential in
Equation (8), which generalizes the expression eτi τt (h) = e(Δτt ) − e(Δτi ) appearing in Deﬁnition 25

379
Entropy 2017, 19, 322

for a single co-relative history h : Di ⇒ Dt . The simplest such phase maps Θe are given by choosing
Δτik and Δτtk to be the mth-degree terminal states T m ( Dik ) and T m ( Dtk ) defined via the mth-degree
transitions τik = τDm and τtk = τDm under Definition 13, for some natural number m. I focus principally
ik tk
on phase maps of this form in what follows. The primitive phase maps discussed in Section 8.2 of [14]
are defined exclusively in terms of terminal states of transitions representing the co-relative histories
h0 , ..., h N . The approach described here is more general.
Referring to Section 3.4, there are many possible ways to define an entropy function e to determine
specific content for Equation (7) or Equation (8). No specific examples involving resolution entropy are
computed here, since the details of this approach are outside the scope of this paper. In rough terms,
however, the multiplicities assigned to terminal states in this context are the numbers of such states
sharing common resolutions, and the corresponding entropies are the logarithms of these multiplicities.
An obvious qualitative conclusion that may be drawn in this context is that maximizing the entropic
quantity eτi0 τtN (γ) = e(ΔτtN ) − e(Δτi0 ) tends to favor “expanding universe” scenarios, in which the
cardinality of ΔτtN exceeds that of Δτi0 , provided that the sizes of causal atoms are roughly equal
in decompositions of states of different sizes. This qualitative relationship may be understood by
“inverting” the decomposition process, replacing each element in a directed set with a causal atom;
there are clearly more ways to do this for larger sets. Qualitative entropic preference for expanding
universe scenarios is in fact a generic feature of discrete causal notions of entropy; this is a posteriori
obvious on basic enumerative grounds. Cosmological observations do favor accelerating expansion
of spacetime, but the correspondence between large universes and high overall entropy is much too
general to favor discrete causal theory specifically. Conventional thermodynamic systems exhibit
increasing entropy without acquiring new degrees of freedom, and this suggests examining the notion
of entropy per unit volume to “correct” for differences in the sizes of states. This idea is revisited in more
detail below. It should also be emphasized that the quantity eτi0 τtN (γ) appears here in a role analogous
to that of the classical action S in Feynman’s phase map, which is typically minimized for favored
trajectories under Hamilton’s principle of stationary action. This suggests the possibility of adding a
minus sign to the exponents in Equations (7) and (8), thus treating eτi0 τtN (γ) as a “negative action”.
Regardless of this choice, the quantity eτi0 τtN (γ) must obey some analogue of stationary action to
produce suitable interference effects, for example, by exhibiting similar values for similar states of
high entropy. This nontrivial requirement is elaborated in Section 4.2.
A simple specific choice for the entropy function e in Equations (7) and (8) is the nth superset
entropy function eSUP n of Definition 18. Choosing Δτi0 = T m ( Di0 ) and ΔτtN = T m ( DtN ) in Equation (8)
yields the phase map
' (
n
Θe (γ) = exp i eSUP ( T m ( DtN )) − eSUP
n
( T m ( Di0 )) . (9)

Even this simple phase map is difficult to compute exactly for arbitrary values of m and n, since it
requires calculating all possible ways to add n prehistorical elements and an unspecified number
of relations to T m ( Di0 ) and T m ( DtN ). However, a few special cases may be computed, and rough
qualitative conclusions may be drawn. Beginning with m = 0, T 0 ( Di0 ) and T 0 ( DtN ) are just antichains
consisting of the maximal elements of Di0 and DtN , respectively. In the finite case, their cardinalities
are natural numbers Ki0 and KtN . If also n = 0, then
' ( ' (
0
Θe (γ) = exp i eSUP ( T 0 ( DtN )) − eSUP
0
( T 0 ( Di0 )) = exp i log 1 − log 1 = e0 = 1,

for any choice of γ, since there is exactly one way to add zero elements to each of the directed
sets T 0 ( Di0 ) and T 0 ( DtN ). More generally, trivial supersets produce trivial superset entropies.
Taking m = 0 and n = 1 in Equation (9) still involves zeroth-degree terminal states, but adds nontrivial
information to these states. The ﬁrst superset multiplicity μ1SUP ( T 0 ( Di0 )) of T 0 ( Di0 ) under Deﬁnition 18
is Ki0 + 1, because a superset of an antichain given by adding a single prehistorical element is

380
Entropy 2017, 19, 322

determined up to isomorphism by its number of relations, which may range from 0 to Ki0 in this case.
Similarly, the multiplicity μ1SUP ( T 0 ( DtN )) is KtN + 1, so with these choices
' (
Θe (γ) = exp i log(KtN + 1) − log(Ki0 + 1) .

Here, the entropic preference for “expanding universe” scenarios is quantitatively obvious, and the
same effect clearly extends to higher-order states and higher-index superset entropy functions,
since there are typically more ways to add families of prehistorical elements to large directed sets than to
small ones. Conventional thermodynamics suggests that working with zeroth-degree terminal states is
likely inadequate to determine relevant entropic quantities, so a more serious treatment involves states
of higher degree. Substituting first-degree terminal states T 1 ( Di0 ) and T 1 ( DtN ) into Equation (9) yields
the most obvious discrete causal analogue of conventional thermodynamic entropy in the superset
context. Zeroth superset entropies offer no useful information, so the first interesting case is given by
setting m = n = 1. This requires computing the number of ways to add a single prehistorical element
to a first-degree terminal state of cardinality K, an interesting enumerative problem. Referring to the
discussion following Figure 15, a very rough estimate of this number is 2K , assuming that the state is
nearly rigid. This produces an estimate of

Θe (γ) ≈ exp i (KtN − Ki0 ) log 2

for the resulting phase map, which again suggests an entropic preference for “expanding universe”
scenarios. Applying higher-index entropy maps eSUP n in this context leads to further intricate
enumerations, but rough estimates may again be formulated. Ignoring symmetries, overcounting,
and multidirected structure of the type illustrated by McKay’s example in Figure 16, the nth superset
multiplicity μSUP
n ( Δ ) of a state Δ of cardinality K and arbitrary order is roughly

n n n2
n
μSUP (Δ) ≈ ∏ 2K+k = 2(2)+Kn = 2 2 +O(n) , (10)
k =1
√
which corresponds to superset entropies of roughly n2 log 2 + O(n). This estimate is derived
by adding prehistorical elements sequentially, and naïvely multiplying together the estimated
multiplicities at each step. The factor n2 explains the choice of denominators n4 in the summands in
Equation (6), which offers a simple way to ensure convergence of the series. Equation (10) yields better
estimates for higher-order states, which are typically more rigid. For zeroth-order states, it is a very
poor estimate, particularly for low-index superset entropies. For first-degree terminal states, its overall
accuracy depends on the asymptotic behavior of automorphism groups of states of increasing size.
The mathematical interest of terminal states of low but nonzero degree arises largely from the
fact that their behavior is balanced between the rigidity of high-order states and the transitivity of
zeroth-order states in a group-theoretic sense. Estimates assuming rigidity, such as Equation (10),
are naturally rough in this context, but can nonetheless provide useful upper bounds. As in the case
of resolution entropy, conventional thermodynamic analogies suggest studying entropies per unit
volume in the superset context. The necessity of demonstrating suitable interference effects under path
summation also remains central. Since there is generally no natural limit to “how far back in time” one
may extend supersets, filtering methods associated with the lexicographic superset entropy system of
Definition 19, such as such the weighted sum of entropies in Equation (6), are of interest for organizing
relevant information, while respecting the relative insignificance of the distant past, and producing
finite values for physically meaningful quantities.
The labeled entropy function eLAB of Definition 20 offers another choice for the entropy function e
in Equations (7) and (8). A trivial case is when Δτi0 = T 0 ( Di0 ) and ΔτtN = T 0 ( DtN ). Since these states
are antichains, they are transitive under their automorphism groups; i.e., each consists of a single orbit.

381
Entropy 2017, 19, 322

Hence, all labelings of these states are equivalent, so their labeled multiplicities are equal to 1, and their
labeled entropies are equal to zero. Thus, Θe (γ) = e0 = 1 for any choice of γ. For higher-degree states,
the situation is more interesting. Referring again to Deﬁnition 20, the labeled multiplicity μLAB (Δ) of
an arbitrary state Δ of cardinality K is K!/|Aut(Δ)|. In particular, the multiplicity of 1 for a zeroth-order
state may be interpreted as the ratio K!/K!. This ratio typically increases toward K! for a sequence of
states of increasing order, since such states tend to become increasingly rigid. For such a sequence
constructed by adding new levels of structure to an initial state, the state cardinality K in the ratio
K!/|Aut(Δ)| is itself an increasing function, but this ratio is particularly interesting in the study of
entropy per unit volume, which corrects for increasing K. Low-order states often possess nontrivial
automorphism groups, and the computation of labeled entropies for such states leads to interesting
enumerative problems. The dynamical insigniﬁcance of the distant past suggests that these states
are also the most interesting from an evolutionary perspective. For high-degree states T m ( Di0 ) and
T m ( DtN ) of cardinalities Ki0 and KtN , abbreviated to K and K for legibility, typical labeled multiplicities
are approximately K! and K ! by rigidity, and the corresponding entropies are approximately

eLAB ( T m ( Di0 )) ≈ log K! = K log K − K + O(log K )

and
eLAB ( T m ( DtN )) ≈ log K ! = K log K − K + O(log K ),

by Stirling’s approximation. These estimates lead to a phase map with values of roughly
' (
Θe (γ) ≈ exp i log(K !/K!) ≈ exp i K log K − K log K , (11)

where the last expression omits the linear and logarithmic terms in Stirling’s approximation,
since rigidity is only generic and asymptotic. As in previous examples, maximizing the entropic
quantity eτi0 τtN (γ) ≈ K log K − K log K in this context favors “expanding universe” scenarios.
More sophisticated phase maps involving filtering methods such as weighted sums associated with
the labeled entropy system of Definition 21 are also of interest in this context.
Phase maps derived from symmetry entropies may be treated in a similar manner, although high
labeled entropies correspond to low symmetry entropies, and vice versa, after accounting for the
cardinalities of the states under consideration. If e = eSYM , then the symmetry multiplicities of the
0 ( D ) of cardinalities K and K are K! and K !, so the corresponding
zeroth-degree states T 0 ( Di0 ) and T tN
phase Θe (γ) = exp i log(K !/K!) is the same as the estimate given in Equation (11) for the phase
induced by labeled entropies of nearly-rigid states T m ( Di0 ) and T m ( DtN ) of the same cardinalities.
Conversely, for nearly-rigid states, phase values induced by symmetry entropies are near e0 = 1.
Again, the most interesting behavior occurs for terminal states of relatively low but nonzero degree,
which possess limited but nontrivial causal structure, and have limited but nontrivial symmetries.
More sophisticated phase maps may be constructed in terms of the symmetry entropy system of
Definition 23. For example, it is interesting to compare entropies associated with terminal states of
different degrees for the same history, using the relative notions introduced in Definition 24.

4.2. Interference Effects

Feynman’s path integral reinforces the contributions of paths near the classical path γCL of
a particle, via constructive interference, while faraway paths are damped out via destructive interference.
Mathematically, this means that the phases assigned to paths near γCL tend to cluster near each other
on the unit circle S1 , inducing large amplitudes for neighborhoods of γCL , while the phases assigned to
faraway paths tend to scatter around S1 , leading to cancellation. To produce this type of behavior, paths
i
near γCL must possess similar phases. As explained in Section 1.2, Feynman’s phase map Θ(γ) = e h̄ S(γ)
satisﬁes this condition due to Hamilton’s principle, i.e., because γCL renders the classical action S
stationary. In the discrete causal context, analogous relationships must be identiﬁed and exploited for

382
Entropy 2017, 19, 322

the path summation approach to succeed. Much of the appeal of entropic phase maps in this setting
arises from the fact that the idea of entropy is sufficiently general to produce a variety of discrete causal
quantities with interesting interference-related behavior that may resemble that of S, while remaining
sufficiently specific to offer meaningful physical interpretations. This is not to suggest that S is similar
to conventional entropy in other ways; indeed, S is a cumulative quantity that is typically minimized
by favored processes, which are typically time-symmetric, while entropy is conventionally understood
as an instantaneous quantity whose increase is observed to follow, and in some settings is believed to
possibly generate, the arrow of time. It is the role of discrete causal entropy in producing desirable
interference effects that must be “action-like” in the context of entropic phase maps. This is one reason
why it is reasonable to simultaneously entertain essentially opposite versions of entropy in this setting,
such as labeled entropy and symmetry entropy. In a similar manner, discrete causal action principles
need not closely resemble conventional motion-related or metric-related action principles in general,
provided that they play an analogous abstract role. The action principles discussed in Section 4.3 are
chosen with conventional definitions in mind, but many other choices are possible.
It is therefore interesting to explore which, if any, discrete causal notions of entropy can produce
“clustering effects” for phases that mimic stationary action in a suitable manner. I begin with a
simple “very early universe scenario” in SPS , involving a toy co-relative kinematics represented by
a chain γ = r (h0 ) ≺ ... ≺ r (h N ) of relations r (hk ) in R(M(SPS )) representing co-relative histories
hk : Dik ⇒ Dtk for 0 ≤ k ≤ N. In the general telescoping entropic phase map
' (
Θe (γ) = exp i e(ΔτtN ) − e(Δτi0 )

of Equation (8), I choose e to be the symmetry entropy function eSYM of Definition 22, and Δτi0 and
ΔτtN to be zeroth-degree terminal states T 0 ( Di0 ) and T 0 ( DtN ) of cardinalities 5 and 10, respectively.
' (
With these choices, Θe (γ) = exp i (log 10! − log 5!) = ei(10.3169...) . Phases determined by this
particular map are very unstable for small changes in the sizes of T 0 ( Di0 ) and T 0 ( DtN ). For example,
adding one additional element to T 0 ( DtN ) yields a phase of ei(12.7148...) , which is separated from
Θe (γ) by an angle of about 3π/4 on S1 . More generally, since log(K + 1)! − log K! = log(K + 1),
adding even a single additional maximal element to an arbitrary zeroth-order terminal state
produces a much different symmetry multiplicity, and this behavior only increases for large histories.
Working with entropy per unit volume, instead of raw entropy, trades this instability for a profound,
and perhaps excessive, stability. By Stirling’s approximation, the entropy per unit volume of T 0 ( DtN )
is roughly log | T 0 ( DtN )| in this example, a quantity which is very stable under small changes in the
size of T 0 ( DtN ). Using ballpark figures for fundamental units, the observable universe may possess
a spatial volume of about 10180 in a suitable frame of reference, and treating Hubble’s “constant” as
actually constant gives a doubling time of about 1060 . Depending on the choice of kinematic scheme,
one may therefore imagine a chain of perhaps 1060 to 10180 co-relative histories leading to a change in
entropy per unit volume of about log 2. Hence, this simplistic notion of entropy per unit volume does
not seem to change very rapidly in the actual universe.
The chain independence property for the general telescoping entropic phase map Θe of
Equation (8) is at least superficially attractive in the path summation context, since it suggests large
amplitudes for processes possessing large numbers of evolutionary pathways. What is really needed,
however, is a stronger property that produces “nearly identical phases” for “nearly identical physics”,
rather than merely producing identical phases for alternative descriptions of identical physics. A class
of maps that often exhibits this type of behavior is the class of telescoping multiplicity phase maps
' (
Θμ (γ) = exp iμ(ΔτtN )/μ(Δτi0 ) . (12)

Even a modest increase in entropy between Δτi0 and ΔτtN corresponds to a ratio μ(ΔτtN )/μ(Δτi0 )
that is near zero. Phases Θμ (γ) for chains γ exhibiting large increases in entropy therefore
constructively interfere, clustering near the complex number ei0 = 1. Similar behavior is not evident in

383
Entropy 2017, 19, 322

Equation (8), because the entropic quantity eτi0 τtN (γ) = e(ΔτtN ) − e(Δτi0 ) in the exponent of Θe typically
has nonnegligible magnitude compared to the circumference 2π of S1 . Hence, two chains γ and γ’
with “similar” final co-relative histories exhibiting large but distinct entropies may possess phases
Θe (γ) and Θe (γ ) far apart on S1 , which does not suggest encouraging interference properties for Θe .
For example, suppose that Δτi0 is rigid, and compare two different chains γ and γ with final co-relative
histories h N and hN exhibiting symmetry multiplicities μSYM (ΔτtN ) = K and μSYM (ΔτtN ) = 6K.
Here, ΔτtN and ΔτtN may be nearly-identical first-degree terminal states, differing, for example, by a
single “trident-shaped" component contributing a symmetry factor of S3 . However, the difference
between the entropic quantities eτi0 τtN (γ) and eτi0 τtN (γ ) in Θe (γ) and Θ1e (γ ) is log 6, which translates
to an angular separation exceeding π/2. This example suggests that very similar processes can
destructively interfere under Θe . In contrast, the angular separation between Θμ (γ ) and Θμ (γ) in
this example is 1/6K, so that both phases are very near ei0 = 1 for large K. Unfortunately, the map
Θμ in Equation (12) seems to exhibit too much constructive interference, in the sense that it assigns
a phase near 1 to every chain involving a modest increase in entropy. The precedent of Feynman’s
i
phase map Θ(γ) = e h̄ S(γ) suggests that the entropic quantities multiplying i in a phase map should
not be uniformly small for “physically reasonable” chains. Indeed, by scaling the classical action S
by Planck’s reduced constant h̄, Feynman’s map allows these multipliers to differ appreciably for
modestly different paths describing the behavior of systems for which quantum effects are noticeable,
such as the motion of individual electrons.
It seems, then, that the “additive recipe” of Equation (8) may produce too little constructive
interference, while the “multiplicative recipe” of Equation (12) may produce too much. There are many
possible ways to address this issue. It should be noted that the problem with Equation (12) seems to
be much more serious, producing an obviously wrong answer, whereas for Equation (8) it is merely
unclear what the interference behavior looks like for physically realistic histories. If one chooses,
then, to study modifications of Equation (8), there are at least two obvious methods to explore.
First, one may adjust Θe via a positive real-valued scale factor s, analogous to h̄. The resulting phase
map is of the form
i' (
Θs (γ) = exp e(ΔτtN ) − e(Δτi0 ) . (13)
s
Choosing s > 1 produces more tightly-clustered phases, thereby increasing constructive interference
for similar processes. The obvious question then becomes how to choose s in a non-arbitrary manner.
This immediately suggests a second method of modifying Θe , by adjusting the entropies e(Δτi0 ) and
e(ΔτtN ) individually, via information derived in a natural manner from the co-relative histories h0
and h N . An interesting variant of this approach, foreshadowed above, is to focus on entropy per
unit volume, rather than raw entropy. This involves completely different considerations than does
the conventional thermodynamic study of a variable-volume system, such as a quantity of gas in
a chamber compressed by a piston. Such a system is background dependent and does not involve
spacetime expansion. In the present more-fundamental setting, the study of entropy per unit volume
is partly motivated by the idea that the production of “new spacetime” ought to involve some “cost”,
or obey some analogue of continuity. In particular, one does not observe immediate runaway expansion
of spacetime, even though this tends to produce a large increase in entropy. A general phase map for
finite states defined in terms of entropy per unit volume is the telescoping map
' (
Θe/V (γ) = exp i e(ΔτtN )/|ΔτtN | − e(Δτi0 )/|Δτi0 | . (14)

For an “early universe scenario” involving a version of this map, let Δτi0 and ΔτtN be ﬁrst-degree
terminal states T 1 ( Di0 ) and T 1 ( DtN ) of cardinalities 10 and 20, respectively, and suppose that
|Aut(Δτi0 )| = 102 and |Aut(ΔτtN )| = 104 . Then using e = eSYM in Equation (14) yields
' (
Θe/V (γ) = exp i log(104 )/20 − log(102 )/10 = ei0 = 1.

384
Entropy 2017, 19, 322

A similar process represented by a chain γ whose final co-relative history has the same size
for its first-degree terminal state but twice the symmetry multiplicity produces a phase of
Θ1e/V (γ ) ≈ ei(0.0346...) . The angular difference of 0.0346... between these two values is much smaller
than the corresponding difference of log 2 = 0.6931... produced by Θ1e . Hence, Θe/V offers an example
of how one may increase constructive interference effects via natural information associated with
evolutionary processes. Precise characterization of these effects in physically realistic scenarios depends
on asymptotic behavior of large states. For example, working with symmetry entropy, states that
are “too rigid” will typically produce values near ei0 = 1 under Equation (14), regardless of the
process involved. On the other hand, states that are “too free” will produce phases for similar processes
insufficiently close to generate adequate constructive interference. Other state-specific modifications
of Equation (8) are also worth considering. For example, natural data associated with states may
be used to determine weights in more sophisticated phase maps involving weighted sums, such as
generalizations of the map given by Equation (6). This is analogous to assigning density functions to
state spaces or weights to individual outcomes in Gibbs or Shannon entropy.

4.3. Objections and Alternatives

Entropic phase maps may be criticized in various ways, and alternative approaches are possible
under the general framework of path summation. Given a choice of dynamics favoring an increase
in a speciﬁed type of entropy, it is prudent to ask whether this dynamics obviously contradicts
established physics. If so, then it can be at best a toy model. Figure 18 illustrates one type of scenario
that may be considered in this context, involving a sequence of co-relative histories h7 to h11
beginning

with the initial history D7 from the evolutionary process illustrated in Figure 10. Subsequent histories
in the present process are much different; each is constructed by adding a new element related to
all previously-existing elements. New elements are illustrated by large black nodes. This process
is visually suggestive of gravitational collapse, leading to a “black hole” represented by the chain of
new elements. This analogy is motivated by the fact that causal influence flows exclusively toward
the “back hole”. The automorphism groups Aut( T 1 ( Dk )) are large symmetric groups; in fact, they are
the largest possible automorphism groups for states of cardinality | T 1 ( Dk )| that are not antichains.
In particular, they are much larger than the corresponding groups associated with the process illustrated
in Figure 10. Hence, the present process maximizes symmetry entropy for first-degree terminal states.

D11

D10

D9 h10

h9
D8

h8
D7

Figure 18. Sequence of co-relative histories hk suggestive of gravitational collapse.

385
Entropy 2017, 19, 322

Since gravitational collapse is an important feature of general relativity, one should expect
such processes to be favored for certain histories that are large in ordinary terms but small on
cosmological scales. Similarly, one should expect “expanding universe” scenarios such as those
discussed in Section 4.1 to be favored in an appropriate cosmological sense. However, one should
not expect extreme versions of such processes to dominate all others in every situation, and such
behavior would disqualify any choice of dynamics producing it. Generalizing the present example,
it would discredit the entire idea of entropic phase maps if gravitational collapse scenarios were
found to entropically dominate all other evolutionary pathways combined. Rough computations
suggest that this is not the case. For example, beginning with a history D, one may estimate its
number of direct descendants in SPS , along with the possible sizes of their first-degree terminal state
automorphism groups. If D has cardinality K, then there exists one direct descendant D of D in SPS for
which Aut( T 1 ( D )) is isomorphic to SK , with cardinality K!, namely, the directed set D with one new
element related to all elements of D. The co-relative history D ⇒ D represents the beginning of the
global gravitational collapse scenario for D. Similarly, there are typically about K direct descendants
of D constructed by adding one new element connected to K − 1 elements of D. There may be fewer
such descendants, due to symmetries, but this is atypical due to rigidity. The first-degree terminal
state automorphism groups of these direct descendants may be as large as SK −1 , with cardinalities as
large as (K − 1)!, though they may be smaller due to symmetry breaking by the “excluded element”.
Next, there are typically about (K2 ) direct descendants of D in SPS constructed by adding one new
element connected to K − 2 elements of D, with first-degree terminal state automorphism groups
as large as (K − 2)!. Continuing this rough enumeration leads to an overestimate of the sum of the
symmetry multiplicities for first-degree terminal states over all direct descendants of D in SPS :

K K
K K!
multiplicity sum ≈ ∑ k
(K − k)! = ∑ k!
.
k =0 k =0

The ratio of the individual multiplicity associated with the beginning of gravitational collapse to the
overall multiplicity sum is therefore roughly

K
K! 1 n
K! K
1 1
K!/ ∑ k!
= 1/
K! ∑ k!
= 1/ ∑
k!
≈ = 0.3678...
e
k =0 k =0 k =0

Though this ratio is actually somewhat larger due to symmetry considerations, as well as the tiny
effect of truncating the rapidly convergent series for e, this computation suggests that the gravitational
collapse scenario does not always entropically dominate all other evolutionary pathways in the case of
symmetry entropy.
A much more general objection to the idea of entropic phase maps, already mentioned
in Section 4.2, is that it forces together notions that are only distantly related in conventional
situations where the path summation approach to quantum theory is known to succeed and where
the second law of thermodynamics is known to hold. In particular, the interference behavior
of Feynman’s phase map for paths in R4 is not closely related to conventional entropic data.
i
As explained in Section 1.2, Feynman’s map Θ(γ) = e h̄ S(γ) is determined by the classical action

S(γ) = γ L dt, where L is the Lagrangian. Hamilton’s principle states that the classical path γCL
renders S(γ) stationary, and for “sufﬁciently short” paths, S(γ) is generally minimized by γCL .
In this context, the Lagrangian L is symmetric under time reversal, so Hamilton’s principle certainly
does not imply the second law. While paths favored by Hamilton’s principle typically do exhibit
increases in entropy in realistic scenarios, this behavior may be attributed to auxiliary details such as
where these paths originate in state space. However, time reversal of a classical system, which generally
involves a systematic decrease in entropy, obeys the equations of motion determined by L just as well
as does the original system. Hence, an analogy between “high entropy” and “stationary action” is not

386
Entropy 2017, 19, 322

necessarily motivated by established physics in any compelling way. From this viewpoint, it is not at
all obvious that discrete causal analogues of Feynman’s phase map should depend directly on entropy.
The answer to this objection, already summarized in Section 4.2, is that discrete causal entropy
is neither expected, nor required, to play an “action-like” role in every sense. Nor must it resemble
conventional thermodynamic entropy in the sense of approximation, under which macrostates are
defined via imprecise, rather than merely incomplete, data. Indeed, the only version of entropy
introduced in Section 3 that fits this description is resolution entropy. The remaining versions all
differ from conventional thermodynamic entropy in at least two important respects: first, they do
not involve actual approximation; second, they depend nontrivially on information above first order
at the level of individual histories. More generally, discrete causal entropy must be “action-like”
only in that it produces desirable interference effects, and it must be “entropic” only in that it arises
via comparison of levels of detail under the basic framework of entropy systems. Regardless of
such conventional analogies, combinatorial data encoded in terminal states is likely, on basic
structural grounds, to determine discrete causal dynamics in the background independent setting.
The entropic notions introduced in Section 3.4 enjoy the additional benefits of possessing clear
physical meaning and suggesting effects that are known to be among the most universal in physics.
Hence, these notions stand out from among a relatively limited assortment of reasonable alternatives
for determining specific data for path summation.
Nevertheless, it is illuminating to briefly examine an alternative approach to path summation
in the discrete causal context, expressed via discrete causal action principles related more directly
to conventional motion-related or metric-related ideas. This involves defining discrete causal
“Lagrangians” and “actions” that mimic their conventional counterparts as closely as possible, in the
sense that they are defined in terms of specific “alterations” of individual histories. This is a much
narrower prescription than that of the relation function θ in Equation (4), which is “Lagrangian-like”
in an abstract sense regardless of its actual information content. An immediate difficulty with
this strategy is that notions such as energy, metric structure, and curvature, which are central to
conventional definitions of L and S, are themselves emergent in discrete causal theory. The same
is true of related quantities such as mass and momentum, which are often used to determine
these notions. In partially-background-dependent versions of discrete causal theory, such as quantum
causal set theory, “nongravitational matter” is ascribed to auxiliary fields and particles existing on
directed sets, and it is not too difficult to define reasonable analogues of L and S in this setting.
However, the situation is subtler in the perfectly-background-independent context under the strong
version of the causal metric hypothesis. As explained in Section 3.3, a popular problem in the
study of discrete gravity is how to abstract and generalize the Einstein–Hilbert action SEH [45–47].
However, the metric g and the scalar curvature R used to define SEH are unlikely to possess meaningful
direct analogues at the fundamental scale, where even primitive notions such as dimension and
topological structure are relatively obscure. Success in abstracting such quantities would accomplish
only part of the desired objective in any case, since a genuinely fundamental theory of spacetime
should explain the origins of more basic geometric and pre-geometric properties.
For these reasons, it seems preferable to work at a more conceptual level in defining discrete
causal analogues of L and S. The conceptual content of Hamilton’s principle is that nature is
basically conservative; it favors as little overall alteration as possible in evolving from one state
to another. Setting aside conventional ideas involving the conversion of one type of energy into another,
or the overall motion represented by a path between two points in a manifold, one may formulate discrete
causal action principles embodying this basic concept, hypothesizing that the resulting dynamics will
faithfully preserve the desired physical meaning as one works up from the fundamental scale. In this
context, the most natural discrete causal analogues of L and S are functionals that describe the extent
to which a given history or terminal state is altered in a process leading to another history or terminal
state. One way of describing such alteration is in terms of the elementary operations introduced in
Definition 16, which define the absolute distance between pairs of directed or multidirected sets. There

387
Entropy 2017, 19, 322

are at least two possible choices for how to quantify such an action: one may either count the number of
elementary operations necessary to convert one state Δ to another state Δ , ignoring ambient histories,
or one may count the number of operations involved in converting a history with terminal state Δ to a
history with terminal state Δ . The difference between these two notions of action is analogous to the
difference between absolute distance in Deﬁnition 16 and scheme-dependent distances in Deﬁnition 17.

Deﬁnition 27. Let h : Di ⇒ Dt be a co-relative history in a kinematic scheme S. Let Δτi and Δτt be terminal
states of Di and Dt with respect to transitions τi and τt , respectively.

1. The state-level Lagrangian quantity Lτi τt (h) of h with respect to the pair (τi , τt ) is the number of
elementary operations necessary to convert Δτi to Δτt .
2. The history-level Lagrangian L is the functional assigning to each co-relative history h the number of
elementary operations involved in converting Di to Dt , i.e., the number of elements and relations added to
Di by h.

Both Lτi τt (h) and L may take on either finite or infinite values in this general setting, though it is
often useful and appropriate to impose finiteness conditions. Lτi τt (h) is called a “Lagrangian quantity”
rather than a “Lagrangian” because it depends on choices of transitions τi and τt . One may specialize
this definition to define standard Lagrangian functionals. For example, one might define the first-degree
τ1 τ1
state-level Lagrangian L1 to be the functional assigning the state-level Lagrangian quantity L Di Dt (h) to
each co-relative history h : Di ⇒ Dt . The history-level quantity L seems much more natural than the
state-level quantity Lτi τt (h) in a structural sense. An unattractive aspect of Lτi τt (h) is that a sequence
of elementary operations converting Δτi to Δτt typically identifies structural components of these
two sets that arise from different parts of their corresponding histories. For example, the first-degree
terminal state Δ7 of the history D7 appearing in the evolutionary process illustrated in Figure 10 may
be converted into the first-degree terminal state Δ8 by a sequence of three elementary operations,
but only at the expense of identifying “early” structure in D7 with “later” structure in D8 .
A good motivation to study state-level quantities such as Lτi τt (h) despite this awkwardness is
that they are related to conventional evolutionary ideas in certain important ways. For example,
one may imagine a history in which “nothing changes”, in the sense that each terminal state of a
given degree “exactly replicates itself”. The simplest example is given by sequential growth of a chain;
at each stage of evolution, the first-degree terminal state of this chain consists of a single relation
connecting its penultimate element to its terminal element. Such a “frozen” or “static” history exhibits
a value of zero at every stage of evolution for an appropriate uniform choice of state-level Lagrangian
quantities Lτi τt (h), such as those induced by the first-degree state-level Lagrangian L1 . This agrees
with the naïve idea of dynamical stasis for this history. By contrast, the value L(h) of the history-level
Lagrangian L at every stage h of the evolution of such a history is a nonzero constant, and a similar
average value for L(h) occurs in “non-static” histories adding roughly the same number of elements
and relations at each evolutionary stage. Such histories may exhibit extreme structural differences
among generations, which may be essentially invisible to L. More generally, state-level quantities
may often detect interesting changes that are invisible to history-level quantities. A closely-related
issue is the problem of how to obtain suitable analogues of conventional evolutionary continuity.
As explained in Section 3.3, the conventional entropic preference for thermal equilibrium is balanced
by the continuity of evolution curves in state space and the fact that such curves may not originate near
the cell representing thermal equilibrium. The same topic was revisited in Section 4.2 in the context of
entropy per unit volume and spacetime expansion. Dynamics that explicitly resists drastic changes in
state-level quantities seems a priori more likely to avoid serious pathologies along these lines than
dynamics defined in terms of history-level quantities.
Each discrete causal Lagrangian induces a corresponding discrete causal action by summing
Lagrangian quantities over sequences of co-relative histories.

388
Entropy 2017, 19, 322

Deﬁnition 28. Let S be a kinematic scheme, and let γ = r (h0 ) ≺ ... ≺ r (h N ) be a chain in R(M(S))
representing a co-relative kinematics in S, where each relation r (hk ) represents a co-relative history hk : Dik →
Dtk . Let Δτik and Δτtk be terminal states of Dik and Dtk with respect to transitions τik and τtk .

1. The state-level action quantity S{τik },{τtk } (γ) along γ with respect to the pair of sequences of transitions
{τik } = {τi0 , ..., τiN } and {τtk } = {τt0 , ..., τtN } is the sum

N
S{τik },{τtk } (γ) = ∑ Lτik τtk (hk )
k =0

2. The history-level action S is the functional assigning to each chain γ the number of elementary operations
involved in converting Di0 to DtN , i.e., the number of elements and relations added to Di0 by the sequence
of co-relative histories h0 , ..., h N .

As in the case of Lagrangians, the history-level action S seems to be much more natural in a
basic structural sense than the state-level action quantity S{τik },{τtk } (γ). One obvious complication
involving the latter quantity is that fewer elementary operations are typically required to convert
a state Δ directly to a state Δ than to first convert Δ to an “interpolating state” Δ , then convert Δ
to Δ . However, the awkwardness of S{τik },{τtk } (γ) may be ameliorated to some extent by specifying
a uniform choice of transitions {τik } and {τtk }, for example, first-degree transitions. The resulting
first-degree state-level action functional may be denoted by S1 . Again, a good motivation for considering
state-level functionals is that they are more closely related to conventional evolutionary ideas in
certain respects than are history-level functionals. In particular, the history-level functional S does not
distinguish between co-relative kinematics involving state-replicating “static histories” and co-relative
kinematics involving histories in which considerable state-level change occurs, provided that the same
total number of elements and relations are added over the course of each process.
Discrete causal Lagrangians and actions defined in terms of elementary operations on directed
sets supply dynamical alternatives to entropic phase maps under the path summation approach
1
to quantum theory. For example, one might define an action-induced phase map Θ(γ) = eiS (γ)
using the first-degree state-level action functional S1 introduced above. This raises the obvious
question of how these two general types of dynamics compare. For example, one may consider
the gravitational collapse scenario illustrated in Figure 18. The value of the first-degree state-level
Lagrangian L1 at the kth stage of evolution is 2, because the kth first-degree terminal state Δk
differs from the (k + 1)st first-degree terminal state Δk+1 by a single element and a single relation,
up to isomorphism. However, the elements and relations that are identified under such a comparison
are completely different from the perspective of the entire terminal history Dk+1 . The value of the
history-level Lagrangian L at the kth stage of evolution is (k + 1), because one new element and
k new relations are added to the initial history Dk . The state automorphism group Aut(Δk ) of Δk ,
meanwhile, is typically isomorphic to Sk−1 , of cardinality (k − 1)!, and the state automorphism
group Aut(Δk+1 ) of Δk+1 is typically isomorphic to Sk , of cardinality k!. The ratio of the symmetry
multiplicities μSYM (Δk+1 )/μSYM (Δk ) is therefore typically k, and the corresponding increase in
symmetry entropy is typically log k.
Interesting structural relationships exist between the Lagrangians and actions introduced in this
section and the entropic notions developed in Section 3. Here, I can only offer vague sketches of a few
of these relationships. For example, the construction of superset microstates may be expressed via
“elementary operations” at the level of kinematic schemes. In particular, the first superset multiplicity
μ1SUP (Δ) in Definition 18 is the number | R+ ( x (Δ∗ ))| of relations in M(SPS ) beginning at the element
x (Δ∗ ) representing the causal dual Δ∗ of a state Δ. If this multiplicity is N, then one may imagine
a “growth process” for SPS that adds the N co-relative histories represented by the elements of
R+ ( x (Δ∗ )) at some stage of growth. This corresponds to a “history-level action” of roughly 2N for
the corresponding stage of growth of M(SPS ), ignoring multidirected structure, so in this case large

389
Entropy 2017, 19, 322

entropy corresponds to large action. However, since supersets encode “growth into the past”, one might
argue for associating a minus sign with this “action”, reversing this relationship. Relative notions
of symmetry entropy such as those introduced in Deﬁnition 24 also involve supersets, and may
therefore be related to such higher-level “action”. However, the most basic question in comparing
a “non-entropic” discrete causal action principle to a choice of discrete causal entropy is whether or
not such a principle, together with the structure of an appropriate discrete causal state space, at least
favors increasing entropy, regardless of whether or not it favors the maximal possible increase at each
evolutionary stage. In this context, an action principle applied to a state space may lead indirectly
to a version of the second law of thermodynamics, even if it is not derived from, or equivalent to,
such a law. This is certainly the case for conventional thermodynamics based on Newtonian physics
applied to ordinary state spaces. Corresponding relationships between discrete causal action principles
and discrete causal entropy remain mostly unexplored.

4.4. Summary and Conclusions

Entropic phase maps offer one possible method of supplying specific dynamical content for the
path summation approach to discrete quantum causal theory developed in [14]. Background and basics
of this approach are reviewed in Sections 1 and 2 of this paper. Such maps assign phases to evolutionary
pathways called co-relative kinematics in a discrete causal history configuration space called a
kinematic scheme. Their role is analogous to the role of Feynman’s phase map in the path summation
approach to ordinary quantum theory [1], which assigns phases to particle paths in a background
spacetime manifold. Each co-relative kinematics consists of a sequence of individual evolutionary
relationships between pairs of histories, called co-relative histories, mathematically represented
by equivalence classes of transitions between pairs of directed sets. A phase map whose values
are multiplicative for concatenation of co-relative kinematics is generated by a relation function θ,
which assigns phases to relations representing individual co-relative histories. Such a phase map
determines a specific version of the causal Schrödinger-type equation

− −
ψR;θ (r ) = θ (r ) ∑ ψR;θ (r − ),
r − ≺r

reproduced here from Equation (4). In physical terms, a suitable phase map must produce interference
effects that reinforce “reasonable” evolutionary processes, while damping out pathological processes.
In the case of entropic phase maps, this means that the entropic quantities defining these maps
should satisfy a property analogous to Hamilton’s principle of stationary action. In other respects,
these quantities need not resemble the classical action that determines Feynman’s phase map.
In particular, they need not be directly associated with familiar motion-related concepts such as
potential and kinetic energy, which define classical Lagrangians and actions in Newtonian mechanics,
or with metric structure, which determines the Einstein–Hilbert action in general relativity.
Entropy systems, introduced in Section 3.1, offer a general approach to entropy and the second law
of thermodynamics. Conventional versions of the second law involve notions of entropy associated
with “present states”, not with entire histories. In the discrete causal context, this suggests defining
entropies for terminal states of histories, which encode “recent” causes and effects. Such states are
defined in Section 3.3 in terms of transitions between pairs of directed sets. Aside from their evident
physical importance, such states are mathematically interesting due to their symmetry properties,
which exhibit a balance between the typical rigidity of general acyclic directed sets demonstrated
by Bender and Robinson [37], and the transitivity of antichains under their automorphism groups.
There are a variety of ways to define entropies for such states, all of which involve comparing
distinguishability properties of states at different levels of detail. Since multiple such levels merit
simultaneous consideration in discrete causal theory, a sufficiently general approach to discrete
causal entropy requires the use of entropy systems, which organize such levels in a systematic way.
Given two levels of detail, descriptions of a system at the coarser level are called macrostates,

390
Entropy 2017, 19, 322

while descriptions at the finer level are called microstates. The corresponding notion of entropy
measures the quantity of microstates corresponding to each macrostate in a manner that is additive
for composite systems. An important distinction between conventional thermodynamics and discrete
causal theory is that precise information up to first order typically suffices to determine future evolution
in the former setting, while higher-order information at the level of individual histories is a priori
relevant in the latter setting. In both cases, however, empirical evidence suggests that details of the
distant past should exert negligible influence on future events.
Four general methods of defining discrete causal macrostates and microstates, along with their
associated notions of entropy, and the resulting entropic phase maps, are examined in this paper.
Spaces of states are studied in Section 3.3, entropies in Section 3.4, and phase maps in Section 4.1.
The first method uses the theory of causal atomic resolution, whereby causal structure at the
fundamental scale is approximated by families of coarser causal structures constructed from special
subsets of directed sets, called causal atoms. This leads to the notion of resolution entropy.
This approach is very similar to coarse-graining of state space in conventional thermodynamics; in
particular, it involves actual approximation. The second method supplements the information encoded in
terminal states by describing how they may embed into larger states called supersets. This leads to the
notion of superset entropy. The level of detail in the original states is regarded as “coarse” because it is
incomplete, not because it is approximate. Supersets offer finer detail in the sense that they encode
more complete information. The third method measures distinguishability properties intrinsic to states
by counting the number of distinct ways in which they may be labeled. This leads to the notion of
labeled entropy. Labeled entropy is maximal for states lacking nontrivial symmetries, which meshes
with the intuition that high-entropy states should be “disordered”. The fourth method follows
essentially the opposite approach, by counting symmetries. This leads to the notion of symmetry
entropy. Like superset entropy, both labeled entropy and symmetry entropy involve organizing precise
but incomplete information, rather than actual approximation.
Computation of entropic phase maps in physically realistic situations is analytically involved,
and most of the results in this paper involve toy examples or qualitative results. Many of these appear
in Sections 4.1, 4.2 and 4.3. Discrete causal versions of the second law of thermodynamics favor
expanding universe scenarios, but this conclusion is obvious on basic enumerative grounds, and does
not favor discrete causal theory over other theories in any specific way. There is some evidence that
raw measures of entropy may be too sensitive to minor changes in structure to produce desirable
interference effects. The notion of entropy per unit volume seems more stable in this regard, and is also
attractive in other respects. Since the theory of entropic phase maps is almost completely unexplored,
many versions of the approach can likely be eliminated without serious effort. Symmetry entropy is
doubtful on conventional grounds, and also seems to be vulnerable to pathological instabilities such as
universal gravitational collapse scenarios. However, the idea is not obviously unworkable, and the
desire to model symmetric structures in nature, such as “elementary” particles, renders such notions
worth entertaining. Discrete causal action principles involving elementary operations on directed sets
offer an alternative to entropic phase maps in the path summation context. Relationships exist between
these two approaches, but the details of these connections are unclear at present.
Problems that must be solved to further develop the theory of entropic phase maps
include the enumeration of certain classes of acyclic directed sets, and the computations of their
automorphism groups. These problems may be approached from a mathematical perspective via
the theory of random graphs, and interesting and important results of this nature may be found in
the graph-theoretic literature. However, most of these results are developed from a perspective very
different than the study of fundamental spacetime structure, and the perception of what problems
are interesting is different in this setting as well. Hence, it is not easy to mine the existing body of
graph theory for such results, and many physically relevant topics remain underdeveloped. This is
likely due both to difficulty of problems and differences in emphasis. Particularly useful in this
context would be a thorough analysis of families of directed graphs corresponding to nth-order states.

391
Entropy 2017, 19, 322

For example, how would one compute the average number of superset microstates adding 103 elements
to a first-order state of cardinality 104 ? What is the average size of the automorphism group of a
first-order state with 109 elements and 1012 relations? For a fixed degree n, how does the average size
of Aut( T n ( D )) scale with the cardinality of D? For a fixed ratio of order to cardinality for states Δ,
how does the average size of Aut(Δ) scale with the cardinality of Δ? Going beyond average quantities,
how are the numbers of superset microstates, or the sizes of state automorphism groups, distributed for
certain classes of states? Are they randomly scattered, or do they tend to cluster around certain values?
Many questions of this nature must be answered before the physical implications of entropic phase
maps can be understood in any detail. Computational resources may also be used to compile numerical
evidence about the behavior of various entropic phase maps for relatively small histories. For example,
it would be very interesting to compute some of the entropic quantities examined in this paper for the
first few generations of the positive sequential kinematic scheme SPS .

Acknowledgments: The author thanks Brendan McKay, Johnny Feng, Jessica Garriga, Kiran Bist, and Stephanie
Dribus for useful discussions.
Conﬂicts of Interest: The author declares no conﬂict of interest.

References
1. Feynman, R. Space-Time Approach to Non-Relativistic Quantum Mechanics. Rev. Mod. Phys. 1948, 20, 367.
2. Bombelli, L.; Lee, J.; Meyer, D.; Sorkin, R. Space-Time as a Causal Set. Phys. Rev. Lett. 1987, 59, 521.
3. Finkelstein, D. Space-Time Code. Phys. Rev. 1969, 184, 1261.
4. Finkelstein, D. “Superconducting” Causal Nets. Int. J. Theor. Phys. 1988, 27, 473–519.
5. Knuth, K.H.; Bahreyni, N. A potential foundation for emergent space-time. J. Math. Phys. 2014, 55, 112501.
6. Ambjorn, J.; Dasgupta, A.; Jurkiewicz, J.; Loll, R. A Lorentzian cure for Euclidean troubles. Nucl. Phys. B
Proc. Suppl. 2002, 106, 977–979.
7. Markopoulou, F. Quantum Causal Histories. Class. Quantum Gravity 2000, 17, 2059.
8. Rovelli, C. Quantum Gravity. In Cambridge Monographs on Mathematical Physics; Cambridge University Press:
Cambridge, UK, 2004.
9. Thiemann, T. Modern Canonical Quantum General Relativity. In Cambridge Monographs on Mathematical Physics;
Cambridge University Press: Cambridge, UK, 2007.
10. D’Ariano, G.M.; Perinotti, P. Derivation of the Dirac Equation from Principles of Information Processing.
Phys. Rev. A 2014, 90, 062106.
11. Finster, F. Causal Fermion Systems: An Overview. In Quantum Mathematical Physics; Springer: Berlin,
Germany, 2016.
12. Finster, F. The Continuum Limit of Causal Fermion Systems: From Planck Scale Structures to Macroscopic
Physics. In Fundamental Theories of Physics; Springer: Berlin, Germany, 2016.
13. Chen, H.; Sasakura, N.; Sato, Y. Emergent Classical Geometries on Boundaries of Randomly Connected
Tensor Networks. arXiv 2016, arXiv:1601.04232.
14. Dribus, B.F. Discrete Causal Theory: Emergent Spacetime and the Causal Metric Hypothesis; Springer: Berlin,
Germany, 2017.
15. Dribus, B.F. On the Foundational Assumptions of Modern Physics. In Questioning the Foundations, the Frontiers
Collection; Springer: Berlin, Germany, 2015; pp. 45–60.
16. Dribus, B.F. On the Axioms of Causal Set Theory. arXiv 2013, arXiv:1311.2148.
17. D’Ariano, G.M.; Chiribella, G.; Perinotti, P. Quantum Theory From First Principles; Cambridge University Press:
Cambridge, UK, 2017.
18. Knuth, K.H. Information-based Physics: An observer-centric foundation. Contemp. Phys. 2014, 55, 12–32.
19. Verlinde, E. On the origin of gravity and the laws of Newton. J. High Energy Phys. 2011, 4, 29.
20. Kleitman, D.J.; Rothschild, B.L. Asymptotic Enumeration of Partial Orders on a Finite Set. Trans. Am.
Math. Soc. 1975, 205, 205–220.
21. Moore, C. Comment on “Space-Time as a Causal Set”. Phys. Rev. Lett. 1988, 60, 655.
22. Bombelli, L.; Lee, J.; Meyer, D.; Sorkin, R. Bombelli et al. Reply to Comment on “Space-Time as a Causal
Set”. Phys. Rev. Lett. 1988, 60, 656.

392
Entropy 2017, 19, 322

23. Hawking, S.W.; King, A.R.; McCarthy, P.J. A new topology for curved space-time which incorporates
the causal, differential, and conformal structures. J. Math. Phys. 1976, 17, 174–181.
24. Malament, D.B. The class of continuous timelike curves determines the topology of spacetime. J. Math. Phys.
1977, 18, 1399–1404.
25. Martin, K.; Panangaden, P. A Domain of Spacetime Intervals in General Relativity. Commun. Math. Phys.
2006, 267, 563–586.
26. Bombelli, L.; Meyer, D. Origin of Lorentzian geometry. Phys. Lett. A 1989, 141, 226–228.
27. Parrikar, O.; Surya, S. Causal topology in future and past distinguishing spacetimes. Class. Quantum Gravity
2011, 28, 155020.
28. Myrheim, J. Statistical Geometry. Available online: https://cds.cern.ch/record/293594/ﬁles/197808143.pdf
(accessed on 30 June 2017).
29. Hooft, G. Quantum Gravity: A Fundamental Problem and some Radical Ideas. In Recent Developments in
Gravitation; Springer: New York, NY, USA, 1978; pp. 323–345.
30. Ahmed, M.; Dodelson, S.; Greene, P.B.; Sorkin, R. Everpresent Λ. Phys. Rev. D 2004, 69, 103523.
31. Bombelli, L.; Henson, J.; Sorkin, R. Discreteness without symmetry breaking: A theorem. Mod. Phys. Lett. A
2009, 24, 2579–2587.
32. Harary, F.; Norman, R.Z. Some Properties of Line Digraphs. Rediconti del Circolo Matematico di Palermo
1960, 9, 161–168.
33. Major, S.A.; Rideout, D.; Surya, S. Spatial Hypersurfaces in Causal Set Cosmology. Class. Quantum Gravity
2006, 23, 4743–4751.
34. Surya, S. Directions in Causal Set Quantum Gravity. In Recent Research in Quantum Gravity; Dasgupta, A., Ed.;
Nova Science Publishing Incorporated: Hauppauge, NY, USA, 2012.
35. Sorkin, R. Expressing entropy globally in terms of (4D) field-correlations. J. Phys. Conf. Ser. 2014, 484, 012004.
36. Sorkin, R.; Yazdi, Y. Entanglement Entropy in Causal Set Theory. arXiv 2016, arXiv:1611.10281v1.
37. Bender, E.A.; Robinson, R.W. The Asymptotic Number of Acyclic Digraphs II. J. Comb. Theory Ser. B
1988, 44, 363–369.
38. Rideout, D.; Sorkin, R. Classical sequential growth dynamics for causal sets. Phys. Rev. D 2000, 61, 024002.
39. Sorkin, R. Toward a Fundamental Theorem of Quantal Measure Theory. Math. Struct. Comput. Sci.
2012, 22, 816–852.
40. Isham, C. Quantum Logic and the Histories Approach to Quantum Theory. J. Math. Phys. 1994, 35, 2157.
41. Isham, C. Topos Theory and Consistent Histories: The Internal Logic of the Set of all Consistent Sets. Int. J.
Theor. Phys. 1997, 36, 785.
42. Isham, C. Quantising on a Category. Found. Phys. 2005, 35, 271–297.
43. Isham, C. Topos Methods in the Foundations of Physics. In Deep Beauty: Understanding the Quantum World
through Mathematical Innovation; Halvorson, H., Ed.; Cambridge University Press: Cambridge, UK, 2011.
44. Penrose, R. Cycles of Time; Vintage Books: New York, NY, USA, 2010.
45. Benincasa, D.M.T.; Dowker, F. Scalar Curvature of a Causal Set. Phys. Rev. Lett. 2010, 104, 181301.
46. Glaser, L. A closed form expression for the causal set D’Alembertian. Class. Quantum Gravity 2014, 31, 5007.
47. Aslanbeigi, S.; Saravani, M.; Sorkin, R. Generalized Causal Set d’Alembertians. arXiv 2014, arXiv:1403.1622.

393
entropy
Article
Nonclassicality by Local Gaussian Unitary
Operations for Gaussian States
Yangyang Wang 1,† , Xiaofei Qi 1,2, *,† and Jinchuan Hou 1,3,†
1 Department of Mathematics, Shanxi University, Taiyuan 030006, China; [email protected] (Y.W.);
[email protected] (J.H.)
2 Institute of Big Data Science and Industry, Shanxi University, Taiyuan 030006, China
3 Department of Mathematics, Taiyuan University of Technology, Taiyuan 030024, China
* Correspondence: [email protected]; Tel.:+86-351-7010555
† These authors contributed equally to this work.

Received: 19 January 2018; Accepted: 6 April 2018; Published: 11 April 2018

Abstract: A measure of nonclassicality N in terms of local Gaussian unitary operations for bipartite
Gaussian states is introduced. N is a faithful quantum correlation measure for Gaussian states as
product states have no such correlation and every non product Gaussian state contains it. For any
bipartite Gaussian state ρ AB , we always have 0 ≤ N (ρ AB ) < 1, where the upper bound 1 is sharp.
An explicit formula of N for (1 + 1)-mode Gaussian states and an estimate of N for (n + m)-mode
Gaussian states are presented. A criterion of entanglement is established in terms of this correlation.
The quantum correlation N is also compared with entanglement, Gaussian discord and Gaussian
geometric discord.

Keywords: quantum correlations; Gaussian states; Gaussian unitary operations; continuous-variable

systems

1. Introduction
The presence of correlations in bipartite quantum systems is one of the main features of quantum
mechanics. The most important one among such correlations is entanglement [1]. However, recently
much attention has been devoted to the study and the characterization of quantum correlations
that go beyond the paradigm of entanglement, being necessary but not sufficient for its presence.
Non-entangled quantum correlations also play important roles in various quantum communications
and quantum computing tasks [2–5].
For the last two decades, various methods have been proposed to quantify quantum correlations,
such as quantum discord (QD) [6,7], geometric quantum discord [8,9], measurement-induced
nonlocality (MIN) [10] and measurement-induced disturbance (MID) [11] for discrete-variable systems.
It is also important to develop new simple criteria for witnessing correlations beyond entanglement for
continuous-variable systems. In this direction, Giorda, Paris [12] and Adesso, Datta [13] independently
introduced the definition of Gaussian QD for Gaussian states and discussed its properties. Adesso
and Girolami in [14] proposed the concept of Gaussian geometric discord (GD) for Gaussian states.
Measurement-induced disturbance of Gaussian states was studied in [15], while MIN for Gaussian
states was discussed in [16]. For other related results, see [17,18] and the references therein. Note
that not every quantum correlation defined for discrete-variable systems has a Gaussian analogy for
continuous-variable systems [16]. On the other hand, the values of Gaussian QD and Gaussian GD are
very difficult to be computed and the known formulas are only for some (1 + 1)-mode Gaussian states.
Little information is revealed by Gaussian QD and GD. The purpose of this paper is to introduce a new

Entropy 2018, 20, 266; doi:10.3390/e20040266 395 www.mdpi.com/journal/entropy

Entropy 2018, 20, 266

measure of nonclassicality for (n + m)-mode quantum states in continuous-variable systems, which is

simpler to be computed and can be used with any (n + m)-mode Gaussian states.
Given a bipartite quantum state ρ acting on Hilbert space H A ⊗ HB , denote by ρ A = TrB (ρ) the
reduced density operator in subsystem A. For the case of finite dimensional systems, the author
of [19] proposed a quantity dUA (ρ) defined by dUA (ρ) = √1 ρ − (U A ⊗ I )ρ(U A ⊗ I )† F , where
. 2
A F = Tr( A† A) denotes the Frobenius norm and U A is any unitary operator satisfying
[ρ A , U A ] = 0. This quantity demands that the reduced density matrix of the subsystem A is invariant
under this unitary transformation. However, the global density matrix may be changed after such
local unitary operation, and therefore dUA (ρ) may be non-zero for some U A . Then, Datta, Gharibian,
et al. discussed respectively in [20,21] the properties of dUA (ρ) and revealed that maxUA dUA (ρ) can be
used to investigate the nonclassical effect.
Motivated by the works in [19–21], we can consider an analogy for continuous-varable systems.
In the present paper, we introduce a quantity N in terms of local Gaussian unitary operations for
(n + m)-mode quantum states in Gaussian systems. Different from the finite dimensional case, besides
the local Gaussian unitary invariance property for quantum states, we also show that N (ρ AB ) = 0
if and only if ρ AB is a Gaussian product state. This reveals that the quantity N is a kind of faithful
measure of the nonclassicality for Gaussian states that a state has this nonclassicality if and only
if it is not a product state. In addition, we show that 0 ≤ N (ρ AB ) < 1 for each (n + m)-mode
Gaussian state ρ AB and the upper bound 1 is sharp. An estimate of N for any (n + m)-mode Gaussian
states is provided and an explicit formula of N for any (1 + 1)-mode Gaussian states is obtained.
As an application, a criterion of entanglement for (1 + 1)-mode Gaussian states is established in terms
of N by numerical approaches. Finally, we compare N with Gaussian QD and Gaussian GD to
illustrate that it is a better measure of the nonclassicality.

2. Gaussian States and Gaussian Unitary Operations

Recall that, for arbitrary state ρ in an n-mode continuous-variable system, its characteristic
function χρ is deﬁned as
χρ (z) = Tr(ρW (z)),

where z = ( x1 , y1 , · · · , xn , yn )T ∈ R2n with R the ﬁeld of real numbers and (·)T the transposition,
and W (z) = exp(iR T z) is the Weyl operator. Let R = ( R1 , R2 , · · · , R2n )T = ( Q̂1 , P̂1 , · · · , Q̂n , P̂n )T .
As usual, Q̂i and P̂i stand respectively for the position and momentum operators for each
i ∈ {1, 2, · · · , n}. They satisfy the Canonical Commutation Relation (CCR) in natural units (h̄ = 1)

[ Q̂i , P̂j ] = δij iI and [ Q̂i , Q̂ j ] = [ P̂i , P̂j ] = 0,

i, j = 1, 2, . . . , n.
Gaussian states: ρ is called a Gaussian state if χρ (z) is of the form

1
χρ (z) = exp[− zT Γz + idT z],
4
where
d= ( R̂1 , R̂2 , . . . , R̂2n )T
= (Tr(ρR1 ), Tr(ρR2 ), . . . , Tr(ρR2n ))T ∈ R2n
is called the mean or the displacement vector of ρ and Γ = (γkl ) ∈ M2n (R) is the covariance matrix
(CM) of ρ deﬁned by γkl = Tr[ρ(Δ R̂k Δ R̂l + Δ R̂l Δ R̂k )] with Δ R̂k = R̂k − R̂k ([22–24]). Here, Ml ×k (R)
stands for the set of all l-by-k real matrices and, when l = k, we write Ml ×k (R) as Ml (R). Note
that the CM Γ of a state
is symmetric
and must satisfy the uncertainty principle Γ + iΔ ≥ 0, where
0 1
Δ = ⊕in=1 Δi with Δi = for each i. From the diagonal terms of the above inequality, one can
−1 0

396
Entropy 2018, 20, 266

easily derive the usual Heisenberg uncertainty relation for position and momentum V ( Q̂i )V ( P̂i ) ≥ 1
with V ( R̂i ) = (Δ R̂i )2 [25].
Now assume that ρ AB is any (n + m)-mode Gaussian state. Then, the CM Γ of ρ AB can be
written as

A C
Γ= , (1)
CT B

where A ∈ M2n (R), B ∈ M2m (R) and C ∈ M2n×2m (R). Particularly, if n = m = 1 , by means of local
Gaussian unitary (symplectic at the CM level) operations, Γ has a standard form:

A0 C0
Γ0 = , (2)
C0T B0

a 0 b 0 c 0
where A0 = , B0 = , C0 = , Γ0 > 0, det Γ0 ≥ 1 and
0 a 0 b 0 d
det Γ0 + 1 ≥ det A0 + det B0 + 2 det C0 ([26–29]).
Gaussian unitary operations. Let us consider an n-mode continuous-variable system with
R = ( Q̂1 , P̂1 , · · · , Q̂n , P̂n )T . For a unitary operator U, the unitary operation ρ → UρU † is said to
be Gaussian if its output is a Gaussian state whenever its input is a Gaussian state, and such U is called
a Gaussian unitary operator. It is known that a unitary operator U is Gaussian if and only if

U † RU = SR + m,

for some vector m in R2n and some S ∈ Sp(2n, R), the symplectic group of all 2n × 2n real matrices S
that satisfy

S ∈ Sp(2n, R) ⇔ SΔST = Δ.

Thus, every Gaussian unitary operator U is determined by some afﬁne symplectic map (S, m) acting
on the phase space, and can be denoted by U = US,m ([23,24]).
The following well-known facts for Gaussian states and Gaussian unitary operations are useful
for our purpose.

Lemma 1 ([23]). For any (n + m)-mode Gaussian state ρ AB , write its CM Γ as in Equation (1). Then, the CMs
of the reduced states ρ A = TrB ρ AB and ρ B = Tr A ρ AB are matrices A and B, respectively.

Denote by S( H A ⊗ HB ) the set of all quantum states of H A ⊗ HB , where H A and HB are

respectively the state space for n-mode and m-mode continuous-variable systems.

Lemma 2 ([30]). If ρ AB ∈ S( H A ⊗ HB ) is an (n + m)-mode Gaussian state, then ρ AB is a product state,

that is, ρ AB = σA ⊗ σB for some σA ∈ S( H A ) and σB ∈ S( HB ), if and only if Γ = Γ A ⊕ Γ B , where Γ, Γ A and
Γ B are the CMs of ρ AB , σA and σB , respectively.

Lemma 3 ([23,24]). Assume that ρ is any n-mode Gaussian state with CM Γ and displacement vector d,
and US,m is a Gaussian unitary operator. Then, the characteristic function of the Gaussian state σ = UρU † is
of the form exp(− 14 zT Γσ z + idTσ z), where Γσ = SΓST and dσ = m + Sd.

3. Quantum Correlation Introduced by Gaussian Unitary Operations

Now, we introduce a quantum correlation N by local Gaussian unitary operations in the
continuous-variable system.

397
Entropy 2018, 20, 266

Deﬁnition 1. For any (n + m)-mode quantum state ρ AB ∈ S( H A ⊗ HB ), the quantum correlation N (ρ AB )

of ρ AB by Gaussian unitary operations is deﬁned by

1
N (ρ AB ) = sup ρ AB − (I ⊗ U )ρ AB (I ⊗ U † )22 , (3)
2 U

where the supremum is taken over all Gaussian unitary operators U ∈ B( HB ) satisfying Uρ B U † = ρ B ,
and ρ B = Tr A (ρ AB ) is the reduced state. Here, B( HB ) is the set of all bounded linear operators acting on HB .

Observe that N (ρ AB ) = 0 holds for every product state. Thus, the product state contains no
such correlation.

Remark 1. For any Gaussian state ρ AB , there exist many Gaussian unitary U so that Uρ B U † = ρ B . This
ensures that the deﬁnition of the quantity N (ρ AB ) makes sense for each Gaussian state ρ AB .

To see this, we need Williamson Theorem ([31]), which states that, for any n-mode Gaussian state
ρ ∈ S( H ) with CM Γρ , there exists a 2n × 2n symplectic matrix S such that SΓρ ST = ⊕in=1 vi I2 with
vi ≥ 1. The diagonal matrix ⊕in=1 vi I2 and vi s are called respectively the Williamson form and the
symplectic eigenvalues of Γρ . By the Williamson Theorem, there exists a Gaussian unitary operator
U = US,m = US,−Sd such that UρU † = ⊗in=1 ρi , where ρi are thermal states. Let Sθ = ⊕in=1 Sθi with

cos θi sin θi
S θi = , θi ∈ [0, π2 ]. Then, Sθ is a symplectic matrix, and the corresponding
− sin θi cos θi
Gaussian unitary operator USθ ,0 = USθ has the form USθ = ⊗in=1 USθ = ⊗in=1 exp(θi âi† âi ). It is easily
i
checked that Sθ (⊕in=1 vi I )STθ = ⊕in=1 vi I, and so USθ (⊗in=1 ρi )US† θ = ⊗in=1 ρi . Now, write W = U † USθ U.
Obviously, W is Gaussian unitary and satisﬁes WρW † = U † USθ UρU † US† θ U = ρ.
We ﬁrst prove that N is local Gaussian unitary invariant for all quantum states.

Proposition 1 (Local Gaussian unitary invariance). If ρ AB ∈ S( H A ⊗ HB ) is an (n + m)-mode quantum

state, then N ((U ⊗ V )ρ AB (U † ⊗ V † )) = N (ρ AB ) holds for any Gaussian unitary operators U ∈ B( H A )
and V ∈ B( HB ).

Proof of Proposition 1. Let ρ AB ∈ S( H A ⊗ HB ) be an (n + m)-mode Gaussian state. For any Gaussian

unitary operators U ∈ B( H A ) and V ∈ B( HB ), denote σAB = (U ⊗ V )ρ AB (U † ⊗ V † ). Then,
σB = Vρ B V † . For any Gaussian unitary operator W ∈ B( HB ) satisfying WσB W † = σB , we have
WVρ B V † W † = Vρ B V † . Let W = V † WV. Then, W is also a Gaussian unitary operator and satisﬁes
W ρ B W † = V † WVρ B V † W † V = ρ B . It is clear that W runs over all Gaussian unitary operators that

398
Entropy 2018, 20, 266

commutes with ρ B when W runs over all Gaussian unitary operators commuting with σB . Hence,
by Equation (3), we have

N (σAB )
1
= sup σAB − ( I ⊗ W )σAB ( I ⊗ W )22
2 W
1
= sup (U ⊗ V )ρ AB (U † ⊗ V † ) − ( I ⊗ W )(U ⊗ V )ρ AB (U † ⊗ V † )( I ⊗ W )22
2 W
= sup{Tr(ρ2AB ) − Tr(ρ AB ( I ⊗ V † WV )ρ AB ( I ⊗ V † W † V ))}
W
= sup{Tr(ρ2AB ) − Tr(ρ AB ( I ⊗ W )ρ AB ( I ⊗ W † ))}
W
1
= sup ρ AB − ( I ⊗ W )ρ AB ( I ⊗ W † )22
2 W
=N (ρ AB )

as desired.

The next theorem shows that N (ρ AB ) is a faithful nonclassicality measure for Gaussian states.

Theorem 1. For any (n + m)-mode Gaussian state ρ AB ∈ S( H A ⊗ HB ), N (ρ AB ) = 0 if and only if ρ AB is a

product state.

Proof of Theorem 1. By Deﬁnition 1, the “if” part is apparent. Let us check the “only if” part. Since the
mean of any Gaussian state can be transformed to zero under some local Gaussian unitary operation,
it is sufﬁcient to consider those Gaussian states whose means are zero by Proposition 1. In the sequel,

A C
assume that ρ AB is an (n + m)-mode Gaussian state with zero mean vector and CM Γ =
CT B
as in Equation (1), so that N (ρ AB ) = 0.
By Lemma 1, the CM of ρ B is B. According to the Williamson Theorem, there exists a
symplectic matrix S0 such that S0 BST0 = ⊕im=1 vi I and U0 ρ B U0† = ⊗im=1 ρi , where U0 = US0 ,0 and
ρi are of the thermal states. Write σAB = ( I ⊗ U0 )ρ AB ( I ⊗ U0† ). It follows from Proposition 1 that
N (σAB ) = N (ρ AB ) = 0. Obviously, σAB has the CM of form:

A C
Γ =
C T ⊕im vi I

and the mean 0.

For any θi ∈ [0, π2 ] for i = 1, 2, · · · , m, let Sθ be the symplectic matrix as in Remark 1. Then,
Sθ (⊕im=1 vi I )SθT = ⊕im=1 vi I and USθ ,0 σB US† θ ,0 = σB = Tr A (σAB ). As N (σAB ) = 0, by Equation (3),
σAB = ( I ⊗ USθ ,0 )σAB ( I ⊗ US† θ ,0 ), and hence they must have the same CMs, that is,

A C A C STθ
= .
C T ⊕im=1 vi I S θ C T ⊕im=1 vi I

Note that I − STθ is an invertible matrix if we take θi ∈ (0, π2 ) for each i. Then, it follows from
C = C STθ that we must have C = 0. Thus, σAB is a product state by Lemma 2, and, consequently,
ρ AB = ( I ⊗ U0† )σAB ( I ⊗ U0 ) is also a product state.

We can give an analytic formula of N (ρ AB ) for (1+1)-mode Gaussian state ρ AB . Since N is locally
Gaussian unitary invariant, it is enough to assume that the mean vector of ρ AB is zero and the CM
is standard.

399
Entropy 2018, 20, 266

A0 C0
Theorem 2. For any (1 + 1)-mode Gaussian state ρ AB with CM Γ whose standard form is Γ0 =
C0T B0
as in Equation (2), we have

1 1
N (ρ AB ) = . −/ . (4)
( ab − c2 )( ab − d2 ) c2
( ab − 2 )( ab − d2
2 )
/
Particularly, N (ρ AB ) = 1 − 2
2−c2 d2 + ab(c2 +d2 )
whenever ρ AB is pure.

Proof of Theorem 2. By Proposition 1, we may assume that the mean vector of ρ AB is zero. Let US,m
be a Gaussian unitary operator such that US,m ρ B US,m† = ρ B . Then, S and m meet the conditions
SB0 ST = B0 and Sd B + m = d B = 0. It follows that m = 0. Thus, we can denote
US,m by US .
cos θ sin θ
As SΔST = Δ, there exists some θ ∈ [0, π2 ] such that S = Sθ = . Thus, the CM of
− sin θ cos θ
Gaussian state ( I ⊗ US )ρ AB ( I ⊗ US† ) is
⎛ ⎞
a 0 c cos θ −c sin θ
⎜ d sin θ d cos θ ⎟
⎜ 0 a ⎟
Γθ = ⎜ ⎟,
⎝ c cos θ d sin θ b 0 ⎠
−c sin θ d cos θ 0 b

and the mean of ( I ⊗ US )ρ AB ( I ⊗ US† ) is ( I ⊕ S)d + 0 ⊕ 0 = 0 as d = 0. Hence, by Equations (3)

and (4), one gets

N (ρ AB )
1
= sup ρ AB − ( I ⊗ U )ρ AB ( I ⊗ US,m
†
)22
2 US,m
= sup {Tr(ρ2AB ) − Tr(ρ AB ( I ⊗ US,m )ρ AB ( I ⊗ US,m
†
))}
US,m
1 1
= sup { √ −. }
θ ∈[0, π2 ] det Γ det[(Γ + Γθ )/2]
1
= maxπ { .
θ ∈[0, 2 ] a2 b2 − ab(c2 + d2 )
+ c2 d2
1
−. }
[ ab − c2 (1 + cos θ )/2][ ab − d2 (1 + cos θ )/2]
1 1
=. −. .
( ab − c2 )( ab − d2 ) ( ab − c2 /2)( ab − d2 /2)

Hence, Equation (4) is true.

Particularly, if ρ AB is a pure state, then, by [29], we have 1 = Tr(ρ2 ) = √1 = √ 1
.
detΓ ( ab−c2 )( ab−d2 )
/
This entails that N (ρ AB ) = 1 − 2−c2 d2 +2ab(c2 +d2 ) .

For the general (n + m)-mode case, it is difﬁcult to give an analytic formula of N (ρ AB ) for all
(n + m)-mode Gaussian states ρ AB . However, we are able to give an estimate of N (ρ AB ).

400
Entropy 2018, 20, 266

A C
Theorem 3. For any (n + m)-mode Gaussian state ρ AB with CM Γ = as in Equation (1),
CT B
we have
1 1
0 ≤ N (ρ AB ) ≤ √ −. < 1. (5)
det Γ (det A)(det B)

Particularly, when ρ AB is pure, N (ρ AB ) ≤ 1 − √ 1

. Moreover, the upper bound 1 in the inequality
(det A)(det B)
(5) is sharp, that is, we have
sup N (ρ AB ) = 1.
ρ AB

Proof of Theorem 3. By Proposition 1, without loss of generality, we may assume that the mean of
ρ AB is 0. Let US,m be a Gaussian unitary operator such that US,m ρ B US,m
† = ρ B . Then, the CM and the

A CST
mean of the Gaussian state ( I ⊗ US,m )ρ AB ( I ⊗ US,m
† ) are Γ =
U and 0, respectively.
SCT B
Note that, for any n-mode Gaussian states ρ, σ with CMs Vρ , Vσ and means dρ , dσ , respectively, it is
shown in [32] that

1 1
Tr(ρσ ) = / exp[− δ d T det[(Vρ + Vσ )/2]−1 δ d], where δ d = dρ − dσ . (6)
det[(Vρ + Vσ )/2] 2

Hence,

1
N (ρ AB ) = sup ρ AB − (I ⊗ U )ρ AB (I ⊗ U † )22
2 U
= sup{Tr(ρ2AB ) − Tr(ρ AB ( I ⊗ U )ρ AB ( I ⊗ U † ))}
U
1 1
= sup{ √ −. }.
S det Γ det[(Γ + ΓU )/2]

C +CST
Γ + ΓU A
Since A > 0, B > 0 and 2 = CT +SCT
2 , by Fischer’s inequality (p. 506, [33]), we have
2 B
det Γ+2ΓU ≤ (det A)(det B). Thus, we get N (ρ AB ) ≤ √1 −√ 1
. If ρ AB is a pure state, then
det Γ (det A)(det B)
1 = Tr(ρ2AB ) = √1 , which gives N (ρ AB ) ≤ 1 − √ 1
.
det Γ (det A)(det B)
Notice that, by Equation (6), we have 1. 1
= Tr(ρ2AB )2 ≤
This implies that
det Γ
N (ρ AB ) ≤ √ 1 − √ 1
< 1 since det A > 0 and det B > 0, that is, the inequality (5) is true.
det Γ (det A)(det B)
To see that the upper bound 1 is sharp, consider the two-mode squeezed vacuum state
ρ(r ) = S(r )|00 00|S† (r ), where S(r ) = exp(−r â1 â2 + r â1† â2† ) is the two-mode squeezing
operator with squeezednumber r ≥ 0 and |00 is the vacuum state ([24]). The
CM
1 A0 B0 exp(−2r ) + exp(2r ) 0
of ρ(r ) is 2 , where A0 = and
B0 A0 0 exp(−2r ) + exp(2r )

− exp(−2r ) + exp(2r ) 0
B0 = . By Theorem 2, it is easily calculated that
0 exp(−2r ) − exp(2r )

8
N (ρ(r )) = 1 − .
6 + exp(−4r ) + exp(4r )

Clearly, N (ρ(r )) → 1 as r → ∞, thus

sup N (ρ(r )) = 1,
r

401
Entropy 2018, 20, 266

completeing the proof.

4. Comparison with Other Quantum Correlations

Entanglement is one of the most important quantum correlations, being central in most quantum
information protocols [1]. However, it is an extremely difﬁcult task to verify whether a given quantum
state is entangled or not. Recall that a quantum state ρ AB ∈ S( H A ⊗ HB ) is said to be separable if
it belongs to the closed convex hull of the set of all product states ρ A ⊗ ρ B ∈ S( H A ⊗ HB ). Note

that a state ρ AB is separable if and only if it admits a representation ρ AB = X ρ A ( x ) ⊗ ρ B ( x )π (dx ),
where π (dx ) is a Borel probability measure and ρ A( B) ( x ) is a Borel S( H A( B) )-valued function on some
complete, separable metric space X [34]. One of the most useful separability criteria is the positive
partial transpose (PPT) criterion, which can be found in [35,36]. The PPT criterion states that if a
state is separable, then its partial transposition is positive. For discrete systems, the positivity of the
partial transposition of a state is necessary and sufﬁcient for its separability in the 2 ⊗ 2 and 2 ⊗ 3
cases. However, it is not true for higher dimensional systems [36]. For continuous systems, in [27,37],
the authors extended the PPT criterion to (n + m) -mode continuous systems. It is remarkable that,
for any (1 + n)-mode Gaussian state, it has PPT if and only if it is separable. Furthermore, for the
(1 + 1)-mode case, it is shown that a (1 + 1)-mode Gaussian state ρ AB is separable if and only if v̄− ≥ 1,
TB
where v̄− is the smallest symplectic eigenvalue of the CM of the partial transpose ρ AB [24,29].
Comparing N with the entanglement, we conjecture that there exists some positive number d < 1
such that N (ρ AB ) ≤ d for any (n + m)-mode separable Gaussian state ρ AB , that is,

sup N (ρ AB ) ≤ d < 1.
ρ AB is separable

If this is true, then ρ AB is entangled when N (ρ AB ) > d. This will give a criterion of entanglement
for (n + m)-mode Gaussian states in terms of correlation N . Though we can not give a mathematical
proof, we show that this is true for (1 + 1)-mode separable Gaussian states with d ≤ 10 1
by a
numerical approach (Firstly, we randomly generated one million, five million, ten million, fifty million,
one hundred million, five hundred million separable Gaussian states with a, b, |c|, |d| ranging from 1
to 2, respectively. We found that the maximum of N is smaller than 0.09. Secondly, we used the same
method and extended the range to 5. Then, the maximum of N is smaller than 0.1. Thirdly, using the
same method and extending the range to 10, 100, 1000, 10000, respectively, we found that the maximum
of N is still smaller than 0.1. We repeated the above computations ten times, and the result is just
the same).

Proposition 2. N (ρ AB ) ≤ 0.1 for any (1 + 1)-mode separable Gaussian state ρ AB .

It is followed from Theorem 1 that the quantum correlation N exists in all entangled Gaussian
states and almost all separable Gaussian states except product states. In addition, Proposition 2 can be
viewed as a sufﬁcient condition for the entanglement of two-mode Gaussian states: if N (ρ AB ) > 0.1,
then ρ AB is entangled.
To have an insight into the behavior of this quantum correlation by N and to compare it with the
entanglement and the discords, we consider a class of physically relevant states–squeezed thermal
state (STS). This kind of Gaussian state is used by many authors to illustrate the behavior of several
interesting quantum correlations [12,13]. Recall that a two-mode Gaussian state ρ AB is an STS if
n̄ik
ρ AB = S(r )ν1 (n̄1 ) ⊗ ν2 (n̄2 )S(r )† , where νi (n̄i ) = ∑k (1+n̄i )k+1
|k k| is the thermal state with thermal
photon number n̄i (i = 1, 2) and S(r ) = exp{r ( â1† â2† − â1 â2 )} is the
two-mode squeezing operator.
Particularly, when n̄1 = n̄2 = 0, ρ AB is a pure two-mode squeezed vacuum state, also known as an
Einstein–Podolski–Rosen (EPR) state [24]. When n̄1 > 0 or n̄2 > 0, ρ AB is a mixed Gaussian state.

402
Entropy 2018, 20, 266

For ﬁxed r, ρ AB is separable (not in product form) for large enough n̄1 , n̄2 . Notice that if ρ is a STS with
the CM Γ0 in the standard form in Equation (2), then c = −d. In this case, by Theorem 2, we have

1 1
N (ρ AB ) = − . (7)
ab − c2 ab − c2 /2

Using this parametrization, one can get. a = 2n̄r + 1 + 2n̄1 (1 + n̄r ) + 2n̄2 n̄r , b = 2n̄r + 1 + 2n̄2 (1 + n̄r ) +
2n̄1 n̄r and c = −d = 2(1 + n̄1 + n̄2 ) n̄r (1 + n̄r ), where n̄r = sinh2 r ([12]). Especially, if n̄1 = n̄2 = n̄,
then ρ AB is called a symmetric squeezed thermal state (SSTS). Now assume that ρ AB is a SSTS. Then,
ρ AB is a mixed state if and only if n̄ > 0. The global purity of ρ AB is μ = Tr(ρ2AB ) = (1+12n̄)2 and the
T 1+2n̄
smallest symplectic eigenvalue v̄− of CM of ρ AB B
is v̄− = exp (2r )
. Moreover, ρ AB is entangled if and
only if v̄− < 1.
We first discuss the relation between N and the entanglement by considering SSTS. Regard
N (ρ AB ) as a function of μ and v̄− . From Figure 1a, for separable states, we see that the value N at the
separable SSTS is always smaller than 0.06, which supports positively Proposition 2. From Figure 1b,
for fixed purity μ, N turns out to be a decreasing function of v̄− . However, for fixed v̄− , N tends to 0
when μ increases.

Figure 1. (a) N (ρ AB ) for separable SSTSs as a function of μ and v̄− ; (b) from top to bottom,
v̄− = 1.0, 1.2, 1.5, 2.0.

For the entangled SSTS, one sees from Figure 2a,b that the value of N is from 0 to 1. This reveals
that, for some entangled SSTSs, N can be smaller than 10 1
. Thus, Proposition 2 is only a necessary
condition for a Gaussian state to be separable. For ﬁxed purity μ, from Figure 1b and 2b, N (ρ AB )
increases when entanglement increases (that is, v̄− → 0) and limμ→1,v̄− →0 N = 1. However, for ﬁxed
v̄− , the behavior of N on μ is more complex.

Figure 2. (a) N (ρ AB ) for entangled SSTS as a function of μ and v̄− ; (b) from top to bottom,
v̄− = 0.1, 0.2, 0.5, 0.8.

403
Entropy 2018, 20, 266

Regarding N as a function of r and n̄, Figure 3 shows that N (ρ AB ) is an increasing function of

r and a decreasing function of n̄, respectively. The value of N (ρ AB ) always gains the maximum at
n̄ = 0, that is, at pure states. Figure 3b also shows that N (ρ AB ) almost depends only on n̄ when r is
large enough because the curves for r = 5, 10, 20 are almost the same.

n

Figure 3. N (ρ AB ) for SSTS as a function of n̄ and r. (a) from top to bottom n̄ = 0, 0.5, 1, 2, 3; (b) from
top to bottom r = 0.5, 1, 5, 10, 20.

Recall that an n-mode Gaussian positive operator-valued measure (GPOVM) is a collection

of positive operators Π = {Π(z)} satisfying z Π(z)dz = I, where Π(z) = W (z)ωW † (z), z ∈ R2n
with W (z) the Weyl operators and ω an n-mode Gaussian state, which is called the seed of the
GPOVM Π [38,39]. Let ρ AB be a (n + m)-mode Gaussian state and Π = {Π(z)} be a GPOVM of the
subsystem B. Denote by ρ A (z) = p(1z) TrB (ρ AB I ⊗ Π(z)) the reduced state of the system A after the
GPOVM Π performed on the system B, where p(z) = Tr(ρ AB I ⊗ Π(z)). Write the von Neumann
entropy of a state ρ as S(ρ), that is, S(ρ) = −Tr(ρ log ρ). Then, the Gaussian QD of ρ AB is deﬁned as

D (ρ AB ) = S(ρ B ) − S(ρ AB ) + infΠ dzp(z)S(ρ A (z)) [12,13], where the inﬁmum takes over all GPOVMs
Π performed on the system B. It is known that a (1 + 1)-mode Gaussian state has zero Gaussian QD if
and only if it is a product state; in addition, for all separable (1 + 1)-mode Gaussian states, D (ρ AB ) ≤ 1;
if the standard form of the CM of a (1 + 1)-mode Gaussian state ρ AB is as in Equation (2), then
. /
D (ρ AB ) = f ( det B0 ) + f (v− ) + f (v+ ) + f ( inf det Eω ), (8)
ω

x −1 x −1
where the inﬁmum takes over all one-mode Gaussian states ω, f ( x ) = x+ 2 log 2 − 2 log 2 , v−
1 x +1

and v+ are the symplectic eigenvalues of the CM of ρ AB , Eω = A0 − C0 ( B0 + Γω )−1 C0T with Γω the
CM of ω. Let α = det A0 , β = det B0 , γ = det C0 , δ = det Γ0 , then we have [13]
⎧ 2 √ 2
⎨ 2γ +( β−1)(δ−α)+2|γ| γ +( β−1)(δ−α) if (δ − αβ)2 ≤ (1 + β)γ2 (α + δ),
inf det Eω = √ ( β −1)2 (9)
ω ⎩ αβ−γ2 +δ− γ4 +(δ−αβ)2 −2γ2 (αβ+δ)
2β otherwise.

In [14], the quantum GD DG is proposed. Consider an (n + m)-mode Gaussian state ρ AB ,

its Gaussian GD is deﬁned by DG (ρ AB ) = infΠ ||ρ AB − Π(ρ AB )||22 , where the inﬁmum takes
over all GPOVM Π
. performed on.system B, ||· ||2 stands for the Hilbert–Schmidt norm and
Π(ρ AB ) = dz( I ⊗ Π(z))ρ AB ( I ⊗ Π(z)). If ρ AB is a (1 + 1)-mode Gaussian state with the CM Γ
as in Equation (1) and Π is an one-mode Gaussian POVM performed on mode B with seed ω B , then
Π(ρ AB ) = ω A ⊗ ω B , where ω A is a Gaussian state of which the CM Γω A = A + C ( B + Γ B )−1 C T with
ΓωB the CM of ω B . It is known from [14] that

DG (ρ) = inf ||ρ AB − ω A ⊗ ω B ||22 . (10)

ωB

404
Entropy 2018, 20, 266

Now it is clear that, for (1 + 1)-mode Gaussian state ρ AB , DG (ρ AB ) = 0 if and only if ρ AB is a

product state.
By Theorem 1 and the results mentioned above, D, DG and N describe the same quantum
correlation for (1 + 1)-mode Gaussian states. However, from the deﬁnitions, D, DG use all GPOVMs,
while N only employs Gaussian unitary operations, which is simpler and may consume less physical
resources. Moreover, though an analytical formula of D is given for two-mode Gaussian states, the
expression is more complex and more difﬁcult to calculate (Equations (8) and (9)). DG is not handled
in general and there is no analytical formula for all (1 + 1)-mode Gaussian states (Equation (10)).
As far as we know, there are no results obtained on D, DG for general (n + m)-mode case.
To have a better insight into the behavior of N and DG , we compare them in scale with the help
of two-mode STS. Note that DG of any two-mode STS ρ AB is given by [14]

1 9
DG (ρ AB ) = − √ √ . (11)
ab − c2 ( 4ab − 3c2 + ab)2

Clearly, our formula (7) for N is simpler then formula (11) for DG .
Figures 4 and 5 are plotted in terms of photo number n̄ and squeezing parameter r. Figure 4 shows
that, for the case of SSTS and for 0 < r ≤ 2.5, we have DG (ρ AB ) < N (ρ AB ). This means that N is
better than DG when they are used to detect the correlation that they describe in the SSTS with r < 2.5.
Figure 5a reveals that, for the case of nonsymmetric STS and for r = 0.5, we have DG (ρ AB ) < N (ρ AB );
that is, N is better in this situation too. However, for r = 5, N and DG can not be compared with each
other globally, which suggests that one may use max{N (ρ AB ), DG (ρ AB )} to detect the correlation.

Figure 4. Comparison with DG (ρ AB ) for SSTS.

DG DG

Figure 5. Comparison with DG (ρ AB ) for nonsymmetric STS. (a) and (b) are correspond to nonsymmetric STS with
r = 0.5, 5, respectively.

405
Entropy 2018, 20, 266

5. Conclusions
In conclusion, we introduce a measure of quantum correlation by N for bipartite quantum states
in continuous-variable systems. This measure is introduced by performing Gaussian unitary operations
to a subsystem and the value of it is invariant for all quantum states under local Gaussian unitary
operations. N exists in all (n + m)-mode Gaussian states except product ones. In addition, N takes
values in [0, 1) and the upper bound 1 is sharp. An analytical formula of N for any (1 + 1)-mode
Gaussian states is obtained. Moreover, for any (n + m)-mode Gaussian states, an estimate of N
is established in terms of its covariance matrix. Numerical evidence shows that the inequality
N (ρ AB ) ≤ 0.1 holds for any (1 + 1)-mode separable Gaussian states ρ AB , which can be viewed as a
criterion of entanglement. It is worth noting that Gaussian QD, Gaussian GD and N measure the same
quantum correlation for (1 + 1)-mode Gaussian states. However, N is easer to calculate and can be
applied to any (n + m)-mode Gaussian states.

Acknowledgments: The authors would like to thank the anonymous referees for helpful comments and
suggestions that improved the original paper. This work is partially supported by the Natural Science Foundation
of China (11671006, 11671294) and the Outstanding Youth Foundation of Shanxi Province (201701D211001).
Author Contributions: Yangyang Wang completed the proofs of main theorems. The rest work of this paper was
accomplished by Xiaofei Qi and Jinchuan Hou.
Conﬂicts of Interest: The authors declare no conﬂict of interest.

References
1. Horodecki, R.; Horodecki, P.; Horodecki, M.; Horodecki, K. Quantum entanglement. Rev. Mod. Phys. 2009,
81, 865.
2. Dakić, B.; Lipp, Y.O.; Ma, X.; Ringbauer, M.; Kropatschek, S.; Barz, S.; Paterek, T.; Vedral, V.; Zeilinger, A.;
Brukner, Č.; et al. Quantum discord as resource for remote state preparation. Nat. Phys. 2012, 8, 666–670.
3. Madhok, V.; Datta, A. Interpreting quantum discord through quantum state merging. Phys. Rev. A 2011,
83, 032323.
4. Cavalcanti, D.; Aolita, L.; Boixo, S.; Modi, K.; Piani, M.; Winter, A. Operational interpretations of quantum
discord. Phys. Rev. A 2011, 83, 032324.
5. Datta, A.; Shaji, A.; Caves, C.M. Quantum discord and the power of one qubit. Phys. Rev. Lett. 2008,
100, 050502.
6. Ollivier, H.; Zurek, W.H. Quantum Discord: A Measure of the Quantumness of Correlations. Phys. Rev. Lett.
2001, 88, 017901.
7. Dakić, B.; Vedral, V.; Brukner, Č. Necessary and Sufﬁcient Condition for Nonzero Quantum Discord.
Phys. Rev. Lett. 2010, 105, 190502.
8. Luo, S.; Fu, S. Geometric measure of quantum discord. Phys. Rev. A 2010, 82, 034302.
9. Miranowicz, A.; Horodecki, P.; Chhajlany, R.W.; Tuziemski, J.; Sperling, J. Analytical progress on symmetric
geometric discord: Measurement-based upper bounds. Phys. Rev. A 2012, 86, 042123.
10. Luo, S.; Fu, S. Measurement-induced nonlocality. Phys. Rev. Lett. 2011, 82, 120401.
11. Luo, S. Using measurement-induced disturbance to characterize correlations as classical or quantum.
Phys. Rev. A 2008, 77, 022301.
12. Giorda, P.; Paris, M.G.A. Gaussian Quantum Discord. Phys. Rev. Lett. 2010, 105, 020503.
13. Adesso, G.; Datta, A. Quantum versus Classical Correlations in Gaussian States. Phys. Rev. Lett. 2010,
105, 030501.
14. Adesso, G.; Girolami, D. Gaussian geometric discord. Int. J. Quantum Inf. 2011, 9, 1773–1786.
15. Mišta, L.; Tatham, R., Jr.; Girolami, D.; Korolkova, N.; Adesso, G. Measurement-induced disturbances and
nonclassical correlations of Gaussian states. Phys. Rev. A 2011, 83, 042325.
16. Ma, R.F.; Hou, J.C.; Qi, X.F. Measurement-induced nonlocality for Gaussian states. Int. J. Theor. Phys. 2017,
56, 1132–1140.
17. Farace, A.; de Pasquale, A.; Rigovacca, L.; Giovannetti, V. Discriminating strength: A bona ﬁde measure of
non-classical correlations. New J. Phys. 2014, 16, 073010.

406
Entropy 2018, 20, 266

18. Rigovacca, L.; Farace, A.; de Pasquale, A.; Giovannetti, V. Gaussian discriminating strength. Phys. Rev. A
2015, 92, 042331.
19. Fu, L. Nonlocal effect of a bipartite system induced by local cyclic operation. Europhys. Lett. 2006, 75, 1.
20. Datta, A.; Gharibian, S. Signatures of nonclassicality in mixed-state quantum computation. Phys. Rev. A
2009, 79, 042325.
21. Gharibian, S. Quantifying nonclassicality with local unitary operations. Phys. Rev. A 2012, 86, 042106.
22. Braunstein, S.L.; van Loock, P. Quantum information with continuous variables. Rev. Mod. Phys. 2005,
77, 513.
23. Wang, X.B.; Hiroshimab, T.; Tomitab, A.; Hayashi, M. Quantum information with Gaussian states. Phys. Rep.
2007, 448, 1–111.
24. Weedbrook, C.; Pirandola, S.; García-Patrón, R.; Cerf, N.J.; Ralph, T.C.; Shapiro, J.H.; Lloyd, S. Gaussian
quantum information. Rev. Mod. Phys. 2012, 84, 621.
25. Simon, R.; Mukunda, N.; Dutta, B. Quantum-noise matrix for multimode systems: U(n) invariance, squeezing,
and normal forms. Phys. Rev. A 1994, 49, 1567.
26. Duan, L.M.; Giedke, G.; Cirac, J.I.; Zoller, P. Inseparability Criterion for Continuous Variable Systems.
Phys. Rev. Lett. 2000, 84, 2722.
27. Simon, R. Peres-Horodecki Separability Criterion for Continuous Variable Systems. Phys. Rev. Lett. 2000,
84, 2726.
28. Serafini, A. Multimode Uncertainty Relations and Separability of Continuous Variable States. Phys. Rev. Lett.
2006, 96, 110402.
29. Pirandola, S.; Serafini, A.; Lloyd, S. Correlation matrices of two-mode bosonic systems. Phys. Rev. A 2009,
79, 052327.
30. Anders, J. Estimating the degree of entanglement of unknown Gaussian states. arXiv 2012, arXiv:quant-ph/
0610263v1.
31. Williamson, J. On the algebraic problem concerning the normal forms of linear dynamical systems.
Am. J. Math. 1936, 58, 141–163.
32. Marian, P.; Marian, T.A. Uhlmann fidelity between two-mode Gaussian states. Phys. Rev. A 2012, 86, 022340.
33. Horn, R.A.; Johnson, C.R. Matrix Analysis; Cambridge University Press: Cambridge, UK, 2012.
34. Holevo, A.S. Quantum Systems, Channels, Information: A Mathematical Introduction; De Gruyter: Berlin,
Germany, 2012.
35. Peres, A. Separability Criterion for Density Matrices. Phys. Rev. Lett. 1997, 77, 1413.
36. Horodecki, M.; Horodecki, P.; Horodecki, R. Separability of mixed states: necessary and sufficient conditions.
Phys. Lett. A 1996, 1, 223.
37. Werner, R.F.; Wolf, M.M. Bound Entangled Gaussian States. Phys. Rev. Lett. 2001, 86, 3658.
38. Giedke, G.; Cirac, J.I. Characterization of Gaussian operations and distillation of Gaussian states. Phys. Rev. A
2002, 66, 032316.
39. Fiurášek, J.; Mišta, L., Jr. Gaussian localizable entanglement. Phys. Rev. A 2007, 75, 060302.

407
entropy
Article
Entropic Updating of Probabilities and
Density Matrices
Kevin Vanslette
Department of Physics, University at Albany (SUNY), Albany, NY 12222, USA; [email protected]

Received: 2 November 2017; Accepted: 2 December 2017; Published: 4 December 2017

Abstract: We ﬁnd that the standard relative entropy and the Umegaki entropy are designed for the
purpose of inferentially updating probabilities and density matrices, respectively. From the same set
of inferentially guided design criteria, both of the previously stated entropies are derived in parallel.
This formulates a quantum maximum entropy method for the purpose of inferring density matrices
in the absence of complete information.

Keywords: probability theory; entropy; quantum relative entropy; quantum information; quantum
mechanics; inference

1. Introduction
We design an inferential updating procedure for probability distributions and density matrices
such that inductive inferences may be made. The inferential updating tools found in this derivation take
the form of the standard and quantum relative entropy functionals, and thus we find the functionals
are designed for the purpose of updating probability distributions and density matrices, respectively.
Previously formulated design derivations which found the entropy to be a tool for inference originally
required five design criteria (DC) [1–3], this was reduced to four in [4–6], and then down to three in [7].
We reduced the number of required DC down to two while also providing the first design derivation of
the quantum relative entropy—using the same design criteria and inferential principles in both instances.
The designed quantum relative entropy takes the form of Umegaki’s quantum relative entropy,
and thus it has the “proper asymptotic form of the relative entropy in quantum (mechanics)” [8–10].
Recently, Wilming, etc. [11] gave an axiomatic characterization of the quantum relative entropy that
“uniquely determines the quantum relative entropy”. Our derivation differs from their’s, again in
that we design the quantum relative entropy for a purpose, but also that our DCs are imposed on
what turns out to be the functional derivative of the quantum relative entropy rather than on the
quantum relative entropy itself. The use of a quantum entropy for the purpose of inference has a large
history: Jaynes [12,13] invented the notion of the quantum maximum entropy method [14], while it
was perpetuated by [15–22] and many others. However, we find the quantum relative entropy to be the
suitable entropy for updating density matrices, rather than the von Neuman entropy [23], as is suggested
in [24]. I believe the present article provides the desired motivation for why the appropriate quantum
relative entropy for updating density matrices, from prior to posterior, should be logarithmic in form
while also providing a solution for updating non-uniform prior density matrices [24]. The relevant
results of these papers may be found using the quantum relative entropy with suitably chosen prior
density matrices.
It should be noted that because the relative entropies were reached by design, they may be
interpreted as such, “the relative entropies are tools for updating”, which means we no longer need to
attach an interpretation ex post facto—as a measure of disorder or amount of missing information. In this
sense, the relative entropies were built for the purpose of saturating their own interpretation [4,7], and,
therefore, the quantum relative entropy is the tool designed for updating density matrices.

Entropy 2017, 19, 664; doi:10.3390/e19120664 409 www.mdpi.com/journal/entropy

Entropy 2017, 19, 664

This article takes an inferential approach to probabilities and density matrices that is expected
to be notionally consistent with the Bayesian derivations of Quantum Mechanics, such as Entropic
Dynamics [7,25–27], as well as Bayesian interpretations of Quantum Mechanics, such as QBism [28].
The quantum maximum entropy method is, however, expected to be useful independent of one’s
interpretation of Quantum Mechanics because the entropy is designed at the level of density matrices
rather than being formulated from arguments about the “inner workings” of Quantum Mechanics.
This inferential approach is, at the very least, verbally convenient so we will continue writing in
this language.
A few applications of the quantum maximum entropy method are given in an another article [29].
By maximizing the quantum relative entropy with respect to a “data constraint” and the appropriate
prior density matrix, the Quantum Bayes Rule [30–34] (a positive-operator valued measure (POVM)
measurement and collapse) is derived. The quantum maximum entropy method can reproduce the
density matrices in [35,36] that are cited as “Quantum Bayes Rules”, but the required constraints
are difficult to motivate; however, it is expected that the results of this paper may be useful for
further understanding Machine Learning techniques that involve the quantum relative entropy [37].
The Quantum Bayes Rule derivation in [29] is analogous to the standard Bayes Rule derivation from
the relative entropy given in [38], as was suggested to be possible in [24]. This article provides the
foundation for [29], and thus, the quantum maximum entropy method unifies a few topics in Quantum
Information and Quantum Measurement through entropic inference.
As is described in this article and in [29], the quantum maximum entropy method is able to
provide solutions even if the constraints and prior density matrix in question do not all mutually
commute. This might be useful for subjects as far reaching as [39], which seeks to use Quantum Theory
as a basis for building models for cognition. The immediate correspondence is that the quantum
maximum entropy method might provide a solution toward addressing the empirical evidence for
noncommutative cognition, which is how one’s cognition changes when addressing questions in
permuted order [39]. A simpler model for noncommutative cognition may also be possible by applying
sequential updates via the standard maximum entropy method with their order permuted. Sequential
updating does not, in general, give the same resultant probability distribution when the updating order
is permuted—this is argued to be a feature of the standard maximum entropy method [40]. Similarly,
sequential updating in the quantum maximum entropy method also has this feature, but it should be
noted that the noncommutativity of sequential updating is different in principle than simultaneously
updating with respect to expectation values of noncommuting operators.
The remainder of the paper is organized as follows: first, we will discuss some universally
applicable principles of inference and motivate the design of an entropy function able to rank
probability distributions. This entropy function will be designed such that it is consistent with
inference by applying a few reasonable design criteria, which are guided by the aforementioned
principles of inference. Using the same principles of inference and design criteria, we find the form
of the quantum relative entropy suitable for inference. The solution to an example of updating 2 × 2
prior density matrices with respect to expectation values over spin matrices that do not commute
with the prior via the quantum maximum entropy method is given in the Appendix B. We end with
concluding remarks (I thank the reviewers for providing several useful references in this section).

2. The Design of Entropic Inference

Inference is the appropriate updating of probability distributions when new information is
received. Bayes rule and Jeffrey’s rule are both equipped to handle information in the form of data;
however, the updating of a probability distribution due to the knowledge of an expectation value was
realized by Jaynes [12–14] through the method of maximum entropy. The two methods for inference
were thought to be devoid of one another until the work of [38,40], which showed Bayes Rule and
Jeffrey’s Rule to be consistent with the method of maximum entropy when the expectation values were

410
Entropy 2017, 19, 664

in the form of data [38,40]. In the spirit of the derivation we will carry on as if the maximum entropy
method were not known and show how it may be derived as an application of inference.
Given a probability distribution ϕ( x ) over a general set of propositions x ∈ X, it is self evident
that if new information is learned, we are entitled to assign a new probability distribution ρ( x ) that
somehow reﬂects this new information while also respecting our prior probability distribution ϕ( x ).
The main question we must address is: “Given some information, to what posterior probability
distribution ρ( x ) should we update our prior probability distribution ϕ( x )?”, that is,
∗
ϕ( x ) −→ ρ( x )?

This specifies the problem of inductive inference. Since “information” has many colloquial,
yet potentially conflicting, definitions, we remove potential confusion by defining information
operationally (∗) as the rationale that causes a probability distribution to change (inspired by and
adapted from [7]). Directly from [7]:

Our goal is to design a method that allows a systematic search for the preferred posterior
distribution. The central idea, ﬁrst proposed in [4], is disarmingly simple: to select the
posterior, ﬁrst rank all candidate distributions in increasing order of preference and then pick
the distribution that ranks the highest. Irrespective of what it is that makes one distribution
preferable over another (we will get to that soon enough), it is clear that any ranking
according to preference must be transitive: if distribution ρ1 is preferred over distribution
ρ2 , and ρ2 is preferred over ρ3 , then ρ1 is preferred over ρ3 . Such transitive rankings are
implemented by assigning to each ρ( x ) a real number S[ρ], which is called the entropy of ρ,
in such a way that if ρ1 is preferred over ρ2 , then S[ρ1 ] > S[ρ2 ]. The selected distribution
(one or possibly many, for there may be several equally preferred distributions) is that
which maximizes the entropy functional.

Because we wish to update from prior distributions ϕ to posterior distributions ρ by ranking,

the entropy functional S[ρ, ϕ] is a real function of both ϕ and ρ. In the absence of new information,
there is no available rationale to prefer any ρ to the original ϕ, and thereby the relative entropy should
be designed such that the selected posterior is equal to the prior ϕ (in the absence of new information).
The prior information encoded in ϕ( x ) is valuable and we should not change it unless we are informed
otherwise. Due to our deﬁnition of information, and our desire for objectivity, we state the predominate
guiding principle for inductive inference:

The Principle of Minimal Updating (PMU):

A probability distribution should only be updated to the extent required by the new information.

This simple statement provides the foundation for inference [7]. If the updating of probability
distributions is to be done objectively, then possibilities should not be needlessly ruled out or
suppressed. Being informationally stingy, that we should only update probability distributions
when the information requires it, pushes inductive inference toward objectivity. Thus, using the PMU
helps formulate a pragmatic (and objective) procedure for making inferences using (informationally)
subjective probability distributions [41].
This method of inference is only as universal and general as its ability to apply equally well to
any specific inference problem. The notion of “specificity” is the notion of statistical independence;
a special case is only special in that it is separable from other special cases. The notion that systems
may be “sufficiently independent” plays a central and deep-seated role in science and the idea that
some things can be neglected and that not everything matters, is implemented by imposing criteria
that tells us how to handle independent systems [7]. Ironically, the universally shared property by all
specific inference problems is their ability to be independent of one another—they share independence.
Thus, a universal inference scheme based on the PMU permits:

411
Entropy 2017, 19, 664

Properties of Independence (PI):

Subdomain Independence: When information is received about one set of propositions, it should
not affect or change the state of knowledge (probability distribution) of the other propositions
(else information was also received about them too);

And,

Subsystem Independence: When two systems are a priori believed to be independent and we only
receive information about one, then the state of knowledge of the other system remains unchanged.

The PIs are special cases of the PMU that ultimately take the form of design criteria in this design
derivation. The process of constraining the form of S[ρ, ϕ] by imposing design criteria may be viewed
as the process of eliminative induction, and after sufﬁcient constraining, a single form for the entropy
remains. Thus, the justiﬁcation behind the surviving entropy is not that it leads to demonstrably correct
inferences, but, rather, that all other candidate entropies demonstrably fail to perform as desired [7].
Rather than the design criteria instructing one how to update, they instruct in what instances one should
not update. That is, rather than justifying one way to skin a cat over another, we tell you when not to
skin it, which is operationally unique—namely you don’t do it—luckily enough for the cat.

The Design Criteria and the Standard Relative Entropy

The following design criteria (DC), guided by the PMU, are imposed and formulate the standard
relative entropy as a tool for inference. The form of this presentation is inspired by [7].
DC1: Subdomain Independence
We keep DC1 from [7] and review it below. DC1 imposes the ﬁrst instance of when one should
not update—the Subdomain PI. Suppose the information to be processed does not refer to a particular
subdomain D of the space X of xs. In the absence of new information about D , the PMU insists we do
not change our minds about probabilities that are conditional on D . Thus, we design the inference
method so that ϕ( x |D), the prior probability of x conditional on x ∈ D , is not updated and therefore
the selected conditional posterior is
P( x |D) = ϕ( x |D). (1)

(The notation will be as follows: we denote priors by ϕ, candidate posteriors by lower case ρ, and the
selected posterior by upper case P.) We emphasize the point is not that we make the unwarranted
assumption that keeping ϕ( x |D) unchanged is guaranteed to lead to correct inferences. It need not;
induction is risky. The point is, rather, that, in the absence of any evidence to the contrary, there is no
reason to change our minds and the prior information takes priority.
DC1 Implementation
Consider the set of microstates xi ∈ X belonging to either of two non-overlapping domains D or
its compliment D , such that X = D ∪ D and ∅ = D ∩ D . For convenience, let ρ( xi ) = ρi . Consider
the following constraints:

ρ(D) = ∑ ρi and ρ(D ) = ∑ ρi , (2)

i ∈D i ∈D

such that ρ(D) + ρ(D ) = 1, and the following “local” expectation value constraints over D and D ,

A = ∑ ρi Ai and A = ∑ ρi Ai , (3)

i ∈D i ∈D

where A = A( x ) is a scalar function of x and Ai ≡ A( xi ). As we are searching for the candidate

distribution which maximizes S while obeying (2) and (3), we maximize the entropy S ≡ S[ρ, ϕ] with
respect to these expectation value constraints using the Lagrange multiplier method,

412
Entropy 2017, 19, 664

0 = δ S − λ[ρ(D) − ∑ ρi ] − μ[ A − ∑ ρi Ai ]
i ∈D i ∈D

−λ [ρ(D ) − ∑ ρi ] − μ [ A − ∑ ρi Ai ] ,
i ∈D i ∈D

and, thus, the entropy is maximized when the following differential relationships hold:

δS
= λ + μAi ∀ i ∈ D, (4)
δρi
δS
= λ + μ Ai ∀ i ∈ D. (5)
δρi

Equations (2)–(5), are n + 4 equations we must solve to ﬁnd the four Lagrange multipliers {λ, λ , μ, μ }
and the n probability values {ρi } associated to the n microstates { xi }. If the subdomain constraint
DC1 is imposed in the most restrictive case, then it will hold in general. The most restrictive case
requires splitting X into a set of {Di } domains such that each Di singularly includes one microstate xi .
This gives,

δS
= λi + μi Ai in each Di . (6)
δρi

Because the entropy S = S[ρ1 , ρ2 , ...; ϕ1 , ϕ2 , ...] is a functional over the probability of each microstate’s
posterior and prior distribution, its variational derivative is also a function of said probabilities
in general,

δS
≡ φi (ρ1 , ρ2 , ...; ϕ1 , ϕ2 , ...) = λi + μi Ai for each (i, Di ). (7)
δρi

DC1 is imposed by constraining the form of φi (ρ1 , ρ2 , ...; ϕ1 , ϕ2 , ...) = φi (ρi ; ϕ1 , ϕ2 , ...) to ensure that
changes in Ai → Ai + δAi have no inﬂuence over the value of ρ j in domain D j , through φi , for i
= j.
If there is no new information about propositions in D j , its distribution should remain equal to ϕ j
by the PMU. We further restrict φi such that an arbitrary variation of ϕ j → ϕ j + δϕ j (a change in the
prior state of knowledge of the microstate j) has no effect on ρi for i
= j and therefore DC1 imposes
φi = φi (ρi , ϕi ), as is guided by the PMU. At this point, it is easy to generalize the analysis to continuous
microstates such that the indices become continuous i → x, sums become integrals, and discrete
probabilities become probability densities ρi → ρ( x ).
Remark
We are designing the entropy for the purpose of ranking posterior probability distributions (for the
purpose of inference); however, the highest ranked distribution is found by setting the variational
derivative of S[ρ, ϕ] equal to the variations of the expectation value constraints by the Lagrange
multiplier method,

δS
= λ + ∑ μ i A i ( x ). (8)
δρ( x ) i

δS
Therefore, the real quantity of interest is δρ( x )
rather than the speciﬁc form of S[ρ, ϕ]. All forms of S[ρ, ϕ]
δS
that give the correct form of are equally valid for the purpose of inference. Thus, every design
δρ( x )
criteria may be made on the variational derivative of the entropy rather than the entropy itself,
which we do. When maximizing the entropy, for convenience, we will let,

δS
≡ φx (ρ( x ), ϕ( x )), (9)
δρ( x )

413
Entropy 2017, 19, 664

and further use the shorthand φx (ρ, ϕ) ≡ φx (ρ( x ), ϕ( x )), in all cases.
DC1’: In the absence of new information, our new state of knowledge ρ( x ) is equal to the old state of
knowledge ϕ( x ).
This is a special case of DC1, and is implemented differently than in [7]. The PMU is in principle
a statement about informational honestly—that is, one should not “jump to conclusions” in light
of new information and in the absence of new information, one should not change their state of
knowledge. If no new information is given, the prior probability distribution ϕ( x ) does not change,
that is, the posterior probability distribution ρ( x ) = ϕ( x ) is equal to the prior probability. If we
maximizing the entropy without applying constraints,

δS
= 0, (10)
δρ( x )

then DC1’ imposes the following condition:

δS
= φx (ρ, ϕ) = φx ( ϕ, ϕ) = 0, (11)
δρ( x )

for all x in this case. This special case of the DC1 and the PMU turns out to be incredibly constraining
as we will see over the course of DC2.
Comment
If the variable x is continuous, DC1 requires that when information refers to points infinitely close
but just outside the domain D , that it will have no influence on probabilities conditional on D [7].
This may seem surprising as it may lead to updated probability distributions that are discontinuous.
Is this a problem? No.
In certain situations (e.g., physics) we might have explicit reasons to believe that conditions of
continuity or differentiability should be imposed and this information might be given to us in a variety
of ways. The crucial point, however—and this is a point that we keep and will keep reiterating—is
that unless such information is explicitly given, we should not assume it. If the new information leads
to discontinuities, so be it.
DC2: Subsystem Independence
DC2 imposes the second instance of when one should not update—the Subsystem PI.
We emphasize that DC2 is not a consistency requirement. The argument we deploy is not that both
the prior and the new information tells us the systems are independent, in which case consistency
requires that it should not matter whether the systems are treated jointly or separately. Rather, DC2
refers to a situation where the new information does not say whether the systems are independent
or not, but information is given about each subsystem. The updating is being designed so that the
independence reflected in the prior is maintained in the posterior by default via the PMU and the
second clause of the PIs [7].
The point is not that when we have no evidence for correlations we draw the firm conclusion that
the systems must necessarily be independent. They could indeed have turned out to be correlated and
then our inferences would be wrong. Again, induction involves risk. The point is rather that if the
joint prior reflects independence and the new evidence is silent on the matter of correlations, then the
prior independence takes precedence. As before, in this case subdomain independence, the probability
distribution should not be updated unless the information requires it [7].
DC2 Implementation
Consider a composite system, x = ( x1 , x2 ) ∈ X = X1 × X2 . Assume that all prior evidence led
us to believe the subsystems are independent. This belief is reflected in the prior distribution: if the
individual system priors are ϕ1 ( x1 ) and ϕ2 ( x2 ), then the prior for the whole system is their product

414
Entropy 2017, 19, 664

ϕ1 ( x1 ) ϕ2 ( x2 ). Further suppose that new information is acquired such that ϕ1 ( x1 ) would by itself be
updated to P1 ( x1 ) and that ϕ2 ( x2 ) would be itself be updated to P2 ( x2 ). By design, the implementation
of DC2 constrains the entropy functional such that, in this case, the joint product prior ϕ1 ( x1 ) ϕ2 ( x2 )
updates to the selected product posterior P1 ( x1 ) P2 ( x2 ) [7].
The argument below is considerably simpliﬁed if we expand the space of probabilities to include
distributions that are not necessarily normalized. This does not represent any limitation because a
normalization constraint may always be applied. We consider a few special cases below:
Case 1: We receive the extremely constraining information that the posterior distribution for system 1
is completely speciﬁed to be P1 ( x1 ) while we receive no information at all about system 2. We treat
the two systems jointly. Maximize the joint entropy S[ρ( x1 , x2 ), ϕ( x1 ) ϕ( x2 )] subject to the following
constraints on the ρ( x1 , x2 ) :
dx2 ρ( x1 , x2 ) = P1 ( x1 ) . (12)

Notice that the probability of each x1 ∈ X1 within ρ( x1 , x2 ) is being constrained to P1 ( x1 ) in the

marginal. We therefore need a one Lagrange multiplier λ1 ( x1 ) for each x1 ∈ X1 to tie each value of

dx2 ρ( x1 , x2 ) to P1 ( x1 ). Maximizing the entropy with respect to this constraint is,
2 3
δ S − dx1 λ1 ( x1 ) dx2 ρ( x1 , x2 ) − P1 ( x1 ) = 0, (13)

which requires that

λ1 ( x1 ) = φx1 x2 (ρ( x1 , x2 ), ϕ1 ( x1 ) ϕ2 ( x2 )) , (14)

for arbitrary variations of ρ( x1 , x2 ). By design, DC2 is implemented by requiring ϕ1 ϕ2 → P1 ϕ2 in this

case, therefore,
λ1 ( x1 ) = φx1 x2 ( P1 ( x1 ) ϕ2 ( x2 ), ϕ1 ( x1 ) ϕ2 ( x2 )) . (15)

This equation must hold for all choices of x2 and all choices of the prior ϕ2 ( x2 ) as λ1 ( x1 ) is independent
of x2 . Suppose we had chosen a different prior ϕ2 ( x2 ) = ϕ2 ( x2 ) + δϕ2 ( x2 ) that disagrees with ϕ2 ( x2 ).
For all x2 and δϕ2 ( x2 ), the multiplier λ1 ( x1 ) remains unchanged as it constrains the independent
ρ( x1 ) → P1 ( x1 ). This means that any dependence that the right-hand side might potentially have had
on x2 and on the prior ϕ2 ( x2 ) must cancel out. This means that

φx1 x2 ( P1 ( x1 ) ϕ2 ( x2 ), ϕ1 ( x1 ) ϕ2 ( x2 )) = f x1 ( P1 ( x1 ), ϕ1 ( x1 )). (16)

Since ϕ2 is arbitrary in f , suppose further that we choose a constant prior set equal to one,
ϕ2 ( x2 ) = 1, therefore

f x1 ( P1 ( x1 ), ϕ1 ( x1 )) = φx1 x2 ( P1 ( x1 ) ∗ 1, ϕ1 ( x1 ) ∗ 1) = φx1 ( P1 ( x1 ), ϕ1 ( x1 )) (17)

in general. This gives

λ1 ( x1 ) = φx1 ( P1 ( x1 ), ϕ1 ( x1 )) . (18)

The left-hand side does not depend on x2 , and therefore neither does the right-hand side. An argument
exchanging systems 1 and 2 gives a similar result.
Case 1—Conclusion: When the system 2 is not updated the dependence on ϕ2 and x2 drops out,

φx1 x2 ( P1 ( x1 ) ϕ2 ( x2 ), ϕ1 ( x1 ) ϕ2 ( x2 )) = φx1 ( P1 ( x1 ), ϕ1 ( x1 )) , (19)

and vice-versa when system 1 is not updated,

φx1 x2 ( ϕ1 ( x1 ) P2 ( x2 ), ϕ1 ( x1 ) ϕ2 ( x2 )) = φx2 ( P2 ( x2 ), ϕ2 ( x2 )) . (20)

415
Entropy 2017, 19, 664

As we seek the general functional form of φx1 x2 , and because the x2 dependence drops out of (19)
and the x1 dependence drops out of (20) for arbitrary ϕ1 , ϕ2 and ϕ12 = ϕ1 ϕ2 , the explicit coordinate
dependence in φ consequently drops out of both such that,

φx1 x2 → φ, (21)

as φ = φ(ρ( x ), ϕ( x )) must only depend on coordinates through the probability distributions

themselves. (As a double check, explicit coordinate dependence was included in the following
computations but inevitably dropped out due to the form the functional equations and DC1’. By the
argument above, and for simplicity, we drop the explicit coordinate dependence in φ here.)
Case 2: Now consider a different special case in which the marginal posterior distributions for systems
1 and 2 are both completely speciﬁed to be P1 ( x1 ) and P2 ( x2 ), respectively. Maximize the joint entropy
S[ρ( x1 , x2 ), ϕ( x1 ) ϕ( x2 )] subject to the following constraints on the ρ( x1 , x2 ) ,

dx2 ρ( x1 , x2 ) = P1 ( x1 ) and dx1 ρ( x1 , x2 ) = P2 ( x2 ) . (22)

Again, this is one constraint for each value of x1 and one constraint for each value of x2 , which,
therefore, require the separate multipliers μ1 ( x1 ) and μ2 ( x2 ). Maximizing S with respect to these
constraints is then,
2
0 = δ S − dx1 μ1 ( x1 ) dx2 ρ( x1 , x2 ) − P1 ( x1 )
3
− dx2 μ2 ( x2 ) dx1 ρ( x1 , x2 ) − P2 ( x2 ) , (23)

leading to

μ1 ( x1 ) + μ2 ( x2 ) = φ (ρ( x1 , x2 ), ϕ1 ( x1 ) ϕ2 ( x2 )) . (24)

The updating is being designed so that ϕ1 ϕ2 → P1 P2 , as the independent subsystems are being updated
based on expectation values which are silent about correlations. DC2 thus imposes,

μ1 ( x1 ) + μ2 ( x2 ) = φ ( P1 ( x1 ) P2 ( x2 ), ϕ1 ( x1 ) ϕ2 ( x2 )) . (25)

Write (25) as,

μ1 ( x1 ) = φ ( P1 ( x1 ) P2 ( x2 ), ϕ1 ( x1 ) ϕ2 ( x2 )) − μ2 ( x2 ). (26)

The left-hand side is independent of x2 so we can perform a trick similar to that we used before.
Suppose we had chosen a different constraint P2 ( x2 ) that differs from P2 ( x2 ) and a new prior ϕ2 ( x2 )
that differs from ϕ2 ( x2 ) except at the value x̄2 . At the value x̄2 ,the multiplier μ1 ( x1 ) remains unchanged
for all P2 ( x2 ), ϕ2 ( x2 ), and thus x2 . This means that any dependence that the right-hand side might
potentially have had on x2 and on the choice of P2 ( x2 ), ϕ2 ( x2 ) must cancel out, leaving μ1 ( x1 )
unchanged. That is, the Lagrange multiplier μ( x2 ) “pushes out” these dependences such that

φ ( P1 ( x1 ) P2 ( x2 ), ϕ1 ( x1 ) ϕ2 ( x2 )) − μ2 ( x2 ) = g( P1 ( x1 ), ϕ1 ( x1 )). (27)

Because g( P1 ( x1 ), ϕ1 ( x1 )) is independent of arbitrary variations of P2 ( x2 ) and ϕ2 ( x2 ) on the left hand

side (LHS) above—it is satisﬁed equally well for all choices. The form of g = φ( P1 ( x1 ), q1 ( x1 ))
is apparent if P2 ( x2 ) = ϕ2 ( x2 ) = 1 as μ2 ( x2 ) = 0 similar to Case 1 as well as DC1’. Therefore,
the Lagrange multiplier is
μ1 ( x1 ) = φ ( P1 ( x1 ), ϕ1 ( x1 )) . (28)

416
Entropy 2017, 19, 664

A similar analysis carried out for μ2 ( x2 ) leads to

μ2 ( x2 ) = φ ( P2 ( x2 ), ϕ2 ( x2 )) . (29)

Case 2—Conclusion: Substituting back into (25) gives us a functional equation for φ ,

φ ( P1 P2 , ϕ1 ϕ2 ) = φ ( P1 , ϕ1 ) + φ ( P2 , ϕ2 ) . (30)

The general solution for this functional equation is derived in the Appendix A.3, and is

φ(ρ, ϕ) = a1 ln(ρ( x )) + a2 ln( ϕ( x )), (31)

where a1 , a2 are constants. The constants are ﬁxed by using DC1’. Letting ρ1 ( x1 ) = ϕ1 ( x1 ) = ϕ1 gives
φ( ϕ, ϕ) = 0 by DC1’, and, therefore,

φ( ϕ, ϕ) = ( a1 + a2 ) ln( ϕ) = 0, (32)

so we are forced to conclude a1 = − a2 for arbitrary ϕ. Letting a1 ≡ A = −| A| such that we are really
maximizing the entropy (although this is purely aesthetic) gives the general form of φ to be
ρ( x )
φ(ρ, ϕ) = −| A| ln . (33)
ϕ( x )

As long as A
= 0, the value of A is arbitrary as it always can be absorbed into the Lagrange multipliers.
The general form of the entropy designed for the purpose of inference of ρ is found by integrating φ,
and, therefore,
ρ( x )
S(ρ( x ), ϕ( x )) = −| A| dx (ρ( x ) ln − ρ( x )) + C [ ϕ]. (34)
ϕ( x )

The constant in ρ, C [ ϕ], will always drop out when varying ρ. The apparent extra term (| A| ρ( x )dx)
from integration cannot be dropped while simultaneously satisfying DC1’, which requires ρ( x ) = ϕ( x )
in the absence of constraints or when there is no change to one’s information. In previous versions

where the integration term (| A| ρ( x )dx) is dropped, one obtains solutions like ρ( x ) = e−1 ϕ( x )
(independent of whether ϕ( x ) was previously normalized or not) in the absence of new information.
Obviously, this factor can be taken care of by normalization, and, in this way, both forms of the
entropy are equally valid; however, this form of the entropy better adheres to the PMU through DC1’.

Given that we may regularly impose normalization, we may drop the extra ρ( x )dx term and C [ ϕ].
For convenience then, (34) becomes
ρ( x )
S(ρ( x ), ϕ( x )) → S∗ (ρ( x ), ϕ( x )) = −| A| dx ρ( x ) ln , (35)
ϕ( x )

which is a special case when the normalization constraint is being applied. Given normalization is
applied, the same selected posterior ρ( x ) maximizes both S(ρ( x ), ϕ( x )) and S∗ (ρ( x ), ϕ( x )), and the
star notation may be dropped.
Remarks
It can be seen that the relative entropy is invariant under coordinate transformations. This implies
that a system of coordinates carry no information and it is the “character” of the probability
distributions that are being ranked against one another rather than the speciﬁc set of propositions or
microstates they describe.

417
Entropy 2017, 19, 664

The general solution to the maximum entropy procedure with respect to N linear constraints in ρ,
Ai ( x ), and normalization gives a canonical-like selected posterior probability distribution,

ρ( x ) = ϕ( x ) exp ∑ αi Ai ( x ) . (36)
i

The positive constant | A| may always be absorbed into the Lagrange multipliers so we may let it equal
unity without loss of generality. DC1’ is fully realized when we maximize with respect to a constraint

on ρ( x ) that is already held by ϕ( x ), such as x2 = x2 ρ( x ) dx, which happens to have the same

value as x2 ϕ = x2 ϕ( x ) dx, then its Lagrange multiplier is forcibly zero α1 = 0 (as can be seen in
(36) using (34)), in agreement with Jaynes. This gives the expected result ρ( x ) = ϕ( x ) as there is no
new information. Our design has arrived at a reﬁned maximum entropy method [12] as a universal
probability updating procedure [38].

3. The Design of the Quantum Relative Entropy

In the last section, we assumed that the universe of discourse (the set of relevant propositions
or microstates) X = A × B × ... was known. In quantum physics, things are a bit more ambiguous
because many probability distributions, or many experiments, can be associated with a given density
matrix. In this sense, it is helpful to think of density matrices as “placeholders” for probability
distributions rather than a probability distributions themselves. As any probability distribution from a
given density matrix, ρ(·) = Tr(|· ·|ρ̂), may be ranked using the standard relative entropy, it is unclear
why we would chose one universe of discourse over another. In lieu of this, such that one universe of
discourse is not given preferential treatment, we consider ranking entire density matrices against one
another. Probability distributions of interest may be found from the selected posterior density matrix.
This moves our universe of discourse from sets of propositions X → H to Hilbert space(s).
When the objects of study are quantum systems, we desire an objective procedure to update from
a prior density matrix ϕ̂ to a posterior density matrix ρ̂. We will apply the same intuition for ranking
probability distributions (Section 2) and implement the PMU, PI, and design criteria to the ranking
of density matrices. We therefore ﬁnd the quantum relative entropy S(ρ̂, ϕ̂) to be designed for the
purpose of inferentially updating density matrices.

3.1. Designing the Quantum Relative Entropy

In this section, we design the quantum relative entropy using the same inferentially guided design
criteria as were used in the standard relative entropy.
DC1: Subdomain Independence
The goal is to design a function S(ρ̂, ϕ̂) that is able to rank density matrices. This insists that
S(ρ̂, ϕ̂) be a real scalar valued function of the posterior ρ̂, and prior ϕ̂ density matrices, which we will
call the quantum relative entropy or simply the entropy. An arbitrary variation of the entropy with
respect to ρ̂ is,

δS(ρ̂, ϕ̂) δS(ρ̂, ϕ̂) δS(ρ̂, ϕ̂) δS(ρ̂, ϕ̂)

δ S(ρ̂, ϕ̂) = ∑ δρij
δρij = ∑ δρ̂ ij
δ(ρ̂)ij = ∑ δρ̂ T ji
δ(ρ̂)ij = Tr
δρ̂ T
δρ̂ , (37)
ij ij ij

where Tr(...) is the trace. We wish to maximize this entropy with respect to expectation value
constraints, such as A = Tr( Âρ̂) on ρ̂. Using the Lagrange multiplier method to maximize the
entropy with respect to A and normalization, and setting the variation equal to zero,

δ S(ρ̂, ϕ̂) − λ[Tr(ρ̂) − 1] − α[Tr( Âρ̂) − A] = 0, (38)

418
Entropy 2017, 19, 664

where λ and α are the Lagrange multipliers for the respective constraints. Because S(ρ̂, ϕ̂) is a real
number, we inevitably require δS to be real, but without imposing this directly, we ﬁnd that requiring
δS to be real requires ρ̂, Â to be Hermitian. At this point, it is simpler to allow for arbitrary variations
of ρ̂ such that,
δS(ρ̂, ϕ̂)
Tr − λ1̂ − α Â δρ̂ = 0. (39)
δρ̂ T

For these arbitrary variations, the variational derivative of S must satisfy,

δS(ρ̂, ϕ̂)
= λ1̂ + α Â (40)
δρ̂ T

δS(ρ̂, ϕ̂)
at the maximum. As in the remark earlier, all forms of S that give the correct form of δρ̂T under
variation are equally valid for the purpose of inference. For notational convenience, we let

δS(ρ̂, ϕ̂)
≡ φ(ρ̂, ϕ̂), (41)
δρ̂ T

which is a matrix valued function of the posterior and prior density matrices. The form of φ(ρ̂, ϕ̂) is
already “local” in ρ̂ (the variational derivative is with respect to the whole density matrix), so we don’t
need to constrain it further as we did in the original DC1.
DC1’: In the absence of new information, the new state ρ̂ is equal to the old state ϕ̂
Applied to the ranking of density matrices, in the absence of new information, the density matrix
ϕ̂ should not change, that is, the posterior density matrix ρ̂ = ϕ̂ is equal to the prior density matrix.
Maximizing the entropy without applying any constraints gives,

δS(ρ̂, ϕ̂)
= 0̂, (42)
δρ̂ T

and, therefore, DC1’ imposes the following condition in this case:

δS(ρ̂, ϕ̂)
= φ(ρ̂, ϕ̂) = φ( ϕ̂, ϕ̂) = 0̂. (43)
δρ̂ T

As in the original DC1’, if ϕ̂ is known to obey some expectation value Â, and then if one goes
out of their way to constrain ρ̂ to that expectation value and nothing else, it follows from the PMU that
ρ̂ = ϕ̂, as no information has been gained. This is not imposed directly but can be verified later.
DC2: Subsystem Independence
The discussion of DC2 is the same as the standard relative entropy DC2—it is not a consistency
requirement, and the updating is designed so that the independence reflected in the prior is maintained
in the posterior by default via the PMU when the information provided is silent about correlations.
DC2 Implementation
Consider a composite system living in the Hilbert space H = H1 ⊗ H2 . Assume that all prior
evidence led us to believe the systems were independent. This is reflected in the prior density matrix:
if the individual system priors are ϕ̂1 and ϕ̂2 , then the joint prior for the whole system is ϕ̂1 ⊗ ϕ̂2 .
Further suppose that new information is acquired such that ϕ̂1 would itself be updated to ρ̂1 and that
ϕ̂2 would be itself be updated to ρ̂2 . By design, the implementation of DC2 constrains the entropy
functional such that in this case, the joint product prior density matrix ϕ̂1 ⊗ ϕ̂2 updates to the product
posterior ρ̂1 ⊗ ρ̂2 so that inferences about one do not affect inferences about the other.

419
Entropy 2017, 19, 664

The argument below is considerably simpliﬁed if we expand the space of density matrices to
include density matrices that are not necessarily normalized. This does not represent any limitation
because normalization can always be easily achieved as one additional constraint. We consider a few
special cases below:
Case 1: We receive the extremely constraining information that the posterior distribution for system 1
is completely speciﬁed to be ρ̂1 while we receive no information about system 2 at all. We treat the
two systems jointly. Maximize the joint entropy S[ρ̂12 , ϕ̂1 ⊗ ϕ̂2 ], subject to the following constraints on
the ρ̂12 ,
Tr2 (ρ̂12 ) = ρ̂1 . (44)

Notice all of the N 2 elements in H1 of ρ̂12 are being constrained. We therefore need a Lagrange
multiplier which spans H1 and therefore it is a square matrix λ̂1 . This is readily seen by observing the
component form expressions of the Lagrange multipliers (λ̂1 )ij = λij . Maximizing the entropy with
respect to this H2 independent constraint is

0 = δ S − ∑ λij Tr2 (ρ̂1,2 ) − ρ̂1 , (45)
ij ij

but reexpressing this with its transpose (λ̂1 )ij = (λ̂1T ) ji , gives

0 = δ S − Tr1 (λ̂1 [Tr2 (ρ̂1,2 ) − ρ̂1 ]) , (46)

where we have relabeled λ̂1T → λ̂1 , for convenience, as the name of the Lagrange multipliers are
arbitrary. For arbitrary variations of ρ̂12 , we therefore have

λ̂1 ⊗ 1̂2 = φ (ρ̂12 , ϕ̂1 ⊗ ϕ̂2 ) . (47)

DC2 is implemented by requiring ϕ̂1 ⊗ ϕ̂2 → ρ̂1 ⊗ ϕ̂2 , such that the function φ is designed to reﬂect
subsystem independence in this case; therefore, we have

λ̂1 ⊗ 1̂2 = φ (ρ̂1 ⊗ ϕ̂2 , ϕ̂1 ⊗ ϕ̂2 ) . (48)

Had we chosen a different prior ϕ̂2 = ϕ̂2 + δ ϕ̂2 , for all δ ϕ̂2 the LHS λ̂1 ⊗ 1̂2 remains unchanged given
that φ is independent of scalar functions (I would like to thank M. Krumm for pointing this out.) of ϕ̂2 ,
as those could be lumped into λ̂1 while keeping ρ̂1 ﬁxed. The potential dependence on scalar functions
of ϕ̂2 can be removed by imposing DC2 in a subsystem independent situation where ρ̂1 in φ need not
be ﬁxed under variations of ϕ̂2 . The resulting equation in such a situation, for instance maximizing the
entropy of an independent joint prior with respect to Tr( Â1 ⊗ 1̂2 · ρ̂12 ) = A, facilitated by a scalar
Lagrange multiplier λ, and after imposing DC2,
' (
λ Â1 ⊗ 1̂2 = φ ρ̂1 ⊗ ϕ̂2 , ϕ̂1 ⊗ ϕ̂2 . (49)

For subsystem independence to be imposed here, ρ̂1 must be independent of variations in ϕ̂2 , and,
therefore, in a general subsystem independent case, φ is independent of scalar functions of ϕ̂2 .
This means that any dependence that the right-hand side of (48) might potentially have had on
ϕ̂2 must drop out, meaning,
φ (ρ̂1 ⊗ ϕ̂2 , ϕ̂1 ⊗ ϕ̂2 ) = f (ρ̂1 , ϕ̂1 ) ⊗ 1̂2 . (50)

Since ϕ̂2 is arbitrary, suppose further that we choose a unit prior, ϕ̂2 = 1̂2 , and note that ρ̂1 ⊗ 1̂2 and
ϕ̂1 ⊗ 1̂2 are block diagonal in H2 . Because the LHS is block diagonal in H2 ,
' (
f (ρ̂1 , ϕ̂1 ) ⊗ 1̂2 = φ ρ̂1 ⊗ 1̂2 , ϕ̂1 ⊗ 1̂2 . (51)

420
Entropy 2017, 19, 664

The RHS is block diagonal in H2 and, because the function φ is understood to be a power series
expansion in its arguments,
' (
f (ρ̂1 , ϕ̂1 ) ⊗ 1̂2 = φ ρ̂1 ⊗ 1̂2 , ϕ̂1 ⊗ 1̂2 = φ (ρ̂1 , ϕ̂1 ) ⊗ 1̂2 . (52)

This gives
λ̂1 ⊗ 1̂2 = φ (ρ̂1 , ϕ̂1 ) ⊗ 1̂2 , (53)

and, therefore, the 1̂2 factors out and λ̂1 = φ (ρ̂1 , ϕ̂1 ). A similar argument exchanging systems 1 and 2
shows λ̂2 = φ (ρ̂2 , ϕ̂2 ).
Case 1—Conclusion: The analysis leads us to conclude that when the system 2 is not updated,
the dependence on ϕ̂2 drops out,

φ (ρ̂1 ⊗ ϕ̂2 , ϕ̂1 ⊗ ϕ̂2 ) = φ (ρ̂1 , ϕ̂1 ) ⊗ 1̂2 , (54)

and, similarly,
φ ( ϕ̂1 ⊗ ρ̂2 , ϕ̂1 ⊗ ϕ̂2 ) = 1̂1 ⊗ φ (ρ̂2 , ϕ̂2 ) . (55)

Case 2: Now consider a different special case in which the marginal posterior distributions for systems
1 and 2 are both completely speciﬁed to be ρ̂1 and ρ̂2 , respectively. Maximize the joint entropy,
S[ρ̂12 , ϕ̂1 ⊗ ϕ̂2 ], subject to the following constraints on the ρ̂12 ,

Tr2 (ρ̂12 ) = ρ̂1 and Tr1 (ρ̂12 ) = ρ̂2 , (56)

where Tri (...) is the partial trace function, which a trace over the vectors in over
Hi . Here, each expectation value constrains the entire space Hi , where ρ̂i lives. The Lagrange
multipliers must span their respective spaces, so we implement the constraint with the Lagrange
multiplier operator μ̂i , then,

0 = δ S − Tr1 (μ̂1 [Tr2 (ρ̂12 ) − ρ̂1 ]) − Tr2 (μ̂2 [Tr1 (ρ̂12 ) − ρ̂2 ]) . (57)

For arbitrary variations of ρ̂12 , we have

μ̂1 ⊗ 1̂2 + 1̂1 ⊗ μ̂2 = φ (ρ̂12 , ϕ̂1 ⊗ ϕ̂2 ) . (58)

By design, DC2 is implemented by requiring ϕ̂1 ⊗ ϕ̂2 → ρ̂1 ⊗ ρ̂2 in this case; therefore, we have

μ̂1 ⊗ 1̂2 + 1̂1 ⊗ μ̂2 = φ (ρ̂1 ⊗ ρ̂2 , ϕ̂1 ⊗ ϕ̂2 ) . (59)

Write (59) as
μ̂1 ⊗ 1̂2 = φ (ρ̂1 ⊗ ρ̂2 , ϕ̂1 ⊗ ϕ̂2 ) − 1̂1 ⊗ μ̂2 . (60)

The LHS is independent of changes that might occur in H2 on the RHS of (60). This means that any
variation of ρ̂2 and ϕ̂2 must be “pushed out” by μ̂2 —it removes the dependence of ρ̂2 and ϕ̂2 in φ.
Any dependence that the RHS might potentially have had on ρ̂2 , ϕ̂2 must cancel out in a general
subsystem independent case, leaving μ̂1 unchanged. Consequently,

φ (ρ̂1 ⊗ ρ̂2 , ϕ̂1 ⊗ ϕ̂2 ) − 1̂1 ⊗ μ̂2 = g(ρ̂1 , ϕ̂1 ) ⊗ 1̂2 . (61)

421
Entropy 2017, 19, 664

Because g(ρ̂1 , ϕ̂1 ) is independent of arbitrary variations of ρ̂2 and ϕ̂2 on the LHS above—it is satisﬁed
equally well for all choices. The form of g(ρ̂1 , ϕ̂1 ) reduces to the form of f (ρ̂1 , ϕ̂1 ) from Case 1 when
ρ̂2 = ϕ̂2 = 1̂2 and, similarly, DC1’ gives μ̂2 = 0. Therefore, the Lagrange multiplier is

μ̂1 ⊗ 1̂2 = φ(ρ̂1 , ϕ̂1 ) ⊗ 1̂2 . (62)

A similar analysis is carried out for μ̂2 leading to

1̂1 ⊗ μ̂2 = 1̂1 ⊗ φ(ρ̂2 , ϕ̂2 ). (63)

Case 2—Conclusion: Substituting back into (59) gives us a functional equation for φ ,

φ(ρ̂1 ⊗ ρ̂2 , ϕ̂1 ⊗ ϕ̂2 ) = φ(ρ̂1 , ϕ̂1 ) ⊗ 1̂2 + 1̂1 ⊗ φ(ρ̂2 , ϕ̂2 ), (64)

which is

φ(ρ̂1 ⊗ ρ̂2 , ϕ̂1 ⊗ ϕ̂2 ) = φ(ρ̂1 ⊗ 1̂2 , ϕ̂1 ⊗ 1̂2 ) + φ(1̂1 ⊗ ρ̂2 , 1̂1 ⊗ ϕ̂2 ). (65)

The general solution to this matrix valued functional equation is derived in Appendix A.5 and is
∼ ∼
φ(ρ̂, ϕ̂) = A ln(ρ̂)+ B ln( ϕ̂), (66)
∼
where tilde A is a “super-operator” having ∼constant
coefﬁcients and twice the number of indicies as ρ̂
∼
and ϕ̂ as discussed in the Appendix (i.e., A ln(ρ̂) = ∑k Aijk (log(ρ̂))k and similarly for B ln( ϕ̂)).
ij
DC1’ imposes
∼ ∼
φ( ϕ̂, ϕ̂) = A ln( ϕ̂)+ B ln( ϕ̂) = 0̂, (67)
∼ ∼
which is satisﬁed in general when A = − B , and, now,
∼
φ(ρ̂, ϕ̂) = A ln(ρ̂) − ln( ϕ̂) . (68)

∼
We may ﬁx the constant A by substituting our solution into the RHS of Equation (64), which is equal
to the RHS of Equation (65),
∼ ∼
A1 ln(ρ̂1 ) − ln( ϕ̂1 ) ⊗ 1̂2 + 1̂1 ⊗ A2 ln(ρ̂2 ) − ln( ϕ̂2 )

∼ ∼
= A 12 ln(ρ̂1 ⊗ 1̂2 ) − ln( ϕ̂1 ⊗ 1̂2 ) + A 12 ln(1̂1 ⊗ ρ̂2 ) − ln(1̂1 ⊗ ϕ̂2 ) , (69)

∼ ∼ ∼
where A 12 acts on the joint space of 1 and 2 and A 1 , A 2 acts on single subspaces 1 or 2, respectively.
Using the well known log tensor product identity in this case (The proof is demonstrated by taking the
log of ρ̂1 ⊗ 1̂2 ≡ exp(ρ̂1 ) ⊗ 1̂2 = exp(ρ̂1 ⊗ 1̂2 ) and substituting ρ̂1 = log(ρ̂1 ).), ln(ρ̂1 ⊗ 1̂2 ) = ln(ρ̂1 ) ⊗ 1̂2 ,
the RHS of Equation (69) becomes
∼ ∼
= A 12 ln(ρ̂1 ) ⊗ 1̂2 − ln( ϕ̂1 ) ⊗ 1̂2 + A 12 1̂1 ⊗ ln(ρ̂2 ) − 1̂1 ⊗ ln( ϕ̂2 ) . (70)

Note that arbitrarily letting ρ̂2 = ϕ̂2 gives

∼ ∼
A1 ln(ρ̂1 ) − ln( ϕ̂1 ) ⊗ 1̂2 = A 12 ln(ρ̂1 ) ⊗ 1̂2 − ln( ϕ̂1 ) ⊗ 1̂2 , (71)

422
Entropy 2017, 19, 664

or arbitrarily letting ρ̂1 = ϕ̂1 gives

∼ ∼
1̂1 ⊗ A2 ln(ρ̂2 ) − ln( ϕ̂2 ) = A 12 1̂1 ⊗ ln(ρ̂2 ) − 1̂1 ⊗ ln( ϕ̂2 ) . (72)

∼ ∼ ∼
As A 12 , A 1 , and A 2 are constant tensors, inspecting the above equalities determines the form of
∼ ∼ ∼
the tensor to be A = A 1 where A is a scalar constant and 1 is the super-operator identity over the
appropriate (joint) Hilbert space.
Because our goal is to maximize the entropy function, we let the arbitrary constant A = −| A| and
∼
distribute 1 identically, which gives the ﬁnal functional form,

φ(ρ̂, ϕ̂) = −| A| ln(ρ̂) − ln( ϕ̂) . (73)

“Integrating” φ gives a general form for the quantum relative entropy,

S(ρ̂, ϕ̂) = −| A|Tr(ρ̂ log ρ̂ − ρ̂ log ϕ̂ − ρ̂) + C [ ϕ̂] = −| A|SU (ρ̂, ϕ̂) + | A|Tr(ρ̂) + C [ ϕ̂], (74)

where SU (ρ̂, ϕ̂) is Umegaki’s form of the relative entropy [42–44], the extra | A|Tr(ρ̂) from integration
is an artifact present for the preservation of DC1’, and C [ ϕ̂] is a constant in the sense that it drops out
under arbitrary variations of ρ̂. This entropy leads to the same inferences as Umegaki’s form of the
entropy with an added bonus that ρ̂ = ϕ̂ in the absence of constraints or changes in information—rather
than ρ̂ = e−1 ϕ̂, which would be given by maximizing Umegaki’s form of the entropy. In this sense,
the extra | A|Tr(ρ̂) only improves the inference process as it more readily adheres to the PMU though
DC1’; however, now, because SU ≥ 0, we have S(ρ̂, ϕ̂) ≤ Tr(ρ̂) + C [ ϕ̂], which provides little nuisance.
In the spirit of this derivation, we will keep the Tr(ρ̂) term there, but, for all practical purposes of
inference, as long as there is a normalization constraint, it plays no role, and we ﬁnd (letting | A| = 1
and C [ ϕ̂] = 0),

S(ρ̂, ϕ̂) → S∗ (ρ̂, ϕ̂) = −SU (ρ̂, ϕ̂) = −Tr(ρ̂ log ρ̂ − ρ̂ log ϕ̂), (75)

Umegaki’s form of the relative entropy. S∗ (ρ̂, ϕ̂) is an equally valid entropy because, given normalization
is applied, the same selected posterior ρ̂ maximizes both S(ρ̂, ϕ̂) and S∗ (ρ̂, ϕ̂).

3.2. Remarks
Due to the universality and the equal application of the PMU by using the same design criteria
for both the standard and quantum case, the quantum relative entropy reduces to the standard relative
entropy when [ρ̂, ϕ̂] = 0 or when the experiment being preformed ρ̂ → ρ( a) = Tr(ρ̂| a a|) is known.
The quantum relative entropy we derive has the correct asymptotic form of the standard relative
entropy in the sense of [8–10]. Further connections will be illustrated in a follow up article that is
concerned with direct applications of the quantum relative entropy. Because two entropies are derived
in parallel, we expect the well-known inferential results and consequences of the relative entropy to
have a quantum relative entropy representation.
Maximizing the quantum relative entropy with respect to some constraints Âi , where { Âi } are
a set of arbitrary Hermitian operators, and normalization 1̂ = 1, gives the following general solution
for the posterior density matrix:
1 1
ρ̂ = exp α0 1̂ + ∑ αi Âi + ln( ϕ̂) = exp ∑ αi Âi + ln( ϕ̂) ≡ exp Ĉ , (76)
i
Z i
Z

where αi are the Lagrange multipliers of the respective constraints and normalization may be factored
out of the exponential in general because the identity commutes universally. If ϕ̂ ∝ 1̂, it is well
known that the analysis arrives at the same expression for ρ̂ after normalization, as it would if the

423
Entropy 2017, 19, 664

von Neumann entropy were used, and thus one can ﬁnd expressions for thermalized quantum states
ρ̂ = Z1 e− β Ĥ . The remaining problem is to solve for the N Lagrange multipliers using their N associated
expectation value constraints. In principle, their solution is found by computing Z and using standard
methods from Statistical Mechanics,

∂
Âi = − ln( Z ), (77)
∂αi

and inverting to ﬁnd αi = αi ( Âi ), which has a unique solution due to the joint concavity (convexity
depending on the sign convention) of the quantum relative entropy [8,9] when the constraints are
linear in ρ̂. The simple proof that (77) is monotonic in α, and therefore invertible, is that its derivative
∂
∂α Âi = Âi − Âi ≥ 0. Between the Zassenhaus formula [45]
2 2

t2 t3
et( Â+ B̂) = et Â et B̂ e− 2 [ Â,B̂] e 6 (2[ B̂,[ Â,B̂]]+[ Â,[ Â,B̂]]) ..., (78)

and Horn’s inequality [46–48], the solutions to (77) lack a certain calculational elegance because it is
difﬁcult to express the eigenvalues of Ĉ = log( ϕ̂) + ∑ αi Âi (in the exponential) in simple terms of the
eigenvalues of the Âi ’s and ϕ̂, in general, when the matrices do not commute. The solution requires
solving the eigenvalue problem for Ĉ, such the the exponential of Ĉ may be taken and evaluated in
terms of the eigenvalues of the αi Âi s and the prior density matrix ϕ̂. A pedagogical exercise is starting
with a prior that is a mixture of spin-z up and down ϕ̂ = a|+ +| + b|− −| (a, b
= 0), maximizing
the quantum relative entropy with respect to an expectation of a general Hermitian operator with
which the prior density matrix does not commute. This example for spin is given in the Appendix B.

4. Conclusions
This approach emphasizes the notion that entropy is a tool for performing inference and
downplays counter-notional issues that arise if one interprets entropy as a measure of disorder,
a measure of distinguishability, or an amount of missing information [7]. Because the same design
criteria, guided by the PMU, are applied equally well to the design of a relative and quantum relative
entropy, we find that both the relative and quantum relative entropy are designed for the purpose of
inference. Because the quantum relative entropy is the functional that fits the requirements of a tool
designed for the inference of density matrices, we now know what it is and how to use it—formulating
an inferential quantum maximum entropy method. This article provides the foundation for [29], which,
in particular, derives the Quantum Bayes Rule and collapse as special cases of the quantum maximum
entropy method, as was craved in [24], analogous to [38,40]’s treatment for deriving Bayes Rule using
the standard maximum entropy method. The quantum maximum entropy method thereby unifies
a few topics in Quantum Information and Quantum Measurement through entropic inference.

Acknowledgments: I must give ample acknowledgment to Ariel Caticha who suggested the problem of justifying
the form of the quantum relative entropy as a criterion for ranking of density matrices. He cleared up several
difficulties by suggesting that design constraints be applied to the variational derivative of the entropy rather
than the entropy itself. In addition, he provided substantial improvements to the method for imposing DC2 that
led to the functional equations for the variational derivatives (φ12 = φ1 + φ2 )—with more rigor than in earlier
versions of this article. His time and guidance are all greatly appreciated—thanks, Ariel. I would also like to
thank M. Krumm, the reviewers, as well as our information physics group at UAlbany for our many intriguing
discussions about probability, inference, and quantum mechanics.
Conflicts of Interest: The author declares no conflict of interest.

Appendix A
The Appendix loosely follows the relevant sections in [49], and then uses the methods reviewed to
solve the relevant functional equations for φ. The last section is an example of the quantum maximum
entropy method applied to a mixed spin state.

424
Entropy 2017, 19, 664

Appendix A.1. Simple Functional Equations

From [49] pages 31–44.

Theorem A1. If Cauchy’s functional equation

f ( x + y) = f ( x ) + f (y) (A1)

is satisﬁed for all real x, y, and if the function f ( x ) is (a) continuous at a point, (b) nonegative for small positive
x’s, or (c) bounded in an interval, then,

f ( x ) = cx (A2)

is the solution to (A1) for all real x. If (A1) is assumed only over all positive x, y, then under the same
conditions, (A2) holds for all positive x.

Proof. The most natural assumption for our purposes is that f ( x ) is continuous at a point (which later
extends to continuity all points as given by Darboux [50]). Cauchy solved the functional equation by
induction. In particular, Equation (A1) implies,

f ( ∑ xi ) = ∑ f ( x i ), (A3)
i i

and if we let each xi = x as a special case to determine f , we ﬁnd

f (nx ) = n f ( x ). (A4)

We may let nx = mt such that

m m
f (x) = f ( t ) = f ( t ). (A5)
n n

Letting limt→1 f (t) = f (1) = c gives

m m m
f( ) = f (1) = c, (A6)
n n n
m
and, because for t = 1, x = n above, we have

f ( x ) = cx, (A7)

which is the general solution of the linear functional equation. In principle, c can be complex.
The importance of Cauchy’s solution is that it can be used to give general solutions to the following
Cauchy equations:

f ( x + y) = f ( x ) f ( y ), (A8)
f ( xy) = f ( x ) + f ( y ), (A9)
f ( xy) = f ( x ) f ( y ), (A10)

by preforming consistent substitution until they are the same form as (A1), as given by Cauchy. We will
brieﬂy discuss the ﬁrst two.

Theorem A2. The general solution of f ( x + y) = f ( x ) f (y) is f ( x ) = ecx for all real or for all positive x, y
that are continuous at one point and, in addition to the exponential solution, the solution f (0) = 1 and f ( x ) = 0
for (x > 0) are in these classes of functions.

425
Entropy 2017, 19, 664

The ﬁrst functional f ( x + y) = f ( x ) f (y) is solved by ﬁrst noting that it is strictly positive for real x, y,
f ( x ), which can be shown by considering x = y,

f (2x ) = f ( x )2 > 0. (A11)

If there exists f ( x0 ) = 0, then it follows that f ( x ) = f (( x − x0 ) + x0 ) = 0, a trivial solution, hence the reason
why the possibility of being equal to zero is excluded above. Given f ( x ) is nowhere zero, we are justiﬁed in
taking the natural logarithm ln( x ), due to its positivity f ( x ) > 0. This gives,

ln( f ( x + y)) = ln( f ( x )) + ln( f (y)), (A12)

and letting g( x ) = ln( f ( x )) gives,

g ( x + y ) = g ( x ) + g ( y ), (A13)

which is Cauchy’s linear equation, and thus has the solution g( x ) = cx. Because g( x ) = ln( f ( x )), one ﬁnds in
general that f ( x ) = ecx .

Theorem A3. If the functional equation f ( xy) = f ( x ) + f (y) is valid for all positive x, y then its general
solution is f ( x ) = c ln( x ) given it is continuous at a point. If x = 0 (or y = 0) are valid, then the general
solution is f ( x ) = 0. If all real x, y are valid except 0, then the general solution is f ( x ) = c ln(| x |).
In particular, we are interested in the functional equation f ( xy) = f ( x ) + f (y) when x, y are positive.
In this case, we can again follow Cauchy and substitute x = eu and y = ev to get,

f ( e u e v ) = f ( e u ) + f ( e v ), (A14)

and letting g(u) = f (eu ) gives g(u + v) = g(u) + g(v). Again, the solution is g(u) = cu and, therefore,
the general solution is f ( x ) = c ln( x ) when we substitute for u. If x could equal 0, then f (0) = f ( x ) + f (0),
which has the trivial solution f ( x ) = 0. The general solution for x
= 0, y
= 0 and x, y positive is therefore
f ( x ) = c ln( x ).

Appendix A.2. Functional Equations with Multiple Arguments

From [49] pages 213–217. Consider the functional equation,

F ( x1 + y1 , x2 + y2 , ..., xn + yn ) = F ( x1 , x2 , ..., xn ) + F (y1 , y2 , ..., yn ), (A15)

which is a generalization of Cauchy’s linear functional Equation (A1) to several arguments.

Letting x2 = x3 = ... = xn = y2 = y3 = ... = yn = 0 gives

F ( x1 + y1 , 0, ..., 0) = F ( x1 , 0, ..., 0) + F (y1 , 0, ..., 0), (A16)

which is the Cauchy linear functional equation having solution F ( x1 , 0, ..., 0) = c1 x1 , where F ( x1 , 0, ..., 0)
is assumed to be continuous or at least measurable majorant. Similarly,

F (0, ..., 0, xk , 0, ..., 0) = ck xk , (A17)

and if you consider

F ( x1 + 0, 0 + y2 , 0, ..., 0) = F ( x1 , 0, ..., 0) + F (0, y2 , 0, ..., 0) = c1 x1 + c2 y2 , (A18)

and, as y2 is arbitrary, we could have let y2 = x2 such that in general

F ( x1 , x2 , ..., xn ) = ∑ ci xi , (A19)

426
Entropy 2017, 19, 664

formulating the general solution.

Appendix A.3. Relative Entropy

We are interested in the following functional equation:

φ ( ρ1 ρ2 , ϕ1 ϕ2 ) = φ ( ρ1 , ϕ1 ) + φ ( ρ2 , ϕ2 ). (A20)

This is an equation of the form,

F ( x1 y1 , x2 y2 ) = F ( x1 , x2 ) + F ( y1 , y2 ), (A21)

where x1 = ρ( x1 ), y1 = ρ( x2 ), x2 = ϕ( x1 ), and y2 = ϕ( x2 ). First, assume all q and p are greater than

zero. Then, substitute: xi = e xi and yi = eyi and let F ( x1 , x2 ) = F (e x1 , e x2 ) and so on such that

F ( x1 + y1 , x2 + y2 ) = F ( x1 , x2 ) + F (y1 , y2 ), (A22)

which is of the form of (A15). The general solution for F is therefore

F ( x1 + y1 , x2 + y2 ) = a1 ( x1 + y1 ) + a2 ( x2 + y2 ) = a1 ln( x1 y1 ) + a2 ln( x2 y2 ) = F ( x1 y1 , x2 y2 ), (A23)

which means the general solution for φ is

φ ( ρ1 , ϕ1 ) = a1 ln(ρ( x1 )) + a2 ln( ϕ( x1 )). (A24)

In such a case, when ϕ( x0 ) = 0 for some value x0 ∈ X , we may let ϕ( x0 ) = , where is as close to
zero as we could possibly want—the trivial general solution φ = 0 is saturated by the special case
when ρ = ϕ from DC1’. Here, we return to the text.

Appendix A.4. Matrix Functional Equations

(This derivation is implied in [49] pages 347–349). First, consider a Cauchy matrix functional equation,

f ( X̂ + Ŷ ) = f ( X̂ ) + f (Ŷ ), (A25)

where X̂ and Ŷ are n × n square matrices. Rewriting the matrix functional equation in terms of its
components gives

f ij ( x11 + y11 , x12 + y12 , ..., xnn + ynn ) = f ij ( x11 , x12 , ..., xnn ) + f ij (y11 , y12 , ..., ynn ) (A26)

and is now in the form of (A15), and, therefore, the solution is

n
f ij ( x11 , x12 , ..., xnn ) = ∑ cijk xk (A27)
,k=0

for i, j = 1, ..., n. We ﬁnd it convenient to introduce super indices, A = (i, j) and B = (, k ) such that
the component equation becomes

fA = ∑ c AB xB , (A28)
B

and resembles the solution for the linear transformation of a vector from [49]. In general, we will be
discussing matrices X̂ = X̂1 ⊗ X̂2 ⊗ ... ⊗ X̂ N which stem from tensor products of density matrices.
In this situation, X̂ can be thought of as 2N index tensor or a z × z matrix where z = ∏iN ni is the
product of the ranks of the matrices in the tensor product or even as a vector of length z2 . In such

427
Entropy 2017, 19, 664

a case, we may abuse the super index notation where A and B lump together the appropriate number
of indices such that (A28) is the form of the solution for the components in general. The matrix form of
the general solution is

)X̂,
f ( X̂ ) = C (A29)

) is a constant super-operator having components c AB .

where C

Appendix A.5. Quantum Relative Entropy

The functional equation of interest is

φ ρ̂1 ⊗ ρ̂2 , ϕ̂1 ⊗ ϕ̂2 = φ ρ̂1 ⊗ 1̂2 , ϕ̂1 ⊗ 1̂2 + φ 1̂1 ⊗ ρ̂2 , 1̂1 ⊗ ϕ̂2 . (A30)

These density matrices are Hermitian, positive semi-deﬁnite, have positive eigenvalues, and are not
equal to 0̂. Because every invertible matrix can be expressed as the exponential of some other matrix,

we can substitute ρ̂1 = eρ̂1 , and so on for all four density matrices giving,

φ eρ̂1 ⊗ eρ̂2 , e ϕ̂1 ⊗ e ϕ̂2 = φ eρ̂1 ⊗ 1̂2 , e ϕ̂1 ⊗ 1̂2 + φ 1̂1 ⊗ eρ̂2 , 1̂1 ⊗ e ϕ̂2 . (A31)

Now, we use the following identities for Hermitian matrices:

eρ̂1 ⊗ eρ̂2 = eρ̂1 ⊗1̂2 +1̂1 ⊗ρ̂2 (A32)

and

eρ̂1 ⊗ 1ˆ2 = eρ̂1 ⊗1̂2 , (A33)

to recast the functional equation as,

φ eρ̂1 ⊗1̂2 +1̂1 ⊗ρ̂2 , e ϕ̂1 ⊗1̂2 +1̂1 ⊗ ϕ̂2 = φ eρ̂1 ⊗1̂2 , e ϕ̂1 ⊗1̂2 + φ e1̂1 ⊗ρ̂2 , e1̂1 ⊗ ϕ̂2 . (A34)

Letting G (ρ̂1 ⊗ 1̂2 , ϕ̂1 ⊗ 1̂2 ) = φ eρ̂1 ⊗1̂2 , e ϕ̂1 ⊗1̂2 , and the like, gives

G (ρ̂1 ⊗ 1̂2 + 1̂1 ⊗ ρ̂2 , ϕ̂1 ⊗ 1̂2 + 1̂1 ⊗ ϕ̂2 ) = G (ρ̂1 ⊗ 1̂2 , ϕ̂1 ⊗ 1̂2 ) + G (1̂1 ⊗ ρ̂2 , 1̂1 ⊗ ϕ̂2 ). (A35)

This functional equation is of the form

G ( X̂1 + Ŷ1 , X̂2 + Ŷ2 ) = G ( X̂1 , X̂2 ) + G (Ŷ1 , Ŷ2 ), (A36)

which has the general solution

∼
)Ŷ ,
G ( X̂ , Ŷ ) = A X̂ + B (A37)

analogous to (A19), and ﬁnally, in general,

∼
) ln( ϕ̂),
φ(ρ̂, ϕ̂) = A ln(ρ̂) + B (A38)
∼ ∼
where A , B are super-operators having constant coefﬁcients. Here, we return to the text.

428
Entropy 2017, 19, 664

Appendix B. Spin Example

Consider an arbitrarily mixed prior (in the spin-z basis for convenience) with a, b
= 0,

ϕ̂ = a|+ +| + b|− −| (A39)

and a general Hermitian matrix in the spin-1/2 Hilbert space,

cμ σ̂μ = c1 1̂ + c x σ̂x + cy σ̂x + cz σ̂z (A40)

= (c1 + cz )|+ +| + (c x − icy )|+ −| + (c x + icy )|− +| + (c1 − cz )|− −|, (A41)

having a known expectation value,

Tr(ρ̂cμ σ̂μ ) = c. (A42)

Maximizing the entropy with respect to this general expectation value and normalization is:

0 = δS − λ[Tr(ρ̂) − 1] − α(Tr(ρ̂cμ σ̂μ ) − c) , (A43)

which after varying gives the solution,

1
ρ̂ = exp(αcμ σ̂μ + log( ϕ̂)). (A44)
Z
Letting

Ĉ = αcμ σ̂μ + log( ϕ̂) (A45)

gives

1 Ĉ −1 1
ρ̂ = e = UeU ĈU U −1 = Ueλ̂ U −1
Z Z
eλ+ e λ−
= U |λ+ λ+ |U −1 + U |λ− λ− |U −1 , (A46)
Z Z

where λ̂ is the diagonalized matrix of Ĉ having real eigenvalues. They are

λ± = λ ± δλ, (A47)

due to the quadratic formula, where explicitly:

1
λ = αc1 + log( ab), (A48)
2
and

1 a 2
δλ = 2αcz + log( ) + 4α2 (c2x + c2y ). (A49)
2 b

Because λ± and a, b, c1 , c x , cy , cz are real, δλ is real and ≥ 0. The normalization constraint speciﬁes the
Lagrange multiplier Z,

eλ+ + eλ−
1 = Tr(ρ̂) = , (A50)
Z

429
Entropy 2017, 19, 664

so Z = eλ+ + eλ− = 2eλ cosh(δλ). The expectation value constraint speciﬁes the Lagrange multiplier α,

∂ ∂
c = Tr(ρ̂cμ σμ ) = log( Z ) = c1 + tanh(δλ) δλ, (A51)
∂α ∂α
which becomes

tanh(δλ) a
c = c1 + 2α(c2x + c2y + c2z ) + cz log( ) ,
2δλ b
or
1 a 2 2α(c2x + c2y + c2z ) + cz log( ba )
c = c1 + tanh 2αcz + log( ) + 4α2 (c2x + c2y ) 2 . (A52)
2 b
2αcz + log( ba ) + 4α2 (c2x + c2y )

This equation is monotonic in α and therefore it is uniquely specified by the value of c. Ultimately, this is
a consequence from the concavity of the entropy. The specific proof of (A52)’s monotonicity is below:
.
Proof. For ρ̂ to be Hermitian, Ĉ is Hermitian and δλ = 12 f (α) is real—furthermore, because δλ
is real f (α) ≥ 0 and thus δλ ≥ 0. Because f (α) is quadratic in α and positive, it may be written in
vertex form,

f (α) = a(α − h)2 + k, (A53)

where a > 0, k ≥ 0, and (h, k) are the ( x, y) coordinates of the minimum of f (α). Notice that the form
of (A52) is
.
tanh( 12 f (α)) ∂ f (α)
F (α) = . × . (A54)
f (α) ∂α

Making the change of variables α = α − h centers the function such that f (α ) = f (−α ) is symmetric
about α = 0. We can then write
.
tanh( 12 f (α ))
F (α ) = . × 2aα , (A55)
f (α )

where the derivative has been computed. Because f (α ) is a positive, symmetric, and monotonically
√
tanh( 12 f (α ))
increasing on the (symmetric) half-plane (for α greater than or less that zero), S(α ) ≡ √ is
f (α )
also positive and symmetric, but it is unclear whether S(α) is strictly monotonic in the half-plane or
not. We may restate

F (α ) = S(α ) × 2aα . (A56)

We are now in a convenient position to preform the derivate test for monotonic functions:

∂ ∂
F (α ) = 2aS(α ) + 2aα S(α )
∂α ∂α

aα2 aα2 .
2 1
= 2aS(α ) 1 − 2 + a 2 1 − tanh ( 2
aα + k ) (A57)
aα + k aα + k 2
a ( α )2
≥ 2aS(α ) 1 − 2 ≥0
aα + k

430
Entropy 2017, 19, 664

2
because a, k, S(α ), and therefore aαaα2 +k are all > 0. The function of interest F (α ) is therefore monotonic
for all α , and therefore it is monotonic for all α, completing the proof that there exists a unique real
Lagrange multiplier α in (A52).
Although (A52) is monotonic in α, it is seemingly a transcendental equation. This can be solved
graphically for the given values c, c1 , c x , cy , cz , i.e., given the Hermitian matrix and its expectation value
are speciﬁed. Equation (A52) and the eigenvalues take a simpler form when a = b = 12 because, in this
instance, ϕ̂ ∝ 1̂ and commutes universally so it may be factored out of the exponential in (A44).

References
1. Shore, J.E.; Johnson, R.W. Axiomatic derivation of the Principle of Maximum Entropy and the Principle of
Minimum Cross-Entropy. IEEE Trans. Inf. Theory 1980, 26, 26–37.
2. Shore, J.E.; Johnson, R.W. Properties of Cross-Entropy Minimization. IEEE Trans. Inf. Theory 1981, 27, 472–482.
3. Csiszár, I. Why least squares and maximum entropy: An axiomatic approach to inference for linear inverse
problems. Ann. Stat. 1991, 19, 2032.
4. Skilling, J. The Axioms of Maximum Entropy. In Maximum-Entropy and Bayesian Methods in Science and
Engineering; Erickson, G.J., Smith, C.R., Eds.; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1988.
5. Skilling, J. Classic Maximum Entropy. In Maximum-Entropy and Bayesian Methods in Science and Engineering;
Kluwer Academic Publishers: Dordrecht, The Netherlands, 1988.
6. Skilling, J. Quantiﬁed Maximum Entropy. In Maximum-Entropy and Bayesian Methods in Science and Engineering;
Fougére, P.F., Ed.; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1990.
7. Caticha, A. Entropic Inference and the Foundations of Physics (Monograph Commissioned by the 11th
Brazilian Meeting on Bayesian Statistics—EBEB-2012). Available online: http://www.albany.edu/physics/
ACaticha-EIFP-book.pdf (accessed on 30 November 2017).
8. Hiai, F.; Petz, D. The Proper Formula for Relative Entropy and its Asymptotics in Quantum Probability.
Commun. Math. Phys. 1991, 143, 99–114.
9. Petz, D. Characterization of the Relative Entropy of States of Matrix Algebras. Acta Math. Hung. 1992, 59,
449–455.
10. Ohya, M.; Petz, D. Quantum Entropy and Its Use; Springer: New York, NY, USA, 1993; ISBN 0-387-54881-5.
11. Wilming, H.; Gallego, R.; Eisert, J. Axiomatic Characterization of the Quantum Relative Entropy and Free
Energy. Entropy 2017, 19, 241.
12. Jaynes, E.T. Information Theory and Statistical Mechanics. Phys. Rev. 1957, 106, 620–630.
13. Jaynes, E.T. Probability Theory: The Logic of Science; Cambridge University Press: Cambridge, UK, 2003.
14. Jaynes, E.T. Information Theory and Statistical Mechanics II. Phys. Rev. 1957, 108, 171–190.
15. Balian, R.; Vénéroni, M. Incomplete descriptions, relevant information, and entropy production in collision
processes. Ann. Phys. 1987, 174, 229–224.
16. Balian, R.; Balazs, N.L. Equiprobability, inference and entropy in quantum theory. Ann. Phys. 1987, 179,
97–144.
17. Balian, R. Justiﬁcation of the Maximum Entropy Criterion in Quantum Mechanics. In Maximum Entropy
and Bayesian Methods; Skilling, J., Ed.; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1989;
pp. 123–129.
18. Balian, R. On the principles of quantum mechanics. Am. J. Phys. 1989, 57, 1019–1027.
19. Balian, R. Gain of information in a quantum measurement. Eur. J. Phys. 1989, 10, 208–213
20. Balian, R. Incomplete descriptions and relevant entropies. Am. J. Phys. 1999, 67, 1078–1090.
21. Blankenbecler, R.; Partovi, H. Uncertainty, Entropy, and the Statistical Mechanics of Microscopic Systems.
Phys. Rev. Lett. 1985, 54, 373–376.
22. Blankenbecler, R.; Partovi, H. Quantum Density Matrix and Entropic Uncertainty. In Proceedings of the
Fifth Workshop on Maximum Entropy and Bayesian Methods in Applied Statistics, Laramie, WY, USA,
5–8 August 1985.
23. Von Neumann, J. Mathematische Grundlagen der Quantenmechanik; Springer: Berlin, Germany, 1932.
English Translation: Mathematical Foundations of Quantum Mechanics; Princeton University Press: Princeton,
NY, USA, 1983.

431
Entropy 2017, 19, 664

24. Ali, S.A.; Cafaro, C.; Giffin, A.; Lupo, C.; Mancini, S. On a Differential Geometric Viewpoint of Jaynes’
Maxent Method and its Quantum Extension. AIP Conf. Proc. 2012, 1443, 120–128.
25. Caticha, A. Entropic Dynamics: Quantum Mechanics from Entropy and Information Geometry.
Available online: https://arxiv.org/abs/1711.02538 (accessed on 30 November 2017).
26. Reginatto, M.; Hall, M.J.W. Quantum-classical interactions and measurement: A consistent description using
statistical ensembles on configuration space. J. Phys. Conf. Ser. 2009, 174, 012038.
27. Reginatto, M.; Hall, M.J.W. Information geometry, dynamics and discrete quantum mechanics.
AIP Conf. Proc. 2013, 1553, 246–253.
28. Caves, C.; Fuchs, C.; Schack, R. Quantum probabilities as Bayesian probabilities. Phys. Rev. A 2002, 65, 022305.
29. Vanslette, K. The Quantum Bayes Rule and Generalizations from the Quantum Maximum Entropy Method.
Available online: https://arxiv.org/abs/1710.10949 (accessed on 30 November 2017).
30. Schack, R.; Brun, T.; Caves, C. Quantum Bayes rule. Phys. Rev. A 2001, 64, 014305.
31. Korotkov, A. Continuous quantum measurement of a double dot. Phys. Rev. B 1999, 60, 5737–5742.
32. Korotkov, A. Selective quantum evolution of a qubit state due to continuous measurement. Phys. Rev. B
2000, 63, 115403.
33. Jordan, A.; Korotkov, A. Qubit feedback and control with kicked quantum nondemolition measurements:
A quantum Bayesian analysis. Phys. Rev. B 2006, 74, 085307.
34. Hellmann, F.; Kamiński, W.; Kostecki, P. Quantum collapse rules from the maximum relative entropy
principle. New J. Phys. 2016, 18, 013022.
35. Warmuth, M. A Bayes Rule for Density Matrices. In Advances in Neural Information Processing Systems 18,
Proceedings of the Neural Information Processing Systems Conference, Montréal, QC, Canada, 7–12 December 2005;
Neural Information Processing Systems Foundation, Inc.: La Jolla, CA, USA, 2015.
36. Warmuth, M.; Kuzmin, D. A Bayesian Probability Calculus for Density Matrices. Mach. Learn. 2010, 78,
63–101.
37. Tsuda, K. Machine learning with quantum relative entropy. J. Phys. Conf. Ser. 2009, 143, 012021.
38. Giffin, A.; Caticha, A. Updating Probabilities. Presented at the 26th International Workshop on Bayesian
Inference and Maximum Entropy Methods (MaxEnt 2006), Paris, France, 8–13 July 2006.
39. Wang, Z.; Busemeyer, J.; Atmanspacher, H.; Pothos, E. The Potential of Using Quantum Theory to Build
Models of Cognition. Top. Cogn. Sci. 2013, 5, 672–688.
40. Giffin, A. Maximum Entropy: The Universal Method for Inference. Ph.D. Thesis, University at Albany
(SUNY), Albany, NY, USA, 2008.
41. Caticha, A. Toward an Informational Pragmatic Realism. Minds Mach. 2014, 24, 37–70.
42. Umegaki, H. Conditional expectation in an operator algebra, IV (entropy and information). Ködai Math.
Sem. Rep. 1962, 14, 59–85.
43. Uhlmann, A. Relative entropy and the Wigner-Yanase-Dyson-Lieb concavity in an interpolation theory.
Commun. Math. Phys. 1997, 54, 21–32.
44. Schumacher, B.; Westmoreland, M. Relative entropy in quantum information theory. In Proceedings of the
AMS Special Session on Quantum Information and Computation, Washington, DC, USA, 19–21 January 2000.
45. Suzuki, M. On the Convergence of Exponential Operators—The Zassenhaus Formula, BCH Formula and
Systematic Approximants. Commun. Math. Phys. 1977, 57, 193–200.
46. Horn, A. Eigenvalues of sums of Hermitian matrices. Pac. J. Math. 1962, 12, 225–241.
47. Bhatia, R. Linear Algebra to Quantum Cohomology: The Story of Alfred Horn’s Inequalities. Am. Math. Mon.
2001, 108, 289–318.
48. Knutson, A.; Tao, T. Honeycombs and Sums of Hermitian Matrices. Not. AMS 2001, 48, 175–186.
49. Aczél, J. Lectures on Functional Equations and Their Applications; Academic Press Inc.: New York, NY, USA,
1966; Volume 19, pp. 31–44, 141–145, 213–217, 301–302, 347–349.
50. Darboux, G. Sur le théorème fondamental de la géométrie projective. Math. Ann. 1880, 17, 55–61.

432
entropy
Article
Finding a Hadamard Matrix by Simulated
Quantum Annealing
Andriyan Bayu Suksmono
Telecommunication Engineering Scientiﬁc and Research Group (TESRG), School of Electrical Engineering and
Informatics and The Research Center on Information and Communication Technology (PPTIK-ITB),
Institut Teknologi Bandung, Jl. Ganesha No.10, Bandung 40132, Indonesia; [email protected]

Received: 2 January 2018; Accepted: 16 February 2018; Published: 22 February 2018

Abstract: Hard problems have recently become an important issue in computing. Various methods,
including a heuristic approach that is inspired by physical phenomena, are being explored. In this
paper, we propose the use of simulated quantum annealing (SQA) to find a Hadamard matrix,
which is itself a hard problem. We reformulate the problem as an energy minimization of spin
vectors connected by a complete graph. The computation is conducted based on a path-integral
Monte-Carlo (PIMC) SQA of the spin vector system, with an applied transverse magnetic field whose
strength is decreased over time. In the numerical experiments, the proposed method is employed to
find low-order Hadamard matrices, including the ones that cannot be constructed trivially by the
Sylvester method. The scaling property of the method and the measurement of residual energy after
a sufficiently large number of iterations show that SQA outperforms simulated annealing (SA) in
solving this hard problem.

Keywords: quantum annealing; adiabatic quantum computing; hard problems; Hadamard matrix;
binary optimization

1. Introduction

1.1. Background
Finding a solution to a hard problem is a challenging task in computing. Such a problem is
characterized by its complexity, as it grows beyond the polynomial against the size of the input.
A class of particularly important ones are NP (non-deterministic polynomial) problems, in which
verifying a solution can be conducted in polynomial time, whereas ﬁnding the solution is of
exponential order. Examples of such problems are, among others, the TSP (traveling salesman problem),
SAT (Boolean satisﬁability), graph coloring, graph isomorphism, and subset sums.
An interesting approach to the hard problems is a method inspired by physical phenomena, such
as classical annealing (CA) or quantum annealing (QA). Both CA and QA are physical processes that
obtain an ordered (physical) system from an unordered one, which can be done either thermally (as is
the case in CA) or quantum-mechanically (as is the case in QA). To simulate the physical processes on
a (classical/non-quantum) computer, numerical methods, such as MC (Monte Carlo) for CA and PIMC
(path-integral Monte Carlo) for QA, have been developed. The algorithm or computational method
inspired by classical/thermal annealing is called simulated annealing (SA), whereas the one based
on quantum annealing is called simulated quantum annealing (SQA). Both of these methods make
use of the methods in numerical CA or numerical QA. They encode the problem into a Hamiltonian
of a spin system [1] and then evolve the system from a high energy state down to the ground state.
The annealing process enables the system to avoid local minima trapping and therefore is capable of
achieving a global optimum, which represents the best solution of the problem. The main difference

Entropy 2018, 20, 141; doi:10.3390/e20020141 433 www.mdpi.com/journal/entropy

Entropy 2018, 20, 141

between SA and SQA is in the evolution of the systems; whereas SA uses classical/thermal annealing,
SQA employs quantum mechanism.
In SA [2–4], one starts the system in total randomness with regard to a high temperature state.
The temperature is then lowered and the system is evolved, which causes the energy to decrease so
that the system becomes increasingly ordered. To avoid local-optima trapping, a particular updating
rule, such as the Metropolis [2], is applied. The rule allows the system to (sometimes) move to a higher
energy state. Upon completion of the algorithm, the system achieves the ground state, at which point
a solution is found.
In [5], Kadowaki and Nishimori introduced quantum fluctuations to replace the thermal
fluctuations in SA to accelerate the convergence. They applied the method on an Ising model, where a
transverse field plays the role of temperature in classical SA, enabling the system to achieve the ground
state with greater probability. Santoro et al. [6] compared classical and quantum Monte Carlo annealing
protocols on a two-dimensional Ising model. They found that the quantum Monte Carlo annealing is
superior to classical annealing. In [7], Boixo et al. show experimental results on a 108 qubit D-Wave One,
which is a kind of hardware implementation of QA. A strong correlation between D-Wave and SQA,
compared to the device with classical annealing, was found, which indicates that the D-Wave performs
quantum annealing. This result raised the important issue of whether QA actually outperforms SA [8].
Rønnow et al. [9] showed how quantum speedup should be defined and measured. In an experiment
with random spin glass instances on 503 qubits of D-Wave Two, they did not find any evidence of
such speedup.
Regardless of these issues, different results have been achieved via SQA. Isakov [10] performed
quantum Monte Carlo (QMC) simulations and found that the QMC tunneling rate displayed scaled
according to system size. He also found quadratic speedup in QMC simulations when, instead of
periodic conditions, open boundary conditions were employed. In [11], Mazzola et al. demonstrated
that QMC simulations can recover the scaling of ground-state tunneling rates, which validates QA in
terms of solving combinatorial problems.
Some classes of hard problems, including ones with exponential or combinatorial complexity,
have been a subject of interest in SQA research. Martonak et al. [12] introduced an application of SQA
to solve the TSP problem. They found that a PIMC algorithm was more efficient than SA in terms of
finding an approximately minimal tour in a given graph. SQA has also been used to successfully address
other hard problems related to graphs, such as graph coloring [13] and graph isomorphism [14].
In this paper, we propose SQA as a mean to find a Hadamard matrix (H-matrix). Previously, in [15],
we successfully employed SA to perform a similar task, in which low-order H-matrices were found.
Compared to existing H-matrix construction methods, an SA-based method is more general in terms
of its capability of finding (or constructing probabilistically) an m = 4k order H-matrix, without any
restriction on the property of the order m, whereas the Sylvester method requires m = 2n , where k and
n are positive integers. This paper extends this classical SA method to its quantum version, where
PIMC based on Suzuki–Trotter formulation [16,17] is employed to simulate the quantum process.

1.2. Finding A Hadamard Matrix

A Hadamard matrix, or H-matrix, is an orthogonal binary {±1} matrix of size 4k × 4k, where k is
a positive integer. This matrix was discovered by J. J. Sylvester [18] in 1867 and then studied more
extensively by J. Hadamard [19] during his investigation of the maximal determinant problem.
The orthogonal property makes the H-matrix popular in applied areas, such as information coding
and signal transform. In the 1960s and 1970s, Hadamard code was used in space exploration for
information transmission [20,21]. In a recent technological case, CDMA (code-division multiple access),
which is widely used in cellular mobile phone systems, employs Walsh–Hadamard signals to reduce
interference between its users [22,23].

434
Entropy 2018, 20, 141

One of the most important issues in the theory of H-matrix is its existence. Any 2l order H-matrix
with l a positive integer can be constructed using Sylvester’s method. Furthermore, if there is an m
order H-matrix, m = 4k can be shown for a positive integer k. On the other hand, no one yet knows
if there is always a 4k order H-matrix [20,21]. The latter case is formulated as the Hadamard matrix
conjecture. Up to this writing, the smallest unknown 4k order H-matrix is 668.
Various reconstruction methods have been proposed [24–29]. Nevertheless, these methods force
the order m to follow a particular rule. In [15], a general m = 4k order algorithm employing SA is
proposed. The method works on a special H-matrix called a seminormalized Hadamard (SH) matrix,
in which the first column is a 4k order unity vector v0 = (1, · · · , 1) T , and the rest are 4k order SH
vectors vi ∈ V.
A brute-force method needs to verify all NB of the 4k order binary matrix to find an H-matrix,
2
where NB (4k ) = 216k [15]. Let all matrices constructed where v0 is the first column and a combination
of vi ∈ V constitutes the remaining (4k − 1) columns be called quasi-SH (QSH) matrices. Since there are
4k 4k
NV = C (4k, 2k ) SH vectors, there are about NQU (4k) ≈ 8k23/2 unique QSH-matrices. Although the
number has been greatly reduced compared to NB , exhaustive checking still requires a great amount
of computational resources. The SA method proposed in [15] is capable of finding a few low-order SH
matrices in a more reasonable time.
Following the convention in our previous paper [15], the role of the spin, i.e., its ±1 eigenvalues,
is replaced by SH spin vectors vi ∈ V. To find a 4k order SH-matrix, one needs (4k − 1) fully connected
SH spin vectors, which initially are set randomly. With a defined energy E( Q ), the SH spin vectors are
randomly changed in accordance with conditions whereby a transition into another SH spin vector is
allowed but a transition into a non-SH-spin-vector is forbidden.

2. Methods

2.1. Simulated Quantum Annealing

The Hamiltonian of an Ising system with spin configuration {σ̂k }, where k ∈ K = {1, 2, · · · , i, j, · · · }
is the set of the lattice’s indices, can be expressed as

Ĥ = − ∑ Jij σ̂iz σ̂jz − ∑ hi σ̂iz (1)

i
= j i

where Jij is a coupling constant/strength between a spin at site i with a spin at site j, h j is the magnetic
strength at site j, and {σ̂iz , σ̂ix } are Pauli’s matrices at site i. In SQA, quantum ﬂuctuation is elaborated
by introducing a transverse magnetic ﬁeld Γ. The Hamiltonian of the system takes the following
form [5]:
ĤQA = − ∑ Jij σ̂iz σ̂jz − ∑ hi σ̂iz − Γ ∑ σ̂ix . (2)
i
= j i i

In Equation (2), the transverse field is changed (reduced) over time, i.e., Γ ≡ Γ(t). On the right
hand side of the equation, the first two terms corresponds to potential energy Ĥ pot , while the third one
is the Hamiltonian introduced by the transverse field, which is related to kinetic energy Ĥkin ; i.e, we
can define
Ĥ pot ≡ − ∑ Jij σ̂iz σ̂jz − ∑ hi σ̂iz (3)
i
= j i

Ĥkin ≡ −Γ ∑ σ̂ix . (4)

435
Entropy 2018, 20, 141

In general, Ĥ pot and Ĥkin do not commute, so [ Ĥ pot , Ĥkin ]

= 0. Denoting the Hamiltonian of the
' (
potential as a function of spin conﬁgurations Ĥ pot ≡ Ĥ {σ̂iz } , we can also express Equation (2) in a
more general form as follows:
ĤQA = − Ĥ ({σ̂iz }) − Γ ∑ σ̂ix . (5)
i

To simulate a quantum system described by Equation (5) using the classical method, we have to
formulate PIMC by introducing imaginary time. It can be then approximated by the Suzuki–Trotter
transform by adding one dimension in the imaginary time direction, which, for ( P × N ) degrees of
freedom, takes the following form [13,30]:

P ' ( P −1 N N
1
HST =
P ∑ H pot {Si,p } − JΓ ∑ ∑ Si,p Si,p+1 + ∑ Sj,1 Sj,p (6)
p =1 p =1 i j

where N is the number of spins in the lattice, P is the number of Trotter’s replicas, Si = ±1 are the
eigenvalues of the spin matrices, and

PT Γ
JΓ = − ln tanh >0 (7)
2 PT

is the nearest-neighbor coupling of the transverse magnetic ﬁeld [30].

2.2. SQA Formulation of the SH Spin Vector

Similar to the previous paper [15], we employ a seminormalized Hadamard spin vector,
abbreviated here as an SH spin vector, instead of an ordinary spin. In a 4k order SH spin vector,
for a given positive integer k, 2k spins are −1 and another 2k spins are +1. Therefore, an SH spin
vector transition is allowed only if these balance numbers are conserved; otherwise, such a transition
is forbidden. We also treat the SH spin vector as a single entity, even though it consists of 4k spins,
and is denoted as vi ∈ V, where V is the set of all 4k-order SH vectors. We formulate the energy of a
particular conﬁguration of spin vectors {vi } as follows:

E ({vi }) = ∑ vi · v j + ∑ 1 · vi − 16k2 (8)
i
= j i

where vi · v j denotes the inner product of the vector vi with v j .
Figure 1 shows an Ising system with four SH spin vectors with an additional Trotter’s dimension.
In the lower part of Figure 1a, each circle represents a binary spin, whereas the solid line represents the
connection among the spins. Interacting spin i with binary variable Si and spin j with binary variable
S j contributes the term Jij Si S j to the Hamiltonian. For a 4k order case, every 4k non-connected spins
are grouped into one SH vector vi , which is illustrated as a dashed line. To simplify the diagram, each
SH vector is represented by a ﬁlled circle; thus, we obtain the upper part of Figure 1a, which is called a
slice or a replica. In the PIMC, the slice is replicated P-times, and these slices are arranged as layers
in imaginary time. Each neighboring SH vector in a replica, i.e., vi,p with vi,p−1 and vi,p with vi,p+1 ,
interacts. The extension (in imaginary time) is illustrated in Figure 1b. The Hamiltonian in Equation (6)
becomes a Hamiltonian of an SH vector spin system HQV that can be rewritten as follows:

P ' ( P −1
1
HQV =
P ∑ Hpot {vi,p } − JΓ ∑ ∑ vi,p · vi,p+1 + ∑ vi,1 · vi,p (9)
p =1 p =1 i i

436
Entropy 2018, 20, 141

' (
where JΓ ≡ JΓ (t) and H pot {vi,p } represent complete-graph connections among the SH spin vectors,
similar to Equation (8), which is given by

' (

H pot {vi,p } = ∑ vi,p · v j,p + ∑1 · vi,p − 16k2 . (10)
i
= j i

The evolution of HQV in Equation (9) leads to the solution to the H-matrix search problem.

(a) (b)

Figure 1. Connection diagrams of the spins and spin vectors. We consider a four-order SH vector
in this example: (a) four SH spins are connected by a complete graph K4 , and each column is then
grouped into a single SH spin vector; (b) an extension of fully connected SH spin vectors into a Trotter
dimension (imaginary time) τ.

We will now formulate the SQA method for ﬁnding the H-matrix into an algorithm, which
is displayed as pseudo-code in Algorithm 1. It takes the matrix order, the number of replicas,
the initial temperature, the initial value of Γ, and the amount of iterations and sub-iterations as inputs.
This algorithm yields either an SH-matrix or a QSH-matrix that has more orthogonal column vectors
than the initial one. The algorithm starts with a random initialization of replicas with QSH-matrices,
which are (4k − 1) sets of SH vectors, and then calculates its initial energy. Following the schedule of a
linear transverse ﬁeld, a trial transition is performed for each replica. The acceptance and rejection
of the transition is based on the Metropolis criterion. The iteration will be stopped when either the
number of maximum iterations is reached or an SH-matrix is found.

437
Entropy 2018, 20, 141

Algorithm 1 Finding an H-Matrix via Simulated Quantum Annealing

1: Input: Order of SH-matrix 4k, number of replicas P, T0 , Γ0 , MaxIter, SubIter.
2: Output: A 4k-order SH-matrix H
F or a partially orthogonal matrix Q.
3: Initialize T = T0 , Γ = Γ0
4: Initialize all-replicas R with randomly generated QSH-matrix: R ← { Q 1 , ..., Q
P}
5: idx ← 0
6: F ← 0
H
7: FLAG ← 0
8: while (idx < MaxIter) or (FLAG== 0) do
9: Calculate JΓ (idx; Γ, P, T )
10: Calculate current all-replicas energy: Erep = HQV ( R, JΓ )
11: r←0
12: while r < P do
13: r
Select a replica at position r: Q
14: r)
Calculate potential energy of the replica: E pot = H pot ( Q
15: if E pot > 0 then
16: m←0
17: while (m < SubIter) and (FLAG== 0) do
18: Flip SH spin vector randomly: Qr → Q r
19: r )
Calculate energy of the updated replica: E pot1 = H pot ( Q
20: if E pot1 ==0 then
21: E pot ← E pot1
22: F ← Q
H r
23: FLAG← 1
24: r←P
25: else
26: Update all-replicas: R → R
27: Calculate energy of updated all-replicas Erep1 ← HQV ( R , JΓ )
28: ΔErep ← Erep1 − Erep
29: ΔE pot ← E pot1 − E pot
30: Perform a transition if allowed (Metropolis update rule):
ΔErep
31: if (ΔE pot < 0) or (ΔErep < 0) or (e− T > rand) then
32: Accept the transition: R ← R , Erep ← Erep1
33: end if
34: end if
35: m ← m+1
36: end while
37: else
38: F ← Q
H r
39: FLAG ← 1
40: r←P
41: end if
42: r ← r+1
43: end while
44: end while

438
Entropy 2018, 20, 141

3. Numerical Experiments and Analysis

3.1. Finding a 12-Order SH-Matrix Using SQA

We have performed numerical experiments to find low-order H-matrices. Here we present results
for the H-matrix of order 12 for detailed analysis, since it is the lowest-order H-matrix that cannot
be constructed by the Sylvester method. Initially, all of the slices (replica) were filled with randomly
generated vi ∈ V. Note that there are two nested iterations in Algorithm 1. The first one is an iteration
of all replicas with the maximum number set to k · M × M, where M = 12 is the H-order. The second one
is an iteration of flipping within a slice of a replica, whose number is c · M, c can be any small number.
The energy evolution during the iteration is shown in Figure 2. The figure shows curves of
replica energy Erep , mean potential energy Epmean , and minimum potential energy Epmin . The replica
energy is defined similarly to Equation (9), i.e., Erep ≡ HQV , whereas the potential energy is given
by Equation (10) E pot ≡ H pot . The mean and minimum values have been taken across the replicas.
Based on the figure, both Epmean and Erep fluctuate over time, but they tend to decrease. The minimum
energy of a lattice in the replica Epmin also tends to decrease. When Epmin = 0, the H-matrix is found.

Figure 2. Energy evolution during the SQA algorithm runs to ﬁnd an SH-matrix of order 12. Four curves
are drawn in the graph, which are the mean potential energy Epmean , the minimum potential energy
Epmin , the replica energy Erep , and the deviation standard of the potential energy Epstd . When Epmin
equals zero, the iteration is stopped since an SH-matrix has been found. The Epstd curve indicates high
variation in the conﬁguration of replicas at the initial stage, which is then reduced in later stages.

The degree of orthogonality of the matrix Q is displayed by the indicator matrix D ≡ Q T Q.

Figure 3 shows the initial QSH-matrix and its related indicator matrix. We also show the initial and
final indicators for the first and last slices of the replica in Figure 4. It is expected that all of the
QSH-matrices become more orthogonal, indicated by a lower number of zeros in off-diagonal entries.
The last figure showing the last slice of the replica condition after the iterations are completed clearly
show this case. The found H-matrix is shown in the left part of Figure 5, with its corresponding
indicator shown on the right, which is a diagonal matrix.

439
Entropy 2018, 20, 141

Figure 3. The initial state of the found H-matrix: (a) The QSH-matrix, white squares indicate +1, black
squares indicate −1. (b) Orthogonality indicator, gray squares show the non-orthogonality condition
of related pair of vectors.

Figure 4. Indicator matrices of the replica content: (a) the first replica at the initial stage; (b) the last
replica at the initial stage; (c) the first replica at the final stage, and (d) the last replica at the final stage.
The matrices at the initial stages show most of the vectors as non-orthogonal, whereas those at the final
stages show most of the vectors as orthogonal.

440
Entropy 2018, 20, 141

Figure 5. Final results: (a) the found H-matrix and (b) its orthogonality indicator. The diagonal form of
the indicator matrix indicates that all of the column vectors are now orthogonal.

3.2. The Number of The Replicas and Convergence Issue

In theory, the number of Trotter’s replicas P should be as large as possible. However, in practice,
we should also consider the convergence issue when a running time restriction (iteration number)
is given. As explained in [13,30], replicas provide diversity of solutions; a greater P selects the best
solution with minimum energy. On the other hand, the replicas are not merely running Monte Carlo
on several replicas; the interactions between replicas JΓ (t) also deﬁne their behavior, i.e., a large value
of Γ at the initial stage implies a low value of JΓ , which loosens the connections, and the interactions
then become independent. A low Γ value at the end of an iteration implies a high JΓ value, which
tightens the replica connections such that they become similar. To measure these variations, we used a
simple deviation standard of energy across the replicas. Figure 6 shows the curves of variation of the
energy evolution for P = 5, 10, 15, and 20 in ﬁnding a 12-order H-matrix.

Figure 6. The effect of the replica number P in the algorithm: although ideally a large P is desired,
it also needs to be adjusted to the problem. Variation in replica energy (in terms of deviation standards
of the energy across the replicas) when searching for an H-matrix is shown. The numbers of replicas
P = 10, 15, 20 yield large variations up to the end of the iteration, whereas P = 5 yields a better result
with steady values at the end. In all of these cases, for the construction of a 12-order H-matrix, the total
maximum iteration is set to 20,000, consisting of a global iteration count of 20,000 for each P.

Since initially the replicas were set randomly, they will have almost identical energy, so variation
in the energy will be very low. In later iterations, the value will increase as a new conﬁguration is
explored, and this will be followed by a decrease, which indicates that the replicas have become
homogeneous. This cycle of increasing–decreasing energy should be observed if P is chosen properly

441
Entropy 2018, 20, 141

with respect to the dimension of the problem (H-order) and a sufﬁcient number of iterations. When P
is too small, the system will perform akin to classical SA, whereas a P that is too large will cause the
system to fail. The ﬁgure shows that, for a given number of maximum iterations 20,000, the number of
replicas P = 5 is the most suitable; anything higher is too high. This also shows that frequent updates
on a limited number of replicas, compared to less frequent updates on a larger number of replicas,
better achieve convergence.

3.3. Performance Comparison: SQA vs. SA

To compare performances, in the first experiment, we measured the residual error of both
algorithms. Since the ground state is achieved when the matrix becomes orthogonal, in which case
Equation (10) will equal zero, the residual error will be defined as the minimum H pot over all of the
replicas, i.e., = min ( H pot ). We have chosen the order of the H-matrix to be sufficiently large so that
we will still have a residual error at the end of the execution of the algorithm, i.e., so that the H-matrix
is not found. We considered order M = 28 to be sufficient for this purpose, where we actually have
283 = 21,952 spins. We also chose a Trotter slice of P = 5 and plotted the curve for iterations 50 up
to 5,000,000.
Following [30], the annealing schedule was linear; i.e., the temperature T was reduced linearly
in SA, and was the transverse magnetic strength Γ. Even though T is reduced linearly, the threshold
probability Pthresh will change exponentially. By using the function

1 −1
Pthresh (t) = 1 − e T(t) (11)
2
the threshold will start a bit higher than 0.5, which asymptotically approaches 1.0 at the end of iteration
time t. Figure 7 shows the curve of T (t), Pthresh (t), Γ(t), and JΓ (t).

(a) (b)

Figure 7. The annealing schedules in SA and SQA: (a) Linear temperature schedule and corresponding
threshold schedule in SA. (b) Linear transverse-ﬁeld Γ(t) and corresponding JΓ (t) in SQA.

The experiments were repeated 10 times for each case. The averages of residual errors for each
iteration numbers are plotted in Figure 8 for both SA and SQA.
The figure shows that, although initially the residual error of SQA is larger than SA, the slope
is steeper. With a higher number of iterations, which in this case is around 100,000, SQA is superior.
Considering that SQA shows the least amount of error among the replica slices, it seems that variation
in the replica is an ideal solution. In SA, once a solution is selected, the change in spin configuration
will be less significant by the time the system reaches a lower energy state. Therefore, in terms of
finding an H-matrix, SQA is superior to SA.

442
Entropy 2018, 20, 141

Figure 8. Residual energy left by the SA and SQA algorithms. The QAP curve shows when the
horizontal axis accounts for the MCS (the Monte Carlo step); i.e., the number of iterations in the SQA
curve is divided by the number of slices P. The ﬁgure shows that SQA outperform SA in ﬁnding an
H-matrix. Even when the number of steps is counted without the MCS, SQA eventually outperforms SA
at higher iterations, demonstrated by the steeper slope of the SQA performance curve, compared to SA.

In the second experiment, both SA and SQA were applied to matrices with an increasing size
(order). Figure 9 shows a graph of computational gain, which is deﬁned as the ratio of the number of
SA iterations to the number of SQA iterations needed to achieve 50 percent of the residual energy of the
initial mean energy of all replicas. The horizontal axis shows the order of the H-matrix, from 4 to 20,
whereas the vertical axis shows the computational gain. The gain grows with the order of the H-matrix,
which shows that speedup increases with problem size. Based on this curve, we observe that SQA
outperforms SA for the Hadamard search problem.

Figure 9. Curve of computational gain, which is the ratio of the number of SA iterations to the
number of SQA iterations needed for the algorithm to achieve 50 percent of its initial residual error.
The horizontal axis represent the problem size, which is the order of the H-matrix. The ﬁgure shows
that the gain grows non-linearly with problem size, indicating that SQA outperforms SA.

4. Conclusions
We here propose a new method of ﬁnding an H-matrix based on SQA. We have formulated the
method into an algorithm, which has been implemented, tested, and analyzed. Low-order H-matrices,
including one of order 12 that cannot be constructed via the Sylvester method, were found. We have
also discussed the advantages of the method over classical SA. Measurements of the residual error and

443
Entropy 2018, 20, 141

the relative running time on an increasing order of H-matrices indicate that SQA is superior to SA in
solving the Hadamard search problem.

Acknowledgments: This research was funded by ITB Grant of Research P3MI 2017. The author would like to thank
ITB (Institut Teknologi Bandung) for their continuous support to his research. He also thanks Donny Danudirjdo
and Andika Triwidada for their assistance in the manuscript layout and English editing.
Conﬂicts of Interest: The author declares no conﬂict of interest.

References
1. Lucas, A. Ising formulations of many NP problem. Front. Phys. 2014, 2, 5, doi:10.3389/fphy.2014.00005.
2. Metropolis, N.; Rosenbluth, A.W.; Rosenbluth, M.N.; Teller, A.H. Equation of state calculations by fast
computing machines. J. Chem. Phys. 1953, 21, 1087, doi:10.1063/1.1699114.
3. Kirkpatrick, S.; Gelatt, C.D., Jr.; Vecchi, M.P. Optimization by simulated annealing. Science 1983, 220, 671–680,
doi:10.1126/science.220.4598.671.
4. Cerny, V. Thermodynamical approach to the traveling salesman problem: An efﬁcient simulation algorithm.
J. Optim. Theory Appl. 1985, 45, 41–51, doi:10.1007/BF00940812.
5. Kadowaki, T.; Nishimori, H. Quantum annealing in the transverse Ising model. Phys. Rev. E 1988, 58, 5355,
doi:10.1103/PhysRevE.58.5355.
6. Santoro, G.E.; Martonak, R.; Tosatti, E.; Car, E. Theory of quantum annealing of an Ising spin glass. Science
2002, 295, 2427–2730, doi:10.1126/science.1068774.
7. Boixo, S.; Rønnow, T.F.; Isakov, S.V.; Wang, Z.; Wecker, D.; Lidar, D.A.; Martinis, J.M.; Troyer, M. Evidence for
quantum annealing with more than one hundred qubits. Nat. Phys. 2014, 10, 218–224, doi:10.1038/nphys2900.
8. Heim, B.; Rønnow, T.F.; Isakov, S.V.; Troyer, M. Quantum versus classical annealing of Ising spin glasses.
Science 2015, 348, 215–217, doi:10.1126/science.aaa4170.
9. Rønnow, T.F.; Wang, Z.; Job, J.; Boixo, S.; Isakov, S.V.; Wecker, D.; Martinis, J.M.; Lidar, D.A.; Troyer, M.
Deﬁning and detecting quantum speedup. Science 2014, 345, 420–424, doi:10.1126/science.1252319.
10. Isakov, S.V.; Mazzola, G.; Smelyanskiy, V.N.; Jiang, Z.; Boixo, S.; Neven, H.; Troyer, M. Understanding
Quantum Tunneling through Quantum Monte Carlo Simulation. Phys. Rev. Lett. 2016, 117, 180402,
doi:10.1103/PhysRevLett.117.180402.
11. Mazzola, G.; Smelyanskiy, V.N.; Troyer, M. Quantum Monte Carlo Tunneling from quantum chemistry to
quantum annealing. Phys. Rev. B 2017, 96, 134305, doi:10.1103/PhysRevB.96.134305.
12. Martonak, R.; Santoro, G.E.; Tosatti, E. Quantum annealing of the traveling-salesman problem. Phys. Rev. E
2004, 70, doi:10.1103/PhysRevE.70.057701.
13. Titiloye, O.; Crispin, A. Quantum annealing of the graph coloring problem. Discret. Optim. 2011, 8, 376–384,
doi:10.1016/j.disopt.2010.12.001.
14. Zick, K.M.; Shehab, O.; French, M. Experimental quantum annealing: Case study involving the graph
isomorphism problem. Sci. Rep. 2015, 5, 11168, doi:10.1038/srep11168.
15. Suksmono, A.B. Finding a Hadamard matrix by simulated annealing of spin-vectors. J. Phys. Conf. Ser. 2012,
856, 012012, doi:10.1088/1742-6596/856/1/012012.
16. Suzuki, M. Relationship between d-dimensional quantal spin systems and (d+1)-dimensional Ising systems:
Equivalence, critical exponents and systematic approximants of the partition function and spin correlations.
Prog. Theor. Phys. 1976, 56, 1454–1469, doi:10.1143/PTP.56.1454.
17. Trotter, H.F. On the product of semi-groups of operators. Proc. Am. Math. Soc. 1959, 10, 545–551,
doi:10.1090/S0002-9939-1959-0108732-6.
18. Sylvester, J.J. Thoughts on inverse orthogonal matrices, simultaneous sign successions, and tessellated
pavements in two or more colours, with applications to Newton’s Rule, ornamental tile-work, and the theory
of numbers. Lond. Edinb. Dublin Philos. Mag. J. Sci. 1867, 34, 461–475.
19. Hadamard, J. Resolution d’une question relative aux determinants. Bull. Sci. Math. 1893, 17, 240–246.
20. Hedayat, A.; Wallis, W.D. Hadamard Matrices and Their Applications. Ann. Stat. 1978, 6, 1184–1238.
21. Horadam, K.J. Hadamard Matrices and Their Applications; Princeton University Press: Princeton, NJ, USA,
2007; ISBN 978-1-40-084290-2.

444
Entropy 2018, 20, 141

22. Garg, V. Wireless Communications and Networking; Morgan-Kaufman: San Francisco, CA, USA, 2007;
ISBN 978-0-12-373580-5.
23. Seberry, J.; Wysocki, B.J.; Wysocki, T.A. On some applications of Hadamard matrices. Metrika 2005, 62,
221–239, doi:10.1007/s00184-005-0415-y.
24. Paley, R.E.A.C. On Orthogonal Matrices. J. Math. Phys. 1933, 12, 311–320, doi:10.1002/sapm1933121311.
25. Dade, E.C.; Goldberg, K. The construction of Hadamard matrices. Mich. Math. J. 1959, 6, 247–250,
doi:10.1307/mmj/1028998229.
26. Williamson, J. Hadamard’s determinant theorem and the sum of four squares. Duke Math. J. 1944, 11, 65–81,
doi:10.1215/S0012-7094-44-01108-7.
27. Bush, K.A. Unbalanced Hadamard matrices and ﬁnite projective planes of even order. J. Comb. Theory Ser. A
1971, 11, 38–44, doi:10.1016/0097-3165(71)90005-7.
28. Bush, K.A. Atti del Convegno di Geometria Combinatoria e sue Applicazioni; University Perugia: Perugia, Italy,
1971, Volume 131.
29. Wallis, J.S. On the existence of Hadamard matrices. J. Comb. Theory A 1976, 21, 188–195,
doi:10.1016/0097-3165(76)90062-5.
30. Battaglia, D.A.; Santoro, G.E.; Tosatti, E. Optimization by quantum annealing: Lessons from hard satisﬁability
problems. Phys. Rev. E 2005, 71, 066707, doi:10.1103/PhysRevE.71.066707.

445
entropy
Article
Quantum Genetic Learning Control of Quantum
Ensembles with Hamiltonian Uncertainties
Ameneh Arjmandzadeh and Majid Yarahmadi *
Department of Mathematics and Computer sciences, Lorestan University, Khorramabad, Lorestan 465, Iran;
[email protected]
* Correspondence: [email protected]; Tel.: +98-916-665-3079

Received: 9 April 2017; Accepted: 19 July 2017; Published: 1 August 2017

Abstract: In this paper, a new method for controlling a quantum ensemble that its members have
uncertainties in Hamiltonian parameters is designed. Based on combining the sampling-based
learning control (SLC) and a new quantum genetic algorithm (QGA) method, the control
of an ensemble of a two-level quantum system with Hamiltonian uncertainties is achieved.
To simultaneously transfer the ensemble members to a desired state, an SLC algorithm is designed.
For reducing the transfer error signiﬁcantly, an optimization problem is deﬁned. Considering the
advantages of QGA and the nature of the problem, the optimization problem by using the QGA
method is solved. For this purpose, N samples through sampling of the uncertainty parameters
via uniform distribution are generated and an augmented system is also created. By using QGA
in the training step, the best control signal is obtained. To test the performance and validation of
the method, the obtained control is implemented for some random selected samples. A couple of
examples are simulated for investigating the proposed model. The results of the simulations indicate
the effectiveness and the advantages of the proposed method.

Keywords: quantum control; quantum genetic algorithm; sampling-based learning control (SLC)

1. Introduction
In quantum phenomena, as in the classical systems, the existence of uncertainties and noises are
unavoidable. For example, in superconducting qubits, the coupling energy of a Josephson junction may
have fluctuations [1,2]. Noises and fluctuations may exist in magnetic fields and electric fields in cavity
quantum electrodynamics (QED) [3,4]. The spins of an ensemble in nuclear magnetic resonance (NMR)
experiments may not be exactly known with respect to the strength of the applied radio frequency
field [5].
The classification of inhomogeneous quantum ensembles is a significant issue which has many
applications in the discrimination of atoms (or molecules), the separation of isotopic molecules,
and quantum information extraction. Thus, treating the quantum systems with uncertainties is an
important and applicable subject which needs to be considered.
A quantum ensemble consists of a large number of single quantum systems. In the practical world,
some of the quantum systems exist in the form of quantum ensembles. Each single quantum system in
a quantum ensemble is referred to as a member of the ensemble [6]. Quantum ensembles have wide
applications in emerging quantum technology, including long-distance quantum communication [7],
quantum computation [8], and magnetic resonance imaging [9].
Control of inhomogeneous quantum ensembles is an important issue in practical applications.
Control of inhomogeneous quantum systems for discrimination between two or more similar systems,
for instance, is an attractive field of study [10]. In practical applications, the members of quantum
ensembles could have variations in some parameters of dynamic systems. These situations are referred
to as inhomogeneous quantum ensembles [6].

Entropy 2017, 19, 376; doi:10.3390/e19080376 447 www.mdpi.com/journal/entropy

Entropy 2017, 19, 376

There are many approaches which can be used for solving quantum control problems with
uncertainties. For instance, an optimal control for NMR pulse sequences is designed by applying
gradient algorithms [11]. Additionally, a sequential convex programming method is proposed for
designing robust quantum manipulations [12]. Dong and his collogues have designed a development
of the variable structure control approach with sliding modes to improve the robustness of quantum
systems in which a sliding mode control method is presented for two-level quantum systems to treat
bounded uncertainties in the system Hamiltonian [13]. In addition to these works, a Lyapunov control
method is presented to attain a universal quantum control [14]. For the first time a sampling-based
learning control (SLC) of inhomogeneous quantum ensembles is presented for overcoming the
compensation for parameter dispersion [6]. As an important application, the sampling-based learning
controller is used for designing of a superconducting quantum control of systems [15]. Construction
of universal quantum gates by using a sampling-based learning control are presented in order to
find robust optimal control fields in the presence of different fluctuations and uncertainties [16].
Furthermore, an extended sampling-based learning control for designing a robust quantum unitary
transformation in quantum information processing is presented and implemented [17]. In other
applications, to prevent a control field failing in laser-assisted collisions, a sampling-based robust
control is used [18].
In [19], a systematic sampling-based learning control method with gradient-based learning
algorithms for steering the components of inhomogeneous quantum ensembles with uncertainties to
the same ideal state is investigated by Dong and coworkers. There are some challenges in gradient
algorithms. For instance, they may fall into a local optimum depending on the initial choices of
problem variables or, in complex situations, function derivatives may not be easily found.
Genetic-type algorithms (GAs) have being used in optimization problem-solving. For this purpose,
by applying cross-over and mutation operators on current solutions, new solutions are generated
and, statistically, they are moving toward optimal solutions in the search space. The set of solutions,
however, converges to an optimum solution according to the principle of the Darwinian theory
of evolution.
The quantum genetic algorithm (QGA) was identified by Narayanan and Moore [20]. The QGA,
with even a smaller population, presents a great ability of global optimization and good robustness.
Therefore, as compared with the common genetic algorithm, QGA has greater effectiveness [21,22].
QGAs are mostly constructed based on qubits (or quantum bits) and state superposition in quantum
mechanics. In contrast to classical representations of chromosomes (a binary string, for instance), here
they are represented by vectors of qubits (quantum registers).
In this paper, for controlling the quantum systems with uncertainties, a hybrid method based
on the SLC method and QGA is used. Specially, artificial samples are generated by sampling the
uncertainty parameters in the system model and an augmented system is constructed by using these
samples in the training step. Then, to train a control law with the desired performance for the
augmented system, QG (quantum genetic) learning and optimization algorithms are used. In the
process of testing, a set of selected uncertainty samples is tested to evaluate the control performance.
Additionally, an improvement of QGA is conducted to attain better results. In [22] an adding quantum
mutation operation in the conventional quantum genetic algorithm is used as an improving device.
Quantum mutation, by swapping the value of the probability amplitude of qubits (α, β), can completely
reverse the individual’s evolutionary direction. In this paper the mutation operation is implemented
on measured qubits (bit strings), which is more effective than adding quantum mutation. Reduction of
learning iterations, test error and training error, and also increasing the fidelity index are advantages
of the proposed method.
This paper is organized as follows: Section 2 represents the quantum control model and formulates
the control problem; A quantum genetic learning ensemble control algorithm is designed in Section 3;
Simulation results and control performance are illustrated in Section 4; Conclusions are presented in
Section 5.

448
Entropy 2017, 19, 376

2. Problem Formulation
In this paper, a ﬁnite-dimensional (N-level) closed quantum system with a state in an underlying
Hilbert space is considered. The states can be written as a superposition of eigenstates as follows:
N
|ψ(t) = ∑ ci (t)|φi (1)
i =1

where complex numbers ci (t) satisfy ∑iN=1 |ci (t)|2 = 1 and {|φi }iN=1 are the eigenstates of the N-level
quantum system [23]. Usually, the states of two-level quantum systems are considered as arrows from
the origin to points on the Bloch sphere [24].
The dynamical equation can be described as the following Schrödinger equation:

i dt
d
|ψ(t) = H (t)|ψ(t)
(2)
|ψ(t = 0) =|ψ0
√
where is Plank constant (assume = 1 in this paper), H (t) is the system Hamiltonian and i = −1.
The dynamics of the system are governed under the following Hamiltonian:
M
H (t) = H0 + Hc (t) = H0 + ∑ um (t) Hm (3)
m =1

where H0 is the free Hamiltonian of the system and Hc (t) is the time-dependent control Hamiltonian
that represents the interaction of the system with the external control ﬁelds um (t), m = 1, 2, . . . , M
(scalar functions). Additionally, Hm for m = 1, 2, . . . , M are Hermitian operators.
In practical applications, there exist external disturbances affecting the control ﬁelds. Assume
that the system Hamiltonian is disturbed as follows:
M
HΘ (t) = f 0 (θ0 ) H0 + ∑ f m (θm )um (t) Hm (4)
m =1

where functions f m (θm ), (m = 0, 1, . . . , M) characterize uncertainty functions and Θ = (θ0 , θ1 , . . . , θ M ).

To compare and indicate the advantages of the proposed method, it is assumed that the situations
and assumptions are similar to the system described in [19]. Therefore, let f m (θm ), for m = 1, 2, . . . , M,
be continuous functions and the parameters θm ∈ [1 − Em , 1 + Em ] could be time-dependent.
For simplicity, one can assume that the uncertainty bounds E0 =, . . . , = Em =, . . . , = E M = E are
all equal in this paper. Additionally, let the nominal values of θm are 1 and the fluctuations of the
uncertainty parameters θm be 2E (where E ∈ [0, 1]).
The objective is to design the controls {um (t), m = 1, 2, . . . , M } to steer the quantum system with
:
uncertainties from an initial state |ψ0 to a target state ψtarget with high fidelity. The fidelity between
two pure quantum states |ψ1 and |ψ2 is defined as [25]:
F (| ψ1 ,|ψ2 ) = | ψ1 |ψ2 |. (5)

Suppose that a similar ensemble’s members with different Hamiltonians are given. The main
objective is to drive the members from an initial state to a desired state. To control the ensemble,
one can select a set of samples instead of all ensemble members and create an augmented system to
be controlled. Let { HΘn , n = 1, 2, . . . , N } be the Hamiltonian of the selected samples, where N is the
number of the training samples. The augmented system is constructed as follows:
⎛ ⎞ ⎛ : ⎞
|ψ1 (t) HΘ1 (t)ψ1 (t)
⎜ ⎟ ⎜ : ⎟
d⎜ |ψ2 (t) ⎟ ⎜ HΘ2 (t)ψ2 (t) ⎟
⎜ ⎟ = −i ⎜ ⎟, (6)
dt ⎜ ⎟ ⎜ ⎟
.. ..
⎝ . ⎠ ⎝ .
⎠
:
|ψN (t) HΘ N (t)ψN (t)

449
Entropy 2017, 19, 376

5' ( 6
where Θn ∈ θ0n0 , θ1n1 , . . . , θ Mn M , n0 = 1, 2, . . . , N0 , . . . , n M = 1, 2, . . . , NM and N = ∏ jM= 0 Nj
is number of the training samples. The task is to ﬁnd the best control u∗ such that the
performance function
1 N ;

<2

J (u) = ∑
N n =1
ψn (t)ψntarget (7)

for each control strategy in u = {um (t), m = 1, 2, . . . , M }, is maximized. Thus, the control problem can
be formulated as a maximization problem as follows:

N ;

<2

1
max J (u) = N∑ ψn ( T )ψn target
⎛ ⎞ n =1 ⎛ : ⎞
|ψ1 (t) HΘ1 (t)ψ1 (t)
⎜ ⎟ ⎜ : ⎟
d⎜
⎜ |ψ2 (t) ⎟ ⎜ HΘ2 (t)ψ2 (t) ⎟
s.t. dt ⎟ = −i ⎜ ⎟, (8)
⎜ .. ⎟ ⎜ .. ⎟
⎝ . ⎠ ⎝ .

⎠
:
|ψN (t) HΘ N (t)ψN (t)
M
ψ(t = 0) = |ψ0 , HΘn (t) = f 0,n (θ0,n ) H0 + ∑ f m,n (θm,n )um (t) Hm , n = 1, 2, . . . , N
m =1

when θm,n ∈ [1 − E, 1 + E], t ∈ [0, T ] and n = 1, 2, . . . , N.

Note that J (u) depends on the control signal u, implicitly, subject to Schrödinger equation,
be satisﬁed.

3. Quantum Genetic Learning Ensemble Control Algorithm

In this section a systematic methodology for control design of a quantum ensemble is presented
during two training and testing steps. Solving Equation (8) by using QGA a quantum learning
controller is designed.

3.1. Solving Process

If um (t) = um , m = 1, 2, . . . , M for t ∈ [0, Δt] where um is a constant, then according to the
Schrodinger equation and time-evolution equation for each sample, from Equation (6) we have:

|ψn (Δt) = e−i( H0 f0 (θ0n0 ) + ∑m=1 um f m (θmnm ) Hm )Δt |ψn (0) , n = 1, 2, . . . , N.

M
(9)

So, for t ∈ [0, Δt] considering Equation (9), the objective function of Equation (8), changes to:

N ; <2
1
ψn (0)e−i( H0 f0 (θ0n0 ) + ∑m=1 um f m (θmnm ) Hm )Δt ψntarget .
M
Max J (u) =
N ∑ (10)
n =1

Hence, [0, T ] is divided into Q subintervals and suppose that um (t), m = 1, 2, . . . , M are constants
:
in any subinterval with the same length Δt = T/Q. Let ψn j−1 (0) be the initial state of the control
system in the j-th subinterval, then for j-th subinterval the following problem must be solved:
< 2
Max J j (u) = |e−i( H0 f0 (θ0n0 ) + ∑m = 1 um f m (θmnm ) Hm )Δt ψn j − 1 (0)ψn target
M j
, (11)

where
|ψ −|ψ (0)
< n
|ψn (0) + j target Q
n
j
ψn target = |ψn −|ψn (0)
(12)
|ψn (0) + j target Q

450
Entropy 2017, 19, 376

is the target state of j-th subinterval for n-th sample. In each subinterval, Equation (11), by QGA is
∗j
solved and the best control um , m = 1, . . . , M is obtained. Then, for j = 1, . . . , Q,
< <
j −i ( H0 f 0 (θ0n0 )+∑m
M ∗j

=1 um f m (θmnm ) Hm )Δt ψ j −1 (0)
ψn = e n (13)

∗j
is the state transferred by optimal control um , m = 1, 2, . . . , M in the j-th subinterval, which is
: :
considered as the initial state of the next subinterval, that is, ψn j (0) = ψn j is the initial state of the
(j + 1)-th subinterval, and the process continues.

3.2. Structure of Quantum Chromosomes

The smallest unit of information stored in a two-state quantum unit is called a quantum bit or
qubit, which can be in a superposition of states. QGA is an algorithm based on the concepts of qubit
and superposition of the states in quantum mechanics theory [22]. A chromosome is made as a string
of m qubits that forms a quantum register. Additionally, the j-th individual chromosome of the t-th
generation can be indicated as
, -
α1k α21 α2k . . .
jt jt jt jt jt jt jt jt jt
α11 α12 ... α22 . . . α αm2 . . . αmk
utj = jt jt . . . . . . jtm1 (14)
β ...
jt jt jt jt jt jt
β 11 β 12 ... β 1k β 21 β 22 . . . 2k β m1 β m2 . . . β mk

where m indicates the number of genes in any chromosomes and k represents the number of qubits
encoding each gene. In the initial generation (when t = 0), quantum encoding (α,β) of each individual
in the population is initialized with ( √1 , √1 ), which denotes that the probability of collapsing the
2 2
superposed state into each basic states is equal.

3.3. Quantum Rotating Gates

Unlike the conventional genetic algorithm that uses a crossover operation, the quantum genetic
algorithm applies the probability amplitude of qubits to encode chromosomes and uses quantum
rotating gates to update generations. The genetic utilization of the quantum genetic algorithm is
mainly through acting on the superposition state or entanglement state by the quantum rotating gates
to change the probability amplitude. Accordingly, the construction of quantum rotating gates is the
key issue of the quantum genetic algorithm, and it directly affects the performance of the algorithm.
Quantum rotating gates can be organized according to the practical problems and usually can be
deﬁned as [26] , -
cos(ξ i ) − sin(ξ i )
R(ξ i ) = . (15)
sin(ξ i ) cos(ξ i )

Therefore, the updating process is deﬁned as follows:

, - , - , -, -
αi αi cos(ξ i ) − sin(ξ i ) αi
= R(ξ i ) = (16)
βi βi sin(ξ i ) cos(ξ i ) βi

' (T
where (αi , β i ) T and αi , βi are the probability amplitudes of the i-th qubit in a chromosome before
and after the quantum rotating gates update, respectively. Additionally, θi is the rotating angle.
In Table 1, the updating strategies, for the chromosomes, are presented. The value and the sign of θi are
determined by the adjustment strategy. Here, xi is the i-th bit of the current chromosome; Refi is the
i-th bit of the current optimal binary solution, named the reference binary solution, that all quantum
chromosomes should be steered toward its corresponding chromosome; f ( x ) is the ﬁtness function;
s(αi , β i ) is the rotate direction of the rotating angle and Δθi is the increment value of the i-th rotating
angle. The value of Δθi is a constant and is usually around 0.01π. The overall process in QGA is similar

451
Entropy 2017, 19, 376

to the GAs but with some differences in changing from one generation to the next one. In fact, a new
generation P(t) is achieved by operating quantum rotating gates on any individuals.

Table 1. Adjustment strategy of rotating angle.

s(ffi , fii )
xi Refi f(x) > f(Ref) Δθi
ffi fii > 0 ffi fii < 0 ffi = 0 fii = 0
0 0 FALSE 0 0 0 0 0
0 0 TRUE 0 0 0 0 0
0 1 FALSE Δθi +1 −1 0 ±1
0 1 TRUE Δθi −1 +1 ±1 0
1 0 FALSE Δθi −1 +1 ±1 0
1 0 TRUE Δθi +1 −1 0 ±1
1 1 FALSE 0 0 0 0 0
1 1 TRUE 0 0 0 0 0

A genetic type-based iterative learning algorithm is shown in Algorithm 1. The algorithm is

written according to Section 3.1.

Algorithm 1. Genetic Type Based Iterative Learning Algorithm

set j = 1 (counter of subintervals)
<

set ψ j − 1 (0) = | ψ(0)
< |ψntarget −|ψn (0)
j |ψn (0) + j
ψn target = Q
(target state)
| ψntarget −|ψn (0)
|ψn (0)+ j Q
Repeat (for each subinterval)
j0
Choose a set of arbitrary controls um , m = 1, 2, . . . , m
j∗
Solve problem (11) by using QGA and ﬁnd um , m = 1, 2, . . . , m
< <
j −i ( H0 f 0 (θ0n0 )+∑kM=1 uk f k (θknk ) Hk )Δt j − 1
∗j
ψn = e ψn (0)
< <
j
ψn (0) := ψn j
j = j+1
Until j = Q
j∗
The optimal control u∗m = {um , j = 1, 2, . . . , Q} , m = 1, 2, . . . , M

Additionally, in Figure 1, a schematic diagram of the proposed method is given. In this diagram,
first, a random population of quantum chromosomes P(t) is generated. A binary population Pb (t) by
measuring the present population is obtained. After evaluating Pb (t) and specifying the best solution
Ref, the whole of the quantum chromosomes are rotated toward the corresponding chromosome of
Ref, according to Table 1. This process generates a new population with better fitness. As indicated in
Figure 1, the above processes are repeated until the stop criterion is satisfied for all j = 1, 2, . . . , Q.

452
Entropy 2017, 19, 376

Figure 1. Diagram description of ﬁnding the signal control process.

4. Simulation Results
In this section two examples are simulated. Assume that all of the control signals are bounded in
a known interval [umin , umax ].
Objective and protocols of simulation are explained as follows:
Let [umin , umax ] = [−4, 6], the initial state |ψ(0) = (0, 0, 1) in real coordinates (i.e., |ψ0 = [1 0]t ),
the time interval [0, 5] (T = 5) is divided by Q = 20 and time slices Δt = 0.25. Additionally, the
quantum genetic populations are the input control signals. The evolution generation number, the size
of the population and the length of the each quantum chromosome are 200, 100, and 24, respectively.
The mutation rate is 0.05 and the selection percentage of individuals is 50%. The stop condition for
the iterative algorithm is considered as |1 − J (u)| < ε (ε = 0.001). The objective of the problem-solving
is transferring all of the initial states to the target state |ψ( T ) = (0, 0, −1)(i.e., |ψT = [0 1]t ), with
maximum ﬁdelity. The value of Δθi is set 0.01π.

Example 1: Consider the following two level quantum systems:

2
d
i dt |ψ(t) = ( f 0 (θ0 ) H0 + ∑ f m (θm )um (t) Hm )|ψ(t)
m=1 (17)
|ψ(t = 0) =|ψ0

1 1 1 0 1 1 01
where H0 = =2 σz is the free Hamiltonian and H1 = = 2 σx , H2 = 12 σy =
2 0−1 2 10

1 0−i 01 0−i 1 0
. Additionally, σx = , σy = , and σz = are Pauli matrices.
2 i 0 10 i 0 0−1
Assume that the system’s state is written as

|ψ(t) = c1 (t)|1 + c2 (t)|2 (18)

453
Entropy 2017, 19, 376

where B = {|1, |2} is the orthonormal basis of the corresponding Hilbert space.
Let C (t) = (c1 (t), c2 (t)), where ci (t) are complex time depended coefﬁcients. Therefore, Equation (17)
is equivalent to
. 2
iC (t) = ( f 0 (θ0 ) H0 + ∑m=1 f m (θm )um (t) Hm )C (t). (19)
' (
In this example, let f m (θm ) = 1 − 2θm 2 exp (−θm 2 /2) be the Mexican hat wavelet functions for
m = 1, 2 and f 0 (θ0 ) = 1 on [1 − E, 1 + E] for E = 0.21. After sampling the uncertainty parameters,
every sample can be described as follows:
.
c1 ( t ) 0.5 f 0 (θ0 ) G (θ1 , θ2 ) c1 ( t )
. = −i (20)
c2 ( t ) G ∗ ( θ1 , θ2 ) − f 0 ( θ0 ) c2 ( t )

where G (θ1 , θ2 ) = 0.5( f 1 (θ1 )u1 (t) − f 2 (θ2 )u2 (t)i ) and θi ∈ [1 − E, 1 + E]. Additionally, G ∗ is the
complex conjugate of G. To construct an augmented system for the training step of the SLC method,
consider N training samples that are selected through sampling the uncertainties, as follows:
.
c1,n (t) 0.5 f 0 (θ0,n ) G (θ1,n , θ2,n ) c1,n (t)
. = −i , n = 1, 2, . . . , N. (21)
c2,n (t) G ∗ (θ1,n , θ2,n ) − f 0 (θ0,n ) c2,n (t)

The results of simulation are illustrated in Figure 2. Figure 2a illustrates the control signals um (t),
m = 1, 2 obtained in the training step.

(a) (b)

(c)

Figure 2. Control of an ensemble of two level quantum system with uncertainties: (a) control signals
um (t), m = 1, 2; (b) ﬁdelity function (Fitness function performance); (c) simultaneously steering
ensemble members to the desired state.

454
Entropy 2017, 19, 376

Figure 2b illustrates the mean of fidelity function of any states as a fitness function of the
QGA. Finally, Figure 2c illustrates simultaneously steering ensemble members to the desired state.
As simulation results indicate, 25 training samples are steered to the target state with a fidelity
amplitude 0.9986 and error = 0.001. After running the control system, with founded control signals of
the training step, for 200 test samples the fidelity amount is 0.9968 and the corresponding error is 0.003.

Example 2: The second example is a three-level quantum system with uncertainties in Hamiltonian parameters
that are found widely in natural and artiﬁcial atoms. Some atoms can be explained by a V-type three-level
quantum system model. It is important to reach a robust preparation of this class of states for practical applications
of quantum technology. The SLC, contributed with QGA, is used for a V-type quantum control system. Assume
the initial state is:
|ψ(t) = c1 (t)|1 + c2 (t)|2 + c3 (t)|3 (22)

with B = {|1, |2, |3}, the orthonormal basis of the corresponding Hilbert space.
Let C (t) = (c1 (t), c2 (t), c3 (t)), where ci (t) are complex numbers. Then we have
. 4
iC (t) = ( f 0 (θ0 ) H0 + ∑m=1 f m (θm )um (t) Hm )C (t). (23)

We take H0 = diag(1.5, 1, 0) as the free Hamiltonian and choose H1 , H2 , H3 , and H4 as follows:

⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
010 0−i 0 001 00−i
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
H1 = ⎝ 1 0 0 ⎠, H2 = ⎝ i 0 0 ⎠, H3 = ⎝ 0 0 0 ⎠, H4 = ⎝ 0 0 0 ⎠. (24)
000 000 100 i 0 0

After sampling the uncertainty parameters, every sample can be described as follows:
⎛ . ⎞ ⎛ ⎞⎛ ⎞
c1 ( t ) 1.5 f 0 (θ0 ) G (θ1 , θ2 ) G (θ3 , θ4 ) c1 ( t )
⎜ . ⎟ ⎜ ⎟ ⎜ ⎟
⎝ c2 ( t ) ⎠ = − i ⎝ G ∗ ( θ1 , θ2 ) f 0 ( θ0 ) 0 ⎠ ⎝ c2 ( t ) ⎠, (25)
. ∗
c3 ( t ) G ( θ3 , θ4 ) 0 0 c3 ( t )

where G (θ1 , θ2 ) = f 1 (θ1 )u1 (t) − f 2 (θ2 )u2 (t)i, G (θ3 , θ4 ) = f 3 (θ3 )u3 (t) − f 4 (θ4 )u4 (t)i, and
θi ∈ [1 − E, 1 + E]. E ∈ [0,1] is a given constant and G ∗ is the complex conjugate of G. Comparing
the results with previous works, uncertainty coefﬁcients are chosen the same as what is given in [19],
that is, f m (θm ) = θm and f 0 (θ0 ) = θ0 have uniform distributions over [0.79, 1.21]. To construct an
augmented system for the training step of the SLC design, we choose N training samples (denoted as
n = 1, 2, . . . , N) through sampling the uncertainties as follows:
⎛ . ⎞ ⎛ ⎞⎛ ⎞
c1,n (t) 1.5 f 0 (θ0,n ) G (θ1,n , θ2,n ) G (θ3,n , θ4,n ) c1,n (t)
⎜ . ⎟ ⎜ ⎟ ⎜ ⎟
⎝ c2,n (t) ⎠ = −i ⎝ G ∗ (θ1,n , θ2,n ) f 0 (θ0,n ) 0 ⎠⎝ c2,n (t) ⎠ (26)
. ∗
c3,n (t) G (θ3,n , θ4,n ) 0 0 c3,n (t)

where G (θ1,n , θ2,n ) = f 1 (θ1,n )u1 (t) − f 2 (θ2,n )u2 (t)i and G (θ3 , θ4 ) = f 3 (θ3,n )u3 (t) − f 4 (θ4,n )u4 (t)i.
Now, the objective is to ﬁnd a robust control strategy u(t) = {um (t), m = 1, 2, 3, 4} to drive
: √
the quantum system from |ψ0 = |1 (i.e., C0 = (1, 0, 0)) to ψtarget = 1/ 2 (|2 + |3)
√ √
(i.e., Ctarget = 0, 1/ 2, 1/ 2 ). The general conditions here are similar to ones mentioned in previous
example but Q = 10. Apart from the initial values, the results are always converged and it is more
precise than the gradient method as shown in Table 2.

455
Entropy 2017, 19, 376

Table 2. Comparison between the results of QGA and gradient algorithm.

Method Training Error Test Error

Gradient based learning control 0.004 0.08
Quantum Genetic algorithm 0.002 0.005

The training error is computed as |1 − J (u∗ ( T ))| in which J (u∗ ( T )) is the ﬁdelity function for
training samples. For calculating the test error, optimal control u∗ is implemented to the test samples,
which are selected randomly. Additionally, the amount of |1 − J (u∗ ( T ))| is computed for test samples,
as a test error index. The method presented in this paper always converges and does not depend
on initial choices of u = {um (t), m = 1, 2, . . . , M}. Figure 3a–c demonstrate the control signals um (t),
m = 1, 2, 3, 4 and ﬁdelity function for steering training samples simultaneously to the target state.

(a) (b)

(c)

Figure 3. Control of an ensemble of a two-level quantum system with uncertainties: (a) control signals
um (t), m = 1, 2; (b) control signals um (t), m = 3, 4; (c) ﬁdelity function.

The training samples are steered to the target state with a fidelity of 0.9982. The control values
found in training step are applied on 200 testing samples. The fidelity amount of 0.9954 is achieved
with a test error equal to 0.005. Figure 3a,b show the control signals u1 (t), u2 (t), u3 (t), and u4 (t)
through the time interval [0, 5] and Figure 3c illustrates the mean of the fidelity function of all of the
states as a fitness function of QGA.

5. Conclusions
In this paper a new quantum genetic sampling-based learning controller is designed. For this
purpose an unconstrained nonlinear optimization problem is designed and is solved by a new quantum
genetic algorithm. All of the members of an inhomogeneous quantum ensemble transfers to a known
target state, simultaneously. In this method controller performance is independent of the initial input
values and this is an important advantage of the proposed method as compared with gradient-based

456
Entropy 2017, 19, 376

learning methods. Additionally, transfer process errors and learning iteration numbers are reduced,
signiﬁcantly. A couple of examples for two- and three-level quantum systems are simulated by
using the proposed method. The simulation results indicate the advantages and efﬁciency of the
presented method.

Acknowledgments: The authors are very grateful to the editor and anonymous reviewers for their suggestions in
improving the quality of the paper.
Author Contributions: They conceived of the presented idea and developed the theory of the presented paper.
Both authors discussed the results and contributed to the final manuscript. Both authors have read and approved
the final manuscript.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Shnirman, A.; Schön, G.; Hermon, Z. Quantum manipulations of small Josephson Junctions. Phys. Rev. Lett.
1997, 79, 2371–2374. [CrossRef]
2. Makhlin, Y.; Schön, G.; Shnirman, A. Josephson junction quantum logic gates. Comput. Phys. Commun. 2000,
127, 156–164. [CrossRef]
3. Giovannetti, V.; Vitali, D.; Tombesi, P.; Ekert, A. Scalable quantum computation with cavity QED systems.
Phys. Rev. A 2000, 62, 032306. [CrossRef]
4. Shu, J.; Zou, X.; Xiao, Y.; Guo, G. Quantum phase gate of photonic qubits in a cavity QED system. Phys. Rev. A
2007, 74, 044302. [CrossRef]
5. Li, J.S.; Khaneja, N. Control of inhomogeneous quantum ensembles. Phys. Rev. A 2006, 73, 030302. [CrossRef]
6. Chen, C.; Dong, D.; Long, R.; Petersen, I.R.; Rabitz, H.A. Sampling-based learning control of inhomogeneous
quantum ensembles. Phys. Rev. A 2014, 89, 023402. [CrossRef]
7. Duan, L.M.; Lukin, M.D.; Cirac, J.I.; Zoller, P. Long-distance quantum communication with atomic ensembles
and linear optics. Nature 2001, 414, 413–418. [CrossRef] [PubMed]
8. Cory, D.G.; Fahmy, A.F.; Havel, T.F. Ensemble quantum computing by NMR spectroscopy. Proc. Natl. Acad.
Sci. USA 1997, 94, 1634–1639. [CrossRef] [PubMed]
9. Li, J.S.; Ruths, J.; Yu, T.Y.; Arthanari, H.; Wagner, G. Optimal pulse design in quantum control: A uniﬁed
computational method. Proc. Natl. Acad. Sci. USA 2011, 108, 1879–1884. [CrossRef] [PubMed]
10. Mitra, A.; Rabitz, H. Mechanistic Analysis of Optimal Dynamic Discrimination of Similar Quantum Systems.
J. Phys. Chem. A 2004, 108, 4778–4785. [CrossRef]
11. Khanejia, N.; Reiss, T.; Kehlet, C.; Schulte-Herbrüggen, T.; Glaser, S.J. Optimal control of coupled spin
dynamics: Design of NMR pulse sequences by gradient ascent algorithm. J. Magn. Reson. 2005, 172, 296–305.
[CrossRef] [PubMed]
12. Kosut, R.L.; Grace, M.D.; Brif, C. Robust control of quantum gates via sequential convex programming.
Phys. Rev. A 2013, 88, 1–12. [CrossRef]
13. Dong, D.; Petersen, I.R. Sliding mode control of two-level quantum systems. Automatica 2012, 48, 725–735.
[CrossRef]
14. Hou, S.C.; Wang, L.C.; Yi, X.X. Realization of quantum gates by Lyapunov control. Phys. Lett. A 2014, 378,
699–704. [CrossRef]
15. Dong, D.; Chen, C.; Qi, B.; Petersen, I.R.; Nori, F. Robust manipulation of superconducting qubits in the
presence of ﬂuctuations. Sci. Rep. 2015, 5, 7873. [CrossRef] [PubMed]
16. Dong, D.; Wu, C.; Chen, C.; Qi, B.; Petersen, I.R.; Nori, F. Learning robust pulses for generating universal
quantum gates. Sci. Rep. 2015, 6, 36090. [CrossRef] [PubMed]
17. Wu, C.; Qi, B.; Chen, C. Robust learning control design for quantum unitary transformations.
IEEE Trans. Cybern. 2016, 99, 1–13. [CrossRef] [PubMed]
18. Zhang, W.; Dong, D.; Petersen, I.R.; Rabitz, H.A. Sampling-based robust control in synchronizing collision
with shaped laser pulses: An application. RSC Adv. 2016, 6, 92962–92969. [CrossRef]
19. Dong, D.; Mabrok, M.A.; Petersen, I.R.; Qi, B.; Chen, C.; Rabitz, H. Sampling-Based Learning Control for
Quantum Systems with Uncertainties. IEEE Trans. Control Syst. Technol. 2015, 23, 2155–2166. [CrossRef]

457
Entropy 2017, 19, 376

20. Narayanan, A.; Moore, M. Quantum-inspired genetic algorithm. In Proceedings of the IEEE International
Conference on Evolutionary Computation, Nagoya, Japan, 20–22 May 1996.
21. Laboudi, Z.; Chikhi, S. Comparison of Genetic Algorithm and Quantum Genetic Algorithm. Int. Arab J.
Inf. Technol. 2012, 9, 243–249.
22. Wang, H.; Liu, J.; Zhi, J.; Fu, C. The Improvement of Quantum Genetic Algorithm and Its Application on
Function Optimization. Math. Probl. Eng. 2013, 2013, 1–10. [CrossRef]
23. Wu, C.; Chen, C.; Qi, B.; Dong, D. Robust quantum operation for two-level systems using sampling-based
learning control. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics,
Hongkong, China, 9–12 October 2015.
24. Wang, L.C.; Hou, S.C.; Yi, X.X.; Dong, D.; Petersen, I.R. Optimal Lyapunov quantum control of two-level
systems: Convergence and extended techniques. Phys. Lett. A 2014, 378, 1074–1080. [CrossRef]
25. Nielsen, M.A.; Chuang, I.L. Distance Measures for Quantum Information; Cambridge University Press:
Cambridge, UK, 2000.
26. Lahoz-Beltra, R. Quantum Genetic Algorithms for Computer Scientists. Computers 2016, 5, 24. [CrossRef]

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

458
entropy
Article
Discrete Wigner Function Derivation of the
Aaronson—Gottesman Tableau Algorithm
Lucas Kocia, Yifei Huang and Peter Love *
Department of Physics, Tufts University, Medford, MA 02155, USA; [email protected] (L.K.);
[email protected] (Y.H.)
* Correspondence: [email protected]; Tel.: +1-617-627-3029 (ext. 7-1065)

Received: 3 May 2017; Accepted: 4 July 2017; Published: 11 July 2017

Abstract: The Gottesman–Knill theorem established that stabilizer states and Clifford operations can
be efﬁciently simulated classically. For qudits with odd dimension three and greater, stabilizer states
and Clifford operations have been found to correspond to positive discrete Wigner functions and
dynamics. We present a discrete Wigner function-based simulation algorithm for odd-d qudits that
has the same time and space complexity as the Aaronson–Gottesman algorithm for qubits. We show
that the efﬁciency of both algorithms is due to harmonic evolution in the symplectic structure of
discrete phase space. The differences between the Wigner function algorithm for odd-d and the
Aaronson–Gottesman algorithm for qubits are likely due only to the fact that the Weyl–Heisenberg
group is not in SU (d) for d = 2 and that qubits exhibit state-independent contextuality. This may
provide a guide for extending the discrete Wigner function approach to qubits.

Keywords: quantum information; quantum computation; semiclassical physics

1. Introduction
The cost of brute-force classical simulation of the time evolution of n-qubit states grows
exponentially with n. An important exception to this involves the set of Clifford operators acting
on stabilizer states. This set of states plays an important role in quantum error correction [1] and is
closed under action by Clifford gates. Efficient simulation of such systems was demonstrated with the
tableau algorithm of Aaronson and Gottesman [1,2] for qubits (d = 2). Finding the underlying reason
for why such an efficient algorithm is possible for Clifford circuit simulation has since been the subject
of much study [3–5].
Recent progress has been the result of work by Wootters [6], Gross [7], Veitch et al. [8,9],
Mari et al. [4], and Howard et al. [5], who have formulated a new perspective based on the
discrete phase spaces of states and operators in finite Hilbert spaces using discrete Wigner functions.
In odd-dimensional systems, they have shown that stabilizer states have positive-definite discrete
Wigner functions and that Clifford operators are positive-definite maps. This implies that Clifford
circuits are non-contextual and are efficiently simulatable on classical computers. In odd-dimensional
systems, stabilizer states have been shown to be the discrete analogue to Gaussian states in continuous
systems [7] and Clifford group gates have been shown to have underlying harmonic Hamiltonians
that preserve the discrete Weyl phase space points [10]. This means Clifford circuits are expressible by
path integrals truncated at order h̄0 and are thus manifestly classical [10,11].
This poses the question: what is the relationship between past efficient algorithms for Clifford
circuits and the propagation of discrete Wigner functions of stabilizer states under Clifford operators?
In the present paper, we show that the original Aaronson–Gottesman tableau algorithm for qubit
stabilizer states is actually equivalent to such a discrete Wigner function propagation and that the
tableau matrix coincides with the discrete Wigner function of a stabilizer state. We accomplish this by

Entropy 2017, 19, 353; doi:10.3390/e19070353 459 www.mdpi.com/journal/entropy

Entropy 2017, 19, 353

ﬁrst developing a Wigner function-based algorithm that classically simulates stabilizer state evolution
under Clifford gates and measurements in the Ẑ Pauli basis for odd d. We then show its equivalence
to the well-known Aaronson–Gottesman tableau algorithm [2] for qubits (d = 2). Both algorithms
require O(n2 ) dits to represent n stabilizer states, O(n) operations per Clifford operator, and both
deterministic and random measurements require O(n2 ) operations.
The Aaronson–Gottesman tableau algorithm makes use of the Heisenberg representation.
This means that time evolution is accomplished by updating an associated tableau or matrix
representation of the Clifford operators instead of the stabilizer states themselves. The algorithm we
present is framed in the Schrödinger picture and involves evolving the Wigner function of stabilizer
states. By demonstrating that the two algorithms are equivalent, we show that the formulation of
Clifford simulation in the Heisenberg picture is a choice and not a necessity for its efﬁcient simulation.
Furthermore, by instead working in the Schrödinger picture we are able to more easily reveal the purely
classical basis of both algorithms and the physically intuitive phase space structures and symplectic
properties on which they rely.

2. Discrete Wigner Function for Odd d Qudits

Before we discuss the discrete Wigner function, we introduce a basic framework that deﬁnes how
a phase space behaves for odd d-dimensional Hilbert spaces. To begin, we associate the computational
basis with the position basis, such that the Pauli Ẑj operator on the jth qudit for n qudits acts as a
“boost” operator:
: :
Ẑj k1 , . . . , k j , . . . , k n = e d k j k1 , . . . , k j , . . . , k n ,
2πi
(1)

where k j ∈ Z/dZ for 1 ≤ j ≤ n.

The discrete Fourier transform operator is deﬁned by:

1 :9
F̂j = √ ∑ e−
2πi k l
d j j k 1 , . . . , k j , . . . , k n l1 , . . . , l j , . . . , l n .
d k j ,l j ∈
Z/dZ

This is the d-dimensional equivalent of the Hadamard gate and allows us to deﬁne the Pauli X̂ j operator
as follows:
X̂ j ≡ F̂j Ẑj F̂j† . (2)

While Ẑj is a boost, X̂ j is a shift operator because

δq : :
X̂ j k1 , . . . , k j , . . . , k n ≡ k1 , . . . , k j ⊕ δq, . . . , k n , (3)

where ⊕ denotes integer addition mod d.

We can reexpress the boost Ẑj and shift X̂ j operators in terms of their generators, which are the
conjugate q̂ j and p̂ j operators, respectively:

2πi q̂
Ẑj = e d j (4)

and
X̂ j = e−
2πi p̂ j
d . (5)

Thus, we can refer to the X̂ j basis as the momentum (p j ) basis, which is equivalent to the Fourier
transform of the q j basis:
p̂ j = F̂j q̂ j F̂j† . (6)

These bases form the discrete Weyl phase space ( p, q).

460
Entropy 2017, 19, 353

The Wigner function WΨ ( p, q) of a pure state |Ψ is deﬁned on this discrete Weyl phase space:

( d + 1) ξ q ( d + 1) ξ q
e−
2πi ξ · p
WΨ ( p, q) = d−n ∑ d q Ψ q+ Ψ∗ q− . (7)
ξq ∈
2 2
(Z/dZ)n

This is equivalent to the discrete Wigner function introduced by Gross [7]. We will shortly be interested
in the discrete Wigner function of stabilizer states. However, ﬁrst, we introduce the effect that the
Clifford gates have in this discrete Weyl phase space.

2.1. Clifford Gates

A Clifford group gate V̂ is related to a symplectic transformation on the discrete Weyl phase space,
governed by a symplectic matrix MV̂ and vector αV̂ [7]:
, -
p p 1 1
= MV̂ + αV̂ + αV̂ . (8)
q q 2 2

Wigner functions WΨ ( x) of states evolve under Clifford operators V̂ by

' ' ( (
WΨ MV̂ x + αV̂ /2 + αV̂ /2 , (9)

where x ≡ ( p, q). When considering Clifford gate propagation, we can restrict to a set of gates which
are generators of the Clifford group. One such set of generators is made up of the phase-shift gate P̂i ,
the Hadamard gate F̂i , and the controlled-not (CNOT) Ĉij (which act on the ith and jth qudits).
The phase shift P̂i is a one-qudit gate with the underlying Hamiltonian HP̂ = − d+ 1 2 d +1
2 qi + 2 qi [10].
i
Without loss of generality, we will instead consider

P̂i = P̂i P̂i Ẑi , (10)

which we will refer to as the phase-shift gate in this paper. We note that the usual phase-shift can be
obtained from the new one within the Clifford group:

P̂i = P̂i P̂i Ẑi , (11)

where [ P̂i , Ẑi ] = [ P̂i , Ẑi ] = 0. Hence, P̂i is an adequate replacement generator for P̂i , and we will use
it instead of P̂i from now on. Since its Hamiltonian has no linear term (HP̂ = −q2i ), this leads to an
i
easier presentation ahead since α P̂ = 0. The corresponding equations of motion for P̂i are ṗi = 2qi
i
and q̇i = 0. Hence, for Δt = 1,
M P̂ = δj,k + 2δi,j δn+i,k . (12)
i j,k

The Hadamard gate F̂i is a one-qudit gate and has the underlying Hamiltonian
HF̂ = − π4 ( p2i + q2i ) [10]. The corresponding equations of motion are ṗi = π2 qi and q̇i = − π2 pi .
i
Hence, for Δt = 1,

M F̂ = δj,k − δi,j δi,k − δn+i,j δn+i,k
i
(13)
j,k
+δi,j δn+i,k − δn+i,j δi,k ,

and α F̂ = 0.
i

461
Entropy 2017, 19, 353

Finally, the two-qudit CNOT Ĉij on control qudit i and second qudit j has the corresponding
Hamiltonian HĈ = pi q j [10]. The corresponding equations of motion are ( ṗi , ṗ j ) = −(0, pi ) and
ij
(q̇i , q̇ j ) = (q j , 0). Hence, for Δt = 1,

MĈij = δk,l − δi,k δj,l + δn+ j,k δn+i,l , (14)
k,l

and αĈ = 0.
ij

2.2. Wigner Functions of Stabilizer States

A discrete Wigner function for stabilizer states associated with the boost and shift operators
deﬁned in Equations (4) and (5) is given by the following theorem [10]:

Theorem 1. The discrete Wigner function WΨ ( x) of a stabilizer state Ψ for any odd d and n qudits is δΦ× x,r
for 2n × 2n matrix Φ and 2n vector r with entries in Z/dZ.

An equivalent form was proven by Gross [7] who also showed that these discrete Wigner functions of
stabilizer states are non-negative. In particular,
ifwe begin with a stabilizer state deﬁned as |Ψ0 = |q0 ,
0 0
then WΨ0 ( x) = δΦ0 × x,r0 , where Φ0 = for In the n × n identity matrix, and r 0 = (0, q0 ).
0 In

3. Wigner Stabilizer Algorithm for Odd d Qudits

With the discrete Wigner function of a stabilizer state defined in Theorem 1 and the effect of the
Clifford group generators on discrete Wigner functions defined in Equation (9), we can now examine
the effect Clifford operators have on stabilizer states. We note that since the discrete Wigner functions
of stabilizer states are non-negative and Clifford operations take stabilizer states to stabilizer states,
it follows that Clifford operations (if associated positive-operator valued measures (POVMs) also have
non-negative Wigner functions) can always be efficiently classically simulated by sampling from these
Wigner functions as probability distributions [4]. However, here we pursue a description that is not
dependent on classical sampling.

3.1. Stabilizer Representation

From Theorem 1, propagation of the stabilizer state Ψ can be represented by considering the
state’s Wigner function: WΨ ( x) = δΦt · x,r t . In this way, Φt and r t specify a linear system of equations
in terms of pt and qt . The first n rows of Φt are the coefficients of ( pt , qt ) T in p0 ( pt , qt ) and the last n
rows of Φt are the coefficients of ( pt , qt ) T in q0 ( pt , qt ):

p0 pt
= Φt . (15)
q0 qt

The Kronecker delta function sets this linear system of equations equal to r t . In this way, an afﬁne
map—a linear transformation displaced from the origin by r t —is deﬁned. This system of equations
must be updated after every unitary propagation and measurement.
Since the Wigner functions WΨ ( x) of stabilizer states propagate under M as WΨ (M x),
it follows that
Φt → Φt Mt−1 . (16)

(The importance of vector r t and when it must be updated will become evident when we consider
random measurements.) Hence, after n operations M1 , M2 , . . ., Mn ,

M− 1 −1 −1 −1
t = M1 M2 . . . M n . (17)

462
Entropy 2017, 19, 353

The matrices are ordered chronologically left-to-right instead of right-to-left.

Since M is symplectic, M− t = −J Mt J where
1 T

0 −In
J = . (18)
In 0

Thus, the the stability matrices M for F̂i , P̂i and Ĉij given in Equations (12)–(14) differ from their
inverses only by sign changes in their off-diagonal elements:

M−
P̂
1
= δj,k − 2δi,j δn+i,k , (19)
i j,k

M−
F̂
1
= δj,k − δi,j δi,k − δn+i,j δn+i,k (20)
i j,k
−δi,j δn+i,k + δn+i,j δi,k ,

and
MĈ−1 = δk,l + δi,k δj,l − δn+ j,k δn+i,l . (21)
ij
k,l

We assume the quantum state is initialized in the computational basis state Ψ0 = |0 ⊗ · · · ⊗ |0
E FG H
n
0 0
and so initially we should set Φ0 = and r 0 = 0. The initial stabilizer state is WΨ0 = δqt ,0 .
0 In
However, it will become clear when we discuss measurements that it is practically useful to instead set

In 0
Φ0 = , (22)
0 In

thereby setting WΨ0 = δ( pt ,qt ),(0,0) —not a true Wigner function. This new matrix Φ0 is equivalent to
the last matrix if the first n rows in Φt x and r t are ignored—the same as ignoring p0 ( pt , qt ). In fact,
we have two Wigner functions here: one defined by the first n rows and another by the last n rows.
We proceed in this manner, ignoring the first n rows, until their usefulness becomes apparent to us.
For n qudits unitary propagation requires O(n2 ) dits of storage to track Φt and r t . More precisely,
since Φt is a 2n × 2n matrix and r t is an 2n-vector, 2n(2n + 1) dits of storage are necessary.

3.2. Unitary Propagation

Φt contains the coefficients of the linear equations relating x0 to xt . Each row is one equation
relating q0 i or p0 i to xt . When manipulating rows of Φt we shall refer to the linear equations that these
rows define.
Examining Equations (19)–(21), we see that the inverse stability matrices of the generator gates
F̂i , P̂i and Ĉij are the sum of an identity matrix and a matrix with a finite number of non-zero
off-diagonal elements. The number of these off-diagonal elements is independent of the number
of qudits, n. Hence, multiplying Φt with a new stability matrix in Equation (16) and evaluating the
matrix multiplication is equivalent to performing a finite number of n-vector dot products and so
requires O(n) operations. Therefore, keeping track of propagation of stabilizer states by Clifford gates
can be simulated with O(n) operations.
Let us examine these unitary operations more closely. Defining ⊕ and 3 to be mod d addition
and subtraction respectively, we find:
Phase gate on qudit i ( P̂i ). For all j ∈ {1, . . . , 2n}, set Φ j,n+i → Φ j,n+i 3 2Φ j,n .
Hadamard gate on qudit i ( F̂i ). For all j ∈ {1, . . . , 2n}, negate Φ j,i mod d, and then swap 3Φ j,i and
Φ j,n+i .

463
Entropy 2017, 19, 353

CNOT from control i to target j (Ĉij ). For all j ∈ {1, . . . , 2n}, set Φk,j → Φk,j ⊕ Φk,i and Φk,n+i →
Φk,n+i 3 Φk,n+ j .
This conﬁrms that unitary propagation in this scheme requires O(n) operations.

3.3. Measurement
The outcome of a measurement Ẑi on a stabilizer state can be either random or deterministic.
As described above, the bottom half of Φt defines q0 j for j ∈ {1, . . . , n}, each of which is a linear
combination of qt i and pt i . The entries in the (n + j)th row of Φt give the coefficient of pt i and qt i in
q0 j for j ∈ {1, . . . , n}. If the coefficient of pt i in any q0 j is non-zero then the measurement Ẑi will be
random. If all coefficients of pt i are zero for q0 j ∀ j, then the measurement of Ẑi will be deterministic.
This can be seen from the fact that if our stabilizer state |Ψ is an eigenstate of Ẑi , then Ẑi |Ψ = eiφ |Ψ
for some φ ∈ R and (discrete) Wigner functions do not change under a global phase. Thus, measuring
Ẑi leaves the Wigner function of |Ψ invariant if the measurement is deterministic. Since Ẑi is a boost
operator that increments the momentum of a state by one, its effect on the linear system of equations
specified by the Wigner function is:
⎛ ⎞ ⎛ ⎞
pt 1 pt 1
⎜ .. ⎟ ⎜ .. ⎟
⎜ . ⎟ ⎜ . ⎟
⎜ ⎟ ⎜ ⎟
⎜ ⎟ rt p ⎜ ⎟ rt p
⎜ pt i ⎟ ⎜ pt i + 1 ⎟
Φt ⎜ ⎟= → Φt ⎜ ⎟= . (23)
⎜ .. ⎟ rt q Ẑi ⎜ .. ⎟ rt q
⎜ . ⎟ ⎜ . ⎟
⎜ ⎟ ⎜ ⎟
⎝ pt n ⎠ ⎝ pt n ⎠
qt qt

Thus, if the lower half of the ith column of Φt is zero, then Ẑi leaves the Wigner function invariant
(and so the measurement is deterministic). Verifying that these coefficients are all zero takes O(n)
operations for each Ẑi .
In other words, to see if a given measurement of Ẑi is random or deterministic, a search must be
performed for non-zero Φtn+ j,i elements. If such a non-zero element exists, then the measurement
is random since it means that the final momentum of qudit i affects the state of the stabilizer and so
its position must be undetermined (by Heisenberg’s uncertainty principle). If no such finite Φtn+ j,i
element exists, then the measurement Ẑi is deterministic. We now describe the algorithm in detail for
these two cases:

Case 1: Random Measurement

Let the (n + j)th row in the bottom half of Φt have a non-zero entry in the ith column, Φtn+ j,i
= 0.
Since the random measurement Ẑi will project qudit i onto a position state, we will replace the
(n + j)th row with q0 i = qi (the uniformly random outcome of this measurement). After this projection
onto a position state, none of the other qudits’ positions should depend on qudit i’s momentum, pt i .
To accomplish this, before we replace row (n + j), we solve its equation for pt i and substitute every
instance of pt i in the linear system of equations with this solution. As a result, every equation will no
longer depend on pt i and we can go ahead and replace the (n + j)th row with q0 i = qi .
There is one more thing to do, which will be important for deterministic measurements: replace
the jth row with the old (n + j)th row. This sets p0 i = q0 j ( pt , qt ), which becomes the only remaining
equation explicitly dependent on pt i . In other words, p0 i ∝ pt i , similar to the beginning when we set
p0 i = pt i by setting Φ = I2n . However, now we also preserve any dependence p0 i has on the other
qudits incurred during unitary propagation. In other words, we preserve pt i ’s dependence upon the
other qudits, but only in the Wigner function specified by the top n rows, which we ignore otherwise.
After replacing the equation specified by row (n + j) of Φt and r t with a randomly chosen
measurement outcome qi (i.e., q0 i = qi ), the identification of rows (n + i ) and (n + j) are exchanged,

464
Entropy 2017, 19, 353

so that the former now specifies q0 j ( pt , qt ) while the latter specifies q0 i ( pt , qt ). p0 i has also been
updated by replacing the jth row in the first half of Φt , with the (n + j)th row we just changed. Again,
this row now describes p0 i ( pt , qt ) while the ith row now specifies p0 j ( pt , qt ). Overall, this takes O(n2 )
operations since we are replacing O(n) rows with O(n) entries.

Case 2: Deterministic Measurement

Since the measurement is deterministic, Φt and r t do not change. The n equations speciﬁed by
the bottom half of Φt xt = r t can be used to solve for qt i —the deterministic measurement outcome.
In general, this can also be done by inverting Ψt and evaluating xt = Φt−1 · r t for qi . Aaronson and
Gottesman themselves noted that such a matrix inversion is possible, but practically takes O(n3 )
operations. (However, we are not certain if Aaronson and Gottesman were referring to the Φt matrix
corresponding to the 2n × 2n part of their tableau when they discuss matrix inversion in [2].)
Fortunately, there is another method that scales as O(n2 ) and requires use of the n equations
represented by the top n rows of Φt , which were included in our description by setting Φ0 = I2n .
The linear system of n equations represented by Φt xt = r t can be written as

Φt xt = rt , (24)

p0 ( p t , q t ) rt p
= , (25)
q0 ( p t , q t ) rt q

where we are interested in linear combinations of the bottom half, q0 ( pt , qt ), to solve for the
measurement outcome qt i :
n
∑ cij q0 j = qt i , (26)
j =1

where cij ∈ Z/dZ.

Lemma 1. The coefficient in front of pt i in the row of Φt that specifies p0 j ( pt , qt ), Φt ji , is equal to the coefficient
cij in front of q0 j that makes up qt i in Equation (26). Equivalently,

cij = q0 j · qt i ( p0 , q0 ) = p0 j ( pt , qt ) · pt i = Φt ji . (27)

Proof. Under evolution under the Clifford group operators,

pt p0
= Mt . (28)
qt q0

M− t
1
= −J MtT J since Mt is symplectic. This means that we can express the matrix inversion
as follows:

p0 −1 pt
= Mt (29)
q0 qt

pt
= −J MtT J (30)
qt
T
(Mt )11 (Mt )12 pt
= −J J (31)
(Mt )21 (Mt )22 qt

(Mt )22 (−Mt )12 pt
= . (32)
(−Mt )21 (Mt )11 qt

465
Entropy 2017, 19, 353

Therefore, Mt−1 = (Mt )22i,j , and so
11i,j

cij = q0 j · qt i ( p0 , q0 ) = p0 j ( pt , qt ) · pt i = Φt ji . (33)

This property can also be seen in the drawing of phase space shown in Figure 1. There, initial
perpendicular p0 j and q0 j manifolds are drawn along with harmonically evolved pt i and qt i manifolds,
which remain perpendicular to each other and make an angle α to the ﬁrst p0 j and q0 j manifolds,
respectively. The projection of qt i ( p0 , q0 ) onto q0 j can be represented as the length b of a right triangle’s
adjacent side to the angle α, with an opposite side set to some length a. The projection of p0 j ( pt , qt )
onto pt i is similarly represented by the length b of a right triangle’s adjacent side to the angle α, with an
opposite side also set to length a. It follows that the third angle β in both triangles must be the same,
and so by the law of sines
a b b
= = . (34)
sin α sin β sin β
Therefore, b = b and so these two projections are equal to one another. In the discrete Weyl phase
space such manifolds must lie along grid phase points and obey the periodicity in x p and xq , but the
premise is the same.

Overall, the procedure outlined in Lemma 1 for deterministic measurements takes O(n2 )
operations since Equation (27) is a sum of O(n) vectors made up of O(n) components. Therefore, the
overall measurement protocol takes O(n2 ) operations. Note that this formulation of the algorithm
shows that it is the symplectic structure on phase space and the linear transformation under harmonic
evolution that allows the inversion (Equation (32)) to be performed efﬁciently.

Figure 1. The initial perpendicular manifolds p0 j and q0 j and the harmonically evolved perpendicular
manifolds pt i and qt i . Description of the various lengths and angles are given in the text in the proof
of Lemma 1.

4. Aaronson–Gottesman Tableau Algorithm for Qubits (d = 2)

The Aaronson–Gottesman tableau algorithm was originally deﬁned for qubits (d = 2) [2].
Like the algorithm we presented in the previous section, it only requires overall O(n2 ) operations

466
Entropy 2017, 19, 353

for propagation and measurement for n qubits. The algorithm has been proven to be extendable
to d > 2 [12] and similar algorithms have been formulated in d > 2 [13]. Alternatives have also
been developed to the tableau formalism, though they prove to be equally efﬁcient in worst-case
scenarios [14]. However, we are not aware of any direct extension of the Aaronson–Gottesman tableau
algorithm to dimensions greater than two. In this and the next section, we will show that the Wigner
algorithm presented in Section 3 is equivalent to the Aaronson–Gottesman tableau algorithm extended
to odd d.

4.1. Stabilizer Representation

The Aaronson–Gottesman algorithm is deﬁned in the stabilizer formalism. It keeps track of the
evolution of a stabilizer state by updating the generators of the stabilizer group, elements of which are
deﬁned as follows:

Deﬁnition 1. A set of operators that satisﬁes S = { ĝ ∈ P such that ĝ |ψ = |ψ} are called the stabilizers
πi
of state |ψ, where P is the set of Pauli operators, each of which has the form e 2 α P̂1 ⊗ · · · ⊗ P̂n where
α ∈ {0, 1, 2, 3} for n qubits with P̂i ∈ { Îi , Ẑi , X̂i , Ŷi }.

For the sake of completeness, we present here a summary of the qubit Aaronson–Gottesman
algorithm, in order to compare it to our odd d qudit algorithm. For more details, see [1,2].
Each n-qubit stabilizer state is uniquely determined by 2n Pauli operators. There are only n
generators of this Abelian group of 2n operators. Therefore, an n-qubit stabilizer state is deﬁned by
the n generators of its stabilizer state. Every element in this set of generators, { ĝ1 , ĝ2 , . . . , ĝn }, is in the
Pauli group, and each generator has the form:

ĝi = ± P̂i1 . . . P̂in . (35)

Any unitary propagation by Clifford operators or measurement of the stabilizer state changes at least
some of the P̂ij elements of the n generators of the state’s stabilizer. This includes the ±1 phase in
Equation (35), which must also be kept track of in Aaronson–Gottesman’s algorithm.

4.2. Unitary Propagation

For each Clifford operation, Aaronson and Gottesman showed that only O(n) operations are
necessary to update all generators [2]. Speciﬁcally, according to the update rules in Table 1, each
generator can be updated with a constant number of operators for every single Clifford gate, therefore
O(n) in total. However, it is a little more complicated to update the generators after each measurement.
To do this efﬁciently, Aaronson introduced “destabilizers”:

Deﬁnition 2. Destabilizers { ĝ1 , . . . , ĝn } are the operators that generate the full Pauli group with the stabilizers
{ ĝ1 , . . . , ĝn }. They have the following properties:

(i) ĝ1 , ĝ2 , . . ., ĝn commute.

(ii) Each destabilizer ĝh anti-commutes with the corresponding stabilizer ĝh , and commutes with all
other stabilizers.

467
Entropy 2017, 19, 353

Table 1. Transformation of stabilizer generators under Clifford operations.

Gates Input Output

X̂ Ẑ
Hadamard
Ẑ X̂
X̂ Ŷ
phase
Ẑ Ẑ
X̂ ⊗ Î X̂ ⊗ X̂
Î ⊗ X̂ Î ⊗ X̂
CNOT
Ẑ ⊗ Î Ẑ ⊗ Î
Î ⊗ Ẑ Ẑ ⊗ Ẑ

To incorporate the destabilizers, a tableau becomes useful to see how they play a role in updating
the stabilizer generators during measurement [2].
Aaronson–Gottesman deﬁned such a 2n × (2n + 1) binary tableau matrix as:
⎛ ⎞
x11 ... x1n z11 ... z1n r1
⎜ .. .. .. .. .. .. .. ⎟
⎜ . . . . . . . ⎟
⎜ ⎟
⎜ ⎟
⎜ xn1 ... xnn zn1 ... znn rn ⎟
⎜ ⎟.
⎜ x ( n +1)1 ... x ( n +1) n z ( n +1)1 ... z ( n +1) n r n +1 ⎟
⎜ ⎟
⎜ .. .. .. .. .. .. .. ⎟
⎝ . . . . . . . ⎠
x(2n)1 ... x(2n)n z(2n)1 ... z(2n)n r2n

This matrix contains 2n rows. The ﬁrst n rows denote the destabilizers ĝ1 to ĝn while rows (n + 1) to
2n represent the stabilizers ĝ1 to ĝn . The (n + 1)th bit in each row denotes the phase (−1)ri for each
generator. We encode the jth Pauli operator in the ith row as shown in Table 2.

Table 2. Binary representation of the Pauli operators and the Pauli group phase used in their tableau representation.

xij zij P̂j

0 0 Îj
0 1 Ẑj
1 0 X̂ j
1 1 Ŷj
ri Phase
0 +1
1 −1

We can update the stabilizers and destabilizers as follows:

Hadamard gate on qubit i For all j ∈ {1, 2, ..., 2n}, r j → r j ⊕ x ji z ji , then swap x ji with z ji .
Phase gate on qubit i For all j ∈ {1, 2, ..., 2n}, r j → r j ⊕ x ji z ji , z ji → z ji ⊕ x ji .
CNOT gate on control qubit i and target qubit j For all k ∈ {1, 2, ..., 2n}, rk → rk ⊕ xki zkj ( xkj ⊕ zki ⊕ 1),
xkj → xkj ⊕ xki , zki → zki ⊕ zkj .
These actions correspond to those given in Table 1.
Notice the striking similarity of these tableau transformation rules under unitary propagation to
the Φ transformation rules in Section 4. The most notable difference is that the Aaronson–Gottesman
algorithm involves updates of the vector r. We will discuss this and its connection to the dimension
d = 2 of the system in Section 5. It is clear that these transformations also take O(n) operations each.

468
Entropy 2017, 19, 353

4.3. Measurement
To describe the measurement part of the algorithm, we need to first define a rowsum operation in
the tableau that corresponds to multiplying two Pauli operators together. As defined in [2]:
Rowsum: To sum row i and j, first update the bits that represent operators by xik ⊕ x jk and zik ⊕ z jk
for k = 1, . . . , n. To calculate the resultant phase, Aaronson and Gottesman first defined the
following function:
⎧
⎪
⎪ 0 if xik = zik = 0,
⎪
⎪
⎪
⎨z − x
jk jk if xik = zik = 1,
f ( xik , x jk , zik , z jk ) = (36)
⎪
⎪z jk (2x jk − 1) if xik = 1, zik = 0,
⎪
⎪
⎪
⎩ x (1 − 2z )
jk jk if xik = 0, zik = 1.

Since each stabilizer generator is the tensor product of n single qubit Pauli operators (see Equation (35)),
they must be multiplied together to obtain the phase:
#
0 if ri + r j + ∑nk=1 f ( xik , x jk , zik , z jk ) ≡ 0 (mod 4),
(37)
1 if ri + r j + ∑nk=1 f ( xik , x jk , zik , z jk ) ≡ 2 (mod 4).

Having deﬁned the rowsum function, let us now consider a measurement of Ẑi on qubit i.
For d = 2, Pauli group operators can only commute or anti-commute with each other. If Ẑi
anti-commutes with one or more of the generators, then the measurement is random. If Ẑi commutes
with all of the generators, then the measurement is deterministic. We consider these two cases:

Case 1: Random Measurement

Ẑi anti-commutes with one or more of the generators. If there is more than one, we can always pick
a single anti-commuting generator, ĝ j , and update the rest by replacing them with their product with ĝ j
(i.e., taking the rowsum of their corresponding rows) such that they commute with Ẑi . These updates
take O(n2 ) operations. Finally, we only need to replace ĝ j by Ẑi .
In other words, with respect to the tableau, there should exist at least one j ∈ {n + 1, n + 2, ..., 2n}
such that x ji = 1. Replacing all rows where xki = 1 for k
= j with the sum of the jth and kth row
(using the rowsum function) sets all xki = 0 for k
= j.
Finally, we replace the ( j − n)th row with the jth row and update the jth row by setting z ji = 1
and all other x jk s and z jk s to 0 for all k. We output r j = 0 or r j = 1 with equal probability for the
measurement result. This procedure takes O(n2 ) operations because each rowsum operation takes
O(n) operations and up to n − 1 rowsums may be necessary.

Case 2: Deterministic Measurement

Ẑi commutes with all generators. In this case, there is no j ∈ {n + 1, n + 2, ..., 2n} such that x ji = 1
and we don’t need to update any of the generators. However, we do need to do some work to retrieve
the measurement outcome.
Measurement Ẑi commutes with all of the stabilizers; therefore, either + Ẑi or − Ẑi is a stabilizer of
the state. Therefore, it must be generated by the generators. The sign ±1 is the measurement outcome
we are looking for. This means that
n
cj
∏ ĝj = ± Ẑi , (38)
j =1

where c j = 1 or 0.

469
Entropy 2017, 19, 353

For those destabilizers gk that satisfy

{ ĝk , ± Ẑi } = 0, (39)

ck = 1. Otherwise, ck = 0. This can be seen from

n n
c c
{ ĝk , ± Ẑi } = { ĝk , ∏ ĝ j j } = ∏ ĝ j j { ĝk , ĝkk } = 0,
c
(40)
j =1 j =1
j
=k

where we used part (ii) of Deﬁnition 2 of the destabilizers and Equation (39). The last equality requires
ck = 1.
Therefore, to ﬁnd the deterministic measurement outcome, the stabilizers whose corresponding
destabilizer anti-commutes with the measurement operation Ẑi must be multiplied together. Every
row (n + j) in the bottom half of the tableau, such that x ji = 1 (for j ∈ {1, . . . , n}), can be added up
together and stored in a temporary register. The resultant phase ±1 of this sum is the measurement
result we are looking for.
Checking if each destabilizer commutes or anti-commutes with Ẑi takes a constant number of
operations. One multiplication takes O(n) operations, and there are O(n) multiplications needed.
Therefore, a measurement takes O(n2 ) operations overall.

5. Discussion
As we made clear throughout Section 4, the scaling of the number of required operations with
respect to number of qudits n is exactly the same in the (d = 2) Aaronson–Gottesman algorithm as
in the (odd d) Wigner algorithm presented in Section 3. The two algorithms also require the same
number of dits of temporary storage for performing the deterministic measurement. Moreover, there
is a correspondence between the tableau employed by Aaronson–Gottesman
and the matrix Φt and
vector r t we use. In particular, the tableau is equal to Φt rt :

⎛ ⎞
x11 ... x1n z11 ... z1n
⎜ .. .. .. .. .. .. ⎟
⎛ ⎜
⎞ ⎜ . . . . . . ⎟
∂p0 ∂p0 ⎟
⎜ x ⎟
⎠≡⎜ ⎟
∂pt ∂qt ... xnn zn1 ... znn
Φt = ⎝
n1
∂q0 ∂q0 ⎜ ⎟ (41)
⎜ x ( n +1)1 ... x ( n +1) n z ( n +1)1 ... z ( n +1) n ⎟
∂pt ∂qt ⎜ ⎟
⎜ .. .. .. .. .. .. ⎟
⎝ . . . . . . ⎠
x(2n)1 ... x(2n)n z(2n)1 ... z(2n)n

and ⎛ ⎞
r1
⎜ . ⎟
⎜ .. ⎟
⎜ ⎟
rp ⎜ r ⎟
⎜ n ⎟
rt = ≡⎜ ⎟. (42)
rq ⎜ r n +1 ⎟
⎜ ⎟
⎜ .. ⎟
⎝ . ⎠
r2n
This can be seen through the following equation:

2n 2n
2πi 2πi
exp
d ∑ Φtn+i,j x̂ j |Ψt = ∏ exp d
Φtn+i,j x̂ j |Ψt
j =1 j =1

2πi
= exp rti |Ψt , (43)
d

470
Entropy 2017, 19, 353

x̂ ≡ ( p̂,
where q̂). Multiplying the right-hand side of the ﬁrst equation and the second equation by
exp − 2πi
d r ti , it follows that

2n
2πi 2πi
exp − rti ∏ exp Φtn+i,j x̂ j Ψt = ĝi |Ψt = |Ψt . (44)
d j =1
d

In other words, rti specifies the phase exp − 2πi
d rti of the ith stabilizer, which is itself specified by
Φtn+i,j for j ∈ {0, . . . , 2n}. These are the same roles for r and the tableau in the Aaronson–Gottesman
tableau algorithm [2].
Indeed, both algorithms check the bottom half of their matrices for finite elements of Φn+ j,i to
determine if a measurement on the ith qudit will be random or not. They also use a very similar
protocol to determine the outcome of deterministic measurements. The Wigner-based algorithm
motivates these manipulations in terms of the symplectic structure of Weyl phase space and the
relationship between the two Wigner functions specified by the top and bottom of Φ, providing a
strong physical intuition for their effects. Aaronson and Gottesman motivate these manipulations
using the anti-commutation relations between the stabilizer and destabilizer generators. In addition,
the latter half of both the Wigner function’s r t and Aaronson–Gottesman’s r are used to determine
measurement outcomes. The only fundamental algorithmic difference between the approaches is that
the Wigner-based algorithm does not require updates of r t during unitary propagation. The reason
for this lies in the fact that Aaronson–Gottesman’s algorithm deals with systems with d = 2 while the
Wigner-based algorithm is restricted to odd d.
In particular, for the one-qubit Clifford group gate operator Â = { P̂i , F̂i } ∀i = {1, . . . , n}, the
Aaronson–Gottesman algorithm specifies that for a q- or p-state, its Wigner function evolves by:

WΨ (M Â x). (45)

However, for |r = √1 (|0 ± i |1), a Y-state which is diagonal in the pq plane, its Wigner function
2
must ﬁrst be translated:
' (
WΨ M Â x + β , (46)

where the translation β can be (1, 0) or (0, 1) equivalently. There is a similar state-dependence for the
two-qubit CNOT gate Ĉij .
This demonstrates that the Aaronson–Gottesman algorithm is state-dependent on the qubit
stabilizer state it is acting on. On the other hand, the Wigner function algorithm on odd d qudit stabilizer
states is state-independent. This likely is a consequence of the fact that the Weyl–Heisenberg group, which
is made up of the boost and shift operators deﬁned in Equations (4) and (5) that underlie the discrete
Wigner formulation, are a subgroup of U (d) instead of SU (d) for d = 2 [15]. Furthermore, qubits
exhibit state-independent contextuality while odd d qudits do not [16]. Recent progress on this subject
relating non-contextuality to classical simulatability for qubits can be found here [17,18].

6. Example of Stabilizer Evolution

As a demonstration of what stabilizer state propagation looks like in the Wigner formalism, we
proceed to go through an example of Bell state preparation and measurement starting from the state
|0 ⊗ |0. To illustrate this process we decompose the two qutrit Wigner function of this state into nine
3 × 3 grids, as shown in Figure 2. The prepared Wigner function is denoted
in Figure 3 with the color
1 0
black, and the Wigner function represented by setting Φ0 = (i.e., considering the top n
0 0
rows of Φ to be a separate Wigner function, as discussed at the end of Section 3.1) is denoted with the
color gray.

471
Entropy 2017, 19, 353

Figure 2. A decomposition of the two qutrit Wigner function into nine 3 × 3 grids, where each 3 × 3
grid denotes the value of the Wigner function at all pt 1 and qt 1 for a ﬁxed value of pt 2 and qt 2 denoted
by the external axes. This organization is used in Figure 3 below.

Figure 3. The Wigner function of two qutrits initially prepared in (a) the state |0 ⊗ |0. (1) This is
evolved under F̂1 to produce (b) √1 (|0 + |1 + |2) ⊗ |0. (2) Subsequently, this state is evolved under
3
Ĉ12 producing (c) the Bell state √1 (|00 + |11 + |22). (3) Qutrit 1 is then measured producing the
3
random outcome 1, which collapses qutrit 2 into the same state, so that (d) |1 ⊗ |1 results. The black
color indicates the Wigner function speciﬁed by the lowest n rows of δΦt x,r t , and the gray color
indicates the Wigner function speciﬁed by the highest n rows (q0 ( pt , qt ) and p0 ( pt , qt ), respectively).
The evolution and algorithmic implementation are explained in the text.

We begin with

WΨ ( x) = (47)
δ⎛ ⎞⎛ ⎞⎛ ⎞ = δ⎛ ⎞⎛ ⎞,
1 0 0 0 pt 1 0 pt 1 0
⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟
⎜ 0 1 0 0 ⎟⎜ pt 2 ⎟⎜ 0 ⎟ ⎜ pt 2 ⎟⎜ 0 ⎟
⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟
⎜ ⎟⎜ ⎟,⎜ ⎟ ⎜ ⎟,⎜ ⎟
⎜ 0 0 1 0 ⎟⎜ qt 1 ⎟⎜ 0 ⎟ ⎜ qt 1 ⎟⎜ 0 ⎟
⎝ ⎠⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠
0 0 0 1 qt 2 0 qt 2 0

472
Entropy 2017, 19, 353

denoting an initially prepared state of |0 ⊗ |0. This is clear in Figure 3a by the black band that lies
along all Weyl phase space points with qt 1 = 0 and qt 2 = 0. On the other hand, the gray manifold is
perpendicular to the black one, and lies along Weyl phase space points with pt 1 = 0 and pt 2 = 0.
Acting on this state with F̂1 produces √1 e 3 0×0 |0 + e
2πi 2πi 1×0 2πi 2×0

3
3 |1 + e 3 |2 ⊗ |0. Applying
the algorithm speciﬁed at the end of Section 3.2, we ﬁnd:
WΨ ( x) = (48)
δ⎛ ⎞⎛ ⎞⎛ ⎞ = δ⎛ ⎞⎛ ⎞
0 0 −1 0 pt 1 0 −qt 1 0
⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟
⎜ 0 1 0 0 ⎟⎜ pt 2 ⎟⎜ 0 ⎟ ⎜ pt 2 ⎟⎜ 0 ⎟
⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟
⎜ ⎟⎜ ⎟,⎜ ⎟ ⎜ ⎟,⎜ ⎟.
⎜ 1 0 0 0 ⎟⎜ qt 1 ⎟⎜ 0 ⎟ ⎜ pt 1 ⎟⎜ 0 ⎟
⎝ ⎠⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠
0 0 0 1 qt 2 0 qt 2 0

Thus, the momentum of qutrit 1 is now determined and is 0 while the second qutrit is unchanged.
This can be seen in Figure 3b, where the qt 2 values of the non-zero Weyl phase space points are the
same, while the state has rotated by −π/2 in ( pt 1 , qt 1 )-space. A similar transformation has occurred
for the perpendicular gray manifold.
Acting next with Ĉ12 produces the Bell state √1 (|00 + |11 + |22), which is represented by the
3
following Wigner function:
WΨ ( x) = (49)
δ⎛ ⎞⎛ ⎞⎛ ⎞ = δ⎛ ⎞⎛ ⎞
0 0 −1 0 pt 1 0 −qt 1 0
⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟
⎜ 0 1 0 0 ⎟⎜ pt 2 ⎟⎜ 0 ⎟ ⎜ pt 2 ⎟⎜ 0 ⎟
⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟
⎜ ⎟⎜ ⎟,⎜ ⎟ ⎜ ⎟,⎜ ⎟.
⎜ 1 1 0 0 ⎟⎜ qt 1 ⎟⎜ 0 ⎟ ⎜ pt 1 + pt 2 ⎟⎜ 0 ⎟
⎝ ⎠⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠
0 0 −1 1 qt 2 0 −qt 1 + qt 2 0

The entanglement between the two qutrits is evident in both of their dependence on each other’s
momenta and positions, pt 1 = − pt 2 and qt 1 = qt 2 , specified by the last two rows. Figure 3c
shows that the state is still representable as lines in Weyl phase space, except they now traverse
through the different planes of (qt 1 , pt 1 ) associated with each value of (qt 2 , pt 2 ). However, if you
consider the left column in Figure 3c corresponding to qt 2 = 0, you can see that the only black Weyl
phase points are at qt 1 = 0. Similarly, the middle column corresponding to qt 2 = 1 shows that
qt 1 = 1, and the right column corresponding to qt 2 = 2 shows that qt 1 = 2 too, confirming that
|Φ = √13 (|00 + |11 + |22). Thus, the entanglement of the two qutrits’ positions is clearly evident
in this Figure of the Wigner function.
We then proceed to measure qutrit 1. Since the lower two equations involve pt 1 , we know that
this is a random measurement. Let us pick the outcome to be 1 and set the third row as such, replacing
the first row with the old third row. This collapses qutrit 2 into the same state:
WΨ ( x) = (50)
δ⎛ ⎞⎛ ⎞⎛ ⎞ = δ⎛ ⎞⎛ ⎞
1 1 0 0 pt 1 0 pt 1 + pt 2 0
⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟
⎜ 0 1 0 0 ⎟⎜ pt 2 ⎟⎜ 0 ⎟ ⎜ pt 2 ⎟⎜ 0 ⎟
⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟
⎜ ⎟⎜ ⎟,⎜ ⎟ ⎜ ⎟,⎜ ⎟.
⎜ 0 0 1 0 ⎟⎜ qt 1 ⎟⎜ 1 ⎟ ⎜ qt 1 ⎟⎜ 1 ⎟
⎝ ⎠⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠
0 0 −1 1 qt 2 0 −qt 1 + qt 2 0

The lower two rows show that now qt 1 = 1, as we chose, and qt 2 = qt 1 = 1. The collapse of qutrit 2
into |1 can also been seen in Figure 3c by the fact that qt 1 = 1 only in the 3 × 3 grids that correspond
to qt 2 = 1 too.
Finally, the fact that a measurement of qt 2 would be deterministic at this point can be seen in
the fact that pt 2 is not present in the last two rows of Φt . Furthermore, it is clear, since the ﬁrst row
has a coefﬁcient of 1 in front of pt 1 , that the corresponding third row must be added with weight 1 to
the fourth row to obtain this deterministic measurement outcome of qt 2 = 1. This can also be seen in

473
Entropy 2017, 19, 353

Figure 3 by ﬁnding the projection of p0 1 onto pt 2 , which are shown by the gray manifolds in panels (a)
and (d), respectively. They are collinear and so the projection is equal to 1. (Perpendicular manifolds
corresponds to a projection of 0, and those that lie π/4 diagonally with respect to each other have a
projection equal to 2 in this discrete geometry.)

7. Conclusions
In summary, we introduced an algorithm that efficiently simulates stabilizer state evolution
under Clifford gates and measurements in the Ẑ Pauli basis for odd d qudits. We accomplished
this by relying on the phase-space perspective of stabilizer states as discrete Gaussians and
Clifford operators as having underlying harmonic Hamiltonians. We showed the equivalence of
our algorithm, through Equations (43) and (44), to the well-known Aaronson–Gottesman tableau
algorithm [2] for qubits, revealing that Aaronson–Gottesman’s tableau corresponds to a discrete
Wigner function. As a consequence, we revealed the physically intuitive phase space perspective of
Aaronson–Gottesman’s algorithm, as well as its extension to higher odd d.
This work illustrates that no efficiency advantage is gained by using the Heisenberg representation
for stabilizer propagation. Equation (44) indicates that the Heisenberg representation is equivalent to
the Schrödinger representation in this context; evolving the operators is just as efficient as evolving the
states, as perhaps expected.
Lastly, the correspondence between the Wigner-based algorithm and the Aaronson–Gottesman
tableau algorithm may point the direction on how to resolve the long-standing issue of describing
the Wigner–Weyl–Moyal and center-chord formalism for d = 2 systems. We have shown that
the Aaronson–Gottesman algorithm is essentially a d = 2 treatment of the Wigner approach.
The salient difference appears to be the state-dependence of this evolution, and likely is related
to the state-independent contextuality that qubits exhibit, which odd d qudits do not. Exploring the
details of this state-dependence is a promising subject of future study.

Acknowledgments: This work was supported by the Air Force Office of Scientific Research (AFOSR) award
No. FA9550-12-1-0046.
Author Contributions: All authors contributed to the work presented here.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Gottesman, D. The Heisenberg Representation of Quantum Computers. arXiv 1998, arXiv:quant-ph/9807006.
2. Aaronson, S.; Gottesman, D. Improved simulation of stabilizer circuits. Phys. Rev. A 2004, 70, 052328.
3. Gottesman, D. Fault-tolerant quantum computation with higher-dimensional systems. In Quantum
Computing and Quantum Communications; Springer: Heidelberg, Germany, 1999; pp. 302–313.
4. Mari, A.; Eisert, J. Positive Wigner functions render classical simulation of quantum computation efficient.
Phys. Rev. Lett. 2012, 109, 230503.
5. Howard, M.; Wallman, J.; Veitch, V.; Emerson, J. Contextuality supplies the ‘magic’ for quantum computation.
Nature 2014, 510, 351–355.
6. Wootters, W.K. A Wigner-function formulation of finite-state quantum mechanics. Ann. Phys. 1987, 176, 1–21.
7. Gross, D. Hudson’s theorem for finite-dimensional quantum systems. J. Math. Phys. 2006, 47, 122107.
8. Veitch, V.; Ferrie, C.; Gross, D.; Emerson, J. Negative quasi-probability as a resource for quantum computation.
New J. Phys. 2012, 14, 113011.
9. Veitch, V.; Wiebe, N.; Ferrie, C.; Emerson, J. Efficient simulation scheme for a class of quantum optics
experiments with non-negative Wigner representation. New J. Phys. 2013, 15, 013037.
10. Kocia, L.; Love, P. Semiclassical Formulation of Gottesman–Knill and Universal Quantum Computation.
arXiv 2016, arXiv:1612.05649.
11. Koh, D.E.; Penney, M.D.; Spekkens, R.W. Computing quopit Clifford circuit amplitudes by the
sum-over-paths technique. arXiv 2017, arXiv:1702.03316.
12. De Beaudrap, N. A linearized stabilizer formalism for systems of finite dimension. arXiv 2011, arXiv:1102.3354.

474
Entropy 2017, 19, 353

13. Yoder, T.J. A Generalization of the Stabilizer Formalism for Simulating Arbitrary Quantum Circuits.
2012. Available online: https://pdfs.semanticscholar.org/b200/efe1709d07ffc1b5b7bd90e61c09e2729bdf.pdf
(accessed on 6 July 2017).
14. Anders, S.; Briegel, H.J. Fast simulation of stabilizer circuits using a graph-state representation. Phys. Rev. A
2006, 73, 022334.
15. Bengtsson, I.; Zyczkowski, K. On discrete structures in ﬁnite Hilbert spaces. arXiv 2017, arXiv:1701.07902.
16. Mermin, N.D. Hidden variables and the two theorems of John Bell. Rev. Mod. Phys. 1993, 65, 803.
17. Raussendorf, R.; Browne, D.E.; Delfosse, N.; Okay, C.; Bermejo-Vega, J. Contextuality as a resource for qubit
quantum computation. arXiv 2015, arXiv:1511.08506.
18. Kocia, L.; Love, P. Discrete Wigner Formalism for Qubits and the Non-Contextuality of Clifford Operations
on Qubit Stabilizer States. arXiv 2017, arXiv:1705.08869.

475
entropy
Article
Concepts and Criteria for Blind Quantum Source
Separation and Blind Quantum Process Tomography
Alain Deville 1, * and Yannick Deville 2
1 Institut Matériaux Microélectronique et Nanosciences de Provence (IM2NP), Aix-Marseille Université,
13397 Marseille, France
2 Institut de Recherche en Astrophysique et Planétologie (IRAP), Université de Toulouse, 31400 Toulouse,
France; [email protected]
* Correspondence: [email protected]; Tel.: +33-5-61-33-28-24

Received: 6 April 2017; Accepted: 23 June 2017; Published: 6 July 2017

Abstract: Blind Source Separation (BSS) is an active domain of Classical Information Processing,
with well-identified methods and applications. The development of Quantum Information Processing
has made possible the appearance of Blind Quantum Source Separation (BQSS), with a recent
extension towards Blind Quantum Process Tomography (BQPT). This article investigates the use of
several fundamental quantum concepts in the BQSS context and establishes properties already used
without justification in that context. It mainly considers a pair of electron spins initially separately
prepared in a pure state and then submitted to an undesired exchange coupling between these spins.
Some consequences of the existence of the entanglement phenomenon, and of the probabilistic aspect
of quantum measurements, upon BQSS solutions, are discussed. An unentanglement criterion is
established for the state of an arbitrary qubit pair, expressed first with probability amplitudes and
secondly with probabilities. The interest of using the concept of a random quantum state in the
BQSS context is presented. It is stressed that the concept of statistical independence of the sources,
widely used in classical BSS, should be used with care in BQSS, and possibly replaced by some
disentanglement principle. It is shown that the coefficients of the development of any qubit pair pure
state over the states of an orthonormal basis can be expressed with the probabilities of results in the
measurements of well-chosen spin components.

Keywords: blind source separation (BSS); qubit pair; exchange coupling; entangled pure state;
unentanglement criterion; probabilities in quantum measurements; independence of random
quantum sources

1. Introduction
The book entitled “Do we really understand quantum mechanics?” [1] was published five years
ago. Some fourty years earlier, its author, Laloë, had co-authored a treatise on quantum mechanics,
together with Cohen-Tannoudji, later a Nobel laureate, and Diu [2]. While this recent book illustrates
the present strong interest for the foundations of Quantum Theory (QT), already in 1929, Dirac could
claim: “The general theory of quantum mechanics is now almost complete” and “The underlying physical laws
necessary for the mathematical theory of a large part of physics and the whole of chemistry are thus completely
known” [3]. Since that time, the development of both telecommunications through electromagnetic
waves and solid state electronics favoured the appearance first of classical Information Theory, and then
of Quantum Information Theory and Processing (QIT, QIP).
This special issue, Quantum Information and Foundations, in the Quantum Information Section
of Entropy, reflects the existence of links between QIP/QIT and the foundations of QT. An instance
of such links is given by the approach adopted e.g., in Timpson’s Thesis [4]. This methodology, in
the framework of Philosophy of Science, is difficult because of its rather general character. For the

Entropy 2017, 19, 311; doi:10.3390/e19070311 477 www.mdpi.com/journal/entropy

Entropy 2017, 19, 311

last decade, we have been following another approach. Starting from a problem in the domain of
classical information processing, namely Source Separation (SS) with its more difficult so-called Blind
version (BSS), introduced around 1985 and now a mature field [5,6], we are developing its quantum
counterpart, which we proposed to call Blind Quantum Source Separation (BQSS). Each step of this
more pedestrian approach may be controlled, presently e.g., through simulations. This approach has
been achieved in our 2007 paper introducing BQSS [7], and in those describing the solutions which we
have built since then (see e.g., [6,8–14]), and which led to our recent introduction of Blind Quantum
Process Tomography (cf. [12,14] and more explanations at the end of this section and in Part A.2 of
the Appendix).
A short presentation of the problem of classical (i.e., non quantum) or conventional BSS, and of
its interest, is needed here. In BSS, typically, at first, a set of users (the Writer) presents a set of
simultaneous signals (input signals, or sources) at the input of a multi-user communication system
(the Mixer). The sources, constrained to possess some general properties (e.g., mutual statistical
independence), are combined (mixed, in the SS sense) in the Mixer, often specified through a model,
e.g., the linear memoryless one (cf. Chapter 11 from [15]). Another set of users (the Reader) receives the
signals arriving at the Mixer output. The Writer possibly knows the sources, but the Reader does not
know them, and cannot access the inputs of the Mixer. That Mixer uses one or several parameter values,
unknown to the Reader, who only knows some of its general properties. The Reader’s final task is the
restoration of the sources (possibly up to some so-called acceptable indeterminacies) from the signals
at the Mixer output, during the inversion phase. An intermediate task is the determination of the
unknown parameters of the Mixer, or of its inverse. Before receiving the signals to be separated at the
Mixer output, derived from the sources sent by the Writer, the Reader therefore enters an “adaptation
phase”, during which he knows that the Writer is sending one (or possibly a limited number of)
signal(s) submitted to some definite, and known by the Reader, constraints. The particular signal sent
is not known by the Reader (blind separation problem), who knows the class of the input signal(s)
and the signal(s) at the Mixer output in the adaptation phase, and, of course, the mixed signals to be
separated in the inversion phase.
Conventional BSS is already used to extract some or all source signals in various application
fields, e.g., in some audio systems, or when using radio-frequency signals to transmit digital data, or
in the biomedical field, in the processing of signals such as electrocardiograms, electroencephalograms
or magnetoencephalograms, as explained in Part A.1 of the Appendix. More information on the
applications of conventional BSS may be found in our previous papers [11,14], in [6], and in the papers
or books they cite.
BSS is moreover closely linked to a well-known domain of signal processing technology called system
identification. More precisely, BSS is linked to Blind Mixture Identification (BMI), as briefly explained in
Part A.1 of the Appendix and developed in [6], and BSS may be used in the corresponding applications.
Conventional (B)SS has favoured the introduction of concepts and the development of specific
methods [5,6]. Its extension to the quantum domain seems suitable for at least three reasons. First,
the source concept may be extended from a classical to a quantum context. Secondly, as any classical
phenomenon, conventional (B)SS may be seen as the limit of a quantum phenomenon. When
developing solutions to the BQSS problem, it seems legitimate to try and import concepts and
methods from the classical to the quantum SS domain. However, the presence of entanglement
in a quantum approach should be clearly identified and the consequences of its existence should not
be underestimated. In addition, the concepts of quantum sources and of their statistical independence
deserve some discussion, and consequences of the probabilistic aspect of the results of measurements
in the quantum domain must be drawn. Furthermore, last but not least, since some of the basic
concepts of QT are still open to discussion, when e.g., using measurements, even in an abstract process,
the adopted point of view should once be made explicit, in order to minimize confusion. The nature
of this special issue gave us the opportunity to clarify concepts and justify properties already used
in our previous papers upon BQSS, a task postponed up to now, and which should be of use in the

478
Entropy 2017, 19, 311

BQSS domain, and maybe in other fields. These two motivations stimulate a third natural one, namely
the hope of extending the field of BSS applications toward the quantum world. In the following
sections, in order to illustrate our methods and help reading, some aspects or results of our previous
papers will be occasionally presented, but the building of any specific BQSS solution is outside their
scope. The reader interested in the results from simulations may consult [8,11], obtained through BQSS
methods with classical processing, and [14], with quantum processing in the forward path. This recent
paper moreover contains a table with a detailed comparison of the key features and performance from
the existing methods.
In all of our previous papers, we considered two distinguishable qubits numbered 1 and 2,
and we presently keep this situation. When it is meaningful to speak of the state of a quantum system,
and specifically if this system is a qubit, this state may be either pure or mixed. In order to avoid any
confusion with the meaning of a mixture in the SS context, if it is needed to speak of a (quantum)
mixed state in the following, we will systematically speak of a statistical mixture. A typical situation
is the following one: at an initial time t0 , the Writer prepares both qubits, each in a given pure state,
described by some ket. This ket carries information, an idea contained in the expression “quantum
source”. The initial state | Ψ(t0 ) > of the qubit pair is then the tensor product of the corresponding
kets. The time between t0 (writing) and t1 (reading) is supposed to be short enough for the qubit pair
to be treated as isolated, a choice already made by Feynman [16,17] in the context of the quantum
computer, and presently refined at the beginning of Section 4.1 for qubits physically realized with spins.
At any time t between t0 and t1 , the state of the qubit pair may then be described by a ket | Ψ(t) >.
In the Schrödinger picture, this time evolution of the pair is described by a time-dependent unitary
operator U (t0 , t1 ). It is assumed that an undesired coupling exists between these qubits. Because of
this undesired coupling, as time goes on the state of the pair generally becomes entangled. Coupling
is then interpreted as a mixing (in the SS sense), realized by an abstract Mixer depending upon one
or several parameter values, unknown to the Reader, who only knows some general properties of
that Mixer. It is said that the input of the Mixer receives state | Ψ(t0 ) >, and that its output provides
state | Ψ(t) >. It should be well appreciated that inverting U (t0 , t1 ) in order to get | Ψ(t0 ) > from
| Ψ(t1 ) > is not that easy, because U (t0 , t1 ) is unknown (blind QSS). In Section 2, it is first explained why
both state and process quantum tomography are unable to solve this BQSS problem, and secondly
why the Schmidt criterion is ill-suited for following the degree of entanglement of | Ψ(t1 ) > during
the adaptation phase. The Peres–Horodecki criterion [18,19] is valid for separable statistical mixtures
of bipartite systems, and not specifically for unentangled pure states. A better suited unentanglement
criterion is therefore established in Section 2.
In Section 3, a model situation, for a single spin and then for a pair of spins, in inhomogeneous
magnetic fields with random directions, allows us to speak of random and possibly independent
variables, in that quantum context. We explain why, although this random quantum state corresponds
to a statistical mixture, it is simpler, in the BQSS context, to speak of a random pure state than to
introduce a density operator. In Section 4, we first make brief comments about the description of
quantum states (including the existence of statistical mixtures as source states, in a more general
context), about the act of measurement and about the physical realization of qubits with electron
spins. We then discuss questions related to the probabilities of the possible results obtained in
measurements of spin components, in the context of spins 1/2 as qubits. We first present their use
when the Reader makes measurements at the Mixer output in order to restore the sources (cf. Figure 1).
These measurements establish a link between the output of the Mixer and the classical world. It is
stressed that while the macroscopic support of the results of measurements has a classical behaviour,
the probabilities of these results obey quantum laws. We then establish an unentanglement criterion
using probabilities, equivalent to the one established in Section 2 for the probability amplitudes ci .
It is shown that the ci coefficients can be expressed as functions of the probabilities of results in the
measurements of well-chosen spin components. In Section 5, we derive the expression of the above
unentanglement criterion for all possible source states, at the output of the so-called separating system,

479
Entropy 2017, 19, 311

with respect to the parameters of both the cylindrical Heisenberg coupling, an abstract Mixer largely
used in our previous papers, and that separating system.

| ψ (t)> p
classical
| ψ (t )> mixing y
0 processing

mixing stage separating stage

Figure 1. Block diagram of a system using classical-processing BQSS.

In Part A.2 of the Appendix, the question of the applications of BQSS is addressed. Partly
because the appearance of BQSS is recent, the subject of its applications is presently largely speculative.
Two main subdomains should be distinguished. The first one is BQSS in a strict sense. It aims
at recovering the source states and is the quantum counterpart of conventional BSS. The second
subdomain focuses on an intermediate step possibly found in methods developed for BQSS and aiming
at the knowledge of the mixer function or of its inverse. The corresponding classical problem is known
as Blind Mixture Identification (BMI), a subfield of System Identification. The non-blind quantum
version of System Identification is that already mentioned and well-established field of QIP called
Quantum Process Tomography (as opposed to Quantum State Tomography). We recently introduced
the quantum version of BMI, which we proposed to call Blind Quantum Process Tomography (BQPT).

2. An Unentanglement Criterion for a Qubit Pair

A superficial look may suggest that it is possible to restore the initial product state through State
or Process Tomography (ST, PT). ST aims at determining a quantum state if a lot of copies of that state
are available [20]. However, in BQSS, the Reader is unable to access the input of the Mixer, and ST
is therefore obviously presently strictly useless. PT would presently consist of placing (preparing)
successive well-defined and known quantum states at the input of the Mixer, thus operating in the
non-blind mode (cf. [15], p. 202) and observing the corresponding signals at its output. However, in
the BQSS problem, the Reader is strictly unable to operate that way, as he is unable to ask the Writer to
prepare him the quite specific input states asked for by PT. Therefore, quantum tomography is unable
to solve the BQSS problem, which needs dedicated methods (for more details, see [8]).
Up to now, in the BQSS problem, we developed two main approaches for both determination
of the unknown parameter(s) of the mixing or separating system and source separation. In the first
approach [7,8,11], the Reader measures observables, using the signals at the Mixer output (cf. Figure 1).
The results, and properties associated with them, e.g., the probabilities of their occurrences, are kept
upon a macroscopic device, e.g., the memory of a classical computer, and then used in a separating
system. Since this macroscopic device and the separating system have a classical behaviour, we
called this processing aimed at restoring the sources “classical-processing BQSS”. In the second, quite
different, and more recently introduced approach [9,10,14], the quantum state at the Mixer output is
sent to the input of a quantum-processing subsystem (cf. Figure 2), the inverting block of the separating
system. This block is so designed that its output provides a quantum pure state equal to | Ψ(t0 ) >
(possibly up to some acceptable indeterminacies), after the adaptation phase.

480
Entropy 2017, 19, 311

mixing stage separating stage

| ψ (t)>
| ψ (t0)> mixing quantum processing |φ >

classical
processing

Figure 2. Block diagram of a system using BQSS, with quantum processing in the forward path
(no cloning [14], with permision from Elsevier).

From now on, the state spaces of two arbitrary qubits, called qubits 1 and 2, are denoted as
E1 and E2 , respectively. The possible (pure) states of the pair are the kets in E1 ⊗ E2 . We assume
that the qubits are physically realized with spins 1/2, which, e.g., allows us to speak of the spin
component s1z or s2z , but many results established hereafter keep true without this assumption. We
introduce the orthonormal basis B+ , {| ++ >, | +− >, | −+ >, | −− >}, where e.g., | +− > means
| 1+ > ⊗ | 2− > and | i, + >, | i, − > are normed eigenkets of the siz component of (reduced) spin − →
si
(with i = 1, 2), for the eigenvalues +1/2 and −1/2, respectively. Any pure pair state, entangled or not,
may be expanded in B+ as

| Ψ >= c1 | ++ > +c2 | +− > +c3 | −+ > +c4 | −− >, (1)

where the complex coefﬁcients c j (j = 1 to 4) respect ∑ j | c j |2 = 1. If a pure state or a statistical mixture

of a bipartite system S12 (parts S1 and S2 ) is described by a density operator ρ, the corresponding
reduced traces ρ1 = Tr2 ρ and ρ2 = Tr1 ρ have all the mathematical properties of a density operator [2].
In addition, if S12 is in a pure state, ρ1 and ρ2 have the same eigenvalues [21]. This pure state is
unentangled if and only if its Schmidt number NS (the number of non-zero eigenvalues of ρ1 and ρ2 )
is equal to 1 [21]. We are particularly interested in the case when | Ψ > is the state found at the
output of the inverting block. Then, any pure state may be expanded in the standard basis B+ as
in Equation (1), where the values of the ci coefﬁcients are affected by both the coupling between
the qubits and, during the adaptation phase, by the adaptation procedure. This adaptation phase
typically consists of an iterative numerical algorithm, which aims at optimizing a continuous-valued
function, traditionally called the “cost function”. For any given values of the adjustable parameters of
the inverting block, the cost function measures a kind of “distance” between | Ψ > at the output of
the inverting block and an unentangled pure state. The Schmidt unentanglement criterion cannot be
used in our problem because the considered state remains (at least slightly) entangled throughout the
adaptation procedure, and the Schmidt number thus remains higher than one. The Schmidt criterion
provides a binary-valued unentanglement detector, with a Schmidt number equal to one or not and,
if taking into account all possible integer values of NS beyond unentanglement detection, the Schmidt
criterion provides a discrete-valued quantity. What we eventually need instead is a quantitative,
continuous-valued, measure of that “distance” of the considered state with respect to unentanglement,
in order to keep the adjustable parameter values of the inverting block, yielding the state which is the
closest to unentanglement. Moreover, even if the Schmidt approach could be modiﬁed to this end, it
would yield high computational complexity, as it would require one to diagonalize ρ1 or ρ2 for each of

481
Entropy 2017, 19, 311

the quite numerous steps of the iterative adaptation algorithm. We avoid these issues as follows. Since
the qubit pair is in a pure state, its partial traces ρ1 and ρ2 satisfy

Trρ21 = Trρ22 ≤ 1, (2)

and the common value for Trρ21 and Trρ22 is 1 if and only if the pure state is unentangled (cf. [21]).
One could think of using Trρ21 − 1 as a cost function. However, Trρ21 depends upon the ci , which
suggests one to try and establish an unentanglement criterion using the ci explicitly. To this end, we
consider state |Ψ deﬁned through Equation (1). When it is assumed that |Ψ is unentangled, i.e., that
it can be written as
|Ψ = ( a|+ + b|−) ⊗ (c|+ + d|−), (3)

then, in Equation (1), c1 = ac, c2 = ad, c3 = bc, c4 = bd, so c1 c4 and c2 c3 are both equal to abcd:

c1 c4 = c2 c3 . (4)

Conversely, when it is assumed that Equation (4) is satisﬁed, if c1

= 0 then |Ψ may be written as
c3 c
|Ψ = c1 (|+ + |−) ⊗ (|+ + 2 |−), (5)
c1 c1

which means that |Ψ is then unentangled. If Equation (4) is satisﬁed and c1 = 0, then c2 = 0 and
c3
= 0, or c3 = 0 and c2
= 0, or c2 = c3 = 0, and in each case |Ψ is unentangled. Therefore, if the qubit
pair is in a pure state |Ψ written as in Equation (1), then:

|Ψ is unentangled ⇐⇒ c1 c4 = c2 c3 . (6)

This unentanglement criterion for a qubit pair pure state was used without justiﬁcation in [9,10].
In Equation (1), |Ψ was expanded in the standard basis. It is possible instead to introduce e.g.,
the normed eigenvectors of s1x and s2x , or more generally those of s1u and s2v , the components of the
spins along respective arbitrary directions − →u (θ1E , ϕ1E ) and −
→
v (θ2E , ϕ2E ), deﬁned through their Euler
angles. For each component, the possible results are again ±1/2. The possible results for the pair
may be symbolically written as (+u + v), (+u − v), (−u + v) and (−u − v), and the corresponding
probabilities as P1uv , P2uv , P3uv , P4uv. Equation (1) is replaced by

|Ψ = c1uv | + u + v + c2uv | + u − v + c3uv | − u + v + c4uv | − u − v. (7)

With the same reasoning within the new basis, (6) is replaced by

|Ψ is unentangled ⇐⇒ c1uv c4uv = c2uv c3uv . (8)

3. Random Quantum Sources and Their Independence

The qubits are again supposed to be physically realized with spins 1/2. Standard Electron
Spin and Nuclear Magnetic Resonance (ESR, NMR) use a non-microscopic number of resonant
spins, but methods have been proposed for more than twenty years in order to detect a single spin,
particularly with Optically Detected Magnetic Resonance (ODMR [22,23]) or with Magnetic Resonance
Force Microscopy (MRFM [24]), and more recently at low temperature (0.5 K) with Spin Excitation
Spectroscopy [25], or even with ESR, in extreme conditions [26]. These approaches are still under
development. Here, anticipating upon advances in spintronics, we rather consider a pair of spins, or
even a single spin, submitted to a static magnetic ﬁeld.
When speaking e.g., of a microwave source for satellite television, one speaks of the device
emitting the microwave carrier. Similarly, the expression “laser source” generally refers to the device
creating the coherent radiation. In conventional SS, “source” is an abbreviation for “source signal”.

482
Entropy 2017, 19, 311

Furthermore, in Quantum SS with abstract qubits corresponding to physical spins 1/2, the word
“source” does not refer to some atomic beam delivering atoms carrying an electron or nuclear magnetic
moment, but still means “source signal”, then referring to some information from the quantum states
of these qubits.
In conventional SS, an important concept is that of statistical independence of the sources, at the
root of the frequent use of Independent Component Analysis (ICA) [27]. In [7,8,11], we postulated
the existence of statistically independent quantum sources when using the classical-processing SS
defined at the beginning of Section 2. Hereafter, we show that statistical independence may exist in
that context. Quantum Mechanics (QM) does e.g., consider random operators, the matrix elements
of which are random quantities (see the random lattice operators F (q) in the quantum description
of the motions of nuclear moments in liquids, in the study of Spin-Lattice Relaxation (SLR), in [28]).
As a simple model situation, a magnetic moment − →
μ associated with a single electron spin 1/2, with
−
→μ = −G − →s (isotropic g tensor), placed in a Stern–Gerlach device, is now introduced. The static field
−
→ −
→
is B0 = B0 Z , with amplitude B0 . The system of interest consists of this spin and the magnet. Writing
−
→
the Zeeman Hamiltonian as h = −− →
μ B 0 = GB0 s Z indicates that while the spin is a quantum object,
the magnetic field is treated classically. The Writer first prepares the spin in the | + Z eigenstate of s Z
(eigenvalue +1/2). The moment is then received by the Reader, supposed to ignore the direction of
−
→
B0 , and who chooses some direction attached to the Laboratory as the quantization direction, called z
(unit vector −→
uz ) and introduces a Laboratory-tied cartesian reference frame xyz, used to define θ E and
−
→
ϕ E , the Euler angles of Z . Since the field is treated classically, θ E and ϕ E behave as classical variables,
while s Z is an operator. The Reader measures sz = − →s−→
uz (eigenstates: |+ and |−), and is interested
in the probability p+z of getting +1/2. An elementary calculation indicates that
.
| + Z = r |+ + 1 − r2 eiϕ |−, (9)

with
θ2E
r = cos , ϕ = ϕE , (10)
2
and therefore p+z = cos2 θ E /2. Once the direction of the magnetic field has been chosen, state
| + Z is then unambiguously defined. If this direction has a deterministic nature, r and ϕ are
deterministic variables, and | + Z may then be called a deterministic quantum state. If θ E and ϕ E ,
−
→
defining the direction of B0 chosen by the Writer, obey probabilistic laws, one may consider that
the quantum quantities r and ϕ, which depend upon the classical Random Variables (RV) θ E and
ϕ E , do possess the properties of conventional, i.e., classical, RV. It may e.g., happen that they be
uncorrelated, or even independent (which happens if θ E and ϕ E are independent). In addition, if θ E
and ϕ E depend on time in a random way, r and ϕ are then random time functions. We are not strictly
facing the quantum equivalent of a classical situation here. Rather, the stochastic character of the field
direction, with classical nature, is reflected in the random behaviour of the quantum state expressed
through Equation (9). Therefore, rather than a random operator, we meet here a random quantum
state. The concept of a random state, if not the expression, was already used e.g., in the early and
canonical books [29,30]. The probability p+z , presently a function of the RV θ E , is itself an RV. This
results from both the randomness of the field direction and the standard probabilistic interpretation of
QM. Probabilities of results of measurements for a qubit pair were treated as RV, without the present
justification, in most of our previous papers, including [7,8,11].
If one measures the scalar observable O when the spin is in the state |Ψ = α|+ + β|− = Σk f k | ϕk
(where k is associated with + and −), had the f k been deterministic the mean value would have been:

Ψ|O|Ψ = ∑ f k∗ f l Okl , Okl = ϕk |O| ϕl . (11)

k,l

483
Entropy 2017, 19, 311

Since the f k are random, one must moreover calculate the statistical mean, denoted as Ψ|O|Ψ:

Ψ|O|Ψ = ∑ f k∗ f l Okl = TrρO, (12)

k,l

where ρ is the density operator, the matrix elements of which, in the (|+ , |−) basis, are ρl,k = f k∗ f l .
Therefore, it is in principle possible to presently introduce a density operator, which is a non-random
operator (its matrix elements are not random quantities, but statistical averages). However, this does
not present any interest, since in the BQSS problem examined up to now, the Reader knows that e.g.,
qubit 1 has been prepared in a pure state, but does not know the values of the ρij coefﬁcients in any
basis, and is consequently unable to choose a basis in which ρ would be diagonal. It is simpler to keep
speaking of a random pure state.
As a model situation, we now consider two spins 1/2 numbered 1 and 2, each with conditions
−
→
similar to the previous ones, with ﬁelds along directions with respective unit vectors Z1 (θ1E , ϕ1E ) and
−
→
Z2 (θ2E , ϕ2E ), and each spin initially prepared in the state
/
|ψi (t0 ) = ri |i + + 1 − ri2 eiϕi |i −, i = 1, 2, (13)

where |i + and |i − are the eigenkets of siz , the component of − →si along the quantization direction,
for the eigenvalues 1/2 and −1/2, respectively. For the same reason, if the field directions are
random, r1 , ϕ1 , r2 and ϕ2 have the properties of conventional RV. If (θ1E , ϕ1E ) and (θ2E , ϕ2E ) are
mutually statistically independent, the same is then true for the couples of RV (r1 , ϕ1 ) and (r2 , ϕ2 ).
In addition, if e.g., θ1E and ϕ1E are independent, the same is true for r1 and ϕ1 (cf. Equation (10)). These
properties are of major importance for our quantum-source independent component analysis (QSICA)
methods described in [11]. We may then say that the initial state of each qubit is random, i.e., that in
Equation (13) ri and ϕi are RV. When considering the preparation of a pair of qubits each in a pure state,
one may assume either a deterministic or a random direction for each magnetic field. This discussion
shows that the relevant concept, in the latter case, is that of random quantum states, rather than that of
random quantum operators mentioned earlier in this section.
Keeping our assumption of a pair of qubits each prepared in a pure state, we now consider
the second approach for the adaptation and inversion phases (cf. the beginning of Section 2 and
Figure 2), with a quantum state |Φ present at the output of the inverting block. The presence of |Φ
and the Reader’s final aim, the recovery of the initial pure state, prompts the Reader: (1) to speak of
a deterministic or random pure state, rather than to use a density operator; (2) to consider that the
first constraint to be respected in BQSS is then the very existence of an unentangled state at the output
of this inverting block. If unentanglement has first been achieved, then and only then is it possible
to speak of a deterministic or random state for each part of that product state. While entanglement
has no classical counterpart, the following point may be noted here: if a bipartite system is in a pure
(deterministic) state |Φ, to which a density operator ρ = |Φ Φ| corresponds, |Φ is unentangled
if and only if the partial traces ρ1 and ρ2 satisfy the equality ρ = ρ1 ⊗ ρ2 [31]. This unentanglement
condition is reminiscent of the relation ρ = ρ1 · ρ2 between ρ, the joint probability density function
of independent classical RV X1 and X2 , and ρ1 and ρ2 , the respective marginal probability density
functions. Presently, operators replace functions, a tensor product replaces the ordinary product,
and this reminiscence reflects the existence of a classical analogue to unentangled states. Condition (4)
for unentanglement was established using spins 1/2, but is valid for any pair of two-level systems.
This discussion suggests that, in the BQSS problem, when considering a pair of qubits prepared in a
pure state, and moreover using the second approach of Section 2 for adaptation and inversion, instead
of trying to directly import ICA methods into the BQSS context, one should focus on disentanglement
at the output of the inverting block, which recently led us to introduce a disentanglement-based
separation principle [9,10].

484
Entropy 2017, 19, 311

In the next section, use will be made of the number of real independent parameters necessary to
deﬁne an arbitrary normed ket |Ψ in E1 ⊗ E2 , written as in Equation (1), and a ket in E1 ⊗ E2 forced to be
unentangled. These numbers are speciﬁed hereafter. An arbitrary normed ket |Ψ in E1 ⊗ E2 depends
upon the four complex quantities c1 to c4 linked through two relations between real numbers (∑i | ci |2
is equal to 1, and |Ψ and eiϕ |Ψ, with ϕ an arbitrary real quantity, should be considered identical).
An arbitrary normed ket |Ψ in E1 ⊗ E2 therefore depends upon six real independent parameters. If it
is forced to be unentangled, it has to satisfy the equality c1 c4 = c2 c3 between complex quantities.
An unentangled normed ket |Ψ therefore depends upon four real parameters. This corresponds to the
fact that |Ψ is then restricted to the form |Ψ = |ψ1 ⊗ |ψ2 , where the normed kets |ψ1 and |ψ2 ,
describing the state of qubits 1 and 2, respectively, each depend upon two real parameters (r1 , ϕ1 ),
(r2 , ϕ2 ) (cf. Equation (13)).

4. BQSS and Probabilities in Spin Component Measurements

4.1. Some General Considerations
Faced with the variety of existing interpretations of QM, Fuchs and Peres have argued that
“quantum theory needs no interpretation” [32]. Concerning the question of interpreting QM, one
may distinguish between claims that can be experimentally tested (i.e., confirmed or refuted) through
experience, and those which cannot. This may be illustrated by an instance from the early days of QM,
related to the measurement act. At first, Bohr apparently introduced some dichotomy between the
quantum system of interest and the classical behaviour of the apparatus. Chapter VI of Von Neumann’s
1932 book [30] was perhaps the first attempt to treat the system of interest and the apparatus (with a
so-called pointer) as a single system obeying the laws of QM. However, in his book, Von Neumann
also introduced a postulate (wave-function reduction) specifiying the state of the system of interest at
the end of the measurement. Since then, this postulate has been criticized, first by Margenau, who
introduced the concept of preparation, to be distinguished from the one of measurement, and who
insisted that e.g., when a photon is absorbed, the measurement act does not bring the photon into a
new state, but destroys it [33,34]. The measurement act has been largely debated, including recent
discussions through the concept of decoherence (see e.g., [1,21]). When trying to develop the domain of
BQSS, we got some control of the proposed separation methods, through simulations, but we moreover
tried to avoid using ideas linked with some specific “interpretation” of QM. In [8], we did mention
Von Neumann’s book and the irreversible behaviour of the system during measurements, but, after
getting a result through some measurement upon a qubit pair, we never used the state of that qubit
pair at the end of that measurement. On the contrary, after such a measurement, the qubit pair was
often (in an abstract process) submitted to a new preparation, which is not linked to any specific
interpretation of QM.
In the previous sections, the concepts of a pure state and a statistical mixture were both
used. The concept of a statistical mixture may be introduced through a different and more general
situation [35] than the one used in Section 3. The system of interest S and its environment E are viewed
as a global quantum system Σ. If S and E are uncoupled, and isolated from the rest of the world,
and have been separately prepared in a pure state at time t a , then they evolve separetely, each in a
(time-dependent) pure state. If, after t a , a coupling between S and E exists between some times tb and
tc , then from tb on their state generally becomes entangled. In addition, if, starting from tc , one focuses
upon the behaviour of S, use of the partial trace tool shows that everything then occurs as if S were in
a state of statistical mixture described by a well-chosen density operator, obeying the Von Neumann
equation. If one takes the qubit pair as S, up to now we did not discuss the BQSS problem found when
the Writer proposes the qubit pair in a state described by a statistical mixture resulting from some past
interaction with its environment.
In recent discussions about the measurement problem, the concept of decoherence [21] was used
for discussing the effect of a transfer of energy from the system to its environment, an irreversible
phenomenon corresponding to SLR in the ESR/NMR context (with, in the simplest situations,

485
Entropy 2017, 19, 311

a characteristic time called T1 ) [28,36]. In our previous papers and in the present one, starting from
time t0 when the Writer operates, then, at the chosen time scale, the qubit pair is assumed to be isolated
from its environment.
In the ESR/NMR domain, a well-known situation exists when a collection of identical (nuclear
or electron) spins placed in a fixed resonant magnetic field are transiently submitted to an intense,
oscillating magnetic field with a frequency equal to (or near) its resonant value, and with well-chosen
polarization. If each spin is coupled to the magnetic fields only, at the end of the pulse the density
matrix (written in the basis in which the static Zeeman Hamiltonian is diagonal) describing the state of
these spins possesses non-diagonal elements, called coherences. If a weak internal coupling (spin-spin
coupling) such as the dipolar magnetic coupling exists between the spins, and if it is able to manifest
itself at a time scale allowing one to neglect SLR, it progressively induces a decrease of the coherences,
a reversible phenomenon allowing spin echo techniques.
There is presently a second reason for referring to these behaviours in the MR domain, namely the
fact that DiVincenzo suggested the use of electron spins for the physical realization of qubits more than
twenty years ago [37]. Between two neighbouring electron spins, there may exist a strong exchange
interaction, a strictly quantum phenomenon historically first identified by Heisenberg in magnetically
ordered materials. This is the first reason for our choice of a Heisenberg coupling in the BQSS problem.
The second one is that, on the formal side, the version of the Heisenberg Hamiltonian with spherical or
cylindrical symmetry, simple enough to be used in theoretical works, may serve as a benchmark in
that BQSS problem. It should be recalled that an Ising coupling, simpler to manipulate theoretically
than the Heisenberg one, was present in the DiVincenzo 1995 paper, where it helped in the operating
process, while the presence of the Heisenberg coupling is undesired and should be compensated for in
the BQSS context.
It is well-known that the ESR lines of transition ions in insulators at moderate concentrations are
broadened by the dipolar magnetic coupling between the electron spins, the exchange interaction being
negligible then. In concentrated samples, exchange is stronger than dipolar coupling and produces a
narrowing of the lines [36]. Dipolar coupling is long ranged and anisotropic, which should lead to
heavy theoretical treatments if considering a three-dimensional configuration in the BQSS context.
Future technological developments could possibly make e.g., the consideration of a planar square
lattice of dipolar coupled spins meaningful in that context.

4.2. Probabilities in Measurements, Classical versus Quantum World

In this subsection, we are interested in our first approach as defined in Section 2, with measurements
at the Mixer output (cf. Figure 1). We speciﬁcally consider the solutions to BQSS discussed in [7,8,11],
with two spins 1/2, each prepared in a pure state at t0 , then submitted to an undesired Heisenberg
cylindrical coupling [28,38] (axial component: Jz , normal component: Jxy , cf. Equation (4) and
Appendix E of [8], and [36]), and measurements of s1z and s2z at the output of the formal Mixer at
t1 . The probabilities of obtaining (+1/2, +1/2), (+1/2, −1/2), (−1/2, +1/2) and (−1/2, −1/2) are
denoted, respectively, as p1 , p2 , p3 and p4 (as in [8], while in [7] e.g., our present p4 was denoted as p2 ).
We keep Equation (13) for both qubits, with the choice ϕ1 = 0. One then gets [8]:

p1 = r12 r22 , p4 = (1 − r12 )(1 − r22 ). (14)

p2 depends upon a mixing parameter v = sgn(cos Δ E ) sin Δ E , with [8] Δ E = − Jxy (t1 − t0 )/h̄. This
expression for Δ E may be vizualized as the opposite of the phase rotation Δφ = ω (t1 − t0 ) between
states coupled by a Hamiltonian term with energy Jxy , during the time interval (t1 − t0 ), with ω given
by the Planck–Einstein relation ω = Jxy /h̄. Probability p2 satisﬁes
/ / .
p2 = r12 (1 − r22 )(1 − v2 ) + (1 − r12 )r22 v2 − 2r1 r2 1 − r12 1 − r22 1 − v2 v sin Δ I (15)

486
Entropy 2017, 19, 311

and, with our choice for ϕ1 , Δ I = ϕ2 .

In Equation (13), which describes the initial state of the qubit pair, r1 , r2 , ϕ1 and ϕ2 , are used to
define probability amplitudes, i.e., quantum quantities. Expressions (14) and (15) show that p1 , p4 and
p2 depend upon both r1 and r2 , and that p2 moreover depends upon Δ I and therefore the probabilities
clearly follow quantum laws. This instance illustrates the distinction to be made between the quantum
status of these probabilities and the validity of the classical approximation for the physical supports
that store them. In [7,8,11], once r1 , r2 and Δ I were known, the initially prepared qubit states were
completely known, and in the context of classical-processing BQSS, we called r1 , r2 and Δ I the sources
(cf. Section 3) in order to focus on the quantities used in the SS process.
The concept of RV is often used in a classical context. Since on the contrary probabilities
p1 , p4 and p2 follow quantum laws, treating them as RV does not go without saying. However,
Equations (14) and (15) establish that when r1 , r2 , ϕ2 are RV (cf. Section 3) the same is true for p1 , p4
and p2 . They also indicate that p1 , p4 and p2 depend upon both r1 and r2 , and that p2 also depends
upon Δ I . When Jxy = 0 (Ising Hamiltonian −2Js1z s2z ), then v = 0 and, for the state at the Mixer
output, p1 p4 = p2 p3 , which can be interpreted as follows. The four states defining the B+ basis
are then eigenstates of the Hamiltonian, but time evolution introduces phase differences, and it can
be verified that the state at the Mixer output is entangled (except if, accidentally, J (t1 − t0 )/h̄ = kπ,
k being an integer). However, when measuring s1z and s2z , the probability of getting (1/2, 1/2) is
then time-independent, which is also true for the probabilities of getting (1/2, −1/2), (−1/2, 1/2) or
(−1/2, −1/2). Therefore, both products p1 p4 and p2 p3 are time-independent, and since p1 p4 = p2 p3 at
t0 , because the qubit pair is then in a product state, this equality is preserved as time goes on, although
the state has become entangled.
In the end, these measurements made at the output of the Mixer establish a bridge between the
classical and the quantum worlds, the results being kept on macroscopic devices for which the classical
approximation is valid, while the probabilities of their occurrences follow quantum laws.

4.3. An Unentanglement Criterion Using Probabilities

The unentanglement criterion expressed through Equation (4) uses the ci coefﬁcients, i.e.,
probability amplitudes. However, measurements give access to probabilities, not to probability
amplitudes, and the question of establishing whether this unentanglement criterion could be formulated
with probabilities (of the results from spin component measurements) therefore seems relevant. State |Φ
being present at the ouput of the inverting block, and the components s1u and s2u being then measured,
we denote the probabilities of obtaining (1/2, 1/2), (1/2, −1/2), (−1/2, 1/2) and (−1/2, −1/2) as
P1u , P2u , P3u , P4u , respectively, and the corresponding eigenstates of s1u .s2u as | + u, +u, | + u, −u,
| − u, +u and | − u, −u. If e.g., s1x and s2x are measured, the probabilities are denoted as Pix , with
i = 1 to 4. In Section 3, it was said that an unentangled normed ket |Ψ in E1 ⊗ E2 possesses four
degrees of freedom. Taking the squared modulus of each member of the equality c1 c4 = c2 c3 leads to

P1z P4z = P2z P3z . (16)

Then, taking −→u and −→

v of Section 2 both along direction x, we know that c1x c4x = c2x c3x for an
unentangled state (cf. Equation (8)), and therefore that

P1x P4x = P2x P3x . (17)

Equation (16) together with (17) is however weaker than condition c1 c4 = c2 c3 , as can be tested by
considering the following state:

1
|Ψi−i11 = (i | + + − i | + − + | − + + | − −). (18)
2

487
Entropy 2017, 19, 311

|Ψi−i11 is entangled since c1 c4 = − c2 c3 . It can be written

1
|Ψi−i11 = (| + x, + x + i | + x, − x − | − x, + x + i | − x, − x ). (19)
2

Equation (19) shows that the four probabilities Pix attached to |Ψi−i11 are all equal to 1/4. Therefore,
|Ψi−i11 satisfies (16) and (17), while being entangled.
The two qubits being in the state |Ψ expressed through (1), one may decide to treat the three
orthogonal directions on the same footing, measuring successively s x for both spins, then, in a new
set of preparations/measurements, sy for both spins, and finally sz for both spins. The probabilities
of obtaining (1/2, 1/2)), (1/2, −1/2), (−1/2, 1/2), (−1/2, −1/2), respectively, when measuring s1k
and s2k (with k successively equal to x, y, and z), will be denoted as P1k , P2k , P3k and P4k . For e.g.,
the entangled state | Ψi−i11 , as P1z P4z = P2z P3z and P1x P4x = P2x P3x , the hope is that entanglement can
be detected thanks to P1y P4y
= P2y P3y , but, in fact, the four Piy are equal to 1/4. Therefore, measuring
the same spin component for both qubits, successively for x, y and z, fails to allow us to build up an
unentanglement criterion.
However, since two spins are present, there is still the possibility of not systematically measuring
the same spin component for both spins. One chooses to measure successively sz for both spins, then
s1z and s2x in a new set of preparations/measurements, and finally s1z and s2y . The presence of the
s1z measurement in each of these sets corresponds to recognizing that (1) uses the standard basis.
The probabilities of obtaining (1/2, 1/2), (1/2, −1/2),(−1/2, 1/2), (−1/2, −1/2), respectively, when
measuring s1i and s2j (with i = z, x, or y, and j = z, x, or y) will be denoted as P1ij , P2ij , P3ij and P4ij .
Denoting the ci introduced in Equation (1) as ci = ρi eiψi , then from Equation (4) it is known that |Ψ is
unentangled if and only if

{ ρ1 ρ4 = ρ2 ρ3 and ψ1 + ψ4 = ψ2 + ψ3 mod 2π }. (20)

Measuring {s1z , s2z } allows us to know the moduli | ci |2 = ρ2i in (1), and to express the ﬁrst equality
in Equation (20) as
P1zz P4zz = P2zz P3zz . (21)

The Pkzx and Pkzy (with k = 1 to 4), when expressed as functions of the moduli ρl and angles ψm ,
depend upon trigonometric functions of the ψm angles. For instance, for any state |Ψ entangled or not

2P1zx = (ρ21 + ρ22 ) + 2ρ1 ρ2 cos(ψ1 − ψ2 ). (22)

When expressing unentanglement through probabilities, one then has to try and respect both
cos α = cos β and sin α = sin β with α and β values compatible with the equality ψ1 + ψ4 = ψ2 + ψ3 , rather
than to respect the equality ψ1 + ψ4 = ψ2 + ψ3 (mod 2π) itself. If it is ﬁrst known that simultaneously
P1zz P4zz = P2zz P3zz and P1zx P4zx = P2zx P3zx are true, then one immediately deduces that
cos(ψ1 − ψ2 ) = cos(ψ3 − ψ4 ). In addition, if P1zy P4zy = P2zy P3zy replaces the second equality, one
deduces that sin(ψ1 − ψ2 ) = sin(ψ3 − ψ4 ). Therefore, when the three equalities between probability
products are satisﬁed, then ρ1 ρ4 = ρ2 ρ3 and ψ1 + ψ4 = ψ2 + ψ3 (mod 2π). Conversely, if |ψ is
unentangled, then Equation (8) implies that P1zj P4zj = P2zj P3zj , with j = z, x, y respectively. Finally,

c1 c4 = c2 c3 ⇐⇒ { P1zj P4zj = P2zj P3zj , with j = x, y, z}. (23)

The equivalence therefore is between a single relation between probability amplitudes and a
triplet of relations between probabilities. This criterion, although established in the context of BQSS,
has the same general validity as Equation (4).
Use of criterion (23) necessitates successive measurements first of s1z and s2z , then (after new
preparations) of s1z and s2x , and finally (again after new preparations) of s1z and s2y , in order to
successively estimate first the Pizz probabilities, then the Pizx and finally the Pizy . One must measure

488
Entropy 2017, 19, 311

s1z each time, because (1) getting e.g., (+1/2, −1/2) when measuring s1z and s2z is an event to be
distinguished from the one realized when measuring s1z and s2x and getting (+1/2, −1/2), (2) results
of measurements of s1z and s2x are independent only if |Ψ is unentangled, which precisely can’t be
assumed when Equation (23) is to be used.
The two distinguishable spins were made to play different roles in the process, which led to
Equation (23) (systematic measurement of s1z ). This dissymmetry is only partial, as Equation (23) can
be replaced by a version obtained by exchanging the spin numbers. The next subsection makes a
symmetrical use of measurements of spin components, allowing one to get the values of both the ρi
moduli and the ψi angles for the ci coefﬁcients in Equation (1).

4.4. Knowing 2-Qubit Pure States from sij Measurements

If a qubit pair physically realized with spins 1/2 is known to be in an arbitrary pure state described
by |Ψ written as in Equation (1), with ci = ρi eiψi and i = 1 to 4, then in order to know |Ψ, one should
know three moduli ρi and three angles ψi . Accessing these six real quantities is more demanding
than testing |Ψ unentanglement, since once these quantities are known, it is always possible to know
whether |Ψ is unentangled, by testing whether both equalities ρ1 ρ4 = ρ2 ρ3 and ψ1 + ψ4 = ψ2 + ψ3
are satisfied. On the contrary, when one focuses upon entanglement, these two equalities may be
found to be satisfied, while the values of the ρi and ψi are unknown. In the previous subsection,
an unentanglement criterion using only probabilities in the measurements of the sij components,
equivalent to the c1 c4 = c2 c3 criterion, was given. Its existence suggests the following question: is
it possible to access these six real quantities using only probabilities of results in the measurements
of the spin components? We are going to show that the answer is yes. It is already known that
measurements of both s1z and s2z give access to the moduli ρi , through the probabilities Pizz introduced
in Section 4.3. One is left with e.g., determining the three angle differences (ψ1 − ψ3 ), (ψ2 − ψ3 ) and
(ψ4 − ψ3 ) from well-chosen probabilities. We first consider measurements of s1z and s2i , with i = x or y,
as in Section 4.3. When measuring s1z and s2x , the probabilities of getting (1/2, 1/2) and (−1/2, 1/2)
are, respectively,
1 1
P1zx = | c1 + c2 |2 , P3zx = | c3 + c4 |2 , (24)
2 2
which leads to
2P1zx − P1zz − P2zz 2P3zx − P3zz − P4zz
cos(ψ1 − ψ2 ) = √ , cos(ψ3 − ψ4 ) = √ . (25)
2 P1zz P2zz 2 P3zz P4zz

Similarly, when measuring s1z and s2y , the probabilities of getting (1/2, 1/2) and (−1/2, 1/2) are,
respectively,
1 1
P1zy = | c1 − ic2 |2 , P3zy = | c3 − ic4 |2 , (26)
2 2
which leads to
2P1zy − P1zz − P2zz 2P3zy − P3zz − P4zz
sin(ψ1 − ψ2 ) = − √ , sin(ψ3 − ψ4 ) = − √ . (27)
2 P1zz P2zz 2 P3zz P4zz

Expressions (25) and (27) allow us to know both (ψ1 − ψ2 ) and (ψ3 − ψ4 ) (mod 2π).
Now, exchanging the roles of spins 1 and 2, we successively measure {s1x , s2z } and (after new
preparations) {s1y , s2z }. The probabilities of getting (1/2, 1/2) in these measurements are, respectively,

1 1
P1xz = | c + c3 |2 , P1yz = | c − ic3 |2 , (28)
2 1 2 1

489
Entropy 2017, 19, 311

which leads to

2P1xz − P1zz − P3zz 2P1yz − P1zz − P3zz

cos(ψ1 − ψ3 ) = √ , sin(ψ1 − ψ3 ) = − √ . (29)
2 P1zz P3zz 2 P1zz P3zz

(ψ1 − ψ3 ) is therefore known (mod 2π ).

If one wants to identify not the state at the Mixer input but a pure state at the Inverter output,
State Tomography (ST) may in principle be used. However, it is far simpler to make measurements
for the five {s1i , s2j } pairs just considered and to access the corresponding probabilities, than to use ST.
The reason is that ST claims to be valid for any quantum state, and therefore does not take advantage
of the fact that the qubit pair is presently known to be in a pure state. The dimension of the state
space of the qubit pair being four, then, for ST, one has to introduce sixteen operators, namely the
Identity, the six operators s1i and s2j (with i = x, y, z, and j = x, y, z), and the nine products s1i s2j [20].
One should determine experimentally fifteen mean values, giving access to fifteen independent real
values together defining the density operator describing the qubit pair state (three diagonal real
elements, and six non-diagonal complex elements).
The simpler state estimation procedure proposed in this section therefore opens the way to new
classes of BQSS methods, that we just started to explore in [12,13], and then applying this procedure to
the Mixer output.

5. Disentanglement and Cylindrical-Symmetry Heisenberg Coupling

In Section 4.2, we considered measurements made at the Mixer output. We now come to the
method for BQSS used, e.g., in [9], with classical processing in the adapting block of the separating
system, using the notations of [9]. |Ψ(t0 ), the initial product state of the qubit pair, is given by
Equation (1), with the values of the coefﬁcients ci (in the B+ basis) taken at t0 and denoted as ci (t0 ).
These components form the source vector

C+ (t0 ) = [c1 (t0 ), c2 (t0 ), c3 (t0 ), c4 (t0 )] T , T : transpose. (30)

Similarly, the state at the Mixer output at time t, here denoted as |Ψ(t) >, is given by
Equation (1), with the values of the coefﬁcients ci (in the B+ basis) taken at t and denoted as ci (t).
The coupling-induced transition from state |Ψ(t0 ) to |Ψ(t) is interpreted as the transformation
induced by the Mixer, leading to the appearance of |Ψ(t) at its output. In the same basis, |Ψ(t)
is described by the column vector C+ (t) given by (30), with t replacing t0 . In the matrix formalism,
the relation between C+ (t0 ) and C+ (t) is written as

C+ (t) = MC+ (t0 ), (31)

where the square fourth-order matrix M describes the effect of the coupling. In [8], it was shown that
when the coupling may be described by a Heisenberg cylindrical Hamiltonian, then M = QDQ−1 ,
where Q = Q−1 is a square matrix with the following non-zero matrix elements:

1
Q11 = Q44 = 1, Q22 = − Q33 = Q23 = Q32 = √ , (32)
2

and D is a Diagonal square matrix with its diagonal elements equal to Dii = e−iωi (t−t0 ) (i = 1...4),
the ωi being real quantities depending upon Jz and Jxy , with generally unknown numerical values.
The input of the inverting block then receives this state |Ψ(t). Its output provides a state |Φ described
in the B+ basis by a column vector C, with

C = UC+ (t) = UMC+ (t0 ), (33)

490
Entropy 2017, 19, 311

where the square matrix U (Unmixing matrix) describes the effect of the inverting block of the
separating system. If it is possible to choose U in the form U = M−1 , then |Φ will be equal to
|Ψ(t0 ). However, strictly speaking, operating this way is impossible because M = QDQ, and D
is unknown. In [9], the inverting block was formally built using a chain of quantum gates globally
) where D
realizing matrix U in the form U = Q DQ, ) is a diagonal matrix with its four diagonal elements
D) ii (i = 1...4) equal to
) ii = eiγi ,
D γi : free real parameters. (34)
) = Δ is therefore a diagonal matrix with diagonal elements Δii = eiδi , where
DD

δi = γi − ωi (t − t0 ). (35)

The D ) matrix and the adaptation phase were introduced because it is not possible to modify the
values of the D matrix. In the following discussion, it is assumed that the ωi are time-independent and
that the adaptation phase has been successful with respect to unentanglement, i.e., that it has been
possible to adjust the γi in such a way that, in the inversion phase, if the Writer has prepared each
qubit of the qubit pair in an arbitrary pure state at time t0 , we are then sure that state |Φ at the output
of the inverting block is unentangled. The column vectors C+ (t0 ) and C are associated with |Ψ(t0 )
and |Φ respectively, and C = QΔQC+ (t0 ) is therefore the column vector
⎛ ⎞
eiδ1 c1 (t0 )
⎜ [eiδ2 (c (t ) + c (t )) + eiδ3 (c (t ) − c (t ))]/2 ⎟
⎜ 2 0 3 0 2 0 3 0 ⎟
⎜ iδ2 ⎟. (36)
⎝ [e (c2 (t0 ) + c3 (t0 )) − eiδ3 (c2 (t0 ) − c3 (t0 ))]/2 ⎠
eiδ4 c4 (t0 )

State |Φ is unentangled if and only if Equation (4) is fulﬁlled, i.e., if

1
ei(δ1 +δ4 ) c1 c4 = [2c2 c3 (ei2δ2 + ei2δ3 ) + (c22 + c23 )(ei2δ2 − ei2δ3 )] (37)
4

(ci meaning ci (t0 ), for i = 1 to 4). We want this relation to be satisﬁed for any unentangled |Ψ(t0 ).
Starting with a |Ψ(t0 ) state with c2 (t0 )c3 (t0 )
= 0 and remembering that c1 (t0 )c4 (t0 ) = c2 (t0 )c3 (t0 ),
Equation (37) may then be written

1 c2 (t0 ) + c23 (t0 ) i2δ2

ei(δ1 +δ4 ) − (ei2δ2 + ei2δ3 ) = 2 (e − ei2δ3 ). (38)
2 4c2 (t0 )c3 (t0 )

Equation (38) is required to be fulfilled for all possible states |Ψ(t0 ) with c2 (t0 )c3 (t0 )
= 0, and
for fixed δi values (defined once for all during the adaptation phase). The left-hand term does not
depend upon the ci (t0 ), whereas its right-hand term does depend upon them. Therefore, Equation (38)
is satisfied only if
ei2δ2 − ei2δ3 = 0, i.e., δ3 − δ2 = mπ, m : integer, (39)

and then Equation (38) moreover imposes that

δ1 + δ4 = 2δ2 + 2kπ, k : integer. (40)

If Equations (39) and (40) and relation c1 (t0 )c4 (t0 ) = c2 (t0 )c3 (t0 ) are inserted into Equation (36),
it is easy to write |Φ as a product state, which confirms that if Equations (39) and (40) are fulfilled,
and then |Φ is unentangled indeed.
If one now supposes e.g., a |Ψ(t0 ) with c3 (t0 ) = 0, c2 (t0 )
= 0, c4 (t0 )
= 0, and therefore c1 (t0 ) = 0,
then in order for |Φ to be unentangled Equation (37) has to be fulfilled. Putting c1 (t0 ) = c3 (t0 ) = 0
into Equation (37) leads to Equation (39), and the δi are then not submitted to another constraint.

491
Entropy 2017, 19, 311

The same behaviour is found if c4 (t0 ) = c3 (t0 ) = 0, and c1 (t0 )

= 0, c2 (t0 )
= 0, and this remains true if
c1 (t0 ) = c2 (t0 ) = c4 (t0 ) = 0, c3 (t0 )
= 0.
When one starts with an arbitrary initial unentangled state |Ψ(t0 ), the following property is a
consequence of the results of the previous discussion. If during the adaptation phase it has been
possible to rightly fix the γi values, one may claim that the corresponding |Φ is unentangled if and
only if during that adaptation phase the choice of the γi has allowed conditions (39) and (40) to be both
fulfilled. This, however, does not guarantee that |Φ is identical to |Ψ(t0 ). The latter identification
corresponds to source restoration itself, outside the scope of this article.

6. Conclusions
Conventional BSS is a mature field of Signal Processing, with various applications. Its extension
into a quantum context has been developing for a decade, first through the creation of theoretical
methods for Blind Quantum Source Separation (BQSS), with classical and/or quantum processing,
and recently through the use of BQSS in the exploration of Blind Quantum Process Tomography
(BQPT). The present paper examined in detail concepts (e.g., those of quantum sources and of
their independence) and established properties (e.g., an unentanglement criterion) introduced in
our previous papers. In the BQSS context, with qubits supposed to be realized with spins 1/2, one
has to face two major consequences of the quantum behaviour. First, if each qubit of a spin qubit
pair is initially prepared in a pure state, and the time evolution of the pair state is governed by some
undesired coupling between the spins, the Reader at the Mixer output accesses an unknown generally
entangled qubit pair quantum state. This entangled state may be sent to a quantum processing system
in order to restore the initially prepared state. Writing the output state of this processing system as
e.g., |Φ = ∑i ci | i in the standard basis, with well-ordered basis states, we showed that this state is
unentangled if and only if c1 c4 = c2 c3 , a constraint between probability amplitudes. Secondly, results
of measurements of the qubit spin components have a probabilistic nature, and the corresponding
probabilities follow quantum properties even when processed with classical means. This article shows
precautions to be taken when trying to extend to Blind Quantum SS the concept of source statistical
independence used in conventional BSS. Using the probabilities Pizj of getting the different possible
results when measuring s1z and s2j , successively with j = z, x and y, it is shown that the above
unentanglement criterion may be written as { P1zj P4zj = P2zj P3zj }, a set of three constraints between
probabilities. This unentanglement criterion has already been used in the adaptation phase of Blind
Quantum SS, through a disentanglement-based separation principle, before restoration of the initial
unentangled state. The already developed BQSS/BQPT methods do not depend on some specific
interpretation of Quantum Theory, while respecting its general postulates.

Acknowledgments: This theoretical study was performed without financial support. The costs to publish in open
access were handled by Yannick Deville, in the framework of the research activities and projects that he is heading
in his lab (Institut de Recherche en Astrophysique et Planétologie).
Author Contributions: This theoretical study was performed by Alain Deville and Yannick Deville, in connection
with the research activities about related topics that they also performed together (see above-mentioned papers).
Both authors participated in writing this paper.
Conflicts of Interest: The authors declare no conflict of interest.

Appendix A. About Applications of Blind Conventional and Quantum Source Separation

Appendix A.1. Conventional BSS
Some audio systems aim at automatic recognition of speech by a processing unit, e.g., in order to
control actuators (for instance, a car driver can thus control various car functions by speech). When
a speech signal is recorded by a set of microphones situated in a noisy environment, each recorded
signal is a mixture of speech and of various noise signals. In order to avoid a degraded recognition
performance in case these plain recordings were directly provided to an automatic speech recognition
(ASR) system, these recordings may be ﬁrst pre-processed by means of a BSS system, so as to extract

492
Entropy 2017, 19, 311

the speech signal. The denoised speech output of this BSS system is then provided to the ASR system
(see [11] and references therein).
When using radio-frequency signals to transmit digital data, reception antennas may
simultaneously receive several mixed data streams. BSS is then applied to first unmix these signals.
Each extracted signal may then be separately used as required in the considered application. Its use in
the radio-frequency identification (RFID) system instance is briefly presented in [11].
The biomedical field makes a systematic use of signals such as electrocardiograms (ECGs) or
electroencephalograms (EEGs), processed by human experts or computers. This “main task” is often
difficult because each signal in the recorded set is a mixture of various contributions, and the information
of interest thus cannot be easily extracted from any such mixed signal. Again, a solution to this problem
consists of pre-processing the original recordings by means of BSS methods, so as to extract each
signal component of interest separately on each output of this BSS system. In [11], information
is given about the extraction of foetus’s heartbeats from ECG recordings which were mixtures of
large-magnitude mother’s heartbeats, low-magnitude foetus’s heartbeats and noise components.
These foetus’s heartbeats were hardly visible in the original recordings.
BSS is closely related to the so-called Blind System Identification (BSI). The problem of
describing an unknown classical (i.e., non quantum) system through a realistic model is called system
identification. When e.g., this system may be described by a matrix, the task is the determination of
its matrix elements. In Blind System Identification, some properties of the input signals are known,
but the input signals themselves are unknown. Methods for BSS often include the determination of
the unknown mixer function or of its inverse. This is a kind of BSI problem, called Blind Mixture
Identification (BMI).

Appendix A.2. Blind Quantum Source Separation

The acronym BQSS describes the operations aimed at recovering the source state(s) (possibly up
to some accepted indeterminacies), in a context already described in this paper. BQSS with classical
processing can already be used, e.g., by physicists, in possible experiments requiring methods for
retrieving information about individual quantum states from measurements performed after undesired
coupling between these states, e.g., when dealing with quantum phenomena involving electron spins
1/2. BQSS with quantum processing keeps the quantum form of the available mixed data and processes
them by means of quantum circuits in order to retrieve the quantum sources. This version of our QSS
methods could be of interest for the core of future quantum computers, where both the data to be
processed and the processing means will have a quantum form. Quantum-processing BQSS may then
be used as a pre-processing stage, to remove undesired alterations (e.g., due to Heisenberg coupling
between physical qubits made with electron spins) of the data to be provided to the input of the main
processing stage, which then applies the ﬁnal quantum algorithm to these pre-processed data. It was
explained in Part A.1 of this Appendix that such a two-stage system architecture is already used in
conventional BSS.
Independently from BQSS, the QIP community has already developed what is called Quantum
Process Tomography (QPT), the quantum version of system identiﬁcation, and which operates in a
non-blind way. It turns out that BQSS, by estimating the inverse of the mixing function, is also able
to estimate this function itself, i.e., the parameters of the considered coupling operator (possibly up
to some residual transforms, called indeterminacies as in classical BSS). BQSS therefore opens the
way to introducing the blind version of QPT (called BQPT), i.e., performing QPT essentially without
knowing the values of the input quantum states of the considered process (but e.g., requesting them to
be unentangled). The applications related to BQSS thus include applications of BQPT, as a spin-off.
In [14], it was recalled that QPT is considered the gold standard for fully characterising quantum
systems, and in particular for characterising the quantum logic gates that form the basic elements of a
quantum computer. Extending the standard QPT tool to BQPT, its blind version, should be of interest,
e.g., when the input states of the considered process indeed cannot be known, or when it is important

493
Entropy 2017, 19, 311

to beneﬁt from the fact that BQSS avoids the intrisic complexity of standard QPT methods. For more
details about the applications of BQSS and BQPT, the interested reader may refer to [11,14], and to
references therein.

References
1. Laloë, F. Comprenons-Nous Vraiment la MéCanique Quantique; EDP Sciences Les Ulis: Les Ulis, France, 2011;
English version: Do We Really Understand Quantum Mechanics? Cambridge University Press: Cambridge,
UK, 2012.
2. Cohen-Tannoudji, C.; Diu, B.; Laloë, F. Mécanique Quantique; Hermann: Paris, France, 1973; English version:
Quantum Mechanics; John Wiley: New York, NY, USA, 1977.
3. Dirac, P. Quantum Mechanics of Many-Electron Systems. Proc. R. Soc. A 1929, 123, 714–733.
4. Timpson, C.G. Quantum Information Theory and the Foundations of Quantum Mechanics. Ph.D. Thesis,
University of Oxford, Oxford, UK, 2004.
5. Comon, P.; Jutten, C. (Eds.) Handbook of Blind Source Separation: Independent Component Analysis and Applications;
Academic Press: Oxford, UK, 2010.
6. Deville, Y. Blind Source Separation and Blind Mixture Identiﬁcation Methods. In Wiley Encyclopedia of Electrical
and Electronics Engineering; Webster, J., Ed.; Wiley: Hoboken, NJ, USA, 2016; pp. 1–33.
7. Deville, Y.; Deville, A. Blind separation of quantum states: Estimating two qubits from an isotropic Heisenberg
spin coupling model. In Proceedings of the 7th International Conference on Independent Component
Analysis and Signal Separation, London, UK, 9–12 September 2007; Davies, M.E., James, C.J., Abdallah, S.A.,
Plumbley, M.D., Eds.; Springer: Berlin, Germany, 2007; pp. 706–713.
8. Deville, Y.; Deville, A. Classical-processing and quantum-processing signal separation methods for qubit
uncoupling. Quantum Inf. Process. 2012, 11, 1311–1347.
9. Deville, Y.; Deville, A. A quantum-feedforward and classical-feedback separating structure adapted with
monodirectional measurements; blind qubit uncoupling capability and links with ICA. In Proceedings
of the 23rd IEEE International Workshop on Machine Learning for Signal Processing, Southampton, UK,
22–25 September 2013.
10. Deville, Y.; Deville, A. Blind qubit state disentanglement with quantum processing: Principle, criterion
and algorithm using measurements along two directions. In Proceedings of the 2014 IEEE International
Conference on Acoustics, Speech and Signal Processing, Florence, Italy, 4–9 May 2014; pp. 6262–6266.
11. Deville, Y.; Deville, A. Quantum-Source Independent Component Analysis and Related Statistical Blind Qubit
Uncoupling Methods. In Blind Source Separation: Advances in Theory, Algorithms and Applications; Naik, G.R.,
Wang, W., Eds; Springer: Berlin, Germany, 2014; pp. 3–37.
12. Deville, Y.; Deville, A. From blind quantum source separation to blind quantum process tomography.
In Proceedings of the 12th International Conference on Latent Variable Analysis and Signal Separation,
Liberec, Czech Republic, 25–28 August 2015; Vincent, E., Yeredor, A., Koldovský, Z., Tichavský, P., Eds.;
Springer: Berlin, Germany, 2015; pp. 184–192.
13. Deville, Y.; Deville, A. Blind quantum computation: Blind quantum source separation and blind quantum
process tomography. In Proceedings of the 19th Conference on Quantum Information Processing, Banff, AB,
Canada, 10–15 January 2016.
14. Deville, Y.; Deville, A. Blind quantum source separation: Quantum-processing qubit uncoupling systems
based on disentanglement. Digit. Signal Process. 2017, 67, 30–51.
15. Deville, Y. Traitement du Signal: Signaux Temporels et Spatiotemporels—Analyse des Signaux, Théorie de
L’information, Traitement D’antenne, Séparation Aveugle de Sources; Ellipses Editions Marketing: Paris, France,
2011. (In French)
16. Feynman, R.P. Quantum Mechanical Computers. Opt. News 1985, 11, 11–20.
17. Feynman, R.P. Feynman Lectures on Computation; Perseus Publishing: Cambridge, MA, USA, 1996.
18. Peres, A. Separability Criterion for Density Matrices. Phys. Rev. Lett. 1996, 77, 1413–1415.
19. Horodecki, M.; Horodecki, P.; Horodecki, R. Separability of mixed states: Necessary and sufﬁcient conditions.
Phys. Lett. A 1996, 223, 1–8.
20. Nielsen, M.A.; Chuang, I.L. Quantum Computation and Quantum Information; Cambridge University Press:
Cambridge, UK, 2000.

494
Entropy 2017, 19, 311

21. Buchleitner, A.; Viviescas, C.; Tiersch, M. (Eds.) Entanglement and Decoherence (Lectures Notes in Physics);
Springer: Berlin, Germany, 2009.
22. Köhler, J.; Disselhorst, J.A.J.M.; Donckers, M.C.J.M.; Groenen, E.J.J.; Schmidt, J.; Moerner, W.E. Magnetic
resonance of a single molecular spin. Nature 1993, 363, 242–244.
23. Gruber, A.; Dräbenstedt, A.; Tietz, C.; Fleury, L.; Wrachtrup, J.; von Borczyskowski, C. Scanning Confocal
Optical Microscopy and Magnetic Resonance on Single Defect Centers. Science 1997, 276, 2012–2014.
24. Rugar, D.; Budakian, R.; Mamin, H.J.; Chui, B.W. Single spin detection by magnetic resonance force microscopy.
Nature 2004, 430, 329–332.
25. Otte, A.F. Can data be stored in a single magnetic atom? Europhys. News 2008, 38, 31–34.
26. Bienfait, A.; Pla, J.J.; Kubo, Y.; Stern, M.; Zhou, X.; Lo, C.C.; Weis, C.D.; Schenkel, T.; Thewalt, M.L.W.;
Vion, D.; et al. Reaching the quantum limit of sensitivity in electron spin resonance. arXiv 2015, arXiv:1507.06831.
27. Hyvärinen, A.; Karhunen, J.; Oja, E. Independent Component Analysis; Wiley: New York, NY, USA, 2001.
28. Abragam, A. The Principles of Nuclear Magnetism; Oxford University Press: Oxford, UK, 1961.
29. Tolman, R.C. The Principles of Statistical Mechanics; Oxford University Press: Oxford, UK, 1938; p. 327.
30. Von Neumann, J. Les Fondements Mathématiques de la Mécanique Quantique; Alcan: Paris, France, 1946; Editions
Jacques Gabay: Paris, France, 1988. (In French)
31. Barnett, S.M. Quantum Information; Oxford University Press: Oxford, UK, 2009.
32. Fuchs, C.A.; Peres, A. Quantum theory needs no “interpretation”. Phys. Today 2000, 53, 70–71.
33. Margenau, H. Quantum-Mechanical description. Phys. Rev. 1936, 49, 240–242.
34. Margenau, H. Critical Points in Modern Physical Theory. Philos. Sci. 1937, 4, 337–370.
35. Feynman, R.P. Statistical Mechanics; Basic Books: New York, NY, USA, 1972.
36. Abragam, A.; Bleaney, B. Electron Paramagnetic Resonance of Transition Ions; Oxford University Press: Oxford,
UK, 1970.
37. DiVincenzo, D.P. Quantum Computation. Science 1995, 270, 255–261.
38. Fazekas, P. Electron Correlation and Magnetism; World Scientiﬁc: Hackensack, NJ, USA, 1999.

495
MDPI
St. Alban-Anlage 66
4052 Basel
Switzerland
Tel. +41 61 683 77 34
Fax +41 61 302 89 18
www.mdpi.com

Entropy Editorial Ofﬁce

E-mail: [email protected]
www.mdpi.com/journal/entropy
MDPI
St. Alban-Anlage 66
4052 Basel
Switzerland
Tel: +41 61 683 77 34
Fax: +41 61 302 89 18
www.mdpi.com ISBN 978-3-03928-381-1

Willi-Hans Steeb, Yorick Hardy - Problems and Solutions in Quantum Computing and Quantum Information-WSPC (2018)
100% (1)
Willi-Hans Steeb, Yorick Hardy - Problems and Solutions in Quantum Computing and Quantum Information-WSPC (2018)
554 pages
978 94 007 5410 2
100% (1)
978 94 007 5410 2
742 pages
Tao Xiang, Congjun Wu - D-Wave Superconductivity-Cambridge University Press (2022)
100% (18)
Tao Xiang, Congjun Wu - D-Wave Superconductivity-Cambridge University Press (2022)
401 pages
The Historical and Physical Foundations of Quantum Mechanics 0198822189 9780198822189 - Compress
100% (2)
The Historical and Physical Foundations of Quantum Mechanics 0198822189 9780198822189 - Compress
769 pages
(Graduate Texts in Physics) Pierre Meystre - Quantum Optics - Taming The Quantum-Springer (2021)
100% (3)
(Graduate Texts in Physics) Pierre Meystre - Quantum Optics - Taming The Quantum-Springer (2021)
402 pages
John Macquarrie - Principles of Christian Theology PDF
100% (14)
John Macquarrie - Principles of Christian Theology PDF
554 pages
Maria Chekhova, Peter Banzer - Polarization of Light - in Classical, Quantum, and Nonlinear Optics (De Gruyter Textbook) - de Gruyter (2021)
50% (2)
Maria Chekhova, Peter Banzer - Polarization of Light - in Classical, Quantum, and Nonlinear Optics (De Gruyter Textbook) - de Gruyter (2021)
232 pages
The Conceptual Framework of Quantum Field Theory (PDFDrive)
100% (3)
The Conceptual Framework of Quantum Field Theory (PDFDrive)
793 pages
Mohsen Razavy - Quantum Theory of Tunneling (2013, World Scientific Publishing Company)
No ratings yet
Mohsen Razavy - Quantum Theory of Tunneling (2013, World Scientific Publishing Company)
792 pages
(The History of Modern Physics 1800-1950, Volume 12) Max Jammer - The Conceptual Development of Quantum Mechanics-American Institute of Physics - Tomash Publishers (1989)
No ratings yet
(The History of Modern Physics 1800-1950, Volume 12) Max Jammer - The Conceptual Development of Quantum Mechanics-American Institute of Physics - Tomash Publishers (1989)
457 pages
2018 Relativity Without Spacetime
100% (2)
2018 Relativity Without Spacetime
196 pages
A Modern Course in Quantum Field Theory: December 2018
0% (1)
A Modern Course in Quantum Field Theory: December 2018
27 pages
Zlib - Pub - Introduction To Quantum Mechanics
100% (1)
Zlib - Pub - Introduction To Quantum Mechanics
273 pages
Keller J. - Theory of The Electron (2007)
100% (3)
Keller J. - Theory of The Electron (2007)
280 pages
Henriette Elvang, Yu-Tin Huang - Scattering Amplitudes in Gauge Theory and Gravity-Cambridge University Press (2015) - 1
100% (1)
Henriette Elvang, Yu-Tin Huang - Scattering Amplitudes in Gauge Theory and Gravity-Cambridge University Press (2015) - 1
336 pages
The Kurzweil-Henstock Integral For Undergraduates: Alessandro Fonda
No ratings yet
The Kurzweil-Henstock Integral For Undergraduates: Alessandro Fonda
227 pages
Macquarrie - An Existentialist Theology
100% (3)
Macquarrie - An Existentialist Theology
241 pages
Vdoc - Pub Relativistic Quantum Mechanics and Quantum Field Theory
100% (2)
Vdoc - Pub Relativistic Quantum Mechanics and Quantum Field Theory
272 pages
Hughes2021 Book QuantumComputingForTheQuantumC
100% (7)
Hughes2021 Book QuantumComputingForTheQuantumC
159 pages
Elementary Particle Theory.
100% (2)
Elementary Particle Theory.
217 pages
Leonhardt U. - Essential Quantum Optics - From Quantum Measurements To Black Holes-Cambridge University Press (2010)
80% (5)
Leonhardt U. - Essential Quantum Optics - From Quantum Measurements To Black Holes-Cambridge University Press (2010)
292 pages
David Vanderbilt - Berry Phases in Electronic Structure Theory - Electric Polarization, Orbital Magnetization and Topological Insulators-Cambridge University Press (2018)
100% (1)
David Vanderbilt - Berry Phases in Electronic Structure Theory - Electric Polarization, Orbital Magnetization and Topological Insulators-Cambridge University Press (2018)
395 pages
Quantum Anomaly
100% (2)
Quantum Anomaly
297 pages
Lecture Notes On General Relativity
No ratings yet
Lecture Notes On General Relativity
954 pages
(Oxford Graduate Texts) Efstratios Manousakis - Practical Quantum Mechanics - Modern Tools and Applications (2016, Oxford University Press) PDF
100% (3)
(Oxford Graduate Texts) Efstratios Manousakis - Practical Quantum Mechanics - Modern Tools and Applications (2016, Oxford University Press) PDF
348 pages
Semiconductor QDOTS
100% (1)
Semiconductor QDOTS
399 pages
Q. HO-KIM - Group Theory: A Problem Book
100% (3)
Q. HO-KIM - Group Theory: A Problem Book
104 pages
(Free Sample) Disha 144 JEE Main Physics Online (2023-2012) & Offline (2018-2002) Chapter-Wise + Topic-Wise Previous Year Solved Papers 7th Edition NCERT Chapterwise PYQ Question Bank - Interior
No ratings yet
(Free Sample) Disha 144 JEE Main Physics Online (2023-2012) & Offline (2018-2002) Chapter-Wise + Topic-Wise Previous Year Solved Papers 7th Edition NCERT Chapterwise PYQ Question Bank - Interior
24 pages
Allan Griffin, Tetsuro Nikuni, Eugene Zaremba Bose-Condensed Gases at Finite Temperatures
No ratings yet
Allan Griffin, Tetsuro Nikuni, Eugene Zaremba Bose-Condensed Gases at Finite Temperatures
475 pages
Quantum Computing Lectures Preskill
100% (4)
Quantum Computing Lectures Preskill
321 pages
Gauge Invariance and Weyl-Polymer Quantization
100% (1)
Gauge Invariance and Weyl-Polymer Quantization
104 pages
Penrose - Twistor Theory
100% (2)
Penrose - Twistor Theory
75 pages
Quantization Book
100% (1)
Quantization Book
516 pages
Advanced Quantum Physics
100% (1)
Advanced Quantum Physics
31 pages
Tensors Manifolds and Relativity
100% (1)
Tensors Manifolds and Relativity
106 pages
(Synthesis Lectures On Quantum Computing) Marco Lanzagorta-Quantum Radar - MC (2012)
100% (1)
(Synthesis Lectures On Quantum Computing) Marco Lanzagorta-Quantum Radar - MC (2012)
141 pages
Principles of Quantum Computation 2
100% (1)
Principles of Quantum Computation 2
445 pages
Tensor Calculus and Differential Geometry - Luc Florack
No ratings yet
Tensor Calculus and Differential Geometry - Luc Florack
68 pages
Introduction To Quaternions With Numerou PDF
No ratings yet
Introduction To Quaternions With Numerou PDF
249 pages
General Relativity Px436
No ratings yet
General Relativity Px436
133 pages
Ee212 Lecture Notes 2019
No ratings yet
Ee212 Lecture Notes 2019
204 pages
Advanced Topics in Quantum Field Theory
No ratings yet
Advanced Topics in Quantum Field Theory
131 pages
Innovating With Quantum Computing: Enterprise Experimentation Provides View Into Future of Computing
100% (1)
Innovating With Quantum Computing: Enterprise Experimentation Provides View Into Future of Computing
16 pages
Geometry, Topology and Physics
100% (1)
Geometry, Topology and Physics
35 pages
Quantum Theory of Many Particle Systems
No ratings yet
Quantum Theory of Many Particle Systems
109 pages
Zlamn - Christodoulou - Titelei 3.1.2008 16:07 Uhr Seite 1: S E E M S M
No ratings yet
Zlamn - Christodoulou - Titelei 3.1.2008 16:07 Uhr Seite 1: S E E M S M
157 pages
Willi-Hans Steeb, Yorick Hardy - Problems & Solutions in Quantum Computing & Quantum Information-World Scientific Publishing Company (2004) PDF
No ratings yet
Willi-Hans Steeb, Yorick Hardy - Problems & Solutions in Quantum Computing & Quantum Information-World Scientific Publishing Company (2004) PDF
262 pages
Richard Feynman: Simulating Physics With Computers
100% (1)
Richard Feynman: Simulating Physics With Computers
8 pages
Mathematics Is Physics
100% (1)
Mathematics Is Physics
9 pages
A First Course in General Relativity Sol
No ratings yet
A First Course in General Relativity Sol
41 pages
Quantum Logic
100% (2)
Quantum Logic
102 pages
A00 Book EFSchubert Physical Foundations of Solid State Devices
100% (1)
A00 Book EFSchubert Physical Foundations of Solid State Devices
273 pages
Dirac and The Principles of Quantum Mechanics - K.gottfried
No ratings yet
Dirac and The Principles of Quantum Mechanics - K.gottfried
11 pages
General Chemistry 1 Activity Sheet: Quarter 2 - MELC 3 Week 2
100% (1)
General Chemistry 1 Activity Sheet: Quarter 2 - MELC 3 Week 2
11 pages
String Theory Demystified
No ratings yet
String Theory Demystified
10 pages
Notes On Relativistic Quantum Field Theory: A Course Given by Dr. Tobias Osborne
No ratings yet
Notes On Relativistic Quantum Field Theory: A Course Given by Dr. Tobias Osborne
97 pages
General Chemistry I - Q2 M6.2 Electron Configuration of Atoms
100% (1)
General Chemistry I - Q2 M6.2 Electron Configuration of Atoms
17 pages
Oro 551 - Renewable Energy Sources Unit I Principles of Solar Radiation
100% (1)
Oro 551 - Renewable Energy Sources Unit I Principles of Solar Radiation
44 pages
Demystifying Quantum Mechanics
0% (1)
Demystifying Quantum Mechanics
10 pages
T'hooft Quantum Mechanics and Determinism
No ratings yet
T'hooft Quantum Mechanics and Determinism
13 pages
Quantum Weinberg PDF
No ratings yet
Quantum Weinberg PDF
7 pages
Entanglement Lectures - Susskind
No ratings yet
Entanglement Lectures - Susskind
4 pages
Applied Maths 2020-2011
No ratings yet
Applied Maths 2020-2011
75 pages
Introduction To Computing
No ratings yet
Introduction To Computing
270 pages
Mefp 2
No ratings yet
Mefp 2
44 pages
Brian Greene Thesis
100% (1)
Brian Greene Thesis
267 pages
Qclec
No ratings yet
Qclec
260 pages
Agnosis - Theology in The Void (PDFDrive)
No ratings yet
Agnosis - Theology in The Void (PDFDrive)
206 pages
ON THE QUANTUM STRUCTURE OF A BLACK HOLE Gerard T HOOFT
No ratings yet
ON THE QUANTUM STRUCTURE OF A BLACK HOLE Gerard T HOOFT
19 pages
The Nature of Suffering and The Goals of Medicine (PDFDrive)
No ratings yet
The Nature of Suffering and The Goals of Medicine (PDFDrive)
273 pages
MD Anderson Handbook 2018
No ratings yet
MD Anderson Handbook 2018
197 pages
LASER Lecture Notes
No ratings yet
LASER Lecture Notes
10 pages
Chapter #5 MCQ 1ST Year (File 2) - 2
100% (1)
Chapter #5 MCQ 1ST Year (File 2) - 2
9 pages
Palliative Network of Wisconsin Fast Facts
No ratings yet
Palliative Network of Wisconsin Fast Facts
788 pages
Ketamine NNT
No ratings yet
Ketamine NNT
8 pages
A Review of Transformer Losses
No ratings yet
A Review of Transformer Losses
17 pages
TQ - Q3 - Science - 8 - ASTERIO MADALLA
No ratings yet
TQ - Q3 - Science - 8 - ASTERIO MADALLA
8 pages
Physics Ass
No ratings yet
Physics Ass
2 pages
Atomic Structure
No ratings yet
Atomic Structure
8 pages
Perspectives: The Default Mode Network: Where The Idiosyncratic Self Meets The Shared Social World
No ratings yet
Perspectives: The Default Mode Network: Where The Idiosyncratic Self Meets The Shared Social World
12 pages
The Destiny of Man
No ratings yet
The Destiny of Man
158 pages
Vol5Iss1CR1Ganapathy 2
No ratings yet
Vol5Iss1CR1Ganapathy 2
3 pages
Course Meeting Times: 8.323 Relativistic Quantum Field Theory I 8.324 Relativistic Quantum Field Theory II
No ratings yet
Course Meeting Times: 8.323 Relativistic Quantum Field Theory I 8.324 Relativistic Quantum Field Theory II
3 pages
Facou 1 1529474
No ratings yet
Facou 1 1529474
13 pages
Physical Quantities and Measurement
No ratings yet
Physical Quantities and Measurement
19 pages
Assignment Motion in Plane
No ratings yet
Assignment Motion in Plane
2 pages
Quantum Physics
No ratings yet
Quantum Physics
3 pages
William Alexander - Seminar Flyer
No ratings yet
William Alexander - Seminar Flyer
1 page
Simple Harmonic Motion
No ratings yet
Simple Harmonic Motion
6 pages
bAppM 2021 deRidderL
No ratings yet
bAppM 2021 deRidderL
62 pages
PHY106 Finals Lesson 5
No ratings yet
PHY106 Finals Lesson 5
11 pages
Physics Math PDF
No ratings yet
Physics Math PDF
5 pages
Physics: Pearson Edexcel
No ratings yet
Physics: Pearson Edexcel
28 pages
The Fluid Equation of Motion
No ratings yet
The Fluid Equation of Motion
7 pages
SCH 2110 Set 1
No ratings yet
SCH 2110 Set 1
29 pages
Coulomb's Law and Electric Field Intensity: Engineering Electromagnetics
No ratings yet
Coulomb's Law and Electric Field Intensity: Engineering Electromagnetics
24 pages
November 2022 (v3) QP - Paper 4 CAIE Physics IGCSE
No ratings yet
November 2022 (v3) QP - Paper 4 CAIE Physics IGCSE
16 pages
EM Waves WS 2
No ratings yet
EM Waves WS 2
2 pages
Maths - Lecture Planner - - रणनीति Lakshya JEE 2025 - Most Expected Questions
No ratings yet
Maths - Lecture Planner - - रणनीति Lakshya JEE 2025 - Most Expected Questions
1 page
Lesson 15 Sound Propagation
No ratings yet
Lesson 15 Sound Propagation
5 pages
Nuclear Decay Processes
No ratings yet
Nuclear Decay Processes
4 pages
Self-Quiz: For Each Question, Select The Best Answer From The Four Alternatives
No ratings yet
Self-Quiz: For Each Question, Select The Best Answer From The Four Alternatives
1 page
Expanded Maxwellian Geometry of Space
From Everand
Expanded Maxwellian Geometry of Space
Andre Michaud
2/5 (2)
Lectures on the Coupling Method
From Everand
Lectures on the Coupling Method
Torgny Lindvall
No ratings yet
Methods of Quantum Field Theory in Statistical Physics
From Everand
Methods of Quantum Field Theory in Statistical Physics
A. A. Abrikosov
4/5 (2)
Molecular Quantum Electrodynamics
From Everand
Molecular Quantum Electrodynamics
D. P. Craig
4/5 (2)
Graphs and Tables of the Mathieu Functions and Their First Derivatives
From Everand
Graphs and Tables of the Mathieu Functions and Their First Derivatives
James C. Wiltse
No ratings yet
What Is Empty Space?
From Everand
What Is Empty Space?
Doug Domke
No ratings yet
Notes on the Quantum Theory of Angular Momentum
From Everand
Notes on the Quantum Theory of Angular Momentum
Eugene Feenberg
No ratings yet

Quantum Information and Foundations

Uploaded by

Quantum Information and Foundations

Uploaded by

Quantum

Special Issue Editors

ISBN 978-3-03928-380-4 (Pbk)

About the Special Issue Editors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

Giacomo Mauro D’Ariano and Paolo Perinotti

Marc-Olivier Renou, Nicolas Gisin and Florian Fröwis

John Selby and Bob Coecke

Alberto Barchielli, Matteo Gregoratti and Alessandro Toigo

Mario Arnolfo Ciampini, Paolo Mataloni and Mauro Paternostro

Philipp Andres Höhn

Yangyang Wang, Xiaofei Qi and Jinchuan Hou

Andriyan Bayu Suksmono

Ameneh Arjmandzadeh and Majid Yarahmadi

Lucas Kocia, Yifei Huang, Peter Love

Paolo Perinotti is an associate professor at the Physics Department of Pavia University. He

Keywords: quantum information; quantum foundations; quantum theory and gravity

Entropy 2020, 22, 22; doi:10.3390/e22010022 1 www.mdpi.com/journal/entropy

Keywords: quantum walks; Hubbard model; Thirring model

Entropy 2018, 20, 435; doi:10.3390/e20060435 9 www.mdpi.com/journal/entropy

2. The Dirac Quantum Walk

where Tx denotes the translation operator on 2 (Z), deﬁned by Tx | x = | x + 1.

W ( p)vsp = e−isω ( p) vsp , s = ±, (6)

can be conveniently written as

with gs ( p) := −i (s sin ω ( p) + ν sin p), | Ns |2 := μ2 + | gs |2 .

3. The Thirring Quantum Walk

Vint = VN (χ) := eiχn↑ ( x)n↓ ( x) . (9)

where the matrix W2 ( p, k ) is given by

k = v p+k ⊗ v p−k , with s, r = ±, such that

where ωsr ( p, k) := sω ( p + k) + rω ( p − k ) is the dispersion relation of the two-particle walk. Explicitly,

We focus in this work on Fermionic solutions satisfying the eigenvalue equation

U2 (χ, p) |ψ = e−iω |ψ , ω ∈ R, (16)

|ψ(y) = − E |ψ(−y) , (17)

E being the exchange matrix

4. Symmetries of the Thirring Quantum Walk

W ( p) = σx W (− p)σx , p ∈ (−π, π ], (20)

so that, for the two-particle walk, we have the relation

On the other hand, a translation of π of the total momentum p entails that

while the interaction term remains unaffected in both cases.

Pe = ∑ |2z 2z| , Po = ∑ |2z + 1 2z + 1| . (24)

5. Review of the Solutions

and the antisymmetry condition becomes:

ψ1,4 (−z) = −ψ1,4 (z − 1), (26)

The most general solution of Equation (28) for p

with k = k R + ik I , S := { k ∈ C | k R ∈ (−π, π ] }, and gω

sr . Let us now study the equation

e−iωsr ( p,k) = e−iω . (32)

5.1. Scattering Solutions

ψk±,1 (z) := (α± +±,1

ψk1 (z) = a[λe−i(2z+1)k − ρei(2z+1)k ],

−iμνeip eik a + ν2 ei2k b − μ2 c − iμνe−ip eik d = e−iω b, (41)

to obtain the expression

It is worth noticing that T± is of unit modulus for k ∈ (−π, π ].

ψk±,1 (z) = (v+±

which in terms of the relative coordinate y can be written as

5.2. Bound States

ψk̃1 (z) = (v+±

ψk̃4 (z) = (v+±

where k̃ is the solution of T+ = 0 or T− = 0 and ± chosen accordingly. More compactly, in the y

5.3. Solution for e−iω = e±i2p

(e−iω − ei2p )ζ = ei2p (e−iω − e−i2p )ζ  . (51)

(1 − μ2 ei(χ−2p) )η = iμνe−ip ζ (52)

5.4. Solutions for p ∈ {0, π/2}

then we name the related eigenspace the even eigenspace; whereas, if s

Analogously for the odd case, the eigenstates are

Received: 31 October 2017; Accepted: 26 December 2017; Published: 9 January 2018

Keywords: quantum measurement; quantum estimation; macroscopic quantum measurement

Entropy 2018, 20, 39; doi:10.3390/e20010039 27 www.mdpi.com/journal/entropy

2.1. General Framework

2.2. Score for Given Input State and Measurement

Lemma 1. Bob’s mean score is:

Proof. Bob’s mean score is:

2.4. Optimality of a State and a Measurement for Direction Guessing

2.5. Estimation from a Thermal State

3. A Macroscopic Quantum Measurement

3.1. The Model

3.2. Behavior for Zero Temperature States

(e−iω − ei2p )ζ = ei2p (e−iω − e−i2p )ζ . (51)

where o (1) → 0 when J → ∞. Hence, if o J is not asymptotically 2J + 1, J cannot be asymptotically one.

(∂/∂t + η̂η∂/∂x + ∂/∂y + η∂/∂z

Γ̂ρ( x, t) = (− E − η̂η p x − py − η

{±1, ±, ±η, ±i }.

I 2 = ιι = ιι = (−1)(+1) = −1,

J 2 = ηη = (−)ηη = −1,

I JK = ιηιη = ι1ιηη = ιι = −1.