0% found this document useful (0 votes)
6 views231 pages

AssumptionsOfPhysicsV2 0

The document outlines the project 'Assumptions of Physics' by Gabriele Carcassi and Christine A. Aidala, which aims to derive fundamental physical laws from a minimal set of assumptions. It emphasizes the need for a unified framework that integrates various disciplines of science and mathematics while being accessible to experimental physicists and engineers. The work is ongoing and will evolve with new insights, focusing on reverse physics and physical mathematics to establish a rigorous foundation for scientific theories.

Uploaded by

salamostrov
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views231 pages

AssumptionsOfPhysicsV2 0

The document outlines the project 'Assumptions of Physics' by Gabriele Carcassi and Christine A. Aidala, which aims to derive fundamental physical laws from a minimal set of assumptions. It emphasizes the need for a unified framework that integrates various disciplines of science and mathematics while being accessible to experimental physicists and engineers. The work is ongoing and will evolve with new insights, focusing on reverse physics and physical mathematics to establish a rigorous foundation for scientific theories.

Uploaded by

salamostrov
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 231

of

Gabriele Carcassi
Christine A. Aidala
Gabriele Carcassi, Christine A. Aidala

Assumptions of physics
Ver 2.0 - October 1, 2023

This edition was finalized on October 1, 2023. Older and newer versions can be found at
https://assumptionsofphysics.org/book.
Copyright © 2018-23 Gabriele Carcassi, Christine A. Aidala

assumptionsofphysics.org/book

Licensed under the Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA


4.0) (the “License”). You may not use this file except in compliance with the License. You
may obtain a copy of the License at https://creativecommons.org/licenses/by-nc-sa/4.0/. This
work is distributed under the License “as is”, without warranties or conditions of any kind,
either express or implied.
Preface

This work is part of a larger project, Assumptions of Physics (https://assumptionsofphysics.


org), that aims to identify a handful of physical principles from which the basic laws can be
rigorously derived. The goal is to give physics (and science more in general) a renewed foun-
dation that is mathematically precise, physically meaningful and philosophically consistent.
Given the ambition and broad scope of the task, nothing would ever be written if one were
to wait for the complete picture. Therefore this work contains only the parts of the project
that are considered to be mature, and so it will be revised and expanded as progress is made.
We give here a brief overview of the project, which can be useful to better understand the
context of this work.

Overall goals of Assumptions of Physics


What do the basic laws of physics describe? Why is the state of a classical particle identi-
fied by position and momentum (i.e. a point on a symplectic manifold) while the state of
a quantum system is identified by a wave function (i.e. a vector in a Hilbert space)? What
assumptions are unique to each theory? What are, instead, the basic requirements that all
physical theories must satisfy? Could we have had different laws? A lack of clear answers to
these questions is, we believe, the biggest obstacle in the foundations of physics and prevents
the resolution of outstanding problems in the field. Our approach is to find a minimal set
of physical assumptions that are necessary and sufficient to derive the different
theories within a unified framework. If we are able to do so, then we are guaranteed that
all that the laws of physics say is encoded in those assumptions and we are able to answer
those questions.
We found this approach to be very fruitful. It provides new insights into physics as a whole,
the role of mathematics in physical theories and a gives a more solid conceptual foundation to
both. It becomes clear why some mathematical structures are pervasive in science and what
exactly they are meant to represent, while others will never play a role. The downside is that
we have to touch many subjects in math (logic, topological spaces, measure theory, group
theory, vector spaces, differential geometry, symplectic geometry, statistics, probability the-
ory, ...), physics (Hamiltonian mechanics, Lagrangian mechanics, thermodynamics, quantum
mechanics, electromagnetism, ...) and science in general (computer science theory, informa-
tion theory, system theory, ...). In other words, the only way to properly achieve the
goal is to rebuild everything from the ground up: formal rigor, physical significance
and conceptual integrity are not something that can be added at the end, but they must be
present from the beginning.
The main takeaway for us is that the foundations of science are one: no real progress can
be made on the foundations of one subject without making progress on the foundations of

v
vi PREFACE

others. What is needed is a general theory of experimental science: a theory that


studies physical theories. This provides a standard framework that defines basic concepts
and requirements (e.g. experimental verifiability, granularity of descriptions, processes and
states) that serve as a common basis for all theories. Each theory, then, is recovered by
studying how these common objects become specialized under different assumptions. This
book aims to build over time, piece by piece, this framework.
While the topic is necessarily inter-disciplinary, this is still first and foremost a sci-
entific book. The material should be accessible to the mathematician and philosopher, but
understand that it needs to resonate first and foremost with the experimental physicist and
the engineer. The mathematical definitions and derivations are there to make the science pre-
cise, but they are not the main focus. In fact, the book is designed so that the mathematical
definitions and proofs, highlighted with a green side bar, can be skipped altogether without
loss of the big picture and the important details. Along the same line, the foundational dis-
cussions are there to articulate more precisely what it means to do science, so they will not
indulge in other questions which may be of interest to the philosopher but not to the scientist.

A living work
For the project to be successful, we need to depart from some of the norms of academic
research and academic publishing. For example, one typically develops his research program
as a series of articles published in a peer reviewed journal that caters to a specific community.
These articles are then typically collected as is or merged into a book. This does not work for
this project. As journals are specialized into sometimes very narrow fields, this would create
a set of disjoint articles that cater to different audiences, with no guarantee that they can fit
into a unified vision. For us, the overall picture, if and how the different perspectives combine,
is the most important feature. In this sense, the book comes first and the articles are
derivative works. We need to pool expertise and ideas from a wide range of disciplines and
make sure that the result makes sense from all angles.
As the goal is broad, the framework needs to evolve as new issues are solved and old ones
are better understood. If one part changes, we have to make sure that everything is updated
to keep conceptual consistency. This book, therefore, is an ongoing project. It will
continue to grow organically, adding and revising chapters. As is standard practice in open
source/free software communities, we need to “release early, release often” to gather feedback.
Each new version supersedes all prior ones and will be superseded by future ones. There is
therefore no “definitive version” in the near future as we don’t expect to “solve all of physics”
in the near future. However, the framework will tend to converge as different parts become
more settled.
The upshot is that one only needs to read the latest version of this work to be current.
That is, one does not need to read a scattered set of papers, which require previous knowledge
of a field, and follow how the ideas have changed. Just get the latest copy of the book, and if
you do find areas you can help us expand or improve, let us know!

Project overview
Here we present a summary of the whole project and the status of each part as the layout will
map to the structure of this work. We divide the work based on the two main techniques we use.
The first, reverse physics, aims to identify the fundamental ideas and assumptions by reverse
PROJECT OVERVIEW vii

engineering them from the current physical theories. The second, physical mathematics,
aims to construct a rigorous mathematical framework from the ground up, based on the ideas
and assumptions found by reverse physics.

Reverse physics
Reverse physics looks at the main physical theories, like classical mechanics, thermodynamics
and quantum mechanics, to identify concepts that can be used to fully explain the common
and different aspects of those theories. The idea is to find physically meaningful assumptions
that can be shown to be equivalent to the physical laws.
The standard of rigor in this part is necessarily more relaxed as we do not have a guarantee
that sufficiently mature mathematical tools exist to carry out the argument in a precise
fashion. For example, we have found that the idea of a unit system is linked in a fundamental
way to the notion of state spaces, yet we lack a fully developed mathematical framework
to model units and their dependency. The goal is to test the ideas conceptually, find those
that are broad enough and necessary enough to then justify investing further time in a more
rigorous approach.
The following are examples of the type of assumptions we have found to be good starting
points to rederive the different theories.

Determinism and reversibility: “The system undergoes deterministic and reversible evo-
lution.” Mathematically, the physical properties of the system determine which category,
in the mathematical sense, is used to describe the state space, and deterministic and
reversible evolution will be an isomorphism in the category (i.e. a bijective map that
preserves the physical properties of the system). Therefore the law of evolution is not
just a bijective map, but is also a linear transformation, a differentiable map or an
isometry depending on the context.
Infinitesimal reducibility: “Specifying the state of the whole system is equivalent to spec-
ifying the state of all its infinitesimal parts.” For example, we can study the motion of
a ball, but we can also mark a spot in red and study the motion of the mark. Knowing
the evolution of the whole ball means knowing the evolution of any arbitrary spot and
vice-versa. Mathematically, the state of the whole will be a distribution over the state
space of the parts. It will need to be a distribution whose value is invariant under coor-
dinate transformations. The state space of the infinitesimal parts, then, comes equipped
with an invariant two-form upon which we can define such a distribution. The state
space is therefore a symplectic manifold, that is, the states of the infinitesimal parts are
described by pairs of conjugate variables, which recovers phase space. If the previous
assumption holds, deteministic and reversible evolution is a symplectomorphism, that is,
deterministic and reversible evolution follows classical Hamiltonian mechanics. Proper
handling of the time variable will give us a relativistic version of the framework without
extra assumptions.
Irreducibility: “Specifying the state of the whole system tells us nothing about its infinites-
imal parts.” For example, we can study the state of an electron by scattering photons
off of it. But whenever a photon interacts with the electron, it interacts with the whole
electron. There is no way to mark a part of an electron and study it independently from
the rest. Mathematically, the state of the electron will be a distribution that evolves de-
terministically where the motion of each infinitesimal part cannot be further described.
viii PREFACE

Kinematic equivalence: “Specifying the motion of the system is equivalent to specifying


its state and evolution.” This means that we will have to be able to re-express a distri-
bution over kinematic variables (i.e. position and velocity) into a distribution over state
variables (i.e. position and momentum) and vice-versa. Mathematically, the symplectic
two-form will induce a symmetric tensor over the tangent space for position. This will
give us a metric and will also allow us to reformulate the laws of motion according to
Lagrangian mechanics. Because the transformation is linear, we are able to constrain the
Hamiltonian to the one for massive particles under scalar and vector potential forces.

Physical mathematics
The first step in physical mathematics is the development of a general mathematical theory
of experimental science, or simply a general theory, that lays down the basic axioms and
definitions that are required for any physical theory. We need a “theory of scientific
theories” that allows us to define and study the set of all well-formed scientific
theories. The core tenet is the principle of scientific objectivity: “Science is universal, non-
contradictory and evidence based.” This serves to define the subject and the basic formal
requirements of a scientific theory. We divide the general theory into three main parts.

Experimental verifiability. This part defines a scientific theory as a set of statements. It


provides the basic axioms which simply characterize the rules of logic as applied to
experimentally verifiable statements. Mathematically, these requirements are captured
by topologies and σ-algebras over the space of the possible cases that can be identified
experimentally. This part is very well developed both conceptually and mathematically,
and it is the only part currently included in this book.
Informational granularity. This part adds the ability to compare the level of description
of different statements. Mathematically, this imposes a preorder on the algebra of state-
ments which is intended to provide a common foundation to geometry, measure theory,
probability theory and information theory. All those structures, in fact, provide ways to
compare the size of sets of possible cases and the description provided in terms of those
sets. This part is fairly developed conceptually but not yet well developed mathemati-
cally.
States and processes. This part defines the basic physical notions of processes, systems
and states. It serves to bring to light a series of hidden assumptions that are needed,
both physically and mathematically, to be able to talk about independent systems.
These will lead to notions of entropy, state spaces and the geometrical structures that
accompany them. This part is still being developed conceptually, though the progress is
encouraging.

Current plan and status


In this version we have added a reverse physics part with a chapter on classical mechanics.
We plan to follow that with a chapter on thermodynamics and statistical mechanics. For
physical mathematics, the work is slowed down by the fact that general mathematical tools
for the informational granularity appear to be missing. In particular, if we want state counting
in quantum mechanics to be consistent with statistical mechanics and information theory, a
more general notion of measure is needed.
PROJECT OVERVIEW ix

Changelog
2023/10/1: Ver 2.0 - Divided the work into two main parts: Reverse Physics and Physical
mathematics. Added chapter on reversing classical mechanics. Minor updates on the
logic section.
2021/03/08: Ver 1.0 - Updated the first three chapters with minor changes: renamed tau-
tology to certainty and contradiction to impossibility as they characterize better their
role in the framework; made more formal justifications for the basic axioms and some
of the basic definitions; causal relationships are now proved to be continuous instead of
assumed to be continuous. Added Part II to include the results that are not yet fully
formalized, to give a sense of the future scope of the work.
2019/07/07: Ver 0.3 - Reviewed first two chapters to clarify the idea of possible assignments
and how contexts for function spaces are constructed.
2019/02/22: Ver 0.2 - Consolidated third chaper on properties, quantities and ordering.
2018/06/22: Ver 0.1 - Consolidated first two chapters that lay the foundation for the general
theory.
Contents

Preface v
Project overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

I Reverse Physics 1

1 Classical mechanics 5
1.1 Formulations of classical mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Inequivalence of formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Kinematics vs dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 Reversing Hamiltonian mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.5 Multiple degrees of freedom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.6 Reversing differential topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.7 Reversing Lagrangian mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
1.8 Full kinematic equivalence and massive particles . . . . . . . . . . . . . . . . . . 53
1.9 Relativistic mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
1.10 Reversing phase space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
1.11 Reversing Newtonian mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
1.12 Directional degree of freedom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
1.13 Infinitesimal reducibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
1.14 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

II Physical Mathematics 97

1 Verifiable statements and experimental domains 101


1.1 Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
1.2 Verifiable statements and experimental domains . . . . . . . . . . . . . . . . . . 114
1.3 Theoretical domains and possibilities . . . . . . . . . . . . . . . . . . . . . . . . . 124
1.4 Topological spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
1.5 Sigma-algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
1.6 Decidable domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
1.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
1.8 Reference sheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

2 Domain combination and relationships 147

xi
xii CONTENTS

2.1 Dependence and equivalence between domains . . . . . . . . . . . . . . . . . . . 147


2.2 Combining domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
2.3 Experimental domain for experimental relationships . . . . . . . . . . . . . . . . 160
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

3 Properties and quantities 167


3.1 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
3.2 Quantities and ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
3.3 References and experimental ordering . . . . . . . . . . . . . . . . . . . . . . . . 176
3.4 Discrete quantities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
3.5 Arbitrary precision and continuous quantities . . . . . . . . . . . . . . . . . . . . 186
3.6 When ordering breaks down . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

IIIBlueprints for the work ahead 195

1 Reverse Physics 197


1.1 Classical mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
1.2 Thermodynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
1.3 Quantum mechanics and irreducibility . . . . . . . . . . . . . . . . . . . . . . . . 199

2 Physical mathematics 205


2.1 Experimental verifiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
2.2 Informational granularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
2.3 States and processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
2.4 Open questions and possible extensions . . . . . . . . . . . . . . . . . . . . . . . 211

IV Appendix 215

A Reference sheets for math and physics 217


A.1 Set theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217

Credits 219
Part I

Reverse Physics

1
CONTENTS 3

Reverse physics is an approach to the foundations of physics that analyzes known theo-
ries to identify those physical principles and assumptions that can be taken as their conceptual
foundation. This is the analogue of reverse mathematics, a program in mathematical logic
that seeks to determine which axioms are required to prove mathematical theorems.
While some physical theories, such as Newtonian mechanics, thermodynamics and special
relativity, are indeed founded on laws or principles, many theories, such as Hamiltonian and
Lagrangian mechanics, quantum mechanics, and general relativity, are based on mathematical
relationships that are simply postulated without a strict physical justification. Most modern
theories are of the latter type, as theoretical physicists have increasingly focused on mathe-
matical ideas rather than physical starting points. Therefore we do not know why the state of
a quantum system can be represented by a ray in a Hilbert space, or what exact relationship
is represented by the Einstein field equations. The goal of reverse physics, then, is to find
suitable physical premises that can function as a more proper foundation for each theory. We
stress that the premises have to be of a physical nature, such as “the system under study
is isolated” or “the quantity is additive under system composition”. That is, they must be
principles or assumptions that express some physical idea, not some abstract mathematical
notion.
Another issue in modern physics is that it is currently a patchwork of different theories, and
one is trained to match patterns and examples to decide which problems (or what aspects of a
larger problem) should be treated with a particular theory. The goal here is to fully understand
what physical situation each mathematical structure can suitably describe, what connections
can be made across different theories and what exactly are the true limits of applicability of
the different physical theories. One of the main results of reverse physics is that there are
some core ideas that are shared by all physical theories, and that, in fact, physical theories
are not as disconnected from each other as generally thought.
While this work explores connections between different mathematical and physical disci-
plines, it would be impractical to provide even a concise introduction to all. Therefore when
encountering concepts like information entropy, Minkowski space-time, function spaces, quan-
tum observables, Lagrange multipliers, intensive quantities, and so on, we can only provide
the definition with very limited context, as most of the space must be dedicated to the con-
nections between these concepts. It is up to the reader to decide whether to invest additional
time on other sources, or to simply make a mental note and proceed further.
By nature, reverse physics does not aim to be mathematically precise, but rather concep-
tually precise. While the goal is to create a dictionary between physical concepts and their
mathematical representation, this mapping cannot be absolutely perfect for a simple reason:
we have no guarantee that the current mathematical structures are the correct ones to capture
the physical ideas. Conversely, we cannot give a precise mathematical characterization of the
physical ideas if these are not conceptually clear. Therefore conceptual clarity has to come
before formal precision. Yet, it is true that to reach full conceptual clarity the ideas have
to be sufficiently refined to allow a formal definition. This task is the purview of physical
mathematics, which aims to find physically meaningful mathematical structures starting
from definitions and axioms that can be fully supported by physical requirements and as-
sumptions. Yet, it would be premature to engage in such activity without having thoroughly
tested the conceptual framework beforehand.
Chapter 1

Classical mechanics

The standard view in physics is that classical mechanics is perfectly understood. It has three
different but equivalent formulations, the oldest of which, Newtonian mechanics, is based
on three laws. Classical mechanics is the theory of point particles that follow those laws.
Unfortunately, this view is incorrect.
We will see that the three formulations are not equivalent, in the sense that there are phys-
ical systems that are Newtonian but not Hamiltonian and vice-versa. There are also a number
of questions that have been left unanswered, such as the precise nature of the Hamiltonian
or the Lagrangian, and what exactly the principle of stationary action represents physically.
While shedding light on these issues, we will also find that classical mechanics already contains
elements that are typically associated with other theories, such as quantum mechanics/field
theories (uncertainty principle, anti-particles), thermodynamics/statistical mechanics (ther-
modynamic and information entropy conservation) or special relativity (energy as the time
component of a four-vector). In other words, the common understanding of classical mechan-
ics is quite shallow, and its foundations are, in fact, not separate from the ones of classical
statistical mechanics or special relativity.
What reverse physics shows is that the central assumption underneath classical mechanics
is that of infinitesimal reducibility (IR): a classical system can be thought of as made of
parts, which in turn are made of parts and so on; studying the whole system is equivalent to
studying all its infinitesimal parts. This assumption, together with the assumption of inde-
pendence of degrees of freedom (IND), is what gives us the structure of classical phase
space with conjugate variables. The additional assumption of determinism and reversibil-
ity (DR), the fact that the description of the system at one time is enough to predict its
future or reconstruct its past, leads us to Hamiltonian mechanics. On the other hand, assuming
kinematic equivalence (KE), the idea that trajectories in space are enough to reconstruct
the state of the system and vice-versa, leads to Newtonian mechanics. The combination of
all above assumptions, instead, leads to Lagrangian mechanics and, in particular, to massive
particles under (scalar and vector) potential forces.
As a guide to the chapter, here is the list of main points in the order in which they will
be presented, one for each section.

1. Review of classical formulations


2. Lagrangian mechanics is Hamiltonian mechanics and KE
3. Kinematics, in general, is not enough to reconstruct dynamics

5
6 CHAPTER 1. CLASSICAL MECHANICS

4. Hamiltonian mechanics (one DOF) is equivalent to DR


5. Hamiltonian mechanics (multiple DOFs) is equivalent to DR plus IND
6. Differential calculus and its generalization, differential topology, study infinitesimally
additive quantities that depend on geometric shapes (i.e. lines, surfaces, volumes)
7. The principle of least action is a consequence of DR, IND and KE
8. Massive particles under potential forces are a consequence of DR, IND and KE
9. Special relativity is a consequence of DR, IND and KE
10. Phase space is the only structure that makes distributions, state counting and entropy
frame invariant
11. Newtonian mechanics is a consequence of KE
12. Three dimensional spaces are the only spaces for which distributions over directions are
frame invariant
13. Classical particle states as points in phase space are equivalent to IR

1.1 Formulations of classical mechanics


In this section we will briefly review the three main formulations of classical mechanics. Our
task is not to present them in detail, but rather to provide a brief summary of the equations
so that we can proceed with the comparison. In particular, given that different conventions
are used across formulations, within the same formulation and among different contexts (e.g.
relativity, symplectic geometry), we will want to make the notation homogeneous to allow
easier comparisons.

Newtonian mechanics
For all formulations, the system is modeled as a collection of point particles, though we will
mostly focus on the single particle case. For a Newtonian system, the state of the system at
a particular time t is described by the position xi and velocity v i of all its constituents. Each
particle has its mass m, not necessarily constant in time, and, for each particle, we define
kinetic momentum as Πi = mv i .1
The evolution of our system is given by Newton’s second law:2

F i (xj , v k , t) = dt Πi . (1.1)

Mathematically, if the forces F i are locally Lipschitz3 continuous, then the solution xi (t) is
1
We will use the letter t for the time variable, x for position and v for velocity, which is a very common
notation in Newtonian mechanics. However, we will keep using the same letters in Lagrangian mechanics as
well, instead of q and q̇, for consistency. Given that the distinction between kinetic and conjugate momentum
is an important one, we will denote Π the former and p the latter. The Roman letters i, j, k, ... will be used
to span the spatial components (e.g. i ∈ {1, 2, 3} for a particle in 3 dimensional space and i ∈ {1, 2, . . . , 3n} for
n particles), while we will use the Greek letters α, β, γ, ... to span space-time components (e.g. α ∈ {0, 1, 2, 3}
where the 0 value of the index is used for time). Unlike some texts, xi do not represent Cartesian coordinates,
and therefore they should be understood already as generalized coordinates.
2 d ∂
For derivatives, we will use the shorthand dt for dt and ∂xi for ∂x i . For functions that depend on multiple
arguments we use a free index to note that it depends on all elements; each argument will have a different
index to highlight that there is no relationship between arguments.
3 √
Lipschitz continuity means that the slope of the function is bounded. For example, x in the neighborhood
of 0 is not Lipschitz continuous as it has a vertical asymptote at that point. One can construct examples (e.g.
Norton’s dome) where the forces are not locally Lipschitz continuous, and therefore the initial position and
1.1. FORMULATIONS OF CLASSICAL MECHANICS 7

unique. That is, given position and velocity at a given time, we can predict the position and
velocity at future times. We will assume a Newtonian system has this property.
An important aspect of Newtonian mechanics is that the equations are not invariant under
coordinate transformation. To distinguish between apparent forces (i.e. those dependent on
the choice of frame) and the real ones, we assume the existence of inertial frames. In an inertial
frame there are no apparent forces, and therefore a free system (i.e. no forces) with constant
mass proceeds in a linear uniform motion, or stays still.4

Lagrangian mechanics
The state for a Lagrangian system is also given by position xi and velocity v i . The dynamics
is specified by a single function L(xi , v j , t) called the Lagrangian. For each spatial trajectory
t
xi (t) we define the action as A[xi (t)] = ∫t01 L(xi (t), dt xi (t), t)dt. The trajectory taken by the
system is the one that makes the action stationary:
t1
δA[xi (t)] = δ ∫ L (xi (t), dt xi (t), t) dt = 0 (1.2)
t0

The evolution can equivalently be specified by the Euler-Lagrange equations:

∂xi L = dt ∂vi L. (1.3)

Note that not all Lagrangians lead to a unique solution. For example, L = 0 will give the
same action for all trajectories and therefore, strictly speaking, all trajectories are possible.
The stationary action leads to a unique solution if and only if the Lagrangian is hyperregular,
which means the Hessian matrix ∂vi ∂vj L is invertible. Like in the Newtonian case, we will
assume Lagrangian systems satisfy this property.
Unlike Newton’s second law, both the Lagrangian and the Euler-Lagrange equations are
invariant under coordinate transformations. This means that Lagrangian mechanics is partic-
ularly suited to study the symmetries of the system.

Hamiltonian mechanics
In Hamiltonian mechanics, the state of the system is given by position q i and conjugate mo-
mentum pi . The dynamics is specified by a single function H(q i , pj , t) called the Hamiltonian.5
The evolution is given by Hamilton’s equations:

dt q i = ∂pi H
(1.4)
dt pi = −∂qi H
velocity do not yield a unique solution (i.e. in Norton’s dome, the body can stay on the top of the dome
indefinitely, or it can fall down after an arbitrary amount of time). In this case, something else, outside the
system, will necessarily determine what is the motion of the system, and therefore it is not true that the force
and the state of the system fully determine the dynamics of the system.
4
Recall that linear motion simply means that it describes a line in space, while uniform motion means that
the speed is constant. Therefore we can have linear non-uniform notion (e.g. an object accelerated along the
same direction) or a non-linear uniform motion (e.g. an object going around in a circle at constant speed).
5
We use a different symbol for position in Hamiltonian mechanics because, while it is true that q i = xi , it is
also true that ∂qi ≠ ∂xi : the first derivative is taken at constant conjugate momentum while the second is taken
at constant velocity. This creates absolute confusion when mixing and comparing Lagrangian and Hamiltonian
concepts, which our notation avoids completely.
8 CHAPTER 1. CLASSICAL MECHANICS

We will again want these equations to yield a unique solution, which means the Hamiltonian
must be at least differentiable, and the derivatives must at least be Lipschitz continuous.
Hamilton’s equations are invariant as well. The Hamiltonian itself is a scalar function
which is often considered (mistakenly as we’ll see later) invariant. This formulation is the
most suitable for statistical mechanics as volumes of phase space correctly count the number
of possible configurations.

1.2 Inequivalence of formulations


It is often stated in physics books that all three formulations of classical mechanics are equiv-
alent. We will look at this claim in detail, and conclude that this is not the case: there are
systems that can be described by one formulation and not another. More precisely, the set of
Lagrangian systems is exactly the intersection of Newtonian and Hamiltonian systems.

Testing equivalence
We will consider two formalisms equivalent if they can be applied to exactly the same sys-
tems. That is, Newtonian and Lagrangian mechanics are equivalent if any system that can
be described using Newtonian mechanics can also be described by Lagrangian mechanics and
vice-versa. In general, in physics great emphasis is put on systems that can indeed be studied
by all three, leaving the impression that this is always doable.6 However, just with a cursory
glance, we realize that this can’t possibly be the case.
The dynamics of a Newtonian system, in fact, is specified by three independently chosen
functions of position and velocity, the forces applied to each degree of freedom (DOF). On
the other hand, the dynamics of Lagrangian and Hamiltonian systems is specified by a single
function of position and velocity/momentum, the Lagrangian/Hamiltonian. Intuitively, there
are more choices in the dynamics for Newtonian systems than for Lagrangian and Hamiltonian.
Now, the reality is a bit trickier because the mathematical expression of the forces is not
enough to fully characterize the physical system. We need to know in which frame we are,
what coordinates are being used and the mass of the system, which is potentially a function
of time. On the Lagrangian side, note that the Euler-Lagrange equations are homogeneous in
L. This means that multiplying L by a constant leads to the same solutions, meaning that the
same system can be described by more than one Lagrangian. The converse is also true: if one
system is half as massive and is subjected to a force half as intense, the resulting Lagrangian
is also simply rescaled by a constant factor. Therefore the map between Lagrangians and
Lagrangian systems is not one-to-one: it is many-to-many. This is why we should never look
simply at mathematical structures if we want to fully understand the physics they describe.
Regardless, our task is at the moment much simpler: we only need to show that there
are Newtonian systems not expressible by Lagrangian or Hamiltonian mechanics. We can
therefore limit ourselves to systems with a specific constant mass m in an inertial frame
and write ai = F i (xj , v k , t)/m. Given that the force is arbitrary, the acceleration can be an
arbitrary function of position, velocity and time. Similarly, we can write the acceleration of a
6
If one asks the average physicist whether Newtonian and Hamiltonian mechanics are equivalent, the answer
most of the time will be enthusiastically positive. If one then asks for the Hamiltonian for a damped harmonic
oscillator, the typical reaction is annoyance due to the nonsensical question (damped harmonic oscillators do
not conserve energy), followed by a realization and partial retraction of the previous claim. The moral of the
story is to never take these claims at face value.
1.2. INEQUIVALENCE OF FORMULATIONS 9

Lagrangian system as ai = F i [L]/m. That is, the acceleration is going to be some functional of
the Lagrangian. Given the Euler-Lagrange equations 1.3, the map between the Lagrangian and
the acceleration must be continuous in both direction: for small variation of the Lagrangian
we must have a small variation of the equations of motion and therefore of the acceleration,
and for small variations of the equations of motions we must have a small variation of the
Lagrangian. But a continuous surjective map from the space of a single function (i.e. the
Lagrangian) to the space of multiple functions (i.e. those that specify the acceleration in
terms of position and velocity) does not exist,7 and therefore there must be at least one
Newtonian system with constant mass expressed in an inertial frame that is not describable
using Lagrangian mechanics. The same argument applies for Hamiltonian mechanics, since the
dynamics in this case is also described by a single function in the same number of arguments.
We therefore reach the following conclusion:

Insight 1.5. Not all Newtonian systems are Lagrangian and/or Hamiltonian.

Newtonian vs Lagrangian/Hamiltonian
We now want to understand whether all Lagrangian systems are Newtonian. Given what we
discussed, we cannot expect to reconstruct the mass and force uniquely from the expression of
the Lagrangian. We consider the mass and the frame fixed by the problem, together with the
Lagrangian, and therefore we must only see whether we can indeed find a unique expression
for the acceleration. From the Euler-Lagrange equations 1.3 we can write

∂xi L = dt ∂vi L = ∂xj ∂vi L dt xj + ∂vk ∂vi L dt v k = ∂xj ∂vi L v j + ∂vk ∂vi L ak
(1.6)
∂vk ∂vi L ak = ∂xi L − ∂xj ∂vi L v j .

To be able to write the acceleration explicitly, we must be able to invert the Hessian matrix
∂vk ∂vi L. As we noted before, this is exactly the condition for which the principle of stationary
action leads to a unique solution, and we can better understand why. If it is not invertible
at a point, the determinant is zero and therefore one eigenvalue is zero. The corresponding
eigenvector corresponds to a direction for which the equation tells us nothing, and therefore
a variation of the acceleration in that direction will not change the action. This is why the
invertibility of the Hessian is required in order to obtain unique solutions.
What we find, then, is that for any Lagrangian system, which we assume to have a unique
solution, we can explicitly write the acceleration as a function of position, velocity and time.
Therefore

Insight 1.7. All Lagrangian systems are Newtonian.

Now we turn our attention to Hamiltonian mechanics and, similarly, we ask whether we
can express the acceleration as a function of position and velocity. We have

ai = dt v i = dt dt q i = dt ∂pi H = ∂qj ∂pi Hdt q j + ∂pk ∂pi Hdt pk


(1.8)
= ∂qj ∂pi H∂pj H − ∂pk ∂pi H∂qk H.
7
Mathematically, the space of continuous functions C(R, R) and C(Rn , R) are not homeomorphic. Intu-
itively, the underlying reason is the same as to why a map from a volume to a line can’t be continuous: in a
volume you have infinitely many directions you can move away from a point, while on a line you only have
two.
10 CHAPTER 1. CLASSICAL MECHANICS

p p

q H

Figure 1.1: On the left, the phase-space diagram for a photon treated as a point particle.
The Hamiltonian H = c∣p∣, on the right, is proportional to the modulus of p. Since H is
not differentiable when p = 0, those states are excluded, consistent with the physics. The
displacement field has only a q component, which is +c above the horizontal axis and −c
below the horizontal axis.

This tells us that the acceleration is always an explicit function, but it is, in general, an explicit
function of position and momentum, not of position and velocity. To change the expression,
we need to be able to write the momentum as a function of position and velocity. Note that
Hamilton’s equations already give a way to express the velocity in terms of position and
momentum, we just need that expression to be invertible, which means the Jacobian must be
invertible. We must have:
∣∂pi v j ∣ = ∣∂pi ∂pj H∣ ≠ 0. (1.9)
To be able to express momentum as a function of position and velocity, then, we need the
Hessian of the Hamiltonian to be invertible (i.e. to have non-zero determinant).
Note that we had no such requirement for the Hamiltonian. For example, H = 0 leads
to equations dt q i = 0 and dt pi = 0, which have unique solutions: both position q i (t) = kqi
and momentum pi (t) = kpi are constants of motion. The Hessian, being the zero matrix, is
not invertible, and in fact we cannot write momentum as a function of position and velocity:
velocity dt q i is always zero in all cases while conjugate momentum can be any value kpi .
Though this case may not be physically interesting, it is a perfectly valid Hamiltonian system
and shows that we should always check the trivial mathematical case. However, let us go
through a more physically meaningful case.
Photon as a particle. If we want to treat the photon as a classical particle, we can write
the Hamiltonian by expressing the energy as a function of momentum
H = h∣ω∣
̵ = ch∣k∣
̵ = c∣p∣. (1.10)
If we apply Hamilton’s equations, we have
pi
dt q i = c
∣p∣ (1.11)
dt pi = 0.
1.2. INEQUIVALENCE OF FORMULATIONS 11

That is, the norm of the velocity is always c, the momentum decides its direction, and the
momentum itself does not change in time, as shown in fig. 1.1. This is indeed the motion
of a free photon. One can confirm, through tedious calculation, that the determinant of the
Hessian is indeed zero, yet it is easier and more physically instructive to see that we cannot
reconstruct the momentum from the velocity. Relativistically, all photons travel along the
geodesics at the same speed, therefore two photons that differ only by the magnitude of the
momentum will travel the same path.
Hamiltonian systems that are also Newtonian, then, need to satisfy this extra condition,
so let us give it a name.
Assumption KE (Kinematic Equivalence). The kinematics of the system is sufficient to
reconstruct its dynamics and vice-versa. That is, specifying the motion of the system is equiv-
alent to specifying its state and evolution.
By kinematics we mean the motion in space and time and by dynamics we mean the state
and its time evolution in phase space. We will need to analyze the difference between the two
more in detail, but we should first finish our comparison between the different formulations.
Summing up, we find that
Insight 1.12. Not all Hamiltonian systems are Newtonian: only those for which KE is valid.

Lagrangian vs Newtonian
We now need to compare Lagrangian and Hamiltonian systems. The task is a lot easier because
we already have a precise way to connect the two. If we are given a Lagrangian L, we define
the conjugate momentum pi = ∂vi L and the Hamiltonian H = pi v i − L. If we are given a
Hamiltonian H, we can define a Lagrangian L = pi v i − H and a velocity v i = dt q i = ∂pi H. The
only detail that needs to be understood is whether this can be done for all Lagrangian and
Hamiltonian systems.
While these expressions are always defined, we need to check whether we can change vari-
ables; whether we can write the Lagrangian in terms of position and velocity and the Hamil-
tonian in terms of position and momentum. Going from a Hamiltonian to a Lagrangian, it
again means that we can write momentum as a function of position and velocity, and there-
fore assumption KE must hold. This makes sense: if all Lagrangian systems are Newtonian,
and KE was required for a Hamiltonian system to be Newtonian, then it is also required
for a Hamiltonian system to be Lagrangian. But the connection is stronger: KE is the only
additional assumption we need to be able to write a Lagrangian given a Hamiltonian.
Going from a Lagrangian to a Hamiltonian, it means that we can write velocity as a
function of position and momentum. Note that since we define conjugate momentum as the
derivative of the Lagrangian, we can already express momentum as a function of position and
velocity, which means we are simply asking that expression to be invertible. This is, again,
assumption KE, just in the opposite direction. We must have
0 ≠ ∣∂vi pj ∣ = ∣∂vi ∂vj L∣ . (1.13)
This means that assumption KE is exactly the invertibility of the Hessian, the condition for
unique solution of the Lagrangian. All Lagrangian systems that admit unique solutions, then,
satisfy assumption KE. In fact, we can see that the Hessian determinants are related
−1
∣∂vi ∂vj L∣ = ∣∂vi pj ∣ = ∣∂pi v j ∣
−1
= ∣∂pi ∂pj H∣ . (1.14)
12 CHAPTER 1. CLASSICAL MECHANICS

This means that every Lagrangian admits a Hamiltonian, but not every Hamiltonian admits
a Lagrangian. Only the Hamiltonian systems for which KE is valid will also be Lagrangian
systems, with a guaranteed unique solution given that KE is exactly the assumption needed
for that as well. Therefore we conclude that

Insight 1.15. Lagrangian systems are exactly those Hamiltonian systems for which KE is
valid.

Relationship between formulations


The relationship between the different formulations, then, can be summarized with the Venn
diagram in fig. 1.2.

Newtonian Hamiltonian
systems systems

Lagrangian
systems

Figure 1.2: Not all Hamiltonian systems are Newtonian and not all Newtonian systems are
Hamiltonian. All Lagrangian systems are both Newtonian and Hamiltonian.

We have found that KE is a constitutive assumption of Lagrangian mechanics, and that


it clearly marks which Hamiltonian systems are Newtonian/Lagrangian. By constitutive as-
sumption we mean an assumption that must be taken, either explicitly or implicitly, for a
theory to be valid. But what makes a system Hamiltonian and what makes a system Newto-
nian? Can we find a full set of constitutive assumptions for classical mechanics?

1.3 Kinematics vs dynamics


We have seen the importance of the connection between kinematics and dynamics. In this sec-
tion we will explore this link more deeply and come to the following conclusion: the kinematics
of a system is not enough to reconstruct its dynamics.

Particle under linear drag


Let us first review exactly what the kinematics and dynamics are. Given a system, its kine-
matics is the description of its motion in space and time. Position, velocity, and acceleration
are kinematic variables because they describe the motion. Kinematics is what Galileo studied
and started to give a rigorous account of. The dynamics, instead, describes the cause of such
motion. Force, mass, momentum, energy are dynamic quantities as they are used to describe
1.3. KINEMATICS VS DYNAMICS 13

x v a
m
x0 + b v0 v0
t

x0 b
−m v0
t t

Figure 1.3: Evolution in time of position, velocity and acceleration for ma = −bv. Both acceler-
ation and velocity will tend to zero as time increases. The position will tend to an equilibrium
given by initial position and initial velocity.

why a body moves in a particular way. Dynamics is what Newton introduced and his second
law, expressed as F = ma, clearly shows the link.
The link between the two concepts seems important given the constitutive role of KE
in Lagrangian mechanics. Moreover, while both Newtonian and Hamiltonian mechanics are
dynamical theories, in the sense that quantities like force and momentum are intrinsic parts
of the respective theories, Lagrangian mechanics seems to be a purely kinematic theory, as it
is described only by kinematic variables like position and velocity. Therefore it seems useful
to characterize the kinematics-dynamics link as much as possible. Let’s analyze a concrete
example.
Suppose we are given the following equation:

ma = −bv. (1.16)

The equation is in terms of kinematic variables and, given initial conditions x0 and v0 , it
admits a unique solution, a unique trajectory. The solution, plotted in fig. 1.3, is
m b
x(t) = x0 + v0 (1 − e− m t )
b
b
v(t) = v0 e− m t (1.17)
b b
a(t) = −v0 e− m t
m
Can we reconstruct the forces acting on this system?
The obvious answer seems to be that the constant m represents the mass of the system
and F = −bv the force. This is the case of a particle under linear drag: the system is subjected
to a frictional force that is proportional and opposite to the velocity. If we set the Lagrangian

1 b
L = mv 2 e m t . (1.18)
2
and apply the Euler-Lagrange equation 1.3 we have
b b b b b
∂x L = 0 = dt ∂v L = dt (mve m t ) = mae m t + mve m t = e m t (ma + bv)
m (1.19)
ma = −bv.
14 CHAPTER 1. CLASSICAL MECHANICS

Therefore we have a Lagrangian for the system. We can also find a Hamiltonian
b
p = ∂v L = mve m t
p b
v = e− m t
m
p b 1 p b 2 b p2 b 1 p2 − b t (1.20)
H = pv − L = p e− m t − m ( e− m t ) e m t = e− m t − e m
m 2 m m 2m
1 p2 − b t
= e m
2m
and apply Hamilton’s equations 1.4
p −bt
dt q = ∂p H = e m
m (1.21)
dt p = −∂q H = 0.

The second equation tells us momentum is constant p0 . Substituting the constant in the first
equation, we have the velocity as a function of time, which we can integrate. We have
p0 b
q(t) = q0 + (1 − e− m t )
b (1.22)
p(t) = p0 .

The kinematics works perfectly, but the dynamics seems off, as shown in fig. 1.4. First of
all, based on the physics, one would expect the momentum to be decreasing in time
b
p(t) = mv(t) = mv0 e− m t . (1.23)

However, conjugate momentum is a constant of motion. For the energy, we would expect the
Hamiltonian to match the kinetic energy
1 1 b
E(t) = mv 2 (t) = mv02 e−2 m t (1.24)
2 2
but if we express the Hamiltonian in terms of velocity we have

1 p2 − b t 1 1 b 2 b 1 b 1 b
H(t) = e m = (mv(t)e m t ) e− m t = mv 2 (t)e m t = mv02 e− m t . (1.25)
2m 2m 2 2
That is, the energy decreases more slowly than it should. This is not good.
Now, it is true that conjugate momentum is not the same as kinetic momentum. But the
difference, as we will see much more clearly later, is caused by non-inertial non-Cartesian
coordinate systems and/or the presence of vector potential forces.8 We are not at all in that
case. Also, note that at time t = 0 the momentum and the energy do match our expectation,
but not after. Therefore imagine a situation where friction is non-negligible only in a particular
region. We would expect p = mv to be valid before it enters, but not when it comes out. But
wouldn’t it come out in another region where we would expect p = mv to work? This is strange.
How should we proceed?
8
The relationship is pi = mgij v j + qAi . This reduces to pi = mv i if and only if we are in an inertial frame
with Cartesian coordinates (i.e. gij = δij ) and no forces Ai = 0
1.3. KINEMATICS VS DYNAMICS 15

q p E
p(t) p20
q0 + 1b p0 p0 2m

mv(t) H(t)
q0 E(t)
t t t
b 2 b
Figure 1.4: Trying to interpret L = 12 mv 2 e m t and H = 12 pm e− m t as respectively the Lagrangian
and Hamiltonian of a particle under linear drag. While evolution of the position matches, note
how the conjugate momentum is constant while the kinetic momentum decreases. Also, the
Hamiltonian and the energy do not decrease at the same rate.

m̂ p E
p(t) = m̂(t)v(t) p20
p0 2m

H(t) = E(t)

m
t t t
b 2 b
Figure 1.5: Showing how L = 12 mv 2 e m t and H = 12 pm e− m t can be interpreted as the Lagrangian
and Hamiltonian of a variable mass system. The mass is increasing exponentially in time, while
both conjugate and kinetic momentum remain constant. This means the velocity will need to
decrease. The energy decreases at the same rate as the Hamiltonian.

Variable mass system


As it is typical in reverse physics, we will assume that things work in a reasonable way and
that we simply have the wrong connection between physics and math. Recall that we started
just with an equation, and we then interpreted m to be the mass of the system. Let’s just
assume that m is a constant with units of mass and define the actual mass of the system as
the ratio between conjugate momentum and velocity. Looking back at 1.20, as shown in fig.
1.5, we have
b
m̂(t) = p(t)/v(t) = me m t
b
p(t) = mv(t)e m t = m̂(t)v(t) (1.26)
1 p2 (t) − b t 1 p2 (t) 1
H(t) = e m = = m̂(t)v 2 (t) = E(t)
2 m 2 m̂ 2
Now everything actually works perfectly: the relationship between velocity and conjugate
momentum is respected, the Hamiltonian matches the kinetic energy. We just have a variable
mass system. How and why does this work exactly?
16 CHAPTER 1. CLASSICAL MECHANICS

Let us expand Newton’s second law for a variable mass system.9 We have:
F i = dt (m̂v i ) = dt m̂ v i + m̂ai
(1.27)
m̂ai = F i − dt m̂v i
In particular, for our one dimensional case, let us set F = 0 and substitute m̂
b b b b
me m t a = 0 − dt me m t v = − me m t v
m (1.28)
ma = −bv.
Therefore the same equation, the same kinematics, applies to a variable mass system that
increases the mass over time. You can imagine, for example, a body that is absorbing mass
from all directions, so that the balance of forces on the body is zero. The body, then, is not
slowing down because of friction. It is slowing down because momentum is conserved, and if
the mass is increasing, the velocity must be decreasing at the same rate. The energy, on the
other hand, will decrease because the square of the velocity will decrease faster than the mass
increases.
In Newtonian mechanics, we can readily distinguish these two cases because we have
to be explicit about forces and masses. In Hamiltonian mechanics things are a bit more
difficult because, as we will see later more precisely, conjugate momentum is not exactly
kinetic momentum and the Hamiltonian is not exactly energy. Yet, conjugate momentum
and the Hamiltonian are not kinematic quantities, they are dynamic quantities and therefore
we can see that these would be different in different cases. In Lagrangian mechanics this is
even more difficult to see because it looks like a purely kinematic theory, while it is not: the
Lagrangian itself is not a purely kinematic entity. As we saw, Lagrangian mechanics implicitly
assumes KE, which is a condition on the dynamics as well, and the Lagrangian itself is used to
reconstruct conjugate momentum and the Hamiltonian. Moreover, if Lagrangian mechanics
were a purely kinematic theory, and told us nothing about forces, energy or momentum, it
would not be a complete formulation of classical mechanics.
So we have seen that the same kinematic equation can describe a constant mass dissipative
system or a variable mass system. Is that it? Not quite. Recall that we mentioned that kinetic
and conjugate momentum will differ in non-inertial frames. Note that we implicitly assumed
that x and t represented the variables for an inertial observer, in the same way that we
originally assumed m was the mass of the system. Could the same equation, then, be describing
yet another system but in a non-inertial frame?

Non-inertial motion
Let’s compare the motion of a particle traveling at constant velocity in an inertial frame,
using t as the time variable, and the motion of a particle decelerating exponentially, using t̂
as the time variable
x(t) = x0 + v0 t
m b (1.29)
x(t̂) = x0 + v0 (1 − e− m t̂ ) .
b
9
Note that, in general, the variable mass system should take into account the momentum gained or lost
by the system when the mass is acquired or ejected. In our case, we are assuming that no momentum is lost,
which means that either the mass is acquired/ejected uniformly from all directions or it is just an apparent
change that depends on the change of coordinates.
1.3. KINEMATICS VS DYNAMICS 17

Note the striking similarity: we can simply set


m b
t = (1 − e− m t̂ ) (1.30)
b
which clearly takes us to a non-inertial frame since uniform motion is no longer uniform in
the new frame.
Let’s study how Newton’s second law changes if we make a change of time variable while
keeping the position variables unchanged
t̂ = t̂(t)
(1.31)
F i = dt (m v i ) = dt (m dt xi ) = dt t̂ dt̂ (m dt t̂ dt̂ xi ).
If we set
m̂ = m dt t̂ (1.32)
we can express the previous equation in the following form
F i = dt t̂ dt̂ (m̂ dt̂ xi ) = dt t̂ dt̂ (m̂ v̂ i ) = dt t̂F̂ i . (1.33)
This tells us that the second observer will see an effective mass rescaled exactly by the ratio
between the time variables. Note that this is exactly what happens in special relativity: the
clock for a boosted observer is dilated by a factor of γ which is exactly the factor used in
the relativistic mass.10 If t is the time variable for an inertial frame and t(t̂) is a non-linear
function, the resulting frame will be non-inertial and the observer will see an effective variable
mass system.
If we look at our problem this way, the rescaling of the mass, then, is not due to a truly
variable mass, but a variable effective mass due to the slowing down of the clock. The body
slows down because the non-inertial time is slowing down and the body appears to stop
because the clock becomes infinitely slow. While this might sound like a contrived case,11
these are exactly the type of situations a fully relativistic theory (i.e. one that works for all
definitions of time and space variables) needs to take into account.
We can verify that this gives us the correct effective mass
m b m b m b m b − b t̂ b
dt̂ t = dt̂ ( (1 − e− m t̂ )) = dt̂ (1 − e− m t̂ ) = − dt̂ e− m t̂ = + e m = e− m t̂
b b b b m (1.34)
b

m̂ = mdt t̂ = m(dt̂ t) −1
= me m .
And we can verify that we get the same equation by plugging in the time transformation in
Newton’s second law with a zero force
0 = dt (m v) = dt (m dt x) = dt t̂ dt̂ (m dt t̂ dt̂ x)
b b b b b b b
= e m t̂ dt̂ (me m t̂ v̂) = e m t̂ (m e m t̂ v̂ + me m t̂ â) = e2 m t̂ (bv̂ + mâ) (1.35)
m
mâ = −bv̂.
10
It may be surprising to see a proto-relativistic effect showing up given that no assumption on space-time
has been made. As we will see, these types of connections between different theories come up often in reverse
physics.
11
On the surface, it sounds similar to what happens in general relativity with a black-hole. An observer that
sees someone falling into a black hole will see him gradually slowing down as he approaches the even horizon
and asymptotically stop there. The observer falling inside the black hole, instead, will perceive his time flowing
uniformly and nothing special will happen as the even horizon is crossed.
18 CHAPTER 1. CLASSICAL MECHANICS

Note that the expressions for momentum and energy will match the previous case because
the system in the non-inertial frame looks like a variable mass system.

The relationship between kinematics and dynamics


This last case highlights a more subtle issue. In the two previous cases we were in the same
inertial frame, we saw the same trajectory, the same kinematics, but we couldn’t tell whether
we were looking at a fixed mass system under linear drag or a variable mass system: we
couldn’t tell the dynamics. Now, we have the same system, a constant mass particle under no
forces, described in two different frames, one inertial and one not. The motion of the system
will naturally have different representations in the different frames, but this does not mean the
motion or the causes of motion are different: it’s the same object. Therefore we have the same
kinematics even though we have different expressions for the same trajectory. The expression
x(t), then, is not enough to define the kinematics if we do not know exactly what x and t
represent physically, if the frame is not given.
While typically one proceeds by defining the frame first and then the dynamics (i.e. the
forces acting on the system), here we have followed a different approach: we first defined
the dynamics (i.e. constant mass system under no forces) and then found the frame that
matched the given kinematics (i.e. the trajectory or the relationship between velocity and
acceleration). Given that Lagrangian and Hamiltonian mechanics are frame invariant, an
intrinsic characterization of the system itself is exactly what we should be looking for. Saying,
for example, that a system is subjected to no forces or to a linear drag is not frame invariant
because forces are not frame invariant.
It is clear that the type of apparent variable mass due to non-inertial frames is unavoidable
if we want to have a consistent theory with invariant laws. Therefore both Lagrangian and
Hamiltonian mechanics must include these cases. However, it is not exactly clear what to
do for true variable mass systems. From a cursory look, it would seem that everything is
fine and there is no harm in including them. Yet again, from a cursory look we seemed to
have a Lagrangian for a particle under linear drag. As we will see later, there are implicit
connections between Lagrangian/Hamiltonian mechanics on one side and thermodynamics,
statistical mechanics and special relativity on the other. Given that it is not clear to us whether
these connections hold or not,12 we will concentrate on the constant mass case from now on.
Let’s recap what we learned. The biggest point is that we can’t simply look at the kine-
matics and understand the causes of motion. The different formulations have different ways
to relate the dynamics and the kinematics. Newtonian mechanics is the most clear about the
dynamics as it makes us clearly spell out what is going on. This, however, comes at a cost:
the equations are not covariant, meaning they have a different expression in different frames.
The second law, in fact, is valid only for inertial frames with Cartesian coordinates. It is only
in these frames, in fact, that a body will proceed in uniform motion if no forces are applied to
it. If we are in polar coordinates, for example, the trajectory expressed in radius r and angle θ
will not be linear. Even the notion of force is, if one looks closely, a bit ambiguous. In principle,
we want to write both the second law F = ma and the expression for work dW = F dx. If dW
is invariant under change of position variables, the force should be a covector and therefore
dW = Fi dxi . But since the acceleration a will change like a vector, we also have F i = mai .
12
For example, areas of phase space are connected to entropy. Does this connection hold with a variable
mass system?
1.4. REVERSING HAMILTONIAN MECHANICS 19

The notion of force in the second law and in the infinitesimal work are slightly different, and
they coincide only if we are in an inertial frame and Cartesian coordinates.
On the other side, Hamiltonian and Lagrangian mechanics are coordinate independent:
the laws remain the same if we change position variables. This makes them more useful in
many contexts. Lagrangian mechanics is more useful when trying to study the symmetries
of the system. Hamiltonian mechanics is more useful for statistical mechanics and to better
separate degrees of freedom. However, this comes at a price. Hamiltonian and Lagrangian
mechanics apply in fewer cases than Newtonian mechanics. As we saw, linear drag may look
like it has a valid Hamiltonian/Lagrangian, but it doesn’t. For quadratic drag or friction due
to normal force, one cannot find a suitable trick, and is forced to use Rayleigh’s dissipation
functions which modify the Euler-Lagrange equations. This is not a coincidence: while New-
tonian mechanics links kinematics and dynamics by choosing a particular frame, Hamiltonian
and Lagrangian mechanics do so by fixing a type of system. It is the implicit knowledge of the
type of system that allows us to reconstruct the dynamics just by looking at the kinematics
in an unknown frame. What we need to understand, then, is what exactly is this restriction.

1.4 Reversing Hamiltonian mechanics


We now turn our attention to Hamiltonian mechanics and try to understand exactly what
types of systems it focuses on. We will find twelve equivalent formulations of Hamiltonian
mechanics that link ideas from vector calculus, differential geometry, statistical mechanics,
thermodynamics, information theory and plain statistics. The overall result is that Hamilto-
nian mechanics focuses on systems that are assumed to be deterministic and reversible. We
will see how the physical significance of that assumption differs from mathematically naive
characterizations.

Mathematical characterizations
To simplify our discussion, we will first concentrate on a single degree of freedom. The first
characterization of Hamiltonian mechanics is naturally in terms of the equations
dt q = ∂p H
(HM-1D)
dt p = −∂q H.
We will want to treat phase space as a generic two-dimensional space (i.e. manifold), like
we would for a plane in physical space. We will reserve the term coordinate for the position
variable q, while we will refer to the collection of position and momentum as state variables
and will note them as ξ a = [q, p]. We can now define the displacement field
S a = dt ξ a = [dt q, dt p] (1.36)
which is a vector field that defines the evolution of the system in time. Hamilton’s equations,
then, can be expressed as
S q = ∂p H
(1.37)
S p = −∂q H.
To bring out the geometric meaning of the equations, we introduce the matrix
ωqq ωqp 0 1
ωab = [ ]=[ ] (SF-1D)
ωpq ωpp −1 0
20 CHAPTER 1. CLASSICAL MECHANICS

4 4

2 2
p p
0 0

−2 −2

−4 −4

−6 −4 −2 0 2 4 6 −6 −4 −2 0 2 4 6
q q

Figure 1.6: The surface plot shows the value of the Hamiltonian for a harmonic oscillator
p2
H = 2m + 12 kq 2 , red means higher value. The lines are the regions at constant energy H. On
the left, the gradient of the Hamiltonian is shown. On the right, the displacement field is
shown, which is the gradient rotated by a right angle. Note how the displacement is always
parallel to the lines at constant energy.

which rotates a vector by a right angle.13 That is, if v a = [v q , v p ], then va = v b ωba = [−v p , v q ].14
We can rewrite equation HM-1D as

Sa = S b ωba = ∂a H (HM-G)

which tells us that the displacement field is the gradient of the Hamiltonian rotated by a right
angle. Note that the gradient is perpendicular to the lines at constant energy. Therefore, as
we can see in fig. 1.6, a right angle rotation gives us a vector field tangent to those lines,
making it geometrically evident that the value of the Hamiltonian is a constant of motion.
Condition HM-G is just a re-expression of HM-1D. Though it is already useful, we want to
find different mathematical conditions which turn out to be equivalent to the equations.
We start by noting that the displacement field as expressed by 1.37 looks very similar to
a curl of H, except that it is a two dimensional version. In vector calculus, a vector field is
the curl of another field if and only if its divergence is zero.15 This holds here as well. First,
we can verify that

∂a S a = ∂q S q + ∂p S p = ∂q ∂p H − ∂p ∂q H = 0. (1.38)

Geometrically, this means that the flow of S a through a closed region is always zero, as shown
in fig. 1.7. That is, ∮ (S q dp − S p dq) = 0. Note that, since we are in a two dimensional space, a
13
The notion of angle is technically ill-defined in phase space, but this slight imprecision makes it easier to
get the point across.
14
The notation is purposely similar to how indexes are raised and lowered in general relativity by the metric
tensor gαβ , since ωab plays a similar geometric role in phase space. One should be careful, however, that ωab is
anti-symmetric (i.e. ωab = −ωba ), so it matters which side is contracted. In terms of symplectic geometry, the
rotated displacement field Sa corresponds to the interior product of the displacement field with the symplectic
form, usually noted as ιS ω or S⌟ ω.
15
We will leave for now topological requirements as they would be a distraction from the overall point.
1.4. REVERSING HAMILTONIAN MECHANICS 21

p p

q q

Figure 1.7: The flow of the displacement field S a through a path, shown on the left, is equation
to the difference of the Hamiltonian at the two points ∆H = ∫OP S a × dξ b . The net flow of
states through a region (i.e. the flow of the displacement field through the boundary) is zero,
as shown on the left. This means that S a is divergenceless and will admit a stream function,
a potential, which corresponds to the Hamiltonian H.

hyper-surface has dimension n − 1 = 2 − 1 = 1 and therefore hyper-surfaces are lines. Therefore


we have
q p
∮ (S dp − S dq) = ∮ (∂p Hdp + ∂q Hdq) = ∮ dH = 0. (1.39)

That is, the flow of the displacement field is the line integral of the gradient of H, which is
zero over a closed curve.
Conversely, we can see that each divergenceless field in two dimensions admits a stream
function H that satisfies HM-1D. Geometrically, we can construct H in the following way.
Take a reference state O in phase space and assign H(O) = 0. For any other state P , consider
the flow of S through any two lines that connect O and P . Given that the flow through
the region contoured by those lines must be zero, the flow through each line must be equal.
Therefore the flow through a line that connects O and P only depends on the states, it is
path independent. We can assign H(P ) = ∫OP (S q dp − S p dq). If we expand the differential of
H we have

dH = ∂q Hdq + ∂p Hdp = −S p dq + S q dp. (1.40)

If we equate the components, we recover HM-1D. Geometrically, at least for the one dimen-
sional case, we can understand the difference of the Hamiltonian between two states as the
flow of the displacement field between them.
We conclude that the following condition

The displacement field is divergenceless: ∂a S a = 0 (DR-DIV)

is equivalent to HM-1D. Unlike HM-G, this is a truly different mathematical condition.


Having looked at the flow through a region, we turn our attention to how regions them-
selves are transported by the evolution. Liouville’s theorem states that volumes of phase space
22 CHAPTER 1. CLASSICAL MECHANICS

are preserved during Hamiltonian evolution, which in our case will be areas over the q−p plane.
To see this, let us review how variables transform, together with infinitesimal volumes:

ξˆa = ξˆa (ξ b )
dξˆa = ∂b ξˆa dξ b
dξˆ1 ⋯dξˆn = ∣∂b ξˆa ∣ dξ 1 ⋯dξ n (1.41)
∂q q̂ ∂p q̂
dq̂dp̂ = ∣ ∣ dqdp
∂q p̂ ∂p p̂

This tells us that, mathematically, a transformation is volume preserving if the determi-


nant of the Jacobian ∂b ξˆa is unitary. If q̂ and p̂ represent the evolution of q and p after an
infinitesimal time step δt, we have

q̂ = q + S q δt
p̂ = p + S p δt
1 + ∂q S q δt ∂p S q δt (1.42)
∂b ξˆa = [ p ]
∂q S δt 1 + ∂p S p δt
∣∂b ξˆa ∣ = (1 + ∂q S q δt)(1 + ∂p S p δt) − ∂p S q ∂q S p δt2 = 1 + (∂q S q + ∂p S p ) δt + O(δt2 ).

Note that the first order term is proportional to the divergence of the displacement field, there-
fore the Jacobian determinant is equal to one if and only if the displacement is divergenceless.
In other words, condition

The Jacobian of time evolution is unitary: ∣∂b ξˆa ∣ = 1 (DR-JAC)

and condition

Volumes are conserved through the evolution: dξˆ1 ⋯dξˆn = dξ 1 ⋯dξ n (DR-VOL)

are equivalent to DR-DIV. We have found a third and a fourth way to characterize Hamiltonian
evolution.
While condition DR-VOL is expressed in terms of areas, similar considerations will work
for densities because a density is a quantity divided by an infinitesimal area. In fact densities

∣∂b ξˆa ∣ ρ̂(ξˆa ) = ρ(ξ b ). (1.43)

transform in an equal and opposite way with respect to areas (i.e. the Jacobian determinant
is on the other side of the equality). The unitarity of the Jacobian determinant, then, is
equivalent to requiring that the density at an initial state is always equal to the density at the
corresponding final state. Both areas and densities are transported unchanged by Hamiltonian
evolution, as shown in fig. 1.8. Therefore

Densities are conserved through the evolution: ρ̂(ξˆa ) = ρ(ξ b ) (DR-DEN)

is yet another equivalent characterization.


To get a yet different perspective, we can reframe these arguments in terms of ωab and Sa .
Given two vectors v a and wa , the area of the parallelogram they form is v q wp − v p wq . This can
1.4. REVERSING HAMILTONIAN MECHANICS 23
p

q
p

Figure 1.8: On the left side, we see how the displacement field S a transports areas of phase
space to equal areas of phase space. On the right, we see Hamiltonian evolution transports a
probability distribution point by point. The value of the probability density remains the same
as it moves over phase space.

be rewritten as v a ωab wb , which means we can think of ωab as a tensor that, given two vectors,
returns the area of the parallelogram they form.16 If we denote v̂ a = ∂b ξˆa v b and ŵa = ∂b ξˆa wb
the transformed vectors, the invariance of the area can be written as

v a ωab wb = v̂ c ωcd ŵd . (1.44)

Since

v̂ c ωcd ŵd = v a ∂a ξˆc ωcd ∂b ξˆd wb = v a ω̂ab wb (1.45)

the previous equivalence means that ωab = ω̂ab , that is ωab remains unchanged. In other words,
preserving the area for all possible pairs of vectors is the same as preserving the tensor ωab
that returns the areas. We now see that ωab plays such an important geometric role that

The evolution leaves ωab invariant: ω̂ab = ωab (DI-SYMP)

is yet another equivalent characterization of Hamiltonian mechanics.


It is useful to look more closely at the definition of the Poisson bracket

∂q f ∂p f
{f, g} = ∂q f ∂p g − ∂p f ∂q g = ∣ ∣. (1.46)
∂q g ∂ p g

For a single degree of freedom, the Poisson bracket coincides with the Jacobian determinant,
where f and g are the two new variables. It essentially tells us how the volume changes if we
change state variables from [q, p] to [f, g]. Canonical transformations, then, are those that
16
More properly, ωab is a two-form.
24 CHAPTER 1. CLASSICAL MECHANICS

do not change the units of area. The Poisson bracket can be expressed17 as

{f, g} = −∂a f ω ab ∂b g = ∂b gω ba ∂a f (1.47)

where

ω qq ω qp 0 −1
ω ab = [ pq pp ] = [ ] (1.48)
ω ω 1 0

is the inverse of ωab . The invariance of the Poisson brackets is equivalent to the invariance of
the inverse of ωab , which is equivalent to DI-SYMP. Therefore

The evolution leaves the Poisson brackets invariant (DI-POI)

is yet another equivalent characterization. So, again, we see how ωab plays a fundamental
geometrical role.
We can also rewrite the flow of the displacement field

q p a b b
∫ (S dp − S dq) = ∫ S ωab dξ = ∫ Sb dξ (1.49)

as the line integral of the rotated displacement field Sa . We can do that because in two
dimensions the flow through a boundary is effectively a line integral along the boundary with
the field rotated 90 degrees. This means that the following condition

The rotated displacement field is curl free: ∂a Sb − ∂b Sa = 0 (DI-CURL)

is equivalent to condition DR-DIV.18 In fact, we can read equation HM-G as saying that the
rotated displacement field is the gradient of the scalar potential H.
We can see that we have found plenty of alternative characterizations of Hamilton’s equa-
tions HM-1D (or HM-G). Conditions DR-DIV, DR-JAC, DR-VOL and DR-DEN relate more
directly to the displacement field S a , while conditions DI-SYMP, DI-POI and DI-CURL relate
more directly to ωab and the rotated displacement field Sa . Nonetheless, they are all in terms
of the mathematical description. While these are useful, the final goal of reverse physics is to
find physical assumptions, not just equivalent mathematical definitions. So it is time to step
back and try to understand what the math is really about.

Physical characterizations
Let us first reflect on what we just found out: the defining characteristic of Hamiltonian
mechanics is not the transport of points, but the transport of areas and densities. If classical
Hamiltonian mechanics were really about and only about point particles, there would be no
reason for it to be characterized by DR-DIV, DR-VOL or DR-DEN. In fact, there would be
17
To see how our definitions and notation map to that used in differential geometry, let us define
∂ H = ω ab ∂a H. Note that ∂ a H corresponds to the Hamiltonian vector field of H usually noted XH . The
a

Poisson bracket is usually defined as ω(Xf , Xg ). In our notation this becomes ∂ a f ωab ∂ b g = ω ac ∂c f ωab ω bd ∂d g =
ω ac ∂c f δad ∂d g = ω ac ∂c f ∂a g. One can see how the notation mimics the Einstein notation of general relativity and
avoids the introduction of ad-hoc symbols.
18
Those familiar with relativistic electromagnetism will recognize the expression ∂a Sb − ∂b Sa as the gener-
alization of the curl. More properly, it is the exterior derivative applied to a one-form.
1.4. REVERSING HAMILTONIAN MECHANICS 25

no reason for the equations of motion HM-1D to be differentiable. Differentiable equations are
exactly needed if we need to define the Jacobian, the transport of areas, or of densities defined
on those areas. Classical point particles, then, are more aptly conceived not as points, but as
infinitesimal regions of phase space, as distributions so peaked that only the mean value is
important.
This, in retrospect, matches how classical mechanics is used in practice: planets, cannon-
balls, pendulums, beads on a wire, all the objects we study with classical mechanics are not
point-like objects. They can be considered point-like if their size is negligible compared to
the scale of the problem. If the distance between two celestial bodies is smaller than the sum
of their radii, the point particle approximation clearly fails. This is also consistent with fluid
dynamics and continuum mechanics, where we are literally studying the motion of infinites-
imal parts of a material. It is interesting to see echoes of these considerations present in the
mathematics.19
If we look at physics more broadly, we realize that in statistical mechanics we already have
a physical interpretation for volumes of regions in phase space: they represent the number
of states. Hamiltonian mechanics, then, maps regions while preserving the number of states.
This means that, for each initial state there is one and only one final state, which leads to the
following condition:

The evolution is deterministic and reversible. (DR-EV)

Note that by reversible here we mean that given the final state we can reconstruct the initial
state. Given that areas measure the number of states, DR-EV is equivalent to DR-VOL,
which means this is another characterization of Hamiltonian mechanics. We can also see a
connection to DR-DEN. If we assign a density to an initial state, and we claim that all and only
the elements that start in that initial state will end in a particular final state, we will expect
the density of the corresponding final state to match. That is, if the evolution is deterministic
and reversible, it may shuffle around a distribution, but it will never be able to spread it or
concentrate it.
This makes us understand, at a conceptual level, why a dissipative system, like a particle
under linear drag, is not a Hamiltonian system. A dissipative system will have an attractor: a
point or a region to which the system will tend given enough time. This means that, in time,
the area around the attractor must shrink, the density will concentrate over the attractor, but
this is exactly what Hamiltonian systems cannot do. Therefore Hamiltonian systems cannot
have attractors, they cannot be dissipative. By the same argument, they can’t have unstable
points or regions from which the system always goes away.
What may be confusing is that the motion of a particle under linear drag may seem
reversible, in the sense that we are able to, given the final position and momentum, reconstruct
the initial values. Mathematically, it maps points one-to-one and would seem to satisfy DR-
EV, even though it is not a Hamiltonian system. This is a perfect example of how focusing
on just the points leads to the wrong physical intuition. Physically, we would say that a one
meter range of position allows for more configurations than a one centimeter range, even
though mathematically they have the same number of points. If we understand that states
are infinitesimal areas of phase space, we can see that a dissipative system, though it does
19
We will want to investigate this link in more detail later.
26 CHAPTER 1. CLASSICAL MECHANICS

map the center points of infinitesimal areas one-to-one, it does not map the full infinitesimal
area one-to-one. In this sense dissipative systems fail to be reversible.
Let that sink in: we found that, if the system is deterministic and reversible, it admits
a Hamiltonian, a notion of energy, and that energy is conserved over time. This may seem
like a surprising and unexpected result. In retrospect, we can make an argument for it based
on familiar physics considerations. If a system is deterministic and reversible it means that
its evolution only depends on the state of the system itself. This means that it does not
depend on the state of anything else. A system whose evolution does not depend on anything
else is an isolated system. Therefore a deterministic and reversible system is isolated, and
from thermodynamics we know that an isolated system conserves energy. It should not be
surprising, then, that a deterministic and reversible system conserves energy. However, we
found that not only does it conserve energy, it defines it. Therefore this link between mechanics
and thermodynamics is actually deeper than we may think at first, and we should explore it
further.
The idea that a dissipative system is not reversible sounds true on thermodynamic grounds.
But thermodynamic reversibility is not the ability to reconstruct the initial state, but rather
the existence of a process that can undo the change. Alternatively, a process is thermodynam-
ically reversible if it conserves thermodynamic entropy, which is a more precise characteriza-
tion.20 We should not, then, confuse the two notions of reversibility, but we can easily show
their relationship. The fundamental postulate of statistical mechanics tells us that the ther-
modynamic entropy S = kB log W is the logarithm of the count of states, which corresponds
to volume in phase space. Since the logarithm is a bijective function, conservation of areas of
phase space is equivalent to conservation of entropy. Therefore

The evolution is deterministic and thermodynamically reversible (DR-THER)

is yet another characterization of Hamiltonian mechanics.


There is another type of entropy that is also fundamental in both statistical mechanics
and information theory: the Gibbs/Shannon entropy I[ρ(ξ a )] = − ∫ ρ log ρ dξ 1 ⋯dξ n which is
defined for each distribution ρ(ξ a ). Recalling the transformation rules for both volumes 1.41
and densities 1.43, we have

I[ρ(ξ a )] = − ∫ ρ(ξ a ) log ρ(ξ a ) dξ 1 ⋯dξ n

= − ∫ ρ̂(ξˆb ) ∣∂a ξˆb ∣ log (ρ̂(ξˆb ) ∣∂a ξˆb ∣) dξ 1 ⋯dξ n

= − ∫ ρ̂(ξˆb ) log (ρ̂(ξˆb ) ∣∂a ξˆb ∣) dξˆ1 ⋯dξˆn (1.50)

= − ∫ ρ̂(ξˆb ) log ρ̂(ξˆb ) dξˆ1 ⋯dξˆn − ∫ ρ̂(ξˆb ) log ∣∂a ξˆb ∣ dξˆ1 ⋯dξˆn

= I[ρ̂(ξˆb )] − ∫ ρ̂(ξˆb ) log ∣∂a ξˆb ∣ dξˆ1 ⋯dξˆn .

Information entropy, then, remains constant if and only if the logarithm of the Jacobian
determinant is zero, which means the Jacobian determinant is one. Therefore

The evolution conserves information entropy (DR-INFO)


20
The actual existence of a reverse process is not something that can always be guaranteed.
1.4. REVERSING HAMILTONIAN MECHANICS 27

is equivalent to DR-JAC and is yet another characterization of Hamiltonian mechanics.


The fact that determinism and reversibility is equivalent to conservation of information
entropy should not be, in retrospect, surprising. Given a distribution, its information entropy
quantifies the average amount of information needed to specify a particular element chosen
according to that distribution. If the evolution is deterministic and reversible, giving the initial
state is equivalent to giving the final state and therefore the information to describe one or
the other must be the same. Determinism and reversibility, then, can be understood as the
informational equivalence between past and future descriptions.
Lastly, given that entropy is often associated with uncertainty, it may be useful to under-
stand how Hamiltonian evolution affects uncertainty. Given a multivariate distribution, the
uncertainty is characterized by the covariance matrix

σq2 covq,p
cov(ξ a , ξ b ) = [ ]. (1.51)
covp,q σp2

The determinant of the covariance matrix gives us a coordinate independent quantity to


characterize the uncertainty. If the distribution is narrow enough, we can use the linearized
transformation to see how the uncertainty evolves after an infinitesimal time step δt. We have

∣cov(ξˆc , ξˆd )∣ = ∣∂a ξˆc cov(ξ a , ξ b ) ∂b ξˆd ∣ = ∣∂a ξˆc ∣ ∣cov(ξ a , ξ b )∣ ∣∂b ξˆd ∣ , (1.52)

which means the uncertainty remains unchanged if and only if the Jacobian is unitary. So

The evolution conserves the uncertainty of peaked distributions (DR-UNC)

is equivalent to DR-JAC and is another characterization of Hamiltonian mechanics.


This connection gives us yet another insight on the nature of determinism and reversibility
in physics. Given that all physically meaningful descriptions are finite precision, a system is
deterministic and reversible in a physically meaningful sense if and only if the past/future
descriptions can be reconstructed/predicted at the same level of precision. This gives us
another perspective as to why areas and densities must be conserved.

Assumption of determinism and reversibility


We have found twelve equivalent characterizations that link Hamiltonian mechanics, vector
calculus, differential geometry, statistical mechanics, thermodynamics, information theory and
plain statistics. Though we only talked about the case of a single degree of freedom, it gives
us a much better idea of what systems Hamiltonian mechanics is supposed to describe, those
that satisfy the following

Assumption DR (Determinism and Reversibility). The system undergoes deterministic and


reversible evolution. That is, specifying the state of the system at a particular time is equivalent
to specifying the state at a future (determinism) or past (reversibility) time.

We can see how this concept is implemented mathematically: it is not simply a one-to-one
map between points. Classical particles should be more properly thought of as infinitesimal
regions of phase space. Conceptually, the count of states, the thermodynamic entropy and
information entropy are all conserved, and are all equivalent characterizations of determinism
28 CHAPTER 1. CLASSICAL MECHANICS

and reversibility. In terms of physical measurement, past and future states are given at the
same level of uncertainty. But the most important lesson is that the foundations of classical
mechanics are not disconnected from the foundations of all other disciplines we encountered.
A full understanding of classical mechanics means understanding those connections as well.

1.5 Multiple degrees of freedom


We have seen how DR is a constitutive assumption for Hamiltonian mechanics, and in fact is
equivalent to Hamiltonian mechanics for one degree of freedom. We now turn our attention to
the general case, and we will find that DR, by itself, is not enough to recover the equations.
We will need an additional assumption, that of the independence of degrees of freedom.
First, let’s take Hamilton’s equations for multiple degrees of freedom

dt q i = ∂pi H
(HM-ND)
dt pi = −∂qi H

and re-express them in terms of generalized state variables. These will be noted as ξ a = [q i , pi ]
and will span a 2n-dimensional space (i.e. manifold). The displacement field will be

S a = dt ξ a = [dt q i , dt pi ] (1.53)

which again is the vector field that defines the evolution of the system in time. Hamilton’s
equations, then, can be expressed as
i
S q = ∂pi H
(1.54)
S pi = −∂qi H.

Similarly to the previous case, let’s introduce the following matrix

ωqi qj ωqi pj 0 In 0 1
ωab = [ ]=[ ]=[ ] ⊗ In (SF-ND)
ωpi qj ωpi pj −I n 0 −1 0

which performs a 90 degree rotation within each degree of freedom, switching the components
i i
between position and momentum. That is, if v a = [v q , v pi ], then va = v b ωba = [−v pi , v q ].21 We
can rewrite equation HM-ND as

Sa = S b ωba = ∂a H (1.55)

which notationally is the same as HM-G. The insight that the displacement field is equal
to the gradient of H rotated 90 degrees still applies, except there are now multiple ways, in
principle, to do that rotation. It is only the one defined by ωab that works.
21
For those versed in symplectic geometry, v a ωab are the components of the one-form ω(v, ⋅). However, we
are not going to call it a one-form as that assumes that the whole object is a map from a vector field to a scalar
field, and we do not know whether that is the correct physical understanding. In other words, we want simply
to understand what the quantities are doing without being tied, as much as possible, to a particular way to
frame it. Full reverse engineering of differential geometry will be done in a later chapter, once the physics we
need to describe is clear.
1.5. MULTIPLE DEGREES OF FREEDOM 29

Conditions DR-DIV, DR-JAC, DR-VOL and DR-DEN are still satisfied and equivalent to
each other. In fact, the divergence of the displacement field is zero
i
∂a S a = ∂qi S q + ∂pi S pi = ∂qi ∂pi H − ∂pi ∂qi H = 0 (1.56)

and the Jacobian is unitary


i
q̂ i = q i + S q δt
p̂i = pi + S pi δt
i i
δji + ∂qj S q δt ∂pj S q δt
∂b ξˆa = [ ] (1.57)
∂qj S pi δt δij + ∂pj S pi δt
i i
∣∂b ξˆa ∣ = ∣δji + ∂qj S q δt∣ ∣δij + ∂pj S pi δt∣ − ∣∂pj S q δt∣ ∣∂qj S pi δt∣
i
= 1 + (∂qi S q + ∂pi S pi ) δt + O(δt2 )

since the first-order term is again the divergence. The Jacobian is still the multiplicative factor
between past/future areas (and densities), and therefore they are conserved even in the case
of multiple degrees of freedom.
However, these conditions are not equivalent to HM-ND. The displacement field S a has
2n components and is therefore specified by 2n functions. Conditions DR-DIV, DR-JAC, DR-
VOL and DR-DEN specify the same single constraint, bringing down to 2n − 1 the number
of independent components. The choice of Hamiltonian provides another constraint, leaving
2n − 2 choices undetermined. In the single degree of freedom case, n = 1, no choices are left,
and therefore the displacement field is fully constrained. In the general case, however, this is
not enough to fully characterize the evolution. Therefore HM-ND implies DR-DIV, DR-JAC,
DR-VOL and DR-DEN, but the converse is not true.
Let’s see what happens to condition DI-SYMP, the invariance of ω in the general case.
We have

ω̂ab = ∂a ξˆc ωcd ∂b ξˆd


= (δac + ∂a S c δt) ωcd (δbd + ∂b S d δt)
= ωab + (∂a S c ωcb + ωad ∂b S d ) δt + O(δt2 ) (1.58)
= ωab + (∂a (S c ωcb ) + ∂b (S d ωad )) δt + O(δt2 )
= ωab + (∂a (S c ωcb ) − ∂b (S d ωda )) δt + O(δt2 ).

Therefore, the invariance of ωab is equivalent to

∂a (S c ωcb ) − ∂b (S c ωca ) = 0. (1.59)

In terms of the rotated displacement field Sa we have the more compact form

∂a Sb − ∂b Sa = 0. (1.60)

This tells us that the rotated displacement field Sa is curl free, which is the same condition
as DI-CURL, therefore DI-CURL and DI-SYMP are equivalent conditions also in the general
case.
30 CHAPTER 1. CLASSICAL MECHANICS

Note that Hamilton’s equations state that the rotated displacement field is the gradient
of the Hamiltonian, and therefore

∂a Sb − ∂b Sa = ∂a (S c ωcb ) − ∂b (S c ωca ) = ∂a ∂b H − ∂b ∂a H = 0, (1.61)

which simply verifies that the curl of the gradient is zero. Conversely, if Sa is curl-free, then
it admits a scalar potential H such that

Sa = S b ωba = ∂a H (1.62)

which recovers Hamilton’s equations. Therefore HM-ND, DI-CURL and DI-SYMP are equiv-
alent.
The relationship between Poisson brackets and ω ab is the same in the general case, therefore
DI-POI and DI-SYMP are equivalent as well.
To sum up, in the general case HM-ND, HM-G, DI-SYMP, DI-POI and DI-CURL are
all equivalent and therefore full characterizations of Hamiltonian mechanics in the general
case. These imply DR-DIV, DR-JAC, DR-VOL and DR-DEN, which are all equivalent to
one another, but weaker conditions that cannot recover Hamiltonian mechanics in full. For
the second set of conditions, we already have an intuitive geometrical picture: the net flow
of the displacement within a region of phase space is zero, volumes are preserved and so are
densities. We need to build a stronger geometrical intuition for the first set, which is actually
the more fundamental one.
Condition DI-SYMP tells us that v a ωab wb is a conserved quantity, no matter what vectors
v and wb we choose. In the case of a single degree of freedom, this represented the area of
a

the parallelogram formed by the two vectors, which was also the volume of the region. In the
general case, we still have two vectors, but the situation is a bit more complicated.
We can gain an understanding by looking at the outer product decomposition for ωab we
saw in SF-ND. This tells us that what happens within a degree of freedom is different from
what happens across degrees of freedom. If we pick a single degree of freedom 1 ≤ x ≤ n and
two vectors v = v q eqx + v p epx and w = wq eqx + wp epx that stretch along that degree of freedom,
then we have

v a ωab wb = v q wp − v p wq . (1.63)

That is, within each degree of freedom, ωab computes the area of the parallelogram. Since ωab
is conserved, parallelograms within any degree of freedom will be mapped to parallelograms
of the same size.
If we pick two different DOFs x and y and two corresponding vectors v = v q eqx + v p epx and
w = wq eqy + wp epy , then we have

v a ωab wb = 0. (1.64)

This defines a notion of orthogonality between different degrees of freedom. Since ωab is
conserved, this notion of orthogonality is preserved during the evolution: orthogonal degrees
of freedom are mapped to orthogonal degrees of freedom.
Those familiar with general relativity and/or Riemannian geometry may gain more in-
sight by the following analogy. In those cases, the metric tensor gij defines the geome-
try by defining the scalar product between vectors. That is, given two vectors v i and wj ,
1.5. MULTIPLE DEGREES OF FREEDOM 31

v i gij wj = ∣v∣∣w∣ cos θvw . Therefore the metric tensor defines the length and angles for vectors.
In Cartesian coordinates, the metric tensor is a unitary matrix of the same dimension of
the space. The form ωab does something in some sense similar and in some sense different.
It defines areas within degrees of freedom and angles between them. Hamiltonian evolution
preserves these areas and angles.
If areas and orthogonality are preserved, then volumes are preserved as well. The volume
of a parallelepiped formed by parallelograms on orthogonal degrees of freedom will simply be
the product of the areas of the parallelograms. Therefore we can understand why Hamiltonian
mechanics satisfies DR-VOL. We can also understand why DR-VOL is not enough to recover
Hamiltonian mechanics. An evolution could stretch one degree of freedom while shrinking
another by the same amount. The total volume would remain the same, even though the area
in each degree of freedom wouldn’t. For example, take the system of equations:
1 p1
dt q 1 = S q =
m
dt p1 = S p1 = −bp1
2 p2 (1.65)
dt q 2 = S q =
m
dt p2 = S p2 = bp2

The first degree of freedom is a particle under linear drag, while the second is a particle
accelerated (not decelerated) proportionally to its momentum by the same coefficient. We can
verify that
p1 p2
∂a S a = ∂q 1 + ∂p1 (−bp1 ) + ∂q2 + ∂p2 (bp2 ) = −b + b = 0 (1.66)
m m
the divergence is zero and therefore DR-DIV and DR-VOL are satisfied. However
1 p1
∂q1 Sp1 − ∂p1 Sq1 = ∂q1 S q ωq1 p1 − ∂p1 S p1 ωp1 q1 = ∂q1 (1) − ∂p1 (−bp1 )(−1) = −b. (1.67)
m
The curl of Sa , then, is not zero, DI-CURL is not satisfied, nor are DI-SYMP and HM-ND.
The system is not Hamiltonian precisely because we are not preserving the areas within each
independent DOF: the first is shrunk and the second stretched.
Now that we have a more precise understanding of the mathematics and the geometry, we
should turn to the physics. Note that all the previous physical conditions DR-EV, DR-THER,
DR-INFO and DR-UNC are equivalent to DR-VOL and DR-JAC. Therefore determinism and
reversibility is clearly a constitutive assumption of Hamiltonian mechanics in the general case,
but it cannot be the only one. Ideally, we would like to find a condition that is independent of
DR. However, we saw that DI-SYMP implies DR-VOL, therefore the mathematics does not
already give us two independent conditions we can map to the physics.
This is an important aspect to understand for reverse physics: the mapping between phys-
ical and mathematical conditions need not necessarily be one to one. A single mathematical
condition can map to multiple physical ones, or the same physical condition can map to multi-
ple mathematical ones. We saw before that determinism and reversibility forces the evolution
map to be both bijective and volume preserving. Mathematically, these are two independent
conditions. We can have a bijection that is not volume preserving (e.g. a linear transformation
that stretches one side) or a volume preserving map that is not bijective (e.g. a map from
32 CHAPTER 1. CLASSICAL MECHANICS

R to R that maps all rationals to 0 while leaving all the irrationals the same). Yet, a physi-
cally meaningful deterministic and reversible map must do both. Here we have the opposite:
Hamiltonian mechanics implies determinism and reversibility, but is also implying at least
another physical condition, and we need to understand which and whether it is physically
independent.
Let’s start from what we have already established: the phase space volume quantifies the
number of states in the region. It stands to reason that the area on each degree of freedom
identifies the number of configurations for that degree of freedom. Therefore, given two vectors,
v = v q eqx +v p epx and w = wq eqx +wp epx , constrained on a single degree of freedom x, the area of
the parallelogram they identify, v q wp −v p wq , quantifies the number of configurations. Therefore
ωab returns the number of configurations within each degree of freedom. What about between
degrees of freedom?
As we saw, the conserved volume, which means the total number of states, is the product
of those degrees of freedom that are orthogonal. In terms of ωab , x and y are orthogonal
if ωqx qy = ωqx py = ωpx qy = ωpx py = 0. In this case, the volume is simply the product of the
phase-space areas on the x and y degrees of freedom. This means that the total number of
states is the product of the configurations of each degree of freedom. Physically, it means that
a configuration choice for x does not constrain the configurations for y. This means that the
degrees of freedom are independent.
Given that there are different notions of independence, let us go through an example.
Suppose we have a rabbit farm and describe its state with the number of males and females.
These two variables are independent: if we say there are 231 females it doesn’t, in principle,
tell us anything about the number of males. Now, we may expect the population of both sexes
to be about equal, and we may even find that in most rabbit farms that is the case, but this
does not describe something about the nature of the variables themselves: it describes the
nature of rabbit farms. Chicken farms, for example, would be predominantly females, as those
are the ones that lay eggs.
Now we could choose to describe the rabbit farm with another set of variables: the number
of females and the total number of rabbits. In this case, the variables are not independent.
If we find that there are 231 females, it tells us, in principle, that there must be at least 231
rabbits. Conversely, if we find that there are 231 rabbits, there can only be up to 231 females.
This dependence is not a feature of the rabbit farms. It does not just happen to be that there
are no farms where the number of female rabbits exceeds the total number of rabbits. There
can’t be one.
This type of independence is very different from the notion of independence in terms
of statistics and probability. The latter is in terms of whether the probability distribution
factorizes. That is, if P (f, m) is the probability that a particular rabbit farm has f females
and m males, the distributions of males P (m) and females P (f ) are independent if P (f, m) =
P (f )P (m).
We therefore have two notions of independence. One is on the variables themselves and
whether they can allow (i.e. whether we can measure) different combinations. One is on the
probability distribution we may have in a particular case and whether it factorizes. The
orthogonal directions in phase space, then, are independent in the first, stronger, sense. The
degrees of freedom themselves are independent, regardless of what probability distribution
one may put on top.
One may ask whether there is a link between the two, and in fact there is. Going back to
1.5. MULTIPLE DEGREES OF FREEDOM 33

our rabbits, we can easily see that, given any distribution P (f ) on the females, we can choose
any distribution P (m) on the males and set P (f, m) = P (f )P (m). However, this does not
happen for the total number. Suppose we chose P (f ) and we wanted to find a P (f + m) such
that P (f, f + m) = P (f )P (f + m). The probability of the total number of rabbits could not
change based on the number of females. If the probability of having 231 females is non-zero,
then the probability of having less than 231 total rabbits in that case must be zero. But
since we want the probability of having less than 231 rabbits independent of the number of
females, then it must be zero for all cases. That is, the probability P (f + m) must be zero
for all numbers smaller than the greatest value of f such that P (f ) ≠ 0. If there is no such
greatest value of f , for example f follows a geometric distribution, no P (f + m) can exist that
is independent of P (f ).
The conclusion is that only independent variables can support independent distributions.22
It should be clear that this observation is something that goes beyond the physical underpin-
ning of Hamiltonian mechanics: it is something that applies to any variable to which we want
to assign a probability distribution. As such, we do not want to expand the scope too much
at this point, though clearly we will need to explore this more in full.
Here we limit ourselves to concluding that the following four conditions
The system is decomposable into independent DOFs (IND-DOF)
The system allows statistically independent distributions over each
(IND-STAT)
DOF
The system allows informationally independent distributions over
(IND-INFO)
each DOF
The system allows peaked distributions where the uncertainty is the
(IND-UNC)
product of the uncertainty on each DOF
are equivalent. The first means that the count of states factorizes, the second that probability
distributions can factorize, the third that the information entropy can sum, and the fourth that
the determinant of the covariance matrix can factorize. Since only independent variables can
support statistically independent distributions, IND-DOF is equivalent to IND-STAT. Statis-
tical independence of random variables coincides with independence of information entropy,
therefore IND-STAT is equivalent to IND-INFO. The uncertainty for peaked distributions
factorizes if and only if the joint distribution is the product of independent distributions,
therefore IND-STAT is equivalent to IND-UNC.
Clearly these conditions are independent from DR. We can imagine a deterministic and
reversible system that cannot be broken into separate independent degrees of freedom, and we
can imagine a system that can be broken into separate independent degrees of freedom that
does not evolve deterministically or reversibly. The question is whether assuming independent
degrees of freedom and deterministic and reversible evolution is enough to recover Hamiltonian
mechanics.
The first thing to check, then, is whether we have enough constraints to recover ωab .
Assuming that the system can be broken up into independent degrees of freedom, we must be
22
Formally, let (Ω, F, P ) be a probability space, let X ∶ Ω → EX and Y ∶ Ω → EY be two random variables
and Z ∶ Ω → EX ×EY be their joint random variable (i.e. Z(ω) = (X(ω), Y (ω)). Then X and Y are independent
in this stronger sense if Z(Ω) = X(Ω) × Y (Ω) and are statistically independent if the cumulative distribution
function FZ (x, y) = FX (x)FY (y) factorizes. Alternatively, they are independent if the σ-algebra generated by
the joint distribution Z is the product of the σ-algebras generated by X and Y , and are statistically independent
if the σ-algebras generated by X and Y are independent in the standard probability sense.
34 CHAPTER 1. CLASSICAL MECHANICS

able to define the count of configurations for each degree of freedom, and independent degrees
of freedom must be orthogonal. The fact that ωab will return zero if it acts on directions
belonging to independent degrees of freedom is really telling us that ωab counts not just
configurations, but independent configurations. This, in retrospect, makes sense. But, so far
this seems to restrict ωab to be

⎢ a1 0 ⋯ 0 ⎤⎥
0 1 ⎢ 0 a2 ⋯ 0 ⎥⎥

ωab = [ ]⊗⎢ ⎥. (1.68)
−1 0 ⎢
⎢ ⋮ ⋮ ⋱ ⋮ ⎥⎥

⎣ 0 0 ⋯ an ⎥⎦

The fact that ωab must return the area in each degree of freedom constrains the left matrix of
the outer product to the one above. The fact that ωab needs to return zero across independent
degrees of freedom constrains the right matrix of the outer product to be diagonal. However,
there is nothing, at this point, that seems to constrain the area of each DOF to map to the
same count of configurations. Naturally, we could simply rescale conjugate momentum in each
DOF to homogenize the count, but this would be an arbitrary freedom. Is there something
forcing all the ai to be the same?
Note that q i and pi are not the only variables that form independent degrees of freedom. If
we take two independent DOFs x and y, x + y and x − y will also form independent degrees of
freedom. That is, ωab defines orthogonality for all DOFs, not just those of a particular basis.
Changing x and y to x + y and x − y will effectively apply a rotation on the diagonal matrix,
which will remain diagonal only if the coefficients on the diagonal are the same.
This tells us that if we want ωab to properly capture the independence of linear com-
binations of independent DOFs, the diagonal matrix must have the same coefficient. This
coefficient represents the freedom we have in choosing the units of omega with respect to the
units of everything else. In SI units, by convention, the product between q i and pi is in J ⋅ s
(i.e. the same units of h and angular momentum) and we set the coefficient to 1. Therefore
expressing the number of configurations with the same units for all DOFs is not an extra
constraint, but it is necessary to keep track of the dependency relationship for all degrees of
freedom.
Therefore the other constitutive assumption of Hamiltonian mechanics is

Assumption IND (Independent DOFs). The system is decomposable into independent de-
grees of freedom. That is, the variables that describe the state can be divided into groups that
have independent definition, units and count of states.

This assumption leads to conditions IND-DOF, IND-STAT, IND-INFO and IND-UNC,


which we saw implies the existence of a form ωab that defines the independence of DOFs
together with the count of independent configurations for each DOF. Conversely, assuming
HM-ND means defining an ωab such that IND is satisfied. The last question is whether DR
and IND are enough to recover HM-ND.
As we saw, IND by itself means the existence of the counting form ωab . Meanwhile, DR
by itself means the conservation of the total number of states, the volume. These two math-
ematical conditions, by themselves, do not lead to Hamiltonian mechanics. We need that
the counting form itself is conserved, meaning that DOF independence and the configuration
count is preserved. This boils down to the following question: does it make sense, on physical
1.5. MULTIPLE DEGREES OF FREEDOM 35

grounds, to have a deterministic and reversible evolution that takes a system decomposable
into independent degrees of freedom and turns it into a system that is no longer decomposable?
More specifically, can deterministic and reversible evolution take two independent degrees of
freedom and break their independence?
We should remind ourselves that independence here is not statistical independence. Clearly,
Hamiltonian evolution can add correlations, and therefore evolve the product of two indepen-
dent distributions into something that is no longer factorizable. This is not the question at
hand. The notion of independence on the table is the one that tells us that all the combi-
nations of configurations are at least possible. Recall the example of the rabbits: number of
females and total rabbits are not independent because we cannot have more females than rab-
bits. This is the type of independence we are interested in. Can deterministic and reversible
evolution lose this type of independence?
The answer is, as you may expect, no. This is easily seen in the finite case. Suppose we
have two integer variables, 1 ≤ x ≤ 3 and 1 ≤ y ≤ 3. If they are independent, we have a total
of 9 distinct cases. If the evolution is deterministic and reversible, we will still have 9 distinct
cases, which means the variables must remain independent. Note that we may introduce a
correlation between x and y. But still, we need 9 total cases.
The issue here is that the case of independent variables is maximal as it posits that all
combinations of configurations are possible. Therefore, the only way to make variables no
longer independent is to decrease the number of distinct cases, which cannot happen during
deterministic and reversible evolution. The same must happen in the infinite or the continuous
case, because finite ranges must still be comparable with finite sizes which must hold the same
property. Therefore when we take both physical assumptions IND and DR, the physics tells
us that independence of degrees of freedom must be preserved, which means preserving ωab
as well. In other words:

Insight 1.69. Hamiltonian mechanics is exactly the deterministic and reversible evolution of
a system decomposable into a finite collection of independent degrees of freedom.

This conclusion is yet another example of why just looking at the math is not enough.
Two physical conditions taken separately may each impose one mathematical condition, but
it is not necessarily true that imposing them together will only impose the conjunction of the
two mathematical conditions. More than a problem with math in general, we believe it is an
indication that the math we currently use is not the “physically correct” one as it does not
seem to be capturing the entirety of the physical conditions.
Note that, in principle, we could ask the evolution to preserve the independence of DOFs
without requiring DR. As we saw, fixing all independent DOFs fixes ωab up to a scalar factor,
which could change during the evolution. The volume would stretch or shrink depending
on the factor, stretching or shrinking each DOF by the same amount. An example of this
would be a particle under linear drag in three dimensions. Since a faster moving object will
be subjected to a greater frictional force, a spread in momentum ∆p will become smaller in
time, and will tend to zero as time increases. Given that the friction coefficient is the same
for all directions, all degrees of freedom will shrink at the same rate. If we understand the
volume as entropy, this tells us that the only way we can add or remove entropy to/from a
system while preserving the independence of the degrees of freedom is by dividing that entropy
contribution equally among each DOF. In other words, preservation of DOF independence
36 CHAPTER 1. CLASSICAL MECHANICS

gives us a sort of equipartition of entropy change. It is again striking to find these connections
between disciplines at such a basic level.
To conclude, we now have a mathematically and physically precise way to characterize
Hamiltonian evolution. We had found that Hamiltonian mechanics did not apply to all cases,
and now we know exactly to which cases it applies: systems described by finitely many inde-
pendent degrees of freedom undergoing deterministic and reversible evolution.

1.6 Reversing differential topology


In the previous sections, we saw that elements of differential topology and differential geometry
started appearing: we employed a generalization of the curl and we used the form ωab to lower
indexes, much like one does in general relativity with the metric tensor gαβ . Since we will use
these tools more and more, we should take a detour and understand the physical significance
of the tools themselves. We will end up with a generalized notion of integral and of differential
operations that work with an arbitrary number of dimensions. We will also conclude that to
reach the full potential of reverse physics, we need to apply the same techniques not just to
the equations themselves, but to the mathematical tools we use to formulate them.
The main reason we are forced to abandon vector calculus in favor of differential topology
and geometry is that vector calculus works in three dimensions but does not generalize to
an arbitrary dimensional space. Special and general relativity, for example, live on a four-
dimensional space-time; Hamiltonian and Lagrangian mechanics live in phase space, which
can have an arbitrarily large number of degrees of freedom. Similarly to what we have done
with the equations, we will start from the expressions of vector calculus, see their limitations,
and construct generalized ones. We warn the reader who is already familiar with differential
topology that this process will give us notation and concepts that are slightly different from
what is used by mathematicians. We will discuss these differences at the end of the section.
The first tool we need to generalize is that of line, surface and volume integrals. For
example, the mass m within a region V can be understood as the sum of the contributions of
a mass density ρ within each infinitesimal volume dV :

m(V ) = ∭ ρdV. (1.70)


V

Similarly, the magnetic flux Φ through a surface Σ can be understood as the sum of the
contributions of the magnetic field B
⃗ over each infinitesimal dΣ:

Φ(Σ) = ∬ B
⃗ ⋅ dΣ.
⃗ (1.71)
Σ

Lastly, the work W over a path γ can be understood as the sum of the contributions of a
force field f⃗ over each infinitesimal segment d⃗
γ:

W (γ) = ∫ f⃗ ⋅ d⃗
γ. (1.72)
γ

Note the pattern: the functionals W , Φ and m all take a region of space while f⃗, B
⃗ and ρ act
on infinitesimal regions. However the pattern is not completely consistent. In the line integral
case, we have the product between a vector representing the force and a vector representing
the displacement along the line. In the surface integral case we have the product between a
1.6. REVERSING DIFFERENTIAL TOPOLOGY 37

pseudo-vector representing the magnetic force and a pseudo-vector representing the normal to
the surface element. In the volume integral case we have the product between a pseudo-scalar
representing the density and a pseudo-scalar representing the volume of the infinitesimal
region. Each operation is slightly ad-hoc. Moreover, a surface has a single perpendicular
direction only in three dimensions. In four dimensions, for example, there are multiple different
perpendiculars to the same plane. Lastly, they all require a notion of product between vectors,
an inner product, which in differential geometry is only defined on Riemannian spaces, those
that define a metric tensor. That is, those spaces in which angles and distances are well
defined. In physics we are so used to working with spaces that have an inner product that it
may seem that all spaces provide one, but that is not the case. In phase space, there is no
way to compare differences in position with differences in momentum, therefore we do not
have a metric to take a scalar product; there is no overall notion of distance and angle. In
general, if we imagine the space that represents all possible outcomes of a blood test, this will
form a manifold as each test can be fully described by a finite set of continuous quantities. In
this space, there is no natural notion of distance and angles between directions, there is no
notion of geometry. As we will see, the notion of integral, the idea of a quantity that can be
understood as the sum of infinitely many infinitesimally small contributions, does not require
either a particular number of dimensions or a notion of distance and angle.
Suppose we understood f⃗, B ⃗ and ρ not as above, but as maps that for each infinitesimal
region return an infinitesimal contribution dW , dΦ and dm. We could simply write

W (γ) = ∫ dW = ∫ f (dγ)
γ γ

Φ(Σ) = ∬ dΦ = ∬ B(dΣ)
Σ Σ

m(V ) = ∭ dm = ∭ ρ(dV ).
V V

This pattern is straightforward and more easily generalized. We call these functions of in-
finitesimal regions k-forms, where k is the dimensionality of the infinitesimal region they take
as an argument. The force, in this notation, is a one-form (or covector) as it takes one di-
mensional infinitesimal regions (i.e. vectors); the magnetic field is a two-form; the density is
a three-form. We can also say that a scalar field, like the temperature, is a zero-form, as it
takes points, zero dimensional objects.
Since k-forms act over infinitesimal regions, they will have some key properties. First, note
that each infinitesimal region can be understood as a parallelepiped, and a parallelepiped is
fully identified by its sides. Therefore, a k-form can be understood as acting on a set of in-
finitesimal displacements, the sides of the parallelepiped, whose number matches the dimen-
sionality of the form. A one-form will take one displacement, a two-form two displacements
and so on. Second, as they are linear functions of the infinitesimal regions, they will also be
linear functions of the vectors that define these infinitesimal regions. Lastly, all forms must be
anti-symmetric because switching the order of the sides does not change the parallelepiped,
but it changes its orientation.
38 CHAPTER 1. CLASSICAL MECHANICS

We can write displacements and forms in terms of components and basis elements
dP = dxi ei
f = fi ei
(1.73)
B = Bij ei ⊗ ej
ρ = ρijk ei ⊗ ej ⊗ ek .
The anti-symmetry means that switching the indexes introduces a minus sing. For example,
Bij = −Bji and ρijk = −ρjik . Therefore, our integrals can be written as

W (γ) = ∫ fi dxi
γ

Φ(Σ) = ∬ Bij dxi1 dxj2 (1.74)


Σ

m(V ) = ∭ ρijk dxi1 dxj2 dxk3 .


V
Each k-form, then, is a fully anti-symmetric covariant tensor, with one index for each dimen-
sion of the form. In this view, the B x component of the magnetic field, then, becomes the Byz
component. That is, instead of being the component that gives us the magnetic flux along the
x direction, it is the component that gives us the flux through the yz plane. Similarly, ρxyz is
better understood not as the value of the mass density at a point, but rather the value of the
density over the xyz volume. The indexes make unit dependency apparent as well: Byz must
be in units of magnetic flux divided by the units of y and z; ρxyz must be in units of mass
divided by the units of x, y and z; in spherical coordinates, Brθ must be in units of magnetic
flux divided by units of r and θ. In fact, one can argue that these tools are here exactly to
keep track of the units of the components.
We can go further, and rewrite the integrals in terms of parametrizations of the surfaces
and the components of the forms along said parametrization. We have
bu bu
W (γ) = ∫ fi dxi = ∫ fi ∂u xi du = ∫ fu du
γ au au
bu bv bu bv
Φ(Σ) = ∬ Bij dxi1 dxj2 = ∫ ∫ Bij ∂u xi1 du ∂v xj2 dv = ∫ ∫ Buv dudv
Σ au av au av
bu bv bw
(1.75)
m(V ) = ∭ ρijk dxi1 dxj2 dxk3 = ∫ ∫ ∫ ρijk ∂u xi1 du ∂v xj2 dv ∂w xk3 dw
V au av aw
bu bv bw
=∫ ∫ ∫ ρuvw dudvdw.
au av aw

These expressions are useful to write the integrals in terms of an integral of k variables. This
is particularly useful if the surfaces possess different symmetries than the forms, which allows
one to use the parametrization appropriately.
We now have expressions that are easier to generalize. If we denote S k the space of all
the k-dimensional subregions of an n-dimensional manifold, a k-functional Fk ∶ S k → R is an
additive functional that takes a k-surface σ k and returns a number. This can be expressed as

Fk (σ k ) = ∫ ωk (dσ k ) = ∫ ωi1 i2 ⋯ik dxi11 dxi22 ⋯ dxikk


σk σk
bu1 bu2 buk (1.76)
=∫ ∫ ⋯∫ ωu1 u2 ⋯uk du1 du2 ⋯ duk ,
au1 au2 auk
1.6. REVERSING DIFFERENTIAL TOPOLOGY 39

which is the integral of a k-form ωk ∶ V k → R. Here V is the space of vectors, of infinitesimal


displacements, and an infinitesimal k-surface dσ k is defined by an ordered set of k vectors,
therefore is an element from the Cartesian product V k = V × V × ⋯ × V . The k-form takes an
infinitesimal k-surface dσ k and returns a number. We now have a way to express local objects
and integrals in a generalized way.
In the spirit of reverse physics, we ask: what are the physical assumptions that are required
to use these mathematical objects? Note that we do not measure mass density ρ directly: we
measure mass m within a finite region V and the size of the finite region V , and we calculate
the mass density by dividing the first by the second. The mass density ρ is the limit for
which the region V shrinks to a point.23 The same happens with the other quantities. We do
not measure a force f directly, but rather the work that a force performs, for example, by
deforming a spring. In other words, we measure the finite value of the functional F over a finite
region σ, and the form ω returns the limit of F for infinitesimal regions. Note that it is crucial
for F to be additive, that is, if σ1 and σ2 are two disjoint regions, F (σ1 ∪ σ2 ) = F (σ1 ) + F (σ2 ).
Physically, it means that the contribution of one sub-region is independent from the other,
because if this isn’t the case, we cannot assign a unique value to each region. While this
may seem a relatively harmless assumption, it may not hold in general. For example, suppose
we have a system distributed in space. Given the mass-energy equivalence, the mass is an
additive functional only if the interaction between the parts can be neglected. In fact, if the
interaction energy at the boundary is so large that it is of the same scale of the mass within
the regions, then it is not true that the total mass is simply the mass within each region.
Therefore, if we write the mass density ρ, we have implicitly assumed that the interaction
energy between parts can be neglected. While this is going to be true in a large number of
cases, it is something we have to keep in mind. The functional F , in fact, is a more general
physical object as it may exist even though infinitesimal additivity fails.
If the functional F respects the limits, in the sense that small variations of the region
result in small variations of the value of the functional, then we can express the functional F
as the integral of a form ω.24 Note that the standard mathematical definitions are inverted
with respect to what makes physical sense. Mathematically, we first define the form and then
its integration, and we may worry whether the integral exists or diverges; physically, we first
define the functional of finite regions and worry about the existence of an infinitesimal limit,
the form, under additional physical assumptions. This is important because, as we said before,
it tells us that it is the functional that survives when the assumption fails, not the form. It
also means that, if we want to get a robust physical intuition, we should concentrate on the
functionals, the finite objects, rather than the forms, the infinitesimal objects.
We therefore reach the following:

Insight 1.77. Differential k-forms represent the infinitesimal contributions of an infinitesi-


mally additive quantity Fk that depends on a k-dimensional surface.

The issue, however, is that while we have a sense that, mutatis mutandi, an infinitesimally
additive functional corresponds to a differential form, we would need to show that differential
23
This is essentially the definition of the Radon-Nikodym derivative used in measure theory.
24
Mathematically, one needs to be precise as to what these small variations are, and what the space of
regions is. Effectively, we need to define what it means for k-surfaces to be differentiable. This is more in the
scope of physical mathematics than reverse physics.
40 CHAPTER 1. CLASSICAL MECHANICS

forms cannot represent anything else. We currently do not have such an argument, though
we suspect it should exist. One may first note that no measurement can really be conducted
at a point, and therefore it has to extend, at least in principle, over a k-dimensional region.
Therefore we could argue that all measurements are functionals, additive or not. This means
that we would need a map between the value of the components of the form at all points and
the value of the functional over all regions. To reach our goal, we would need to show that
this map can be established only if the functional is infinitesimally additive, which may not
be a strong enough condition by itself. While we are already implicitly assuming the map to
be differentiable, we could impose further requirements on the measurement functional. We
could impose locality, in the sense that changes of the form within a region must change the
value of the functional for said region; it would also mean that changes outside of the region
do not affect measurements in the region. While we do not know whether these requirements
are sufficient, these are the types of questions we would need to answer in order to have a
fully physically meaningful treatment of differential forms.
Having generalized the idea of integration, let us now turn to differential operators such
as gradient, curl and divergence. Suppose that T is a scalar field, like the temperature. Then
we have the following relationship

⃗ ⋅ d⃗
∫ ∇T γ = T (B) − T (A). (1.78)
γ

That is, the integral of the gradient of T along γ gives us the difference of T evaluated at the
endpoints, the boundary of the line. If f⃗ is a vector field, like the force, we have

⃗ × f⃗ ⋅ dΣ
∬ ∇ ⃗ =∫ f⃗ ⋅ d⃗
γ = W (γ) (1.79)
Σ γ=∂Σ

That is, the integral of the curl of f⃗ over a surface Σ is equal to the line integral of f⃗ over the
boundary. If B⃗ is a pseudo-vector field, like the magnetic field, we have

⃗ ⋅B
∭ ∇ ⃗ dV = ∬ B
⃗ ⋅ dΣ
⃗ = Φ(Σ). (1.80)
V Σ=∂V

That is, the integral of the divergence of B⃗ over a region V is equal to the surface integral of
B over the boundary. Note the pattern: the integral of the differential operator applied to the

bulk becomes the integral of the original object over the boundary.
Let us understand how this works in terms of functionals. Given the temperature T (P ),
we can construct a line functional that for each line returns the difference in temperature at
the endpoints. Given the work line functional W (γ), we can construct a surface functional
that for each surface returns the work needed to go around the contour. Given the magnetic
flux surface functional Φ(Σ), we can construct a volume functional that for each volume
returns the magnetic flux over the boundary. In general, suppose we have a k-functional
Fk ∶ S k → R. To each k-surface σ k we can associate a quantity Fk (σ k ). Now, suppose we are
given a (k + 1)-surface σ k+1 . While Fk cannot act on σ k+1 , the boundary ∂σ k+1 is a k-surface,
therefore we can evaluate Fk (∂σ k+1 ). Therefore, we can define the (k + 1)-functional ∂Fk
such that ∂Fk (σ k+1 ) ↦ Fk (∂σ k+1 ), which we call the exterior functional of Fk .25 Since the
25
Mathematically, we would have to prove that ∂Fk is a functional. Again, these mathematical details are
left for the physical mathematics section. For now, we are more interested in the conceptual understanding.
1.6. REVERSING DIFFERENTIAL TOPOLOGY 41

exterior functional is an additive functional, it will have a corresponding form that acts on
the infinitesimal region. Conceptually, the gradient of T is the one-form that corresponds to
the boundary functional of T ; the curl of f⃗ is the two-form that corresponds to the boundary
functional of the work W , the line integral of f⃗; the divergence of B⃗ is the three-form that
corresponds to the boundary functional of the magnetic flux Φ, the surface integral of B.⃗ The
problem is, again, that the gradient, curl and divergence are not expressed in a way that is
easy to generalize.
Given that all three operators are in terms of ∇, which in components is written ∂i , we
would like to write something along the lines of

∂Fk (σ k+1 ) = ∫ ∂i0 ∧ ωi1 i2 ⋯ik dxi0 dxi1 dxi2 ⋯dxik


σ k+1
(1.81)
=∫ ωi1 i2 ⋯ik dxi1 dxi2 ⋯dxik = Fk (∂σ k+1 ).
∂σ k+1

The operation ∧, which we call exterior product, must be such that we recover something
consistent to the previous operations. That is, we need

∂i ∧ T = ∂i T
∂i ∧ Fj = ∂i Fj − ∂j Fi (1.82)
∂i ∧ Bjk = ∂i Bjk + ∂j Bki + ∂k Bij

These would be the expressions we need to recover the gradient, curl and divergence respec-
tively. Let us study them to see the pattern. Fist of all, each expression takes a k-form and
returns a (k + 1)-form by adding a derivation along each index. Given that we have a deriva-
tion for each index, the number of terms matches the number of indexes of the final form,
which is k + 1. Each term changes index by taking a cyclic permutation. Recall that the forms
are anti-symmetric, therefore each permutation of two indexes introduces a minus sign. A
cyclic permutation of k + 1 elements corresponds to k pair swaps. If k + 1 is odd, then, each
cyclic permutation will correspond to an even number of sign switches, which cancel out. The
pattern, then, generalizes in the following way

∂i0 ∧ ωi1 i2 ⋯ik = ∂i0 ωi1 i2 ⋯ik + (−1)k ∂i1 ωi2 ⋯ik i0 + (−1)2⋅k ∂i2 ωi3 ⋯ik i0 i1 + ⋯
+ (−1)k⋅k ∂ik ωi0 i1 ⋯ik−1
(1.83)
k
= ∑ (−1)j⋅k ∂ij mod k+1 ωij+1 mod k+1 ij+2 mod k+1 ⋯ij+k mod k+1 .
j=0

The use of mod k + 1 in the generalized expression makes sure that the index jumps from
ik to i0 . This gives us a fully anti-symmetric tensor which matches the gradient, curl and
divergence in the simple cases.26 This operation is called the exterior derivative.
While we have the expression for the exterior derivative, we would like to understand
why and how the expression works. Geometrically, we can imagine integrating along a paral-
lelepiped, which becomes

∫ (∂ ∧ ωk )(dσ k+1 ) = ∫ ∂i0 ∧ ωi1 i2 ⋯ik dxi00 dxi11 ⋯ dxikk


σ k+1 σ k+1
26
In principle, we could write the expression in different equivalent ways. Here we used cyclic permutations
as it gives a more elegant expression.
42 CHAPTER 1. CLASSICAL MECHANICS
bu0 bu1 buk
=∫ ∫ ⋯∫ (∂u0 ωu1 u2 ⋯uk + (−1)k ∂u1 ωu2 ⋯uk u0
au0 au1 auk
k⋅k
+ ⋯ + (−1) ∂uk ωu0 u1 ⋯uk−1 )du0 du1 du2 ⋯ duk
bu1 buk bu0
=∫ ⋯∫ ∫ du0 ∂u0 ωu1 u2 ⋯uk du1 du2 ⋯ duk
au1 auk au0
bu0 bu2 buk bu1
+ (−1)k ∫ ∫ ⋯∫ ∫ du1 ∂u1 ωu2 ⋯uk u0 du0 du2 ⋯ duk
au0 au2 auk au1
bu0 bu1 buk
+ ⋯ + (−1)k⋅k ∫ ∫ ⋯∫ duk ∂uk ωu0 u1 ⋯uk−1 du0 du1 ⋯ duk−1
au0 au1 auk

bu1 buk bu0


= [∫ ⋯∫ ωu1 u2 ⋯uk du1 du2 ⋯ duk ]
au1 auk au0

bu0 bu2 buk bu1


k
+ (−1) [∫ ∫ ⋯∫ ωu2 ⋯uk u0 du0 du2 ⋯ duk ]
au0 au2 auk au1

bu0 bu1 buk−1 buk


+ ⋯ + (−1)k⋅k [∫ ∫ ⋯∫ ωu0 u1 ⋯uk−1 du0 du1 ⋯ duk−1 ]
au0 au1 auk−1 auk

=∫ ωk (d∂σ k+1 ).
∂σ k+1

What happens is that each direction of integration will match a derivative in the same direc-
tion, and therefore will reduce to the integration of ω on opposing sides of the parallelepiped.
This happens for each direction, and therefore the whole integral will reduce to the integra-
tion of ω on the surface of the parallelepiped. We have verified that equation 1.81, which is
known as the generalized Stokes theorem, indeed works and it includes, as particular cases,
the gradient theorem 1.78, the curl theorem 1.79 and the divergence theorem 1.80.
Another aspect of vector calculus is given by the following identities
⃗ =0
⃗ × ∇T

(1.84)
⃗ × f⃗ = 0.
⃗ ⋅∇

To generalize them, we note that the exterior product ∧ is an anti-commutative and associative
operation. Therefore we have

∂i ∧ ∂j ∧ ωl1 l2 ⋯lk = (∂i ∂j − ∂j ∂i ) ∧ ωl1 l2 ⋯lk = 0 ∧ ωl1 l2 ⋯lk = 0. (1.85)

In other words, the exterior derivative applied twice returns zero, no matter on what form it
is applied. Given that the curl is the exterior derivative applied to one forms and the gradient
is the exterior derivative applied to zero forms, the fact that the curl of the gradient is zero is
simply an application of the more general property. The same applies for the divergence of the
curl. As we saw, mathematically the property is easy enough to verify, but we get no insight
into its meaning. To understand the geometrical significance of the generalized relationship,
recall that the exterior derivative of the form is associated with the exterior functional. We
should, then, look at what happens when we construct the exterior functional of an exterior
functional. We have

∂∂Fk (σ k+2 ) = ∂Fk (∂σ k+2 ) = Fk (∂∂σ k+2 ) = Fk (∅) = 0. (1.86)


1.6. REVERSING DIFFERENTIAL TOPOLOGY 43

In words, the exterior of the exterior functional of Fk equals Fk applied to the boundary of the
boundary. However, the boundary of a boundary is always the empty set, and any functional
applied to the empty set must be zero. In fact, since functionals are additive, we must have

Fk (σ k ) = Fk (σ k ∪ ∅) = Fk (σ k ) + Fk (∅). (1.87)

Therefore, we can see that the identity ∂ ∧ ∂ ∧ ω = 0 for every form ω is ultimately a direct
consequence that ∂∂U = ∅ for every set U .27 This shows how studying differential relationships
in terms of finite functionals can give a more geometrically meaningful picture.
The last element of vector calculus we need to generalize is the idea of potentials. That
is,

⃗ × f⃗ = 0 ⇒ f⃗ = ∇V
∇ ⃗
(1.88)
⃗ ⋅B
∇ ⃗=0⇒B ⃗=∇ ⃗ × A.

These are generalized by the following formula

∂i ∧ ωl1 l2 ⋯lk = 0 ⇒ ωl1 l2 ⋯lk = ∂l1 ∧ θl2 ⋯lk . (1.89)

That is, if the exterior derivative of a k-form is zero, then there exists a (k − 1)-form whose
exterior derivative is the original k-form.28
To sum up, we have generalized the idea of integration over k-dimensional submanifolds
of an n-dimensional space, which leads to the idea of k-functionals over finite regions and k-
forms over the infinitesimal ones; we have seen that the finite functionals are physically more
fundamental than the infinitesimal forms. We have seen how k-functionals induce (k + 1)-
functionals by acting on the boundaries of the k + 1 dimensional regions. We have seen that
the exterior derivative gives us the form associated to the exterior functionals, and that this
operation generalizes the notion of gradient, curl and divergence. While this does not exhaust
all that can be done in differential topology and differential geometry, this is enough for what
we need to use in the following sections.
Those already familiar with differential topology will have noticed that our notation and
definitions do not quite match the ones typically used in math textbooks. The issue is that
said notation and definitions do not match what we need to physically capture, and therefore
it would be a mistake to employ them. Let us briefly see why. First of all, in the context of
differential topology vectors are defined as directional derivatives. That is, a vector v = v i ∂i
is an operator that acts on scalar functions. A velocity, which in physics we think of as a
vector, is not a directional derivative. In differential topology, a covector θ = θi dxi is defined
27
Note that we are talking about the boundary of a manifold, not the boundary of a set in the topological
sense.
28
Technically, closed forms (i.e. those whose exterior derivative is zero) are not necessarily exact forms (i.e.
those that are the exterior derivative of another form). This is true only on contractible regions (i.e. those regions
that can be continuously shrunk to a point, that do not have holes). While this is a subtle mathematical point,
we can understand it by looking at the corresponding functionals. An exact form corresponds to a functional
that returns zero for any closed surface. A closed form, however, is guaranteed to return zero only on closed
surfaces that are contractible, that can be continuously shrunk to a point. For example, the functional associated
with a closed form may return non-zero over a closed surface that encloses a hole, something the functional
associated with an exact form cannot do. Therefore, all exact forms are closed, but not all closed forms are
exact. However, if we restrict ourselves to a contractible region, the two definitions are the same.
44 CHAPTER 1. CLASSICAL MECHANICS

as a map from a vector to a scalar number. Conjugate momentum, whose components change
as a covector, is not a map. In differential topology, a differential dxi is a covector, and it
is the exterior derivative of the coordinate xi . This means that differentials are maps such
that dxi (∂xj ) = δji . In physics, this is not how we think of differentials and integration. In
fact, consider the expression ∫ fi dxi . To write it in terms of invariant objects, we would
have f = fi ei and dx = dxi ei , where ei and ei are the co-basis and basis respectively, and
therefore ei (ej ) = δji . So we obtain ∫ f (dx) = ∫ fi ei (dxj ej ) = ∫ fi dxi = ∫ df . Therefore, in
physics, the differential dx is more properly thought of as a vector, where the dxi are the
contravariant components, while f as a map from a vector dx to the differential df , which is
what is integrated. Therefore the differential is the vector, while the force is the covector. This
is in contrast to the use in differential topology. Moreover, in differential topology there is a
notion of a single tangent space where all vectors live, which is not compatible with the idea of
units. Consider the basis ∂i . Given that coordinates are expressed in different units, we cannot
simply sum derivatives along different directions. For example, in polar coordinates ∂r may
have units of inverse meters while ∂θ of inverse radians. Worst of all, a directional derivative
is taken with respect to a parameter, which will also have units and physical dimension. For
example, if v represents a velocity, the components v i would also depend on the units of space
and time. This would mean that units of vectors, the tangent space of a manifold, depend
not only on the physical dimensions of the space, but also on the physical dimensions of all
possible parameters along which we may want to define a directional derivative. This would
mean that the tangent space is not definable only in terms of the units of the manifold itself,
and therefore is not defined just in terms of the manifold itself.
The takeaway message here is the following: the mathematical tools we inherit from math-
ematics are not necessarily designed to capture the physical relationships we need to capture.
Mathematicians only care about formal definitions, regardless of what, or if, they represent
physically. In physics we do not have this luxury. If we want to have meaningful physical
theories, which is ultimately the goal of reverse physics, we need to revisit the mathematical
tools we use to formulate them.

1.7 Reversing Lagrangian mechanics


Now that we have a good geometric and physical feel for Hamiltonian mechanics, and that
we have a general understanding of what the tools of differential topology describe physically,
we will analyze Lagrangian mechanics more in detail. Conceptually, we already know that
Lagrangian mechanics is Hamiltonian mechanics plus assumption KE. We will see that the
flow of states in phase space admits a vector potential and the Lagrangian is the scalar product
between that potential and the displacement along the path. The principle of least action is
geometrically equivalent to asking for paths that are always tangent to the displacement field.
Moreover, the principle of least action is better understood as a property of Hamiltonian
evolution, and assumption KE is only required to express the product between potential and
displacement in terms of velocity instead of momentum.

Kinematic assumption revisited


As we saw in section 1.2, Lagrangian mechanics is the subset of Hamiltonian mechanics for
which assumption KE is valid, which means Lagrangian mechanics is equivalent to assuming
1.7. REVERSING LAGRANGIAN MECHANICS 45

DR, IND and KE. For the first two assumptions, we found a host of equivalent mathematical,
geometric and physical formulations. Let’s see what can we find for KE.
First of all, let us summarize the conditions we have already found. Hamilton’s equations
always impose that the velocity is a function of the state variables:
v i = dt q i = ∂pi H. (1.90)
At fixed position, the Jacobian of the transformation is therefore the Hessian of the Hamilto-
nian
∂pi v j = ∂pi ∂pj H. (1.91)
Therefore condition
At every position, the relationship between momentum and velocity
(WKE-INV)
is invertible and differentiable
and condition
At every point, the Hessian of the Hamiltonian is non-singular (hy-
(WKE-HYP)
perregularity of H): ∣∂pi ∂pj H∣ ≠ 0
are equivalent to each other. The non-singularity of the Hessian can also be understood as
strict monotonicity of the derivative along momentum. Therefore
The Hamiltonian is twice differentiable and concave (or convex) in
(WKE-CONC)
momentum
is another equivalent condition.
Note that the relationship between velocity and momentum is not just invertible, but
differentiable as well. This may seem like an additional condition from KE, but recall that the
dynamics is not in terms of points, but rather cells of phase space and density distributions. In
order to express those geometric elements in terms of the kinematic variables, we must make
sure that the Jacobian determinant of the transformation from state variables to kinematic
variables is well-defined and non-zero. We have
∂ j xi ∂pj xi δji 0
∣J∣ = ∣ q i i∣ = ∣ ∣ = ∣δji ∣ ∣∂pj v i ∣ − ∣0∣ ∣∂qj v i ∣ = ∣∂pj v i ∣ . (1.92)
∂q j v ∂p j v ∂qj v i ∂pj v i
The Jacobian determinant of the change of variables, then, coincides with the Jacobian deter-
minant of the relationship between velocity and momentum at constant position. Therefore
the following conditions are all equivalent to each other and to condition WKE-INV.
The Jacobian of the transformation between state variables and kine-
(WKE-NSIN)
matic variables is non-singular.
Densities over phase space can be expressed in terms of position and
(WKE-DEN)
velocity: ρ(xi , v j )∣J∣ = ρ(q i , pj ).
Areas and volumes in phase space can be expressed in kinematic
(WKE-VOL)
variables: dx1 ⋯dxn dv 1 ⋯dv n = ∣J∣ dq 1 ⋯dq n dp1 ⋯dpn .
The symplectic form ωab can be expressed in kinematic variables. (WKE-SYMP)
The displacement field S a can be expressed in kinematic variables. (WKE-DISP)
The insight is that differentiability between state variables and kinematic variables is
required to be able to express the objects we used to characterize the deterministic and
reversible dynamics. The only physical requirement, then, is to be able to express the dynamics
in terms of the kinematics, which is exactly what assumption KE already requires.
46 CHAPTER 1. CLASSICAL MECHANICS

On the physical meaning of the Lagrangian


A key problem is to understand what the Lagrangian represents physically. It is often intro-
duced as the difference between kinetic and potential energy. However, this does not work in
general. Take the Lagrangian for a particle with charge q under an electromagnetic field with
electric potential V and magnetic potential Ai .

1
L = m∣v i ∣2 + qv i Ai − qV. (1.93)
2
The first term is clearly kinetic energy, the last clearly potential energy, but what about
the middle term? It depends on velocity, so it would appear to be a kinetic term, but it
also depends on the potential. It’s both and neither. The characterization of Lagrangian as
difference between kinetic and potential energy, then, works for some systems but not in
general and it is best abandoned.
Another problem in understanding what the Lagrangian represents is that it is not unique.
Now, a similar problem exists for the Hamiltonian, in the sense that we can sum an arbitrary
constant to any Hamiltonian without changing the equations of motion. However, the de-
generacy for a Lagrangian is far worse. For example, let f (xi , t) be an arbitrary function of
position and time. We can set

L′ = L + ∂xi f (xi , t)v i + ∂t f (xi , t). (1.94)

We can see how the Euler-Lagrange equations 1.3 are affected

0 = ∂xi L′ − dt ∂vi L′ = ∂xi L + ∂xi ∂xj f v j + ∂xi ∂t f − dt (∂vi L + ∂xi f )


= ∂xi L − dt ∂vi L + ∂xi ∂xj f v j + ∂xi ∂t f − ∂xj ∂xi f dt xj − ∂t ∂xi f dt t
(1.95)
= ∂xi L − dt ∂vi L + ∂xi ∂xj f v j + ∂xi ∂t f − ∂xj ∂xi f v j − ∂t ∂xi f
= ∂xi L − dt ∂vi L.

That is, the equations of motion given by L′ are the same as the ones given by L. Therefore
the actual value, or the difference of values, of the Lagrangian is physically meaningless. This
makes the question even more puzzling: what is the Lagrangian?

The extended phase space


Given that we have a good understanding of Hamiltonian mechanics, let’s work on the equa-
tions that link the two. We have

L = pi v i − H
= pi dt q i − Hdt t
= [pi 0 −H] ⎡⎢dt q i ⎤⎥. (1.96)
⎢ ⎥
⎢dt pi ⎥
⎢ ⎥
⎢ dt t ⎥
⎣ ⎦
The Lagrangian, then, can be understood as the scalar product of two vectors. The second
one is the displacement along the path, however it is the displacement not just in position and
momentum, but over time as well. The correct setting to understand Lagrangian mechanics,
1.7. REVERSING LAGRANGIAN MECHANICS 47

then, is phase space extended by the time variable. If we redefine ξ a = [q i pi t] to include the
time variable, we have

S a = dt ξ a = [dt q i dt pi dt t] . (1.97)

We can then define the new vector θa in terms of its components

θa = [pi 0 −H] . (1.98)

We also need to generalize ωab to the extended phase space. Looking at equation HM-G,
the idea is to put the gradient of the Hamiltonian in the time component. If we set

⎡ ωi j
⎢ qq ωqi pj ωqi t ⎤ ⎡
⎥ ⎢ 0 δji ∂q i H ⎤

⎢ ⎥ ⎢ j ⎥
ωab = ⎢ ωpi qj ωpi pj ωpi t ⎥ = ⎢ −δi 0 ∂pi H ⎥, (SF-EPS)
⎢ ⎥ ⎢ ⎥
⎢ ωtqj ωtpj ωtt ⎥ ⎢ −∂qj H −∂p H 0 ⎥
⎣ ⎦ ⎣ j ⎦

we have
i
S a ωaqj = S q ωqi qj + S pi ωpi qj + S t ωtqj
= −S pj − S t ∂qj H = −S pj − ∂qj H = 0
i
S a ωapj = S q ωqi pj + S pi ωpi pj + S t ωtpj
j j
= S q − S t ∂pj H = S q − ∂pj H = 0 (1.99)
i
S a ωat = S q ωqi t + S pi ωpi t + S t ωtt
i
= S q ∂q i H + S p i ∂p i H
= ∂pi H∂qi H − ∂qi H∂pi H = 0.

This means that, on the phase space extended by time, Hamilton’s equations become

Sa = S b ωba = 0. (HM-EPS)

Note that while the position and momentum components of Sa still perform a rotation, the
time component does not. So we can’t understand Sa as a rotated displacement. Recall that
v a ωab wb = vb wb quantified the number of states in the parallelepiped formed by v a and wb . If
vb wb = 0, then the parallelepiped does not identify states on an independent DOF. For each
vector v a , the covector va identifies the direction that forms an independent DOF with v a .
If we only have position and momentum, we can see that you get the direction rotated by
ninety degrees along each DOF. If we extend phase space with time, time does not add a
new independent degree of freedom. In fact, the direction given by the displacement field S a
should give us no new independent states: there should be no direction v b in phase space such
that S a ωab v b ≠ 0. In other words, Sb = S a ωab must be zero and this is what equation HM-EPS
says.

Potential of the flow - 1 DOF case


We wrote the Lagrangian L = θa dt ξ a in terms of θa and dt ξ a , but we have yet to understand
what θa is. If we compare it to ωab , we note that the first has the Hamiltonian as a component,
48 CHAPTER 1. CLASSICAL MECHANICS

while the second has its derivative. Given that ωab is anti-symmetric, it is a two-form, we may
want to calculate the anti-symmetrized derivative of θa , the exterior derivative. We have

⎡ ∂qi θqj − ∂qj θqi ∂ q i θp j − ∂ p j θq i ⎤


∂qi θt − ∂t θqi
⎢ ⎥
⎢ ⎥
∂a θb − ∂b θa = ⎢ ∂pi θqj − ∂qj θpi ∂ p i θp j − ∂ p j θp i ∂pi θt − ∂t θpi

⎢ ⎥

⎣ ∂t θqj − ∂qj θt ∂ t θp j − ∂ p j θt ∂t θt − ∂t θt


⎡ ∂qi pj − ∂qj pi ∂qi 0 − ∂pj pi ∂qi (−H) − ∂t pi ⎤
⎢ ⎥
⎢ ⎥
=⎢ ∂pi pj − ∂qj 0 ∂pi 0 − ∂pj 0 ∂pi (−H) − ∂t 0 ⎥
⎢ ⎥

⎣ ∂t pj − ∂qj (−H) ∂t 0 − ∂pj (−H) ∂t (−H) − ∂t (−H) ⎥

(1.100)

⎢ 0−0 0 − δij −∂qi H − 0 ⎤⎥
⎢ i ⎥
=⎢
⎢ δj − 0 0−0 −∂pi H − 0 ⎥⎥

⎣ 0 + ∂qj H 0 + ∂pj H −∂t H + ∂t H ⎥⎦

⎢ 0 −δij −∂qi H ⎤⎥
⎢ ⎥
=⎢
⎢ δji 0 −∂pi H ⎥⎥ .

⎣ ∂qj H ∂pj H 0 ⎥

This means that the form ωab is minus the exterior derivative of θa :

ωab = −(∂a θb − ∂b θa ) = −∂a ∧ θb


(1.101)
θa = [pi 0 −H] .

In other words, ωab has a null exterior derivative and θa is its potential.
Given that the extended phase space for a single degree of freedom is three dimensional,
we can use standard vector calculus to gain more understanding. In this case, we have

θa = [p 0 −H]
⎡ 0
⎢ 1 ∂q H ⎤ ⎡ 0
⎥ ⎢ S t −S p ⎤⎥ (1.102)
⎢ ⎥ ⎢ ⎥
ωab = ⎢ −1 0 ∂p H ⎥ = ⎢ −S t
0 S q ⎥ = ϵabc S c
⎢ ⎥ ⎢ p ⎥
⎢ −∂q H −∂p H
⎣ 0 ⎥ ⎢ S
⎦ ⎣ −S q 0 ⎥⎦

where ϵabc is the fully anti-symmetric Levi-Civita symbol which returns the sign of the per-
mutation of the variables (i.e. ϵqpt = ϵptq = ϵtqp = 1 while ϵtpq = ϵpqt = ϵqtp = −1). We then have
the following relationship

ϵabc S c = ωab = −∂a ∧ θb = −(∂a θb − ∂b θa )


(1.103)
S⃗ = −∇
⃗ × θ⃗b

Apart from the minus, this is the same relationship we would have between the magnetic field
B i and its vector potential Ai . Therefore the same concepts from vector calculus apply: S a
is divergenceless, admits a vector potential, and the flow of the displacement over a closed
surface is zero. We can understand this as assumption DR implemented in the extended phase
space. Moreover, we have the following relationship:

a b c a b
∫ ωab dx1 dx2 = ∫ ϵabc S dx1 dx2 (1.104)
Σ Σ
1.7. REVERSING LAGRANGIAN MECHANICS 49

That is, the integral of the form ωab corresponds to the surface integral of the displacement
field S a .29 In phase space extended by time, then, ωab quantifies both the number of states
over a surface and the flow of S a through the surface. Under assumption DR this works
because we have one and only one evolution for each state, therefore quantifying the number
of evolutions that intersect a given surface is the same as quantifying the number of states
that flow through them.
One striking feature of the vector potential θa is that it is fully characterized by a single
arbitrary component, the time component −H. The magnetic vector potential Ai , instead,
is characterized by two arbitrary components. What are the exact physical conditions that
allow that to happen?
Suppose that S a is a divergenceless field, then we can write it as minus the curl of a
potential

θa = [θq θp θt ] . (1.105)

Vector potentials are defined up to the gradient of an arbitrary function (i.e. up to a gauge),
since ∇ × (θ + ∇f ) = ∇ × θ. We can choose f such that ∂p f = −θp , and therefore we can set,
without loss of generality, the momentum component to zero

θa = [θq 0 θt ] . (1.106)

So far, this procedure can be applied to any potential of a divergenceless field. However, the
displacement field S a is particular in that its time component S t = dt t = 1 is unitary. That is,
states flow at a uniform rate in time. Therefore we must have

S t = −(∂q θp − ∂p θq ) = ∂p θq = 1. (1.107)

Integrating the relationship we find that θq = p + c(q, t), where c(q, t) is an arbitrary function.
We can choose c(q, t) = 0 without loss of generality since that arbitrariness corresponds to a
choice of gauge. In fact, we can choose f (q, t) such that ∂q f = −c(q, t). Given that ∂p f (q, t) = 0,
this will not impact the previous arbitrary choice of θp = 0. We have

θa = [p 0 θt ] . (1.108)

At this point, we simply rename the last component to −H and have

θa = [p 0 −H] . (1.109)

The form of the vector potential, then, is set by the fact that the flow of S a is both diver-
genceless and uniform along the time direction.
This discussion tells us exactly what θa is in the case of a single degree of freedom: it is
the potential of the displacement field S a or, equivalently, of the form ωab . Since we are in the
single degree of freedom case, assumption DR is enough to recover Hamiltonian mechanics,
and this is the same in the extended-phase-space formulation.
29
Note that, strictly speaking, while ωab is a tensor, ϵabc and S a are not. Relationship 1.103, then, is valid
only if we chose position, momentum and time as variables. Yet, as we will see much later, we can change the
position variable and time variable and still have the same expression by redefining momentum and energy
appropriately.
50 CHAPTER 1. CLASSICAL MECHANICS

We also learned another thing: the Hamiltonian is the time component of the potential,
therefore it is not a scalar. That is, if we made a change in the time variable t̂ = t̂(t), we would
have

Ĥ = −θt̂ = −dt̂ t θt = dt̂ t H. (1.110)

It transforms like a scalar only under purely spatial change of variables.


Lagrangian and action
Now that we have seen what θa is, we can go back to the Lagrangian and the action. Suppose
that we have a path γ, not necessarily an actual evolution, that proceeds along the time
variable. That is, we can write

γ = [q i (t), pi (t), t]
(1.111)
dγ = dξ a = dt ξ a dt.

where we used the time variable as its affine parameter. Note that, in this case, the dis-
placement is along the generic path γ, which is not necessarily an actual evolution. Therefore
dt ξ a = S a if and only if γ is an actual evolution of the system. For a generic path we can write:

L = pi v i − H = θa dt ξ a
(1.112)
A[γ] = ∫ Ldt = ∫ θa dt ξ a dt = ∫ θa dξ a = ∫ θdγ
γ γ γ γ

This tells us that


The action along a path γ is the line integral of the vector potential θa of
(1.113)
the form ωab .
We now have a precise geometric characterization of the action and of the Lagrangian. As for
the physics, it tells us conclusively that both the Lagrangian and the action, by themselves,
are unphysical. That is, the value of the Lagrangian at a given position, velocity and time,
or the value of the action for a given path is not a physically meaningful value. The vector
potential is not a physical quantity: it depends on a choice of gauge, which does not correspond
to a physically well-defined object.
We are now in the position to understand why, as we have seen before, the Lagrangian for
a given system is not unique: it depends on the vector potential, and the vector potential is
not uniquely defined. We can see how the change of Lagrangian 1.94 corresponds to redefining
the vector potential as

θa′ = θa + ∂a f (q i , t). (1.114)

We have
L = θa dt ξ a
L′ = θa′ dt ξ a = θa dt ξ a + ∂a f dt ξ a = L + ∂qi f dt q i + ∂t f dt t (1.115)
i i i
= L + ∂xi f (x , t)v + ∂t f (x , t).

which is the same expression we had in 1.94.


1.7. REVERSING LAGRANGIAN MECHANICS 51

But if the action and the Lagrangian are unphysical, how can we use them to derive the
laws of evolution, which are clearly physical? Note that the laws are not expressed in terms
of the action, but in terms of the variation of the action. Let γ ′ , then, be a small variation of
the path γ with the same endpoints. Note that γ and γ ′ form a closed loop. Take a surface Σ
that is enclosed by that boundary, we can use Stokes’ theorem:

δA[γ] = δ ∫ Ldt = δ ∫ θa dξ a = ∫ θa dξ a − ∫ θa dξ a = ∮ θa dξ a = ∫ ∂a ∧ θb dξ a dη b
γ γ γ γ′ ∂Σ Σ
(1.116)
a b
= − ∫ ωab dξ dη ,
Σ

where dη b is the displacement of a point from γ to the corresponding point on the variation γ ′ .
The variation of the action, then, corresponds to the surface integral of ωab , which is physical.
Geometrically, it corresponds to the flow of the evolutions through the surface enclosed by
the path and its variation. Note that, because the flow is divergenceless, it does not matter
which surface is chosen, since all surfaces that share the same boundaries will correspond to
the same flow. This is a striking observation: while the action is unphysical, its variation is
physical.
We now are in a perfect position to fully understand the principle of stationary action.
This states that actual evolutions are given by paths for which the variation is zero. That is,
a b b a
∫Σ ωab dξ dη has to be zero for all possible dη . This means we must have ωab dξ = 0. Since
a a a
S is the only degenerate direction for ωab , we must have dξ = S dt. In other words, the
paths that make the action stationary are exactly the paths whose tangent vector applied to
ωab return zero.
Geometrically, if a path γ is an evolution, it will always be tangent to the displacement
field S a . If we make a small variation γ ′ , the two paths will enclose an infinitesimal strip Σ
which will be tangent to S a as well. Therefore the flow of S a through Σ will be zero. Given
that the flow through Σ is minus the variation of the action, the variation of the action for
all evolutions will be zero. Conversely, if a path is not an evolution, at some point its tangent
will be different from the displacement field. Therefore we will be able to find a variation
γ ′ for which the surface Σ will “catch” some flow. That is, we will have a variation of the
path for which the variation of the action is non-zero. In other words, the action principle
is a geometrically roundabout way to ask for those paths that are always tangent to the
displacement field S a , which is divergenceless and has constant flow in time.
Note that the geometric and physical interpretation we have given to the Lagrangian and
the action lives in the extended phase space, which is part of the Hamiltonian formulation.
In fact, the only place where we need assumption KE is when we want to write pi dt q i − H
as a function of velocity instead of momentum. The integral of the vector potential and its
variation, in fact, can still be written even in terms of momentum, and therefore the geometric
and physical characterization of the principle of stationary action applies for all Hamiltonian
systems, not just the Lagrangian subset. In other words, the principle of stationary action
is really more a property of Hamiltonian mechanics than Lagrangian mechanics. The only
difference is that in Lagrangian mechanics, since assumption KE applies, it can be expressed
purely in terms of kinematic variables.
Multiple DOFs
The above discussion captures all the important elements of Lagrangian mechanics and the
action principle even if it is limited to the single DOF case. Still, we should generalize to the
52 CHAPTER 1. CLASSICAL MECHANICS

multiple DOFs case. Note that the equations we wrote for the Lagrangian and the action are
already in generalized form. The only thing that we need to do is derive the expression for
the vector potential θa in the general case.
If we compare SF-EPS with SF-ND, we find that the non-temporal components of the form
are identical. Therefore, we can still understand it as quantifying the number of configurations
within each DOF and the degree of independence across DOFs, while adding the idea that the
displacement field does not contribute new configurations. Mathematically, the displacement
field identifies the only direction in which the form ωab is degenerate. Let us see what the
non-temporal components tell us in terms of the potential. We have:
i
ω(eq , epj ) = (−∂θ)qi pj = −(∂qi θpj − ∂pj θqi ) = δji
i j
ω(eq , eq ) = (−∂θ)qi qj = −(∂qi θqj − ∂qj θqi ) = 0 (1.117)
ω(epi , epj ) = (−∂θ)pi pj = −(∂pi θpj − ∂pj θpi ) = 0

We can use our gauge freedom to set θp1 = 0, much in the same way we did before. We now have
∂q1 θp1 = 0 and, by the first condition, ∂p1 θq1 = 1. Integrating, we have θq1 = p1 +g(q i , p2 , p3 , ..., t)
where g is an arbitrary function which we can set to zero, since it corresponds to a choice of
gauge. Therefore we have:

θa = [p1 θq2 ⋯ θq n 0 θp 2 ⋯ θp n θt ] . (1.118)

Note that the components for the first degree of freedom do not depend on the other
degrees of freedom. That is, for all i > 1, ∂qi θq1 = ∂pi θq1 = ∂qi θp1 = ∂pi θp1 = 0. But by using
conditions 1.117, we find that the converse is true as well: the components of all other degrees
of freedom do not depend on the first. That is, for all i > 1, ∂q1 θqi = ∂p1 θqi = ∂q1 θpi = ∂p1 θpi = 0.
We can then use, again, our gauge freedom with a function that does not depend on the
first two variables to set θp2 = 0. And, with the same reasoning, we will be able to set θq2 = p2 .
And then, again, find that the first two degrees of freedom do not depend on the others, etc.
This will exhaust all DOFs and we can set θt = −H as before. This will find 1.101.
To recap, we have found that the expression of SF-EPS are equivalent to assuming DR
and IND and is also equivalent to 1.101. Note that the expression of ωab and θa are in terms
of specific coordinates, while the fact that ωab is the exterior derivative of θa , instead, is
coordinate independent. However, there is a characterization of ωab that is fully coordinate
independent.
Since ωab admits a potential, its exterior derivative is zero. We also have that the displace-
ment field identifies the only degenerate direction. That is, if v a ωab = 0 then v a = f S a for some
scalar function f . Physically, this corresponds to saying that temporal displacement is the
only direction that does not provide independent configurations: as we said before, time does
not provide new possible configurations for the system. It turns out that these are enough
conditions to find canonical coordinates [q i , pi , t] such that we can express ωab as SF-EPS.30
Therefore
The two-form ωab that quantifies independent configurations is closed
(i.e. has zero exterior derivative) and its only direction of degeneracy (DI-SYME)
is identified by the displacement field S a
30
This result is known as Darboux’s theorem. Unfortunately we haven’t been able to find a proof that is
short and/or physically significant, though we suspect such a proof should exist.
1.8. FULL KINEMATIC EQUIVALENCE AND MASSIVE PARTICLES 53

is an equivalent characterization of Hamiltonian mechanics in the extended phase space.


The above condition must imply both DR and IND. We can break down the two contribu-
tions in the following way. Suppose we have a system with n independent degrees of freedom,
meaning the extended phase space is of dimension N = 2n + 1. Assumption DR tells us we
have a way to measure the flow of the evolutions over a hyper-surface, which corresponds to
an (N − 1)-form Ωa1 ⋯a2n such that:

S a1 Ωa1 ⋯a2n = 0. (1.119)

This is just stating that, given that Ωa1 ⋯a2n measures the flow through an infinitesimal 2n-
dimensional parallelepiped, if one of the sides is along the direction of flow S a then the flow
will be parallel to the parallelepiped. If the space is charted by position, momentum and time,
we can write:
Ωa1 ⋯a2n = ϵa1 ⋯a2n a2n+1 S a2n+1
a2n+1 a1
(1.120)
∫ ϵa1 ⋯a2n a2n+1 S dξ ⋯dξ a2n = ∫ Ωa1 ⋯a2n dξ a1 ⋯dξ a2n .

On the right side, we see that we are really integrating the flow S a through the hypersurface
defined by the differentials dξ ai , which corresponds to the integral of Ωa1 ⋯a2n through the hy-
persurface. Note that, under DR, we have one and only one evolution for each state, therefore
measuring the flow of evolutions is equivalent to measuring the number of states that travel
through the surface.
As we said, this setup was motivated by assumption DR alone. If we add IND, then we
can write:

Ωa1 ⋯a2n = ωa1 a2 ∧ ωa3 a4 ∧ ⋯ ∧ ωa2n−1 a2n . (1.121)

This tells us that the total state count becomes the product of the configurations along each
degree of freedom.
The geometric and physical understanding of the action and the Lagrangian are the same
as in the single DOF case: the action is the line integral of the vector potential θa and the
variation of the action is minus the surface integral of ωab between the path and its variation.
The only difference is that this corresponds not to the flow of total states, but to the flow of
configurations for a single degree of freedom. Still, the flow is zero only if the path is everywhere
tangent to the displacement field S a , and therefore if the path is an actual evolution of the
system.

1.8 Full kinematic equivalence and massive particles


In the previous section we recovered Lagrangian mechanics by formulating the principle of
stationary action in Hamiltonian form, and then used assumption KE in form WKE-INV
simply to express it in terms of the kinematic variables. This is the weakest use of the as-
sumption, as it only assumes we have an invertible map between velocity and momentum.
However, we will find that this is not enough to define a meaningful count of configurations
over kinematic variables, which requires states to be uniformly distributed over velocity. This
leads to a stronger version of assumption KE, which requires the map between velocity and
conjugate momentum to be linear, which in turn fixes the dynamics to the one of massive
particles under scalar and vector potential forces.
54 CHAPTER 1. CLASSICAL MECHANICS

Linear map between linear spaces


Condition WKE-INV imposes, at every position, an invertible relationship pi = pi (q j , v j )
between conjugate momentum and velocity which can be arbitrary. Note that both velocity
and momentum are linear objects, and therefore it would seem natural to require a linear
relationship between the two. Though we do not have, at this point, a clear physical reason
to impose this restriction, let’s see what it would entail.
If the relationship between momentum and velocity at every point is linear, we have that
the Jacobian is only a function of position. Moreover, since it is a linear map between a vector
and a covector, the map must be a tensor. We have

∂vi pj = mgij (1.122)

where m is a constant that transforms units of velocity to units of momentum while gij is the
actual linear map in terms of spatial coordinates. Since pj = ∂vj L, we have

mgij = ∂vi pj = ∂vi ∂vj L = ∂vj ∂vi L = ∂vj pi = mgji (1.123)

and find that the tensor gij is symmetric.


If we integrate equation 1.122 we have

pi = mgij v j + qAi (1.124)

where Ai are arbitrary functions of position and q is an arbitrary constant. Since gij and Ai
are functions of position only, we can see that, if we fix position, the relationship is a line
where mgij is the slope and Ai the value of momentum for zero velocity. Note that
1 ij
v i = dt q i = ∂pi H = g (pj − qAj ). (1.125)
m
If we integrate again we find
1
H= (pi − qAi )g ij (pj − qAj ) + V (1.126)
2m
where V is yet another arbitrary function. This is exactly the Hamiltonian for a massive
particle under scalar and vector potential forces: m is the mass, gij is the metric tensor, q is
the charge, Ai is a vector potential and V is the scalar potential.31 Therefore, we saw that
imposing a linear relationship between velocity and conjugate momentum gives us massive
particles under potential forces.
Conversely, if we start by imposing the above Hamiltonian for massive particles under
potential forces, we find a linear relationship between conjugate momentum and velocity.
Therefore condition
There is a linear relationship between conjugate momentum and ve-
(FKE-LIN)
locity
and condition
The system under study is a massive particle under scalar and vector
(FKE-POT)
potential forces
31
Note that we have not separated the charge from the potentials.
1.8. FULL KINEMATIC EQUIVALENCE AND MASSIVE PARTICLES 55

are equivalent. At this point, having seen enough of reverse physics, it should be clear that
this can’t simply be a coincidence and it warrants more exploration.
Recall, that in 1.92 we saw that the Jacobian determinant of the transformation from
state variables to kinematic variables was equal to the Jacobian determinant of the relationship
between velocity and momentum. Since mgij is the Jacobian between velocity and momentum,
we find
1 ij
∂p i v j = g
m
(1.127)
∣g ij ∣ ∣gij ∣−1 1
∣J∣ = ∣∂pi v j ∣ = = = .
m m m ∣gij ∣

Recall that the existence of a non-singular Jacobian is what allowed us to express all differential
objects (e.g. densities, phase-space areas/volumes, the symplectic form) in terms of kinematic
variables. This forces those expressions to be in terms of position only. We therefore have
that the following conditions are equivalent to each other and equivalent to FKE-LIN and
FKE-POT.
The Jacobian of the transformation between state variables and kine-
(FKE-NSIN)
matic variables is a non-singular function of position only.
Densities over phase space can be expressed in terms of position
and velocity by rescaling the value at each point: ρ(xi , v j )∣J(xi )∣ = (FKE-DEN)
ρ(q i , pj ).
Areas and volumes in phase space can be expressed in kine-
matic variables, and the transformation depends on position only: (FKE-VOL)
dx1 ⋯dxn dv 1 ⋯dv n = ∣J(xi )∣ dq 1 ⋯dq n dp1 ⋯dpn .
The symplectic form ωab can be expressed in kinematic variables,
(FKE-SYMP)
and its components are a linear function of velocity.

To understand better these conditions, consider a density ρqp over phase space and its
expression ρxv over kinematic variables. Since there is a factor of a Jacobian determinant
between the two, if the density takes the same value between two states as expressed in
position and momentum, the density as defined over position and velocity will not necessarily
have the same value. Position and momentum have a special property in that the density
of states is uniform in terms of those variables, and therefore comparing densities in those
variables is simply a matter of comparing the value of the density. Canonical coordinates are
exactly those state variables for which this property holds. While kinematic variables are, in
general, not canonical, the linearity between velocity and momentum imposes a lesser version
of this property: it makes it so that, at least at the same position, we can compare areas and
densities. Therefore if density for two different values of velocity at the same point matches,
ρxv (xi , v1j ) = ρxv (xi , v2j ), then we really have the same density of states. Therefore

Density expressed in velocity at the same position is proportional to


(FKE-PROP)
the density over states

is another equivalent condition.


In this formulation, one starts to wonder whether the lack of this property would make
physical sense. We can understand that non-linear transformations of position, since they
56 CHAPTER 1. CLASSICAL MECHANICS

stretch and shrink space differently at different points, would make it more complicated to
understand whether two densities at two different points are the same. On the other hand,
velocity is defined locally over infinitesimal changes of position, therefore non-linear changes
in velocity do not arise when changing units and coordinates.

Mass and inertial frames


Another way of looking at it is the following. Suppose we have a density ρ(q i , pj ) defined over
a finite region of phase space that is constant along momentum. That is, ρ(q i , p1j ) = ρ(q i , p2j )
for all p1j and p2j . Assuming we have a linear relationship between momentum and velocity we
have
ρxv (xi , v1j ) = m∣gij ∣ρqp (q i , mgij v1j + qAi ) = m∣gij ∣ρqp (q i , mgij v2j + qAi )
(1.128)
= ρxv (xi , v2j ).

Therefore if we have a linear map between velocity and momentum, uniform distributions
along momentum correspond to uniform distributions along velocity. The converse is also true:
if uniform distributions along momentum correspond to uniform distributions along velocity,
then, the Jacobian of the transformation cannot depend on either, and it must depend on
position only. Therefore
Uniform distributions along momentum correspond to uniform dis-
(FKE-UNIF)
tributions along velocity
is another equivalent characterization of FKE-LIN. Failure of this condition, then, would mean
that physical states are not uniformly distributed over velocity at the same position, and some
velocities correspond to higher state densities than others. This does not sound like something
that would mesh well with the principle of relativity.
We can go one step further. Suppose we can find a coordinate system for which gij over a
finite region is the identity matrix δij . Physically, this corresponds to a Cartesian coordinate
system in an inertial frame. In this frame, ∣gij ∣ = 1 and therefore densities over kinematic
variables differ from densities over phase space by a constant of proportionality: the mass. We
have the following characterization of mass
Inertial mass tells us how many states there are per unit of area of
(IM)
position-velocity in an inertial (Cartesian) frame.
This characterization of mass works even for massless particles, while the standard one does
not. In fact, if we define mass to be the resistance of a body to acceleration, we would expect
a zero mass body to be extremely easy to accelerate, while this is not the case: a massless
particle travels at the same constant speed. On the other hand, as we saw before, massless
particles do not satisfy KE, and therefore the state cannot be reconstructed from position
and velocity: areas of position-velocity correspond to zero states. So why are more massive
bodies harder to accelerate? The more massive the body, the more states per unit of velocity,
and therefore reaching the same velocity means going through more states. If we understand
force as a change of state per unit time, then we have an intuitive explanation that matches
the new characterization.
Note that we focused on Cartesian coordinates, not just inertial frames. That is not a co-
incidence: the motion for a free particles appears linear and uniform only in these coordinates.
1.8. FULL KINEMATIC EQUIVALENCE AND MASSIVE PARTICLES 57

That is, trajectories obey the linear expression xi = v i t+xi0 only in Cartesian coordinates. The
first law of Newton, the fact that in an inertial frame a body travels in linear and uniform
motion, implicitly implies the ability to use Cartesian coordinates. But, in light of what we
have seen before, it also implies the existence of mass. In fact, we can, in true reverse physics
spirit, turn this on its head. Suppose that the motion of free particles is linear and uniform.
The velocity will remain the same along an evolution; the distance between two particles
moving at the same velocity along the same line will also remain the same. Therefore consider
the plane charted by position and velocity along a particular direction. Take a parallelogram
where two sides are at constant velocity: its area would be the length of those sides ∆x times
the height, which will be given by the difference in velocity ∆v. As time evolves, the sides
at constant velocity will move but will remain at the same velocity. The parallelogram will
remain a parallelogram with the same height and base: the area is conserved. Now, clearly this
is a deterministic and reversible system. Position and velocity identify all states and already
provide an area that is conserved over time. It must be, then, that the count of states is pro-
portional to the area in position-velocity, and therefore we have the relationship pi = mv i . In
other words, once we assumed the existence of inertial frames, we already implicitly introduced
the idea of inertial mass as characterized above.
The above discussion links the linearity between velocity and momentum to the existence
of inertial frames. Therefore

At each position, there exists a local inertial frame (FKE-INER)

is equivalent to FKE-LIN. Note that here we are claiming that only local inertial frames are
needed. To see that, note that a change of position coordinates fully determines the change of
both position and velocity. Therefore a change of coordinate can neither introduce nor remove
a velocity dependency from the expression ∂vi pj . If ∂vi pj = mgij has no velocity dependency,
we can always find local spatial coordinates such that gij = δij . Therefore, ∂vi pj has no velocity
dependency if and only if we can find a local inertial (Cartesian) frame.
So we found that FKE-LIN is linked to inertia and the existence of inertial frames, but
in what sense is it linked to assumption KE? Suppose one gave us all the kinematics of a
system, which are fully defined by the reference frame and all possible trajectories. Velocity,
as a quantity, is fully defined in terms of position and time. Now, assumption KE tells us
we must be able to recover the full dynamics. We must be able, for example, to convert a
distribution over kinematic variables to one over state variables. This is essentially a change
of units of the density ρ from position-velocity to position-momentum. But if velocity is a
derived quantity, fully defined by position, then it should be the case that the transformation
rule of ρ is only a function of position. Therefore condition

The position fully defines the units of all state variables, therefore
(FKE-UNIT)
an invertible transformation between momentum and velocity

is an equivalent characterization of FKE-LIN. Ultimately, this must be the case if KE holds:


trajectories in space-time are fully specified by units of position, nad since we must be able
to reconstruct states from them, so the units of momentum must depend on them.
While this argument works on physics grounds, we cannot carry it out in a mathematically
precise way. The issue is that current mathematical structures are devoid of the concept of
units and of the type of definitional dependence we used. What does it mean, in a precise way,
58 CHAPTER 1. CLASSICAL MECHANICS

that velocity is a derived unit from position and time? What does it mean that temperature is
a derived unit from energy and entropy? Covariance and contravariance capture some hint in
terms of change of units (i.e. how do units of velocity (or momentum) change if I change units
of position?), but this is sufficient only when some initial relationship is assumed. Capturing
this initial relationship requires conceptual and mathematical tools that are not currently
available.

1.9 Relativistic mechanics


In this section we analyze what happens if we consider change of coordinates that mix space
and time. We will see that relativistic elements appear even if a notion of metric tensor is not
introduced, and the particle/anti-particle duality emerges. Moreover, under the full kinematic
equivalence special relativity is recovered.

Hamiltonian mechanics on the extended phase space


The Hamiltonian formalism is invariant under generic change of spatial variables, but if we
want to introduce generic changes of variables that mix space and time we have a problem. In
the standard formulation, time is the parameter of the evolution q i (t), meaning that we can
mix the spatial variables q i while leaving t alone. In the formulation extended by time, time
has a double of role of parameter and variable. That is, we write q i (t) and t(t). Therefore
mixing q i with t will affect the evolution parameter as well. What we need to do is to separate
the time variable from the parameter of the evolution. We can group the space-time variables
as

q α = [t, q i ]. (1.129)

We introduce an affine parameter s, and therefore a trajectory in space-time will be noted as


q α (s). Under a change of coordinate, the variables q α will mix but the affine parameter s will
not change.
Now we have understood how to deal with time, we have to understand how to deal
with energy. As we saw before, the Hamiltonian is no longer invariant under transformations
that affect time. Moreover, even if we start with a time-independent Hamiltonian, it will not
remain so under a generic coordinate transformation that mixes time and space. This means
that, during the evolution, energy will not be constant but will need to increase and decrease.
Another, more physical, way to look at it is that energy, like momentum, is not an absolute
quantity but rather a relative quantity with respect to an observer. If we imagine an observer
that is accelerating and decelerating, in the same way that he will see momentum change, he
will see the energy change as well. In the same way that momentum is sensitive to changes of
units along the corresponding spatial direction, energy is sensitive to change of time units. To
be able to characterize these relationships better, we introduce an energy variable E which we
group with the momentum variables and, since −H was the time component of the potential
θa , we write

pα = [−E, pi ] (1.130)

so that pα is a covector.
1.9. RELATIVISTIC MECHANICS 59

What we did is add a temporal degree of freedom, and with that addition, the equations
look a lot like standard Hamiltonian mechanics for multiple degrees of freedom. We have

ξ a = [q α pα ]
(1.131)
S a = ds ξ a = [ds q α ds pα ]

We define the potential θa 32 and the form ωab

θa = [pα 0]
ωqα qβ ωqα pβ 0 In (SF-FEPS)
ωab = ∂a ∧ θb = [ ]=[ ],
ωpα qβ ωpα pβ −In 0

and the equations of motion will be

S a ωab = ∂b H. (1.132)

The function H is called the Hamiltonian constraint. Typically, the constraint is chosen such
that

H = 0. (1.133)

Relativistic free particle


Given that this formulation is probably unfamiliar to most, let us see how it works for a free
particle in a Cartesian (and inertial) reference frame. In this case, it is more convenient to use
q 0 = ct as the time variable, which, given 1.110, means using p0 = −E/c for the energy. The
Hamiltonian constraint is
1 1
H= (pα η αβ pβ + m2 c2 ) = (pi δ ij pj − (E/c)2 + m2 c2 ) , (1.134)
2m 2m

where η αβ is the Minkowski metric. Hamilton’s equations give

1 1 E/c
ds q 0 = ds ct = ∂p0 H = ∂−E/c (−(−E/c)2 ) = − 2(−E/c) =
2m 2m m
E
ds t =
mc2
1 pi (1.135)
ds q i = ∂pi H = ∂pi (pi δ ij pj ) =
2m m
ds p0 = ds (−E/c) = ∂q0 H = 0
ds pi = ∂qi H = 0.

Therefore we have

pα = mds q α = muα . (1.136)


32
Recall that when phase space was extended with just time, we had θa = [θqi θpi θt ] = [pi 0 − H]. Now
that it is also extended with energy, we have θa = [θt θqi θ−E θpi ] = [−E pi 0 0] = [θqα θpα ] = [pα 0].
60 CHAPTER 1. CLASSICAL MECHANICS

We recognize uα as the four-velocity, and the affine parameter s as proper time. With this in
mind, we can rewrite the Hamiltonian constraint as

1 1 1
H= (muα ηαβ muβ + m2 c2 ) = m∣u∣2 + mc2 , (1.137)
2m 2 2

which looks like a kinetic energy term constructed from the four-velocity. Setting H to zero,
then, means setting the norm squared of the four-velocity to −c2 , which is consistent with
special relativity. It also means that

(−E/c)2 = m2 c2 + pi pi
√ (1.138)
E = ± c2 ∣pi ∣2 + (mc2 )2 .

That is, the Hamiltonian constraint sets the relationship between energy and momentum.

The geometry of extended phase space


Note that while the mathematical structure of the extended phase space looks the same at
first glance as the one of standard phase space, it has important differences. Let’s concentrate
on the Hamiltonian constraint. In standard Hamiltonian mechanics, the Hamiltonian H gives
a value of energy that is conserved by the system during the evolution, but the system can be
in different states with different energy. The Hamiltonian constraint H, instead, is a quantity
that is not only conserved by the system during the evolution, it is always the same for the
given system. In the example above, the Hamiltonian constraint essentially is a constraint on
the mass of the system. Therefore the space of states is not really the full 2n + 2 dimensional
manifold charted by time, position, energy and momentum, but it is the 2n + 1 dimensional
sub-manifold that is given by the constraint H = 0. The constraint lowers the dimensionality
by imposing a relationship between energy and the other state variables. The Hamiltonian
constraint, then, plays a double role: as a generator of the evolution over the affine parameter
s and as an equation of state of the system.33
Since over valid states we will have both H = 0 and E = H, we can write

H = λ(H − E). (1.139)

where λ is a function of the extended phase space. This expression provides a link between the
Hamiltonian constraint and the standard Hamiltonian. To understand what λ is, note that

ds t = ∂−E H = λ. (1.140)

Therefore λ is the rate of change between time and the affine parameter. Additionally, given
that time must still be a possible parameter for the evolution, t(s) must be an invertible
strictly monotonic function. Therefore we must always have ds t ≠ 0 at least over the region
33
Note that to apply the constraint to a distribution ρ, one can simply write Hρ = 0. This means that ρ can
be different from zero only over the region where H is zero, and therefore ρ is non-zero only for those points that
satisfy the Hamiltonian constraint. Moreover, note that in the case of a free particle in a Cartesian frame, the
̵2
Hamiltonian constraint Hρ = 0 is the classical analogue of the Klein-Gordon equation ( hc2 ∂t2 − h
̵ 2 ∇2 + m2 c2 ) ψ =
0.
1.9. RELATIVISTIC MECHANICS 61

where E = H. We can now verify that the new formulation recovers the old.

H = λ(H − E) = 0
E=H
dt t = 1 = dt s ds t = dt s ∂−E H = dt s λ
1
dt s =
λ
1 (1.141)
dt q i = dt s ds q i = dt s ∂pi H = (∂pi λ(H − E) + λ∂pi H) = ∂pi H
λ
1
dt pi = dt s ds pi = −dt s ∂qi H = − (∂qi λ(H − E) + λ∂qi H) = −∂qi H
λ
1
dt E = dt s ds E = dt s ∂t H = (∂t λ(H − E) + λ∂t H) = ∂t H
λ
Therefore if we set H = H − E, the affine parameter will have to be equal to time up to an
arbitrary additive constant, and therefore the formulations are equivalent.
The expression above seems to imply that the Hamiltonian constraint always has to cor-
respond to one Hamiltonian. This is not the case. Suppose we have

H = (H1 − E)(H2 − E) (1.142)

so that H1 (ξ a ) ≠ H2 (ξ a ) at all points. This means that there are two regions of the extended
phase space that satisfy the Hamiltonian constraints, one for H1 and one for H2 . These
two regions are disconnected, therefore states from different regions cannot be connected by
Hamiltonian evolution. They essentially represent two different types of system encoded into
the same equation. In principle, the same idea can be extended to have more than two regions.
While this may seem just a mathematical artifact, note that this is exactly what happens
for the Hamiltonian constraint of a free particle. In fact we can write:
1 √ 2 2 √
2 )2 + E)( c2 ∣p ∣2 + (mc2 )2 − E)
H= ( c ∣pi ∣ + (mc i (1.143)
2mc2
Given that the solutions for energy have opposite sign, one finds

1 E
λ= 2
(E + E) = (1.144)
2mc mc2
in agreement with what we found before.
Note that for a free particle, similarly to what one finds in quantum field theory, we have
both positive and negative energy solutions. Given that ds t = E/mc2 , what happens is the
affine parameter is anti-aligned with respect to time. For the negative energy solutions, then,
s will be minus proper time instead of proper time.34
In general, ds t may be positive or negative, which corresponds to the affine parameter
being aligned or anti-aligned with respect to time. Note that since ds t cannot be equal to
zero, therefore states for which ds t > 0 can never be connected with states for which ds t < 0.
34
This is a much more precise version of the claim that anti-particles “travel backwards in time”. They do
not. What happens is that the function t(s) is parameterized in the opposite direction. However, the affine
parameter s is not physically significant.
62 CHAPTER 1. CLASSICAL MECHANICS

Therefore these two types of states must always exist in different regions of the extended
phase space. In analogy with what happens in quantum field theory, we call particle states
those for which ds t > 0 and anti-particle states those for which ds t < 0. We therefore find the
following
Insight 1.145. A frame-invariant notion of determinism and reversibility (i.e. allowing gen-
eralized coordinate transformations in Hamiltonian mechanics) gives us the notion of anti-
particles, even in classical particle mechanics.
Now that we got a better feel for what the Hamiltonian constraint is and how it works,
let’s go back to explore the geometry and physical significance of the extended phase space.
Looking at ωab it may seem that adding the temporal degree of freedom means just adding
another independent DOF, but that is not the case. If we start with standard phase space
and add time, we are not adding new independent configurations: if we have deterministic
and reversible motion, given the state at one time, we know the state at all times. So, while
adding time allows us to talk about all states at all times, we are not really adding new states
because states are defined and counted at equal time. In phase space extended by time only,
this was captured by the fact that the form ωab applied to the displacement field always gives
zero. If we now add energy, the conjugate of time, things are even more constrained. The
energy, in fact, is a function of the state of the system at a particular time. The addition of
energy, then, adds no configurations at all.
Now that it is clear that (t, −E) do not constitute an independent degree of freedom, we
should understand the significance of the minus sign. As we saw, it was needed so that pα is
a covector, but that does not tell us anything in terms of the geometry. What we are really
saying is that an infinitesimal rectangle of size dt and dE has a negative contribution to the
count of states. Shouldn’t it have no contribution at all?
We saw that orthogonality in phase space means DOF independence. If we have two
independent DOFs we can write:
ω(dq 1 + dq 2 , dp1 + dp2 ) = dq 1 dp1 + dq 2 dp2 . (1.146)
The above expression, then, can be understood as an areal version of Pythagoras’s theorem:
instead of summing the square distances, we are summing areas directly. That is, given the
orthogonality of the independent DOFs, we have a right triangle-like structure where the two
sides are the areas in each DOF and the hypotenuse is the area determined by ω. Now, because
spatial and temporal degrees of freedom are not independent, i.e. they are not orthogonal,
the form of the above expression cannot be the correct one for the temporal DOF.
The naive consideration would simply be to disregard the temporal degree of freedom,
since time does not contribute new states, and set
ω(dq + dt, dp + dE) = dq dp. (1.147)
However, this does not work either. This would say that the temporal DOF does not identify
states at all, which is not the case. Suppose we study a free particle with one degree of freedom.
Let’s assume that for t = 0, q = 0. We have
p
q= t
m
(1.148)
p2
E= .
2m
1.9. RELATIVISTIC MECHANICS 63

If we look at the region where momentum is positive, the relationship is bijective. Therefore
time and energy can be used to identify, and therefore count, states. The issue is that those
states are not new states, they are the same ones that are identified, and counted, by position
and momentum.
If we look at the expressions derived before, we have

ω(dq + dt, dp + dE) = dq dp + dt d(−E) = dq dp − dt dE. (1.149)

We can rearrange this expression as

dq dp = ω(dq + dt, dp + dE) + dt dE. (1.150)

Now, compare 1.146 with 1.150. In the second expression, the area in the spatial DOF is the
hypotenuse, while the area on the temporal DOF and the count of states given by ω are the
sides. That is, while the temporal DOF and the spatial DOF are not orthogonal, the temporal
DOF is orthogonal to the surface where states are counted. If we have a region defined at
equal time, for example, the count of states reduces to the count of spatial configurations,
which makes sense. In the general case, the differentials dt and dE are defined over surfaces at
constant (q, p), while the differentials dq and dp are defined over surfaces at constant (t, E).
These, as we said, are not orthogonal. This means that the area identified by dq and dp may
have a non-zero projection over the temporal degree of freedom. Given that the state count
should be defined at equal time, the area given by dq dp does not always properly count states.
If we want the count of states to be defined at equal time, this needs to be done on the surface
that is orthogonal to the surface where dt and dE are defined, which therefore forms a right
triangle-like structure with the spatial and temporal degrees of freedom.
The minus sign for the energy, then, has both a geometrical and physical meaning. If we
used E instead of −E as the variable, the minus sign would show up in the form ωab , which
would probably be better as it would make the geometrical feature more clear. However,
pα = [E pi ] would not match the definition used in relativity and it would not form a covector,
so it would be confusing in a different way. If there is a better overall notation and grouping,
we haven’t yet found it. At any rate, we have found that

Insight 1.151. A frame-invariant notion of determinism and reversibility (i.e. allowing gen-
eralized coordinate transformations in Hamiltonian mechanics) gives elements of special rela-
tivity (i.e. energy-momentum four-vector), even without the notion of a metric tensor.

What is interesting is that we didn’t need to add any further assumption: DR and IND
are sufficient. The only thing we needed to add was the ability to make generalized space-time
transformations.

Relativistic kinematics
It is time to add assumption KE. In this generalized setting, the trajectory, and therefore the
velocity, will be in terms of the affine parameter s. Therefore we set

ds q α = ds t dt q α = ds t [dt q 0 , dt q i ] = ds t [dt q 0 , v i ]. (1.152)

Given that the Hamiltonian constraint on the extended phase space plays the same role as the
Hamiltonian on the standard phase space, we will have an equivalent principle of stationary
64 CHAPTER 1. CLASSICAL MECHANICS

action once KE is taken. The action can be written as

A[γ] = ∫ Lds = ∫ (pα uα − H) ds (1.153)


γ γ

Under the full kinematic assumption, we have


∂uα pβ = mgαβ
pα = mgαβ uβ + qAα
1 αβ (1.154)
uα = ds q α = ∂pα H = g (pβ − qAβ )
m
1
H= (pα − qAα ) g αβ (pβ − qAβ ) + U
2m
If we set U = 21 mc2 , this is the Hamiltonian constraint for a massive particle under potential
forces.35
As we saw, the best way to understand the geometry of phase space is to use the state
variables q α and pα . However, understanding the physics is tricky as conjugate momentum
is a gauge dependent quantity. One thing we can do is actually work on phase space with
kinematic quantities, and express elements like θa and ωab in those variables. We have:
q α = xα
pα = mgαβ uβ + qAα
xα = q α
(1.155)
1
uβ = g βα (pα − qAα )
m
1 1
H = muα gαβ uβ + mc2
2 2
Note how the Hamiltonian constraint, in kinematic variables, is always the same, regardless
of the forces acting on the particle.
To calculate the expressions of θa and ωab in kinematic variables, we first see how the
covector basis transforms. We have
α β γ β γ α
eq = ∂xβ q α ex + ∂uγ q α eu = δβα ex + 0eu = ex
β γ
epα = ∂xβ pα ex + ∂uγ pα eu (1.156)
xβ uγ
= (m∂xβ gαγ uγ + q∂xβ Aα ) e + mgαγ e
Now we can express the forms in terms of state variables and perform variable substitution.
α α
θ = θa ea = pα eq = (mgαβ uβ + qAα )ex
α β α α
ω = ωab ea ⊗ eb = ωqα pβ eq ⊗ epβ + ωpα qβ epα ⊗ eq = eq ⊗ epα − epα ⊗ eq
α β γ β γ α
= ex ⊗ [(m∂xβ gαγ uγ + q∂xβ Aα ) ex + mgαγ eu ] − [(m∂xβ gαγ uγ + q∂xβ Aα ) ex + mgαγ eu ] ⊗ ex
α β α β β α β α
= (m∂xβ gαγ uγ + q∂xβ Aα ) ex ⊗ ex + mgαβ ex ⊗ eu − (m∂xβ gαγ uγ + q∂xβ Aα ) ex ⊗ ex − mgαβ eu ⊗ ex
α β α γ γ α
= (muγ (∂xβ gαγ − ∂xα gβγ ) + q (∂xβ Aα − ∂xα Aβ )) ex ⊗ ex + mgαγ ex ⊗ eu − mgαγ eu ⊗ ex
35
An open problem is understanding whether U is constrained to be a constant or not.
1.9. RELATIVISTIC MECHANICS 65

(1.157)
We introduce the following two expressions
Fαβ = ∂xα Aβ − ∂xβ Aα
(1.158)
Gαβγ = ∂xα gβγ − ∂xβ gαγ
We have
θa = [mgαβ uβ + qAα 0]
⎡−mGαβγ uγ − qFαβ mgαβ ⎤ (1.159)
⎢ ⎥
ωab = ⎢⎢ ⎥


⎣ −mgαβ 0 ⎥⎦
We recognize Fαβ as the electromagnetic field tensor. However, it is still an open problem
what Gαβγ represents. It has a direct relationship with the Christoffel symbols Γαβγ 36
1 1
Gαβγ = ∂α gβγ − ∂β gαγ = (∂α gβγ − ∂β gαγ ) − (∂β gαγ − ∂α gβγ )
2 2
1 1 (1.160)
= (∂γ gαβ + ∂α gβγ − ∂β gαγ ) − (∂γ gαβ + ∂β gαγ − ∂α gβγ )
2 2
= Γβαγ − Γαβγ .
If we express the two tensors in covariant components, we find37
Gαβγ = g αδ g βϵ g γζ Gδϵζ = g αδ g βϵ g γζ ∂δ gϵζ − g αδ g βϵ g γζ ∂ϵ gδζ
= −g αδ ∂δ g βγ + g βϵ ∂ϵ g αγ (1.161)
β αγ α βγ
=∂ g −∂ g

F αβ = g αγ g βδ Fγδ = g αγ g βδ ∂γ Aδ − g αγ g βδ ∂δ Aγ
= g αγ ∂γ (g βδ Aδ ) − g βδ ∂δ (g αγ Aγ ) − g αγ ∂γ g βδ Aδ + g βδ ∂δ g αγ Aγ
(1.162)
= ∂ α Aβ − ∂ β Aα + ∂ β g αγ Aγ − ∂ α g βδ Aδ
= ∂ α Aβ − ∂ β Aα + Gαβγ Aγ
The link between between F αβ and Gαβγ is not specific to the electromagnetic field. In fact
∇α Aβ − ∇β Aα = g αγ ∇γ Aβ − g βγ ∇γ Aα
= g αγ (∂γ Aβ + Γβγδ Aδ ) − g βγ (∂γ Aα + Γαγδ Aδ )
(1.163)
= ∂ α Aβ − ∂ β Aα + g αδ g βϵ Aγ (Γϵδγ − Γδϵγ )
= ∂ α Aβ − ∂ β Aα + Gαβγ Aγ
36
It is not the torsion as it anti-symmetrizes the first two indexes of the Christoffel symbols, while the
torsion uses the second two. In fact, we used the expression for a connection with no torsion to derive the
expression.
37
The derivation uses the following relationship
∂α g βγ = −g βδ g γϵ ∂α gδϵ
which can be derived from the following
0 = ∂α δϵβ = ∂α (g βδ gδϵ ) = gδϵ ∂α g βδ + g βδ ∂α gδϵ = g γϵ gδϵ ∂α g βδ + g γϵ g βδ ∂α gδϵ = ∂α g βγ + g γϵ g βδ ∂α gδϵ
66 CHAPTER 1. CLASSICAL MECHANICS

The expression Gαβγ is linked to a sort of covariant exterior derivative for vectors. Unfortu-
nately, we do not truly understand its geometrical significance.
To find the equations for motions, we first calculate the Poisson brackets between the
kinematic variables.

{xα , xβ } = {q α , q β } = 0 (1.164)

1 βγ 1 q
{xα , uβ } = {q α , g (pγ − qAγ )} = {q α , g βγ pγ } − {q α , g βγ Aγ }
m m m (1.165)
1 αβ
= g
m

1
{uα , uβ } = {g αγ pγ − qAα , g βδ pδ − qAβ }
m2
1
= 2 [{g αγ pγ , g βδ pδ } − {g αγ pγ , qAβ } − {qAα , g βδ pδ } + {qAα , qAβ }]
m
1
= 2 [g αγ {pγ , pδ }g βδ + pγ {g αγ , pδ }g βδ + g αγ {pγ , g βδ }pδ + pγ {g αγ , g βδ }pδ
m
−g αγ {pγ , qAβ } − pγ {g αγ , qAβ } − {qAα , pδ }g βδ − {qAα , g βδ }pδ ]
1 (1.166)
= 2 [Gαβγ pγ + q(∂ α Aβ − ∂ β Aα )]
m
1
= 2 [Gαβγ m(gγδ uδ + qAγ ) + q(∂ α Aβ − ∂ β Aα )]
m
1
= 2 [Gαβγ mgγδ uδ + q(Gαβγ Aγ + ∂ α Aβ − ∂ β Aα )]
m
1
= 2 [Gαβγ mgγδ uδ + qF αβ ]
m
We find the evolution of the kinematic variables by calculating the Poisson bracket with
the Hamiltonian constraint.
1
ds xα = {xα , H} = m{xα , uβ gβγ uγ } = muβ gβγ {xα , uγ }
2 (1.167)
1
= muβ gβγ g αγ = uα
m

1
ds uα = {uα , H} = m{uα , uβ gβγ uγ }
2
1
= muβ gβγ {uα , uγ } + muβ uγ {uα , gβγ }
2
1 1 1
= mu gβγ 2 (G mgδϵ uϵ + qF αγ ) − muβ uγ g αδ ∂δ gβγ
β αγδ
m 2 m
β ϵ αγδ 1 β γ αδ q αγ
= u u gβγ gϵδ G − u u g ∂δ gβγ + F gγβ uβ
2 m
1 β γ αδ q
= u u g Gγβϵ − u u g (∂β gγδ − ∂β gγδ + ∂δ gβγ ) + F αγ gγβ uβ
β ϵ αγ
2 m
1.9. RELATIVISTIC MECHANICS 67
1 q
= uβ uγ g αδ Gδβγ − uβ uγ g αδ (∂γ gδβ + ∂δ gβγ − ∂β gγδ ) + F αγ gγβ uβ
2 m
q
= uβ uγ g αδ (Γβδγ − Γδβγ − Γβδγ ) + F αγ gγβ uβ
m
β γ αδ q αγ β
= −u u g Γδβγ + F gγβ u
m

q αγ
Ds uα = ds uα + uβ ds xγ Γαβγ = F gγβ uβ (1.168)
m
These are geodesic equations modified by the electromagnetic force, which are consistent with
general relativity.
We have found that the addition of assumption KE in the generalized case gives us rela-
tivistic mechanics and only relativistic mechanics. There was no choice at any point, therefore,
in this sense, relativistic mechanics is the only options that works.

Insight 1.169. Relativistic mechanics is a consequence of DR, IND and KE.

Let’s see how the Minkowski metric and the speed of light emerges from what we have
already discussed. The strong kinematic assumptions imposes a linear relationship between
velocity which, in absence of forces, and be written as

pα = mgαβ uα . (1.170)

Given that gαβ is a symmetric tensor that depends only on space-time, it can be diagonalized
at a point P with a suitable coordinate choice. While the velocity is in terms of an affine
parameter s, we can express it in terms of time since ds xα = ds t dt xα = λdt xα . Since we can
set x0 = t, the above equation becomes

−E = m g00 λ dt t = λm g00
(1.171)
pi = m gii λ dt xi = λm gii v i .

Dimensional analysis on the spatial components tells us that already work in the sense that
gii must be pure numbers, under the assumption that s has dimensions of time. For the time
component, instead, dimensional analysis tells us that g00 must be the square of a velocity. As
we saw before, solution with positive energy are those for which s and λ are aligned, therefore
g00 must be a negative quantity. Therefore we set g00 = −c2 , and we recognize c as the speed of
light. Note that, technically, the constant does not play at all a role of speed in this discussion,
just as a conversion factor from the temporal component of the four-velocity to the energy.
We can change units so that g00 = −1 and gii = 1. For the spatial components, it is just a
matter of rescaling the units. For time, units need to be changed as well by setting p0 = −E/c
and x0 = ct. Note that the product of the two remains of the same unit, and the two are still
conjugate as space variables. Therefore we have shown that, at every point, we can find a
set of space-time coordinates such as gαβ = ηαβ where ηαβ is the Minkowki metric. If we use
proper time as the affine parameter s, then λ becomes γ = √ 1 2
1+( v2 )
c
68 CHAPTER 1. CLASSICAL MECHANICS

The failure of Galilean relativity


Given that our assumptions led directly to relativistic mechanics, non-relativistic mechanics
must fail to satisfy the strong kinematic assumption. We have not identified the exact reason
of the failure, though we have identified an interesting issue.
Galilean space-time transformations correspond to

t̂ = t
x̂ = x + v0 t (1.172)
v̂ = v + v0

The canonical transformation induced by that space-time variable change is the following

t = t̂
q = q̂ − v0 t̂
(1.173)
p̂ = ∂q̂ q p + ∂q̂ t(−E) = p
−Ê = ∂t̂ q p + ∂t̂ t(−E) = −v0 p − E

which gives us different rules for how momentum and energy transform. Note, for example,
that the momentum remains unchanged. The expected rules would be

p̂ = mv̂ = mv + mv0 = p + mv0


1 1 1 1
Ê = mv̂ 2 = m(v + v0 )2 = mv 2 + mvv0 + m(v0 )2 (1.174)
2 2 2 2
1 2
= E + v0 p + m(v0 ) .
2
The difference between the two expressions is a constant, therefore Galilean change of vari-
ables together with non-relativistic expressions for momentum and energy are still canonical
transformations. However, accommodating that constant requires a change of gauge. That is,
if we start in an inertial frame where

pα = mλgαβ uβ (1.175)

we end up in another inertial frame where

pα = mλgαβ uβ + Aα
1 (1.176)
Aα = [ m∣v0 ∣2 mv0i ] .
2
The new vector potential is still a constant field, therefore the forces do not change. Still,
the spatial components have a different value than the time ones, therefore the direction of
momentum is corrected.
What we find, then, is that Galilean transformations do not preserve the relationship
between kinetic momentum and conjugate momentum. To recover the original relationship,
one has to perform a gauge transformation. But this is contrary to the expectation that the
laws are the same for all inertial frames. There are likely other deeper issues at play, which
we hope to uncover in the future.
1.10. REVERSING PHASE SPACE 69

Metric tensor revisited


As we saw, the metric tensor appears not as defining the distances in space-time but rather as
the linear relationship between velocity and conjugate momentum. It does, however, end up
playing an important geometric role: in 1.159 we see that the metric tensor times the mass is
the off diagonal component of the form ωab . Therefore, if we have a range of positions dxα and
a range of velocities dv β , the number of configurations they identify is given by dxα mgαβ dv β .
In other words,
Insight 1.177. the metric tensor allows us to count states in terms of the kinematic variables.
This corroborates what we saw in 1.127: the determinant of the metric tensor is the
Jacobian determinant that allows us to express densities over phase space as densities over
kinematic variables. In relativity, the square root of the determinant
√ of the metric tensor
appears to create an invariant volume element. That is, ∫U ∣gαβ ∣ dx dx1 dx2 dx3 defines the
0

volume of space-time region U . However, we can also define an invariant volume element
over the space of kinematic variables. That is, ∫V ∣gαβ ∣ dx0 ⋯dx3 du0 ⋯du3 defines the volume
of the position-velocity space. The determinant of the metric tensor, then, is more directly
giving us the size of a volume taken in position-velocity space instead of space-time. This also
connects the ability to specify distributions in terms of kinematic variables since, as we saw
in FKE-DEN, the Jacobian determinant of the transformation between position and velocity
is proportional to ∣gαβ ∣.
Why is it, then, that defining volumes over kinematic quantities fixes the geometry of
space-time? Note that both velocities and differentials of space transform like vectors since
duα = dxα /dτ . Therefore if dxα gαβ duβ is invariant, then dxα gαβ dxβ will also be invariant. The
geometry of space-time, then, is set by the geometry of phase space.
There are other interesting open questions to explore to better understand this relation-
ship. For example, in relativity, one does put spatial and time variables on the same plane.
However, we saw that the primary role of the metric tensor is so that we deal with distri-
butions and integration, and these work differently in space and time. Consider, in fact, a
distribution ρ(xi , v j , t). This would represent the evolution over time of a distribution over
kinematic variables. At each time, the distribution has to integrate to one, assuming it is
normalized, which means the units of ρ are [ [x]31[v]3 ], the inverse of the cube of distance times
velocity. Note that every observer has to see the same thing: a normalized distribution at each
moment in time. Additionally, this has to happen no matter what ρ is. An open question in
reverse physics, then, is whether imposing this requirement is already enough to recover parts
of relativity.

1.10 Reversing phase space


In this section we will find the assumptions required to rederive the structure of phase space.
The principle of relativity will play a key role as the structure of classical phase space is
the only structure that allows us to define densities and entropy in a way that is coordinate
invariant.

Properties of phase space


We have seen that DR and IND are the constitutive assumptions of Hamiltonian mechanics,
in the sense that they fully characterize Hamiltonian evolution. The addition of KE in its
70 CHAPTER 1. CLASSICAL MECHANICS

full version recovers both Lagrangian mechanics and massive particles under potential forces.
The relativistic version of the theories comes out without additional assumptions simply by
properly dividing the role of time as a variable from time as a parameter. But in all this
discussion we assumed, without questioning, that states are identified by pairs of variables:
position-velocity or position-momentum. Is this a coincidence or is there an underlying reason
for it? Can we find assumptions from which the structure of phase space itself can be recovered?
Since we are interested in the structure of phase space itself, let’s go back to its original
version, without the extension to time or the temporal DOF. Throughout the whole discussion,
conjugate momentum pi has been written with the index down, tacitly assuming it to be a
covector. This means it obeys the following transformation rules under coordinate changes:

q̂ i = q̂ i (q j )
(1.178)
p̂i = ∂q̂i q j pj

When a metric tensor is defined, that is when assumption KE is valid, the difference between
vector and covector blurs, as we can always transform one into the other, but this is not the
general case. We have, therefore, this condition

Conjugate momentum pi changes like a covector under changes of


(PS-COV)
coordinates q i

which characterizes the relationship between q i and pi . We want to stress that these changes
of coordinates do not mix space and time variables. Therefore we are only talking different
choices of spatial coordinates at equal time.
Given that the form ωab plays a fundamental role, as it defines the geometry of phase space
in terms of state count, let’s see how it transforms during a coordinate change. We have:

ωq̂i q̂j ωq̂i p̂j ∂ k q̂ i ∂pk q̂ i ωk l ωqk pl ∂ l q̂ j ∂ql p̂j


ω̂ab = [ ]=[ q ][ q q ][ q j ]
ωp̂i q̂j ωp̂i p̂j ∂qk p̂i ∂pk p̂i ωp k q l ωpk pl ∂pl q̂ ∂pl p̂j
∂qk q̂ i 0 0 δkl ∂ql q̂ j ∂ql p̂j
=[ ] [ ] [ ]
∂qk p̂i ∂q̂i q k −δkl 0 0 ∂q̂j q l
0 ∂ql q̂ i ∂ql q̂ j ∂ql p̂j
=[ ] [ ] (1.179)
−∂q̂i q l ∂ql p̂i 0 ∂q̂j q l
0 ∂ql q̂ i ∂q̂j q l
=[ ]
−∂q̂i q l ∂ql q̂ j −∂q̂i q l ∂ql p̂j + ∂ql p̂i ∂q̂j q l
0 ∂q̂j q̂ i 0 δij
=[ ]=[ ]
−∂q̂i q̂ j −∂q̂i p̂j + ∂q̂j p̂i −δij 0

For the last step, we are taking partial derivatives of the new variables with respect to the new
variables. The derivative is equal to one if we are taking the partial derivative with respect
to the same variable and zero otherwise. We find that condition

The form ωab is invariant under changes of coordinates q i (PS-SYMP)

is implied by PS-COV.
1.10. REVERSING PHASE SPACE 71

As usual, we ask whether the converse is true. That is, suppose we perform a change of
coordinates q̂ i = q̂ i (q j ) for which ωab is invariant. Does this pose restrictions on how conjugate
momentum changes? We have

ωq̂i q̂j ωq̂i p̂j ∂ k q̂ i ∂pk q̂ i ωk l ωq k p l ∂ l q̂ j ∂ql p̂j


ω̂ab = [ ]=[ q ][ q q ][ q j ]
ωp̂i q̂j ωp̂i p̂j ∂qk p̂i ∂pk p̂i ωp k q l ωpk pl ∂pl q̂ ∂pl p̂j
∂qk q̂ i 0 0 δkl ∂ l q̂ j ∂ql p̂j
=[ ][ ][ q ]
∂qk p̂i ∂pk p̂i −δkl 0 0 ∂pl p̂j
(1.180)
0 ∂ql q̂ i ∂ l q̂ j ∂ql p̂j
=[ ][ q ]
−∂pl p̂i ∂ql p̂i 0 ∂pl p̂j
0 ∂ql q̂ i ∂pl p̂j 0 δij
=[ ] = ωab = [ ]
−∂pl p̂i ∂ql q̂ j −∂pl p̂i ∂ql p̂j + ∂ql p̂i ∂pl p̂j −δij 0

The two off diagonal terms impose the same constraint

∂ql q̂ i ∂pl p̂j = δij . (1.181)

In matrix terms, we are taking the product of two matrices and equating it to the identity
matrix. Therefore the two matrices are the inverse of each other:

∂pl p̂j = (∂ql q̂ j )−1 = ∂q̂j q l . (1.182)

This means that the change of position variables induces a change of momentum

p̂j = ∂q̂j q i pi + Aj (q k ), (1.183)

where Aj are arbitrary functions, which we can set to zero without loss of generality.38 This
makes momentum a covector.
The second constraint comes from the bottom diagonal term. Using the newly found
transformation rules, we have

−∂pl p̂i ∂ql p̂j + ∂ql p̂i ∂pl p̂j = −∂q̂i q l ∂ql p̂j + ∂ql p̂i ∂q̂j q l = −∂q̂i p̂j + ∂q̂j p̂i = 0. (1.184)

This constraint is satisfied simply because position and momentum are different state vari-
ables, and partial derivatives along one variable are taken keeping the others constant. By
imposing the preservation of the form ωab under an arbitrary change of coordinates, we recov-
ered the transformation law of momentum as a covector. Therefore conditions PS-COV and
PS-SYMP are equivalent.
The invariance of the form ωab under coordinate changes means that all those properties
that were invariant under Hamiltonian evolution are also invariant under equal-time coordi-
nate changes. Therefore condition

The Poisson brackets are invariant under equal-time coordinate


(PS-POI)
changes
38
The arbitrary functions change the value of zero momentum, potentially at every point in a different way.
While this is mathematically possible, it would make no physical sense that a coordinate change would induce
a change in the zero reference for momentum. We will see later how this is related to gauge transformations.
72 CHAPTER 1. CLASSICAL MECHANICS

is equivalent to PS-SYMP.
The following conditions are equivalent to each other and are implied by PS-SYMP, but
do not imply it.

The system allows statistically independent distributions over each


(PSI-DEN)
DOF under any choice of coordinates q i
The system allows informationally independent distributions over
(PSI-INFO)
each DOF under any choice of coordinates q i
The system allows peaked distributions where the uncertainty is the
product of the uncertainty on each DOF under any choice of coordi- (PSI-UNC)
nates q i

Physically, these correspond to the independence of DOFs, which is assumption IND. The
only difference is that the assumption must be valid for all equal-time coordinate changes,
which is just an application of the principle of relativity.
Lastly, the following conditions are all equivalent but independent of the PSI conditions
and are implied by PS-SYMP, but do not imply it.

Phase space volumes are invariant under equal-time changes of coor-


(PSV-VOL)
dinates q i
The Jacobian for the transformation induced by equal-time changes
(PSV-JAC)
of coordinates q i is unitary
Densities over phase space are invariant under equal-time changes of
(PSV-DEN)
coordinates q i
Thermodynamic entropy is invariant under equal-time changes of
(PSV-THER)
coordinates q i
Information entropy is invariant under equal-time changes of coordi-
(PSV-INFO)
nates q i
Uncertainty of peaked distributions is invariant under equal-time
(PSV-UNC)
changes of coordinates q i

Physically, these correspond to requiring that the count of states is the same under all choices
of coordinates at a given time. In the same way that conservation of ωab in time was equivalent
to assumptions DR and IND combined, any PSV condition together with any PSI condition
will be equivalent to any PS condition. That is, the phase space structure corresponds to
invariance of state count plus independence of DOFs.
It should be evident that all these properties are necessary if we want to have a physically
meaningful state space. If state count, densities, entropy, independence of DOFs were not
properties that all equal-time observers could agree on, there would be no notion of an objec-
tive state to begin with, and it would be pointless to even talk about isolation, determinism,
thermodynamics and so on. All these properties, then, are constitutive assumptions for any
state space, as without them there are no well defined states. Having established that phase
space has all these properties, do these properties define phase space? That is, suppose the
states of our system are identified by a finite number of continuous quantities, meaning that
the state space is a manifold; does the invariance of those properties constrain that space to
be phase space?
1.10. REVERSING PHASE SPACE 73

Invariance over the continuum


To simplify the matter, let us consider the case of a single DOF. One way to define the problem
is to ask that if a distribution is uniform for one observer, it should be uniform for all equal-time
observers. Uniform distributions are important in statistical mechanics since the macrostate
of an isolated system is assumed to be a uniform distribution over all possible microstates
that satisfy a few constraints, such as the value of the energy, the number of particles and so
on. This is very similar to the “principle of indifference” in classical probability, which assigns
equal probabilities to outcomes for which there is no justifiable preference.
For example, if we have a fair die, meaning that the die itself and the mechanism for
throwing it do not have a preference for any side, the principle tells us to assign equal prob-
ability to all sides.39 If the die has 6 sides, we assign probability 1/6 for each side. The part
of probability theory that is linked to combinatorics works like this. Note that what number
we use to label each side of the die, or whether we use something else (e.g. for poker dice,
images of playing cards) is irrelevant for the probability assignment. Therefore, when applied
to discrete variables, the principle of indifference is invariant under relabeling.
If we try to apply the same idea on the continuum, however, things are not as simple.
Suppose we have a factory that produces boxes at random, with side from 1m to 3m. Fur-
thermore, let’s assume that the manufacturing procedure does not have a preference for the
size of the boxes. Using the principle of indifference, we assume a uniform distribution for the
side from 1m to 3m. However, we could have equally said that the factory does not have a
preference for the total volume, and assign a uniform distribution for the volume from 1m3 to
27m3 . The problem is that x → x3 is a non-linear transformation, and uniform distributions
do not remain uniform under non-linear transformations. Another way to frame it, on the
continuum we do not have a probability, but a probability density, which has units of proba-
bility over units of the variable. In the first case the units were probability over meters while
in the second probability over meters cubed. The density, then, is unit dependent, it depends
on the variable.
This is essentially our problem:

Insight 1.185. Densities over continuum variables depend on the choice of variable and
therefore do not, in general, satisfy the principle of relativity.

If we have a uniform distribution over a variable, it will only remain uniform under linear
changes of variable. This means that density, thermodynamic entropy, information entropy, all
the properties we looked at before, are not the same under variable changes. Mathematically,
the issue is that only transformations with unitary Jacobian preserve these properties, which
is not the general case.
How does phase space solve the problem? Phase space is defined by two variables, q and
p. A change of coordinates only changes q arbitrarily. Once the q change is determined, p
changes as a covector. Around each point, we have:

dq̂ = dq q̂ dq
(1.186)
dp̂ = dq̂ q dp
39
The principle of indifference is often stated in terms of degree of belief: it is the agent that has no reason to
prefer one outcome over the other. We state it in more objective terms: the system and preparation procedure
do not have a preference. If an agent believes a die to be fair, while in fact the die is loaded, the principle of
indifference gives wrong empirical results, and therefore, for us, it does not apply.
74 CHAPTER 1. CLASSICAL MECHANICS

which means

dq̂dp̂ = dq q̂ dq dq̂ q dp = dq q dq dp = dq dp. (1.187)

The area is conserved under a generic change of q precisely because p changes in the opposite
way.40
While the math is clear, what is the physics behind this? Why does conjugate momentum
magically change when we change position? This is better understood if we think in kinematic
variables: position and velocity. Why would velocity change when we change position? Because
units of velocity are units of position over time. If we redefine units of position, we will
redefine units of velocity as well. As we said before, the mathematical tools we currently use
do not capture all the physical elements, and unit dependence is something that mathematics
currently fails to capture. The defining role of coordinate variables q i , then, is not that they
describe position, but rather the following:
Insight 1.188. The coordinate variables q i define the unit system.
What are the units of conjugate momentum, then? Given that the product dqdp is invariant
and it measures the count of configurations for one DOF, pi must have units of configurations
divided by units of the corresponding q i . By convention, we measure phase-space areas in
units of angular momentum, in units of action h, but why do we do that? Suppose that we
track position by an angle in radians q θ . Conjugate momentum pθ will be expressed in units of
area of phase space, and therefore angular momentum. This is the connection between angular
momentum and phase-space areas: the conjugate of a dimensionless quantity must have the
same physical dimensions of areas of phase space. However, if we change the angle from radians
to degrees, conjugate momentum will be units of h over degrees. That is, it is improper to
say the count of states is expressed in units of angular momentum. Moreover, while two
quantities that represent the same physical quantity must have the same physical dimension,
the converse is not true.41 Ultimately, areas in phase space are measured with those units
because, in an inertial Cartesian coordinate frame, linear kinetic momentum times distance
can be used to count the configurations of the system, as we saw when discussing the full
kinematic equivalence.
To recap, we saw that we cannot define a coordinate invariant density over a single con-
tinuous variable, but we can do that over two variables, if one has inverse units with respect
40
In statistical mechanics, it is often said that Liouville’s theorem justifies the use of phase space volume as
state count. Liouville’s theorem states that under Hamiltonian evolution the density, or equivalently the phase
space volume is conserved, which corresponds to assumption DR. As we mentioned before, the conservation of
volume in time would not be physically meaningful if the volume were not first an objective quantity. Therefore
Liouville’s theorem does not justify the use of phase space volume as state count: it is the invariance under
equal-time coordinate changes that does.
41
In general, units and physical dimensions keep track of some aspects of physical quantities, but not
everything. For example, one can show that, dimensionally, pressure is equal to energy over volume. This
relationship is actually useful when describing fluids. On the other hand, energy density is another useful
quantity to, for example, measure the capacity of batteries. There is no relationship between the energy density
of a battery and its pressure. Another puzzle: if one multiplies radians and meters, what is the resulting unit?
Consider the area of the side of a cylinder section θrh where θ is the angular size of the section, r is the radius
and h is the height of the cylinder. If we multiply θ and r, we get the length of the arc, which is a distance:
we get meters. However, if we multiply θ and h, we do not get a length, we have an angle times length: we get
meters times radians. The issue is that an angle is really a ratio between the length of the arc and the length
of the radius. Only a multiplication by the correct radius will give back a distance.
1.10. REVERSING PHASE SPACE 75

to the first. Is this the only case? What happens if we use three or more variables? Suppose
that q is the only variable that defines the unit system for a set of states, meaning that all
other variables have derived units. Then under unit change q̂ = q̂(q) the units of all other
variables must be uniquely defined. Suppose that we add a single additional constraint, that
the measure of states is invariant. How many total variables can the state space have? We
saw there must be more than one variable, or we cannot change the unit arbitrarily. Now
suppose that we have three or more variables. Given that we have only two constraints, the
choice of unit change and the area conservation, these are not enough to fully determine the
transformation of all variables, and therefore the units of all variables. Therefore it would not
be true the units of q fully determine the units of all variables. This tells us that, to have den-
sities that are invariant under equal-time coordinate changes, a state space for a system that
is fully characterized by a single unit must have exactly one additional quantity, conjugate
momentum, that changes covariantly under unit change. We reach the following

Insight 1.189. Degrees of freedom are two dimensional because only these allow coordinate
invariant densities and count of configurations.

Invariance over multiple degrees of freedom


We saw how things work for a single DOF, let’s see how they generalize for multiple inde-
pendent DOFs. Suppose states are identified by m variables ξ a , but the unit system is fully
identified by a subset q i of n variables. This would be the case, for example, if we are studying
particle trajectories as the spatial variables define the unit system. Moreover, suppose we
assume IND, that the system is decomposable into independent DOFs. In this case, indepen-
dence of the variables ties in with the independence of the DOFs. That is, if the unit of one
variable depends on the unit of another, they cannot belong to independent DOFs; on the
other hand, if two q i define independent units, they must belong to independent DOFs by
definition.
For example, suppose we have three degrees of freedom. Then we must have three variables
q , q y and q z that define the units of each degree. Suppose we change q x to q̂ x while leaving
x

the others unchanged. Given the premise and what we discussed above, there must be another
variable px with inverse units. This must be an additional variable given that q y and q z must
have different units. Given that the DOFs are independent, we should also be able to fix the
value of both q x and px and obtain a subspace identified by all the remaining variables. Now
suppose that we change q y by itself. By the same logic, there must be another variable py .
This must have units of inverse q y , and therefore cannot be any of the variables we already
have. We repeat the logic and we find another variable pz . At this point, there cannot be
other variables: we would have to introduce units that do not depend on q x , q y and q z , but
this contradicts the assumption that q i define the unit system. In general, then, if we have
n degrees of freedom, we must have 2n state variables: the q i that define the units and the
conjugate quantities pi expressed in inverse units.
We now want to introduce a form ωab that returns the count of independent configura-
tions for an infinitesimal parallelogram. Here we simply use the same arguments that we gave
discussing assumption IND. The number of independent configurations identified by a paral-
lelogram within the same degree of freedom will be given by the area of the parallelogram. On
the other hand, a parallelogram across degrees of freedom, for example formed by ∆q x and
∆py , will not properly identify independent configurations. Note, in fact, that their product
76 CHAPTER 1. CLASSICAL MECHANICS

is not unit invariant. This means ωab must match the familiar form. In short, assumption IND
plus requiring that the count of states is the same for all equal-time observers gives us the
structure of phase space. That is, condition

The space allows coordinate invariant distributions over equal-time


(PS-INV)
independent DOFs

is equivalent to PS-SYMP and therefore to PS-COV and PS-POI.


Phase space and its structure, then, are neither a coincidence nor a choice. Phase space is
the only space that allows us to define key concepts, like uniform distributions or reversibility,
in an invariant way, which would otherwise be ill-defined over the continuum. In other words:

Insight 1.190. The structure of phase space is exactly the structure needed to define state
densities, thermodynamic entropy, information entropy and statistical uncertainty over con-
tinuous quantities in a way that satisfies the principle of relativity for equal-time observers.

The structure of phase space, then, comes out from simply keeping track of unit depen-
dence between variables. This should sound familiar, as unit dependence was used in condition
FKE-UNIT to motivate why assumption KE should be implemented in its full form FKE-LIN.
The existence of a metric tensor, of inertial mass, and more could be seen as stemming from
assuming that the change of variables from dynamical to kinematic variables depended only
on position, on q i . Now we see that the very structure of phase space rests precisely on the
fact that coordinates are those variables that define the units. This leads to the following ob-
servation: while taking the weak form WKE-INV of assumption KE leads to no mathematical
inconsistencies, it would be physically inconsistent. On one side, when defining phase space,
we are saying that the units of the coordinates q i determine both the units of velocity and of
momentum; on the other side, when introducing WKE, we claim that the coordinates q i do
not determine by themselves the unit transformation between velocity and momentum. The
fact that we can write a mathematically consistent model that is physically inconsistent is
evidence that we need a better mathematical specification of our physical theories. Namely,
we need a way to encode unit relationship between variables.
We are also now in a position to answer another question: why are the laws of physics
second order? That is, why is it that external forces specify the acceleration of the system,
and not the velocity or the jerk (i.e. the derivative of the acceleration)? Note that, once
the structure of phase space is derived, we know that, under KE, position and velocity fully
determine the state, and, during the evolution, the change of state is fully determined by the
acceleration. In other words, the laws of physics are second order exactly because the state
is given by variable pairs. Since we have an explanation as to why the state space must obey
that structure, that same explanation tells us why the laws are second order. Therefore, we
have

Insight 1.191. The principle of relativity ultimately requires the laws of motion to be second
order.

The principle of relativity, then, is responsible for much more than the invariance of the laws,
it is also responsible for the structure of the space and the nature of the laws themselves.
1.10. REVERSING PHASE SPACE 77

Invariance under relative motion


So far, we have applied the principle of relativity only to transformations that leave time
unchanged. For completion, let us apply it to the more usual case, when space and time
coordinates are mixed. Consider phase space extended by time only. If we do not mix space
and time variables, we keep the equal-time surfaces the same. Therefore we are only requiring
that areas, density and entropy over each time slice remain the same. Moreover, we have no
requirement to relate what happens on two different surfaces at equal time. That is, we will
have the structure of phase space at each time, but there is no connection between these
structures at different times. However, if we mix space and time variables, equal-time surfaces
for one observer are not necessarily equal-time surfaces for another.
Now suppose we have a distribution that is uniform for one observer. The only way that
it is uniform for all other observers is if it remains constant in time as well. In other words,
the principle of relativity will imply the invariance of the form ωab even under space-time
transformations, which means, as we saw, Hamiltonian mechanics. The only way that we can
satisfy the principle of relativity, then, seems to assume deterministic and reversible motion.
That is, condition
The space allows coordinate invariant distributions over independent
(DI-INV)
and temporal DOFs
is equivalent to DI-SYMP, and is therefore equivalent to Hamiltonian mechanics.
This may be surprising at first, but in retrospect it makes sense. The invariance among
equal-time observers imposed that the density at each state was the same for everybody. In
general, however, the equal-time surface for two generic observers will not be the same, but
only intersect, in a region. Clearly, on that region the density must be the same for both,
but what happens in the regions that are different? Well, if they still need to see the same
density distribution, with the same entropy, then we must be able to map one region to the
other in a bijective way. That is, the evolution must map each state of one surface to a state
of the other surface while preserving the density. But this is exactly what deterministic and
reversible evolution does. If we don’t have this, if densities spread or if states are not mapped
one-to-one, the two observers will see a different state.
As another way to see this, suppose we have a box of gas at equilibrium. Now suppose that
at time t0 we start heating it very slowly, such that we can assume the gas is at equilibrium at
each time. The system, then, will increase its entropy. Suppose we stop the process at time t1 ,
so that the temperature of the gas remains constant after that. Now, the equal-time surface of
an observer boosted with respect to the gas will cross the original frame at different times, and
therefore will see the gas at different temperatures in different places at the same time. The
moving observer will see a system out of equilibrium. So, again, processes that are completely
isolated, for which entropy is conserved, are the only processes that are truly relativistic.
This tells us that if we want to study a non-deterministic and/or non-reversible process,
in a way that is frame independent, we need to be able to compare it to other processes that
can be assumed to be deterministic and reversible. It is only by comparing the two that we
are going to be able to ascertain precisely by how much the first one fails to be deterministic
or reversible. Suppose, in fact, that we want to construct a coordinate system to study our
non-deterministic and non-reversible process. Operationally, this means devising a system of
rods and clocks so that we can correlate the quantities measured with our spatial and time
references. Therefore we need some guarantee that, as time evolves, the spatial and temporal
78 CHAPTER 1. CLASSICAL MECHANICS

references will have set spatial and temporal relationships. For example, if we leave a mark at
a particular position, we may expect the mark to remain there; if the clock ticks uniformly in
time, we may expect it to do so going forward. But this implicitly requires the existence of at
least some deterministic and reversible process that can be used to distribute our references in
space and time. If we do not have access to any deterministic process, no reliable coordinate
system can be constructed. If the processes are not reversible, references will drift and the
precision of our reference system will degrade since DR-UNC is not satisfied.
We close, then, with the following.

Insight 1.192. Hamiltonian evolution over phase space is exactly the structure needed to
define state densities, thermodynamic entropy, information entropy and statistical uncertainty
over continuous quantities in a way that satisfies the principle of relativity.

Gauge transformations
Throughout all this work we have seen that position q i and momentum pi are the true state
variables: they allow us to identify states uniquely in all circumstances, properly count them
and define densities that are invariant under coordinate changes and are defined at a particular
instant in time. On the other hand, kinematic variables xi and v i identify states only under KE,
they properly count states and define densities only in Cartesian inertial frames for systems
subject to no forces, and velocity is technically defined over two infinitesimally close instants
in time. Yet, under assumption KE, the information content of the two sets of variables are
the same. Moreover, position and velocity are the variables that are more directly linked to
experimentation: while we can measure velocity and kinetic momentum directly, conjugate
momentum is not a direct observable.
We now pose the question: if velocity is the variable we actually measure, is conjugate
momentum even uniquely defined? That is, is it possible to change momentum in a way
that leaves velocity, and therefore the trajectories, unchanged? Since the position does not
change, the new state variables can be written as (q, p̂) where the new conjugate momentum
p̂i = p̂i (q j , pk ) is a function of the old variables. We want the new variables to be conjugate,
therefore the following relationships in terms of the Poisson brackets must hold:

{q i , q j } = 0
{q i , p̂j } = δji (1.193)
{p̂i , p̂j } = 0.

Using the second equation we find

δji = {q i , p̂j } = ∂qk q i ∂pk p̂j − ∂pk q i ∂qk p̂j


= δki ∂pk p̂j − 0∂qk p̂j (1.194)
∂pi p̂j = δji .

Integrating this last equation yields

p̂j = pj + Gj (q i ) (1.195)
1.10. REVERSING PHASE SPACE 79

where Gj are arbitrary functions of position. Using the third Poisson bracket, we have:

0 = {p̂i , p̂j } = {pi + Gi , pj + Gj }


= {pi , pj } + {pi , Gj } + {Gi , pj } + {Gi , Gj }
= 0 − {Gj , pi } + {Gi , pj } + 0 (1.196)
= −∂qi Gj + ∂qj Gi
= curl(Gj ).

This Gj is a curl free field, it admits a scalar potential f (q i ). Therefore the general transfor-
mation is of the form

p̂j = pj + ∂j f (q i ). (1.197)

In other words, conjugate momentum is defined up to a gauge, in the same way that the
electromagnetic potential is defined up to a gauge. In fact, these are essentially the same
gauges.
Recall the expression

1 αβ
uα = g (pβ − qAβ ). (1.198)
m

If the four-velocity uα and metric tensor g αβ are gauge independent, and the electromagnetic
four-potential Aβ is gauge dependent, then it must be that conjugate momentum pβ must be
gauge dependent in the exact opposite way such that their difference is gauge independent.
As another way to see this, suppose we have a charged particle in a field free region. In an
inertial frame, we would write

pi = mv i . (1.199)

But this not only requires us to use Cartesian coordinates, it also requires us to have chosen
the gauge for the magnetic field such that Ai = 0. Choosing a gauge for momentum, then, is
the same as choosing a gauge for the magnetic field.
If we look again at equation 1.195, note that what the gauge does is redefine the zero
momentum at each point. The Gj field represents the new value of the old zero. The new
zeros must form a curl free field because the new conjugate momenta still need to form
independent DOFs. Recall that when recovering phase space, in equation 1.183, we saw that
change of units of position left unspecified a potential change for zero momentum. The gauge
freedom is exactly that change.
This tells us, in a more direct way, why assumption KE requires gauge theories. Given
that the true observables are the kinematic variables, conjugate momentum is defined only
up to a curl-free field that represents the arbitrary definition of zero momentum. Given that
the vector potential of the interaction field tells us how the zero momentum is mapped to
zero velocity, this inherits the same arbitrariness. The same relationship will be there in
quantum mechanics, where the arbitrariness of the zero momentum will be implemented in
the arbitrariness of absolute phases.
80 CHAPTER 1. CLASSICAL MECHANICS

Entropy and dissipative evolution


Having seen the link between the principle of relativity and deterministic and reversible evo-
lution, let’s look a bit more closely at what happens during non-reversible evolution. We saw
that a damped harmonic oscillator is deterministic but not reversible, as it does not preserve
the count of states. The evolution, in fact, has an equilibrium, and regions around it get
smaller and smaller. This presents a problem: if areas get smaller, the entropy goes down. But
this is a dissipative system: shouldn’t the entropy go up?
To understand the problem, suppose Alice takes a pendulum at rest, displaces the weight
and lets it go. Suppose Alice told Bob, “In an hour, the position of the pendulum will be within
1 mm of the equilibrium.” This would not give Bob much information as, after an hour, all the
energy will have likely dissipated, no matter how it was initialized. Suppose Alice told Bob, “I
let the weight go within 1 mm of the equilibrium.” This gives Bob a lot more information, as
the range of possible trajectories of the pendulum has greatly decreased. Because the motion
is irreversible, statements at the same level of precision give more information in the past
than in the future.
In terms of information, then, the difference is whether we are asking about how much we
know about the position and momentum at a specific time, or how much we know about the
system in terms of its overall evolution. If the system satisfies DR, the two are the same. If
it doesn’t, we have a problem. Intuitively, we would want the state of the system to always
provide the same amount of information about the system, but under non-deterministic or
non-reversible evolution, this does not work. The amount of information changes in time and,
for a boosted observer, it will also change in space and momentum. This brings to light a
fundamental problem when defining states and systems: it would seem that such definition
can only be given if, at least in some cases, the system can be studied under deterministic
and reversible motion.
While the notion of state starts being problematic, the notion of evolution, however, is
automatically invariant. A damped harmonic oscillator may not satisfy DR, but it does satisfy
KE, which means that each state at a particular time will correspond to a particular trajectory
and, given a region of phase space at a particular time, we will be able to quantify the flow of
evolutions through that region. As time goes on, the evolutions become more concentrated,
and therefore it is more difficult to tell them apart with measurements of the same precision.
This is the indication that the system is not reversible. The proper indicator of irreversibility,
then, is not the number of states within the region, but the number of evolutions over the
region of phase space. This will increase for dissipative systems around the equilibria, and will
remain unchanged over reversible evolution.
Note that the flow of evolutions through a surface was exactly the geometric interpretation
we found for the principle of stationary action. In the extended phase space, the symplectic
form could be understood as quantifying the flow of the displacement field instead of the count
of states. Again, this seems to be hinting that the true nature of entropy is not the count of
states, but the count of evolutions. We will explore these ideas when applying reverse physics
to thermodynamics and statistical mechanics.

1.11 Reversing Newtonian mechanics


In this section we return to Newtonian mechanics, and see that assumption KE is the only
assumption that characterizes Newtonian mechanics.
1.11. REVERSING NEWTONIAN MECHANICS 81

Inertia and forces


We have seen that all systems that satisfy assumptions DR, IND and KE are Lagrangian
systems and, therefore, Newtonian given that the acceleration is a function of position and
velocity. We also saw that some dissipative systems, like a particle under friction, do not
satisfy DR though are Newtonian systems. So now the question is whether IND and KE, by
themselves, are enough to characterize Netwonian mechanics.
The first task is to verify that all Newtonian systems satisfy the assumptions. The second
law F i = mai in inertial frames makes a link between the dynamics, the force, and the kine-
matics, the acceleration. Moreover, the force is specified through kinematic variables, which
makes it clear that the kinematics is enough to reconstruct the dynamics. Therefore assump-
tion KE is satisfied somewhat trivially because Newtonian systems specify both kinematics
and dynamics. Translating the dynamics from forces and masses into energy and conjugate
momentum is complicated by the fact that there is no single way to decompose the total
force into a conservative and non-conservative part. However, we can always set conjugate
momentum equal to kinetic momentum in a Cartesian inertial frame.
As for IND, note that each new variable introduces its own force term, its own velocity
and acceleration. The difficulty here is that, since the dynamics does not satisfy DR, this
independence may not be kept over time. For example, the forces may be dissipative in such
a way that the region with equilibria has lower dimensionality, introducing correlations that
are effectively variable constraints. In other words, we do have a notion of independence at
each time, but it is not the same notion throughout the evolution. Mathematically, if we are
given the map from velocity to conjugate momentum at each time, we would be able to write
a form ωab at each time. Given that the evolution is still differentiable, the Jacobian exists
allowing us to map the count of configurations, and the form, back and forth in time. Since
the evolution does not, in general, satisfy DR, the form is not conserved over time.
Having seen that Newtonian systems satisfy IND and KE, we have to show the converse,
that all systems that satisfy those assumptions are Newtonian systems. We will need all the
insights we gained in the previous section. We saw that the principle of relativity mixed with
assumption IND recovers the structure of phase space. Assumption KE requires that the
kinematics is specified by the dynamics, therefore the state at each time is enough to identify
the trajectory of the system. This means that position and velocity must be invertible functions
of position and momentum. This also means that the count of configurations can be calculated
in terms of regions of position and velocity, which therefore must be well defined over time.
Given that finite regions are mapped to finite regions, the evolution must be differentiable,
an acceleration must be well defined and must be a function of position and velocity.
Given that position determines the unit system, condition FKE-UNIT is satisfied which is
equivalent to FKE-INER. Therefore we find the existence of locally inertial systems, in which
a system not subjected to forces will travel in uniform linear motion. This recovers the first
law. We already saw that acceleration was fully specified by position and velocity. The mass
is the coefficient that recovers the correct count of states, which will also fix the units for the
force. This recovers the second law.
Note that, in principle, a more complete account of Newtonian mechanics could be given.
This has not been pursued given how Lagrangian and Hamiltonian mechanics play a much
bigger role in field theories and in quantum mechanics.
82 CHAPTER 1. CLASSICAL MECHANICS

1.12 Directional degree of freedom


When recovering the structure of phase space, we only relied on the premise that q i define the
units. We are now going to use that premise to define a directional degree of freedom, that is
a degree of freedom that identifies a direction in space. We will see that, since a DOF must
be composed of two variables, the only directional DOFs that are possible are those in three
dimensional space.
Magnetic dipole
Spatial and temporal DOFs are of fundamental importance given that every object must be
located in space and time. We now want to focus our attention on a different type of DOF, one
that captures a direction in space. In general, objects are not completely spherically symmetric
and can be distinguished by their orientation. This is also true for fundamental particles, as
their intrinsic angular momentum, their spin, gives them an orientation that can be detected
and manipulated through magnetic forces.
To study this case, let’s consider a magnetic dipole. The magnetic moment µi can be
written as
q
µi = Li , (1.200)
2m
where q is the electric charge, m is the mass and Li the angular momentum. The Hamiltonian
for a magnetic dipole subjected to a magnetic field B i is given by
q
H = −µi B i = − Li B i . (1.201)
2m
To write the equations of motion, we need to know what the conjugate variables of Li are or,
equivalently, the form ωab or the Poisson brackets. Typically, one is given the Poisson brackets,
therefore we will start from those. They are

{Li , Lj } = Lk ϵijk . (1.202)

In Hamiltonian mechanics, we can write the evolution of any quantity f (q i , pi ) as

dt f = ∂qi f dt q i + ∂pi f dt pi = ∂qi f ∂pi H + ∂pi f (−∂qi H) = {f, H}. (1.203)

Therefore we have
q q j q j
dt Li = {Li , H} = {Li , − Lj B j } = − B {Li , Lj } = − B Lk ϵijk
2m 2m 2m (1.204)
q ⃗ ⃗
dt L
⃗=− B × L.
2m
Without loss of generality, we can assume that the magnetic field is oriented along the z axis.
We have
q z q z
dt Lx = − B Ly ϵxzy = B Ly
2m 2m
q z q z
dt Ly = − B Lx ϵyzx = − B Lx (1.205)
2m 2m
q z
dt Lz = − B Li ϵzzi = 0.
2m
1.12. DIRECTIONAL DEGREE OF FREEDOM 83

If we integrate, assuming Li = L0i at time t = 0, we have


q z
ω=− B
2m
Lx (t) = L0x cos ωt − L0y sin ωt
(1.206)
Ly (t) = L0x sin ωt + L0y cos ωt
Lz (t) = L0z

which gives us the Larmor precession of the magnetic moment. This is clearly a Hamiltonian
system, it obeys Hamilton’s equations, but it seems to have three variables, the components
of Li , instead of an even number. Shouldn’t we have conjugate pairs?
Note that the norm L of the vector is a constant of motion, only the angle changes. We
can then write Li = Lni where ni is a unit vector. The state space can be understood as the
space of all possible vectors with the same magnitude, which is the surface of a 2-sphere. The
size of an area A over the surface of a sphere of radius r can be measured using the solid angle

A
Ω= , (1.207)
r2
which is already a unit independent quantity. The infinitesimal solid angle can be written in
terms of the polar angle φ and the azimuthal angle θ

dΩ = sin φdφdθ. (1.208)

The two variables are not conjugate, as the area is not simply the product of the differentials.
However, we can write

dΩ = d(− cos φ)dθ. (1.209)

The cos φ is simply the component along the z direction, while θ is the angle on the x-y plane.
Changing the orientation of the solid angle, and relabeling for clarity θ = θxy we have

dΩ = dnz dθxy , (1.210)

which means that the angle on a plane and the component orthogonal to the plane are
conjugate. If we pick the variable θxy as the coordinate q and Lz as the conjugate p, we have
two variables with the right geometric relationship and with the right units. Therefore

ωθxy Lz = −ωLz θxy = 1


xy (1.211)
{θ , Lz } = −{Lz , θxy } = 1

We can recover Li with the following expressions



Lx = cos θxy L2 − L2z

Ly = sin θxy L2 − L2z (1.212)
Lz = Lz .
84 CHAPTER 1. CLASSICAL MECHANICS

We can also recover the Poisson brackets for the components


{Lx , Ly } = ∂θxy Lx ∂Lz Ly − ∂Lz Lx ∂θxy Ly
√ ⎛ −Lz ⎞
= − sin θxy L2 − L2z sin θxy √
⎝ L2 − L2z ⎠
(1.213)
xy ⎛ −Lz ⎞ √
− cos θ √ cos θxy L2 − L2z
⎝ L2 − L2z ⎠
= sin2 θxy Lz + cos2 θxy Lz = Lz

{Ly , Lz } = ∂θxy Ly ∂Lz Lz − ∂Lz Ly ∂θxy Lz


√ (1.214)
= cos θxy L2 − L2z − 0 = Lx

{Lz , Lx } = ∂θxy Lz ∂Lz Lx − ∂Lz Lz ∂θxy Lx


√ (1.215)
= 0 − (− sin θxy L2 − L2z ) = Ly
We can verify that Hamilton’s equations work as before. If the magnetic field is aligned
along the z direction, we have
q
H =− Lz B z
2m
q z (1.216)
dt θxy = ∂Lz H = − B
2m
dt Lz = −∂θxy H = 0.
Given initial conditions θ0xy and L0z , we have
Lz (t) = L0z
θxy (t) = ωt + θ0xy

L0x = cos(θ0xy ) L2 + (L0z )2

L0y = sin(θ0xy ) L2 + (L0z )2
√ √
Lx (t) = cos θxy (t) L2 − L2z (t) = cos(ωt + θ0xy ) L2 − (L0z )2
√ √ (1.217)
= cos(ωt) cos(θ0xy ) L2 − (L0z )2 − sin(ωt) sin(θ0xy ) L2 − (L0z )2
= L0x cos ωt − L0y sin ωt
√ √
Ly (t) = sin θxy (t) L2 − L2z (t) = sin(ωt + θ0xy ) L2 − (L0z )2
√ √
= sin(ωt) cos(θ0xy ) L2 − (L0z )2 + cos(ωt) sin(θ0xy ) L2 − (L0z )2
= L0x sin ωt + L0y cos ωt
which recovers the precession.
Let us sum up what we learned. The magnetic dipole is described by a single DOF,
where the conjugate variables are the angle on a plane and the component of the angular
momentum perpendicular to the plane. A constant force does not correspond to a constant
angular acceleration, but to a constant angular velocity. Now the question is how and to what
extent this can all be generalized.
1.12. DIRECTIONAL DEGREE OF FREEDOM 85

Generalizing to directional quantities


The above treatment works without modification regardless of whether the magnetic dipole
describes a small solenoid, a small rotating charge distribution, or the spin of a single particle.
Spin was originally thought to be due to a rotational motion of a particle, hence the name.
However, this would require, for example, an electron to be spinning at velocities that would
exceed the speed of light, therefore this view is no longer favored. But if spin is not due to a
rotation, why does it have the same properties as angular momentum? Is it a coincidence?
Let us call directional quantity a quantity li defined only by a direction in space, meaning
that the magnitude of the quantity ∣li ∣ = l is fixed, an intrinsic feature of the object, while
the direction can change. A magnetic dipole, generated by either spin or rotating charge, can
be considered a directional quantity. The question now is whether the directional nature is
enough to recover its properties.
If we want to keep track of a direction in space, the angle θxy of the component of the
direction on a plane is a natural variable to use. Therefore θxy is a natural choice for a unit
variable. Because of the structure of phase space, we will have a conjugate quantity Lz , that
we will call directional momentum. Given that a directional quantity is fully determined by
a direction in space, the count of possible configurations is proportional to a solid angle. As
we saw before in the context of a magnetic dipole, this means that the conjugate quantity Lz
must be proportional to the component of the directional quantity perpendicular to the plane
where θxy is defined. Therefore lz = kLz for some constant k.
Recall now that the product of conjugate quantities must always be in units of phase space.
We already noted, in fact, that if a coordinate is dimensionless, then its conjugate must have
dimensions of angular momentum. Therefore directional momentum Lz must have dimension
of an angular momentum, regardless of what the directional quantity li is about. Thus, even
if the directional quantity is not “an amount of rotation” defined by an angular momentum,
we can still write li = kLi , where Li is expressed in units of angular momentum. That is, a
directional momentum Li arises naturally every time we have a directional quantity. In the
case of a magnetic dipole k = Lµii = 2m
q
is the gyromagnetic ratio.
We want to stress that for a directional quantity both variables, the directional momentum
Lz along a given direction z and its angle θxy on the perpendicular plane, are effectively
tracking a single quantity. This is different from what happens in a spatial DOF, where the
coordinate q determines position while the conjugate p determines conjugate momentum, and
therefore the velocity, which are different properties of the system. This fact tells us that
a force acts differently in the two cases. A force for a spatial DOF will, in general, affect
momentum, and therefore velocity, which means it will impart an acceleration on position.
However, for a directional quantity the only thing that can be changed is a direction, therefore
a force can only impart a change in direction, a directional velocity. To sum up, a force over
position will impart a constant spatial acceleration while a force over a direction will impart
a constant directional velocity.
For a rotating rigid body, the situation may seem different. We do have an angle and an
angular velocity, and the angular momentum is the angular velocity times the moment of
inertia, much like kinetic momentum is mass times velocity. Given that angular momentum
is conserved, angular velocity is conserved and a force needs to be used to change the angular
velocity. Therefore, at first glance, the situation seems different from the magnetic dipole.
On the other hand, we are dealing with an angular momentum in both cases, therefore the
86 CHAPTER 1. CLASSICAL MECHANICS

situation can’t be that different.


The issue here is that there are two directions: one is the direction of the axis of rotation,
the other is the orientation of the object on the plane of rotation. While the second one
constantly changes, the first may be fixed. Note that the ability to discern the second assumes
the ability to discern parts of the rigid body. If the object is too small with respect to our
resolution, the orientation angle cannot be defined. On the other hand, even if we study the
overall object, we would still be able to discern the direction and magnitude of the angular
momentum by performing a rotation. In fact, if the rotation is performed along a direction
different from the direction of the angular momentum, a torque will need to be applied to
perform the rotation. The greater the angular momentum, the greater the torque. We are now
in a case similar to the magnetic dipole, where a constant force is needed to impart a constant
velocity.
We have seen that if we want to describe a directional quantity, and only a directional
quantity, with state variables, there is only one way to do it, and it will lead to a notion
of directional momentum. Spin and angular momentum are, in our definition, two types of
directional momentum. Is this the only way to do it? That is, why does a directional DOF
only includes the directional quantity, without the directional velocity?
Suppose we want to describe a particle, in the sense of an infinitesimally small part of an
object, which is not only characterized by a position, but also by a directional quantity. If
we pick a specific time t, the position and direction at that particular time are independent
variables, in all senses: we can choose units independently, we can choose from all possible
configurations of each variable and a choice for one variable does not constrain the choice for
another variable. It is true that we will need to relate the direction identified by the directional
variable to the reference frame used by the position coordinates. Yet, we are not forced to
choose a particular plane for the directional angle θ.
The situation between velocity and the directional velocity, however, is different. Contrast
the following two cases. In the first, we are in an inertial frame with a particle at rest,
whose directional quantity changes with a constant directional velocity. In the second, we
are in a rotating frame with a particle at the center, whose directional quantity changes
with a constant directional velocity. If we look only at the directional quantity, these two
cases are indistinguishable: we need to know which frame we are in. But knowing the frame,
ultimately, means knowing the definition of our spatial variables. That is, there is a definitional
dependency between the spatial variables and the directional variables, which are, in the end,
defined in space. We are free to choose the relationship between the variables at a specific
time. But to relate the directional quantity at different times, we must know how the spatial
coordinates change in time. Therefore, the directional quantity is an independent quantity
with respect to the spatial DOFs, but the directional velocity is not. We reach the following:

Insight 1.218. An independent directional DOF must be restricted to a directional quantity.

We concluded that an independent directional DOF only includes a directional quantity,


so the structure we have for the magnetic dipole is an instance of a general structure. However,
note how the directional quantity fits exactly in one DOF. Is this necessary? That is, could we
have a directional quantity that fits in more than one pair of conjugate quantities? In other
words, is it possible to have directional quantities that break up into multiple independent
DOFs?
1.13. INFINITESIMAL REDUCIBILITY 87

As we said before, identifying a direction in three dimensions is equivalent to identifying a


point on a 2-sphere. If we had an n-dimensional space, a direction would correspond to a point
on an (n − 1)-sphere. That sphere would have to allow a description in terms of conjugate
variables with a suitable ωab . It is a result of symplectic geometry that the 2-sphere is the
only sphere to allow that structure. Therefore

Insight 1.219. An independent directional DOF can only exist in a three dimensional space.

In other words, three dimensional space is special as it is the only space that allows us
to talk about directions as forming an independent degree of freedom. This is yet another
example of how simple unassuming premises have significant consequences.

1.13 Infinitesimal reducibility


We have seen that the structure of phase space (i.e. the symplectic structure) is exactly the
structure needed to describe density, count of states and entropy in a coordinate invariant
way. This implicitly assumes that states are points on a differentiable manifold, that is they
are fully identified by real variables that only allow differentiable changes of variables. We will
see that this corresponds to the assumption of infinitesimal reducibility.

Particles as infinitesimal parts


At this point, we have ample evidence that the correct fundamental object that describes
classical systems is not a point on phase space. All physical objects have finite spatial extent
and assuming otherwise leads to problems. A finite mass concentrated at a point is incompat-
ible with general relativity, as it would puncture space-time and not travel along geodesics,42
and with quantum mechanics, as it would imply uniform spread in momentum thus requir-
ing infinite kinetic energy.43 The correct physical object, instead, is a distribution over phase
space. It explains the very structure of phase space (i.e. the distribution has to be frame
invariant), it explains why the laws of motion are differentiable (i.e. they transport a density,
not just points) and why they follow Hamiltonian mechanics for isolated, i.e. deterministic
and reversible, systems. We can therefore conclude that

The state of a classical system is given by a distribution over phase


(IR-DIST)
space.

As usual in reverse physics, we want to find equivalent physical assumptions for the same
statement.
While point particles should not be considered fundamental classical objects, the whole
of classical mechanics assumes them and uses them. Therefore we should understand exactly
what they represent.
One way to understand point particles is as an approximation. When we calculate the
trajectory of the earth around the sun, the motion of a cannonball or the average displacement
of a molecule under Brownian motion, we know that the object is not actually point-like. But
42
Mathematically, it would mean that all mass-energy would be concentrated in points of infinite curvature,
leaving the rest of space-time flat with no mass.
43
Mathematically, the wave-function would be a δ-function, which is not even an element of the Hilbert
space L2 .
88 CHAPTER 1. CLASSICAL MECHANICS

since the body is rigid, and assuming that the effect of external forces on the different parts of
the body can be neglected, we can get away with studying the motion of the center of mass.
Clearly, we need to check that the assumption holds during the whole motion: if the distance
between a comet and the sun becomes smaller then the radius of the sun, the sun cannot be
assumed to be point-like anymore. The approximation will also fail if the force exerted on
different parts of a body is greater than the force that keeps it together.
The point particle approximation is not unique to classical mechanics. Quantum mechan-
ics, through the Ehrenfest theorem, also allows for a point particle approximation, which
will fail if the wave-function is spread over a region where the potential changes are non-
negligible. Since the approximation is not unique to classical physics, this characterization
of a point particle does not help us understand the fundamental constitutive assumptions of
classical mechanics.
When discussing the nature of Hamiltonian evolution, however, we saw that states were
better understood as infinitesimal regions of phase space. This means that classical particles,
at least this more fundamental version of them, can’t be understood as a standalone physical
object: they should really be thought of as the limit of some recursive subdivision. That is,
classical particles should really be thought of as an infinitesimal part of something bigger, and
the ‘part’ in ‘particle’ should be understood literally.
The actual classical object, then, is not the infinitesimal part, but the whole object that
can be, in principle, divided into infinitesimal parts. In the same way that, when discussing
differential topology, we had quantities over finite regions that could be broken into the in-
finitesimal contributions, the whole object, together with its notion of state, can be understood
as made up of infinitesimal parts.

A classical system can be thought of as being made of infinitesimal


(IR-INF)
parts, called particles.

All classical systems follow the assumption, not just mechanical systems. Continuum mechan-
ics and fluid dynamics assume that the materials are a continuum of infinitesimally small
parts; classical electromagnetism assumes that the electromagnetic radiation can be decom-
posed into arbitrarily small signals at all frequencies. Moreover, the assumption clearly fails
in quantum mechanics. In that case, we cannot talk about the state of a part of an electron;
materials are not a continuum of infinitesimally small parts; the intensity of electromagnetic
radiation cannot be made arbitrarily small.

Divisible vs reducible
Before going forward, we need to be clear as to what “being made of” actually means. For
example, if we say that a table is made of a horizontal top and four legs, we may mean that we
can take the table apart and study its components independently, or that we can describe the
whole table by describing the top and the legs. While often one can do both, these are actually
different properties. That is, divisibility, the ability to “cut” an object into independent parts,
is not the same as reducibility, the ability to describe an object in terms of parts. Let us go
through some examples.
Suppose we have a planarian worm and we divide it in half. After some time, the tail will
regrow a head, and the head will regrow a tail. We will be left with two worms. A planarian
worm is divisible into two worms, in the sense that we have a process by which a worm is
1.13. INFINITESIMAL REDUCIBILITY 89

divided into two worms, but it is not reducible to two worms, in the sense that describing one
worm is not the same as describing two worms.
Now, suppose we have a magnet. We can describe it by describing its north and south
pole. Now suppose we divide it in half. Each half will be a new magnet with its own north
and south pole. A magnet is reducible to a north and a south pole, meaning that describing
a magnet is the same as describing its poles, but it is not divisible into a north and a south
pole, meaning that we do not have a process that can separate the two.
We can also find differences between divisibility and reducibility with fundamental parti-
cles. Suppose we have a muon. If we wait some time, it will decay into an electron, a neutrino
and an antineutrino. A muon divides itself into the three particles, but it is not reducible to
the three particles: the state of a muon is not equivalent to the state of an electron and two
neutrinos.
Suppose we have a proton. This is not a fundamental particle and it is described in terms
of quarks and gluons. However, if we try to separate one of its quarks, the interaction energy
will create new quark-antiquark pairs and the proton will divide into multiple hadrons. A
proton is reducible to quarks and gluons, but it is not divisible into them.
Divisibility here means that we have a physical process that starts with the overall system
and ends with parts that can now be independently manipulated. Mathematically, if SC is the
state space of the overall system and SA and SB are the state spaces of the parts, we have a
time evolution map Ut ∶ SC → SA × SB that takes an initial state of the composite and returns
a state for each part. This is not what we want.
Reducibility means that describing the whole is the same as describing the parts. Mathe-
matically, the state space of the whole system SC = SA × SB is exactly the Cartesian product
of the parts. In other words, what we are taking apart is not really the object itself, but its
state. This is the type of partitioning we are going to use.
To make reducibility more concrete and intuitive, suppose we have a ball. We can throw
the ball, study the motion of the whole ball. We can also take a red marker, make a red dot on
the ball, and study the motion of the red dot. We say the ball is reducible because studying
the motion of the whole ball is equivalent to studying the motion of all possible red dots.
It is infinitesimally reducible if we assume that we can make the red dot arbitrarily small.
This is the property we are interested in. Conversely, suppose we have an electron. We can
study the motion of the electron, but we cannot make a red dot on the electron. We have no
process at our disposal that can tag part of an electron so that we interact with and study
only that part. Whenever we interact, we interact with the whole electron. This means that
the assumption does not hold in that case.
Condition IR-INF, then, is a property of classical systems and only classical systems,
and it is clear that IR-DIST implies IR-INF. But is the converse true? That is, if something
can be thought of as being made of infinitesimal parts, does it follow that the state of those
infinitesimal parts should be represented by points in phase space? In other words, is condition
IR-INF enough to characterize classical systems?

Classical systems and infinitesimal reducibility


Suppose we have a system that satisfies condition IR-INF. Under this premise, we have two
state spaces: SC for the full system and SP for the infinitesimal parts, the particles. Each state
for the full system must tell us exactly how much of the system is in each region of SP . That
90 CHAPTER 1. CLASSICAL MECHANICS

is, for each state s ∈ SC we have an associated real valued set function44 f (U ) ∈ [0, 1] that for
every region U ∈ SP returns the fraction of the system in that region of particle state space.45
Moreover, reducibility implies f to be additive, meaning that if U is the disjoint union of
two regions U1 and U2 , then f (U ) = f (U1 ) + f (U2 ). Because we are assuming infinitesimal
reducibility, this must hold not just in the finite case, but also in the countable case. Mathe-
matically, f is a bounded measure. Since the state of the whole system is fully identified by
the state of the parts, for each state s ∈ SC there is one and only one f . We can understand
SC as the space of such functions.
Depending on whether we are considering a specific instance of a particular system or
an ensemble of similarly prepared systems, the value of the function f will have a different
physical meaning. It could be understood as the fraction of a single system that has a particular
property (i.e. half of the ball is to the right of the line), or it could be understood as the
probability that a particular instance of an ensemble has a particular property (i.e. half the
time the ball is to the right of the line). From now on, we will assume we are talking about
an actual system, but all that we say will apply to the ensemble case as well.
The fact that f (U ) is real valued is also implied by condition IR-INF since the parts
are infinitesimal and therefore there can’t be a smallest increment.46 It also implies that the
fraction f ({x}) associated to a single particle state x ∈ SP must be zero. Otherwise we would
be associating a finite non-zero fraction to an infinitesimal part x, which would make it not
infinitesimal. The particle state space SP of an infinitesimally reducible system, then, must
be charted by continuous variables.47
So far, we have found that, under condition IR-INF, the state space SC of a system is the
space of all possible functions f (U ) ∈ [0, 1] where U ⊆ SP is a subset of all the possible particle
states. Furthermore, the particle state space SP must be charted by continuous variables, it
must be a manifold. We now need to recover the notion of differentiability and of invariant
density.
To be able to fully characterize the state of the whole system by the state of the parts,
we need to be able to quantify the fraction of the system for each particle state. We saw
that the fraction for each particle state is zero, which makes sense because particle states are
limits. Therefore the fraction associated to a particle state should be a limit as well: it should
be a fraction density, similar to mass or charge density at a point. A fraction density will
quantify the fraction present over a region of particle states, meaning that is expressed as
a fraction over count of particle states. We therefore need to be able to quantify how many
states there are in each region of SP ; we must be able to say which regions have equal number
of states. Over a discrete space this would be trivial, we would just count the number of
points, but over the continuum finite ranges have infinitely many points and counting points
44
A set function is a function that takes sets as an argument.
45
Technically, the region U must be a Borel set, as these are the regions that are associated with an
experimental procedure. These details are established in the physical mathematics part of the book.
46
Technically, this only limits the output of f (U ) to be a bounded dense linear order. However, the function
f must be closed under arbitrary countable addition, bringing in all the limits, meaning that the ordering must
be complete. Lastly, the whole system can always be divided into countably many pieces, which tells us that
the order must have a countable dense subset. Since the order is bounded, dense, complete and has a countable
dense subset, it is order isomorphic to [0, 1] ⊂ R.
47
Again, technically we only found that the space must be, in some sense, dense. To show that it is “dense
like the reals”, we need the link between topology and experimental verifiability that we establish in physical
mathematics.
1.13. INFINITESIMAL REDUCIBILITY 91

doesn’t work. We must have another set function µ(U ), which will also be countably additive
as disjoint regions will identify entirely different states, making µ another measure. However,
µ will be bounded only from below (i.e. no set can have fewer than zero states) as SP can
have potentially infinitely many possible states.
Note that, because of the infinitesimal reducibility assumption, we will be able to find
smaller and smaller subsets of SP . The value of µ will keep decreasing and, in the limit of
infinitesimal subdivision, we will have µ({x}) = 0. That is, a single particle state counts as
zero in terms of number of states. In a way, the math is already telling us that particle states
do not exist literally and that the assumption of infinitesimal reducibility shouldn’t be taken
literally. It is a simplifying assumption and should be understood as such. In the same vein, as
we can define distributions over phase space that have arbitrarily narrow spread, the entropy
of those distributions can be an arbitrarily low number, which will tend to −∞ in the limit
of a δ-distribution. This is clearly a problem, since thermodynamics requires the entropy to
be non-negative. These two problems are in fact the same problem, since the entropy for a
uniform distribution is the logarithm of the count of states. In other words, we shouldn’t
be surprised that classical mechanics fails for small objects (i.e. narrow distributions), or for
ensembles with low entropy: it is exactly when condition IR-INF fails.
Continuing our derivation, we have seen that both the measure f for the fraction of the
system and the measure µ for the count of particle states are zero for sets with single points.
More in general, we must have that whenever µ(U ) = 0, then f (U ) = 0. That is, we cannot
assign a finite fraction to a set of states that has no finite state count. That would, again, mean
that we have a finite fraction associated to an infinitesimal subdivision, which is physically
untenable. Mathematically, this means that f is absolutely continuous with respect to µ, and
therefore we can define a density ρ = dµ f such that f (U ) = ∫U ρdµ.48 That is, the state of the
whole system can be described by a density over the particle states.
As we said, SP is a manifold. The measure µ is a feature of that manifold, in the sense that
it depends only on the properties of the particle states, and it will be the same for all states of
the whole system. Of all possible state variables that we can use to chart the manifold, then,
it is convenient to choose those that can express µ as a density of states over units of the state
variables. Mathematically, we are asking that the Lebesgue measure induced by the variable
ξ ∶ SP → R is absolutely continuous with respect to µ. This will also require that the variables
are differentiable with respect to each other: it will lead to the differentiable structure. That
is
Insight 1.220. The differentiable structure of a state space is exactly the ability to express
state count and fractions as densities over said states.
Lastly, both f and µ must not depend on the choice of state variables. Therefore the
proper expression of ρ = dµ f must also not depend on the choice of state variables. This, plus
assumption IND, as we have seen, requires the existence of the form ω, and recovers phase
space, the symplectic manifold. Therefore we have recovered IR-DIST.
Condition IR-DIST is therefore equivalent to condition IR-INF. Classical mechanics de-
scribes exactly those systems that follow
Assumption IR (Infinitesimal Reducibility). The state of the system is reducible to the
state of its infinitesimal parts. That is, specifying the state of the whole system is equivalent
48
Mathematically, ρ is the Radon-Nikodym derivative.
92 CHAPTER 1. CLASSICAL MECHANICS

to specifying the state of its parts, which in turn is equivalent to specifying the state of its
subparts and so on.

Assumption IR is, therefore, the constitutive assumption of classical mechanics. Assump-


tions IR, IND and DR are the constitutive assumptions of the Hamiltonian formulation.
Assumptions IR, IND and KE are the constitutive assumptions of Newtonian mechanics. As-
sumptions IR, IND, DR and KE are the constitutive assumptions of Lagrangian mechanics.
We have found all the constitutive assumptions of all formulations of classical mechanics.
We now are guaranteed that every result of classical mechanics is, one way or the other,
explained by those assumptions. These are the only physical ideas that are strictly required
to understand all the general aspects of classical mechanics. There is nothing else. Therefore
the goal of reverse physics for classical mechanics is reached.

Classical uncertainty principle


Before concluding, let us turn again to the problem of zero measure on individual particle
states. As we said, a distribution over phase space can be made arbitrarily narrow and, in the
limit of a δ-function, the support will be a single point. Since the entropy is the logarithm of
the phase-space volume, which is zero for a single state, the entropy of a δ-function is minus
infinity.
Negative entropy is not compatible with the third law of thermodynamics, which states
that the entropy must be non-negative. Therefore, on thermodynamics grounds, the spread of
a distribution cannot be made infinitesimal. This means that any distribution that is physically
meaningful must have a finite spread in position and momentum. Conceptually, this sounds
very close to the uncertainty principle of quantum mechanics, so it’s worth looking at it more
closely.
First of all, since we saw that units have an important role, consider the expression for ther-
modynamic entropy S = kB log W where kB is the Boltzmann constant and W is a volume of
phase space. Given that a logarithm must take pure numbers, the expression is dimensionally
incorrect. To correct it, we should take
W
S = kB log (1.221)
W0
where W0 can be understood as the volume for which the thermodynamic entropy is zero. As
we saw, units of momentum are units of configurations, of areas in phase space, over units of
position. Therefore we can choose units of momentum such that W0 is, numerically, one, but
not dimensionally. To keep track of the value and units of W0 separately, then, let’s write

W0 = w0 1qp (1.222)

where w0 is the pure numeric value and 1qp is the product of the units of q and p. If we
substitute in the above expression, we find
W W
S = kB log qp = kB log qp − kB log w0 = kB log W − kB log w0 (1.223)
w0 1 1
where log W is the logarithm of the numeric value of W . This clearly show that changing w0 ,
the numeric value of W0 , changes the zero for the entropy.
1.13. INFINITESIMAL REDUCIBILITY 93

To make the Gibbs-Shannon entropy consistent with the previous definition, we must set

I[ρ] = −kB ∫ ρ log(W0 ρ) dqdp. (1.224)

If we take a uniform distribution ρU over a region U with volume W , we find


1 W0
I[ρ] = −kB ∫ ρU log(W0 ρU ) dqdp = −kB ∫ log dqdp
U W W
(1.225)
W ∫U dqdp W
= kB log = kB log
W0 W W0
Similarly to before, we find

I[ρ] = −kB ∫ ρ log(W0 ρ) dqdp = −kB ∫ ρ log(ρ1qp ) dqdp − kB ∫ ρ log w0 dqdp

= −kB ∫ ρ log(ρ1qp ) dqdp − kB log w0 (1.226)

= −kB ∫ ρ log ρ dqdp − kB log w0

which again shows that w0 changes the zero for entropy.


We are now ready to study the relationship between spreads in classical phase space and
entropy. We want to find the distribution ρ with zero entropy that minimizes the uncertainty.
To do that, we set up a minimization problem using Lagrange multipliers. We want to minimize
the product of the variances σq2 σp2 ≡ ∫ (q − µq )2 ρ dqdp ∫ (p − µp )2 ρ dqdp, where µq and µp are
the mean position and momentum, under two constraints: ρ integrates to 1 and its entropy is
zero. We have:

L = ∫ (q − µq )2 ρ dqdp ∫ (p − µp )2 ρ dqdp

+ λ1 (∫ ρdqdp − 1) + λ′2 (−kB ∫ ρ ln(W0 ρ) dqdp − 0)

= ∫ (q − µq )2 ρ dqdp ∫ (p − µp )2 ρ dqdp

+ λ1 (∫ ρdqdp − 1) + λ2 (− ∫ ρ ln ρ dqdp − ln w0 )

δL = ∫ δρ[(q − µq )2 σp2 + σq2 (p − µp )2 + λ1 − λ2 ln ρ − λ2 ]dqdp = 0


λ2 ln ρ =λ1 − λ2 + (q − µq )2 σp2 + σq2 (p − µp )2
λ1 −λ2 (q−µq )2 σp
2 2 (p−µ )2
σq p
ρ =e λ2 e λ2 e λ2

We recognize the distribution as the product of two independent Gaussians. We can therefore
use the standard expression and calculate the entropy, which must be zero. We find:
(q−µq )2 (p−µp )2
1 − 2
2σq
− 2
2σp
ρ= e e
2πσq σp
σq σp
I[ρ] = kB ln (2πe ) = 0 = kB ln 1
W0
W0
σq σp =
2πe
94 CHAPTER 1. CLASSICAL MECHANICS

We have found that any distribution ρ(q, p) with non-negative entropy (i.e. that satisfies the
third law of thermodynamics) satisfies the inequality

W0
σq σp ≥ . (1.227)
2πe
Moreover, the inequality is saturated (i.e. becomes an equality) for Gaussian distributions.
Compare this with the Heisenberg uncertainty principle of quantum mechanics, which tells
us that the every state satisfies the inequality
h
̵
σq σp ≥ , (1.228)
2
which is saturated by Gaussian states. Given the parallel, we call equation 1.227 the classical
uncertainty principle. Note that the classical uncertainty principle has a clear physical expla-
nation: the third law of thermodynamics. Given that all quantum states have non-negative
entropy, the same explanation can be taken for quantum mechanics as well.
This result also tells us the problem with Maxwell’s demon. In this setup, a gas is sepa-
rated into two chambers with a door. A demon that sees the exact position and momentum
of all particles can open the door letting a particle go only from one specific side to the other,
decreasing the entropy of the system. But knowing the exact position of all particles corre-
sponds to having access to a state of minus infinite entropy, which is forbidden by the third
law. Naturally, if we have access to a source of minus infinite entropy, we can always decrease
the entropy of any system by a finite amount without violating the second law: given that
minus infinity minus a finite amount is equal to minus infinity, the entropy is technically not
decreasing.
This final result should really make it clear that the foundations of statistical mechanics
and thermodynamics are not separate from the foundations of classical mechanics and quan-
tum mechanics. In any theory, states are better understood as ensembles, and pure states
should be understood as the most precise ensemble that can be prepared in the theory.

1.14 Summary
Let’s step back and sum up all that we learned by applying the reverse physics approach
to classical mechanics. If we start from scratch, the first assumption that we need to set is
Infinitesimal Reducibility IR. This tells us that the domain of classical physics is those objects
that can be thought of as being made of arbitrarily small parts. For these objects, specifying
the state of the whole system is equivalent to specifying the state of all the infinitesimal parts.
In this context, a particle is an infinitesimal part, the limit of recursive reduction of parts into
smaller parts. Mathematically, the assumption tells us that the state space of the particles is
a differentiable manifold, together with a volume measure that defines the count of states for
each region. The state space of a classical object is a distribution over such manifold.
We then add assumption IND that tells us that the system is decomposable into indepen-
dent degrees of freedom. This means that not only are we able to count states over regions
of state space, but we must be able to count configurations along each DOF. Moreover, the
count of configurations over multiple DOFs must be the product of the count over each DOF.
The only way to obtain this structure, so that the count is defined independently of the units
used to describe states, is for the manifold to be even dimensional, and for any state variable
1.14. SUMMARY 95

IR: Classical systems

IND: Classical phase space

KE: Newtonian DR: Hamiltonian


systems systems

KE+DR:
Lagrangian
systems

Figure 1.9: Relationship between the constitutive assumptions and the different formulations
of classical mechanics. All classical systems satisfy infinitesimal reducibility. The indepen-
dence of DOF recovers classical phase space. An ideal gas, with state variables [P V T ] is an
example of a classical system that does not satisfy IND. Another example is a systems with
non-holonomic constraints. Newtonian system are fully characterized by spatial trajectories.
Hamiltonian systems are characterized by deterministic and reversible dynamics. Lagrangian
systems are characterized by both.

q i that defines an independent unit, there is a conjugate state variable pi whose units are
count of states over units of the corresponding q i . This recovers the structure of classical
phase space. Mathematically, this gives us the structure of a symplectic manifold, where the
symplectic form ωab counts the configurations over the infinitesimal parallelogram defined by
two vectors.
If we add assumption DR that tells us that the evolution is deterministic and reversible,
we recover Hamiltonian mechanics. For a system to be deterministic and reversible, in fact, we
must map not only each initial state to one and only one final state, but we must map initial to
final regions while preserving the state count. Moreover, the notion of independence must be
preserved, and therefore the way configurations are counted over independent DOFs must also
be preserved. Under these conditions, the displacement vector field, which tells us how states
move in phase space, allows a potential H, which corresponds to the Hamiltonian. This can
be generalized to the time-dependent case, which recovers relativistic features even without
the notion of a metric tensor. Mathematically, the form ωab must be preserved, meaning
deterministic and reversible evolution is a symplectomorphism. Each symplectomorphism can
be characterized by a function H, the Hamiltonian.
If, instead, we add assumption KE that tells us that the dynamics is recoverable from the
kinematics, we recover Newtonian mechanics. If the dynamics is recoverable from the kinemat-
96 CHAPTER 1. CLASSICAL MECHANICS

ics, the momentum must be a function of position and velocity. The units of position are the
only independent units, and therefore define the units of momentum and also the transforma-
tion between densities over dynamic and kinematic variables. This forces the transformation
between position and velocity to be linear. If no forces are present, we can find coordinates
such that the linear transformation is simply a multiplication by a constant. That is, pi = mv i .
These are the inertial frames in Cartesian coordinates and m is the inertial mass. In this case,
assumption DR applies as well, and, since momentum is constant, the velocity is constant. If
a force is present, this can be expressed in terms of position and velocity, recovering F = ma.
If we take both assumption DR and KE, Lagrangian mechanics is recovered. The action is
the line integral over the potential θa of the form ωab = −∂a ∧ θb , and since the vector potential
is unphysical, so is the action. The variation of the action is physical as, by Stokes’ theorem,
it will give the surface integral of ωab , which is zero if and only if the path is always tangent to
the direction of motion, which happens only if the path is an actual evolution of the system.
All the elements of classical mechanics, both physically and mathematically, can be un-
derstood just in terms of these assumptions and the concepts that they require.
Part II

Physical Mathematics

97
1.14. SUMMARY 99

Physical mathematics is an approach to the mathematical foundations of physics that


seeks to construct mathematical structures strictly from axioms and definitions that can
be rigorously justified from physical requirements. This is in contrast to current approaches
typically followed in mathematical physics, which take tools developed within mathematics
and apply them to physics or physics-inspired problems.
If our goal is to fully rederive physical theories from physical assumptions, we need to
have a precise mapping between physical objects and mathematical ones. Understanding the
axioms and definitions of the mathematical tools used in a physical theory, then, is not
just “mathematical detail” of no concern to the physicist, but rather the precise stipulation
of properties that certain physical objects must have under suitable, possibly simplifying,
assumptions. In this sense, there is no “correct” structure in a mathematical sense, because
the correct structure is the one suited to the physical problem at hand.
It should be clear that mathematicians are generally ill-equipped to determine whether
mathematical structures are physically significant. As David Hilbert stated, “Mathematics is
a game played according to certain simple rules with meaningless marks on paper.” Regarding
mathematical axioms, Bertrand Russell claimed, “It is essential not to discuss whether the first
proposition is really true, and not to mention what the anything is, of which it is supposed
to be true.” Mathematics knows the rules of everything but the meaning of nothing. It is
therefore unreasonable to expect that the foundations of mathematics, by themselves, can
provide any foundation for physics.
In the same way that elaborate correct mathematical theories stem from minimal cor-
rect mathematical theories (not elaborate incorrect mathematical theories); that large living
creatures grow from small living creatures (not large dead creatures); sophisticated physically
meaningful theories come from simple physically meaningful theories (and not from sophis-
ticated meaningless ones). Meaningfulness, like correctness or aliveness, is not something
that can be imposed after the fact. Therefore the only way to develop physically meaningful
mathematical structures is to develop them from scratch: we cannot simply take higher level
mathematical objects and “sprinkle meaning”, an interpretation, on top.
The goal of physical mathematics, then, is to find how to turn physical assumptions into
precise mathematical requirements, such that we are guaranteed to know what exactly each
mathematical object represents and under which physical conditions.

A new standard for scientific rigor


From the above discussion, it follows that the standard of rigor mathematicians have developed
for their field is not sufficient for the purpose of physical mathematics. Mathematics only deals
with formal systems, whose starting points are a set of definitions and rules that are taken as is.
At that point, correctness of the premise cannot be established, only self-consistency. Therefore
mathematics fails to deal with the most delicate and interesting parts of the foundations of
physics: the physical assumptions and how they are encoded into the formal framework. We
therefore need rules and standards for rigorously handling the informal parts of the framework
and, since there are no guidelines for this, we set our own standard.
We call an axiom a proposition that brings new objects or new properties of established
objects within the formal framework. A definition, instead, is a proposition that further
characterizes objects and properties already present in the formalism. An axiom or a def-
inition is well posed only when it is clear what the objects represent physically
and what aspects are captured mathematically. Therefore each axiom and definition
100 CHAPTER 1. CLASSICAL MECHANICS

is composed of two parts. The first characterizes the objects and properties within the in-
formal system, tells us what they represent physically. The second part, typically preceded
by “Formally”, characterizes the part that is captured by the formal system. Axioms and
definitions are followed by a justification when it is necessary to explain why the elements
in the informal system must be mapped into the formal system in the way proposed. Some
definitions are purely formal and as such do not require justifications. As this argument spans
both the formal and informal systems, this cannot be a mathematical proof in the modern
sense. In particular, the justification for an axiom must argue why those objects must exist.
The above standard makes sure we have a perfect identification between formal and in-
formal objects. All mathematical symbols correspond to physical objects and all the relevant
physical concepts are captured by the math. All subsequent propositions and proofs, then,
can be carried out in the formal system, where it is easier to check for consistency and cor-
rectness. However, all the proofs can, if needed, be translated into the informal language and
given physical meaning.
Chapter 1

Verifiable statements and


experimental domains

In this chapter we lay the foundations for our general mathematical theory of experimental
science: a formalism that is broad enough to be applied to any area of scientific investigation.
It is based on the idea of verifiable statements, assertions that can be experimentally shown
to be true. Whether it is physics or chemistry, economics or psychology, medicine or biology,
the goal is always to find some objective truth about the natural world that is supported by
experimental data.
We group verifiable statements into experimental domains which represent the list
of all possible verifiable answers to a particular scientific question. From those, we define
theoretical domains which add those statements that, though not directly verifiable, are
associated to an experimental test with no guarantee of termination. Within each theoretical
domain, we find those particular statements that, if true, imply the truthfulness or falsehood
of all other statements. We call these the possibilities of the domain as they identify the
complete descriptions that are admissible. To answer a scientific question, then, is to find
which possibility is consistent with experimental data: the one that correctly predicts the
result of all experimental tests.
We’ll see how the above organization always exists on any given set of verifiable state-
ments. That is, it is a fundamental structure for all sciences. We’ll also see that these concepts
are deeply intertwined with fundamental mathematical tools: experimental domains map to
topologies while theoretical domains map to σ-algebras. These two core mathematical struc-
tures provide the foundation for differential geometry, Lie algebras, measure theory, proba-
bility theory and other mathematical branches that are heavily used in physics and other
sciences.
As a consequence of this connection, we can build a more precise, intuitive and insightful
understanding of what these mathematical structures are meant to represent in the scientific
world. It also reveals why these mathematical tools are so pervasive and successful in science.

1.1 Statements
Statements, like “the mass of the electron is 511 ± 0.5 keV” or “that animal is a cat”, will be
the cornerstone of our general mathematical theory of experimental science. In this section we
will outline the basic definitions that allow us to combine statements into other statements

101
102 CHAPTER 1. VERIFIABLE STATEMENTS AND EXPERIMENTAL DOMAINS

(e.g. “that animal is black” and “that animal a cat” gives “that animal is a black cat”) and
to compare their content (e.g. “the mass of the electron is 511 ± 0.5 keV” and “the mass of
the electron is 0.511 ± 0.0005 MeV” are equivalent).
We will start from a somewhat different starting point than what is customary in order
to address a few issues. The first is that we need to develop a formal framework to handle the
relationship of statements that are not themselves formally defined.1 The second issue is that
the truth values in our context are in general found experimentally, not through deduction.
The role of the logic framework is to keep track of the consistency between the possible
hypothetical cases. That is, we need relationships that capture the idea of causal relationship,
that one is true because the other is true, and not merely by coincidence.2 We remind the
reader that the mathematical sections, highlighted with a green bar on the side, can be skipped
without loss of conceptual understanding in case one is not interested in all the details.
As a starting point, we need to define what science is: the systematic study of the physical
world through observation and experimentation. We therefore introduce the principle of scien-
tific objectivity that will guide us throughout this work. This states that science is universal,
non-contradictory and evidence based.
Principle of scientific objectivity. Science is universal, non-contradictory and evidence
based.
Consider assertions like “jazz is marvelous” or “green and red go well together”. These are
not objective: there is no agreed upon definition or procedure for what constitutes marvelous
music or good color combination. Because of their nature, they can’t be the subject of scientific
inquiry. This does not mean that marvelous music or good color combinations do not exist
or are not worth studying.3 What the principle tells us is simply that if we choose to do
science, we are limiting ourselves to those assertions that are either true or false (i.e. non-
contradictory) for everybody (i.e. universal): assertions that have a single truth value. We call
these assertions statements: they are the basic building blocks of a scientific description as
only these can be studied scientifically.
Logical consistency, though, is not just a property of individual statements. Consider the
following two sentences:

“the next sentence is true”


“the previous sentence is false”

Each assertion, by itself, would be fine but their combination makes it impossible to assign
truth values to both in a way that is logically consistent. For this reason we group statements
into logical contexts, sets of statements for which it is possible to assign truth values to all in
a way that is logically consistent.

Definition 1.1. The Boolean domain is the set B = {false, true} of all possible truth
values.

1
This is the main reason we cannot simply use the tools of mathematical logic.
2
In contrast, in standard logic “the moon is a satellite of earth” implies “5 is prime”.
3
In fact, one can argue that most of the things that make life worth living (e.g. love, friendship, arts,
purpose and so on) defy objective characterization and, therefore, that science gives us certain truth about
trifling matters.
1.1. STATEMENTS 103

Axiom 1.2 (Axiom of context). A statement s is an assertion that is either true or false.
A logical context S is a collection of statements with well defined logical relationships.
Formally, a logical context S is a collection of elements called statements upon which is
defined a function truth ∶ S → B.

Justification. As science is universal and non-contradictory, it must deal with assertions


that have clear meaning, well-defined logical relationships and are associated with a unique
truth value. A priori, we only assume these objects exist, simply because we cannot proceed
if we do not. A posteriori, we see that a particular set of statements works in practice, which
shows that science could indeed be done with those statements. We are therefore justified
to assume the existence of sets of assertions that have the aforementioned properties. We
call a logical context such a group of assertions and we represent it formally as a set S.
We call statement each assertion within a logical context and represent it formally with an
element s. We say s ∈ S if the statement s belongs to the logical context S.
Given a context, we are also justified to assume the existence of a function truth ∶ S → B
such that truth(s) = true for all s ∈ S that are true for everybody and truth(s) = false for
all s ∈ S that are false for everybody. In fact, note that no s ∈ S can be both true and false
for everybody as it would be contradictory. Note that no s ∈ S can be both true for some
and false for others as it would not be universal. Note that no s ∈ S can be neither true nor
false for everybody. Suppose it were. Then its truth can never be, even potentially, settled
with experimental evidence. Therefore either truth(s) = true or truth(s) = false for all
s ∈ S.
Note that the statement is the concept asserted, not the words used to express the
concept. A sentence in a particular language is neither necessary nor sufficient to define a
statement. Consider the English language and take the sentence “Snoopy is a dog”. The
truth will depend on whether or not a fictional character qualifies as a dog. The sentence,
by itself, is not enough to determine the truth value. Conversely, take the idea that a
particular animal is a dog. We can express the concept in Italian as “quell’animale è un
cane” without using an English sentence. Therefore English sentences are neither necessary
nor sufficient to express a statement. This will be true of any other language. Therefore,
the basic notion of statement is considered prime, independent of its expression. Moreover,
all statements are considered equally prime.
Also note that logical consistency is not a property of an individual statement but
rather of a set of statements. Consider the two statements: “the next sentence is true” and
“the previous sentence is false”. The two statements together are not logically consistent
as assuming one to be true leads to a proof for it to be false and vice-versa. If we changed
the latter statement to “the previous sentence is true” then both statements would be true
and logically consistent. Therefore logical consistency is defined on the logical context, and
each statement has to be defined as belonging to a context.
Finally, note that the existence of a truth function on the context imposes logical con-
sistency on the logical context without worrying about the details of how this is achieved.

The idea of statements has their origin in the philosophical tradition of classical logic,
“Socrates is a man” being a classic example. Any language can be used to form them, formal
or natural, as indeed any language is used in practice. This means we are not going to care what
104 CHAPTER 1. VERIFIABLE STATEMENTS AND EXPERIMENTAL DOMAINS

particular syntax (i.e. symbols and grammar rules) is used.4 In fact, even a grammatically
incorrect statement is fine as long as the intent is clear. On the other hand, we are going to
care about the semantics of the statements (i.e. their content and meaning). Therefore we will
consider “Socrates is a man” and “Socrate e’ un uomo” to be the same statement because
they provide the same assertion but in different languages.
Moreover, when we say “Socrates is a man” it has to be clear who is Socrates and what
a man is. If it weren’t, we would have no idea what to experimentally test and how. This is
also important because the mere content of a set of statements puts constraints on what can
be found to be true or false. Consider the statement “that cat is a swan”. There is nothing
to experimentally test here: based on the definitions of cat and swan we know the statement
can never be true, no matter what particular cat we are considering. The statement provides
no new information. Consider, instead, the statements “that animal is a mammal” and “that
animal is a bird”. Based on the content, each of the statements can be found true or false
separately, but they can’t be found true together simply by how mammals and birds are
defined.
We want each logical context to keep track not only of the truth value of each statement,
but also of which truth combinations are possible merely because of the content. Given a
logical context S we will call an assignment for S a map a ∶ S → B that for each statement
gives a truth value. For example, consider the following table:

“that animal is a cat” “that animal is a mammal” “that animal is a bird” ...
T T T ...
T T F ...
F F T ...
F T F ...

Each line represents an assignment for all statements, each represented by a column. The set
of all assignments is the set of all functions from S to B, which in set theory notation is BS ,
and corresponds to all possible permutations. Some assignments, like the one in the first line,
are not consistent with the content of the statements. We will call possible the ones that are
allowed and impossible the ones that aren’t. Naturally, the truth must be one of the possible
assignments. As the context itself needs to tell us which assignments are possible, we can
imagine it comes equipped with the set AS ⊆ BS of all possible assignments for that context.5
Some statements may be allowed to either be true or false while others may only be
allowed one option. We will call certainty a statement that can only be true, like “that cat is
an animal”. We will call impossibility a statement that can only be false, like “that cat is a
4
In general, statements in this context are not necessarily well formed formulas, predicates or similar
concepts in the context of mathematical or propositional logic. Scientific investigation in the broad sense of
learning from experimentation predates math and formal languages: information about agriculture, astronomy,
metallurgy, botany and the like were collected and used even before the written word. Moreover, cognitive
scientists have shown that children start using deliberate experimentation at a very young age to understand
the world around them, even before their speech is fully developed. Ultimately, that knowledge is encoded in
the language of electrical and biochemical signals. Formal languages are indeed extremely helpful in that they
allow us to be more precise and to better keep track of possible inconsistencies, but ultimately one always
needs natural language to give meaning and context to the mathematical symbols.
5
In the same way that we do not try to capture how statements are constructed, we are not going to
capture how logical consistency is established. We just assume a mechanism is available and therefore one can
check whether an assignment is possible or not.
1.1. STATEMENTS 105

swan”. We will call a contingent statement one that can be either true or false as its truth is
contingent on the assignment, like “that animal is a cat”.
Note that the semantics, the meaning of the statements, plays an important role in defining
the possible assignments but, in general, does not define the truth values.6 For example, even
if the meaning of “the next race is going to be won by Secretariat” is clear, and so is its logical
relationship to “the next race is going to be lost by Secretariat”, we may be none the wiser
about its truthfulness.

Definition 1.3. Given a set of statements, an assignment associates a truth value with
each statement. Formally, an assignment for a logical context S is a map a ∶ S → B, an
element of BS . An assignment for a set of statements S ⊆ S is a map a ∶ S → B while an
assignment for a statement s ∈ S is a truth value t ∈ B.
Justification. Given that an assignment for a set S ⊆ S associates a truth value to each
statement, it is identified by a map a ∶ S → B. The assignment for a single statement is
given by a single truth value t ∈ B.

Axiom 1.4 (Axiom of possibility). A possible assignment for a logical context S is a


map a ∶ S → B that assigns a truth value to each statement in a way consistent with the
content of the statements. Formally, each logical context comes equipped with a set AS ⊆ BS
such that truth ∈ AS . A map a ∶ S → B is a possible assignment for S if a ∈ AS .

Justification. The meaning associated to each statement in the context may prevent
some assignments from being logically consistent. Consider, for example, the statements
“that animal is a cat” and “that animal is a dog”. Given that an animal cannot be both
a cat and a dog, an assignment that associated true to both statements would not be
logically consistent. Given that the context must give clear meaning to all statements, it
must be able to clarify whether an assignment a ∈ BS is consistent with the meaning of all
statements. We call these possible assignments.
For each logical context, then, we are justified to assume the existence of a set AS ⊆ BS
such that a ∈ AS if and only if a is a possible assignment.

Definition 1.5. Let S ⊆ S be a set of statements. Then a ∶ S → B is a possible assignment


for S if there exists ā ∈ AS such that ā(s) = a(s) for all s ∈ S. Let s ∈ S be a statement.
Then t ∈ B is a possible assignment for s if there exists ā ∈ AS such that ā(s) = t.

Justification. Let S ∈ S be a set of statements. If a ∶ S → B is a possible assignment


then there must be a way to assign the remaining statements in the domain in a way that
is logically consistent. Therefore there must be an ā ∈ AS such that ā(s) = a(s) for all s ∈ S.
Similarly, t ∈ B is a possible assignment for s if there exists ā ∈ AS such that ā(s) = t. This
justifies the definition.

Definition 1.6. Statements are categorized based on their possible assignments.

ˆ A certain statement, or certainty, is a statement ⊺ that must be true simply because


of its content. Formally, a(⊺) = true for all possible assignments a ∈ AS .

6
In other formalisms the semantics is said to define the truth values, not in ours.
106 CHAPTER 1. VERIFIABLE STATEMENTS AND EXPERIMENTAL DOMAINS

ˆ An impossible statement, or impossibility, is a statement – that must be false simply


because of its content. Formally, a(–) = false for all possible assignments a ∈ AS .
ˆ A statement is contingent if it is neither certain nor impossible.

Justification. Since a certainty must be true, no possible assignment can assign it to be


false. Therefore a(⊺) = true for all a ∈ AS . Similarly, an impossibility must be false and
no possible assignment can assign it to be true. Therefore a(–) = false for all a ∈ AS . This
justifies the definitions.

Corollary 1.7. A statement s ∈ S can only be exactly one of the following: impossible,
contingent, certain.

Proof. Let s ∈ S be a statement. If it is contingent, by definition, it is neither certain


nor impossible. If it is not contingent, it is either certain or impossible. If s is certain, then
a(s) = true for all possible assignments a ∈ AS . This means a(s) ≠ false for all possible
assignments a ∈ AS and therefore s is not impossible. If s is impossible, then a(s) = false
for all possible assignments a ∈ AS . This means a(s) ≠ true for all possible assignments
a ∈ AS and therefore s is not certain.
Next we want to keep track of statements whose truth depends on the truth of other
statements. Consider “that animal is a cat” and “that animal is not a cat”: if the first one
is true then the second is false and vice-versa. In this sense, the second statement depends
on the first. Therefore, in general, a statement depends on other statements if its truth is
determined by the truth values of the other statements in every possible assignment.
Since statements are intangible, there are no limits to the number of arguments one state-
ment may depend on. For example, consider the statement “the mass of the electron is 511±0.5
keV” and the set of all the statements of the form “the mass of the electron is exactly x keV”
with 510.5 < x < 511.5. If any of the latter is true then the original statement is true as well.
Given that x is a real number, that is uncountably many statements so the original statement
can be seen as a function of uncountably many statements. Therefore we will assume we can
always create a statement that arbitrarily depends on an arbitrary set of statements.

Definition 1.8. Let s̄ ∈ S be a statement and S ⊆ S be a set of statements. Then s̄ depends


on S (or it is a function of S) if we can find an fB ∶ BS → B such that

a(s̄) = fB ({a(s)}s∈S )

for every possible assignment a ∈ AS . We say s̄ depends on S through fB . The relationship


is illustrated by the following diagram:
S⊆S
³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ
s1 s2 s3 ... s̄

⎪ a1 T T F ... T a1 (s̄) = fB (T,T,F)


⎪ fB
⎪ a2
⎪ T F T ... T a2 (s̄) = fB (T,F,T)
AS ⎨ a ÐÐÐÐÐÐÐÐ→
⎪ 3 T F F ... F a3 (s̄) = fB (T,F,F)


⎪ s1 AND (s2 OR s3 )

⎪ ... ... ... ... ... ... fB (...,...,...)

1.1. STATEMENTS 107

Axiom 1.9 (Axiom of closure). We can always find a statement whose truth value arbi-
trarily depends on an arbitrary set of statements. Formally, let S ⊆ S be a set of statements
and fB ∶ BS → B an arbitrary function from an assignment of S to a truth value. Then we
can always find a statement s̄ ∈ S that depends on S through fB .

Justification. Let S ⊆ S be a set of statements and let fB ∶ BS → B be an arbitrary


function. Consider the statement “this statement is true if the function fB applied to the
truth values of S is true”. This statement has a well defined meaning that assigns a truth
value that is the same for everybody. It is therefore logically consistent with all other
statements in S. We are therefore justified to assume it is in the context, for, if it is not,
it can be added without problem.

Corollary 1.10. Functions on truth values induce functions on statements. Formally, let
I be an index set and fB ∶ BI → B be a function. There exists a function f ∶ S I → S such
that
a(f ({si }i∈I )) = fB ({a(si )}i∈I )
for every indexed set {si }i∈I ⊆ S and possible assignment a ∈ AS .

Proof. Given I and fB ∶ BI → B we can construct f ∶ S I → S as follows. Given an


indexed set {si }i∈I ∈ S I , let S ⊆ S be the set of elements in the indexed set and define
f¯B ∶ BS → B such that f¯B ({a(s)}s∈S ) = fB ({a(si )}i∈I ). Then by axiom 1.9 we can find
a statement s̄ that depends on S through f¯B and we can set f ({si }i∈I ) = s̄. We have
a(f ({si }i∈I )) = fB ({a(si )}i∈I ) for all indexed sets and for all possible assignments.

To better characterize truth functions, we borrow ideas and definitions from Boolean
algebra which is the branch of algebra that operates on truth values. Boolean algebra is fun-
damental in logic and computer science, since every digital circuit ultimately is implemented
on two-state systems (e.g. high/low voltage, up/down magnetization). The most fundamental
elements in that algebra are the following three simple operations: negation (i.e. logical NOT),
conjunction (i.e. logical AND) and disjunction (i.e. logical OR).
Suppose s1 = “the sauce is sweet” and s2 = “the sauce is sour”. We can apply the three
operations to make this table:

Operator Gate Symbol Example


Negation NOT ¬s1 “the sauce is not sweet”
Conjunction AND s1 ∧ s2 “the sauce is sweet and sour”
Disjunction OR s1 ∨ s2 “the sauce is at least sweet or sour”
Table 1.1: Boolean operations on statements.

Most languages, natural or symbolic, typically already provide similar operations, as


the examples show. Technically, though, we should consider the ones defined here as meta-
operations that are defined outside the language of the statements. For example, “x is the
2
position of a ball”∧“ ddt2x = −g” stitches together an English statement with a calculus state-
ment into a new statement that is neither. This kind of mix should be allowed as it does
happen in practice.
108 CHAPTER 1. VERIFIABLE STATEMENTS AND EXPERIMENTAL DOMAINS

Definition 1.11. The negation or logical NOT is the function ¬ ∶ B → B that takes a
truth value and returns its opposite. That is: ¬true = false and ¬false = true. We also
call negation ¬ ∶ S → S the related function on statements.

Definition 1.12. The conjunction or logical AND is the function ∧ ∶ B × B → B


that returns true only if all the arguments are true. That is: true ∧ true = true and
true∧false = false∧true = false∧false = false. We also call conjunction ∧ ∶ S×S → S
the related function on statements.

Definition 1.13. The disjunction or logical OR is the function ∨ ∶ B × B → B that


returns false only if all the arguments are false. That is: false ∨ false = false and
true ∨ false = false ∨ true = true ∨ true = true. We also call disjunction ∨ ∶ S × S → S
the related function on statements.

Proposition 1.14. A logical context S is closed under negation, arbitrary conjunction and
arbitrary disjunction.

Proof. Negation, arbitrary conjunction and arbitrary disjunction are particular truth
functions. Their output always exists by axiom (1.9).

Now we have all the elements to define when two statements have the same logical content.
Consider the two statements “that animal is a bird” and “that animal has feathers”: since
all birds and only birds have feathers they give us the same information. Consider “the mass
of the electron is 511 ± 0.5 keV” and “the mass of the electron is 0.511 ± 0.0005 MeV”: they
represent the same measurement but in different units. So, how can we express the fact that
two statements s1 and s2 give us the same information? The idea is that they can never be
assigned opposite truth values. If we assigned true to the first, then the second must be true
as well. If we assigned false to the first, then the second must be false.7

Definition 1.15. Two statements s1 and s2 are equivalent s1 ≡ s2 if they must be equally
true or false simply because of their content. Formally, s1 ≡ s2 if and only if a(s1 ) = a(s2 )
for all possible assignments a ∈ AS .
Justification. If two statements must be equally true or false simply because of their
content, then their value must be the same in all possible assignments, which justifies the
definition.
Again, we want to stress that this notion of equivalence is not based on the truth value
(i.e. whether the statements happen to be both true or false) or on whether they are the same
statement (i.e. whether they assert the same thing): it is based on the possible assignment
(i.e. whether there exists a possible assignment in which the two statements have a different
truth value) and therefore on the content of the statements. We sum up the difference in the
following table.
Note that the relationships in the table are ordered by strength: if two statements are
7
This technique allows us to do something analogous to model theory. Two statements are equivalent if
their truth is equal for all consistent truth assignments, which in our framework play the part of the models of
model theory. But in our context the assignments are only hypothetical: there isn’t a model in which “this is
a cat” and another in which “this is a dog”. There is only one truth value, the one we find experimentally.
1.1. STATEMENTS 109

Also
Relationship Symbol First statement Second statement known as
Statement s1 = s2 “Swans are birds” “I cigni sono uccelli” Semantic
equality equivalence
Statement s1 ≡ s2 “Swans are birds” “Swans have feathers” Logical
equivalence equivalence
Truth truth(s1 ) = truth(s2 ) “Swans are birds” “The earth is round” Material
equality equivalence
Table 1.2: Different types of statement relationships.

equal, they are also equivalent; if two statements are equivalent, they have equal truth. The
reverse is not true in general: two statements with equal truth may not be equivalent; two
equivalent statements may not be equal.
Intuitively, statement equivalence answers the question: do these two statements carry the
same information? Is experimentally testing the first the same as experimentally testing the
second? If that’s the case, they are essentially equivalent to us. So much so, that from now
on we will implicitly assume two different statements to be inequivalent.8
There are a number of useful properties that statement equivalence satisfies.

Corollary 1.16. All certainties are equivalent. All impossibilities are equivalent.
Proof. Let ⊺1 , ⊺2 ∈ S be two certainties. Then for every possible assignment a ∈ AS we
have a(⊺1 ) = true = a(⊺2 ) and therefore ⊺1 ≡ ⊺2 by definition.
Let –1 , –2 ∈ S be two impossibilities. Then for every possible assignment a ∈ AS we
have a(–1 ) = false = a(–2 ) and therefore –1 ≡ –2 by definition.

Corollary 1.17. Two statements s1 and s2 are equivalent if and only if (s1 ∧ s2 ) ∨ (¬s1 ∧
¬s2 ) ≡ ⊺.

Proof. Let a ∈ AS and s = (s1 ∧s2 )∨(¬s1 ∧¬s2 ). We have a(s) = (a(s1 )∧a(s2 ))∨(¬a(s1 )∧
¬a(s2 )). We have the following truth table

s1 s2 s
T T T
T F F
F T F
F F T

Note that the assignments for which a(s1 ) = a(s2 ) are exactly the assignments for which
a(s) = true. Therefore s1 ≡ s2 if and only if s is a certainty.

Corollary 1.18. Statement equivalence satisfies the following properties:

8
Technically, when we’ll say that s is a statement we actually mean s is an equivalence class of statements.
We are not going to be explicit about the distinction, though, as we feel it simply distracts without adding
greater clarity. We’ll let the context determine what is meant.
110 CHAPTER 1. VERIFIABLE STATEMENTS AND EXPERIMENTAL DOMAINS

ˆ reflexivity: s ≡ s
ˆ symmetry: if s1 ≡ s2 then s2 ≡ s1
ˆ transitivity: if s1 ≡ s2 and s2 ≡ s3 then s1 ≡ s3

and is therefore an equivalence relationship.

Proof. Statement equivalence is defined in terms of truth equality within all possible
assignments and will inherit reflexivity, symmetry and transitivity from it.

Corollary 1.19. A logical context S satisfies the following properties:

ˆ associativity: a ∨ (b ∨ c) ≡ (a ∨ b) ∨ c, a ∧ (b ∧ c) ≡ (a ∧ b) ∧ c
ˆ commutativity: a ∨ b ≡ b ∨ a, a ∧ b ≡ b ∧ a
ˆ absorption: a ∨ (a ∧ b) ≡ a, a ∧ (a ∨ b) ≡ a
ˆ identity: a ∨ – ≡ a, a ∧ ⊺ ≡ a
ˆ distributivity: a ∨ (b ∧ c) ≡ (a ∨ b) ∧ (a ∨ c), a ∧ (b ∨ c) ≡ (a ∧ b) ∨ (a ∧ c)
ˆ complements: a ∨ ¬a ≡ ⊺, a ∧ ¬a ≡ –
ˆ De Morgan: ¬a ∨ ¬b ≡ ¬(a ∧ b), ¬a ∧ ¬b ≡ ¬(a ∨ b)

Therefore S is a Boolean algebra by definition.

Proof. The left and right expressions for each equivalence correspond to the same truth
function applied to the same statements. Therefore, the left side is true if and only if the
right side is true and they are therefore equivalent by definition.
These operations and properties define the algebra of statements. While we started from
a slightly different premise, the relationships we found are the logical identities of classical
logic. These are exactly what we need to make sure the truth values of all our statements are
consistent.
Equivalence is not the only semantic relationship that we want to capture. Consider the
contents of the following:

s1 =“that animal is a cat”


s2 =“that animal is a mammal”
s3 =“that animal is a dog”
s4 =“that animal is black”

The second will be true whenever the first is true. In this case we say the first statement is
narrower than the second (s1 ≼ s2 ) because “that animal is a cat” is more specific than “that
animal is a mammal”. The third will be false whenever the first is true. In this case we say
that they are incompatible (s1 ̸ s3 ) because “that animal is a dog” and “that animal is a
cat” can never be true at the same time. The fourth will be true or false regardless of whether
the first is true. In this case we say that they are independent (s1 á s4 ) because knowing
whether “that animal is a cat” tells us nothing about whether “that animal is black”. As for
equivalence, we can define these relationships upon the previous definitions.

Definition 1.20. Given two statements s1 and s2 , we say that:


1.1. STATEMENTS 111

ˆ s1 is narrower than s2 (noted s1 ≼ s2 ) if s2 is true whenever s1 is true simply


because of their content. That is, for all a ∈ AS if a(s1 ) = true then a(s2 ) = true.
ˆ s1 is broader than s2 (noted s1 ≽ s2 ) if s2 ≼ s1 .
ˆ s1 is compatible to s2 (noted s1  s2 ) if their content allows them to be true at the
same time. That is, there exists a ∈ AS such that a(s1 ) = a(s2 ) = true.

The negation of these properties will be noted by ⋠, ⋡ , ̸ respectively.

Definition 1.21. The elements of a set of statements S ⊆ S are said to be independent


(noted s1 á s2 for a set of two) if the assignment of any subset of statements does not
depend on the assignment of the others. That is, a set of statements S ⊆ S is independent
if given a family {ts }s∈S such that each ts ∈ B is a possible assignment for the respective s
we can always find a ∈ AS such that a(s) = ts for all s ∈ S.

Proposition 1.22. The above operations obey the following relationships:

(i) s1 ≼ s2 if and only if s1 ∧ ¬s2 ≡ –


(ii) s1 ≼ s2 if and only if s1 ∧ s2 ≡ s1
(iii) s1 ≼ s2 if and only if s1 ∨ s2 ≡ s2
(iv) s1  s2 if and only if s1 ∧ s2 ≢ –
(v) s1 ̸ s2 if and only if s1 ∧ ¬s2 ≡ s1
(vi) s1 ≼ s1 ∨ s2
(vii) s1 ∧ s2 ≼ s1
(viii) s1 ≼ s2 if and only if ¬s1 ∨ s2 ≡ ⊺

Proof. For (i), consider the following truth table

s1 s2 s1 ∧ ¬s2
T T F
T F T
F T F
F F F

The assignment for which a(s1 ) = true and a(s2 ) = false is exactly the assignment for
which a(s1 ∧ ¬s2 ) = true. Therefore s1 ≼ s2 if and only if s1 ∧ ¬s2 is impossible.
For (ii), consider s1 ∧ s2 ≡ s1 ∧ s2 ∨ –. Since s1 ≼ s2 , s1 ∧ ¬s2 ≡ –. We have s1 ∧ s2 ∨ – ≡
(s1 ∧ s2 ) ∨ (s1 ∧ ¬s2 ) ≡ s1 ∧ (s2 ∨ ¬s2 ) ≡ s1 ∧ ⊺ ≡ s1 . Therefore s1 ∧ s2 ≡ s1 . The same logic can
be applied in reverse.
For (iii), consider s1 ∨ s2 ≡ s1 ∨ s2 ∧ ⊺ ≡ (s1 ∨ s2 ) ∧ (¬s2 ∨ s2 ) ≡ (s1 ∧ ¬s2 ) ∨ s2 . Since s1 ≼ s2 ,
s1 ∧ ¬s2 ≡ –. We have (s1 ∧ ¬s2 ) ∨ s2 ≡ – ∨ s2 ≡ s2 . Therefore s1 ∧ s2 ≡ s2 . The same logic can
be applied in reverse.
For (iv), consider the following truth table
112 CHAPTER 1. VERIFIABLE STATEMENTS AND EXPERIMENTAL DOMAINS

s1 s2 s1 ∧ s2
T T T
T F F
F T F
F F F

The assignment for which a(s1 ) = a(s2 ) = true is exactly the assignment for which a(s1 ∧
s2 ) = true. Therefore s1  s2 if and only if s1 ∧ s2 is not impossible.
For (v), consider s1 ∧ ¬s2 ≡ s1 ∧ ¬s2 ∨ –. Since s1 ̸ s2 , s1 ∧ s2 ≡ –. We have s1 ∧ ¬s2 ∨ – ≡
(s1 ∧ ¬s2 ) ∨ (s1 ∧ s2 ) ≡ s1 ∧ (¬s2 ∨ s2 ) ≡ s1 ∧ ⊺ ≡ s1 . Therefore s1 ∧ ¬s2 ≡ s1 . The same logic
can be applied in reverse.
For (vi), we have s1 ∧ ¬(s1 ∨ s2 ) ≡ s1 ∧ ¬s1 ∧ ¬s2 ≡ – ∧ ¬s2 ≡ –. Therefore s1 ≼ s1 ∨ s2 .
For (vii), we have s1 ∧ s2 ∧ ¬s1 ≡ s1 ∧ ¬s1 ∧ s2 ≡ – ∧ s2 ≡ –. Therefore s1 ∧ s2 ≼ s1 .
For (viii), suppose s1 ≼ s2 . We have ¬s1 ∨ s2 ≡ ¬s1 ∨ (s1 ∨ s2 ) ≡ (¬s1 ∨ s1 ) ∨ s2 ≡ ⊺ ∨ s2 ≡ ⊺.
Conversely, suppose ¬s1 ∨ s2 ≡ ⊺. We have s1 ∨ s2 ≡ s1 ∨ s2 ∨ ⊺ ≡ s1 ∨ s2 ∨ (¬s1 ∨ s2 ) ≡
(s1 ∨ ¬s1 ) ∨ s2 ≡ ⊺ ∨ s2 ≡ s2 and therefore s1 ≼ s2 .

Proposition 1.23. Statement narrowness satisfies the following properties:

ˆ reflexivity: s ≼ s
ˆ antisymmetry: if s1 ≼ s2 and s2 ≼ s1 then s1 ≡ s2
ˆ transitivity: if s1 ≼ s2 and s2 ≼ s3 then s1 ≼ s3

and is therefore a partial order.

Proof. For reflexivity, s ∧ ¬s ≡ – and therefore s ≼ s by 1.22.


For antisymmetry, s1 ∧ s2 ≡ s1 since s1 ≼ s2 and s1 ∧ s2 ≡ s2 since s2 ≼ s1 . Therefore s1 ≡ s2 .
Conversely, suppose s1 ≡ s2 . Then s1 ∧ s2 ≡ s1 ≡ s2 . Therefore s1 ≼ s2 and s1 ≽ s2 .
For transitivity, we have s1 ≡ s1 ∧ s2 ≡ s1 ∧ s2 ∧ s3 ≡ s1 ∧ s3 and therefore s1 ≼ s3 .

Proposition 1.24. Every subset S ⊆ S of statements has a supremum. That is, there
exists an element s̄ ∈ S such that s ≼ s̄ for all s ∈ S. This, by definition, means the S is a
complete Boolean algebra and, as a consequence, that the distributivity and De Morgan
laws in 1.19 hold in the infinite case.

Proof. Let S ⊆ S be an arbitrary set of statements. Consider s̄ = ⋁ e. This statement


e∈S
exists by 1.9. Let s ∈ S. Using the properties in 1.19 we have s ∨ s̄ ≡ s ∨ ( ⋁ e) ≡ s ∨ s ∨
e∈S
( ⋁ e) ≡ s ∨ ( ⋁ e) ≡ ⋁ e = s̄. By 1.22 we have s ≼ s̄ for any s ∈ S. Therefore any
e∈(S∖{s}) e∈(S∖{s}) e∈S
S ⊆ S admits a supremum s̄.
A complete Boolean algebra is one such that every subset admits a supremum, therefore
the algebra of statements is complete by definition. For a complete Boolean algebra infinite
distributivity and De Morgan laws hold, therefore they will hold in the algebra of statements
as well.

It should be noted that statement narrowness captures more than just the idea of one
statement being more specific than another. Consider “this harp seal is white” and “this harp
seal is less than one year old”. Since harp seals have white fur only for their first month, the
1.1. STATEMENTS 113

first one can never be true while the second is not. Therefore “this harp seal is white” ≼ “this
harp seal is less than one year old”. By the same token, we also have “I lighted the fuse” ≼ “the
bomb will go off”. That is, narrowness can also capture causal relationships, which is essential
if we want to develop a basic theory of scientific investigation.9 Intuitively, a statement is
narrower than another if it provides at least as much or more information when true. If we
experimentally verified the narrower statement, then we already know that the broader one
is also verified.
It should also be noted that independence is not transitive and pair-wise independence is
not sufficient. Consider the following statements for an ideal gas:

1. “the pressure is 101 ± 1 kPa”


2. “the volume is 1 ± 0.1 m3 ”
3. “the temperature is 293 ± 1 Kelvin”

Since the three quantities are linked by the equation of state P V = nRT , any two statements
are independent but the three together aren’t. This notion of independence is similar, and in
fact related, to statistical independence and linear independence.
We finish this section by showing how every logical dependence among statements can be
naturally expressed in terms of negation, conjunction and disjunction. Consider the statement
s =“the sauce is sweet and sour or it is neither”. This depends on the two statements s1 =“the
sauce is sweet” and s2 =“the sauce is sour” defined before: if we know whether s1 and s2 are
true, we can tell whether s is true as well. The idea is that we can express the dependence as
all possible assignments for {s1 , s2 } that make the result true. For example, s will be true if
the sauce is sweet and sour or if it is not sweet and not sour. That is: (s1 ∧ s2 ) ∨ (¬s1 ∧ ¬s2 ).
Similarly, the statement “the sauce is not sweet-and-sour” can be expressed as (s1 ∧ ¬s2 ) ∨
(¬s1 ∧ s2 ) ∨ (¬s1 ∧ ¬s2 ) since it is going to be true in all cases except the one where the sauce
is sweet and sour.
Each of the cases is the conjunction of all independent statements where each one appears
only once, negated or not. We call these expressions minterms. A function can always be
expressed as the disjunction of all the minterms for which the function is true. This is called
its disjunctive normal form because it is a canonical way to express the function in terms of
disjunctions.

Definition 1.25. Let S ⊆ S be a set of statements. A minterm of S is a conjunction


where each element appears once and only once, either negated or not. That is, it is a
statement m ∈ S that can be written as m ≡ ⋀ (¬)a(s) s where a ∶ S → B, ¬true s = s and
s∈S
¬false s = ¬s. In this notation, in a given a0 ∈ BS we have a0 (m) = true if and only if
a0 (s) = a(s) for all s ∈ S.

Proposition 1.26. Let s̄ ∈ S be a statement that depends on a set of statements S ⊆


S through fB ∶ BS → B. Then we can express s̄ in its disjunctive normal form as

9
We considered using the term implication directly, but it seems that it leads to confusion. Implication in
classical logic is something different: it is simply another truth function. Moreover, saying that an impossibility is
narrower than all other statements sounds better than saying that an impossibility implies all other statements.
114 CHAPTER 1. VERIFIABLE STATEMENTS AND EXPERIMENTAL DOMAINS

a disjuction of minterms of S, that is s̄ ≡ ⋁ ( ⋀ (¬)a(s) s) where A ⊆ BS is a set of


a∈A s∈S
assignments for S. In this notation, in a given a0 ∈ BS we have a0 (s̄) = true if and only
if there is an a ∈ A such that a0 (s) = a(s) for all s ∈ S.

Proof. We first show that this can be done for a function fB that returns true for
only one assignment. Let a ∶ S → B be an assignment for S and suppose fB ∶ BS → B is
such that fB (ā) = true if and only if ā = a. Now consider the minterm ma = ⋀ (¬)a(s) s
s∈S
and an assignment ā ∶ S → B for the whole context. We have that ā(ma ) = true if and
only if ā(s) = a(s) for all s ∈ S. Then ā(s̄) = ā(ma ) for all possible assignments ā ∈ AS and
therefore s̄ ≡ ⋀ (¬)a(s) s.
s∈S
Now we generalize the result for arbitrary functions. Let fB ∶ BS → B be a generic
function. Let A = {a ∈ BS ∣ fB (a) = true}. Let ma = ⋀ (¬)a(s) s be the minterm associated
s∈S
with an assignment a ∈ A. Consider m̄ = ⋁ ma and an assignment ā ∶ S → B for the whole
a∈A
context. We have that ā(m̄) = true if and only if ā(s) = a(s) for some a ∈ A and for all s ∈ S.
Then ā(s̄) = ā(m̄) for all possible assignments ā ∈ AS and therefore s̄ ≡ ⋁ ( ⋀ (¬)a(s) s)
a∈A s∈S

With these tools in place we are in a position to formulate models that are universal and
non-contradictory. These models will be a collection of statements with a well defined content
and well defined possible cases. Each statement’s truth value will be discovered experimentally.

1.2 Verifiable statements and experimental domains


We now focus on those statements that are verifiable: we have a way to experimentally confirm
that the statement is true. The main result of this section is that not all functions of verifiable
statements are verifiable statements. For example, since a test has to finish in a finite amount
of time we are not going to be able to verify a statement that is the conjunction (i.e. logical
AND) of infinitely many statements. We are also going to group verifiable statements into
experimental domains which represent all the experimental evidence about a scientific subject
that can be acquired in an indefinite amount of time.
The previous section took care of universality and non-contradiction, but the principle of
scientific objectivity requires science to be evidence based. Consider the statements “23 is
a prime number”, “it is immoral to kill a person exclusively for monetary gain” or “God is
eternal”. They deal with abstract concepts that cannot be defined operationally and therefore
cannot be experimentally verified conclusively. Again, this does not mean these concepts are
of less significance, just that they cannot be the subject of scientific inquiry.10
Limiting the scope of our discussion to objects and properties that are well defined physi-
cally is also not enough. For example, “the electron is green” or “1 meter is equal to 5 Kelvin”
are still not suitable scientific statements as the relationships established are not physically
meaningful. Even when the relationship is meaningful, we may still not be able to validate it
experimentally. For example, “there is no extra-terrestrial life” or “the mass of the electron is
exactly 9.109 × 10−31 kg” are not statements that can be verified in practice. In the first case,
10
In fact, one may be more interested in them precisely because of their abstract, and therefore less transient,
nature.
1.2. VERIFIABLE STATEMENTS AND EXPERIMENTAL DOMAINS 115

we would need to check every corner of the universe and find none, with the closest galaxy
like ours, Andromeda, being 2.5 million light-years away; in the second case, we will always
have an uncertainty associated with the measurement, however small.
So we have to narrow the scope to those and only those statements that can be verified
experimentally. That is, first we have to provide an experimental test: a repeatable experi-
mental procedure (i.e. evidence based) that anyone (i.e. universal) can in principle execute
and obtain consistent results (i.e. non-contradictory). Second, we must guarantee that the
test always terminates successfully if and only if the statement is true. This is both the power
and the limit of scientific inquiry: it gives us a way to construct a coherent description of the
physical world but it is limited to those aspects that can be reliably studied experimentally.
Note that certainty and impossibility are trivially verifiable since we know a priori that they
are true and false respectively. Also note that if we have two statements that are equivalent,
having a test that verifies one means we have a test that verifies the other as well. For example,
since “that animal is a bird” is equivalent to “that animal has feathers”, checking whether
the animal has feathers is equivalent to checking whether the animal is a bird. The subtlety
here is that the evidence may be indirect and it is only the relationship between statements
(i.e. the theoretical model) that guarantees the validity of the verification. This should not
worry us, though, because in this strict sense most experimental data is indirect. It comes
from a chain of inductions (e.g. the pulse of light is produced, bounces off a moving target and
changes frequency due to the Doppler effect, the light signal is transduced to an electronic
signal which finally is displayed on the device) and therefore needs a theoretical framework
to be properly understood.

Axiom 1.27 (Axiom of verifiability). A verifiable statement is a statement that, if true,


can be shown to be so experimentally. Formally, each logical context S contains a set of
statements Sv ⊆ S whose elements are said to be verifiable. Moreover, we have the following
properties:

ˆ every certainty ⊺ ∈ S is verifiable


ˆ every impossibility – ∈ S is verifiable
ˆ a statement equivalent to a verifiable statement is verifiable

Justification. To give a better justification for this and later axioms, we introduce the
following pseudo-mathematical concepts. As science is evidence based, for each logical
context S we must have at our disposal a set of experimental tests, which we indicate
with E. Each element e ∈ E is a repeatable procedure (i.e. it can be restarted and stopped
at any time) that anybody can execute and will always terminate successfully, terminate
unsuccessfully or never terminate. As the tests must provide evidence, the output of the
test must depend on the truth values of the statements; therefore we can assume we have
a function result ∶ E × AS → {success, failure, undefined}. For a statement s to be
verifiable, there must be an experimental test e that succeeds if and only if the statement
is true. That is, for all a ∈ AS , result(e, a) = success if and only if a(s) = true.
Certainties and impossibilities can be associated with trivial tests that always terminate
successfully or unsuccessfully respectively. This justifies assuming them to be verifiable. As
for equivalence and verifiability, let s1 , s2 ∈ S and suppose s1 is verifiable. This means there is
116 CHAPTER 1. VERIFIABLE STATEMENTS AND EXPERIMENTAL DOMAINS

a test e ∈ E such that result(e, a) = success if and only if a(s1 ) = true. Since the statements
are equivalent, a(s1 ) = true if and only if a(s2 ) = true. Therefore result(e, a) = success
if and only if a(s2 ) = true. We are therefore justified to assume s2 is verifiable.
Precisely defining experimental tests as procedures will present the same type of prob-
lems in defining statements or logical consistency, therefore we leave them as primary
concepts. But as primary concepts they only complicate the formal framework without
adding insights, therefore we use them only as part of the justifications for the axioms.

Experimental tests are the second and last building block of our general mathematical
theory of experimental science. As with statements, any language (e.g. natural, formal, en-
gineering drawings, computer programs, ...) can in principle be used to describe the proce-
dure, which can be arbitrarily complicated. It may require building detectors, gathering large
amounts of data and performing complicated computations. We are not going to care how
these procedures are described, just that it is done in a way that allows us to execute the
test.11
As an example, consider a procedure along the lines of:

1. find a swan
2. if it’s black terminate successfully
3. go to step 1

If a black swan exists, at some point we’ll find it and the test will be successful. If a black
swan does not exist, then the procedure will never terminate and the result is undefined.
This is something anybody can do and will eventually always provide the same result: it is
an experimental test. It also terminates successfully if and only if a black swan exists, so the
statement “black swans exist” is verifiable.
Note that, in principle, science can also study statements that can be refuted experimen-
tally. But the negation of those is a statement that can be verified experimentally. Therefore
we lose nothing by only focusing on verification.12
In the previous section we saw that we can combine statements into new statements. How
about verifiable statements? Can we always combine verifiable statements into other verifiable
statements? Since all truth functions can be constructed from the three basic Boolean opera-
tions, the question becomes: can we construct experimental tests for the negation, conjunction
and disjunction of verifiable statements?
The first important result is that the negation of an experimental test, an experimental test
that is successful when the first is not successful, does not necessarily exist. Consider our black
swan example, an experimental test for the negation would be a procedure that terminates
successfully if black swans do not exist. But the given procedure never finishes in that case,
so it is not just a matter of switching success with failure. Because of non-termination, not-
successful does not necessarily mean failure.13 Moreover, it is a result of computability theory
that some problems are undecidable: they do not allow the construction of an algorithm that
11
Trying to formalize a universal language for experimental tests is not only impractical but also conceptually
problematic. To know what we can test experimentally is to know what is physically possible, which is equivalent
to knowing the laws of physics, which is what we are trying to construct a framework for.
12
Mathematically, the spaces of verifiable statements and refutable statements are dual to each other.
13
In this case, the old adage “absence of evidence is not evidence of absence” applies.
1.2. VERIFIABLE STATEMENTS AND EXPERIMENTAL DOMAINS 117

always terminates with a correct yes-or-no answer. So we know that in some cases this is not
actually possible.
In the same vein we are able to confirm experimentally that “the mass of this particle is not
zero” but not that “the mass of this particle is exactly zero” since we always have uncertainty
in our measurements of mass. Even if we could continue shrinking the uncertainty arbitrarily,
we would ideally need infinite time to shrink it to zero. What this means is that not all answers
to the same question can be equally verified. Is the mass of the photon exactly zero? We can
either give a precise “no” or an imprecise “it’s within this range.” Is there extra-terrestrial
life? We can either give a precise “yes” or an imprecise “we haven’t found it so far.”14
Remark. The negation or logical NOT of a verifiable statement is not necessarily a
verifiable statement.
Justification. A statement s ∈ S is verifiable if we can find e ∈ E such that result(e, a) =
success if and only if a(s) = true for all a ∈ AS . This means that for some a ∈ AS we
may have a(s) = false and result(e, a) = undefined. That is, the test is not guaranteed
to terminate if the statement is false. Therefore we cannot, in general, use e to construct
a statement that terminates whenever s is false. Therefore we are not justified, in general,
to assume that the negation of a verifiable statement is verifiable.

While this is true in general, we can still test the negation of many verifiable statements.
Consider the statement “this swan is black”. It allows the following experimental test:

1. look at the swan


2. if it’s black terminate successfully
3. terminate unsuccessfully

Note that, since the test always terminates, we can switch failure to success and vice-versa.
In this case we can test the negation and we say that the statement is decidable: we can
decide experimentally whether it is true or false. It is precisely when and only when the test
is guaranteed to terminate, that we can test the negation.

Definition 1.28. A falsifiable statement is a statement that, if false, can be shown


to be so experimentally. Formally, a statement s is falsifiable if its negation ¬s ∈ Sv is a
verifiable statement.
Justification. Note that the informal definition is based on experimentally showing that
the statement is false, while the formal definition is based on the falsifiable statement to
be the negation of a verifiable one. We have to show they are equivalent.
For a statement s ∈ S to be falsifiable, there must be an experimental test e ∈ E that
fails if and only if the statement is false. That is, for all a ∈ AS , result(e, a) = failure if
and only if a(s) = false.
Let e ∈ E be an experimental test and consider e¬ (e) defined as follows:

1. run test e

14
Note that we are on purpose avoiding induction. It does not play any role in our general mathematical
theory of experimental science since the decision of when and how to apply induction violates the principle of
scientific objectivity.
118 CHAPTER 1. VERIFIABLE STATEMENTS AND EXPERIMENTAL DOMAINS

2. if e is unsuccessful terminate successfully


3. if e is successful terminate unsuccessfully.

Since e is repeatable and can be executed by anybody, e¬ (e) is also repeatable and can be
executed by anybody. Therefore we are justified to assume e¬ (e) ∈ E.
Now let s be a verifiable statement, then we can find e ∈ E such that result(e, a) =
success if and only if a(s) = true. We also have result(e¬ (e), a) = failure if and only
if result(e, a) = success and a(¬s) = false if and only if a(s) = true. Therefore for all
a ∈ AS , result(e¬ (e), a) = failure if and only if a(¬s) = false. Which means ¬s is falsifiable
if and only if s is verifiable. This justifies the definition.

Definition 1.29. A decidable statement is a statement that can be shown to be either


true or false experimentally. Formally, a statement s is decidable if s ∈ Sv and ¬s ∈ Sv . We
denote Sd ⊆ Sv the set of all decidable statements.

Justification. Note that the informal definition is based on experimentally showing that
the statement is either true or false, while the formal definition is based on both the
decidable statement and its negation to be verifiable. We have to show they are equivalent.
Let s ∈ S be a decidable statement and e ∈ E an experimental test that verifies whether
the statement is true or false. That is, result(e, a) = success if and only if a(s) = true
and result(e, a) = failure if and only if a(s) = false for all a ∈ AS . Then s is verifiable.
We also have result(e¬ (e), a) = success if and only if a(¬s) = true. Therefore s is also
verifiable.
Conversely, let s ∈ S be a verifiable statement such that ¬s is also verifiable. Let e, e¬ ∈ E
be their respective experimental tests. We have to be careful as e and e¬ may not terminate.
Consider the procedure ê(e, e¬ ) defined as follows:

1. initialize n to 1
2. run the test e for n seconds
3. if e is successful, terminate successfully
4. run the test e¬ for n seconds
5. if e¬ is successful, terminate unsuccessfully
6. increment n and go to step 2

The procedure is repeatable and can be executed by anybody therefore ê(e, e¬ ) ∈ E. Both e
and e¬ are eventually run an arbitrarily long amount of time therefore result(ê(e, e¬ ), a) ∈
{success, failure}, that is the test will always terminate. We have result(ê(e, e¬ ), a) =
success if and only if result(e, a) = success if and only if a(s) = true. We also have
result(ê(e, e¬ ), a) = failure if and only if result(e¬ , a) = success if and only if a(¬s) =
true if and only if a(s) = false. Therefore s is decidable. This justifies the definition.

Corollary 1.30. Certainties and impossibilities are decidable statements.

Proof. Let ⊺ ∈ S be a certainty and – ∈ S be an impossibility. By 1.27 they are verifiable


statements. We also have ⊺ ≡ ¬– therefore they are decidable by definition.

We introduce decidable statements here because their definition and related properties
clarify what happens during negation, but they do not play a major role in our framework.
1.2. VERIFIABLE STATEMENTS AND EXPERIMENTAL DOMAINS 119

They represent a special case which will we turn to time and time again over the course of
this work.
Combining verifiable statements with conjunction (i.e. the logical AND) is more straight-
forward. If we are able to verify that “that animal is a swan” and that “that animal is black”,
we can verify that “that animal is a black swan” by verifying both. If the tests for both are
successful, then the test for the conjunction is successful. That is, if we have two or more
verifiable statements, we can always construct an experimental test for the logical AND by
running all tests one at a time and check if they are successful. Yet, the number of tests needs
to be finite or we would never terminate, so we are limited to the conjunction of a finite
number of verifiable statements.

Axiom 1.31 (Axiom of finite conjunction verifiability). The conjunction of a finite collec-
tion of verifiable statements is a verifiable statement. Formally, let {si }ni=1 ⊆ Sv be a finite
n
collection of verifiable statements. Then the conjunction ⋀ si ∈ Sv is a verifiable statement.
i=1
Justification. Let {si }ni=1 ⊆ Sv be a finite collection of verifiable statements. Then we can
find a corresponding set of experimental tests {ei }ni=1 ⊆ E such that result(ei , a) = success
if and only if a(si ) = true for all a ∈ AS .
n
Let ⋀ ei be the experimental procedure defined as follows:
i=1

1. for each i = 1..n run the test ei


2. if all tests terminate successfully then terminate successfully
3. terminate unsuccessfully.

The experimental procedure so defined is repeatable, can be executed by anybody, therefore


n n
⋀ ei ∈ E. We have, for every a ∈ AS , result( ⋀ ei , a) = success if and only if result(ei , a) =
i=1 i=1
n
success for all i if and only if a(si ) = true for all i if and only if a( ⋀ si ) = true.
i=1
n
Therefore ⋀ si is a verifiable statement. We are therefore justified to assume that the
i=1
finite conjunction of verifiable statements is a verifiable statement.
Note that this cannot be generalized to infinite collections as the procedure would not
be guaranteed to terminate. In fact, if the only way to test the infinite conjunction is to
test each statement individually, then we are guaranteed to take infinite time and the test
will never terminate. Therefore we are not justified to assume the infinite conjunction of
verifiable statements is verifiable.

Combining verifiable statements with disjunction (i.e. the logical OR) is also straightfor-
ward. To verify that “the swan is black or white” we can first test that “the swan is black”.
If that is verified that’s enough: the swan is black or white. If not, we test that “the swan
is white”. That is, if we have two or more verifiable statements we can always construct an
experimental test for the logical OR by running all tests and stopping at the first one that is
successful. Because we stop at the first success, the number of tests can be countably infinite.
As long as one test succeeds, which will always be the case when the overall test succeeds, it
does not matter how many elements we are not going to verify later. But it cannot be more
than countably infinite since the only way we have to find if one experimental test in the set
120 CHAPTER 1. VERIFIABLE STATEMENTS AND EXPERIMENTAL DOMAINS

is successful is testing them all one by one. Therefore we are limited to the disjunction of a
countable number of verifiable statements.

Axiom 1.32 (Axiom of countable disjunction verifiability). The disjunction of a countable


collection of verifiable statements is a verifiable statement. Formally, let {si }∞
i=1 ⊆ Sv be a

countable collection of verifiable statements. Then the disjunction ⋁ si ∈ Sv is a verifiable
i=1
statement.
Justification. Let {si }∞
i=1 ⊆ Sv be a countable collection of verifiable statements. Then
we can find a corresponding set of experimental tests {ei }∞ i=1 ⊆ E such that result(ei , a) =
success if and only if a(si ) = true for all a ∈ AS .

In this case, we have to be careful to handle tests that may not terminate. Let ⋁ ei be
i=1
the experimental procedure defined as follows:

1. initialize n to 1
2. for each i = 1..n
a) run the test ei for n seconds
b) if ei terminates successfully then terminate successfully
3. increment n and go to step 2

The experimental procedure so defined is repeatable, can be executed by anybody, therefore



⋁ ei ∈ E. The procedure will eventually run all tests for an arbitrarily long amount of time.
i=1

Therefore, for every a ∈ AS , result( ⋁ ei , a) = success if and only if result(ei , a) = success
i=1

for some i if and only if a(si ) = true for some i if and only if a( ⋁ si ) = true. Therefore
i=1

⋁ si is a verifiable statement. We are therefore justified to assume that the countable
i=1
disjunction of verifiable statements is a verifiable statement.
Note that this cannot be generalized to uncountable infinite collections as the procedure
would not be guaranteed to eventually run all tests. In fact, suppose the only way to test
the uncountable disjunction is to test each statement individually. We would then need to,
at least, run all of them for a finite time, say one minute. Even assuming arbitrary long
time, we would only have countably many minutes at our disposal. Since the set of tests is
uncountable, we are not going to be able to run each test for one minute. Therefore we are
not justified to assume the uncountable disjunction of verifiable statements is verifiable.

Proposition 1.33. The conjunction and disjunction of a finite collection of decidable state-
ments are decidable. Formally, let {si }ni=1 ⊆ Sd be a finite collection of decidable statements.
n n
Then the conjunction ⋀ si ∈ Sd and the disjuction ⋁ si ∈ Sd are decidable statements.
i=1 i=1

Proof. Let {si }ni=1 ⊆ Sd be a finite collection of decidable statements. Then {si }ni=1 ⊆ Sv
n
and {¬si }ni=1 ⊆ Sv are verifiable. Consider ⋀ si : this is the finite conjunction of verifiable
i=1
1.2. VERIFIABLE STATEMENTS AND EXPERIMENTAL DOMAINS 121
n
statements and is therefore a verifiable statement by 1.31. Consider its negation ¬ ⋀ si ≡
i=1
n
⋁ ¬si : this is the finite disjunction of verifiable statements and is therefore a verifiable
i=1
statement by 1.32. The finite conjunction of decidable statements is decidable by definition.
n
Similarly, consider ⋁ si : this is the finite disjunction of verifiable statements and is
i=1
n n
therefore a verifiable statement by 1.32. Consider its negation ¬ ⋁ si ≡ ⋀ ¬si : this is the
i=1 i=1
finite conjunction of verifiable statements and is therefore a verifiable statement by 1.31.
The finite disjunction of decidable statements is decidable by definition.
Note that this cannot be generalized to infinite collection as it would require closure
under infinite conjunction. Also note that this result is consistent with the experimental
tests given in 1.31 and 1.32.

Taken as a whole, finite conjunction and countable disjunction define the algebra of
verifiable statements. It is limited compared to the algebra of statements and it tells us
that, in practice, we are not going to be able in general to construct an experimental test
whose success is an arbitrary function of the success of other tests.

Operator Gate Statement Verifiable Statement Decidable Statement


Negation NOT allowed disallowed allowed
Conjunction AND arbitrary finite finite
Disjunction OR arbitrary countable finite
Table 1.3: Comparing algebras of statements.

Before we continue, it is interesting and useful to stop and understand the interplay be-
tween scientific and mathematical constructs.15 Technically, 1.2, 1.4, 1.9, 1.27, 1.31 and 1.32
are the axioms of our mathematical formalism for statements. Note that the actual content of
the statements, the methodology for which an assignment is deemed possible and the proce-
dures for experimental verification are not formally defined: the math simply uses symbols to
label and identify them. The only assumptions are that statements exist (organized in logical
contexts with possible assignments, one of which is the true one), that some of them are
verifiable and that they admit the associated algebra. The mathematical formalism does not
know what the symbols actually represent: they may as well be pieces of cardboard painted
black or white. The math can only derive consequence given the premise, but it does not
know whether the premise makes actual sense. In other words, the way that we are making
the framework mathematically precise is not by making everything mathematically precise:
it is by omitting the details that are not amenable to a precise specification.
We should stress this for a couple of reasons. First, the part that is not formalized is the
most important part. Discovering new science is exactly finding new things to study (i.e. new
statements), new connections between known objects (i.e. new logical relationships) or devis-
15
It took us many many confusing years to fully understand where the scientific argument ends and the
mathematical argument begins, what makes sense to assume physically and what makes sense to prove rigor-
ously. Part of the confusion is that this line is not objective but it is based on what is considered “precise”
by a mathematician, which has evolved considerably through the centuries. The rule we follow is: in the
mathematical formalism the only objects that can be left unspecified are the elements of a set.
122 CHAPTER 1. VERIFIABLE STATEMENTS AND EXPERIMENTAL DOMAINS

ing new measurement techniques (i.e. new experimental tests). The content of the statements,
their semantic relationships and the procedure of the experimental tests is the actual science.
Everything that follows is, in a sense, the trivial bit and that is why it can be done generally.
Which leads to the second reason: understanding whether statements and verifiable state-
ments actually follow the algebras we defined is crucial. The math just takes it at face value,
it does not prove it. The justifications for our axioms, then, are the most critical part of this
work and they are not mathematical proofs. If we botch them, we’ll have a nice, consistent,
rich but meaningless mathematical framework. Lastly, it has to be clear that something gets
lost in the formalization. The mathematical framework cannot carry all the physics content:
we removed the most important part! Different systems may have the same mathematical
description, so the scientific content can never be entirely reconstructed from the math. That
is why we always have to carefully bring it along.
Now that we have characterized verifiable statements we want to understand how to
characterize groups of them. Consider the verifiable statements
“that animal is a duck”
“that animal is a swan”
“that animal is white”
“that animal is black”
“that animal is a black swan”
“that animal is a white duck”
“that animal is a duck or a swan”
Since some depend on others, we do not need to actually run all the tests. Once we have
tested the first four we have gathered enough information for the others. We call this set a
basis.16

Definition 1.34. Given a set D of verifiable statements, B ⊆ D is a basis if the truth


values of B are enough to deduce the truth values of the set. Formally, all elements of D
can be generated from B using finite conjunction and countable disjunction.

Note that once we have tested the basis, we have tested any other verifiable statement
that can be constructed from it. In the example before, once we tested the first four we
have implicitly tested “that animal is a black duck”. It is also true that impossibilities and
certainties don’t really need to be tested. We already know that “that animal is a duck and
a swan” is false. The idea, then, is to group verifiable statements into experimental domains
that can be seen as all the experimental information one can gather for a particular subject.
These will include the certainty, the impossibility and any other verifiable statement that can
be constructed from a basis. The basis, though, has to be countable so that, by running one
test at a time, we can hope to eventually reach any element.

Definition 1.35. An experimental domain D represents a set of verifiable statements


that can be tested and possibly verified in an indefinite amount of time. Formally, it is a

16
The term basis is used in general to define a set of objects from which, through a series of operations, one
can construct the full space. It is the same for a vector space: from a basis one can construct any other vector
through linear combination. What changes is what objects are combined and what operations are used.
1.2. VERIFIABLE STATEMENTS AND EXPERIMENTAL DOMAINS 123

set of statements, closed under finite conjunction and countable disjunction, that includes
precisely the certainty, the impossibility, and a set of verifiable statements that can be
generated from a countable basis.
Justification. In principle, indefinite is different from infinite. Having infinite time at our
disposal literally means that we can go on forever, which we cannot do. Having indefinite
time means that, while at some point we have to stop, we have to be prepared to keep
going on because we do not know when we will stop. In practice, in both cases we have
to make a plan for an infinite amount of time. In the indefinite case, our plan will be cut
short.
As we have already argued, we cannot run uncountably many tests in infinite time. Let
E ⊂ E be an uncountable set of experimental tests. Let t0 be the amount ot time that the
shortest test will take to run. Each test must at least run for t0 time. Given that each test
will succeed in finite non-zero time, at best t0 is a finite number. We can divide time into
slots of t0 time. The slots will be countable. As E is uncountable, we cannot associate each
test to a slot. We cannot test uncountably many tests.
We can, however, test countably many tests. Let B = {ei }∞ i=1 ∈ E be a countable set of
tests. Ee can proceed as follows:

1. initialize n to 1
2. for each i = 1..n
a) run the test ei for n seconds
3. increment n and go to step 2

This will run all tests for an indefinite amount of time.


Now let D ⊆ S be the set of statements generated by B using finite conjunction and
countable disjunction. Then D ⊆ Sv is a set of verifiable statements. Furthermore, testing
B will eventually verify each true statement in D.
Therefore we are justified to assume that a set D ⊆ Sv of verifiable statements that can
be tested in an indefinite amount of time must have a countable basis. We are also justified
to assume that it contains the certainty and the impossibility as these are two verifiable
statements, and two verifiable statements can always be added to a countable basis and
keep it countable. This justifies the definition.

We can think of an experimental domain as the enumeration of all possible verifiable


answers to a scientific question. For example, the domain related to the question “what is
that animal?” would include “it is a mammal”, “it is a dog”, “it is an animal with feathers”
and so on. If two statements are possible answers to that question, then their conjunction
and disjunction will also be possible answers. For example: “it is a dog or a cat” or “it is a
mammal and it lays eggs”.
While each statement only needs finite time to be verified, we allow indefinite time for the
domain because we want to capture those questions that can be answered only approximately.
The idea is that, given more time, we can always get a better answer so, in principle, we have
an infinite sequence of tests to perform and continue indefinitely.
The basis not only serves as a way to constrain the size of the experimental domain, but
most of the time it will also serve to define the experimental domain itself. We will typically
start by characterizing a set of verifiable statements (e.g. a set of characteristics of animals and
124 CHAPTER 1. VERIFIABLE STATEMENTS AND EXPERIMENTAL DOMAINS

how to identify them) and then consider the domain of all the verifiable statements that can
be constructed from them (e.g. the set of all animals and groups of animals we can identify).

1.3 Theoretical domains and possibilities


The basis for an experimental domain allows us to create a procedure that will eventually test
any verifiable statement. But to fully characterize a domain we want to find those statements,
like “that animal is a cat” or “the mass of the photon is exactly 0 eV”, that, if true, determine
the truth value of all verifiable statements in the domain. The main result of this section is
that these statements, which we call possibilities for the domain, are not necessarily verifiable
themselves. We will therefore need to introduce theoretical domains which consist of those
statements that can be associated to a test. These tests are constructed from those associated
to verifiable statements from an experimental domain. We will also be able to conclude that
the set of possibilities for an arbitrary experimental domain has at most the cardinality of the
continuum, thus putting a hard constraint on what type of mathematical objects are useful
in science.
Suppose DX is the domain of animal species identification. It will contain verifiable state-
ments such as “that animal has feathers”, “that animal has claws”, “that animal has paws”.
Some statements are broader and some are narrower. But some statements, like “that animal
is a mute swan (Cygnus olor)” or “that animal is a mallard duck (Anas platyrhynchos)”, are
special because if we verify those then we are able to know which other statements are true
or false. Once we verify those we are essentially done. These are what we call the possibilities
of the domain and enumerating them means characterizing the experimental domain.
Unfortunately, not all possibilities are verifiable statements. Consider the statements
s1 =“there is extra-terrestrial life” and s2 =“there is no extra-terrestrial life”. We can cre-
ate an experimental test to verify the first (i.e. find extra-terrestrial life somewhere) but not
the second (it would require us to check every place in the universe which is something we
cannot do). So, for this question, the experimental domain DX = {s1 , ⊺, –} is composed of
the first statement, the certainty and the impossibility. But s2 is conceptually still one of the
possibilities: if true we have a complete answer for the domain.
What happens is that while the negation of a verifiable statement is not always a verifiable
statement, it is still a statement that can be associated to an experimental test. As such, while
we cannot verify the statement, we can still predict the outcome of its test, including non-
termination. To be able to find all possibilities, then, we have to create the set of statements
that can be associated with experimental tests regardless of termination, which will include
negations. We call this set the theoretical domain and theoretical statements its elements.

Definition 1.36. The theoretical domain D ¯ of an experimental domain D is the set of


statements constructed from D to which we can associate a test regardless of termination.
We call theoretical statement a statement that is part of a theoretical domain. More
formally, D ¯ is the set of all statements generated from D using negation, finite conjunction
and countable disjunction.
Justification. The theoretical domain is defined to contain all statements that depend
on the verifiable statements for which we can associate an experimental test. A state-
ment in the theoretical domain, therefore, must be associated to a procedure that can be
1.3. THEORETICAL DOMAINS AND POSSIBILITIES 125

constructed in terms of the tests of the verifiable statements, regardless of whether the
procedure is guaranteed to terminate.
First of all, any verifiable statement is a theoretical statement. In fact, let s ∈ D be
a verifiable statement. Then there is a test e ∈ E associated to s. Therefore s ∈ D ¯ is a
theoretical statement.
In the previous justifications, we saw how to construct tests for negation, finite con-
junction and countable disjunction. We can use these to construct tests of theoretical
statements from other theoretical statements. Let s ∈ D ¯ be a theoretical statement. Then
there is a test e ∈ E associated to s. Consider ¬s. This can be associated with the test
¯ be a finite collection of
e¬ (e) ∈ E. Therefore ¬s is a theoretical statement. Let {si }ni=1 ∈ D
theoretical statements. Then there is a finite collection of tests {ei }ni=1 ⊆ E each associated
n n
to the respective statement. Consider ⋀ si . This can be associated with the test ⋀ ei ∈ E.
i=1 i=1
n
Therefore ⋀ si is a theoretical statement. Let {si }∞ ¯ be a countable collection of theo-
∈D
i=1
i=1
i=1 ⊆ E each associated
retical statements. Then there is a countable collection of tests {ei }∞
∞ ∞
to the respective statement. Consider ⋁ si . This can be associated with the test ⋁ ei ∈ E.
i=1 i=1

Therefore ⋁ si is a theoretical statement.
i=1
Therefore we are justified to assume the theoretical domain contains the closure of
the experimental domain under negation, finite conjunction and countable disjuction. As
shown later, this will automatically include countable conjunction as well.
Note that, as before, this cannot be generalized to uncountable operations, as a proce-
dure can be composed only of countably many non-zero-time operations. Therefore we are
not justified to close under uncountable operations.

Because of its construction, the theoretical domain will also include all the limits of all
the sequences of verifiable statements. Consider the experimental domain for the mass of the
photon. It will contain verifiable statements such as

s1 =“the mass of the photon is smaller than 10−1 eV”


s2 =“the mass of the photon is smaller than 10−2 eV”
s3 =“the mass of the photon is smaller than 10−3 eV”
...

It will not contain the statement s =“the mass of the photon is exactly 0 eV”, though, as we
cannot measure a continuous quantity with infinite precision.
Note s can be seen as the limit of the sequence of ever increasing precision, but can

also be seen as the conjunction for all those statements s = ⋀ si . In fact, the mass of the
i=1
photon is exactly 0 if and only if all the finite precision measurements will contain 0 in the
range. It makes sense, then, that it is not part of the experimental domain because only finite
conjunctions of verifiable statements are verifiable. But we expect s to be a possibility for the
mass of the photon. Why should it be in the theoretical domain?
Because of the De Morgan properties in 1.19, we can express conjunctions in terms of
∞ ∞
negation and disjunction. So we have s = ⋀ si = ¬ ⋁ ¬si . Therefore by allowing negation
i=1 i=1
126 CHAPTER 1. VERIFIABLE STATEMENTS AND EXPERIMENTAL DOMAINS

we are also allowing countable conjunction and therefore we are including all the limits of
sequences of verifiable statements.

Proposition 1.37. All theoretical domains are closed under countable conjunction.

Proof. Any countable conjunction s = ⋀ si is equivalent to the negation of disjunction of
i=1

the negation: s = ¬ ⋁ ¬si . As the theoretical domain is closed under negation and countable
i=1
disjunction, so it is closed under countable conjunction.

Definition 1.38. A theoretical statement s ∈ D ¯ is approximately verifiable if it is the


¯ is approximately verifiable
limit of some sequence of verifiable statements. Formally, s ∈ D

if there exists a sequence {si }∞
i=1 ∈ D such that s = ⋀ si .
i=1

Note that we are closed under countable operations and not arbitrary (e.g. uncountable).
Therefore there could be statements that can be constructed from verifiable statements that
are not even theoretical statements. One such statement, for example, could be constructed
given a set U of possible mass values for a particle that is uncountable, has an uncountable
complement, and where the elements are picked arbitrarily and not according to a simple
rule.17 The statement “the mass of the particle expressed in eV is in the set U ” can only be
tested by checking each value individually. But since the set is uncountable and a procedure
can only be made of countably many steps, it will be impossible to construct a test for such
a statement.
A theoretical statement, then, is one for which we can at least conceive an experimental
test. This may not always terminate if the statement is true or it may not always terminate
if the statement is false, but at least we have one. The statements that depend on the exper-
imental domain but are not part of the theoretical domain do not even hypothetically allow
for a procedure, regardless of the fact that it can terminate, and therefore we do not consider
them part of our scientific discourse, even theoretically.
In general, given a theoretical statement s̄, we would like to characterize what experimen-
tal test can be associated to it. Ideally, we want the experimental test for that statement
that terminates, successfully or unsuccessfully, in the most cases. Consider all the verifiable
statements that are narrower than s̄. If we take their disjunction we get the broadest verifiable
statement that is still narrower than s̄. We call this the verifiable part of s̄, noted ver(s̄). Test-
ing ver(s̄) means running the test that is guaranteed to terminate successfully in the broadest
situations in which s̄ is true. In fact, if s̄ is itself verifiable then ver(s̄) will be exactly s̄.
Conversely, consider all the verifiable statements that are incompatible with s̄. If we take
their disjunction we get the broadest verifiable statement that is still incompatible with s̄. We
call this the falsifiable part of s̄, noted fal(s̄). Testing fal(s̄) means running the test that is
guaranteed to terminate successfully in the broadest situations in which s̄ is false. In fact, if
s̄ is itself falsifiable then fal(s̄) will be exactly ¬s̄.
To each theoretical statement, then, we associate the experimental test constructed by
returning successfully if the test for the verifiable part succeeds and returning unsuccessfully
if the test for the falsifiable part succeeds. We will not be able to terminate if either of those
17
Mathematically, we are looking for a set of real numbers that is not a Borel set.
1.3. THEORETICAL DOMAINS AND POSSIBILITIES 127

doesn’t terminate, which will correspond to the statement ¬ ver(s̄) ∧ ¬ fal(s̄) being true. We
call this the undecidable part of s̄, noted und(s̄).
In light of this, consider s =“the mass of the photon is rational as expressed in eV”. It is
the disjunction of all possibilities with rational numbers, which is countable, and therefore is a
theoretical statement. Since we can only experimentally verify finite precision intervals, each
verifiable statement will include infinitely many rational (and irrational) numbers. Therefore
no verifiable statement is narrower than s and therefore ver(s) ≡ –. But for the same rea-
son no verifiable statement is incompatible with s and therefore fal(s) ≡ –. Which means
und(s̄) ≡ ⊺. This means that the experimental test for s will never terminate either success-
fully or unsuccessfully. We call this type of statements undecidable as we will never be able
to experimentally test anything about them.

Definition 1.39. Let s̄ ∈ D ¯ be a theoretical statement. We call the verifiable part ver(s̄) =
⋁s∈D ∣ s≼s̄ s the broadest verifiable statement that is narrower than s̄. We call the falsifiable
part fal(s̄) = ⋁s∈D ∣ s̸s̄ s the broadest verifiable statement that is incompatible with s̄. We call
the undecidable part und(s̄) = ¬ ver(s̄) ∧ ¬ fal(s̄) the broadest statement incompatible with
both the verifiable and the falsifiable part.
Justification. Let es̄ be the optimal test for s̄ ∈ D, ¯ that is the experimental test con-
structible from those associated to D that terminates under most condition. Let ver(s̄)
be the verifiable statement associated with that test. We must have ver(s̄) ∈ D since it is
constructible from elements of the domain. Consider s∨ = ⋁s∈D ∣ s≼s̄ s. We have s∨ ≼ s̄ and
therefore the test e associated with s∨ must terminate successfully whenever s̄ is true. Since
es̄ terminates under most conditions, it must terminates successfully whenever e terminates
successfully. If it didn’t, we could construct the test es̄ ∨ e which would terminate in more
cases, and therefore es̄ would not be optimal. Therefore we must have ver(s̄) ≽ s∨ . However,
we cannot have ver(s̄) ≻ s∨ . Since ver(s̄) ∈ D, this would imply ver(s̄) ≻ ver(s̄). Therefore
we must have ver(s̄) = ⋁s∈D ∣ s≼s̄ s.
Along the same lines, let fal(s) be the verifiable statement associated with e¬ (es̄ ). Noting
that s ≼ ¬s̄ if and only if s ̸ s̄, we find fal(s̄) = ⋁s∈D ∣ s̸s̄ s. This justifies the definitions.

Corollary 1.40. The verifiable, falsifiable and undecidable part partition the certainty.
¯ we have ⊺ ≡ ver(s̄) ∨ und(s̄) ∨ fal(s̄) while ver(s̄) ̸ und(s̄), ver(s̄) ̸
That is, for every s̄ ∈ D
fal(s̄) and und(s̄) ̸ fal(s̄).

Proof. We have und(s̄) = ¬ ver(s̄) ∧ ¬ fal(s̄) ≡ ¬(ver(s̄) ∨ fal(s̄)). Therefore ⊺ ≡ und(s̄) ∨


¬ und(s̄) ≡ und(s̄) ∨ ver(s̄) ∨ fal(s̄).
Since ver(s̄) ≼ s̄ and fal(s̄) ̸ s̄ we have ver(s̄) ̸ fal(s̄). Since und(s̄) ≡ ¬ ver(s̄) ∧ ¬ fal(s̄) ≼
¬ ver(s̄) and therefore und(s̄) ̸ ver(s̄). Similarly, we have und(s̄) ≼ ¬ fal(s̄) and therefore
und(s̄) ̸ fal(s̄).

Corollary 1.41. A theoretical statement s̄ ∈ D ¯ is verifiable if and only if s̄ ≡ ver(s̄). It is


falsifiable if and only if ¬s̄ ≡ fal(s̄). It is decidable if and only if und(s̄) ≡ –.

Proof. Let s̄ be a verifiable statement. Then s̄ itself is the broadest verifiable statement
compatible with itself and therefore s̄ ≡ ver(s̄). Conversely, let s̄ ≡ ver(s̄), then s̄ is equivalent
to a verifiable statement and is therefore verifiable.
128 CHAPTER 1. VERIFIABLE STATEMENTS AND EXPERIMENTAL DOMAINS

Let s̄ be a falsifiable statement. Then ¬s̄ itself is the broadest verifiable statement in-
compatible with s̄ and therefore ¬s̄ ≡ fal(s̄). Conversely, let ¬s̄ ≡ fal(s̄), then ¬s̄ is equivalent
to a verifiable statement and s̄ is therefore falsifiable.
Let s̄ be a decidable statement. Then und(s̄) = ¬ ver(s̄) ∧ ¬ fal(s̄) ≡ ¬s̄ ∧ ¬¬s̄ ≡ ¬s̄ ∧ s̄ ≡ –.
Conversely, let und(s̄) ≡ –. Then ver(s̄) and fal(s̄) are two incompatible statements whose
disjunction is a certainty. This means ver(s̄) ≡ ¬ fal(s̄). Since ¬ fal(s̄) ≡ ver(s̄) ≼ s̄ ≼ ¬ fal(s̄) ≡
ver(s̄) then s̄ ≡ ver(s̄) and ¬s̄ ≡ fal(s̄). Therefore s̄ is decidable.

Corollary 1.42. For every theoretical statement s̄, the undecidable part und(s̄) is a falsi-
fiable statement.

Proof. Since und(s̄) ≡ ¬(ver(s̄) ∨ fal(s̄)) then it is the negation of a verifiable statement
and is therefore falsifiable.
¯ is undecidable if und(s̄) ≡ ⊺.
Definition 1.43. A theoretical statement s̄ ∈ D

The theoretical domain, then, contains all sort of edge cases. Similar to undecidable state-
ments, we could have statements that can just never be verified experimentally (i.e. ver(s) ≡ –)
or just never falsified (i.e. fal(s) ≡ –). The possible presence of these types of statements means
we have to be cautious in giving physical significance to all theoretical statements, as the only
thing we may be able to say about some is that nothing can be said about them.18
In particular, we want to be able to understand when two statements represent two distinct
situations that we can tell apart experimentally. If the two statements are verifiable (e.g. “the
mass is between 1 and 2 Kg” and “the mass is between 2 and 3 Kg”), incompatibility will
be enough to tell us that the two cases are distinct (only one can happen at a time) since
verifying one will be enough to exclude the other. If the statements are not both verifiable
(e.g. “the mass is between 1 and 2 Kg” and “the mass is exactly 2 Kg”) being incompatible
is not enough: one may lie in the undecidable part of the other in which case we cannot verify
one and exclude the other at the same time. In that case, we would not be able to distinguish
the two cases experimentally.
For two statements s̄1 and s̄2 to be experimentally distinguishable, then, we need to have
an experimental test that always terminates successfully in one case and always terminates
unsuccessfully in the other. That is, we need to have a third statement s̄ whose verifiable
part is broader than one (e.g. ver(s̄) ≽ s̄1 ) and whose falsifiable part is broader than the other
(e.g. fal(s̄) ≽ s̄2 ). This way if s̄ is found to be true experimentally, we know that s̄1 may be
true and s̄2 must be false; if s̄ is found to be false experimentally, we know that s̄1 must be
false and s̄2 may be true; if the test for s̄ does not terminate, then both statements must be
false so we are in neither of those cases.

¯ be two theoretical statements. We say they are experimen-


Definition 1.44. Let s̄1 ,s̄2 ∈ D
tally distinguishable if there is an experimental test that can tell them apart. Formally,
we can find a theoretical statement s̄ ∈ D¯ such that s̄1 ≼ ver(s̄) and s̄2 ≼ fal(s̄).

18
It will be a recurrent theme of this work to make a precise distinction between mathematical objects that
represent physical entities (e.g. verifiable statements), those that represent idealizations of physical entities
(e.g. theoretical statements) and those that do not have scientific standing (e.g. statements that are neither
verifiable nor theoretical).
1.3. THEORETICAL DOMAINS AND POSSIBILITIES 129

Justification. Suppose we have an experimental test e constructible from those associ-


ated to D that can tell s̄1 and s̄2 apart. Then we must have that, without loss of generality,
result(e, a) = success for all a ∈ AS for which a(s̄1 ) = true and result(e, a) = failure
for all a ∈ AS for which a(s̄2 ) = true. Without loss of generality, we can take e to be an
optimal test, since we could make it optimal by combining it with other tests. Let s̄ be a
statement associated with that optimal test. Then we have s̄1 ≼ ver(s̄) and s̄2 ≼ fal(s̄). This
justifies the definition.

Proposition 1.45. Two theoretical statements s̄1 ,s̄2 ∈ D ¯ are experimentally distinguishable
¯ such that s1 ̸ s2 , s̄1 ≼ s1 and
if and only if we can find two verifiable statements s1 , s2 ∈ D
s̄2 ≼ s2 .
¯ be experimentally distinguishable. Then we can find s̄ such that
Proof. Let s̄1 ,s̄2 ∈ D
s̄1 ≼ ver(s̄) and s̄2 ≼ fal(s̄). Moreover, ver(s̄) ̸ fal(s̄). Therefore ver(s̄) and fal(s̄) are
verifiable statements that satisfy the condition in the proposition.
¯ such that s1 ̸ s2 , s̄1 ≼ s1 and s̄2 ≼ s2 . We have s̄1 ≼ s1 ≡ ver(s1 ).
Conversely, let s1 , s2 ∈ D
We also have s̄2 ≼ s2 ≼ fal(s1 ) because fal(s1 ) by definition is broader than all verifiable
statements incompatible with s1 . Therefore s̄1 ,s̄2 ∈ D ¯ are experimentally distinguishable.

To sum up, verifiable statements are the only ones that, perhaps under some simplifying
assumption, we can think of as tangible scientific objects (e.g. the idea that we can measure
mass with finite precision). From these we construct theoretical statements that represent our
idealizations (e.g. the infinitely precise value for the mass of a particle or the idea that said
value is a rational number expressed in a particular unit). And then there are statements that
are not even physically meaningful as they have no well defined experimental consequences.
We have seen that the theoretical domain may contain more statements, but we want to
make it clear that it never contains more information. That is, if we knew which verifiable
statements were true and which weren’t, we would automatically know which theoretical
statements would be true or not. So it is not adding extra cases. It is essentially completing
the list of answers by adding a “no” if only “yes” can be experimentally verified and vice-
versa. To see that this is the case, we can show that a basis for an experimental domain D
¯ That is, all verifiable and theoretical statements
is also a basis for its theoretical domain D.
can be expressed as functions of the same basis. The difference is that a theoretical statement
can be a function of the negation of the element of a basis.

Proposition 1.46. The truth values of the statements of a basis B for an experimental
domain D are enough to determine the truth values for all statements in the associated
theoretical domain D.¯ More formally, all statements in the theoretical domain D ¯ can be
generated by negation, countable conjunction and countable disjunction from a basis B of
D.
Proof. By definition of basis, any verifiable statement within the experimental domain
D can be generated from B using only finite conjunction and countable disjunction. The
certainty may be generated through negation and disjunction from any verifiable statement
therefore B generates D which in turn generates D ¯ by definition. This means that B,
¯
through negation, countable conjunction and countable disjunction, generates all of D.
130 CHAPTER 1. VERIFIABLE STATEMENTS AND EXPERIMENTAL DOMAINS

Having defined what a theoretical domain is, we can finally define what the possibilities
of a domain are: those statements that if known to be true determine the truth value of all
other statements.

Definition 1.47. A possibility for an experimental domain D is a statement x ∈ D ¯


that, when true, determines the truth value for all statements in the theoretical domain.
¯ either x ≼ s or x ̸ s. The full possibilities, or simply
Formally, x ≢ – and for each s ∈ D,
the possibilities, X for D are the collection of all possibilities.

A possibility represents a complete answer for a scientific question. Only one of them can
be true and one of them must be true since the theoretical domain contains all negations.
But how can we construct them? Suppose D is the experimental domain for animal species
identification. There will be a set of possible truth assignments for it. Suppose B ⊂ D is a
basis of statements, like “that animal has feathers”, “that animal has claws”, “that animal
has paws” and so on, that allows us to fully identify the animal species. Then each possible
assignment for the basis will correspond to one and only one assignment for the domain.
Consider a minterm, a conjunction where each statement appears once either negated or
not. For example, “that animal has feathers”∧“that animal has claws”∧¬“that animal has
paws”∧... . If that statement is true, it will determine the truth value of all the basis, it will
select one possible truth assignment for the whole domain. Then a possibility for the domain
is simply a minterm of a basis that corresponds to a possible assignment for the domain.

Proposition 1.48. Let D be an experimental domain. A possibility for D is any minterm


of a basis that is not impossible.
Proof. Let B ⊆ D be a basis for D. Let x be a minterm of B. Any theoretical statement
s∈D ¯ can be expressed as the disjunction of minterms of B by 1.26 and by 1.46. x is either
within the minterms needed to express s, or not. If it is, x ∧ s ≡ x and therefore x ≼ s. If it’s
not, x ∧ ¬s ≡ x and therefore x ̸ s. Therefore a minterm is either narrower or incompatible
with all theoretical statements and, if it is not impossible, it is a possibility by definition.
Conversely, suppose x ∈ D ¯ is a possibility. As it is a theoretical statement, it can be
expressed as a disjunction of minterms of a basis B. Suppose it is the disjunction of more
than one minterm. Then each minterm would be narrower than x, which cannot be. x must
be expressed by a single minterm. Therefore any possibility is a minterm.

Proposition 1.49 (No other possibilities). All statements that determine and only de-
termine the truth value of all statements in a theoretical domain D ¯ are possibilities of D.
Formally, there is no statement s ∈ S that has all the properties of a possibility except that
¯
s ∉ D.

Proof. Let x ∈ S be a statement that determines and only determines all truth values
of all statements in a theoretical domain D. ¯ This is equivalent to determining the truth
values and only the truth values of all elements of a basis B ⊆ D. As we can find a countable
basis, the statement x is equivalent to the countable conjunction of statements of B or their
negation. Therefore x ∈ D ¯ as it is generated by the statements of the basis by negation
and countable conjunction. But a statement in D ¯ that determines all truth values of the
¯
statements in D is a possibility by definition. Therefore x is a possibility.
1.3. THEORETICAL DOMAINS AND POSSIBILITIES 131

There is one possibility that is often forgotten and sometimes needs special handling.
Suppose one is trying to identify an illness by going through a series of known markers. It
may happen that no match for the disease is found because we are dealing with a new kind of
illness. In the same way, we may fail to measure the value of a quantity because it lies outside
the sensitive range of our equipment. In other words, it may be possible that none of our tests
succeed and none of the verifiable statements is verified. We call this possibility the residual
because it’s what remains after we went through all the cases we already know.
Note, though, that the residual possibility does not exist for all domains. Suppose we have
a basket of fruit and we want to count how many whole apples there are. There can only
be a finite number of them, and we can successfully identify all finite numbers: there is no
“something else” to be found in this case.

Definition 1.50. The residual possibility x̊ for an experimental domain D is, if it exists,
the possibility that predicts that no test will be successful. Formally, let B ⊆ D be a basis
then x̊ = ⋀ ¬e if it is not impossible. An experimental domain is complete if it doesn’t
e∈B
admit a residual possibility.

Definition 1.51. The established possibilities Ẋ = X ∖{x̊} for an experimental domain


is the set of all possibilities excluding the residual possibility.

As we refine our understanding and techniques for a domain of knowledge, we may find
that the residual possibility actually corresponds to multiple cases that weren’t previously
cataloged. So, intuitively, it is better thought of as a bucket that contains all that is yet to be
experimentally discovered within the particular domain of knowledge. Because of its special
nature, the residual possibility sometimes behaves differently than the other possibilities.
Therefore we will find that some theorems are more elegantly expressed in terms of the
established possibilities and some in terms of the full possibilities.
To sum up the mathematical structure of the domain, consider the following table:

¯
D
³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹· ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ
D
³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ · ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹µ
B X
³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹· ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ
b1 b2 ... s1 s2 ... s̄1 s̄2 ... x1 x2 x3 x4 ...
T F ... T F ... F F ... T F F F ...
T T ... T F ... F T ... F T F F ...
F F ... T T ... F F ... F F T F ...
T F ... F F ... T T ... F F F T ...

On the left we find the basis elements b ∈ B. From these, through finite conjunction and
countable disjunction we construct all verifiable statements s ∈ D. From these, through nega-
tion, countable conjunction and countable disjunction, we construct all theoretical statements
¯ Within these we find the possibilities x ∈ X. Each row corresponds to a possible truth
s̄ ∈ D.
assignment for the domain. Each possibility is true in one and only one possible assignment.
The mathematical structure, then, is simply there to keep track of the logical relationships
132 CHAPTER 1. VERIFIABLE STATEMENTS AND EXPERIMENTAL DOMAINS

between all the statements, which assignments are possible and which statements are verifi-
able.
With our definitions in mind, we can answer the following fundamental question: what is
the maximum number of possibilities that an experimental domain can have? In other words:
what is the maximum number of cases among which we can distinguish experimentally? As
we saw before, a possibility is a statement that defines the truth values of a basis. Since a
basis is countable, we can uniquely identify a possibility by a countable sequence of true or
false. Note that a real number expressed in a binary basis (e.g. 0.10110001...) will also be
uniquely identified by such a sequence: the cardinality of the possibilities is at most that of
the continuum.

Theorem 1.52. The possibilities X for an experimental domain D have at most the car-
dinality of the continuum.
Proof. Let B = {ei }∞
i=1 ⊆ D be a countable basis. Let 2 denote the set of infinite binary
N

sequences. We define the function F ∶ X → B such that F (x) = {F (x)i }∞


N
i=1 is given by:



⎪true x  ei
F (x)i = ⎨

⎩false
⎪ x ̸ ei

For each x ∈ X we have x = ⋀ ¬F (x)i ei . Suppose x1 ≠ x2 , then F (x1 )i ≠ F (x2 )i for some i,
i=1
therefore F is injective. We then have ∣X∣ ≤ ∣2N ∣ = ∣R∣. X has at most the cardinality of the
continuum.
This means we have an upper bound on how many cases can be distinguished experimen-
tally: only up to the continuum. We are not going to be able to tell apart experimentally
more possibilities than those. This result gives us a basic requirement for any mathematical
object we want to use in a scientific theory: if the cardinality is greater than the continuum, it
cannot have a well defined experimental meaning. For example, while the set of all continuous
functions between real numbers has the cardinality of the continuum, the set of all functions
(including discontinuous ones) has greater cardinality. We can already conclude that the first
set may be useful to represent physical objects and the second may not.
As we are looking into the cardinality of the possibilities of a domain, it should not be
surprising that the only way we can have infinitely many possibilities is if we are given infinitely
many verifiable statements. As each verifiable statement can distinguish between two cases
(i.e. true or false), we need infinitely many of these distinctions to reach infinitely many cases.

Proposition 1.53. Let DX be an experimental domain. The following are equivalent:

1. the set of possibilities is finite


2. the experimental domain contains finitely many verifiable statements
3. there exists a finite basis

Proof. To prove 2 from 1, let DX be an experimental domain and X its possibilities.


Suppose X is finite. Since any statement in DX can generated from X using finite con-
1.4. TOPOLOGICAL SPACES 133

junction, DX must be finite as well.


To prove 3 from 2, let DX contain finitely many verifiable statements. Then a basis will
consist of finitely many statements as it is a subset of the domain.
To prove 1 from 3, suppose B is a finite basis for the domain. Since the possibilities are
minterms of the basis, we have ∣X∣ ≤ ∣2∣B∣ ∣. Since the basis is finite, X is finite as well.

We have now presented the fundamental objects of our general mathematical theory of
experimental science. In our framework, a “scientific theory” or a “scientific model” is an
experimental domain: a set of statements, what they mean and how to verify them experi-
mentally.
The starting point is often a basis: a set of verifiable statements which defines all the do-
main knowledge that we can gather experimentally. The content of the statements determines
what combinations can be true at the same time, which defines the possibilities for our do-
main. Everything in the domain is grounded within the verifiable statements: there is nothing
else in it. This also maps well to the practice of most scientific fields, where one defines states
and other physical objects based on what can be measured.
As we’ll see later, some verifiable statements may be idealizations. For example, we may
assume that a quantity can be measured with arbitrary level of precision, which we know not
to be strictly true. We may assume a volume of gas to have a well defined temperature, which
we know not be true if it is not at equilibrium. This type of simplification makes a domain
applicable only within the realm of validity of those idealizations, but does not change the
formal structure we have identified here. In fact, the focus of much of this work will be deriving
the details of different experimental domains under different physical assumptions.
The main point of our framework is that this conceptual structure is inescapable once we
set the principle of scientific objectivity. We will always need a set of statements and a way
to test them and, if we are given those, we have all we need. These elements are necessary
and sufficient to be able to do science. And by being clear about which statements can be
considered physically meaningful, and which are an idealization, we can then be more precise
on the physical status of each component of a particular scientific theory.

1.4 Topological spaces


Now that we have defined what experimental domains are, we want to explore the link between
them and some fundamental mathematical structures. The main result of this section is that an
experimental domain provides a natural topology for its possibilities. Each verifiable statement
can be seen as the disjunction of a set of possibilities. Performing finite conjunction and
countable disjunction of verifiable statements means performing finite union and countable
intersection on those sets of possibilities.
Topological spaces were developed in the first half of the 1900s as a generalization of
metric spaces. The idea is to define a notion of closeness without having to define an actual
distance.19 Other branches of math (e.g. metric spaces, differential geometry, Lie algebras)
now see their foundation on topological spaces, which therefore play a very important role in
mathematics as a whole. In our case, this notion of closeness will map to how hard it is to
19
In science and engineering, one talks about topology also when discussing the structure of a molecule,
an electronic circuit or a computer network. These types of structures (i.e. nodes connected by vertices) are
studied by graph theory and should not be confused with point-set topology.
134 CHAPTER 1. VERIFIABLE STATEMENTS AND EXPERIMENTAL DOMAINS

tell possibilities apart. That is, possibilities that are topologically closer are more difficult to
distinguish experimentally.
Let’s first review what a topology is. The general idea is that we have a set X of ele-
ments, which we call points, and a collection of subsets of X such that it is closed under
finite intersection and arbitrary union, contains the empty set and contains the whole set
X. For example, suppose X = {1, 2, 3} then {{}, {1}, {2}, {1, 2, 3}} is not a topology while
{{}, {1}, {2}, {1, 2}, {1, 2, 3}} is. The first one is missing the union of {1} and {2}.

Definition 1.54. Let X be a set. A topology on X is a collection TX of subsets of X


closed under finite intersection and arbitrary union such that it contains X and ∅. A
topological space is a tuple (X, TX ) of a set and a topology defined on it.

Mathematicians designed this abstract mathematical structure because it is a useful and


general tool to study the notion of continuity. It also happens that all the mathematical
structures used in science are topological spaces. Why is that? What is it that topological
spaces capture?
Let’s go back to our verifiable statements and possibilities. For example, consider s1 =“the
mass of the photon is less than 10−10 eV”. This can be expressed as s1 = ⋁ “the mass
0≤x<10−10
of the photon is precisely x eV”: the precise value must be in the given range of possibilities.
Consider s2 =“the mass of the photon is greater than 10−20 eV”= ⋁ “the mass of the photon
x>10−20
is precisely x eV”. The conjunction is the intersection of the possible values: s1 ∧ s2 =“the mass
of the photon is between 10−20 and 10−10 eV”= ⋁ “the mass of the photon is precisely
10−20 <x<10−10
x eV”. The disjunction is the union of the possible values: s1 ∨ s2 =“the mass of the photon
can be anything”= ⋁ “the mass of the photon is precisely x eV”.
x≥0
This is something that works in general. In Proposition 1.26 we saw that, if a statement
is a function of other statements, it can be expressed as the disjunction of minterms of the
arguments. A verifiable statement is a function of basis, so it can be expressed as the dis-
junction of minterms of a basis. But we have also seen that the minterms of a basis are the
possibilities, so each verifiable statement can be expressed as the disjunction of possibilities.
Therefore each statement in the experimental domain defines a set of possibilities, which
we call a verifiable set. Since certainty and impossibility are in the domain, the empty set
and the full set of possibilities are verifiable sets. Since we can take finite conjunction and
countable disjunction of verifiable statements, we can take finite intersection and countable
union of verifiable sets. The collection of all verifiable sets forms a topology on the set of
possibilities.

Definition 1.55. Let D be an experimental domain and X its possibilities. We define the
map U ∶ D → 2X that for each statement s ∈ D returns the set of possibilities compatible
with it. That is: U (s) ≡ {x ∈ X ∣ x  s}. We call U (s) the verifiable set of possibilities
associated with s.

Proposition 1.56. A statement s ∈ D is equivalent to the disjunction of the possibilities


1.4. TOPOLOGICAL SPACES 135

in its verifiable set U (s). That is, s ≡ ⋁ x.


x∈U (s)

Proof. First we show each statement is the disjunction of some set of possibilities. Let
D be an experimental domain, s ∈ D a verifiable statement and B ⊆ D a basis. Since s is a
function of the basis, it can be expressed as a disjunction of minterms of B. The minterms
of B that are impossible can be ignored since s ∨ – ≡ s. But the minterms of B that are not
impossible are possibilities of D so s = ⋁ x for some U ⊆ X.
x∈U
Now we show it is the disjunction of its verifiable set. Let x ∈ X be a possibility and
consider x ∧ s. If x ∈ U then x ∧ s = x ∧ ⋁ x̂ = ⋁ (x ∧ x̂) ≡ x ≢ –. Therefore x  s. If x ∉ U
x̂∈U x̂∈U
then x ∧ s ≡ –. Therefore x ̸ s. This means U = U (s) as it contains and only contains all
the possibilities compatible with s.

Proposition 1.57. Let X be the set of possibilities for an experimental domain D. X has
a natural topology given by the collection of all verifiable sets TX = U (D).

Proof. The verifiable sets for the certainty and the impossibility correspond to the full
set and empty set respectively. Formally, U (⊺) = {x ∈ X ∣ x  ⊺} = X while U (–) = {x ∈
X ∣ x  –} = ∅. Therefore X, ∅ ∈ U (D) since ⊺, – ∈ D.
The finite intersection of verifiable sets corresponds to the verifiable set of the finite
conjunction and therefore it is a verifiable set. Formally, U (s1 ∧s2 ) = {x ∈ X ∣ x  (s1 ∧s2 )} =
{x ∈ X ∣ x  s1 and x  s2 } = {x ∈ X ∣ x  s1 } ∩ {x ∈ X ∣ x  s2 } = U (s1 ) ∩ U (s2 ).
The countable union of verifiable sets corresponds to the verifiable set of the countable
disjunction and therefore it is a verifiable set. Formally, U (s1 ∨ s2 ) = {x ∈ X ∣ x  (s1 ∨
s2 )} = {x ∈ X ∣ x  s1 or x  s2 } = {x ∈ X ∣ x  s1 } ∪ {x ∈ X ∣ x  s2 } = U (s1 ) ∪ U (s2 ).
This generalizes to countable disjunctions. Arbitrary disjunctions can be re-expressed as
countable disjunctions, since any verifiable statement can always be expressed in terms of
a countable basis.
The collection TX = U (D) is therefore a topology by definition since it satisfies all its
properties.

Mainly for historical reasons, the sets in a topology are called open sets. The complements
of open sets are called closed sets. In metric spaces, such as the Euclidean space with the
standard topology, these will map to the standard notion of open and closed intervals. But,
in general, they do not and this may lead to confusion. For example, if we take the integers
with their standard topology, any subset is both open and closed.
Given that we are only interested in the natural topologies of possibilities, we are going
to refer to the sets in our topology as verifiable sets and we will occasionally call falsifiable
sets their complements. For example, when counting apples a subset of the integers is both
verifiable and falsifiable: we can test whether the apple count is within or outside that set of
possible numbers. While this terminology does not follow math convention, we find it more
intuitive and meaningful in the context of this work.
We can also re-express the semantic relationships between statements in terms of set
operations on the verifiable sets. For example, “that animal is a cat” is narrower than “that
animal is a mammal” because the set of possibilities for which the first is true is a subset
of the possibilities for which the second is true. Conversely, “that animal is a cat” and “that
animal is a dog” are incompatible because the set of possibilities in which both are true is
136 CHAPTER 1. VERIFIABLE STATEMENTS AND EXPERIMENTAL DOMAINS

empty.
In the following table we summarize how statement operations and relationships are ex-
pressed in terms of operations and relationships between sets of possibilities.

Statement relationship Set relationship


s1 ∧ s2 (Conjunction) U (s1 ) ∩ U (s2 ) (Intersection)
s1 ∨ s2 (Disjunction) U (s1 ) ∪ U (s2 ) (Union)
¬s (Negation) U (s)C (Complement)
ver(s) (Verifiable part) int(U (s)) (Interior)
fal(s) (Falsifiable part) ext(U (s)) (Exterior)
und(s) (Undecidable part) ∂U (s) (Boundary)
s1 ≡ s2 (Equivalence) U (s1 ) = U (s2 ) (Equality)
s1 ≼ s2 (Narrower than) U (s1 ) ⊆ U (s2 ) (Subset)
s1 ≽ s2 (Broader than) U (s1 ) ⊇ U (s2 ) (Superset)
s1  s2 (Compatibility) U (s1 ) ∩ U (s2 ) ≠ ∅ (Intersection not empty)
Table 1.4: Correspondence between statement operators and set operators.

Let’s review the definition of basis and sub-basis for a topology: a collection of sets from
which we can generate the whole topology through finite intersection and countable union
(for a sub-basis) or just through countable union (for a basis). Bases are important since
they are often used in proofs and calculations. Moreover, many properties of topologies can
be shown to be equivalent to properties of one of their bases. Countability, and in particular
second-countability, is one such property which characterizes the number of verifiable sets in
the topology.

Definition 1.58. A collection B ⊆ TX of verifiable sets of X is a sub-basis if every


verifiable set in X is the union of finite intersections of elements of B. It is a basis if
every verifiable set in X is the union of elements of B.

Definition 1.59. A topology for X is second-countable if it admits a countable basis.

There is a link between the basis of an experimental domain and a sub-basis of a topology.
If every statement in the experimental domain can be constructed from a basis through finite
conjunction and countable disjunction, then each corresponding verifiable set can be generated
through intersection and union of the verifiable set corresponding to the basis. Therefore the
verifiable set corresponding to the basis of the experimental domain forms a sub-basis in
the topology. Since experimental domains must have a countable basis to make sure we can
test any verifiable statement given enough time, the topologies we’ll be interested in must be
second-countable.

Proposition 1.60. Let X be the set of possibilities of an experimental domain D. Let


B ⊆ D be a basis for the domain, then the collection of verifiable sets U (B) ∪ {X} forms a
sub-basis for the natural topology of X.
Proof. Since every verifiable statement of a domain can be generated by finite con-
junction and countable disjunction from a basis B ⊆ D, its corresponding verifiable set
1.4. TOPOLOGICAL SPACES 137

can be generated by finite intersection and countable union from the verifiable sets U (B)
corresponding to the basis. Note that the certainty, though, is not necessarily the union
of U (B): if the domain is not complete, the residual possibility is not contained in any
verifiable sets. Therefore U (B) ∪ {X} can generate all verifiable sets, including the one for
the certainty, and is a sub-basis.

Proposition 1.61. The natural topology for the possibilities of an experimental domain is
second-countable.

Proof. Since each experimental domain admits a countable basis, its verifiable sets
form a countable sub-basis for the natural topology. We can close the sub-basis over finite
intersection, forming a countable basis for the topology. The natural topology is therefore
second-countable as it admits a countable basis.
Another important property to classify topological spaces is the degree of separation of
their elements: how well one can use verifiable sets to tell points and sets apart. A Kolmogorov,
or T0 , space is one in which for every pair of points there is always a verifiable set that
contains one but not the other. This property is significant because it allows all points to be
distinguished through verifiable sets. A Fréchet, or T1 , space is one in which for every pair of
points we can find two verifiable sets each containing only one. In a Hausdorff, or T2 , space
the two verifiable sets containing the points are disjoint. This implies the uniqueness of limits
of sequences of points.

T0 T1 T2

⋅ ⋅ ⋅ ⋅ ⋅ ⋅

Definition 1.62. A topology for X is Kolmogorov (or T0 ) if for every two elements
x1 , x2 ∈ X there exists a verifiable set U ∈ TX containing one element but not the other.
That is: either x1 ∈ U while x2 ∉ U or x1 ∉ U while x2 ∈ U .

Definition 1.63. A topology for X is Fréchet (or T1 ) if for every two elements x1 , x2 ∈ X
there exist two, not necessarily disjoint, verifiable sets U1 , U2 ∈ TX each containing only
one element. That is: x1 ∈ U1 and x2 ∈ U2 .

Definition 1.64. A topology for X is Hausdorff (or T2 ) if for every two elements x1 , x2 ∈
X there exist two disjoint verifiable sets U1 , U2 ∈ TX each containing one element. That is:
U1 ∩ U2 = ∅, x1 ∈ U1 and x2 ∈ U2 .

Remark. Note that T2 implies T1 which in turn implies T0 .

How do these properties relate to experimental domains? Consider two possibilities for a
domain, for example “that is a cat” and “that is a swan”. We can always find a verifiable
statement, such as “that animal has feathers”, that we can use to distinguish one possibility
from the other. This means that, given two different possibilities, we can always find a verifiable
138 CHAPTER 1. VERIFIABLE STATEMENTS AND EXPERIMENTAL DOMAINS

set that contains one and not the other: the natural topology for any set of possibilities is
always T0 .
Now suppose two possibilities are approximately verifiable as we defined in Definition
1.38. For example, “the mass of the photon is exactly 0 eV” or “the mass of the photon is
exactly 10−20 eV”. We can find two verifiable statements “the mass of the photon is less than
10−25 eV” and “the mass of the photon is more than 10−25 eV” each compatible with only
one possibility. This means that, given two approximately verifiable possibilities, we can find
two verifiable sets each containing only one possibility: if all possibilities are approximately
verifiable then the natural topology is T1 .
Now suppose two possibilities are experimentally distinguishable as we defined in Defi-
nition 1.44. Then, by 1.45, we can find two disjoint approximations. In the example before,
the two verifiable statements were in fact incompatible. This means that, given two approxi-
mately verifiable possibilities, we can find two disjoint verifiable sets each containing only one
possibility: if all possibilities experimentally distinguishable then the natural topology is T2 .

Proposition 1.65. The natural topology of a set of possibilities is Kolmogorov (or T0 ).


Proof. Let X be the set of possibilities for an experimental domain D. Let x1 , x2 ∈ X
be two distinct possibilities. Each of them can be expressed as a minterm of a basis B ⊆ D.
Since the two possibilities are distinct, there must exist a verifiable statement e ∈ B that
appears negated in one conjunction but not the other. That is, e is compatible with only
one possibility. Since the verifiable set associated with a verifiable statement contains only
the possibilities compatible with said statement, the verifiable set of e either contains x1
or x2 but not both. The topology is therefore Kolmogorov (or T0 ).

Proposition 1.66. The natural topology of a set of possibilities is Fréchet (or T1 ) if and
only if all possibilities are approximately verifiable.

Proof. Suppose all possibilities in X for an experimental domain DX are approximately


verifiable. Let x1 , x2 ∈ X be two possibilities, then we can find two sequences of verifiable
∞ ∞
statements {s1i }∞ 2 ∞ 1 2
i=1 , {sj }j=1 ∈ DX such that x1 = ⋀ si and x2 = ⋀ sj . We can assume the
i=1 j=1
sequences are monotone with respect to narrowness, that is s1i+1 ≼ s1i , as we can always create
a monotone sequence from one that is not by taking the sequence of finite conjunction,
k
that is ŝ1k = ⋀ s1i . If x1 ≠ x2 , then x1 ∧ x2 ≡ – since different possibilities are incompatible.
i=1
Therefore we must have x1 ∧ s2j ≡ – and s1i ∧ x2 ≡ – from some i, j ≥ 1 or the limits would
not be incompatible. In terms of verifiable sets we have x2 ∉ U (s1i ) and x1 ∉ U (s2j ). For any
two distinct possibilities we can find two verifiable sets each containing one: the natural
topology is T1 .
Conversely, suppose the natural topology TX for the possibilities X for an experimental
domain DX is T1 . Let x ∈ X be a possibility. Consider the collection, not necessarily
countable, of all verifiable sets {Ui }i∈I ⊂ TX such that they contain x. Consider their
intersection Ux = ⋂ Ui . It will contain x since all Ui contain x. It will not contain anything
i∈I
else: since the topology is T1 , for every other possibility x̂ there is always an open set Ui that
does not contain it. Therefore Ux = {x}. Because the natural topology is second countable,
1.5. SIGMA-ALGEBRAS 139

we can find a countable basis B and rewrite the arbitrary intersection into {x} = ⋂ Vi
i=1
a countable intersection of elements Vi ∈ B of the basis. Let {si }∞
i=1 be the sequence of

verifiable statements such that U (si ) = Vi for every i. Then x = ⋀ si which means x is
i=1
approximately verifiable.

Proposition 1.67. The natural topology of a set of possibilities is Hausdorff (or T2 ) if


and only if all possibilities are pairwise experimentally distinguishable.

Proof. Suppose all possibilities in X for an experimental domain DX are pairwise ex-
perimentally distinguishable. Then, by 1.45, given two possibilities x1 , x2 ∈ X we can find
two verifiable statements s1 , s2 ∈ DX such that s1 ̸ s2 , x1 ≼ s1 and x2 ≼ s2 . In terms of
verifiable sets we have U (s1 ) ∩ U (s2 ) = ∅, x1 ∈ U (s1 ) and x2 ∈ U (s2 ). The topology is T2 .
Conversely, suppose the natural topology TX for the possibilities X for an experimental
domain DX is T1 . Given two possibilities x1 , x2 ∈ X we can find two verifiable sets U1 , U2 ∈
TX such that U1 ∩ U2 = ∅, x1 ∈ U1 and x2 ∈ U2 . Since U1 , U2 ∈ TX , we can find two
corresponding verifiable statements s1 , s2 ∈ DX such that U (s1 ) = U1 and U (s2 ) = U2 . We
have s1 ̸ s2 , x1 ≼ s1 and x2 ≼ s2 and by 1.45 the possibilities are pairwise experimentally
distinguishable.

1.5 Sigma-algebras
In the same way that experimental domains find a natural mathematical representation as
topological spaces, theoretical domains find a natural mathematical representation in σ-
algebras. The main result of this section is that a theoretical domain provides a natural
σ-algebra on its possibilities.
Like topologies, σ-algebras are fundamental in mathematics as they allow us to construct
measures (i.e. assigning sizes to sets), limits for sequences of sets and probability spaces. It
is again fitting that theoretical domains are associated to such a fundamental mathematical
structure.
Let’s first review what a σ-algebra is. The general idea is that we have a set X of ele-
ments which we call points, and we have a collection of subsets of X such that it is closed
under complement and countable union, contains the empty set and contains the whole
set X. For example, suppose X = {1, 2, 3} then {{}, {1}, {1, 2, 3}} is not a σ-algebra while
{{}, {1}, {2, 3}, {1, 2, 3}} is. The first one is missing the complement of {1}.

Definition 1.68. Let X be a set. A σ-algebra on X is a collection ΣX of subsets of X


closed under complement and countable union such that it contains X.

Note that σ-algebras are also closed under countable intersections, since these can be
expressed in terms of complements and countable unions.
In the previous section we saw how each verifiable statement can be expressed as the
conjunction of a set of possibilities, how the operations on statements can be expressed as
operations on the verifiable sets and how all the verifiable sets form a topology. The same
is true for theoretical statements, with the only difference being that we will end up with a
140 CHAPTER 1. VERIFIABLE STATEMENTS AND EXPERIMENTAL DOMAINS

collection of sets that is closed under complement and countable union since the theoretical
domain is closed under negation and countable disjunction.

Definition 1.69. Let D ¯ be a theoretical domain and X its possibilities. We define the
map A ∶ D ¯ → 2 that for each theoretical statement s ∈ D
X ¯ returns the set of possibilities
compatible with it. That is, A(s) ≡ {x ∈ X ∣ x  s}. We call A(s) the theoretical set of
possibilities associated with s
¯ X has a
Proposition 1.70. Let X be the set of possibilities for a theoretical domain D.
¯
natural σ-algebra given by the collection of all theoretical sets ΣX = A(D).

Proof. The theoretical sets for the certainty and the impossibility correspond to the
full set and empty set respectively. Formally, A(⊺) = {x ∈ X ∣ x  ⊺} = X while A(–) = {x ∈
X ∣ x  –} = ∅. Therefore X, ∅ ∈ A(D) ¯ since ⊺, – ∈ D. ¯
The complement of a theoretical set corresponds to the theoretical set of the negation
and therefore it is a theoretical set. Formally, A(s)C = {x ∈ X ∣ x ̸ s} = {x ∈ X ∣ x  ¬s} =
A(¬s).
The countable union of verifiable sets corresponds to the verifiable set of the countable
disjunction and therefore it is a theoretical set. Formally, A(s1 ∨ s2 ) = {x ∈ X ∣ x  s1 ∨ s2 } =
{x ∈ X ∣ x  s1 or x  s2 } = {x ∈ X ∣ x  s1 } ∪ {x ∈ X ∣ x  s2 } = A(s1 ) ∪ A(s2 ). This generalizes
to countable disjunctions.
The collection ΣX = A(D) ¯ is therefore a σ-algebra by definition since it satisfies all its
properties.

There is also a special link between topologies and σ-algebras. As one may want to con-
struct measures and probability spaces on topological spaces, there is a standard way to
construct a σ-algebra from a topology. This object, called Borel algebra, is the smallest σ-
algebra that contains all verifiable sets defined by the topology. The σ-algebra defined by
a theoretical domain is none other than the Borel algebra of the topology defined by the
corresponding experimental domain.

Definition 1.71. Let (X, T) be a topological space. Its Borel algebra is the collection
ΣX of subsets of X generated by countable union, countable intersection and complement
from the verifiable sets.

Proposition 1.72. The natural σ-algebra for a set of possibilities is the Borel algebra of
its natural topology.

Proof. Since the theoretical domain can be generated by a basis of the experimental
domain, the natural σ-algebra can be generated by a sub-basis of the natural topology. This
means that it is also generated by countable union, countable intersection and negation
from the verifiable sets of the natural topology.

This fundamental link between experimental domains and topology on one side and the-
oretical domains and σ-algebra on the other is important for multiple reasons.
From a practical standpoint, it guarantees that these mathematical tools can always be
used in science. Since experimental and theoretical domains are general constructs, any branch
1.6. DECIDABLE DOMAINS 141

of scientific investigation can use techniques and results from topology and σ-algebras for
calculations or for characterizing the domain at hand.
From a conceptual standpoint it provides a Rosetta stone, i.e a way to translate, between
the mathematical concepts and the scientific ones. It gives a precise scientific meaning to the
mathematical tools and everything built on top of them. Every single step in a calculation,
every single argument in a proof can be given a clear, and possibly insightful, physical meaning.
It grounds the abstract mathematical language in more concrete scientific objects. This in turn
helps clarify the science described by common mathematical tools, unearthing possible hidden
assumptions or simplifications about the physical systems being studied.
This connection explains why these mathematical tools have found such successful appli-
cation in the physical sciences.

1.6 Decidable domains


We conclude this chapter by analyzing decidable domains, those for which we can experimen-
tally test both the truth and the falsehood of all statements. For example, the domain for
animal identification and the domain for the amount of money in my poket can be considered
decidable as we can typically tell experimentally whether “this animal has whiskers” or not,
or whether “there is more than one dollar and fifty cents in my pocket” or not.
Decidable domains have special characteristics and are easier to study. Since we can verify
the negation, any theoretical statement is also verifiable. And since all theoretical statements
are verifiable, so are the possibilities. That is, we can verify that “this animal is a cat”
and that “there are two dollars and thirty cents in my pocket”. As all statements can be
expressed as a disjunction of possibilities, the possibilities themselves form a countable basis.
For example, “there is more than one dollars and fifty cents in my pocket” can be expressed
as the disjunction of the appropriate statements of the form “there are x dollars and y cents
in my pocket”.

Definition 1.73. An experimental domain DX is decidable if all statements in the domain


are decidable. Formally, for every s ∈ DX we have ¬s ∈ DX .

Proposition 1.74. Let DX be an experimental domain. The following are equivalent:

1. the experimental domain is decidable


2. the experimental domain and its theoretical domain coincide
3. all possibilities are verifiable
4. the possibilities form a countable basis.

Proof. To prove 2 from 1, suppose DX is a decidable experimental domain. As DX is


decidable, it is already closed under negation and therefore all statements in its theoretical
domain D ¯ X are already in DX .
To prove 3 from 2, suppose DX coincides with its theoretical domain D ¯ X . As each
possibility is a theoretical statement, it is also a verifiable statement.
To prove 4 from 3, suppose the possibilities are verifiable. Note that the possibilities
can generate all other statements through disjunction. To show X is countable, consider
a countable basis B ⊆ DX . Because the possibilities are verifiable statements, they can
142 CHAPTER 1. VERIFIABLE STATEMENTS AND EXPERIMENTAL DOMAINS

be generated from B by finite conjunction and countable disjunction. Moreover, since the
possibilities are the narrowest statements that are not impossible, they can be generated
from B using finite conjunction only. Since B is countable and X is generated by B through
finite conjunction, X can be at most countable. Therefore X is a countable basis.
To prove 1 from 4, suppose the possibilities X form a countable basis. Then each pos-
sibility is verifiable and so is their countable union. The negation of a verifiable statement
can be expressed as the countable union of possibilities, and is therefore verifiable. All
statements in the experimental domain are decidable and therefore the domain is decid-
able.
As the possibilities for a decidable domain must form a countable basis, their cardinal-
ity can’t be greater than countable. That is: only domains that are non-decidable can have
possibilities with cardinality of the continuum. In this sense they are more constrained and
simpler to study.
Mathematically, the natural topology corresponds to the discrete topology: the one for
which any subsets of the possibilities is a verifiable set. That is, the topology is simply the
set of all possible sets of possibilities. The cardinality of the possibility is therefore enough to
determine the topology of the space, which means that one number is enough to characterize
the space.

Definition 1.75. A topology TX on a set X is called discrete if it contains every subset


of X.

Theorem 1.76 (Decidability is discreteness). The natural topology of the possibilities X


for a domain DX is discrete if and only if the domain is decidable.

Proof. Suppose DX is decidable. Let U ⊆ X be a subset of possibilities. The statement


s = ⋁ x is generated from X through countable disjunction. Since DX is decidable, X is
x∈U
a countable basis and s is verifiable. Therefore U is a verifiable set and it is contained in
the natural topology. The natural topology of X is discrete by definition.
Now suppose DX is such that the natural topology for its possibilities X is discrete.
Let s = ⋁ x be a statement. Since the topology is discrete, U is part of the topology and
x∈U
s is verifiable. Consider its negation ¬s = ⋁ x. Since the topology is discrete, U C is also
x∈U C
part of the topology and ¬s is verifiable. This means s is decidable. Since every statement
in DX is decidable, the domain is decidable.

Note, though, that discrete does not imply finite or vice-versa. The domain for extra-
terrestrial life is finite but is not decidable as we cannot verify that “there is no extra-terrestrial
life”. The domain for the amount of money in my pocket, instead, is decidable but not nec-
essarily finite as I could potentially have an arbitrarily large amount.

1.7 Summary
In this first chapter we have laid down the foundations for our general mathematical theory
of experimental science. We have seen how it is grounded in the logic of verifiable statements,
1.7. SUMMARY 143

which is more limited than the logic of pure statements as it has to deal with the practical
constraints introduced by the termination of the tests.

Statements (∧ ∨ ¬ – ⊺)
¯X
D
Theoretical statements
statements associated with an experimental test
DX
X
Verifiable statements Possibilities
if true, test always experimentally
succeeds distinguishable
cases

ΣX
TX X

Open sets Points

Borel sets
Sets (∪ ∩ ∁ X ∅)

We saw that we can group verifiable statements into experimental domains which must
have a countable basis to allow us to test any statement within an indefinite amount of time.
We saw how to construct theoretical domains to find all the theoretical statements that can
be associated to an experimental test. And we saw how the possibilities are those statements
that, if true, give a complete prediction for all statements in the domain.
We have seen that, because of the disjunctive normal form, each verifiable and theoretical
statement is equivalent to a set of possibilities and how logic operations and relationships
become set operations and relationships. As such, the experimental and theoretical domains
respectively provide a natural topology and σ-algebra for the possibilities.
What we have ended up with is a conceptual framework that captures the necessary
elements of scientific practice and codifies them into a symbolic representation with a well
defined meaning. There is no guesswork as to what the points of our spaces are: they are the
possibilities, statements that provide a complete description for the domain. We do not have
to provide an “interpretation” as to what the sets of a topology represent: they correspond to
verifiable statements. All the objects have a clear definition and meaning from the start, we
know which ones are necessary and to what extent they are physical or idealized. This will
provide a much more solid foundation to the rest of the work, which will ultimately allow us
to understand much better the fundamental physical theories and the connections between
them and to other areas of scientific thought.
144 CHAPTER 1. VERIFIABLE STATEMENTS AND EXPERIMENTAL DOMAINS
1.8. REFERENCE SHEET 145

1.8 Reference sheet


Name Meaning
B Boolean domain the set of possible truth values
i.e. B = {true, false}
S logical context a set of statements with well defined logical
relationships
s∈S statement an assertion with a well defined truth value
truth ∶ S → B the truth function returns whether a statement is true or not
AS ⊆ BS possible assignments the logically consistent truth value combina-
tions that can be assigned to the statements
⊺ certainty a statement that can only be true (i.e. it is true
in all possible assignments)
– impossibility a statement that can never be true (i.e. it is
false in all possible assignments)
contingent statement a statement that can be either true or false
depending on the possible assignment
¬s negation (logical NOT) the statement whose truth value is always op-
posite
s1 ∧ s2 conjunction (logical AND) the statement that is true only if all statements
are true
s1 ∨ s2 disjunction (logical OR) the statement that is true if any of the state-
ments is true
s1 ≡ s2 equivalence whether each statement is a logical conse-
quence of the other (i.e. they must have the
same value in every possible assignment)
s1 ≼ s2 narrower than whether the first statement is more specific
than the second (i.e. in every possible assign-
ment, if the first is true than the second must
be also true)
s1 ≽ s2 broader than whether the second statement is narrower than
the first
s1  s2 compatibility whether both statement can be true at the
same time (i.e. there is a possible assignment
in which they are both true)
s1 á s2 independence whether both statement can be true at the
same time (i.e. there is a possible assignment
for each combination of their possible truths)
minterm a conjunction where each statement appears
once, either negated or not
s ∈ Sv verifiable statement a statement that can be validated experimen-
tally
D experimental domain a set of verifiable statement that can be tested
in an indefinite amount of time (i.e. a set of
statements closed under finite conjunction and
countable disjunction, that precisely contains
the certainty, the impossibility and a set of ver-
ifiable statements generated by a countable ba-
sis)
B∈D basis a set of verifiable statements from which all
others can be constructed
146 CHAPTER 1. VERIFIABLE STATEMENTS AND EXPERIMENTAL DOMAINS

Name Meaning
¯
D theoretical domain the set of all statements constructed from an
experimental domain that can be associated
with an experimental test
approximately verifiable when a statement is not verifiable but is the
limit of a sequence of statements that are
X possibilities of a domain those statements that, if true, determine the
value of all verifiable statements of a domain
ẋ established possibility a possibility for which at least a verifiable state-
ments is true (i.e. it can be established experi-
mentally)
x̊ residual possibility if it exists, the possibility for which all ver-
ifiable statements are false (i.e. the remain-
ing case that cannot be established experimen-
tally)
Chapter 2

Domain combination and


relationships

We continue our investigation of the fundamental mathematical structures for experimental


science by studying what happens when we have more than one experimental domain. We
will define experimental relationships between experimental domains, which capture either
causal or inference relationships between them. We will see that these correspond to continuous
functions in the natural topology.
We will take two or more domains and merge all the experimental information that can be
gathered through them into a combined domain. We will study how the set of possibilities of
the combined domain depends not only on the original domains, but also on the relationships
between them. These will also determine the natural topology that can vary from the product
topology all the way to the disjoint union topology.
We will also show that experimental relationships, under suitable conditions, can them-
selves be verified experimentally by constructing the relationship domain for which its
possibilities correspond to the possible relationships.

2.1 Dependence and equivalence between domains


The first thing we want to be able to characterize, when dealing with more than one domain,
is when there exists a relationship between them. For example, consider the domains for the
temperature and height of a mercury column or the domains for the temperature and density
of water. How do we express, in this framework, the fact that these domains are connected?
We have two ways to define these relationships between domains. The first is in terms of
inference: any measurement on the height of a mercury column is an indirect measurement
on its temperature; any experimental test on the density of water is an indirect experimental
test on its temperature. The second is in terms of causes: the height of the mercury column
depends on its temperature; the density of water is a function of its temperature. The main
result of this section is to show that these definitions are equivalent and that the dependent
domain can be seen as a sub-domain of the other.
Suppose DX represents the domain for the temperature of a mercury column while DY
represents the domain for its height. Since we know that an increase in temperature makes
the metal expand, we can infer the temperature of the mercury column by looking at its
height. For example, if we verify that “the height of the mercury column is between 24 and

147
148 CHAPTER 2. DOMAIN COMBINATION AND RELATIONSHIPS

25 millimeters” we will be able to infer that “the temperature is between 24 and 25 Celsius”.
That is, given a verifiable statement sY we have another verifiable statement sX that is going
to be true if and only if the first one is, that is sY ≡ sX .
Note that the inference is between verifiable statements and not intervals. For example, the
verifiable statement “the water density is between 999.8 and 999.9 kg/m3 ” will correspond to
“the water temperature is between 0 and 0.52 Celsius”∨“the water temperature is between 7.6
and 9.12 Celsius” as water is most dense at 4 Celsius. The disjunction of verifiable statements
is still a verifiable statement so we are still inferring one verifiable statement from the other.
For each verifiable statement in DY we can find a verifiable statement in DX that is verified
if and only if the first is. That is: an inference relationship is a map from DY to DX that
preserves equivalence.

Definition 2.1. An inference relationship between two experimental domains estab-


lishes that testing a verifiable statement in one means testing a verifiable statement in the
other. Formally, an inference relationship between two experimental domains DX and DY
is a map r ∶ DY → DX such that r(sY ) ≡ sY . In other words: it is an equivalence-preserving
map between experimental domains.

An inference relationship is essentially an injection that preserves equivalence instead


of identity. In terms of equality, the two statements “the height of the mercury column is
between 24 and 25 millimeters” and “the temperature is between 24 and 25 Celsius” are
different, but they are the same in terms of equivalence. In this sense, the dependent domain
is already contained within the other domain. This means we can define domain inclusion and
equivalence based on inference relationships.

Definition 2.2. An experimental domain DY is dependent on another experimental do-


main DX , noted DY ⊆ DX , if there exists an inference relationship r ∶ DY → DX .

Corollary 2.3. Let DX be an experimental domain. Let DY be a subset of statements of


DX that form an experimental domain (i.e. contains impossibility, certainty and is closed
under finite conjunction and countable disjunction). Then DY ⊆ DX .

Proof. Let ι ∶ DY → DX be the inclusion map. This is an inference relationship since


ι(sY ) = sY ≡ sY therefore DY depends on DX .

Definition 2.4. Two experimental domains DX and DY are equivalent DX ≡ DY if DX


depends on DY and vice-versa.

Corollary 2.5. Domain equivalence satisfies the following properties:

ˆ reflexivity: D ≡ D
ˆ symmetry: if DX ≡ DY then DY ≡ DX
ˆ transitivity: if DX ≡ DY and DY ≡ DZ then DX ≡ DZ

and is therefore an equivalence relationship.


2.1. DEPENDENCE AND EQUIVALENCE BETWEEN DOMAINS 149

Proof. For reflexivity, D is a subset of D that is an experimental domain, therefore


D ⊆ D by 2.3. Equivalence follows by symmetry.
For symmetry, suppose DX ≡ DY , then DY ⊆ DX and DX ⊆ DY and therefore DY ≡ DX .
For transitivity, suppose DX ≡ DY and DY ≡ DZ . Then we have the following inference
relationships: rXY ∶ DX → DY , rY X ∶ DY → DX , rY Z ∶ DY → DZ , rZY ∶ DZ → DY . We can
define the function compositions rXZ = rY Z ○rXY and rZX = rY X ○rZY . These are inference
relationships since sX ≡ rXY (sX ) ≡ rY Z (rXY (sX )) and sZ ≡ rZY (sZ ) ≡ rY X (rZY (sZ )).
Therefore DX ≡ DZ .

Water Temperature vs. Density Steel Temperature vs. Density

ρ (mg/cm3 ) ρ (g/cm3 )

1000.0 - 7.00 -
a
b = f (a)
f (b)

999.85- 6.50 -
0 b f −1 (a) 4 f −1 (a) 8 0 a = f −1 (b) 1000
T(○ C) T(○ C)

It should be evident that we cannot impose inference relationships between any two do-
mains: it’s something that the domains allow or not. The domains for the temperature of two
different mercury columns are in general not related: testing the value of one does not tell
us anything about the other. The topologies of the two domains, however, are going to be
the same because we’ll have the same possible values and the same way to experimentally
test them. Equivalence between experimental domains is a much stronger relationship than
equivalence of the natural topology. It carries enough of the semantic to be able to tell what
spaces are truly scientifically equivalent.
Let’s continue with our example. We can re-express a relationship between domains in
terms of causal relationship between the two domains. If x is the value of temperature of
the mercury column (i.e. a possibility for DX ) and y is the height of the mercury column
(i.e. a possibility for DY ), then we can write y = f (x) since the height is determined by the
temperature.
Note that the direction of the causal relationship is the opposite of the inference. X causes
Y and from DY we can infer DX . Chains of events are in terms of possibilities and start with
the cause and end with the effect. Chains of inferences are in terms of verifiable statements
and start with the result and end with the origin.
The other directions do not work in general. Even if we know the final possibility, we may
not be able to reconstruct the initial possibility: if the water density is exactly 999.9 kg/m3 ,
the temperature could be either 0.52 or 7.6 Celsius because density peaks at 4 Celsius. For
the same reason, a measurement of the cause is not always equivalent to a measurement of
the effect: verifying that “the water temperature is between 0 and 0.52 Celsius” will mean
that “the water density is between 999.8 and 999.9 kg/m3 ” but not the other way around.
150 CHAPTER 2. DOMAIN COMBINATION AND RELATIONSHIPS

Because of the peak in density, the statement about the temperature tells us more (i.e. it is
narrower) than the statement about the density and therefore they are not equivalent. That
is: we can learn more about the temperature by measuring it directly than indirectly through
the density.
Another important consideration is that, in order to be consistent, the function y = f (x)
has to be continuous. The general idea is the following: if we say that we can only measure both
the temperature and height of a mercury column with finite precision, we have to make sure
that we can use the causal relationship for inference. Therefore a finite precision measurement
of height will correspond to a finite precision measurement of temperature. This means a
small change in height has to correspond to a small change in temperature: the function is
continuous.
More precisely, consider the verifiable statement sY =“the height of the mercury column is
between 24 and 25 millimeters”. The height of the mercury column y is within the verifiable
set U (sY ) = (24, 25) millimeters. We can then infer that the temperature must be in the
reverse image of the possible heights f −1 (UY (sY )) = (24, 25) Celsius. But this means that,
indirectly, we have experimentally verified that x is in f −1 (UY (sY )). And if DX is really
the domain of the verifiable statements for the temperature, then it must contain one that
matches “the temperature of the mercury column is between 24 and 25 Celsius”. In other
words, f −1 (UY (sY )) must be a set in the topology of X and the function is continuous.1

Definition 2.6. Let (X, TX ) and (Y, TY ) be two topological spaces. A continuous func-
tion is a map f ∶ X → Y such that given any verifiable set UY ∈ TY its reverse image
f −1 (UY ) ∈ TX is a verifiable set. A homeomorphism is a continuous bijective map such
that its inverse is also continuous.

Definition 2.7. A causal relationship between two experimental domains establishes


that determining which possibility is true in the first domain also determines which pos-
sibility is true in the second. Formally, a causal relationship between two experimental
domains DX and DY is a function f ∶ X → Y between the possibilities of the respective
domains such that x ≼ f (x) for all x ∈ X.

Corollary 2.8. All causal relationships are continuous functions over the respective natural
topologies.

Proof. Let f ∶ X → Y be a causal relationship between DX and DY . Let y ∈ Y be


a possibility for DY . Consider f −1 (y): this is the set of all possibilities in X that are
compatible with y which is, by definition, the theoretical set of y. Therefore we have
y≡ ⋁ x. Now consider a verifiable statement sY ∈ DY and the associated verifiable
x∈f −1 (y)
set UY (sY ) ⊆ Y . We have sY ≡ ⋁ y ≡ ⋁ ⋁ x ≡ ⋁ x. Given
y∈UY (sY ) y∈UY (sY ) x∈f −1 ({y}) x∈f −1 (UY (sY ))
that sY is verifiable, ⋁ x is also verifiable because it is equivalent to a verifiable
x∈f −1 (UY (sY ))
statement. Since f (UY (sY )) is
−1
the set of possibilities compatible with that statement,

1
In topology, continuity is defined in terms of the sets in the topology and not in terms of small changes
as in analysis. When using the standard topology on real numbers, the two coincide but not in general.
2.1. DEPENDENCE AND EQUIVALENCE BETWEEN DOMAINS 151

f −1 (UY (sY )) must be a verifiable set and therefore f −1 (UY (sY )) ∈ TX must be in the
natural topology of DX . Therefore f is a continuous function over the respective natural
topologies.

Corollary 2.9. A causal relationship between two domains is unique if it exists.

Proof. Suppose f1 ∶ X → Y and f2 ∶ X → Y are two causal relationships. Let x ∈ X. We


have x ≼ f1 (x) and x ≼ f2 (x). This means f1 (x)  f2 (x). But these are two possibilities of
the same domain, so they are either incompatible or are the same possibilities. Therefore
f1 (x) ≡ f2 (x) for all x ∈ X. The causal relationships are the same.

Theorem 2.10 (Experimental Relationship Theorem). Inference and causal relationships


are equivalent. More formally, let DX and DY be two experimental domains. An inference
relationship r ∶ DY → DX exists between them if and only if a causal relationship f ∶ X → Y
also exists.

Proof. First we show that a causal relationship exists between the independent and
the dependent domain. Suppose DY depends on DX . Given that for each statement in DY
there exists an equivalent statement in DX , DY is effectively a subset of DX . Moreover,
since the theoretical domains for both experimental domains are generated by completing
under negation, the theoretical domain D ¯ Y will effectively be a subset of D¯ X . This means
¯
that a possibility x ∈ DX , if true, will determine all the truth values of all statements in
D¯ Y , including its possibilities. Because one possibility of Y must be true and because all
possibilities are incompatible with each other, there must be one and only one possibility
y∈D ¯ Y compatible with x. Therefore we can define f ∶ X → Y the function that given a
possibility x ∈ X returns the only possibility y = f (x) ≽ x that is compatible with it.
We still need to show that f is continuous. Consider a verifiable statement sY ∈ DY .
Let UY (sY ) ∈ TY be its verifiable set. Since DY depends on DX , we can find sX ∈ DX such
that sX ≡ sY . Let UX (sX ) ∈ TX be its verifiable set. This is also the set of all possibilities
in X that are compatible with sY , which means UX (sX ) contains all the possibilities that
are compatible with a possibility in UY (sY ). Since f returns the only possibility in Y
compatible with a possibility in X, f −1 (UY (sY )) will return all the possibilities in X that
are compatible with a possibility in UY (sY ). That means f −1 (UY (sY )) = UX (sX ) and that
f −1 maps verifiable sets to verifiable sets. Therefore f is continuous.
Now we show that a causal relationship implies dependence between domains. Suppose
we have a causal relationship f ∶ X → Y between DX and DY . Let y ∈ Y be a possibility
for DY . Consider f −1 ({y}): this is the set of all possibilities in X that are compatible with
y which is, by definition, the theoretical set of y. Therefore we have y ≡ ⋁ x. Now
x∈f −1 ({y})
consider a verifiable statement sY ∈ DY . We have sY ≡ ⋁ y≡ ⋁ ⋁ x≡
y∈UY (sY ) y∈UY (sY ) x∈f −1 ({y})
⋁ x. Because f is continuous, the reverse image of a verifiable set is a verifiable
x∈f −1 (UY (sY ))
set. Therefore there is an sX ∈ DX such that UX (sX ) = f −1 (UY (sY )). The two verifiable
statements sX ≡ ⋁ x ≡ sY are equivalent. For each sY ∈ DY we can find an equivalent
x∈UX (sX )
sX ∈ DX so DY depends on DX .
152 CHAPTER 2. DOMAIN COMBINATION AND RELATIONSHIPS

Corollary 2.11. Two experimental domains DX and DY are equivalent if and only if there
exists a homeomorphism f ∶ X → Y between the possibilities such that x ≡ f (x).

Proof. Let DX and DY be two equivalent experimental domains. Then we can find
the causal relationship f ∶ X → Y and g ∶ Y → X. We have x ≼ f (x) ≼ g(f (x)). Since
g(f (x)) ∈ X is a possibility, and since x is the only possibility compatible with itself, we
must have x ≡ g(f (x)). Therefore g is the inverse of f and it is continuous. Therefore f is
a homeomorphism. We also have x ≼ f (x) ≼ x, therefore f (x) ≡ x.
Now let f ∶ X → Y be a homeomorphism between the possibilities of DX and DY such
that x ≡ f (x). Then f is a causal relationship and DY depends on DX . Let g be the inverse
of f , which is continuous since f is a homeomorphism. We have y ≡ f (g(y)) ≡ g(y). Then
g is also a causal relationship and DX depends on DY . Therefore DX ≡ DY .

Since for each inference relationship we have a causal relationship and vice-versa, we
will simply use the term experimental relationship to describe the link between the two
domains.
We should stress that causal relationships and inference relationships are defined on spaces
that have, in a sense, a different status in a physical theory. Inference relationships are de-
fined on verifiable statements, on finite precision measurements, which are the objects that
are directly defined experimentally. In this sense, inferences have a higher status as they
are more directly related to experimental verification. However, the map r ∶ DY → DX is
over-complicated and redundant precisely because it maps all possible finite precision mea-
surements from one domain to the other. Conversely, the causal relationship f ∶ X → Y is
only defined on the possibilities, on the points, therefore there is no redundancy. However, the
possibilities are not verifiable statements in general and are often the product of idealizations.
In this sense, causal relationships have a lower status as they are only indirectly defined by
experimental verification.
The perfect correspondence between causal and inference relationships is what rescues and
justifies the predominant focus on causal relationships to describe experimental relationships:
since studying one is mathematically the same as studying the other, why should we use
the more complicated object? Therefore, while the inference relationship is more directly
physically meaningful, the causal relationship is a much more convenient object to study and
characterize. That is why, in the end, all relationships will be predominantly defined by a
function over the possibilities.
We now turn our attention back to domain equivalence. We have seen that two experi-
mental domains DX and DY are equivalent if they consist of equivalent statements: if they
allow a one to one correspondence that preserves the equivalence of their statements. This
implies, for example, that the possibilities are also equivalent but there is more to it.
Suppose we define some type of operation on one domain. For example, on the experimental
domain for the temperature of a mercury column we define an increase by one Celsius; or on
the experimental domain for the amount of gasoline in a tank we define the sum of two possible
amounts. These will correspond either to operations on the domain (e.g. f ∶ DX → DX ) or
on its possibilities (e.g. + ∶ X × X → X). But by doing so we are also defining them on all
equivalent domains as well: in the end, they are made of equivalent statements. Therefore we
are also defining an increase of the height of the mercury column and the sum of the monetary
value of the gasoline.
2.1. DEPENDENCE AND EQUIVALENCE BETWEEN DOMAINS 153

This means that if we capture some physical feature using some mathematical structure on
one domain, then all equivalent domains will inherit the same structure. Moreover, the causal
relationship is a function that preserves that structure. If the possibilities of one domain form a
vector space, then the possibilities of an equivalent domain form a vector space and the causal
relationship is an invertible linear transformation. If the possibilities of one domain form a
group, then the possibilities of an equivalent domain form a group and the causal relationship
is an isomorphism.2 As we’ll see much later, this is fundamental since deterministic and
reversible evolution means equivalence of the domains describing the past, present and future
states. Therefore deterministic and reversible motion is not “just” a one to one map.

Theorem 2.12 (Domain Equivalence is Isomorphism). Let DY ≡ DX be two equivalent


experimental domains. Suppose DX is endowed with some mathematical structure. Then
DY is also endowed with an equivalent structure and the experimental relationship preserves
said structure.
Proof. Since an experimental domain is really defined not on the statements themselves
but on their equivalence classes, the mathematical structure will also be defined on the
equivalence classes. But this means that a structure defined on DX is also defined on DY
since they contain the same equivalence classes. Therefore DY is also endowed with an
equivalent structure.
The experimental relationship can either expressed as a map between possibilities f ∶
X → Y or as a map between verifiable statements r ∶ DX → DY . This means that the
mathematical structure defined on DY can be transported to DX using the experimental
relationship. But since the mathematical structure defined on DX already contains the
mathematical structure defined on DY , the transported mathematical structure has to be
the same. That is, the experimental relationship must preserve the mathematical structure
defined on DX .

Note that the converse is not true: two domains that are endowed with the same mathemat-
ical structure are not necessarily equivalent. Consider two similarly constructed thermometers:
their respective experimental domains are not equivalent since knowing something about one
tells us nothing about the other. Yet, their natural topologies are equivalent because the way
we can measure temperature for both is the same. One way to look at it is that the mathemat-
ical structures “forget” the full equivalence between statements and only look at a particular
aspect. Topological spaces capture how the possibilities are distinguished in terms of verifiable
statements. Therefore, while the domains for temperature of two different thermometers are
not equivalent, their natural topologies are equivalent because the way we characterize all
possible measurements is the same (i.e. the value is within a finite precision interval). Simi-
larly, the σ-algebra only cares about what statements can be associated to experimental tests.
We’ll see that, in some cases, Abelian groups will capture how distributions can be composed
into other distributions, that non-Abelian groups will capture how transformations can be
composed into other transformations, and so on.

2
Domain equivalence is an isomorphism in whatever category (e.g. topological space, group, vector space,
...) used to model the experimental domain.
154 CHAPTER 2. DOMAIN COMBINATION AND RELATIONSHIPS

2.2 Combining domains


In this section we want to understand what happens when we combine statements from
different domains. For example, suppose we have the experimental domain for the pressure
of an ideal gas and the experimental domain for its temperature. We can mix and match
verifiable statements with conjunction and disjunction as in “the pressure is between 1 and
1.1 KPa”∧“the temperature is between 20 and 21 C” creating a new domain. How can we
characterize this combined experimental domain?
The main result of this section is that the possibilities of the combined domain depend on
how compatible the verifiable statements of the domains are. In particular, if the verifiable
statements are independent across domains (e.g. the horizontal and vertical position of an
object), then the possibilities of the combined domain will be the scalar product of those
for the individual domains. On the other hand, if the verifiable statements are incompatible
across domains (e.g. plant identification and animal identification), then the possibilities for
the combined domain will be the disjoint union of the possibilities of the individual domains.
Suppose DX is the experimental domain generated by the two verifiable statements “the
patient is dead” and “the patient is alive” and a second one DY generated by the two verifiable
statements “the patient is not in a coma” and “the patient is in a coma”. The given verifiable
statements also correspond to the possibilities for the respective domains.
We can construct the combined domain DX × DY by taking all possible disjunctions and
conjunctions. What are the possibilities for the new domain? Since by 1.48 the possibilities
are minterms, we have the following cases to consider:

ˆ “the patient is alive” ∧ “the patient is in a coma”


ˆ “the patient is alive” ∧ “the patient is not in a coma”
ˆ “the patient is dead” ∧ “the patient is in a coma”
ˆ “the patient is dead” ∧ “the patient is not in a coma”

The third one is impossible: the patient cannot be dead and in a coma. Therefore the combined
domain has only three possibilities. The possibilities of the combined domain are, in general,
the subset of all possible combinations (i.e. the scalar product) of the possibilities of the
domains we are combining, those that are not impossible.

Definition 2.13. Let {DXi }∞


i=1 be a countable set of experimental domains. The com-

bined experimental domain DX = ⨉ DXi is the experimental domain generated from
i=1
all statements in {DXi }∞
i=1 by finite conjunction and countable disjunction.
Proof. We need to show that the combined experimental domain is indeed an experi-
mental domain. It will contain the certainty and the impossibility since any of the original
experimental domains contains them. It is closed under finite conjunction and countable
disjunction by construction. To show that it has a countable basis, for each i = 1..∞ let

Bi ∈ DXi be a countable basis for the respective domain. Consider B = ⋃ Bi . From this set
i=1
we can generate any DXi and therefore we can also generate all of DX . B is a basis and it
is countable since it is the union of a countable set of countable elements. Note that it is
precisely because the basis needs to remain countable that we cannot extend the operation
to an uncountable set of domains.
2.2. COMBINING DOMAINS 155

Proposition 2.14. The possibilities for a combined domain are a subset of the scalar
product of the possibilities for the individual domains. Formally, let {DXi }∞ i=1 be a countable
set of experimental domains and {Xi }∞ i=1 their respective possibilities. Let X be the set
∞ ∞
of possibilities for the combined domain DX = ⨉ DXi . Then X = {x = ⋀ xi ∣ {xi }∞
i=1 ∈
i=1 i=1

⨉ Xi , x ≢ –}.
i=1

Proof. A possibility x of the combined domain is a minterm of a basis B ⊆ DX . Since



we can choose B = ⋃ Bi where Bi ⊆ DXi is a countable basis for each domain, x is the
i=1

conjunction x ≡ ⋀ xi of minterms xi of Bi . Since x is a possibility, it is not impossible and
i=1
therefore none of the xi can be impossible. Since each xi is a minterm of the respective
basis Bi that is not impossible, it is a possibility by 1.48. Therefore a possibility x of the
combined domain is the conjunction of the possibilities xi of the original domains that is
not impossible.

Proposition 2.15. Let {DXi }∞i=1 be a countable set of incomplete experimental domains
and DX their combined experimental domain. Let {x̊i }∞
i=1 be the residual possibility of each

respective domain. Let x̊ = ⋀ x̊i . The combined domain DX is incomplete if and only if
i=1
x̊ ≢ – in which case x̊ is the residual possibility.

Proof. For each domain, the residual possibility is the conjunction of the negation of
∞ ∞
its basis. Therefore we have x̊ ≡ ⋀ ¬e ≡ ⋀ ⋀ ¬e ≡ ⋀ x̊i . If x̊ ≢ – then it is the residual
e∈B i=1 e∈Bi i=1
possibility.

Corollary 2.16. The residual possibility of the combined domain is narrower than the
ones of the original domains. If one of the domains is complete then the combined domain
is also complete.

Proof. Let x̊ be the residual possibility for the combined domain and x̊j the one for one
∞ ∞
of the original domains. We have x̊j ∧ x̊ ≡ x̊j ∧ ⋀ x̊i ≡ ⋀ x̊i ≡ x̊. Therefore, by 1.22 x̊ ≼ x̊j .
i=1 i=1
Suppose that one of the original domains DXj is complete. Then the conjunction of the
negation of its basis x̊j is an impossibility. This means the conjunction of the negation of
the basis of the combined domain x̊ is also an impossibility and the combined domain is
complete.

Independent domains

A special case is when combining two independent domains. For example, the domain for the
pressure and the domain for the volume of an ideal gas are independent because a measurement
on one tells us nothing about the other. Similarly, the domain for the shape and the domain
for the color of an object are independent. In these cases, we can have any combination of
possibilities: any pressure with any volume or any color with any shape.
In terms of topology, the possibilities of the combined domain are the Cartesian product of
156 CHAPTER 2. DOMAIN COMBINATION AND RELATIONSHIPS

the possibilities of the original domains and their natural topology is the product topology.3

Number of Males vs. Females Number of Males vs. Total Population


F P

M M

Figure 2.1: Domain independence and projections. Given a population, the number of male
and female members form independent domains: knowing something about one value tells us
nothing about the other. Pictorially, a constraint on one side gives us a vertical or horizontal
band, which projected on the other axis, gives us the full axis (i.e. the certainty). On the other
hand, the total population and the number of males are not independent domains: given the
total population, the number of males cannot exceed that number; given the number of males,
the total population cannot be lower than that number. Pictorially, we see that the combined
domain does not span the whole plane. A constraint on one domain gives us, when projected,
a constraint on the other domain.

Definition 2.17. The experimental domains of a countable set {DXi }∞i=1 are independent
if taking one verifiable statement si ∈ DXi from each domain always gives an independent
set of statements.

Proposition 2.18. Let {DXi }∞ i=1 be a countable set of independent experimental domains
and Xi their respective possibilities. The set of possibilities X of the combined experimental

domain ⨉ DXi consists of all the possible conjunctions of the possibilities of each domain.
i=1

That is: X = { ⋀ xi ∣ xi ∈ Xi }. Notationally, we write DX = D ∞ .
i=1 ⨉ Xi
i=1


Proof. A possibility x ≡ ⋀ xi for the combined domain is the conjunction of possibilities
i=1
of each individual domain by 2.14. Since the domains are independent and since possibilities
are neither certainties nor impossibilities, by 1.21 there exists an assignment a ∈ AS such

3
Note that the topology is quite naturally the product topology and not the box topology. The box topology
would require countable conjunction and is therefore discarded. The fact that the correct topology is the one
most natural to define confirms again the appropriateness of our framework.
2.2. COMBINING DOMAINS 157

that a(xi ) = true for all i = 1..∞ and therefore a(x) = true. That is, each conjunction

x = ⋀ xi is not impossible and therefore is a possibility.
i=1

Definition 2.19. Let {(Xi , Ti )}∞
i=1 be a countable set of topological spaces. Let X = ⨉ Xi
i=1

be the Cartesian product of the points. Let B be the collection of sets of the form ⨉ Ui ,
i=1
with Ui ∈ Ti and Ui ≠ Xi only finitely many times. The topology generated by B is called
the product topology.

Proposition 2.20. Let {DXi }∞ i=1 be a countable set of independent experimental domains.
The natural topology for the possibilities of the combined experimental domain D ∞ is
⨉ Xi
i=1
the product topology of the natural topology for the possibilities of each domain.

Proof. Let Ui ∶ DXi → TXi be the map from a verifiable statement of a domain to
its verifiable set in the respective topology. Let U ∶ DX → TX be the same map for the
combined domain. Let si ∈ DXi be a verifiable statement from a particular domain and
Ui (si ) its verifiable set in that domain. Since we also have si ∈ DX , the statement is also
associated with the verifiable set U (si ) in the combined domain. Because the domains are
independent, every possibility in Ui (si ) is compatible with any possibility xj ∈ Xj for all
j ≠ i. This means that U (si ) = X1 × ... × Xi−1 × Ui (si ) × Xi+1 × ... . Given that a verifiable
statement in the combined domain can be generated using finite conjunction and countable
disjunction from the verifiable statements of the independent domains, the topology of the

combined space can be generated by all sets of the form ⨉ Ui , with Ui ∈ Ti and Ui ≠ Xi only
i=1
once. Using finite conjunction, this includes those sets where Ui ≠ Xi finitely many times.
The natural topology of the combined domain is the product topology by definition.

Dependent domains
Another special case is combining a domain DX with another DY ⊆ DX that is dependent
on it. For example, combining the domain for the temperature of a mercury column with
the domain for its height. Since the height can be determined by the temperature, no new
possibilities are added. The combined domain is equivalent to the independent domain DX
since all the verifiable statements in DY have equivalents in it.

Proposition 2.21. Let DX and DY be two experimental domains such that DY ⊆ DX


depends on the first. Then DX × DY ≡ DX .
Proof. Since DY is dependent on DX , any statement in DY is equivalent to one in DX .
Therefore no statement can be generated from them that is not equivalent to one already
contained in DX . Therefore DX × DY is equivalent to DX .

Corollary 2.22. Let DX and DY be two experimental domains such that DY ⊆ DX depends
on the first. The possibilities of DX × DY are the possibilities of DX .
158 CHAPTER 2. DOMAIN COMBINATION AND RELATIONSHIPS

Proof. The possibilities of DX × DY are the possibilities of DX since they are equivalent
domains. To be consistent with 2.14, we additionally show that they are also the subset
of the scalar product. Let f ∶ X → Y be the causal relationship between the domains, let
x and y be two possibilities of the respective domains. We have x ∧ y ≢ – if and only if
y = f (x), so the possibilities of DX ×DY are x∧f (x) for all x ∈ X. We also have x∧f (x) ≡ x
since x ≼ f (x).

Incompatible domains
The last special case we consider is when the domains are incompatible, that is all verifiable
statements of one are incompatible with the verifiable statements of the others. This is one
case where the residual possibility behaves differently from all the others.4 Suppose DX is the
domain to classify a particular specimen as an animal and DY is the domain to classify it as
a plant. If we take a verifiable statement from the first, such as “that specimen has fur”, then
it will be incompatible with a verifiable statement from the other, such as “that specimen
has lobed leaves”. The only way we can combine the possibilities is to take an established
possibility of one (e.g. “this specimen is a cat”) and combine it with the residual possibility
of the other (e.g. “this specimen is not a plant”). In other words, the combined possibilities
are the union of the possibilities of the two domains (e.g. all possible plants and all possible
animals).
In terms of the topology, the established possibilities of the combined domain are the
disjoint union of the established possibilities of the original domains and their natural topology
is the disjoint union topology (or co-product topology).

Definition 2.23. Two experimental domains DX and DY are incompatible if all verifi-
able statements in one are incompatible with all verifiable statements of the other. Formally,
sX ̸ sY for each pair of verifiable statements sX ∈ DX and sY ∈ DY .

Corollary 2.24. Let DX and DY be two incompatible experimental domains. Then they
must be incomplete and admit a residual possibility.

Proof. Let BX and BY be a countable basis for the respective domain. Since we have
eX ̸ eY for all choices of eX ∈ BX and eY ∈ BY , we also must have ⋁ eX ̸ ⋁ eY .
eX ∈BX eY ∈BY
Therefore ⋁ eX ≢ ⊺. Which means the residual possibility x̊ = ⋀ ¬eX = ¬ ⋁ eX ≢
eX ∈BX eX ∈BX eX ∈BX
–. Therefore DX is not complete and, by symmetry, neither is DY .

Proposition 2.25. Let {DXi }∞ i=1 be a countable set of experimental domains pair-wise
incompatible and Ẋi their respective established possibilities. The set of established possi-

bilities Ẋ of the combined experimental domain ⨉ DXi consists of the disjoint union of the
i=1
∞ ∞
possibilities of each domain. That is: Ẋ = ∐ Ẋi = ⋃ Ẋi as Ẋi ∩ Ẋj = ∅ for all i, j ≥ 1 and
i=1 i=1
i ≠ j. Notationally, we write DX = D ∞ .
∐ Xi
i=1

4
In fact, this is what prompted us to introduce the residual possibility.
2.2. COMBINING DOMAINS 159

Proof. Consider two incompatible domains DX and DY . Let ẋ ∈ X and ẏ ∈ Y be two


established possibilities. Then they both correspond to a minterm of the respective basis
where at least one element is taken without negation. This also means that their conjunction
will include the conjunction of one element of each of the basis. Since the elements of one
basis are incompatible with the elements of the other, we have ẋ ̸ ẏ.
Now let ẋ ∈ X be an established possibility, ẙ ∈ Y the residual and Ẏ = Y ∖ {ẙ} the
established possibilities. We have ẋ ≡ ẋ ∧ ⊺ ≡ ẋ ∧ ⋁ y ≡ ẋ ∧ ( ⋁ ẏ ∨ẙ) ≡ ⋁ (ẋ ∧ ẏ) ∨ (ẋ ∧ẙ) ≡
y∈Y ẏ∈Ẏ ẏ∈Ẏ
– ∨ (ẋ ∧ ẙ) ≡ ẋ ∧ ẙ. By symmetry, ẏ ≡ x̊ ∧ ẏ.
To conclude, let x̊ ∈ X and ẙ ∈ Y be the two residual possibilities. The conjunction x̊ ∧ẙ
corresponds to a minterm where all the elements of the basis are negated. Therefore x̊ ∧ ẙ
is the residual possibility of the combined domain, if it is not impossible.
Generalizing to a countable set of incompatible domains, the conjunction of all the
residual possibilities x̊ = ⋀∞ i=1 x̊i is the residual possibility of the combined domain, if it is
not impossible. The only other conjunctions of the form x = ⋀∞ i=1 xi with xi ∈ Xi for all
i ≥ 1 that are not impossible are those where only one element is not a residual possibility.
Those correspond to the established possibilities. But each of those conjunctions will be
equivalent to the only element that is not a residual possibility. Therefore each established
possibility of the combined domain is equivalent to an established possibility of one of the

original domains: Ẋ = ⋃ Ẋi . Given that the established possibilities of two incompatible
i=1
domains are incompatible and therefore different, we have Ẋi ∩ Ẋj = ∅ for all i, j ≥ 1 and
i ≠ j. The established possibilities of the combined domains are the disjoint union of the
established possibilities of the individual domains.

i=1 be a countable set of topological spaces. Let X = ∐ Xi
Definition 2.26. Let {(Xi , Ti )}∞
i=1
be the disjoint union of the points. The disjoint union topology T is the topology for
which U ∈ T if and only if U ∩ Xi ∈ Ti for all i ≥ 1.

Proposition 2.27. The disjoint union topology is generated by closing the topologies of
the initial spaces under disjoint union.

Proof. First we show that all disjoint unions of verifiable sets are part of the disjoint
union topology. Let {(Xi , Ti )}∞
i=1 be a countable set of topological spaces and (X, T) their
disjoint union with the disjoint union topology. Any disjoint union of verifiable sets can be
∞ ∞
put in the form U = ∐ Ui with Ui ∈ Ti for all i ≥ 1. We have U ∩ Xi = ∐ Uj ∩ Xi = Ui for all
i=1 i=j
i ≥ 1. Therefore U ∈ T by definition.
Now we show that any set in the disjoint union topology is the disjoint union of verifiable
sets of the individual topologies. Let U ∈ T. Since U ⊆ X and X is the disjoint union for

all Xi , we can write U = ∐ Ui . As before U ∩ Xi = Ui for all i ≥ 1 and, since U ∩ Xi ∈ Ti
i=1
for all i ≥ 1 by definition of the disjoint union topology, Ui ∈ Ti for all i ≥ 1. Therefore any
U ∈ T is the disjoint union of verifiable sets.
This also shows that the disjoint union topology is a topology since it is closed under
arbitrary union by definition, it is closed under finite intersection within each topology and
160 CHAPTER 2. DOMAIN COMBINATION AND RELATIONSHIPS

it is closed under finite intersection across topologies since their intersection is always the
empty set.

Proposition 2.28. Let {DXi }∞ i=1 be a countable set of pair-wise incompatible domains.
The natural topology for the established possibilities of the combined experimental domain
D∞ is the disjoint union topology of the natural topology for the established possibilities
∐ Xi
i=1
of each domain.

Proof. Given that the domains are incompatible, they are also incomplete. The only
statement in each domain DXi compatible with the respective residual possibility x̊i will
be the certainty ⊺. This means the natural topology restricted to the established possi-
bilities contains all the verifiable sets associated to the impossibilities and to all verifiable
statements. This will also be true for the combined domain D ∞ , if it is incomplete.
∐ Xi
i=1
If it is complete, then the established possibilities are the full possibilities and their nat-
ural topology coincides. Note that all verifiable statements in the combined domain can
be generated from the verifiable statements of the individual domains. Also note that the
conjunction between the different domains is an impossibility and therefore does not yield
a new statement. Therefore all the statements whose verifiable sets form the topology on
the established possibilities on the combined domain can be generated by the disjunction
of all the statements whose verifiable sets form the topology on the established possibilities
on the individual domains. Therefore the topology of the combined space is simply closing
the individual topologies under the disjoint union. The topology of the combined space is
therefore the disjoint union topology by 2.27
For topologies, as well as for other mathematical structures, we can choose between two
types of products. If we have two one dimensional euclidean spaces we can decide whether to
take their product (i.e. the plane) or their co-product (i.e. the disjoint union of the two lines).
For experimental domains we do not choose: it is what it is. There is no way to combine the
two experimental domains for temperature and height of the same mercury column and get the
Cartesian product of their possibilities. Though combining two independent domains mimics
the categorical product and combining two incompatible domains mimics the categorical co-
product, it is the semantic (and ultimately physical) relationship between them that decides
which product we have. This is another case where the mathematical structures “forget” the
full equivalence. The topology only captures how the temperature and height can be measured,
and not whether they are independent, dependent or incompatible. That information lies
outside of the topology and therefore needs to be added by choosing the correct combination.

2.3 Experimental domain for experimental relationships


Now that we have seen how to describe relationships between domains, we should ask: are
experimental relationships themselves something we can experimentally verify? We may know
that there is a relationship between the temperature of a mercury column and its height, but
how can we confirm experimentally which one it is?
The main result of this section is to show that, given two related experimental domains, we
can always mathematically construct from them another experimental domain for which the
possibilities are continuous functions between the possibilities of the original domains. This
2.3. EXPERIMENTAL DOMAIN FOR EXPERIMENTAL RELATIONSHIPS 161

means that, since we can recursively create relationship domains about relationship domains,
the universe of discourse of our mathematical framework is closed. Yet, the availability of the
experimental tests (i.e. whether the statements we construct are actually verifiable) is not
guaranteed.
First of all we have to clarify within our framework what it means to experimentally
verify a relationship. Suppose DX is the domain for the state of a light switch, up or down
being the two possibilities, and DY is the domain for the state of the light, which can be
on or off. Suppose we could have three cases: the one where the switch is wired correctly,
up corresponding to on, the one where the switch is wired incorrectly, up corresponding to
off, and the one where the switch is not actually wired to the light, the two domains are
independent. Each of these cases is represented by a different set of logical relationships, a
different set of possible assignments.
X Y
up down on off
T F T F
F T F T
Table 2.1: Case 1: switch wired correctly to the light

X Y
up down on off
T F F T
F T T F
Table 2.2: Case 2: switch wired incorrectly to the light

X Y
up down on off
T F T F
T F F T
F T T F
F T F T
Table 2.3: Case 3: switch not wired to the light

Each case, then, is a different model: a different logical context. To experimentally verify
which model is the right one means verifying statements of the type “it is possible for the
switch to be up while the light is on”. This does not correspond to “the switch is up” ∧ “the
light is on” but to “the switch is up”  “the light is on”. These are statements about the table
and they cannot be columns of the table itself. What we need to create is a metacontext, in
which each line corresponds to one choice of model, to one context.
Each line will tell us what statements are compatible and given the truth values of the
line we can reconstruct the model. The verifiable statements that distinguish between differ-
ent models, therefore, live in this metacontext and so will the experimental domain for the
relationships.
162 CHAPTER 2. DOMAIN COMBINATION AND RELATIONSHIPS

up  on down  on up  off down  off


Case 1 T F F T
Case 2 F T T F
Case 3 T T T T
Table 2.4: Possible assignments for the metacontext

Definition 2.29. Let S1 and S2 be two logical contexts. Let S1 ⊆ S1 and S2 ⊆ S2 be sets
of statements. We say f ∶ S1 → S2 is a logic homomorphism if it preserves the logical
structure. That is:

ˆ possible assignments on S2 correspond to possible assignments on S1 : if a2 ∈ AS2 then


there exists a1 ∈ AS1 such that a1 (s) = a2 (f (s)) for all s ∈ S1
ˆ a verifiable statement of S1 is mapped to a verifiable statement in S2

Additionally, if there exists a logic homomorphism g ∶ S2 → S1 such that f ○ g = IdS1


and g ○ f = IdS2 then f is a logic isomorphism and g is the inverse of f . Two logic
isomorphisms f ∶ S1 → S2 and g ∶ S1 → S2 are equivalent if f (s) ≡ g(s) for all s ∈ S1 .

Corollary 2.30. Let h ∶ S1 → S2 be a logic homomorphism, then for every ŝ ∈ S1 , Ŝ ⊆


S1 and fB ∶ BŜ → B such that a1 (ŝ) = fB ({a1 (s)}s∈Ŝ ) for all a1 ∈ AS1 , then a2 (h(ŝ)) =
fB ({a2 (h(s))}s∈Ŝ ) for all a2 ∈ AS2 . In particular:

ˆ h(¬s) ≡ ¬h(s) for all s ∈ S1


ˆ h(⋀s∈S s) ≡ ⋀s∈S h(s) for all S ⊆ S1
ˆ h(⋁s∈S s) ≡ ⋁s∈S h(s) for all S ⊆ S1

Proof. If a1 (ŝ) = fB ({a1 (s)}s∈Ŝ ) for all a1 ∈ AS1 then, by the first property of logic
homomorphisms, there is no a2 ∈ AS2 such that a2 (h(ŝ)) ≠ fB ({a2 (h(s))}s∈Ŝ ). Therefore
a2 (h(ŝ)) = fB ({a2 (h(s))}s∈Ŝ ) for all a2 ∈ AS2 .
In particular, since the relationship holds for any arbitrary fB , it will hold for negation,
arbitrary conjunction and arbitrary disjunction.

Proposition 2.31. Let S1 and S2 be two logical contexts. Let S1 ⊆ S1 and S2 ⊆ S2 be sets
of statements and f ∶ S1 → S2 a logic homomorphism. Let S̄1 ⊆ S1 and S̄2 ⊆ S2 be the
closure of S1 and S2 respectively under arbitrary conjunction, arbitrary disjunction and
negation. Then if there exists a logic homomorphism f¯ ∶ S̄1 → S̄2 such that f¯(s1 ) ≡ f (s1 )
for all s1 ∈ S1 it is unique.

Proof. Let s̄1 ∈ S̄1 but s̄1 ∉ S1 . Then s̄1 depends on S1 through some gB ∶ BS1 → B. Let
f1 ∶ S̄1 → S̄2 and f¯2 ∶ S̄1 → S̄2 be two logic homomorphisms such that f¯1 (s1 ) ≡ f¯2 (s1 ) ≡ f (s1 )
¯
for all s1 ∈ S1 . Then by 2.30 both f¯1 (s̄1 ) and f¯2 (s̄1 ) depend on S2 through gB . This means
that in every possible assignment a ∈ AS̄2 we have a(f¯1 (s̄1 )) = a(f¯2 (s̄1 )) and therefore
f¯1 (s̄1 ) ≡ f¯2 (s̄1 ). Then f¯1 and f¯2 are equivalent.
2.3. EXPERIMENTAL DOMAIN FOR EXPERIMENTAL RELATIONSHIPS 163

Definition 2.32. Let DX , DY ⊆ S be two experimental domains. The relationship meta-


context SC(X,Y ) between DX and DY over {Si }i∈I is defined by following construction. Let
I be an index set. Let {Si }i∈I be an indexed set of logical contexts, let {DXi }i∈I and {DYi }i∈I
be two indexed sets of experimental domains such that DXi , DYi ⊆ Si for all i ∈ I. Let {fi }i∈I
be a set of logic isomorphisms such that fi ∶ DX → DXi . Let {gi }i∈I be a set of logic iso-
morphisms such that gi ∶ DY → DYi .
Let S be the set of ordered pairs of DX and DY and “sx  sy ” be the notation of an
element of S for some sx ∈ DX and sy ∈ DY . For each i ∈ I, let ai ∈ BS be such that
ai (“sx  sy ”) = true if and only if fi (sx )  gi (sy ). Define AS = {ai ∣ ∀i ∈ I}. Assume ai ≠ aj
if i ≠ j; if not, given a subset of I that leads to the same possible assignment, remove all
elements but one. Pick t ∈ I and define truth = at .
Given an arbitrary function fB ∶ BS → B, we can extend the set S with a new element
for which the possible assignments are calculated according to the function. The relationship
context SC(X,Y ) is the closure of S under all possible functions fB ∈ BS .
Let Sv ⊆ S be set of statements “sx ≼ sy ”=“sx ̸ ¬sy ”=¬“sx  ¬sy ” such that sx ∈ DX and
sy ∈ DY are verifiable. The set of verifiable statements Sv ⊆ S for the relationship context is
the set of statements generated by Sv using finite conjunction and countable disjunctions.

Justification. By constructions, the relationship context satisfies the axioms for a logical
context. We have a set of elements and a set of possible assignments of which one is the
truth, this satisfies axioms 1.2 and 1.4. Some of the statements are verifiable and satisfy
1.27. Closure for axioms 1.9, 1.31 and 1.32 are satisfied by constructions. We need to clarify
why and what cases this construct makes physical sense.
The set {Si }i∈I consists of different contexts, each containing a copy of the domains
DX and DY with a different relationship between them. For example, if DX is the domain
for a light switch, up or down being the possibilities, and DY is the domain for the light,
being on or off, in one context the two domain may have a causal relationship (i.e. the
switch is wired to the light), in another they may have a different causal relationship (i.e.
the switch is wired in the wrong way), and in yet another they may be independent (i.e.
switch is not wired with the light). The logic isomorphisms {fi }i∈I and {gi }i∈I allow us
relate the statements that represent the same assertion in the different contexts.
Note that while mathematically we can always construct the relationship metacontext,
it may not exist in reality. Specifically, there is nothing that guarantees that statements of
the form “sx ≼ sy ” are actually verifiable.

Now that we have the right context, we have to understand what is the right basis to
generate the experimental domain. Suppose we have a way to verify experimentally statements
of the type “the temperature of the mercury column is between 24 and 25 Celsius”≼“the height
of the mercury column is between 24 and 25 millimeters”. That is, we can verify that whenever
sX =“the temperature of the mercury column is between 24 and 25 Celsius” is verified, then
sY =“the height of the mercury column is between 24 and 25 millimeters” is verified. In that
case, we can explore the connection between the two domains within different ranges at ever
increasing precision. This means we can narrow the range of possible functions in the same
way that we can narrow the range of possible values for a quantity. These types of statements,
then, form an experimental domain where each possibility corresponds to a possible continuous
function between the two initial experimental domains.
164 CHAPTER 2. DOMAIN COMBINATION AND RELATIONSHIPS

Definition 2.33. Let SC(X,Y ) be the relationship metacontext between DX and DY over
{Si }i∈I . Let DYi ⊆ DXi for all i ∈ I. Let BX ⊆ DX , BY ⊆ DY be the two countable basis of
the respective domains and let B = {“ex ≼ ey ” ∣ ex ∈ BX , ey ∈ BY }. Then the relationship
domain DC(X,Y ) is the experimental domain generated by B.
Proof. The only thing that needs to be checked is that the basis that generates DC(X,Y )
is countable. Since BX and BY are countable, the set of statements of the form ≼(ex , ey )
with ex ∈ BX and ey ∈ BY is also countable.

Proposition 2.34. The possibilities for a relationship domain DC(X,Y ) coincide with the
possible experimental relationships between X and Y . That is, for each possibility z of
DC(X,Y ) there exists a continuous function fz ∶ X → Y such that z ≡ ⋀ “x ≼ fz (x)” and
x∈X
an i ∈ I such that ḡi (fz (f¯i−1 (xi ))) ≡ fc (xi ) for all xi ∈ Xi where fc is the causal relationship
between DXi and DYi .

Proof. First we show that for each possibility z of the relationship domain there exists
a set Pz ⊆ X × Y such that z ≡ ⋀ “x  y” ∧ ⋀ “x ̸ y”. Let z be a possibility
(x,y)∈Pz (x,y)∉Pz
of a relationship domain. This will be a minterm of the basis therefore z ≡ minterm(B).
Let x ∈ X and y ∈ Y be two possibilities and consider the statement “x  y”. If z is true
in a possible assignment, then the truth value for all statements of the form “ex ≼ ey ”
where (ex , ey ) ∈ BX × BY will be set. But x ̸ y if and only if there exists at least one pair
(ex , ey ) ∈ BX × BY such that x ≼ ex ≼ ey and y ̸ ey . Therefore if z is true in an assignment
it will tell us whether x ̸ y is true or not, so we either have z ≼ “x  y” or z ≼ ¬“x  y”.
Let Pz = {(x, y) ∈ X × Y ∣ z ≼ “x  y”} and ẑ = ⋀ “x  y” ∧ ⋀ ¬“x  y”. Then z ≼ ẑ.
(x,y)∈Pz (x,y)∉Pz
Conversely, if we suppose ẑ to be true in an assignment, then we will know the truth
assignment for all statements of the form “x  y” where (x, y) ∈ X ×Y . Let (ex , ey ) ∈ BX ×BY
and consider the statement “ex ≼ ey ”. We have ex ≼ ey if and only if there is no pair (x, y)
such that x ≼ ex , y ̸ ey and x  y. Therefore if ẑ is true in an assignment it will tell us
whether ex ≼ ey is true or not, so we either have ẑ ≼ “ex ≼ ey ” or ẑ ≼ ¬“ex ≼ ey ”. As z is a
minterm of statements of the form “ex ≼ ey ”, we’ll either have ẑ ≼ z or ẑ ̸ z. Since z ≼ ẑ,
ẑ  z and therefore ẑ ≼ z. Since we also have z ≼ ẑ, then we have z ≡ ẑ.
Now we want to show that for each possibility z there is a function fz ∶ X → Y such
that z ≡ ⋀ “x ≼ fz (x)”. We note that statements of the form “sx  sy are precisely the
x∈X
statements used to generate the relationship metacontext. Statements of the form “x  y”,
where x ∈ X and y ∈ Y , are enough to determine the truth value of statements of the
form “sx  sy . Therefore for each possibility z there is be one and only one possible truth
assignment in which z is true. That is, each x correspond to particular logical context
Si . Since in that context DYi ⊆ DXi there will be a causal relationship fi ∶ Xi → Yi such
that xi ≼ fi (xi ) which means xi  yi if and only if yi = fi (xi ). Since causal relationships
are unique and the possible assignments are distinct, to each z will correspond a unique
fz ∶ X → Y such that ḡi ○ fz ○ f¯i−1 = fi and z ≡ ⋀ “x ≼ fz (x)”. Moreover fz is continuous
x∈X
since f¯i and ḡi are logical isomorphisms.

Note that the definition for the metacontext simply declares some statements to be veri-
2.3. EXPERIMENTAL DOMAIN FOR EXPERIMENTAL RELATIONSHIPS 165

fiable: it does not guarantee that this is actually possible. So when can we do it? Or better,
what do we need in practice to be able to build the necessary confidence? Let’s think how we
would test the relationship between temperature and height of a mercury column. We would
prepare different mercury samples with different values of temperatures in many different
conditions, measure height and temperature with different devices, repeat many times, ask
someone else to do it independently, compare results, and so on. At some point we will have
explored enough of the possible cases, checked and tried anything that could invalidate the
result and we will have the confidence to say that “if the temperature of the mercury column
is between 24 and 25 Celsius then its height is between 24 and 25 millimeters”. We should
stress that the procedure is not at all to observe a few values and then generalize. It is not
mere induction. It is the ability to prepare and control the system in many different conditions
and our inability to violate the relationship that gives us the confidence needed to reach the
conclusion.
Suppose, in fact, that we want to experimentally verify the link between inflation and
money supply. As long as we cannot create new countries in different economic conditions,
the only thing we can do is gather data for as many nations as we can throughout history.5
Since we cannot purposely explore different conditions and we can’t even replicate older ones,
the best we can do is show that there was a correlation between those specific values. It is the
inability to freely and fully explore the problem space that may not enable us to experimentally
verify the causal relationship.6
The point is that we cannot give a general purpose algorithm for how to construct ex-
perimental tests for the relationships starting from the experimental tests of the individual
domains. We cannot formalize in general what it means that we have explored the space
“enough” to consider the relationship verified, no more than we can formalize when the data
collected is “enough” to consider a statement verified or when a statement is specific “enough”
to consider the semantics well defined. This is where the practice of experimental science comes
in.
But, while we may not know in general whether we can experimentally verify a rela-
tionship between two specific domains, we do know that the relationship domain can always
be constructed in principle and therefore our mathematical framework is complete. That is:
we can take two experimental domains (e.g. Dt for time and Dx for position), construct a
relationship domain between them (e.g. Dx(t) for trajectories in space), then take another
experimental domain (e.g. D(q,p) for the states) and construct another relationship domain
between the two (e.g. D(q,p)→x(t) for the relationship between states and trajectories). Our gen-
eral mathematical theory of experimental science is therefore closed, since we can recursively
create relationship domains about relationship domains indefinitely while remaining within
its bounds. In other words, experimentally distinguishable objects and their relationships will
never lie outside of the theory.

5
Computer simulations can sometimes alleviate this problem, though they are only as good as the model
one uses.
6
In this sense, experimental sciences allow for more rigorous results than observational sciences precisely
because it is more feasible to experimentally test relationships between domains.
166 CHAPTER 2. DOMAIN COMBINATION AND RELATIONSHIPS

2.4 Summary
In this chapter we have seen the first important set of consequences of our general mathemat-
ical theory of experimental science. We have seen that experimental relationships between
domains can be defined in terms of inference between verifiable statements or equivalently
in terms of causal relationship between possibilities. The causal relationship always corre-
sponds to a continuous function in the natural topology. Moreover, it will need to preserve
any additional mathematical structure that formalizes a physical characteristic of the domain.
We have also seen how to combine a set of countably many experimental domains and how
the combined possibilities depend on the logical and semantic relationships that exist across
domains. Depending on the case, we can have the scalar product of the possibilities (with the
product topology) all the way to the disjoint union of the established possibilities (with the
disjoint union topology).
We have shown that it is possible to construct experimental domains for the relationships
themselves. This means that within our theory we can describe relationships of arbitrarily
higher order (i.e. relationships about relationships) and therefore we never go outside of our
framework.
As we have only explored simple constructions based on the original concepts, these con-
clusions are of a general nature and therefore must apply to any area of science.
Chapter 3

Properties and quantities

In this chapter we will introduce the idea of properties, which we use to label the possibilities
of an experimental domain. For example, we may use names to distinguish among people, fur
color to distinguish among cats, pressure to distinguish the state of a gas. In particular, we’ll
define a quantity as a property that has a magnitude: we can compare two different values
and determine which one is greater or smaller.
Instead of simply assuming we have quantities, we will need to construct them from the
verifiable statements of an experimental domain. To do that, we will prove three theorems that
will give necessary and sufficient conditions for that construction, clarifying how quantities are
defined experimentally. These theorems will allow us to understand how the ordering of the
values for a quantity are linked to the logical relationships between the verifiable statements,
and in particular how the ordering of values is linked to the ordering defined by narrowness.
After characterizing quantities in general, we will see the cases of discrete and contin-
uous quantities. We will see how discrete quantities are linked to decidability, the ability
to verify both a statement and its negation, while continuous quantities are linked to a lack
thereof. We will then discuss how the requirements for continuous quantities cannot be truly
realized in practice and what happens when that idealization fails.

3.1 Properties
Whether the result of direct observation or not, a physical object is most often identified by
its properties and their values. A person may be identified by name and date of birth; a bird
may be characterized by the color of its beak; the state of a particle may be qualified by its
position and momentum.
In this section we define how in general we can use properties to label the possibilities of an
experimental domain. Each property will specify a continuous function from the possibilities
to the topological space of the possible property values. Some properties may fully identify
each possibility, as they assign distinct values, and some may also fully characterize which
statements are verifiable.
Suppose DX is a domain for animal identification and X, its possibilities, are all animal
species. Providing good names and definitions for species is a whole scientific subject by itself
(i.e. taxonomy) with its own rules (i.e. the International Code of Zoological Nomenclature).
The ICZN assigns each species a name composed of two Latin words. For example, “Passer
domesticus” is the official name for the house sparrow while “Passer italiae” is the one for

167
168 CHAPTER 3. PROPERTIES AND QUANTITIES

the Italian sparrow. This means that identifying an animal species is equivalent to identifying
its name. Therefore the domain to identify the species name DQ ≡ DX is equivalent to the
one for identifying the actual species and we have an experimental relationship q ∶ X → Q
between them. As each species is given a unique name this is the most discriminating type
of property: one that covers the whole range of possibilities and fully identifies each of them.
But this is a special case.
The ICZN also assigns a genus (pl. genera) to each species, which corresponds to the
first part of the species name. For example, both aforementioned sparrows are of the genus
“Passer” while all swans, black or white, are of the genus “Cygnus”. Now suppose that Q
is the set of all names for all genera and q ∶ X → Q is the function that gives us the genus
name for each species. We can still use this as a label for our possibilities, but it does not
fully identify them.
On a different note, we could decide to distinguish species by their morphological at-
tributes. For example, if Q is the set of all colors, we can imagine a function that gives us
the beak color for each species. The function is partial, not defined on all elements, as not all
species have a beak or its color may not be unique. We have q ∶ U → Q where U ⊆ X is the
set where the property is defined. To be consistent, though, we must at least be able to tell
in what cases the property is actually defined. For example, we must be able to tell whether
an animal has a beak. Therefore there must be a verifiable statement that is true if and only
if the property is well defined. The set U , then, must correspond to the verifiable set of said
statement.
Other examples of properties include: postal addresses for buildings, tax ID numbers for
people, generation for fundamental particles, position of the center of mass of a body. In all
these cases, the general pattern is that we assign a label to a possibility from an established
set.

Definition 3.1. A property for an experimental domain DX is any attribute we can use
to distinguish between its possibilities. Formally, it is a tuple (Q, q) where Q is a topological
space and q ∶ U → Q is a continuous function where U ∈ TX is a verifiable set of possibilities.
Justification. Suppose q is a physical property that can be assigned to distinguish the
set of possibilities U ⊆ X. We must be able to tell experimentally whether the property is
defined for each possibility. Therefore we are justified to assume that U is a verifiable set.
Let Q be the set of possible values for the property. Each element U will be matched with
a possible value. Therefore we are justified to assume there is a function q ∶ X → Q that
specifies that mapping.
As we must be able to experimentally distinguish between the values in Q, there must
be a collection TQ of verifiable sets that corresponds to the possible verifiable statements
associated to the quantity. Therefore we are justified to assume Q is a topological space.
Moreover, since the property value is determined by each possibility, the function q estab-
lishes a causal relationship between the two and therefore we are justified to assume q is a
continuous function.

Note that we have not defined a property as an experimental domain. The possibility of
the domain for a property would be a statement like “the species name for this animal is
Passer domesticus” while the value is just the object “Passer domesticus”. The idea is that
the property and its values are defined independently of the particular object. For example,
3.2. QUANTITIES AND ORDERING 169

distance in meters is defined independently of whether we later want to measure the earth
diameter or its distance to the moon.1
The fact that the property and its values are defined more generally leads to some subtle
issues: even if each possibility of the domain corresponds to one property value and each
verifiable statement of the property is equivalent to a verifiable statement of the domain, the
reverse may not be true. Consider the experimental domain for identifying negatively charged
fundamental particles. In this case, each particle will correspond to a unique value of mass,
and by measuring the mass we can infer the particle. But for each possible value of mass
we will not have a possible fundamental particle. Moreover, the verifiable statement “this
particle is an electron” would correspond to an infinitely precise statement of the form “the
mass of the particle is exactly 510998.9461... eV” which is not verifiable. The reason is that
identifying a negatively charged particle is not the same as measuring its mass with arbitrary
precision: in the first case we can stop when there is only one possible particle within the
finite range of our measurement. Now consider the experimental domain for the position of
the center of mass of a particle along a particular direction. Not only is the distance from a
fixed point enough to identify the position, but for each value of the distance we also have a
possible position of the particle.
We say that a property fully characterizes an experimental domain if each possible value
of a property corresponds to a possibility of the domain, like in the case of the distance from
a reference and position of a particle. The idea is that it tells us what cases are possible
and how they are experimentally verified. Instead, we say that a property fully identifies an
experimental domain if each possibility corresponds to a unique value, like in the case of the
mass of a negatively charged particle. The idea is that the value of the property allows us to
uniquely identify the possibility but nothing more.

Definition 3.2. The possibilities of an experimental domain are fully identified by a


property if its value is enough to uniquely determine a possibility. Formally, let (Q, q) be
a property for an experimental domain DX . Then the possibilities X are fully identified if
q ∶ X → Q is injective.

Definition 3.3. An experimental domain is fully characterized by a property if all


verifiable statements in the domain correspond to verifiable statements on the property and
vice-versa. Formally, let (Q, q) be a property for an experimental domain DX . Then it is
fully characterized if q ∶ X → Q is a homeomorphism.

3.2 Quantities and ordering


We now focus our attention on those properties that can be quantified. The number of people
in a group can be quantified by an integer, the distance between two objects can be quantified
in meters, the force acting on a body can be quantified by the magnitude of a vector expressed

1
While in principle we could envision a formal construction of properties and their values purely built
on top of the notion of statement (e.g. define the value red to be the set of all statements that claim that
something is of that color) we are not going to do so. While appealing in some respects, it is not clear whether
such a construction would actually help to clarify underlying assumptions while it is clear it would distract
from other more critical issues.
170 CHAPTER 3. PROPERTIES AND QUANTITIES

in Newtons. The defining characteristic for a quantity is that a value can be greater, smaller
or equal to another.
In this section we will define a quantity as a property with a linear order which we assume
given a priori, and study its relationship with the experimental domain. We will define the
order topology, which assumes one can experimentally verify whether a value is before or
after another one. We will see that a domain is fully characterized by a quantity only if
the possibilities themselves can be ordered, and how this ordering, in the end, is uniquely
characterized by statement narrowness: 10 is less than 42 because “the quantity is less than
10” is narrower than “the quantity is less than 42”.
As the defining characteristic for a quantity is the ability to compare its values, then the
values must be ordered in some fashion from smaller to greater. Therefore, given two different
values, one must be before the other. Mathematically, we call linear order an order with such
a characteristic as we can imagine the elements positioned along a line. Note that vectors are
not linearly ordered: no direction is greater than the other. Therefore, in this context, a vector
will not strictly be a quantity but a collection of quantities.2
We also have to define how this order can be experimentally verified. The idea is that we
should, at least, be able to verify that the value of a given quantity is before or after a set
value. This allows us to construct bounds such as “the mass of the electron is 511 ± 0.5 keV”
which we take to be equivalent to “the mass of the electron is more than 510.5 keV but less
than 511.5 keV”.3 For integers, this also allows us to verify particular numbers as “the earth
has one natural satellite” is equivalent to the “the earth has more than zero natural satellites
and fewer than two”. Therefore we will define the order topology as the one generated by sets
of the type (a, ∞) and (−∞, b).
A quantity, then, is an ordered property with the order topology.

Definition 3.4. A linear order on a set Q is a relationship ≤∶ Q × Q → B such that:

1. (antisymmetry) if q1 ≤ q2 and q2 ≤ q1 then q1 = q2


2. (transitivity) if q1 ≤ q2 and q2 ≤ q3 then q1 ≤ q3
3. (total) at least q1 ≤ q2 or q2 ≤ q1

A set together with a linear order is called a linearly ordered set.

Definition 3.5. Let (Q, ≤) be a linearly ordered set. The order topology is the topology
generated by the collections of sets of the form:

(a, ∞) = {q ∈ Q ∣ a < q} , (−∞, b) = {q ∈ Q ∣ q < b}.

2
In other languages, there are two words to differentiate quantity as in “physical quantity” (e.g. grandezza,
Grösse, grandeur) and as in “amount” (e.g. quantità, Menge, quantité). It is the second meaning of quantity
that is captured here.
3
The sentence “the mass of the electron is 511±0.5 keV” could instead be referring to statistical uncertainty
instead of an accuracy bound and would constitute a different statement when the different meaning is attached.
We will be treating these types of statistical statements later in the book, but suffice it to say that they cannot
be defined before statements that identify bounds.
3.2. QUANTITIES AND ORDERING 171

Definition 3.6. A quantity for an experimental domain DX is a linearly ordered property.


Formally, it is a tuple (Q, ≤, q) where (Q, q) is a property, ≤∶ Q × Q → B is a linear order
and Q is a topological space with the order topology with respect to ≤.
As for properties, the quantity values are just symbols used to label the different cases. The
set Q may correspond to the integers, real numbers or a set of words ordered alphabetically.4
The units are not captured by the numbers themselves: they are captured by the function q
that allows us to map statements to numbers and vice-versa.
As we want to understand quantities better, we concentrate on those experimental domains
that are fully characterized by a quantity. For example, the domain for the mass of a system
will be fully characterized by a real number greater than or equal to zero. Each possibility
will be identified by a number which will correspond to the mass expressed in a particular
unit, say in Kg. As the values of the mass are ordered, we can also say that the possibilities
themselves are ordered. That is, “the mass of the system is 1 Kg” precedes “the mass of the
system is 2 Kg”. This ordering of the possibilities will be linked to the natural topology as
“the mass of the system is less than 2 Kg”, the disjuction of all possibilities that come before
a particular possibility, is a verifiable statement.
We call a natural order for the possibility a linear order on them such that the order
topology is the natural topology. An experimental domain is fully characterized by a quantity
if and only if it is naturally ordered and that quantity is ordered in the same way: it is order
isomorphic. In other words, we can only assign a quantity to an experimental domain if it
already has a natural ordering of the same type.

Definition 3.7. Let DX be an experimental domain and X its possibilities. We say ≤∶


X × X → B is a natural order for the possibilities of the domain if the order topology
constructed from that ordering coincides with the natural topology. We say the domain and
the possibilities are naturally ordered if they admit a natural order.

Definition 3.8. Let (Q, ≤) and (P, ≤) be two ordered sets. A function f ∶ Q → P is in-
creasing if for every q1 , q2 ∈ Q, q1 ≤ q2 implies f (q1 ) ≤ f (q2 ). It is an order isomorphism
if it is bijective and, for every q1 , q2 ∈ Q, q1 ≤ q2 if and only if f (q1 ) ≤ f (q2 ).

Theorem 3.9 (Property ordering theorem). An experimental domain DX is fully charac-


terized by a quantity (Q, ≤, q) if and only if it is naturally ordered and the possibilities are
order isomorphic to the quantity.

Proof. Suppose DX is fully characterized by a quantity (Q, ≤q , q). Since q is a bijection,


we can define an ordering ≤x ∶ X × X → B such that x1 ≤x x2 if and only if q(x1 ) ≤q q(x2 ).
As q is now an order isomorphism, sets of the type (q1 , ∞) will be mapped to (q −1 (q1 ), ∞),
and sets of the type (−∞, q2 ) will be mapped to (−∞, q −1 (q2 )) for all q1 , q2 ∈ Q. As q is also
a homeomorphism, it will map a basis of one space to and only to a basis of the other. This
means the collection of sets of the type (x1 , ∞) and (−∞, x2 ) form a basis, and therefore
the ordering ≤x is a natural order.

4
When consulting the dictionary, we use the fact that we can experimentally tell whether the word we are
looking for is before or after the one we randomly selected.
172 CHAPTER 3. PROPERTIES AND QUANTITIES

We can run the argument in reverse, by assuming we have a naturally ordered exper-
imental domain, finding a set Q that is order isomorphic to the possibility and showing
that the topology induced by the isomorphism is the order topology.
Now that we have established that the linear ordering is something already present in the
domain, we can show it is actually based on the ordering of verifiable statements in terms
of narrowness. That is, “the mass of the system is 1 Kg” is before “the mass of the system
is 2 Kg” precisely because “the mass of the system is less than 1 Kg” is narrower than “the
mass of the system is less than 2 Kg”. The set of statements Bb of the form “the mass of the
system is less than q1 Kg” are linearly ordered by narrowness, and their ordering is the same
as the ordering of the possibilities/values. Similarly, the set of statements Ba of the form “the
mass of the system is more than q1 Kg” is also ordered by narrowness but with the reverse
ordering of the possibilities/values. These are the very statements whose verifiable sets define
the order topology and therefore jointly constitute a basis for the experimental domain.
Now consider the statement s1 =“the mass of the system is less than or equal to 1 Kg”
with s2 =“the mass of the system is less than 1 Kg”. We have s2 ≼ s1 . In fact, if we replace
the value in s2 with anything less than 1 Kg we’ll still have s2 ≼ s1 . Instead if we use a value
greater than 1 Kg we’d have s1 ≼ s2 . In other words, if we call B the set that includes both
the less-than-or-equal and less-than statements this is also linearly ordered by narrowness.
But “the mass of the system is less than or equal to 1 Kg” is equivalent to ¬“the mass of the
system is greater than 1 Kg”. In other words, B = Bb ∪ ¬(Ba ) contains all the statements like
“the mass of the system is less than q1 Kg” and ¬“the mass of the system is more than q1
Kg” and these are all linearly ordered by narrowness.
The ordering of B can be further characterized. Note that s1 =“the mass of the system
is less than or equal to 1 Kg” is the immediate successor of s2 =“the mass of the system is
less than 1 Kg”. That is, they are different and there can’t be any other statement in B that
is broader than s2 but narrower than s1 since they differ for a single case. This will happen
for any mass value. So B is composed of two exact copies of the ordering of X, where each
element of one copy is immediately followed by an element of the other copy. Moreover, if a
statement in B has an immediate successor, there must be only one case that separates the
two. If we call q1 the value of that case, then the statement must be of the form “the mass
of the system is less than q1 Kg” while its immediate successor is of the form “the mass of
the system is less than or equal to q1 Kg”: the successor is broader by just the possibility
associated with q1 . Therefore statements in B that have an immediate successor must be in
Bb as well.
The main result is that the above characterization of the basis of the domain is necessary
and sufficient to order the possibilities. If an experimental domain has a basis composed of
two parts Bb and Ba such that B = Bb ∪ ¬(Ba ) is linearly ordered by narrowness with those
characteristics, then the experimental domain is naturally ordered. The possibilities can be
written as x ≡ ¬sb ∧ ¬sa where sb ∈ Bb and ¬sa is the immediate successor. In fact, “the mass
of the system is 1 Kg” is equivalent to “the mass of the system is not less than 1 Kg and
not more than 1 Kg”. Note that, because of negation, the possibilities may not in general be
verifiable.

Definition 3.10. Let DX be a naturally ordered experimental domain and X its possi-
3.2. QUANTITIES AND ORDERING 173

bilities. We define the following notation: “x < x1 ”= ⋁ x, “x > x1 ”= ⋁ x,


{x∈X ∣ x<x1 } {x∈X ∣ x>x1 }
“x ≤ x1 ”= ⋁ x and “x ≥ x1 ”= ⋁ x. That is, they represent statements like
{x∈X ∣ x≤x1 } {x∈X ∣ x≥x1 }
“the value of the quantity x is less than x1 ”.

Corollary 3.11. Based on the previous notation, “x ≤ x1 ” ≡ ¬“x > x1 ” and “x ≥ x1 ” ≡


¬“x < x1 ”.

Proof. Note that “x ≤ x1 ” ∨ “x > x1 ” ≡ ⋁ x ≡ ⊺ while “x ≤ x1 ” ∧ “x > x1 ” ≡ ⋁ x ≡ –.


x∈X x∈∅
Therefore one is the negation of the other. Similarly “x < x1 ” ∨ “x ≥ x1 ” ≡ ⋁ x ≡ ⊺ while
x∈X
“x < x1 ” ∧ “x ≥ x1 ” ≡ ⋁ x ≡ –.
x∈∅

Definition 3.12. Let DX be a naturally ordered experimental domain and X its possibili-
ties. Define Bb = {“x < x1 ” ∣ x1 ∈ X}, Ba = {“x > x1 ” ∣ x1 ∈ X} and B = Bb ∪ ¬(Ba ).

Definition 3.13. Let (Q, ≤) be an ordered set. Let q1 , q2 ∈ Q. Then q2 is an immediate


successor of q1 and q1 is an immediate predecessor of q2 if there is no element strictly
between them in the ordering. That is, q1 < q2 and there is no q ∈ Q such that q1 < q < q2 .
Two elements are consecutive if one is the immediate successor of the other.

Proposition 3.14. Let DX be a naturally ordered experimental domain. Then (Bb , ≼),
(Ba , ≽) and (B, ≼) are linearly ordered sets. Moreover (Bb , ≼), (Ba , ≽) are order isomorphic
to (X, ≤).

Proof. Let f ∶ X → Bb be defined such that f (x1 ) = “x < x1 ”. As there is one and
only one statement “x < x1 ” for each x1 ∈ X, f is a bijection. Suppose x1 ≤ x2 , we
have f (x2 ) ≡ ⋁ x≡( ⋁ x) ∨ ( ⋁ x) ≡ f (x1 ) ∨ f (x2 ) and therefore
{x∈X ∣ x<x2 } {x∈X ∣ x<x1 } {x∈X ∣ x<x2 }
f (x1 ) ≼ f (x2 ). On the other hand if f (x1 ) ≼ f (x2 ) then as sets (−∞, x1 ) ⊆ (−∞, x2 ) which
means x1 ≤ x2 . This means that f is an order isomorphism between (Bb , ≼) and (X, ≤).
Similarly, let g ∶ X → Ba be defined such that g(x1 ) = “x > x1 ”. As there is one
and only one statement “x > x1 ” for each x1 ∈ X, g is a bijection. Suppose x1 ≤ x2 , we
have g(x1 ) ≡ ⋁ x≡( ⋁ x) ∨ ( ⋁ x) ≡ g(x1 ) ∨ g(x2 ) and therefore
{x∈X ∣ x>x1 } {x∈X ∣ x>x1 } {x∈X ∣ x>x2 }
g(x1 ) ≽ g(x2 ). On the other hand if g(x1 ) ≽ g(x2 ) then as sets (x1 , ∞) ⊇ (x2 , ∞) which
means x1 ≤ x2 . This means that g is an order isomorphism between (Ba , ≽) and (X, ≤).
To show that B is linearly ordered, let x1 , x2 ∈ X. If they both come from either Bb or
¬(Ba ) then they are already ordered by narrowness. If not, consider the two statements
“x < x1 ” and “x ≤ x2 ” ≡ ¬“x > x2 ”. As X is linearly ordered, either {x ∈ X ∣ x < x1 } ⊆ {x ∈
X ∣ x ≤ x2 } or {x ∈ X ∣ x ≤ x2 } ⊆ {x ∈ X ∣ x < x1 }. Therefore either “x < x1 ” ≼ “x ≤ x2 ” or
“x ≤ x2 ” ≼ “x < x1 ”. Which means B = Bb ∪ ¬(Ba ) is linearly ordered by ≼.

Proposition 3.15. Let Bb and Ba be two sets of verifiable statements such that B =
Bb ∪ ¬(Ba ) is linearly ordered by narrowness. Let Db and Da be the experimental domains
they respectively generate and D = Db ∪¬(Da ). Then (Db , ≼), (Da , ≽) and (D, ≼) are linearly
ordered.
174 CHAPTER 3. PROPERTIES AND QUANTITIES

Proof. First we show that (Db , ≼) is linearly ordered. We have that Bb is linearly or-
dered by narrowness because it is a subset of B which is linearly ordered by narrowness.
Note that the conjunction of a finite set of statements linearly ordered by narrowness will
return the narrowest element and the disjunction of a finite set of statements linearly or-
dered by narrowness will return the broadest element. The countable disjunction, instead,
can introduce new elements. But using those elements again will not introduce new ones:
the disjunction of countable disjunctions will still be a countable disjunction; the finite
conjunction of countable disjunctions is the countable disjunction of finite conjunctions,
which returns elements of the original set and therefore reduces to countable conjunctions.
Therefore, when forming Db the only new elements will be the countable disjunctions.
Consider two countable sets B1 , B2 ⊆ Bb . Their disjunctions b1 = ⋁ b and b2 = ⋁ b
b∈B1 b∈B2
represent the narrowest statement that is broader than all elements of the respective set.
Suppose that for each element of B1 we can find a broader element in B2 . Then b2 , being
broader than all elements of B2 , will be broader than all elements of B1 . But since b1 is
the narrowest element that is broader than all elements in B1 , we have b2 ≽ b1 . Conversely,
suppose there is some element in B1 for which there is no broader element in B2 . Since
the initial set is fully ordered, it means that that element of B1 is broader than all the
elements in B2 . This means that element is broader than b2 and since b1 is broader than
all elements in B1 we have b1 ≽ b2 . Therefore the domain Db generated by Bb is linearly
ordered by narrowness.
Now we show that (Da , ≽) is linearly ordered. The basis Ba is linearly ordered by
broadness because the negation of its elements are part of B and are ordered by narrowness.
Note that broadness is the opposite order of narrowness and therefore a set linearly ordered
by one is linearly ordered by the other. Therefore Ba is also linearly ordered by narrowness
and so is Da by the previous argument. Therefore Da is ordered by broadness.
To show that D = Db ∪ ¬(Da ) is linearly ordered by narrowness, we only need to show
that the countable disjunctions of elements of Bb are either narrower or broader than the
countable conjunctions of the negations of elements of Ba . Let B1 ⊂ Bb and A2 ⊂ Ba . The
disjunction b1 = ⋁ b represents the narrowest statement that is broader than all elements
b∈B1
of B1 while the conjunction ¬a2 = ¬ ⋁ a = ⋀ ¬a represents the broadest statement that
a∈A2 a∈A2
is narrower than all elements of ¬(A2 ). Suppose that for one element of ¬(A2 ) we can find
a broader statement in B1 . Then b1 , being broader than all elements in B1 , will be broader
than that one element in ¬(A2 ). But since ¬a2 is narrower than all elements in ¬(A2 ), we
have ¬a2 ≼ b1 . Conversely, suppose that for no element of ¬(A2 ) we can find a broader
statement in B1 . As B is linearly ordered, it means that all elements in ¬(A2 ) are broader
than all elements in B1 . This means that all elements in ¬(A2 ) are broader than b1 and
therefore b1 ≼ ¬a2 . Therefore D is linearly ordered by narrowness.

Theorem 3.16 (Domain ordering theorem). An experimental domain DX is naturally


ordered if and only if it is the combination of two experimental domains DX = Da × Db such
that:

(i) D = Db ∪ ¬(Da ) is linearly ordered by narrowness


3.2. QUANTITIES AND ORDERING 175

(ii) all elements of D are part of a pair (sb , ¬sa ) such that sb ∈ Db , sa ∈ Da and ¬sa is
either the immediate successor of sb in D or sb ≡ ¬sa
(iii) if s ∈ D has an immediate successor, then s ∈ Db

Proof. Let DX be a naturally ordered experimental domain. Let Bb and Ba be defined


as in 3.12 which means B = Bb ∪ Ba is the basis that generates the order topology. Let
Db be the domain generated by Bb and Da be the domain generated by Ba . Then DX is
generated from Db and Da by finite conjunction and countable disjunction and therefore
DX = Db × Da .
To prove (i), we have that Bb and Ba are linearly ordered by 3.14. We need to show that
the linear ordering holds across the sets. Let x1 , x2 ∈ X and consider the two statements
“x < x1 ” and “x ≤ x2 ” ≡ ¬“x > x2 ”. As X is linearly ordered, either {x ∈ X ∣ x < x1 } ⊆ {x ∈
X ∣ x ≤ x2 } or {x ∈ X ∣ x ≤ x2 } ⊆ {x ∈ X ∣ x < x1 }. Therefore either “x < x1 ” ≼ “x ≤ x2 ” or
“x ≤ x2 ” ≼ “x < x1 ”. Which means B = Bb ∪ ¬(Ba ) is linearly ordered by ≼. By 3.15 the set
D = Db ∪ ¬(Da ) is also linearly ordered.
To prove (ii), let sb ∈ Db . Take sa ∈ Da such that ¬sa is the narrowest statement in
¬(Da ) that is broader than sb . This exists because Da is closed by infinite disjunction. As
¬sa ≽ sb , let X1 be the set of possibilities compatible with ¬sa but not compatible with sb .
The set cannot have more than one element, or we could find an element x1 ∈ X1 such that
sb ≼ “x ≤ x1 ” ≺ ¬sa . If X1 contains one possibility, then ¬sa is the immediate successor. If
X1 is empty then sb ≡ ¬sa . Similarly, we can start with sa ∈ Da and find sb ∈ Db such that sb
is the broadest statement in Db that is narrower than ¬sa . Let X1 be the set of possibilities
compatible with ¬sa but not compatible with sb . If X1 contains one possibility, then ¬sa is
the immediate successor and if X1 is empty then sb ≡ ¬sa .
To prove (iii), let s1 , s2 ∈ D such that s2 is the immediate successor of s1 . This means
we can write s2 ≡ s1 ∨ x1 for some x1 ∈ X. This means s1 ≡ “x < x1 ” while s2 ≡ “x ≤ x1 ” and
therefore s1 ∈ Bb .
Now we need to prove the opposite direction. Now let DX = Db × Da be an experimental
domain as described in the theorem. Let X be the set of all statements of the form x =
¬sb ∧ ¬sa for which sb ∈ Db , sa ∈ Da and ¬sa is the immediate successor of sb in D. All
statements in X are possibilities. In fact, take x = ¬sb ∧ ¬sa ∈ X and s ∈ D. It is not
impossible because sb ≺ ¬sa . Since ¬sa is the immediate successor of sb , we either have
s ≼ sb ≼ sb ∨ sa ≡ ¬x or s ≽ ¬sa ≽ x. That is, x ̸ s or x ≼ s. And since the theoretical domain
of DX can be generated from D, x is either narrower than or incompatible with any other
statement in the theoretical domain. Therefore x is a possibility.
Now we show that X, as defined, covers all possibilities. Let x be a possibility for Dx .
Let Fx = {s ∈ D∣x ̸ s} and Tx = {s ∈ D ∣ x ≼ s}. Since x is a possibility Fx ∪ Tx = D and since
D is linearly ordered, s1 ≺ s2 for all s1 ∈ Fx and s2 ∈ Tx . Let fx = ⋁ s and tx = ⋀ s. To
s∈Fx s∈Tx
see that fx ∈ D, let fx′ = ⋁ s. This will be in Fx as it is in Db and will be false if x is
s∈Fx ∩Db
true. Because of (ii), there can be only one statement in Fx ∩ ¬(Da ) that is broader than
fx′ but narrower than the other elements of Db that are not in Fx , therefore fx is in D as
it has been reduced to a finite disjunction. Similarly, tx ∈ D. Let t′x = ⋀ s. This will
s∈Tx ∩¬(Da )
be in Tx as it is in ¬(Da ) and will be true if x is true. Because of (ii), there can be only
176 CHAPTER 3. PROPERTIES AND QUANTITIES

one statement in Tx ∩ Db that is narrower than t′x but broader than the other elements of
Da that are not in Tx , therefore tx is in D as it has been reduced to a finite conjunction.
Consider ¬fx ∧ tx : if true then fx will be false, and so will all statements in Fx since they
are narrower; also tx will be true, and so will all statements in Tx since they are broader.
We have x ≡ ¬fx ∧ tx . Since x is not impossible, fx ≢ tx . Since all statements in D are either
in Fx or Tx , tx is the immediate successor of fx . Therefore by (iii) fx ∈ Db , ¬tx ∈ Da and
x ∈ X.
Now we show that X can be given a natural ordering. Let Bb ⊆ Db be the set of
statements that have an immediate successor in D and Ba ⊆ Da be the set of the negation
of the immediate successors. Let (⋅)++ ∶ Bb → Ba be the function such that ¬(b++ ) = ¬b++ is
the immediate successor of b. Let b ∶ X → Bb be the function such that x ≡ ¬b(x)∧¬b(x)++ .
On X define the ordering ≤ such that x1 ≤ x2 if and only if b(x1 ) ≼ b(x2 ). Since (Bb , ≼)
is linearly ordered so is (X, ≤). To show that the ordering is natural, suppose x1 < x2
then b(x1 ) ≺ ¬b(x1 )++ ≼ b(x2 ) and therefore x1 ≼ b(x2 ). We also have ¬b(x1 )++ ≼ b(x2 ) ≺
¬b(x2 )++ and therefore x2 ≼ b(x1 )++ . This means that given a possibility x1 ∈ X, all and
only the possibilities lower than x1 are compatible with b(x1 ) and therefore b(x1 ) ≡ “x <
x1 ”, while all and only the possibilities greater than x1 are compatible with b(x1 )++ and
therefore b(x1 )++ ≡ “x > x1 ”. The topology is the order topology and the domain has a
natural ordering.

3.3 References and experimental ordering


In the previous section we have characterized what a quantity is and how it relates to an
experimental domain. But as we saw in the first chapters, the possibilities of a domain are
not objects that exist a priori: they are defined based on what can be verified experimentally.
Therefore simply assigning an ordering to the possibilities of a domain does not answer the
more fundamental question: how are quantities actually constructed? How do we, in practice,
create a system of references that allows us to measure a quantity at a given level of precision?
What are the assumptions we make in that process?
In this section we construct ordering from the idea of a reference that physically defines a
boundary between a before and an after. In general, a reference has an extent and may overlap
with others. We define ordering in terms of references that are clearly before and after others.
We see that the possibilities have a natural ordering only if they are generated from a set of
references that is refinable (we can always find finer ones that do not overlap) and for which
before/on/after are mutually exclusive cases. The possibilities, then, are the finest references
possible.
We are by now so used of the ideas of real numbers, negative numbers and the number zero
that it is difficult to realize that these are mental constructs that are, in the end, somewhat
recent in the history of humankind. Yet geometry itself started four thousand years ago as
an experimentally discovered collection of rules concerning lengths, areas and angles. That is,
human beings were measuring quantities well before the real numbers were invented. So, how
does one construct instruments that measure values?
To measure position, we can use a ruler, which is a series of equally spaced marks. We
give a label to each mark (e.g. a number) and note which two marks are closest to the target
position (e.g. between 1.2 and 1.3 cm). To measure weight, we can use a balance and a set of
equally prepared reference weights. The balance can clearly tell us whether one side is heavier
3.3. REFERENCES AND EXPERIMENTAL ORDERING 177

than the other, so we use it to compare the target with a number of reference weights and
note the two closest (e.g. between 300 and 400 grams). A clock gives us a series of events to
compare to (e.g. earth’s rotation on its axis, the ticks of a clock). We can pour water from
a reference container into another as many times as are needed to measure its volume. In all
these cases what actually happens is similar: we have a reference (e.g. a mark on a ruler, a set
of equally prepared weights, a number of ticks of a clock) and it is fairly easy to tell whether
what we are measuring is before (e.g shorter, lighter, sooner) or after (e.g. longer, heavier,
later) the reference.
Note that determining whether the quantity is exactly equal to the reference is not as
easy: the mark on the ruler has a width, the balance has friction, the tick of our clock will last
a finite amount of time. That is, the reference itself can only be compared up to a finite level
of precision. This may be a problem when constructing the references themselves: how do we
know that the marks on our ruler are equally spaced, or that the weights are equally prepared,
or that ticks of our clock are equally timed? It is a circular problem in the sense that, in a
way, we need instruments of measurement to be able to create instruments of measurement.
Yet, even if our references can’t be perfectly compared and are not perfectly equal, we can
still say whether the value is well before or well after any of them.
To make matters worse, the object we are measuring may itself have an extent. If we are
measuring the position of a tiny ball, it may be clearly before or clearly after the nearest
mark, but it may also be partly before, partly on and partly after. One may try to sidestep
the problem by measuring part of the object, say the position of the center of mass or of its
closest part. But this assumes we have a process to interact with only part of the object, and
that part can only be before, on or after the reference. It may be a reasonable assumption in
many cases but we have to be mindful that we made that assumption: our general definition
will have to be able to work in the less ideal cases.
In our general mathematical theory of experimental science, we can capture the above
discussion with the following definitions. A reference is represented by a set of three state-
ments: they tell us whether the object is before, on or after a specific reference. To make sense,
these have to satisfy the following minimal requirements. The before and the after statements
must be verifiable, as otherwise they would not be usable as references. As the reference must
be somewhere, the on statement cannot be an impossibility. If the object is not before and
not after the reference, then it must be on the reference. If the object is before and after the
reference, then it must also be on the reference. These requirements recognize that, in general,
a reference has an extent and so does the object being measured.
We can compare the extent of two references and say that one is finer than the other if
the on statement is narrower than the other, and the before and after statements are wider.
This corresponds to a finer tick of a ruler or a finer pulse in our timing system. We say that
a reference is strict if the before, on and after statements are incompatible. That is, the three
cases are distinct and can’t be true at the same time.

Definition 3.17. A reference defines a before, an on and an after relationship between


itself and another object. Formally a reference r = (b, o, a) is a tuple of three statements
such that:

1. we can verify whether the object is before or after the reference: b and a are verifiable
178 CHAPTER 3. PROPERTIES AND QUANTITIES

statements
2. the object can be on the reference: o ≢ –
3. if it’s not before or after, it’s on the reference: ¬b ∧ ¬a ≼ o
4. if it’s before and after, it’s also on the reference: b ∧ a ≼ o

A beginning reference has nothing before it. That is, b ≡ –. An ending reference has
nothing after it. That is, a ≡ –. A terminal reference is either beginning or ending.

Corollary 3.18. Let (b, o, a) be a reference. Then b ∨ o ∨ a ≡ ⊺.

Proof. By definition, we have ¬b ∧ ¬a ≼ o and by 1.22 ¬(¬b ∧ ¬a) ∨ o ≡ ⊺ ≡ b ∨ a ∨ o.

Definition 3.19. A reference r1 = (b1 , o1 , a1 ) is finer than another reference r2 = (b2 , o2 , a2 )


if b1 ≽ b2 , o1 ≼ o2 and a1 ≽ a2 .

Corollary 3.20. The finer relationship between references is a partial order.

Proof. As the finer relationship is directly based on narrowness, it inherits its reflexivity,
antisymmetry and transitivity properties and is therefore a partial order.

Definition 3.21. A reference is strict if its before, on and after statements are incom-
patible. Formally, r = (b, o, a) is such that b ̸ a and o ≡ ¬b ∧ ¬a. A reference is loose if it
is not strict.

Remark. In general, we can’t turn a loose reference into a strict one. The on statement
can be made strict by replacing it with ¬b∧¬a. This is possible because o is not required to
be verifiable. The before (and after) statements would need to be replaced with statements
like b ∧ ¬a, which are not in general verifiable because of the negation.
To measure a quantity we will have many references one after the other: a ruler will have
many marks, a scale will have many reference weights, a clock will keep ticking. What does
it mean that a reference comes after another in terms of the before/on/after statements?
If reference r1 is before reference r2 we expect that if the value measured is before the first
it will also be before the second, and if it is after the second it will also be after the first. Note
that this is not enough, though, because as references have an extent they may overlap. And
if they overlap one can’t be after the other. To have an ordering properly defined we must
have that the first reference is entirely before the second. That is, if the value measured is on
the first it will be before the second and if it is on the second it will be after the first.
Mathematically, this type of ordering is strict in the sense that it defines what is strictly
before and strictly after. It does not define what happens in the overlap, in between cases.
One may be tempted to define the ordering based on how the references overlap, but that
requires refining the references and, in the end, it means we are defining an ordering on those
refined references, not the original ones.

Definition 3.22. A reference is before another if whenever an object is found before or on


the first it cannot be on or after the second. Formally, r1 < r2 if and only if b1 ∨ o1 ̸ o2 ∨ a2 .

Proposition 3.23. Reference ordering satisfies the following properties:


3.3. REFERENCES AND EXPERIMENTAL ORDERING 179

ˆ irreflexivity: not r < r


ˆ transitivity: if r1 < r2 and r2 < r3 then r1 < r3

and is therefore a strict partial order.

Proof. For irreflexivity, since the on statement can’t be impossible, we have o  o and
therefore b ∨ o  o ∨ a. Therefore a reference is not before itself and the relationship is
irreflexive.
For transitivity, if r1 < r2 , we have b1 ∨ o1 ̸ o2 ∨ a2 and therefore ¬(b1 ∨ o1 ) ≽ o2 ∨ a2
by 1.22. Since b1 ∨ o1 ∨ a1 ≡ ⊺, we have a1 ≽ ¬(b1 ∨ o1 ). Similarly if r2 < r3 we’ll have
a2 ≽ ¬(b2 ∨ o2 ) ≽ o3 ∨ a3 . Putting it all together ¬(b1 ∨ o1 ) ≽ o2 ∨ a2 ≽ a2 ≽ ¬(b2 ∨ o2 ) ≽ o3 ∨ a3 ,
which means b1 ∨ o1 ̸ o3 ∨ a3 .

Corollary 3.24. The relationship r1 ≤ r2 , defined to be true if r1 < r2 or r1 = r2 , is a


partial order.
As we saw, two references may overlap and therefore an ordering between them cannot be
defined. But references can overlap in different ways.
Suppose we have a vertical line one millimeter thick and call the left side the part before
the line and the right side the part after. We can have another vertical line of the same
thickness overlapping but we can also have a horizontal line which will also, at some point,
overlap. The case of the two vertical lines is something that, through finding finer references,
can be given a linear order. The case of the vertical and horizontal line, instead, cannot.
Intuitively, the vertical lines are aligned while the horizontal and vertical are not.
Conceptually, the overlapping vertical lines are aligned because we can imagine narrower
lines around the borders, and those lines will be ordered references in the above sense: each
line would be completely before or after, without intersection. This means that the before and
not-after statements of one reference are either narrower or broader than the before and not-
after statements of the other. That is, alignment can also be defined in terms of narrowness
of statements.
Note that if a reference is strict, before and after statements are not compatible and
therefore the before statement is narrower than the not-after statement. This means that,
given a set of aligned strict references, the set of all before and not-after statements is linearly
ordered by narrowness. As we saw in the previous section, this was a necessary condition for
the possibilities of a domain to be ordered and therefore aligned strict references play a crucial
role.

Definition 3.25. Two references r1 = (b1 , o1 , a1 ) and r2 = (b2 , o2 , a2 ) are aligned if for
any s1 ∈ {b1 , ¬a1 } and s2 ∈ {b2 , ¬a2 } we have s1 ≼ s2 or s2 ≼ s1 . A set R of references is
aligned if any pair of references is aligned.

Proposition 3.26. Let R = {(bi , oi , ai )}i∈I be a set of aligned strict references. Let Bb =
{bi }i∈I and Ba = {ai }i∈I be respectively the sets of before and after statements. Then the set
B = Bb ∪ ¬(Ba ) is linearly ordered by narrowness.

Proof. Let R be a set of aligned strict references. Let s1 , s2 ∈ B. Suppose they are taken
from the same reference. If they are both before statements or both after statements, we
180 CHAPTER 3. PROPERTIES AND QUANTITIES

have s1 ≡ s2 and therefore s1 ≼ s2 . If one is the before statement and the other is the not-
after statement, since the reference is strict, we have b ̸ a and b ≼ ¬a by 1.22 and therefore
s1 ≼ s2 or s2 ≼ s1 . Now suppose they are taken from different references. Since they are
aligned we have s1 ≼ s2 or s2 ≼ s1 by definition.

Proposition 3.27. Let r1 = (b1 , o1 , a1 ) and r2 = (b2 , o2 , a2 ) be two references. If r1 < r2


then b1 ≼ b2 , a2 ≼ a1 , b1 ≼ ¬a2 , ¬a1 ≼ b2 and therefore the two references are aligned.

Proof. We have b1 ∨ o1 ≡ (b1 ∨ o1 ) ∧ ⊺ ≡ (b1 ∨ o1 ) ∧ (b2 ∨ o2 ∨ a2 ) ≡ ((b1 ∨ o1 ) ∧ b2 ) ∨


((b1 ∨ o1 ) ∧ (o2 ∨ a2 )) ≡ ((b1 ∨ o1 ) ∧ b2 ) ∨ – ≡ (b1 ∨ o1 ) ∧ b2 . Therefore b1 ∨ o1 ≼ b2 . And since
b1 ≼ b1 ∨ o1 , we have b1 ≼ b2 .
Similarly, we have o2 ∨ a2 ≡ (o2 ∨ a2 ) ∧ ⊺ ≡ (o2 ∨ a2 ) ∧ (b1 ∨ o1 ∨ a1 ) ≡ ((o2 ∨ a2 ) ∧ (b1 ∨
o1 )) ∨ ((o2 ∨ a2 ) ∧ a1 ) ≡ – ∨ ((o2 ∨ a2 ) ∧ a1 ) ≡ (o2 ∨ a2 ) ∧ a1 . Therefore a2 ∨ o2 ≼ a1 . And since
a2 ≼ o2 ∨ a2 , we have a2 ≼ a1 .
Since b1 ∨ o1 ̸ o2 ∨ a2 , we have b1 ̸ a2 which means b1 ≼ ¬a2 .
Since b1 ∨o1 ∨a1 ≡ ⊺, we have ¬a1 ≼ b1 ∨o1 . Similarly ¬b2 ≼ o2 ∨a2 . Since b1 ∨o1 ̸ o2 ∨a2 ,
¬a1 ̸ ¬b2 and therefore ¬a1 ≼ b2 .
Since b1 ≼ b2 , a2 ≼ a1 , b1 ≼ ¬a2 and ¬a1 ≼ b2 , the two references are aligned.

Proposition 3.28. Let r1 = (b1 , o1 , a1 ) and r2 = (b2 , o2 , a2 ) be two strict references. Then
r1 < r2 if and only if ¬a1 ≼ b2 .

Proof. Let r1 < r2 . By 3.27, we have ¬a1 ≼ b2 . Conversely, let ¬a1 ≼ b2 . Then ¬a1 ̸ ¬b2 .
Because the references are strict, ¬a1 ≡ b1 ∨ o1 and ¬b2 ≡ o2 ∨ a2 . Therefore b1 ∨ o1 ̸ o2 ∨ a2
and r1 < r2 by definition.

Definition 3.29. A reference is the immediate predecessor of another if nothing can be


found before the second and after the first. Formally, r1 < r2 and a1 ̸ b2 . Two references
are consecutive if one is the immediate successor of the other.

Proposition 3.30. Let r1 = (b1 , o1 , a1 ) and r2 = (b2 , o2 , a2 ) be two references. If r1 is


immediately before r2 then b2 ≡ ¬a1 .

Proof. Let r1 be immediately before r2 . Then a1 ̸ b2 which means b2 ≼ ¬a1 . By 3.27


we also have ¬a1 ≼ b2 . Therefore b2 ≡ ¬a1 .

Proposition 3.31. Let r1 = (b1 , o1 , a1 ) and r2 = (b2 , o2 , a2 ) be two strict references. Then
r1 is immediately before r2 if and only if b2 ≡ ¬a1 .

Proof. Let r1 be immediately before r2 . Then b2 ≡ ¬a1 by 3.30. Conversely, let b2 ≡ ¬a1 .
Then r1 < r2 by 3.28. We also have a1 ̸ ¬a1 , therefore a1 ̸ b2 and r1 is immediately before
r2 by definition.

If we have a set of references, we can generate an experimental domain by using the before
and after statements as the basis. More specifically we can take all the before statements and
generate the before domain Db and all the after statements and generate the after domain
Da . If the references are all strict and aligned, the set D = Db ∪ ¬(Da ) of all the before and
not-after statements will be linearly ordered by narrowness. We recognize this as the first
requirement of the domain ordering theorem 3.16.
3.3. REFERENCES AND EXPERIMENTAL ORDERING 181

Definition 3.32. Let R = {(bi , oi , ai )}i∈I be a set of references. Let Bb = {bi }i∈I be the set
of all before statements and Ba = {ai }i∈I be the set of all after statements. The experimental
domain D generated by R is the one generated by all before and after statements Bb ∪ Ba .
The before domain Db is the domain generated only by the before statements Bb and the
after domain Da is the domain generated only by the after statements Ba .

Definition 3.33. Let D be a domain generated by a set of references R. A reference


r = (b, o, a) is said to be aligned with D if b ∈ Db and a ∈ Da .

Proposition 3.34. Let D be an experimental domain generated by a set of aligned strict


references R and let D = Db ∪ ¬(Da ). Then (D, ≼) is linearly ordered.

Proof. By 3.26 we have that B = Bb ∪ ¬(Ba ) is aligned by narrowness. By 3.15 the


ordering extends to D.

Having a set of aligned references is not necessarily enough to cover the whole space at all
levels of precision. To do that we need to make sure that, for example, between two references
that are not consecutive we can at least put a reference in between. Or that if we have two
references that overlap, we can break them apart into finer ones that do not overlap and one
is after the other.
We call a set of references refinable if the domain they generate has the above mentioned
properties. This allows us to break up the whole domain into a sequence of references that
do not overlap, are linearly ordered and that cover the whole space. As we get to the finest
references, their before statements will be immediately followed by the negation of their after
statements, since there can’t be any reference in between. Conceptually, this will give us the
second and the third condition of the domain ordering theorem 3.16.

Definition 3.35. Let D be an experimental domain generated by a set of aligned references


R. The set of references is refinable if, given two strict references r1 = (b1 , o1 , a1 ) and
r2 = (b2 , o2 , a2 ) aligned with D, we can always:

ˆ find an intermediate one if they are not consecutive; that is, if r1 < r2 but r2 is not
the immediate successor of r1 , then we can find a strict reference r3 aligned with D
such that r1 < r3 < r2 .
ˆ refine overlapping references if one is finer than the other; that is, if o2 ≺ o1 , we can
find a strict reference r3 aligned with D such that o3 ≼ o1 and either b3 ≡ b1 and
r3 < r2 or a3 ≡ a1 and r2 < r3 .

Proposition 3.36. Let D be an experimental domain generated by a set of refinable aligned


strict references R.

1. If b1 , b2 ∈ Db such that b1 ≺ b2 , then there exists a ∈ Da such that b1 ≺ ¬a ≼ b2 .


2. If a1 , a2 ∈ Da such that ¬a1 ≺ ¬a2 , then there exists b ∈ Db such that ¬a1 ≼ b ≺ ¬a2 .
3. If a1 ∈ Da and b2 ∈ Db such that ¬a1 ≺ b2 , then there exists b ∈ Db and a ∈ Da such
that ¬a1 ≼ b ≺ ¬a ≼ b2 .
182 CHAPTER 3. PROPERTIES AND QUANTITIES

Proof. For the first, suppose b1 , b2 ∈ Db such that b1 ≺ b2 . Then r1 = (b1 , ¬b1 , –) and
r2 = (b2 , ¬b2 , –) are strict references aligned with the domain such that ¬b2 ≺ ¬b1 . This
means we can find r3 = (b1 , ¬b1 ∧ ¬a, a) for some a ∈ Da such that r3 < r2 and therefore
b1 ≺ ¬a ≼ b2 .
For the second, suppose a1 , a2 ∈ Da such that ¬a1 ≺ ¬a2 . Then r1 = (–, ¬a2 , a2 ) and
r2 = (–, ¬a1 , a1 ) are strict references aligned with the domain such that ¬a1 ≺ ¬a2 . This
means we can find r3 = (b, ¬b ∧ ¬a2 , a2 ) for some b ∈ Db such that r3 < r2 and therefore
¬a1 ≼ b ≺ ¬a2 .
For the third, suppose a1 ∈ Da and b2 ∈ Db such that ¬a1 ≺ b2 . Then r1 = (–, ¬a1 , a1 )
and r2 = (b2 , ¬b2 , –) are strict references aligned with the domain such that r1 < r2 but r2
is not an immediate successor of r1 . This means we can find r3 = (b, ¬b ∧ ¬a, a) such that
r1 < r3 < r2 and therefore ¬a1 ≼ b ≺ ¬a ≼ ¬b2 .

Proposition 3.37. Let D be an experimental domain generated by a set of refinable aligned


strict references. Then all elements of D are part of a pair (sb , ¬sa ) such that sb ∈ Db ,
sa ∈ Da and ¬sa is the immediate successor of sb in D or sb ≡ ¬sa . Moreover if s ∈ D has an
immediate successor, then s ∈ Db .

Proof. Let D be an experimental domain generated by a set of refinable aligned strict


references. Let sb ∈ Db . Let A = {a ∈ Da ∣ a ∨ sb ≢ ⊺}. Let sa = ⋁ a. First we show that
a∈A
sb ≼ ¬sa . We have sb ∧ ¬sa ≡ sb ∧ ¬ ⋁ a ≡ sb ∧ ⋀ ¬a ≡ ⋀ sb ∧ ¬a. For all a ∈ A we have
a∈A a∈A a∈A
a ∨ sb ≢ ⊺, ¬a ⋠ sb which means sb ≼ ¬a because of the total order of D. This means that
sb ∧ ¬a ≡ sb for all a ∈ A, therefore sb ∧ ¬sa ≡ sb and sb ≼ ¬sa .
Next we show that no statement s ∈ D is such that sb ≺ s ≺ ¬sa . Let a ∈ Da such that
sb ≺ ¬a. By construction a ∈ A and therefore ¬sa ≼ ¬a. Therefore we can’t have sb ≺ a ≺ ¬sa .
We also can’t have b ∈ Db such that sb ≺ b ≺ ¬sa : by 3.36 we’d find a ∈ Da such that
sb ≺ a ≼ b ≺ ¬sa which was ruled out. So there are two cases. Either sb ≢ ¬sa then sb ≺ ¬sa :
¬sa is the immediate successor of b. Or sb ≡ ¬sa .
The same reasoning can be applied starting from sa ∈ Da to find a sb ∈ Db such that sb is
the immediate predecessor of ¬sa or an equivalent statement. This shows that all elements
of D are paired.
To show that if a statement in D has a successor then it must be a before statement, let
s1 , s2 ∈ D such that s2 is the immediate successor of s1 . By 3.36, in all cases where s1 ∉ Db
and s2 ∉ Da we can always find another statement between the two. Then we must have
that s1 ∈ Db and s2 ∈ Da .

Theorem 3.38 (Reference ordering theorem). An experimental domain is naturally or-


dered if and only if it can be generated by a set of refinable aligned strict references.

Proof. Suppose DX is an experimental domain generated by a set of refinable aligned


strict references. Then by 3.34 and 3.37 the domain satisfies the requirement of theorem
3.16 and therefore is naturally ordered.
Now suppose DX is naturally ordered. Define the set Bb , Ba and D as in 3.12. Let
R = {(b, ¬b ∧ ¬a, a) ∣ b ∈ Bb , a ∈ Ba , b ≺ ¬a} be the set of all references constructed from the
basis. First let us verify they are references. The before and after statements are verifiable
3.4. DISCRETE QUANTITIES 183

since they are part of the basis. The on statement ¬b ∧ ¬a is not impossible since b ≺ ¬a
means b ̸ a and b ≢ ¬a. The on statement is broader than ¬b∧¬a as they are equivalent and
it is broader than b ∧ a as that is impossible since b ≺ ¬a. Therefore R is a set of references.
Since the before and after statements of R coincide with the basis of the domain, DX is
generated by R.
Now we show that R consists of aligned strict references. We already saw that b ̸ a
and we also have ¬b ∧ ¬a is incompatible with both b and a. The references are strict. To
show they are aligned, take two references. The before and not after statements are linearly
ordered by 3.14 which means the references are aligned.
To show R is refinable, note that each reference can be expressed as (“x < x1 ”, “x1 ≤
x ≤ x2 ”, “x > x2 ”) where x1 , x2 ∈ X and “x1 ≤ x ≤ x2 ” ≡ “x ≥ x1 ” ∧ “x ≤ x2 ”. That is,
every reference is identified by two possibilities x1 , x2 such that x1 ≤ x2 . Therefore take
two references r1 , r2 ∈ R and let (x1 , x2 ) and (x3 , x4 ) be the respective pair of possibilities
we can use to express the references as we have shown. Suppose r1 < r2 but they are not
consecutive. Then “x ≤ x2 ” ≺ “x < x3 ”. That is, we can find x5 ∈ X such that x2 < x5 < x3
which means “x ≤ x2 ” ≼ “x < x5 ” and “x ≤ x5 ” ≼ “x < x3 ”. Therefore the reference r3 ∈ R
identified by (x5 , x5 ) is between the two references. On the other hand, assume the second
reference is finer than the first. Then x1 ≤ x3 and x4 ≤ x2 with either x1 ≠ x3 or x4 ≠ x2 .
Consider the references r3 , r4 ∈ R identified by (x1 , x1 ) and (x2 , x2 ). Either r3 < r2 or
r2 < r4 . Also note that the before statements of r1 and r3 are the same and the after
statements of r1 and r4 are the same. Therefore we satisfy all the requirements and the set
R is refinable by definition.

To recap, experimentally we construct ordering by placing references and being able to


tell whether the object measured is before or after. We can define a linear order on the
possibilities, and therefore a quantity, only when the set of references meets special conditions.
The references must be strict, meaning that before, on and after are mutually exclusive.
They must be aligned, meaning that the before and not-after statement must be ordered by
narrowness. They must be refinable, meaning when they overlap we can always find finer
references with well defined before/after relationships. If all these conditions apply, we have
a linear order. If any of these conditions fail, a linear order cannot be defined.
The possibilities, then, correspond to the finest references we can construct within the
domain. That is, given a value q0 , we have the possibility “the value of the property is q0 ” and
we have the reference (“the value of the property is less than q0 ”, “the value of the property
is q0 ”, “the value of the property is more than q0 ”).

3.4 Discrete quantities


Now that we have seen the general conditions to have a naturally ordered experimental do-
main, we study common types of quantities and under what conditions they arise. We start
with discrete ones: the number of chromosomes for a species, the number of inhabitants of a
country or the atomic number for an element are all discrete quantities. These are quantities
that are fully characterized by integers (positive or negative).
We will see that discrete quantities have a simple characterization: between two references
there can only be a finite number of other references.
The first thing we want to do is characterize the ordering of the integers. That is, we want
184 CHAPTER 3. PROPERTIES AND QUANTITIES

to find necessary and sufficient conditions for an ordered set of elements to be isomorphic to a
subset of integers. First we note that between any two integers there are always finitely many
elements. Let’s call sparse an ordered set that has that property: that between two elements
there are only finitely many. This is enough to say that the order is isomorphic to the integers.
In fact, if an ordered set is sparse we can always go from any element to another in finitely
many steps. Therefore we can pick one element, call it zero, go forward one element at a time
and assign a positive integer to all the following elements or go backward one element at a
time and assign a negative integer to all the preceding elements.

Definition 3.39. A chain is a linearly ordered subset of an ordered set. A chain between
two elements is a chain where the two elements are the greatest and smallest elements.

Definition 3.40. An ordered set is sparse if every chain between any two elements is
finite.

Corollary 3.41. Every element in a sparse linearly ordered set that has a predecessor (or
successor) has an immediate predecessor (or successor).

Proof. Let Q be a sparse ordered set. Suppose q1 ∈ Q has a predecessor (or successor) q0 .
Then we have a finite chain between the two, and the immediate predecessor (or successor)
of q1 is the second greatest (or smallest) element.
Remark. The converse of the corollary, that a linearly ordered set in which every element
that has a predecessor (or successor) has an immediate predeccessor (or successor) is sparse,
is not true. Consider the integers with the following ordering {0, 1, 2, 3, ..., −3, −2, −1}. All
elements have an immediate predecessor/successor or no predecessor/successor, yet 3 and
−3 have infinitely many elements in between.

Proposition 3.42. A linear order is sparse if and only if it is isomorphic to a contiguous


subset of the integers.

Proof. Let Q be a sparse linearly ordered set. Pick an element q0 ∈ Q. Let q ∶ Q → Z


such that it returns 0 for q0 , 1 for its immediate successor (if it exists), −1 for its immediate
predecessor (if it exists), 2 for the immediate successor of the immediate successor (if it
exists) and so on. Since the order is sparse, all elements will eventually be reached through
a chain of immediate successors/predecessors and will be assigned a value. The function is
injective and order preserving, so it is an isomorphism over q(Q), which, by construction,
is a contiguous subset of the integers
Conversely, let Q be an ordered set isomorphic to a contiguous subset of the integers
and let q ∶ Q → Z be the isomorphism. The number of elements between two elements of
the set will be equal to the number of elements between the two corresponding integers,
which is always finite.

We can now define a discrete quantity as one for which the ordering is sparse. While this
may typically correspond to a set of contiguous integers, it is not necessary. For example, a set
of names ordered alphabetically, the set of orbitals for a hydrogen atom, the possible energies
for a quantum harmonic oscillator, these are all discrete quantities even if the label we use is
not an integer.
3.4. DISCRETE QUANTITIES 185

The natural question now is: under what conditions are the possibilities of a domain or-
dered like the integers? The answer is straightforward: when the references within the domain
have a sparse order. That is, between two references we can only put finitely many ordered
references.

Definition 3.43. A discrete quantity for an experimental domain DX is a quantity


(Q, ≤, q) for which the ordering is sparse.

Theorem 3.44 (Discrete ordering theorem). Let DX be an experimental domain. Then


the following are equivalent:

1. the domain has a natural sparse order


2. the domain is fully characterized by a discrete quantity
3. the domain is generated by a set of refinable aligned strict references with a sparse
order

Proof. For (1) to (2), let DX be an experimental domain with a natural sparse order
and let X be its possibilities. Pick an ordered set (Q, ≤) that is order isomorphic to the
possibilities and let q ∶ X → Q be an order isomorphism. By 3.9 DX is fully characterized
by (Q, ≤, q). Since the order on X is sparse, X will be order isomorphic to a contiguous set
of integers and so will Q. Therefore Q has a sparse order as well and is therefore a discrete
quantity.
For (2) to (3), let DX be an experimental domain fully characterized by Q. Then by
3.9 and by 3.38 it is generated by a set of refinable aligned strict references. Let r1 and
r2 be two references aligned with the domain such that r1 < r2 . Then the after statement
of r1 will be of the form “x > q −1 (q1 )”, the before statement of r2 will be of the form
“x < q −1 (q2 )” for some q1 , q2 ∈ Q such that q1 < q2 . Let r ∶ Q → DX × D ¯ X × DX be the
function such that r(qi ) = (“x < q (qi )”, “x ≥ q (qi )” ∧ “x ≤ q (qi )”, “x > q −1 (qi )”). Let
−1 −1 −1

C = {r1 } ∪ {r(qi ) ∣ qi ∈ Q, q1 < qi < q2 } ∪ {r2 }. This is the longest chain between r1 and r2
and it is finite because Q has a sparse ordering. The set of references that generate the
domain, then, must have a sparse ordering.
For (3) to (1), let DX be an experimental domain generated by a set of refinable
aligned strict references with a sparse order. By 3.38 DX has a natural order. Let r ∶ X →
DX × D ¯ X × DX be the function such that r(xi ) = (“x < xi ”, “x ≥ xi ” ∧ “x ≤ xi ”, “x > xi ”).
Let R = {r(xi ) ∣ xi ∈ X}. Then R is order isomorphic to X. As the order on R is sparse
then the order on X is sparse as well.

Now, consider the examples above of discrete quantities: in each case we can experimentally
test whether we have a particular value or not. For example, we are always able to tell whether
there are exactly three apples on the table or not.5 This is not a coincidence: there is a direct
link between the ability to have consecutive references and decidability.
As we saw before in 3.31, two consecutive references are such that the before statement
of one is equal to the negation of the after statement of the other. But since before and after
statements are both verifiable, it means their negation is also verifiable: they are decidable.
5
Recall that this is not the case with continuous quantities. Because of finite precision, we are able to
exclude that a given particle has exactly zero mass but it is not possible to conclusively show that it has zero
mass.
186 CHAPTER 3. PROPERTIES AND QUANTITIES

And since before and after statements generate the domain, all statements in the domain are
decidable. It turns out that this will work in reverse as well: whenever we have a domain
consisting of only decidable statements, we can always create a discrete quantity that fully
characterizes the experimental domain.

Proposition 3.45. The order topology for the integers is discrete.


Proof. Each singleton {z} ⊆ Z is in the order topology since {z} = (z −1, ∞)∩(−∞, z +1).
Each arbitrary set of integers is the union of singletons and is therefore in the order topology
as well. The order topology on the integers is discrete.

Proposition 3.46. An experimental domain is decidable if and only if it is fully charac-


terized by a discrete quantity.

Proof. Let DX be a decidable domain. Then by 1.74 the set of possibilities X is countable
and by 1.76 the natural topology is discrete. Since X is countable, there exists a bijective
map q ∶ X → Z. The map is a homeomorphism since the topology on both X and Z is
discrete. The domain DX is fully identified by (Z, ≤, q).
Let DX be fully characterized by (Z, ≤, q). This means that q ∶ X → Z is a homeomor-
phism. The natural topology for X is therefore discrete and the domain is decidable by
1.76.

Note that, since any discrete order leads to a decidable domain and a discrete topology,
any reordering of the possibilities will give the same exact domain. This means that, while
we can always assign a natural order, the order itself may not necessarily be meaningful. For
example, we can always take a finite group of objects and arbitrarily assign each a unique
integer to identify it. In the discrete case the domain itself is not enough to pick a unique
order, though the set of aligned references that are used to generate the domain is.
Also note that for the link between decidability and discrete quantities to apply, it is
crucial the quantity is measurable: that we can actually experimentally ascertain its values.
Consider the domain with the possibilities “there is no extra-terrestrial life” and “there is
extra-terrestrial life”. We can arbitrarily label 0 the former and 1 the latter. But since we
cannot verify the first statement, we cannot really “measure” 0. In that case, the domain is
fully identified by the discrete quantity, but not fully characterized.

3.5 Arbitrary precision and continuous quantities


The second type of quantities we want to consider are continuous ones: the average wingspan
for a species, the population density of a country or the mass of a proton are all continuous
quantities. These are quantities that are fully characterized by real numbers.
We will see that also continuous quantities have a simple characterization: between two
references there can always be an infinite number of other references.
Similar to what we did for the integers, we want to characterize the ordering of the real
numbers. This will be a little bit more involved as we will need a few more requirements. First
of all we note that between two real numbers there are always infinitely many real numbers.
Let’s call dense an ordered set that has that property. This is not enough to identify the real
numbers, though; the rational numbers are also dense.
3.5. ARBITRARY PRECISION AND CONTINUOUS QUANTITIES 187

One reason real numbers are used over the rationals is that they contain all the limits. This
property can be restated in terms of ordering in the following way. Suppose (Q, ≤) is a linearly
ordered set. Take a set A ⊂ Q that is bounded. That is, the set B = {q ∈ Q ∣ a ≤ q ∀a ∈ A} of
elements that are greater than all the elements of A is not empty. Then we say that (Q, ≤) is
complete if B has a smallest element. One can show that the rationals are not complete. Say
A contains all rationals less than π, then there is no smallest rational value that is greater
than all elements of A.
Dense and complete linear orders exclude both the integers and the rationals, but they
don’t pick out only the reals. To do that we take advantage of two results of order theory.
The first is that all dense countable ordered sets are isomorphic to the rational numbers.
The second is that from any linear order one can construct its completion (i.e. you add all
the missing limits), which is unique up to an order isomorphism. Suppose that (Q, ≤) has a
countable subset Q ⊂ Q that is dense in Q. That is, for every two distinct elements q1 , q2 ∈ Q
where q1 < q2 we can find an element q ∈ Q such that q1 < q < q2 . If Q is dense, Q will also be
and therefore will be order isomorphic to the rationals, since it is countable. If Q is complete,
then it will be the completion of its dense set Q, and therefore it will be order isomorphic to
the reals.

Definition 3.47. An ordered set is said to be dense if between any two elements there
exists an infinite chain.

Corollary 3.48. An ordered set is dense if and only if between two elements we can always
find another one.

Proof. We give a definition of dense that is different from the typical definition because
we want it to be formally similar to our definition of sparse. Here we show that our definition
of dense is equivalent to the standard one.
Let (Q, ≤) be a dense ordered set. Let q1 , q2 ∈ Q then we can find an infinite chain
between them. Take an element q within that chain that is not an endpoint. We have
q1 < q < q2 .
Now let (Q, ≤) be a linearly ordered set such that between two elements we can always
find another one. Let qa , qb ∈ Q and let C ⊂ Q contain qa , qb , an element q1 such that
qa < q1 < qb , an element q2 such that q1 < q2 < qb , an element q3 such that q2 < q3 < qb and
so on. Then C is a chain between qa and qb that contains infinitely many elements.

Definition 3.49. A subset Q ⊂ Q of a linearly ordered set is dense in Q if given q1 , q2 ∈ Q


such that q1 < q2 we can find q ∈ Q such that q1 ≤ q ≤ q2 .

Definition 3.50. A linearly ordered set Q is complete if every non-empty bounded subset
of Q has a supremum. This is, given A ⊂ Q such that B = {q ∈ Q ∣ a ≤ q ∀a ∈ A} is not
empty, then B has a smallest element.

Theorem 3.51. A linearly ordered set is dense, complete and has a countable dense subset
if and only if it is order isomorphic to a contiguous subset of real numbers.

Remark. Proving this theorem would go beyond the scope of this book so we will take
it as a given. The general idea is as follows. Show that the completion of an ordered set is
188 CHAPTER 3. PROPERTIES AND QUANTITIES

unique up to an isomorphism. Show that the given set is the completion of the countable
dense subset. Show that the countable dense subset is order isomorphic to a subset of the
rational numbers. Show that the real numbers are the completion of the rational numbers.
Then the set is isomorphic to a subset of the real numbers.
We can now define a continuous quantity as one for which the values are a contiguous
subset of the real numbers. While in principle we could define it on a generic set that is order
isomorphic to the real numbers, we do not have any other example of such a set.
The natural question now is: under what conditions are the possibilities of a domain
ordered like the real numbers? The answer is straightforward: when the references within
the domain have a dense order. That is, between two references we can put infinitely many
ordered references. Intuitively, between two marks of a ruler we can keep putting finer and
finer marks.
Note that the dense order on the references is enough to get a dense complete order on the
possibilities that has a countable dense subset. The completion comes without extra conditions
because experimental domains are closed under countable disjunctions. New references, in fact,
can be constructed as limits of others and, since they will be just other references they will
have the same properties. The countable dense subset simply corresponds to the countable
basis of the domain.

Definition 3.52. A continuous quantity for an experimental domain DX is a quantity


(U, ≤, q) where U ⊆ R is a contiguous subset of the real numbers.

Theorem 3.53 (Continuous ordering theorem). Let DX be an experimental domain. Then


the following are equivalent:

1. the domain has a natural dense complete order that has a countable dense subset
2. the domain is fully characterized by a continuous quantity
3. the domain is generated by a set of refinable aligned strict references with a dense
order

Proof. For (1) to (2). Let DX be a domain with a natural dense complete order with
a countable dense subset. Then by 3.51 it is order isomorphic to a contiguous subset of
the real numbers. Then by 3.9 the domain is fully characterized by a continuous quantity
(U, ≤, q) where U ⊆ R.
For (2) to (3). Let DX be a domain fully characterized by a continuous quantity. Then
by 3.9 and by 3.38 it is generated by a set of refinable aligned strict references. Let r1 and
r2 be two references aligned with the domain such that r1 < r2 . Then the after statement
of r1 will be of the form “x > q −1 (q1 )”, the before statement of r2 will be of the form
“x < q −1 (q2 )” for some q1 , q2 ∈ Q such that q1 < q2 . Let r ∶ Q → DX × D ¯ X × DX be the
function such that r(qi ) = (“x < q (qi )”, “x ≥ q (qi )” ∧ “x ≤ q (qi )”, “x > q −1 (qi )”). Let
−1 −1 −1

C = {r1 } ∪ {r(qi ) ∣ qi ∈ Q, q1 < qi < q2 } ∪ {r2 }. As Q is dense, this chain will be infinite.
Therefore the domain can be generated by a set of refinable aligned strict references with
a dense order.
For (3) to (1). Let DX be an experimental domain generated by a set of refinable
aligned strict references with a dense order. By 3.38 DX has a natural order. Let r ∶ X →
3.5. ARBITRARY PRECISION AND CONTINUOUS QUANTITIES 189

DX × D ¯ X ×DX be the function such that r(xi ) = (“x < xi ”, “x ≥ xi ”∧“x ≤ xi ”, “x > xi ”). Let
R = {r(xi ) ∣ xi ∈ X}. Then R is order isomorphic to X. As the order on R is dense then the
order on X is dense as well. Let Db be the before domain. By 3.14 it is order isomorphic
to X. Let Bb ⊆ Db be a countable basis for the domain. As it is a subset of a linearly
ordered set, it will also be a linearly ordered set. As the basis is ordered by narrowness,
finite conjunctions and disjunctions of basis elements will return a basis element. Therefore
every element in Db is equivalent to the disjunction of a countable set of elements of Bb .
Let b1 , b2 ∈ Db be two statements such that b1 ≺ b2 . Let B1 , B2 ⊂ Bb be the set of basis
elements such that b1 = ⋁ s and b2 = ⋁ s. Since b1 ≢ b2 , there must be a b ∈ B2 such that
s∈B1 s∈B2
b ∉ B1 . Because of the ordering of the basis, we have b1 ≺ b ≼ b2 . The basis Bb is dense in
Db . Moreover, Db is complete. Let B ⊆ Db then ⋁ s is in Db and, since it is the narrowest
s∈B
statement that is broader than any statement in B, it is the supremum of B. As Db is
complete and has a countable dense subset so does X and therefore it has a continuous
order.
Another way to think about the verifiable statements of a domain characterized by a
continuous quantity is in terms of finite but arbitrarily small precision. That is, when we
measure a continuous quantity we can verify statements of the form “the value is 1 ± 0.5”.
The verifiable statement is in terms of a range and the extremes are rational numbers. It is
instructive to know, then, that the topology for the real numbers can be generated by those
types of statements. In terms of references this means that all our verifiable statements could
be expressed as verifying that the value is between two references from a countable set of
possible references. This, again, maps well to what one can and does do in scientific practice.
Mathematically, we call the standard topology on the real numbers the one generated by
the open intervals between rational numbers, and we can show that this is exactly the order
topology. Moreover, every verifiable set is the disjoint union of open intervals.

Definition 3.54. We call standard topology on the real numbers R the one generated
by the collections of sets B = {(a, b) ⊂ R ∣ a, b ∈ Q} of all open intervals between rational
numbers Q.

Proposition 3.55. The order topology on the real numbers is the standard topology.

Proof. To show that they are equivalent, we show that the basis of one generates the
basis of the other. Let a, b ∈ Q be two rationals. The sets (−∞, b) and (a, ∞) are in the
basis of the order topology. Their intersection is the set (a, b) of the standard topology.
The basis of the standard topology can be generated by the order topology.
Conversely, let a ∈ R be a real number. Let {Ui }i∈I be the collection of all sets Ui =
(ai , bi ) such that ai , bi ∈ Q and a < ai . These sets are in the basis of the standard topology.
We have ⋃ Ui = (a, ∞). In the same way, let b ∈ R be a real number. Let {Vj }j∈J be the
i∈I
collection of all sets Vj = (aj , bj ) such that aj , bj ∈ Q and bj < b. These sets are in the basis
of the standard topology. We have ⋃ Vj = (−∞, b). The basis of the order topology can be
j∈J
generated by the standard topology.
190 CHAPTER 3. PROPERTIES AND QUANTITIES

Proposition 3.56. Let U ∈ TR be a set in the standard topology on the reals. Then U =

⋃ Vi is the countable disjunction of open intervals Vi = (ai , bi ) where ai , bi ∈ R ∪ {−∞, ∞}.
i=1

Proof. The set U can be expressed as the union of some collection B ⊆ B of open
rational intervals. From B construct B1 ⊆ B by picking an element of B and keep adding
any element that is not-disjoint from an element in B. If there are elements left over,
construct B2 ⊆ B with the same procedure and continue until there are no elements left.
Take V1 = ⋃ V the union of all elements of B1 . Since we are taking the countable
V ∈B1
union of open rational intervals, and because of the overlap of these intervals, the result
will be an open interval over the real numbers. Repeating for all Bi will give us a collection

of disjoint open intervals {Vi }∞
i=1 such that U = ⋃ Vi .
i=1

In the previous section we saw how the ability to have consecutive references is linked to
decidability. In the case of continuous quantities, the inability to have consecutive references
means we cannot have decidable before or after statements, otherwise we could use them to
create consecutive references. So, while for integer quantities references have immediate suc-
cessors and predecessors and all statements are decidable, for a continuous quantity references
do not have immediate successors and predecessors and the statements are only verifiable.
We stress here that it is the immediate successors of the references that matter, not the
immediate successors of possibilities/values. Take the rationals, for example. As values, they
do not have immediate successors or predecessors: their order is dense. But we can construct
a reference with “the rational quantity is more than π” as an after statement and a reference
that has “the rational quantity is less than π” as a before statement. Since π is not a rational
number, there are no values in between: the two references are consecutive. This can never
happen with the reals, because all limit values are possible values. So, while the rationals do
not admit consecutive values, it is only the reals that never admit consecutive references.6

Proposition 3.57. A naturally ordered experimental domain DX is characterized by a


continuous quantity if and only if none of the verifiable statements are decidable except for
the certainty and the impossibility.
Proof. Let DX be a naturally ordered domain for which none of the verifiable statements
are decidable except for the certainty and the impossibility. Then it is generated by a
refinable set of aligned strict references R. Let r1 = (b1 , ¬b1 ∧ ¬a1 , a1 ) be a reference aligned
with the domain that admits a successor. Then a1 is a verifiable statement that is not
impossible and therefore it is not decidable given the properties of DX . This means there
can’t be an immediate successor for r1 : by 3.31 the before statement of the immediate
successor would have to be equivalent to ¬a1 which is not verifiable. Therefore if r3 is a
reference such that r1 < r3 , we can find yet another reference r2 aligned with the domain
such that r1 < r2 < r3 . The order on R is dense and by 3.53 the domain is fully characterized
by a continuous quantity.

6
Mathematically, this construction corresponds to the Cauchy limits or the Dedekind cuts that one uses
to construct the reals. The idea is that experimental domains already have them built into their structure in
a way that corresponds better to physical concepts.
3.6. WHEN ORDERING BREAKS DOWN 191

Conversely, let DX be a domain characterized by a continuous quantity (U, ≤, q). Take


a contingent verifiable statement s ∈ DX and a possibility x0 ∈ X such that x0  s. Consider
the value q0 ∈ R such that q(x0 ) = q0 . Because s is verifiable, it will correspond to a set V
of the topology of the quantity, which by 3.56 is the union of open intervals. Since x0  s
then q0 ∈ V . Find the interval (q1 , q2 ) that contains q0 . We can find q4 , q5 ∈ (q1 , q2 ) such
that q4 < q0 < q5 . We’ll also have q̂ ∈ V for all q4 < q̂ < q5 . That is, for any value that is
compatible with s we can find a contiguous finite interval surrounding the value with all
elements compatible with s.
Consider now the statement s. Because it is contingent, it will not be compatible with
at least one possibility and therefore one value in U . The set V will contain some interval
with at least one finite endpoint q0 . Since it is the endpoint of an open interval, we cannot
find q1 < q0 < q2 such that q̂ ∉ V for all q1 < q̂ < q2 . That is, there is at least one value
incompatible with s that does not admit a finite interval surrounding it with all elements
incompatible with s.
Now consider ¬s. There will be at least one value compatible with ¬s that does not admit
a finite interval surrounding it that is all compatible with ¬s. But since this cannot happen
for a verifiable statement, then ¬s is not a verifiable statement and s is not decidable.

The integers and the reals, then, are the only two possible orders defined experimentally
that are, in a sense, regular. That is, the order relationships look the same no matter where
you are in the order. You have an immediate successor (or not) regardless of what reference
you have. You can only put finitely many references (or not) between any two references.
For any other ordering, instead, some references will have an immediate successor and some
won’t. In this sense integers and reals are very special and that is why they are fundamental
in physics.

3.6 When ordering breaks down


As we have identified the necessary and sufficient conditions to define an order from exper-
imental verification, we can reflect on whether these are achievable in practice or they are
idealizations. We need to be able to create references that are strict (before/on/after mu-
tutally exclusive), aligned (references can be fully before or after each other) and refinable
(overlapping references can be divided into finer sequential ones). Are these always reasonable
expectations?
In the case of discrete quantities, they indubitably are. The domain is decidable so the
possibilities themselves are verifiable statements. We can actually confirm that “there are 3
ducks on the table”. Therefore it is clear that this can be and is achieved.
In the case of continuous quantities, instead, things are a lot more problematic. In some
cases, you start with what you think is a continuous quantity but then you realize that it was
a discrete one. For example, an amount of water seems continuous but if you keep refining
your references you see that it consists of discrete molecules. That is, you can’t really go on
refining references indefinitely. But that’s not the only way the continuous order can break
down.
Suppose you want to refine the marks on a ruler over and over, getting more and more
precise position measurements. You start with 1 mm thick lines, you reduce them to 0.1 mm
and make more of them and so on. At some point you’ll reach single molecules or single atoms,
192 CHAPTER 3. PROPERTIES AND QUANTITIES

but those have spatial extension as well. We can reach fundamental particles, but those too
have an extent (i.e. their wavefunctions can overlap). We can imagine using more wavelike
features, but the spatial resolution will be linked to higher and higher energies through higher
wave-numbers. So it is not clear we continue having ever finer references.
On a similar note, if a reference is, in the end, realized through a set of particles, it stands
to reason that if we reduce the number of particles the reference will become finer. If single
particles are the finest reference, we don’t have that many choices. But how are we going to
be able to tell our references apart, since, for example, all electrons are indistinguishable from
each other? How can we place them ever increasingly close to each other so that they don’t
scatter and switch place?
Moreover, it is not clear how our references can be strict. That is, that the object we
measure is always either before, on or after the reference. If the object we are measuring is
of smaller extent than our references then we can reasonably pretend they are strict. But if
both the references and what we are measuring are single particles, it would seem we have a
problem.
It would appear, then, that the conditions for a continuous quantity can never really
be ultimately met. And if those conditions can’t be met, it’s not that we don’t have the real
numbers: we don’t have ordering at all. Strictness, alignment and refinability are requirements
for order in general. And if space and time can’t be truly given an ordering, other derived
quantities, like velocity, acceleration, mass, energy and so on would inherit the problems.
In both cases the real numbers are just an approximation we can make by pretending we
can get finer and finer strict and aligned references. This may be contrary to the way many
people see the relationship between mathematical and physical objects. Some may feel that
the geometric description, with its infinite precision, is the perfect one while the physical
one, with the inherent measurement bounds, is the less precise one. Actually, it is quite the
opposite: the bounds of a measurement better qualify our description and knowledge while
the geometrical description provides a simplified, idealized and therefore less precise account.
In other words, 3.14 ± 0.005 is an exact physical description while π is the approximation.
We again stress the fact that the approximation can break down in different ways. It
may break down because we reach a finest element or it may break down because we do
not have finer, strict and aligned references anymore, which is what we expect to happen for
space and time. We’d have a structure where coarse references, at some point, are to a good
approximation refinable/strict/aligned and therefore approximately ordered while the really
fine ones are not. Note that a lot of physical ideas and mathematical tools rest on the idea
that there is a well defined ordering. Causality and deterministic motion require that time is
linearly ordered. Differentiation and integration also require the reals to be linearly ordered.
All these tools, then, need to be fundamentally reshaped. The general theory, then, is telling
us that there is a lot of work that needs to be redone if we are to construct a physical theory
that works in those regimes.

3.7 Summary
In this chapter we have seen how our general mathematical theory of experimental science
handles properties and quantities. These are what we typically use in practice to distinguish
between the possible cases and are what we measure experimentally.
3.7. SUMMARY 193

Mathematically, each possibility is mapped to an element of a topological space whose


verifiable sets correspond to verifiable statements. This construction maps to what happens
in manifolds, where the points of the space are in one-to-one correspondence with a suitable
Euclidean space. Our construction is more general and works for non-numeric properties as
well.
We have seen that quantities are particular types of properties characterized by a linear
order. In this case the topology is the order topology given by the linear order, which represents
the ability to experimentally compare two different values and tell which one is greater.
The property ordering theorem and the domain ordering theorem give us the necessary and
sufficient conditions under which an experimental domain is fully characterized by a quantity.
To construct a system of measurement for a quantity, we saw that all we need is to
define a set of references: objects that partition the cases into a before and after. In general,
though, references can overlap as they will have some physical extent and may not be aligned.
The reference ordering theorem tells us that an ordering on the possibilities emerges only if
the references are refinable (we can always break apart overlapping references), aligned (the
before and not-after statements are ordered by narrowness) and strict (the value is always
either before, on or after the reference).
We defined discrete quantities as the ones that can be associated with integers and contin-
uous quantities as the ones that can be associated with real numbers. Physically, the defining
characteristic of the first is that between two references we can only put finitely many ref-
erences while the defining characteristic of the second is that between two references we can
always put infinitely many. Mathematically, the ordering in the second case is automatically
complete because experimental domains will already contain all the limits in the form of
countable disjunctions.
It is important to note that the requirements for continuous quantities cannot really be
physically realized. Continuous quantities, then, should really be thought of as an idealization:
the limit of an infinite process of subdivision. To go past the idealization, either we lose the
idea of having infinitely many references between two, in which case we revert to discrete
quantities, or we lose the ordering altogether.
Part III

Blueprints for the work ahead

195
Chapter 1

Reverse Physics

1.1 Classical mechanics


The work on classical mechanics is considered mostly concluded, in the sense that suitable
initial assumptions have been identified. There are still a few open issues, such as the case of
variable mass, the generalization of the directional DOF to the relativistic case, or clarifying
the nature or the generalization to infinite DOFs (i.e. field theories).

Curvature for particle dynamics


The assumption of kinematic equivalence already gives us relativistic Hamiltonians. Does it
also give us a relationship between the curvature of the metric tensor and the forces acting
on the particles?
The setup is the following. Suppose we have two vectors in the extended phase space
dξ = {dq α , 0} and dν a = {0, dpα }. Using the symplectic form we have the invariant dξ a ωab dν b =
a

dq α dpα . Under the kinematic assumption we have dq α = dxα and dpα = mgαβ duβ + qAα . We
have dξ a ωab dν b = dxα mgαβ duβ + dxα qAα .
Since the two terms have to match at each point and the symplectic form has the same
components at each point, can we constrain the change of the components of gαβ ? The general
idea would be that components of gαβ may have to change in space/time coordinately with
Aα as to make dξ a ωab dν b = dq α dpα remain the same. Note that derivatives in q α are taken at
constant pα while derivatives in xα are taken at constant uα .

1.2 Thermodynamics
Process entropy
The key to recover thermodynamics is finding a definition of entropy that applies in very
general cases and recovers the usual definition. Instead of using the logarithm of the count of
states, we use the logarithm of the count of possible evolutions. That is, the ways a system
can evolve under a specific process. The entropy of the system is automatically relativistic
(i.e. we are essentially counting “worldlines” of the overall system in its state space) and is
process dependent (i.e. contextual).
In the case of deterministic and reversible evolution, the count of states is equivalent to the
count of evolutions, and therefore the usual definition is recovered. In the case of stochastic
steady state over continuous time, that is when the probability distribution stabilizes, the

197
198 CHAPTER 1. REVERSE PHYSICS

states will traverse infinitely many states within a small time difference dt. The count of
evolutions, then, can be shown to reduce to the permutations of infinite sequences which
recovers the Gibbs/Shannon entropy.
As for the behavior of entropy, the idea is that for a specific process, the state at a
particular time identifies a set of possible evolutions. This would be the entropy of that state.
Over the continuum, where states are points, the entropy would become a density of the count
of evolutions. In essence, the entropy of a system at a particular time tells us how much or
how little the evolution is constrained. In other words, it tells us how much the system is
expected to fluctuate. As time evolves, the state changes, and the count of evolutions changes
as well. If the evolution is deterministic, the evolutions can never split, in the sense that all
the evolutions that end up in a particular state must all go to another state. This means that
for a deterministic process the count can never decrease. If the evolution is deterministic and
reversible, then the count must stay the same. This recovers the feature of entropy to be a
non-decreasing quantity, which is conserved during reversible processes.
If the evolution allows equilibria, the evolutions will concentrate around states of equilibria.
Given that states cannot go out of equilibrium once it has been reached, the count of evolutions
is maximized at equilibrium. This recovers another feature of entropy.
Lastly, if two systems are independent, the way one evolves does not constrain the other.
The total count of evolutions, then, is simply the product of the count of evolutions of the two
systems. Since the entropy is the logarithm of the count of evolutions, it sums over independent
systems. This recovers the last property of entropy.

Equation of state
If we study the space of equilibria, each state will have a well defined entropy. Therefore
we have an equation of state S(ξ a ) where ξ a form a set of variables that fully identify the
state. Moreover, as noted before, entropy is additive under system composition of independent
systems.
In a process with equilibria different evolutions must converge to the same final state,
which means the process entropy increases and is maximized at equilibria. This gives the
general idea that entropy increases during an irreversible process. These results are therefore
valid in general, no matter what type of system is being described.
To find thermodynamics specifically, we need an additional set of assumptions. First, all
states are equilibria. Second, all state variables ξ a are additive under system composition.
Third, one of the them, which we call internal energy U , is conserved under any evolution,
including irreversible evolution. We can then write the equation of state as S(U, xi ) and define
the following quantities:
∂S 1
=β=
∂U kB T
(1.1)
∂S
= −βXi
∂xi
We can then express the differentials as:
∂S ∂S
dS = dU + i dxi = βdU − βXi dxi
∂U ∂x (1.2)
i
dU = T kB dS + Xi dx
This is essentially Gibbs’ approach to thermodynamics.
1.3. QUANTUM MECHANICS AND IRREDUCIBILITY 199

Thermodynamic laws
To recover the laws, we need a few more definitions. We define a reservoir R as a system for
which the internal energy UR is the only state variable and the state entropy SR is a linear
∂SR
function of UR . That is, ∂U R
= βR = kB1TR is a constant. We call heat Q = −∆UR the energy
lost by the reservoir during a transition.
We define a purely mechanical system M as a system for which the state entropy is zero
for each state. That is, SM (UM , xiM ) = 0. We call work W = ∆UM the energy acquired by a
purely mechanical system during a transition.
Now, consider a composite system made of a generic system A, a reservoir R and a purely
mechanical system M . Consider a transition where we go to a new equilibrium. Since energy
is additive under system composition, let us call U the total energy. Since energy is conserved
we have:

∆U = 0 = ∆UA + ∆UR + ∆UM = ∆UA − Q + W


(1.3)
∆UA = Q − W

Since entropy is extensive, let us call S the total entropy. Since the process is going to an
equilibrium, the entropy can only increase. We have:

−Q
0 ≤ ∆S = ∆SA + ∆SR + ∆SM = ∆SA + βR ∆UR + 0 = ∆SA +
kB TR
(1.4)
Q
kB ∆SA ≥
TR

1.3 Quantum mechanics and irreducibility


Quantum mechanics can be recovered by swapping reducibility with irreducibility as shown
in diagram 1.1, which can be used as a guide throughout this section.
The assumptions lie on the left column. Each assumption leads to one or two key insights
that progressively lead to the physical concepts in the middle column. Each of these is then
mapped to its corresponding formal framework on the right. Note that “quasi-static process”
and “conserved density” both independently lead to the same result of “unitary evolution”.

Irreducibility
The state space of quantum mechanics can be recovered under the:

Assumption V (Irreducibility). The state of the system is irreducible. That is, giving the
state of the whole system says nothing about the state of its parts.

Under this assumption the state of the system is automatically an ensemble over the state
of the parts as preparation of the whole leaves the parts unspecified. For the same reason, the
entropy of these ensembles must be the same, or some ensembles would provide more or less
information about the parts. The whole task, then, is to characterize these ensembles without
making specific assumptions on the parts.
Let C be the state space of the irreducible system. Let us call fragment a part of the
irreducible system. The state of a fragment will be associated with a random variable uni-
formly distributed over the possible fragment states. As discussed in the context of classical
200 CHAPTER 1. REVERSE PHYSICS

Figure 1.1: Assumptions for quantum mechanics

mechanics, distributions over states must be invariant and symplectic manifolds are the only
manifolds over which invariant distributions can be defined. As we cannot say anything about
the state of the fragments, the dimensionality of this manifold must be irrelevant as long as
it is even dimensional. For simplicity, we can choose a two-dimensional one. Therefore we
are interested in the space of bi-dimensional uniform distributions formed by a pair of two
random variables A and B.
The values of the variables themselves are not relevant, as they are not physically accessible
by assumption. However, the size of the system µ = ∫ ρdA ∧ dB is relevant. Without loss of
generality, we can rescale A and B such that the density ρ is not only uniform but unitary:
ρ = 1. This way the size of the system is directly proportional to the area covered by the
random variables. In other words: the more fragments there are, the more each fragment can
swap its state with another without changing the whole, the more uncertainty there is on the
state of the fragment, the higher the variance of the random variables.
Since only linear transformations will preserve the uniform distribution, we look to those.
These are translations, stretches and rotations. Translations do not lead to other physically
distinguishable states since the exact values of A and B are not physically accessible. Stretch-
ing of the distribution will correspond to an increase of the size of the system, which is
physically accessible. However, only the stretching of the area is of interest. So, without loss
of generality, we can set σA = σB = σ and we have µ ∝ σ 2 . Rotations just change the cor-
relations which, by themselves, are not physically accessible. However, under addition the
correlations still result in differences in variance and, indirectly, the size of the system, and
therefore are physically interesting. The space of transformations is therefore given by two
1.3. QUANTUM MECHANICS AND IRREDUCIBILITY 201

parameters a and b such that:

C = aA + bB
(1.5)
D = −bA + aB

Equivalently, we can use the complex number c = a + ıb to characterize the transformation,


which we can note as τ (c). The increase/decrease in size is given by a2 +b2 = (a−ıb)(a+ıb) = c∗ c
and the change in correlation is given by the Pearson correlation coefficient ρA,τ (c)A = cos arg c.
Putting it all together, we can characterize the state space C with a complex vector space.
The linear combination represents the mixing of the different stochastic descriptions. Two
vectors that only differ by a total phase are physically equivalent since a global change of
correlation does not change the distribution.
We can define a scalar product ⟨⋅∣⋅⟩ where the square norm induced corresponds to the
size of the system (or equivalently to the strength of the random variable) and the phase
difference corresponds to the correlations (the Pearson correlation coefficient). To see this,
note the formal equivalence between the variance and norm rules under linear composition:
2 2
σX+Y = σX + σY2 + 2 σX σY ρX,Y
(1.6)
∣ψ + ϕ∣2 = ∣ψ∣2 + ∣ϕ∣2 + 2∣ψ∣∣ϕ∣ cos(∆θ)

The quadratic form, again, reflects the fact that the size of the system is proportional to the
variance of a random variable. Since the size of the system is fixed, we use unitary vectors to
represent actual states. The state of the system, then, is represented by a ray in a complex
inner product space.
Lastly, we need to define an expectation operator that returns the average value for each
physical quantity. This operator will have to be linear under linear combination of quantities:

E[aX + bY ∣ψ] = aE[X∣ψ] + bE[Y ∣ψ]. (1.7)

It will not be linear under linear combination of states:

E[X∣ψ + ϕ] ≠ E[X∣ψ] + E[X∣ϕ]. (1.8)

Yet, it will have to be proportional to the increase in size and invariant under a total change
in correlation: E[X∣τ (c)ψ] = c∗ cE[X∣ψ]. This leads us to associate to each physical quantity
a linear Hermitian operator X where E[X∣ψ] = ⟨ψ∣X∣ψ⟩. An eigenstate ψ0 of X corresponds
to a state where all the elements of the ensemble have exactly the same value. That is,
E[(X − x̄)2 ∣ψ0 ] = 0.
Note that an inner product space can always be completed into a Hilbert space. This may,
however, bring in objects that may not correspond to physical objects (i.e. infinite expectation
for some quantities). In general, we believe it is better to regard the (possibly incomplete)
inner product space as the physical state space and regard the completion as a mathematical
device for calculation. For example, the Schwartz space seems more physically meaningful
than the standard L2 space as it gives finite expectation of all polynomials of position and
momentum and, moreover, it is closed under Fourier transform.
202 CHAPTER 1. REVERSE PHYSICS

Process with equilibria


The first type of process we consider is one with equilibria. The measurement process is
recovered as a special case.

Assumption VI (Process with equilibria). Given an initial ensemble (i.e. mixed state), the
final ensemble is uniquely determined and remains the same if the process is applied again.

Under this assumption, the process can be characterized by a projection operator. Let ρ1
be the density matrix that characterizes a mixed state. Since the final mixed state must be
uniquely determined by ρ1 , it will be P(ρ1 ) for some operator P. Similarly, if ρ2 is another
initial mixed state, its final operator will be P(ρ2 ). Note that, given any observable X the
expectation E[X∣ρ1 ] = tr(Xρ1 ) is the trace of Xρ1 . Similarly E[X∣P(ρ1 )] = tr(XP(ρ1 )).
We can always create statistical mixtures of the ensembles and we must have E[X∣aρ1 +
bρ2 ] = aE[X∣ρ1 ] + bE[X∣ρ2 ] since these are classical mixtures. But since these are classical
mixtures, the final state will also need to obey E[X∣aP(ρ1 ) + bP(ρ2 )] = aE[X∣P(ρ1 )] +
bE[X∣P(ρ2 )] for all possible X. Which means P(aρ1 + bρ2 ) = aP(ρ1 ) + bP(ρ2 ) Therefore the
operator P is a linear operator. Moreover, the process applied twice must lead to the same
result, which means P(P(ρ)) = P(ρ) for any ρ. That is, P 2 = P. Therefore P is a projection.
Suppose, now, that we want to measure a quantity X. We want the final outcome, the
final ensemble, to be determined by the initial state, the initial ensemble. We also want
the measurement to be consistent in the sense that, if it is repeated immediately after, it
should yield the same result. Therefore the process will be a projection. We will also want
that the process does not distort the quantity. That is, E[X∣ρ] = E[X∣P(ρ)]. This means
that the eigenstates of X will correspond to equilibria of the process. Moreover, subsequent
measurements must give the same value, not just the same mixture. That is, if X1 is the
random variable after the first instance of the process and X2 is the random variable after the
second instance, P (X2 = x∣X1 = x) = 1. This means that E[(X − x̄)2 ∣P(ρ)] = 0 which means
the eigenstates of X are the only equilibria.
The measurement process is therefore simply a special case of a process with equilibria.

Deterministic and reversible evolution


The second type of process we consider is one that is deterministic and reversible, which is
the same as assumption DR.
Under this assumption, the process can be characterized by unitary evolution (i.e. the
Schrodinger equation). There are multiple different ways to see this. The first relates to the
more general idea that all deterministic and reversible processes must be isomorphisms in the
category of states. Since the state space is an inner product space, the isomorphism is unitary
evolution.
The second, is that if there is a set of quantities X0 at time t0 that fully identify the state
(i.e. the state is the only eigenstate of those quantities), then there must be a corresponding
set of quantities X1 that fully identify the state at time t1 . This means that the evolution maps
basis to basis. Moreover, given the linearity of statistical mixtures, this will also mean that a
statistical distribution over X0 will have to map to the same distribution over X1 . Therefore
the evolution must map linear combinations of that basis to the same linear combination. The
evolution is a linear operator. Since the total size of the irreducible system cannot change,
the operator must be unitary.
1.3. QUANTUM MECHANICS AND IRREDUCIBILITY 203

The third, is by constructing a quasi-static process from processes with equilibria, much
like one does in thermodynamics. The idea is that we have an infinitesimal time step, an initial
state ψt and a final state ψt+dt . We want P (ψt+dt ∣ψt ) = 1. This means that ∣⟨ψt+dt ∣ψt ⟩∣2 = 1.
This can happen only if the difference between initial and final states is infinitesimal. That
is, ⟨ψt+dt ∣ψt ⟩ = 1 + ıϵdt where ϵ is a real number. Therefore, by convention, we can write
∣ψt+dt ⟩ = I + Hdt
̵ ∣ψt ⟩ where H is a Hermitian operator.
ıh
Putting these perspectives together, time evolution is a unitary operator which can be
H∆t
written as U = e ıh̵ . If we start in an eigenstate of X, that is X∣ψt ⟩ = x0 ∣ψt ⟩ we will end in
H∆t H∆t
an eigenstate X̂∣ψt+∆t ⟩ = x0 ∣ψt+∆t ⟩ of another operator X̂ = e ıh̵ Xe− ıh̵ .
In fact:
H∆t H∆t H∆t H∆t
e ̵
ıh Xe− ̵
ıh ∣ψt+∆t ⟩ = e ̵
ıh Xe− ̵
ıh U ∣ψt ⟩
H∆t
− H∆t H∆t
=e ̵
ıh Xe ̵
ıh e ̵
ıh ∣ψt ⟩
H∆t
=e ̵
ıh X∣ψt ⟩ (1.9)
H∆t
=e ̵
ıh x0 ∣ψt ⟩
= x0 U ∣ψt ⟩
= x0 ∣ψt+∆t ⟩

This is consistent with assuming there is a quasi-static process that, at every t, has equilibria
H(t−t0 ) H(t−t0 )
identified by e ıh̵ Xe− ıh̵ . Note that, unlike thermodynamics, the equilibria during the
evolution are not set by external constraints but by the system itself. That is, X depends on
the initial state of the system.
In this light, the measurement processes and the unitary processes can be seen as particular
cases of the same type of processes, those with equilibria, which are defined as a black-box
from initial to final state. This is consistent with the irreducibility assumption as the inability
to describe the dynamics of the parts implicitly assumes that the dynamics of the parts is
at equilibrium and sets a time-scale under which the further description of the system (i.e.
non-equilibrium dynamics) would require describing the internal dynamics.
Chapter 2

Physical mathematics

This chapter presents the areas that still need to be covered to conclude the general math-
ematical theory of experimental science and a summary of the preliminary work done on
them.

2.1 Experimental verifiability


This first part is already well developed and has been presented in chapters one to three.
Possible improvements are discussion in section 2.4.

2.2 Informational granularity


The general goal of this part is to recover elements of measure theory, differential geometry,
probability theory and information theory. The central theme is the ability to compare and
then quantify the granularity of the description provided by different statements. The idea is
to have a single unified structure which can be, in some cases, reduced to the more familiar
mathematical structures.

Statement fineness
Conceptually, we want to be able to compare two statements to see which one provides a
more refined description, which one provides more information. For this, we need to establish
a new axiom.
¯ comes with a partial order ≼ that indicates whether one
Note that a theoretical domain D
statement gives a narrower, more specific, description than the other. For example:

ˆ “The position of the object is between 0 and 1 meters” ≼ “The position of the object is
between 0 and 1 kilometers”
ˆ “The fair die landed on 1” ≼ “The fair die landed on 1 or 2”
ˆ “The first bit is 0 and the second bit is 1” ≼ “The first bit is 0”

In these cases, the first statements are “contained” in the second ones, which are more general.
We need to define an additional preorder t∶ D ¯ ×D¯ → B that compares two statements and
tells us if the first provides a description with finer granularity than the second. Saying s1 t s2
means that the description provided by s1 is finer, gives more information, is more precise,
than the description provided by s2 . For example:

205
206 CHAPTER 2. PHYSICAL MATHEMATICS

ˆ “The position of the object is between 0 and 1 meters” t “The position of the object is
between 2 and 3 kilometers”
ˆ “The fair die landed on 1” t “The fair die landed on 3 or 4”
ˆ “The first bit is 0 and the second bit is 1” t “The third bit is 0”

In these cases, the first statement may not be contained or overlap with the second. The exis-
tence of this operator and its property would be an additional axiom. Fineness is a preorder,
rather than an order, because it does not satisfy antisymmetry: if s1 t s2 and s2 t s1 then
it is not necessarily true that s1 ≡ s2 . In that case, we will say that the two statements are
equigranular, noted s1 ≐ s2 .
Note how statements about geometry, probability and information all satisfy the same
concept. In fact, each of these structures will generate a preorder on the statements. The
general question is what are the necessary and sufficient conditions on the preorder to be able
to recover those structures.

Measure theory
Conceptually, a measure allows one to assign a size to a set. For us, a theoretical set is really
a statement, so we want to assign sizes to statements that represent the coarseness of the
description they provide.
The construction should, roughly, proceed as follows. Let D ¯ X be a theoretical domain.
¯
We select a unit statement u ∈ DX . We define, in some way, the set D ¯u ⊆ D ¯ X which contains
all statements that are comparable to u. We then try and construct a measure µu ∶ D ¯u → R
such that µu (u) = 1. By a measure, we mean that µu is additive over incompatible statements
(i.e. disjoint sets of possibilities). That is, if s1 ̸ s2 , we have µu (s1 ∨ s2 ) = µu (s1 ) + µu (s2 ). We
want the measure to respect the fineness preorder, to be monotonic. That is, if s1 t s2 then
µu (s1 ) ≤ µu (s2 ).
Originally, we thought that these measures would have to be always additive and therefore
we starting adding suitable axioms on fineness. However, we realized that, in the context of
quantum mechanics, the measure cannot be additive if it has to agree with the von Neumann
entropy. Worse, it is not even monotonic (i.e. a broader statement is not necessarily coarser).
More conceptual work needs to be done to understand the issue.
Note that we have essentially one measure for each equivalence class defined by fineness.
This is intended. One reason a single measure is not sufficient for our work is because we
need to compare statements of “different infinities”. If we have a single measure, we can only
compare objects with a finite measure. All objects with zero measure (or infinite measure)
are indistinguishable. For example, we want to say:

ˆ s1 = “The horizontal position of the object is exactly 0 meters”


ˆ s2 = “The horizontal position of the object is exactly 1 or 2 meters”
ˆ s3 = “The horizontal position of the object is between 0.5 and 1.5 meters”
ˆ s4 = “The horizontal position of the object is between 1.5 and 3.5 meters”
ˆ s1 t s2 t s3 t s4
ˆ s1 ̸ s2 ̸ s3 ̸ s4

Fineness may also capture the concept of physical dimension. In fact, two descriptions in
the same units are “finitely comparable” in the sense that one gives a finer description than
the other by a finite factor. Descriptions of different units are either “infinitely comparable”
2.2. INFORMATIONAL GRANULARITY 207

(e.g. areas are always bigger than lengths) or not comparable (e.g. position and momentum).
Consider a two dimensional phase space of a classical system. Points should be comparable and
in fact should be equigranular ≐ so that we can compare sets of finitely many points. Areas are
also comparable to each other, and are comparable to points (i.e. they are infinitely bigger).
However, vertical lines (i.e. ranges in momentum alone) are not comparable to horizontal lines
(i.e. ranges in position alone). Symplectic geometry, in fact, gives a size to areas and not to
lines. Mathematically, this should be clarified when one is trying to define the domain of the
measure D ¯u.

Probability
Conceptually, probability is recovered as a measure restricted to a particular subset. The idea
is that you take two statements, such as “the die landed on 2” given that “the die has 6 sides
and it is fair”, and you ask what fraction of the possibilities compatible with the second is
also compatible with the first. This defines the conditional probability.
¯ be two theoretical statements. Then the probability of s2 given s1 is
Let s1 , s2 ∈ D
µu (s1 ∧ s2 )
P (s2 ∣s1 ) = µs1 (s1 ∧ s2 ) = (2.1)
µu (s1 )
which quantifies the fraction of possibilities compatible with s1 that are also compatible with
s2 .
If we take the certainty ⊺ as a unit, we have a probability measure for the whole space.
However, since we can take different statements as a unit, we will be able to distinguish
between the following cases:

ˆ P (“n is odd” ∣“n is picked fairly from all integers”) = 1/2


ˆ P (“n is between 0 and 9” ∣“n is picked fairly from all integers”) = 0
ˆ P (“n is 3” ∣“n is picked fairly from all integers”) = 0
ˆ P (“n is 3” ∣“n is between 0 and 9”∧“n is picked fairly from all integers”) = 1/10

Differentiability
We want to construct a notion of differentials and differentiability that is the same for all
spaces, even infinite dimensional ones. When introducing derivatives, this is typically done
by taking limits of differences, and therefore differentiability is the existence of those limits.
In differential topology, this notion is used to define differentiability of manifolds in terms of
differentiability of coordinates, and then differentials are defined as linear functions of vectors.
That is, the differentials defined on the coordinates of a particular chart are technically not
the same objects as the differentials defined on the space.
The idea is to define differentiability on the vector space structure alone. That is, given
two vector spaces V and W , a map f ∶ V → W is differentiable if it becomes linear in the
neighborhood. We would first define a differential as a sequence of vectors {vi }∞ i=1 ∈ V such
that there exists a vector t ∈ V and a sequence of non-zero elements {ai }∞ i=1 ∈ R that converges
to 0 for which
vi
lim = t.
i→∞ ai

We call t the tangent vector of the differential and {ai }∞ i=1 its convergence envelope.
Note that, given a sequence vi , these are not unique. We note dv[ai t] the differential with its
208 CHAPTER 2. PHYSICAL MATHEMATICS

tangent vector and convergence envelope. One can show that every differential can be written
as vi = ai ti where ti converges to t.
We can now study how a map f ∶ V → W maps differentials. Given a sequence {vi }∞ i=1 ∈ V ,
we can define wi = f (vi ). If, additionally, we have a differential dv[ai t], we can define the
sequence {wi }∞ i=1 = {f (vi + ai ti ) − f (vi )}i=1 . Now, the observation here is that if the map is

linear, the sequence {wi }i=1 will be a differential with tangent vector f (t) and convergence

envelope ai . But any map that is locally linear will have the same property, given that dif-
ferentials are local objects. Therefore we say f is differentiable at v ∈ V if there exists a map
dv f ∣v0 ∶ V → W such that {wi }∞ i=1 = dw[ai dv f ∣v0 (t)].
From a preliminary study, this would work on any vector space, regardless of dimension
or field (i.e. real, complex, rational, ...).

Differential geometry/geometric measure theory


In the reverse physics chapter about classical mechanics we have seen that forms can be
understood as modeling additive functionals of subregions. We need to connect those ideas to
the rest of the formal framework.
Conceptually, we want to assign quantities to regions instead of points. If we assume these
quantities are additive, the idea is that we can decompose them into the sum of infinitesimal
contributions at each point. Therefore the differential objects exist as the limit of infinitesimal
decomposition. This, again, reflects the overall spirit of the project that compels us to start
from physically well defined entities (in this case the quantities associated with finite regions)
and derive the theoretical ones (in this case the infinitesimal contributions that are integrated).
Let D¯ X be a theoretical domain and U ∈ ΣX a theoretical set. This represents the region
associated to our measurement. Let D ¯ Y be a theoretical domain and R ∈ ΣY a theoretical set.
This represents the possible values found. Our starting point consists of statements like:

ˆ “the amount of mass inside volume U is within range R”


ˆ “the force applied to surface U is within range R”
ˆ “the energy used to move the object along the line U is within range R”

These are finite precision statements of a quantity associated to a region of finite size.
The first step is to group statements within the same region U into subdomains D ¯ U →Y .
We can then show how the possibilities for each D ¯ U →Y reduce to statements like:

ˆ “the amount of mass inside volume U is precisely y”


ˆ “the force applied to surface U is precisely y”
ˆ “the energy used to move the object along the line U is precisely y”

These are infinite precision statements of a quantity associated to a region of finite size. We
define S ⊆ ΣX as the type of region (i.e. volumes vs surfaces vs lines) upon which the functional
is defined and therefore we have a functional f ∶ S → Y which tells us the exact value of the
quantity in each region.
Then we study the case where f is a real linear k-functional, meaning:

ˆ the possibilities X are identified by a set of real values; that is, X with the natural
topology is a manifold
ˆ the domain is all k-dimensional surfaces S k ; that is, the submanifolds of dimension k
2.3. STATES AND PROCESSES 209

ˆ the co-domain is the reals; so we have f ∶ S k → R


ˆ the functional is additive over disjoint sets; that is, F (U1 ∪ U2 ) = F (U1 ) + F (U2 ) if
U1 ∩ U2 = ∅
ˆ the functional commutes with the limit; that is, lim F (Ui ) = F ( lim Ui )
i→∞ i→∞

Under these conditions (and possibly others) one can express the functional as a sum of
infinitesimal contributions. That is, f (U ) = ∫U ω(dU ), where ω represents a suitable k-form.
Note that there is not a unique way to perform this decomposition. For example, if f (U )
is the total mass in the volume, ω(dU ) is the density in the infinitesimal volume. If we change
the density at a single point, the integral does not change and only the integral is physical.
These are the types of issues that still need to be solved.
Stokes’ theorem and exterior derivatives. One interesting application of this view-
point is that we can understand things like Stokes’ theorem, exterior derivative and the dif-
ference between closed and exact forms directly on the finite functionals.
Let ∂ ∶ S k → S k−1 be the boundary operator that, given a surface σ k , returns the boundary
k
∂σ which is of dimension k − 1. We have ∂∂σ = ∅ for any surface of any dimensionality.
Let Fk be the space of linear k-functionals. We can define the boundary functional operator
∂ ∶ Fk → Fk+1 such that ∂f (σ) = f (∂σ). That is, given a functional that acts on k-surfaces we
can always construct one that acts on k +1-surfaces by taking the boundary of the k +1-surface
and giving it to the first functional. Note that ∂∂f (σ) = ∂f (∂σ) = f (∂∂σ) = f (∅) = 0, so the
boundary functional of the boundary functional is the null functional, the one that returns
zero for every k-surface. What we should be able to prove is that if ω is the k-form associated
with f , dω is the k + 1-form associated with ∂f . In other words, Stokes’ theorem essentially
becomes a definition of the boundary functional and the calculation of the expression for dω.
We say a surface is contractible if it can be reduced to a point with a continuous transfor-
mation. A functional is closed if it is zero for all closed contractible surfaces. It is exact if it
is zero for all closed surfaces. All boundary functionals are exact since ∂f (∂σ) = f (∂∂σ) = 0.
The form associated to a closed functional will be closed while the form associated to an exact
functional will be exact.

2.3 States and processes


The general goal of this part is to give general definitions of states and processes that are
always valid and are captured by a fundamental mathematical framework. Different theories
would then specialize these basic definitions for different circumstances.

Processes
A process is an experimental domain P that contains all the possible statements of the systems
under study for all possible times. We call evolutions the possibilities E of the domain, as
they represent the complete description of all systems at all times.
We define a time parameter t ∈ T ⊆ R. We group all statements relative to a system of
interest at a particular time into a time domain Dt . We call snapshots the possibilities Xt of
each time domain. A possible trajectory is a sequence {xt }t∈T such that xt ∈ Xt for all t ∈ T
and e ≼ ⋀ xt for some e ∈ E. That is, there is an evolution for which the system will be
t∈T
described by that sequence of snapshots.
210 CHAPTER 2. PHYSICAL MATHEMATICS

A process is deterministic if for all possible trajectories xt0 ≼ xt1 for all t0 ≤ t1 . A process is
reversible if for all possible trajectories xt1 ≼ xt0 for all t0 ≤ t1 . Recall that narrowness between
the possibilities of two domains means there is an experimental relationship. Therefore, if the
process is deterministic, we can write a causal relationship f ∶ Xt0 → Xt1 such that x0 ≼ f (x0 ).
Once we derive a measure µu ∶ P¯u → R, we can define the evolution entropy as log µu . As
the measure is multiplicative for independent systems, the evolution entropy will be additive
making it an extensive property. The evolution entropy of a system at a time is defined to be
the evolution entropy log µu (xt ) of the snapshot at that time. Under a deterministic process,
the evolution entropy can never decrease: log µu (xt0 ) ≤ log µu (xt1 ) since xt0 ≼ xt1 for all t0 ≤ t1
and therefore µu (xt0 ) ≤ µu (xt1 ). If the process is also reversible, then log µu (xt0 ) = log µu (xt1 ).
These definitions give a very general setting to describe a process and already find a
quantity that cannot decrease during deterministic evolution.

States
Conceptually, states represent description of the system, and only of the system, regardless
of time. Therefore the state space is not a set of statements, but a “template” for a set of
statements that can be “instantiated” at different times.
The idea is that a state space S comes equipped with a function ι ∶ S × T → P¯ such that
¯ t . That is, it maps the state space and its statements to the particular time domain
ι(S, t) = D
that represents the system at that particular time. Specifically, states of the system will be
mapped to snapshots of the system.
The structure of the state space will not be, in general, isomorphic to each particular time
domain. In a particular process at a particular time some states may not be accessible, so some
states will be mapped to an impossibility. Or there may be correlations with other system, so
the snapshot will provide more information (will be narrower) than the states themselves.
The relationships defined on the state space will be equivalent to the ones in the time
domain if and only if the time domain of the system is independent from the time domain of
the other systems. In other words: the state space represents the system and its properties
when the system is independent. This also means that, to be able to define a system, we need
to have a process that renders it independent from other systems.
When the system is independent from all others, the description is coarser than in the case
of when there are correlations. Note that to a coarser description is associated a higher process
entropy. Processes that render the system independent are exactly the ones that maximize the
process entropy. We can associate a state entropy to each state, which is the process entropy
associated to that description when the system is independent.
While it is still not clear what can be derived and what must be imposed, the overall
goal is to understand what assumptions are needed to construct state spaces. One result
should be that processes that isolate the system are implicitly needed, which forms the basis
of requiring entropy maximization. All states are therefore equilibria of those processes (i.e.
symmetries of the group of processes). Conceptually, this maps well with all branches of
physics as all state spaces come equipped with some structure which, in the end, is connected
to entropy/probability/measure.
2.4. OPEN QUESTIONS AND POSSIBLE EXTENSIONS 211

2.4 Open questions and possible extensions


Here we note some thoughts and ideas about open problems and possible extensions to the
general theory.

Homogeneity of an experimental domain


It may be interesting to characterize some notion of homogeneity that makes all possibilities
in a domain “equally verifiable”, that no possibility is “special” compared to the others in
terms of experimental verifiability. For example:

ˆ the “extra-terrestrial life” domain is not homogeneous because one possibility can be
verified while the other cannot
ˆ the integers and reals are the only linearly ordered quantities where all contingent state-
ments are the same experimentally: all decidable and none decidable
ˆ phase transitions are special, as knowing whether a system is in a mixed state is decid-
able, so a domain with phase transitions is not homogeneous

It is not clear how this notion should be implemented and how exactly it would be useful.
It may give a reason to expect a complete domain (the residual possibility is the only one
that is not compatible with any contingent verifiable statement, so the domain would not
be homogeneous) and also that all possibilities are approximately verifiable (if one is able to
prove that, in any domain, at least one possibility is approximately verifiable).

Predictive relationships
Another way to characterize relationships between domains could be in terms of predictions,
what statements of one domain can tell about the other. That is, we give a theoretical state-
ment on one and look for the best prediction (i.e. narrowest theoretical statement broader
than the original) for the other.
For example, if a domain is independent from another, any theoretical statement should
predict the certainty on the other. If a domain is dependent on another, any theoretical
statement should predict an equivalent statement.
A possible approach. Let DX and DY be two experimental domains and D ¯ X and D
¯ Y their
¯ ¯
respective theoretical domains. Now we construct the function π ∶ DX → DY such that given
sX ∈ D¯ x and sY ∈ D
¯ Y such that sX ≼ sY we always have sX ≼ π(sX ) ≼ sY . In other words, it
should map to the narrowest broader statement in D ¯Y .
In principle, we can even extend π ∶ S → D ¯ Y to be defined on the whole context. In
that case, π can be proven to be a projection. This map should be able to characterize the
relationship between domains. For example, if π(D ¯ X ) = {⊺, –} then the domains should be
independent. If π(D ¯X ) = D
¯ Y the domains should be dependent.

Defining structures on experimental domains


Some mathematical structures are defined on points (i.e. vector spaces, ordering) and others on
their σ-algebras. In our context, verifiable statements are the only elements that are actually
physical, therefore it would be nice to always define the structures on the experimental domain
(i.e. the topology) and show that it induces a unique structure on the theoretical statements
and possibilities (and vice-versa).
212 CHAPTER 2. PHYSICAL MATHEMATICS

We have already implemented this approach in a couple of areas. Theoretical domains


are constructed from experimental domains (see 1.36) and so are the possibilities (see 1.47).
Theorem 2.10 shows that causal relationship on the possibilities is equivalent to an inference
relationship on the verifiable domain. Theorem 3.16 shows that ordering of the possibilities is
equivalent to the ordering of the basis according to narrowness.
We need to understand how this can be achieved for other structures, such as measures,
metrics, groups, vector spaces, inner products, ...

Space of possible combined domains


It should be possible to better characterize the space of all possible combined domains. As
we show in 2.34 that the space of the possible experimental relationships is the space of
topologically continuous functions, there should be an analogue for the space of all possible
combined domains. For example, one should be able to show that the combined domain is
an immersion within the product topology. Is that the only constraint? How can that be
characterized? Can we create an experimental domain to distinguish them?

Limited precision
One area we could explore for new physical ideas is what happens if we assume that the
precision cannot be arbitrarily decreased. How is it different from the continuous case? Here
are some preliminary ideas.
The limited precision case cannot simply lead to a discrete topology. The standard topology
of the reals is not the limit of the integer topology since it is not discrete. Most likely, the
limited precision case will need to have uncountable possibilities so that the limit to arbitrary
precision can work well.
The main cause of confusion is that, in the continuous case, whether the precision of two
statements overlap determines whether the statements are compatible. For example, “the po-
sition is between 0 and 1 meters” and “the position is between 2 and 3 meters” are both
incompatible and not overlapping. This cannot be the case for limited precision. The possi-
bilities themselves must be incompatible with each other but some of them must overlap, or
we would simply have a discrete topology. That is, suppose that 1 unit is the precision limit,
the statements “the position is between 0 and 1” and “the position is between 0.5 and 1.5”
are incompatible because if we verify one we cannot verify the other. If we could verify them
both, we would measure at a smaller precision. So, overlapping cannot be defined in terms of
incompatibility.
Whether two statements overlap cannot be determined through incompatibility but must
be recovered from the precision of the disjunction. Suppose we have the following arbitrary
precision statements.

s1 =“the position is between 0 and 1 meter”


s2 =“the position is between 0.5 and 1.5 meter”
s3 =“the position is between 2 and 3 meter”

The precision associated to all statements will be one meter. The precision for s1 ∨ s2 will be
one meter and a half while the precision for s1 ∨ s3 will be two meters. That is: the precision
for non overlapping statements sums. It may even be the case that if the precision sums,
statements must be incompatible but the converse is what fails.
2.4. OPEN QUESTIONS AND POSSIBLE EXTENSIONS 213

This means that the measure we put on the possibilities cannot represent the precision
anymore. That is, dµ ≠ dx. We can imagine a relationship like dx2 = dµ2 + 1. This would both
make the precision go to 1 when the measure goes to 0 and dx ≃ dµ for large µ.
Part IV

Appendix

215
Appendix A

Reference sheets for math and


physics

A.1 Set theory

Name Meaning
A = {1, 2, 3} set a collection of elements
N = {0, 1, 2, ...} natural numbers the set of numbers one uses to count
Z = {.., −1, 0, 1, ..} integers the set of all whole numbers
Q rationals the set of all fractions
R reals the set of numbers with infinite precision
C complex the set of numbers that represent a two dimen-
sional vector or rotation
a∈A in whether the element a is contained in A
A⊆B subset a set that only contains elements of the other
set
A⊂B proper subset a set that only contains elements of the other
set but not all of them; it is a subset but is not
the same set
A⊇B superset a set that contains all elements of the other set
A⊃B proper superset a set that contains all elements of the other set
but not just them; it is a superset but is not
the same set
A∪B union the set of all elements contained in either sets
A∩B intersection the set of all elements contained in both sets
A∖B subtraction the set of elements in A that are not in B
AC complement the set of all elements that are not in A
it is equal to A ∖ U where U is the set of all
elements, which depends on context
A×B Cartesian product the set of all ordered pairs (a, b) with a ∈ A and
b∈B
2A power set the set of all possible subsets of A

217
218 APPENDIX A. REFERENCE SHEETS FOR MATH AND PHYSICS

Name Meaning
f ∶A→B function a map that for every element A returns an el-
ement of B
injective function a function that every distinct element of A map
that for every element A returns an element of
B
BA the set of all possible functions f ∶ A → B
C(A, B) the set of all continuous functions f ∶ A → B
Credits

Created by: Gabriele Carcassi


Written by: Gabriele Carcassi and Christine A. Aidala

Subject-matter advisors (Math): Mark Greenfield (Ch. II.1,II.2,II.3)


review prompted significant technical changes
Additional subject-matter advisors (Phil): Josh Hunt (Ch. II.1)
review prompted significant technical improvements

Diagrams and figures: Matteo Carcassi (Ch. I.1), Saja Gherri (Ch.
contributed one or more II.1,II.2)

Test readers: Chami Amarasinghe, Andre Antoine, Saja


reviewed a full chapter or more Gherri, Uriah Israel, Micah Johnson, Sean
Kelly, Dan McCusker, Everardo Olide
Additional test readers: Josce Kooistra, Armin Nikkhah Shirazi, Ayla
review prompted corrections and clarifications Rodriguez, Alex Takla, Tobias Thrien, Allan
Vanzandt

219

You might also like