100% found this document useful (1 vote)

17K views

2020 Book ProgrammingLanguagesAndSystems PDF

Uploaded by

Sebastian Gomez

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

17K views

2020 Book ProgrammingLanguagesAndSystems PDF

Uploaded by

Sebastian Gomez

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 785

ARCoSS Peter Müller (Ed.

Programming
LNCS 12075

Languages
and Systems
29th European Symposium on Programming, ESOP 2020
Held as Part of the European Joint Conferences
on Theory and Practice of Software, ETAPS 2020
Dublin, Ireland, April 25–30, 2020, Proceedings
Lecture Notes in Computer Science 12075
Founding Editors
Gerhard Goos, Germany
Juris Hartmanis, USA

Editorial Board Members

Elisa Bertino, USA Gerhard Woeginger , Germany
Wen Gao, China Moti Yung, USA
Bernhard Steffen , Germany

Advanced Research in Computing and Software Science

Subline of Lecture Notes in Computer Science

Subline Series Editors

Giorgio Ausiello, University of Rome ‘La Sapienza’, Italy
Vladimiro Sassone, University of Southampton, UK

Subline Advisory Board

Susanne Albers, TU Munich, Germany
Benjamin C. Pierce, University of Pennsylvania, USA
Bernhard Steffen , University of Dortmund, Germany
Deng Xiaotie, Peking University, Beijing, China
Jeannette M. Wing, Microsoft Research, Redmond, WA, USA
More information about this series at http://www.springer.com/series/7407
Peter Müller (Ed.)

Programming
Languages
and Systems
29th European Symposium on Programming, ESOP 2020
Held as Part of the European Joint Conferences
on Theory and Practice of Software, ETAPS 2020
Dublin, Ireland, April 25–30, 2020
Proceedings
Editor
Peter Müller
ETH Zurich
Zurich, Switzerland

ISSN 0302-9743 ISSN 1611-3349 (electronic)

Lecture Notes in Computer Science
ISBN 978-3-030-44913-1 ISBN 978-3-030-44914-8 (eBook)
https://doi.org/10.1007/978-3-030-44914-8

LNCS Sublibrary: SL1 – Theoretical Computer Science and General Issues

© The Editor(s) (if applicable) and The Author(s) 2020. This book is an open access publication.
Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International
License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution
and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and
the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this book are included in the book’s Creative Commons license,
unless indicated otherwise in a credit line to the material. If material is not included in the book’s Creative
Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use,
you will need to obtain permission directly from the copyright holder.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a speciﬁc statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors
give a warranty, expressed or implied, with respect to the material contained herein or for any errors or
omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in
published maps and institutional afﬁliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
ETAPS Foreword

Welcome to the 23rd ETAPS! This is the first time that ETAPS took place in Ireland in
its beautiful capital Dublin.
ETAPS 2020 was the 23rd instance of the European Joint Conferences on Theory
and Practice of Software. ETAPS is an annual federated conference established in
1998, and consists of four conferences: ESOP, FASE, FoSSaCS, and TACAS. Each
conference has its own Program Committee (PC) and its own Steering Committee
(SC). The conferences cover various aspects of software systems, ranging from
theoretical computer science to foundations of programming language developments,
analysis tools, and formal approaches to software engineering. Organizing these
conferences in a coherent, highly synchronized conference program enables researchers
to participate in an exciting event, having the possibility to meet many colleagues
working in different directions in the field, and to easily attend talks of different
conferences. On the weekend before the main conference, numerous satellite
workshops took place that attracted many researchers from all over the globe. Also, for
the second time, an ETAPS Mentoring Workshop was organized. This workshop is
intended to help students early in the program with advice on research, career, and life
in the fields of computing that are covered by the ETAPS conference.
ETAPS 2020 received 424 submissions in total, 129 of which were accepted,
yielding an overall acceptance rate of 30.4%. I thank all the authors for their interest in
ETAPS, all the reviewers for their reviewing efforts, the PC members for their
contributions, and in particular the PC (co-)chairs for their hard work in running this
entire intensive process. Last but not least, my congratulations to all authors of the
accepted papers!
ETAPS 2020 featured the unifying invited speakers Scott Smolka (Stony Brook
University) and Jane Hillston (University of Edinburgh) and the conference-specific
invited speakers (ESOP) Işıl Dillig (University of Texas at Austin) and (FASE) Willem
Visser (Stellenbosch University). Invited tutorials were provided by Erika Ábrahám
(RWTH Aachen University) on the analysis of hybrid systems and Madhusudan
Parthasarathy (University of Illinois at Urbana-Champaign) on combining Machine
Learning and Formal Methods. On behalf of the ETAPS 2020 attendants, I thank all the
speakers for their inspiring and interesting talks!
ETAPS 2020 took place in Dublin, Ireland, and was organized by the University of
Limerick and Lero. ETAPS 2020 is further supported by the following associations and
societies: ETAPS e.V., EATCS (European Association for Theoretical Computer
Science), EAPLS (European Association for Programming Languages and Systems),
and EASST (European Association of Software Science and Technology). The local
organization team consisted of Tiziana Margaria (general chair, UL and Lero),
Vasileios Koutavas (Lero@UCD), Anila Mjeda (Lero@UL), Anthony Ventresque
(Lero@UCD), and Petros Stratis (Easy Conferences).
vi ETAPS Foreword

The ETAPS Steering Committee (SC) consists of an Executive Board, and

representatives of the individual ETAPS conferences, as well as representatives of
EATCS, EAPLS, and EASST. The Executive Board consists of Holger Hermanns
(Saarbrücken), Marieke Huisman (chair, Twente), Joost-Pieter Katoen (Aachen and
Twente), Jan Kofron (Prague), Gerald Lüttgen (Bamberg), Tarmo Uustalu (Reykjavik
and Tallinn), Caterina Urban (Inria, Paris), and Lenore Zuck (Chicago).
Other members of the SC are: Armin Biere (Linz), Jordi Cabot (Barcelona), Jean
Goubault-Larrecq (Cachan), Jan-Friso Groote (Eindhoven), Esther Guerra (Madrid),
Jurriaan Hage (Utrecht), Reiko Heckel (Leicester), Panagiotis Katsaros (Thessaloniki),
Stefan Kiefer (Oxford), Barbara König (Duisburg), Fabrice Kordon (Paris), Jan
Kretinsky (Munich), Kim G. Larsen (Aalborg), Tiziana Margaria (Limerick), Peter
Müller (Zurich), Catuscia Palamidessi (Palaiseau), Dave Parker (Birmingham),
Andrew M. Pitts (Cambridge), Peter Ryan (Luxembourg), Don Sannella (Edinburgh),
Bernhard Steffen (Dortmund), Mariëlle Stoelinga (Twente), Gabriele Taentzer
(Marburg), Christine Tasson (Paris), Peter Thiemann (Freiburg), Jan Vitek (Prague),
Heike Wehrheim (Paderborn), Anton Wijs (Eindhoven), and Nobuko Yoshida
(London).
I would like to take this opportunity to thank all speakers, attendants, organizers
of the satellite workshops, and Springer for their support. I hope you all enjoyed
ETAPS 2020. Finally, a big thanks to Tiziana and her local organization team for all
their enormous efforts enabling a fantastic ETAPS in Dublin!

February 2020 Marieke Huisman

ETAPS SC Chair
ETAPS e.V. President
Preface

Welcome to the European Symposium on Programming (ESOP 2020)! The 29th

edition of this conference series was initially planned to be held April 27–30, 2020, in
Dublin, Ireland, but was then moved to fall 2020 due to the COVID-19 outbreak.
ESOP is one of the European Joint Conferences on Theory and Practice of Software
(ETAPS). It is devoted to fundamental issues in the specification, design, analysis, and
implementation of programming languages and systems.
This volume contains 27 papers, which the Program Committee (PC) selected
among 87 submissions. Each submission received between three and six reviews. After
an author response period, the papers were discussed electronically among the PC
members and external reviewers. The one paper for which the PC chair had a conflict of
interest was kindly handled by Sasa Misailovic.
Submissions authored by a PC member were held to slightly higher standards: they
received at least four reviews, had an external reviewer, and were accepted only if they
were not involved in comparisons of relative merit with other submissions. We
accepted two out of four PC submissions.
The final program includes a keynote by Işıl Dillig on “Formal Methods for
Evolving Database Applications.”
Any conference depends first and foremost on the quality of its submissions. I would
like to thank all the authors who submitted their work to ESOP 2020! I am truly
impressed by the members of the PC. They produced insightful and constructive
reviews, contributed very actively to the online discussions, and were extremely
helpful. It was an honor to work with all of you! I am also grateful to the external
reviewers, who provided their expert opinions and helped tremendously to reach
well-informed decisions. I would like to thank everybody who contributed to the
organization of ESOP 2020, especially the ESOP 2020 Steering Committee and its
chair Peter Thiemann as well as the ETAPS 2020 Steering Committee and its chair
Marieke Huisman, who provided help and guidance on numerous occasions. Finally,
I’d like to thank Linard Arquint and Vasileios Koutavas for their help with the
proceedings.

February 2020 Peter Müller

Organization

Program Committee
Elvira Albert Universidad Complutense de Madrid, Spain
Sophia Drossopoulou Imperial College London, UK
Jean-Christophe Filliatre LRI, CNRS, France
Arie Gurﬁnkel University of Waterloo, Canada
Jan Hoffmann Carnegie Mellon University, USA
Ranjit Jhala University of California at San Diego, USA
Woosuk Lee Hanyang University, South Korea
Rustan Leino Amazon Web Services, USA
Rupak Majumdar MPI-SWS, Germany
Roland Meyer Technische Universität Braunschweig, Germany
Antoine Miné LIP6, Sorbonne Université, France
Sasa Misailovic University of Illinois at Urbana-Champaign, USA
Toby Murray University of Melbourne, Australia
Peter Müller ETH Zurich, Switzerland
David Naumann Stevens Institute of Technology, USA
Zvonimir Rakamaric University of Utah, USA
Francesco Ranzato University of Padova, Italy
Sukyoung Ryu KAIST, South Korea
Ilya Sergey Yale-NUS College and National University
of Singapore, Singapore
Alexandra Silva University College London, UK
Nikhil Swamy Microsoft Research, USA
Sam Tobin-Hochstadt Indiana University Bloomington, USA
Caterina Urban Inria Paris, France
Viktor Vafeiadis MPI-SWS, Germany

Additional Reviewers

Amtoft, Torben Brady, Edwin

Arenas, Puri Brunet, Paul
Balabonski, Thibaut Caires, Luís
Bernardy, Jean-Philippe Charguéraud, Arthur
Bierman, Gavin Chini, Peter
Blanchet, Bruno Chudnov, Andrey
Bonchi, Filippo Correas Fernández, Jesús
Bonelli, Eduardo Costea, Andreea
Botbol, Vincent Cousot, Patrick
Bourke, Timothy Crole, Roy
x Organization

Cusumano-Towner, Marco Muller, Stefan

Dagand, Pierre-Evariste Ngo, Minh
Dahlqvist, Fredrik Oh, Hakjoo
Dang, Hai Ouadjaout, Abdelraouf
Danielsson, Nils Anders Ouederni, Meriem
Das, Ankush Palamidessi, Catuscia
Enea, Constantin Pearlmutter, Barak
Finkbeiner, Bernd Peters, Kirstin
Fromherz, Aymeric Pham, Long
Fuhs, Carsten Poli, Federico
Genaim, Samir Polikarpova, Nadia
Genitrini, Antoine Pottier, François
Ghica, Dan Rival, Xavier
Gordillo, Pablo Román-Díez, Guillermo
Gordon, Colin S. Sammartino, Matteo
Haas, Thomas Sasse, Ralf
Hage, Jurriaan Scalas, Alceste
He, Shaobo Scherer, Gabriel
Heljanko, Keijo Sieczkowski, Filip
Jourdan, Jacques-Henri Sivaramakrishnan, Kc
Kahn, David Staton, Sam
Kang, Jeehoon Stutsman, Ryan
Kuderski, Jakub Tan, Yong Kiam
Lahav, Ori van den Brand, Mark
Laurent, Olivier Vákár, Matthijs
Lee, Dongkwon Wang, Di
Lee, Wonyeol Wang, Meng
Lesani, Mohsen Wehrheim, Heike
Levy, Paul Blain Weng, Shu-Chun
Lindley, Sam Wies, Thomas
Martin-Martin, Enrique Wijesekera, Duminda
Mohan, Anshuman Wolff, Sebastian
Mordido, Andreia Zufferey, Damien
Morris, J. Garrett
Formal Methods for Evolving
Database Applications
(Abstract of Keynote Talk)

Işıl Dillig

University of Texas at Austin, USA

[email protected]

Many database applications undergo significant schema changes during their life cycle
due to performance or maintainability reasons. Examples of such schema changes
include denormalization, splitting a single table into multiple tables, and consolidating
multiple tables into a single table. Even though such schema refactorings are quite
common in practice, programmers need to spend significant time and effort to
re-implement parts of the code base that are affected by the schema change. Further-
more, it is not uncommon to introduce bugs during this code transformation process.
In this talk, I will present our recent work on using formal methods to simplify the
schema refactoring process for evolving database applications. Specifically, I will first
propose a definition of equivalence between database applications that operate over
different schemas. Building on this definition, I will then present a fully automated
technique for proving equivalence between a pair of applications. Our verification
technique is capable of automatically synthesizing bisimulation invariants between two
database applications and uses the inferred bisimulation invariant to automatically
prove equivalence.
In the next part of the talk, I will explain how to leverage this verification technique
to completely automate the code migration process. Specifically, given an original
database application P over schema S and a new schema S0 , I will discuss a practical
program synthesis technique that can be used to generate a new program P0 over
schema S0 such that P and P0 are provably equivalent. In particular, I will first present a
method for generating a program sketch of the new version; then, I will describe a
novel synthesis algorithm that efficiently explores the space of all programs that are in
the search space of the generated sketch.
Finally, I will describe experimental results on a suite of schema refactoring
benchmarks, including real-world database applications written in Ruby-on-Rails.
I will also outline remaining challenges in this area and motivate future research
directions relevant to research in programming languages and formal methods.
Contents

Trace-Relating Compiler Correctness and Secure Compilation . . . . . . . . . . . 1

Carmine Abate, Roberto Blanco, Ștefan Ciobâcă, Adrien Durier,
Deepak Garg, Cătălin Hrițcu, Marco Patrignani, Éric Tanter,
and Jérémy Thibault

Runners in Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Danel Ahman and Andrej Bauer

On the Versatility of Open Logical Relations: Continuity,

Automatic Differentiation, and a Containment Theorem . . . . . . . . . . . . . . . . 56
Gilles Barthe, Raphaëlle Crubillé, Ugo Dal Lago,
and Francesco Gavazzo

Constructive Game Logic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

Brandon Bohrer and André Platzer

Optimal and Perfectly Parallel Algorithms for On-demand

Data-Flow Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Krishnendu Chatterjee, Amir Kafshdar Goharshady,
Rasmus Ibsen-Jensen, and Andreas Pavlogiannis

Concise Read-Only Specifications for Better Synthesis of Programs

with Pointers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
Andreea Costea, Amy Zhu, Nadia Polikarpova, and Ilya Sergey

Soundness Conditions for Big-Step Semantics . . . . . . . . . . . . . . . . . . . . . . 169

Francesco Dagnino, Viviana Bono, Elena Zucca,
and Mariangiola Dezani-Ciancaglini

Liberate Abstract Garbage Collection from the Stack by Decomposing

the Heap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
Kimball Germane and Michael D. Adams

SMT-Friendly Formalization of the Solidity Memory Model. . . . . . . . . . . . . 224

Ákos Hajdu and Dejan Jovanović

Exploring Type-Level Bisimilarity towards More Expressive Multiparty

Session Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
Sung-Shik Jongmans and Nobuko Yoshida

Verifying Visibility-Based Weak Consistency . . . . . . . . . . . . . . . . . . . . . . . 280

Siddharth Krishna, Michael Emmi, Constantin Enea,
and Dejan Jovanović
xiv Contents

Local Reasoning for Global Graph Properties . . . . . . . . . . . . . . . . . . . . . . . 308

Siddharth Krishna, Alexander J. Summers, and Thomas Wies

Aneris: A Mechanised Logic for Modular Reasoning

about Distributed Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
Morten Krogh-Jespersen, Amin Timany, Marit Edna Ohlenbusch,
Simon Oddershede Gregersen, and Lars Birkedal

Continualization of Probabilistic Programs With Correction . . . . . . . . . . . . . 366

Jacob Laurel and Sasa Misailovic

Semantic Foundations for Deterministic Dataflow and Stream Processing. . . . 394

Konstantinos Mamouras

Connecting Higher-Order Separation Logic to a First-Order

Outside World . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428
William Mansky, Wolf Honoré, and Andrew W. Appel

Modular Inference of Linear Types for Multiplicity-Annotated Arrows . . . . . 456

Kazutaka Matsuda

RustHorn: CHC-Based Verification for Rust Programs. . . . . . . . . . . . . . . . . 484

Yusuke Matsushita, Takeshi Tsukada, and Naoki Kobayashi

A First-Order Logic with Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515

Adithya Murali, Lucas Peña, Christof Löding, and P. Madhusudan

Proving the Safety of Highly-Available Distributed Objects . . . . . . . . . . . . . 544

Sreeja S. Nair, Gustavo Petri, and Marc Shapiro

Solving Program Sketches with Large Integer Values . . . . . . . . . . . . . . . . . 572

Rong Pan, Qinheping Hu, Rishabh Singh, and Loris D’Antoni

Modular Relaxed Dependencies in Weak Memory Concurrency . . . . . . . . . . 599

Marco Paviotti, Simon Cooksey, Anouk Paradis, Daniel Wright,
Scott Owens, and Mark Batty

ARMv8-A System Semantics: Instruction Fetch in Relaxed Architectures. . . . 626

Ben Simner, Shaked Flur, Christopher Pulte, Alasdair Armstrong,
Jean Pichon-Pharabod, Luc Maranget, and Peter Sewell

Higher-Ranked Annotation Polymorphic Dependency Analysis . . . . . . . . . . . 656

Fabian Thorand and Jurriaan Hage

ConSORT: Context- and Flow-Sensitive Ownership Refinement Types

for Imperative Programs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684
John Toman, Ren Siqi, Kohei Suenaga, Atsushi Igarashi,
and Naoki Kobayashi
Contents xv

Mixed Sessions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715

Vasco T. Vasconcelos, Filipe Casal, Bernardo Almeida,
and Andreia Mordido

Higher-Order Spreadsheets with Spilled Arrays . . . . . . . . . . . . . . . . . . . . . . 743

Jack Williams, Nima Joharizadeh, Andrew D. Gordon,
and Advait Sarkar

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 771

Trace-Relating Compiler Correctness
and Secure Compilation
Carmine Abate1 Roberto Blanco1 S, tefan Ciobâcă2 Adrien Durier1
Deepak Garg3 Cătălin Hrit, cu1 Marco Patrignani4,5 Éric Tanter6,1 Jérémy Thibault1
1 2
Inria Paris, France UAIC Iaşi, Romania 3 MPI-SWS, Saarbrücken, Germany 4 Stanford University, Stanford, USA
5
CISPA, Saarbrücken, Germany 6 University of Chile, Santiago, Chile

Abstract. Compiler correctness is, in its simplest form, defined as the inclusion
of the set of traces of the compiled program into the set of traces of the origi-
nal program, which is equivalent to the preservation of all trace properties. Here
traces collect, for instance, the externally observable events of each execution.
This definition requires, however, the set of traces of the source and target lan-
guages to be exactly the same, which is not the case when the languages are far
apart or when observations are fine-grained. To overcome this issue, we study a
generalized compiler correctness definition, which uses source and target traces
drawn from potentially different sets and connected by an arbitrary relation. We
set out to understand what guarantees this generalized compiler correctness defi-
nition gives us when instantiated with a non-trivial relation on traces. When this
trace relation is not equality, it is no longer possible to preserve the trace prop-
erties of the source program unchanged. Instead, we provide a generic charac-
terization of the target trace property ensured by correctly compiling a program
that satisfies a given source property, and dually, of the source trace property one
is required to show in order to obtain a certain target property for the compiled
code. We show that this view on compiler correctness can naturally account for
undefined behavior, resource exhaustion, different source and target values, side-
channels, and various abstraction mismatches. Finally, we show that the same
generalization also applies to many secure compilation definitions, which char-
acterize the protection of a compiled program against linked adversarial code.

1 Introduction
Compiler correctness is an old idea [37, 40, 41] that has seen a significant revival in re-
cent times. This new wave was started by the creation of the CompCert verified C com-
piler [33] and continued by the proposal of many significant extensions and variants of
CompCert [8, 9, 12, 23, 29, 30, 42, 52, 56, 57, 61] and the success of many other mile-
stone compiler verification projects, including Vellvm [64], Pilsner [45], CakeML [58],
CertiCoq [4], etc. Yet, even for these verified compilers, the precise statement of cor-
rectness matters. Since proof assistants are used to conduct the verification, an external
observer does not have to understand the proofs in order to trust them, but one still has
to deeply understand the statement that was proved. And this is true not just for correct
compilation, but also for secure compilation, which is the more recent idea that our
compilation chains should do more to also ensure security of our programs [3, 26].
Basic Compiler Correctness. The gold standard for compiler correctness is semantic
preservation, which intuitively says that the semantics of a compiled program (in the
target language) is compatible with the semantics of the original program (in the source
c The Author(s) 2020
P. Müller (Ed.): ESOP 2020, LNCS 12075, pp. 1–28, 2020.
https://doi.org/10.1007/978-3-030-44914-8_ 1
2 C. Abate et al.

language). For practical veriﬁed compilers, such as CompCert [33] and CakeML [58],
semantic preservation is stated extrinsically, by referring to traces. In these two settings,
a trace is an ordered sequence of events—such as inputs from and outputs to an external
environment—that are produced by the execution of a program.
A basic deﬁnition of compiler correctness can be given by the set inclusion of the
traces of the compiled program into the traces of the original program. Formally [33]:

Deﬁnition 1.1 (Basic Compiler Correctness (CC)). A compiler ↓ is correct iff

∀W t. W↓ t ⇒ W t.

This definition says that for any whole1 source program W, if we compile it (denoted
W↓), execute it with respect to the semantics of the target language, and observe a trace
t, then the original W can produce the same trace t with respect to the semantics of
the source language.2 This definition is simple and easy to understand, since it only
references a few familiar concepts: a compiler between a source and a target language,
each equipped with a trace-producing semantics (usually nondeterministic).
Beyond Basic Compiler Correctness. This basic compiler correctness definition as-
sumes that any trace produced by a compiled program can be produced by the source
program. This is a very strict requirement, and in particular implies that the source and
target traces are drawn from the same set and that the same source trace corresponds
to a given target trace. These assumptions are often too strong, and hence in practice
verified compiler efforts use different formulations of compiler correctness:
CompCert [33] The original compiler correctness theorem of CompCert [33] can be
seen as an instance of basic compiler correctness, but it does not provide any guar-
antees for programs that can exhibit undefined behavior [53]. As allowed by the
C standard, such unsafe programs are not even considered to be in the source lan-
guage, so are not quantified over. This has important practical implications, since
undefined behavior often leads to exploitable security vulnerabilities [13, 24, 25]
and serious confusion even among experienced C and C++ developers [32, 53, 59,
60]. As such, since 2010, CompCert provides an additional top-level correctness
theorem3 that better accounts for the presence of unsafe programs by providing
guarantees for them up to the point when they encounter undefined behavior [53].
This new theorem goes beyond the basic correctness definition above, as a target
trace need only correspond to a source trace up to the occurrence of undefined
behavior in the source trace.
CakeML [58] Compiler correctness for CakeML accounts for memory exhaustion in
target executions. Crucially, memory exhaustion events cannot occur in source
traces, only in target traces. Hence, dually to CompCert, compiler correctness only
requires source and target traces to coincide up to the occurrence of a memory
exhaustion event in the target trace.
1
For simplicity, for now we ignore separate compilation and linking, returning to it in §5.
2
Typesetting convention [47]: we use a blue, sans-serif font for source elements, an orange,
bold font for target ones and a black , italic font for elements common to both languages.
3
Stated at the top of the CompCert file driver/Complements.v and discussed by Regehr [53].
Trace-Relating Compiler Correctness and Secure Compilation 3

Trace-Relating Compiler Correctness. Generalized formalizations of compiler cor-

rectness like the ones above can be naturally expressed as instances of a uniform defini-
tion, which we call trace-relating compiler correctness. This generalizes basic compiler
correctness by (a) considering that source and target traces belong to possibly distinct
sets TraceS and TraceT , and (b) being parameterized by an arbitrary trace relation ∼.
Definition 1.2 (Trace-Relating Compiler Correctness (CC∼ )). A compiler ↓ is cor-
rect with respect to a trace relation ∼ ⊆ TraceS × TraceT iff
∀W.∀t. W↓ t ⇒∃s ∼ t. W s.
This definition requires that, for any target trace t produced by the compiled program
W↓, there exist a source trace s that can be produced by the original program W and is
related to t according to ∼ (i.e., s ∼ t). By choosing the trace relation appropriately,
one can recover the different notions of compiler correctness presented above:
Basic CC Take s ∼ t to be s = t. Trivially, the basic CC of Definition 1.1 is CC= .
CompCert Undefined behavior is modeled in CompCert as a trace-terminating event
Goes_wrong that can occur in any of its languages (source, target, and all in-
termediate languages), so for a given phase (or composition thereof), we have
TraceS = TraceT . Nevertheless, the relation between source and target traces
with which to instantiate CC∼ to obtain CompCert’s current theorem is:
s ∼ t ≡ s = t ∨ (∃m ≤ t. s = m·Goes_wrong).
A compiler satisfying CC∼ for this trace relation can turn a source trace ending
in undefined behavior m·Goes_wrong (where “·” is concatenation) either into the
same trace in the target (first disjunct), or into a target trace that starts with the
prefix m but then continues arbitrarily (second disjunct, “≤” is the prefix relation).
CakeML Here, target traces are sequences of symbols from an alphabet ΣT that has
a specific trace-terminating event, Resource_limit_hit, which is not available
in the source alphabet ΣS (i.e., ΣT = ΣS ∪ {Resource_limit_hit}. Then, the
compiler correctness theorem of CakeML can be obtained by instantiating CC∼
with the following ∼ relation:
s ∼ t ≡ s = t ∨ (∃m. m ≤ s. t = m·Resource_limit_hit).
The resulting CC∼ instance relates a target trace ending in Resource_limit_hit
after executing m to a source trace that first produces m and then continues in a
way given by the semantics of the source program.
Beyond undefined behavior and resource exhaustion, there are many other practical
uses for CC∼ : in this paper we show that it also accounts for differences between source
and target values, for a single source output being turned into a series of target outputs,
and for side-channels.
On the flip side, the compiler correctness statement and its implications can be
more difficult to understand for CC∼ than for CC= . The full implications of choosing a
particular ∼ relation can be subtle. In fact, using a bad relation can make the compiler
correctness statement trivial or unexpected. For instance, it should be easy to see that
if one uses the total relation, which relates all source traces to all target ones, the CC∼
property holds for every compiler, yet it might take one a bit more effort to understand
that the same is true even for the following relation:
s ∼ t ≡ ∃W.W s ∧ W↓ t.
4 C. Abate et al.

Reasoning About Trace Properties. To understand more about a particular CC∼ in-
stance, we propose to also look at how it preserves trace properties—defined as sets of
allowed traces [31]—from the source to the target. For instance, it is well known that
CC= is equivalent to the preservation of all trace properties (where W |= π reads “W
satisfies π” and stands for ∀t. W t ⇒ t ∈ π):
CC= ≡ ∀π ∈ 2Trace ∀W. W|=π ⇒ W↓|=π.
However, to the best of our knowledge, similar results have not been formulated for
trace relations beyond equality, when it is no longer possible to preserve the trace prop-
erties of the source program unchanged. For trace-relating compiler correctness, where
source and target traces can be drawn from different sets and related by an arbitrary
trace relation, there are two crucial questions to ask:
1. For a source trace property πS of a program—established for instance by formal
verification—what is the strongest target property that any CC∼ compiler is guar-
anteed to ensure for the produced target program?
2. For a target trace property πT , what is the weakest source property we need to show
of the original source program to obtain πT for the result of any CC∼ compiler?
Far from being mere hypothetical questions, they can help the developer of a verified
compiler to better understand the compiler correctness theorem they are proving, and
we expect that any user of such a compiler will need to ask either one or the other if they
are to make use of that theorem. In this work we provide a simple and natural answer to
these questions, for any instance of CC∼ . Building upon a bijection between relations
and Galois connections [5, 20, 43], we observe that any trace relation ∼ corresponds
to two property mappings τ̃ and σ̃, which are functions mapping source properties to
target ones (τ̃ standing for “to target”) and target properties to source ones (σ̃ standing
for “to source”):
τ̃ (πS ) = {t | ∃s. s ∼ t ∧ s ∈ πS } ; σ̃(πT ) = {s | ∀t. s ∼ t ⇒ t ∈ πT } .
The existential image of ∼, τ̃ , answers the first question above by mapping a given
source property πS to the target property that contains all target traces for which there
exists a related source trace that satisfies πS . Dually, the universal image of ∼, σ̃, an-
swers the second question by mapping a given target property πT to the source property
that contains all source traces for which all related target traces satisfy πT . We intro-
duce two new correct compilation definitions in terms of trace property preservation
(TP): TPτ̃ quantifies over all source trace properties and uses τ̃ to obtain the corre-
sponding target properties. TPσ̃ quantifies over all target trace properties and uses σ̃
to obtain the corresponding source properties. We prove that these two definitions are
equivalent to CC∼ , yielding a novel trinitarian view of compiler correctness (Figure 1).

∀W. ∀t. W↓

t ⇒∃s ∼ t. W
s
≡

CC∼
∀πT . ∀W. W |= σ̃(πT ) ∀πS . ∀W. W |= πS
⇒ W↓ |= πT ≡ TP σ̃
TP τ̃ ≡ ⇒ W↓ |= τ̃ (πS )

Fig. 1: The equivalent compiler correctness deﬁnitions forming our trinitarian view.
Trace-Relating Compiler Correctness and Secure Compilation 5

Contributions.
We propose a new trinitarian view of compiler correctness that accounts for non-trivial
trace relations. While, as discussed above, specific instances of the CC∼ definition have
already been used in practice, we seem to be the first to propose assessing the meaning-
fulness of CC∼ instances in terms of how properties are preserved between the source
and the target, and in particular by looking at the property mappings σ̃ and τ̃ induced
by the trace relation ∼. We prove that CC∼ , TPσ̃ , and TPτ̃ are equivalent for any
trace relation (§2.2), as illustrated in Figure 1. In the opposite direction, we show that
for every trace relation corresponding to a given Galois connection [20], an analogous
equivalence holds. Finally, we extend these results (§2.3) from the preservation of trace
properties to the larger class of subset-closed hyperproperties (e.g., noninterference).
We use CC∼ compilers of various complexities to illustrate that our view on com-
piler correctness naturally accounts for undefined behavior (§3.1), resource exhaustion
(§3.2), different source and target values (§3.3), and differences in the granularity of
data and observable events (§3.4). We expect these ideas to apply to any other discrep-
ancies between source and target traces. For each compiler we show how to choose
the relation between source and target traces and how the induced property mappings
preserve interesting trace properties and subset-closed hyperproperties. We look at the
way particular σ̃ and τ̃ work on different kinds of properties and how the produced
properties can be expressed for different kinds of traces.
We analyze the impact of correct compilation on noninterference [22], showing what
can still be preserved (and thus also what is lost) when target observations are finer than
source ones, e.g., side-channel observations (§4). We formalize the guarantee obtained
by correct compilation of a noninterfering program as abstract noninterference [21], a
weakening of target noninterference. Dually, we identify a family of declassifications
of target noninterference for which source reasoning is possible.
Finally, we show that the trinitarian view also extends to a large class of secure com-
pilation definitions [2], formally characterizing the protection of the compiled program
against linked adversarial code (§5). For each secure compilation definition we again
propose both a property-free characterization in the style of CC∼ , and two character-
izations in terms of preserving a class of source or target properties satisfied against
arbitrary adversarial contexts. The additional quantification over contexts allows for
finer distinctions when considering different property classes, so we study mapping
classes not only of trace properties and hyperproperties, but also of relational hyper-
properties [2]. An example secure compiler accounting for a target that can produce
additional trace events that are not possible in the source illustrates this approach.
The paper closes with discussions of related (§6) and future work (§7). An online ap-
pendix contains omitted technical details: https://arxiv.org/abs/1907.05320.
The traces considered in our examples are structured, usually as sequences of events.
We notice however that unless explicitly mentioned, all our definitions and results are
more general and make no assumption whatsoever about the structure of traces. Most
of the theorems formally or informally mentioned in the paper were mechanized in the
Coq proof assistant and are marked with . This development has around 10k lines of
code, is described in the online appendix, and is available at the following address:
https://github.com/secure-compilation/different_traces.
6 C. Abate et al.

2 Trace-Relating Compiler Correctness

In this section, we start by generalizing the trace property preservation definitions at
the end of the introduction to TPσ and TPτ , which depend on two arbitrary mappings
σ and τ (§2.1). We prove that, whenever σ and τ form a Galois connection, TPσ and
TPτ are equivalent (Theorem 2.4). We then exploit a bijective correspondence between
trace relations and Galois connections to close the trinitarian view (§2.2), with two main
benefits: first, it helps us assess the meaningfulness of a given trace relation by look-
ing at the property mappings it induces; second, it allows us to construct new compiler
correctness definitions starting from a desired mapping of properties. Finally, we gen-
eralize the classic result that compiler correctness (i.e., CC= ) is enough to preserve not
just trace properties but also all subset-closed hyperproperties [14]. For this, we show
that CC∼ is also equivalent to subset-closed hyperproperty preservation, for which we
also define both a version in terms of σ̃ and a version in terms of τ̃ (§2.3).

2.1 Property Mappings

As explained in §1, trace-relating compiler correctness CC∼ , by itself, lacks a crisp de-
scription of which trace properties are preserved by compilation. Since even the syntax
of traces can differ between source and target, one can either look at trace properties of
the source (but then one needs to interpret them in the target), or at trace properties of
the target (but then one needs to interpret them in the source). Formally we need two
property mappings, τ : 2TraceS → 2TraceT and σ : 2TraceT → 2TraceS , which lead us
to the following generalization of trace property preservation (TP).

Deﬁnition 2.1 (TPσ and TPτ ). Given two property mappings, τ : 2TraceS → 2TraceT
and σ : 2TraceT → 2TraceS , for a compilation chain ·↓ we deﬁne:

TPτ ≡ ∀πS . ∀W. W |= πS ⇒ W↓ |= τ (πS ); TPσ ≡ ∀πT . ∀W. W |= σ(πT ) ⇒ W↓ |= πT .

For an arbitrary source program W, τ interprets a source property πS as the target

guarantee for W↓. Dually, σ defines a source obligation sufficient for the satisfaction
of a target property πT after compilation. Ideally:
– Given πT , the target interpretation of the source obligation σ(πT ) should actually
guarantee that πT holds, i.e., τ (σ(πT )) ⊆ πT ;
– Dually for πS , we would not want the source obligation for τ (πS ) to be harder than
πS itself, i.e., σ(τ (πS )) ⊇ πS .
These requirements are satisfied when the two maps form a Galois connection between
the posets of source and target properties ordered by inclusion. We briefly recall the
definition and the characteristic property of Galois connections [16, 38].

Definition 2.2 (Galois connection). Let (X, ) and (Y, ) be two posets. A pair of
maps, α : X → Y , γ : Y → X is a Galois connection iff it satisfies the adjunction law:
∀x ∈ X. ∀y ∈ Y. α(x) y ⇐⇒ x γ(y). α (resp. γ) is the lower (upper) adjoint
or abstraction (concretization) function and Y (X) the abstract (concrete) domain.

We will often write α : (X, ) (Y, ) : γ to denote a Galois connection, or simply

α : X Y : γ, or even α γ when the involved posets are clear from context.
Trace-Relating Compiler Correctness and Secure Compilation 7

Lemma 2.3 (Characteristic property of Galois connections). If α:(X, ) (Y, ):γ

is a Galois connection, then α, γ are monotone and they satisfy these properties:
i) ∀x ∈ X. x γ(α(x)); ii) ∀y ∈ Y. α(γ(y)) y.
If X, Y are complete lattices, then α is continuous, i.e., ∀F ⊆ X. α( F ) = α(F ).

If two property mappings, τ and σ, form a Galois connection on trace properties ordered
by set inclusion, Lemma 2.3 (with α = τ and γ = σ) tells us that they satisfy the ideal
conditions we discussed above, i.e., τ (σ(πT )) ⊆ πT and σ(τ (πS )) ⊇ πS .4
The two ideal conditions on τ and σ are sufficient to show the equivalence of the
criteria they define, respectively TPτ and TPσ .
Theorem 2.4 (TPτ and TPσ coincide ). Let τ : 2TraceS 2TraceT : σ be a Galois
connection, with τ and σ the lower and upper adjoints (resp.). Then TPτ ⇐⇒ TPσ .
2.2 Trace Relations and Property Mappings
We now investigate the relation between CC∼ , TPτ and TPσ . We show that for a trace
relation and its corresponding Galois connection (Lemma 2.7), the three criteria are
equivalent (Theorem 2.8). This equivalence offers interesting insights for both verifi-
cation and design of a correct compiler. For a CC∼ compiler, the equivalence makes
explicit both the guarantees one has after compilation (τ̃ ) and source proof obligations
to ensure the satisfaction of a given target property (σ̃). On the other hand, a compiler
designer might first determine the target guarantees the compiler itself must provide,
i.e., τ , and then prove an equivalent statement, CC∼ , for which more convenient proof
techniques exist in the literature [7, 58].
Definition 2.5 (Existential and Universal Image [20]). Given any two sets X and Y
and a relation ∼ ⊆ A × B, define its existential or direct image, τ̃ : 2X → 2Y and its
universal image, σ̃ : 2Y → 2X as follows:
τ̃ = λ π ∈ 2X . {y | ∃x. x ∼ y ∧ x ∈ π} ; σ̃ = λ π ∈ 2Y . {x | ∀y. x ∼ y ⇒ y ∈ π} .
When trace relations are considered, the existential and universal images can be used to
instantiate Definition 2.1 leading to the trinitarian view already mentioned in §1.
Theorem 2.6 (Trinitarian View ). For any trace relation ∼ and its existential and
universal images τ̃ and σ̃, we have: TPτ̃ ⇐⇒ CC∼ ⇐⇒ TPσ̃ .
This result relies both on Theorem 2.4 and on the fact that the existential and universal
images of a trace relation form a Galois connection ( ). Below we further generalize
this result (Theorem 2.8) relying on a bijective correspondence between trace relations
and Galois connections on properties.
Lemma 2.7 (Trace relations ∼ = Galois connections on trace properties). The func-
tion ∼ → τ̃ σ̃ that maps a trace relation to its existential and universal images
is a bijection between trace relations 2TraceS ×TraceT and Galois connections on trace
properties 2TraceS 2TraceT . Its inverse is τ σ → ∼,
ˆ where s ∼
ˆ t ≡ t ∈ τ ({s}).
4
While target traces are often “more concrete” than source ones, trace properties 2Trace (which
in Coq we represent as the function type Trace→Prop) are contravariant in Trace and thus
target properties correspond to the abstract domain.
8 C. Abate et al.

Proof. Gardiner et al. [20] show that the existential image is a functor from the category
of sets and relations to the category of predicate transformers, mapping a set X → 2X
and a relation ∼ ⊆ X × Y → τ̃ : 2X → 2Y . They also show that such a functor
is an isomorphism – hence bijective – when one considers only monotonic predicate
transformers that have a – unique – upper adjoint. The universal image of ∼, σ̃, is the
unique adjoint of τ̃ ( ), hence ∼ → τ̃ σ̃ is itself bijective.

The bijection just introduced allows us to generalize Theorem 2.6 and switch between
the three views of compiler correctness described earlier at will.

Theorem 2.8 (Correspondence of Criteria). For any trace relation ∼ and corre-
sponding Galois connection τ σ, we have: TPτ ⇐⇒ CC∼ ⇐⇒ TPσ .

Proof. For a trace relation ∼ and the Galois connection τ̃ σ̃, the result follows from
Theorem 2.6. For a Galois connection τ σ and ∼, ˆ use Lemma 2.7 to conclude that
the existential and universal images of ∼
ˆ coincide with τ and σ, respectively; the goal
then follows from Theorem 2.6.

We conclude by explicitly noting that sometimes the lifted properties may be trivial:
the target guarantee can be the true property (the set of all traces), or the source obli-
gation the false property (the empty set of traces). This might be the case when source
observations abstract away too much information (§3.2 presents an example).

2.3 Preservation of Subset-Closed Hyperproperties

A CC= compiler ensures the preservation not only of trace properties, but also of all
subset-closed hyperproperties, which are known to be preserved by reﬁnement [14]. An
example of a subset-closed hyperproperty is noninterference [14]; a CC= compiler thus
guarantees that if W is noninterfering with respect to the inputs and outputs in the trace
then so is W↓. To be able to talk about how (hyper)properties such as noninterference
are preserved, in this section we propose another trinitarian view involving CC∼ and
preservation of subset-closed hyperproperties (Theorem 2.11), slightly weakened in that
source and target property mappings will need to be closed under subsets.
First, recall that a program satisﬁes a hyperproperty when its complete set of traces,
which from now on we will call its behavior, is a member of the hyperproperty [14].

Deﬁnition 2.9 (Hyperproperty Satisfaction). A program W satisﬁes a hyperproperty

H, written W |= H, iff beh(W ) ∈ H, where beh(W ) = {t | W t}.
Hyperproperty preservation is a strong requirement in general. Fortunately, many inter-
esting hyperproperties are subset-closed (SCH for short), which simplifies their preser-
vation since it suffices to show that the behaviors of the compiled program refine the
behaviors of the source one, which coincides with the statement of CC= .
To talk about hyperproperty preservation in the trace-relating setting, we need an
interpretation of source hyperproperties into the target and vice versa. The one we con-
sider builds on top of the two trace property mappings τ and σ, which are naturally
lifted to hyperproperty mappings. This way we are able to extract two hyperproperty
mappings from a trace relation similarly to §2.2:
Trace-Relating Compiler Correctness and Secure Compilation 9

Deﬁnition 2.10 (Lifting property mappings to hyperproperty mappings). Let τ :

2TraceS → 2TraceT and σ : 2TraceT → 2TraceS be arbitrary property mappings. The
Trace TraceT
images of HS ∈ 22 S , HT ∈ 22 under τ and σ are, respectively:
τ (HS ) = {τ (πS ) | πS ∈ HS } ; σ(HT ) = {σ(πT ) | πT ∈ HT } .

Formally we are defining two new mappings, this time on hyperproperties, but by a
small abuse of notation we still denote them by τ and σ.
Interestingly, it is not possible to apply the argument used for CC= to show that a
CC∼ compiler guarantees W↓ |= τ̃ (HS ) whenever W |= HS . This is in fact not true
because direct images do not necessarily preserve subset-closure [36, 44]. To fix this
we close the image of τ̃ and σ̃ under subsets (denoted as Cl⊆ ) and obtain:
Theorem 2.11 (Preservation of Subset-Closed Hyperproperties ). For any trace
relation ∼ and its existential and universal images lifted to hyperproperties, τ̃ and σ̃,
and for Cl ⊆ (H) = {π | ∃π ∈ H. π ⊆ π }, we have:
SCHPCl ⊆ ◦τ̃ ⇐⇒ CC∼ ⇐⇒ SCHPCl ⊆ ◦σ̃ , where
SCHPCl ⊆ ◦τ̃ ≡ ∀W∀HS ∈ SCHS .W |= HS ⇒ W↓ |= Cl ⊆ (τ̃ (HS ));
SCHPCl ⊆ ◦σ̃ ≡ ∀W∀HT ∈ SCHT .W |= Cl ⊆ (σ̃(HT )) ⇒ W↓ |= HT .
Theorem 2.11 makes us aware of the potential loss of precision when interested in
preserving subset-closed hyperproperties through compilation. In §4 we focus on a se-
curity relevant subset-closed hyperproperty, noninterference, and show that such a loss
of precision can be intended as a declassification of noninterference.

3 Instances of Trace-Relating Compiler Correctness

The trace-relating view of compiler correctness above can serve as a unifying frame-
work for studying a range of interesting compilers. This section provides several rep-
resentative instantiations of the framework: source languages with undeﬁned behavior
that compilation can turn into arbitrary target behavior (§3.1), target languages with re-
source exhaustion that cannot happen in the source (§3.2), changes in the representation
of values (§3.3), and differences in the granularity of data and observable events (§3.4).

3.1 Undeﬁned Behavior

We start by expanding upon the discussion of undefined behavior in §1. We first study
the model of CompCert, where source and target alphabets are the same, including the
event for undefined behavior. The trace relation weakens equality by allowing undefined
behavior to be replaced with an arbitrary sequence of events.
Example 3.1 (CompCert-like Undefined Behavior Relation). Source and target traces
are sequences of events drawn from Σ, where Goes_wrong ∈ Σ is a terminal event that
represents an undefined behavior. We then use the trace relation from the introduction:
s ∼ t ≡ s = t ∨ ∃m ≤ t. s = m · Goes_wrong.
Each trace of a target program produced by a CC∼ compiler is either also a trace of the
original source program or it has a finite prefix that the source program also produces,
immediately before encountering undefined behavior. As explained in §1, one of the
correctness theorems in CompCert can be rephrased as this variant of CC∼ .
10 C. Abate et al.

We proved that the property mappings induced by the relation can be written as ( ):
σ̃(πT ) = {s | s∈πT ∧ s = m·Goes_wrong} ∪ {m·Goes_wrong | ∀t. m≤t =⇒ t∈πT } ;
τ̃ (πS ) = {t | t∈πS } ∪ {t | ∃m ≤ t. m·Goes_wrong ∈ πS } .

These two mappings explain what a CC∼ compiler ensures for the ∼ relation above. The
target-to-source mapping σ̃ states that to prove that a compiled program has a property
πT using source-level reasoning, one has to prove that any trace produced by the source
program must either be a target trace satisfying πT or have undefined behavior, but only
provided that any continuation of the trace substituted for the undefined behavior satis-
fies πT . The source-to-target mapping τ̃ states that by compiling a program satisfying
a property πS we obtain a program that produces traces that satisfy the same property
or that extend a source trace that ends in undefined behavior.
These definitions can help us reason about programs. For instance, σ̃ specifies that,
to prove that an event does not happen in the target, it is not enough to prove that it
does not happen in the source: it is also necessary to prove that the source program is
does not have any undefined behavior (second disjunct). Indeed, if it had an undefined
behavior, its continuations could exhibit the unwanted event.
This relation can be easily generalized to other settings. For instance, consider the
setting in which we compile down to a low-level language like machine code. Target
traces can now contain new events that cannot occur in the source: indeed, in modern
architectures like x86 a compiler typically uses only a fraction of the available instruc-
tion set. Some instructions might even perform dangerous operations, such as writing
to the hard drive. Formally, the source and target do not have the same events any more.
Thus, we consider a source alphabet ΣS = Σ ∪ {Goes_wrong}, and a target alpha-
bet ΣT = Σ ∪ Σ . The trace relation is defined in the same way and we obtain the
same property mappings as above, except that since target traces now have more events
(some of which may be dangerous), and the arbitrary continuations of target traces get
more interesting. For instance, consider a new event that represents writing data on the
hard drive, and suppose we want to prove that this event cannot happen for a compiled
program. Then, proving this property requires exactly proving that the source program
exhibits no undefined behavior [11]. More generally, what one can prove about target-
only events can only be either that they cannot appear (because there is no undefined
behavior) or that any of them can appear (in the case of undefined behavior).
In §5.2 we study a similar example, showing that even in a safe language linked ad-
versarial contexts can cause dangerous target events that have no source correspondent.

3.2 Resource Exhaustion

Let us return to the discussion about resource exhaustion in §1.

Example 3.2 (Resource Exhaustion). We consider traces made of events drawn from
ΣS in the source, and ΣT = ΣS ∪ {Resource_Limit_Hit} in the target. Recall the
trace relation for resource exhaustion:
s ∼ t ≡ s = t ∨ ∃m ≤ s. t = m · Resource_Limit_Hit.
Formally, this relation is similar to the one for undeﬁned behavior, except this time it is
the target trace that is allowed to end early instead of the source trace.
Trace-Relating Compiler Correctness and Secure Compilation 11

The induced trace property mappings σ̃ and τ̃ are the following ( ):

σ̃(πT ) = {s | s ∈ πT } ∩ {s | ∀m ≤ s. m · Resource_Limit_Hit ∈ πT };
τ̃ (πS ) = πS ∪ {m · Resource_Limit_Hit | ∃s ∈ πS . m ≤ s}.
These capture the following intuitions. The target-to-source mapping σ̃ states that to
prove a property of the compiled program one has to show that the traces of the source
program satisfy two conditions: (1) they must also satisfy the target property; and (2)
the termination of every one of their prefixes by a resource exhaustion error must be
allowed by the target property. This is rather restrictive: any property that prevents re-
source exhaustion cannot be proved using source-level reasoning. Indeed, if πT does
not allow resource exhaustion, then σ̃(πT ) = ∅. This is to be expected since resource
exhaustion is simply not accounted for at the source level. The other mapping τ̃ states
that a compiled program produces traces that either belong to the same properties as the
traces of the source program or end early due to resource exhaustion.
In this example, safety properties [31] are mapped (in both directions) to other safety
properties ( ). This can be desirable for a relation: since safety properties are usually
easier to reason about, one interested only in safety properties at the target can reason
about them using source-level reasoning tools for safety properties.
The compiler correctness theorem in CakeML is an instance of CC∼ for the ∼
relation above. We have also implemented two small compilers that are correct for this
relation. The full details can be found in the Coq development in the supplementary
materials. The first compiler ( ) goes from a simple expression language (similar to the
one in §3.3 but without inputs) to the same language except that execution is bounded by
some amount of fuel: each execution step consumes some amount of fuel and execution
immediately halts when it runs out of fuel. The compiler is the identity.
The second compiler ( ) is more interesting: we proved this CC∼ instance for a
variant of a compiler from a WHILE language to a simple stack machine by Xavier
Leroy [35]. We enriched the two languages with outputs and modified the semantics of
the stack machine so that it falls into an error state if the stack reaches a certain size.
The proof uses a standard forward simulation modified to account for failure.
We conclude this subsection by noting that the resource exhaustion relation and
the undefined behavior relation from the previous subsection can easily be combined.
Indeed, given a relation ∼UB and a relation ∼RE defined as above on the same sets of
traces, we can build a new relation ∼ that allows both refinement of undefined behavior
and resource exhaustion by taking their union: s ∼ t ≡ s ∼UB t ∨ s ∼RE t. A compiler
that is CC∼UB or CC∼RE is trivially CC∼ , though the converse is not true.

3.3 Different Source and Target Values

We now illustrate trace-relating compilation for a translation mapping source-level
booleans to target-level natural numbers. Given the simplicity of this compiler, most
of the details of the formalization are deferred to the online appendix.
The source language is a pure, statically typed expression language whose expres-
sions e include naturals n, booleans b, conditionals, arithmetic and relational operations,
boolean inputs inb and natural inputs inn . A trace s is a list of inputs is paired with a
result r, which can be a natural, a boolean, or an error. Well-typed programs never pro-
duce error ( ). Types ty are either N (naturals) or B (booleans); typing is standard. The
12 C. Abate et al.

source language has a standard big-step operational semantics (e is, r) which tells
how an expression e generates a trace is, r. The target language is analogous, except
that it is untyped, only has naturals n and its only inputs are naturals inn . The semantics
of the target language is also given in big-step style. Since we only have naturals and
all expressions operate on them, no error result is possible in the target.
The compiler is homomorphic, translating a source expression to the same target
expression; the only differences are natural numbers (and conditionals), as noted below.
true↓ = 1 inb ↓ = inn e1 ≤ e2 ↓ = if e1 ↓ ≤ e2 ↓ then 1 else 0
false↓ = 0 inn ↓ = inn if e1 then e2 else e3 ↓ = if e1 ↓ ≤ 0 then e3 ↓ else e2 ↓
When compiling an if-then-else the target condition e1 ↓ ≤ 0 is used to check that e1 is
false, and therefore the then and else branches of the source are swapped in the target.
Relating Traces. We relate basic values (naturals and booleans) in a non-injective fash-
ion as noted below. Then, we extend the relation to lists of inputs pointwise (Rules Empty
and Cons) and lift that relation to traces (Rules Nat and Bool).
n∼n true ∼ n if n > 0 false ∼ 0
(Empty) (Cons) (Nat) (Bool)
i ∼ i is ∼ is is ∼ is n∼n is ∼ is b∼n
∅∼∅ i · is ∼ i · is is, n ∼ is, n is, b ∼ is, n
Property mappings. The property mappings σ̃ and τ̃ induced by the trace relation ∼
deﬁned above capture the intuition behind encoding booleans as naturals:
– the source-to-target mapping allows true to be encoded by any non-zero number;
– the target-to-source mapping requires that 0 be replaceable by both 0 and false.
Compiler correctness. With the relation above, the compiler is proven to satisfy CC∼ .

Theorem 3.3 ( ·↓ is correct ). ·↓ is CC∼ .

Simulations with different traces. The difficulty in proving Theorem 3.3 arises from
the trace-relating compilation setting: For compilation chains that have the same source
and target traces, it is customary to prove compiler correctness using a forward simula-
tion (i.e., a simulation between source and target transition system); then, using deter-
minacy [18, 39] of the target language and input totality [19, 63] (aka receptiveness) of
the source, this forward simulation is flipped into a backward simulation (a simulation
between target and source transition system), as described by Beringer et al. [7], Leroy
[34]. This flipping is useful because forward simulations are often much easier to prove
(by induction on the transitions of the source) than backward ones, as it is the case here.
We first give the main idea of the flipping proof, when the inputs are the same in
the source and the target [7, 34]. We only consider inputs, as it is the most interesting
case, since with determinacy, nondeterminism only occurs on inputs. Given a forward
simulation R, and a target program WT that simulates a source program WS , WT is
able to perform an input iff so is WS : otherwise, say for instance that WS performs an
output, by forward simulation WT would also perform an output, which is impossible
because of determinacy. By input totality of the source, WS must be able to perform
the exact same input as WT ; using forward simulation and determinacy, the resulting
programs must be related.
Trace-Relating Compiler Correctness and Secure Compilation 13

WS = WS R WT

i1 ks i2 ks +3 i1
By input totality By contradiction,

using forward simulation

∃WS1 · and determinacy WT1
R
By forward simulation and determinacy

However, our trace relation is not injective (both 0 and false are mapped to 0),
therefore these arguments do not apply: not all possible inputs of target programs are
accounted for in the forward simulation. We thus have to strengthen the forward sim-
ulation assumption, requiring the following additional property to hold, for any source
program WS and target program WT related by the forward simulation R.
WS R WT
∃iS2 iT 2 where iS 1 ∼ iT 1
iS1 iT 1
{ # i S 1 ∼ iT 2
∃WS2 WS1 R WT1 WT2 iS2 ∼ iT2
R
We say that a forward simulation for which this property holds is flippable. For our
example compiler, a flippable forward simulation works as follows: whenever a boolean
input occurs in the source, the target program must perform every strictly positive input
n (and not just 1, as suggested by the compiler). Using this property, determinacy of
the target, input totality of the source, as well as the fact that any target input has an
inverse image through the relation, we can indeed show that the forward simulation can
be turned into a backward one: starting from WS R WT and an input iT2 , we show
that there is iS1 and iT2 as in the diagram above, using the same arguments as when the
inputs are the same; because the simulation is flippable, we can close the diagram, and
obtain the existence of an adequate iS2 . From this we obtain CC∼ .
In fact, we have proven a completely general ‘flipping theorem’, with this flippable
hypothesis on the forward simulation ( ). We have also shown that if the relation ∼
defines a bijection between the inputs of the source and the target, then any forward
simulation is flippable, hence reobtaining the usual proof technique [7, 34] as a special
case. This flipping theorem is further discussed in the online appendix.

3.4 Abstraction Mismatches

We now consider how to relate traces where a single source action is compiled to mul-
tiple target ones. To illustrate this, we take a pure, statically-typed source language that
can output (nested) pairs of arbitrary size, and a pure, untyped target language where
sent values have a ﬁxed size. Concretely, the source is analogous to the language of §3.3,
except that it does not have inputs or booleans and it has an expression send e, which
can emit a (nested) pair e of values in a single action. That is, given that e reduces
to a pair, e.g., v1, v2, v3, expression send v1, v2, v3 emits action v1, v2, v3.
That expression is compiled into a sequence of individual sends in the target language
send v1 ; send v2 ; send v3, since in the target, send e sends the value that e re-
duces to, but the language has no pairs.
14 C. Abate et al.

Due to space constraints we omit the⏐ full formalization of these simple languages
and of the homomorphic compiler ( (·) : e → e). The only interesting bit is the
compilation of the send · expression, which relies on the gensend (·) function below.
That function takes a source expression of a given type and returns a sequence of target
send · instructions that send each element of the expression.
⏐
send ( e : N) if τ = N
gensend ( e : τ ) =
gensend ( e.1 : τ ); gensend ( e.2 : τ ) if τ = τ × τ
Relating Traces. We start with the trivial relation between numbers: n ∼0 n, i.e., num-
bers are related when they are the same. We cannot build a relation between single ac-
tions since a single source action is related to multiple target ones. Therefore, we deﬁne
a relation between a source action M and a target trace t (a list of numbers), inductively
on the structure of M (which is a pair of values, and values are natural numbers or pairs).
(Trace-Rel-N-N) (Trace-Rel-N-M) (Trace-Rel-M-N) (Trace-Rel-M-M)
n ∼0 n n ∼0 n n ∼0 n M∼t M∼t n ∼0 n M∼t M ∼ t
n, n ∼ n · n

n, M ∼ n · t M, n ∼ t · n M, M ∼ t · t

A pair of naturals is related to the two actions that send each element of the pair
(Rule Trace-Rel-N-N). If a pair is made of sub-pairs, we require all such sub-pairs to be
related (Rules Trace-Rel-N-M to Trace-Rel-M-M). We build on these rules to deﬁne the
s ∼ t relation between source and target traces for which the (Trace-Rel-Single)
compiler is correct (Theorem 3.4). Trivially, traces are related s∼t M ∼ t
when they are both empty. Alternatively, given related traces, s · M ∼ t · t
we can concatenate a source action and a second target trace
provided that they are related (Rule Trace-Rel-Single).
Theorem 3.4 ( (·) is correct). (·) is CC∼ .
⏐ ⏐

With our trace relation, the trace property mappings capture the following intuitions:
– The target-to-source mapping states that a source property can reconstruct target
action as it sees ﬁt. For example, trace 4 · 6 · 5 · 7 is related to 4, 6 · 5, 7 and
4, 6, 5, 7 (and many more variations). This gives freedom to the source im-
plementation of a target behavior, which follows from the non-injectivity of ∼.5
– The source-to-target mapping “forgets” about the way pairs are nested, but is faith-
ful w.r.t. the values vi contained in a message. Notice that source safety properties
are always mapped to target safety properties. For instance, if πS ∈ SafetyS pre-
scribes that some bad number is never sent, then τ̃ (πS ) prescribes the same number
is never sent in the target and τ̃ (πS ) ∈ SafetyT . Of course if πS ∈ SafetyS pre-
scribes that a particular nested pairing like 4, 6, 5, 7 never happens, then τ̃ (πS )
is still a target safety property, but the trivial one, since τ̃ (πS ) = ∈ SafetyT .

4 Trace-Relating Compilation and Noninterference Preservation

When source and target observations are drawn from the same set, a correct compiler
(CC= ) is enough to ensure the preservation of all subset-closed hyperproperties, in par-
ticular of noninterference (NI) [22], as also mentioned at the beginning of §2.3. In the
5
Making ∼ injective is a matter of adding open and close parenthesis actions in target traces.
Trace-Relating Compiler Correctness and Secure Compilation 15

scenario where target observations are strictly more informative than source observa-
tions, the best guarantee one may expect from a correct trace-relating compiler (CC∼ )
is a weakening (or declassification) of target noninterference that matches the noninter-
ference property satisfied in the source. To formalize this reasoning, this section applies
the trinitarian view of trace-relating compilation to the general framework of abstract
noninterference (ANI) [21].
We first define NI and explain the issue of preserving source NI via a CC∼ compiler.
We then introduce ANI, which allows characterizations of various forms of noninterfer-
ence, and formulate a general theory of ANI preservation via CC∼ . We also study how
to deal with cases such as undefined behavior in the target. Finally, we answer the dual
question, i.e., which source NI should be satisfied to guarantee that compiled programs
are noninterfering with respect to target observers.
Intuitively, NI requires that publicly observable outputs do not reveal information
about private inputs. To define this formally, we need a few additions to our setup. We
indicate the (disjoint) input and output projections of a trace t as t◦ and t• respectively6 .
Denote with [t]low the equivalence class of a trace t, obtained using a standard low-
equivalence relation that relates low (public) events only if they are equal, and ingores
any difference between private events. Then, NI for source traces can be defined as:
NIS = {πS | ∀s1 s2 ∈ πS . [s◦1 ]low = [s◦2 ]low ⇒ [s•1 ]low = [s•2 ]low } .
That is, source NI comprises the sets of traces that have equivalent low output projec-
tions as long as their low input projections are equivalent.
Trace-Relating Compilation and Noninterference. When additional observations are
possible in the target, it is unclear whether a noninterfering source program is compiled
to a noninterfering target program or not, and if so, whether the notion of NI in the tar-
get is the expected or desired one. We illustrate this issue considering a scenario where
target traces extend source ones by exposing the execution time. While source noninter-
ference NIS requires that private inputs do not affect public outputs, NIT additionally
requires that the execution time is not affected by private inputs.
To model the scenario described, let TraceS denote the set of traces in the source,
and TraceT = TraceS × Nω be the set of target traces, where Nω N ∪ {ω}. Tar-
get traces have two components: a source trace, and a natural number that denotes
the time spent to produce the trace (ω if infinite). Notice that if two source traces
s1 , s2 , are low-equivalent then {s1 , s2 } ∈ NIS and {(s1 , 42), (s1 , 42)} ∈ NIT , but
{(s1 , 42), (s2 , 43)} ∈ NIT and {(s1 , 42), (s2 , 42), (s1 , 43), (s2 , 43)} ∈ NIT .
Consider the following straightforward trace relation, which relates a source trace
to any target trace whose first component is equal to it, irrespective of execution time:
s ∼ t ≡ ∃n. t = (s, n).
A compiler is CC∼ if any trace that can be exhibited in the target can be simulated
in the source in some amount of time. For such a compiler Theorem 2.11 says that
if W satisfies NIS , then W↓ satisfies Cl⊆ ◦ τ̃ (NIS ), which however is strictly weaker
than NIT , as it contains, e.g., {(s1 , 42), (s2 , 42), (s1 , 43), (s2 , 43)}, and one cannot
conclude that W↓ is noninterfering in the target. It is easy to prove that
6
Here we only require the projections to be disjoint. Depending on the scenario and the attacker
model the projections might record information such as the ordering of events.
16 C. Abate et al.

Cl⊆ ◦ τ̃ (NIS ) = Cl⊆ ({ πS × Nω | πS ∈ NIS }) = { πS × I | πS ∈ NIS ∧ I ⊆ Nω } ,

the first equality coming from τ̃ (πS ) = πS × Nω , and the second from NIS being
subset-closed. As we will see, this hyperproperty can be characterized as a form of
NI, which one might call timing-insensitive noninterference, and ensured only against
attackers that cannot measure execution time. For this characterization, and to describe
different forms of noninterference as well as formally analyze their preservation by a
CC∼ compiler, we rely on the general framework of abstract noninterference [21].
Abstract Noninterference. ANI [21] is a generalization of NI whose formulation re-
lies on abstractions (in abstract interpretation sense [16]) in order to encompass arbi-
trary variants of NI. ANI is parameterized by an observer abstraction ρ, which denotes
the distinguishing power of the attacker, and a selection abstraction φ, which specifies
when to check NI, and therefore captures a form of declassification [54].7 Formally:
ANI ρφ = {π | ∀t1 t2 ∈ π. φ(t◦1 ) = φ(t◦2 ) ⇒ ρ(t•1 ) = ρ(t•2 )} .
By picking φ = ρ = [·]low , we recover the standard noninterference defined above,
where NI must hold for all low inputs (i.e., no declassification of private inputs), and
the observational power of the attacker is limited to distinguishing low outputs.
The observational power of the attacker can be weakened by choosing a more liberal
relation for ρ. For instance, one may limit the attacker to observe the parity of output
integer values. Another way to weaken ANI is to use φ to specify that noninterference
is only required to hold for a subset of low inputs.
To be formally precise, φ and ρ are defined over sets of (input and output projections
of) traces, so when we write φ(t) above, this should be understood as a convenience

notation for φ({t}). Likewise, φ = [·]low should be understood as φ = λπ. t∈π [t]low ,
i.e., the powerset lifting of [·]low . Additionally, φ and ρ are required to be upper-closed
operators (uco)—i.e., monotonic, idempotent and extensive—on the poset that is the
powerset of (input and output projections of) traces ordered by inclusion [21].
Trace-Relating Compilation and ANI for Timing. We can now reformulate our ex-
ample with observable execution times in the target in terms of ANI. We have NIS =
ANI φρSS with φS = ρS = [·]low . In this case, we can formally describe the hyperproperty
that a compiled program W↓ satisfies whenever W satisfies NIS as an instance of ANI:
ρ
Cl⊆ ◦ τ̃ (NIS ) = ANI φTT ,
for φT = φS and ρT (πT ) = {(s, n) | ∃(s1 , n1 ) ∈ πT . [s• ]low = [s•1 ]low } .
The definition of φT tells us that the trace relation does not affect the selection abstrac-
tion. The definition of ρT characterizes an observer that cannot distinguish execution
times for noninterfering traces (notice that n1 in the definition of ρT is discarded). For
instance, ρT ({(s, n1 )}) = ρT ({(s, n2 )}), for any s, n1 , n2 . Therefore, in this setting,
we know explicitly through ρT that a CC∼ compiler degrades source noninterference
to target timing-insensitive noninterference.
Trace-Relating Compilation and ANI in General. While the particular φT and ρT
above can be discovered by intuition, we want to know whether there is a systematic
way of obtaining them in general. In other words, for any trace relation ∼ and any
7
ANI includes a third parameter η, which describes the maximal input variation that the attacker
may control. Here we omit η (i.e., take it to be the identity) in order to simplify the presentation.
Trace-Relating Compiler Correctness and Secure Compilation 17

notion of source NI, what property is guaranteed on noninterfering source programs by

any CC∼ compiler?
We can now answer this question generally (Theorem 4.1): any source notion of
noninterference expressible as an instance of ANI is mapped to a corresponding in-
stance of ANI in the target, whenever source traces are an abstraction of target ones
(i.e., when ∼ is a total and surjective map). For this result we consider trace relations
that can be split into input and output trace relations (denoted as ∼ ∼, ∼) such that
◦ •

s ∼ t ⇐⇒ s◦ ∼ t◦ ∧ s• ∼ t• . The trace relation ∼ corresponds to a Galois connection

◦ •

between the sets of trace properties τ̃ σ̃ as described in §2.2. Similarly, the pair ∼
◦

and ∼ corresponds to a pair of Galois connections, τ̃ ◦ σ̃ ◦ and τ̃ • σ̃ • , between the

•

sets of input and output properties. In the timing example, time is an output so we have
∼ =, ∼ and ∼ is deﬁned as s• ∼ t• ≡ ∃n. t• = (s• , n).
• • •

Theorem 4.1 (Compiling ANI). Assume traces of source and target languages are
related via ∼ ⊆ TraceS × TraceT , ∼ ∼, ∼ such that ∼ and ∼ are both total
◦ • ◦ •

maps from target to source traces, and ∼ is surjective. Assume ↓ is a CC∼ compiler,
◦

◦ •
and φS ∈ uco(2TraceS ), ρS ∈ uco(2TraceS ).
ρ#
If W satisfies ANI ρφSS , then W↓ satisfies ANI T
, where φ# #
T and ρT are defined as:
φ#
T

φ#T = g ◦ φS ◦ f ;
◦ ◦
ρ#
T = g ◦ ρS ◦ f
• •
and
∃t ∈ πT . s ∼ t ; g (πS ) = {t | ∀s . s ∼ t◦ ⇒ s◦ ∈ πS◦ }
◦ ◦

◦ ◦ ◦ ◦ ◦
◦ ◦
◦ ◦ ◦ ◦ ◦
f (πT ) = s
(and both f • and g • are defined analogously).
For the example above we recover the definitions we justified intuitively, i.e., φ# T =
g ◦ ◦ φS ◦ f ◦ = φT and ρ# ◦ ◦ ρ ∼
•
• •
T = g ρ S f = T . Moreover, we can prove that if also
ρ# ρ#
is surjective, ANI T
⊆ Cl ⊆ ◦ τ̃ (ANI ρφSS ). Therefore, the derived guarantee ANI T
is
φ#
T φ#
T
at least as strong as the one that follows by just knowing that the compiler ↓ is CC∼ .
Noninterference and Undefined Behavior. As stated above, Theorem 4.1 does not
apply to several scenarios from §3 such as undefined behavior (§3.1), as in those cases
the relation ∼ is not a total map. Nevertheless, we can still exploit our framework to
•

reason about the impact of compilation on noninterference.

Let us consider ∼ ∼, ∼ where ∼ is any total and surjective map from target to
◦ • ◦

source inputs (e.g., equality) and ∼ is deﬁned as s• ∼ t• ≡ s• = t• ∨ ∃m• ≤ t• . s• =

• •

m• · Goes_wrong. Intuitively, a CC∼ compiler guarantees that no interference can be

observed by a target attacker that cannot exploit undefined behavior to learn private
information. This intuition can be made formal by the following theorem.
Theorem 4.2 (Relaxed Compiling ANI). Relax the assumptions of Theorem 4.1 by
allowing ∼ to be any output trace relation. If W satisfies ANI ρφSS , then W↓ satisfies
•

ρ#
ANI T
where φ# #
T is deﬁned as in Theorem 4.1, and ρT is such that:
φ#
T

∀s t. s• ∼ t• ⇒ ρ# # •
T (t ) = ρT (τ̃ (ρS (s ))).
• •
•

Technically, instead of giving us a deﬁnition of ρ# T , the theorem gives a property of it.

The property states that, given a target output trace t• , the attacker cannot distinguish it
from any other target output traces produced by other possible compilations (τ̃ • ) of the
18 C. Abate et al.

source trace s it relates to, up to the observational power of the source level attacker ρS .
Therefore, given a source attacker ρS , the theorem characterizes a family of attackers
that cannot observe any interference for a correctly compiled noninterfering program.
Notice that the target attacker ρ# T = λ_. satisﬁes the premise of the theorem, but
ρ#
deﬁnes a trivial hyperproperty, so that we cannot prove in general that ANI T
⊆ Cl ⊆ ◦
φ#
T

τ̃ (ANI ρφSS ).
The same ρ#
= λ_. shows that the family of attackers described in
T
Theorem 4.2 is nonempty, and this ensures the existence of a most powerful attacker
among them [21], whose explicit characterization we leave for future work.
From Target NI to Source NI. We now explore the dual question: under what hy-
potheses does trace-relating compiler correctness alone allow target noninterference to
be reduced to source noninterference? This is of practical interest, as one would be able
to protect from target attackers by ensuring noninterference in the source. This task can
be made easier if the source language has some static enforcement mechanism [1, 36].
Let us consider the languages from §3.4 extended with inputting of (pairs of) values.
It is easy to show that the compiler described in §3.4 is still CC∼ . Assume that we want
ρ
to satisfy a given notion of target noninterference after compilation, i.e., W↓|=ANI φTT .
Recall that the observational power of the target attacker, ρT , is expressed as a property
of sequences of values. To express the same property (or attacker) in the source, we
have to abstract the way pairs of values are nested. For instance, the source attacker
should not distinguish v1 , v2 , v3 and v1 , v2 , v3 . In general (i.e., when ∼ is not
◦

the identity), this argument is valid only when φT can be represented in the source.
More precisely, φT must consider as equivalent all target inputs that are related to the
same source one, because in the source it is not possible to have a ﬁner distinction of
inputs. This intuitive correspondence can be formalized as follows:
Theorem 4.3 (Target ANI by source ANI). Let φT ∈ uco(2TraceT ), ρT ∈ uco(2TraceT )
◦ •

and ∼ a total and surjective map from source outputs to target ones and assume that
•

∀s t. s◦ ∼ t◦ ⇒ φT (t◦ ) = φT (τ̃ ◦ (s◦ )).

◦

ρ# ρ
If ·↓ is a CC∼ compiler and W satisﬁes ANI S
, then W↓ satisﬁes ANI φTT for
φ#
S

φ#
S = σ̃ ◦ φT ◦ τ̃ ;
◦ ◦
ρ#
S = σ̃ ◦ ρT ◦ τ̃ .
• •

To wrap up the discussion about noninterference, the results presented in this section
formalize and generalize some intuitive facts about compiler correctness and noninter-
ference. Of course, they all place some restrictions on the shape of the noninterference
instances that can be considered, because compiler correctness alone is in general not a
strong enough criterion for dealing with many security properties [6, 17].

5 Trace-Relating Secure Compilation

So far we have studied compiler correctness criteria for whole, standalone programs.
However, in practice, programs do not exist in isolation, but in a context where they in-
teract with other programs, libraries, etc. In many cases, this context cannot be assumed
to be benign and could instead behave maliciously to try to disrupt a compiled program.
Hence, in this section we consider the following secure compilation scenario: a
source program is compiled and linked with an arbitrary target-level context, i.e., one
Trace-Relating Compiler Correctness and Secure Compilation 19

that may not be expressible as the compilation of a source context. Compiler correctness
does not address this case, as it does not consider arbitrary target contexts, looking
instead at whole programs (empty context [33]) or well-behaved target contexts that
behave like source ones (as in compositional compiler correctness [27, 30, 45, 57]).
To account for this scenario, Abate et al. [2] describe several secure compilation
criteria based on the preservation of classes of (hyper)properties (e.g., trace properties,
safety, hypersafety, hyperproperties, etc.) against arbitrary target contexts. For each of
these criteria, they give an equivalent “property-free” criterion, analogous to the equiv-
alence between TP and CC= . For instance, their robust trace property preservation cri-
terion (RTP) states that, for any trace property π, if a source partial program P plugged
into any context CS satisfies π, then the compiled program P↓ plugged into any target
context CT satisfies π. Their equivalent criterion to RTP is RTC, which states that for
any trace produced by the compiled program, when linked with any target context, there
is a source context that produces the same trace. Formally (writing C [P ] to mean the
whole program that results from linking partial program P with context C) they define:
RTP ≡ ∀P. ∀π. (∀CS . ∀t.CS [P] t ⇒ t ∈ π) ⇒ (∀CT . ∀t. CT [ P↓] t ⇒ t ∈ π);
RTC ≡ ∀P. ∀CT .∀t.CT [ P↓] t ⇒ ∃CS . CS [P] t.
In the following we adopt the notation P |=R π to mean “P robustly satisfies π,” i.e., P
satisfies π irrespective of the contexts it is linked with. Thus, we write more compactly:
RTP ≡ ∀π. ∀P. P |=R π ⇒ P↓ |=R π.
All the criteria of Abate et al. [2] share this flavor of stating the existence of some
source context that simulates the behavior of any given target context, with some varia-
tions depending on the class of (hyper)properties under consideration. All these criteria
are stated in a setting where source and target traces are the same. In this section, we ex-
tend their result to our trace-relating setting, obtaining trintarian views for secure com-
pilation. Despite the similarities with §2, more challenges show up, in particular when
considering the robust preservation of proper sub-classes of trace properties. For exam-
ple, after application of σ̃ or τ̃ , a property may not be safety anymore, a crucial point for
the equivalence with the property-free criterion for safety properties by Abate et al. [2].
We solve this by interpreting the class of safety properties as an abstraction of the class
of all trace properties induced by a closure operator (§5.1). The remaining subsections
provide example compilation chains satisfying our trace-relating secure compilation
criteria for trace properties (§5.2) and for safety properties hypersafety (§5.3).

5.1 Trace-Relating Secure Compilation: A Spectrum of Trinities

In this subsection we generalize many of the criteria of Abate et al. [2] using the ideas
of §2. Before discussing how we solve the challenges for classes such as safety and
hypersafety, we show the simple generalization of RTC to the trace-relating setting
(RTC∼ ) and its corresponding trinitarian view (Theorem 5.1):

Theorem 5.1 (Trinity for Robust Trace Properties ). For any trace relation ∼ and
induced property mappings τ̃ and σ̃, we have: RTPτ̃ ⇐⇒ RTC∼ ⇐⇒ RTPσ̃ , where
RTC∼ ≡ ∀P ∀CT ∀t. CT [ P↓] t ⇒ ∃CS ∃s ∼ t. CS [P]
s;
RTPτ̃ ≡ ∀P ∀πS ∈ 2TraceS . P |=R πS ⇒ P↓ |=R τ̃ (πS );
20 C. Abate et al.

RTPσ̃ ≡ ∀P ∀πT ∈ 2TraceT . P |=R σ̃(πT ) ⇒ P↓ |=R πT .

Abate et al. [2] propose many more equivalent pairs of criteria, each preserving different
classes of (hyper)properties, which we briefly recap now. For trace properties, they also
have criteria that preserve safety properties plus their version of liveness properties. For
hyperproperties, they have criteria that preserve hypersafety properties, subset-closed
hyperproperties, and arbitrary hyperproperties. Finally, they define relational hyper-
properties, which are relations between the behaviors of multiple programs for express-
ing, e.g., that a program always runs faster than another. For relational hyperproperties,
they have criteria that preserve arbitrary relational properties, relational safety proper-
ties, relational hyperproperties and relational subset-closed hyperproperties. Roughly
speaking, the security guarantees due to robust preservation of trace properties regard
only protecting the integrity of the program from the context, the guarantees of hyper-
properties also regard data confidentiality, and the guarantees of relational hyperprop-
erties even regard code confidentiality. Naturally, these stronger guarantees are increas-
ingly harder to enforce and prove.
While we have lifted the most significant criteria from Abate et al. [2] to our trini-
tarian view, due to space constraints we provide the formal definitions only for the two
most interesting criteria. We summarize the generalizations of many other criteria in
Figure 2, described at the end. Omitted definitions are available in the online appendix.
Beyond Trace Properties: Robust Safety and Hyperproperty Preservation. We
detail robust preservation of safety properties and of arbitrary hyperproperties since they
are both relevant from a security point of view and their generalization is interesting.
Theorem 5.2 (Trinity for Robust Safety Properties ). For any trace relation ∼
and for the induced property mappings τ̃ and σ̃, we have:
RTPSafe◦τ̃ ⇐⇒ RSC∼ ⇐⇒ RSPσ̃ , where
∼
RSC ≡ ∀P ∀CT ∀t ∀m ≤ t.CT [ P↓] t ⇒ ∃CS ∃t ≥ m ∃s ∼ t . CS [P]

s;
RTPSafe◦τ̃ ≡ ∀P∀πS ∈ 2TraceS .P |=R πS ⇒ P↓ |=R (Safe ◦ τ̃ )(πS );
RSPσ̃ ≡ ∀P∀πT ∈ SafetyT .P |=R σ̃(πT ) ⇒ P↓ |=R πT .
There is an interesting asymmetry between the last two characterizations above, which
we explain now in more detail. RSPσ̃ quantifies over target safety properties, while
RTPSafe◦τ̃ quantifies over arbitrary source properties, but imposes the composition of
τ̃ with Safe, which maps an arbitrary target property πT to the target safety property
that best over-approximates πT 8 (an analogous closure was needed for subset-closed
hyperproperties in Theorem 2.11). More precisely, Safe is a closure operator on target
properties, with SafetyT = Safe(πT ) πT ∈ 2TraceT . The mappings

Safe ◦ τ̃ : 2TraceS SafetyT : σ̃

determine a Galois connection between source trace properties and target safety prop-
erties, and ensure the equivalence RTPSafe◦τ̃ ⇐⇒ RSPσ̃ ( ). This argument gen-
eralizes to arbitrary closure operators on target properties ( ) and on hyperproperties,
as long as the corresponding class is a sub-class of subset-closed hyperproperties, and
8
Safe(πT ) = ∩ {ST | πT ⊆ ST ∧ ST ∈ SafetyT } is the topological closure in the topol-
ogy of Clarkson and Schneider [14], where safety properties coincide with the closed sets.
Trace-Relating Compiler Correctness and Secure Compilation 21

explains all but one of the asymmetries in Figure 2, the one that concerns the robust
preservation of arbitrary hyperproperties:

Theorem 5.3 (Weak Trinity for Robust Hyperproperties ). For a trace relation
∼ ⊆ TraceS × TraceT and induced property mappings σ̃ and τ̃ , RHC∼ is equivalent
to RHPτ̃ ; moreover, if τ̃ σ̃ is a Galois insertion (i.e., τ̃ ◦ σ̃ = id), RHC∼ implies
RHPσ̃ , while if σ̃ τ̃ is a Galois reﬂection (i.e., σ̃ ◦ τ̃ = id), RHPσ̃ implies RHC∼ ,
where RHC∼ ≡ ∀P ∀CT ∃CS ∀t. CT [ P↓] t ⇐⇒ (∃s ∼ t. CS [P] s);
RHPτ̃ ≡ ∀P ∀HS . P |=R HS ⇒ P↓ |=R τ̃ (HS );
RHPσ̃ ≡ ∀P ∀HT . P |=R σ̃(HT ) ⇒ P↓ |=R HT .

This trinity is weak since extra hypotheses are needed to prove some implications.
While the equivalence RHC∼ ⇐⇒ RHPτ̃ holds unconditionally, the other two im-
plications hold only under distinct, stronger assumptions. For RHPσ̃ it is still possible
and correct to deduce a source obligation for a given target hyperproperty HT when no
information is lost in the the composition τ̃ ◦ σ̃ (i.e., the two maps are a Galois inser-
tion). On the other hand, RHPτ̃ is a consequence of RHPσ̃ when no information is lost
in composing in the other direction, σ̃ ◦ τ̃ (i.e., the two maps are a Galois reﬂection).
Navigating the Diagram. For a given trace relation ∼, Figure 2 orders the generalized
criteria according to their relative strength. If a trinity implies another (denoted by ⇒),
then the former provides stronger security for a compilation chain than the latter.
As mentioned, some property-full criteria regarding proper subclasses (i.e., subset-
closed hyperproperties, safety, hypersafety, 2-relational safety and 2-relational hyper-
properties) quantify over arbitrary (relational) (hyper)properties and compose τ̃ with
an additional operator. We have already presented the Safe operator; other operators
are Cl⊆ , HSafe, and 2rSafe, which approximate the image of τ̃ with a subset-closed
hyperproperty, a hypersafety and 2-relational safety respectively.
As a reading aid, when quantifying over arbitrary trace properties we use the shaded
blue as background color, we use the red when quantifying over arbitrary subset-closed
hyperproperties and green for arbitrary 2-relational properties.
We now describe how to interpret the acronyms in Figure 2. All criteria start with R
meaning they refer to robust preservation. Criteria for relational hyperproperties—here
only arity 2 is shown—contain 2r. Next, criteria names spell the class of hyperproperties
they preserve: H for hyperproperties, SCH for subset-closed hyperproperties, HS for
hypersafety, T for trace properties, and S for safety properties. Finally, property-free
criteria end with a C while property-full ones involving σ̃ and τ̃ end with P. Thus,
robust (R) subset-closed hyperproperty-preserving (SCH) compilation (C) is RSCHC∼ ,
robust (R) two-relational (2r) safety-preserving (S) compilation (C) is R2rSC∼ , etc.

5.2 Instance of Trace-Relating Robust Preservation of Trace Properties

This subsection illustrates trace-relating secure compilation when the target language
has strictly more events than the source that target contexts can exploit to break security.
Source and Target Languages. The source and target languages used here are nearly
identical expression languages, borrowing from the syntax of the source language of
§3.3. Both languages add sequencing of expressions, two kinds of output events, and
22 C. Abate et al.

R2rSCHC∼
∼
RHC
Ins.
Reﬂ.
R2rSCHPCl ⊆ ◦σ̃ R2rSCHPCl ⊆ ◦τ̃
σ̃ τ̃
RHP RHP R2rTC∼
RSCHC∼ R2rTPσ̃ R2rTPτ̃ R2rSC∼
Cl ⊆ ◦σ̃ Cl ⊆ ◦τ̃
RSCHP RSCHP
R2rSPσ̃ R2rTP2rS
2rSafe◦
rS
S τ̃
RHSC∼
∼
RTC
RHSPCl ⊆ ◦σ̃ RSCHPHS
HSafe◦
S τ̃

RTPσ̃ RTPτ̃
RSC∼

RSPσ̃ RTPSa
Safe◦
S a τ̃

R robust 2r 2-relational
H hyperproperties SCH subset-closed hyperproperties HS hypersafety
T trace properties S safety properties
P property-full criterion C property-free criterion based on σ and τ

Fig. 2: Hierarchy of trinitarian views of secure compilation criteria preserving classes

of hyperproperties and the key to read each acronym. Shorthands ‘Ins.’ and ‘Reﬂ.’ stand
for Galois Insertion and Reﬂection. The symbol denotes trinities proven in Coq.

the expressions that generate them: outS n and outS n usable in source and target, re-
spectively, and outT n usable only in the target, which is the only difference between
source and target. The extra events in the target model the fact that the target language
has an increased ability to perform certain operations, some of them potentially dan-
gerous (such as writing to the hard drive), which cannot be performed by the source
language, and against which source-level reasoning can therefore offer no protection.
Both languages and compilation chains now deal with partial programs, contexts
and linking of those two to produce whole programs. In this setting, a whole program
is the combination of a main expression to be evaluated and a set of function deﬁnitions
(with distinct names) that can refer to their argument symbolically and can be called by
the main expression and by other functions. The set of functions of a whole program
is the union of the functions of a partial program and a context; the latter also contains
the main expression. The extensions of the typing rules and the operational semantics
for whole programs are unsurprising and therefore elided. The trace model also follows
closely that of §3.3: it consists of a list of regular events (including the new outputs)
terminated by a result event. Finally, a partial program and a context can be linked into
a whole program when their functions satisfy the requirements mentioned above.
Relating Traces. In the present model, source and target traces differ only in the fact
that the target draws (regular) events from a strictly larger set than the source, i.e.,
ΣT ⊃ ΣS . A natural relation between source and target traces essentially maps to a
given target trace t the source trace that erases from t those events that exist only at the
target level. Let t|ΣS indicate trace t ﬁltered to retain only those elements included in
Trace-Relating Compiler Correctness and Secure Compilation 23

alphabet ΣS . We deﬁne the trace relation as:

s ∼ t ≡ s = t|ΣS .
In the opposite direction, a source trace s is related to many target ones, as any target-
only events can be inserted at any point in s. The induced mappings for ∼ are:
τ̃ (πS ) = {t | ∃s. s = t|ΣS ∧ s ∈ πS } ; σ̃(πT ) = {s | ∀t. s = t|ΣS ⇒ t ∈ πT } .
That is, the target guarantee of a source property is that the target has the same
source-level behavior, sprinkled with arbitrary target-level behavior. Conversely, the
source-level obligation of a target property is the aggregate of those source traces all of
whose target-level enrichments are in the target property.
Since RS and RT are very similar, it is simple to prove that the identity compiler
( ·↓) from RS to RT is secure according to the trace relation ∼ defined above.
Theorem 5.4 ( ·↓ is Secure ). ·↓ is RTC∼ .
5.3 Instances of Trace-Relating Robust Preservation of Safety and Hypersafety
To provide examples of cross-language trace-relations that preserve safety and hyper-
safety properties, we show how existing secure compilation results can be interpreted in
our framework. This indicates how the more general theory developed here can already
be instantiated to encompass existing results, and that existing proof techniques can be
used in order to achieve the secure compilation criteria we define.
For the preservation of safety, Patrignani and Garg [50] study a compiler from a
typed, concurrent WHILE language to an untyped, concurrent WHILE language with
support for memory capabilities. As in §3.3, their source has bools and nats while
their target only has nats. Additionally, their source has an ML-like memory (where
the domain is locations ) while their target has an assembly-like memory (where the
domain is natural numbers n). Their traces consider context-program interactions and
as such they are concatenations of call and return actions with parameters, which can
include booleans as well as locations. Because of the aforementioned differences, they
need a cross-language relation to relate source and target actions.
Besides defining a relation on traces (i.e., an instance of ∼), they also define a
relation between source and target safety properties. They provide an instantiation of τ
that maps all safe source traces to the related target ones. This ensures that no additional
target trace is introduced in the target property, and source safety properties are mapped
to target safety ones by τ . Their compiler is then proven to generate code that respects
τ , so they achieve a variation of RTPSafe◦τ̃ .
Concerning the preservation of hypersafety, Patrignani and Garg [49] consider com-
pilers in a reactive setting where traces are sequences of input (α?) and output (α!) ac-
tions. In their setting, traces are different between source and target, so they define a
cross-language relation on actions that is total on the source actions and injective. Ad-
ditionally, their set of target output actions is strictly larger than the source one, as it
√
includes a special action , which is how compiled code must respond to invalid target
inputs (i.e., receiving a bool when a nat was expected). Starting from the relation on
actions, they define TPC, which is an instance of what we call τ . Informally, given a set
of source traces, TPC generates all target traces that are related (pointwise) to a source
trace. Additionally, it generates all traces with interleavings of undesired inputs α? fol-
√ √
lowed by as long as removing α? leaves a trace that relates to the source trace.
24 C. Abate et al.

TPC preserves hypersafety across languages, i.e., it is an instance of RSCHPHSafe◦τ̃

mapping source hypersafety to target hypersafety (and safety to safety).

6 Related Work
We already discussed how our results relate to some existing work in correct compila-
tion [33, 58] and secure compilation [2, 49, 50]. We also already mentioned that most
of our definitions and results make no assumptions about the structure of traces. One
result that relies on the structure of traces is Theorem 5.2, which involves some finite
prefix m, suggesting traces should be some sort of sequences of events (or states), as
customary when one wants to refer to safety properties [14]. It is however sufficient
to fix a topology on properties where safety properties coincide with closed sets [46].
Even for reasoning about safety, hypersafety, or arbitrary hyperproperties, traces can
therefore be values, sequences of program states, or of input output events, or even the
recently proposed interaction trees [62]. In the latter case we believe that the compila-
tion from IMP to ASM proposed by Xia et al. [62] can be seen as an instance of HC∼ ,
for the relation they call “trace equivalence.”
Compilers Where Our Work Could Be Useful. Our work should be broadly applica-
ble to understanding the guarantees provided by many verified compilers. For instance,
Wang et al. [61] recently proposed a CompCert variant that compiles all the way down
to machine code, and it would be interesting to see if the model at the end of §3.1 applies
there too. This and many other verified compilers [12, 29, 42, 56] beyond CakeML [58]
deal with resource exhaustion and it would be interesting to also apply the ideas of §3.2
to them. Hur and Dreyer [27] devised a correct compiler from an ML language to as-
sembly using a cross-language logical relation to state their CC theorem. They do not
have traces, though were one to add them, the logical relation on values would serve as
the basis for the trace relation and therefore their result would attain CC∼ .
Switching to more informative traces capturing the interaction between the program
and the context is often used as a proof technique for secure compilation [2, 28, 48].
Most of these results consider a cross-language relation, so they probably could be
proved to attain one of the criteria from Figure 2.
Generalizations of Compiler Correctness. The compiler correctness definition of
Morris [41] was already general enough to account for trace relations, since it consid-
ered a translation between the semantics of the source program and that of the compiled
program, which he called “decode” in his diagram, reproduced in Figure 3 (left). And
even some of the more recent compiler correctness definitions preserve this kind of flex-
ibility [51]. While CC∼ can be seen as an instance of a definition by Morris [41], we are
not aware of any prior work that investigated the preservation of properties when the
“decode translation” is neither the identity nor a bijection, and source properties need
to be re-interpreted as target ones and vice versa.
Correct Compilation and Galois Connections. Melton et al. [38] and Sabry and
Wadler [55] expressed a strong variant of compiler correctness using the diagram of
Figure 3 (right) [38, 55]. They require that compiled programs parallel the computation
steps of the original source programs, which can be proven showing the existence of a
decompilation map # that makes the diagram commute, or equivalently, the existence
of an adjoint for ↓ (W ≤ W ⇐⇒ W W for both source and target). The
Trace-Relating Compiler Correctness and Secure Compilation 25

source semantics
source language source meanings S
W Z#
compile decode T
target semantics W↓ Z
target language target meanings

Fig. 3: Morris’s [41] (left) and Melton et al.’s [38] and Sabry and Wadler’s [55] (right)

“parallel” intuition can be formalized as an instance of CC∼ . Take source and target
traces to be finite or infinite sequences of program states (maximal trace semantics
[15]), and relate them exactly like Melton et al. [38] and Sabry and Wadler [55].
Translation Validation. Translation validation is an important alternative to proving
that all runs of a compiler are correct. A variant of CC∼ for translation validation can
simply be obtained by specializing the definition to a particular W, and one can obtain
again the same trinitarian view. Similarly for our other criteria, including our extensions
of the secure compilation criteria of Abate et al. [2], which Busi et al. [10] seem to
already be considering in the context of translation validation.

7 Conclusion and Future Work

We have extended the property preservation view on compiler correctness to arbitrary
trace relations, and believe that this will be useful for understanding the guarantees var-
ious compilers provide. An open question is whether, given a compiler, there exists a
most precise ∼ relation for which this compiler is correct. As mentioned in §1, every
compiler is CC∼ for some ∼, but under which conditions is there a most precise rela-
tion? In practice, more precision may not always be better though, as it may be at odds
with compiler efficiency and may not align with more subjective notions of usefulness,
leading to tradeoffs in the selection of suitable relations. Finally, another interesting
direction for future work is studying whether using the relation to Galois connections
allows to more easily compose trace relations for different purposes, say, for a compiler
whose target language has undefined behavior, resource exhaustion, and side-channels.
In particular, are there ways to obtain complex relations by combining simpler ones in
a way that eases the compiler verification burden?

Acknowledgements. We thank Akram El-Korashy and Amin Timany for participating

in an early discussion about this work and the anonymous reviewers for their valuable
feedback. This work was in part supported by the European Research Council under
ERC Starting Grant SECOMP (715753), by the German Federal Ministry of Education
and Research (BMBF) through funding for the CISPA-Stanford Center for Cybersecu-
rity (FKZ: 13N1S0762), by DARPA grant SSITH/HOPE (FA8650-15-C-7558) and by
UAIC internal grant 07/2018.
26 C. Abate et al.

Bibliography

[1] M. Abadi, A. Banerjee, N. Heintze, and J. G. Riecke. A core calculus of dependency.

POPL, 1999.
[2] C. Abate, R. Blanco, D. Garg, C. Hriţcu, M. Patrignani, and J. Thibault. Journey beyond
full abstraction: Exploring robust property preservation for secure compilation. CSF, 2019.
[3] A. Ahmed, D. Garg, C. Hriţcu, and F. Piessens. Secure compilation (Dagstuhl Seminar
18201). Dagstuhl Reports, 8(5), 2018.
[4] A. Anand, A. Appel, G. Morrisett, Z. Paraskevopoulou, R. Pollack, O. S. Belanger,
M. Sozeau, and M. Weaver. CertiCoq: A verified compiler for Coq. CoqPL Workshop,
2017.
[5] K. Backhouse and R. Backhouse. Safety of abstract interpretations for free, via logical
relations and Galois connections. Science of Computer Programming, 51(1-2), 2004.
[6] G. Barthe, B. Grégoire, and V. Laporte. Secure compilation of side-channel countermea-
sures: the case of cryptographic “constant-time”. CSF, 2018.
[7] L. Beringer, G. Stewart, R. Dockins, and A. W. Appel. Verified compilation for shared-
memory C. ESOP, 2014.
[8] F. Besson, S. Blazy, and P. Wilke. A verified CompCert front-end for a memory model
supporting pointer arithmetic and uninitialised data. Journal of Automated Reasoning, 62
(4), 2019.
[9] S. Boldo, J. Jourdan, X. Leroy, and G. Melquiond. Verified compilation of floating-point
computations. Journal of Automated Reasoning, 54(2), 2015.
[10] M. Busi, P. Degano, and L. Galletta. Translation validation for security properties. CoRR,
abs/1901.05082, 2019.
[11] Q. Cao, L. Beringer, S. Gruetter, J. Dodds, and A. W. Appel. VST-Floyd: A separation logic
tool to verify correctness of C programs. Journal of Automated Reasoning, 61(1-4), 2018.
[12] Q. Carbonneaux, J. Hoffmann, T. Ramananandro, and Z. Shao. End-to-end verification of
stack-space bounds for C programs. PLDI, 2014.
[13] C. Cimpanu. Microsoft: 70 percent of all security bugs are memory safety issues. ZDNet,
2019.
[14] M. R. Clarkson and F. B. Schneider. Hyperproperties. JCS, 18(6), 2010.
[15] P. Cousot. Constructive design of a hierarchy of semantics of a transition system by abstract
interpretation. TCS, 277(1-2), 2002.
[16] P. Cousot and R. Cousot. Abstract interpretation: a unified lattice model for static analysis
of programs by construction or approximation of fixpoints. POPL, 1977.
[17] V. D’Silva, M. Payer, and D. X. Song. The correctness-security gap in compiler optimiza-
tion. S&P Workshops, 2015.
[18] J. Engelfriet. Determinacy implies (observation equivalence = trace equivalence). TCS, 36,
1985.
[19] R. Focardi and R. Gorrieri. A taxonomy of security properties for process algebras. JCS, 3
(1), 1995.
[20] P. H. Gardiner, C. E. Martin, and O. De Moor. An algebraic construction of predicate
transformers. Science of Computer Programming, 22(1-2), 1994.
[21] R. Giacobazzi and I. Mastroeni. Abstract non-interference: a unifying framework for weak-
ening information-flow. ACM Transactions on Privacy and Security, 21(2), 2018.
[22] J. A. Goguen and J. Meseguer. Security policies and security models. S&P, 1982.
[23] R. Gu, Z. Shao, J. Kim, X. N. Wu, J. Koenig, V. Sjöberg, H. Chen, D. Costanzo, and T. Ra-
mananandro. Certified concurrent abstraction layers. PLDI, 2018.
Trace-Relating Compiler Correctness and Secure Compilation 27

[24] I. Haller, Y. Jeon, H. Peng, M. Payer, C. Giuffrida, H. Bos, and E. van der Kouwe. TypeSan:
Practical type confusion detection. CCS, 2016.
[25] Heartbleed. The Heartbleed bug. http://heartbleed.com/, 2014.
[26] C. Hriţcu, D. Chisnall, D. Garg, and M. Payer. Secure compilation. SIGPLAN PL Perspec-
tives Blog, 2019.
[27] C. Hur and D. Dreyer. A Kripke logical relation between ML and assembly. POPL, 2011.
[28] A. Jeffrey and J. Rathke. Java Jr: Fully abstract trace semantics for a core Java language.
ESOP, 2005.
[29] J. Kang, C. Hur, W. Mansky, D. Garbuzov, S. Zdancewic, and V. Vafeiadis. A formal C
memory model supporting integer-pointer casts. PLDI, 2015.
[30] J. Kang, Y. Kim, C.-K. Hur, D. Dreyer, and V. Vafeiadis. Lightweight verification of sepa-
rate compilation. POPL, 2016.
[31] L. Lamport and F. B. Schneider. Formal foundation for specification and verification. In
Distributed Systems: Methods and Tools for Specification, An Advanced Course, 1984.
[32] C. Lattner. What every C programmer should know about undefined behavior #1/3. LLVM
Project Blog, 2011.
[33] X. Leroy. Formal verification of a realistic compiler. CACM, 52(7), 2009.
[34] X. Leroy. A formally verified compiler back-end. JAR, 43(4), 2009.
[35] X. Leroy. The formal verification of compilers (DeepSpec Summer School 2017), 2017.
[36] I. Mastroeni and M. Pasqua. Verifying bounded subset-closed hyperproperties. SAS, 2018.
[37] J. McCarthy and J. Painter. Correctness of a compiler for arithmetic expressions. Mathe-
matical Aspects Of Computer Science 1, 19 of Proceedings of Symposia in Applied Math-
ematics, 1967.
[38] A. Melton, D. A. Schmidt, and G. E. Strecker. Galois connections and computer science
applications. In Proceedings of a Tutorial and Workshop on Category Theory and Computer
Programming, 1986.
[39] R. Milner. A Calculus of Communicating Systems. Springer-Verlag, Berlin, Heidelberg,
1982.
[40] R. Milner and R. Weyhrauch. Proving compiler correctness in a mechanized logic. In Pro-
ceedings of 7th Annual Machine Intelligence Workshop, volume 7 of Machine Intelligence,
1972.
[41] F. L. Morris. Advice on structuring compilers and proving them correct. POPL, 1973.
[42] E. Mullen, D. Zuniga, Z. Tatlock, and D. Grossman. Verified peephole optimizations for
CompCert. PLDI, 2016.
[43] D. A. Naumann. A categorical model for higher order imperative programming. Mathe-
matical Structures in Computer Science, 8(4), 1998.
[44] D. A. Naumann and M. Ngo. Whither specifications as programs. In International Sympo-
sium on Unifying Theories of Programming. Springer, 2019.
[45] G. Neis, C. Hur, J. Kaiser, C. McLaughlin, D. Dreyer, and V. Vafeiadis. Pilsner: a compo-
sitionally verified compiler for a higher-order imperative language. ICFP, 2015.
[46] M. Pasqua and I. Mastroeni. On topologies for (hyper)properties. CEUR, 2017.
[47] M. Patrignani. Why should anyone use colours? or, syntax highlighting beyond code snip-
pets, 2020.
[48] M. Patrignani and D. Clarke. Fully abstract trace semantics for protected module architec-
tures. Computer Languages, Systems & Structures, 42, 2015.
[49] M. Patrignani and D. Garg. Secure compilation and hyperproperty preservation. CSF, 2017.
[50] M. Patrignani and D. Garg. Robustly safe compilation. ESOP, 2019.
[51] D. Patterson and A. Ahmed. The next 700 compiler correctness theorems (functional pearl).
PACMPL, 3(ICFP), 2019.
[52] T. Ramananandro, Z. Shao, S. Weng, J. Koenig, and Y. Fu. A compositional semantics for
verified separate compilation and linking. CPP, 2015.
28 C. Abate et al.

[53] J. Regehr. A guide to undefined behavior in C and C++, part 3. Embedded in Academia
blog, 2010.
[54] A. Sabelfeld and D. Sands. Dimensions and principles of declassification. CSFW, 2005.
[55] A. Sabry and P. Wadler. A reflection on call-by-value. ACM Transactions on Programming
Languages and Systems, 19(6), 1997.
[56] J. Sevcík, V. Vafeiadis, F. Z. Nardelli, S. Jagannathan, and P. Sewell. CompCertTSO: A
verified compiler for relaxed-memory concurrency. J. ACM, 60(3), 2013.
[57] G. Stewart, L. Beringer, S. Cuellar, and A. W. Appel. Compositional CompCert. POPL,
2015.
[58] Y. K. Tan, M. O. Myreen, R. Kumar, A. Fox, S. Owens, and M. Norrish. The verified
CakeML compiler backend. Journal of Functional Programming, 29, 2019.
[59] X. Wang, H. Chen, A. Cheung, Z. Jia, N. Zeldovich, and M. F. Kaashoek. Undefined
behavior: What happened to my code? APSYS, 2012.
[60] X. Wang, N. Zeldovich, M. F. Kaashoek, and A. Solar-Lezama. Towards optimization-safe
systems: Analyzing the impact of undefined behavior. SOSP, 2013.
[61] Y. Wang, P. Wilke, and Z. Shao. An abstract stack based approach to verified compositional
compilation to machine code. PACMPL, 3(POPL), 2019.
[62] L. Xia, Y. Zakowski, P. He, C. Hur, G. Malecha, B. C. Pierce, and S. Zdancewic. Interaction
trees: representing recursive and impure programs in Coq. PACMPL, 4(POPL), 2020.
[63] A. Zakinthinos and E. S. Lee. A general theory of security properties. S&P, 1997.
[64] J. Zhao, S. Nagarakatte, M. M. K. Martin, and S. Zdancewic. Formalizing the LLVM
intermediate representation for verified program transformations. POPL, 2012.

Danel Ahman and Andrej Bauer

Faculty of Mathematics and Physics

University of Ljubljana, Slovenia

Abstract. Runners of algebraic eﬀects, also known as comodels, pro-

vide a mathematical model of resource management. We show that they
also give rise to a programming concept that models top-level external
resources, as well as allows programmers to modularly deﬁne their own
intermediate “virtual machines”. We capture the core ideas of program-
ming with runners in an equational calculus λcoop , which we equip with
a sound and coherent denotational semantics that guarantees the lin-
ear use of resources and execution of ﬁnalisation code. We accompany
λcoop with examples of runners in action, provide a prototype language
implementation in OCaml, as well as a Haskell library based on λcoop .

Keywords: Runners, comodels, algebraic eﬀects, resources, ﬁnalisation.

1 Introduction

Computational eﬀects, such as exceptions, input-output, state, nondeterminism,

and randomness, are an important component of general-purpose programming
languages, whether they adopt functional, imperative, object-oriented, or other
programming paradigms. Even pure languages exhibit computational effects at
the top level, so to speak, by interacting with their external environment.
In modern languages, computational effects are often structured using mon-
ads [22,23,36], or algebraic effects and handlers [12,28,30]. These mechanisms
excel at implementation of computational effects within the language itself. For
instance, the familiar implementation of mutable state in terms of state-passing
functions requires no native state, and can be implemented either as a monad or
using handlers. One is naturally drawn to using these techniques also for deal-
ing with actual effects, such as manipulation of native memory and access to
hardware. These are represented inside the language as algebraic operations (as
in Eff [4]) or a monad (in the style of Haskell’s IO), but treated specially by
the language’s top-level runtime, which invokes corresponding operating system
functionality. While this approach works in practice, it has some unfortunate
downsides too, namely lack of modularity and linearity, and excessive generality.
Lack of modularity is caused by having the external resources hard-coded into
the top-level runtime. As a result, changing which resources are available and
how they are implemented requires modifications of the language implementa-
tion. Additional complications arise when a language supports several operating
systems and hardware platforms, each providing their own, different feature set.
c The Author(s) 2020
P. Müller (Ed.): ESOP 2020, LNCS 12075, pp. 29–55, 2020.
https://doi.org/10.1007/978-3-030-44914-8_ 2
30 D. Ahman and A. Bauer

One wishes that the ingenuity of the language implementors were better sup-
ported by a more flexible methodology with a sound theoretical footing.
Excessive generality is not as easily discerned, because generality of program-
ming concepts makes a language expressive and useful, such as general algebraic
effects and handlers enabling one to implement timeouts, rollbacks, stream redi-
rection [30], async & await [16], and concurrency [9]. However, the flip side of such
expressive freedom is the lack of any guarantees about how external resources
will actually be used. For instance, consider a simple piece of code, written in
Eff-like syntax, which first opens a file, then writes to it, and finally closes it:
let fh = open "hello.txt" in write (fh, "Hello, world."); close fh

What this program actually does depends on how the operations open, write,
and close are handled. For all we know, an enveloping handler may intercept the
write operation and discard its continuation, so that close never happens and
the ﬁle is not properly closed. Telling the programmer not to shoot themselves
in the foot by avoiding such handlers is not helpful, because the handler may
encounter an external reason for not being able to continue, say a full disk.
Even worse, external resources may be misused accidentally when we combine
two handlers, each of which works as intended on its own. For example, if we
combine the above code with a non-deterministic choose operation, as in
let fh = open "greeting.txt" in
let b = choose () in
if b then write (fh, "hello") else write (fh, "good bye") ; close fh

and handle it with the standard non-determinism handler

handler { return x Ñ [x], choose () k Ñ return (append (k true) (k false)) }

The resulting program attempts to close the file twice, as well as write to it twice,
because the continuation k is invoked twice when handling choose. Of course,
with enough care all such situations can be dealt with, but that is beside the
point. It is worth sacrificing some amount of the generality of algebraic effects
and monads in exchange for predictable and safe usage of external computational
effects, so long as the vast majority of common use cases are accommodated.

Contributions We address the described issues by showing how to design a

programming language based on runners of algebraic eﬀects. We review runners
in §2 and recast them as a programming construct in §3. In §4, we present λcoop ,
a calculus that captures the core ideas of programming with runners. We provide
a coherent and sound denotational semantics for λcoop in §5, where we also prove
that well-typed code is properly ﬁnalised. In §6, we show examples of runners in
action. The paper is accompanied by a prototype language Coop and a Haskell
library Haskell-Coop, based on λcoop , see §7. The relationship between λcoop
and existing work is addressed in §8, and future possibilities discussed in §9.
The paper is also accompanied by an online appendix (https://arxiv.org/
abs/1910.11629) that provides the typing and equational rules we omit in §4.
Runners in action 31

Runners are modular in that they can be used not only to model the top-
level interaction with the external environment, but programmers can also use
them to define and nest their own intermediate “virtual machines”. Our runners
are effectful : they may handle operations by calling further outer operations,
and raise exceptions and send signals, through which exceptional conditions and
runtime errors are communicated back to user programs in a safe fashion that
preserves linear usage of external resources and ensures their proper finalisation.
We achieve suitable generality for handling of external resources by showing
how runners provide implementations of algebraic operations together with a
natural notion of finalisation, and a strong guarantee that in the absence of
external kill signals the finalisation code is executed exactly once (Thm. 7). We
argue that for most purposes such discipline is well worth having, and giving up
the arbitrariness of effect handlers is an acceptable price to pay. In fact, as will
be apparent in the denotational semantics, runners are simply a restricted form
of handlers, which apply the continuation at most once in a tail call position.
Runners guarantee linear usage of resources not through a linear or unique-
ness type system (such as in the Clean programming language [15]) or a syntac-
tic discipline governing the application of continuations in handlers, but rather
by a design based on the linear state-passing technique studied by Møgelberg
and Staton [21]. In this approach, a computational resource may be implemented
without restrictions, but is then guaranteed to be used linearly by user code.

2 Algebraic eﬀects, handlers, and runners

We begin with a short overview of the theory of algebraic eﬀects and handlers,
as well as runners. To keep focus on how runners give rise to a programming
concept, we work naively in set theory. Nevertheless, we use category-theoretic
language as appropriate, to make it clear that there are no essential obstacles to
extending our work to other settings (we return to this point in §5.1).

2.1 Algebraic eﬀects and handlers

There is by now no lack of material on the algebraic approach to structuring

computational eﬀects. For an introductory treatment we refer to [5], while of
course also recommend the seminal papers by Plotkin and Power [25,28]. The
brief summary given here only recalls the essentials and introduces notation.
An (algebraic) signature is given by a set Σ of operation symbols, and for each
op P Σ its operation signature op : Aop Bop , where Aop and Bop are called the
parameter and arity set. A Σ-structure M is given by a carrier set |M|, and
for each operation symbol op P Σ, a map opM : Aop ˆ pBop ñ |M|q Ñ |M|,
where ñ is set exponentiation. The free Σ-structure TreeΣ pXq over a set X is
the set of well-founded trees generated inductively by

– return x P TreeΣ pXq, for every x P X, and

– oppa, κq P TreeΣ pXq, for every op P Σ, a P Aop , and κ : Bop Ñ TreeΣ pXq.
32 D. Ahman and A. Bauer

We are abusing notation in a slight but standard way, by using op both as the
name of an operation and a tree-forming constructor. The elements of TreeΣ pXq
are called computation trees: a leaf return x represents a pure computation re-
turning a value x, while oppa, κq represents an eﬀectful computation that calls
op with parameter a and continuation κ, which expects a result from Bop .
An algebraic theory T “ pΣT , EqT q is given by a signature ΣT and a set of
equations EqT . The equations EqT express computational behaviour via inter-
actions between operations, and are written in a suitable formalism, e.g., [30].
We explain these by way of examples, as the precise details do not matter for
our purposes. Let 0 “ t u be the empty set and 1 “ t‹u the standard singleton.

Example 1. Given a set C of possible states, the theory of C-valued state has
two operations, whose somewhat unusual naming will become clear later on,

getenv : 1 C, setenv : C 1

and the equations (where we elide appearances of ‹):

getenvpλc . setenvpc, κqq “ κ, setenvpc, getenv κq “ setenvpc, κ cq,

setenvpc, setenvpc , κqq “ setenvpc1 , κq.
1

For example, the second equation states that reading state right after setting it
to c gives precisely c. The third equation states that setenv overwrites the state.

Example 2. Given a set of exceptions E, the algebraic theory of E-many excep-

tions is given by a single operation raise : E 0, and no equations.

A T -model, also called a T -algebra, is a ΣT -structure which satisﬁes the

equations in EqT . The free T -model over a set X is constructed as the quotient

FreeT pXq “ TreeΣT pXq {„

by the ΣT -congruence „ generated by EqT . Each op P ΣT is interpreted in the

free model as the map pa, κq ÞÑ roppa, κqs, where r´s is the „-equivalence class.
FreeT p´q is the functor part of a monad on sets, whose unit at a set X is

r´s
X
return / TreeΣ pXq / / FreeT pXq .
T

The Kleisli extension for this monad is then the operation which lifts any map
f : X Ñ TreeΣT pY q to the map f : : FreeΣT pXq Ñ FreeΣT pY q, given by

f : rreturn xs “ f x, f : roppa, κqs “ roppa, f : ˝ κqs.

def def

That is, f : traverses a computation tree and replaces each leaf return x with f x.
The preceding construction of free models and the monad may be retro-
ﬁtted to an algebraic signature Σ, if we construe Σ as an algebraic theory with
no equations. In this case „ is just equality, and so we may omit the quotient
Runners in action 33

and the pesky equivalence classes. Thus the carrier of the free Σ-model is the
set of well-founded trees TreeΣ pXq, with the evident monad structure.
A fundamental insight of Plotkin and Power [25,28] was that many com-
putational eﬀects may be adequately described by algebraic theories, with the
elements of free models corresponding to eﬀectful computations. For example,
the monads induced by the theories from Examples 1 and 2 are respectively
isomorphic to the usual state monad StC X “ pC ñ X ˆ Cq and the exceptions
def

monad ExcE X “ X ` E.
def

Plotkin and Pretnar [30] further observed that the universal property of free
models may be used to model a programming concept known as handlers. Given
a T -model M and a map f : X Ñ |M|, the universal property of the free
T -model gives us a unique T -homomorphism f ; : FreeT pXq Ñ |M| satisfying

f ; rreturn xs “ f x, f ; roppa, κqs “ opM pa, f ; ˝ κq.

A handler for a theory T in a language such as Eff amounts to a model M

whose carrier |M| is the carrier FreeT 1 pY q of the free model for some other the-
ory T 1 , while the associated handling construct is the induced T -homomorphism
FreeT pXq Ñ FreeT 1 pY q. Thus handling transforms computations with effects T
to computations with effects T 1 . There is however no restriction on how a han-
dler implements an operation, in particular, it may use its continuation in an
arbitrary fashion. We shall put the universal property of free models to good use
as well, while making sure that the continuations are always used affinely.

2.2 Runners
Much like monads, handlers are useful for simulating computational effects, be-
cause they allow us to transform T -computations to T 1 -computations. However,
eventually there has to be a “top level” where such transformations cease and
actual computational effects happen. For these we need another concept, known
as runners [35]. Runners are equivalent to the concept of comodels [27,31], which
are “just models in the opposite category”, although one has to apply the motto
correctly by using powers and co-powers where seemingly exponentials and prod-
ucts would do. Without getting into the intricacies, let us spell out the definition.
Definition 1. A runner R for a signature Σ is given by a carrier set |R| together
with, for each op P Σ, a co-operation opR : Aop Ñ p|R| ñ Bop ˆ |R|q.
Runners are usually defined to have co-operations in the equivalent uncurried
form opR : Aop ˆ |R| Ñ Bop ˆ |R|, but that is less convenient for our purposes.
Runners may be defined more generally for theories T , rather than just sig-
natures, by requiring that the co-operations satisfy EqT . We shall have no use
for these, although we expect no obstacles in incorporating them into our work.
A runner tells us what to do when an effectful computation reaches the
top-level runtime environment. Think of |R| as the set of configurations of
the runtime environment. Given the current configuration c P |R|, the opera-
tion oppa, κq is executed as the corresponding co-operation opR a c whose result
34 D. Ahman and A. Bauer

pb, c1 q P Bop ˆ |R| gives the result of the operation b and the next runtime
configuration c1 . The continuation κ b then proceeds in runtime configuration c1 .
It is not too difficult to turn this idea into a mathematical model. For any
X, the co-operations induce a Σ-structure M with |M| “ St|R| X “ p|R| ñ
def

X ˆ |R|q and operations opM : Aop ˆ pBop ñ St|R| Xq Ñ St|R| X given by

opM pa, κq “ λc . κ pπ1 popR a cqq pπ2 popR a cqq.

def

We may then use the universal property of the free Σ-model to obtain a Σ-
homomorphism rX : TreeΣ pXq Ñ St|R| X satisfying the equations
rX preturn xq “ λc . px, cq, rX poppa, κqq “ opM pa, rX ˝ κq.
The map rX precisely captures the idea that a runner runs computations by
transforming (static) computation trees into state-passing maps. Note how in
the above definition of opM , the continuation κ is used in a controlled way, as
it appears precisely once as the head of the outermost application. In terms of
programming, this corresponds to linear use in a tail-call position.
Runners are less ad-hoc than they may seem. First, notice that opM is just the
composition of the co-operation opR with the state monad’s Kleisli extension of
the continuation κ, and so is the standard way of turning generic effects into Σ-
structures [26]. Second, the map rX is the component at X of a monad morphism
r : TreeΣ p´q Ñ St|R| . Møgelberg & Staton [21], as well as Uustalu [35], showed
that the passage from a runner R to the corresponding monad morphism r forms
a one-to-one correspondence between the former and the latter.
As defined, runners are too restrictive a model of top-level computation,
because the only effect available to co-operations is state, but in practice the
runtime environment may also signal errors and perform other effects, by calling
its own runtime environment. We are led to the following generalisation.
Definition 2. For a signature Σ and monad T , a T -runner R for Σ, or just an
effectful runner, is given by, for each op P Σ, a co-operation opR : Aop Ñ T Bop .
The correspondence between runners and monad morphisms still holds.
Proposition 3. For a signature Σ and a monad T , the monad morphisms
TreeΣ p´q Ñ T are in one-to-one correspondence with T -runners for Σ.
Proof. This is an easy generalisation of the correspondence for ordinary runners.
Let us fix a signature Σ, and a monad T with unit η and Kleisli extension ´: .
Let R be a T -runner for Σ. For any set X, R induces a Σ-structure M
with |M| “ T X and opM : Aop ˆ pBop ñ T Xq Ñ T X defined as opM pa, κq “
def def

κ: popR aq. As before, the universal property of the free model TreeΣ pXq provides
a unique Σ-homomorphism rX : TreeΣ pXq Ñ T X, satisfying the equations
rX preturn xq “ ηX pxq, rX poppa, κqq “ opM pa, rX ˝ κq.
The maps rX collectively give us the desired monad morphism r induced by R.
Conversely, given a monad morphism θ : TreeΣ p´q Ñ T , we may recover a T -
runner R for Σ by deﬁning the co-operations as opR a “ θBop poppa, λb . return bqq.
def

It is not hard to check that we have described a one-to-one correspondence. \ [

Runners in action 35

3 Programming with runners

If ordinary runners are not general enough, the effectful ones are too general:
parameterised by arbitrary monads T , they do not combine easily and they lack
a clear notion of resource management. Thus, we now engineer more specific
monads whose associated runners can be turned into a programming concept.
While we give up complete generality, the monads presented below are still quite
versatile, as they are parameterised by arbitrary algebraic signatures Σ, and so
are extensible and support various combinations of effects.

3.1 The user and kernel monads

Effectful source code running inside a runtime environment is just one example
of a more general phenomenon in which effectful computations are enveloped by
a layer that provides a supervised access to external resources: a user process
is controlled by a kernel, a web page by a browser, an operating system by
hardware, or a virtual machine, etc. We shall adopt the parlance of software
systems, and refer to the two layers generically as the user and kernel code.
Since the two kinds of code need not, and will not, use the same effects, each
will be described by its own algebraic theory and compute in its own monad.
We first address the kernel theory. Specifically, we look for an algebraic theory
such that effectful runners for the induced monad satisfy the following desiderata:
1. Runners support management and controlled finalisation of resources.
2. Runners may use further external resources.
3. Runners may signal failure caused by unavoidable circumstances.
The totality of external resources available to user code appears as a stateful
external environment, even though it has no direct access to it. Thus, kernel
computations should carry state. We achieve this by incorporating into the kernel
theory the operations getenv and setenv, and equations for state from Example 1.
Apart from managing state, kernel code should have access to further effects,
which may be true external effects, or some outer layer of runners. In either case,
we should allow the kernel code to call operations from a given signature Σ.
Because kernel computations ought to be able to signal failure, we should
include an exception mechanism. In practice, many programming languages and
systems have two flavours of exceptions, variously called recoverable and fatal,
checked and unchecked, exceptions and errors, etc. One kind, which we call just
exceptions, is raised by kernel code when a situation requires special attention
by user code. The other kind, which we call signals, indicates an unrecoverable
condition that prevents normal execution of user code. These correspond pre-
cisely to the two standard ways of combining exceptions with state, namely the
coproduct and the tensor of algebraic theories [11]. The coproduct simply adjoins
exceptions raise : E 0 from Example 2 to the theory of state, while the tensor
extends the theory of state with signals kill : S 0, together with equations

getenvpλc . kill sq “ kill s, setenvpc, kill sq “ kill s. (1)

36 D. Ahman and A. Bauer

These equations say that a signal discards state, which makes it unrecoverable.
To summarise, the kernel theory KΣ,E,S,C contains operations from a signa-
ture Σ, as well as state operations getenv : 1 C, setenv : C 1, exceptions
raise : E 0, and signals kill : S 0, with equations for state from Example 1,
equations (1) relating state and signals, and for each operation op P Σ, equations

getenvpλc . oppa, κ cqq “ oppa, λb . getenvpλc . κ c bqq,

setenvpc, oppa, κqq “ oppa, λb . setenvpc, κ bqq,

expressing that external operations do not interact with kernel state. It is not
diﬃcult to see that KΣ,E,S,C induces, up to isomorphism, the kernel monad

KΣ,E,S,C X “ C ñ TreeΣ pppX ` Eq ˆ Cq ` Sq .

def

How about user code? It can of course call operations from a signature Σ
(not necessarily the same as the kernel code), and because we intend it to handle
exceptions, it might as well have the ability to raise them. However, user code
knows nothing about signals and kernel state. Thus, we choose the user theory
UΣ,E to be the algebraic theory with operations Σ, exceptions raise : E 0, and
no equations. This theory induces the user monad UΣ,E X “ TreeΣ pX ` Eq.
def

3.2 Runners as a programming construct

In this section, we turn the ideas presented so far into programming constructs.
We strive for a realistic result, but when faced with several design options, we
prefer simplicity and semantic clarity. We focus here on translating the central
concepts, and postpone various details to §4, where we present a full calculus.
We codify the idea of user and kernel computations by having syntactic
categories for each of them, as well as one for values. We use letters M , N to
indicate user computations, K, L for kernel computations, and V , W for values.
User and kernel code raise exceptions with operation raise, and catch them
with exception handlers based on Benton and Kennedy’s exceptional syntax [7],

try M with treturn x ÞÑ N, . . . , raise e ÞÑ Ne , . . .u,

and analogously for kernel code. The familiar binding construct let x “ M in N
is simply shorthand for try M with treturn x ÞÑ N, . . . , raise e ÞÑ raise e, . . .u.
As a programming concept, a runner R takes the form

tpop x ÞÑ Kop qopPΣ uC ,

where each Kop is a kernel computation, with the variable x bound in Kop , so
that each clause op x ÞÑ Kop determines a co-operation for the kernel monad.
The subscript C indicates the type of the state used by the kernel code Kop .
The corresponding elimination form is a handling-like construct

using R @ V run M ﬁnally F, (2)

Runners in action 37

which uses the co-operations of runner R “at” initial kernel state V to run user
code M , and ﬁnalises its return value, exceptions, and signals with F , see (3)
below. When user code M calls an operation op, the enveloping run construct
runs the corresponding co-operation Kop of R. While doing so, Kop might raise
exceptions. But not every exception makes sense for every operation, and so
we assign to each operation op a set of exceptions Eop which the co-operations
implementing it may raise, by augmenting its operation signature with Eop , as

op : Aop Bop ! Eop .

An exception raised by the co-operation Kop propagates back to the operation

call in the user code. Therefore, an operation call should have not only a contin-
uation x . M receiving a result, but also continuations Ne , one for each e P Eop ,

oppV, px . M q, pNe qePEop q.

If Kop returns a value b P Bop , the execution proceeds as M rb{xs, and as Ne if

Kop raises an exception e P Eop . In examples, we use the generic versions of op-
erations [26], written op V , which pass on return values and re-raise exceptions.
One can pass exceptions back to operation calls also in a language with han-
dlers, such as Eff, by changing the signatures of operations to Aop Bop ` Eop ,
and implementing the exception mechanism by hand, so that every operation call
is followed by a case distinction on Bop ` Eop . One is reminded of how operating
system calls communicate errors back to user code as exceptional values.
A co-operation Kop may also send a signal, in which case the rest of the user
code M is skipped and the control proceeds directly to the corresponding case
of the ﬁnalisation part F of the run construct (2), whose syntactic form is

treturn x @ c ÞÑ N, . . . , raise e @ c ÞÑ Ne , . . . , kill s ÞÑ Ns , . . .u. (3)

Speciﬁcally, if M returns a value v, then N is evaluated with x bound to v and c

to the final kernel state; if M raises an exception e (either directly or indirectly
via a co-operation of R), then Ne is executed, again with c bound to the final
kernel state; and if a co-operation of R sends a signal s, then Ns is executed.
Example 4. In anticipation of setting up the complete calculus we show how one
can work with files. The language implementors can provide an operation open
which opens a file for writing and returns its file handle, an operation close which
closes a file handle, and a runner fileIO that implements writing. Let us further
suppose that fileIO may raise an exception QuotaExceeded if a write exceeds the
user disk quota, and send a signal IOError if an unrecoverable external error
occurs. The following code illustrates how to guarantee proper closing of the file:
using fileIO @ (open "hello.txt") run
write "Hello, world."
finally {
return x @ fh Ñ close fh,
raise QuotaExceeded @ fh Ñ close fh,
kill IOError Ñ return () }
38 D. Ahman and A. Bauer

Notice that the user code does not have direct access to the file handle. Instead,
the runner holds it in its state, where it is available to the co-operation that
implements write. The finalisation block gets access to the file handle upon suc-
cessful completion and raised exception, so it can close the file, but when a signal
happens the finalisation cannot close the file, nor should it attempt to do so.
We also mention that the code “cheats” by placing the call to open in a posi-
tion where a value is expected. We should have let-bound the file handle returned
by open outside the run construct, which would make it clear that opening the
file happens before this construct (and that open is not handled by the finalisa-
tion), but would also expose the file handle. Since there are clear advantages to
keeping the file handle inaccessible, a realistic language should accept the above
code and hoist computations from value positions automatically.

4 A calculus for programming with runners

Inspired by the semantic notion of runners and the ideas of the previous section,
we now present a calculus for programming with co-operations and runners,
called λcoop . It is a low-level ﬁne-grain call-by-value calculus [19], and as such
could inspire an intermediate language that a high-level language is compiled to.

4.1 Types

The types of λcoop are shown in Fig. 1. The ground types contain base types, and
are closed under finite sums and products. These are used in operation signa-
tures and as types of kernel state. (Allowing arbitrary types in either of these
entails substantial complications that can be dealt with but are tangential to
our goals.) Ground types can also come with corresponding constant symbols f,
each associated with a fixed constant signature f : pA1 , . . . , An q Ñ B.
We assume a supply of operation symbols O, exception names E, and signal
names S. Each operation symbol op P O is equipped with an operation signature
Aop Bop ! Eop , which specifies its parameter type Aop and arity type Bop , and
the exceptions Eop that the corresponding co-operations may raise in runners.
The value types extend ground types with two function types, and a type
of runners. The user function type X Ñ Y ! pΣ, Eq classifies functions tak-
ing arguments of type X to computations classified by the user (computa-
tion) type Y ! pΣ, Eq, i.e., those that return values of type Y , and may call
operations Σ and raise exceptions E. Similarly, the kernel function type X Ñ
Y pΣ, E, S, Cq classifies functions taking arguments of type X to computations
classified by the kernel (computation) type Y pΣ, E, S, Cq, i.e., those that return
values of type Y , and may call operations Σ, raise exceptions E, send signals S,
and use state of type C. We note that the ingredients for user and kernel types
correspond precisely to the parameters of the user monad UΣ,E and the kernel
monad KΣ,E,S,C from §3.1. Finally, the runner type Σ ñ pΣ 1 , S, Cq classifies run-
ners that implement co-operations for the operations Σ as kernel computations
which use operations Σ 1 , send signals S, and use state of type C.
Runners in action 39

Ground type A, B, C ::“ b base type

ˇ
ˇ unit unit type
ˇ
ˇ empty empty type
ˇ
ˇ AˆB product type
ˇ
ˇ A`B sum type
Constant signature: f : pA1 , . . . , An q Ñ B
Signature Σ ::“ top1 , op2 , . . . , opn u Ă O
Exception set E ::“ te1 , e2 , . . . , en u Ă E
Signal set S ::“ ts1 , s2 , . . . , sn u Ă S
Operation signature: op : Aop Bop ! Eop
Value type X, Y , Z ::“ A ground type
ˇ
ˇ X ˆY product type
ˇ
ˇ X `Y sum type
ˇ
ˇ X ÑY !U user function type
ˇ
ˇ X Ñ Y K kernel function type
ˇ
ˇ Σ ñ pΣ 1 , S, Cq runner type
User (computation) type: X !U where U “ pΣ, Eq
Kernel (computation) type: X K where K “ pΣ, E, S, Cq

Fig. 1. The types of λcoop .

4.2 Values and computations

The syntax of terms is shown in Fig. 2. The usual fine-grain call-by-value strat-
ification of terms into pure values and effectful computations is present, except
that we further distinguish between user and kernel computations.

Values Among the values are variables, constants for ground types, and con-
structors for sums and products. There are two kinds of functions, for abstracting
over user and kernel computations. A runner is a value of the form
tpop x ÞÑ Kop qopPΣ uC .
It implements co-operations for operations op as kernel computations Kop , with
x bound in Kop . The type annotation C speciﬁes the type of the state that Kop
uses. Note that C ranges over ground types, a restriction that allows us to deﬁne
a naive set-theoretic semantics. We sometimes omit these type annotations.

User and kernel computations The user and kernel computations both have
pure computations, function application, exception raising and handling, stan-
40 D. Ahman and A. Bauer

Values
V, W ::“ x variable
ˇ
ˇ fpV1 , . . . , Vn q ground constant
ˇ
ˇ pq unit
ˇ
ˇ pV, W q pair
ˇ ˇ
ˇ inlX,Y V ˇ inrX,Y V injection
ˇ
ˇ fun px : Xq ÞÑ M user function
ˇ
ˇ funK px : Xq ÞÑ K kernel function
ˇ
ˇ tpop x ÞÑ Kop qopPΣ uC runner

User computations
M, N ::“ return V value
ˇ
ˇ V W application
ˇ
ˇ try M with treturn x ÞÑ N, praise e ÞÑ Ne qePE u exception handler
ˇ
ˇ match V with tpx, yq ÞÑ M u product elimination
ˇ
ˇ match V with tuX empty elimination
ˇ
ˇ match V with tinl x ÞÑ M, inr y ÞÑ N u sum elimination
ˇ
ˇ opX pV, px . M q, pNe qePEop q operation call
ˇ
ˇ raiseX e raise exception
ˇ
ˇ using V @ W run M ﬁnally F running user code
ˇ
ˇ kernel K @ W ﬁnally F switch to kernel mode

F ::“ treturn x @ c ÞÑ N, praise e @ c ÞÑ Ne qePE , pkill s ÞÑ Ns qsPS u

Kernel computations
K, L ::“ returnC V value
ˇ
ˇ V W application
ˇ
ˇ try K with treturn x ÞÑ L, praise e ÞÑ Le qePE u exception handler
ˇ
ˇ match V with tpx, yq ÞÑ Ku product elimination
ˇ
ˇ match V with tuX@C empty elimination
ˇ
ˇ match V with tinl x ÞÑ K, inr y ÞÑ Lu sum elimination
ˇ
ˇ opX pV, px . Kq, pLe qePEop q operation call
ˇ
ˇ raiseX@C e raise exception
ˇ
ˇ killX@C s send signal
ˇ
ˇ getenvC pc . Kq get kernel state
ˇ
ˇ setenvpV, Kq set kernel state
ˇ
ˇ user M with treturn x ÞÑ K, praise e ÞÑ Le qePE u switch to user mode

Fig. 2. Values, user computations, and kernel computations of λcoop .

Runners in action 41

dard elimination forms, and operation calls. Note that the typing annotations
on some of these differ according to their mode. For instance, a user operation
call is annotated with the result type X, whereas the annotation X @ C on a
kernel operation call also specifies the kernel state type C.
The binding construct letX!E x “ M in N is not part of the syntax, but is an
abbreviation for try M with treturn x ÞÑ N, praise e ÞÑ raiseX eqePE u, and there is
an analogous one for kernel computations. We often drop the annotation X!E.
Some computations are specific to one or the other mode. Only the kernel
mode may send a signal with kill, and manipulate state with getenv and setenv,
but only the user mode has the run construct from §3.2. Finally, each mode has
the ability to “context switch” to the other one. The kernel computation

user M with treturn x ÞÑ K, praise e ÞÑ Le qePE u

runs a user computation M and handles the returned value and leftover excep-
tions with kernel computations K and Le . Conversely, the user computation

kernel K @ W ﬁnally tx @ c ÞÑ M, praise e @ c ÞÑ Ne qePE , pkill s ÞÑ Ns qsPS u

runs kernel computation K with initial state W , and handles the returned value,
and leftover exceptions and signals with user computations M , Ne , and Ns .

4.3 Type system

We equip λcoop with a type system akin to type and effect systems for algebraic
effects and handlers [3,7,12]. We are experimenting with resource control, so it
makes sense for the type system to tightly control resources. Consequently, our
effect system does not allow effects to be implicitly propagated outwards.
In §4.1, we assumed that each operation op P O is equipped with some fixed
operation signature op : Aop Bop ! Eop . We also assumed a fixed constant
signature f : pA1 , . . . , An q Ñ B for each ground constant f. We consider this
information to be part of the type system and say no more about it.
Values, user computations, and kernel computations each have a correspond-
ing typing judgement form and a subtyping relation, given by

Γ $ V : X, Γ $ M : X ! U, Γ $ K : X K,
X Ď Y, X ! U Ď Y ! V, X K Ď Y L,

where Γ is a typing context x1 : X1 , . . . , xn : Xn . The eﬀect information is an

over-approximation, i.e., M and K employ at most the eﬀects described by U
and K. The complete rules for these judgements are given in the online appendix.
We comment here only on the rules that are peculiar to λcoop , see Fig. 3.
Subtyping of ground types Sub-Ground is trivial, as it relates only equal
types. Subtyping of runners Sub-Runner and kernel computations Sub-Kernel
requires equality of the kernel state types C and C 1 because state is used invari-
antly in the kernel monad. We leave it for future work to replace C ” C 1 with
a lens [10] from C 1 to C, i.e., maps C 1 Ñ C and C 1 ˆ C Ñ C 1 satisfying state
42 D. Ahman and A. Bauer

Sub-Runner
Sub-Ground Σ11 Ď Σ1 Σ2 Ď Σ21 S Ď S1 C ” C1
AĎA Σ1 ñ pΣ2 , S, Cq Ď Σ11 ñ pΣ21 , S 1 , C 1 q

Sub-Kernel
X Ď X1 Σ Ď Σ1 E Ď E1 S Ď S1 C ” C1
X pΣ, E, S, Cq Ď X 1 pΣ 1 , E 1 , S 1 , C 1 q

TyUser-Try ` ˘
Γ $ M : X ! pΣ, Eq Γ, x : X $ N : Y ! pΣ, E 1 q Γ $ Ne : Y ! pΣ, E 1 q ePE
Γ $ try M with treturn x ÞÑ N, praise e ÞÑ Ne qePE u : Y ! pΣ, E 1 q

TyUser-Run
F ” treturn x @ c ÞÑ N, praise e @ c ÞÑ Ne qePE , pkill s ÞÑ Ns qsPS u
Γ $ V : Σ ñ pΣ 1 , S, Cq Γ $W :C
` Γ $ M : X ! pΣ, Eq ˘Γ, x : X, c
` : C $ N : Y ! pΣ 1 , E 1 q˘
Γ, c : C $ Ne : Y ! pΣ 1 , E 1 q ePE Γ $ Ns : Y ! pΣ 1 , E 1 q sPS
Γ $ using V @ W run M ﬁnally F : Y ! pΣ 1 , E 1 q

TyUser-Op
U ” pΣ, Eq op P Σ` Γ $ V : Aop˘
Γ, x : Bop $ M : X ! U Γ $ Ne : X ! U ePE
op

Γ $ opX pV, px . M q, pNe qePEop q : X ! U

TyKernel-Op
K ” pΣ, E, S, Cq op P` Σ Γ $ V :˘Aop
Γ, x : Bop $ K : X K Γ $ Le : X K ePE
op

Γ $ opX pV, px . Kq, pLe qePEop q : X K

TyUser-Kernel
F ” treturn x @ c ÞÑ N, praise e @ c ÞÑ Ne qePE , pkill s ÞÑ Ns qsPS u
Γ $ K `: X pΣ, E, S, Cq Γ $ ˘W : C ` Γ, x : X, c : C $ N : ˘Y ! pΣ, E 1 q
Γ, c : C $ Ne : Y ! pΣ, E 1 q ePE Γ $ Ns : Y ! pΣ, E 1 q sPS
Γ $ kernel K @ W ﬁnally F : Y ! pΣ, E 1 q

TyKernel-User
K ” pΣ, E 1 , S, Cq Γ` $ M : X ! pΣ, ˘Eq
Γ, x : X $ K : Y K Γ $ Le : Y K ePE
Γ $ user M with treturn x ÞÑ K, praise e ÞÑ Le qePE u : Y K

Fig. 3. Selected typing and subtyping rules.

Runners in action 43

equations analogous to Example 1. It has been observed [24,31] that such a lens
in fact amounts to an ordinary runner for C-valued state.
The rules TyUser-Op and TyKernel-Op govern operation calls, where we
have a success continuation which receives a value returned by a co-operation,
and exceptional continuations which receive exceptions raised by co-operations.
The rule TyUser-Run requires that the runner V implements all the opera-
tions M can use, meaning that operations are not implicitly propagated outside
a run block (which is diﬀerent from how handlers are sometimes implemented).
Of course, the co-operations of the runner may call further external operations,
as recorded by the signature Σ 1 . Similarly, we require the ﬁnally block F to in-
tercept all exceptions and signals that might be produced by the co-operations
of V or the user code M . Such strict control is exercised throughout. For ex-
ample, in TyUser-Run, TyUser-Kernel, and TyKernel-User we catch all
the exceptions and signals that the code might produce. One should judiciously
relax these requirements in a language that is presented to the programmer, and
allow re-raising and re-sending clauses to be automatically inserted.

4.4 Equational theory

We present λcoop as an equational calculus, i.e., the interactions between its
components are described by equations. Such a presentation makes it easy to
reason about program equivalence. There are three equality judgements
Γ $ V ” W : X, Γ $ M ” N : X ! U, Γ $ K ” L : X ! K.
It is presupposed that we only compare well-typed expressions with the indicated
types. For the most part, the context and the type annotation on judgements
will play no significant role, and so we shall drop them whenever possible.
We comment on the computational equations for constructs characteristic
of λcoop , and refer the reader to the online appendix for other equations. When
read left-to-right, these equations explain the operational meaning of programs.
Of the three equations for run, the first two specify that returned values and
raised exceptions are handled by the corresponding clauses,
using V @ W run preturn V 1 q finally F ” N rV 1 {x, W {cs,
using V @ W run praiseX eq finally F ” Ne rW {cs,

where F “ treturn x @ c ÞÑ N, praise e @ c ÞÑ Ne qePE , pkill s ÞÑ Ns qsPS u. The third

def

equation below relates running an operation op with executing the corresponding

co-operation Kop , where R stands for the runner tpop x ÞÑ Kop qopPΣ uC :

using R @ W run popX pV, px . M q, pNe1 1 qe1 PEop qq ﬁnally F ”

kernel Kop rV {xs @ W finally

return x @ c1 ÞÑ pusing R @ c1 run M finally F q,
` ˘
raise e1 @ c1 ÞÑ pusing R @ c1 run Ne1 1 finally F q e1 PEop ,
(
pkill s ÞÑ Ns qsPS
44 D. Ahman and A. Bauer

Because Kop is kernel code, it is executed in kernel mode, whose finally clauses
specify what happens afterwards: if Kop returns a value, or raises an exception,
execution continues with a suitable continuation, with R wrapped around it; and
if Kop sends a signal, the corresponding finalisation code from F is evaluated.
The next bundle describes how kernel code is executed within user code:
kernel preturnC V q @ W finally F ” N rV {x, W {cs,
kernel praiseX@C eq @ W finally F ” Ne rW {cs,
kernel pkillX@C sq @ W finally F ” Ns ,
kernel pgetenvC pc . Kqq @ W finally F ” kernel KrW {cs @ W finally F,
kernel psetenvpV, Kqq @ W finally F ” kernel K @ V finally F.
We also have an equation stating that an operation called in kernel mode prop-
agates out to user mode, with its continuations wrapped in kernel mode:

kernel opX pV, px . Kq, pLe1 qe1 PE q @ W ﬁnally F ”

opX pV, px . kernel K @ W ﬁnally F q, pkernel Le1 @ W ﬁnally F qe1 PE q.
Similar equations govern execution of user computations in kernel mode.
The remaining equations include standard βη-equations for exception han-
dling [7], deconstruction of products and sums, algebraicity equations for oper-
ations [33], and the equations of kernel theory from §3.1, describing how getenv
and setenv work, and how they interact with signals and other operations.

5 Denotational semantics
We provide a coherent denotational semantics for λcoop , and prove it sound with
respect to the equational theory given in §4.4. Having eschewed all forms of
recursion, we may aﬀord to work simply over the category of sets and functions,
while noting that there is no obstacle to incorporating recursion at all levels and
switching to domain theory, similarly to the treatment of eﬀect handlers in [3].

5.1 Semantics of types

The meaning of terms is most naturally defined by structural induction on their
typing derivations, which however are not unique in λcoop due to subsumption
rules. Thus we must worry about devising a coherent semantics, i.e., one in which
all derivations of a judgement get the same meaning. We follow prior work on the
semantics of effect systems for handlers [3], and proceed by first giving a skeletal
semantics of λcoop in which derivations are manifestly unique because the effect
information is unrefined. We then use the skeletal semantics as the frame upon
which rests a refinement-style coherent semantics of the effectful types of λcoop .
The skeletal types are like λcoop ’s types, but with all effect information erased.
In particular, the ground types A, and hence the kernel state types C, do not
change as they contain no effect information. The skeletal value types are
P, Q ::“ A | unit | empty | P ˆ Q | P ` Q | P Ñ Q! | P Ñ QC | runner C.
Runners in action 45

The skeletal versions of the user and kernel types are P ! and P C, respec-
tively. It is best to think of the skeletal types as ML-style types which implicitly
over-approximate effect information by “any effect is possible”, an idea which is
mathematically expressed by their semantics, as explained below.
First of all, the semantics of ground types is straightforward. One only needs
to provide sets denoting the base types b, after which the ground types receive
the standard set-theoretic meaning, as given in Fig. 4.
Recall that O, S, and E are the sets of all operations, signals, and exceptions,
and that each op P O has a signature op : Aop Bop ! Eop . Let us additionally
assume that there is a distinguished operation O P O with signature O : 1 0 ! 0
(otherwise we adjoin it to O). It ensures that the denotations of skeletal user and
kernel types are pointed sets, while operationally O indicates a runtime error.
Next, we define the skeletal user and kernel monads as
Us X “ UO,E X “ TreeO pX ` Eq ,
def

KsC X “ KO,E,S,C X “ pC ñ TreeO ppX ` Eq ˆ C ` Sqq,

def

and Runners C as the set of all skeletal runners R (with state C), which are fami-
lies of co-operations topR : rrAop ss Ñ KO,Eop ,S,C rrBop ssuopPO . Note that KO,Eop ,S,C
is a coproduct [11] of monads C ñ TreeO p´ ˆ C ` Sq and ExcEop , and thus the
skeletal runners are the effectful runners for the former monad, so long as we
read the effectful signatures op : Aop Bop ! Eop as ordinary algebraic ones
op : Aop Bop ` Eop . While there is no semantic difference between the two
readings, there is one of intention: KO,Eop ,S,C rrBop ss is a kernel computation that
(apart from using state and sending signals) returns values of type Bop and raises
exceptions Eop , whereas C ñ TreeO pprrBop ss ` Eop q ˆ C ` Sq returns values of
type Bop ` Eop and raises no exceptions. We prefer the former, as it reflects our
treatment of exceptions as a control mechanism rather than exceptional values.
These ingredients suffice for the denotation of skeletal types as sets, as given
in Fig. 4. The user and kernel skeletal types are interpreted using the respective
skeletal monads, and hence the two function types as Kleisli exponentials.
We proceed with the semantics of effectful types. The skeleton of a value
type X is the skeletal type X s obtained by removing all effect information, and
similarly for user and kernel types, see Fig. 5. We interpret a value type X as a
subset rrrXsss Ď rrX s ss of the denotation of its skeleton, and similarly for user and
computation types. In other words, we treat the effectful types as refinements
of their skeletons. For this, we define the operation pX0 , X1 q pY0 , Y1 q, for any
X0 Ď X1 and Y0 Ď Y1 , as the set of maps X1 Ñ Y1 restricted to X0 Ñ Y0 :
pX0 , X1 q pY0 , Y1 q “ tf : X1 Ñ Y1 | @x P X0 . f pxq P Y0 u.
def

Next, observe that the user and the kernel monads preserve subset inclusions, in
the sense that UΣ,E X Ď UΣ 1 ,E 1 X 1 and KΣ,E,S,C X Ď KΣ 1 ,E 1 ,S 1 ,C X 1 if Σ Ď Σ 1 ,
E Ď E 1 , S Ď S 1 , and X Ď X 1 . In particular, we always have UΣ,E X Ď Us X
and KΣ,E,S,C X Ď KsC X. Finally, let RunnerΣ,Σ 1 ,S C Ď Runners C be the subset
of those runners R whose co-operations for Σ factor through KΣ 1 ,Eop ,S,C , i.e.,
opR : rrAop ss Ñ KΣ 1 ,Eop ,S,C rrBop ss Ď KO,Eop ,S,C rrBop ss, for each op P Σ.
46 D. Ahman and A. Bauer

Ground types

rrbss “ ¨ ¨ ¨ rrunitss “ 1 rremptyss “ 0

def def def

rrA ˆ Bss “ rrAss ˆ rrBss rrA ` Bss “ rrAss ` rrBss

def def

Skeletal types

rrP ˆ Qss “ rrP ss ˆ rrQss rrP Ñ Q!ss “ rrP ss ñ rrQ!ss

def def

rrP ` Qss “ rrP ss ` rrQss rrP Ñ QCss “ rrP ss ñ rrQCss

def def

rrrunner Css “ Runners rrCss rrP !ss “ Us rrP ss rrP Css “ KsrrCss rrP ss
def def def

rrx1 : P1 , . . . , xn : Pn ss “ rrP1 ss ˆ ¨ ¨ ¨ ˆ rrPn ss

def

Fig. 4. Denotations of ground and skeletal types.

Semantics of eﬀectful types is given in Fig. 5. From a category-theoretic

viewpoint, it assigns meaning in the category SubpSetq whose objects are subset
inclusions X0 Ď X1 and morphisms from X0 Ď X1 to Y0 Ď Y1 those maps X1 Ñ
Y1 that restrict to X0 Ñ Y0 . The interpretations of products, sums, and function
types are precisely the corresponding category-theoretic notions ˆ, `, and in
SubpSetq. Even better, the pairs of submonads UΣ,E Ď Us and KΣ,E,S,C Ď KsC
are the “SubpSetq-variants” of the user and kernel monads. Such an abstract
point of view drives the interpretation of terms, given below, and it additionally
suggests how our semantics can be set up on top of a category other than Set. For
example, if we replace Set with the category Cpo of ω-complete partial orders,
we obtain the domain-theoretic semantics of eﬀect handlers from [3] that models
recursion and operations whose signatures contain arbitrary types.

5.2 Semantics of values and computations

To give semantics to λcoop ’s terms, we introduce skeletal typing judgements
Γ $s V : P, Γ $s M : P !, Γ $s K : P C,
which assign skeletal types to values and computations. In these judgements, Γ
is a skeletal context which assigns skeletal types to variables.
The rules for these judgements are obtained from λcoop ’s typing rules, by
excluding subsumption rules and by relaxing restrictions on eﬀects. For example,
the skeletal versions of the rules TyValue-Runner and TyKernel-Kill are
` ˘
Γ, x : Aop $s Kop : Bop C opPΣ sPS
Γ $s tpop x ÞÑ Kop qopPΣ uC : runner C Γ $s killX@C s : X s C
The relationship between eﬀectful and skeletal typing is summarised as follows:
Proposition 5. (1) Skeletal typing derivations are unique. (2) If X Ď Y , then
X s “ Y s , and analogously for subtyping of user and kernel types. (3) If Γ $ V : X,
then Γ s $s V : X s , and analogously for user and kernel computations.
Runners in action 47

Skeletons

pΣ ñ pΣ 1 , S, Cqq “ runner C
s def
As “ A pX ˆ Y qs “ X s ˆ Y s
def def

pX Ñ Y ! Uqs “ X s Ñ pY ! Uqs pX ` Y qs “ X s ` Y s
def def

pX Ñ Y Kqs “ X s Ñ pY Kqs pX ! U qs “ X s !
def def

px1 : X1 , . . . , xn : Xn qs “ px1 : X1s , . . . , xn : Xns q pX pΣ, E, S, Cqqs “ X s C

def def

Denotations
rrrAsss “ rrAss rrrX ˆ Y sss “ rrrXsss ˆ rrrXsss
def def

rrrΣ ñ pΣ 1 , S, Cqsss “ RunnerΣ,Σ 1 ,S rrrCsss rrrX ` Y sss “ rrrXsss ` rrrXsss

def def

rrrX Ñ Y ! Usss “ prrrXsss, rrX s ssq prrrY ! U sss, rrpY ! Uqs ssq
def

rrrX Ñ Y Ksss “ prrrXsss, rrX s ssq prrrY Ksss, rrpY Kqs ssq
def

rrrX ! pΣ, Eqsss “ UΣ,E rrrXsss rrrX pΣ, E, S, Cqsss “ KΣ,E,S,rrCss rrrXsss
def def

rrrx1 : X1 , . . . , xn : Xn sss “ rrrX1 sss ˆ ¨ ¨ ¨ ˆ rrrXn sss

def

Fig. 5. Skeletons and denotations of types.

Proof. We prove (1) by induction on skeletal typing derivations, and (2) by

induction on subtyping derivations. For (1), we further use the occasional type
annotations, and the absence of skeletal subsumption rules. For proving (3),
suppose that D is a derivation of Γ $ V : X. We may translate D to its skeleton
Ds deriving Γ s $s V : X s by replacing typing rules with matching skeletal ones,
skipping subsumption rules due to (2). Computations are treated similarly. \ [
To ensure semantic coherence, we ﬁrst deﬁne the skeletal semantics of skeletal
typing judgements, rrΓ $s V : P ss : rrΓ ss Ñ rrP ss, rrΓ $s M : P !ss : rrΓ ss Ñ rrP !ss,
and rrΓ $s K : PCss : rrΓ ss Ñ rrPCss, by induction on their (unique) derivations.
Provided maps rrA1 ssˆ¨ ¨ ¨ˆrrAn ss Ñ rrBss denoting ground constants f, values
are interpreted in a standard way, using the bi-cartesian closed structure of sets,
except for a runner tpop x ÞÑ Kop qopPΣ uC , which is interpreted at an environment
γ P rrΓ ss as the skeletal runner top : rrAop ss Ñ KO,Eop ,S,rrCss rrBop ssuopPO , given by

op a “ pif op P Σ then ρprrΓ, x : Aop $s Kop : Bop Csspγ, aqq else Oq.
def

Here the map ρ : KsrrCss rrBop ss Ñ KO,Eop ,S,rrCss rrBop ss is the skeletal kernel theory
homomorphism characterised by the equations
ρpreturn bq “ return b, ρpop1 pa1 , κ, pνe qePEop1 qq “ op1 pa1 , ρ ˝ κ, pρpνe qqePEop1 q,
ρpgetenv κq “ getenvpρ ˝ κq, ρpraise eq “ pif e P Eop then raise e else Oq,
ρpsetenvpc, κqq “ getenvpc, ρ ˝ κq, ρpkill sq “ kill s.
The purpose of O in the deﬁnition of op is to model a runtime error when the
runner is asked to handle an unexpected operation, while ρ makes sure that op
raises at most the exceptions Eop , as prescribed by the signature of op.
48 D. Ahman and A. Bauer

User and kernel computations are interpreted as elements of the correspond-

ing skeletal user and kernel monads. Again, most constructs are interpreted in
a standard way: returns as the units of the monads; the operations raise, kill,
getenv, setenv, and ops as the corresponding algebraic operations; and match
statements as the corresponding semantic elimination forms. The interpretation
of exception handling oﬀers no surprises, e.g., as in [30], as long as we follow the
strategy of treating unexpected situations with the runtime error O.
The most interesting part of the interpretation is the semantics of

Γ $s pusing V @ W run M ﬁnally F q : Q!, (4)

where F “ treturn x @ c ÞÑ N, praise e @ c ÞÑ Ne qePE , pkill s ÞÑ Ns qsPS u. At an

def

environment γ P rrΓ ss, V is interpreted as a skeletal runner with state rrCss, which
induces a monad morphism r : TreeO p´q Ñ prrCss ñ TreeO p´ ˆ rrCss ` Sqq, as
in the proof of Prop. 3. Let f : KsrrCss rrP ss Ñ prrCss ñ Us rrQssq be the skeletal
kernel theory homomorphism characterised by the equations

f preturn pq “ λc . rrΓ, x : P, c : C $s N : Qsspγ, p, cq,

f poppa, κ, pνe qePEop qq “ λc . oppa, λb . f pκ bq c, pf pνe q cqePEop q,
(5)
f praise eq “ λc . pif e P E then rrΓ, c : C $s Ne : Qsspγ, cq else Oq,
f pkill sq “ λc . pif s P S then rrΓ $s Ns : Qss γ else Oq,
f pgetenv κq “ λc . f pκ cq c, f psetenvpc1 , κqq “ λc . f κ c1 .

The interpretation of (4) at γ is f prrrP ssÈ prrΓ $s M : P !ss γqq prrΓ $s W : Css γq,
which reads: map the interpretation of M at γ from the skeletal user monad
to the skeletal kernel monad using r (which models the operations of M by the
cooperations of V ), and from there using f to a map rrCss ñ Us rrQss, that is then
applied to the initial kernel state, namely, the interpretation of W at γ.
We interpret the context switch Γ $s kernel K @ W finally F : Q! at an
environment γ P rrΓ ss as f prrΓ $s K : P Css γq prrΓ $s W : Css γq, where f is the
map (5). Finally, user context switch is interpreted much like exception handling.
We now define coherent semantics of λcoop ’s typing derivations by passing
through the skeletal semantics. Given a derivation D of Γ $ V : X, its skeleton
Ds derives Γ s $s V : X s . We identify the denotation of V with the skeletal one,

rrrΓ $ V : Xsss “ rrΓ s $s V : X s ss : rrΓ s ss Ñ rrX s ss.

def

All that remains is to check that rrrΓ $ V : Xsss restricts to rrrΓ sss Ñ rrrXsss. This
is accomplished by induction on D. The only interesting step is subsumption,
which relies on a further observation that X Ď Y implies rrrXsss Ď rrrY sss. Typing
derivations for user and kernel computations are treated analogously.

5.3 Coherence, soundness, and ﬁnalisation theorems

We are now ready to prove a theorem that guarantees execution of ﬁnalisation
code. But ﬁrst, let us record the fact that the semantics is coherent and sound.
Runners in action 49

Theorem 6 (Coherence and soundness). The denotational semantics of

λcoop is coherent, and it is sound for the equational theory of λcoop from §4.4.
Proof. Coherence is established by construction: any two derivations of the same
typing judgement have the same denotation because they are both (the same)
restriction of skeletal semantics. For proving soundness, one just needs to unfold
the denotations of the left- and right-hand sides of equations from §4.4, and
compare them, where some cases rely on suitable substitution lemmas. \
[
To set the stage for the finalisation theorem, let us consider the computation
using V @ W run M finally F , well-typed by the rule TyUser-Run from Fig. 3.
At an environment γ P rrrΓ sss, the finalisation clauses F are captured semantically
by the finalisation map φγ : prrrXsss ` Eq ˆ rrrCsss ` S Ñ rrrY ! pΣ 1 , E 1 qsss, given by

φγ pι1 pι1 x, cqq “ rrrΓ, x : X, c : C $ N : Y ! pΣ 1 , E 1 qssspγ, x, cq,

def

φγ pι1 pι2 e, cqq “ rrrΓ, c : C $ Ne : Y ! pΣ 1 , E 1 qssspγ, cq,

def

φγ pι2 psqq “ rrrΓ $ Ns : Y ! pΣ 1 , E 1 qsss γ.

def

With φ in hand, we may formulate the finalisation theorem for λcoop , stating that
the semantics of using V @ W run M finally F is a computation tree all of whose
branches end with finalisation clauses from F . Thus, unless some enveloping
runner sends a signal, finalisation with F is guaranteed to take place.
Theorem 7 (Finalisation). A well-typed run factors through finalisation:
rrrΓ $ pusing V @ W run M finally F q : Y ! pΣ 1 , E 1 qsss γ “ φ:γ t,
for some t P TreeΣ 1 pprrrXsss ` Eq ˆ rrrCsss ` Sq.
Proof. We first prove that f u c “ φ:γ pu cq holds for all u P KΣ 1 ,E,S,rrrCsss rrrXsss
and c P rrrCsss, where f is the map (5). The proof proceeds by computational
induction on u [29]. The finalisation statement is then just the special case with
u “ rrrrXsssÈ prrrΓ $ M : X ! pΣ, Eqsss γq and c “ rrrΓ $ W : Csss γ. \
[
def def

6 Runners in action
Let us show examples that demonstrate how runners can be usefully combined
to provide flexible resource management. We implemented these and other ex-
amples in the language Coop and a library Haskell-Coop, see §7.
To make the code more understandable, we do not adhere strictly to the
syntax of λcoop , e.g., we use the generic versions of effects [26], as is customary
in programming, and effectful initialisation of kernel state as discussed in §3.2.
Example 8 (Nesting). In Example 4, we considered a runner fileIO for basic file
operations. Let us suppose that fileIO is implemented by immediate calls to the
operating system. Sometimes, we might prefer to accumulate writes and commit
them all at once, which can be accomplished by interposing between fileIO and
user code the following runner accIO, which accumulates writes in its state:
50 D. Ahman and A. Bauer

{ write s' Ñ let s = getenv () in setenv (concat s s') }string

By nesting the runners, and calling the outer write (the one of fileIO) only in the
finalisation code for accIO, the accumulated writes are commited all at once:
using fileIO @ (open "hello.txt") run
using accIO @ (return "") run
write "Hello, world."; write "Hello, again."
finally { return x @ s Ñ write s; return x }
finally { return x @ fh Ñ ... , raise QuotaExceeded @ fh Ñ ... , kill IOError Ñ ... }

Example 9 (Instrumentation). Above, accIO implements the same signature as

ﬁleIO and thus intercepts operations without the user code being aware of it. This
kind of invisibility can be more generally used to implement instrumentation:
using { ..., op x Ñ let c = getenv () in setenv (c+1); op x, ... }int @ (return 0) run
M
ﬁnally { return x @ c Ñ report_cost c; return x, ... }

Here the interposed runner implements all operations of some enveloping runner,
by simply forwarding them, while also measuring computational cost by counting
the total number of operation calls, which is then reported during ﬁnalisation.
Example 10 (ML-style references). Continuing with the theme of nested run-
ners, they can also be used to implement abstract and safe interfaces to low-level
resources. For instance, suppose we have a low-level implementation of a mem-
ory heap that potentially allows unsafe memory access, and we would like to
implement ML-style references on top of it. A good ﬁrst attempt is the runner
{ ref x Ñ let h = getenv () in
let (r,h') = malloc h x in
setenv h'; return r,
get r Ñ let h = getenv () in memread h r,
put (r, x) Ñ let h = getenv () in memset h r x }heap

which has the desired interface, but still suffers from three deficiencies that can be
addressed with further language support. First, abstract types would let us hide
the fact that references are just memory locations, so that the user code could
never devise invalid references or otherwise misuse them. Second, our simple
typing discipline forces all references to hold the same type, but in reality we
want them to have different types. This could be achieved through quantification
over types in the low-level implementation of the heap, as we have done in the
Haskell-Coop library using Haskell’s forall. Third, user code could hijack
a reference and misuse it out of the scope of the runner, which is difficult to
prevent. In practice the problem does not occur because, so to speak, the runner
for references is at the very top level, from which user code cannot escape.
Example 11 (Monotonic state). Nested runners can also implement access re-
strictions to resources, with applications in security [8]. For example, we can
Runners in action 51

restrict the references from the previous example to be used monotonically by

associating a preorder with each reference, which assignments then have to obey.
This idea is similar to how monotonic state is implemented in the F˚ language [2],
except that we make dynamic checks where F˚ statically uses dependent types.
While we could simply modify the previous example, it is better to implement
a new runner which is nested inside the previous one, so that we obtain a modular
solution that works with any runner implementing operations ref, get, and put:
{ mref x rel Ñ let r = ref x in
let m = getenv () in
setenv (add m (r,rel)); return r,
mget r Ñ get r,
mput (r, y) Ñ let x = get r in
let m = getenv () in
match (sel m r) with
| inl rel Ñ if (rel x y) then put (r, y)
else raise MonotonicityViolation
| inr () Ñ kill NoPreoderFound }mappref,intRelq

The runner’s state is a map from references to preorders on integers. The co-
operation mref x rel creates a new reference r initialised with x (by calling ref of
the outer runner), and then adds the pair pr, relq to the map stored in the runner’s
state. Reading is delegated to the outer runner, while assignment first checks that
the new state is larger than the old one, according to the associated preorder. If
the preorder is respected, the runner proceeds with assignment (again delegated
to the outer runner), otherwise it reports a monotonicity violation. We may not
assume that every reference has an associated preorder, because user code could
pass to mput a reference that was created earlier outside the scope of the runner.
If this happens, the runner simply kills the offending user code with a signal.
Example 12 (Pairing). Another form of modularity is achieved by pairing run-
ners. Given two runners tpop x ÞÑ Kop qopPΣ1 uC1 and tpop1 x ÞÑ Kop1 qop1 PΣ2 uC2 ,
e.g., for state and file operations, we can use them side-by-side by combining
them into a single runner with operations Σ1 ` Σ2 and kernel state C1 ˆ C2 , as
follows (the co-operations op1 of the second runner are treated symmetrically):
{ op x Ñ let (c,c') = getenv () in
user
kernel (Kop x) @ c finally {
return y @ c'' Ñ return (inl (inl y, c'')),
(raise e @ c'' Ñ return (inl (inr e, c'')))ePEop ,
(kill s Ñ return (inr s))sPS1 }
with {
return (inl (inl y, c'')) Ñ setenv (c'', c'); return y,
return (inl (inr e, c'')) Ñ setenv (c'', c'); raise e,
return (inr s) Ñ kill s},
op' x Ñ ... , ... }C1 ˆC2

Notice how the inner kernel context switch passes to the co-operation Kop only
its part of the combined state, and how it returns the result of Kop in a reiﬁed
52 D. Ahman and A. Bauer

form (which requires treating exceptions and signals as values). The outer user
context switch then receives this reiﬁed result, updates the combined state, and
forwards the result (return value, exception, or signal) in unreiﬁed form.

7 Implementation
We accompany the theoretical development with two implementations of λcoop :
a prototype language Coop [6], and a Haskell library Haskell-Coop [1].
Coop, implemented in OCaml, demonstrates what a more fully-featured
language based on λcoop might look like. It implements a bi-directional variant
of λcoop ’s type system, extended with type definitions and algebraic datatypes,
to provide algorithmic typechecking and type inference. The operational seman-
tics is based on the computation rules of the equational theory from §4.4, but
extended with general recursion, pairing of runners from Example 12, and an in-
terface to the OCaml runtime called containers—these are essentially top-level
runners defined directly in OCaml. They are a modular and systematic way of
offering several possible top-level runtime environments to the programmer.
The Haskell-Coop library is a shallow embedding of λcoop in Haskell. The
implementation closely follows the denotational semantics of λcoop . For instance,
user and kernel monads are implemented as corresponding Haskell monads.
Internally, the library uses the Freer monad of Kiselyov [14] to implement free
model monads for given signatures of operations. The library also provides a
means to run user code via Haskell’s top-level monads. For instance, code
that performs input-output operations may be run in Haskell’s IO monad.
Haskell’s advanced features make it possible to use Haskell-Coop to
implement several extensions to examples from §6. For instance, we implement
ML-style state that allow references holding arbitrary values (of different types),
and state that uses Haskell’s type system to track which references are alive.
The library also provides pairing of runners from Example 12, e.g., to combine
state and input-output. We also use the library to demonstrate that ambient
functions from the Koka language [18] can be implemented with runners by
treating their binding and application as co-operations. (These are functions
that are bound dynamically but evaluated in the lexical scope of their binding.)

8 Related work
Comodels and (ordinary) runners have been used as a natural model of stateful
top-level behaviour. For instance, Plotkin and Power [27] have given a treatment
of operational semantics using the tensor product of a model and a comodel.
Recently, Katsumata, Rivas, and Uustalu have generalised this interaction of
models and comodels to monads and comonads [13]. An early version of Eff [4]
implemented resources, which were a kind of stateful runners, although they
lacked satisfactory theory. Uustalu [35] has pointed out that runners are the
additional structure that one has to impose on state to run algebraic eﬀects
statefully. Møgelberg and Staton’s [21] linear-use state-passing translation also
Runners in action 25

relies on equipping the state with a comodel structure for the effects at hand.
Our runners arise when their setup is specialised to a certain Kleisli adjunction.
Our use of kernel state is analogous to the use of parameters in parameter-
passing handlers [30]: their return clause also provides a form of finalisation, as
the final value of the parameter is available. There is however no guarantee of
finalisation happening because handlers need not use the continuation linearly.
The need to tame the excessive generality of handlers, and willingness to give
it up in exchange for efficiency and predictability, has recently been recognised
by Multicore OCaml’s implementors, who have observed that in practice
most handlers resume continuations precisely once [9]. In exchange for impres-
sive efficiency, they require continuations to be used linearly by default, whereas
discarding and copying must be done explicitly, incurring additional cost. Lei-
jen [17] has extended handlers in Koka with a finally clause, whose semantics
ensures that finalisation happens whenever a handler discards its continuation.
Leijen also added an initially clause to parameter-passing handlers, which is used
to compute the initial value of the parameter before handling, but that gets
executed again every time the handler resumes its continuation.

9 Conclusion and future work

We have shown that effectful runners form a mathematically natural and mod-
ular model of resources, modelling not only the top level external resources, but
allowing programmers to also define their own intermediate “virtual machines”.
Effectful runners give rise to a bona fide programming concept, an idea we have
captured in a small calculus, called λcoop , which we have implemented both as a
language and a library. We have given λcoop an algebraically natural denotational
semantics, and shown how to program with runners through various examples.
We leave combining runners and general effect handlers for future work. As
runners are essentially affine handlers, inspired by Multicore OCaml we also
plan to investigate efficient compilation for runners. On the theoretical side, by
developing semantics in a SubpCpoq-enriched setting [32], we plan to support
recursion at all levels, and remove the distinction between ground and arbitrary
types. Finally, by using proof-relevant subtyping [34] and synthesis of lenses [20],
we plan to upgrade subtyping from a simple inclusion to relating types by lenses.

Acknowledgements We thank Daan Leijen for useful discussions about initialisa-

tion and finalisation in Koka, as well as ambient values and ambient functions.
We thank Guillaume Munch-Maccagnoni and Matija Pretnar for discussing re-
sources and potential future directions for λcoop . We are also grateful to the
participants of the NII Shonan Meeting “Programming and reasoning with alge-
braic effects and effect handlers” for feedback on an early version of this work.
This project has received funding from the European Union’s Hori-
zon 2020 research and innovation programme under the Marie
Skłodowska-Curie grant agreement No 834146.
This material is based upon work supported by the Air Force Office of Scientific
Research under award number FA9550-17-1-0326.
54 D. Ahman and A. Bauer

References
1. Ahman, D.: Library Haskell-Coop. Available at https://github.com/
danelahman/haskell-coop (2019)
2. Ahman, D., Fournet, C., Hritcu, C., Maillard, K., Rastogi, A., Swamy, N.: Recalling
a witness: foundations and applications of monotonic state. PACMPL 2(POPL),
65:1–65:30 (2018)
3. Bauer, A., Pretnar, M.: An effect system for algebraic effects and handlers. Logical
Methods in Computer Science 10(4) (2014)
4. Bauer, A., Pretnar, M.: Programming with algebraic effects and handlers. J. Log.
Algebr. Meth. Program. 84(1), 108–123 (2015)
5. Bauer, A.: What is algebraic about algebraic effects and handlers? CoRR
abs/1807.05923 (2018)
6. Bauer, A.: Programming language coop. Available at https://github.com/
andrejbauer/coop (2019)
7. Benton, N., Kennedy, A.: Exceptional syntax. Journal of Functional Programming
11(4), 395–410 (2001)
8. Delignat-Lavaud, A., Fournet, C., Kohlweiss, M., Protzenko, J., Rastogi, A.,
Swamy, N., Zanella-Beguelin, S., Bhargavan, K., Pan, J., Zinzindohoue, J.K.: Im-
plementing and proving the tls 1.3 record layer. In: 2017 IEEE Symp. on Security
and Privacy (SP). pp. 463–482 (2017)
9. Dolan, S., Eliopoulos, S., Hillerström, D., Madhavapeddy, A., Sivaramakrishnan,
K.C., White, L.: Concurrent system programming with effect handlers. In: Wang,
M., Owens, S. (eds.) Trends in Functional Programming. pp. 98–117. Springer
International Publishing, Cham (2018)
10. Foster, J.N., Greenwald, M.B., Moore, J.T., Pierce, B.C., Schmitt, A.: Combinators
for bidirectional tree transformations: A linguistic approach to the view-update
problem. ACM Trans. Program. Lang. Syst. 29(3) (2007)
11. Hyland, M., Plotkin, G., Power, J.: Combining effects: Sum and tensor. Theor.
Comput. Sci. 357(1–3), 70–99 (2006)
12. Kammar, O., Lindley, S., Oury, N.: Handlers in action. In: Proc. of 18th ACM
SIGPLAN Int. Conf. on Functional Programming, ICFP 2013. ACM (2013)
13. Katsumata, S., Rivas, E., Uustalu, T.: Interaction laws of monads and comonads.
CoRR abs/1912.13477 (2019)
14. Kiselyov, O., Ishii, H.: Freer monads, more extensible effects. In: Proc. of 2015
ACM SIGPLAN Symp. on Haskell. pp. 94–105. Haskell ’15, ACM (2015)
15. Koopman, P., Fokker, J., Smetsers, S., van Eekelen, M., Plasmeijer, R.: Functional
Programming in Clean. University of Nijmegen (1998), draft
16. Leijen, D.: Structured asynchrony with algebraic effects. In: Proceedings of
the 2nd ACM SIGPLAN International Workshop on Type-Driven Development,
TyDe@ICFP 2017, Oxford, UK, September 3, 2017. pp. 16–29. ACM (2017)
17. Leijen, D.: Algebraic effect handlers with resources and deep finalization. Tech.
Rep. MSR-TR-2018-10, Microsoft Research (April 2018)
18. Leijen, D.: Programming with implicit values, functions, and control (or, implicit
functions: Dynamic binding with lexical scoping). Tech. Rep. MSR-TR-2019-7,
Microsoft Research (March 2019)
19. Levy, P.B.: Call-By-Push-Value: A Functional/Imperative Synthesis, Semantics
Structures in Computation, vol. 2. Springer (2004)
20. Miltner, A., Maina, S., Fisher, K., Pierce, B.C., Walker, D., Zdancewic, S.: Synthe-
sizing symmetric lenses. Proc. ACM Program. Lang. 3(ICFP), 95:1–95:28 (2019)
Runners in action 55

21. Møgelberg, R.E., Staton, S.: Linear usage of state. Logical Methods in Computer
Science 10(1) (2014)
22. Moggi, E.: Computational lambda-calculus and monads. In: Proc. of 4th Ann.
Symp. on Logic in Computer Science, LICS 1989. pp. 14–23. IEEE (1989)
23. Moggi, E.: Notions of computation and monads. Inf. Comput. 93(1), 55–92 (1991)
24. O’Connor, R.: Functor is to lens as applicative is to biplate: Introducing multiplate.
CoRR abs/1103.2841 (2011)
25. Plotkin, G., Power, J.: Semantics for algebraic operations. In: Proc. of 17th Conf. on
the Mathematical Foundations of Programming Semantics, MFPS XVII. ENTCS,
vol. 45, pp. 332–345. Elsevier (2001)
26. Plotkin, G., Power, J.: Algebraic operations and generic effects. Appl. Categor.
Struct. (1), 69–94 (2003)
27. Plotkin, G., Power, J.: Tensors of comodels and models for operational semantics.
In: Proc. of 24th Conf. on Mathematical Foundations of Programming Semantics,
MFPS XXIV. ENTCS, vol. 218, pp. 295–311. Elsevier (2008)
28. Plotkin, G.D., Power, J.: Notions of computation determine monads. In: Proc. of
5th Int. Conf. on Foundations of Software Science and Computation Structures,
FOSSACS 2002. LNCS, vol. 2303, pp. 342–356. Springer (2002)
29. Plotkin, G.D., Pretnar, M.: A logic for algebraic effects. In: Proc. of 23th Ann.
IEEE Symp. on Logic in Computer Science, LICS 2008. pp. 118–129. IEEE (2008)
30. Plotkin, G.D., Pretnar, M.: Handling algebraic effects. Logical Methods in Com-
puter Science 9(4:23) (2013)
31. Power, J., Shkaravska, O.: From comodels to coalgebras: State and arrays. Electr.
Notes Theor. Comput. Sci. 106, 297–314 (2004)
32. Power, J.: Enriched Lawvere theories. Theory Appl. Categ 6(7), 83–93 (1999)
33. Pretnar, M.: The Logic and Handling of Algebraic Effects. Ph.D. thesis, School of
Informatics, University of Edinburgh (2010)
34. Saleh, A.H., Karachalias, G., Pretnar, M., Schrijvers, T.: Explicit effect subtyping.
In: Proc. of 27th European Symposium on Programming, ESOP 2018. pp. 327–354.
LNCS, Springer (2018)
35. Uustalu, T.: Stateful runners of effectful computations. Electr. Notes Theor. Com-
put. Sci. 319, 403–421 (2015)
36. Wadler, P.: The essence of functional programming. In: Sethi, R. (ed.) Proc. of 19th
Ann. ACM SIGPLAN-SIGACT Symp. on Principles of Programming Languages,
POPL 1992. pp. 1–14. ACM (1992)

Gilles Barthe1,4 , Raphaëlle Crubillé4 , Ugo Dal Lago2,3 , and Francesco

Gavazzo2,3,4
1
MPI for Security and Privacy, Bochum, Germany
2
University of Bologna, Bologna, Italy,
3
INRIA Sophia Antipolis, Sophia Antipolis, France
4
IMDEA Software Institute, Madrid, Spain

Abstract. Logical relations are one among the most powerful tech-
niques in the theory of programming languages, and have been used
extensively for proving properties of a variety of higher-order calculi.
However, there are properties that cannot be immediately proved by
means of logical relations, for instance program continuity and differen-
tiability in higher-order languages extended with real-valued functions.
Informally, the problem stems from the fact that these properties are
naturally expressed on terms of non-ground type (or, equivalently, on
open terms of base type), and there is no apparent good definition for
a base case (i.e. for closed terms of ground types). To overcome this is-
sue, we study a generalization of the concept of a logical relation, called
open logical relation, and prove that it can be fruitfully applied in sev-
eral contexts in which the property of interest is about expressions of
first-order type. Our setting is a simply-typed λ-calculus enriched with
real numbers and real-valued first-order functions from a given set, such
as the one of continuous or differentiable functions. We first prove a
containment theorem stating that for any collection of real-valued first-
order functions including projection functions and closed under function
composition, any well-typed term of first-order type denotes a function
belonging to that collection. Then, we show by way of open logical re-
lations the correctness of the core of a recently published algorithm for
forward automatic differentiation. Finally, we define a refinement-based
type system for local continuity in an extension of our calculus with con-
ditionals, and prove the soundness of the type system using open logical
relations.

Keywords: Lambda Calculus · Logical Relations · Continuity Analysis

· Automatic Diﬀerentiation

The Second and Fourth Authors are supported by the ANR project 16CE250011
REPAS, the ERC Consolidator Grant DIAPASoN – DLV-818616, and the MIUR
PRIN 201784YSZ5 ASPRA.

c The Author(s) 2020

P. Müller (Ed.): ESOP 2020, LNCS 12075, pp. 56–83, 2020.
https://doi.org/10.1007/978-3-030-44914-8_ 3
On the Versatility of Logical Relations 57

1 Introduction

Logical relations have been extremely successful as a way of proving equivalence

between concrete programs as well as correctness of program transformations.
In their “unary” version, they also are a formidable tool to prove termination of
typable programs, through the so-called reducibility technique. The class of pro-
gramming languages in which these techniques have been instantiated includes
not only higher-order calculi with simple types, but also calculi with recursion
[3,2,23], various kinds of effects [14,12,25,36,10,11,34], and concurrency [56,13].
Without any aim to be precise, let us see how reducibility works, in the
setting of a simply typed calculus. The main idea is to define, by induction on
the structure of types, the concept of a well-behaved program, where in the
base case one simply makes reference to the underlying notion of observation
(e.g. being strong normalizing), while the more interesting case is handled by
stipulating that reducible higher-order terms are those which maps reducible
terms to reducible terms, this way exploiting the inductive nature of simple types.
One can even go beyond the basic setting of simple types, and extend reducibility
to, e.g., languages with recursive types [23,2] or even untyped languages [44] by
means of techniques such as step-indexing [3].
The same kind of recipe works in a relational setting, where one wants to
compare programs rather than merely proving properties about them. Again, two
terms are equivalent at base types if they have the same observable behaviour,
while at higher types one wants that equivalent terms are those which maps
equivalent arguments to equivalent results.
There are cases, however, in which the property one observes, or the property
in which the underlying notion of program equivalence or correctness is based,
is formulated for types which are not ground (or equivalently, it is formulated
for open expressions). As an example, one could be interested in proving that in
a higher-order type system all first-order expressions compute numerical func-
tions of a specific kind, for example, continuous or derivable ones. We call such
properties first-order properties 5 . As we will describe in Section 3 below, logical
relations do not seem to be applicable off-the-shelf to these cases. Informally,
this is due to the fact that we cannot start by defining a base case for ground
types and then build the relation inductively.
In this paper, we show that logical relations and reducibility can deal with
first-order properties in a compositional way without altering their nature. The
main idea behind the resulting definition, known as open logical relations [59],
consists in parameterizing the set of related terms of a certain type (or the
underlying reducibility set) on a ground environment, this way turning it into a
set of pairs of open terms. As a consequence, one can define the target first-order
property in a natural way.

5
To avoid misunderstandings, we emphasize that we use first-order properties to refer
to properties of expressions of first-order types—and not in relation with definability
of properties in first-order predicate logic.
58 G. Barthe et al.

Generalizations of logical relations to open terms have been used by sev-

eral authors, and in several (oftentimes unrelated) contexts (see, for instance,
[15,39,47,30,53]). In this paper, we show how open logical relations constitute a
powerful technique to systematically prove first-order properties of programs. In
this respect, the paper’s technical contributions are applications of open logical
relations to three distinct problems.
• In Section 4, we use open logical relations to prove a general Containment
Theorem. Such a theorem serves as a vehicle to introduce open logical re-
lations but is also of independent interest. The theorem states that given a
collection F of real-valued functions including projections and closed under
function composition, any first-order term of a simply-typed λ-calculus en-
dowed with primitives for real numbers and operators computing functions in
F, computes itself a function in F. As an instance of such a result, we see that
any first-order term in a simply-typed λ-calculus extended with primitives
for continuous functions, computes a continuous function. Although the Con-
tainment Theorem can be derived from previous results by Lafont [41] (see
Section 7), our proof is purely syntactical and consists of a straightforward
application of open logical relations.
• In Section 5, we use open logical relations to prove correctness of a core
algorithm for forward automatic differentiation of simply-typed terms. The
algorithm is a fragment of the one presented in [50]. More specifically, any
first-order term is proved to be mapped to another first-order term computing
its derivative, in the usual sense of mathematical analysis. This goes beyond
the Containment Theorem by dealing with relational properties.
• In Section 6, we consider an extended language with an if-then-else con-
struction. When dealing with continuity, the introduction of conditionals in-
validates the Containment Theorem, since conditionals naturally introduce
discontinuities. To overcome this deficiency, we introduce a refinement type
system ensuring that first-order typable terms are continuous functions on
some intended domain, and use open logical relations to prove the soundness
of the type system.
Due to space constraints, many details have to be omitted, but can be found in
an Extended Version of this work [7].

2 The Playground

In order to facilitate the communication of the main ideas behind open logical
relations and their applications, this paper deals with several vehicle calculi. All
such calculi can be seen as derived from a unique calculus, denoted by Λ×,→,R ,
which thus provides the common ground for our inquiry. The calculus Λ×,→,R is
obtained by adding to the simply typed λ-calculus with product and arrow types
(which we denote by Λ×,→ ) a ground type R for real numbers and constants r
of type R, for each real number r.
Given a collection F of real-valued functions, i.e. functions f : Rn → R
(with n ≥ 1), we endow Λ×,→,R with an operator f , for any f ∈ F, whose
On the Versatility of Logical Relations 59

intended meaning is that whenever t1 , . . . , tn compute real numbers r1 , . . . , rn ,

then f (t1 , . . . , tn ) compute f (r1 , . . . , rn ). We call the resulting calculus Λ×,→,R
F .
Depending on the application we are interested in, we will take as F specific
collections of real-valued functions, such as continuous or differentiable functions.
The syntax and static semantics of Λ×,→,R F are defined in Figure 1, where
f : Rn → R belongs to F. The static semantics of Λ×,→,R F is based on judgments
of the form Γ t : τ , which have the usual intended meaning. We adopt standard
syntactic conventions as in [6], notably the so-called variable convention. In
particular, we denote by F V (t) the collection of free variables of t and by s[t/x]
the capture-avoiding substitution of the expression t for all free occurrences of
x in s.

τ ::= R | τ × τ | τ → τ Γ ::= · | x : τ , Γ

t ::= x | r | f (t, . . . , t) | λx.t | tt | (t, t) | t.1 | t.2

Γ t 1 : R · · · Γ tn : R Γ , x : τ1 t : τ2
Γ,x : τ x : τ Γ r:R Γ f (t1 , . . . , tn ) : R Γ λx.t : τ1 → τ2
Γ s : τ1 → τ2 Γ t : τ1 Γ t1 : τ Γ t2 : σ Γ t : τ 1 × τ2
(i ∈ {1, 2})
Γ st : τ2 Γ (t1 , t2 ) : τ × σ Γ t.i : τi

Fig. 1: Static semantics of Λ×,→,R

F .

We do not confine ourselves with a fixed operational semantics (e.g. with a call-
by-value operational semantics), but take advantage of the simply-typed nature
of Λ×,→,R
F and opt for a set-theoretic denotational semantics. The category of
sets and functions being cartesian closed, the denotational semantics of Λ×,→,R F
is standard and associates to any judgment x 1 : τ 1 , . . . , xn : τn t : τ , a function
x1 : τ1 , . . . , xn : τn t : τ : i τi → τ , where τ —the semantics of τ —is

thus defined:

R = R; τ1 → τ2 = τ2 τ1 ; τ1 × τ2 = τ1 × τ2 .

Due to space constraints, we omit the deﬁnition of Γ t : τ and refer the

reader to any textbook on the subject (such as [43]).

3 A Fundamental Gap

In this section, we will look informally at a problem which, apparently, cannot

be solved using vanilla reducibility or logical relations. This serves both as a
60 G. Barthe et al.

motivating example and as a justiﬁcation of some of the design choices we had

to do when designing open logical relations.
Consider the simply-typed λ-calculus Λ×,→ , the prototypical example of a
well-behaved higher-order functional programming language. As is well known,
Λ×,→ is strongly normalizing and the technique of logical relations can be applied
on-the-nose. The proof of strong normalization for Λ×,→ is structured around
the definition of a family of reducibility sets of closed terms {Red τ }τ , indexed by
types. At any atomic type τ , Red τ is defined as the set of terms (of type τ ) having
the property of interest, i.e. as the collection of strongly normalizing terms. The
set Red τ1 →τ2 , instead, contains those terms which, when applied to a term in
Red τ1 , returns a term in Red τ2 . Reducibility sets are afterwards generalised to
open terms, and finally all typable terms are shown to be reducible.
Let us now consider the calculus Λ×,→,R
F , where F contains the addition and
multiplication functions only. This language has already been considered in the
literature, under the name of higher-order polynomials [22,40], which are crucial
tools in higher-order complexity theory and resource analysis. Now, let us ask
ourselves the following question: can we say anything about the nature of those
functions Rn → R which are denoted by (closed) terms of type Rn → R? Of
course, all the polynomials on the real field can be represented, but can we go
beyond, thanks to higher-order constructions? The answer is negative: terms of
type Rn → R represent all and only the polynomials [5,17]. This result is an
instance of the general containment theorem mentioned at the end of Section 1.
Let us now focus on proofs of this containment result. It turns out that proofs
from the literature are not compositional, and rely on“heavyweight” tools, in-
cluding strong normalization of Λ×,→ and soundness of the underlying opera-
tional semantics. In fact, proving the result using usual reducibility arguments
would not be immediate, precisely because there is no obvious choice for the base
case. If, for example, we define Red R as the set of terms strongly normalizing to
a numeral, Red Rn →R as the set of polynomials, and for any other type as usual,
we soon get into troubles: indeed, we would like the two sets of functions

Red R×R→R ; Red R→(R→R) ;

to denote essentially the same set of functions, modulo the adjoint between
R2 → R and R → (R → R). But this is clearly not the case: just consider the
function f in R → (R → R) thus deﬁned:

λy.y if x ≥ 0

f (x) =
λy.y + 1 if x < 0.

Clearly, f turns any fixed real number to a polynomial, but when curried, it
is far from being a polynomial. In other words, reducibility seems apparently
inadequate to capture situations like the one above, in which the “base case” is
not the one of ground types, but rather the one of first-order types.
Before proceeding any further, it is useful to fix the boundaries of our in-
vestigation. We are interested in proving that (the semantics of) programs of
On the Versatility of Logical Relations 61

ﬁrst-order type Rn → R enjoy ﬁrst-order properties, such as continuity or dif-

ferentiability, under their standard interpretation in calculus and real analysis.
More specifically, our results do not cover notions of continuity and differentiabil-
ity studied in fields such as (exact) real-number computation [57] or computable
analysis [58], which have a strong domain-theoretical flavor, and higher-order
generalizations of continuity and differentiability (see, e.g., [26,27,32,29]). We
leave for future work the study of open logical relations in these settings. What
this paper aims to provide, is a family of lightweight techniques that can be
used to show that practical properties of interest of real-valued functions are
guaranteed to hold when programs are written taking advantage of higher-order
constructors. We believe that the three case studies we present in this paper are
both a way to point to the practical scenarios we have in mind and of witnessing
the versatility of our methodology.

4 Warming Up: A Containment Theorem

In this section we introduce open logical relations in their unary version (i.e. open
logical predicates). We do so by proving the following Containment Theorem.
Theorem 1 (Containment Theorem). Let F be a collection of real-valued
functions including projections and closed under function composition. Then,
any Λ×,→,R
F term x1 : R, . . . , xn : R t : R denotes a function (from Rn to R) in
F. That is, x1 : R, . . . , xn : R t : R ∈ F.
As already remarked in previous sections, notable instances of Theorem 1
are obtained by taking F as the collection of continuous functions, or as the
collection of polynomials.
Our strategy to prove Theorem 1 consists in defining a logical predicate,
denoted by F, ensuring the denotation of programs of a first-order type to be
in F, and hereditary preserving this property at higher-order types. However, F
being a property of real-valued functions—and the denotation of an open term
of the form x1 : R, . . . , xn : R t : R being such a function—we shall work with
open terms with free variables of type R and parametrize the candidate logical
predicate by types and environments Θ containing such variables.
This way, we obtain a family of logical predicates FτΘ acting on terms of the
form Θ t : τ . As a consequence, when considering the ground type R and an
environment Θ = x1 : R, . . . , xn : R, we obtain a predicate FRΘ on expressions
Θ t : R which naturally corresponds to functions from Rn to R, for which
belonging to F is indeed meaningful.
Definition 1 (Open Logical Predicate). Let Θ = x1 : R, . . . , xn : R be a fixed
environment. We define the type-indexed family of predicates FτΘ by induction
on τ as follows:
t ∈ FRΘ ⇐⇒ (Θ t : R ∧ Θ t : R ∈ F)
t ∈ FτΘ1 →τ2 ⇐⇒ (Θ t : τ1 → τ2 ∧ ∀s ∈ FτΘ1 . ts ∈ FτΘ2 )
t ∈ FτΘ1 ×τ2 ⇐⇒ (Θ t : τ1 × τ2 ∧ ∀i ∈ {1, 2}. t.i ∈ FτΘi ).
62 G. Barthe et al.

We extend FτΘ to the predicate FτΓ ,Θ , where Γ ranges over arbitrary environ-
ments (possibly containing variables of type R) as follows:

t ∈ FτΓ ,Θ ⇐⇒ (Γ , Θ t : τ ∧ ∀γ. γ ∈ FΘ
Γ
=⇒ tγ ∈ FτΘ ).

Γ
Here, γ ranges over substitutions6 and γ ∈ FΘ holds if the support of γ is Γ and
Θ
γ(x) ∈ Fτ , for any (x : τ ) ∈ Γ .

Notice that Deﬁnition 1 ensures ﬁrst-order real-valued functions to be in F,

and asks for such a property to be hereditary preserved at higher-order types.
Lemma 1 states that these conditions are indeed suﬃcient to guarantee any
Λ×,→,R
F term Θ t : R to denote a function in F.

Lemma 1 (Fundamental Lemma). For all environments Γ , Θ as above, and

for any expression Γ , Θ t : τ , we have t ∈ FτΓ ,Θ .

Proof. By induction on t, observing that FτΘ is closed under denotational se-

mantics: if s ∈ FτΘ and Θ t : τ = Θ s : τ , then t ∈ FτΘ . The proof follows
the same structure of Lemma 3, and thus we omit details here.

Finally, a straightforward application of Lemma 1 gives the desired result,

namely Theorem 1.

5 Automatic Diﬀerentiation

In this section, we show how we can use open logical relations to prove the
correctness of (a fragment of) the automatic differentiation algorithm of [50]
(suitably adapted to our calculus).
Automatic differentiation [8,9,35] (AD, for short) is a family of techniques
to efficiently compute the numerical (as opposed to symbolical ) derivative of
a computer program denoting a real-valued function. Roughly speaking, AD
acts on the code of a program by letting variables incorporate values for their
derivative, and operators propagate derivatives according to the chain rule of
differential calculus [52]. Due to its vast applications in machine learning (back-
propagation [49] being an example of an AD technique) and, most notably, in
deep learning [9], AD is rapidly becoming a topic of interest in the programming
language theory community, as witnessed by the new line of research called dif-
ferentiable programming (see, e.g., [28,50,16,1] for some recent results on AD
and programming language theory developed in the latter field).
AD comes several modes, the two most important ones being the forward
mode (also called tangent mode) and the backward mode (also called reverse
mode). These can be seen as different ways to compute the chain rule, the former
by traversing the chain rule from inside to outside, while the latter from outside
to inside.
6
We write tγ for the result of applying γ to variables in t.
On the Versatility of Logical Relations 63

Here we are concerned with forward mode AD. More specifically, we consider
the forward mode AD algorithm recently proposed in [50]. The latter is based
on a source-to-source program transformation extracting out of a program t a
new program Dt whose evaluation simultaneously gives the result of computing
t and its derivative. This is achieved by augmenting the code of t in such a way
to handle dual numbers 7 .
The transformation roughly goes as follows: expressions s of type R are trans-
formed into dual numbers, i.e. expressions s of type R×R, where the first compo-
nent of s gives the original value of s, and the second component of s gives the
derivative of s. Real-valued function symbols are then extended to handle dual
numbers by applying the chain rule, while other constructors of the language
are extended pointwise.
The algorithm of [50] has been studied by means of benchmarks and, to the
best of the authors’ knowledge, the only proof of its correctness available in the
literature8 has been given at the time of writing by Huot et al. in [37]. However,
the latter proof relies on denotational semantics, and no operational proof of
correctness has been given so far. Differentiability being a first-order concept,
open logical relations are thus a perfect candidate for such a job.

An AD Program Transformation In the rest of this section, given a diﬀerentiable

function f : Rn → R, we denote by ∂x f : Rn → R its partial derivative with
respect to the variable x. Let D be the collection of (real-valued) differentiable
functions, and let us fix a collection F of real-valued functions such that, for any
f ∈ D, both f and ∂x f belong to F. We also assume F to contain functions for
real number arithmetic. Notice that since ∂x f is not necessarily differentiable,
in general ∂x f
∈ D.
We begin by recalling how the program transformation of [50] works on
Λ×,→,R
D , the extension of Λ×,→,R with operators for functions in D. In order
to define the derivative of a Λ×,→,R
D expression, we first define an intermediate
program transformation D : Λ×,→,R
D → Λ×,→,R
F such that:

Γ t : τ =⇒ DΓ Dt : Dτ .

The action of D on types, environments, and expressions is deﬁned in Figure 2.

Notice that t is an expression in Λ×,→,R
D , whereas Dt is an expression in Λ×,→,R
F .

Let us comment the definition of D, beginning with its action on types. Follow-
ing the rationale behind forward-mode AD, the map D associates to the type
7
We represent dual numbers [21] as pairs of the form (x, x ), with x, x ∈ R. The first
component, namely x, is subject to the usual real number arithmetic, whereas the
second component, namely x , obeys to first-order differentiation arithmetic. Dual
numbers are usually presented, in analogy with complex numbers, as formal sums
of the form x + x ε, where ε is an abstract number (an infinitesimal) subject to the
law ε2 = 0.
8
However, we remark that formal approaches to backward automatic differentiation
for higher-order languages have been recently proposed in [1,16] (see Section 7).
64 G. Barthe et al.

DR = R × R D(·) = ·
D(τ1 × τ2 ) = Dτ1 × Dτ2 D(x : τ , Γ ) = dx : Dτ , DΓ
D(τ1 → τ2 ) = Dτ1 → Dτ2

n
Dr = (r, 0) D(f (t1 , . . . , tn )) = (f (Dt1 .1, . . . , Dtn .1), ∂xi f (Dt1 .1, . . . , Dtn .1) ∗ Dti .2)
i=1

Dx = dx D(λx.t) = λdx.Dt D(st) = (Ds)(Dt) D(t.i) = Dt.i D(t1 , t2 ) = (Dt1 , Dt2 )

Fig. 2: Intermediate transformation D

R the product type R × R, the ﬁrst and second components of its inhabitants
being the original expression and its derivative, respectively. The action of D
on non-basic types is straightforward and it is designed so that the automatic
diﬀerentiation machinery can handle higher-order expressions in such a way to
guarantee correctness at real-valued function types.
The action of D on the usual constructors of the λ-calculus is pointwise,
although it is worth noticing that D associates to any variable x of type τ a new
variable, which we denote by dx, of type Dτ . As we are going to see, if τ = R,
then dx acts as a placeholder for a dual number.
More interesting is the action of D on real-valued constructors. To any nu-
meral r, D associates the pair Dr = (r, 0), the derivative of a number being zero.
Let us now inspect the action of D on an operator f associated to f : Rn → R
(we treat f as a function in the variables x1 , . . . , xn ). The interesting part is the
second component of D(f (t1 , . . . , tn )), namely

n

∂xi f (Dt1 .1, . . . , Dtn .1) ∗ Dti .2
i=1

n
where i=1 and ∗ denote the operators (of Λ×,→,R F ) associated to summation
and (binary) multiplication (for readability we omit the underline notation), and
∂xi f is the operator (of Λ×,→,R
F ) associated to partial derivative ∂xi f of f in the
variable xi . It is not hard to recognize that the above expression is nothing but
an instance of the chain rule.
Finally, we notice that if Γ t : τ is a (derivable) judgment in Λ×,→,R
D , then
×,→,R
indeed DΓ Dt : Dτ is a (derivable) judgment in ΛF .

Example 1. Let us consider the binary function f (x1 , x2 ) = sin(x1 ) + cos(x2 ).

For readability, we overload the notation writing f in place of f (and similarly
for ∂xi f ). Given expressions t1 , t2 , we compute D(sin(t1 ) + cos(t2 )). Recall that
On the Versatility of Logical Relations 65

∂x1 f (x1 , x2 ) = cos(x1 ) and ∂x2 f (x1 , x2 ) = − sin(x2 ). We have:

D(sin(t1 ) + cos(t2 ))
= (sin(Dt1 .1) + cos(Dt2 .1), ∂x1 f (Dt1 .1, Dt2 .1) ∗ Dt1 .2 + ∂x2 f (Dt1 .1, Dt2 .1) ∗ Dt2 .2)
= (sin(Dt1 .1) + cos(Dt2 .1), cos(Dt1 .1) ∗ Dt1 .2 − sin(Dt2 .1) ∗ Dt2 .2).
As a consequence, we see that D(λx.λy. sin(x) + cos(y)) is
λdx.λdy.(sin(dx.1) + cos(dy.1), cos(dx.1) ∗ dx.2 − sin(dy.1) ∗ dy.2).
We now aim to define the derivative of an expression x1 : R, . . . , xn : R t : R
with respect to a variable x (of type R). In order to do so we first associate to
any variable y : R its dual expression dualx (y) : R × R defined as:

(y, 1) if x = y
dualx (y) =
(y, 0) otherwise.

Next, we deﬁne for x1 : R, . . . , xn : R t : R the derivative deriv(x, t) of t with

respect to x as:
deriv(x, t) = Dt[dualx (x1 )/dx1 , . . . , dualx (xn )/dxn ].2
Let us clarify this passage with a simple example.
Example 2. Let us compute the derivative of x : R, y : R t : R, where t = x ∗ y.
We ﬁrst of all compute Dt, obtaining:
dx : R × R, dy : R × R ((dx.1) ∗ (dy.1), (dx.1) ∗ (dy.2) + (dx.2) ∗ (dy.1)) : R × R.
Observing that dualx (x) = (x, 1) and dualx (y) = (y, 0), we indeed obtain the
desired derivative as x : R, y : R Dt[dualx (x)/dx, dualx (y)/dy].2 : R. For we
have:
x : R, y : R Dt[dualx (x)/dx, dualx (y)/dy].2 : R
= x : R, y : R (x ∗ y, x ∗ 0 + 1 ∗ y).2 : R
= x : R, y : R y : R = ∂x x : R, y : R x ∗ y : R.
Remark 1. For Θ = x1 : R, . . . , xn : R we have Θ dualy (xi ) : DR and Θ
Ds[dualy (x1 )/dx1 , . . . , dualy (xn )/dxn ] : Dτ , for any variable y and Θ s : τ .

Open Logical relations for AD We have claimed that the operation deriv per-
forms automatic differentiation of Λ×,→,R
D expressions. By that we mean that
once applied to expressions of the form x1 : R, . . . , xn : R t : R, the operation
deriv can be used to compute the derivative of x1 : R, . . . , xn : R t : R. We
now show how we can prove such a statement using open logical relations, this
way providing a proof of correctness of our AD program transformation.
We begin by defining a logical relations R between Λ×,→,R D and Λ×,→,R
F ex-
pressions. We design R in such a way that (i) tRDt and (ii) if tRs and t inhabits
a first-order type, then indeed s corresponds to the derivative of t. While (ii)
essentially holds by definition, (i) requires some efforts in order to be proved.
66 G. Barthe et al.

Deﬁnition 2 (Open Logical Relation). Let Θ = x1 : R, . . . , xn : R be a ﬁxed,

×,→,R
arbitrary environment. Deﬁne the family of relations (RΘ
τ )Θ,τ between ΛD
×,→,R
and ΛF expressions by induction on τ as follows:

Θ t : R ∧ DΘ s : R × R
⎧
⎪
⎪
∀y
⎪
⎨ : R.
t RΘ
R s ⇐⇒
⎪Θ s[dualy (x1 )/dx1 , . . . , dualy (xn )/dxn ].1 : R = Θ t : R
⎪
⎪
Θ s[dualy (x1 )/dx1 , . . . , dualy (xn )/dxn ].2 : R = ∂y Θ t : R
⎩

Θ t : τ1 → τ2 ∧ DΘ s : Dτ1 → Dτ2
t RΘ
τ1 →τ2 s ⇐⇒
∀p, q. p RΘ Θ
τ1 q =⇒ tp Rτ2 sq

Θ t : τ1 × τ2 ∧ DΘ s : Dτ1 × Dτ2
t RΘ
τ1 ×τ2 s ⇐⇒
∀i ∈ {1, 2}. t.i RΘ
τi s.i

We extend RΘτ to the family (Rτ

Γ ,Θ
)Γ ,Θ,τ , where Γ ranges over arbitrary envi-
ronments (possibly containing variables of type R), as follows:

t RΓτ ,Θ s ⇐⇒ (Γ , Θ t : τ ) ∧ (DΓ , DΘ s : Dτ ) ∧ (∀γ, δ. γ RΓΘ δ =⇒ tγ RΘ

τ sδ)

where γ, δ range over substitutions, and:

γ RΓΘ δ ⇐⇒ (supp(γ) = Γ ) ∧ (supp(δ) = DΓ ) ∧ (∀(x : τ ) ∈ Γ . γ(x) RΘ

τ δ(dx)).

Obviously, Deﬁnition 2 satisﬁes condition (ii) above. What remains to be

done is to show that it satisﬁes condition (i) as well. In order to prove such a
result, we ﬁrst need to show that the logical relation respects the denotational
semantics of Λ×,→,R
D .

Lemma 2. Let Θ = x1 : R, . . . , xn : R. Then, the following hold:

t RΘ Θ
τ s ∧ Θ t : τ = Θ t : τ =⇒ t Rτ s

t RΘ Θ
τ s ∧ DΘ s : Dτ = DΘ s : Dτ =⇒ t Rτ s.

Proof. A standard induction on τ .

We are now ready to state and prove the main result of this section.

Lemma 3 (Fundamental Lemma). For all environments Γ , Θ and for any

expression Γ , Θ t : τ , we have t RΓτ ,Θ Dt.

Proof. We prove the following statement, by induction on t:

∀t. ∀τ . ∀Γ , Θ. (Γ , Θ t : τ =⇒ t RΓτ ,Θ Dt).

We show only the most relevant cases. Suppose t is a variable x. We distinguish

whether x belongs to Γ or Θ.
On the Versatility of Logical Relations 67

1. Suppose (x : R) ∈ Θ. We have to show x RΓR ,Θ dx, i.e.

Θ dx[dualy (x)/dx].1 : R = Θ x : R
Θ dx[dualy (x)/dx].2 : R = ∂y Θ x : R

for any variable y (of type R). The ﬁrst identity obviously holds as

Θ dx[dualy (x)/dx].1 : R = Θ dx[(x, b)/dx].1 : R = Θ x : R,

where b ∈ {0, 1}. For the second identity we distinguish whether y = x or

y
= x. In the former case we have dualy (x) = (x, 1), and thus:

Θ dx[dualy (x)/dx].2 : R = Θ 1 : R = ∂y Θ y : R.

In the latter case we have dualy (x) = (x, 0), and thus:

Θ dx[dualy (x)/dx].2 : R = Θ 0 : R = ∂y Θ x : R.

2. Suppose (x : τ ) ∈ Γ . We have to show x RΓ ,Θ dx, i.e. γ(x) RΘ

τ δ(dx), for all
substitutions γ, δ such that γ RΓΘ δ. Since x belongs to Γ , we are trivially
done.
Suppose t is λx.s, so that we have
Γ , Θ, x : τ1 s : τ2
Γ , Θ λx.s : τ1 → τ2

for some types τ1 , τ2 . As x is bound in λx.s, without loss of generality we can

assume (x : τ1 )
∈ Γ ∪ Θ. Let Δ = Γ , x : τ1 , so that we have Δ, Θ s : τ2 , and
thus s RΔ,Θ
τ2 Ds, by induction hypothesis. By deﬁnition of open logical relation,
we have to prove that for arbitrary γ, δ such that γ RΓΘ δ, we have

λx.sγ RΘ
τ1 →τ2 λdx.(Ds)δ,

i.e. (λx.sγ)p RΘ Θ
τ2 (λdx.(Ds)δ)q, for all p Rτ1 q. Let us fix a pair (p, q) as above.
By Lemma 2, it is sufficient to show (sγ)[p/x] RΘ
τ2 ((Ds)δ)[q/dx]. Let γ , δ be the
substitutions defined as follows:

p if y = x q if y = dx
γ (y) = δ (y) =
γ(y) otherwise δ(y) otherwise.

It is easy to see that γ RΔ Δ,Θ

Θ δ , so that by s Rτ2 Ds (recall that the latter follows
by induction hypothesis) we infer sγ Rτ2 (Ds)δ , by the very deﬁnition of open
Θ

logical relation. As a consequence, the thesis is proved if we show

(sγ)[p/x] = sγ ; ((Ds)δ)[q/dx] = (Ds)δ .

The above identities hold if x

∈ F V (γ(y)) and dx
∈ F V (δ(dy)), for any (y :
τ ) ∈ Γ . This is indeed the case, since γ(y) RΘ
τ δ(dy) implies Θ γ(y) : τ and
DΘ δ(dy) : Dτ , and x
∈ Θ (and thus dx
∈ DΘ).
68 G. Barthe et al.

A direct application of Lemma 3 allows us to conclude the correctness of

the program transformation D. In fact, given a ﬁrst-order term Θ t : R, with
Θ = x1 : R, . . . , xn : R, by Lemma 3 we have t RΘ
R Dt, and thus

∂y Θ t : R = Θ Dt[dualy (x1 )/dx1 , . . . , dualy (xn )/dxn ].2 : R,

for any real-valued variable y, meaning that Dt indeed computes the partial
derivative of t.

Theorem 2. For any term Θ t : R as above, the term DΘ Dt : DR computes

the partial derivative of t, i.e., for any variable y we have

∂y Θ t : R = Θ Dt[dualy (x1 )/dx1 , . . . , dualy (xn )/dxn ].2 : R.

6 On Reﬁnement Types and Local Continuity

In Section 4, we exploited open logical relations to establish a containment the-

orem for the calculus Λ×,→,R
F , i.e. the calculus Λ×,→,R extended with real-valued
functions belonging to a set F including projections and closed under function
composition. Since the collection C of (real-valued) continuous functions satisfies
both constraints, Theorem 1 allows us to conclude that all first order terms of
Λ×,→,R
C represent continuous functions.
The aim of the present section is the development of a framework to prove
continuity properties of programs in a calculus that goes beyond Λ×,→,R
C . More
specifically, (i) we do not restrict our analysis to calculi having operators rep-
resenting continuous real-valued functions only, but consider operators for ar-
bitrary real-valued functions, and (ii) we add to our calculus an if-then-else
construct whose static semantics is captured by the following rule:
Γ t:R Γ s:τ Γ p:τ
Γ if t then s else p : τ

The intended dynamic semantics of the term if t then s else p is the same as
the one of s whenever t evaluates to any real number r
= 0 and the same as the
one of p if it evaluates to 0.

Notice that the crux of the problem we aim to solve is the presence of the
if-then-else construct. Indeed, independently of point (i), such a construct breaks
the global continuity of programs, as illustrated in Figure 3a. As a consequence
we are forced to look at local continuity properties, instead: for instance we
can say that the program of Figure 3a is continuous both on R<0 and R≥0 .
Observe that guaranteeing local continuity allows us (up to a certain point) to
recover the ability of approximating the output of a program by approximating
its input. Indeed, if a program t : R × . . . × R → R is locally continuous on a
subset X of Rn , then the value of ts (for some input s) can be approximated
On the Versatility of Logical Relations 69

t(x) t(x)

x x

(a) t = λx.if x < 0 then − x else x + 1 (b) t = λx.if x < 0 then 1 else x + 1

Fig. 3: Simply typed ﬁrst-order programs with branches

by passing as argument to t a family (sn )n∈N of approximations of s, as long as

both s and all the (sn )n∈N are indeed elements of X. Notice that the continuity
domains we are interested in are not necessary open sets: we could for instance
be interested in functions that are continuous on the unit circle, i.e. the points
{(a, b) | a2 + b2 = 1} ⊆ R2 . For this reason we will work with the notion
of sequential continuity, instead of the usual topological notion of continuity.
It must be observed, however, that these two notions coincide as soon as the
continuity domain X is actually an open set.
Definition 3 (Sequential Continuity). Let f : Rn → R, and X be any subset
of Rn . We say that f is (sequentially) continuous on X if for every x ∈ X, and
for every sequence (xn )n∈N of elements of X such that limn→∞ xn = x, it holds
that limn→∞ f (xn ) = f (x).
In [18], Chaudhuri et al. introduced a logical system designed to guarantee
local continuity properties on programs in an imperative (first-order) program-
ming language with conditional branches and loops. In this section, we develop
a similar system in the setting of a higher-order functional language with an
if-then-else construct, and we use open logical relations to prove the sound-
ness of our system. This witnesses, on yet another situation, the versatility of
open logical relations. Compared to [18], we somehow generalize from a result
on programs built from only first-order constructs and primitive functions, to a
containment result for programs built using also higher-order constructs.
We however mention that, although our system is inspired by the work of
Chaudhuri at al., there are significant differences between the two, even at the
first-order level. The consequences these differences have on the expressive power
of our systems are twofold:
• On the one hand, while inferring continuity on some domain X of a program
of the form if t then s else p, we have more flexibility than [18] for the
domains of continuity of s and p. To be more concrete, let us consider the
program λx.(if (x > 0) then 0 else (if x = 4 then 1 else 0)), which is
continuous on R even though the second branch is continuous on R≤0 , but
not on R. We are able to show in our system that this program is indeed
continuous on the whole domain R, while Chaudhuri et al. cannot do the
70 G. Barthe et al.

same in their system for the corresponding imperative program: they ask the
domain of continuity of each of the two branches to coincide with the domain
of continuity of the whole program.
• On the other hand, the system of Chaudhuri at al. allows one to express
continuity along a restricted set of variables, which we cannot do. To illustrate
this, let us look at the program: λx, y.if (x = 0) then (3 ∗ y) else (4 ∗ y):
along the variable y, this program is continuous on the whole of R. Chaudhuri
et al. are able to express and prove this statement in their system, while we
can only say that for every real a, this program is continuous on the domain
{a} × R.
For the sake of simplicity, it is useful to slightly simplify our calculus; the ideas
we present here, however, would still be valid in a more general setting, but
that would make the presentation and proofs more involved. As usual, let F be
a collection of real-valued functions. We consider the restriction of the calculus
Λ×,→,R
F obtained by considering types of the form
τ ::= R | ρ; ρ ::= ρ1 × · · · × ρn × R × · · · × R → τ ;

m-times

only. For the sake of readability, we employ the notation (ρ1 . . . , ρn , R, . . . , R) → τ

in place of ρ1 × · · · × ρn × R × · · · × R → τ . We also overload the notation and
keep indicating the resulting calculus as Λ×,→,R
F . Nonetheless, the reader should
keep in mind that from now on, whenever referring to a Λ×,→,R F term, we are
tacitly referring to a term typable according to the restricted type system, but
that can indeed contain conditionals.
Since we want to be able to talk about composition properties of locally
continuous programs, we actually need to talk not only about the points where
a program is continuous, but also about the image of this continuity domain.
In higher-order languages, a well-established framework for the latter kind of
specifications is the one of refinement types, that have been first introduced
by [31] in the context of ML types: the basic idea is to annotate an existing
type system with logical formulas, with the aim of being more precise about
the underlying program’s behaviors than in simple types. Here, we are going to
adapt this framework by replacing the image annotations provided by standard
refinement types with continuity annotations.

6.1 A Reﬁnement Type System Ensuring Local Continuity

Our refinement type system is developed on top of the simple types system of
Section 2 (actually, on the simplification of such a system we are considering in
this section). We first need to introduce a set of logical formulas which talk about
n-uples of real numbers, and which we use as annotations in our refinement types.
We consider a set V of logical variables, and we construct formulas as follows:

ψ, φ ∈ L ::= | (e ≤ e) | ψ∧φ | ¬ψ,

e ∈ E ::= α | a | f (e, . . . , e) with α ∈ V, a ∈ R, f : Rn → R.

On the Versatility of Logical Relations 71

Recall that with the connectives in our logic, we are able to encode logical
disjunction and implication, and as customary, we write φ ⇒ ψ for ¬φ ∨ ψ. A
real assignment is a partial map σ : V → R. When σ has ﬁnite support, we
sometimes specify σ by writing (α1 → σ(α1 ), . . . , αn → σ(αn )). We note σ |= φ
when σ is deﬁned on the variables occurring in φ, and moreover the real formula
obtained when replacing along σ the logical variables of φ is true. We write |= φ
when σ |= φ always holds, independently on σ.
We can associate to every formula the subset of Rn consisting of all points
where this formula holds: more precisely, if φ is a formula, and X = α1 , . . . , αn
is a list of logical variables such that Vars(φ) ⊆ X, we call truth domain of φ
w.r.t. X the set:

Dom(φ)X = {(a1 , . . . , an ) ∈ Rn | (α1 → a1 , . . . , αn → an ) |= φ}.

We are now ready to define the language of refinement types, which can be
seen as simple types annotated by logical formulas. The type R is annotated by
logical variables: this way we obtain refinement real types of the form {α ∈ R}.
The crux of our refinement type system consists in the annotations we put on
the arrows. We introduce two distinct refined arrow constructs, depending on
the shape of the target type: more precisely we annotate the arrow of a type
(T1 , . . . , Tn ) → R with two logical formulas, while we annotate (T1 , . . . , Tn ) → H
(where H is an higher-order type) with only one logical formula. This way, we ob-
ψφ ψ
tain refined arrow types of the form (T1 , . . . , Tn ) → {α ∈ R}, and (T1 , . . . , Tn ) →
H: in both cases the formula ψ specifies the continuity domain, while the formula
φ is an image annotation used only when the target type is ground. The intuition
ψφ
is as follows: a program of type (H1 , . . . , Hn , {α1 ∈ R}, . . . , {αn ∈ R}) → {α ∈ R}
uses its real arguments continuously on the domain specified by the formula ψ
(w.r.t α1 , . . . , αn ), and this domain is sent into the domain specified by the for-
ψ
mula φ (w.r.t. α). Similarly, a program of the type (T1 , . . . , Tn ) → H has its real
arguments used in a continuous way on the domain specified by ψ, but it is not
possible anymore to specify an image domain, because H is higher-order.
The general form of our refined types is thus as follows:

T ::= H | F; F ::= {α ∈ R};

ψ
H ::= (H1 , . . . , Hm , F1 , . . . , Fn ) → H | ψφ
(H1 , . . . , Hm , F1 , . . . , Fn ) → F

with n + m > 0, Vars(φ) ⊆ {α}, Vars(ψ) ⊆ {α1 , . . . , αn } when F = {α ∈ R},

Fi = {αi ∈ R}, and the (αi )1≤i≤n are distinct. We take refinement types up to
renaming of logical variables. If T is a refinement type, we write T for the simple
type we obtain by forgetting about the annotations in T .
Example 3. We illustrate in this example the intended meaning of our refinement
types.
• We first look at how to refine R → R: those are types of the form {α1 ∈
φ1 φ2
R} → {α2 ∈ R}. The intended inhabitants of these types are the programs
72 G. Barthe et al.

t : R → R such that i) t is continuous on the truth domain of φ1 ; and

ii) t sends the truth domain of φ1 into the truth domain of φ2 . As an
example, φ1 could be (α1 < 3), and φ2 could be (α2 ≥ 5). An example of a
program having this type is t = λx.(5 + f (x)), where f : R → R is deﬁned

1
when a < 3
as f (a) = 3−a , and moreover we assume that {f , +} ⊆ F.
0 otherwise
• We look now at the possible reﬁnements of R → (R → R): those are of the form
θ θ θ
{α1 ∈ R} → 1
({α2 ∈ R} 2→ 3 {α3 ∈ R}). The intended inhabitants of these
types are the programs t : R → (R → R) whose interpretation function (x, y) ∈
R2 → t(x)(y) sends continously Dom(θ1 )α1 × Dom(θ2 )α2 into Dom(θ3 )α3 .
As an example, consider θ1 = (α1 < 1), θ2 = (α2 ≤ 3), and θ3 = (α3 > 0).
An example of a program having this type is λx1 .λx2 .f (x1 ∗ x2 ) where we
take f as above.

A reﬁned typing context Γ is a list x1 : T1 , . . . , xn : Tn , where each Ti is a

reﬁnement type. In order to express continuity constraints, we need to annotate
typing judgments by logical formulas, in a similar way as what we do for arrow
types. More precisely, we consider two kinds of reﬁned typing judgments: one
for terms of ground type, and one for terms of higher-order type:
ψ ψφ
Γ r t : H; Γ r t : F .

6.2 Basic Typing Rules

We ﬁrst consider reﬁnement typing rules for the fragment of our language which
excludes conditionals: they are given in Figure 4. We illustrate them by way of
a series of examples.

Example 4. We ﬁrst look at the typing rule var-F: if θ implies θ , then the
variable x—that, in semantics terms, does the projection of the context Γ to
one of its component—sends continuously the truth domain of θ into the truth
domain of θ . Using this rule we can, for instance, derive the following judgment:

(α≥0∧β≥0)(α≥0)
x : {α ∈ R}, y : {β ∈ R} r x : {α ∈ R}. (1)

Example 5. We now look at the Rf rule, that deals with functions from F. Using
this rule, we can show that:
(α≥0∧β≥0)(γ≥0)
x : {α ∈ R}, y : {β ∈ R} r min(x, y) : {γ ∈ R}. (2)

Before giving the reﬁned typing rule for the if-then-else construct, we also
illustrate on an example how the rules in Figure 4 allow us to exploit the conti-
nuity informations we have on functions in F, compositionally.
On the Versatility of Logical Relations 73

|= θ ⇒ θ
var-H var-F
ψ θθ
Γ , x : H r x : H Γ , x : {α ∈ R} r x : {α ∈ R}

θθi
f ∈ F is continuous on Dom(θ1 ∧ . . . ∧ θn )α1 ...αn
Γ r ti : {αi ∈ R}
f (Dom(θ1 ∧ . . . ∧ θn )α1 ...αn ) ⊆ Dom(θ )β
Rf
θθ
Γ r f (t1 . . . tn ) : {β ∈ R}

ψ(η)
Γ , x 1 : T 1 , . . . , x n : T n r t : T |= ψ1 ∧ ψ2 ⇒ ψ
abs ψ2 ψ1 (η)
Γ r λ(x1 , . . . , xn ).t : (T1 , . . . , Tn ) → T

φ
(Γ r si : Hi )1≤i≤m |= θ1 ∧ . . . ∧ θn ⇒ θ
φθj
φ θ(η)
Γ r t : (H1 , . . . , Hm , F1 , . . . , Fn ) → T (Γ r pj : Fj )1≤j≤m
app
φ(η)
Γ r t(s1 , . . . , sm , p1 , . . . , pm ) : T
The formula ψ(η) should be read as ψ when T is a higher-order type, and as ψ η
when T is a ground type.

Fig. 4: Typing Rules

−x if x < 0
Example 6. Let f : R → R be the function deﬁned as: f (x) = .
x + 1 otherwise
Observe that we can actually regard f as represented by the program in Fig-
ure 3a—but we consider it as a primitive function in F for the time being, since
we have not introduced the typing rule for the if-then-else construct, yet. Con-
sider the program:
t = λ(x, y).f (min(x, y)).
We see that t : R2 → R is continuous on the set {(x, y) | x ≥ 0 ∧ y ≥ 0},
and that, moreover, the image of f on this set is contained on [1, +∞). Using
the rules in Figure 4, the fact that f is continuous on R≥0 , and that min is
continuous on R2 , we see that our reﬁned type system allows us to prove t to be
continuous in the considered domain, i.e.:
(α≥0∧β≥0)(γ≥1)
r t : ({α ∈ R}, {β ∈ R}) → {γ ∈ R}.

6.3 Typing Conditionals

We now look at the rule for the if-then-else construct: as can be seen in the
two programs in Figure 3, the use of conditionals may or may not induce dis-
continuity points. The crux here is the behaviour of the two branches at the
74 G. Barthe et al.

discontinuity points of the guard function. In the two programs represented in

Figure 3, we see that the only discontinuity point of the guard is in x = 0. How-
ever, in Figure 3b the two branches return the same value in 0, and the resulting
program is thus continuous at x = 0, while in Figure 3a the two branches do
not coincide in 0, and the resulting program is discontinuous at x = 0. We can
generalize this observation: for the program if t then s else p to be continu-
ous, we need the branches s and p to be continuous respectively on the domain
where t is 1, and on the domain where t is 0, and moreover we need s and p
to be continuous and to coincide on the points where t is not continuous. Simi-
larly to the logical system designed by Chaudhuri et al [18], the coincidence of
the branches in the discontinuity points is expressed as a set of logical rules by
way of observational equivalence. It should be observed that such an equivalence
check is less problematic for first-order programs than it is for higher-order one
(the authors of [18] are able to actually check observational equivalence through
an SMT solver). On the other hand, various notions of equivalence which are
included in contextual equivalence and sometimes coincide with it (e.g., applica-
tive bisimilarity, denotational semantics, or logical relations themselves) have
been developed for higher-order languages, and this starts to give rise to actual
automatic tools for deciding contextual equivalence [38].
We give in Figure 5 the typing rule for conditionals. The conclusion of the
rule guarantees the continuity of the program if t then s else p on a do-
main specified by a formula θ. The premises of the rule ask for formulas θq for
q ∈ {t, s, p} that specify continuity domains for the programs t, s, p, and ask also
for two additional formulas θ(t,0) and θ(t,1) that specify domains where the value
of the guard t is 0 and 1, respectively. The target formula θ, and the formulas
(θq )q∈{t,s,p,(t,1),(t,0)} are related by two side-conditions. Side-condition (1) con-
sists of the following four distinct requirements, that must hold for every point a
in the truth domain of θ: i) a is in the truth domain of at least one of the two for-
mulas θt , θs ; ii) if a is not in θ(t,1) (i.e., we have no guarantee that t will return 1
at point a, meaning that the program p may be executed) then a must be in the
continuity domain of p; iii) a condition symmetric to the previous one, replacing
1 by 0, and p by s; iv) all points of possible discontinuity (i.e. the points a such
that θt does not hold) must be in the continuity domain of both s and p, and as
a consequence both θs and θp must hold there. The side-condition (2) uses typed
contextual equivalence ≡ctx between terms to express that the two programs s
and p must coincide on all inputs such that θt does not hold–i.e. that are not
in the continuity domain of t. Observe that typed context equivalence here is
defined with respect to the system of simple types.

Notation 1. We use the following notations in Figure 5. When Γ is a typing

environement, we write GΓ and HΓ for the ground and higher-order parts of
Γ , respectively. Moreover, suppose we have a ground reﬁned typing environment
Θ = x1 : {α1 ∈ R}, . . . , xn : {αn ∈ R}: we say that a logical assignment σ is
compatible with Θ when {αi | 1 ≤ i ≤ n} ⊆ supp(σ). When it is the case,
we build in a natural way the substitution associated to σ along Θ by taking
σ Θ (xi ) = σ(αi ).
On the Versatility of Logical Relations 75

θt (β=0∨β=1)
Γ r t : {β ∈ R}
θ(t,0) (β=0) θs (η) θp (η)

Γ r t : {β ∈ R} Γ r s : T Γ r p : T (1), (2)
θ(t,1) (β=1)
Γ r t : {β ∈ R}
If
θ(η)
Γ r if t then s else p : T
Again, the formula ψ(η) should be read as ψ when T is a higher-order type, and as
ψ η when T is a ground type. The side-conditions (1), (2) are given as:
s p p s
1. |= θ ⇒ (θ ∨ θ ) ∧ (θ (t,1)
∨ θ ) ∧ (θ (t,0)
∨ θ ) ∧ (θt ∨ (θs ∧ θp )) .
2. For all logical assignment σ compatible with GΓ , σ |= θ ∧ ¬θt implies HΓ
sσ GΓ ≡ctx pσ GΓ .

Fig. 5: Typing Rules for the if-then-else construct

Example 7. Using our if-then-else typing rule, we can indeed type the program
in Figure 3b as expected:

λx.if x < 0 then 1 else x + 1 : {α ∈ R} → {β ∈ R}.

6.4 Open-logical Predicates for Reﬁnement Types

Our goal in this section is to show the correctness of our reﬁnement type systems,
that we state below.
Theorem 3. Let t be any program such that:
θθ
x1 : {α1 ∈ R}, . . . , xn : {αn ∈ R} r t : {β ∈ R}.

Then it holds that:

• t(Dom(θ)α1 ,...,αn ) ⊆ Dom(θ )β ;
• t is sequentially continuous on Dom(θ)α1 ,...,αn .
As a ﬁrst step, we show that our if-then-else rule is reasonable, i.e. that it
behaves well with primitive functions in F. More precisely, if we suppose that
the functions f , g0 , g1 are such that the premises of the if-then-else rule hold,
then the program if f (x1 , . . . , xn ) then g1 (x1 , . . . , xn ) else g0 (x1 , . . . , xn ) is
indeed continuous in the domain speciﬁed by the conclusion of the rule. This is
precisely what we prove in the following lemma.
Lemma 4. Let f , g0 , g1 : Rn → R be functions in F, and Θ = x1 : {α1 ∈
R}, . . . , xn : {αn ∈ R}. We denote α the list of logical variables α1 , . . . , αn . We
consider logical formulas θ and θf , θ(f ,0) , θ(f ,1) , φg0 , φg1 that have their logical
variables in α, and such that:
76 G. Barthe et al.

1. f is continuous on Dom(θ)α with f (Dom(θf )α ) ⊆ {0, 1} and f (Dom(θ(f ,b) )α ) ⊆

{b} for b ∈ {0, 1}.
2. g0 and g1 are continuous on Dom(φg0 )α , and Dom(φg1 )α respectively, and
(α1 → a1 , . . . , αn → an ) |= θ ∧ ¬θf implies g0 (a1 , . . . , an ) = g1 (a1 , . . . ,an );
3. |= θ ⇒ (φg1 ∨ φg0 ) ∧ (θ(f ,0) ∨ φg1 ) ∧ (θ(f ,1) ∨ φg0 ) ∧ (θf ∨ (φg0 ∧ φg1 )) .
Then it holds that:
Θ if f (x1 , . . . , xn ) then g1 (x1 , . . . , xn ) else g0 (x1 , . . . , xn ) : R
is continuous on Dom(θ)α .
Proof. The proof can be found in the extended version [7].
Similarly to what we did in Section 4, we are going to show Theorem 3
by way of a logical predicate. Recall that the logical predicate we defined in
Section 4 consists actually of three kind of predicates—all defined in Definition 1
of Section 4: FτΘ , FΓΘ , FτΘ,Γ , where Θ ranges over ground typing environments,
Γ ranges over arbitrary environments, and τ is a type. The first predicate FτΘ
contains admissible terms t of type Θ t : τ , the second predicate FΓΘ contains
admissible substitutions γ that associate to every (x : τ ) in Γ a term of type τ
under the typing context Θ, and the third predicate FτΘ,Γ contains admissible
terms t of type Γ , Θ t : τ .
Here, we need to adapt the three kinds of logical predicates to a refinement
scenario: first, we replace τ and Θ, Γ with refinement types and refined typing
contexts respectively. Moreover, for technical reasons, we also need to generalize
our typing contexts, by allowing them to be annotated with any subset of Rn
instead of restricting ourselves to those subsets generated by logical formulas.
Due to this further complexity, we split our definition of logical predicates into
two: we first define the counterpart of the ground typing context predicate FτΘ
in Definition 4, then the counterpart of the predicate for substitutions FΓΘ and
the counterpart of the predicates FτΘ,Γ for higher-order typing environment in
Definition 5.
Let us first see how we can adapt the predicates FτΘ to our refinement types
setting. Recall that in Section 4, we defined the predicate FRΘ as the collection of
terms t such that Θ t : R, and its semantics Θ t : R belongs to F. As we are
interested in local continuity properties, we need to build a predicate expressing
local continuity constraints. Moreover, in order to be consistent with our two
arrow constructs and our two kinds of typing judgments, we actually need to
consider also two kinds of logical predicates, depending on whether the target
type we consider is a real type or an higher-order type. We thus introduce the
following logical predicates:
C(Θ, X φ, F ); C(Θ, X, H);
where Θ is a ground typing environment, X is a subset of Rn , φ is a logical
formula, and, as usual, F ranges over the real refinements types, while H ranges
over the higher-order refinement types. As expected, X and φ are needed to
encode continuity constraints inside our logical predicates.
On the Versatility of Logical Relations 77

Deﬁnition 4. Let Θ be a ground typing context of length n, F and H reﬁned

ground type and higher-order type, respectively. We deﬁne families of predicates
on terms C(Θ, Y φ, F ) and C(Θ, Y , H), with Y ⊆ Rn and φ a logical formula,
as speciﬁed in Figure 6.

• For F = {α ∈ R} we take:

C(Θ, Y ψ, F ) := {t | x1 : R, . . . , xn : R t : R,
t(Y ) ⊆ Dom(ψ)α ∧ t continuous over Y }.

ψ(η)
• if H is an arrow type of the form H = (H1 , . . . , Hm , {α1 ∈ R1 }, . . . , {αp ∈ R}) →
T:

C(Θ, Y , H) := {t | x1 : R, . . . , xn : R t : H,
∀Z, ∀s = (s1 , . . . , sm ) with si ∈ C(Θ, Z, Hi ),
∀p = (p1 , . . . pp ), ∀ψ j with |= ψ 1 ∧ . . . ∧ ψ p ⇒ ψ,
and pj ∈ C(Θ, Z ψ j , {αj ∈ R}),
it holds that t(s, p) ∈ C(Θ, (Y ∩ Z)(η), T )},

where as usual we should read ψ(η) = ψ, (Y ∩ Z)(η) = Y ∩ Z when T is higher-

order, and ψ(η) = ψ η, (Y ∩ Z)(η) = (Y ∩ Z) η when T is an annnotated
real type.

Fig. 6: Open Logical Predicates for Reﬁnement Types.

Example 8. We illustrate Deﬁnition 4 on some examples. We denote by B ◦ the

open unit ball in R2 , i.e. B ◦ = {(a, b) ∈ R2 | a2 + b2 < 1}. We consider the
ground typing context Θ = x1 : {α1 ∈ R}, x2 : {α2 ∈ R}.
• We look ﬁrst at the predicate C(Θ, B ◦ (β > 0), {β ∈ R}). It consists of all
programs x1 : R, x2 : R t : R such that x1 : R, x2 : R t : R is continuous
on the open unit ball, and takes only strictly positive values there.
• We look now at an example when the target type T is higher-order. We take
(β1 ≥0)(β2 ≥0)
H = {β1 ∈ R} → {β2 ∈ R}, and we look at the logical predicate
C(Θ, B ◦ , H). We are going to show that the latter contains, for instance, the
program:
w
t = λw.f (w, x21 + y12 ) where f (w, a) = if a < 1; 0 otherwise.
1−a

Looking at Figure 6, we see that it is enough to check that for any Y ⊆ R2

and any s ∈ C(Θ, Y (β1 ≥ 0), {β1 ∈ R}), it holds that:

ts ∈ C(Θ, B ◦ ∩ Y (β2 ≥ 0), {β2 ∈ R}).

78 G. Barthe et al.

Our overall goal—in order to prove Theorem 3—is to show the counterpart
of the Fundamental Lemma from Section 4 (i.e. Lemma 1), which states that
the logical predicate FRΘ contains all well-typed terms. This lemma only talks
about the logical predicates for ground typing contexts, so we can state it as of
now, but its proof is based on the fact that we dispose of the three predicates.
Observe that from there, Theorem 3 follows just from the definition of the logical
predicates on base types. Similarly to what we did for Lemma 1 in Section 4,
proving it requires to define the logical predicates for substitutions and higher-
order typing contexts. We do this in Definition 5 below. As before, they consist in
an adaptation to our refinement types framework of the open logical predicates
Γ
FΘ and FτΘ,Γ of Section 4: as usual, we need to add continuity annotations, and
distinguish whether the target type is a ground type or an higher-order type.
Notation 2. We need to first introduce the following notation: let Γ , Θ be two
ground non-refined typing environments of length m and n respectively–and with
disjoint support. Let γ : supp(Γ ) → {t | Θ t : R} be a substitution. We write
γ for the real-valued function:
γ :Rn → Rn+m
a → (a, γ(x1 )(a), . . . , γ(xm )(a))
Definition 5. Let Θ be a ground typing environment of length n, and Γ an
arbitrary typing environment. We note n and m the lengths of respectively Θ
and GΓ .
• Let Z ⊆ Rn , W ⊆ Rn+m . We define C(Θ, Z W , Γ ) as the set of those
substitutions γ : supp(Γ ) → {t | Θ t : R} such that:
• ∀(x : H) ∈ HΓ , γ(x) ∈ C(Θ, Z, H),
• γ| GΓ : Rn → Rn+m sends continuously Z into W ;
• Let W ⊆ Rn+m , F = {α ∈ R} an annotated real type, and ψ a logical formula
with Vars(ψ) ⊆ {α}. We define:
C((Γ ; Θ), W ψ, F ) := {t | Γ , Θ t : R
∧ ∀X ⊆ Rn , ∀γ ∈ C(Θ, X W , Γ ), tγ ∈ C(Θ, X ψ, F )}.
• Let W ⊆ Rn+m , and H an higher-order refined type. We define :
C((Γ ; Θ), W , H) := {t | Γ , Θ t : H
∧ ∀X ⊆ Rn , ∀γ ∈ C(Θ, X W , Γ ). tγ ∈ C(Θ, X, H)}.
Example 9. We illustrate Definition 5 on an example. We consider the same
context Θ as in Example 8, i.e. Θ = x1 : {α1 ∈ R}, x2 : {α2 ∈ R}, and we take
(β1 ≥0)(β2 ≥0)
Γ = x3 : {α3 ∈ R}, z : H, with H = {β1 ∈ R} → {β2 ∈ R}. We are
interested in the following logical predicate for substitution:
C(Θ, B ◦ {(v, |v|) | v ∈ B ◦ )}, Γ )
√
where the norm of the couple (a, b) is taken as: |(a, b)| = a2 + b2 . We are
going to build a substitution γ : {x3 , z} → Λ×,→,R
F that belongs to this set. We
take:
On the Versatility of Logical Relations 79

w
• γ(z) = λw.f (w, x21 + x22 ) where f (w, a) = 1−a if a < 1; 0 otherwise.
√
• γ(x3 ) = ( ·)(x1 + x2 ).
2 2

We can check that the requirements of Deﬁnition 5 indeed hold for γ:

• γ(z) ∈ C(Θ, B ◦ , H)—see Example 8;
• γ| GΓ : R × R → R3 is continuous on B ◦ , and moreover sends B ◦ into
{(v, |v|) | v ∈ B ◦ )}. Looking at our deﬁnition of the semantics of a substitu-
tion, we see that γ| GΓ (a, b) = (a, b, |(a, b)|), thus the requirements above
hold.

Lemma 5 (Fundamental Lemma). Let Θ be a ground typing context, and Γ

an arbitrary typing context–thus Γ can contain both ground type variables and
non-ground type variables.
θη
• Suppose that Γ , Θ r t : F : then t ∈ C(Γ ; Θ, Dom(θ) η, F ).
θ
• Suppose that Γ , Θ r t : H: then t ∈ C(Γ ; Θ, Dom(θ), H).

Proof Sketch. The proof is by induction on the derivation of the reﬁned typing
judgment. Along the lines, we need to show that our logical predicates play well
with the underlying denotational semantics, but also with logic. The details can
be found in the extended version [7].

From there, we can finally prove the main result of this section, i.e. Theo-
rem 3, that states the correctness of our refinement type system. Indeed, Lemma 5
has Theorem 3 as a corollary: from there it is enough to look at the definition
of the logical predicate for first-order programs to finally show the correctness
of our type system.

7 Related Work

Logical relations are certainly one of the most well-studied concepts in higher-
order programming language theory. In their unary version, they have been
introduced by Tait [54], and further exploited by Girard [33] and Tait [55] him-
self in giving strong normalization proofs for second-order type systems. The
relational counterpart of realizability, namely logical relations proper, have been
introduced by Plotkin [48], and further developed along many different axes, and
in particular towards calculi with fixpoint constructs or recursive types [3,4,2],
probabilistic choice [14], or monadic and algebraic effects [34,11,34]. Without
any hope to be comprehensive, we may refer to Mitchell’s textbook on program-
ming language theory for a comprehensive account about the earlier, classic
definitions [43], or to aforementioned papers for more recent developments.
Extensions of logical relations to open terms have been introduced by several
authors [39,47,30,53,15] and were explicitly referred to as open logical relations
in [59]. However, to the best of the authors’ knowledge, all the aforementioned
works use open logical relations for specific purposes, and do not investigate
their applicability as a general methodology.
80 G. Barthe et al.

Special cases of our Containment Theorem can be found in many papers,

typically as auxiliary results. As already mentioned, an example is the one of
higher-order polynomials, whose first-order terms are proved to compute proper
polynomials in many ways [40,5], none of them in the style of logical relations.
The Containment Theorem itself can be derived by a previous result by La-
font [41] (see also Theorem 4.10.7 in [24]). Contrary to such a result, however,
our proof of the Containment Theorem is entirely syntactical and consists of a
straightforward application of open logical relations.
Algorithms for automatic differentiation have recently been extended to higher-
order programming languages [50,46,51,42,45], and have been investigated from
a semantical perspective in [16,1] relying on insights from linear logic and deno-
tational semantics. In particular, the work of Huot et al. [37] provides a deno-
tational proof of correctness of the program transformation of [50] that we have
studied in Section 5.
Continuity and robustness analysis of imperative first-order programs by way
of program logics is the topic of study of a series of papers by Chaudhuri and
co-authors [19,18,20]. None of them, however, deal with higher-order programs.

8 Conclusion and Future Work

We have showed how a mild variation on the concept of a logical relation can be
fruitfully used for proving both predicative and relational properties of higher-
order programming languages, when such properties have a first-order, rather
than a ground “flavor”. As such, the added value of this contribution is not much
in the technique itself, but in showing how it is extremely useful in heterogeneous
contexts, this way witnessing the versatility of logical relations.
The three case studies, and in particular the correctness of automatic dif-
ferentiation and refinement type-based continuity analysis, are given as proof-
of-concepts, but this does not mean they do not deserve to be studied more in
depth. An example of an interesting direction for future work is the extension
of our correctness proof from Section 5 to backward propagation differentiation
algorithms. Another one consists in adapting the refinement type system of Sec-
tion 6.1 to deal with differentiability. That would of course require a substantial
change in the typing rule for conditionals, which should take care of checking not
only continuity, but also differentiability at the critical points. It would also be
interesting to implement the refinement type system using standard SMT-based
approaches. Finally, the authors plan to investigate extensions of open logical
relations to non-normalizing calculi, as well as to non-simply typed calculi (such
as calculi with polymorphic or recursive types).

References

1. Abadi, M., Plotkin, G.D.: A simple diﬀerentiable programming language. PACMPL

4(POPL), 38:1–38:28 (2020)
On the Versatility of Logical Relations 81

2. Ahmed, A.J.: Step-indexed syntactic logical relations for recursive and quantified
types. In: Proc. of ESOP 2006. pp. 69–83 (2006)
3. Appel, A.W., McAllester, D.A.: An indexed model of recursive types for foun-
dational proof-carrying code. ACM Trans. Program. Lang. Syst. 23(5), 657–683
(2001)
4. Appel, A.W., Mellies, P.A., Richards, C.D., Vouillon, J.: A very modal model of
a modern, major, general type system. In: ACM SIGPLAN Notices. vol. 42, pp.
109–122. ACM (2007)
5. Baillot, P., Dal Lago, U.: Higher-order interpretations and program complexity. In:
Proc. of CSL 2012. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2012)
6. Barendregt, H.P.: The lambda calculus: its syntax and semantics. North-Holland
(1984)
7. Barthe, G., Crubillé, R., Dal Lago, U., Gavazzo, F.: On the versatility of open
logical relations: Continuity, automatic differentiation, and a containment theorem
(long version) (2019), available at https://arxiv.org/abs/2002.08489
8. Bartholomew-Biggs, M., Brown, S., Christianson, B., Dixon, L.: Automatic dif-
ferentiation of algorithms. Journal of Computational and Applied Mathematics
124(1), 171 – 190 (2000), numerical Analysis 2000. Vol. IV: Optimization and
Nonlinear Equations
9. Baydin, A.G., Pearlmutter, B.A., Radul, A.A., Siskind, J.M.: Automatic differen-
tiation in machine learning: a survey. Journal of Machine Learning Research 18,
153:1–153:43 (2017)
10. Benton, N., Hofmann, M., Nigam, V.: Abstract effects and proof-relevant logical
relations. In: Proc. of POPL 2014. pp. 619–632 (2014)
11. Biernacki, D., Piróg, M., Polesiuk, P., Sieczkowski, F.: Handle with care: rela-
tional interpretation of algebraic effects and handlers. PACMPL 2(POPL), 8:1–
8:30 (2018)
12. Birkedal, L., Jaber, G., Sieczkowski, F., Thamsborg, J.: A kripke logical relation
for effect-based program transformations. Inf. Comput. 249, 160–189 (2016)
13. Birkedal, L., Sieczkowski, F., Thamsborg, J.: A concurrent logical relation. In:
Proc. of CSL 2012. pp. 107–121 (2012)
14. Bizjak, A., Birkedal, L.: Step-indexed logical relations for probability. In: Proc. of
FoSSaCS 2015. pp. 279–294 (2015)
15. Bowman, W.J., Ahmed, A.: Noninterference for free. In: Proc. of ICFP 2015. pp.
101–113 (2015)
16. Brunel, A., Mazza, D., Pagani, M.: Backpropagation in the simply typed lambda-
calculus with linear negation. PACMPL 4(POPL), 64:1–64:27 (2020)
17. Brunel, A., Terui, K.: Church => scott = ptime: an application of resource sensitive
realizability. In: Proc. of DICE 2010. pp. 31–46 (2010)
18. Chaudhuri, S., Gulwani, S., Lublinerman, R.: Continuity analysis of programs. In:
Proc. of POPL 2010. pp. 57–70 (2010)
19. Chaudhuri, S., Gulwani, S., Lublinerman, R.: Continuity and robustness of pro-
grams. Commun. ACM 55(8), 107–115 (2012)
20. Chaudhuri, S., Gulwani, S., Lublinerman, R., NavidPour, S.: Proving programs
robust. In: Proc. of SIGSOFT/FSE 2011. pp. 102–112 (2011)
21. Clifford: Preliminary Sketch of Biquaternions. Proceedings of the London Mathe-
matical Society s1-4(1), 381–395 (11 1871)
22. Cook, S.A., Kapron, B.M.: Characterizations of the basic feasible functionals of
finite type (extended abstract). In: 30th Annual Symposium on Foundations of
Computer Science, Research Triangle Park, North Carolina, USA, 30 October - 1
November 1989. pp. 154–159 (1989)
82 G. Barthe et al.

23. Crary, K., Harper, R.: Syntactic logical relations for polymorphic and recursive
types. Electr. Notes Theor. Comput. Sci. 172, 259–299 (2007)
24. Crole, R.L.: Categories for Types. Cambridge mathematical textbooks, Cambridge
University Press (1993)
25. Dreyer, D., Neis, G., Birkedal, L.: The impact of higher-order state and control
effects on local relational reasoning. J. Funct. Program. 22(4-5), 477–528 (2012)
26. Edalat, A.: The domain of differentiable functions. Electr. Notes Theor. Comput.
Sci. 40, 144 (2000)
27. Edalat, A., Lieutier, A.: Domain theory and differential calculus (functions of one
variable). In: Proc. of LICS 2002. pp. 277–286 (2002)
28. Elliott, C.: The simple essence of automatic differentiation. PACMPL 2(ICFP),
70:1–70:29 (2018)
29. Escardó, M.H., Ho, W.K.: Operational domain theory and topology of sequential
programming languages. Inf. Comput. 207(3), 411–437 (2009)
30. Fiore, M.P.: Semantic analysis of normalisation by evaluation for typed lambda
calculus. In: Proc. of PPDP 2002. pp. 26–37 (2002)
31. Freeman, T., Pfenning, F.: Refinement types for ml. In: Proceedings of the ACM
SIGPLAN 1991 Conference on Programming Language Design and Implementa-
tion. pp. 268–277. PLDI ’91 (1991)
32. Gianantonio, P.D., Edalat, A.: A language for differentiable functions. In: Proc. of
FOSSACS 2013. pp. 337–352 (2013)
33. Girard, J.Y.: Une extension de l’interpretation de gödel a l’analyse, et son applica-
tion a l’elimination des coupures dans l’analyse et la theorie des types. In: Studies
in Logic and the Foundations of Mathematics, vol. 63, pp. 63–92. Elsevier (1971)
34. Goubault-Larrecq, J., Lasota, S., Nowak, D.: Logical relations for monadic types.
In: International Workshop on Computer Science Logic. pp. 553–568. Springer
(2002)
35. Griewank, A., Walther, A.: Evaluating Derivatives: Principles and Techniques
of Algorithmic Differentiation. Society for Industrial and Applied Mathematics,
Philadelphia, PA, USA, second edn. (2008)
36. Hofmann, M.: Logical relations and nondeterminism. In: Software, Services, and
Systems - Essays Dedicated to Martin Wirsing on the Occasion of His Retirement
from the Chair of Programming and Software Engineering. pp. 62–74 (2015)
37. Huot, M., Staton, S., Vákár, M.: Correctness of automatic differentiation via dif-
feologies and categorical gluing (2020), to appear in Proc. of ESOP 2020 (long
version available at http://arxiv.org/abs/2001.02209
38. Jaber, G.: Syteci: automating contextual equivalence for higher-order programs
with references. PACMPL 4(POPL), 59:1–59:28 (2020)
39. Jung, A., Tiuryn, J.: A new characterization of lambda definability. In: Proc. of
TLCA 1993. pp. 245–257 (1993)
40. Kapron, B.M., Cook, S.A.: A new characterization of type-2 feasibility. SIAM J.
Comput. 25(1), 117–132 (1996)
41. Lafont, Y.: Logiques, catégories & machines: implantation de langages de pro-
grammation guidée par la logique catégorique. Institut national de recherche en
informatique et en automatique (1988)
42. Manzyuk, O., Pearlmutter, B.A., Radul, A.A., Rush, D.R., Siskind, J.M.: Pertur-
bation confusion in forward automatic differentiation of higher-order functions. J.
Funct. Program. 29, e12 (2019)
43. Mitchell, J.C.: Foundations for programming languages. Foundation of computing
series, MIT Press (1996)
On the Versatility of Logical Relations 83

44. Owens, S., Myreen, M.O., Kumar, R., Tan, Y.K.: Functional big-step semantics.
In: Proc. of ESOP 2016. pp. 589–615 (2016)
45. Pearlmutter, B.A., Siskind, J.M.: Lazy multivariate higher-order forward-mode
AD. In: Proc. of POPL 2007. pp. 155–160 (2007)
46. Pearlmutter, B.A., Siskind, J.M.: Reverse-mode AD in a functional framework:
Lambda the ultimate backpropagator. ACM Trans. Program. Lang. Syst. 30(2),
7:1–7:36 (2008)
47. Pitts, A.M., Stark, I.D.B.: Observable properties of higher order functions that
dynamically create local names, or what’s new? In: Proc. of MFCS 1993. pp. 122–
141 (1993)
48. Plotkin, G.: Lambda-definability and logical relations. Edinburgh University (1973)
49. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Neurocomputing: Foundations of
research. chap. Learning Representations by Back-propagating Errors, pp. 696–699.
MIT Press (1988)
50. Shaikhha, A., Fitzgibbon, A., Vytiniotis, D., Peyton Jones, S.: Efficient differen-
tiable programming in a functional array-processing language. PACMPL 3(ICFP),
97:1–97:30 (2019)
51. Siskind, J.M., Pearlmutter, B.A.: Nesting forward-mode AD in a functional frame-
work. Higher-Order and Symbolic Computation 21(4), 361–376 (2008)
52. Spivak, M.: Calculus On Manifolds: A Modern Approach To Classical Theorems
Of Advanced Calculus. Avalon Publishing (1971)
53. Staton, S., Yang, H., Wood, F.D., Heunen, C., Kammar, O.: Semantics for prob-
abilistic programming: higher-order functions, continuous distributions, and soft
constraints. In: Proc. of LICS 2016. pp. 525–534 (2016)
54. Tait, W.W.: Intensional interpretations of functionals of finite type i. Journal of
Symbolic Logic 32(2), 198–212 (1967)
55. Tait, W.W.: A realizability interpretation of the theory of species. In: Logic Col-
loquium. pp. 240–251. Springer, Berlin, Heidelberg (1975)
56. Turon, A.J., Thamsborg, J., Ahmed, A., Birkedal, L., Dreyer, D.: Logical relations
for fine-grained concurrency. In: Proc. of POPL 2013. pp. 343–356 (2013)
57. Vuillemin, J.: Exact real computer arithmetic with continued fractions. IEEE
Trans. Comput. 39(8), 1087–1105 (1990)
58. Weihrauch, K.: Computable Analysis: An Introduction. Texts in Theoretical Com-
puter Science. An EATCS Series, Springer Berlin Heidelberg (2000)
59. Zhao, J., Zhang, Q., Zdancewic, S.: Relational parametricity for a polymorphic
linear lambda calculus. In: Proc. of APLAS 2010. pp. 344–359 (2010)

Brandon Bohrer1 and André Platzer1,2

1
Computer Science Department, Carnegie Mellon University, Pittsburgh, USA
{bbohrer,aplatzer}@cs.cmu.edu
2
Fakultät für Informatik, Technische Universität München, München, Germany

Abstract. Game Logic is an excellent setting to study proofs-about-

programs via the interpretation of those proofs as programs, because
constructive proofs for games correspond to effective winning strategies
to follow in response to the opponent’s actions. We thus develop Con-
structive Game Logic, which extends Parikh’s Game Logic (GL) with
constructivity and with first-order programs à la Pratt’s first-order dy-
namic logic (DL). Our major contributions include: 1. a novel realizability
semantics capturing the adversarial dynamics of games, 2. a natural de-
duction calculus and operational semantics describing the computational
meaning of strategies via proof-terms, and 3. theoretical results includ-
ing soundness of the proof calculus w.r.t. realizability semantics, progress
and preservation of the operational semantics of proofs, and Existential
Properties enabling the extraction of computational artifacts from game
proofs. Together, these results provide the most general account of a
Curry-Howard interpretation for any program logic to date, and the
first at all for Game Logic.

Keywords: Game Logic, Constructive Logic, Natural Deduction, Proof Terms

1 Introduction
Two of the most essential tools in theory of programming languages are program
logics, such as Hoare calculi [29] and dynamic logics [45], and the Curry-Howard
correspondence [17,31], wherein propositions correspond to types, proofs to func-
tional programs, and proof term normalization to program evaluation. Their
intersection, the Curry-Howard interpretation of program logics, has received
surprisingly little study. We undertake such a study in the setting of Game
Logic (GL) [38], because this leads to novel insights, because the Curry-Howard
correspondence can be explained particularly intuitively for games, and because
our first-order GL is a superset of common logics such as first-order Dynamic
Logic (DL).
Constructivity and program verification have met before: Higher-order con-
structive logics [16] obey the Curry-Howard correspondence and are used to

This research was sponsored by the AFOSR under grant number FA9550-16-1-0288.
The authors were also funded by the NDSEG Fellowship and Alexander von Hum-
boldt Foundation, respectively.

c The Author(s) 2020

P. Müller (Ed.): ESOP 2020, LNCS 12075, pp. 84–111, 2020.
https://doi.org/10.1007/978-3-030-44914-8_ 4
Constructive Game Logic 85

develop verified functional programs. Program logics are also often embedded
in constructive proof assistants such as Coq [48], inheriting constructivity from
their metalogic. Both are excellent ways to develop verified software, but we
study something else.
We study the computational content of a program logic itself. Every funda-
mental concept of computation is expected to manifest in all three of logic, type
systems, and category theory [27]. Because dynamics logics (DL’s) such as GL
have shown that program execution is a first-class construct in modal logic, the
theorist has an imperative to explore the underlying notion of computation by
developing a constructive GL with a Curry-Howard interpretation.
The computational content of a proof is especially clear in GL, which gen-
eralizes DL to programmatic models of zero-sum, perfect-information games be-
tween two players, traditionally named Angel and Demon. Both normal-play and
misère-play games can be modeled in GL. In classical GL, the diamond modality
αφ and box modality [α]φ say that Angel and Demon respectively have a strat-
egy to ensure φ is true at the end of α, which is a model of a game. The difference
between classical GL and CGL is that classical GL allows proofs that exclude the
middle, which correspond to strategies which branch on undecidable conditions.
CGL proofs can branch only on decidable properties, thus they correspond to
strategies which are effective and can be executed by computer. Effective strate-
gies are crucial because they enable the synthesis of code that implements a
strategy. Strategy synthesis is itself crucial because even simple games can have
complicated strategies, and synthesis provides assurance that the implementa-
tion correctly solves the game. A GL strategy resolves the choices inherent in a
game: a diamond strategy specifies every move made by the Angel player, while
a box strategy specifies the moves the Demon player will make.
In developing Constructive Game Logic (CGL), adding constructivity is a
deep change. We provide a natural deduction calculus for CGL equipped with
proof terms and an operational semantics on the proofs, demonstrating the mean-
ing of strategies as functional programs and of winning strategies as functional
programs that are guaranteed to achieve their objective no matter what counter-
strategy the opponent follows. While the proof calculus of a constructive logic
is often taken as ground truth, we go a step further and develop a realizability
semantics for CGL as programs performing winning strategies for game proofs,
then prove the calculus sound against it. We adopt realizability semantics in
contrast to the winning-region semantics of classical GL because it enables us
to prove that CGL satisfies novel properties (Section 8). The proof of our Strat-
egy Property (Theorem 2) constitutes an (on-paper) algorithm that computes a
player’s (effective) strategy from a proof that they can win a game. This is the
key test of constructivity for CGL, which would not be possible in classical GL. We
show that CGL proofs have two computational interpretations: the operational
semantics interpret an arbitrary proof (strategy) as a functional program which
reduces to a normal-form proof (strategy), while realizability semantics interpret
Angel strategies as programs which defeat arbitrary Demonic opponents.
86 B. Bohrer and A. Platzer

While CGL has ample theoretical motivation, the practical motivations from
synthesis are also strong. A notable line of work on dGL extends ﬁrst-order GL
to hybrid games to verify safety-critical adversarial cyber-physical systems [42].
We have designed CGL to extend smoothly to hybrid games, where synthesis
provides the correctness demanded by safety-critical systems and the synthesis
of correct monitors of the external world [36].

2 Related Work

This work is at the intersection of game logic and constructive modal logics.
Individually, they have a rich literature, but little work has been done at their
intersection. Of these, we are the first for GL and the first with a proofs-as-
programs interpretation for a full first-order program logic.

Games in Logic. Parikh’s propositional GL [38] was followed by coalitional

GL [39]. A first-order version of GL is the basis of differential game logic dGL [42]
for hybrid games. GL’s are unique in their clear delegation of strategy to the proof
language rather than the model language, crucially allowing succinct game spec-
ifications with sophisticated winning strategies. Succinct specifications are im-
portant: specifications are trusted because proving the wrong theorem would not
ensure correctness. Relatives without this separation include Strategy Logic [15],
Alternating-Time Temporal Logic (ATL) [5], CATL [30], Ghosh’s SDGL [24],
Ramanujam’s structured strategies [46], Dynamic-epistemic logics [6,10,49], ev-
idence logics [9], and Angelic Hoare logic [35].

Constructive Modal Logics. A major contribution of CGL is our constructive

semantics for games, not to be confused with game semantics [1], which are used
to give programs semantics in terms of games. We draw on work in semantics
for constructive modal logics, of which two main approaches are intuitionistic
Kripke semantics and realizability semantics.
An overview of Intuitionistic Kripke semantics is given by Wijesekera [52].
Intuitionistic Kripke semantics are parameterized over worlds, but in contrast
to classical Kripke semantics, possible worlds represent what is currently known
of the state. Worlds are preordered by w1 ≥ w2 when w1 contains at least
the knowledge in w2 . Kripke semantics were used in Constructive Concurrent
DL [53], where both the world and knowledge of it change during execution. A
key advantage of realizability semantics [37,33] is their explicit interpretation of
constructivity as computability by giving a realizer, a program which witnesses
a fact. Our semantics combine elements of both: Strategies are represented by
realizers, while the game state is a Kripke world. Constructive set theory [2] aids
in understanding which set operations are permissible in constructive semantics.
Modal semantics have also exploited mathematical structures such as: i) Neigh-
borhood models [8], topological models for spatial logics [7], and temporal log-
ics of dynamical systems [20]. ii) Categorical [3], sheaf [28], and pre-sheaf [23]
models. iii) Coalgebraic semantics for classical Propositional Dynamic Logic
Constructive Game Logic 87

(PDL) [19]. While games are known to exhibit algebraic structure [25], such
laws are not essential to this work. Our semantics are also notable for the seam-
less interaction between a constructive Angel and a classical Demon.
CGL is ﬁrst-order, so we must address the constructivity of operations that
inspect game state. We consider rational numbers so that equality is decidable,
but our work should generalize to constructive reals [11,13].
Intuitionistic modalities also appear in dynamic-epistemic logic (DEL) [21],
but that work is interested primarily in proof-theoretic semantics while we em-
ploy realizability semantics to stay ﬁrmly rooted in computation. Intuitionistic
Kripke semantics have also been applied to multimodal System K with itera-
tion [14], a weak fragment of PDL.

Constructivity and Dynamic Logic. With CGL, we bring to fruition several past
efforts to develop constructive dynamic logics. Prior work on PDL [18] sought
an Existential Property for Propositional Dynamic Logic (PDL), but they ques-
tioned the practicality of their own implication introduction rule, whose side
condition is non-syntactic. One of our results is a first-order Existential Prop-
erty, which Degen cited as an open problem beyond the methods of their day [18].
To our knowledge, only one approach [32] considers Curry-Howard or functional
proof terms for a program logic. While their work is a notable precursor to
ours, their logic is a weak fragment of PDL without tests, monotonicity, or un-
bounded iteration, while we support not only PDL but the much more powerful
first-order GL. Lastly, we are preceded by Constructive Concurrent Dynamic
Logic, [53] which gives a Kripke semantics for Concurrent Dynamic Logic [41],
a proper fragment of GL. Their work focuses on an epistemic interpretation of
constructivity, algebraic laws, and tableaux. We differ in our use of realizability
semantics and natural deduction, which were essential to developing a Curry-
Howard interpretation for CGL. In summary, we are justified in claiming to have
the first Curry-Howard interpretation with proof terms and Existential Proper-
ties for an expressive program logic, the first constructive game logic, and the
only with first-order proof terms.
While constructive natural deduction calculi map most directly to functional
programs, proof terms can be generated for any proof calculus, including a well-
known interpretation of classical logic as continuation-passing style [26]. Proof
terms have been developed [22] for a Hilbert calculus for dL, a dynamic logic
(DL) for hybrid systems. Their work focuses on a provably correct interchange
format for classical dL proofs, not constructive logics.

3 Syntax
We deﬁne the language of CGL, consisting of terms, games, and formulas. The
simplest terms are program variables x, y ∈ V where V is the set of variable
identiﬁers. Globally-scoped mutable program variables contain the state of the
game, also called the position in game-theoretic terminology. All variables and
terms are rational-valued (Q); we also write B for the set of Boolean values {0, 1}
for false and true respectively.
88 B. Bohrer and A. Platzer

Deﬁnition 1 (Terms). A term f, g is a rational-valued computable function

over the game state. We give a nonexhaustive grammar of terms, speciﬁcally
those used in our examples:

f, g ::= · · · | q | x | f + g | f · g | f /g | f mod g

where q ∈ Q is a rational literal, x a program variable, f +g a sum, f ·g a product.

Division-with-remainder is intended for use with integers, but we generalize the
standard notion to support rational arguments. Quotient f /g is integer even
when f and g are non-integer, and thus leaves a rational remainder f mod g.
Divisors g are assumed to be nonzero.

A game in CGL is played between a constructive player named Angel and a

classical player named Demon. Our usage of the names Angel and Demon diﬀers
subtly from traditional GL usage for technical reasons. Our Angel and Demon
are asymmetric: Angel is “our” player, who must play constructively, while the
“opponent” Demon is allowed to play classically because our opponent need not
be a computer. At any time some player is active, meaning their strategy resolves
all decisions, and the opposite player is called dormant. Classical GL identiﬁes
Angel with active and Demon with dormant; the notions are distinct in CGL.

Deﬁnition 2 (Games). The set of games α, β is deﬁned recursively as such:

α, β ::= ?φ | x := f | x := ∗ | α ∪ β | α; β | α∗ | αd

In the test game ?φ, the active player wins if they can exhibit a constructive
proof that formula φ currently holds. If they do not exhibit a proof, the dormant
player wins by default and we informally say the active player “broke the rules”.
In deterministic assignment games x := f, neither player makes a choice, but
the program variable x takes on the value of a term f . In nondeterministic
assignment games x := ∗, the active player picks a value for x : Q. In the choice
game α ∪ β, the active player chooses whether to play game α or game β. In
the sequential composition game α; β, game α is played first, then β from the
resulting state. In the repetition game α∗ , the active player chooses after each
repetition of α whether to continue playing, but loses if they repeat α infinitely.
Notably, the exact number of repetitions can depend on the dormant player’s
moves, so the active player need not know, let alone announce, the exact number
of iterations in advance. In the dual game αd , the active player becomes dormant
and vice-versa, then α is played. We parenthesize games with braces {α} when
necessary. Sequential and nondeterministic composition both associate to the
right, i.e., α ∪ β ∪ γ ≡ {α ∪ {β ∪ γ}}. This does not affect their semantics as both
operators are associative, but aids in reading proof terms.

Deﬁnition 3 (CGL Formulas). The set of CGL formulas φ (also ψ, ρ) is given

recursively by the grammar:

φ ::= αφ | [α]φ | f ∼ g

Constructive Game Logic 89

The defining constructs in CGL (and GL) are the modalities αφ and [α]φ.
These mean that the active or dormant Angel (i.e., constructive) player has a
constructive strategy to play α and achieve postcondition φ. This paper does
not develop the modalities for active and dormant Demon (i.e., classical) players
because by definition those cannot be synthesized to executable code. We assume
the presence of interpreted comparison predicates ∼ ∈ {≤, <, =, =, >, ≥}.
The standard connectives of first-order constructive logic can be derived from
games and comparisons. Verum (tt) is defined 1 > 0 and falsum (ff) is 0 > 1.
Conjunction φ ∧ ψ is defined ?φψ, disjunction φ ∨ ψ is defined ?φ∪?ψtt,
implication φ → ψ is defined [?φ]ψ, universal quantification ∀x φ is defined
[x := ∗]φ, and existential quantification ∃x φ is defined x := ∗φ. As usual in
logic, equivalence φ ↔ ψ can also be defined (φ → ψ) ∧ (ψ → φ). As usual in
constructive logics, negation ¬φ is defined φ → ff, and inequality is defined
by f = g ≡ ¬(f = g). We will use the derived constructs freely but present
semantics and proof rules only for the core constructs to minimize duplication.
Indeed, it will aid in understanding of the proof term language to keep the
definitions above in mind, because the proof terms for many first-order programs
follow those from first-order constructive logic.
For convenience, we also write derived operators where the dormant player
is given control of a single choice before returning control to the active player.
The dormant choice α ∩ β, defined {αd ∪ β d }d , says the dormant player chooses
which branch to take, but the active player is in control of the subgames. We
write φ xy (likewise for α and f ) for the renaming of x for y and vice versa in
formula φ, and write φfx for the substitution of term f for program variable x in
φ, if the substitution is admissible (Def. 9 in Section 6).

3.1 Example Games

We demonstrate the meaning and usage of the CGL constructs via examples,
culminating in the two classic games of Nim and cake-cutting.

Nondeterministic Programs. Every (possibly nondeterministic) program is also

∗
a one-player game. For example, the program n := 0; {n := n + 1} can nonde-
terministically sets n to any natural number because Angel has a choice whether
to continue after every repetition of the loop, but is not allowed to continue
forever. Conversely, games are like programs where the environment (Demon) is
adversarial, and the program (Angel) strategically resolves nondeterminism to
overcome the environment.

Demonic Counter. Angel’s choices often must be reactive to Demon’s choices.

∗
Consider the game c := 10; {c := c − 1 ∩ c := c − 2} ; ?0 ≤ c ≤ 2 where Demon
repeatedly decreases c by 1 or 2, and Angel chooses when to stop. Angel only wins
because she can pass the test 0 ≤ c ≤ 2, which she can do by simply repeating
the loop until 0 ≤ c ≤ 2 holds. If Angel had to decide the loop duration in
advance, Demon could force a rules violation by “guessing” the duration and
changing his choices of c := c − 1 vs. c := c − 2.
90 B. Bohrer and A. Platzer

Coin Toss. Games are perfect-information and do not possess randomness in the
probabilistic sense, only (possibilistic) nondeterminism. This standard limitation
is shown by attempting to express a coin-guessing game:

{coin := 0 ∩ coin := 1}; {guess := 0 ∪ guess := 1}; ?guess = coin

The Demon player sets the value of a tossed coin, but does so adversarially,
not randomly, since strategies in CGL are pure strategies. The Angel player has
perfect knowledge of coin and can set guess equivalently, thus easily passing
the test guess = coin, unlike a real coin toss. Partial information games are
interesting future work that could be implemented by limiting the variables
visible in a strategy.

Nim. Nim is the standard introductory example of a discrete, 2-player, zero-

sum, perfect-information game. We consider misère play (last player loses) for
a version of Nim that is also known as the subtraction game. The constant Nim
deﬁnes the game Nim.

{c := c − 1 ∪ c := c − 2 ∪ c := c − 3}; ?c > 0 ;

Nim =
d ∗
{c := c − 1 ∪ c := c − 2 ∪ c := c − 3}; ?c > 0

The game state consists of a single counter c containing a natural number, which
each player chooses (∪) to reduce by 1, 2, or 3 (c := c − k). The counter is non-
negative, and the game repeats as long as Angel wishes, until some player empties
the counter, at which point that player is declared the loser (?c > 0).

Proposition 1 (Dormant winning region). Suppose c ≡ 1 (mod 4), Then

the dormant player has a strategy to ensure c ≡ 1 (mod 4) as an invariant. That
is, the following CGL formula is valid (true in every state):

c > 0 → c mod 4 = 1 → [Nim∗ ] c mod 4 = 1

This implies the dormant player wins the game because the active player
violates the rules once c = 1 and no move is valid. We now state the winning
region for an active player.

Proposition 2 (Active winning region). Suppose c ∈ {0, 2, 3} (mod 4) ini-

tially, and the active player controls the loop duration. Then the active player
can achieve c ∈ {2, 3, 4}:

c > 0 → c mod 4 ∈ {0, 2, 3} → Nim∗ c ∈ {2, 3, 4}

At that point, the active player will win in one move by setting c = 1 which
forces the dormant player to set c = 0 and fail the test ?c > 0.
Constructive Game Logic 91

Cake-cutting. Another classic 2-player game, from the study of equitable divi-
sion, is the cake-cutting problem [40]: The active player cuts the cake in two,
then the (initially-)dormant player gets first choice of a piece. This is an optimal
protocol for splitting the cake in the sense that the active player is incentivized
to split the cake evenly, else the dormant player could take the larger piece.
Cake-cutting is also a simple use case for fractional numbers. The constant CC
defines the cake-cutting game. Here x is the relative size (from 0 to 1) of the
first piece, y is the size of the second piece, a is the size of the active player’s
piece, and d is the size of dormant player’s piece.

CC = x := ∗; ?(0 ≤ x ≤ 1); y := 1 − x;
{a := x; d := y ∩ a := y; d := x}

The game is played only once. The active player picks the division of the cake,
which must be a fraction 0 ≤ x ≤ 1. The dormant player then picks which slice
goes to whom.
The active player has a tight strategy to achieve a 0.5 cake share, as stated
in Proposition 3.
Proposition 3 (Active winning region). The following formula is valid:

CC a ≥ 0.5

The dormant player also has a computable strategy to achieve exactly 0.5
share of the cake (Proposition 4). Division is fair because each player has a
strategy to get their fair 0.5 share.
Proposition 4 (Dormant winning region). The following formula is valid:

[CC] d ≥ 0.5

Computability and Numeric Types. Perfect fair division is only achieved for a, d ∈
Q because rational equality is decidable. Trichotomy (a < 0.5∨a = 0.5∨a > 0.5)
is a tautology, so the dormant player’s strategy can inspect the active player’s
choice of a. Notably, we intend to support constructive reals in future work, for
which exact equality is not decidable and trichotomy is not an axiom. Future
work on real-valued CGL will need to employ approximate comparison techniques
as is typical for constructive reals [11,13,51]. The examples in this section have
been proven [12] using the calculus deﬁned in Section 5.

4 Semantics

We now develop the semantics of CGL. In contrast to classical GL, whose seman-
tics are well-understood [38], the major semantic challenge for CGL is capturing
the competition between a constructive Angel and classical Demon. We base
our approach on realizability semantics [37,33], because this approach makes the
92 B. Bohrer and A. Platzer

relationship between constructive proofs and programs particularly clear, and

generating programs from CGL proofs is one of our motivations.
Unlike previous applications of realizability, games feature two agents, and
one could imagine a semantics with two realizers, one for each of Angel and
Demon. However, we choose to use only one realizer, for Angel, which captures
the fact that only Angel is restricted to a computable strategy, not Demon.
Moreover, a single realizer makes it clear that Angel cannot inspect Demon’s
strategy, only the game state, and also simplifies notations and proofs. Because
Angel is computable but Demon is classical, our semantics has the flavor both
of realizability semantics and of a traditional Kripke semantics for programs.
The semantic functions employ game states ω ∈ S where we write S for
the set of all states. We additionally write , ⊥ ∈ S (not to be confused with
formulas tt and ff) for the pseudo-states and ⊥ indicating that Angel or
Demon respectively has won the game early by forcing the other to fail a test.
Each ω ∈ S maps each x ∈ V to a value ω(x) ∈ Q. We write ωxv for the state
that agrees with ω except that x is assigned value v where v ∈ Q.
Definition 4 (Arithmetic term semantics). A term f is a computable func-
tion of the state, so the interpretation [[f ]]ω of term f in state ω is f (ω).

4.1 Realizers
To define the semantics of games, we first define realizers, the programs which
implement strategies. The language of realizers is a higher-order lambda calculus
where variables can range over game states, numbers, or realizers which realize a
give proposition φ. Gameplay proceeds in continuation-passing style: invoking a
realizer returns another realizer which performs any further moves. We describe
the typing constraints for realizers informally, and say a is a αφ-realizer (a ∈
αφ Rz) if it provides strategic decisions exactly when αφ demands them.
Definition 5 (Realizers). The syntax of realizers a, b, c ∈ Rz (where Rz is
the set of all realizers) is defined coinductively:

a, b, c ::= x | () | (a, b) | πL (a) | πR (a) | (λω : S. a(ω)) | (Λx : Q. a)

| (Λx : φ Rz. a) | a v | a b | a ω | if (f (ω)) a else b

where x is a program (or realizer) variable and f is a term over the state ω. The
Roman a, b, c should not be confused with the Greek α, β, γ which range over
games. Realizers have access to the game state ω, expressed by lambda realizers
(λω : S. a(ω)) which, when applied in a state ν, compute the realizer a with
ν substituted for ω. State lambdas λ are distinguished from propositional and
ﬁrst-order lambdas Λ. The unit realizer () makes no choices and is understood
as a unit tuple. Units () realize f ∼ g because rational comparisons, in contrast
to real comparisons, are decidable. Conditional strategic decisions are realized
by if (f (ω)) a else b for computable function f : S → B, and execute a if f
returns truth, else b. Realizer (λω : S. f (ω)) is a α ∪ βφ-realizer if f (ω) ∈
({0} × αφ Rz) ∪ ({1} × βφ Rz) for all ω. The ﬁrst component determines
Constructive Game Logic 93

which branch is taken, while the second component is a continuation which

must be able to play the corresponding branch. Realizer (λω : S. f (ω)) can also
be a x := ∗φ-realizer, which requires f (ω) ∈ Q × (φ Rz) for all ω. The first
component determines the value of x while the second component demonstrates
the postcondition φ. The pair realizer (a, b) realizes both Angelic tests ?φψ and
dormant choices [α ∪ β]φ. It is identified with a pair of realizers: (a, b) ∈ Rz×Rz.
A dormant realizer waits and remembers the active Demon’s moves, because
they typically inform Angel’s strategy once Angel resumes action. The first-order
realizer (Λx : Q. b) is a [x := ∗]φ-realizer when bvx is a φ-realizer for every v ∈ Q;
Demon tells Angel the desired value of x, which informs Angel’s continuation b.
The higher-order realizer (Λx : φ Rz. b) realizes [?φ]ψ when bcx realizes ψ for every
φ-realizer c. Demon announces the realizer for φ which Angel’s continuation b
may inspect. Tuples are inspected with projections πL (a) and πR (a). A lambda
is inspected by applying arguments a ω for state-lambdas, a v for first-order,
and a b for higher-order. Realizers for sequential compositions α; βφ (likewise
[α; β]φ) are αβφ-realizers: first α is played, and in every case the continuation
must play β before showing φ. Realizers for repetitions α∗ are streams containing
α-realizers, possibly infinite by virtue of coinductive syntax. Active loop realizer
ind(x. a) is the least fixed point of the equation b = [b/x]a, i.e., x is a recursive
call which must be invoked only in accordance with some well-order. We realize
dormant loops with gen(a, x.b, x.c), coinductively generated from initial value
a, update b, and post-step c with variable x for current generator value.
Active loops must terminate, so α∗ φ-realizers are constructed inductively
using any well-order on states. Dormant loops must be played as long as the
opponent wishes, so [α∗ ]φ-realizers are constructed coinductively, with the in-
variant that φ has a realizer at every iteration.

4.2 Formula and Game Semantics

A state ω paired with a realizer a that continues the game is called a possibility.
A region (written X, Y, Z) is a set of possibilities. We write [[φ]] ⊆ φ Rz × S for
the region which realizes formula φ. A formula φ is valid iff some a uniformly
realizes every state, i.e., {a}
× S ⊆ [[φ]]. A sequent Γ φ is valid iff the formula
Γ → φ is valid, where Γ is the conjunction of all assumptions in Γ .

The game semantics are region-oriented, i.e., they process possibilities in
bulk, though Angel commits to a strategy from the start. The region Xα :
℘(Rz × S) is the union of all end regions of game α which arise when ac-
tive Angel commits to an element of X, then Demon plays adversarially. In
X[[α]] : ℘(Rz×S) Angel is the dormant player, but it is still Angel who commits
to an element of X and Demon who plays adversarially. Recall that pseudo-states
and ⊥ represent early wins by each Angel and Demon, respectively. The defini-
tions below implicitly assume ⊥, ∈/ X, they extend to the case ⊥ ∈ X (likewise
∈ X) using the equations (X ∪ {⊥})[[α]] = X[[α]] ∪ {⊥} and (X ∪ {⊥})α =
Xα ∪ {⊥}. That is, if Demon has already won by forcing an Angel violation
initially, any remaining game can be skipped with an immediate Demon victory,
and vice-versa. The game semantics exploit the Angelic projections Z0 , Z1
94 B. Bohrer and A. Platzer

and Demonic projections Z[0] , Z[1] , which represent binary decisions made by
a constructive Angel and a classical Demon, respectively. The Angelic projec-
tions, which are defined Z0 = {(πR (a), ω) | πL (a)(ω) = 0, (a, ω) ∈ Z} and
Z1 = {(πR (a), ω) | πL (a)(ω) = 1, (a, ω) ∈ Z}, filter by which branch Angel
chooses with πL (a)(ω) ∈ B, then project the remaining strategy πR (a). The
Demonic projections, which are defined Z[0] ≡ {(πL (a), ω) | (a, ω) ∈ Z} and
Z[1] ≡ {(πR (a), ω) | (a, ω) ∈ Z}, contain the same states as Z, but project the
realizer to tell Angel which branch Demon took.
Definition 6 (Formula semantics). [[φ]] ⊆ Rz × S is defined as:
((), ω) ∈ [[f ∼ g]] iff [[f ]]ω ∼ [[g]]ω
(a, ω) ∈ [[αφ]] iff {(a, ω)}α ⊆ ([[φ]] ∪ {})
(a, ω) ∈ [[[α]φ]] iff {(a, ω)}[[α]] ⊆ ([[φ]] ∪ {})
Comparisons f ∼ g defer to the term semantics, so the interesting cases are
the game modalities. Both [α]φ and αφ ask whether Angel wins α by following
the given strategy, and differ only in whether Demon vs. Angel is the active
player, thus in both cases every Demonic choice must satisfy Angel’s goal, and
early Demon wins are counted as Angel losses.
Definition 7 (Angel game forward semantics). We inductively define the
region Xα : ℘(Rz × S) in which α can end when active Angel plays X:
X?φ = {(πR (a), ω) | (πL (a), ω) ∈ [[φ]] for some (a, ω) ∈ X }
∪ {⊥ | (πL (a), ω) ∈/ [[φ]] for all (a, ω) ∈ X }
Xx := f = {(a, ωx[[f ]]ω ) | (a, ω) ∈ X}
Xx := ∗ = {(πR (a), ωxπL (a)(ω) ) | (a, ω) ∈ X}
Xα; β = (Xα)β
Xα ∪ β = X0 α ∪ X1 β
Xα∗ = {Z0 ⊆ Rz × S | X ∪ (Z1 α) ⊆ Z}

Xαd = X[[α]]
Deﬁnition 8 (Demon game forward semantics). We inductively deﬁne the
region X[[α]] : ℘(Rz × S) in which α can end when dormant Angel plays X:
X[[?φ]] = {(a b, ω) | (a, ω) ∈ X, (b, ω) ∈ [[φ]], some b ∈ Rz}
∪ { | (a, ω) ∈ X, but no (b, ω) ∈ [[φ]]}
X[[x := f ]] = {(a, ωx[[f ]]ω ) | (a, ω) ∈ X}
X[[x := ∗]] = {(a r, ωxr ) | r ∈ Q}
X[[α; β]] = (X[[α]])[[β]]
X[[α ∪ β]] = X[0] [[α]] ∪ X[1] [[β]]
X[[α∗ ]] = {Z[0] ⊆ Rz × S | X ∪ (Z[1] [[α]]) ⊆ Z}

X[[αd ]] = Xα
Constructive Game Logic 95

Angelic tests ?φ end in the current state ω with remaining realizer πR (a) if
Angel can realize φ with πL (a), else end in ⊥. Angelic deterministic assignments
consume no realizer and simply update the state, then end. Angelic nondeter-
ministic assignments x := ∗ ask the realizer πL (a) to compute a new value for x
from the current state. Angelic compositions α; β ﬁrst play α, then β from the
resulting state using the resulting continuation. Angelic choice games α ∪ β use
the Angelic projections to decide which branch is taken according to πL (a). The
realizer πR (a) may be reused between α and β, since πR (a) could just invoke
πL (a) if it must decide which branch has been taken. This deﬁnition of Angelic
choice (corresponding to constructive disjunction) captures the reality that re-
alizers in CGL, in contrast with most constructive logics, are entitled to observe
a game state, but they must do so in computable fashion.

Repetition Semantics. In any GL, the challenge in deﬁning the semantics of

repetition games α∗ is that the number of iterations, while finite, can depend
on both players’ actions and is thus not known in advance, while the DL-like
semantics of α∗ as the finite reflexive, transitive closure of α gives an advance-
notice semantics. Classical GL provides the no-advance-notice semantics as a
fixed point [38], and we adopt the fixed point semantics as well. The Angelic
choice whether to stop (Z0 ) or iterate the loop (Z1 ) is analogous to the case
for α ∪ β.

Duality Semantics. To play the dual game αd , the active and dormant players
switch roles, then play α. In classical GL, this characterization of duality is inter-
changeable with the deﬁnition of αd as the game that Angel wins exactly when
it is impossible for Angel to lose. The characterizations are not interchangeable
in CGL because the Determinacy Axiom (all games have winners) of GL is not
valid in CGL:
Remark 1 (Indeterminacy). Classically equivalent determinacy axiom schemata
¬α¬φ → [α]φ and α¬φ ∨ [α]φ of classical GL are not valid in CGL, because
they imply double negation elimination.

Remark 2 (Classical duality). In classical GL, Angelic dual games are character-
ized by the axiom schema αd φ ↔ ¬α¬φ, which is not valid in in CGL. It is
classically interdeﬁnable with αd ↔ [α]φ.

The determinacy axiom is not valid in CGL, so we take αd ↔ [α]φ as primary.

4.3 Demonic Semantics

Demon wins a Demonic test by presenting a realizer b as evidence that the
precondition holds. If he cannot present a realizer (i.e., because none exists), then
the game ends in so Angel wins by default. Else Angel’s higher-order realizer a
consumes the evidence of the pre-condition, i.e., Angelic strategies are entitled to
depend (computably) on how Demon demonstrated the precondition. Angel can
check that Demon passed the test by executing b. The Demonic repetition game
96 B. Bohrer and A. Platzer

α∗ is deﬁned as a ﬁxed-point [42] with Demonic projections. Computationally,

a winning invariant for the repetition is the witness of its winnability.
The remaining cases are innocuous by comparison. Demonic deterministic
assignments x := f deterministically store the value of f in x, just as Angelic
assignments do. In demonic nondeterministic assignment x := ∗, Demon chooses
to set x to any value. When Demon plays the choice game α ∪ β, Demon chooses
classically between α and β. The dual game αd is played by Demon becoming
dormant and Angel become active in α.

Semantics Examples. The realizability semantics of games are subtle on a ﬁrst

read, so we provide examples of realizers. In these examples, the state argument
ω is implicit, and we refer to ω(x) simply as x for brevity.
Recall that [?φ]ψ and φ → ψ are equivalent. For any φ, the identity function
(Λx : φ Rz. x) is a φ → φ-realizer: for every φ-realizer x which Demon presents,
Angel can present the same x as evidence of φ. This confirms expected behavior
per propositional constructive logic: the identity function is the proof of self-
implication.
In example formula x := ∗d ; {x := x ∪ x := −x}x ≥ 0, Demon gets to set x,
then Angel decides whether to negate x in order to make it nonnegative. It is
realized by Λx : Q. ((if (x < 0) 1 else 0), ()): Demon announces the value of
x, then Angel’s strategy is to check the sign of x, taking the right branch when
x is negative. Each branch contains a deterministic assignment which consumes
no realizer, then the postcondition x ≥ 0 has trivial realizer ().
∗
Consider the formula {x := x + 1} x > y, where Angel’s winning strategy
is to repeat the loop until x > y, which will occur as x increases. The realizer
is ind(w. (if (x > y) (0, ()) else (1, w), ())), which says that Angel stops the
loop if x > y and proves the postcondition with a trivial strategy. Else Angel
continues the loop, whose body consumes no realizer, and supplies the inductive
call w to continue the strategy inductively.
∗
Consider the formula [?x > 0; {x := x + 1} ]∃y (y ≤ x ∧ y > 0) for a subtle
example. Our strategy for Angel is to record the initial value of x in y, then
maintain a proof that y ≤ x as x increases. This strategy is represented by Λw :
(x > 0) Rz. gen((x, ((), w)), z.(πL (z), ((), πR (πR (z)))), z.z). That is, initially
Demon announces a proof w of x > 0. Angel specifies the initial element of the
realizer stream by witnessing ∃y (y ≤ x∧y > 0) with c0 = (x, ((), w)), where the
first component instantiates y = x, the trivial second component indicates that
y ≤ y trivially, and the third component reuses w as a proof of y > 0. Demon can
choose to repeat the loop arbitrarily. When Demon demands the k’th repetition,
z is bound to ck−1 to compute ck = (πL (z), ((), πR (πR (z)))), which plays the
next iteration. That is, at each iteration Angel witnesses ∃y (y ≤ x ∧ y > 0) by
assigning the same value (stored in πL (z)) to y, reproving y ≤ x with (), then
reusing the proof (stored in πR (πR (z))) that y > 0.
Constructive Game Logic 97

5 Proof Calculus
Having settled on the meaning of a game in Section 4, we proceed to develop a
calculus for proving CGL formulas syntactically. The goal is twofold: the practical
motivation, as always, is that when verifying a concrete example, the realizabil-
ity semantics provide a notion of ground truth, but are impractical for proving
large formulas. The theoretical motivation is that we wish to expose the compu-
tational interpretation of the modalities αφ and [α]φ as the types of the players’
respective winning strategies for game α that has φ as its goal condition. Since
CGL is constructive, such a strategy constructs a proof of the postcondition φ.
To study the computational nature of proofs, we write proof terms explicitly:
the main proof judgement Γ M : φ says proof term M is a proof of φ in context
Γ , or equivalently a proof of sequent (Γ φ). We write M, N, O (sometimes
A, B, C) for arbitrary proof terms, and p, q, , r, s, g for proof variables, that is
variables that range over proof terms of a given proposition. In contrast to the
assignable program variables, the proof variables are given their meaning by
substitution and are scoped locally, not globally. We adapt propositional proof
terms such as pairing, disjoint union, and lambda-abstraction to our context of
game logic. To support ﬁrst-order games, we include ﬁrst-order proof terms and
new terms for features: dual, assignment, and repetition games.
We now develop the calculus by starting with standard constructs and work-
ing toward the novel constructs of CGL. The assumptions p in Γ are named,
so that they may appear as variable proof-terms p. We write Γ xy and M xy for
the renaming of program variable x to y and vice versa in context Γ or proof
term M, respectively. Proof rules for state-modifying constructs explicitly perform
renamings, which both ensures they are applicable as often as possible and also
ensures that references to proof variables support an intuitive notion of lexical
scope. Likewise Γxf and Mxf are the substitutions of term f for program variable
x. We use distinct notation to substitute proof terms for proof variables while
avoiding capture: [N/p]M substitutes proof term N for proof variable p in proof
term M . Some proof terms such as pairs prove both a diamond formula and a
box formula. We write M, N and [M, N ] respectively to distinguish the terms
or [M, N ] to treat them uniformly. Likewise we abbreviate [α]φ when the same
rule works for both diamond and box modalities, using [α]φ to denote its dual
modality. The proof terms x := f xy in p. M and [x := f xy in p. M ] introduce
an auxiliary ghost variable y for the old value of x, which improves completeness
without requiring manual ghost steps.
The propositional proof rules of CGL are in Fig. 1. Formula [?φ]ψ is construc-
tive implication, so rule [?]E with proof term M N eliminates M by supplying
an N that proves the test condition. Lambda terms (λp : φ. M ) are introduced
by rule [?]I by extending the context Γ . While this rule is standard, it is worth
emphasizing that here p is a proof variable for which a proof term (like N in [?]E)
may be substituted, and that the game state is untouched by [?]I. Constructive
disjunction (between the branches αφ and βφ) is the choice α ∪ βφ. The
introduction rules for injections are ∪I1 and ∪I2, and case-analysis is per-
formed with rule ∪E, with two branches that prove a common consequence
98 B. Bohrer and A. Platzer

Γ A : α ∪ βφ Γ, : αφ B : ψ Γ, r : βφ C : ψ

∪E
Γ case A of ⇒ B | r ⇒ C : ψ
Γ M : αφ Γ M : [α ∪ β]φ
∪I1 [∪]E1
Γ · M : α ∪ βφ Γ [π1 M ] : [α]φ
Γ M : βφ Γ M : [α ∪ β]φ
∪I2 [∪]E2
Γ r · M : α ∪ βφ Γ [π2 M ] : [β]φ
Γ M : [α]φ Γ N : [β]φ
[∪]I hyp
Γ [M, N ] : [α ∪ β]φ Γ, p : φ p : φ
Γ M :φ Γ N :ψ Γ M : ?φψ
?I ?E1
Γ M, N : ?φψ Γ π1 M : φ
Γ, p : φ M : ψ Γ M : ?φψ
[?]I ?E2
Γ (λp : φ. M ) : [?φ]ψ Γ π2 M : ψ
Γ M : [?φ]ψ Γ N : φ
[?]E
Γ (M N ) : ψ

Fig. 1. CGL proof calculus: Propositional rules

from each disjunct. The cases ?φψ and [α ∪ β]φ are conjunctive. Conjunctions
are introduced by ?I and [∪]I as pairs, and eliminated by ?E1, ?E2, [∪]E1,
and [∪]E2 as projections. Lastly, rule hyp says formulas in the context hold by
assumption.
We now begin considering non-propositional rules, starting with the simplest
ones. The majority of the rules in Fig. 2, while thoroughly useful in proofs,

Γ A : α∗ φ Γ, s : φ B : ψ Γ, g : αα∗ φ C : ψ
∗C
Γ case∗ A of s ⇒ B | g ⇒ C : ψ
y
Γ M : αφ Γ BV(α) ,p : φ N :ψ
M
Γ M ◦p N : αψ
Γ M : [α∗ ]φ Γ M :[α]φ
[∗]E [d ]I
Γ [unroll M ] : φ ∧ [α][α∗ ]φ Γ [yield M ] :[αd ]φ
Γ M :φ Γ M : φ ∧ [α][α∗ ]φ
∗S [∗]R
Γ stop M : α∗ φ Γ [roll M ] : [α∗ ]φ
Γ M : φ ∨ αα∗ φ Γ M :[α][β]φ
∗G [;]I
Γ go M : α∗ φ Γ [ι M ] :[α; β]φ

Fig. 2. CGL proof calculus: Some non-propositional rules

are computationally trivial. The repetition rules ([∗]E,[∗]R) fold and unfold the
notion of repetition as iteration. The rolling and unrolling terms are named in
analogy to the iso-recursive treatment of recursive types [50], where an explicit
operation is used to expand and collapse the recursive deﬁnition of a type.
Rules ∗C,∗S,∗G are the destructor and injectors for α∗ φ, which are
similar to those for α ∪ βφ. The duality rules ([d ]I) say the dual game is proved
by proving the game where roles are reversed. The sequencing rules ([;]I) say a
Constructive Game Logic 99

sequential game is played by playing the first game with the goal of reaching a
state where the second game is winnable.
Among these rules, monotonicity M is especially computationally rich. The
y
notation Γ BV(α) says that in the second premiss, the assumptions in Γ have
all bound variables of α (written BV(α)) renamed to fresh variables y for com-
pleteness. In practice, Γ usually contains some assumptions on variables that
are not bound, which we wish to access without writing them explicitly in φ.
Rule M is used to execute programs right-to-left, giving shorter, more efficient
proofs. It can also be used to derive the Hoare-logical sequential composition
rule, which is frequently used to reduce the number of case splits. Note that like
every GL, CGL is subnormal, so the modal modus ponens axiom K and Gödel
generalization (or necessitation) rule G are not sound, and M takes over much of
the role they usually serve. On the surface, M simply says games are monotonic:
a game’s goal proposition may freely be replaced with a weaker one. From a
computational perspective, Section 7 will show that rule M can be (lazily) elimi-
nated. Moreover, M is an admissible rule, one whose instances can all be derived
from existing rules. When proofs are written right-to-left with M, the normal-
ization relation translates them to left-to-right normal proofs. Note also that in
checking M ◦p N, the context Γ has the bound variables α renamed freshly to
some y within N, as required to maintain soundness across execution of α.
Next, we consider first-order rules, i.e., those which deal with first-order
programs that modify program variables. The first-order rules are given in Fig. 3.
In :∗E, FV(ψ) are the free variables of ψ, the variables which can influence
its meaning. Nondeterministic assignment provides quantification over rational-

Γ xy , p : (x = f xy ) M : φ
[:=]I (y fresh)
Γ [x := f xy in p. M ] :[x := f ]φ
Γ x , p : (x = f x ) M : φ
y y
:∗I (y, p fresh, f comp.)
Γ f xy :∗ p. M : x := ∗φ
Γ M : x := ∗φ Γ x , p : φ N : ψ
y
:∗E (y fresh, x ∈/ FV(ψ))
Γ unpack(M, py. N ) : ψ
Γ x M :φ
y
[:∗]I (y fresh)
Γ (λx : Q. M ) : [x := ∗]φ
Γ M : [x := ∗]φ
[:∗]E (φfx admiss.)
Γ (M f ) : φfx

Fig. 3. CGL proof calculus: ﬁrst-order games

valued program variables. Rule [:∗]I is universal, with proof term (λx : Q. M ).
While this notation is suggestive, the diﬀerence vs. the function proof term
(λp : φ. M ) is essential: the proof term M is checked (resp. evaluated) in a state
where the program variable x has changed from its initial value. For soundness,
[:∗]I renames x to fresh program variable y throughout context Γ, written Γ xy .
This means that M can freely refer to all facts of the full context, but they
100 B. Bohrer and A. Platzer

now refer to the state as it was before x received a new value. Elimination [:∗]E
then allows instantiating x to a term f . Existential quantiﬁcation is introduced
by :∗I whose proof term f xy :∗ p. M is like a dependent pair plus bound
renaming of x to y. The witness f is an arbitrary computable term, as always.
We write f x :∗ M for short when y is not referenced in M . It is eliminated in
:∗E by unpacking the pair, with side condition x ∈ / FV(ψ) for soundness. The
assignment rules [:=]I do not quantify, per se, but always update x to the value
of the term f, and in doing so introduce an assumption that x and f (suitably
renamed) are now equal. In :∗I and [:=]I, program variable y is fresh.

Γ A:ϕ
p : ϕ, q : M0 = M 0 B : α(ϕ ∧ M0 M) p : ϕ, q : M = 0 C : φ
∗I (M0 fresh)
Γ for(p : ϕ(M) = A; q. B; C){α} : α∗ φ
∗
Γ M :J Γ A : α φ
p : J N : [α]J p : J O : φ s : φ B:ψ g : αψ C : ψ
[∗]I FP
Γ (M rep p : J. N in O) : [α∗ ]φ Γ FP(A, s. B, g. C) : ψ
split Γ (split [f, g] ) : f ≤ g ∨ f > g

Fig. 4. CGL proof calculus: loops

The looping rules in Fig. 4, especially ∗I, are arguably the most sophis-
ticated in CGL. Rule ∗I provides a strategy to repeat a game α until the
postcondition φ holds. This is done by exhibiting a convergence predicate ϕ and
termination metric M with terminal value 0 and well-ordering . Proof term A
shows ϕ holds initially. Proof term B guarantees M decreases with every itera-
tion where M0 is a fresh metric variable which is equal to M at the antecedent
of B and is never modified. Proof term C allows any postcondition φ which fol-
lows from convergence ϕ ∧ M = 0. Proof term for(p : ϕ(M) = A; q. B; C){α}
suggests the computational interpretation as a for-loop: proof A shows the con-
vergence predicate holds in the initial state, B shows that each step reduces the
termination metric while maintaining the predicate, and C shows that the post-
condition follows from the convergence predicate upon termination. The game α
repeats until convergence is reached (M = 0). By the assumption that metrics
are well-founded, convergence is guaranteed in finitely (but arbitrarily) many
iterations.
A naïve, albeit correct, reading of rule ∗I says M is literally some term f .
If lexicographic or otherwise non-scalar metrics should be needed, it suffices to
interpret ϕ and M0 M as formulas over several scalar variables.
Rule FP says α∗ φ is a least pre-fixed-point. That is, if we wish to show a
formula ψ holds now, we show that ψ is any pre-fixed-point, then it must hold
as it is no lesser than φ. Rule [∗]I is the well-understood induction rule for loops,
which applies as well to repeated games. Premiss O ensures [∗]I supports any
provable postcondition, which is crucial for eliminating M in Lemma 7. The elim-
ination form for [α∗ ]φ is simply [∗]E. Like any program logic, reasoning in CGL
consists of first applying program-logic rules to decompose a program until the
Constructive Game Logic 101

program has been entirely eliminated, then applying first-order logic principles
at the leaves of the proof. The constructive theory of rationals is undecidable
because it can express the undecidable [47] classical theory of rationals. Thus
facts about rationals require proof in practice. For the sake of space and since our
focus is on program reasoning, we defer an axiomatization of rational arithmetic
to future work. We provide a (non-effective!) rule FO which says valid first-order
formulas are provable.

Γ M :ρ
FO (exists a s.t. {a} × S ⊆ [[ρ → φ]], ρ, φ F.O.)
Γ FO[φ](M ) : φ

An eﬀective special case of FO is split (Fig. 4), which says all term comparisons
are decidable. Rule split can be generalized to decide termination metrics (M =
0 ∨ M 0). Rule iG says the value of term f can be remembered in fresh ghost
variable x:
Γ, p : x = f M : φ
iG (x fresh except free in M, p fresh)
Γ Ghost[x = f ](p. M ) : φ

Rule iG can be deﬁned using arithmetic and with quantiﬁers:

Ghost[x = f ](p. M ) ≡ (λx : Q. (λp : (x = f ). M )) f (FO[f = f ]())

What’s Novel in the CGL Calculus? CGL extends first-order reasoning with game
reasoning (sequencing [32], assignments, iteration, and duality). The combina-
tion of first-order reasoning with game reasoning is synergistic: for example,
repetition games are known to be more expressive than repetition systems [42].
We give a new natural-deduction formulation of monotonicity. Monotonicity is
admissible and normalization translates monotonicity proofs into monotonicity-
free proofs. In doing so, normalization shows that right-to-left proofs can be
(lazily) rewritten as left-to-right. Additionally, first-order games are rife with
changing state, and soundness requires careful management of the context Γ .
The extended version [12] uses our calculus to prove the example formulas.

6 Theory: Soundness

Full versions of proofs outlined in this paper are given in the extended ver-
sion [12]. We have introduced a proof calculus for CGL which can prove winning
strategies for Nim and CC. For any new proof calculus, it is essential to con-
vince ourselves of our soundness, which can be done within several prominent
schools of thought. In proof-theoretic semantics, for example, the proof rules are
taken as the ground truth, but are validated by showing the rules obey expected
properties such as harmony or, for a sequent calculus, cut-elimination. While we
will investigate proof terms separately (Section 8), we are already equipped to
show soundness by direct appeal to the realizability semantics (Section 4), which
we take as an independent notion of ground truth. We show soundness of CGL
102 B. Bohrer and A. Platzer

proof rules against the realizability semantics, i.e., that every provable natural-
deduction sequent is valid. An advantage of this approach is that it explicitly
connects the notions of provability and computability! We build up to the proof
of soundness by proving lemmas on structurality, renaming and substitution.

Lemma 1 (Structurality). The structural rules W, X, and C are admissible,

i.e., the conclusions are provable whenever the premisses are provable.
Γ M :φ Γ, p : φ, q : ψ M : ρ Γ, p : φ, q : φ M : ρ
W X C
Γ, p : ψ M : φ Γ, q : ψ, p : φ M : ρ Γ, p : φ [p/q]M : ρ

Proof summary. Each rule is proved admissible by induction on M . Observe that

the only premisses regarding Γ are of the form Γ (p) = φ, which are preserved
under weakening. Premisses are trivially preserved under exchange because con-
texts are treated as sets, and preserved modulo renaming by contraction as it
suﬃces to have any assumption of a given formula, regardless its name. The
context Γ is allowed to vary in applications of the inductive hypothesis, e.g., in
rules that bind program variables. Some rules discard Γ in checking the subterms
inductively, in which case the IH need not be applied at all.

Lemma 2 (Uniform renaming). Let M xy be the renaming of program variable

x to y (and vice-versa) within M , even when neither x nor y is fresh. If Γ M
: φ then Γ xy M xy : φ xy .

Proof summary. Straightforward induction on the structure of M . Renaming

within proof terms (whose deﬁnition we omit as it is quite tedious) follows
the usual homomorphisms, from which the
inductive cases follow. In the case
that M is a proof variable z, then Γ xy (z) = Γ (z) xy from which the case
follows. The interesting cases are those which modify program variables, e.g.,
z := f wz in p. M . The bound variable z is renamed to z xy , while the auxiliary
variable w is α-varied if necessary to maintain freshness. Renaming then happens
recursively in M .

Substitution will use proofs of coincidence and bound eﬀect lemmas.

Lemma 3 (Coincidence). Only the free variables of an expression inﬂuence
its semantics.

Lemma 4 (Bound eﬀect). Only the bound variables of a game are modiﬁed
by execution.

Summary. By induction on the expression, in analogy to [43].

Deﬁnition 9 (Term substitution admissibility). For simplicity, we say φfx

(likewise for context Γ, term f, game α, and proof term M ) is admissible if φ
binds neither x nor free variables of f .

The latter condition can be relaxed in practice [44] to requiring φ does not
mention x under bindings of free variables.
Constructive Game Logic 103

Lemma 5 (Arithmetic-term substitution). If Γ M : φ and the substitu-

tions Γxf , Mxf , and φfx are admissible, then Γxf Mxf : φfx .

Summary. By induction on M . Admissibility holds recursively, and so can be

assumed at each step of the induction. For non-atomic M that bind no variables,
the proof follows from the inductive hypotheses. For M that bind variables, we
appeal to Lemma 3 and Lemma 4.

Just as arithmetic terms are substituted for program variables, proof terms
are substituted for proof variables.
Lemma 6 (Proof term substitution). Let [N/p]M substitute N for p in M ,
avoiding capture. If Γ, p : ψ M : φ and Γ N : ψ then Γ [N/p]M : φ.

Proof. By induction on M, appealing to renaming, coincidence, and bound ef-

fect. When substituting N for p into a term that binds program variables such
as z := f yz in q. M , we avoid capture by renaming within occurrences of N
in the recursive call, i.e., [N/p]z := f yz in q. M = z := f yz in q. [N yz /p]M ,
preserving soundness by Lemma 2.

Soundness of the proof calculus exploits renaming and substitution.

Theorem 1 (Soundness of proof calculus). If Γ M : φ then (Γ φ) is
valid. As a special case for empty context ·, if · M : φ, then φ is valid.

Proof summary. By induction on M . Modus ponens case A B reduces to Lemma 6.

Cases that bind program variables, such as assignment, hold by Lemma 5 and
Lemma 2. Rule W is employed when substituting under a binder.

We have now shown that the CGL proof calculus is sound, the sine qua non
condition of any proof system. Because soundness was w.r.t. a realizability se-
mantics, we have shown CGL is constructive in the sense that provable formulas
correspond to realizable strategies, i.e., imperative programs executed in an ad-
versarial environment. We will revisit constructivity again in Section 8 from the
perspective of proof terms as functional programs.

7 Operational Semantics

The Curry-Howard interpretation of games is not complete without exploring the

interpretation of proof simpliﬁcation as normalization of functional programs.
To this end, we now introduce a structural operational semantics for CGL proof
terms. This semantics provides a view complementary to the realizability seman-
tics: not only do provable formulas correspond to realizers, but proof terms can
be directly executed as functional programs, resulting in a normal proof term.
The chief subtlety of our operational semantics is that in contrast to realizer exe-
cution, proof simpliﬁcation is a static operation, and thus does not inspect game
state. Thus the normal form of a proof which branches on the game state is, of
necessity, also a proof which branches on the game state. This static-dynamic
104 B. Bohrer and A. Platzer

phase separation need not be mysterious: it is analogous to the monadic phase

separation between a functional program which returns an imperative command
vs. the execution of the returned command. While the primary motivation for
our operational semantics is to complete the Curry-Howard interpretation, proof
normalization is also helpful when implementing software tools which process
proof artifacts, since code that consumes a normal proof is in general easier to
implement than code that consumes an arbitrary proof.
The operational semantics consist of two main judgments: M normal says that
M is a normal form, while M → M says that M reduces to term M in one step
of evaluation. A normal proof is allowed a case operation at the top-level, either
case A of ⇒ B | r ⇒ C or case∗ A of s ⇒ B | g ⇒ C. Normal proofs M
without state-casing are called simple, written M simp. The requirement that
cases are top-level ensures that proofs which differ only in where the case was
applied share a common normal form, and ensures that β-reduction is never
blocked by a case interceding between introduction-elimination pairs. Top-level
case analyses are analogous to case-tree normal forms in lambda calculi with
coproducts [4]. Reduction of proof terms is eager.
Definition 10 (Normal forms). We say M is simple, written M simp, if
eliminators occur only under binders. We say M is normal, written M normal, if
M simp or M has shape case A of ⇒ B | r ⇒ C or case∗ A of s ⇒ B | g ⇒
C where A is a term such as (split [f, g] M ) that inspects the state. Subterms
B and C need not be normal since they occur under the binding of or r (resp.
s or g).
That is, a normal term has no top-level beta-redexes, and state-dependent
cases are top-level. We consider rules [∗]R, [:∗]I, [?]I, and [:=]I binding. Rules
such as ∗I have multiple premisses but bind only one. While [∗]R does not
introduce a proof variable, it is rather considered binding to prevent divergence,
which is in keeping with a coinductive understanding of formula [α∗ ]φ. If we did
not care whether terms diverge, we could have made [∗]R non-binding.
For the sake of space, this section focuses on the β-rules (Fig. 5). The full
calculus, given in the extended version [12], includes structural and commuting-
conversion rules, as well as what we call monotonicity conversion rules: a proof
term M ◦p N is simplified by structural recursion on M . The capture-avoiding
substitution of M for p in N is written [M/p]N (Lemma 6). The propositional
cases λφβ, λβ, caseβL, caseβR, π1 β, and π2 β are standard reductions for ap-
plications, cases, and projections. Projection terms π1 M and π2 M should not
be confused with projection realizers πL (a) and πR (a). Rule unpackβ makes the
witness of an existential available in its client as a ghost variable.
Rule FPβ, repβ, and forβ reduce introductions and eliminations of loops.
Rule FPβ, which reduces a proof FP(A, s. B, g. C) says that if α∗ has already
terminated according to A, then B proves the postcondition. Else the inductive
step C applies, but every reference to the IH g is transformed to a recursive
application of FP. If A uses only ∗S and ∗G, then FP(A, s. B, g. C) reduces
to a simple term, else if A uses ∗I, then FP(A, s. B, g. C) reduces to a case.
Rule repβ says loop induction (M rep p : J. N in O) reduces to a delayed pair
Constructive Game Logic 105

λφβ (λp : φ. M ) N → [N/p]M caseβL [case [ · A] of ⇒ B | r ⇒ C] → [A/]B

λβ (λx : Q. M ) f → Mxf caseβR [case [r · A] of ⇒ B | r ⇒ C] → [A/r]C
π1 β [π1 [M, N ]] → M unrollβ [unroll [roll M ]] → M
π2 β [π2 [M, N ]] → N
unpackβ unpack(f xy :∗ q. M , py. N ) → (Ghost[x = f xy ](q. [M/p]N )) xy
FPβ FP(D, s. B, g. C) → (case∗ D of s ⇒ B | g ⇒ [(g◦z FP(z, s. B, g. C))/g]C)
repβ (M rep p : J. N in O) → [roll M, ([M/p]N )◦q (q rep p : J. N in O)]
forβ for(p : ϕ(M) = A; q. B; C){α} →
case split [M, 0] of
⇒ stop [(A, )/(p, q)]C
| r ⇒ Ghost[M0 = M](rr. go (([A, rr, r/p, q]B)◦t (for(p : ϕ(M) = π1 t; q. B; C){α})))

Fig. 5. Operational semantics: β-rules

of the “stop” and “go” cases, where the “go” case first shows [α]J, for loop in-
variant J, then expands J → [α∗ ]φ in the postcondition. Note the laziness of
[roll] is essential for normalization: when (M rep p : J. N in O) is understood
as a coinductive proof, it is clear that normalization would diverge if repβ were
applied indefinitely. Rule forβ for for(p : ϕ(M) = A; q. B; C){α} checks whether
the termination metric M has reached terminal value 0. If so, the loop stop’s
and A proves it has converged. Else, we remember M’s value in a ghost term
M0 , and go forward, supplying A and r, rr to satisfy the preconditions of
inductive step B, then execute the loop for(p : ϕ(M) = π1 t; q. B; C){α} in the
postcondition. Rule forβ reflects the fact that the exact number of iterations is
state dependent.
We discuss the structural, commuting conversion, and monotonicity conver-
sion rules for left injections as an example, with the full calculus in [12]. Struc-
tural rule ·S evaluates term M under an injector. Commuting conversion rule
[·]C normalizes an injection of a case to a case with injectors on each branch.
Monotonicity conversion rule [·]◦ simplifies a monotonicity proof of an injection
to an injection of a monotonicity proof.

M → M
·S
[ · M ] → [ · M ]
[·]C [ · case A of p ⇒ B | q ⇒ C] → case A of p ⇒ [ · B] | q ⇒ [ · C]
[·]◦ [ · M ]◦p N → [ · (M ◦p N )]

Fig. 6. Operational semantics: structural, commuting conversion, monotonicity rules

8 Theory: Constructivity
We now complete the study of CGL’s constructivity. We validate the operational
semantics on proof terms by proving that progress and preservation hold, and
106 B. Bohrer and A. Platzer

thus the CGL proof calculus is sound as a type system for the functional pro-
gramming language of CGL proof terms.
Lemma 7 (Progress). If · M : φ, then either M is normal or M → M for
some M .
Summary. By induction on the proof term M . If M is an introduction rule,
by the inductive hypotheses the subterms are well-typed. If they are all simple,
then M simp. If some subterm (not under a binder) steps, then M steps by a
structural rule. Else some subterm is an irreducible case expression not under
a binder, it lifts by the commuting conversion rule. If M is an elimination rule,
structural and commuting conversion rules are applied as above. Else by Def. 10
the subterm is an introduction rule, and M reduces with a β-rule. Lastly, if M
has form A◦x B and A simp, then by Def. 10 A is an introduction form, thus
reduced by some monotonicity conversion rule.

Lemma 8 (Preservation). Let → ∗ be the reﬂexive, transitive closure of the

→ relation. If · M : φ and M → ∗ M , then · M : φ

Summary. Induct on the derivation M →∗ M , then induct on M → M . The β

cases follow by Lemma 6 (for base constructs), and Lemma 6 and Lemma 2 (for
assignments). C-rules and ◦-rules lift across binders, soundly by W. S-rules are
direct by IH.
We gave two understandings of proofs in CGL, as imperative strategies and
as functional programs. We now give a ﬁnal perspective: CGL proofs support
synthesis in principle, one of our main motivations. Formally, the Existential
Property (EP) and Disjunction Property (DP) justify synthesis [18] for exis-
tentials and disjunctions: whenever an existential or disjunction has a proof,
then we can compute some instance or disjunct that has a proof. We state and
prove an EP and DP for CGL, then introduce a Strategy Property, their coun-
terpart for synthesizing strategies from game modalities. It is important to our
EP that terms are arbitrary computable functions, because more simplistic term
languages are often too weak to witness the existentials they induce.
Example 1 (Rich terms help). Formulas over polynomial terms can have non-
polynomial witnesses.
Let φ ≡ (x = y ∧ x ≥ 0) ∨ (x = −y ∧ x < 0). Then f = |x| witnesses ∃y : Q φ.

Lemma 9 (Existential Property). If Γ M :(∃x : Q φ) then there exists a

f (ω)
term f and realizer b such that for all (a, ω) ∈ [[ Γ ]], we have (b a, ωx ) ∈ [[φ]].

Proof. By Theorem 1, the sequent (Γ ∃x : Q φ) is valid. Since (a, ω) ∈ [[ Γ ]],

then by the deﬁnition of sequent validity, there exists a common realizer c such
that (c a, ω) ∈ [[∃x : Q φ]]. Now let f = πL (c a) and b = πR (c a) and the result is
immediate by the semantics of existentials.
Constructive Game Logic 107

Disjunction strategies can depend on the state, so naïve DP does not hold.
Example 2 (Naïve DP). When Γ M :(φ ∨ ψ) there need not be N such that
Γ N : φ or Γ N : ψ.
Consider φ ≡ x > 0 and ψ ≡ x < 1. Then · split [x, 0] () :(φ ∨ ψ), but
neither x < 1 nor x > 0 is valid, let alone provable.

Lemma 10 (Disjunction Property). When Γ M : φ∨ψ there exists realizer

b and computable f, s.t. for every ω and a such that (a, ω) ∈ [[ Γ ]], either
f (ω) = 0 and (πL (b), ω) ∈ [[φ]], else f (ω) = 1 and (πR (b), ω) ∈ [[ψ]].

Proof. By Theorem 1, the sequent Γ φ ∨ ψ is valid. Since (a, ω) ∈ [[ Γ ]],

then by the deﬁnition of sequent validity, there exists a common realizer c such
that (c a, ω) ∈ [[φ ∨ ψ]]. Now let f = πL (c a) and b = πR (c a) and the result is
immediate by the semantics of disjunction.

Following the same approach, we generalize to a Strategy Property. In CGL,

strategies are represented by realizers, which implement every computation made
throughout the game. Thus, to show provable games have computable winning
strategies, it suﬃces to exhibit realizers.
Theorem 2 (Active Strategy Property). If Γ M : αφ, then there exists
a realizer b such that for all ω and realizers a such that (a, ω) ∈ [[ Γ ]], then

{(b a, ω)}α ⊆ [[φ]] ∪ {}.

Theorem 3 (Dormant Strategy Property). If Γ M : [α]φ, thenthere ex-

ists a realizer b such that for all ω and realizers a such that (a, ω) ∈ [[ Γ ]], then
{(b a, ω)}[[α]] ⊆ [[φ]] ∪ {}.

Summary. From proof term M and Theorem 1, we have a realizer for formula
αφ or [α]φ, respectively. We proceed by induction on α: the realizer b a contains
all realizers applied in the inductive cases composed with their continuations that
prove φ in each base case.

While these proofs, especially EP and DP, are short and direct, we note that
this is by design: the challenge in developing CGL is not so much the proofs of
this section, rather these proofs become simple because we adopted a realizability
semantics. The challenge was in developing the semantics and adapting the proof
calculus and theory to that semantics.

9 Conclusion and Future Work

In this paper, we developed a Constructive Game Logic CGL, from syntax and re-
alizability semantics to a proof calculus and operational semantics on the proof
terms. We developed two understandings of proofs as programs: semantically,
every proof of game winnability corresponds to a realizer which computes the
game’s winning strategy, while the language of proof terms is also a functional
108 B. Bohrer and A. Platzer

programming language where proofs reduce to their normal forms according to

the operational semantics. We completed the Curry-Howard interpretation for
games by showing Existential, Disjunction, and Strategy properties: programs
can be synthesized that decide which instance, disjunct, or moves are taken in
existentials, disjunctions, and games. In summary, we have developed the most
comprehensive Curry-Howard interpretation of any program logic to date, for a
much more expressive logic than prior work [32]. Because CGL contains construc-
tive Concurrent DL and first-order DL as strict fragments, we have provided a
comprehensive Curry-Howard interpretation for them in one fell swoop. The key
insights behind CGL should apply to the many dynamic and Hoare logics used
in verification today.
Synthesis is the immediate application of CGL. Motivations for synthesis
include security games [40], concurrent programs with demonic schedulers (Con-
current Dynamic Logic), and control software for safety-critical cyber-physical
systems such as cars and planes. In general, any kind of software program which
must operate correctly in an adversarial environment can benefit from game logic
verification. The proofs of Theorem 2 and Theorem 3 constitute an (on-paper)
algorithm which performs synthesis of guaranteed-correct strategies from game
proofs. The first future work is to implement this algorithm in code, providing
much-needed assurance for software which is often mission-critical or safety-
critical. This paper focused on discrete CGL with one numeric type simply be-
cause any further features would distract from the core features. Real applica-
tions come from many domains which add features around this shared core.
The second future work is to extend CGL to hybrid games, which provide
compelling applications from the domain of adversarial cyber-physical systems.
This future work will combine the novel features of CGL with those of the classical
logic dGL. The primary task is to define a constructive semantics for differential
equations and to give constructive interpretations to the differential equation
rules of dGL. Previous work on formalizations of differential equations [34] sug-
gests differential equations can be treated constructively. In principle, existing
proofs in dGL might happen to be constructive, but this does not obviate the
present work. On the contrary, once a game logic proof is shown to fall in the
constructive fragment, our work gives a correct synthesis guarantee for it too!

References
1. Abramsky, S., Jagadeesan, R., Malacaria, P.: Full abstraction for PCF. Inf. Com-
put. 163(2), 409–470 (2000), https://doi.org/10.1006/inco.2000.2930
2. Aczel, P., Gambino, N.: The generalised type-theoretic interpretation of construc-
tive set theory. J. Symb. Log. 71(1), 67–103 (2006), https://doi.org/10.2178/jsl/
1140641163
3. Alechina, N., Mendler, M., de Paiva, V., Ritter, E.: Categorical and Kripke seman-
tics for constructive S4 modal logic. In: Fribourg, L. (ed.) CSL. LNCS, vol. 2142,
pp. 292–307. Springer (2001), https://doi.org/10.1007/3-ν540-ν44802-ν0_21
4. Altenkirch, T., Dybjer, P., Hofmann, M., Scott, P.J.: Normalization by evaluation
for typed lambda calculus with coproducts. In: LICS. pp. 303–310. IEEE Computer
Society (2001), https://doi.org/10.1109/LICS.2001.932506
Constructive Game Logic 109

5. Alur, R., Henzinger, T.A., Kupferman, O.: Alternating-time temporal logic. J.

ACM 49(5), 672–713 (2002), https://doi.org/10.1145/585265.585270
6. van Benthem, J.: Logic of strategies: What and how? In: van Benthem, J., Ghosh,
S., Verbrugge, R. (eds.) Models of Strategic Reasoning - Logics, Games, and Com-
munities, LNCS, vol. 8972, pp. 321–332. Springer (2015), https://doi.org/10.1007/
978-ν3-ν662-ν48540-ν8_10
7. van Benthem, J., Bezhanishvili, G.: Modal logics of space. In: Aiello, M., Pratt-
Hartmann, I., van Benthem, J. (eds.) Handbook of Spatial Logics, pp. 217–298.
Springer (2007), https://doi.org/10.1007/978-ν1-ν4020-ν5587-ν4_5
8. van Benthem, J., Bezhanishvili, N., Enqvist, S.: A propositional dynamic logic for
instantial neighborhood models. In: Baltag, A., Seligman, J., Yamada, T. (eds.)
Logic, Rationality, and Interaction - 6th International Workshop, LORI 2017, Sap-
poro, Japan, September 11-14, 2017, Proceedings. LNCS, vol. 10455, pp. 137–150.
Springer (2017), https://doi.org/10.1007/978-ν3-ν662-ν55665-ν8_10
9. van Benthem, J., Pacuit, E.: Dynamic logics of evidence-based beliefs. Studia Log-
ica 99(1-3), 61–92 (2011), https://doi.org/10.1007/s11225-ν011-ν9347-νx
10. van Benthem, J., Pacuit, E., Roy, O.: Toward a theory of play: A logical perspective
on games and interaction. Games (2011), https://doi.org/10.3390/g2010052
11. Bishop, E.: Foundations of constructive analysis (1967)
12. Bohrer, B., Platzer, A.: Constructive hybrid games. CoRR abs/2002.02536
(2020), https://arxiv.org/abs/2002.02536
13. Bridges, D.S., Vita, L.S.: Techniques of constructive analysis. Springer (2007)
14. Celani, S.A.: A fragment of intuitionistic dynamic logic. Fundam. Inform. 46(3),
187–197 (2001), http://content.iospress.com/articles/fundamenta-νinformaticae/
fi46-ν3-ν01
15. Chatterjee, K., Henzinger, T.A., Piterman, N.: Strategy logic. In: Caires, L., Vas-
concelos, V.T. (eds.) CONCUR. LNCS, Springer (2007), https://doi.org/10.1007/
978-ν3-ν540-ν74407-ν8_5
16. Coquand, T., Huet, G.P.: The calculus of constructions. Inf. Comput. 76(2/3),
95–120 (1988), https://doi.org/10.1016/0890-ν5401(88)90005-ν3
17. Curry, H., Feys, R.: Combinatory logic. In: Heyting, A., Robinson, A. (eds.) Studies
in logic and the foundations of mathematics. North-Holland (1958)
18. Degen, J., Werner, J.: Towards intuitionistic dynamic logic. Logic and Logical
Philosophy 15(4), 305–324 (2006). https://doi.org/10.12775/LLP.2006.018
19. Doberkat, E.: Towards a coalgebraic interpretation of propositional dynamic logic.
CoRR abs/1109.3685 (2011), http://arxiv.org/abs/1109.3685
20. Fernández-Duque, D.: The intuitionistic temporal logic of dynamical systems.
Log. Methods in Computer Science 14(3) (2018), https://doi.org/10.23638/
LMCS-ν14(3:3)2018
21. Frittella, S., Greco, G., Kurz, A., Palmigiano, A., Sikimic, V.: A proof-theoretic
semantic analysis of dynamic epistemic logic. J. Log. Comput. 26(6), 1961–2015
(2016), https://doi.org/10.1093/logcom/exu063
22. Fulton, N., Platzer, A.: A logic of proofs for differential dynamic logic:
Toward independently checkable proof certificates for dynamic logics.
In: Avigad, J., Chlipala, A. (eds.) CPP. pp. 110–121. ACM (2016).
https://doi.org/10.1145/2854065.2854078
23. Ghilardi, S.: Presheaf semantics and independence results for some non-classical
first-order logics. Arch. Math. Log. 29(2), 125–136 (1989), https://doi.org/10.
1007/BF01620621
24. Ghosh, S.: Strategies made explicit in dynamic game logic. Workshop on Logic and
Intelligent Interaction at ESSLLI pp. 74 –81 (2008)
110 B. Bohrer and A. Platzer

25. Goranko, V.: The basic algebra of game equivalences. Studia Logica 75(2), 221–238
(2003), https://doi.org/10.1023/A:1027311011342
26. Griffin, T.: A formulae-as-types notion of control. In: Allen, F.E. (ed.) POPL. pp.
47–58. ACM Press (1990), https://doi.org/10.1145/96709.96714
27. Harper, R.: The holy trinity (2011), https://web.archive.org/web/
20170921012554/http://existentialtype.wordpress.com/2011/03/27/
the-νholy-νtrinity/
28. Hilken, B.P., Rydeheard, D.E.: A first order modal logic and its sheaf models
29. Hoare, C.A.R.: An axiomatic basis for computer programming. Commun. ACM
12(10), 576–580 (1969). https://doi.org/10.1145/363235.363259
30. van der Hoek, W., Jamroga, W., Wooldridge, M.J.: A logic for strategic reasoning.
In: Dignum, F., Dignum, V., Koenig, S., Kraus, S., Singh, M.P., Wooldridge, M.J.
(eds.) AAMAS. ACM (2005), https://doi.org/10.1145/1082473.1082497
31. Howard, W.A.: The formulae-as-types notion of construction. To HB Curry: essays
on combinatory logic, lambda calculus and formalism 44, 479–490 (1980)
32. Kamide, N.: Strong normalization of program-indexed lambda calculus. Bull. Sect.
Logic Univ. Łódź 39(1-2), 65–78 (2010)
33. Lipton, J.: Constructive Kripke semantics and realizability. In: Moschovakis,
Y. (ed.) Logic from Computer Science. pp. 319–357. Springer (1992).
https://doi.org/10.1007/978-1-4612-2822-6_13
34. Makarov, E., Spitters, B.: The Picard algorithm for ordinary differential equa-
tions in Coq. In: Blazy, S., Paulin-Mohring, C., Pichardie, D. (eds.) ITP. LNCS,
vol. 7998. Springer (2013), https://doi.org/10.1007/978-ν3-ν642-ν39634-ν2_34
35. Mamouras, K.: Synthesis of strategies using the Hoare logic of angelic and demonic
nondeterminism. Log. Methods Computer Science 12(3) (2016), https://doi.org/
10.2168/LMCS-ν12(3:6)2016
36. Mitsch, S., Platzer, A.: ModelPlex: Verified runtime validation of verified
cyber-physical system models. Form. Methods Syst. Des. 49(1), 33–74 (2016).
https://doi.org/10.1007/s10703-016-0241-z, special issue of selected papers from
RV’14
37. van Oosten, J.: Realizability: A historical essay. Mathematical Structures in Com-
puter Science 12(3), 239–263 (2002), https://doi.org/10.1017/S0960129502003626
38. Parikh, R.: Propositional game logic. In: FOCS. pp. 195–200. IEEE (1983), https:
//doi.org/10.1109/SFCS.1983.47
39. Pauly, M.: A modal logic for coalitional power in games. J. Log. Comput. 12(1),
149–166 (2002), https://doi.org/10.1093/logcom/12.1.149
40. Pauly, M., Parikh, R.: Game logic - an overview. Studia Logica 75(2), 165–182
(2003), https://doi.org/10.1023/A:1027354826364
41. Peleg, D.: Concurrent dynamic logic. J. ACM 34(2), 450–479 (1987), https://doi.
org/10.1145/23005.23008
42. Platzer, A.: Differential game logic. ACM Trans. Comput. Log. 17(1), 1:1–1:51
(2015). https://doi.org/10.1145/2817824
43. Platzer, A.: A complete uniform substitution calculus for differential dynamic logic.
J. Autom. Reas. 59(2), 219–265 (2017). https://doi.org/10.1007/s10817-016-9385-
1
44. Platzer, A.: Uniform substitution for differential game logic. In: Galmiche, D.,
Schulz, S., Sebastiani, R. (eds.) IJCAR. LNCS, vol. 10900, pp. 211–227. Springer
(2018). https://doi.org/10.1007/978-3-319-94205-6_15
45. Pratt, V.R.: Semantical considerations on floyd-hoare logic. In: FOCS. pp. 109–121.
IEEE (1976). https://doi.org/10.1109/SFCS.1976.27
Constructive Game Logic 111

46. Ramanujam, R., Simon, S.E.: Dynamic logic on games with structured strategies.
In: Brewka, G., Lang, J. (eds.) Knowledge Representation. pp. 49–58. AAAI Press
(2008), http://www.aaai.org/Library/KR/2008/kr08-ν006.php
47. Robinson, J.: Deﬁnability and decision problems in arithmetic. J. Symb. Log. 14(2),
98–114 (1949), https://doi.org/10.2307/2266510
48. The Coq development team: The Coq proof assistant reference manual (2019),
https://coq.inria.fr/
49. Van Benthem, J.: Games in dynamic-epistemic logic. Bulletin of Economic Re-
search 53(4), 219–248 (2001)
50. Vanderwaart, J., Dreyer, D., Petersen, L., Crary, K., Harper, R., Cheng, P.: Typed
compilation of recursive datatypes. In: Shao, Z., Lee, P. (eds.) Proceedings of
TLDI’03: 2003 ACM SIGPLAN International Workshop on Types in Languages
Design and Implementation, New Orleans, Louisiana, USA, January 18, 2003. pp.
98–108. ACM (2003), https://doi.org/10.1145/604174.604187
51. Weihrauch, K.: Computable Analysis - An Introduction. Texts in Theoretical
Computer Science. An EATCS Series, Springer (2000), https://doi.org/10.1007/
978-ν3-ν642-ν56999-ν9
52. Wijesekera, D.: Constructive modal logics I. Ann. Pure Appl. Logic 50(3), 271–301
(1990), https://doi.org/10.1016/0168-ν0072(90)90059-νB
53. Wijesekera, D., Nerode, A.: Tableaux for constructive concurrent dynamic logic.
Ann. Pure Appl. Logic (2005), https://doi.org/10.1016/j.apal.2004.12.001

Krishnendu Chatterjee1 , Amir Kafshdar Goharshady1 , Rasmus Ibsen-Jensen2 ,

and Andreas Pavlogiannis3
1
IST Austria, Klosterneuburg, Austria
[krishnendu.chatterjee, amir.goharshady]@ist.ac.at
2
University of Liverpool, Liverpool, United Kingdom
[email protected]
3
Aarhus University, Aarhus, Denmark
[email protected]

Abstract. Interprocedural data-ﬂow analyses form an expressive and

useful paradigm of numerous static analysis applications, such as live
variables analysis, alias analysis and null pointers analysis. The most
widely-used framework for interprocedural data-flow analysis is IFDS,
which encompasses distributive data-flow functions over a finite domain.
On-demand data-flow analyses restrict the focus of the analysis on spe-
cific program locations and data facts. This setting provides a natural
split between (i) an offline (or preprocessing) phase, where the program
is partially analyzed and analysis summaries are created, and (ii) an on-
line (or query) phase, where analysis queries arrive on demand and the
summaries are used to speed up answering queries.
In this work, we consider on-demand IFDS analyses where the queries
concern program locations of the same procedure (aka same-context
queries). We exploit the fact that flow graphs of programs have low
treewidth to develop faster algorithms that are space and time optimal
for many common data-flow analyses, in both the preprocessing and the
query phase. We also use treewidth to develop query solutions that are
embarrassingly parallelizable, i.e. the total work for answering each query
is split to a number of threads such that each thread performs only a
constant amount of work. Finally, we implement a static analyzer based
on our algorithms, and perform a series of on-demand analysis experi-
ments on standard benchmarks. Our experimental results show a dras-
tic speed-up of the queries after only a lightweight preprocessing phase,
which significantly outperforms existing techniques.

Keywords: Data-ﬂow analysis, IFDS, Treewidth

∗
The research was partly supported by Austrian Science Fund (FWF) Grant
No. NFN S11407-N23 (RiSE/SHiNE), FWF Schrödinger Grant No. J-4220, Vienna
Science and Technology Fund (WWTF) Project ICT15-003, Facebook PhD Fellow-
ship Program, IBM PhD Fellowship Program, and DOC Fellowship No. 24956 of the
Austrian Academy of Sciences (ÖAW). A longer version of this work is available at [17].

c The Author(s) 2020

P. Müller (Ed.): ESOP 2020, LNCS 12075, pp. 112–140, 2020.
https://doi.org/10.1007/978-3-030-44914-8_ 5
Optimal and Parallel On-demand Data-ﬂow Analysis 113

1 Introduction

Static data-ﬂow analysis. Static program analysis is a fundamental approach

for both analyzing program correctness and performing compiler optimizations
[25,39,44,64,30]. Static data-flow analyses associate with each program location
a set of data-flow facts which are guaranteed to hold under all program ex-
ecutions, and these facts are then used to reason about program correctness,
report erroneous behavior, and optimize program execution. Static data-flow
analyses have numerous applications, such as in pointer analysis (e.g., points-
to analysis and detection of null pointer dereferencing) [46,57,61,62,66,67,69], in
detecting privacy and security issues (e.g., taint analysis, SQL injection analysis)
[3,37,31,33,47,40], as well as in compiler optimizations (e.g., constant propaga-
tion, reaching definitions, register allocation) [50,32,55,13,2].
Interprocedural analysis and the IFDS framework. Data-flow analyses fall in two
large classes: intraprocedural and interprocedural. In the former, each procedure
of the program is analyzed in isolation, ignoring the interaction between proce-
dures which occurs due to parameter passing/return. In the latter, all procedures
of the program are analyzed together, accounting for such interactions, which
leads to results of increased precision, and hence is often preferable to intrapro-
cedural analysis [49,54,59,60]. To filter out false results, interprocedural analyses
typically employ call-context sensitivity, which ensures that the underlying exe-
cution paths respect the calling context of procedure invocations. One of the most
widely used frameworks for interprocedural data-flow analysis is the framework
of Interprocedural Finite Distributive Subset (IFDS) problems [50], which offers
a unified formulation of a wide class of interprocedural data-flow analyses as a
reachability problem. This elegant algorithmic formulation of data-flow analysis
has been a topic of active study, allowing various subsequent practical improve-
ments [36,45,8,3,47,56] and implementations in prominent static analysis tools
such as Soot [7] and WALA [1].
On-demand analysis. Exhaustive data-flow analysis is computationally expensive
and often unnecessary. Hence, a topic of great interest in the community is
that of on-demand data-flow analysis [4,27,36,51,48,68,45]. On-demand analyses
have several applications, such as (quoting from [36,48]) (i) narrowing down the
focus to specific points of interest, (ii) narrowing down the focus to specific
data-flow facts of interest, (iii) reducing work in preliminary phases, (iv) side-
stepping incremental updating problems, and (v) offering demand analysis as a
user-level operation. On-demand analysis is also extremely useful for speculative
optimizations in just-in-time compilers [24,43,5,29], where dynamic information
can dramatically increase the precision of the analysis. In this setting, it is crucial
that the the on-demand analysis runs fast, to incur as little overhead as possible.

Example 1. As a toy motivating example, consider the partial program shown in

Figure 1, compiled with a just-in-time compiler that uses speculative optimiza-
tions. Whether the compiler must compile the expensive function h depends on
whether x is null in line 6. Performing a null-pointer analysis from the entry of
114 K. Chatterjee et al.

1 void f ( int b ){ 9 void g ( int & x , int y ){

2 int * x = NULL , * y = NULL ; 10 x=y;
3 if ( b > 1) 11 }
4 y = &b;
5 g (x , y ); 12 void h (){
6 if ( x == NULL ) 13 // An expensive
7 h (); 14 // function
8 } 15 }

Fig. 1: A partial C++ program.

f reveals that x might be null in line 6. Hence, if the decision to compile h relies
only on an oﬄine static analysis, h is always compiled, even when not needed.
Now consider the case where the execution of the program is in line 4, and
at this point the compiler decides on whether to compile h. It is clear that
given this information, x cannot be null in line 6 and thus h does not have
to be compiled. As we have seen above, this decision can not be made based
on oﬄine analysis. On the other hand, an on-demand analysis starting from the
current program location will correctly conclude that x is not null in line 6. Note
however, that this decision is made by the compiler during runtime. Hence, such
an on-demand analysis is useful only if it can be performed extremely fast. It
is also highly desirable that the time for running this analysis is predictable, so
that the compiler can decide whether to run the analysis or simply compile h
proactively.

The techniques we develop in this paper answer the above challenges rigor-
ously. Our approach exploits a key structural property of flow graphs of pro-
grams, called treewidth.
Treewidth of programs. A very well-studied notion in graph theory is the con-
cept of treewidth of a graph, which is a measure of how similar a graph is to
a tree (a graph has treewidth 1 precisely if it is a tree) [52]. On one hand the
treewidth property provides a mathematically elegant way to study graphs, and
on the other hand there are many classes of graphs which arise in practice and
have constant treewidth. The most important example is that the flow graph
for goto-free programs in many classic programming languages have constant
treewidth [63]. The low treewidth of flow graphs has also been confirmed exper-
imentally for programs written in Java [34], C [38], Ada [12] and Solidity [15].
Treewidth has important algorithmic implications, as many graph problems
that are hard to solve in general admit efficient solutions on graphs of low
treewidth. In the context of program analysis, this property has been exploited to
develop improvements for register allocation [63,9] (a technique implemented in
the Small Device C Compiler [28]), cache management [18], on-demand algebraic
path analysis [16], on-demand intraprocedural data-flow analysis of concurrent
programs [20] and data-dependence analysis [14].
Optimal and Parallel On-demand Data-flow Analysis 115

Problem statement. We focus on on-demand data-ﬂow analysis in IFDS [50,36,48].

The input consists of a supergraph G of n vertices, a data-fact domain D and a
data-flow transformer function M . Edges of G capture control-flow within each
procedure, as well as procedure invocations and returns. The set D defines the
domain of the analysis, and contains the data facts to be discovered by the anal-
ysis for each program location. The function M associates with every edge (u, v)
of G a data-flow transformer M (u, v) : 2D → 2D . In words, M (u, v) defines the
set of data facts that hold at v in some execution that transitions from u to v,
given the set of data facts that hold at u.
On-demand analysis brings a natural separation between (i) an offline (or
preprocessing) phase, where the program is partially analyzed, and (ii) an online
(or query) phase, where on-demand queries are handled. The task is to preprocess
the input in the offline phase, so that in the online phase, the following types of
on-demand queries are answered efficiently:
1. A pair query has the form (u, d1 , v, d2 ), where u, v are vertices of G in the
same procedure, and d1 , d2 are data facts. The goal is to decide if there exists
an execution that starts in u and ends in v, and given that the data fact d1
held at the beginning of the execution, the data fact d2 holds at the end.
These are known as same-context queries and are very common in data-flow
analysis [23,50,16].
2. A single-source query has the form (u, d1 ), where u is a vertex of G and d1
is a data fact. The goal is to compute for every vertex v that belongs to the
same procedure as u, all the data facts that might hold in v as witnessed by
executions that start in u and assuming that d1 holds at the beginning of
each such execution.
Previous results. The on-demand analysis problem admits a number of solutions
that lie in the preprocessing/query spectrum. On the one end, the preprocessing
phase can be disregarded, and every on-demand query be treated anew. Since
each query starts a separate instance of IFDS, the time to answer it is O(n·|D|3 ),
for both pair and single-source queries [50]. On the other end, all possible queries
can be pre-computed and cached in the preprocessing phase in time O(n2 · |D|3 ),
after which each query costs time proportional to the size of the output (i.e.,
O(1)) for pair queries and O(n · |D|) for single-source queries). Note that this full
preprocessing also incurs a cost O(n2 · |D|2 ) in space for storing the cache table,
which is often prohibitive. On-demand analysis was more thoroughly studied
in [36]. The main idea is that, instead of pre-computing the answer to all possible
queries, the analysis results obtained by handling each query are memoized to a
cache table, and are used for speeding up the computation of subsequent queries.
This is a heuristic-based approach that often works well in practice, however,
the only guarantee provided is that of same-worst-case-complexity, which states
that in the worst case, the algorithm uses O(n2 · |D|3 ) time and O(n2 · |D|2 )
space, similarly to the complete preprocessing case. This guarantee is inadequate
for runtime applications such as the example of Figure 1, as it would require
either (i) to run a full analysis, or (ii) to run a partial analysis which might
wrongly conclude that h is reachable, and thus compile it. Both cases incur a
116 K. Chatterjee et al.

large runtime overhead, either because we run a full analysis, or because we

compile an expensive function.
Our contributions. We develop algorithms for on-demand IFDS analyses that
have strong worst-case time complexity guarantees and thus lead to more pre-
dictable performance than mere heuristics. The contributions of this work are
as follows:
1. We develop an algorithm that, given a program represented as a supergraph
of size n and a data fact domain D, solves the on-demand same-context IFDS
problem while spending (i) O(n · |D|3 ) time in the preprocessing phase, and
(ii) O(|D|/ log n) time for a pair query and O(n · |D|2 / log n) time for a
single-source query in the query phase. Observe that when |D| = O(1), the
preprocessing and query times are proportional to the size of the input and
outputs, respectively, and are thus optimal § . In addition, our algorithm uses
O(n · |D|2 ) space at all times, which is proportional to the size of the input,
and is thus space optimal. Hence, our algorithm not only improves upon
previous state-of-the-art solutions, but also ensures optimality in both time
and space.
2. We also show that after our one-time preprocessing, each query is embar-
rassingly parallelizable, i.e., every bit of the output can be produced by a
single thread in O(1) time. This makes our techniques particularly useful to
speculative optimizations, since the analysis is guaranteed to take constant
time and thus incur little runtime overhead. Although the parallelization of
data-flow analysis has been considered before [41,42,53], this is the first time
to obtain solutions that span beyond heuristics and offer theoretical guaran-
tees. Moreover, this is a rather surprising result, given that general IFDS is
known to be P-complete.
3. We implement our algorithms on a static analyzer and experimentally eval-
uate their performance on various static analysis clients over a standard set
of benchmarks. Our experimental results show that after only a lightweight
preprocessing, we obtain a significant speedup in the query phase compared
to standard on-demand techniques in the literature. Also, our parallel im-
plementation achieves a speedup close to the theoretical optimal, which il-
lustrates that the perfect parallelization of the problem is realized by our
approach in practice.
Recently, we exploited the low-treewidth property of programs to obtain
faster algorithms for algebraic path analysis [16] and intraprocedural reachabil-
ity [21]. Data-flow analysis can be reduced to these problems. Hence, the algo-
rithms in [16,21] can also be applied to our setting. However, our new approach
has two important advantages: (i) we show how to answer queries in a perfectly
parallel manner, and (ii) reducing the problem to algebraic path properties and
then applying the algorithms in [16,21] yields O(n · |D|3 ) preprocessing time and
O(n · log n · |D|2 ) space, and has pair and single-source query time O(|D|) and
O(n · |D|2 ). Hence, our space usage and query times are better by a factor of

§
Note that we count the input itself as part of the space usage.
Optimal and Parallel On-demand Data-ﬂow Analysis 117

log n¶ . Moreover, when considering the complexity wrt n, i.e. considering D to

be a constant, these results are optimal wrt both time and space. Hence, no
further improvement is possible.
Remark. Note that our approach does not apply to arbitrary CFL reachability
in constant treewidth. In addition to the treewidth, our algorithms also exploit
speciﬁc structural properties of IFDS. In general, small treewidth alone does not
improve the complexity of CFL reachability [14].

2 Preliminaries

Model of computation. We consider the standard RAM model with word size
W = Θ(log n), where n is the size of our input. In this model, one can store
W bits in one word (aka “word tricks”) and arithmetic and bitwise operations
between pairs of words can be performed in O(1) time. In practice, word size is
a property of the machine and not the analysis. Modern machines have words
of size at least 64. Since the size of real-world input instances never exceeds 2 64 ,
the assumption of word size W = Θ(log n) is well-realized in practice and no
additional effort is required by the implementer to account for W in the context
of data flow analysis.
Graphs. We consider directed graphs G = (V, E) where V is a finite set of
vertices and E ⊆ V × V is a set of directed edges. We use the term graph to
refer to directed graphs and will explicitly mention if a graph is undirected.
For two vertices u, v ∈ V, a path P from u to v is a finite sequence of vertices
P = (wi )ki=0 such that w0 = u, wk = v and for every i < k, there is an edge from
wi to wi+1 in E. The length |P | of the path P is equal to k. In particular, for
every vertex u, there is a path of length 0 from u to itself. We write P : u v to
denote that P is a path from u to v and u v to denote the existence of such a
path, i.e. that v is reachable from u. Given a set V ⊆ V of vertices, the induced
subgraph of G on V is defined as G[V ] = (V , E ∩ (V × V )). Finally, the graph
G is called bipartite if the set V can be partitioned into two sets V1 , V2 , so that
every edge has one end in V1 and the other in V2 , i.e. E ⊆ (V1 × V2 ) ∪ (V2 × V1 ).

2.1 The IFDS Framework

IFDS [50] is a ubiquitous and general framework for interprocedural data-flow
analyses that have finite domains and distributive flow functions. It encompasses
a wide variety of analyses, including truly-live variables, copy constant propa-
gation, possibly-uninitialized variables, secure information-flow, and gen/kill or
bitvector problems such as reaching definitions, available expressions and live
variables [50,7]. IFDS obtains interprocedurally precise solutions. In contrast to
intraprocedural analysis, in which precise denotes “meet-over-all-paths”, inter-
procedurally precise solutions only consider valid paths, i.e. paths in which when
¶
This improvement is due to the differences in the preprocessing phase. Our algo-
rithms for the query phase are almost identical to our previous work.
118 K. Chatterjee et al.

a function reaches its end, control returns back to the site of the most recent
call [58].
Flow graphs and supergraphs. In IFDS, a program with k procedures is specified
by a supergraph, i.e. a graph G = (V, E) consisting of k flow graphs G1 , . . . , Gk ,
one for each procedure, and extra edges modeling procedure-calls. Flow graphs
represent procedures in the usual way, i.e. they contain one vertex vi for each
statement i and there is an edge from vi to vj if the statement j may immediately
follow the statement i in an execution of the procedure. The only exception is
that a procedure-call statement i is represented by two vertices, a call vertex
ci and a return-site vertex ri . The vertex ci only has incoming edges, and the
vertex ri only has outgoing edges. There is also a call-to-return-site edge from
ci to ri . The call-to-return-site edges are included for passing intraprocedural
information, such as information about local variables, from ci to ri . Moreover,
each flow graph Gl has a unique start vertex sl and a unique exit vertex el .
The supergraph G also contains the following edges for each procedure-call i
with call vertex ci and return-site vertex ri that calls a procedure l: (i) an inter-
procedural call-to-start edge from ci to the start vertex of the called procedure,
i.e. sl , and (ii) an interprocedural exit-to-return-site edge from the exit vertex
of the called procedure, i.e. el , to ri .
Example 2. Figure 2 shows a simple C++ program on the left and its supergraph
on the right. Each statement i of the program has a corresponding vertex vi in
the supergraph, except for statement 7, which is a procedure-call statement and
hence has a corresponding call vertex c7 and return-site vertex r7 .

1 void f ( int & x , int y ){ smain v5

2 y = new int (1);
3 y = new int (2); v6
4 }
sf v 1 call-to-start
c7
call-to-return-site

5 int main (){ v2 r7

e
6 int *x , * y ; rn
-sit
tu
7 f (x , y ); v3 o-
re v8
it-t
ex
8 * x += * y ;
9 } e f v4 emain v9

Fig. 2: A C++ program (left) and its supergraph (right).

Interprocedurally valid paths. Not every path in the supergraph G can potentially
be realized by an execution of the program. Consider a path P in G and let P
be the sequence of vertices obtained by removing every vi from P , i.e. P only
consists of ci ’s and ri ’s. Then, P is called a same-context valid path if P can be
generated from S in this grammar:
Optimal and Parallel On-demand Data-ﬂow Analysis 119

S →ci S ri S for a procedure-call statement i.

| ε
Moreover, P is called an interprocedurally valid path or simply valid if P can be
generated from the nonterminal S in the following grammar:

S →S ci S for a procedure-call statement i.

| S
For any two vertices u, v of the supergraph G, we denote the set of all interproce-
durally valid paths from u to v by IVP(u, v) and the set of all same-context valid
paths from u to v by SCVP(u, v). Informally, a valid path starts from a statement
in a procedure p of the program and goes through a number of procedure-calls
while respecting the rule that whenever a procedure ends, control should return
to the return-site in its parent procedure. A same-context valid path is a valid
path in which every procedure-call ends and hence control returns back to the
initial procedure p in the same context.
IFDS [50]. An IFDS problem instance is a tuple I = (G, D, F, M, ) where:
– G = (V, E) is a supergraph as above.
– D is a finite set, called the domain, and each d ∈ D is called a data flow fact.
– The meet operator is either intersection or union.
– F ⊆ 2D → 2D is a set of distributive flow functions over , i.e. for each
function f ∈ F and every two sets of facts D1 , D2 ⊆ D, we have f (D1 D2 ) =
f (D1 ) f (D2 ).
– M : E → F is a map that assigns a distributive flow function to each edge
of the supergraph.
Let P = (wi )ki=0 be a path in G, ei = (wi−1 , wi ) and mi = M (ei ). In other
words, the ei ’s are the edges appearing in P and the mi ’s are their corresponding
distributive flow functions. The path function of P is defined as: pfP := mk ◦
· · · ◦ m2 ◦ m1 where ◦ denotes function composition. The solution of I is the
collection of values {MVPv }v∈V :

MVPv := pfP (D).
P ∈IVP(smain ,v)

Intuitively, the solution is deﬁned by taking meet-over-all-valid-paths. If the meet

operator is union, then MVPv is the set of data flow facts that may hold at v,
when v is reached in some execution of the program. Conversely, if the meet
operator is intersection, then MVPv consists of data flow facts that must hold
at v in every execution of the program that reaches v. Similarly, we define the
same-context solution of I as the collection of values {MSCPv }v∈Vmain defined as
follows:
MSCPv := pfP (D). (1)
P ∈SCVP(smain ,v)

The intuition behind MSCP is similar to that of MVP, except that in MSCPv
we consider meet-over-same-context-paths (corresponding to runs that return to
the same stack state).
120 K. Chatterjee et al.

Remark 1. We note two points about the IFDS framework:

– As in [50], we only consider IFDS instances in which the meet operator
is union. Instances with intersection can be reduced to union instances by
dualization [50].
– For brevity, we are considering a global domain D, while in many applica-
tions the domain is procedure-speciﬁc. This does not aﬀect the generality of
our approach and our algorithms remain correct for the general case where
each procedure has its own dedicated domain. Indeed, our implementation
supports the general case.

Succinct representations. A distributive function f : 2D → 2D can be succinctly

represented by a relation Rf ⊆ (D ∪ {0}) × (D ∪ {0}) deﬁned as:

Rf := {(0, 0)}
∪ {(0, b) | b ∈ f (∅)}
∪ {(a, b) | b ∈ f ({a}) − f (∅)}.

Given that f is distributive over union, we have f ({d1 , . . . , dk }) = f ({d1 })∪· · ·∪

f ({dk }). Hence, to specify f it is suﬃcient to specify f (∅) and f ({d}) for each
d ∈ D. This is exactly what Rf does. In short, we have: f (∅) = {b ∈ D | (0, b) ∈
Rf } and f ({d}) = f (∅) ∪ {b ∈ D | (d, b) ∈ Rf }. Moreover, we can represent the
relation Rf as a bipartite graph Hf in which each part consists of the vertices
D ∪ {0} and Rf is the set of edges. For brevity, we deﬁne D∗ := D ∪ {0}.

0 a b 0 a b 0 a b 0 a b 0 a b

0 a b 0 a b 0 a b 0 a b a b
0
{a} x = ∅
λx.{a, b} λx.(x − {a}) ∪ {b} λx.x λx.x ∪ {a} λx.
∅ x=∅

Fig. 3: Succinct representation of several distributive functions.

Example 3. Let D = {a, b}. Figure 3 provides several examples of bipartite

graphs representing distributive functions.

Bounded Bandwidth Assumption. Following [50], we assume that the bandwidth

in function calls and returns is bounded by a constant. In other words, there is
a small constant b, such that for every edge e that is a call-to-start or exit-to-
return-site edge, every vertex in the graph representation HM (e) has degree b or
less. This is a classical assumption in IFDS [50,7] and models the fact that every
parameter in a called function is only dependent on a few variables in the callee
(and conversely, every returned value is only dependent on a few variables in the
called function).
Optimal and Parallel On-demand Data-ﬂow Analysis 121

Composition of distributive functions. Let f and g be distributive functions and

Rf and Rg their succinct representations. It is easy to verify that g ◦ f is also
distributive, hence it has a succinct representation Rg◦f . Moreover, we have
Rg◦f = Rf ; Rg = {(a, b) | ∃c (a, c) ∈ Rf ∧ (c, b) ∈ Rg }.

0 a b
0 a b
λx.x ∪ {a}
λx.{a}
{a} x = ∅

λx.
∅ x=∅ 0 a b
0 a b

Fig. 4: Obtaining Hg◦f (right) from Hf and Hg (left)

Example 4. In terms of graphs, to compute Hg◦f , we ﬁrst take Hf and Hg , then

contract corresponding vertices in the lower part of Hf and the upper part of
Hg , and ﬁnally compute reachability from the topmost part to the bottommost
part of the resulting graph. Consider f (x) = x ∪ {a}, g(x) = {a} for x = ∅
and g(∅) = ∅, then g ◦ f (x) = {a} for all x ⊆ D. Figure 4 shows contracting
of corresponding vertices in Hf and Hg (left) and using reachability to obtain
Hg◦f (right).

Exploded supergraph. Given an IFDS instance I = (G, D, F, M, ∪) with super-

graph G = (V, E), its exploded supergraph G is obtained by taking |D∗ | copies of
each vertex in V , one corresponding to each element of D∗ , and replacing each
edge e with the graph representation HM (e) of the ﬂow function M (e). Formally,
G = (V , E) where V = V × D∗ and
E = ((u, d1 ), (v, d2 )) | e = (u, v) ∈ E ∧ (d1 , d2 ) ∈ RM (e) .

A path P in G is (same-context) valid, if the path P in G, obtained by ignoring

the second component of every vertex in P , is (same-context) valid. As shown
in [50], for a data flow fact d ∈ D and a vertex v ∈ V, we have d ∈ MVPv iff
there is a valid path in G from (smain , d ) to (v, d) for some d ∈ D ∪ {0}. Hence,
the IFDS problem is reduced to reachability by valid paths in G. Similarly, the
same-context IFDS problem is reduced to reachability by same-context valid
paths in G.
Example 5. Consider a null pointer analysis on the program in Figure 2. At each
program point, we want to know which pointers can potentially be null. We first
model this problem as an IFDS instance. Let D = {x̄, ȳ}, where x̄ is the data
flow fact that x might be null and ȳ is defined similarly. Figure 5 shows the same
program and its exploded supergraph.
At point 8, the values of both pointers x and y are used. Hence, if either of
x or y is null at 8, a null pointer error will be raised. However, as evidenced by
122 K. Chatterjee et al.

the two valid paths shown in red, both x and y might be null at 8. The pointer
y might be null because it is passed to the function f by value (instead of by
reference) and keeps its local value in the transition from c7 to r7 , hence the
edge ((c7 , ȳ), (r7 , ȳ)) is in G. On the other hand, the function f only initializes
y, which is its own local variable, and does not change x (which is shared with
main).

0x̄ ȳ v
5
1 void f ( int *& x , int * y ) {
v6
2 y = new int (1);
3 y = new int (2); c7
4 }
v1

5 int main () { v2
6 int *x , * y ;
7 f (x , y ); v3 r7
8 * x += * y ;
9 } v4 v8
0x̄ ȳ
v9

Fig. 5: A Program (left) and its Exploded Supergraph (right).

2.2 Trees and Tree Decompositions

Trees. A rooted tree T = (VT , ET ) is an undirected graph with a distinguished

“root” vertex r ∈ VT , in which there is a unique path Pvu between every pair
{u, v} of vertices. We refer to the number of vertices in VT as the size of T . For
an arbitrary vertex v ∈ VT , the depth of v, denoted by dv , is defined as the length
of the unique path Pvr : r v. The depth or height of T is the maximum depth
among its vertices. A vertex u is called an ancestor of v if u appears in Pvr . In
this case, v is called a descendant of u. In particular, r is an ancestor of every
vertex and each vertex is both an ancestor and a descendant of itself. We denote
the set of ancestors of v by A↑v and its descendants by D↓v . It is straightforward
to see that for every 0 ≤ d ≤ dv , the vertex v has a unique ancestor with depth
d. We denote this ancestor by adv . The ancestor pv = advv −1 of v at depth dv − 1
is called the parent of v and v is a child of pv . The subtree Tv↓ corresponding to
↓
v is defined as T [D↓v ] = (D↓v , ET ∩ 2Dv ), i.e. the part of T that consists of v and
its descendants. Finally, a vertex v ∈ VT is called a leaf if it has no children.
Given two vertices u, v ∈ VT , the lowest common ancestor lca(u, v) of u and v is
defined as argmaxw∈A↑u ∩A↑v dw . In other words, lca(u, v) is the common ancestor
of u and v with maximum depth, i.e. which is farthest from the root.
Optimal and Parallel On-demand Data-flow Analysis 123

Lemma 1 ([35]). Given a rooted tree T of size n, there is an algorithm that

preprocesses T in O(n) and can then answer lowest common ancestor queries,
i.e. queries that provide two vertices u and v and ask for lca(u, v), in O(1).

Tree decompositions [52]. Given a graph G = (V, E), a tree decomposition of G

is a rooted tree T = (B, ET ) such that:
(i) Each vertex
b ∈ B of T has an associated subset V (b) ⊆ V of vertices of
G and b∈B V (b) = V. For clarity, we call each vertex of T a “bag” and
reserve the word vertex for G. Informally, each vertex must appear in some
bag.
(ii) For all (u, v) ∈ E, there exists a bag b ∈ B such that u, v ∈ V (b), i.e. every
edge should appear in some bag.
(iii) For any pair of bags bi , bj ∈ B and any bag bk that appears in the path
P : bi bj , we have V (bi ) ∩ V (bj ) ⊆ V (bk ), i.e. each vertex should appear
in a connected subtree of T .
The width of the tree decomposition T = (B, ET ) is deﬁned as the size of its
largest bag minus 1. The treewidth tw(G) of a graph G is the minimal width
among its tree decompositions. A vertex v ∈ V appears in a connected subtree,
so there is a unique bag b with the smallest possible depth such that v ∈ V (b).
We call b the root bag of v and denote it by rb(v).

v1 v5 b1 {v1 , v2 , v5 }

v6 v2 b2 {v2 , v3 , v5 }

v7 v3 v4 b3 {v3 , v4 , v5 } {v2 , v6 , v7 } b4

Fig. 6: A Graph G (left) and its Tree Decomposition T (right).

It is well-known that flow graphs of programs have typically small treewidth [63].
For example, programs written in Pascal, C, and Solidity have treewidth at most
3, 6 and 9, respectively. This property has also been confirmed experimentally
for programs written in Java [34], C [38] and Ada [12]. The challenge is thus to
exploit treewidth for faster interprocedural on-demand analyses. The first step
in this approach is to compute tree decompositions of graphs. As the follow-
ing lemma states, tree decompositions of low-treewidth graphs can be computed
efficiently.
Lemma 2 ([11]). Given a graph G with constant treewidth t, a binary tree
decomposition of size O(n) bags, height O(log n) and width O(t) can be computed
in linear time.

Separators [26]. The key structural property that we exploit in low-treewidth

ﬂow graphs is a separation property. Let A, B ⊆ V. The pair (A, B) is called a
separation of G if (i) A ∪ B = V, and (ii) no edge connects a vertex in A − B
124 K. Chatterjee et al.

to a vertex in B − A or vice versa. If (A, B) is a separation, the set A ∩ B is

called a separator. The following lemma states such a separation property for
low-treewidth graphs.

Lemma 3 (Cut Property [26]). Let T = (B, ET ) be a tree decomposition

of G = (V, E) and e = {b, b } ∈ ET . If we remove e, the tree T breaks into
b b
two connected
components, T and T , respectively containing b and b . Let
A = t∈T b V (t) and B = t∈T b V (t). Then (A, B) is a separation of G and its
corresponding separator is A ∩ B = V (b) ∩ V (b ).

Example 6. Figure 6 shows a graph and one of its tree decompositions with width
2. In this example, we have rb(v5 ) = b1 , rb(v3 ) = b2 , rb(v4 ) = b3 , and rb(v7 ) = b4 .
For the separator property of Lemma 3, consider the edge {b2 , b4 }. By removing
it, T breaks into two parts, one containing the vertices A = {v1 , v2 , v3 , v4 , v5 }
and the other containing B = {v2 , v6 , v7 }. We have A ∩ B = {v2 } = V (b2 ) ∩
V (b4 ). Also, any path from B − A = {v6 , v7 } to A − B = {v1 , v3 , v4 , v5 } or vice
versa must pass through {v2 }. Hence, (A, B) is a separation of G with separator
V (b2 ) ∩ V (b4 ) = {v2 }.

3 Problem definition
We consider same-context IFDS problems in which the flow graphs Gi have a
treewidth of at most t for a fixed constant t. We extend the classical notion of
same-context IFDS solution in two ways: (i) we allow arbitrary start points for
the analysis, i.e. we do not limit our analyses to same-context valid paths that
start at smain ; and (ii) instead of a one-shot algorithm, we consider a two-phase
process in which the algorithm first preprocesses the input instance and is then
provided with a series of queries to answer. We formalize these points below. We
fix an IFDS instance I = (G, D, F, M, ∪) with exploded supergraph G = (V , E).
Meet over same-context valid paths. We extend the definition of MSCP by spec-
ifying a start vertex u and an initial set Δ of data flow facts that hold at u.
Formally, for any vertex v that is in the same flow graph as u, we define:

MSCPu,Δ,v := pfP (Δ). (2)
P ∈SCVP(u,v)

The only difference between (2) and (1) is that in (1), the start vertex u is fixed
as smain and the initial data-fact set Δ is fixed as D, while in (2), they are free
to be any vertex/set.
Reduction to reachability. As explained in Section 2.1, computing MSCP is re-
duced to reachability via same-context valid paths in the exploded supergraph
G. This reduction does not depend on the start vertex and initial data flow facts.
Hence, for a data flow fact d ∈ D, we have d ∈ MSCPu,Δ,v iff in the exploded
supergraph G the vertex (v, d) is reachable via same-context valid paths from
a vertex (u, δ) for some δ ∈ Δ ∪ {0}. Hence, we define the following types of
queries:
Optimal and Parallel On-demand Data-flow Analysis 125

Pair query. A pair query provides two vertices (u, d1 ) and (v, d2 ) of the exploded
supergraph G and asks whether they are reachable by a same-context valid path.
Hence, the answer to a pair query is a single bit. Intuitively, if d2 = 0, then
the query is simply asking if v is reachable from u by a same-context valid
path in G. Otherwise, d2 is a data ﬂow fact and the query is asking whether
d2 ∈ MSCPu,{d1 }∩D,v .
Single-source query. A single-source query provides a vertex (u, d1 ) and asks for
all vertices (v, d2 ) that are reachable from (u, d1 ) by a same-context valid path.
Assuming that u is in the ﬂow graph Gi = (Vi , Ei ), the answer to the single source
query is a sequence of |Vi | · |D∗ | bits, one for each (v, d2 ) ∈ Vi × D∗ , signifying
whether it is reachable by same-context valid paths from (u, d1 ). Intuitively, a
single-source query asks for all pairs (v, d2 ) such that (i) v is reachable from u
by a same-context valid path and (ii) d2 ∈ MSCPu,{d1 }∩D,v ∪ {0}.
Intuition. We note the intuition behind such queries. We observe that since the
functions in F are distributive over ∪, we have MSCPu,Δ,v = ∪δ∈Δ MSCPu,{δ},v ,
hence MSCPu,Δ,v can be computed by O(|Δ|) single-source queries.

4 Treewidth-based Data-ﬂow Analysis

4.1 Preprocessing
The original solution to the IFDS problem, as first presented in [50], reduces
the problem to reachability over a newly constructed graph. We follow a sim-
ilar approach, except that we exploit the low-treewidth property of our flow
graphs at every step. Our preprocessing is described below. It starts with com-
puting constant-width tree decompositions for each of the flow graphs. We then
use standard techniques to make sure that our tree decompositions have a nice
form, i.e. that they are balanced and binary. Then comes a reduction to reacha-
bility, which is similar to [50]. Finally, we precompute specific useful reachability
information between vertices in each bag and its ancestors. As it turns out in
the next section, this information is sufficient for computing reachability between
any pair of vertices, and hence for answering IFDS queries.
Overview. Our preprocessing consists of the following steps:
(1) Finding Tree Decompositions. In this step, we compute a tree decom-
position Ti = (Bi , ETi ) of constant width t for each flow graph Gi . This can
either be done by applying the algorithm of [10] directly on Gi , or by using
an algorithm due to Thorup [63] and parsing the program.
(2) Balancing and Binarizing. In this step, we balance the tree decomposi-
tions Ti using the algorithm of Lemma 2 and make them binary using the
standard process of [22].
(3) LCA Preprocessing. We preprocess the Ti ’s for answering lowest common
ancestor queries using Lemma 1.
(4) Reduction to Reachability. In this step, we modify the exploded super-
graph G = (V , E) to obtain a new graph Ĝ = (V , Ê), such that for every
pair of vertices (u, d1 ) and (v, d2 ), there is a path from (u, d1 ) to (v, d2 ) in
126 K. Chatterjee et al.

Ĝ iﬀ there is a same-context valid path from (u, d1 ) to (v, d2 ) in G. So, this

step reduces the problem of reachability via same-context valid paths in G
to simple reachability in Ĝ.
(5) Local Preprocessing. In this step, for each pair of vertices (u, d1 ) and
(v, d2 ) for which there exists a bag b such that both u and v appear in b, we
compute and cache whether (u, d1 ) (v, d2 ) in Ĝ. We write (u, d1 ) local
(v, d2 ) to denote a reachability established in this step.
(6) Ancestors Reachability Preprocessing. In this step, we compute reach-
ability information between each vertex in a bag and vertices appearing in
its ancestors in the tree decomposition. Concretely, for each pair of vertices
(u, d1 ) and (v, d2 ) such that u appears in a bag b and v appears in a bag b
that is an ancestor of b, we establish and remember whether (u, d1 ) (v, d2 )
in Ĝ and whether (v, d2 ) (u, d1 ) in Ĝ. As above, we use the notations
(u, d1 ) anc (v, d2 ) and (v, d2 ) anc (u, d1 ).
Steps (1)–(3) above are standard and well-known processes. We now provide
details of steps (4)–(6). To skip the details and read about the query phase, see
Section 4.3 below.

Step (4): Reduction to Reachability

In this step, our goal is to compute a new graph Ĝ from the exploded supergraph
G such that there is a path from (u, d1 ) to (v, d2 ) in Ĝ iff there is a same-context
valid path from (u, d1 ) to (v, d2 ) in G. The idea behind this step is the same as
that of the tabulation algorithm in [50].
Summary edges. Consider a call vertex cl in G and its corresponding return-site
vertex rl . For d1 , d2 ∈ D∗ , the edge ((cl , d1 ), (rl , d2 )) is called a summary edge
if there is a same-context valid path from (cl , d1 ) to (rl , d2 ) in the exploded
supergraph G. Intuitively, a summary edge summarizes the effects of procedure
calls (same-context interprocedural paths) on the reachability between cl and
rl . From the definition of summary edges, it is straightforward to verify that the
graph Ĝ obtained from G by adding every summary edge and removing every
interprocedural edge has the desired property, i.e. a pair of vertices are reachable
in Ĝ iff they are reachable by a same-context valid path in G. Hence, we first
find all summary edges and then compute Ĝ. This is shown in Algorithm 1.
We now describe what Algorithm 1 does. Let sp be the start point of a
procedure p. A shortcut edge is an edge ((sp , d1 ), (v, d2 )) such that v is in the
same procedure p and there is a same-context valid path from (sp , d1 ) to (v, d2 ) in
G. The algorithm creates an empty graph H = (V , E ). Note that H is implicitly
represented by only saving E . It also creates a queue Q of edges to be added to
H (initially Q = E) and an empty set S which will store the summary edges.
The goal is to construct H such that it contains (i) intraprocedural edges of G,
(ii) summary edges, and (iii) shortcut edges.
It constructs H one edge at a time. While there is an unprocessed intrapro-
cedural edge e = ((u, d1 ), (v, d2 )) in Q, it chooses one such e and adds it to H
(lines 5–10). Then, if (u, d1 ) is reachable from (sp , d3 ) via a same-context valid
Optimal and Parallel On-demand Data-flow Analysis 127

Algorithm 1: Computing Ĝ in Step (4)

1 Q ← E;
2 S ← ∅;
3 E ← ∅;
4 while Q = ∅ do
5 Choose e = ((u, d1 ), (v, d2 )) ∈ Q;
6 Q ← Q − {e};
7 if (u, v) is an interprocedural edge, i.e. a call-to-start or exit-to-return-site
edge then
8 continue;
9 p ← the procedure s.t. u, v ∈ Vp ;
10 E ← E ∪ {e};
11 foreach d3 s.t. ((sp , d3 ), (u, d1 )) ∈ E do
12 if ((sp , d3 ), (v, d2 )) ∈ E ∪ Q then
13 Q ← Q ∪ {((sp , d3 ), (v, d2 ))};
14 if u = sp and v = ep then
15 foreach (cl , d3 ) s.t. ((cl , d3 ), (u, d1 )) ∈ E do
16 foreach d4 s.t. ((v, d2 ), (rl , d4 )) ∈ E do
17 if ((cl , d3 ), (rl , d4 )) ∈ E ∪ Q then
18 S ← S ∪ {((cl , d3 ), (rl , d4 ))};
19 Q ← Q ∪ {((cl , d3 ), (rl , d4 ))};
20 Ĝ ← G;
21 foreach e = ((u, d1 ), (v, d2 )) ∈ E do
22 if u and v are not in the same procedure then
23 Ĝ = Ĝ − {e};
24 Ĝ ← Ĝ ∪ S;

path, then by adding the edge e, the vertex (v, d2 ) also becomes accessible from
(sp , d3 ). Hence, it adds the shortcut edge ((sp , d3 ), (v, d2 )) to Q, so that it is later
added to the graph H. Moreover, if u is the start sp of the procedure p and v is
its end ep , then for every call vertex cl calling the procedure p and its respective
return-site rl , we can add summary edges that summarize the eﬀect of calling p
(lines 14–19). Finally, lines 20–24 compute Ĝ as discussed above.
Correctness. As argued above, every edge that is added to H is either intrapro-
cedural, a summary edge or a shortcut edge. Moreover, all such edges are added
to H, because H is constructed one edge at a time and every time an edge e
is added to H, all the summary/shortcut edges that might occur as a result
of adding e to H are added to the queue Q and hence later to H. Therefore,
Algorithm 1 correctly computes summary edges and the graph Ĝ.
Complexity. Note that the graph H has at most O(|E| · |D∗ |2 ) edges. Addition
of each edge corresponds to one iteration of the while loop at line 4 of Algo-
rithm 1. Moreover, each iteration takes O(|D∗ |) time, because the loop at line
11 iterates over at most |D∗ | possible values for d3 and the loops at lines 15
and 16 have constantly many iterations due to the bounded bandwidth assump-
128 K. Chatterjee et al.

tion (Section 2.1). Since |D∗ | = O(|D|) and |E| = O(n), the total runtime of
Algorithm 1 is O(|n| · |D|3 ). For a more detailed analysis, see [50, Appendix].

Step (5): Local Preprocessing

In this step, we compute the set Rlocal of local reachability edges, i.e. edges
of the form ((u, d1 ), (v, d2 )) such that u and v appear in the same bag b of a
tree decomposition Ti and (u, d1 ) (v, d2 ) in Ĝ. We write (u, d1 ) local (v, d2 )
to denote ((u, d1 ), (v, d2 )) ∈ Rlocal . Note that Ĝ has no interprocedural edges.
Hence, we can process each Ti separately. We use a divide-and-conquer technique
similar to the kernelization method used in [22] (Algorithm 2).
Algorithm 2 processes each tree decomposition Ti separately. When process-
ing T , it chooses a leaf bag bl of T and computes all-pairs reachability on the
induced subgraph Hl = Ĝ[V (bl ) × D∗ ], consisting of vertices that appear in bl .
Then, for each pair of vertices (u, d1 ) and (v, d2 ) s.t. u and v appear in bl and
(u, d1 ) (v, d2 ) in Hl , the algorithm adds the edge ((u, d1 ), (v, d2 )) to both
Rlocal and Ĝ (lines 7–9). Note that this does not change reachability relations in
Ĝ, given that the vertices connected by the new edge were reachable by a path
before adding it. Then, if bl is not the only bag in T , the algorithm recursively
calls itself over the tree decomposition T −bl , i.e. the tree decomposition obtained
by removing bl (lines 10–11). Finally, it repeats the reachability computation on
Hl (lines 12–14). The running time of the algorithm is O(n · |D∗ |3 ).

Algorithm 2: Local Preprocessing in Step (5)

1 Rlocal ← ∅;
2 foreach Ti do
3 computeLocalReachability(Ti );
4 Function computeLocalReachability(T )
5 Choose a leaf bag bl of T ;
6 bp ← parent of bl ;
7 foreach u, v ∈ V (bl ), d1 , d2 ∈ D∗ s.t. (u, d1 ) (v, d2 ) in Ĝ[V (bl ) × D∗ ]
do
8 Ĝ = Ĝ ∪ {((u, d1 ), (v, d2 ))};
9 Rlocal = Rlocal ∪ {((u, d1 ), (v, d2 ))};
10 if bp = null then
11 computeLocalReachability(T − bl );
12 foreach u, v ∈ V (bl ), d1 , d2 ∈ D∗ s.t. (u, d1 ) (v, d2 ) in
Ĝ[V (bl ) × D∗ ] do
13 Ĝ = Ĝ ∪ {((u, d1 ), (v, d2 ))};
14 Rlocal = Rlocal ∪ {((u, d1 ), (v, d2 ))};

Example 7. Consider the graph G and tree decomposition T given in Figure 6

and let D∗ = {0}, i.e. let Ĝ and Ḡ be isomorphic to G. Figure 7 illustrates the
Optimal and Parallel On-demand Data-ﬂow Analysis 129

steps taken by Algorithm 2. In each step, a bag is chosen and a local all-pairs
reachability computation is performed over the bag. Local reachability edges are
added to Rlocal and to Ĝ (if they are not already in Ĝ).
We now prove the correctness and establish the complexity of Algorithm 2.
Correctness. We prove that when computeLocalReachability(T ) ends, the set Rlocal
contains all the local reachability edges between vertices that appear in the
same bag in T. The proof is by induction on the size of T. If T consists of a
single bag, then the local reachability computation on Hl (lines 7–9) fills Rlocal
correctly. Now assume that T has n bags. Let H−l = Ĝ[∪bi ∈T,i =l V (bi ) × D∗ ].
Intuitively, H−l is the part of Ĝ that corresponds to other bags in T , i.e. every
bag except the leaf bag bl . After the local reachability computation at lines 7–
9, (v, d2 ) is reachable from (u, d1 ) in H−l only if it is reachable in Ĝ. This is
because (i) the vertices of Hl and H−l form a separation of Ĝ with separator
(V (bl ) ∩ V (bp )) × D∗ (Lemma 3) and (ii) all reachability information in Hl is
now replaced by direct edges (line 8). Hence, by induction hypothesis, line 11
finds all the local reachability edges for T − bl and adds them to both Rlocal and
Ĝ. Therefore, after line 11, for every u, v ∈ V (bl ), we have (u, d1 ) (v, d2 ) in
Hl iff (u, d1 ) (v, d2 ) in Ĝ. Hence, the final all-pairs reachability computation
of lines 12–14 adds all the local edges in bl to Rlocal .
Complexity. Algorithm 2 performs at most two local all-pair reachability com-
putations over the vertices appearing in each bag, i.e. O(t · |D∗ |) vertices. Each
such computation can be performed in O(t3 · |D∗ |3 ) using standard reachabil-
ity algorithms. Given that the Ti ’s have O(n) bags overall, the total runtime of
Algorithm 2 is O(n · t3 · |D∗ |3 ) = O(n · |D∗ |3 ). Note that the treewidth t is a
constant and hence the factor t3 can be removed.

Step (6): Ancestors Reachability Preprocessing

This step aims to find reachability relations between each vertex of a bag and
vertices that appear in the ancestors of that bag. As in the previous case, we
compute a set Ranc and write (u, d1 ) anc (v, d2 ) if ((u, d1 ), (v, d2 )) ∈ Ranc .
This step is performed by Algorithm 3. For each bag b and vertex (u, d) such
that u ∈ V (b) and each 0 ≤ j < dv , we maintain two sets: F (u, d, b, j) and
F (u, d, b, j) each containing a set of vertices whose first coordinate is in the
ancestor of b at depth j. Intuitively, the vertices in F (u, d, b, j) are reachable
from (u, d). Conversely, (u, d) is reachable from the vertices in F (u, d, b, j). At
first all F and F sets are initialized as ∅. We process each tree decomposition
Ti in a top-down manner and does the following actions at each bag:
– If a vertex u appears in both b and its parent bp , then the reachability data
computed for (u, d) at bp can also be used in b. So, the algorithm copies this
data (lines 4–7).
– If (u, d1 ) local (v, d2 ), then this reachability relation is saved in F and F
(lines 10–11). Also, any vertex that is reachable from (v, d2 ) is reachable from
(u, d1 ), too. So, the algorithm adds F (v, d2 , b, j) to F (u, d1 , b, j) (line 13). The
converse happens to F (line 14).
130 K. Chatterjee et al.

v1 v5 v1 v5

b4 b3
v6 v2 v6 v2
{v2, v6, v7} {v3, v4, v5}
v7 v3 v4 v7 v3 v4

v1 v5 v1 v5

b2 b1
v6 v2 v6 v2
{v2, v3, v5} {v1, v2, v5}
v7 v3 v4 v7 v3 v4

v1 v5 v1 v5

b2 b3
v6 v2 v6 v2
{v2, v3, v5} {v3, v4, v5}
v7 v3 v4 v7 v3 v4

v1 v5 v1 v5

b4
v6 v2 v6 v2
{v2, v6, v7}
v7 v3 v4 v7 v3 v4

Fig. 7: Local Preprocessing (Step 5) on the graph and decomposition of Figure 6

After the execution of Algorithm 3, we have (v, d2 ) ∈ F (u, d1 , b, j) iﬀ (i) (v, d2 )

is reachable from (u, d1 ) and (ii) u ∈ V (b) and v ∈ V (ajb ), i.e. v appears in the
ancestor of b at depth j. Conversely, (u, d1 ) ∈ F (v, d2 , b, j) iﬀ (i) (v, d2 ) is reach-
able from (u, d1 ) and (ii) v ∈ V (b) and u ∈ V (ajb ). Algorithm 3 has a runtime of
O(n · |D|3 · log n). See [17] for detailed proofs. In the next section, we show that
this runtime can be reduced to O(n · |D|3 ) using word tricks.

4.2 Word Tricks

We now show how to reduce the time complexity of Algorithm 3 from O(n ·
|D∗ |3 · log n) to O(n · |D∗ |3 ) using word tricks. The idea is to pack the F and F
sets of Algorithm 3 into words, i.e. represent them by a binary sequence.
Optimal and Parallel On-demand Data-ﬂow Analysis 131

Algorithm 3: Ancestors Preprocessing in Step (6)

1 foreach Ti = (Bi , ETi ) do
2 foreach b ∈ Bi in top-down order do
3 bp ← parent of b;
4 foreach u ∈ V (b) ∩ V (bp ), d ∈ D∗ do
5 foreach 0 ≤ j < db do
6 F (u, d, b, j) ← F (u, d, bp , j);
7 F (u, d, b, j) ← F (u, d, bp , j);
8 foreach u, v ∈ V (b), d1 , d2 ∈ D∗ do
9 if (u, d1 ) local (v, d2 ) then
10 F (u, d1 , b, db ) ← F (u, d1 , b, db ) ∪ {(v, d2 )};
11 F (v, d2 , b, db ) ← F (v, d2 , b, db ) ∪ {(u, d1 )};
12 foreach 0 ≤ j < db do
13 F (u, d1 , b, j) ← F (u, d1 , b, j) ∪ F (v, d2 , b, j);
14 F (v, d2 , b, j) ← F (v, d2 , b, j) ∪ F (u, d1 , b, j)
15 Ranc ← {((u, d1 ), (v, d2 )) | ∃b, j (v, d2 ) ∈ F (u, d1 , b, j) ∨ (u, d1 ) ∈ F (v, d2 , b, j)};

Given a bag b, we deﬁne δb as the sum of sizes of all ancestors of b. The tree
decompositions are balanced, so b has O(log n) ancestors. Moreover, the width
is t, hence δb = O(t · log n) = O(log n) for every bag b. We perform a top-down
pass of each tree decomposition Ti and compute δb for each b.
For every bag b, u ∈ V (b) and d1 ∈ D∗ , we store F (u, d1 , b, −) as a binary
sequence of length δb ·|D∗ |. The ﬁrst |V (b)|·|D∗ | bits of this sequence correspond
to F (u, d1 , b, db ). The next |V (bp )| · |D∗ | correspond to F (u, d1 , b, db − 1), and so
on. We use a similar encoding for F . Using this encoding, Algorithm 3 can be
rewritten by word tricks and bitwise operations as follows:
– Lines 5–6 copy F (u, d, bp , −) into F (u, d, b, −). However, we have to shift and
align the bits, so these lines can be replaced by

F (u, d, b, −) ← F (u, d, bp , −) |V (b)| · |D∗ |;

– Line 10 sets a single bit to 1.

– Lines 12–13 perform a union, which can be replaced by the bitwise OR
operation. Hence, these lines can be replaced by

F (u, d1 , b, −) ← F (u, d1 , b, −) OR F (v, d2 , b, −);

– Computations on F can be handled similarly.

Note that we do not need to compute Ranc explicitly given that our queries
can be written in terms of the F and F sets. It is easy to verify that using these
word tricks, every W operations in lines 6, 7, 13 and 14 are replaced by one or
two bitwise operations onwords. Hence, the overall runtime of Algorithm 3 is
n·|D ∗ |3 ·log n

reduced to O W = O(n · |D∗ |3 ).
132 K. Chatterjee et al.

4.3 Answering Queries

We now describe how to answer pair and single-source queries using the data
saved in the preprocessing phase.
Answering a Pair Query. Our algorithm answers a pair query from a vertex
(u, d1 ) to a vertex (v, d2 ) as follows:
(i) If u and v are not in the same flow graph, return 0 (no).
(ii) Otherwise, let Gi be the flow graph containing both u and v. Let bu = rb(u)
and bv = rb(v) be the root bags of u and v in Ti and let b = lca(bu , bv ).
(iii) If there exists a vertex w ∈ V (b) and d3 ∈ D∗ such that (u, d1 ) anc (w, d3 )
and (w, d3 ) anc (v, d2 ), return 1 (yes), otherwise return 0 (no).
Correctness. If there is a path P : (u, d1 ) (v, d2 ), then we claim P must pass
through a vertex (w, d3 ) with w ∈ V (b). If b = bu or b = bv , the claim is obviously
true. Otherwise, consider the path P : bu bv in the tree decomposition Ti .
This path passes through b (by definition of b). Let e = {b, b } be an edge of P .
Applying the cut property (Lemma 3) to e, proves that P must pass through a
vertex (w, d3 ) with w ∈ V (b ) ∩ V (b). Moreover, b is an ancestor of both bu and
bv , hence we have (u, d1 ) anc (w, d3 ) and (w, d3 ) anc (v, d2 ).
Complexity. Computing LCA takes O(1) time. Checking all possiblevertices
|D|

(w, d3 ) takes O(t · |D∗ |) = O(|D|). This runtime can be decreased to O log n
by word tricks.
Answering a Single-source Query. Consider a single-source query from a vertex
(u, d1 ) with u ∈ Vi . We can answer this query by performing |Vi | × |D∗ | pair
queries, i.e. by performing one pair query from (u, d1 ) to (v,d2 ) for each v ∈
Vi
∗ ∗ |D|
and d2 ∈ D . Since |D | = O(|D|), the total complexity is O |Vi | · |D| · log n
for answering a single-source query. Using
a more involved preprocessing method,
we can slightly improve this time to O |Vlog
i |·|D|
2

n . See [17] for more details. Based

on the results above, we now present our main theorem:

Theorem 1. Given an IFDS instance I = (G, D, F, M, ∪), our algorithm pre-

processes I in time O(n · |D|3 ) and can then answer each pair query and single-
source query in time

|D| n · |D|2

O and O , respectively.
log n log n

4.4 Parallelizability and Optimality

We now turn our attention to parallel versions of our query algorithms, as well
as cases where the algorithms are optimal.
Parallelizability. Assume we have k threads in our disposal.
Optimal and Parallel On-demand Data-ﬂow Analysis 133

1. Given a pair query of the form (u, d1 , v, d2 ), let bu (resp. bv ) be the root
bag u (resp. v), and b = lca(bu , bv ) the lowest common ancestor of bu and
bv . We partition the set V (b) × D∗ into k subsets {Ai }1≤i≤k . Then, thread
i handles the set Ai , as follows: for every pair (w, d3 ) ∈ Ai , the thread sets
the output to 1 (yes) iﬀ (u, d1 ) anc (w, d3 ) and (w, d3 ) anc (v, d2 ).
2. Recall that a single source query (u, d1 ) is answered by breaking it down to
|Vi | × |D∗ | pair queries, where Gi is the ﬂow graph containing u. Since all
such pair queries are independent, we parallelize them among k threads, and
further parallelize each pair query as described above.
|D|

With word tricks, parallel pair and single-source queries require O k·log n

n·|D|
and O k·log n time, respectively. Hence, for large enough k, each query re-
quires only O(1) time, and we achieve perfect parallelism.
Optimality. Observe that when |D| = O(1), i.e. when the domain is small, our
algorithm is optimal : the preprocessing runs in O(n), which is proportional to
the size of the input, and the pair query and single-source query run in times
O(1) and O(n/ log n), respectively, each case being proportional to the size of
the output. Small domains arise often in practice, e.g. in dead-code elimination
or null-pointer analysis.

5 Experimental Results

We report on an experimental evaluation of our techniques and compare their

performance to standard alternatives in the literature.
Benchmarks. We used 5 classical data-flow analyses in our experiments, including
reachability (for dead-code elimination), possibly-uninitialized variables analy-
sis, simple uninitialized variables analysis, liveness analysis of the variables, and
reaching-definitions analysis. We followed the specifications in [36] for model-
ing the analyses in IFDS. We used real-world Java programs from the DaCapo
benchmark suite [6], obtained their flow graphs using Soot [65] and applied the
JTDec tool [19] for computing balanced tree decompositions. Given that some
of these benchmarks are prohibitively large, we only considered their main Java
packages, i.e. packages containing the starting point of the programs. We ex-
perimented with a total of 22 benchmarks, which, together with the 5 analyses
above, led to a total of 110 instances. Our instance sizes, i.e. number of vertices
and edges in the exploded supergraph, range from 22 to 190, 591. See [17] for
details.
Implementation and comparison. We implemented both variants of our approach,
i.e. sequential and parallel, in C++. We also implemented the parts of the clas-
sical IFDS algorithm [50] and its on-demand variant [36] responsible for same-
context queries. All of our implementations closely follow the pseudocodes of
our algorithms and the ones in [50,36], and no additional optimizations are ap-
plied. We compared the performance of the following algorithms for randomly-
generated queries:
134 K. Chatterjee et al.

– SEQ. The sequential variant of our algorithm.

– PAR. A variant of our algorithm in which the queries are answered using
perfect parallelization and 12 threads.
– NOPP. The classical same-context IFDS algorithm of [50], with no prepro-
cessing. NOPP performs a complete run of the classic IFDS algorithm for
each query.
– CPP. The classical same-context IFDS algorithm of [50], with complete pre-
processing. In this algorithm, all summary edges and reachability information
are precomputed and the queries are simple table lookups.
– OD. The on-demand same-context IFDS algorithm of [36]. This algorithm
does not preprocess the input. However, it remembers the information ob-
tained in each query and uses it to speed-up the following queries.
For each instance, we randomly generated 10,000 pair queries and 100 single-
source queries. In case of single-source queries, source vertices were chosen uni-
formly at random. For pair queries, we ﬁrst chose a source vertex uniformly at
random, and then chose a target vertex in the same procedure, again uniformly
at random.

Experimental setting. The results were obtained on Debian using an Intel Xeon
E5-1650 processor (3.2 GHz, 6 cores, 12 threads) with 128GB of RAM. The
parallel results used all 12 threads.

Time limit. We enforced a preprocessing time limit of 5 minutes per instance.

This is in line with the preprocessing times of state-of-the-art tools on bench-
marks of this size, e.g. Soot takes 2-3 minutes to generate all ﬂow graphs for
each benchmark.

Fig. 8: Preprocessing times of CPP and SEQ/PAR (over all instances). A dot
above the 300s line denotes a timeout.
Optimal and Parallel On-demand Data-ﬂow Analysis 135

Results. We found that, except for the smallest instances, our algorithm consis-
tently outperforms all previous approaches. Our results were as follows:

Treewidth. The maximum width amongst the obtained tree decompositions

was 9, while the minimum was 1. Hence, our experiments confirm the results
of [34,19] and show that real-world Java programs have small treewidth.
See [17] for more details.
Preprocessing Time. As in Figure 8, our preprocessing is more lightweight
and scalable than CPP. Note that CPP preprocessing times out at 25 of
the 110 instances, starting with instances of size < 50, 000, whereas our
approach can comfortably handle instances of size 200, 000. Although the
theoretical worst-case complexity of CPP preprocessing is O(n2 · |D|3 ), we
observed that its runtime over our benchmarks grows more slowly. We believe
this is because our benchmark programs generally consist of a large number
of small procedures. Hence, the worst-case behavior of CPP preprocessing,
which happens on instances with large procedures, is not captured by the
DaCapo benchmarks. In contrast, our preprocessing time is O(n · |D|3 ) and
having small or large procedures does not matter to our algorithms. Hence,
we expect that our approach would outperform CPP preprocessing more
significantly on instances containing large functions. However, as Figure 8
demonstrates, our approach is faster even on instances with small procedures.
Query Time. As expected, in terms of pair query time, NOPP is the worst per-
former by a large margin, followed by OD, which is in turn extremely less
efficient than CPP, PAR and SEQ (Figure 9, top). This illustrates the un-
derlying trade-off between preprocessing and query-time performance. Note
that both CPP and our algorithms (SEQ and PAR), answer each pair query
in O(1). They all have pair-query times of less than a millisecond and are
indistinguishable in this case. The same trade-off appears in single-source
queries as well (Figure 9, bottom). Again, NOPP is the worst performer,
followed by OD. SEQ and CPP have very similar runtimes, except that SEQ
outperforms CPP in some cases, due to word tricks. However, PAR is ex-
tremely faster, which leads to the next point.
Parallelization. In Figure 9 (bottom right), we also observe that single-source
queries are handled considerably faster by PAR in comparison with SEQ.
Specifically, using 12 threads, the average single-source query time is re-
duced by a factor of 11.3. Hence, our experimental results achieve near-
perfect parallelism and confirm that our algorithm is well-suited for parallel
architectures.

Note that Figure 9 combines the results of all five mentioned data-flow analy-
ses. However, the observations above hold independently for every single analysis,
as well. See [17] for analysis-specific figures.
136 K. Chatterjee et al.

Fig. 9: Comparison of pair query time (top row) and single source query time
(bottom row) of the algorithms. Each dot represents one of the 110 instances.
Each row starts with a global picture (left) and zooms into smaller time units
(right) to diﬀerentiate between the algorithms. The plots above contain results
over all ﬁve analyses. However, our observations hold independently for every
single analysis, as well (See [17]).

6 Conclusion

We developed new techniques for on-demand data-ﬂow analyses in IFDS, by

exploiting the treewidth of flow graphs. Our complexity analysis shows that our
techniques (i) have better worst-case complexity, (ii) offer certain optimality
guarantees, and (iii) are embarrassingly paralellizable. Our experiments demon-
strate these improvements in practice: after a lightweight one-time preprocessing,
queries are answered as fast as the heavyweight complete preprocessing, and the
parallel speedup is close to its theoretical optimal. The main limitation of our ap-
proach is that it only handles same-context queries. Using treewidth to speedup
non-same-context queries is a challenging direction of future work.
Optimal and Parallel On-demand Data-flow Analysis 137

References

1. T. J. Watson libraries for analysis (WALA). https://github.com/wala/WALA

(2003)
2. Appel, A.W., Palsberg, J.: Modern Compiler Implementation in Java. Cambridge
University Press, 2nd edn. (2003)
3. Arzt, S., Rasthofer, S., Fritz, C., Bodden, E., Bartel, A., Klein, J., Le Traon, Y.,
Octeau, D., McDaniel, P.: FlowDroid: Precise context, flow, field, object-sensitive
and lifecycle-aware taint analysis for android apps. In: PLDI. pp. 259–269 (2014)
4. Babich, W.A., Jazayeri, M.: The method of attributes for data flow analysis. Acta
Informatica 10(3) (1978)
5. Bebenita, M., Brandner, F., Fahndrich, M., Logozzo, F., Schulte, W., Tillmann, N.,
Venter, H.: Spur: A trace-based JIT compiler for CIL. In: OOPSLA. pp. 708–725
(2010)
6. Blackburn, S.M., Garner, R., Hoffman, C., Khan, A.M., McKinley, K.S., Bentzur,
R., Diwan, A., Feinberg, D., Frampton, D., Guyer, S.Z., Hirzel, M., Hosking, A.,
Jump, M., Lee, H., Moss, J.E.B., Phansalkar, A., Stefanović, D., VanDrunen, T.,
von Dincklage, D., Wiedermann, B.: The DaCapo benchmarks: Java benchmarking
development and analysis. In: OOPSLA. pp. 169–190 (2006)
7. Bodden, E.: Inter-procedural data-flow analysis with IFDS/IDE and soot. In:
SOAP. pp. 3–8 (2012)
8. Bodden, E., Tolêdo, T., Ribeiro, M., Brabrand, C., Borba, P., Mezini, M.: Spllift:
Statically analyzing software product lines in minutes instead of years. In: PLDI.
pp. 355–364 (2013)
9. Bodlaender, H., Gustedt, J., Telle, J.A.: Linear-time register allocation for a fixed
number of registers. In: SODA (1998)
10. Bodlaender, H.L.: A linear-time algorithm for finding tree-decompositions of small
treewidth. SIAM Journal on computing 25(6), 1305–1317 (1996)
11. Bodlaender, H.L., Hagerup, T.: Parallel algorithms with optimal speedup for
bounded treewidth. SIAM Journal on Computing 27(6), 1725–1746 (1998)
12. Burgstaller, B., Blieberger, J., Scholz, B.: On the tree width of ada programs. In:
Ada-Europe. pp. 78–90 (2004)
13. Callahan, D., Cooper, K.D., Kennedy, K., Torczon, L.: Interprocedural constant
propagation. In: CC (1986)
14. Chatterjee, K., Choudhary, B., Pavlogiannis, A.: Optimal dyck reachability for
data-dependence and alias analysis. In: POPL. pp. 30:1–30:30 (2017)
15. Chatterjee, K., Goharshady, A., Goharshady, E.: The treewidth of smart contracts.
In: SAC (2019)
16. Chatterjee, K., Goharshady, A.K., Goyal, P., Ibsen-Jensen, R., Pavlogiannis, A.:
Faster algorithms for dynamic algebraic queries in basic RSMs with constant
treewidth. ACM Transactions on Programming Languages and Systems 41(4),
1–46 (2019)
17. Chatterjee, K., Goharshady, A.K., Ibsen-Jensen, R., Pavlogiannis, A.: Optimal
and perfectly parallel algorithms for on-demand data-flow analysis. arXiv preprint
2001.11070 (2020)
18. Chatterjee, K., Goharshady, A.K., Okati, N., Pavlogiannis, A.: Efficient parame-
terized algorithms for data packing. In: POPL. pp. 1–28 (2019)
19. Chatterjee, K., Goharshady, A.K., Pavlogiannis, A.: JTDec: A tool for tree decom-
positions in soot. In: ATVA. pp. 59–66 (2017)
138 K. Chatterjee et al.

20. Chatterjee, K., Ibsen-Jensen, R., Goharshady, A.K., Pavlogiannis, A.: Algorithms
for algebraic path properties in concurrent systems of constant treewidth com-
ponents. ACM Transactions on Programming Langauges and Systems 40(3), 9
(2018)
21. Chatterjee, K., Ibsen-Jensen, R., Pavlogiannis, A.: Optimal reachability and a
space-time tradeoff for distance queries in constant-treewidth graphs. In: ESA
(2016)
22. Chaudhuri, S., Zaroliagis, C.D.: Shortest paths in digraphs of small treewidth. part
i: Sequential algorithms. Algorithmica 27(3-4), 212–226 (2000)
23. Chaudhuri, S.: Subcubic algorithms for recursive state machines. In: POPL (2008)
24. Chen, T., Lin, J., Dai, X., Hsu, W.C., Yew, P.C.: Data dependence profiling for
speculative optimizations. In: CC. pp. 57–72 (2004)
25. Cousot, P., Cousot, R.: Static determination of dynamic properties of recursive
procedures. In: IFIP Conference on Formal Description of Programming Concepts
(1977)
26. Cygan, M., Fomin, F.V., Kowalik, L ., Lokshtanov, D., Marx, D., Pilipczuk, M.,
Pilipczuk, M., Saurabh, S.: Parameterized algorithms, vol. 4 (2015)
27. Duesterwald, E., Gupta, R., Soffa, M.L.: Demand-driven computation of interpro-
cedural data flow. POPL (1995)
28. Dutta, S.: Anatomy of a compiler. Circuit Cellar 121, 30–35 (2000)
29. Flückiger, O., Scherer, G., Yee, M.H., Goel, A., Ahmed, A., Vitek, J.: Correctness
of speculative optimizations with dynamic deoptimization. In: POPL. pp. 49:1–
49:28 (2017)
30. Giegerich, R., Möncke, U., Wilhelm, R.: Invariance of approximate semantics with
respect to program transformations. In: ECI (1981)
31. Gould, C., Su, Z., Devanbu, P.: Jdbc checker: A static analysis tool for SQL/JDBC
applications. In: ICSE. pp. 697–698 (2004)
32. Grove, D., Torczon, L.: Interprocedural constant propagation: A study of jump
function implementation. In: PLDI (1993)
33. Guarnieri, S., Pistoia, M., Tripp, O., Dolby, J., Teilhet, S., Berg, R.: Saving the
world wide web from vulnerable javascript. In: ISSTA. pp. 177–187 (2011)
34. Gustedt, J., Mæhle, O.A., Telle, J.A.: The treewidth of java programs. In:
ALENEX. pp. 86–97 (2002)
35. Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors.
SIAM Journal on Computing 13(2), 338–355 (1984)
36. Horwitz, S., Reps, T., Sagiv, M.: Demand interprocedural dataflow analysis. ACM
SIGSOFT Software Engineering Notes (1995)
37. Hovemeyer, D., Pugh, W.: Finding bugs is easy. ACM SIGPLAN Notices 39(12),
92–106 (Dec 2004)
38. Klaus Krause, P., Larisch, L., Salfelder, F.: The tree-width of C. Discrete Applied
Mathematics (03 2019)
39. Knoop, J., Steffen, B.: The interprocedural coincidence theorem. In: CC (1992)
40. Krüger, S., Späth, J., Ali, K., Bodden, E., Mezini, M.: CrySL: An Extensible
Approach to Validating the Correct Usage of Cryptographic APIs. In: ECOOP.
pp. 10:1–10:27 (2018)
41. Lee, Y.f., Marlowe, T.J., Ryder, B.G.: Performing data flow analysis in parallel.
In: ACM/IEEE Supercomputing. pp. 942–951 (1990)
42. Lee, Y.F., Ryder, B.G.: A comprehensive approach to parallel data flow analysis.
In: ICS. pp. 236–247 (1992)
Optimal and Parallel On-demand Data-flow Analysis 139

43. Lin, J., Chen, T., Hsu, W.C., Yew, P.C., Ju, R.D.C., Ngai, T.F., Chan, S.: A com-
piler framework for speculative optimizations. ACM Transactions on Architecture
and Code Optimization 1(3), 247–271 (2004)
44. Muchnick, S.S.: Advanced Compiler Design and Implementation. Morgan Kauf-
mann (1997)
45. Naeem, N.A., Lhoták, O., Rodriguez, J.: Practical extensions to the ifds algorithm.
CC (2010)
46. Nanda, M.G., Sinha, S.: Accurate interprocedural null-dereference analysis for java.
In: ICSE. pp. 133–143 (2009)
47. Rapoport, M., Lhoták, O., Tip, F.: Precise data flow analysis in the presence of
correlated method calls. In: SAS. pp. 54–71 (2015)
48. Reps, T.: Program analysis via graph reachability. ILPS (1997)
49. Reps, T.: Undecidability of context-sensitive data-dependence analysis. ACM
Transactions on Programming Languages and Systems 22(1), 162–186 (2000)
50. Reps, T., Horwitz, S., Sagiv, M.: Precise interprocedural dataflow analysis via
graph reachability. In: POPL. pp. 49–61 (1995)
51. Reps, T.: Demand interprocedural program analysis using logic databases. In: Ap-
plications of Logic Databases, vol. 296 (1995)
52. Robertson, N., Seymour, P.D.: Graph minors. iii. planar tree-width. Journal of
Combinatorial Theory, Series B 36(1), 49–64 (1984)
53. Rodriguez, J., Lhoták, O.: Actor-based parallel dataflow analysis. In: CC. pp. 179–
197 (2011)
54. Rountev, A., Kagan, S., Marlowe, T.: Interprocedural dataflow analysis in the
presence of large libraries. In: CC. pp. 2–16 (2006)
55. Sagiv, M., Reps, T., Horwitz, S.: Precise interprocedural dataflow analysis with
applications to constant propagation. Theoretical Computer Science (1996)
56. Schubert, P.D., Hermann, B., Bodden, E.: PhASAR: An inter-procedural static
analysis framework for C/C++. In: TACAS. pp. 393–410 (2019)
57. Shang, L., Xie, X., Xue, J.: On-demand dynamic summary-based points-to analy-
sis. In: CGO. pp. 264–274 (2012)
58. Sharir, M., Pnueli, A.: Two approaches to interprocedural data flow analysis. In:
Program flow analysis: Theory and applications. Prentice-Hall (1981)
59. Smaragdakis, Y., Bravenboer, M., Lhoták, O.: Pick your contexts well: Under-
standing object-sensitivity. In: POPL. pp. 17–30 (2011)
60. Späth, J., Ali, K., Bodden, E.: Context-, flow-, and field-sensitive data-flow analysis
using synchronized pushdown systems. In: POPL. pp. 48:1–48:29 (2019)
61. Sridharan, M., Bodı́k, R.: Refinement-based context-sensitive points-to analysis for
java. ACM SIGPLAN Notices 41(6), 387–400 (2006)
62. Sridharan, M., Gopan, D., Shan, L., Bodı́k, R.: Demand-driven points-to analysis
for java. In: OOPSLA. pp. 59–76 (2005)
63. Thorup, M.: All structured programs have small tree width and good register
allocation. Information and Computation 142(2), 159–181 (1998)
64. Torczon, L., Cooper, K.: Engineering a Compiler. Morgan Kaufmann, 2nd edn.
(2011)
65. Vallée-Rai, R., Co, P., Gagnon, E., Hendren, L.J., Lam, P., Sundaresan, V.: Soot
- a Java bytecode optimization framework. In: CASCON. p. 13 (1999)
66. Xu, G., Rountev, A., Sridharan, M.: Scaling cfl-reachability-based points-to anal-
ysis using context-sensitive must-not-alias analysis. In: ECOOP (2009)
67. Yan, D., Xu, G., Rountev, A.: Demand-driven context-sensitive alias analysis for
java. In: ISSTA. pp. 155–165 (2011)
140 K. Chatterjee et al.

68. Yuan, X., Gupta, R., Melhem, R.: Demand-driven data ﬂow analysis for commu-
nication optimization. Parallel Processing Letters 07(04), 359–370 (1997)
69. Zheng, X., Rugina, R.: Demand-driven alias analysis for c. In: POPL. pp. 197–208
(2008)

Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/
4.0/), which permits use, sharing, adaptation, distribution and reproduction in any
medium or format, as long as you give appropriate credit to the original author(s) and
the source, provide a link to the Creative Commons license and indicate if changes
were made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will need
to obtain permission directly from the copyright holder.
Concise Read-Only Specifications for
Better Synthesis of Programs with Pointers

Andreea Costea1 , Amy Zhu2 , Nadia Polikarpova3 , and Ilya Sergey4,1
1
School of Computing, National University of Singapore, Singapore
2
University of British Columbia, Vancouver, Canada
3
University of California, San Diego, USA
4
Yale-NUS College, Singapore
Abstract. In program synthesis there is a well-known trade-off between
concise and strong specifications: if a specification is too verbose, it might
be harder to write than the program; if it is too weak, the synthesised
program might not match the user’s intent. In this work we explore the
use of annotations for restricting memory access permissions in program
synthesis, and show that they can make specifications much stronger
while remaining surprisingly concise. Specifically, we enhance Synthetic
Separation Logic (SSL), a framework for synthesis of heap-manipulating
programs, with the logical mechanism of read-only borrows.
We observe that this minimalistic and conservative SSL extension bene-
fits the synthesis in several ways, making it more (a) expressive (stronger
correctness guarantees are achieved with a modest annotation overhead),
(b) effective (it produces more concise and easier-to-read programs),
(c) efficient (faster synthesis), and (d) robust (synthesis efficiency is
less affected by the choice of the search heuristic). We explain the in-
tuition and provide formal treatment for read-only borrows. We sub-
stantiate the claims (a)–(d) by describing our quantitative evaluation of
the borrowing-aware synthesis implementation on a series of standard
benchmark specifications for various heap-manipulating programs.

1 Introduction
Deductive program synthesis is a prominent approach to the generation of correct-
by-construction programs from their declarative specifications [14, 23, 29, 33].
With this methodology, one can represent searching for a program satisfying the
user-provided constraints as a proof search in a certain logic. Following this idea,
it has been recently observed [34] that the synthesis of correct-by-construction
imperative heap-manipulating programs (in a language similar to C) can be im-
plemented as a proof search in a version of Separation Logic (SL)—a program
logic designed for modular verification of programs with pointers [32, 37].
SL-based deductive program synthesis based on Synthetic Separation Logic
(SSL) [34] requires the programmer to provide a Hoare-style specification for a
program of interest. For instance, given the predicate ls(x, S), which denotes a
symbolic heap corresponding to a linked list starting at a pointer x, ending with
null, and containing elements from the set S, one can specify the behaviour of
the procedure for copying a linked list as follows:
{r → x ∗ ls(x, S)} listcopy(r) {r → y ∗ ls(x, S) ∗ ls(y, S)} (1)

Work done during an internship at NUS School of Computing in Summer 2019.

c The Author(s) 2020

P. Müller (Ed.): ESOP 2020, LNCS 12075, pp. 141–168, 2020.
https://doi.org/10.1007/978-3-030-44914-8_ 6
142 A. Costea et al.

ls(nxt, S′)

content x v nxt v′ nxt′ ⁄ w null
address r x x+1 nxt nxt + 1

ls(x, S)

The precondition of specification (1), defining the shape of the initial heap,
is illustrated by the figure above. It requires the heap to contain a pointer r,
which is taken by the procedure as an argument and whose stored value, x, is the
head pointer of the list to be copied. The list itself is described by the symbolic
heap predicate instance ls(x, S), whose footprint is assumed to be disjoint from
the entry r → x, following the standard semantics of the separating conjunction
operator (∗) [32]. The postcondition asserts that the final heap, in addition to
containing the original list ls(x, S), will contain a new list starting from y whose
contents S are the same as of the original list, and also that the pointer r will now
point to the head y of the list copy. Our specification is incomplete: it allows, for
example, duplicating or rearranging elements. One hopes that such a program
is unlikely to be synthesised. In synthesis, it is common to provide incomplete
specs: writing complete ones can be as hard as writing the program itself.

1.1 Correct Programs that Do Strange Things

Provided the definition of the heap predi- 1 void listcopy (loc r) {
cate ls and the specification (1), the SuS- 2 let x = *r;
Lik tool, an implementation of the SSL- 3 if (x == 0) {
based synthesis [34], will produce the pro- 4 } else {
5 let v = *x;
gram depicted in Fig. 1. It is easy to check 6 let nxt = *(x + 1);
that this program satisfies the ascribed 7 *r = nxt;
spec (1). Moreover, it correctly duplicates 8 listcopy(r);
the original list, faithfully preserving its 9 let y1 = *r;
contents and the ordering. However, an 10 let y = malloc(2);
11 *(x + 1) = y1;
astute reader might notice a certain odd- 12 *r = y;
ity in the way it treats the initial list pro- 13 *(y + 1) = nxt;
vided for copying. According to the post- 14 *y = v;
condition of (1), the value of the pointer 15 } }
r stored in a local immutable variable y1
ls(x, S)

on line 9 is the head of the copy of the
y v y1 v′ ⁄ ⁄
original list’s tail. Quite unexpectedly, the
r x x+1 nxt nxt + 1
pointer y1 becomes the tail of the original
list on line 11, while the original list’s tail v nxt v′ ⁄ ⁄

pointer nxt, once assigned to *(y + 1) on y y+1 y1 y1 + 1
line 13, becomes the tail of the copy! ls(y, S)
Indeed, the exercise in tail swapping is Fig. 1: Result program for spec (1)
totally pointless: not only does it produces and the shape of its ﬁnal heap.
less “natural” and readable code, but the
resulting program’s locality properties are unsatisfactory; for instance, this pro-
Concise Read-Only Speciﬁcations for Better Synthesis 143

gram cannot be plugged into a concurrent setting where multiple threads rely
on ls(x, S) to be unchanged.
The issue with the result in Fig. 1 is caused by speciﬁcation (1) being too
permissive: it does not prevent the synthesised program from modifying the
structure of the initial list, while creating its copy. Luckily, the SL community has
devised a number of SL extensions that allow one to impose such restrictions, like
declaring a part of the provided symbolic heap as read-only [5, 8, 9, 11, 15, 20, 21],
i.e., forbidden to modify by the speciﬁed code.

1.2 Towards Simple Read-Only Speciﬁcations for Synthesis

The main challenge of introducing read-only annotations (commonly also re-
ferred to as permissions)5 into Separation Logic lies in establishing the disci-
pline for performing sound accounting in the presence of mixed read-only and
mutating heap accesses by diﬀerent components ofa program.
As an example, consider a simple symbolic heap x → f ∗ r → h that declares
M M

two mutable (i.e., allowed to be written to) pointers x and r, that point to
unspeciﬁed values f and h, correspondingly. With this symbolic heap, is it safe
to call the following function that modiﬁes the contents of r but not of x?

RO M RO M
x → f ∗ r → h readX(x, r) x → f ∗ r → f (2)

The precondition of readX requires a weaker form of access permission for x

(read-only, RO), while the considered heap asserts a stronger write permission
(M). It should be possible to satisfy readX’s requirement by providing the nec-
essary read-only permission for x. To do so, we need to agree on a discipline to
“adapt” the caller’s write-permission M to the callee’s read-only permission RO.
While seemingly trivial, if implemented naı̈vely, accounting of RO permissions in
SL might compromise either soundness or completeness of the logical reasoning.
A number of proposals for logically sound interplay between write- and read-
only access permissions in the presence of function calls has been described in
the literature [7–9, 11, 13, 20, 30]. Some of these works manage to maintain the
simplicity of having only mutable/read-only annotations when confined to the
sequential setting [9,11,13]. More general (but harder to implement) approaches
rely on fractional permissions [8,25], an expressive mechanism for permission ac-
counting, with primary applications in concurrent reasoning [7, 28]. We started
this project by attempting to adapt some of those logics [9,11,13] as an extension
of SSL in order to reap the benefits of read-only annotations for the synthesis
of sequential program. The main obstacle we encountered involved definitions
of inductive heap predicates with mixed permissions. For instance, how can one
specify a program that modifies the contents of a linked list, but not its struc-
ture? Even though it seemed possible to enable this treatment of predicates via
permission multiplication [25], developing support for this machinery on top of
existing SuSLik infrastructure was a daunting task. Therefore, we had to look
for a technically simpler solution.

5
We will be using the words “annotation” and “permission” interchangeably.
144 A. Costea et al.

1.3 Our Contributions

Theoretical Contributions. Our main conceptual innovation is the idea of in-
strumenting SSL with symbolic read-only borrows to enable faster and more
predictable program synthesis. Borrows are used to annotate symbolic heaps
in specifications, similarly to abstract fractional permissions from the deductive
verification tools, such as Chalice and VeriFast [20,21,27]. They enable simple
but principled lightweight threading of heap access permissions from the callers
to callees and back, while enforcing read-only access whenever it is required. For
basic intuition on read-only borrows, consider the specification below:

a b M a b M
x → f ∗ y → g ∗ r → h readXY(x, y, r) x → f ∗ y → g ∗ r → (f + g) (3)
The precondition requires a heap with three pointers, x, y, and r, pointing to
unspecified f, g, and h, correspondingly. Both x and y are going to be treated as
read-only, but now, instead of simply annotating them with RO, we add symbolic
borrowing annotations a and b. The semantics of these borrowing annotations
is the same as that of other ghost variables (such as f). In particular, the callee
must behave correctly for any valuation of a and b, which leaves it no choice
but to treat the corresponding heap fragments as read-only (hence preventing
the heap fragments from being written). On the other hand, from the perspec-
tive of the caller, they serve as formal parameters that are substituted with
actuals
of caller’s choosing: for instance, when invoked with a caller’s symbolic
heap x → 1 ∗ y → 2 ∗ r → 0 (where c denotes a read-only borrow of the caller),
M c M

readXY is guaranteed to “restore” the same access permissions in the postcondi-

tion, as per the substitution [M/a, c/b]. The example above demonstrates that
read-only borrows are straightforward to compose when reasoning about code
with function calls. They also make it possible to define borrow-polymorphic
inductive heap predicates, e.g., enhancing ls from spec (1) so it can be used in
specifications with mixed access permissions on their components.6 Finally, read-
only borrows make it almost trivial to adapt the existing SSL-based synthesis
to work with read-only access permissions; they reduce the complex permission
accounting to easy-to-implement permission substitution.
Practical Contributions. Our first practical contribution is ROBoSuSLik—an
enhancement of the SuSLik synthesis tool [34] with support for read-only bor-
rows, which required us to modify less than 100 lines of the original code.
Our second practical contribution is the extensive evaluation of synthesis with
read-only permissions, on a standard benchmark suite of specifications for heap-
manipulating programs. We compare the behaviour, performance, and the out-
comes of the synthesis when run with the standard (“all-mutable”) specifications
and their analogues instrumented with read-only permissions wherever reason-
able. By doing so, we substantiate the following claims regarding the practical
impact of using read-only borrows in SSL specifications:
– First, we show that synthesis of read-only specifications is more efficient: it
does less backtracking while searching for a program that satisfies the imposed
constraints, entailing better performance.
6
We will present borrow-polymorphic inductive heap predicates in Sec. 2.4.
Concise Read-Only Specifications for Better Synthesis 145

– Second, we demonstrate that borrowing-aware synthesis is more eﬀective:

specifications with read-only annotations lead to more concise and human-
readable programs, which do not perform redundant operations.
– Third, we observe that read-only borrows increase expressivity of the synthe-
sis: in most of the cases enhanced specifications provide stronger correctness
guarantees for the results, at almost no additional annotation overhead.
– Finally, we show that read-only borrows make the synthesis more robust: its
results and performance are less likely to be affected by the unification order
or the order of the attempted rule applications during the search.

Paper Outline. We start by showcasing the intricacies and the virtues of SSL-
based synthesis with read-only speciﬁcations in Sec. 2. We provide the formal
account of read-only borrows and present the modiﬁed SSL rules, along with
the soundness argument in Sec. 3. We report on the implementation and evalu-
ation of the enhanced synthesis in Sec. 4. We conclude with a discussion on the
limitations of read-only borrows in Sec. 5 and compare to related work in Sec. 6.

2 Program Synthesis with Read-Only Borrows

We introduce the enhancement of SSL with read-only borrows by walking the
reader through a series of small but characteristic examples of deductive syn-
thesis with separation logic. We provide the necessary background on SSL in
Sec. 2.1; the readers familiar with the logic may want to skip to Sec. 2.2.
2.1 Basics of SSL-based Deductive Program Synthesis
In a deductive Separation Logic-based synthesis, a client provides a specifica-
tion of a function of interest as a pair of pre- and post-conditions, such as
{P} void foo(loc x, int i) {Q}. The precondition P constrains the symbolic
state necessary to run the function safely (i.e., without crashes), while the post-
condition Q constrains the resulting state at the end of the function’s execution.
A function body c satisfying the provided specification is obtained as a result of
deriving the SSL statement, representing the synthesis goal :
{x, i} ; {P} ; {Q}| c
In the statement above, x and i are program variables, and they are explicitly
stated in the environment Γ = {x, i}. Variables that appear in {P} and that are
not program variables are called (logical) ghost variables, while the non-program
variables that only appear in {Q} are referred to as (logical) existential ones (EV).
The meaning of the statement Γ ; {P} ; {Q}| c is the validity of the Hoare-style
triple {P} c {Q} for all possible values of variables from Γ .7 Both pre- and
postcondition contain a spatial part describing the shape of the symbolic state
(spatial formulae are ranged over via P, Q, and R), and a pure part (ranged over
via φ, ψ, and ξ), which states the relations between variables (both program
and logical). A derivation of an SSL statement is conducted by applying logical
7
We often care only about the existence of a program c to be synthesised, not its
specific shape. In those cases we will be using a shorter statement: Γ ; {P} ; {Q}.
146 A. Costea et al.

rules, which reduce the initial goal to a trivial one, so it can be solved by one of
the terminal rules, such as, e.g., the rule Emp shown below:
φ⇒ψ
Emp
Γ ; {φ; emp} ; {ψ; emp}| skip
That is, Emp requires that (i) symbolic heaps in both pre- and post-conditions
are empty and (ii) that the pure part φ of the precondition implies the pure
part ψ of the postcondition. As the result, Emp “emits” a trivial program skip.
Some of the SSL rules are aimed at simplifying the goal, bringing it to the shape
that can be solved with Emp. For instance, consider the following rules:
Frame UnifyHeaps
EV (Γ, P, Q) ∩ Vars (R) = ∅ [σ]R = R ∅ = dom (σ)
⊆ EV (Γ, P, Q)
Γ ; {φ; P} ; {ψ; Q}| c Γ ; {φ; P ∗ R} ; [σ] ψ; Q ∗ R c

Γ ; {φ; P ∗ R} ; {ψ; Q ∗ R}| c Γ ; {φ; P ∗ R} ; ψ; Q ∗ R c

Neither of the rules Frame and UnifyHeaps “adds” to the program c being
synthesised. However, Frame reduces the goal by removing a matching part R
(a.k.a. frame) from both the pre- and the post-condition. UnifyHeaps non-
deterministically picks a substitution σ, which replaces existential variables in a
sub-heap R of the postcondition to match the corresponding symbolic heap R in
the precondition. Both of these rules make choices with regard to what frame R
to remove or which substitution σ to adopt—a point that will be of importance
for the development described in Sec. 2.2.
Finally, the following (simpliﬁed) rule for producing a write command is oper-
ational, as it emits a part of the program to be synthesised, while also modifying
the goal accordingly. The resulting program will, thus, consist of the emitted
store ∗x = e of an expression e to the pointer variable x. The remainder is syn-
thesised by solving the sub-goal produced by applying the Write rule.
Vars (e) ⊆ Γ e = e
Γ ; {φ; x → e ∗ P} ; {ψ; x → e ∗ Q}| c
Write
Γ; φ; x → e ∗ P ; {ψ; x → e ∗ Q} ∗x = e; c

As it is common with proof search, should no rule apply to an intermediate

goal within one of the derivations, the deductive synthesis back-tracks, possibly
discarding a partially synthesised program fragment, trying alternative deriva-
tion branches. For instance, firing UnifyHeaps to unify wrong sub-heaps might
lead the search down a path to an unsatisfiable goal, eventually making the
synthesis back-track and leading to longer search. Consider also a misguided
application of Write into a certain location, which can cause the synthesizer to
generate a less intuitive program that “makes up” for the earlier spurious writes.
This is precisely what we are going to fix by introducing read-only annotations.
2.2 Reducing Non-Determinism with Read-Only Annotations
Consider the following example adapted from the original SSL paper [34]. While
the example is intentionally artificial, it captures a frequent synthesis scenario—
non-determinism during synthesis. This specification allows a certain degree of
freedom in how it can be satisfied:
Concise Read-Only Specifications for Better Synthesis 147

{x → 239 ∗ y → 30} void pick(loc x, loc y) {z ≤ 100; x → z ∗ y → z} (4)

It seems logical for the synthesis to start the program derivation by applying
the rule UnifyHeaps, thus reducing the initial goal to the one of the form
{x, y} ; {x → 239 ∗ y → 30} ; {239 ≤ 100; x → 239 ∗ y → 239}
This new goal has been obtained by picking one particular substitution σ =
[239/z] (out of multiple possible ones), which delivers two identical heaplets of the
form x → 239 in pre- and postcondition. It is time for the Write rule to strike to
fix the discrepancy between the symbolic heap in the pre- and postcondition by
emitting the command ∗y = 239 (at last, some executable code!), and resulting in
the following new goal (notice the change of y-related entry in the precondition):
{x, y} ; {x → 239 ∗ y → 239} ; {239 ≤ 100; x → 239 ∗ y → 239}
What follows are two applications of the Frame rule to the common symbolic
heaps, leading to the goal: {x, y} {emp} ; {239 ≤ 100; emp}. At this point, we
are clearly in trouble. The pure part of the precondition is simply true, while the
postcondition’s pure part is 239 ≤ 100, which is unsolvable.
Turns out that our initial pick of the substitution σ = [239/z] was an unfor-
tunate one, and we should discard the series of rule applications that followed
it, back-track and adopt a different substitution, e.g., σ = [30/z], which will
indeed result in solving our initial goal.8
Let us now consider the same specification for pick that has been enhanced
by explicitly
annotating parts of the symbolic heap as mutable and read-only:

M RO M RO
x → 239 ∗ y → 30 void pick(loc x, loc y) z ≤ 100; x → z ∗ y → z (5)
In this version of SSL, the effect of rules such as Emp, Frame, and UnifyHeaps
remains the same, while operational rules such as Write, become annotation-
aware. Specifically, the rule Write is now replaced by the following one:

e = e
M M
Vars (e) ⊆ Γ Γ ; φ; x → e ∗ P ; ψ; x → e ∗ Q c
WriteRO

Γ; φ; x → e ∗ P ; ψ; x → e ∗ Q ∗x = e; c
M M

Notice how in the rule above the heaplets of the form x → e are now anno-
M

tated with the access permission M, which explicitly indicates that the code may
modify the corresponding heap location.
Following with the example speciﬁcation (5), we can imagine a similar scenario
when the rule UnifyHeaps picks the substitution σ = [239/z]. Should this be
the case, the next application of the rule WriteRO will not be possible, due to
y → 239 in the resulting sub-goal:
RO
the read-only annotation

on the heaplet

M RO M RO
{x, y} ; x → 239 ∗ y → 30 ; z ≤ 100; x → 239 ∗ y → 239
As the RO access permission prevents the synthesised code from modifying the
greyed heaplets, the synthesis search is forced to back-track, picking an alterna-
tive substitution σ =[30/z] and converging on the desirable program ∗x=30.
8
One might argue that it was possible to detect the unsolvable conjunct 239 ≤ 100 in
the postcondition immediately after performing substitution, thus sparing the need
to proceed with this derivation further. This is, indeed, a possibility, but in general
it is hard to argue which of the heuristics in applying the rules will work better in
general. We defer the quantitative argument on this matter until Sec. 4.4.
148 A. Costea et al.

2.3 Composing Read-Only Borrows

Having synthesised the pick function from speciﬁcation (5), we would like to
use it in future programs. For example, imagine that at some point, while syn-
thesising another program, we see the following as an intermediate goal:

M M M M
{u, v} ; u → 239 ∗ v → 30 ∗ P ; w ≤ 200; u → w ∗ v → w ∗ Q (6)
It is clear that, modulo the names of the variables, we can synthesise a part of
the desired program by emitting a call pick(u, v), which we can then reduce to
the goal {u, v} {P} ; {w ≤ 200; Q} via an application of Frame.
Why is emitting such a call to pick() safe? Intuitively, this can be done because
the precondition of the spec (5) is weaker than the one in the goal (6). Indeed, the
precondition of the latter provides the full (mutable) access permission on the
heap portion v → 30, while the pre/postcondition of former requires a weaker
M

form of access, namely read-only: y → 30. Therefore, our logical foundations

should allow temporary “downgrading” of an access permission, e.g., from M to

RO, for the sake of synthesising calls. While allowing this is straightforward and
can be done similarly to up-casting a type in languages like Java, what turns out
to be less trivial is making sure that the caller’s initial stronger access permission
(M) is restored once pick(u, v) returns.
Non-solutions. Perhaps, the simplest way to allow the call to a function with a
weaker (in terms of access permissions) specification, would be to (a) downgrade
the caller’s permissions on the corresponding heap fragments to RO, and (b)
recover the permissions as per the callee’s specification. This approach signif-
icantly reduces the expressivity of the logic (and, as a consequence, complete-
ness of the synthesis). For instance, adopting this strategy for using specifica-
tion (5) in the goal (6) would result in the unsolvable
sub-goal of the form
M RO M M
{u, v} ; u → 30 ∗ v → 30 ∗ P ; u → 30 ∗ v → 30 ∗ Q . This is due to the fact that
the postcondition requires the heaplet v → 30 to have the write-permission M,
M

while the new precondition only provides the RO-access.

Another way to cater for a weaker callee’s speciﬁcation would be to “chip
out” a RO-permission from a caller’s M-annotation (in the spirit of fractional
permissions), oﬀer it to the callee, and then “merge” it back to the caller’s full-
blown permission upon return. This solution works for simple examples, but not
for heap predicates with mixed permissions (discussion in Sec. 6). Yet another
approach would be to create a “RO clone” of the caller’s M-annotation, introduc-
ing an axiom of the form x → t x → t ∗ x → t. The created component x → t
M M RO RO

could be provided to the callee and discarded upon return since the caller re-
tained the full permission of the original heap. Several works on RO permissions
have adopted this approach [9, 11, 13]. While discarding such clones works just
fine for sequential program verification, in the case of synthesis guided by pre-
and postconditions, incomplete postconditions could lead to intractable goals.
Our solution. The key to gaining the necessary expressivity wrt. passing/return-
ing access permissions, while maintaining a sound yet simple logic, is treating
access permissions as first-class values. A natural consequence of this treatment
is that immutability annotations can be symbolic (i.e., variables of a special sort
Concise Read-Only Specifications for Better Synthesis 149

“permission”), and the semantics of such variables is well understood; we refer to

these symbolic annotations as read-only borrows.9 For instance, using borrows,
we can represent the speciﬁcation (5) as an equivalent one:

M a M a
x → 239 ∗ y → 30 void pick(loc x, loc y) z ≤ 100; x → z ∗ y → z (7)

The only substantial difference with spec (5) is that now the pointer y’s access
permission is given an explicit name a. Such named annotations (a.k.a. borrows)
are treated as RO by the callee, as long as the pure precondition does not con-
strain them to be mutable. However, giving these permissions names achieves
an important goal: performing accurate accounting while composing specifica-
tions with different access permissions. Specifically, we can now emit a call to
pick(u, v) as specified by (7) from the goal (6), keeping in mind the substitution
σ = [u/x, v/y, M/a]. This call now accounts for borrows as well, and makes it
straightforward to restore v’s original permission M upon returning.
Following the same idea, borrows can be naturally composed through capture-
avoiding substitutions. For instance, the same specification (7) of pick could be
used to advance the following modified version of the goal (6):

M c M c
{u, v} ; u → 239 ∗ v → 30 ∗ P ; w ≤ 210; u → w ∗ v → w ∗ Q

by means of taking the substitution σ = [u/x, v/y, c/a].

2.4 Borrow-Polymorphic Inductive Predicates
Separation Logic owes its glory to the extensive use of inductive heap predicates—
a compact way to capture the shape and the properties of finite heap fragments
corresponding to recursive linked data structures. Below we provide one of the
most widely-used SL predicates, defining the shape of a heap containing a null-
terminated singly-linked list with elements from a set S:
ls(x, S) x = 0 ∧ {S = ∅; emp}
(8)
| x= 0 ∧ {S = {v} ∪ S1 ; [x, 2] ∗ x → v ∗ x, 1 → nxt ∗ ls(nxt, S1 )}
The predicate contains two clauses describing the corresponding cases of the
list’s shape depending on the value of the head pointer x. If x is zero, the list’s
heap representation is empty, and so is the set of elements S. Alternatively, if x
is not zero, it stores a record with two items (indicated by the block assertion
[x, 2]), such that the payload pointer x contains the value v (where S = {v} ∪ S1
for some set S1 ), and the pointer, corresponding to x + 1 (denoted as x, 1
)
contains the address of the list’s tail, nxt.
While expressive enough to specify and enable synthesis of various list-traversing
and list-generating recursive functions via SSL, the definition (8) does not allow
one to restrict the access permissions to different components of the list: all of
the involved memory locations can be mutated (which explains the synthesis
issue we described in Sec. 1.1). To remedy this weakness of the traditional SL-
style predicates, we propose to parameterise them with read-only borrows, thus
making them aware of different access permissions to their various components.
For instance, we propose to redefine the linked list predicate as follows:
9
In this regard, our symbolic borrows are very similar to abstract fractional permis-
sions in Chalice and VeriFast [21, 27]. We discuss the relation in detail in Sec. 6.
150 A. Costea et al.

ls(x, S, a, b, c) x = 0 ∧ {S = ∅; emp}

b c (9)
| x = 0 ∧ S = {v} ∪ S1 ; [x, 2]a ∗ x → v ∗ x, 1 → nxt ∗ ls(nxt, S1 , a, b, c)
The new definition (9) is similar to the old one (8), but now, in addition to
the standard predicate parameters (i.e., the head pointer x and the set S in this
case), also features three borrow parameters a, b, and c that stand as place-
holders for the access permissions to some particular components of the list.
Specifically, the symbolic borrows b and c control the permissions to manipulate
the pointers x and x + 1, correspondingly. The borrow a, modifying a block-
type heaplet, determines whether the record starting at x can be deallocated
with free(x). All the three borrows are passed in the same configuration to the
recursive instance of the predicate, thereby imposing the same constraints on
the rest of the corresponding list components.
Let us see the borrow-polymorphic inductive predicates in action. Consider
the following specification that asks for a function taking a list of arbitrary values
and replacing all of them with zeroes:10
{ls(x, S, d, M, e)} void reset(loc x) {ls(x, O, d, M, e)} (10)
The spec (10) gives very little freedom to the function that would satisfy it
with regard to permissions to manipulate the contents of the heap, constrained
by the predicate ls(x, S, d, M, e). As the first and the third borrow parameters are
instantiated with read-only borrows (d and e), the desired function is not going
to be able to change the structural pointers or deallocate parts of the list. The
only allowed manipulation is, thus, changing the values of the payload pointers.
This concise specification is pleasantly strong. To wit, in plain SSL, a similar
spec (without read-only annotations) would also admit an implementation that
fully deallocates the list or arbitrarily changes its length. In order to avoid these
outcomes, one would, therefore, need to provide an alternative definition of the
predicate ls, which would incorporate the length property too.
Imagine now that one would like to use the implementation of reset satisfy-
ing specification (10) to generate a function with the following spec, providing
stronger access permissions for the list components:
{ls(y, S, M, M, M)} void call_reset(loc y) {ls(y, O, M, M, M)}
During the synthesis of call reset, a call to reset is generated. For this
purpose the access permissions are borrowed and recovered as per spec (10) via
the substitution [y/x, M/d, M/e] in a way described in Sec. 2.3.
2.5 Putting It All Together
We conclude this overview by explaining how synthesis via SSL enhanced with
read-only borrows avoids the issue with spurious writes outlined in Sec. 1.1.
To begin, we change the specification to the following one, which makes use
of the new list predicate (9) and prevents any modifications in the original list.

M M
r → x ∗ ls(x, S, a, b, c) listcopy(r) r → y ∗ ls(x, S, a, b, c) ∗ ls(y, S, M, M, M)
We should remark that, contrary to the solution sketched at the end of Sec. 1.1,
which suggested using the predicate instance of the shape ls(x, S)[RO], our con-
crete proposal does not allow us to constrain the entire predicate with a single
10
We use O as a notation for a multi-set with an arbitrary finite number of zeroes.
Concise Read-Only Specifications for Better Synthesis 151

Variable x, y Alpha-numeric identiﬁers

Size, oﬀset n, ι Non-negative integers
Expressione ::= 0 | true | x | e = e | e ∧ e | ¬e
Command c ::= let x = ∗(x + ι) | ∗(x + ι) = e | let x = malloc(n) | free(x)
| err | f (ei ) | c; c | if (e) {c} else {c}
Fun. dict. Δ ::= | Δ, f (xi ) { c }
Fig. 2: Programming language grammar.

Pure term φ, ψ, χ, α ::= 0 | true | M | RO | x | φ = φ | φ ∧ φ | ¬φ

α
Symbolic heap P, Q, R ::= emp | e, ι → e | [e, ι]α | p(φi ) | P ∗ Q
Heap predicate D ::= p(xi ) ek , {χk , Rk }
Function spec F ::= f(xi ) : {P}{Q} Assertion P, Q ::= {φ; P}
Environment Γ := | Γ, x Context Σ := | Σ, D | Σ, F
Fig. 3: BoSSL assertion syntax.
access permission (e.g., RO). Instead, we allow fine-grained access control to
its particular elementary components by annotating each one with an individ-
ual borrow. The specification above allows the greatest flexibility wrt. access
permissions to the original list by giving them different names (a, b, c).
In the process of synthesising the non-trivial branch of listcopy, the search
at some point will come up with the following intermediate goal:
{x, r, nxt, v, y12} ;

M b c
S = {v} ∪ S1 ; r → y12 ∗ [x, 2]a ∗ x → v ∗ x, 1 → nxt ∗ ls(y12, S1 , M, M, M) ∗ . . .

M M
; [z, 2]M ∗ z → v ∗ z, 1 → y12 ∗ ls(y12, S1 , M, M, M) ∗ . . .
Since the logical variable z in the postcondition is an existential one, the greyed
part of the symbolic heap can be satisfied by either (a) re-purposing the greyed
part of the precondition (which is what the implementation in Sec. 1.1 does), or
(b) allocating a corresponding record of two elements (as should be done). With
the read-only borrows in place, the unification of the two greyed fragments in the
pre- and postcondition via UnifyHeaps fails, because the mutable annotation
of z → v in the post cannot be matched by the read-only borrow x → v in the
M b

precondition. Therefore, not being able to follow the derivation path (a), the
synthesiser is forced to explore an alternative one, eventually deriving the version
of listcopy without tail-swapping.

3 BoSSL: Borrowing Synthetic Separation Logic

We now give a formal presentation of BoSSL—a version of SSL extended with
read-only borrows. Fig. 2 and Fig. 3 present its programming and assertion lan-
guage, respectively. For simplicity, we formalise a core language without theories
(e.g., natural numbers), similar to the one of Smallfoot [6]; the only sorts in
the core language are locations, booleans, and permissions (where permissions
appear only in speciﬁcations) and the pure logic only has equality. In contrast,
our implementation supports integers and sets (where the latter also only ap-
pear in speciﬁcations), with linear arithmetic and standard set operations. We do
152 A. Costea et al.

Write

e = e
M M
Vars (e) ⊆ Γ Γ; φ; x, ι → e ∗ P ; ψ; x, ι → e ∗ Q c

Γ; φ; x, ι → e ∗ P ; ψ; x, ι → e ∗ Q ∗ (x + ι) = e; c
M M

Alloc

R= [z, n]α ∗ ∗ 0≤i<n

α
z, i →i ei

({y} ∪ {ti }) ∩ Vars (Γ, P, Q) =∅

z∈EV (Γ, P, Q)

∗ M
R [y, n] ∗ 0≤i<n y, i → ti
M
Σ; Γ; φ; P ∗ R ; {ψ; Q ∗ R} c
Σ; Γ; {φ; P} ; {ψ; Q ∗ R}| let y = malloc(n); c
Free

R = [x, n]M ∗ ∗ 0≤i<n

M
x, i → ei Vars ({x} ∪ {ei }) ⊆ Γ Σ; Γ; {φ; P} ; {Q}| c
Σ; Γ; {φ; P ∗ R} ; {Q}| free(x); c

Fig. 4: BoSSL derivation rules.

not formalise sort-checking of formulae; however, for readability, we will use the
meta-variable α where the intended sort of the pure logic term is “permission”,
and Perm for the set of all permissions. The permission to allocate or deallocate
a memory-block [x, n]α is controlled by α.
3.1 BoSSL rules
New rules of BoSSL are shown in Fig. 4. The figure contains only 3 rules: this
minimal adjustment is possible thanks to our approach to unification and permis-
sion accounting from first principles. Writing to a memory location requires its
corresponding
symbolica heap to be annotated as mutable. Note that for a pre-
condition a = M; x
→
5 , a normalisation
rule like SubstLeft would first
transform it into M = M; x
→ 5 , at which point the Write rule can be ap-
M

plied. Note also that Alloc does not require speciﬁc permissions on the block
in the postcondition; if they turn out to be RO, the resulting goal is unsolvable.
Unsurprisingly, the rule for accessing a memory cell just for reading purposes
requires no adjustments since any permission allows reading. Moreover, the Call
rule for method invocation does not need adjustments either. Below, we describe
how borrow and return seamlessly operate within a method call:
Call
F f(xi ) : {φf ; Pf }{ψf ; Qf } ∈ Σ R = [σ]Pf φ ⇒ [σ]φf ei = [σ]xi
Vars (ei ) ⊆ Γ φ [σ]ψf R [σ]Qf Σ; Γ; {φ ∧ φ ; P ∗ R } ; {Q}| c
Σ; Γ; {φ; P ∗ R} ; {Q}| f(ei ); c

The Call rule fires when a sub-heap R in the precondition of the goal can
be unified with the precondition Pf of a function f from context Σ. Some salient
points are worth mentioning here: (1) the annotation borrowing from R to Pf for
those symbolic sub-heaps in Pf which require read-only permissions is handled
by the unification of Pf with R, namely R = [σ]Pf (i.e., substitution accounts for
borrows: α/a); (2) the annotation recovery in the new precondition is implicit
Concise Read-Only Specifications for Better Synthesis 153

via R [σ]Qf , where the substitution σ was computed during the unification,
that is, while borrowing; (3) finding a substitution σ for R = [σ]Pf fails if R does
not have sufficient accessibility permissions to call f (i.e., substitutions of the
form a/M are disallowed since the domain of σ may only contain existentials).
We reiterate that read-only specifications only manipulate symbolic borrows,
that is to say, RO constants are not expected in the specification.
3.2 Memory Model
We closely follow the standard SL memory model [32,37] and assume Loc ⊂ Val.

(Heap) h ∈ Heaps ::= Loc → Val (Stack) s ∈ Stacks ::= Var → Val
To enable C-like accounting of dynamically-allocated memory blocks, we as-
sume that the heap h also stores sizes of allocated blocks in dedicated locations.
Conceptually, this part of the heap corresponds to the meta-data of the mem-
ory allocator. This accounting ensures that only a previously allocated memory
block can be disposed (as opposed to any set of allocated locations), enabling the
free command to accept a single argument, the address of the block. To model
this meta-data, we introduce a function bl : Loc → Loc, where bl(x) denotes
the location in the heap where the block meta-data for the address x is stored, if
x is the starting address of a block. In an actual language implementation, bl(x)
might be, e.g., x − 1 (i.e., the meta-data is stored right before the block).
Since we have opted for an unsophisticated permission mechanism, where the
heap ownership is not divisible, but some heap locations are restricted to RO,
the deﬁnition of the satisfaction relation Σ,R
I for the annotated assertions in a
particular context Σ and given an interpretation I, is parameterised with a ﬁxed
set of read-only locations, R:

– h, s
Σ,R
I {φ;
emp} iff s = true and dom (h) = ∅.
φ
– h, s
Σ,R
α
I φ; e1 , ι
→
e2 iff φs = true and l e1 s +ι and dom (h) = {l}
and h(l) = e2 s and l ∈ R ⇔ α = RO .
– h, s
Σ,R α
I {φ; [e, n] } iff φs = true and l bl(es ) and dom (h) = {l} and
h(l) = n and l ∈ R ⇔ α = RO.
– h, s
Σ,R · Σ,R
I {φ; P1 ∗ P2 } iff ∃ h1 , h2 , h = h1 ∪ h2 and h1 , s
I {φ; P1 } and
Σ,R
h2 , s
I {φ; P2 }.
Σ,R
– h,
s
I φ; p(ψ i ) iff φs = true and D p(xi ) ek , {χk , Rk }
∈ Σ and

h, ψi s ∈ I(D) and k ( h, s
Σ,R I [ψi /xi ]{φ ∧ ek ∧ χk ; Rk }).

There are two non-standard cases: points-to and block, whose permissions
must agree with R. Note that in the definition of satisfaction, we only need to
consider that case where the permission α is a value (i.e., either RO or M).
Although in a specification α can also be a variable, well-formedness guarantees
that this variable must be logical, and hence will be substituted away in the
definition of validity. We stress the fact that a reference that has RO permissions
to a certain symbolic heap still retains the full ownership of that heap, with the
restriction that it is not allowed to update or deallocate it. Note that deallocation
additionally requires a mutable permission for the enclosing block.
154 A. Costea et al.

3.3 Soundness
The BoSSL operational semantics is in the spirit of the traditional SL [38], and
hence is omitted for the sake of saving space (selected rules are available in
the extended version of the paper). The validity definition and the soundness
proofs of SSL are ported to BoSSL without any modifications, since our current
definition of satisfaction implies the one defined for SSL:
Definition 1 (Validity). We say that a well-formed Hoare-style specification
Σ; Γ; {P} c {Q} is valid wrt. the function dictionary Δ iff whenever dom (s) = Γ,
∀σgv = [xi → di ]xi ∈GV(Γ,P,Q) such that h, s
ΣI [σgv ]P, and Δ; h, (c, s) ·

∗
h , (skip, s ) ·

, it is also the case that h , s

ΣI [σev ∪· σgv ]Q for some σev =
[yj → dj ]yj ∈EV(Γ,P,Q) .
The following theorem guarantees that, given a program c generated with
BoSSL, a heap model, and a set of read-only locations R that satisfy the pro-
gram’s precondition, executing c does not change those read-only locations:
Theorem 1 (RO Heaps Do Not Change). Given a Hoare-style speciﬁcation
Σ; Γ; {φ; P}c{Q}, which is valid wrt. the function dictionary Δ, and a set of read-
only memory locations R, if:
(i) h, s
Σ,R
I [σ]P, for some h, s and σ, and
(ii) Δ; h, (c, s) ·

∗ h , (c , s ) ·

for some h , s and c

(iii) R ⊆ dom (h)
then R ⊆ dom (h ) and ∀l ∈ R, h(l) = h (l).
Starting from an abstract state where a spatial heap has a read-only permis-
sion, under no circumstance can this permission be strengthened to M:
Corollary 1 (No Permission Strengthening). Given a valid Hoare-style
specification Σ; Γ; {φ; P} c {ψ; Q} and a permission α, if ψ ⇒ (α = M) then
it is also the case that φ ⇒ (α = M) .
As it turns out, permission weakening is possible, since, though problematic,
postcondition weakening is sound in general. However, even though this affects
completeness, it does not affect our termination
a results. For
example, given a
M a2 M
synthesised auxiliary function F f(x, r) : x → t ∗ r → x x → t ∗ r → t + 1 ,
1

and a synthesis goal Σ, F; Γ; x → 7 ∗ y → x ; x → 7 ∗ y → z c, ﬁring the Call
M M M M

rule
for the candidate
r) would lead to the unsolvable goal Σ, F; Γ;
function f(x,

a2
7 ∗ y → 8 ; x → 7 ∗ y → z f(x, y); c. Frame may never be fired on this
M M M
x →
new goal since the permission of reference x in the goal’s precondition has been
permanently weakened. To eliminate such sources of incompleteness we require
the user-provided predicates and specifications to be well-formed:
Definition 2 (Well-Formedness of Spatial Predicates). We say that a
spatial predicate p(xi ) ek , {χk , Rk }
k∈1..N is well-formed iff
N
( k=1 (Vars (ek ) ∪ Vars (χk ) ∪ Vars (Rk )) ∩ Perm) ⊆ (xi ∩ Perm).
Concise Read-Only Specifications for Better Synthesis 155

That is, every accessibility annotation within the predicate’s clause is bound by
the predicate’s parameters.
Definition 3 (Well-Formedness of Specifications). We say that a Hoare-
style specification Σ; Γ ; {P} c {Q} is well-formed iff EV (Γ, P, Q)∩Perm = ∅ and
every predicate instance in P and Q is an instance of a well-formed predicate.
That is, postconditions are not allowed to have existential accessibility annota-
tions in order to avoid permanent weakening of accessibility.
A callee that requires borrows for a symbolic heap always returns back to the
caller its original permission for that respective symbolic heap:
Corollary 2 (Borrows Always Return). A heaplet with permission α, either
(a) retains the same permission α after a call to a function that is decorated with
well-formed specifications and that requires for that heaplet to have read-only
permission, or (b) it may be deallocated in case if α = M.
4 Implementation and Evaluation
We implemented BoSSL in an enhanced version of the SuSLik tool, which
we refer to as ROBoSuSLik [12].11 The changes to the original SuSLik in-
frastructure affected less than 100 lines of code. The extended synthesis is
backwards-compatible with the original benchmarks. To make this possible, we
treat the original SSL specifications as annotated/instantiated with M permis-
sions, whenever necessary, which is consistent with treatment of access permis-
sions in BoSSL.
We have conducted an extensive experimental evaluation of ROBoSuSLik,
aiming to answer the following research questions:
1. Do borrowing annotations improve the performance of SSL-based synthesis
when using standard search strategy [34, § 5.2]?
2. Do read-only borrows improve the quality of synthesised programs, in terms of
size and comprehensibility, wrt. to their counterparts obtained from regular,
“all-mutable” specifications?
3. Do we obtain stronger correctness guarantees for the programs from the stan-
dard SSL benchmark suite [34, § 6.1] by simply adding, whenever reasonable,
read-only annotations to their specifications?
4. Do borrowing specifications enable more robust synthesis? That is, should we
expect to obtain better programs/synthesis performance on average regardless
of the adopted unification and search strategies?
4.1 Experimental Setup
Benchmark Suite. To tackle the above research questions, we have adopted most
of the heap-manipulating benchmarks from SuSLik suite [34, § 6.1] (with some
variations) into our sets of experiments. In particular we looked at the group
of benchmarks which manipulate singly linked list segments, sorted linked list
segments and binary trees. We did not include the benchmarks concerning binary
search trees (BSTs) for the reasons outlined in the next paragraph.
11
The sources are available at https://github.com/TyGuS/robosuslik.
156 A. Costea et al.

The Tools. For a fair comparison which accounts for the latest advancements
to SuSLik, we chose to parameterise the synthesis process with a flag that
turns the read-only annotations on and off (off means that they are set to be
mutable). Those values which are the result of having this flag set will be marked
in the experiments with RO, while those marked with Mut ignore the read-only
annotations during the synthesis process. For simplicity, we will refer to the two
instances of the tool, namely RO and Mut, as two different tools. Each tool was
set to timeout after 2 minutes of attempting to synthesise a program.
Criteria. In an attempt to quantify our results, we have looked at the size of
the synthesised program (AST size), the absolute time needed to synthesise the
code given its specification, averaged over several runs (Time), the number of
backtrackings in the proof search due to nondeterminism (#Backtr ), the total
number of rule applications that the synthesis fired during the search (#Rules),
including those that lead to unsolvable goals, and the strength of the guarantees
offered by the specifications (Stronger Guarantees).
Variables. Some benchmarks have shown improvement over the synthesis pro-
cess without the read-only annotations. To emphasise the fact that read-only
annotations’ improvements are not accidental, we have varied the inductive defi-
nitions of the corresponding benchmarks to experiment with different properties
of the underlying structure: the shape of the structure (in all the definitions),
the length of the structure (for those benchmarks tagged with len), the values
stored within the structure (val ), a combination of all these properties (all ) as
well as with the sortedness property for the “Sorted list” group of benchmarks.
Experiment Schema. To measure the performance and the quality of the borrowing-
aware synthesis we ran the benchmarks against the two different tools and did
a one-to-one comparison of the results. We ran each tool three times for each
benchmark, and average the resulted synthesis time. All the other evaluation
criteria remain constant within all three runs.
To measure the tools’ robustness we stressed the synthesis algorithm by alter-
ing the default proof search strategy. We prepared 42 such perturbations which
we used to run against the different program variants enumerated above. Each
pair of program variant and proof strategy perturbation has been then analysed
to measure the number of rules that had been fired by RO and Mut.
Hardware Setup. The experiments were conducted on a 64-bit machine running
Ubuntu, with an Intel Xeon CPU (6 cores, 2.40GHz) with 32GB RAM.
4.2 Performance and Quality of the Borrowing-Aware Synthesis
Tab. 1 captures the results of running RO and Mut against the considered bench-
marks. It provides the empirical proof that the borrowing-aware synthesis im-
proves the performance of the original SSL-based synthesis, or in other words,
answering positively the Research Question 1. RO suffers almost no loss in per-
formance (except for a few cases, such as the list segment append where there
is a negligible increase in time), while the gain is considerable for those synthe-
sis problems with complex pointer manipulation. For example, if we consider
the number of fired rules as the performance measurement criteria, in the worst
Concise Read-Only Specifications for Better Synthesis 157

AST size Time (sec) #Backtr. #Rules Stronger

Group Description
RO Mut RO Mut Mut/RO RO Mut Mut/RO RO Mut Mut/RO Guarant.
append 20 20 1.5 1.4 0.9x 8 8 1.0x 77 78 1.0x YES
delete 44 44 1.9 2.1 1.1x 67 67 1.0x 180 180 1.0x same
dispose 11 11 0.5 0.5 1.0x 0 0 1.0x 8 8 1.0x same
Linked init 13 13 0.7 0.7 1.0x 5 5 1.0x 27 27 1.0x YES
List lcopy 32 35 1.0 1.0 1.0x 9 14 1.5x 66 82 1.2x YES
Segment length 22 22 1.5 1.5 1.0x 2 2 1.0x 38 38 1.0x YES
max 28 28 1.4 1.5 1.1x 2 2 1.0x 38 38 1.0x YES
min 28 28 1.5 1.5 1.0x 2 2 1.0x 38 38 1.0x YES
singleton 11 11 0.5 0.5 1.0x 8 8 1.0x 30 30 1.0x same
ins-sort-all 29 29 3.7 3.8 1.0x 5 5 1.0x 60 60 1.0x YES
ins-sort-len 29 29 3.0 3.0 1.0x 7 8 1.1x 59 60 1.0x YES
Sorted
ins-sort-val 29 29 2.6 2.5 1.0x 5 5 1.0x 57 57 1.0x YES
List
insert 53 53 7.8 8.0 1.0x 35 96 2.7x 214 338 1.6x YES
prepend 11 11 0.5 0.6 1.2x 1 1 1.0x 17 17 1.0x YES
dispose 16 16 0.4 0.5 1.2x 0 0 1.0x 10 10 1.0x same
ﬂatten-acc 35 35 2.1 2.0 1.0x 24 24 1.0x 118 118 1.0x same
ﬂatten-app 48 48 1.6 1.7 1.0x 14 14 1.0x 76 76 1.0x same
morph 19 19 0.6 0.5 1.0x 1 1 1.0x 24 24 1.0x YES
tcopy-all 42 51 1.5 2.2 1.5x 10 88 8.8x 85 296 3.5x YES
tcopy-len 36 42 1.3 2.0 1.5x 6 90 15x 72 304 4.2x YES
tcopy-val 42 51 1.4 5.3 3.8x 10 1222 122x 82 2673 32x YES
Tree
tcopy-ptr-all 46 55 1.6 2.4 1.5x 10 88 8.8x 93 303 3.3x YES
tcopy-ptr-len 40 46 1.3 2.2 1.7x 6 90 15x 80 311 3.9x YES
tcopy-ptr-val 46 55 1.3 5.8 4.5x 10 1222 122x 89 2679 30x YES
tsize-all 32 38 1.5 1.4 0.9x 2 4 2.0x 45 51 1.1x YES
tsize-len 32 32 1.2 1.1 0.9x 2 2 1.0x 44 46 1.0x YES
tsize-ptr-all 36 42 1.6 1.4 0.9x 2 4 2.0x 53 58 1.1x YES
tsize-ptr-len 36 36 1.3 1.3 1.0x 2 2 1.0x 52 53 1.0x YES

Table 1: Benchmarks and comparison between the results for synthesis with read-
only annotations (RO) and without them (Mut). For each case study we measure
the AST size of the synthesised program, the Time needed to synthesize the
benchmark, the number of times that the synthesiser had to discard a derivation
branch (#Backtr.), and the total number of ﬁred rules (#Rules).

case, RO behaves the same as Mut, while in the best scenario it buys us a 32-fold
decrease in the number of applied rules. At the same time, synthesising a few
small examples in the RO case is a bit slower, despite the same or smaller num-
ber of rule applications. This is due to the increased number of logical variables
(because of added borrows) when discharging obligations via SMT solver.
Fig. 5 oﬀers a statistical view of the numbers in the table, where smaller bars
mark a better performance. The barplots indicate that as the complexity of the
problem increases (approximately from left to right), RO outperforms Mut.
Perhaps the most important take-away from this experiment is that the syn-
thesis with read-only borrows often produces a more concise program (light green
cells in the columnt AST size of Tab. 1), while retaining the same or better per-
formance wrt. all the evaluated criteria. For instance, RO gets rid of the spurious
write from the motivating example introduced in Sec. 1, reducing the AST size
from 35 nodes down to 32, while in the same time ﬁring fewer rules. That also
means that we secure a positive answer for Research Question 2.

4.3 Stronger Correctness Guarantees

To answer Research Question 3, we have manually compared the guarantees
oﬀered by the speciﬁcations annotated with RO permissions against the default
158 A. Costea et al.
60 AST Sizes of Synthesised Programs
Read−Only
Mutable
50
Number of AST Nodes

0
th ax in n e t y
ini cop en
d te end er t len val −all len −all len −all ose rph len val −all len val −all app acc
ng m leto pos le
le
m
g dis l
ap
p de rep ins or t− or t− or t ize− ize ptr− −ptr isp mo py− py− opy ptr− ptr− −ptr en− en−
sin p −s s−s ns−
s ts ts e− ze d
tc
o tco tc py− py− py latt latt
ins in i iz tsi o tco tc
o f f
ts tc

Rules Tried while Searching

Read−Only
10
log2 of number of tried rules

Mutable

ones - the results are summarized in the last column of Tab. 1. For instance, a
specification stating that the shape of a linked-list segment is read-only implies
that the size of that segment remains constant through the program’s execution.
In other words, the length property need not be captured separately in the
segment’s definition. If, in addition to the shape, the payload of the segment is
also read-only, then the set of values and their ordering are also invariant.
Consider the goal {lseg(x, y, s, a1 , a2 , a3 )} ; {lseg(x, y, s, a1 , a2 , a3 )}, where
lseg is an inductive definition of a list segment which ends at y and contains
the set of values s. The borrowing-aware synthesiser will produce a program
which is guaranteed to treat the segment pointed by x and ending with y as
read-only (that is, its shape, length, values and orderings are invariant). At the
same time, for a goal {lseg(x, y, s)} ; {lseg(x, y, s)} , the guarantees are that
the returned segment still ends in y and contains values s. Internal modifications
of the segment, such as reordering and duplicating list elements, may still occur.
The few entries marked with same are programs with specifications which have
not got stronger when instrumented with RO annotations (e.g., delete). These
benchmarks require mutation over the entire data structure, hence the read-only
annotations do not influence the offered guarantees. Overall, our observations
that read-only annotations offer stronger guarantees are in agreement with the
works on SL-based program verification [9, 13], but are promoted here to the
more challenging problem of program synthesis.
Concise Read-Only Specifications for Better Synthesis 159

4.4 Robustness under Synthesis Perturbations

There is no single search heuristics that will work equally well for any given
specification: for a particular fixed search strategy, a synthesiser can exhibit
suboptimal performance for some goals, while converging quickly on some others.
By evaluating robustness wrt. to RO and M specification methodologies, we are
hoping to show that, provided a large variety of “reasonable” search heuristics,
read-only annotations deliver better synthesis performance “on average”.
For this set of experiments, we have focused on four characteristic programs
from our performance benchmarks based on their pointer manipulation com-
plexity: list segment copy (lcopy), insertion into a sorted list segment (insert),
copying a tree (tcopy), and a variation of the tree copy that shares the same
pointer for the input tree and its returned copy (tcopy-ptr).
Exploring Different Unification Orders. Since spatial unification stays at the core
of the synthesis process, we implemented 6 different strategies for choosing a
unification candidate based on the following criteria: the size of the heaplet chunk
(favor the smallest heap vs. the largest one as the best unification candidate), the
name of the predicate (we considered both an ascending as well as a descending
priority queue), and a customised ranking function which associates a cost to a
symbolic heap based on its kind—a block is cheaper to unify than a points-to
which in turn is cheaper than a spatial predicate.
Exploring Different Search Strategies. We next designed 6 strategies for priori-
tising the rule applications. One of the crux rules in this matter, is the Write
rule whose different priority schemes might make all the results seem randomly-
generated. In the cases where Write leads to unsolvable goals, one might right-
fully argue that RO has a clear advantage over Mut (fail fast). However, for
the cases where mutation leads to a solution faster, then Mut might have an
advantage over RO (solve fast). Because these are just intuitive observations,
and for fairness sake, we experimented with both the cases where Write has a
high and a low priority in the queue of rule phases [34, § 5.2]. Since most of the
benchmarks involve recursion, we also chose to shuffle around the priorities of
the Open and Call rules. Again, we chose between a stack high and a bottom
low priority for these rules to give a fair chance to both tools.
We considered all combinations of the 6 unification permutations and the 6
rule-application permutations (plus the default one) to obtain 42 different proof
search perturbations. We will use the following notation in the narrative below:
– S is the set comprising the synthesis problems: lcopy, insert, tcopy, tcopy-ptr.
– V is the set of all specification variations: len, val, all.
– K is the set of all 42 possible tool perturbations.
The distributions of the number of rules fired for each tool (RO and Mut)
with the 42 perturbations over the 4 synthesis problems with 3 variants of spec-
ification each, that is 1008 different synthesis runs, are summarised using the
boxplots in Fig. 6. There is a boxplot corresponding to each pair of tool and
synthesis problem. In the ideal case, each boxplot contains 126 data points cor-
responding to a unique combination (v, k) of a specification variation v ∈ V and
a tool perturbation k ∈ K. A boxplot is the distribution of such data based on a
160 A. Costea et al.

Read−Only
Mutable
16
14
log2 of number of tried rules
10 8
612

6) 6) (42
)
(42
) 6) 8) 90
)
84
)
(12 (12 er t er t (12 (10 tr ( tr (
py py ins ins py py −p −p
lco lco tco tco copy
copy
t t

Fig. 6: Boxplots of variations in log2 (numbers of applied rules) for synthesis per-
turbations. Numbers of data points for each example are given in parentheses.

six number summary: minimum, first quartile, median, third quartile, maximum,
outliers. For example, the boxplot for tcopy-ptr corresponding to RO and con-
taining 90 data points, reads as follows: “the synthesis processes fired between
64 and 256 rules, with most of the processes firing between 64 and 128 rules.
There are three exception where the synthesiser fired more than 256 rules”. Note
that the y-axis represents the binary logarithm of the number of fired rules.
Even though we attempted to synthesise each program 126 times for each tool,
some attempts hit the timeout and therefore their corresponding data points had
to be eliminated from the boxplot. It is of note, though, that whenever RO with
configuration (v, k) hit the timeout for the synthesis problem s ∈ S, so did Mut,
hence both the (RO, s, (v, k)) as well as (Mut, s, (v, k)) are omitted from the
boxplots. But the inverse did not hold: RO hit the timeout fewer times than
Mut, hence RO is measured at disadvantage (i.e., more data points means more
opportunities to show worse results). Since insert collected the highest number
of timeouts, we equalised it to remove non-matched entries across the two tools.
Despite RO’s potential measurement disadvantage, the boxplots depicts it as a
clear winner. Not only RO fires fewer rules in all the cases, but with the exception
of insert, it is also more stable to the proof search perturbations, it varies a
few order of magnitude less than Mut does for the same configurations. Fig. 7
supports this observation by offering a more detailed view on the distributions
of the numbers of fired rules per synthesis configuration. Taller bars show that
more processes fall in the same range (wrt. the number of fired rules). For lcopy,
tcopy, tcopy-ptr it is clear that Mut has a wider distribution of the number
of fired rules, that is, Mut is more sensitive to the perturbations than RO. We
additionally make some further observations:
Concise Read-Only Specifications for Better Synthesis 161

lcopy (RO) insert (RO) tcopy (RO) tcopy−ptr (RO)

60
126 data points 42 data points 126 data points 90 data points
50

50
40

40
Frequency

30
20

20
10

10
0

0
4 6 8 10 12 14 16 18 4 6 8 10 12 14 16 18 4 6 8 10 12 14 16 18 4 6 8 10 12 14 16 18

lcopy (Mut) insert (Mut) tcopy (Mut) tcopy−ptr (Mut)

60
126 data points 42 data points 108 data points 84 data points
50

50
40

40
Frequency

30
20

20
10

10
0

0
4 6 8 10 12 14 16 18 4 6 8 10 12 14 16 18 4 6 8 10 12 14 16 18 4 6 8 10 12 14 16 18

Fig. 7: Distributions of log2 (number of attempted rule applications).

– Despite a similar distribution wrt. the numbers of fired rules in the case of
insert, RO produces compact ASTs of size 53 for all perturbations, while Mut
fluctuates between producing ASTs of size 53 and 62.
– For all the synthesis tasks, RO produced the same AST irrespective of the
tool’s perturbation. In contrast, there were synthesis problems for which Mut
produced as many as 3 different ASTs for different perturbations, none of
which were as concise as the one produced by RO for the same configuration.
– The outliers of (Mut, lcopy) are ridiculously high, firing close to 40k rules.
– The outliers of (RO, tcopy) are still below the median values of (Mut, tcopy).
– Except for insert, the best performance of Mut, in terms of fired rules, barely
overlaps with the worst performance of RO.
– Except for insert, the medians of RO are closer to the lowest value of the
data distribution, as opposed to Mut where the tendancy is to fire more rules.
– In absolute values, RO hit the 2-minutes timeout 102 times compared to Mut,
which hit the timeout 132 times.
We believe that the main take-aways from this set of experiments, along with
the positive answer to the Research Question 4, are as follows:
– RO is more stable wrt. the number of rules fired and the size of the generated
AST for many reasonable proof search perturbations.
– RO produces better programs, which avoid spurious statements, irrespective
of the perturbation and number of rules fired during the search.
5 Limitations and Discussion
Flexible aliasing. Separating conjunction asserts that the heap can be split into
two disjoint parts, or in other words it carries an implicit non-aliasing infor-
mation. Specifically, x → ∗ y → states that x and y are non-aliased. Such
162 A. Costea et al.

assertions can be used to specify methods as below:

{x → n ∗ y → m ∗ ret → x} sum(x, y, ret) {x → n ∗ y → m ∗ ret → n + m}
Occasionally, enforcing x and y to be non-aliased is too restrictive, rejecting
safe calls such as sum(p, p, q). Approaches to support immutable annotations
permit such calls without compromising safety if both pointers, aliased or not,
are annotated as read-only [9,13]. BoSSL does not support such flexible aliasing.
Precondition strengthening. Let us assume that srtl(x, n, lo, hi, α1 , α2 , α3 ) is an
inductive predicate that describes a sorted linked list of size n with lo and
hi being the list’s minimum and maximum payload value, respectively. Now,
consider the following synthesis goal:
{x, y} ; {y → x ∗ srtl(x, n, lo, hi, M, M, M)} ; {y → n ∗ srtl(x, n, lo, hi, M, M, M)}.
As stated, the goal clearly requires the program to compute the length n of
the list. Imagine that we already have a function that does precisely that, even
though it is stated in terms of a list predicate that does not enforce sortedness:
{ret → x ∗ ls(x, n, a1 , a2 , a3 )} length(x, ret) {ret → n ∗ ls(x, n, a1 , a2 , a3 )}
To solve the initial goal, the synthesiser could weaken the given precondition
srtl(x, n, lo, hi, M, M, M) to ls(x, n, M, M, M), and then successfully synthesise a
call to the length method. Unfortunately, the resulting goal, obtained after hav-
ing emitted the call to length and applying Frame, is unsolvable:
{x, y} {ls(x, n, M, M, M)} ; {srtl(x, n, lo, hi, M, M, M)}.
since the logic does not allow to strengthen an arbitrary linked list to a sorted
linked list without retaining the prior knowledge. Should we have adopted an
alternative approach to read-only annotations [9,13] allowing the caller to retain
the full permission of the sorted list, then the postcondition of length would not
contain the list-related part of the heap and would only quantify over the result
pointer {ret → n}, thus leading to the solvable goal below:
{x, y} ; {srtl(x, n, lo, hi, M, M, M)} ; {srtl(x, n, lo, hi, M, M, M)}.
One straightforward way for BoSSL to cope with this limitation is to simply
add a version of length annotated with specifications that cater to srtl.
Overcoming the limitations. While the “caller keeps the permission” kind of ap-
proach would buy us flexible aliasing and calls with weaker specifications, it
would compromise the benefits discussed earlier with respect to the granular-
ity of borrow-polymorphic inductive predicates. One possible solution to gain
the best of both worlds would be to design a permission system which allows
both borrow-polymorphic inductive predicates as well as read-only modalities to
co-exist, where the latter would overwrite the predicate’s mixed permissions. In
other words, the read-only modality enforces a read-only treatment of the pred-
icate irrespective of its permission arguments, while the permission arguments
control the treatment of a mutable predicate. The theoretical implications of
such a design choice are left as part of future work.
Extending read-only specifications to concurrency. Thus far we have only inves-
tigated the synthesis of sequential programs, for which read-only annotations
helped to reduce the synthesis cost. Assuming that the synthesiser has the capa-
bility to synthesise concurrent programs as well, the borrows annotation mecha-
nism in its current form may not be able to cope with general resource sharing.
Concise Read-Only Specifications for Better Synthesis 163

This is because a callee which requires read-only permissions to a particular

symbolic heap still consumes the entire required symbolic heap from the caller,
despite the read-only requirement; hence, there is no space left for sharing. That
said, the recently proposed alternative approaches to introduce read-only an-
notations [9, 13] have no formal support for heap sharing in the presence of
concurrency either. To address these challenges, we could adopt a more sophis-
ticated approach based on fractional permissions mechanism [7,8,20,25,30], but
this is left as part of future work since it is orthogonal to the current scope.

6 Related Work
Language design. There is a large body of work on integrating access permissions
into practical type systems [5, 16, 42] (see, e.g., the survey by Clarke et al. [10]).
One notable such system, which is the closest in its spirit to our proposal, is
the borrows type system of the Rust programming language [1] proved safe with
RustBelt [22]. Similar to our approach, borrows in Rust are short-lived: in
Rust they share the scope with the owner; in our approach they do not escape
the scope of a method call. In contrast with our work, Rust’s type system care-
fully manages different references to data by imposing strict sharing constraints,
whereas in our approach the treatment of aliasing is taken care of automatically
by building on Separation Logic. Moreover, Rust allows read-only borrows to be
duplicated, while in the sequential setting of BoSSL this is currently not possible.
Somewhat related to our approach, Naden et al. propose a mechanisms for
borrowing permissions, albeit integrated as a fundamental part of a type sys-
tem [31]. Their type system comes equipped with change permissions which
enforce the borrowing requirements and describe the effects of the borrowing
upon return. As a result of treating permissions as first-class values, we do not
need to explicitly describe the flow of permissions for each borrow since this is
controlled by a mix of the substitution and unification principles.
Program verification with read-only permissions. Boyland introduced fractional
permissions to statically reason about interference in the presence of shared-
memory concurrency [8]. A permission p denotes full resource ownership (i.e.
read-write access) when p = 1, while p ∈ (0, 1) denotes a partial ownership (i.e.
read-only access). To leverage permissions in practice, a system must support
two key operations: permission splitting and permission borrowing. Permission
p p1 p2
splitting (and merging back) follows the split rule: x → a = x → a∗x → a, with p =
p1 +p2 and p, p1 , p2 ∈ (0, 1]. Permission borrowing refers to the safe manipulation
of permissions: a callee may remove some permissions from the caller, use them
temporarily, and give them back upon return.
Though it exists, tool support for fractional permissions is still scarce. Leino
and Müller introduced a mechanism for storing fractional permissions in data
structures via dedicated access predicates in the Chalice verification tool [27].
To promote generic specifications, Heule et al. advanced Chalice with insta-
tiable abstract permissions, allowing automatic fire of the split rule and symbolic
borrowing [20]. VeriFast [21] is guided by contracts written in Separation Logic
and assumes the existence of lemmas to cater for permission splitting. Viper [30]
164 A. Costea et al.

is an intermediate language which supports various permission models, includ-

ing abstract fractional permissions [4, 43]. Similar to Chalice, the permissions
are attached to memory locations using an accessibility predicate. To reason
about it, Viper uses permission-aware assertions and assumptions, which corre-
spond in our approach to the unification and the substitution operations, respec-
tively. Like Viper, we enhance the basic memory constructors, that is blocks
and points-to, to account for permissions, but in contrast, the Call rule in our
approach is standard, i.e., not permission-aware.
These tools, along with others [3, 18], offer strong correctness guarantees in
the presence of resource sharing. However, there is a class of problems, namely
those involving predicates with mixed permissions, whose guarantees are weak-
ened due to the general fractional permissions model behind these tools. We next
exemplify this class of problems in a sequential setting. We start by considering
a method which resets the values stored in a linked-list while maintaining its
shape (p < 1 below is to enforce the immutable shape):
{p < 1; ls(x, S)[1, p]} void reset(loc x) {ls(x, {0})[1, p]}.
Assume a call to this method, namely reset(y). The caller has full permission
over the entire list passed as argument, that is ls(y, B)[1, 1]. This attempt leads
to two issues. The first has to do with splitting the payload’s permission (before
the call) such that it matches the callee’s postcondition. To be able to modify the
list’s payload, the callee must get the payload’s full ownership, hence the caller
should retain 0: ls(y, B)[1, 1] = ls(y, B)[0, 1/2] ∗ ls(y, B)[1, 1/2]. But 0 is not a valid
fractional permission. The second issue surfaces while attempting to merge the
permissions after the call: ls(y, B)[0, 1/2]∗ls(y, {0})[1, 1/2] is invalid since the two
instances of ls have incompatible arguments (namely B and {0}). To avoid such
problems, BoSSL abandons the split rule and instead always manipulates full
ownership of resources, hence it does not use fractions. This compromise, along
with the support for symbolic borrows, allows ROBoSuSLik to guarantee read-
only-ness in a sequential setting while avoiding the aforementioned issues. More
investigations are needed in order to lift this result to concurrency reasoning.
Another feature which distinguishes the current work from those based on frac-
tional permissions, is the support for permissions as parameters of the predicate,
which in turn supports the definition of predicates with mixed permissions.
Immutable specifications on top of Separation Logic have also been studied by
David and Chin [13]. Unlike our approach which treats borrows as polymorphic
variables that rely on the basic concept of substitution, their annotation mech-
anism comprises only constants and requires a specially tailored entailment on
top of enhanced proof rules. Since callers retain the heap ownership upon calling
a method with read-only requirements, their machinery supports flexible aliasing
and cut-point preservation—features that we could not find a good use for in the
context of program synthesis. An attempt to extend David and Chin’s work by
adding support for predicates with mixed permissions [11] suffers from significant
annotation overhead. Specifically, it employs a mix of mutable, immutable, and
absent permissions, so that each mutable heaplet in the precondition requires a
corresponding matching heaplet annotated with absent in the postcondition.
Concise Read-Only Specifications for Better Synthesis 165

Charguéraud and Pottier [9] extended Separation Logic with RO assertions

that can be freely duplicated or discarded. Their approach creates lexically-
scoped copies of the RO-permissions before emitting a call, which, in turn, in-
volves discarding the corresponding heap from the postcondition to guarantee a
sound RO-modality. Adapting this modality to program synthesis guided by pre-
and postconditions would require a completely new system of deductive synthesis
since most of the rules in SSL are not designed to handle the d iscardable RO-
heaps. In contrast, BoSSL supports permission-parametric predicates (e.g., (9))
requiring only minimal adjustments to its host logic, i.e., SSL.
Program synthesis. BoSSL continues a long line of work on program synthesis
from formal specifications [26, 36, 40, 41, 44] and in particular, deductive synthe-
sis [14, 23, 29, 33, 34], which can be characterised as search in the space of proofs
of program correctness (rather than in the space of programs). Most directly
BoSSL builds upon our prior work on SSL [34] and enhances its specification
language with read-only annotations. In that sense, the present work is also re-
lated to various approaches that use non-functional specifications as input to
synthesis. It is common to use syntactic non-functional specifications, such as
grammars [2], sketches [36, 40], or restrictions on the number of times a compo-
nent can be used [19]. More recent work has explored semantic non-functional
specifications, including type annotations for resource consumption [24] and se-
curity/privacy [17,35,39]. This research direction is promising because (a) anno-
tations often enable the programmer to express a strong specification concisely,
and (b) checking annotations is often more compositional (i.e., fails faster) than
checking functional specifications, which makes synthesis more efficient. In the
present work we have demonstrated that both of these benefits of non-functional
specifications also hold for the read-only annotations of BoSSL.

7 Conclusion
In this work, we have advanced the state of the art in program synthesis by
highlighting the beneﬁts of guiding the synthesis process with information about
memory access permissions. We have designed the logic BoSSL and implemented
the tool ROBoSuSLik, showing that a minimalistic discipline for read-only per-
missions already brings signiﬁcant improvements wrt. the performance and ro-
bustness of the synthesiser, as well as wrt. the quality of its generated programs.
Acknowledgements. We thank Alexander J. Summers, Cristina David, Olivier
Danvy, and Peter O’Hearn for their comments on the prelimiary versions of
the paper. We are very grateful to the ESOP 2020 reviewers for their detailed
feedback, which helped to conduct a more adequate comparison with related
approaches and, thus, better frame the conceptual contributions of this work.
Nadia Polikarpova’s research was supported by NSF grant 1911149. Amy Zhu’s
research internship and stay in Singapore during the Summer 2019 was supported
by Ilya Sergey’s start-up grant at Yale-NUS College, and made possible thanks
to UBC Science Co-op Program.
166 A. Costea et al.

References
1. The Rust Programming Language: References and Borrowing. https://doc.
rust-lang.org/1.8.0/book/references-and-borrowing.html, 2019.
2. Rajeev Alur, Rastislav Bodı́k, Garvit Juniwal, Milo M. K. Martin, Mukund
Raghothaman, Sanjit A. Seshia, Rishabh Singh, Armando Solar-Lezama, Emina
Torlak, and Abhishek Udupa. Syntax-guided synthesis. In FMCAD, pages 1–8.
IEEE, 2013.
3. Andrew W. Appel. Verified software toolchain - (invited talk). In ESOP, volume
6602 of LNCS, pages 1–17. Springer, 2011.
4. Vytautas Astrauskas, Peter Müller, Federico Poli, and Alexander J. Summers.
Leveraging Rust types for modular specification and verification. PACMPL,
3(OOPSLA):147:1–147:30, 2019.
5. Thibaut Balabonski, François Pottier, and Jonathan Protzenko. The Design and
Formalization of Mezzo, a Permission-Based Programming Language. ACM Trans.
Program. Lang. Syst., 38(4):14:1–14:94, 2016.
6. Josh Berdine, Cristiano Calcagno, and Peter W. O’Hearn. Symbolic execution
with separation logic. In APLAS, volume 3780 of LNCS, pages 52–68. Springer,
2005.
7. Richard Bornat, Cristiano Calcagno, Peter W. O’Hearn, and Matthew J. Parkin-
son. Permission Accounting in Separation Logic. In POPL, pages 259–270. ACM,
2005.
8. John Boyland. Checking Interference with Fractional Permissions. In SAS, volume
2694 of LNCS, pages 55–72. Springer, 2003.
9. Arthur Charguéraud and François Pottier. Temporary Read-Only Permissions for
Separation Logic. In ESOP, volume 10201 of LNCS, pages 260–286. Springer, 2017.
10. Dave Clarke, Johan Östlund, Ilya Sergey, and Tobias Wrigstad. Ownership Types:
A Survey, pages 15–58. Springer Berlin Heidelberg, 2013.
11. Andreea Costea, Asankhaya Sharma, and Cristina David. HIPimm: verifying gran-
ular immutability guarantees. In PEPM, pages 189–194. ACM, 2014.
12. Andreea Costea, Amy Zhu, Nadia Polikarpova, and Ilya Sergey. ROBoSuSLik:
ESOP 2020 Artifact. 2020. DOI: 10.5281/zenodo.3630044.
13. Cristina David and Wei-Ngan Chin. Immutable specifications for more concise and
precise verification. In OOPSLA, pages 359–374. ACM, 2011.
14. Benjamin Delaware, Clément Pit-Claudel, Jason Gross, and Adam Chlipala. Fiat:
Deductive Synthesis of Abstract Data Types in a Proof Assistant. In POPL, pages
689–700. ACM, 2015.
15. Robert Dockins, Aquinas Hobor, and Andrew W. Appel. A fresh look at separation
algebras and share accounting. In APLAS, volume 5904 of LNCS, pages 161–177.
Springer, 2009.
16. Ronald Garcia, Éric Tanter, Roger Wolff, and Jonathan Aldrich. Foundations of
typestate-oriented programming. ACM Trans. Program. Lang. Syst., 36(4):12:1–
12:44, 2014.
17. Adrià Gascón, Ashish Tiwari, Brent Carmer, and Umang Mathur. Look for the
proof to find the program: Decorated-component-based program synthesis. In CAV,
volume 10427 of LNCS, pages 86–103. Springer, 2017.
18. Colin S. Gordon, Matthew J. Parkinson, Jared Parsons, Aleks Bromfield, and Joe
Duffy. Uniqueness and reference immutability for safe parallelism. In OOPSLA,
pages 21–40. ACM, 2012.
19. Sumit Gulwani, Susmit Jha, Ashish Tiwari, and Ramarathnam Venkatesan. Syn-
thesis of loop-free programs. In PLDI, pages 62–73. ACM, 2011.
Concise Read-Only Specifications for Better Synthesis 167

20. Stefan Heule, K. Rustan M. Leino, Peter Müller, and Alexander J. Summers. Ab-
stract read permissions: Fractional permissions without the fractions. In VMCAI,
volume 7737 of LNCS, pages 315–334. Springer, 2013.
21. Bart Jacobs, Jan Smans, Pieter Philippaerts, Frédéric Vogels, Willem Penninckx,
and Frank Piessens. VeriFast: A Powerful, Sound, Predictable, Fast Verifier for C
and Java. In NASA Formal Methods, volume 6617 of LNCS, pages 41–55. Springer,
2011.
22. Ralf Jung, Jacques-Henri Jourdan, Robbert Krebbers, and Derek Dreyer. Rust-
Belt: Securing the foundations of the Rust programming language. PACMPL,
2(POPL):66, 2017.
23. Etienne Kneuss, Ivan Kuraj, Viktor Kuncak, and Philippe Suter. Synthesis modulo
recursive functions. In OOPSLA, pages 407–426. ACM, 2013.
24. Tristan Knoth, Di Wang, Nadia Polikarpova, and Jan Hoffmann. Resource-guided
program synthesis. In PLDI, pages 253–268. ACM, 2019.
25. Xuan Bach Le and Aquinas Hobor. Logical reasoning for disjoint permissions. In
ESOP, volume 10801 of LNCS, pages 385–414. Springer, 2018.
26. K. Rustan M. Leino and Aleksandar Milicevic. Program Extrapolation with Jen-
nisys. In OOPSLA, pages 411–430. ACM, 2012.
27. K. Rustan M. Leino and Peter Müller. A Basis for Verifying Multi-threaded Pro-
grams. In ESOP, volume 5502 of LNCS, pages 378–393. Springer, 2009.
28. K. Rustan M. Leino, Peter Müller, and Jan Smans. Verification of Concurrent
Programs with Chalice. In Foundations of Security Analysis and Design V, FOSAD
2007/2008/2009 Tutorial Lectures, volume 5705 of LNCS, pages 195–222. Springer,
2009.
29. Zohar Manna and Richard J. Waldinger. A deductive approach to program syn-
thesis. ACM Trans. Program. Lang. Syst., 2(1):90–121, 1980.
30. Peter Müller, Malte Schwerhoff, and Alexander J. Summers. Viper: A Verification
Infrastructure for Permission-Based Reasoning. In VMCAI, volume 9583 of LNCS,
pages 41–62. Springer, 2016.
31. Karl Naden, Robert Bocchino, Jonathan Aldrich, and Kevin Bierhoff. A type
system for borrowing permissions. In POPL, pages 557–570. ACM, 2012.
32. Peter W. O’Hearn, John C. Reynolds, and Hongseok Yang. Local reasoning about
programs that alter data structures. In CSL, volume 2142 of LNCS, pages 1–19.
Springer, 2001.
33. Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. Program synthesis
from polymorphic refinement types. In PLDI, pages 522–538. ACM, 2016.
34. Nadia Polikarpova and Ilya Sergey. Structuring the Synthesis of Heap-
Manipulating Programs. PACMPL, 3(POPL):72:1–72:30, 2019.
35. Nadia Polikarpova, Jean Yang, Shachar Itzhaky, and Armando Solar-Lezama. En-
forcing information flow policies with type-targeted program synthesis. CoRR,
abs/1607.03445, 2016.
36. Xiaokang Qiu and Armando Solar-Lezama. Natural synthesis of provably-correct
data-structure manipulations. PACMPL, 1(OOPSLA):65:1–65:28, 2017.
37. John C. Reynolds. Separation logic: A logic for shared mutable data structures.
In LICS, pages 55–74. IEEE Computer Society, 2002.
38. Reuben N. S. Rowe and James Brotherston. Automatic cyclic termination proofs
for recursive procedures in separation logic. In CPP, pages 53–65. ACM, 2017.
39. Calvin Smith and Aws Albarghouthi. Synthesizing differentially private programs.
Proc. ACM Program. Lang., 3(ICFP):94:1–94:29, July 2019.
40. Armando Solar-Lezama. Program sketching. STTT, 15(5-6):475–495, 2013.
168 A. Costea et al.

41. Saurabh Srivastava, Sumit Gulwani, and Jeffrey S. Foster. From program verifica-
tion to program synthesis. In POPL, pages 313–326. ACM, 2010.
42. Sven Stork, Karl Naden, Joshua Sunshine, Manuel Mohr, Alcides Fonseca, Paulo
Marques, and Jonathan Aldrich. Æminium: A Permission-Based Concurrent-by-
Default Programming Language Approach. TOPLAS, 36(1):2:1–2:42, 2014.
43. Alexander J. Summers and Peter Müller. Automating deductive verification for
weak-memory programs. In TACAS, volume 10805 of LNCS, pages 190–209.
Springer, 2018.
44. Emina Torlak and Rastislav Bodı́k. A lightweight symbolic virtual machine for
solver-aided host languages. In PLDI, pages 530–541. ACM, 2014.

Francesco Dagnino1 , Viviana Bono2 , Elena Zucca1 , and

Mariangiola Dezani-Ciancaglini2
1
DIBRIS, University of Genova, Italy
2
Computer Science Department, University of Torino, Italy

Abstract. We propose a general proof technique to show that a pred-

icate is sound, that is, prevents stuck computation, with respect to a
big-step semantics. This result may look surprising, since in big-step se-
mantics there is no difference between non-terminating and stuck com-
putations, hence soundness cannot even be expressed. The key idea is
to define constructions yielding an extended version of a given arbitrary
big-step semantics, where the difference is made explicit. The extended
semantics are exploited in the meta-theory, notably they are necessary
to show that the proof technique works. However, they remain transpar-
ent when using the proof technique, since it consists in checking three
conditions on the original rules only, as we illustrate by several examples.

1 Introduction
The semantics of programming languages or software systems specifies, for each
program/system configuration, its final result, if any. In the case of non-existence
of a final result, there are two possibilities:
– either the computation stops with no final result, and there is no means to
compute further: stuck computation,
– or the computation never stops: non-termination.
There are two main styles to define operationally a semantic relation: the
small-step style [34,35], on top of a reduction relation representing single com-
putation steps, or directly by a set of rules as in the big-step style [28]. Within a
small-step semantics it is straightforward to make the distinction between stuck
and non-terminating computations, while a typical drawback of the big-step style
is that they are not distinguished (no judgement is derived in both cases).
For this reason, even though big-step semantics is generally more abstract,
and sometimes more intuitive to design and therefore to debug and extend, in the
literature much more effort has been devoted to study the meta-theory of small-
step semantics, providing properties, and related proof techniques. Notably, the
soundness of a type system (typing prevents stuck computation) can be proved
by progress and subject reduction (also called type preservation) [40].
Our quest is then to provide a general proof technique to prove the soundness
of a predicate with respect to an arbitrary big-step semantics. How can we
achieve this result, given that in big-step formulation soundness cannot even
c The Author(s) 2020
P. Müller (Ed.): ESOP 2020, LNCS 12075, pp. 169–196, 2020.
https://doi.org/10.1007/978-3-030-44914-8_ 7
170 F. Dagnino et al.

be expressed, since non-termination is modelled as the absence of a ﬁnal result

exactly like stuck computation? The key idea is the following:

1. We deﬁne constructions yielding an extended version of a given arbitrary big-

step semantics, where the diﬀerence between stuckness and non-termination
is made explicit. In a sense, these constructions show that the distinction
was “hidden” in the original semantics.
2. We provide a general proof technique by identifying three suﬃcient condi-
tions on the original big-step rules to prove soundness.

Keypoint (2)’s three suﬃcient conditions are local preservation, ∃-progress,

and ∀-progress. For proving the result that the three conditions actually ensure
soundness, the setting up of the extended semantics from the given one is nec-
essary, since otherwise, as said above, we could not even express the property.
However, the three conditions deal only with the original rules of the given
big-step semantics. This means that, practically, in order to use the technique
there is no need to deal with the extended semantics. This implies, in particular,
that our approach does not increase the original number of rules. Moreover, the
sufficient conditions are checked only on single rules, which makes explicit the
proof fragments typically needed in a proof of soundness. Even though this is
not exploited in this paper, this form of locality means modularity, in the sense
that adding a new rule implies adding the corresponding proof fragment only.
As an important by-product, in order to formally define and prove correct
the keypoints (1) and (2), we propose a formalisation of “what is a big-step
semantics” which captures its essential features. Moreover, we support our ap-
proach by presenting several examples, demonstrating that: on the one hand,
their soundness proof can be easily rephrased in terms of our technique, that
is, by directly reasoning on big-step rules; on the other hand, our technique is
essential when the property to be checked (for instance, the soundness of a type
system) is not preserved by intermediate computation steps, whereas it holds
for the final result. On a side note, our examples concern type systems, but the
meta-theory we present in this work holds for any predicate.
We describe now in more detail the constructions of keypoint (1). Starting
from an arbitrary big-step judgment c ⇒ r that evaluates configurations c into
results r , the first construction produces an enriched judgement c ⇒tr t where t
is a trace, that is, the (finite or infinite) sequence of all the (sub)configurations
encountered during the evaluation. In this way, by interpreting coinductively the
rules of the extended semantics, an infinite trace models divergence (whereas
no result corresponds to stuck computation). The second construction is in a
sense dual. It is the algorithmic version of the well-known technique presented
in Exercise 3.5.16 from the book [33] of adding a special result wrong explicitly
modelling stuck computations (whereas no result corresponds to divergence).
By trace semantics and wrong semantics we can express two flavours of sound-
ness, soundness-may and soundness-must, respectively, and show the correctness
of the corresponding proof technique. This achieves our original aim, and it
should be noted that we define soundness with respect to a big-step semantics
Soundness conditions for big-step semantics 171

within a big-step formulation, without resorting to a small-step style (indeed,

the two extended semantics are themselves big-step).
Lastly, we consider the issue of justifying on a formal basis that the two
constructions are correct with respect to their expected meaning. For instance,
for the wrong semantics we would like to be sure that all the cases are covered.
To this end, we define a third construction, dubbed pev for “partial evalua-
tion”, which makes explicit the computations of a big-step semantics, intended
as the sequences of execution steps of the naturally associated evaluation algo-
rithm. Formally, we obtain a reduction relation on approximated proof trees,
so termination, non-termination and stuckness can be defined as usual. Then,
the correctness of traces and wrong constructions is proved by showing they are
equivalent to pev for diverging and stuck computations, respectively.
In Sect. 2 we illustrate the meta-theory on a running example. In Sect. 3 we
define the trace and wrong constructions. In Sect. 4 we express soundness in the
must and may flavours, introduce the proof technique, and prove its correctness.
In Sect. 5 we show in detail how to apply the technique to the running example,
and other significant examples. In Sect. 6 we introduce the third construction and
state that the three constructions are equivalent. Finally, in 7 and 8 we discuss
related and further work and summarise our contribution. An extended version
including an additional example, proofs omitted for lack of space, and technical
details on the pev semantics, can be found at http://arxiv.org/abs/2002.08738.

2 A meta-theory for big-step semantics

We introduce a formalisation of “what is a big-step semantics” that captures its

essential features, subsuming a large class of examples (as testiﬁed in Sect. 5).
This enables a general formal reasoning on an arbitrary big-step semantics.
A big-step semantics is a triple C , R, R where:

– C is a set of configurations c.
– R ⊆ C is a set of results r . We define judgments j ≡ c ⇒ r , meaning that
configuration c evaluates to result r . Set C (j ) = c and R(j ) = r .
– R is a set of rules ρ of shape
j1 . . . jn jn+1
also written in inline format: rule(j1 . . . jn , jn+1 , c)
c ⇒ R(jn+1 )
with c ∈ C \R, where j1 . . . jn are the dependencies and jn+1 is the continu-
ation. Set C (ρ)=c and, for i ∈ 1..n + 1, C (ρ, i)=C (ji ) and R(ρ, i)=R(ji ).
– For each result r ∈ R, we implicitly assume a single axiom . Hence, the
r⇒r
only derivable judgment for r is r ⇒ r , which we will call a trivial judgment.

We will use the inline format, more concise and manageable, for the development
of the meta-theory, e.g., in constructions.
A rule corresponds to the following evaluation process for a non-result con-
ﬁguration: ﬁrst, dependencies are evaluated in the given order, then the contin-
uation is evaluated and its result is returned as result of the entire computation.
172 F. Dagnino et al.

e ::= x | v | e1 e2 | succ e | e1 ⊕ e2 expression

v ::= n | λx.e value

e1 ⇒ λx.e e2 ⇒ v2 e[v2 /x ] ⇒ v e⇒n

(val) ( app ) (succ)
v⇒v e1 e2 ⇒ v succ e ⇒ n + 1
ei ⇒ v
(choice) i = 1, 2
e 1 ⊕ e2 ⇒ v

(app) rule(e1 ⇒ λx .e e2 ⇒ v2 , e[v2 /x ] ⇒ v , e1 e2 )

(succ) rule(e ⇒ n, n + 1 ⇒ n + 1, succ e)
(choice) rule(, ei ⇒ v, e1 ⊕ e2 ) i = 1, 2

Fig. 1. Example of big-step semantics

Rules as deﬁned above specify an inference system [1,30], whose inductive

interpretation is, as usual, the semantic relation. However, they carry slightly
more structure with respect to standard inference rules. Notably, premises are
a sequence rather than a set, and the last premise plays a special role. Such
additional structure does not affect the semantic relation defined by the rules,
but allows abstract reasoning about an arbitrary big-step semantics, in particular
it is relevant for defining the three constructions. In the following, we will write
R
c ⇒ r when the judgment c ⇒ r is derivable in R.
As customary, the (infinite) set of rules R is described by a finite set of meta-
rules, each one with a finite number of premises. As a consequence, the number of
premises of rules is not only finite but bounded. Since we have no notion of meta-
rule, we model this feature (relevant in the following) as an explicit assumption:
BP there exists b ∈ N such that, for each ρ ≡ rule(j1 . . . jn , jn+1 , c), n < b.
We end this section illustrating the above definitions and conditions by a simple
example: a λ-calculus with natural constants, successor and non-deterministic
choice shown in Fig. 1. We present this example as an instance of our definition:

– Conﬁgurations and results are expressions, and values, respectively. 3

– To have the set of (meta-)rules in our required shape, abbreviated in inline
format in the bottom section of the ﬁgure:
• axiom (val) can be omitted (it is implicitly assumed)
• in (app) we consider premises as a sequence rather than a set (the third
premise is the continuation)
• in (succ), which has no continuation, we add a dummy continuation
• on the contrary, in (choice) there is only the continuation (dependencies
are the empty sequence, denoted in the inline format).

Note that (app) corresponds to the standard left-to-right evaluation order. We

could have chosen the right-to-left order instead:
(app-r) rule(e2 ⇒ v2 e1 ⇒ λx.e , e[v2 /x] ⇒ v, e1 e2 )
or even opt for a non-deterministic approach by taking both rules (app) and
3
In general, conﬁgurations may include additional components, see Sect. 5.2.
Soundness conditions for big-step semantics 173

(app-r). As said above, these different choices do not affect the semantic relation
c ⇒ r defined by the inference system, which is always the same. However, they
will affect the way the extended semantics distinguishing stuck computation and
non-termination is constructed. Indeed, if the evaluation of e1 and e2 is stuck
and non-terminating, respectively, we should obtain stuck computation with rule
(app) and non-termination with rule (app-r).
In summary, to see a typical big-step semantics as an instance of our defi-
nition, it is enough to assume an order (or more than one) on premises, make
implicit the axiom for results, and add a dummy continuation when needed. In
the examples (Sect. 5), we will assume a left-to-right order on premises, and
omit dummy continuations to keep a more familiar style. In the technical part
(Sect. 3, Sect. 4 and Sect. 6) we will adopt the inline format.

3 Extended semantics
In the following, we assume a big-step semantics C , R, R and describe two
constructions which make the distinction between non-termination and stuck
computation explicit. In both cases, the approach is based on well-know ideas;
the novel contribution is that, thanks to the meta-theory in Sect. 2, we provide
a general construction working on an arbitrary big-step semantics.

3.1 Traces
We denote by C , C ω , and C ∞ = C ∪C ω , respectively, the sets of finite, infinite,
and possibly infinite traces, that is, sequences of configurations. We write t · t
for concatenation of t∈C with t ∈C ∞ .
We derive, from the judgement c ⇒ r , an enriched big-step judgement c ⇒tr t
with t ∈ C ∞ . Intuitively, t keeps trace of all the configurations visited during the
evaluation, starting from c itself. To define the trace semantics, we construct,
starting from R, a new set of rules Rtr , which are of two kinds:
trace introduction These rules enrich the standard semantics by finite traces:
for each ρ ≡ rule(j1 . . . jn , jn+1 , c) in R, and finite traces t1 , . . . , tn+1 ∈C ,
we add the rule
C (j1 ) ⇒tr t1 · R(j1 ) . . . C (jn+1 ) ⇒tr tn+1 · R(jn+1 )
c ⇒tr c · t1 · R(j1 ) · . . . · tn+1 · R(jn+1 )
We denote this rule by trace(ρ, t1 , . . . , tn+1 ), to highlight the relationship
with the original rule ρ. We also add one axiom for each result r .
r ⇒tr r

Such rules derive judgements c ⇒ t with t∈C , for convergent computations.
divergence propagation These rules propagate divergence, that is, if a
(sub)configuration in the premise of a rule diverges, then the subsequent
premises are ignored and the configuration in the conclusion diverges as
well: for each ρ ≡ rule(j1 . . . jn , jn+1 , c) in R, index i∈1..n + 1, finite traces
t1 , . . . , ti−1 ∈ C , and infinite trace t, we add the rule:
C (j1 ) ⇒tr t1 · R(j1 ) . . . C (ji−1 ) ⇒tr ti−1 · R(ji−1 ) C (ji ) ⇒ t
c ⇒ c · t1 · R(j1 ) · . . . · ti−1 · R(ti−1 ) · t
174 F. Dagnino et al.

e1 ⇒tr t1 · λx.e e2 ⇒tr t2 · v2 e[v2 /x] ⇒tr t · v

(app-trace) t1 , t2 , t∈C
e1 e2 ⇒tr e1 e2 · t1 · λx.e · t2 · v2 · t · v
e1 ⇒tr t e1 ⇒tr t1 · λx.e e2 ⇒tr t
(div-app-1) t∈C ω (div-app-2) t1 ∈C , t∈C ω
e1 e2 ⇒tr e1 e2 · t e1 e2 ⇒tr e1 e2 · t1 · λx.e · t

e1 ⇒tr t1 · λx.e e2 ⇒tr t2 · v2 e[v2 /x] ⇒tr t

(div-app-3) t1 , t 2 ∈ C , t ∈ C ω
e1 e2 ⇒tr e1 e2 · t1 · λx.e · t2 · v2 · t

Fig. 2. Trace semantics for application

We denote this rule by prop(ρ, i, t1 , . . . , ti−1 , t) to highlight the relationship

with the original rule ρ. These rules derive judgements c ⇒tr t with t ∈ C ω ,
modelling diverging computations.

The inference system Rtr must be interpreted coinductively, to properly

model diverging computations. Indeed, since there is no axiom introducing an
infinite trace, they can be derived only by an infinite proof tree. We write
Rtr
c ⇒tr t when the judgment c ⇒tr t is derivable in Rtr .
We show in Fig. 2 the rules obtained starting from meta-rule (app) of the
example (for other meta-rules the outcome is analogous).
For instance, set Ω = ω ω = (λx .x x ) (λx .x x ), and tΩ the infinite trace
Ω · ω · ω · Ω · ω · ω · . . ., it is easy to see that the judgment Ω ⇒tr tΩ can be derived
by the following infinite tree:4
..
.
(trace-val) (trace-val) (div-app3)
ω ⇒tr ω ω ⇒tr ω ω ω ≡ (x x )[ω/x ] ⇒tr tΩ
(div-app3)
Ω ⇒ Ω · ω · ω · tΩ ≡ tΩ
Note that only the judgment Ω ⇒tr tΩ can be derived, that is, the trace semantics
of Ω is uniquely determined to be tΩ , since the infinite proof tree forces the
equation tΩ = Ω · ωω · tΩ . This example is a cyclic proof, but there are divergent
computations with no circular derivation.
The trace construction is conservative with respect to the original semantics,
that is, converging computations are not affected.
Theorem 1. Rtr
c ⇒tr t · r for some t ∈ C iff R
c ⇒ r .

3.2 Wrong

A well-known technique [33] (Exercise 3.5.16) to distinguish between stuck and

diverging computations, in a sense “dual” to the previous one, is to add a special
result wrong, so that c ⇒ wrong means that the evaluation of c goes stuck.
In this case, to deﬁne an “automatic” version of the construction, starting
from C , R, R, is a non-trivial problem. Our solution is based on deﬁning a re-
lation on rules, modelling equality up to a certain index i, also used for other aims
4
To help the reader, we add equivalent expressions with a grey background.
Soundness conditions for big-step semantics 175

in the following. Consider ρ ≡ rule(j1 . . . jn , jn+1 , c), ρ ≡ rule(j1 . . . jm

, jm+1 , c ),

and an index i ∈ 1.. min(n + 1, m + 1), then ρ ∼i ρ if
– c = c
– for all k < i, jk = jk
– C (ji ) = C (ji )
Intuitively, this means that rules ρ and ρ model the same computation until
the i-th premise. Using this relation, we derive, from the judgment c ⇒ r , an
enriched big-step judgement c ⇒ rwr where rwr ∈ R ∪ {wrong}, defined by a set
of rules Rwr containing all rules in R and two other kinds of rules:
wrong introduction These rules derive wrong whenever the (sub)configuration
in a premise of a rule reduces to a result which is not admitted in such (or any
equivalent) rule: for each ρ ≡ rule(j1 . . . jn , jn+1 , c) in R, index i ∈ 1..n + 1,
and result r ∈ R, if for all rules ρ such that ρ ∼i ρ , R(ρ , i) = r , then we
add the rule wrong(ρ, i, r ) as follows:
j1 . . . ji−1 C (ji ) ⇒ r
c ⇒ wrong
We also add an axiom for each configuration c which is not the
c ⇒ wrong
conclusion of any rule.
wrong propagation These rules propagate wrong analogously to those for di-
vergence propagation: for each ρ ≡ rule(j1 . . . jn , jn+1 , c) in R, and index
i ∈ 1..n + 1, we add the rule prop(ρ, i, wrong) as follows:
j1 . . . ji−1 C (ji ) ⇒ wrong
c ⇒ wrong
We write Rwr
c ⇒ rwr when the judgment c ⇒ rwr is derivable in Rwr .
We show in Fig. 3 the meta-rules for wrong introduction and propagation
constructed starting from those for application and successor. For instance, rule
(wrong-app) is introduced since in the original semantics there is rule (app) with
e1 e2 in the consequence and e1 in the first premise, but there is no equivalent
rule (that is, with e1 e2 in the consequence and e1 in the first premise) such that
the result in the first premise is n.
The wrong construction is conservative as well.
Theorem 2. Rwr
c ⇒ r iff R
c ⇒ r .

e1 ⇒ n e ⇒ λx.e
(wrong-app) (wrong-succ)
e1 e2 ⇒ wrong succ e ⇒ wrong

e1 ⇒ wrong e1 ⇒ λx.e e2 ⇒ wrong

(prop-app-1) (prop-app-2)
e1 e2 ⇒ wrong e1 e2 ⇒ wrong

e1 ⇒ λx.e e2 ⇒ v2 e[v2 /x] ⇒ wrong e ⇒ wrong

(prop-app-3) (prop-succ)
e1 e2 ⇒ wrong succ e ⇒ wrong

Fig. 3. Semantics with wrong for application and successor

176 F. Dagnino et al.

4 Expressing and proving soundness

A predicate (for instance, a typing judgment) is sound when, informally, a pro-
gram satisfying the predicate (e.g., a well-typed program) cannot go wrong, fol-
lowing Robin Milner’s slogan [31]. In small-step style, as ﬁrstly formulated in [40],
this is naturally expressed as follows: well-typed programs never reduce to terms
which neither are values, nor can be further reduced (called stuck terms). The
standard technique to ensure soundness is by subject reduction (well-typedness
is preserved by reduction) and progress (a well-typed term is not stuck).
We discuss how soundness can be expressed for the two approaches previously
presented and we introduce suﬃcient conditions. In other words, we provide a
proof technique to show the soundness of a predicate with respect to a big-step
semantics. As mentioned in the Introduction, the extended semantics is only
needed to prove the correctness of the technique, whereas to apply the technique
for a given big-step semantics it is enough to reason on the original rules.

4.1 Expressing soundness

In the following, we assume a big-step semantics C , R, R, and an indexed
predicate on configurations, that is, a family Π = (Πι )ι∈I , for I set of indexes,
with Πι ⊆ C . A representative case is that, as in the examples of Sect. 5,
the predicate is a typing judgment and the indexes are types; however, the
proof technique could be applied to other kinds of predicates. When there is no
ambiguity, we also denote by Π the corresponding predicate ι∈I Πι on C (e.g.,
to be well-typed with an arbitrary type).
To discuss how to express soundness of Π, first of all note that, in the non-
deterministic case (that is, there is possibly more than one computation for a
configuration), we can distinguish two flavours of soundness [21]:
soundness-must (or simply soundness) no computation can be stuck
soundness-may at least one computation is not stuck
Soundness-must is the standard soundness in small-step semantics, and can be
expressed in the wrong extension as follows:
soundness-must (wrong) If c ∈ Π, then Rwr
c ⇒ wrong
Instead, soundness-must cannot be expressed in the trace extension. Indeed,
stuck computations are not explicitly modelled. Conversely, soundness-may can
be expressed in the trace extension as follows:
soundness-may (traces) If c ∈ Π, then there is t such that Rtr
c ⇒tr t
whereas cannot be expressed in the wrong semantics, since diverging computa-
tions are not modelled.
Of course soundness-must and soundness-may coincide in the deterministic
case. Finally, note that indexes (e.g., the specific types of configurations) do
not play any role in the above statements. However, they are relevant in the
Soundness conditions for big-step semantics 177

notion of strong soundness, introduced by [40]. Strong soundness holds if, for
configurations satisfying Πι (e.g., having a given type), computation cannot be
stuck, and moreover, produces a result satisfying Πι (e.g., of the same type)
if terminating. Note that soundness alone does not even guarantee to obtain a
result satisfying Π (e.g., a well-typed result). The three conditions introduced
in the following section actually ensure strong soundness.
In Sect. 4.2 we provide sufficient conditions for soundness-must, showing that
they actually ensure soundness in the wrong semantics (Theorem 3). Then, in
Sect. 4.3, we provide (weaker) sufficient conditions for soundness-may, and show
that they actually ensure soundness-may in the trace semantics (Theorem 4).

4.2 Conditions ensuring soundness-must

The three conditions which ensure the soundness-must property are local preser-
vation, ∃-progress, and ∀-progress. The names suggest that the former plays the
role of the type preservation (subject reduction) property, and the latter two
of the progress property in small-step semantics. However, as we will see, the
correspondence is only rough, since the reasoning here is diﬀerent.
Considering the ﬁrst condition more closely, we use the name preservation
rather than type preservation since, as already mentioned, the proof technique
can be applied to arbitrary predicates. More importantly, local means that the
condition is on single rules rather than on the semantic relation as a whole, as
standard subject reduction. The same holds for the other two conditions.

Deﬁnition 1 (S1: Local Preservation). For each ρ≡rule(j1 . . . jn , jn+1 , c), if

c∈Πι , then there exist ι1 , . . . , ιn+1 ∈ I , with ιn+1 =ι, such that, for all k ∈ 1..n + 1:

if, for all h < k, R(jh ) ∈ Πιh , then C (jk ) ∈ Πιk .

Thinking to the paradigmatic case where the indexes are types, for each rule
ρ, if the configuration c in the consequence has type ι, we have to find types
ι1 , . . . , ιn+1 which can be assigned to (the configurations in) the premises, in
particular the same type as c for the continuation. More precisely, we start find-
ing type ι1 , and successively find the type ιk for (the configuration in) the k-th
premise assuming that the results of all the previous premises have the expected
types. Indeed, if all such previous premises are derivable, then the expected type
should be preserved by their results; if some premise is not derivable, the consid-
ered rule is “useless”. For instance, considering (an instantiation of) meta-rule
(app) rule(e1 ⇒ λx .e e2 ⇒ v2 , e[v2 /x ] ⇒ v , e1 e2 ) in Sect. 2, we prove that e[v2 /x ]
has the type T of e1 e2 under the assumption that λx .e has type T → T , and
v2 has type T (see the proof example in Sect. 5.1 for more details).
A counter-example to condition S1 is discussed at the beginning of Sect. 5.3.
The following lemma states that local preservation actually implies preser-
vation of the semantic relation as a whole.

Lemma 1 (Preservation). Let R and Π satisfy condition S1. If R

c ⇒ r
and c ∈ Πι , then r ∈ Πι .
178 F. Dagnino et al.

Proof. The proof is by a double induction. We denote by RH and IH the ﬁrst

and the second induction hypothesis, respectively. The ﬁrst induction is on big-
step rules. Axioms have conclusion r ⇒ r , hence the thesis holds since r ∈ Πι by
hypothesis. Other rules have shape rule(j1 . . . jn , jn+1 , c) with c ∈ Πι . We prove
by complete induction on k ∈ 1..n + 1 thatC (jk ) ∈ Πιk , for all k ∈ 1..n + 1 and
for some ι1 , . . . , ιn+1 ∈ I . By S1, there are ι1 , . . . , ιn+1 ∈ I and C (j1 ) ∈ Πι1 .
For k > 1, by IH we know that C (jh ) ∈ Πιh , for all h < k. Then, by RH, we
get that R(jh ) ∈ Πιh . Moreover by S1, C (jk ) ∈ Πιk , as needed. In particular,
we have just proved that C (jn+1 ) ∈ Πιn+1 and, since by S1 ιn+1 = ι, we get
C (jn+1 ) ∈ Πι . Then, by RH, we conclude that r = R(jn+1 ) ∈ Πι , as needed.

The following proposition is a form of local preservation where indexes (e.g.,

speciﬁc types) are not relevant, simpler to use in the proofs of Theorems 3 and 4.

Proposition 1. Let R and Π satisfy condition S1. For each rule(j1 . . . jn , jn+1 , c)
and k ∈ 1..n + 1, if c ∈ Π and, for all h < k, R
jh , then C (jk ) ∈ Π.

The second condition, named ∃-progress, ensures that, for conﬁgurations sat-
isfying the predicate Π (e.g., well-typed), we can start constructing a proof tree.

Deﬁnition 2 (S2: ∃-progress). For each c ∈ Π\R, C (ρ) = c for some rule ρ.

The third condition, named ∀-progress, ensures that, for conﬁgurations sat-
isfying Π, we can continue constructing the proof tree. This condition uses the
notion of rules equivalent up-to an index introduced at the beginning of Sect. 3.2.

Deﬁnition 3 (S3: ∀-progress). For each ρ ≡ rule(j1 . . . jn , jn+1 , c), if c ∈ Π,

then, for each k ∈ 1..n + 1:

if, for all h < k, R

jh and R
C (jk ) ⇒ r , for some r ∈ R, then there
is a rule ρ ∼k ρ such that R(ρ , k) = r .

We have to check, for each rule ρ, the following: if the configuration c in the
consequence satisfies the predicate (e.g., is well-typed), then, for each k, if the
configuration in premise k evaluates to some result r (that is, R
C (jk ) ⇒ r ),
then there is a rule (ρ itself or another rule with the same configuration in the
consequence and the first k − 1 premises) with such judgment as k-th premise.
This check can be done under the assumption that all the previous premises
are derivable. For instance, consider again (an instantiation of) the meta-rule
(app) rule(e1 ⇒ λx .e e2 ⇒ v2 , e[v2 /x ] ⇒ v , e1 e2 ). Assuming that e1 evaluates to
some v1 , we have to check that there is a rule with first premise e1 ⇒ v1 , in
pratice, that v1 is a λ-abstraction; in general, checking S3 for a (meta-)rule
amounts to show that (sub)configurations in the premises evaluate to results
with the required shape (see also the proof example in Sect. 5.1).

Soundness-must in wrong semantics Recall that Rwr is the extension of R with

wrong (Sect. 3.2). We prove the claim of soundness-must with respect to Rwr .
Soundness conditions for big-step semantics 179

Theorem 3. Let R and Π satisfy conditions S1, S2 and S3. If c ∈ Π, then

Rwr
c ⇒ wrong.
Proof. To prove the statement, we assume Rwr
c ⇒ wrong and look for a con-
tradiction. The proof is by induction on the derivation of c ⇒ wrong.
If the last applied rule is an axiom, then, by construction, there is no rule ρ ∈ R
such that C (ρ) = c, and this violates condition S2, since c ∈ Π.
If the last applied rule is wrong(ρ, i, r ), with ρ ≡ rule(j1 . . . jn , jn+1 , c), then,
by hypothesis, for all k < i, Rwr
jk , and Rwr
C (ji ) ⇒ r , and these judg-
ments can also be derived in R by conservativity (Theorem 2). Furthermore, by
construction of this rule, we know that there is no other rule ρ ∼i ρ such that
R(ρ , i) = r , and this violates condition S3, since c ∈ Π.
If the last applied rule is prop(ρ, i, wrong), with ρ ≡ rule(j1 . . . jn , jn+1 , c), then,
by hypothesis, for all k < i, Rwr
jk , and these judgments can also be derived
in R by conservativity. Then, by Prop. 1 (which requires condition S1), since
c ∈ Π, we have C (ji ) ∈ Π, hence we get the thesis by induction hypothesis.
Sect. 5.1 ends with examples not satisfying properties S2 and S3.

4.3 Conditions ensuring soundness-may

As discussed in Sect. 4.1, in the trace semantics we can only express a weaker
form of soundness: at least one computation is not stuck (soundness-may). As
the reader can expect, to ensure this property weaker sufficient conditions are
enough: namely, condition S1, and another condition named progress-may and
defined below.
We write R
c ⇒ if c does not converge (there is no r such that R
c ⇒ r ).
Definition 4 (S4: progress-may). For each c ∈ Π\R, there is
ρ ≡ rule(j1 . . . jn , jn+1 , c) such that:
if there is a (first) k ∈ 1..n + 1 such that R
jk and, for all h < k,
R
jh , then R
C (jk ) ⇒ .
This condition can be informally understood as follows: we have to show that
there is an either finite or infinite computation for c. If we find a rule where all
premises are derivable (no k), then there is a finite computation. Otherwise, c
does not converge. In this case, we should find a rule where the configuration in
the first non-derivable premise k does not converge as well. Indeed, by coinduc-
tive reasoning (use of Lemma 2 below), we obtain that c diverges. The following
proposition states that this condition is indeed a weakening of S2 and S3.
Proposition 2. Conditions S2 and S3 imply condition S4.

Soundness-may in trace semantics Recall that Rtr is the extension of R with

traces, deﬁned in Sect. 3.1, where judgements have shape c ⇒tr t, with t ∈ C ∞ .
The following lemma provides a proof principle useful to coinductively show
that a property ensures the existence of an inﬁnite trace, in particular to show
Theorem 4. It is a slight variation of an analogous principle presented in [8].
180 F. Dagnino et al.

Lemma 2. Let S ⊆ C be a set. If, for all c ∈ S, there are ρ ≡ rule(j1 . . . jn , jn+1 , c)
and k ∈ 1..n + 1 such that
1. for all h < k, R
jh , and
2. C (jk ) ∈ S
then, for all c ∈ S, there is t ∈ C ω such that Rtr
c ⇒tr t.
Theorem 4. Let R and Π satisfy conditions S1 and S4. If c ∈ Π, then there
is t such that Rtr
c ⇒tr t.
Proof. First note that, thanks to Theorem 1, the statement is equivalent to the
following:
If c ∈ Π and R
c ⇒ , then there is t ∈ C ω such that Rtr
c ⇒tr t.
Then, the proof follows from Lemma 2. We define S = {c | c∈Π and R
c ⇒ },
and show that, for all c ∈ S, there are ρ ≡ rule(j1 . . . jn , jn+1 , c) and k ∈ 1..n + 1
such that, for all h < k, R
jh , and C (jk ) ∈ S.
Consider c ∈ S, then, by S4, there is ρ ≡ rule(j1 . . . jn , jn+1 , c). By definition
of S, we have R
c ⇒ , hence there exists a (first) k ∈ 1..n + 1 such that R
jk ,
since, otherwise, we would have R
c ⇒ R(jn+1 ). Then, since k is the first index
with such property, for all h < k, we have R
jh , hence, again by condition
S4, we have that R
C (jk ) ⇒ . Finally, since for all h < k we have R
jh , by
Prop. 1, we get C (jk ) ∈ Π, hence C (jk ) ∈ S, as needed.

5 Examples
Sect. 5.1 explains in detail how a typical soundness proof can be rephrased in
terms of our technique, by reasoning directly on big-step rules. Sect. 5.2 shows
a case where this is advantageous, since the property to be checked is not pre-
served by intermediate computation steps, whereas it holds for the ﬁnal result.
Sect. 5.3 considers a more sophisticated type system, with intersection and union
types. Finally, Sect. 5.4 shows another example where subject reduction is not
preserved, whereas soundness can be proved with our technique. This example
is intended as a preliminary step towards a more challenging case.

5.1 Simply-typed λ-calculus with recursive types

As a first example, we take the λ-calculus with natural constants, successor, and
choice used in Sect. 2 (Fig. 1). We consider a standard simply-typed version with
recursive types, obtained by interpreting the production in Fig. 4 coinductively.
Introducing recursive types makes the calculus non-normalising and permits to
write interesting programs such as Ω (see Sect. 3.1).
The typing rules are recalled in Fig. 4. Type environments, written Γ , are
finite maps from variables to types, and Γ {T /x } denotes the map which returns
T on x and coincides with Γ elsewhere. We write
e : T for ∅
e : T .
Let R1 be the big-step semantics defined in Fig. 1, and let Π1T (e) hold if

e : T , for T deﬁned in Fig. 4. To prove the three conditions S1, S2 and S3 of

Soundness conditions for big-step semantics 181

T ::= Nat | T1 → T2 type

(t-var) Γ (x ) = T (t-const)
Γ x :T Γ n : Nat

Γ {T /x } e : T Γ e1 : T → T Γ e 2 : T
(t-abs) (t-app)
Γ λx .e : T → T Γ e1 e 2 : T

Γ e : Nat Γ e 1 : T Γ e2 : T
(t-succ) (t-choice)
Γ succ e : Nat Γ e 1 ⊕ e2 : T

Fig. 4. λ-calculus: type system

Sect. 4.2, we need lemmas of inversion, substitution and canonical forms, as in

the standard technique.

Lemma 3 (Inversion).

1. If Γ
x : T , then Γ (x ) = T .
2. If Γ
n : T , then T = Nat.
3. If Γ
λx .e : T , then T = T1 → T2 and Γ {T1 /x }
e : T2 .
4. If Γ
e1 e2 : T , then Γ
e1 : T → T , and Γ
e2 : T .
5. If Γ
succ e : T , then T = Nat and Γ
e : Nat.
6. If Γ
e1 ⊕ e2 : T , then Γ
ei : T with i ∈ 1, 2.

Lemma 4 (Substitution). If Γ {T /x }
e : T and Γ
e : T , then Γ
e[e /x ] : T .

Lemma 5 (Canonical Forms).

1. If
v : T → T , then v = λx .e.
2. If
v : Nat, then v = n.

Theorem 5 (Soundness). The big-step semantics R1 and the indexed predi-

cate Π1 satisfy the conditions S1, S2 and S3 of Sect. 4.2.

Since the aim of this ﬁrst example is to illustrate the proof technique, we
provide a proof where we explain the reasoning in detail.
Proof of S1. We should prove this condition for each (instantiation of meta-)rule.
(app):Assume that
e1 e2 : T holds. We have to ﬁnd types for the premises,
notably T for the last one. We proceed as follows:

1. First premise: by Lemma 3 (4),

e1 : T → T .
2. Second premise: again by Lemma 3 (4),
e2 : T (without needing the
assumption
λx .e : T → T ).
3. Third premise:
e[v2 /x ] : T should hold (assuming
λx .e : T → T ,

v2 : T ). Since
λx .e : T → T , by Lemma 3 (3) we have x :T
e : T , so
by Lemma 4 and
v2 : T we have
e[v2 /x ] : T .
182 F. Dagnino et al.

(succ): This rule has an implicit continuation n + 1 ⇒ n + 1. Assume that

succ e : T holds. By Lemma 3 (5), T = Nat, and

e : Nat, hence we find
Nat as type for the first premise. Moreover,
n + 1 : Nat holds by rule (t-const).
(choice): Assume that
e1 ⊕ e2 : T holds. By Lemma 3 (6), we have
ei : T ,
with i ∈ 1, 2. Hence we find T as type for the premise.

Proof of S2. We should prove that, for each non-result conﬁguration (here,
expression e which is not a value) such that
e : T holds for some T , there is
a rule with this conﬁguration in the consequence. The expression e cannot be a
variable, since a variable cannot be typed in the empty environment. Applica-
tion, successor and choice appear as consequence in the reduction rules.

Proof of S3. We should prove this condition for each (instantiation of meta-)rule.
(app): Assuming
e1 e2 : T , again by Lemma 3 (4) we get Γ
e1 : T → T .
1. First premise: if e1 ⇒ v is derivable, then there should be a rule with e1 e2
in the consequence and e1 ⇒ v as first premise. Since we proved S1, by
preservation (Lemma 1)
v : T → T holds. Then, by Lemma 5 (1), v has
shape λx .e, hence the required rule exists. As noted at page 10, in practice
checking S3 for a (meta-)rule amounts to show that (sub)configurations in
the premises evaluate to results which have the required shape (to be a
λ-abstraction in this case).
2. Second premise: if e1 ⇒ λx .e, and e2 ⇒ v2 , then there should be a rule with
e1 e2 in the consequence and e1 ⇒ λx .e, e2 ⇒ v as first two premises. This is
trivial since the meta-variable v2 can be freely instantiated in the meta-rule.
(succ): Assuming
succ e : T , again by Lemma 3 (5) we get
e : Nat. If e ⇒ v
is derivable, there should be a rule with succ e in the consequence and e ⇒ v as
first premise. Indeed, by preservation (Lemma 1) and Lemma 5 (2), v has shape
n. For the second premise, if n + 1 ⇒ v is derivable, then v is necessarily n + 1.
(choice): Trivial since the meta-variable v can be freely instantiated.

An interesting remark is that, diﬀerently from the standard approach, there

is no induction in the proof: everything is by cases. This is a consequence of the
fact that, as discussed in Sect. 4.2, the three conditions are local, that is, they
are conditions on single rules. Induction is “hidden” in the proof that those three
conditions are suﬃcient to ensure soundness.
If we drop in Fig. 1 rule (succ), then condition S2 fails, since there is no longer
a rule for the well-typed non-result conﬁguration succ n. If we add the (fool) rule

0 0 : Nat, then condition S3 fails for rule (app), since 0 ⇒ 0 is derivable, but
there is no rule with 0 0 in the conclusion and 0 ⇒ 0 as ﬁrst premise.

5.2 MiniFJ&λ
In this example, the language is a subset of FJ&λ [12], a calculus extending
Featherweight Java (FJ) with λ-abstractions and intersection types, introduced
in Java 8. To keep the example small, we do not consider intersections and focus
Soundness conditions for big-step semantics 183

on one key typing feature: λ-abstractions can only be typed when occurring in a
context requiring a given type (called the target type). In a small-step semantics,
this poses a problem: reduction can move λ-abstractions into arbitrary contexts,
leading to intermediate terms which would be ill-typed. To maintain subject
reduction, in [12] λ-abstractions are decorated with their initial target type. In
a big-step semantics, there is no need of intermediate terms and annotations.
The syntax is given in the first part of Fig. 5. We assume sets of variables
x , class names C, interface names I, J, field names f, and method names m.
Interfaces which have exactly one method (dubbed functional interfaces) can be
used as target types. Expressions are those of FJ, plus λ-abstractions, and types
are class and interface names. In λxs.e we assume that xs is not empty and e
is not a λ-abstraction. For simplicity, we only consider upcasts, which have no
runtime effect, but are important to allow the programmer to use λ-abstractions,
as exemplified in discussing typing rules.
To be concise, the class table is abstractly modelled as follows:

– ﬁelds(C) gives the sequence of ﬁeld declarations T1 f1 ;..Tn fn ; for class C

– mtype(T , m) gives, for each method m in class or interface T , the pair
T1 . . . Tn → T consisting of the parameter types and return type
– mbody(C, m) gives, for each method m in class C, the pair x1 . . . xn , e con-
sisting of the parameters and body
– <: is the reﬂexive and transitive closure of the union of the extends and
implements relations
– !mtype(I) gives, for each functional interface I, mtype(I, m), where m is the
only method of I.

The big-step semantics is given in the last part of Fig. 5. MiniFJ&λ shows
an example of instantiation of the framework where configurations include an
auxiliary structure, rather than being just language terms. In this case, the
structure is an environment e (a finite map from variables to values) modelling
the current stack frame. Results are values, which are either objects, of shape
[vs]C , or λ-abstractions.
Rules for FJ constructs are straightforward. Note that, since we only consider
upcasts, casts have no runtime effect. Indeed, they are guaranteed to succeed on
well-typed expressions. Rule (λ-invk) shows that, when the receiver of a method
is a λ-abstraction, the method name is not significant at runtime, and the effect
is that the body of the function is evaluated as in the usual application.
The type system is given in Fig. 6. Method bodies are expected to be well-
typed with respect to method types. Formally, mbody(C, m) and mtype(C, m)
are either both defined or both undefined: in the first case mbody(C, m) =
x1 . . . xn , e, mtype(C, m) = T1 . . . Tn → T , and x1 :T1 , . . . , xn :Tn , this:C
e :
T . Moreover, we assume other standard FJ constraints on the class table, such
as no field hiding, no method overloading, the same parameter and return types
in overriding.
Besides the standard typing features of FJ, the MiniFJ&λ type system en-
sures the following.
184 F. Dagnino et al.

e ::= x | e.f | new C(e1 , . . . , en ) | e.m(e1 , . . . , en ) | λxs.e | (T )e expression

xs ::= x1 . . . xn variable list
T ::= C | I type
c ::= e, e | v conﬁguration
v ::= [vs]C | λxs.e result (value)
vs ::= v1 , . . . , vn value list

(var) e(x ) = v
e, x ⇒ v

e, e ⇒ [v1 , . . . , vn ]C ﬁelds(C) = T1 f1 ; . . . Tn fn ;

i ∈ 1..n
(field-access)
e, e.fi ⇒ vi

e, ei ⇒ vi ∀i ∈ 1..n
(new)
e, new C(e1 , . . . , en ) ⇒ [v1 , . . . , vn ]C

e, e0 ⇒ [vs]C
e,
ei ⇒ vi ∀i ∈ 1..n
x1 :v1 , . . . , xn :vn , this:[vs]C , e ⇒ v mbody(C, m) = x1 . . . xn , e
(invk)
e, e0 .m(e1 , . . . , en ) ⇒ v

e, e0 ⇒ λxs.e
e, ei ⇒ vi ∀i ∈ 1..n
x1 :v1 , . . . , xn :vn , e ⇒ v e, e ⇒ v
(λ-invk) (upcast)
e, e0 .m(e1 , . . . , en ) ⇒ v e, (T )e ⇒ v

Fig. 5. MiniFJ&λ: syntax and big-step semantics

– A functional interface I can be assigned as type to a λ-abstraction which has

the functional type of the method, see rule (t-λ).
– A λ-abstraction should have a target type determined by the context where
the λ-abstraction occurs. More precisely, see [25] page 602, a λ-abstraction
in our calculus can only occur as return expression of a method or argument
of constructor, method call or cast. Then, in some contexts a λ-abstraction
cannot be typed, in our calculus when occurring as receiver in ﬁeld access or
method invocation, hence these cases should be prevented. This is implicit
in rule (t-field-access), since the type of the receiver should be a class name,
whereas it is explicitly forbidden in rule (t-invk). For the same reason, a λ-
abstraction cannot be the main expression to be evaluated.
– A λ-abstraction with a given target type J should have type exactly J: a
subtype I of J is not enough. Consider, for instance, the following program:

interface J {}
interface I extends J { A m(A x); }
class C {
C m(I y) { return new C().n(y); }
C n(J y) { return new C(); }
}
Soundness conditions for big-step semantics 185

vi : Ti ∀i ∈ 1..n x1 :T1 , . . . , xn :Tn e : T

(t-conf) Ti <: Ti ∀i ∈ 1..n
x1 :v1 , . . . , xn :vn , e : T

Γ e:C ﬁelds(C) = T1 f1 ; . . . Tn fn ;
Γ (x ) = T
i ∈ 1..n
(t-var) (t-field-access)
Γ x :T Γ e.f : Ti

Γ ei : Ti ∀i ∈ 1..n
(t-new) ﬁelds(C) = T1 f1 ; . . . Tn fn ;
Γ new C(e1 , . . . , en ) : C

e0 not of shape λxs.e

Γ ei : Ti ∀i ∈ 0..n
(t-invk) mtype(T0 , m) = T1 . . . Tn → T
Γ e0 .m(e1 , . . . , en ) : T

x1 :T1 , . . . , xn :Tn e : T
(t-λ) !mtype(I) = T1 . . . Tn → T
Γ λxs.e : I

Γ e:T Γ vi : Ti ∀i ∈ 1..n ﬁelds(C) = T1 f1 ; . . . Tn fn ;

Γ [v1 , . . . , vn ]C : C Ti <: Ti ∀i ∈ 1..n
(t-upcast) (t-object)
Γ (T )e : T

Γ e:T e not of shape λxs.e

T <: T
(t-sub)
Γ e : T

Fig. 6. MiniFJ&λ: type system

and the main expression new C().n(λx .x ). Here, the λ-abstraction has tar-
get type J, which is not a functional interface, hence the expression is ill-
typed in Java (the compiler has no functional type against which to type-
check the λ-abstraction). On the other hand, in the body of method m, the
parameter y of type I can be passed, as usual, to method n expecting a su-
pertype. For instance, the main expression new C().m(λx .x ) is well-typed,
since the λ-abstraction has target type I, and can be safely passed to method
n, since it is not used as function there. To formalise this behaviour, it is
forbidden to apply subsumption to λ-abstractions, see rule (t-sub).
– However, λ-abstractions occurring as results rather than in source code (that
is, in the environment and as ﬁelds of objects) are allowed to have a sub-
type of the required type, see the explicit side condition in rules (t-conf)
and (t-object). For instance, if C is a class with one ﬁeld J f, the expression
new C((I)λx.x) is well-typed, whereas new C(λx.x) is ill typed, since rule
(t-sub) cannot be applied to λ-abstractions. When the expression is evaluated,
the result is [λx.x]C , which is well-typed.

As mentioned at the beginning, the obvious small-step semantics would produce

not typable expressions. In the above example, we get
new C((I)λx.x) −→ new C(λx.x) −→ [λx.x]C
and new C(λx.x) has no type, while new C((I)λx.x) and [λx.x]C have type C.
We write Γ
e :<: T as short for Γ
e : T and T <: T for some T . In
order to state soundness, set R2 the big-step semantics deﬁned in Fig. 5, and let
186 F. Dagnino et al.

Π2T (e, e) hold if

e, e :<: T , Π2T (v ) if
v :<: T , for T deﬁned in Fig. 5.

Theorem 6 (Soundness). The big-step semantics R2 and the indexed predi-

cate Π2 satisfy the conditions S1, S2 and S3 of Sect. 4.2.

5.3 Intersection and union types

We enrich the type system of Fig. 4 by adding intersection and union type
constructors and the corresponding typing rules, see Fig. 7. As usual we require
an inﬁnite number of arrows in each inﬁnite path for the trees representing types.
Intersection types for the λ-calculus have been widely studied [11]. Union types
naturally model conditionals [26] and non-deterministic choice [22].

T ::= Nat | T1 → T2 | T1 ∧ T2 | T1 ∨ T2 type

Γ e:T Γ e:S Γ e :T ∧S Γ e :T ∧S
(∧ I) (∧ E) (∧ E)
Γ e :T ∧S Γ e:T Γ e:S
Γ e:T Γ e:S
(∨ I) (∨ I)
Γ e :T ∨S Γ e :T ∨S

Fig. 7. Intersection and union types: syntax and typing rules

The typing rules for the introduction and the elimination of intersection
and union are standard, except for the absence of the union elimination rule:
Γ {T /x }
e : V Γ {S /x }
e : V Γ
e : T ∨ S
(∨E)
Γ
e[e /x ] : V
As a matter of fact rule (∨E) is unsound for ⊕. For example, let split the type
Nat into Even and Odd and add the expected typings for natural numbers. The
preﬁx addition + has type
(Even → Even → Even) ∧ (Odd → Odd → Even)
and we derive

1 : Odd 2 : Even
(∨ I) (∨ I)
1 : Even ∨ Odd 2 : Even ∨ Odd
(⊕)
x:Even + x x:Even x:Odd + x x:Even (1 ⊕ 2) : Even ∨ Odd
(∨ E)
+(1 ⊕ 2)(1 ⊕ 2) : Even
We cannot assign the type Even to 3, which is a possible result, so strong sound-
ness is lost. In the small-step approach, we cannot assign Even to the interme-
diate term + 1 2, so subject reduction fails. In the big-step approach, there is no
such intermediate term; however, condition S1 fails for the reduction rule for +.
Indeed, considering the following instantiation of the rule:
Soundness conditions for big-step semantics 187

1 ⊕ 2⇒ 1 1 ⊕ 2⇒ 2 3⇒ 3
(+)
+(1 ⊕ 2)(1 ⊕ 2) ⇒ 3
and the type Even for the consequence, we cannot assign this type to the (con-
ﬁguration in) last premise (continuation).
Intersection types allow to derive meaningful types also for expressions con-
taining variables applied to themselves, for example we can derive

λx .x x : (T → S ) ∧ T → S
With union types all non-deterministic choices between typable expressions can
be typed too, since we can derive Γ
e1 ⊕ e2 : T1 ∨ T2 from Γ
e1 : T1 and
Γ
e2 : T2 .
In order to state soundness, let Π3T (e) be
e : T , for T deﬁned in Fig. 7.
Theorem 7 (Soundness). The big-step semantics R1 and the indexed predi-
cate Π3 satisfy the conditions S1, S2 and S3 of Sect. 4.2.

5.4 MiniFJ&O
A well-known example in which proving soundness with respect to small-step
semantics is extremely challenging is the standard type system with intersection
and union types [10] w.r.t. the pure λ-calculus with full reduction. Indeed, the
standard subject reduction technique fails5 , since, for instance, we can derive
the type (T → T → V ) ∧ (S → S → V ) → (U → T ∨ S ) → U → V for both
λx.λy.λz.x((λt.t)(y z))((λt.t)(y z)) and λx.λy.λz.x(y z)(y z), but the intermedi-
ate expressions λx.λy.λz.x((λt.t)(y z))(y z) and λx.λy.λz.x(y z)((λt.t)(y z)) do
not have this type.
As the example shows, the key problem is that rule (∨E) can be applied to
expression e where the same subexpression e occurs more than once. In the
non-deterministic case, as shown by the example in the previous section, this
is unsound, since e can reduce to different values. In the deterministic case,
instead, this is sound, but cannot be proved by subject reduction. Since using
big-step semantics there are no intermediate steps to be typed, our approach
seems very promising to investigate an alternative proof of soundness. Whereas
we leave this challenging problem to future work, here as first step we describe a
(hypothetical) calculus with a much simpler version of the problematic feature.
The calculus is a variant of FJ [27] with intersection and union types. Meth-
ods have intersection types with the same return type and different parameter
types, modelling a form of overloading. Union types enhance typability of condi-
tionals. The more interesting feature is the possibility of replacing an arbitrary
number of parameters with the same expression having an union type. We dub
this calculus MiniFJ&O.
Fig. 8 gives the syntax, big-step semantics and typing rules of MiniFJ&O.
We omit the standard big-step rule for conditional, and typing rules for boolean
5
For this reason, in [10] soundness is proved by an ad-hoc technique, that is, by
considering parallel reduction and an equivalent type system à la Gentzen, which
enjoys the cut elimination property.
188 F. Dagnino et al.

e ::= x | v | e.f | e.m(e1 , . . . , en ) | if e then e1 else e2 expression

v ::= new C(v1 , . . . , vn ) | true | false value
T ::= C | Bool | 1≤i≤n Ti expression type
(i) (i)
MT ::= 1≤i≤m (C1 . . . Cn → D) method type
e ⇒ new C(v1 , . . . , vn ) ﬁelds(C) = T1 f1 ; . . . Tn fn ;
i ∈ 1..n
(field-access)
e.fi ⇒ vi

ei ⇒ vi ∀i ∈ 1..n
(new)
new C(e1 , . . . , en ) ⇒ new C(v1 , . . . , vn )

e0 ⇒ new C(vs )
ei ⇒ vi ∀i ∈ 1..n
e[v1 /x1 ] . . .[vn /xn ][new C(vs )/this] ⇒ v
(invk) mbody(C, m) = x1 . . . xn , e
e0 .m(e1 , . . . , en ) ⇒ v

Γ e:C ﬁelds(C) = T1 f1 ; . . . Tn fn ;
Γ (x ) = T
i ∈ 1..n
(t-var) (t-field-access)
Γ x :T Γ e.fi : Ci

Γ ei : Ci ∀i ∈ 1..n
(t-new) ﬁelds(C) = T1 f1 ; . . . Tn fn ;
Γ new C(e1 , . . . , en ) : C
mtype(C 0 , m) <:
Γ ei : Ci ∀i ∈ 0..n Γ e : 1≤i≤m Di
(C 1 . . . Cn Di . . . Di → C)

(t-invk)
Γ e0 .m(e1 , . . . , en , e, . . . , e ) : C 1≤i≤m
p
p
Γ e : Bool Γ e1 : T Γ e2 : T Γ e:T
(t-if) (t-sub) T <: T
Γ if e then e1 else e2 : T Γ e : T

Fig. 8. MiniFJ&O: syntax, big-step semantics and type system

constants. The subtyping relation <: is the reflexive and transitive closure of the
union of the extends relation and the standard rules for union:
T1 <: T1 ∨ T2 T1 <: T2 ∨ T1
On the other hand, method types (results of the mtype function) are now inter-
section types, and the subtyping relation on them is the reflexive and transitive
closure of the standard rules for intersection:
MT 1 ∧ MT 2 <: MT 1 MT 1 ∧ MT 2 <: MT 2
The functions fields and mbody are defined as for MiniFJ&λ.
Instead mtype(C, m) gives, for each method m in class C, an intersection type. We
assume mbody(C, m) and mtype(C, m) either both defined or both undefined: in
(i)
the first case mbody(C, m)=x1 . . . xn , e, mtype(C, m)= 1≤i≤m (C1 . . . C(i)
n → D),

(i)
and x1 :C1 , . . . , xn :C(i)
n , this:C
e : D for i ∈ 1..m.
Clearly rule (t-invk) is inspired by rule (∨E), but the restriction to method
calls endows a standard inversion lemma. The subtyping in this rule allows to
choose the types for the method best fitting the types of the arguments. Not
surprisingly, subject reduction fails for the expected small-step semantics. For
Soundness conditions for big-step semantics 189

example, let class C have a field point which contains cartesian coordinates and
class D have a field point which contains polar coordinates. The method eq takes
two objects and compares their point fields returning a boolean value. A type for
this method is (C C → Bool) ∧ (D D → Bool) and we can type eq(e, e), where
e = if false then new C( . . . ) else new D( . . . )
In fact e has type C ∨ D. Notice that in a standard small-step semantics
eq(e, e) −→ eq(new D( . . . ), if false then new C( . . . ) else new D( . . . ))
and this last expression cannot be typed.
In order to state soundness, let R4 be the big-step semantics defined in Fig. 8,
and let Π4T (e) hold if
e : T , for T defined in Fig. 8.

Theorem 8 (Soundness). The big-step semantics R4 and the indexed predi-

cate Π4 satisfy the conditions S1, S2 and S3 of Sect. 4.2.

6 The partial evaluation construction

In this section, our aim is to provide a formal justification that the constructions
in Sect. 3 are correct. For instance, for the wrong semantics we would like to be
sure that all the cases are covered. To this end, we define a third construction,
dubbed pev for “partial evaluation”, which makes explicit the computations of
a big-step semantics, intended as the sequences of execution steps of the natu-
rally associated evaluation algorithm. Formally, we obtain a reduction relation
on approximated proof trees, so non-termination and stuck computation are
distinguished, and both soundness-must and soundness-may can be expressed.
To this end, first of all we introduce a special result ?, so that a judgment
c ⇒ ? (called incomplete, whereas a judgment in R is complete) means that the
evaluation of c is not completed yet. Analogously to the previous constructions,
we define an augmented set of rules R? for the judgment extended with ?:

? introduction rules These rules derive ? whenever a rule is partially applied:

for each rule ρ ≡ rule(j1 . . . jn , jn+1 , c) in R, index i ∈ 1..n + 1, and result
r ∈ R, we deﬁne the rule intro? (ρ, i, r ) as
j1 . . . ji−1 C (ji ) ⇒ r
c⇒?
We also add an axiom for each conﬁguration c ∈ C .
c⇒?
? propagation rules These rules propagate ? analogously to those for diver-
gence and wrong propagation: for each ρ ≡ rule(j1 . . . jn , jn+1 , c) in R, and
index i ∈ 1..n + 1, we add the rule prop(ρ, i, ?) as follows:
j1 . . . ji−1 C (ji ) ⇒ ?
c⇒?

Finally, we consider the set T of the (finite) proof trees τ in R? . Each τ can
be thought as a partial proof or partial evaluation of the root configuration. In
particular, we say it is complete if it is a proof tree in R (that is, it only contains
R
complete judgments), incomplete otherwise. We define a reduction relation −−−→
190 F. Dagnino et al.

R R c ⇒ ? C (ρ) = c
−−−→ −−−→
c ⇒ ? C (ρ, 1) = c
(r? ) (r ) (c? ) (prop(ρ,1,?))
r⇒? r⇒r c⇒?
ρ ∼i ρ
τ1 . . . τ i R τ1 . . . τ i
(intro? (ρ, i, r )) −−−→ (ρ ) R(ρ , i) = r
c⇒? c⇒r
#ρ = i
ρ ∼i ρ
τ1 . . . τ i R τ1 . . . τ i c ⇒ ?
(intro? (ρ, i, r )) −−−→ (prop(ρ ,i+1,?)) R(ρ , i) = r
c⇒? c⇒?
C (ρ , i + 1) = c
R
τ1 . . . τ i R τ1 . . . τi−1 τi τi −−−→τi
(prop(ρ,i,?)) −−−→ (prop(ρ,i,?))
c⇒? c⇒? R? (r(τi )) = ?
τ1 . . . τ i R τ1 . . . τi−1 τi τi −−R
−→τi
(prop(ρ,i,?)) −−−→ (intro? (ρ, i, r ))
c⇒? c⇒? R? (r(τi )) = r

Fig. 9. Reduction relation on T

on T such that, starting from the initial proof tree , we derive a sequence
c⇒?
where, intuitively, at each step we detail the proof (evaluation). In this way, a
...
sequence ending with a complete tree models terminating computation,
c⇒r
whereas an infinite sequence (tending to an infinite proof tree) models divergence,
and a stuck sequence models a stuck computation.
R
The one-step reduction relation −−−→ on T is inductively defined by the rules
in Fig. 9. In this figure #ρ denotes the number of premises of ρ, and r(τ ) the
root of τ . We set R? (c ⇒ u) = u where u ∈ R ∪ {?}. Finally, ∼i is the equivalence
up-to an index of rules, introduced at the beginning of Sect. 3.2. As said above,
each reduction step makes “less incomplete” the proof tree. Notably, reduction
rules apply to nodes with consequence c ⇒ ?, whereas subtrees with root c ⇒ r
represent terminated evaluation. In detail:

– If the last applied rule is an axiom, and the configuration is a result r , then
we can evaluate r to itself. Otherwise, we have to find a rule ρ with c in the
consequence and start evaluating the first premise of such rule.
– If the last applied rule is intro? (ρ, i, r ), then all subtrees are complete, hence,
to continue the evaluation, we have to find another rule ρ , having, for each
k ∈ 1..i, as k-th premise the root of τk . Then there are two possibilities: if
there is an i + 1-th premise, we start evaluating it, otherwise, we propagate
to the conclusion the result r of τi .
– If the last applied rule is a propagation rule prop(ρ, i, ?), then we simply
propagate the step made by τi .

In Fig. 10 we report an example of pev reduction.

We end by stating the three constructions to be equivalent to each other,
thus providing a coherency result of the approach. In particular, ﬁrst we show
that pev is conservative with respect to R, and this ensures the three construc-
tions are equivalent for ﬁnite computations. Then, we prove traces and wrong
Soundness conditions for big-step semantics 191

R λx .x ⇒ ? R λx .x ⇒ λx .x R λx .x ⇒ λx .x n ⇒ ?
−−−→ −−−→ −−−→
(λx .x ) n ⇒ ? (λx .x ) n ⇒ ? (λx .x ) n ⇒ ? (λx .x ) n ⇒ ?
R λx .x ⇒ λx .x n ⇒ n R λx .x ⇒ λx .x n ⇒ n n ⇒ ?
−−−→ −−−→
(λx .x ) n ⇒ ? (λx .x ) n ⇒ ?
R λx .x ⇒ λx .x n ⇒ n n ⇒ n R λx .x ⇒ λx .x n ⇒ n n ⇒ n
−−−→ −−−→
(λx .x ) n ⇒ ? (λx .x ) n ⇒ n

Fig. 10. The evaluation in pev of (λx .x ) n.

constructions to be equivalent to pev for diverging and stuck computations,

respectively, and this ensures they cover all possible cases.
R
Theorem 9. 1. R
c ⇒ r iff −−−→ τ , where r(τ ) = c ⇒ r .
c⇒?
R
2. Rtr
c ⇒tr t for some t ∈ C ω iff −−−→ω .
c⇒?
R
3. Rwr
c ⇒ wrong iff −−−→ τ , where τ is stuck.
c⇒?

7 Related work

Modeling divergence The issue of modelling divergence in big-step semantics

dates back to [18], where a stratified approach with a separate coinductive judg-
ment for divergence is proposed, also investigated in [30].
In [5] the authors models divergence by interpreting coinductively standard
big-step rules and considering also non-well-founded values. In [17] a similar tech-
nique is exploited, by adding a special result modelling divergence. Flag-based
big-step semantics [36] captures divergence by interpreting the same semantic
rules both inductively and coinductively. In all these approaches, spurious judge-
ments can be derived for diverging computations.
Other proposals [32,3] are inspired by the notion of definitional interpreter
[37], where a counter limits the number of steps of a computation. Thus, diver-
gence can be modelled on top of an inductive judgement: a program diverges if
the timeout is raised for any value of the counter, hence it is not directly mod-
elled in the definition. Instead, [20] provides a way to directly model divergence
using definitional interpreters, relying on the coinductive partiality monad [16].
The trace semantics in Sect. 3.1 has been inspired by [29]. Divergence propa-
gation rules are very similar to those used in [8,9] to define a big-step judgment
which directly includes divergence as result. However, this direct definition relies
on a non-standard notion of inference system, allowing corules [7,19], whereas
for the trace semantics presented in this work standard coinduction is enough,
since all rules are productive, that is, they always add an element to the trace.
Differently from all the previously cited papers which consider specific exam-
ples, the work [2] shares with us the aim of providing a generic construction to
192 F. Dagnino et al.

model non-termination, basing on an arbitrary big-step semantics. Ager consid-

ers a class of big-step semantics identified by a specific shape of rules, and defines,
in a small-step style, a proof-search algorithm which follows the big-step rules; in
this way, converging, diverging and stuck computations are distinguished. This
approach is somehow similar to our pev semantics, even tough the transition
system we propose is directly defined on proof trees.
There is an extensive body of work on coalgebraic techniques, where the
difference between semantics can be simply expressed by a change of functor.
In this paper we take a set-theoretic approach, simple and accessible to a large
audience. Furthermore, as far as we know [38], coalgebras abstract several kinds
of transition systems, thus being more similar to a small-step approach. In our
understanding, the coalgebra models a single computation step with possible
effects, and from this it is possible to derive a unique morphism into the final
coalgebra modelling the “whole” semantics. Our trace semantics, being big-step,
seems to roughly correspond to directly get this whole semantics. In other words,
we do not have a coalgebra structure on configurations.

Proving soundness As we have discussed, also proving (type) soundness with

respect to a big-step semantics is a challenging task, and some approaches have
been proposed in the literature. In [24], to show soundness of large steps seman-
tics, they prove a coverage lemma, which ensures that the rules cover all cases,
including error situations. In [30] the authors prove a soundness property similar
to Theorem 4, but by using a separate judgment to represent divergence, thus
avoiding using traces. In [5] there is a proof of soundness of a coinductive type
system with respect to a coinductive big-step semantics for a Java-like language,
defining a relation between derivations in the type system and in the big-step
semantics. In [8] there is a proof principle, used to show type soundness with
respect to a big-step semantics defined by an inference system with corules [7].
In [4] the proof of type soundness of a calculus formalising path-dependent types
relies on a big-step semantics, while in [3] soundness is shown for the polymor-
phic type systems F<: , and for the DOT calculus, using definitional interpreters
to model the semantics. In both cases they extend the original semantics adding
error and timeout, and adopt inductive proof strategies, as in [39]. A similar
approach is followed by [32] to show type soundness of the Core ML language.
Also [6] proposes an inductive proof of type soundness for the big-step se-
mantics of a Java-like language, but relying on a notion of approximation of
infinite derivation in the big-step semantics.
Pretty big-step semantics [17] aims at providing an efficient representation
of big-step semantics, so that it can be easily extended without duplication of
meta-rules. In order to define and prove soundness, they propose a generic er-
ror rule based on a progress judgment, whose definition can be easily derived
manually from the set of evaluation rules. This is partly similar to our wrong
extension, with two main differences. First, by factorising rules, they introduce
intermediate steps as in small-step semantics, hence there are similar problems
when intermediate steps are ill-typed (as in Sect. 5.2, Sect. 5.4). Second, wrong
introduction is handled by the progress judgment, that is, at the level of side-
Soundness conditions for big-step semantics 193

conditions. Moreover, in [13] there is a formalisation of the pretty-big-step rules

for performing a generic reasoning on big-step semantics by using abstract inter-
pretation. However, the authors say that they interpret rules inductively, hence
non-terminating computations are not modelled.
Finally, some (but not all) infinite trees of our trace semantics can be seen as
cyclic proof trees, see end of Sect. 3.1. Proof systems supporting cyclic proofs can
be found, e.g., in [14,15] for classical first order logic with inductive definitions.

8 Conclusion and future work

The most important contribution is a general approach for reasoning on sound-
ness with respect to a big-step operational semantics. Conditions can be proven
by a case analysis on the semantic (meta-)rules avoiding small-step-style inter-
mediate configurations. This can be crucial since there are calculi where the
property to be checked is not preserved by such intermediate configurations,
whereas it holds for the final result, as illustrated in Sect. 5.
In future work, we plan to use the meta-theory in Sect. 2 as basis to investi-
gate yet other constructions, notably the approach relying on corules [8,9], and
that, adding a counter, based on timeout [32,3].
We also plan to compare our proof technique for proving soundness with the
standard one for small-step semantics: if a predicate satisfies progress and subject
reduction with respect to a small-step semantics, does it satisfy our soundness
conditions with respect to an equivalent big-step semantics? To formally prove
such a statement, the first step will be to express equivalence between small-step
and big-step semantics. On the other hand, the converse does not hold, as shown
by the examples in Sect. 5.2 and Sect. 5.4.
For what concerns significant applications, we plan to use the approach to
prove soundness for the λ-calculus with full reduction and intersection/union
types [10]. The interest of this example lies in the failure of the subject reduction,
as discussed in Sect. 5.4. In another direction, we want to enhance MiniFJ&O
with λ-abstractions and allowing everywhere intersection and union types [23].
This will extend typability of shared expressions. We plan to apply our approach
to the big-step semantics of the statically typed virtual classes calculus developed
in [24], discussing also the non terminating computations not considered there.
With regard to proofs, that are mainly omitted here, and can be found in
the extended version at http://arxiv.org/abs/2002.08738, we plan to investigate
if we can simplify them by means of enhanced conductive techniques.
As a proof-of-concept, we provided a mechanisation6 in Agda of Lemma 1.
The mechanisations of the other proofs is similar. However, as future work, we
think it would be more interesting to provide a software for writing big-step
definitions and for checking that the soundness conditions hold.

Acknowledgments The authors are grateful to the referees: the paper strongly
improved thanks to their useful suggestions and remarks.
6
Available at https://github.com/fdgn/soundness-big-step-semantics.
194 F. Dagnino et al.

References
1. Peter Aczel. An introduction to inductive definitions. In Handbook of Mathematical
logic, pages 739–782, Amsterdam, 1977. North Holland.
2. Mads Sig Ager. From natural semantics to abstract machines. In Sandro Etalle,
editor, LOPSTR 2014 - 14th International Symposium on Logic Based Program
Synthesis and Transformation, volume 3573 of Lecture Notes in Computer Science,
pages 245–261, Berlin, 2004. Springer. doi:10.1007/11506676\_16.
3. Nada Amin and Tiark Rompf. Type soundness proofs with definitional interpreters.
In Giuseppe Castagna and Andrew D. Gordon, editors, POPL’17 - ACM Symp.
on Principles of Programming Languages, pages 666–679, New York, 2017. ACM
Press. doi:10.1145/3009837.
4. Nada Amin, Tiark Rompf, and Martin Odersky. Foundations of path-dependent
types. In Andrew P. Black and Todd D. Millstein, editors, OOPSLA’14 - ACM
International Conference on Object Oriented Programming Systems Languages and
Applications, pages 233–249, New York, 2014. ACM Press. doi:10.1145/2660193.
2660216.
5. Davide Ancona. Soundness of object-oriented languages with coinductive big-step
semantics. In James Noble, editor, ECOOP’12 - Object-Oriented Programming,
volume 7313 of Lecture Notes in Computer Science, pages 459–483, Berlin, 2012.
Springer. doi:10.1007/978-3-642-31057-7\_21.
6. Davide Ancona. How to prove type soundness of Java-like languages without
forgoing big-step semantics. In David J. Pearce, editor, FTfJP’14 - Formal
Techniques for Java-like Programs, pages 1:1–1:6, New York, 2014. ACM Press.
doi:10.1145/2635631.2635846.
7. Davide Ancona, Francesco Dagnino, and Elena Zucca. Generalizing inference sys-
tems by coaxioms. In Hongseok Yang, editor, ESOP 2017 - European Symposium
on Programming, volume 10201 of Lecture Notes in Computer Science, pages 29–
55, Berlin, 2017. Springer. doi:10.1007/978-3-662-54434-1_2.
8. Davide Ancona, Francesco Dagnino, and Elena Zucca. Reasoning on divergent
computations with coaxioms. PACMPL, 1(OOPSLA):81:1–81:26, 2017. doi:10.
1145/3133905.
9. Davide Ancona, Francesco Dagnino, and Elena Zucca. Modeling infinite behaviour
by corules. In Todd D. Millstein, editor, ECOOP’18 - Object-Oriented Program-
ming, volume 109 of LIPIcs, pages 21:1–21:31, Dagstuhl, 2018. Schloss Dagstuhl -
Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.ECOOP.2018.21.
10. Franco Barbanera, Mariangiola Dezani-Ciancaglini, and Ugo de’Liguoro. Inter-
section and union types: Syntax and semantics. Information and Computation,
119(2):202–230, 1995. doi:10.1006/inco.1995.1086.
11. Hendrik Pieter Barendregt, Wil Dekkers, and Richard Statman. Lambda Calculus
with Types. Perspectives in logic. Cambridge University Press, Cambridge, 2013.
12. Lorenzo Bettini, Viviana Bono, Mariangiola Dezani-Ciancaglini, Paola Giannini,
and Betti Venneri. Java & Lambda: a Featherweight story. Logical Methods in
Computer Science, 14(3), 2018. doi:10.23638/LMCS-14(3:17)2018.
13. Martin Bodin, Thomas Jensen, and Alan Schmitt. Certified abstract interpretation
with pretty-big-step semantics. In Xavier Leroy and Alwen Tiu, editors, CPP’15 -
Proceedings of the 2015 Conference on Certified Programs and Proofs, pages 29–40,
New York, 2015. ACM. doi:10.1145/2676724.2693174.
14. James Brotherston. Cyclic proofs for first-order logic with inductive definitions.
In Bernhard Beckert, editor, Automated Reasoning with Analytic Tableaux and
Soundness conditions for big-step semantics 195

Related Methods, International Conference, TABLEAUX 2005, volume 3702 of

Lecture Notes in Computer Science, pages 78–92. Springer, 2005. doi:10.1007/
11554554\_8.
15. James Brotherston and Alex Simpson. Sequent calculi for induction and infinite
descent. Journal of Logic and Computation, 21(6):1177–1216, 2011. doi:10.1093/
logcom/exq052.
16. Venanzio Capretta. General recursion via coinductive types. Logical Methods in
Computer Science, 1(2), 2005. doi:10.2168/LMCS-1(2:1)2005.
17. Arthur Charguéraud. Pretty-big-step semantics. In Matthias Felleisen and Philippa
Gardner, editors, ESOP 2013 - European Symposium on Programming, volume
7792 of Lecture Notes in Computer Science, pages 41–60, Berlin, 2013. Springer.
doi:10.1007/978-3-642-37036-6\_3.
18. Patrick Cousot and Radhia Cousot. Inductive definitions, semantics and abstract
interpretations. In Ravi Sethi, editor, POPL’92 - ACM Symp. on Principles of
Programming Languages, pages 83–94, New York, 1992. ACM Press. doi:10.
1145/143165.143184.
19. Francesco Dagnino. Coaxioms: flexible coinductive definitions by inference systems.
Logical Methods in Computer Science, 15(1), 2019. doi:10.23638/LMCS-15(1:
26)2019.
20. Nils Anders Danielsson. Operational semantics using the partiality monad. In Peter
Thiemann and Robby Bruce Findler, editors, ICFP’12 - International Conference
on Functional Programming 2012, pages 127–138, New York, 2012. ACM Press.
doi:10.1145/2364527.2364546.
21. Rocco De Nicola and Matthew Hennessy. Testing equivalences for processes. The-
oretical Computer Science, 34(1):83 – 133, 1984. doi:https://doi.org/10.1016/
0304-3975(84)90113-0.
22. Mariangiola Dezani-Ciancaglini, Ugo de’Liguoro, and Adolfo Piperno. A filter
model for concurrent lambda-calculus. SIAM Journal of Computing, 27(5):1376–
1419, 1998. doi:10.1137/S0097539794275860.
23. Mariangiola Dezani-Ciancaglini, Paola Giannini, and Betti Venneri. Intersection
types in Java: Back to the future. In Tiziana Margaria, Susanne Graf, and Kim G.
Larsen, editors, Models, Mindsets, Meta: The What, the How, and the Why Not?
- Essays Dedicated to Bernhard Steffen on the Occasion of His 60th Birthday,
volume 11200 of Lecture Notes in Computer Science, pages 68–86. Springer, 2018.
doi:10.1007/978-3-030-22348-9\_6.
24. Erik Ernst, Klaus Ostermann, and William R. Cook. A virtual class calculus.
In J. Gregory Morrisett and Simon L. Peyton Jones, editors, POPL’06 - ACM
Symp. on Principles of Programming Languages, pages 270–282. ACM, 2006. doi:
10.1145/1111037.1111062.
25. James Gosling, Bill Joy, Guy L. Steele, Gilad Bracha, and Alex Buckley. The Java
Language Specification, Java SE 8 Edition. Addison-Wesley Professional, Boston,
1st edition, 2014.
26. Grzegorz Grudzinski. A minimal system of disjunctive properties for strictness
analysis. In José D. P. Rolim, Andrei Z. Broder, Andrea Corradini, Roberto Gor-
rieri, Reiko Heckel, Juraj Hromkovic, Ugo Vaccaro, and J. B. Wells, editors, ICALP
Workshops, pages 305–322, Waterloo, Ontario, Canada, 2000. Carleton Scientific.
27. Atsushi Igarashi, Benjamin C. Pierce, and Philip Wadler. Featherweight Java:
A minimal core calculus for Java and GJ. ACM Transactions on Programming
Languages and Systems, 23(3):396–450, 2001. doi:10.1145/503502.503505.
196 F. Dagnino et al.

28. Gilles Kahn. Natural semantics. In Franz-Josef Brandenburg, Guy Vidal-Naquet,

and Martin Wirsing, editors, STACS’87 - Symposium on Theoretical Aspects of
Computer Science, volume 247 of Lecture Notes in Computer Science, pages 22–
39, Berlin, 1987. Springer. doi:10.1007/BFb0039592.
29. Jaroslaw D. M. Kusmierek and Viviana Bono. Big-step operational seman-
tics revisited. Fundamenta Informaticae, 103(1-4):137–172, 2010. doi:10.3233/
FI-2010-323.
30. Xavier Leroy and Hervé Grall. Coinductive big-step operational semantics. Infor-
mation and Computation, 207(2):284–304, 2009. doi:10.1016/j.ic.2007.12.004.
31. Robin Milner. A theory of type polymorphism in programming. Journal of Com-
puter and System Sciences, 17(3):348–375, 1978. doi:10.1016/0022-0000(78)
90014-4.
32. Scott Owens, Magnus O. Myreen, Ramana Kumar, and Yong Kiam Tan. Functional
big-step semantics. In Peter Thiemann, editor, ESOP 2016 - European Symposium
on Programming, volume 9632 of Lecture Notes in Computer Science, pages 589–
615, Berlin, 2016. Springer. doi:10.1007/978-3-662-49498-1\_23.
33. Benjamin C. Pierce. Types and programming languages. MIT Press, Cambridge,
Massachusetts, 2002.
34. Gordon D. Plotkin. A structural approach to operational semantics. Technical
report, Aarhus University, 1981.
35. Gordon D. Plotkin. A structural approach to operational semantics. Journal of
Logic and Algebraic Programming, 60-61:17–139, 2004.
36. Casper Bach Poulsen and Peter D. Mosses. Flag-based big-step semantics. Journal
of Logic and Algebraic Methods in Programming, 88:174–190, 2017. doi:10.1016/
j.jlamp.2016.05.001.
37. John C. Reynolds. Deﬁnitional interpreters for higher-order programming lan-
guages. Higher-Order and Symbolic Computation, 11(4):363–397, 1998. doi:
10.1023/A:1010027404223.
38. Jan J. M. M. Rutten. Universal coalgebra: a theory of systems. Theoretical Com-
puter Science, 249(1):3–80, 2000. doi:10.1016/S0304-3975(00)00056-6.
39. Jeremy Siek. Type safety in three easy lemmas. 2013. URL: http://siek.blogspot.
com/2013/05/type-safety-in-three-easy-lemmas.html.
40. A. K. Wright and M. Felleisen. A syntactic approach to type soundness. Informa-
tion and Computation, 115(1):38–94, 1994.

Kimball Germane1( )
and Michael D. Adams2
1
Brigham Young University, Provo UT, USA [email protected]
2
University of Michigan, Ann Arbor MI, USA [email protected]

Abstract. Abstract garbage collection and the use of pushdown systems

each enhance the precision of control-flow analysis (CFA). However, their
respective needs conflict: abstract garbage collection requires the stack
but pushdown systems obscure it. Though several existing techniques
address this conflict, none take full advantage of the underlying interplay.
In this paper, we dissolve this conflict with a technique which exploits
the precision of pushdown systems to decompose the heap across the
continuation. This technique liberates abstract garbage collection from
the stack, increasing its effectiveness and the compositionality of its host
analysis. We generalize our approach to apply compositional treatment to
abstract timestamps which induces the context abstraction of m-CFA, an
abstraction more precise than k-CFA’s for many common programming
patterns.

Keywords: Control-Flow Analysis · Abstract Garbage Collection · Push-

down Systems

1 Introduction
Among the many enhancements available to improve the precision of control-flow
analysis (CFA), abstract garbage collection and pushdown models of control flow
stand out as particularly effective ones. But their combination is non-trivial.
Abstract garbage collection (GC) [10] is the result of applying standard GC—
which calculates the heap data reachable from a root set derived from a given
environment and continuation—to an abstract semantics. Though it operates in
the same way as concrete GC, abstract GC has a different effect on the semantics
to which it’s applied. Concrete GC is semantically irrelevant in that it has no
effect on a program’s observable behavior.3 Abstract GC, on the other hand,
is semantically relevant in that, by eliminating some merging in the abstract
heap, it prevents a utilizing CFA from conflating some distinct heap data. In the
setting of a higher-order language, where data can represent control, this superior
approximation of data translates to a superior approximation of control as well,
manifest by the CFA exploring fewer infeasible execution paths.
Pushdown models of control flow [16, 3] encode the call–return relation of a
program’s flow of execution as precisely as an unbounded control stack would
3
It is irrelevant only if space consumption is unobservable, as is typical.

c The Author(s) 2020

P. Müller (Ed.): ESOP 2020, LNCS 12075, pp. 197–223, 2020.
https://doi.org/10.1007/978-3-030-44914-8_ 8
198 K. Germane and M. D. Adams

allow. Consequently, and in contrast to the ﬁnite-state models which preceded

them, pushdown models enable a utilizing CFA—a stack-precise CFA—to avoid
relating a given return to any but its originating call. Thus, pushdown models
also induce CFAs which explore fewer infeasible execution paths.
Not only do abstract GC and pushdown systems each enhance the control
precision of CFA, they also appear to do so in complementary ways. Is it possible
for a CFA to use both and gain the benefits of each? This question’s answer is
not immediate, as these techniques have competing requirements: abstract GC
must examine the stack to extract the root set of reachability but the use of
pushdown models obscures the control stack to the abstract semantics.
This question has been addressed by two techniques: The first introspec-
tive technique [4] introduces a primitive operation into the analyzing machine
which introspects the stack and delivers the set of frames which may be live;
this technique has a variety of alternative formulations, some of which alter its
complexity–precision profile [8, 7]. The second technique [1], which modifies the
first to work with definitional interpreters, dictates that the analyzer implement
a set-passing style abstract semantics where each passed set contains the heap
addresses present in the continuation at that point. Each of these techniques
reconciles the competing requirements of abstract GC and pushdown models
of control flow and allows the utilizing CFA to enjoy the precision-enhancing
benefits of both at once.
However, each of these techniques—hereafter referred to collectively as push-
down GC —yields a setting in which abstract GC and pushdown models of con-
trol flow merely coexist. In contrast, this paper prescribes a technique which
exploits the pushdown model of control flow to enable a new mode of garbage
collection—compositional garbage collection—which does not require the ability
to inspect the continuation.
The key observation is that, in a stack-precise CFA, the heap present at the
point of a call is in scope at the point of its return. Thus, the analysis can offload
some of the contents of the callee’s heap to the caller’s—in particular, the data
irrelevant to the callee’s execution. When this offloading is performed, the final
heap of the callee (just as it returns) is incomplete with respect to subsequent
execution. But, since the caller’s heap is in scope at this point, the analysis can
reconstitute the subsequent heap by combining the caller’s heap with the callee’s
final heap.
The data relevant to the callee’s execution is the data reachable from its
local environment and excludes the data reachable from its continuation alone.
Offloading heap data, then, consists of GC-ing each callee’s heap with respect
to its local environment only. When one applies this practice consistently to all
calls, one associates with each active call not a heap but a heap fragment, effec-
tively decomposing the heap across the continuation. As we will show, careful
separation and combination of these heap fragments can perfectly simulate the
presence of the full heap.
This liberation of GC from the continuation has several consequences for the
host CFA.
Liberate Abstract Garbage Collection 199

1. It simpliﬁes both the formalization and implementation of the host CFA,

since it can omit the relatively complex machinery to ensure the continuation-
resident addresses are at hand.
2. It reduces the host CFA’s workload by not requiring it to traverse full heaps.
Earl et al. [4] observe that traversal of large heaps observably increases anal-
ysis time.
3. It recovers context irrelevance in the host CFA’s semantics, a property we
discuss more in Section 3.4 and Section 6.1.
4. It enables purely-local execution summaries which makes memoization much
more eﬀective.

In sum, relative to pushdown GC, compositional GC oﬀers quantitative beneﬁts

to the host CFA, being strictly more powerful, as well as qualitative.

1.1 Examples

Let’s look at an example where compositional GC makes memoization more

eﬀective. Consider the following Scheme program

(let* ([id (lambda (x) x)]

[y (id 42)]
[z (id y)])
(+ y z))

which calls id twice, each time on 42.

We would hope that a CFA would be able to memoize its analysis of the
first call and, upon recognizing that the second call is semantically-identical, re-
use its results. However, contemporary CFAs will not because each call is made
with a different heap—the second call’s heap includes a binding for y that the
first’s doesn’t. Moreover, this distinction persists even with pushdown GC since
y’s binding is needed to continue execution after the call. Since CFAs have no
means but reachability to determine what is relevant to a given execution point,
and since what is relevant constitutes a memoization key, pushdown GC is too
weak to identify these two calls.
In contrast, a CFA with compositional GC produces a heap fragment for
each call which is closed over only data reachable from the local environment—
for a call, the procedure and argument values themselves. Accordingly, from its
perspective, these two calls are identical and specify a single memoization key.
Now let’s look at an example where compositional GC keeps co-live bindings
of the same variable distinct. Consider the following Scheme program

(letrec ([f (lambda (x)

(if (prime? x)
(let ([y (f (+ x 1))])
(+ x y))
x))])
(f 2))
200 K. Germane and M. D. Adams

which deﬁnes and calls a recursive procedure f.

Concrete evaluation of this program proceeds ﬁrst calls f with 2, and then 3,
and then 4, returning 4, and then 3 + 4 = 7 and then 2 + 7 = 9. The procedure
f is properly recursive—so these calls are nested—and, after f is called with
4 but before it returns, three distinct bindings of x are live. Moreover, since
each binding of x is needed until its binding call returns, each is continuation-
reachable and therefore not claimed by GC. These facts and limitations translate
to the analysis setting: a CFA will discover multiple co-live bindings of x which
persist in the face of pushdown GC. Consequently, even with pushdown GC, a
CFA will in general join these bindings to some degree, concluding that x can
be 2 whenever it can be 3 and can be 3 whenever it can be 4.
In constrast, just before a CFA with compositional GC performs each call
to f, it GCs with respect to the operator and argument values which, in each
case, consist of the closure of f (which reaches only itself in the heap) and a
number (which doesn’t reach anything). Thus, each binding to x is the ﬁrst in its
respective heap fragment and doesn’t interfere with the live bindings of x in other
heap fragments. Using a numeric abstraction in which arithmetic operations
propagate but do not introduce approximation [1], a CFA with compositional
GC will produce an exact answer (whereas one with pushdown GC will not).

1.2 Generalizing the Approach

The conventional treatment of the heap by CFA is to thread it through execution,

allowing it to evolve as it goes. In contrast, compositional GC advocates that the
CFA treat the heap with the same discipline that it treats the environment: saved
at the evaluation of a subexpression and restored when its evaluation completes
and its value is delivered. That is, compositional GC is achieved by, in effect,
treating the heap compositionally.
What happens if we impose the same compositional discipline on other
threaded components, such as the timestamp? In that case, we move from the
last-k-call-sites4 context abstraction of k-CFA [14] to the top-m-stack-frames5
context abstraction of m-CFA [11] This appearance of m-CFA’s abstraction in
a stack-precise CFA is the first such, to our knowledge.
With compositional treatment of both the heap and timestamp, we arrive
at a stack-precise CFA which treats each of its components compositionally.
This treatment also leads to a CFA closer to being compositional in the sense
that the analysis of a compound expression is a function of the analyses of
its constituent parts. Accordingly, we refer to such a stack-precise CFA as a
compositional control-flow analysis.
The remainder of the paper is as follows. We first introduce the syntax of the
language we will use throughout the paper in Section 2. We then discuss the en-
hancements of perfect stack precision, garbage collection, and their combination
in Section 3. We then proceed through a series of semantics which transition
4
as in, most-recent k call sites
5
as in, youngest m stack frames
Liberate Abstract Garbage Collection 201

from a threaded heap to a compositional, garbage-collected heap in Section 4.

We then abstract the compositional semantics to obtain our CFA in Section 5.
We discuss the ramiﬁcations of the compositional treatment of each of the heap
and abstract time in Section 6. We ﬁnally discuss related work in Section 7 and
conclusions and future work in Section 8.

Note In the remainder of the paper, we use the standard term store to refer
to the analysis component which models the heap. Thus, we will describe our
technique as, e.g., treating stores compositionally.

2 A-Normal Form λ-Calculus

For presentation, we keep the language small: we use a unary λ-calculus in A-

normal form [5], the grammar of which is given below.

Exp e ::= ce | let x = ce in e

CExp ce ::= ae | (ae 0 ae 1 ) | set! x ae
AExp ae ::= x | λx.e
Var x [an inﬁnite set of variables]

A proper expression e is a call expression ce or a let-expression, which binds

a variable to the result of a call expression. (Restricting the bound expression
to a call expression prevents let-expressions from nesting there, a hallmark of
A-normal form.) A call expression ce is an atomic expression ae, an application,
or a set!-expression. An atomic expression ae is a variable reference or a λ
abstraction.
Atomic expressions are trivial [13]. We include set!-expressions to produce
mutative eﬀects that must be threaded through evaluation. (The approach we
present in this paper can also handle more-general forms of mutation, such as
boxes.) For our purposes, we consider a set!-expression “serious” [13] since it has
an eﬀect on the store.
A program is a closed expression; we assume (without loss of generality) that
programs are alphatised—that is, that each bound variable has a distinct name.
Expressions of the form (ae 0 ae 1 ) for some ae 0 and ae 1 constitute the set
App; similarly, expressions of the form λx.e for some x and e constitute the set
Lam.

3 Background

In this section, we review abstract garbage collection and the k-CFA context
abstraction. We begin by introducing a small-step concrete semantics which
deﬁnes the ground truth of evaluation.
202 K. Germane and M. D. Adams

3.1 Semantic Domains

First, we introduce some semantic components that we will use heavily through-
out the rest of the paper.

v ∈ Val = Lam × Env ρ ∈ Env = Var Time

∗
t ∈ Time = App a ∈ Address = Var × Time
σ ∈ Store = Address Val κ ∈ Cont ::= mt | lt(x, ρ, e, κ)

A value v is closure, a pair of a λ abstraction and an environment which closes

it. An environment ρ is a ﬁnite map from each variable x to a time t; a time t
is a ﬁnite sequence of call sites. Let ρ|e denote the restriction of the domain of
the environment ρ to the free variables of e. An address a is a pair of a variable
and time and a store σ is a map from addresses to values. A continuation κ is
either the empty continuation or the continuation of a let binding.

3.2 Concrete Semantics

We deﬁne our concrete semantics as a small-step relation over abstract machine

states. The state space of our machine is given formally as follows.

ς ∈ State =Eval + Apply

ςev ∈ Eval =Exp × Env × Store × Cont × Time
ςap ∈ Apply =Val × Store × Cont × Time

Machine states come in two variants. An Eval machine state represents a point
in execution in which an expression will be evaluated; it contains registers for
an expression e, its closing environment ρ, the store σ (modelling the heap), the
continuation κ (modelling the stack), and the time t. An Apply machine state
represents a point in execution at which a value is in hand and must be delivered
to the continuation; it contains registers for the value v to deliver, the store σ,
the continuation κ, and the time t.
Figure 1 contains the definitions of two relations over machine states, the
union of which constitutes the small-step relation. The →ev relation transitions
an Eval state to its successor. The Let rule pushes a continuation frame to save
the bound variable, environment, and body expression. The resultant Eval state
is poised to evaluate the bound expression ce. The Call rule first uses aeval
defined

aeval(σ, ρ, x) = σ(x, ρ(x)) aeval(σ, ρ, λx.e) = (λx.e, ρ|λx.e )

to obtain values for each of the operator and argument. It then increments
the time, extends the store and environment with the incremented time, and
arranges evaluation of the operator body at the incremented time. The Set! rule
remaps a location in the store designated by a given variable (which is resolved in
the environment) to a value obtained by aeval. It returns the identity function.
Liberate Abstract Garbage Collection 203

Let
ev(let x = ce in e, ρ, σ, κ, t) →ev ev(ce, ρ, σ, lt(x, ρ, e, κ), t)

Call
(λx.e, ρ ) = aeval(σ, ρ, ae 0 ) v = aeval(σ, ρ, ae 1 ) t = (ae 0 ae 1 ) :: t

σ = σ[(x, t ) → v] ρ = ρ [x → t ]
ev((ae 0 ae 1 ), ρ, σ, κ, t) →ev ev(e, ρ , σ , κ, t )

Set!
v = aeval(σ, ρ, ae) a = (x, ρ(x)) σ = σ[a → v]
ev(set! x ae, ρ, σ, κ, t) →ev ap((λx.x, ⊥), σ , κ, t)

Atomic Apply
v = aeval(σ, ρ, ae) ρ = ρ[x → t] σ = σ[(x, t) → v]
ev(ae, ρ, σ, κ, t) →ev ap(v, σ, κ, t) ap(v, σ, lt(x, ρ, e, κ), t) →ap ev(e, ρ , σ , κ, t)

Fig. 1. Small-step abstract machine semantics

The Atomic rule evaluates an atomic expression. The Apply rule applies a
continuation to a value, extending the environment and store and arranging for
the evaluation of the let body.
We inject a program pr into the initial evaluation state ev(pr , ⊥, ⊥, mt, )
which arranges evaluation in the empty environment, empty store, halt contin-
uation, and empty time.

Adding Garbage Collection At this point, we have a small-step relation

defining execution by abstract machine and are perfectly positioned to apply,
e.g., the Abstracting Abstract Machines (AAM) [15] recipe to abstract the se-
mantics and thereby obtain a sound, computable CFA. Before doing so, however,
we will extend our semantics to garbage-collect the store on each transition. This
extension has no semantic effect in the concrete semantics but, as we will discuss,
greatly increases the precision of the abstracted (or, simply, abstract) semantics.
We extend the semantics by defining two garbage collection transitions, one
which collects an Eval state and one which collects an Apply state. Because our
abstract machine explicitly models local environments, heaps (via stores), and
stacks (via continuations), we can apply a copying collector to perform garbage
collection.
First, we define a family root of metafunctions to extract the reachability
root set from values, environments, and continuations.
rootv (λx.e, ρ) = rootρ (ρ) rootκ (mt) = ∅
rootρ (ρ) = ρ rootκ (lt(x, ρ, e, κ)) = rootρ (ρ|e ) ∪ rootκ (κ)
The rootv metafunction extracts the root addresses from a closure by using rootρ
to extract the root addresses from its environment. By the rootρ metafunction,
204 K. Germane and M. D. Adams

the root addresses of an environment are simply the variable–time pairs that
define it—that is, the definition of rootρ views its argument ρ extensionally as
a set of addresses. The rootκ metafunction extracts the root addresses from a
continuation. The empty continuation has no root addresses whereas the root
addresses of a non-empty continuation are those of its stored environment (re-
stricted to the free variables of the expression it closes) combined with those of
the continuation it extends.
Next, we define a reachability relation →σ parameterized by a store σ and
over addresses by
a0 →σ a1 ⇔ a1 ∈ rootv (σ(a0 ))
We then define the reachability of a root set with respect to a store

R(σ, A) = {a : a ∈ A, a →∗σ a }

where →∗σ is the reﬂexive, transitive closure of →σ . From here, we obtain the
transitions
GC-Eval
A = rootρ (ρ|e ) ∪ rootκ (κ) σ = σ|R(σ,A)
ev(e, ρ, σ, κ, t) →GC ev(e, ρ, σ , κ, t)
GC-Apply
A = rootv (v) ∪ rootκ (κ) σ = σ|R(σ,A)
ap(v, σ, κ, t) →GC ap(v, σ , κ, t)
where σ|R(σ,A) is σ restricted to the reachable addresses R(σ, A). We compose
this garbage-collecting transition with each of →ev and →ap . Altogether, the
garbage-collecting semantics are given by →GC ◦[→ev ∪ →ap ].

3.3 Abstracting Abstract Machines with Garbage Collection

Now that we have a small-step abstract machine semantics with GC, we are
ready to apply the AAM recipe to obtain a sound, computable CFA with GC.
We apply the AAM recipe in two steps.
First, we refactor the state space so that all inductively-defined components
are redirected through the store. Practically, this refactoring has the effect of
allocating continuations in the store. For our semantics, this refactoring yields
the state space StateSA defined

StateSA =EvalSA + ApplySA

EvalSA =Exp × Env × StoreSA × ContAddr × Time
ApplySA =StoreSA × ContAddr × Val × Time

in which a continuation address α ∈ ContAddr replaces the continuation drawn

from Cont. The space of continuations becomes deﬁned by

κSA ∈ ContSA ::= mt | lt(x, ρ, e, α)

Liberate Abstract Garbage Collection 205

and of stores by

StoreSA = Address + ContAddr Val + ContSA

Not reflected in this structure is the typical constraint that an address a will
only ever locate a value and a continuation address α will only ever locate a
continuation.
Second, we finitely partition the unbounded address space of the store and
treat the constituent sets as abstract addresses (via some finite representative).
Practically, this partitioning is achieved by limiting the time t to at most k call
sites where k becomes a parameter of the CFA (leading to the designation k-
CFA). Any addresses which agree on the k-length prefix of their time component
are identified and the finite representative for this set of addresses uses simply
that prefix. Accordingly, we define an abstract time domain Time = Time ≤k
and let it reverberate through the state space definitions, obtaining

State
=Eval
+ Apply

=Exp × Env
Eval × Store
× ContAddr
× Time

Apply × ContAddr
=Store × Val × Time

(in which we allow the definition of ContAddr to depend, directly or not, on that
of Time).
Finitization of the address space is key to producing a computable CFA.
Practically, however, it means that some values located previously by distinct
addresses will after be located by the same abstract address. When this conflation
occurs, the CFA must behave as if either access was intended; this behavior is
manifested by non-deterministically choosing the value located by a particular
address. Because our language is higher-order, this non-determinism also affects
the control flows the CFA considers. This effect is evident in the Call rule
defined
Call
(λx.e, ρ̂ ) ∈ aeval(σ̂,
ρ̂, ae 0 ) v̂ = aeval(σ̂,
ρ̂, ae 1 ) t̂ = (ae 0 ae 1 ) :: t̂ k

σ̂ = σ̂[(x, t̂ ) → v̂] ρ̂ = ρ̂ [x → t̂ ]

ev((ae 0 ae 1 ), ρ̂, σ̂, α̂, t̂) →ev ev(e, ρ̂ , σ̂ , α̂, t̂ )

which is structurally identical to that of the concrete semantics except in two

respects:
1. The abstract evaluation of the operator ae 0 may yield multiple closures and
the CFA considers the application of each. Due to the approximation ﬁnitiza-
tion introduces, not every abstractly-applied closure will necessarily appear
in a compatible call under the concrete semantics. Such closures, initiating
spurious control paths, waste analysis eﬀort and this waste compounds as
the exploration of spurious paths leads to the discovery of yet more.
2. The abstract time component is limited to length at most k (obtained by
· k ).
206 K. Germane and M. D. Adams

In short, a ﬁnite address space introduces a value approximation and, in a higher-

order language such as ours, a control approximation as well.
While the strategy to store-allocate continuations facilitates the systematic
abstraction process of AAM, it also imposes a similar approximation on the
continuation space as it does the value space. In consequence, a CFA obtained
by AAM approximates not only the value and control flow of the program,
but the return flow as well. Return-flow approximation is manifest as a single
abstract call returning to caller contexts that did not make that call. 6
On the other hand, because the AAM abstraction process preserves the over-
all structure of the state space—in particular, the explicit models of the local
environment, heap, and stack—applying GC to an abstract state is straightfor-
ward. In addition, GC in the abstract semantics improves precision and reduces
the workload of the analyzer [10].
To see how GC improves precision, consider a 0CFA (that is, [k = 1]CFA)
without GC of the Scheme program
(let* ([id (lambda (x) x)]
[y (id 42)]
[z (id 35)])
z)
at the call (id 42). As the abstract call is made, the abstract value 42 is stored
an address a derived from x. Once the call returns, the abstract value 42 still
resides in the heap at a which is now unreachable. However, as the abstract call
(id 35) is made, the address a is derived again (a consequence of the finite
address space), and the abstract value 35 is merged with the abstract value 42
which persists at a. Since the value at a is returned and becomes the result of
the program, the CFA reports that the program results in either 42 or 35.
Now consider a 0CFA with GC of the same program. Once the call (id
42) returns and α becomes unreachable, its heap entry is reaped by GC. The
abstract call (id 35) then allocates the abstract value 35 at a which is, from
the allocator’s perspective, a fresh heap cell. Consequently, the CFA precisely
reports that the program results in 35.
The above example also illustrates how GC reduces the workload of the an-
alyzer. Though we didn’t call it out, when using a naive continuation allocator
without GC, the abstract call (id 35) not only correctly returns to the contin-
uation binding z but also spuriously returns to the continuation binding y. In
this example, this spurious control (return) flow does no more damage to the
precision of 0CFA’s approximation of the final program result, but does cause
it to explore infeasible control flows which damage the precision of the 0CFA’s
approximation of intermediate values. GC prevents the spurious flows in this
example from arising at all; however, in general, it does not prevent all spurious
return flows.
6
P4F [6] uses a particular continuation allocator which is able to avoid return-flow
approximation. However, the P4F technique applies only when the store is globally-
widened and, in such a setting, no data ever becomes unreachable which renders GC
completely ineffective.
Liberate Abstract Garbage Collection 207

3.4 Stack-Precise CFA with Garbage Collection

In contrast to an AAM-derived analysis, a stack-precise CFA does not approx-
imate the return ﬂow of the program. A stack-precise CFA achieves this feat
by modelling control ﬂow with a pushdown system which allows it to precisely
match returns with their corresponding calls. However, to do so, it requires full
control of the continuation which we abide by factoring it out of the state space,
obtaining

State PD =Eval PD + Apply PD

Eval PD =Exp × Env × Store × Time
Apply PD =Val × Store × Time

before we abstract it to produce a CFA. (Some CFAs factor the store out of
machine states to be managed globally, part of widening the store. In a sense,
factoring out the continuation is part of widening the continuation.) Without
a continuation component, an Eval PD state is an evaluation configuration and
an Apply PD state is an evaluation result. Except for the presence of the time
component, State PD exhibits precisely the configuration and result shapes one
finds in many stack-precise CFAs [17, 8, 1, 18].
However, factoring the continuation out and ceding control of it to the anal-
ysis presents an obstacle to abstract GC, which needs to extract the root set of
reachable addresses from it. Earl et al. [4] developed a technique whereby the
analysis could introspect the continuation and extract the root set of reachable
addresses from the continuation. Johnson and Van Horn [8] reformulated this
incomplete technique for an operational setting and offered a complete—albeit
theoretically more-expensive—technique capable of more precision. Johnson et
al. [7] unified these techniques within an expanded framework. Darais et al. [1]
then showed that the Abstracting Definitional Interpreters-approach—currently
the state of the art—is compatible with the complete technique by including the
set of stack root addresses as a component in the evaluation configuration.

Context Irrelevance These techniques indeed reconcile the conﬂicting needs

of GC and stack-precise control yielding an analysis which enjoys the precision-
enhancing benefits of each. However, the addition of garbage collection causes
the resultant analysis to violate context irrelevance [8], the property that the
evaluation of a configuration is independent of its continuation. In terms of
the concrete semantics of Section 3.2, context irrelevance is the property that
ev(e, ρ, σ, κ, t) →+ ap(σ , κ, v) if and only if ev(e, ρ, σ, κ , t) →+ ap(σ , κ , v) for
any κ and κ .
The incomplete and complete techniques to achieve stack-precise abstract GC
each violate context irrelevance. Under the incomplete technique, abstract GC
prevents spurious paths from being explored and changes the store yielded by
those that are explored. Thus, the abstract evaluation of a configuration becomes
dependent on (the root set of reachable addresses embedded in) its continuation.
The complete technique, achieved by introducing the set of root addresses as a
208 K. Germane and M. D. Adams

component in the evaluation conﬁguration, vacuously restores context irrelevance

by distinguishing otherwise-identical configurations based on the continuation.
That is, the states ev(e, ρ, σ, κ, t) and ev(e, ρ, σ, κ , t) with identical configurations
but distinct continuations become the continuation-less evaluation configurations
ev(e, ρ, σ, A, t) and ev(e, ρ, σ, A , t) with distinct root address sets A and A . This
address set is a close approximation of the continuation and effectively makes
the control context relevant to evaluation.

3.5 The k-CFA Context Abstraction

In the concrete semantics, the time component t serves two purposes. The first
purpose is to provide the allocator with a source of freshness, so that when the
allocator must furnish a heap cell for a variable bound previously in execution, it
is able to furnish a distinct one. Were freshness the only constraint on t, the Time
domain could simply consist of N. In anticipation of its role in the downstream
CFA, the time component assumes a second purpose which is to capture some
notion of the context in which execution is occurring. The hope is that the notion
of context it captures is semantically meaningful so that, when an unbounded
set of times are identified by the process of abstraction, each address, which
is qualified by such an abstracted time, locates a semantically-coherent set of
values.
To get a better idea of what notion of context our treatment of time cap-
tures, let’s examine how our concrete semantics treats time, as dictated by k-
CFA. Time begins as the empty sequence . It is passed unchanged across all
Eval transitions, save one, and the Apply transition. The exception is the Call
transition, which instead passes the (at-most-)k-length prefix of the application
prepended to the incoming time. Hence, the k-CFA context abstraction is the
k-most-recent calls made in execution history.
In Section 6.2, we consider the ramifications of threading the time component
through evaluation and compare it to an alternative treatment.

4 From Threaded to Compositional Stores

In this section, we present a series of four semantics that gradually transition

from a threaded treatment of stores without GC to a compositional treatment of
stores with GC. We define each of these semantics in terms of big-step judgments
of (or close to) the form σ, ρ, t e ⇓ (v, σ ). This judgment expresses that
the evaluation configuration consisting of the expression e under the store σ,
environment ρ, and timestamp t evaluates to the evaluation result consisting of
the value v and the store σ . When discussing the evaluation of e, we will refer
to σ as the incoming store and σ as the resultant store. We will also refer to
the time component t as the binding context since, in the big-step semantics, its
connection to the history of execution becomes more distant.
Formulating our semantics in big-step style offers two advantages to our set-
ting: First, we can readily express them by big-step definitional interpreters at
Liberate Abstract Garbage Collection 209

which point we can apply systematic abstraction techniques [1, 18] to obtain
corresponding CFAs exhibiting perfect stack precision. Second, they emphasize
the availability of the conﬁguration store at the delivery point of the evalua-
tion result; this availability is crucial to our ability to shift to a compositional
treatment of the store.

4.1 Threaded-Store Semantics

To orient ourselves to the big-step setting, we present the reference semantics for
our language in big-step style in Figure 2. This reference semantics is equivalent
to the reference semantics given in small-step style in Section 3.2 except that
there is no corresponding Apply rule; its responsibility—to deliver a value to
a continuation—is handled implicitly by the big-step formulation. In terms of
big-step semantics, this reference semantics is characterized by the threading of
the store through each rule; the resultant store of evaluation is the conﬁguration
store plus the allocation and mutation incurred during evaluation. Hence, we
refer to this semantics as the threaded-store semantics. We use natural numbers
as store subscripts in each rule to emphasize the store’s monotonic increase.

Let
σ0 , ρ, t ce ⇓ (v0 , σ1 )
ρ = ρ[x → t] σ2 = σ1 [(x, t) → v0 ] σ2 , ρ , t e ⇓ (v, σ3 )
σ0 , ρ, t let x = ce in e ⇓ (v, σ3 )

Call
((λx.e, ρ0 ), σ1 ) = aeval(σ0 , ρ, ae 0 )
(v1 , σ2 ) = aeval(σ1 , ρ, ae 1 ) t = (ae 0 ae 1 ) :: t
ρ1 = ρ0 [x → t ] σ3 = σ2 [(x, t ) → v1 ] σ3 , ρ1 , t e ⇓ (v, σ4 )
σ0 , ρ, t (ae 0 ae 1 ) ⇓ (v, σ4 )

Set!
(v, σ1 ) = aeval(σ0 , ρ, ae) σ1 = σ0 [(x, ρ(x)) → v] Atomic
σ0 , ρ, t set! x ae ⇓ ((λx.x, ⊥), σ1 ) σ, ρ, t ae ⇓ aeval(σ, ρ, ae)

Fig. 2. The threaded-store semantics

A program pr is evaluated in an initial conﬁguration with an empty store ⊥,

an empty environment ⊥, and an empty binding context . In such a conﬁgu-
ration, pr evaluates to a value v if ⊥, ⊥, pr ⇓ (v, σ).
The Let rule evaluates the bound call expression ce under the incoming
environment and store. If evaluation results in a value–store pair, this incoming
environment is extended with a binding derived from the bound variable and
210 K. Germane and M. D. Adams

incoming binding context.7 The resultant store is extended with mapping from
that binding to the resultant value. The body expression is evaluated under
the extended environment and store and its result becomes that of the overall
expression.
Contrasting the treatment of the environment and the store by the Let rule
is instructive. On the one hand, the environment is treated compositionally: the
incoming environment of evaluation is restored and extended after evaluation of
the bound value. On the other hand, the store is treated non-compositionally:
the store resulting from the evaluation of the bound expression is extended after
it has accumulated the effects of its evaluation.
Under this criteria, we classify the treatment of the binding context as compo-
sitional rather than threaded. This compositional treatment departs from typical
practice of CFA and is the first such treatment in a stack-precise CFA to our
knowledge. In Section 6.2, we examine the ramifications of this treatment.
The Call rule evaluates the atomic expressions ae 0 and ae 1 for the operator
and argument, respectively. It then derives a new binding context, extends the
environment and store with a binding using that context, and evaluates the oper-
ator body under the extended environment, store, and derived binding context.
The result of evaluation the body is that of the overall expression.
The Set! rule evaluates the atomic body expression ae and updates the
binding of the referenced variable in the store. Its result is the identity function
paired with the updated store.
The Atomic rule evaluates an atomic expression ae using the aeval atomic
evaluation metafunction. Foreshadowing the succeeding semantics, we define
aeval to return a pair of its calculated value and the given store. In this seman-
tics, the store is passed through unmodified; in forthcoming semantics, it will be
altered according to the calculated value. Atomic evaluation is unchanged from
the small-step semantics:

aeval(σ, ρ, x) = (σ(x, ρ(x)), σ) aeval(σ, ρ, λx.e) = ((λx.e, ρ|λx.e ), σ)

4.2 Threaded-Store Semantics with Eﬀect Log

The second semantics enhances the reference semantics with an effect log ξ which
explicitly records the allocation and mutation that occurs through evaluation.
The effect log is considered part of the evaluation result; accordingly the effect log
semantics are in terms of judgments of the form σ, ρ, t e ⇓! (v, σ ), ξ. Figure 3
presents the effect log semantics, identical to the reference semantics except for
(1) the addition of the effect log and (2) the use of the metavariable a to denote
an address (x,t). (This usage persists in all subsequent semantics as well.)
The effect log is represented by a function from store to store. The definition
of each log is given by either a literal identity function, a use of the extendlog
7
Because the program is alphatised, the binding of a let-bound variable in a particular
calling context will not interfere with the binding of any other variable.
Liberate Abstract Garbage Collection 211

Let
σ0 , ρ, t ce ⇓! (v0 , σ1 ), ξ0
ρ = ρ[x → t] σ2 = σ1 [(x, t) → v0 ] σ2 , ρ , t e ⇓! (v, σ3 ), ξ1
σ0 , ρ, t let x = ce in e ⇓! (v, σ3 ), ξ1 ◦ extendlog ((x, t), v0 , σ1 ) ◦ ξ0

Call
((λx.e, ρ0 ), σ1 ) = aeval(σ0 , ρ, ae 0 )
(v1 , σ2 ) = aeval(σ1 , ρ, ae 1 ) t = (ae 0 ae 1 ) :: t

ρ1 = ρ0 [x → t ] σ3 = σ2 [(x, t ) → v1 ] σ3 , ρ1 , t e ⇓! (v, σ4 ), ξ
σ0 , ρ, t (ae 0 ae 1 ) ⇓! (v, σ4 ), ξ ◦ extendlog ((x, t ), v1 , σ2 )

Set!
(v, σ1 ) = aeval(σ0 , ρ, ae)
a = (x, ρ(x)) σ1 = σ0 [a → v]
σ0 , ρ, t set! x ae ⇓! ((λx.x, ⊥), σ1 ), extendlog (a, v, σ1 )

Atomic
σ, ρ, t ae ⇓! aeval(σ, ρ, ae), λσ.σ

Fig. 3. Threaded-store semantics with an eﬀect log

metafunction, or the composition of eﬀect logs. The extendlog metafunction is

deﬁned
extendlog (a, v, σ ) = λσ.σ[a → v] ∪ σ

where the union of the extended store σ[a → v] and the value-associated store
σ treats each store extensionally as a set of pairs but the result is always a
function—i.e. any given address is paired with at most one value. The effect
log of the Atomic rule is the identity function, reflecting that no allocation or
mutation is performed when evaluating an atomic expression. The effect log of
the Set! rule is constructed by the metafunction extendlog ; the store argument
to extendlog is the store after the mutation has occurred. The use of this store is
necessary to propagate the mutative effect and ensures that its union with the
store on which this log is replayed agrees on all common bindings. The effect log
of the Call rule is composed of the effect log of evaluation of the body and an
entry for the allocation of the bound variable. Finally, the effect log of the Let
rule is composed of the effect logs of evaluation of both the body and binding
expression interposed by an entry for the allocation of the bound variable.
In this semantics (and the next), the bindings in σ are redundant: once
extendlog applies the the mutative or allocative binding to its argument σ, σ
already contains all the bindings of σ . Once we introduce GC to the semantics,
however, this will no longer be the case.
The intended role of the effect log is captured by the following lemma, which
states that one may obtain the resultant store by applying the resultant log to
the initial store of evaluation.
212 K. Germane and M. D. Adams

Lemma 1. If σ, ρ, t e ⇓! (v, σ ), ξ, then σ = ξ(σ).

The proof proceeds straightforwardly by induction on the judgment’s derivation.

4.3 Compositional-Store Semantics

The third semantics (seen in Figure 4) shifts the previous semantics from thread-
ing the store to treating it compositionally. Under this treatment, evaluation
results still consist of a value, store, and effect log, but the store is associated
directly to the value—at least conceptually—and not treated as a global effect
repository. This alternative role is particularly apparent in the Let rule: the
store resulting from evaluation of the bound expression is not extended to be
used as the initial store of evaluation of the body. Instead, the effect log resulting
from evaluation of the bound expression is applied to the initial store (of the
overall let expression). We emphasize this compositional treatment by no longer
using numeric subscripts, which suggest “evolution” of the store, and instead
using ticks, which suggest distinct (but related) instances.

Let
σ, ρ, t ce ⇓◦ (v , σv ), ξ σ = ξ (σ)
(ρ , σ ) = extend(ρ, σ , x, t, v , σv )

σ , ρ , t e ⇓◦ (v, σv ), ξ

σ, ρ, t let x = ce in e ⇓◦ (v, σv ), ξ ◦ extendlog ((x, t), v , σv ) ◦ ξ

Call
((λx.e, ρ0 ), σ0 ) = aeval(σ, ρ, ae 0 ) (v1 , σ1 ) = aeval(σ, ρ, ae 1 ) t = (ae 0 ae 1 ) :: t
(ρ , σ ) = extend(ρ0 , σ0 , x, t , v1 , σ1 ) σ , ρ , t e ⇓◦ (v, σv ), ξ
σ, ρ, t (ae 0 ae 1 ) ⇓◦ (v, σv ), ξ ◦ extendlog ((x, t ), v1 , σ1 )

Set!
(v, σv ) = aeval(σ, ρ, ae)
a = (x, ρ(x)) σ = σv [a → v]
σ, ρ, t set! x ae ⇓◦ ((λx.x, ⊥), σ ), extendlog (a, v, σ )

Atomic
σ, ρ, t ae ⇓◦ aeval(σ, ρ, ae), λσ.σ

Fig. 4. The compositional-store semantics

We use the extend metafunction to bind a value v (with an associated store

σv ) to a variable x in a given binding context t within a given environment ρ
and store σ, deﬁned

extend(ρ, σ, x, t, v, σv ) = (ρ[x → t], σ[(x, t) → v] ∪ σv )

Liberate Abstract Garbage Collection 213

When we extend σ with a mapping for v, we also copy all of the mappings from
σv . This copying will yield a well-formed store since σ[(x, t) → v] and σv agree
on any common bindings.
Although the role of the store has changed, the same lemma holds in this
semantics as does in the previous. We repeat it in terms of this semantics.
Lemma 2. If σ, ρ, t e ⇓◦ (v, σv ), ξ, then ξ(σ) = σv .
Like the previous lemma, its proof can be obtained by induction on the
judgment’s derivation.

4.4 Compositional-Store Semantics with Garbage Collection

Our ﬁnal semantics (seen in Figure 5) continues the compositional treatment of

the store but GCs stores to remove irrelevant bindings. Under this compositional
treatment, the role of the store is to model the fragment of the heap which is
reachable from an associated environment: the store of a configuration closes the
associated environment and the store of a result closes the environment of the
associated value. Accordingly, the root set of reachability used by GC includes
the addresses of the closed environment only and, in particular, does not include
addresses from the continuation. We define reachability just as we did for GC in
Section 3.2, using the rootv and rootρ metafunctions to extract a root set from
a value and environment, respectively.
In this semantics, we use a modified atomic evaluation function aevalgc which
garbage-collects the store associated with a value. It is defined

aevalgc (σ, ρ, x) = (v, gc(v, σ)) where v = σ(x, ρ(x))

aevalgc (σ, ρ, λx.e) = (v, gc(v, σ)) where v = (λx.e, ρ|λx.e )

where gc(v, σ) prunes the unreachable bindings from σ with respect to v.

This semantics is careful to ensure that each evaluation is performed under
a store which contains no values unreachable from the environment via frequent
use of the restrict metafunction. For a given expression e, closing environment
ρ, and closing store σ, the restrict metafunction first determines the restriction
ρ|e of ρ to the free variables of e and then the bindings of σ reachable from ρ|e ;
it then garbage-collects the store by pruning unreachable bindings. Formally,
restrict is defined
restrict(e, ρ, σ) = (ρ|e , gc(ρ|e , σ))
where gc(ρ, σ) prunes the unreachable bindings from σ with respect to ρ.
The Let rule proceeds by first obtaining the restriction of the environment
and store with respect to the bound expression ce, before evaluating ce under
that restriction. The evaluation of ce produces a value v , an associated store σv
which closes only that value, and an effect log ξ . The Let rule then replays the
effect log ξ on the initial store σ thereby accumulating any mutation (and allo-
cation on which it depends) which occurred. After replaying the log, it extends
the resultant store σ and initial environment ρ with a binding for v and copies
214 K. Germane and M. D. Adams

Let
(ρce , σce ) = restrict(ce, ρ, σ)
σce , ρce , t ce ⇓gc (v , σv ), ξ σ = ξ (σ) (ρ , σ ) = extend(ρ, σ , x, t, v , σv )

(ρe , σe ) = restrict(e, ρ , σ ) σe , ρe , t e ⇓gc (v, σv ), ξ
σ, ρ, t let x = ce in e ⇓gc (v, σv ), ξ ◦ extendlog ((x, t), v , σv ) ◦ ξ

Call
((λx.e, ρ0 ), σ0 ) = aevalgc (σ, ρ, ae 0 ) (v1 , σ1 ) = aevalgc (σ, ρ, ae 1 )
t = (ae 0 ae 1 ) :: t (ρ , σ ) = extend(ρ0 , σ0 , x, t , v1 , σ1 )
(ρe , σe ) = restrict(e, ρ , σ ) σe , ρe , t e ⇓gc (v, σv ), ξ
σ, ρ, t (ae 0 ae 1 ) ⇓gc (v, σv ), ξ ◦ extendlog ((x, t ), v1 , σ1 )

Set!
(v, σv ) = aevalgc (σ, ρ, ae)
a = (x, ρ(x)) σ = σv [a → v]
σ, ρ, t set! x ae ⇓gc ((λx.x, ⊥), ⊥), extendlog (a, v, σ )

Atomic
σ, ρ, t ae ⇓gc aevalgc (σ, ρ, ae), λσ.σ

Fig. 5. The compositional-store semantics with garbage collection

the bindings of its associated store σv . Finally, the extended environment and
store are restricted with respect to the body expression e before e’s evaluation
under them.
The Call rule proceeds by ﬁrst evaluating the atomic operator and argument
expressions. After calculating the new binding context t , the operator value
environment and store are extended with the new binding. Before evaluation of
the body e commences, the extended environment and store are restricted with
respect to it.
The Set! rule atomically evaluates the expression ae producing the assigned
value. It returns the identity function which, with an empty environment, is
closed by an empty store.
The Atomic rule evaluates an atomic expression with aevalgc .
To connect this semantics to the previous, we show that the addition of GC
has no semantic eﬀect by the following lemma.

Lemma 3. If σ, ρ, t e ⇓◦ (v, σv ), ξ and σ = gc(ρ|e , σ) then σ , ρ, t e ⇓gc

(v, σv ), ξ where σv = gc(v, σv ).

In prose, this lemma states that two evaluation conﬁgurations, identical ex-
cept that one’s store is the other’s with unreachable bindings pruned, will yield
the same evaluation result: their evaluation will produce the same value and,
modulo unreachable bindings, the same closing store.
Liberate Abstract Garbage Collection 215

5 Abstract Compositional-Store Semantics with Garbage

Collection
We now abstract the compositional-store semantics with GC—the final seman-
tics of the preceding section. Abstracting the semantics involves (1) defining a
finite counterpart of each component of the evaluation configuration and result
and (2) defining a counterpart of each semantic rule in terms of these finite
components. With each component of the configuration finite, configurations
themselves become finite. Then we show that each abstracted rule simulates its
counterpart—that it admits the full range of its counterpart’s behavior. Doing
this for each rule ensures that the abstract semantics includes every behavior in-
cluded by the exact semantics. Once that’s complete, we can directly implement
our big-step semantics in an abstract definitional interpreter [1, 18] to obtain our
stack-precise CFA with GC.
We begin by abstracting each configuration component.

v̂ ∈ Val
= P(Lam × Env
) ρ̂ ∈ Env
= Var Time

= App ≤m
t̂ ∈ Time â ∈ Address
= Var × Time

σ̂ ∈ Store → Val
= Address ξˆ ∈ Log → Val
= Address

Like its concrete counterpart, an abstract store σ̂ maps an abstract address to

an abstract value. Abstract addresses remain a pair of a variable and binding
context, only the context is abstract. An abstract value v̂, however, is a set of
abstract closures rather than a single closure. An abstract closure is a λ paired
with an abstract environment ρ̂ which itself is a finite map from variables to
binding contexts. An abstract timestamp t̂ is a sequence of at most m application
sites, where m is a parameter to the analysis.8 An abstract log ξˆ is an extensional
account of the added and modified store mappings relative to the initial store,
and takes the same form of an abstract store itself. We define abstract join,
composition, and application operators by
σ̂0 σ̂1 = λâ.σ̂0 (â) ∪ σ̂1 (â) ξˆ0 ˆ◦ξˆ1 = ξˆ0 ξˆ1 ˆ
ξ(σ̂) = σ̂ ξˆ
To help show that the abstract semantics simulates the concrete, we make
a connection between the state space of the abstract and that of the concrete.
We make this connection by means of a polymorphic abstraction function | · |9 ,
defined for all domains except stores by
|ρ| = λx.|ρ(x)| |t| = t m |(λx.e, ρ)| = {(λx.e, |ρ|)} |ξ| = |ξ(⊥)|
and for stores by
|σ| = λâ. |σ(a)|
|a|=â

8
The parameter m is used similarly to the parameter k of k-CFA.
9
The abstraction function is typically accompanied by a complementary concretiza-
tion function to complete a Galois connection. For simplicity here, we leave it in-
complete.
216 K. Germane and M. D. Adams

Abstracting a store groups entries by their abstracted address in a large set.

Abstracting an environment ρ abstracts its range. Abstracting a binding context
t takes its at-most-m-length prefix. Abstracting a closure produces a singleton
of that closure with an abstracted environment. Finally, abstracting a log ξ
produces the abstract store that results from apply the log to the empty store
⊥ and then abstracting.
Figure 6 defines the abstract compositional-store semantics with garbage
collection. Structurally, nearly every rule is identical to the exact counterpart
that it abstracts; most of the work of abstraction is defining the abstract domains
and metafunctions and connecting them to those of the exact semantics. The
Call rule differs structurally from its exact counterpart in two notable ways:
First, because an abstract value is a set of closures, it applies for each such
closure in the operator set. Second, it defines the new binding context t̂ to be
the prefix of the application site prepended to the previous abstract time t̂ and
limited to a length of at most m. The abstract aeval
metafunction is defined

aeval(σ̂, ρ̂, x) = (v̂, gc(v̂,
σ̂)) where v̂ = σ̂(ρ̂(x))

aeval(σ̂, ρ̂, λx.e) = (v̂, gc(v̂,
σ̂)) where v̂ = {(λx.e, ρ̂|λx.e )}

We omit the straightforward deﬁnitions of the abstract variants of gc,

restrict,

and extend.

Let
(ρ̂ce , σ̂ce ) = restrict(ce,
ρ̂, σ̂)
ˆ (v̂ , σ̂v ), ξˆ
σ̂ce , ρ̂ce , t̂ ce ⇓ σ̂ = ξˆ (σ̂) (ρ̂ , σ̂ ) = extend(ρ̂,
σ̂ , x, t̂, v̂ , σ̂v )
(ρ̂e , σ̂e ) = restrict(e,
ρ̂ , σ̂ ) σ̂e , ρ̂e , t̂ e ⇓ˆ (v̂, σ̂v ), ξˆ
σ̂, ρ̂, t̂ let x = ce in e ⇓ ˆ (v̂, σ̂v ), ξˆˆ◦ξˆ

Call
(v̂0 , σ̂0 ) = aeval(σ̂,
ρ̂, ae 0 ) (λx.e, ρ̂0 ) ∈ v̂0
(v̂1 , σ̂1 ) = aeval(σ̂,
ρ̂, ae 1 )
t̂ = (ae 0 ae 1 ) :: t̂
m (ρ̂ , σ̂ ) = extend(ρ̂
0 , σ̂0 , x, t̂ , v̂1 , σ̂1 )
(ρ̂e , σ̂e ) = restrict(e,
ρ̂ , σ̂ ) σ̂e , ρ̂e , t̂ e ⇓
ˆ (v̂, σ̂v ), ξˆ
ˆ (v̂, σ̂v ), ξˆ
σ̂, ρ̂, t̂ (ae 0 ae 1 ) ⇓

Set!
(v̂, σ̂v ) = aeval(σ̂,
ρ̂, ae)
ˆ
( , ξ) = extend(⊥, ⊥, x, ρ̂(x), v̂, σ̂v )
Atomic

σ̂, ρ̂, t̂ set! x ae ⇓ˆ ({(λx.x, ⊥)}, ⊥), ξˆ ˆ aeval(σ̂,

σ̂, ρ̂, t̂ ae ⇓ ρ̂, ae), ⊥

Fig. 6. The abstract compositional-store semantics with garbage collection

Liberate Abstract Garbage Collection 217

As a ﬁnal step before we establish the simulation relationship, we deﬁne an

ordering on stores (and logs, extending it in the natural way):

σ̂0 σ̂1 ⇔ ∀â ∈ Address.σ̂

0 (â) ⊆ σ̂1 (â) v̂0 v̂1 ⇔ v̂0 ⊆ v̂1
We formally connect this abstract semantics with the concrete compositional-
store semantics given in Section 4.4 by the following abstraction theorem.
Theorem 1. If |σ| σ̂ and |ρ| = ρ̂ and |t| = t̂ and σ, ρ, t e ⇓gc (v, σv ), ξ, then
ˆ (v̂, σ̂v ), ξˆ where |v| v̂ and |σv | σ̂v and |ξ| ξ.
σ̂, ρ̂, t̂ e ⇓ ˆ

This theorem states that if the conﬁguration components are related by ab-
straction, then, for any given derivation in the exact semantics, there is an deriva-
tion in the abstract semantics which yields an abstraction of its results. It can
be proved by induction on the derivation.

6 Discussion
Now we examine the ramifications of a compositional treatment of analysis com-
ponents. We do so in turn, first considering the ramifications of treating the store
compositionally and then of treating the time compositionally.

6.1 The Eﬀects of Treating the Store Compositionally

We saw in Section 4.3 that a semantics could treat stores compositionally without
employing GC. In this case, the caller’s store and callee’s final store agreed on
common entries and combining them produced the same store as the threaded-
store semantics. However, the compositional machinery liberates evaluation from
the stack. With evaluation so-liberated, GC need not preserve any heap data
reachable solely from the stack. This relaxation
1. simplifies GC and increases its effectiveness;
2. leads to general yet precise summaries; and
3. restores context irrelevance under GC.
We discuss each of these aspects in more detail.

Simpliﬁed and More-Eﬀective Garbage Collection Classical abstract GC

and its succeeding pushdown GC each preserve heap data reachable from both
the local environment and the stack. Once one has determined the root set of
reachable addresses from these two components, it determines the transitive
closure of reachability. When GC is performed with respect to only the local
environment, both the initial root set and its transitive closure are smaller and
it requires less work to calculate them. If the CFA employs incomplete garbage
collection [8], the garbage collector is also freed from calculating the root set
of stack addresses as a ﬁxed point. A smaller transitive closure of reachable
addresses is not only less costly to calculate but also leads to more collected
garbage.
218 K. Germane and M. D. Adams

General Yet Precise Summaries A stack-precise CFA without GC will

falsely distinguish abstract evaluations of the same call which are identical mod-
ulo GC-able heap data. In such cases, the addition of pushdown GC will allow the
CFA to identify them. However, even with pushdown GC, a stack-precise CFA
will falsely distinguish abstract evaluations of the same call which are identical
modulo continuation-reachable heap data. On the other hand, compositional GC
soundly disregards such data and thereby identifies such evaluations.
Compositional GC is able to achieve this feat because its calculates the frag-
ments of the heap reachable from the local environment alone. Since this envi-
ronment is restricted to the free variables of the expression it closes, the resultant
heap fragment includes a tight overapproximation of the actually-relevant heap
data. One effect is that evaluation summaries—the association of an evaluation
configuration with its results—are general yet precise. They are general since,
with a minimum of irrelevant heap data, more contexts are consistent with them.
They are precise since, with a minimum of irrelevant heap data, they are less
likely to allocate an entry at an existing address. In fact, the precision of com-
positional GC dominates that of pushdown GC.

Restored Context Irrelevance A semantics determines which parts of a given

configuration are relevant to its evaluation [8]. When the continuation is irrel-
evant to evaluation, the semantics exhibits the property of context irrelevance.
Context irrelevance is an intuitive property: unless our semantics has control
effects or some other explicit dependence, we would be surprised if a configu-
ration’s continuation was relevant to its evaluation. Even a concrete semantics
with GC exhibits context irrelevance since data reachable from the stack alone
will not effect the result of evaluation. In an abstract semantics with GC, how-
ever, where new allocations can occur at old addresses, the presence of data
reachable from the stack alone can affect evaluation. The set of data preserved
by GC, which determines how evaluation is affected, is itself determined by the
continuation. Thus, an abstract semantics in which GC is defined with respect
to the stack violates context irrelevance.
Put this way, it is clear why compositional GC restores context irrelevance to
the semantics: it removes the dependence on the stack from GC itself and allows
all data reachable from the stack alone to be collected. This restoration makes
evaluation easier to reason about and increases the effectiveness of memoization.

6.2 The Eﬀect of Treating the Time Compositionally

The k-CFA context abstraction consists of a sequence of k call sites—for each

point in execution, the last k call sites encountered. In Section 3.5, we discussed
how the last-k-call-sites abstraction arose as a consequence of the semantics
threading the abstract time (i.e. the context) through execution.
In contrast, the big-step, concrete semantics of Section 4 and the big-step,
abstract semantics of Section 5 didn’t thread the abstract time through execution
but treated it compositionally, installing a new time at a call but restoring the
Liberate Abstract Garbage Collection 219

previous time at the corresponding return. This treatment of time induces a

different notion of context than k-CFA; instead of yielding the last-k call sites,
it yields the top-m stack frames.
This top-m-stack-frames context abstraction is not novel and originates with
m-CFA [11], a family of polynomial-time CFAs. However, to our knowledge, its
appearance here is its first in a stack-precise setting: many stack-precise CFAs
encode context using other means than a time component (or don’t use context
in the first place) [16, 3, 1]; still others achieve the last-k-call-sites abstraction,
incidentally or intentionally [4, 18].
Using the top-m stack frames to qualify heap allocation has certain advan-
tages to using the last-k call sites; in particular, its power to distinguish bindings
is not diluted by static call sequences. To see how k-CFA’s and m-CFA’s context
abstractions compare, let’s consider a few examples.
First, consider a [k = 2]CFA of the program

(define (f x) x)
(define (g y) (f y))
(g 42)
(g 35)

the abstract resource 42 is allocated in the heap twice—ﬁrst when the call to
g is made and second when the call to f is made. At the point of the second
allocation, the two most-recently-encountered call sites in evaluation are (f y)
and (g 42); hence, these call sites are used to qualify the binding of 42 to x in
the heap. The treatment of the abstract resource 35 is similar except its second
allocation is qualiﬁed by (f y) and (g 35). For this program, [k = 2]CFA is
able to keep the two allocations distinct.
Next, consider a [k = 2]CFA of the similar program

(define (f x) x)
(define (g y)
(displayln y)
(f y))
(g 42)
(g 35)

which includes the call (displayln y) in the body of g. As in the previous

program, the analysis of this program allocates the abstract resources 42 and 35
twice each. However, in this program, the second of each of their allocations is
qualified by (f y) and (displayln y). In fact, every call to f made via g will
occur in that same context. In a sense, the static sequence of (displayln y)
and (f y) eats up the context budget ensuring that the analysis conflates all
bindings made at the call (f y). (Incrementing k would remove the conflation
in this example, but it makes the analysis more expensive and such a strategy
can always be confounded by a longer “static” trace of calls.)
To constrast, consider an [m = 2]CFA of the same program. Because the
context consists of the top two stack frames, the allocation of 42 is qualified by
220 K. Germane and M. D. Adams

(f y) and (g 42) and the allocation of 35 is qualiﬁed by (f y) and (g 35).

Because the second stack frame of each allocation is distinct, [m = 2]CFA is able
to keep the bindings distinct in the analysis.
The top-m-stack-frames context abstraction is itself susceptible to deep nests
of calls which serve only to pass parameters: if the nesting depth exceeds m, then
the analysis will conﬂate the bindings made by the innermost calls. And, as with
k-CFA, an increased m can always be confounded by a deeper nesting. In spite
of that, the m-CFA context abstraction has been shown to work well relative to
k-CFA in practice in a stack-imprecise setting where variables are aggressively
re-bound [11]. Future work is needed to verify that its advantages carry over to
a stack-precise setting.

7 Related Work

Broadly, this work is an instance of abstract interpretation and, more speciﬁcally,

of control-flow analysis (CFA) [9, 14]. It inherits from the Abstracting Abstract
Machines methodology [15] of systematically deriving CFAs from purely opera-
tional specifications. More specifically, this work is an instance of stack-precise
CFA which is preceded by many variations [16, 3, 8, 6, 12, 1, 18].
Might and Shivers [10] first introduced GC to CFA. Reconciling GC with
stack-precise CFAs has been the focus of significant effort. Earl et al. [4] intro-
duced the first technique to do so which approximated the the set of frames
that could be on any possible stack at any given control point. Johnson and Van
Horn [8] cast this technique into a more operational framework and considered
a more-precise variant in which a control point splits for each possible stack
with its heap being collected with respect to that stack alone. Johnson et al. [7]
unified these previous two works in one formal framework. Darais et al. [1] show
that the Abstracting Definitional Interpreters approach easily accommodates
abstract GC by introducing a machine component which contains the addresses
embedded in stack frames; this realization of GC amounts essentially to the fully-
precise technique. Our work sidesteps the need for all of this previous effort by
decomposing the heap into continuation-independent fragments.
A significant concept in the work of Johnson and Van Horn [8] is context
irrelevance, the property that the evaluation of a configuration is independent
of its continuation, and they note that the approximate abstract GC technique
introduced by Earl et al. [4] violates context irrelevance. Once again, the in-
dependence of GC from the stack under our technique sidesteps these issues;
evaluation under our technique exhibits context irrelevance effortlessly.
As part of the resolution of an apparent paradox regarding the complexities of
object-oriented k-CFA and functional k-CFA, Might et al. [11] develop m-CFA,
a stack-imprecise, polynomial-time family of CFA that employs the top-m stack
frames as a context abstraction as opposed to the last-k call sites of k-CFA. They
show that this abstraction is more resilient against approximation in the face
of the aggressive rebinding that m-CFA effects. Our treatment of the abstract
time component induces this same top-m-stack-frames context abstraction but
Liberate Abstract Garbage Collection 221

in a stack-precise setting, the ﬁrst such appearance in the literature, to our

knowledge.
Although not inspired by it, our work surprisingly shares much of the per-
spective and approach of the work of Dillig et al. [2] to verify C and C++
programs. In particular, both works employ a compositional approach to analy-
sis by producing evaluation summaries and decompose the heap to support their
approach. In addition, both works have some notion of propagation of summary
effects: theirs is a summary transfer function; ours is an effect log. In contrast,
our work does not produce summaries in a bottom-up fashion and is targeted to-
ward explicitly higher-order languages with effects. Interesting future work could
explore whether any precision-enhancing techniques of Dillig et al. [2] could be
ported and applied, whether the bottom-up production of summaries is viable,
or whether their general approach can be used for verification in our setting.

8 Conclusion and Future Work

In this paper, we showed that treating the heap compositionally in a stack-precise

CFA removes its dependence on the stack, at once simplifying GC and increasing
its effectiveness. As a result, the analysis produces more compact and precise
evaluation summaries that are more amenable to reuse. We also showed that
treating the time component compositionally induces the top-m-stack-frames
context abstraction of m-CFA. Unlike k-CFA’s last-k-call-sites context abstrac-
tion, m-CFA’s need not devote any precision to static call sequences.
Interestingly, the notion of context shared by k-CFA and m-CFA—calling
context, roughly—seems to be at odds with summary reuse. In a stack-precise
1CFA (which exhibits the same context abstraction whether it is [k = 1]CFA or
[m = 1]CFA), the syntactic call site of the caller is encoded in the summary of
the callee, preventing the summary’s reuse at any other call site. If this tension
is fundamental, it might benefit to look to alternative notions of context—extant
and novel.
The complement to abstract GC is abstract counting [10] which keeps track
of the number of concrete resources that correspond to an abstract resource
and enables certain abstract transitions, such as a strong store update. If an
abstact counting can be applied to heap fragments such that the overlap among
fragments is accounted for correctly, it might be possible to detect opportunities
to perform strong updates to heap bindings which would further increase the
precision of our technique.
Finally, Darais et al. [1] consider a particular value abstraction in which
primitive operations propagate imprecision but do not introduce it. Their ab-
straction suggests a generalization in which each “basic block” is analyzed at full
precision and imprecision occurs only at the join points of control flow. CFA2’s
stack environments capture an aspect of this generalization and it appears our
technique does as well. However, a focused investigation would reveal whether
such a generalization can be more-fully realized.
222 K. Germane and M. D. Adams

References
1. Darais, D., Labich, N., Nguyen, P.C., Van Horn, D.: Abstracting definitional in-
terpreters (functional pearl). Proceedings of the ACM on Programming Languages
1(ICFP), 12:1–12:25 (Aug 2017). https://doi.org/10.1145/3110256
2. Dillig, I., Dillig, T., Aiken, A., Sagiv, M.: Precise and compact modular pro-
cedure summaries for heap manipulating programs. In: Proceedings of the
32Nd ACM SIGPLAN Conference on Programming Language Design and Im-
plementation. pp. 567–577. PLDI ’11, ACM, New York, NY, USA (2011).
https://doi.org/10.1145/1993498.1993565
3. Earl, C., Might, M., Van Horn, D.: Pushdown control-flow analysis of higher order
programs. Workshop on Scheme and Functional Programming (2010)
4. Earl, C., Sergey, I., Might, M., Van Horn, D.: Introspective pushdown analysis of
higher-order programs. In: Proceedings of the 17th ACM SIGPLAN International
Conference on Functional Programming. pp. 177–188. ICFP ’12, ACM, New York,
NY, USA (Sep 2012). https://doi.org/10.1145/2364527.2364576
5. Flanagan, C., Sabry, A., Duba, B.F., Felleisen, M.: The essence of compiling with
continuations. In: Proceedings of the ACM SIGPLAN 1993 Conference on Pro-
gramming Language Design and Implementation. pp. 237–247. PLDI ’93, ACM,
New York, NY, USA (1993). https://doi.org/10.1145/155090.155113
6. Gilray, T., Lyde, S., Adams, M.D., Might, M., Van Horn, D.: Pushdown control-
flow analysis for free. In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT
Symposium on Principles of Programming Languages. pp. 691–704. POPL ’16,
ACM, New York, NY, USA (Jan 2016). https://doi.org/10.1145/2837614.2837631
7. Johnson, J.I., Sergey, I., Earl, C., Might, M., Van Horn, D.: Pushdown flow analysis
with abstract garbage collection. Journal of Functional Programming 24, 218–283
(May 2014). https://doi.org/10.1017/s0956796814000100
8. Johnson, J.I., Van Horn, D.: Abstracting abstract control. In: Proceedings of the
10th ACM Symposium on Dynamic Languages. pp. 11–22. DLS ’14, ACM, New
York, NY, USA (Oct 2014). https://doi.org/10.1145/2661088.2661098
9. Jones, N.D.: Flow analysis of lambda expressions. In: International Colloquium on
Automata, Languages, and Programming. pp. 114–128. Springer (1981)
10. Might, M., Shivers, O.: Improving flow analyses via Γ CFA: abstract garbage collec-
tion and counting. In: Proceedings of the Eleventh ACM SIGPLAN International
Conference on Functional Programming. pp. 13–25. ICFP ’06, ACM, New York,
NY, USA (Sep 2006). https://doi.org/10.1145/1159803.1159807
11. Might, M., Smaragdakis, Y., Van Horn, D.: Resolving and exploiting the k -CFA
paradox: illuminating functional vs. object-oriented program analysis. In: Proceed-
ings of the 31st ACM SIGPLAN Conference on Programming Language Design and
Implementation. pp. 305–315. PLDI ’10, ACM, New York, NY, USA (Jun 2010).
https://doi.org/10.1145/1806596.1806631
12. Peng, F.: h-CFA: A simplified approach for pushdown control flow analysis. Mas-
ter’s thesis, The University of Wisconsin-Milwaukee (2016)
13. Reynolds, J.C.: Definitional interpreters for Higher-Order programming languages.
Higher-Order and Symbolic Computation 11(4), 363–397 (1998)
14. Shivers, O.: Control-Flow Analysis of Higher-Order Languages. Ph.D. thesis,
Carnegie Mellon University, Pittsburgh, PA, USA (1991)
15. Van Horn, D., Might, M.: Abstracting abstract machines. In: Proceedings
of the 15th ACM SIGPLAN International Conference on Functional Pro-
gramming. pp. 51–62. ICFP ’10, ACM, New York, NY, USA (Sep 2010).
https://doi.org/10.1145/1863543.1863553
Liberate Abstract Garbage Collection 223

16. Vardoulakis, D., Shivers, O.: CFA2: A context-free approach to control-flow anal-
ysis. In: Gordon, A.D. (ed.) Programming Languages and Systems. pp. 570–589.
Springer Berlin Heidelberg, Berlin, Heidelberg (2010)
17. Vardoulakis, D., Shivers, O.: CFA2: a context-free approach to control-
flow analysis. Logical Methods in Computer Science 7(2) (2011).
https://doi.org/10.2168/LMCS-7(2:3)2011
18. Wei, G., Decker, J., Rompf, T.: Refunctionalization of abstract abstract machines:
bridging the gap between abstract abstract machines and abstract definitional in-
terpreters (functional pearl). Proceedings of the ACM on Programming Languages
2(ICFP), 105:1–105:28 (Jul 2018). https://doi.org/10.1145/3236800

Ákos Hajdu1 and Dejan Jovanović2

1
Budapest University of Technology and Economics, Budapest, Hungary
[email protected]
2
SRI International, New York City, USA
[email protected]

Abstract. Solidity is the dominant programming language for Ethereum

smart contracts. This paper presents a high-level formalization of the So-
lidity language with a focus on the memory model. The presented formal-
ization covers all features of the language related to managing state and
memory. In addition, the formalization we provide is effective: all but few
features can be encoded in the quantifier-free fragment of standard SMT
theories. This enables precise and efficient reasoning about the state of
smart contracts written in Solidity. The formalization is implemented in
the solc-verify verifier and we provide an extensive set of tests that
covers the breadth of the required semantics. We also provide an evalu-
ation on the test set that validates the semantics and shows the novelty
of the approach compared to other Solidity-level contract analysis tools.

1 Introduction
Ethereum [32] is a public blockchain platform that provides a novel computing
paradigm for developing decentralized applications. Ethereum allows the deploy-
ment of arbitrary programs (termed smart contracts [31]) that operate over the
blockchain state. The public can interact with the contracts via transactions. It
is currently the most popular public blockchain with smart contract functional-
ity. While the nodes participating in the Ethereum network operate a low-level,
stack-based virtual machine (EVM) that executes the compiled smart contracts,
the contracts themselves are mostly written in a high-level, contract-oriented
programming language called Solidity [30].
Even though smart contracts are generally short, they are no less prone
to errors than software in general. In the Ethereum context, any flaws in the
contract code come with potentially devastating financial consequences (such as
the infamous DAO exploit [17]). This has inspired a great interest in applying
formal verification techniques to Ethereum smart contracts (see e.g., [4] or [14] for
surveys). In order to apply formal verification of any kind, be it static analysis or

The author was also affiliated with SRI International as an intern during this project.
Supported by the ÚNKP-19-3 New National Excellence Program of the Ministry for
Innovation and Technology.
c The Author(s) 2020
P. Müller (Ed.): ESOP 2020, LNCS 12075, pp. 224–250, 2020.
https://doi.org/10.1007/978-3-030-44914-8_ 9
SMT-Friendly Formalization of the Solidity Memory Model 225

model checking, the first step is to formalize the semantics of the programming
language that the smart contracts are written in. Such semantics should not
only remain an exercise in formalization, but should preferably be developed,
resulting in precise and automated verification tools.
Early approaches to verification of Ethereum smart contracts focused mostly
on formalizing the low-level virtual machine precisely (see, e.g., [11,19,21,22,2]).
However, the unnecessary details of the EVM execution model make it difficult to
reason about high-level functional properties of contracts (as they were written
by developers) in an effective and automated way. For Solidity-level properties
of smart contracts, Solidity-level semantics are preferred. While some aspects
of Solidity have been studied and formalized [23,10,15,33], the semantics of the
Solidity memory model still lacks a detailed and precise formalization that also
enables automation.
The memory model of Solidity has various unusual and non-trivial behaviors,
providing a fertile ground for potential bugs. Smart contracts have access to two
classes of data storage: a permanent storage that is a part of the global blockchain
state, and a transient local memory used when executing transactions. While the
local memory uses a standard heap of entities with references, the permanent
storage has pure value semantics (although pointers to storage can be declared
locally). This memory model that combines both value and reference semantics,
with all interactions between the two, poses some interesting challenges but
also offers great opportunities for automation. For example, the value semantics
of storage ensures non-aliasing of storage data. This can, if supported by an
appropriate encoding of the semantics, potentially improve both the precision
and effectiveness of reasoning about contract storage.
This paper provides a formalization of the Solidity semantics in terms of a
simple SMT-based intermediate language that covers all features related to man-
aging contract storage and memory. A major contribution of our formalization
is that all but few of its elements can be encoded in the quantifier-free fragment
of standard SMT theories. Additionally, our formalization captures the value se-
mantics of storage with implicit non-aliasing information of storage entities. This
allows precise and effective verification of Solidity smart contracts using modern
SMT solvers. The formalization is implemented in the open-source solc-verify
tool [20], which is a modular verifier for Solidity based on SMT solvers. We val-
idate the formalization and demonstrate its effectiveness by evaluating it on a
comprehensive set of tests that exercise the memory model. We show that our
formalization significantly improves the precision and soundness compared to
existing Solidity-level verifiers, while remarkably outperforming low-level EVM-
based tools in terms of efficiency.

2 Background
2.1 Ethereum
Ethereum [32,3] is a generic blockchain-based distributed computing platform.
The Ethereum ledger is a storage layer for a database of accounts (identiﬁed
226 Á. Hajdu and D. Jovanović

by addresses) and the data associated with the accounts. Every account has
an associated balance in Ether (the native cryptocurrency of Ethereum). In
addition, an account can also be associated with the executable bytecode of a
contract and the contract state.
Although Ethereum contracts are deployed to the blockchain in the form
of the bytecode of the Ethereum Virtual Machine (EVM) [32], they are gener-
ally written in a high-level programming language called Solidity [30] and then
compiled to EVM bytecode. After deployment, the contract is publicly acces-
sible and its code cannot be modified. An external user, or another contract,
can interact with a contract through its API by invoking its public functions.
This can be done by issuing a transaction that encodes the function to be called
with its arguments, and contains the contract’s address as the recipient. The
Ethereum network then executes the transaction by running the contract code
in the context of the contract instance.
A contract instance has access to two different kinds of memory during its
lifetime: contract storage and memory.3 Contract storage is a dedicated data
store for a contract to store its persistent state. At the level of the EVM, it is
an array of 256-bit storage slots stored on the blockchain. Contract data that
fits into a slot, or can be sliced into fixed number of slots, is usually allocated
starting from slot 0. More complex data types that do not fit into a fixed number
of slots, such as mappings, or dynamic arrays, are not supported directly by the
EVM. Instead, they are implemented by the Solidity compiler using storage as a
hash table where the structured data is distributed in a deterministic collision-
free manner. Contract memory is used during the execution of a transaction on
the contract, and is deleted after the transaction finishes. This is where function
parameters, return values and temporary data can be allocated and stored.

2.2 Solidity

Solidity [30] is the high-level programming language supporting the develop-

ment of Ethereum smart contracts. It is a full-ﬂedged object-oriented program-
ming language with many features focusing on enabling rapid development of
Ethereum smart contracts. The focus of this paper is the semantics of the Solid-
ity memory model: the Solidity view of contract storage and memory, and the
operations that can modify it. Thus, we restrict the presentation to a generous
fragment of Solidity that is relevant for discussing and formalizing the memory
model. An example contract that illustrates relevant features is shown in Fig-
ure 1, and the abstract syntax of the targeted fragment is presented in Figure 2.
We omit parts of Solidity that are not relevant to the memory model (e.g., in-
heritance, loops, blockchain-speciﬁc members). We also omit low-level, unsafe
features that can break the Solidity memory model abstractions (e.g., assembly
and delegatecall).
3
There is an additional data location named calldata that behaves the same as mem-
ory, but is used to store parameters of external functions. For simplicity, we omit it
in this paper.
SMT-Friendly Formalization of the Solidity Memory Model 227

contract DataStorage {
struct Record {
bool set ;
int [] data ;
}

mapping ( address = > Record ) private records ;

function append ( address at , int d ) public {

Record storage r = records [ at ];
r . set = true ;
r . data . push ( d ) ;
}
function isset ( Record storage r ) internal view returns ( bool s ) {
s = r . set ;
}
function get ( address at ) public view returns ( int [] memory ret ) {
require ( isset ( records [ at ]) ) ;
ret = records [ at ]. data ;
}
}

Fig. 1: An example contract illustrating commonly used features of the Solidity

memory model. The contract keeps an association between addresses and data
and allows users to query and append to their data.

Contracts. Solidity contracts are similar to classes in object-oriented program-

ming. A contract can deﬁne any additional types needed, followed by the dec-
laration of the state variables and contract functions, including an optional sin-
gle constructor function. The contract’s state variables deﬁne the only persis-
tent data that the contract instance stores on the blockchain. The constructor
function is only used once, when a new contract instance is deployed to the
blockchain. Other public contract functions can be invoked arbitrarily by exter-
nal users through an Ethereum transaction that encodes the function call data
and designates the contract instance as the recipient of the transaction.

Example 1. The contract DataStorage in Figure 1 deﬁnes a struct type Record.

Then it defines the contract storage as a single state variable records. Finally
three contract functions are defined append(), isset(), and get(). Note that
a constructor is not defined and, in this case, a default constructor is provided
to initialize the contract state to default values.

Solidity supports further concepts from object-oriented programming, such as in-

heritance, function modiﬁers, and overloading (also covered by our implementa-
tion [20]). However, as these are not relevant for the formalization of the memory
model we omit them to simplify our presentation.

Types. Solidity is statically typed and provides two classes of types: value types
and reference types. Value types include elementary types such as addresses,
integers, and Booleans that are always passed by value. Reference types, on the
other hand, are passed by reference and include structs, arrays and mappings.
228 Á. Hajdu and D. Jovanović

TypeName ::= address | int | uint | bool Value types

| mapping(TypeName => TypeName) Mapping
| TypeName[] | TypeName[n] Arrays
| StructName Struct name
DataLoc ::= storage | memory Data location
lval ::= id Identifier
| expr .id Member access
| expr [expr ] Index access
expr ::= lval Lvalue
| expr ? expr : expr Conditional
| new TypeName[](expr ) New memory array
| StructName(expr ∗ ) New memory struct
stmt ::= TypeName DataLoc? id [= expr ]; Local variable declaration
| (lval)∗ = (expr)∗ ; Assignment (tuples)
| lval .push(expr ); Push
| lval .pop(); Pop
| delete lval ; Delete
StructMem ::= TypeName id ; Struct member
StructDef ::= struct StructName { StructMem ∗ } Struct definition
StateVar ::= TypeName id ; State variable definition
FunPar ::= TypeName DataLoc? id Function parameter
Fun ::= function id (FunPar ∗ ) Function definition
[returns (FunPar ∗ )] { stmt ∗ }
Constr ::= constructor(FunPar ∗ ) { stmt ∗ } Constructor definition
Contract ::= contract id Contract definition
{StructDef ∗ StateVar ∗ Constr ? Fun ∗ }

Fig. 2: Syntax of the targeted Solidity fragment.

A struct consists of a ﬁxed number of members. An array is either ﬁxed-size or

dynamically-sized and besides the elements of the base type, it also includes a
length ﬁeld holding the number of elements. A mapping is an associative array
mapping keys to values. The important caveat is that the table does not actually
store the keys so it is not possible to check if a key is deﬁned in the map.

Example 2. The contract in Figure 1 uses the following types. The records
variable is a mapping from addresses to Record structures which, in turn, consist
of a Boolean value and a dynamically-sized integer array. It is a common practice
to deﬁne a struct with a Boolean member (set) to indicate that a mapping value
has been set. This is because Solidity mappings do not store keys: any key can
be queried, returning a default value if no value was associated previously.

Data locations for reference types. Data of reference types resides in a data
location that is either storage or memory. Storage is the persistent store used
for state variables of the contract. In contrast, memory is used during execution
of a transaction to store function parameters, return values and local variables,
and it is deleted after the transaction ﬁnishes.
SMT-Friendly Formalization of the Solidity Memory Model 229

Semantics of reference types diﬀer fundamentally depending on the data loca-

tion that they are stored in. Layout of data in the memory data location resem-
bles the memory model common in Java-like programming languages: there is a
heap where reference types are allocated and any entity in the heap can contain
values of value types, and references to other memory entities. In contrast, the
storage data location treats and stores all entities, including those of reference
types, as values with no references involved. Mixing storage and memory is not
possible: the data location of a reference type is propagated to its elements and
members. This means that storage entities cannot have references to memory
entities, and memory entities cannot have reference types as values. Storage of
a contract can be viewed as a single value with no aliasing possible.

contract C { t T
struct T { function f ( S memory sm1 ) public {
S T memory tm = sm1 . ta [1];
int z ;
s T
} S memory sm2 = S (0 , sm1 . ta ) ;
struct S { T }
int x ;
T [] ta ; S
} T S T
sm1
T t; T
S s; sa S
T tm
S [] sa ;
} T S
T sm2

(a) (b) (c)

Fig. 3: An example illustrating reference types (structs and arrays) and their lay-
out in storage and memory: (a) a contract deﬁning types and state variables; (b)
an abstract representation of the contract storage as values; and, (c) a function
using the memory data location and a possible layout of the data in memory.

Example 3. Consider the contract C defined in Figure 3a. The contract defines
two reference struct types S and T, and declares state variables s, t, and sa.
These variables are maintained in storage during the contract lifetime and they
are represented as values with no references within. A potential value of these
variables is shown in Figure 3b. On the other hand, the top of Figure 3c shows a
function with three variables in the memory data location, one as the argument
to the function, and two defined within the function. Because they are in memory,
these variables are references to heap locations. Any data of reference types,
stored within the structures and arrays, is also a reference and can be reallocated
or assigned to point to an existing heap location. This means that the layout of
the data can contain arbitrary graphs with arbitrary aliasing. A potential layout
of these variables is shown at the bottom of Figure 3c.

Functions. Functions are the Solidity equivalent of methods in classes. They

receive data as arguments, perform computations, manipulate state variables
230 Á. Hajdu and D. Jovanović

and interact with other Ethereum accounts. Besides accessing the storage of the
contract through its state variables, functions can also deﬁne local variables, in-
cluding function arguments and return values. Variables of value types are stored
as values on a stack. Variables of reference types must be explicitly declared with
a data location, and are always pointers to an entity in that data location (stor-
age or memory). A pointer to storage is called a local storage pointer. As the
storage is not memory in the usual sense, but a value instead, one can see storage
pointers as encoding a path to one reference type entity in the storage.

Example 4. Consider the example in Figure 1. The local variable r in function

append() points to the struct at index at of the state variable records (residing
in the contract storage). In contrast, the return value ret of function get() is
a pointer to an integer array in memory.

Statements and expressions. Solidity includes usual programming statements

and control structures. To keep the presentation simple, we focus on the state-
ments that are related to the formalization of the memory model: local variable
declarations, assignments, array manipulation, and the delete statement.4 So-
lidity expressions relevant for the memory model are identiﬁers, member and
array accesses, conditionals and allocation of new arrays and structs in memory.
If a value is not provided, local variable declarations automatically initialize
the variable to a default value. For reference types in memory, this allocates new
entities on the heap and performs recursive initialization of its members. For
reference types in storage, the local storage pointers must always be explicitly
initialized to point to a storage member. This ensures that no pointer is ever
“null”. Value types are initialized to their simple default value (0, false). Behavior
of assignment in Solidity is complex (see Section 3.5) and depends on the data
location of its arguments (e.g., deep copy or pointer assignment). Dynamically-
sized storage arrays can be extended by pushing an element to their end, or
can be shrunk by popping. The delete statement assigns the default value
(recursively for reference types) to a given entity based on its type.

Example 5. The assignment r.set = true in the append() function of Figure 1

is a simple value assignment. On the other hand, ret = records[at].data in
the get() function allocates a new array on the heap and performs a deep copy
of data from storage to memory.

2.3 SMT-Based Programs

We formalize the semantics of the Solidity fragment by translating it to a simple

programming language that uses SMT semantics [9,12] for the types and data.
The syntax of this language is shown in Figure 4. The syntax is purposefully
4
Our implementation [20] supports a majority of statements, excluding low-level op-
erations (such as inline assembly). Loops are also supported and can be speciﬁed
with loop invariants.
SMT-Friendly Formalization of the Solidity Memory Model 231

TypeName ::= int | bool Integer, Boolean

| [TypeName]TypeName SMT array
| DataTypeName SMT datatype
DataTypeDef ::= DataTypeName((id : TypeName)∗ ) Datatype definition
expr ::= id Identifier
| expr [expr ] Array read
| expr [expr ← expr ] Array write
| DataTypeName(expr ∗ ) Datatype constructor
| expr .id Member selector
| ite(expr , expr , expr ) Conditional
| expr + expr | expr − expr Arithmetic expression
VarDecl ::= id : TypeName Variable declaration
stmt ::= id := expr Assignment
| if expr then stmt ∗ else stmt ∗ If-then-else
| assume(expr ) Assumption
Program ::= DataTypeDef ∗ VarDecl ∗ stmt ∗ Program definition

Fig. 4: Syntax of SMT-based programs.

minimal and generic, so that it can be expressed in any modern SMT-based

verification tool (e.g., Boogie [5], Why3 [18] or Dafny [26]).5
The types of SMT-based programs are the SMT types: simple value types
such as Booleans and mathematical integers, and structured types such as ar-
rays [27,16] and inductive datatypes [8]. The expressions of the language are
standard SMT expressions such as identifiers, array reads and writes, datatype
constructors, member selectors, conditionals and basic arithmetic [7]. All vari-
ables are declared at the beginning of a program. The statements of the language
are limited to assignments, the if-then-else statement, and assumption statement.
SMT-based programs are a good fit for modeling of program semantics. For
one, they have clear semantics with no ambiguities. Furthermore, any property
of the program can be checked with SMT solvers: the program can be translated
directly to a SMT formula by a single static assignment (SSA) transformation.
Note that the syntax requires the left hand side of an assignment to be an
identifier. However, to make our presentation simpler, we will allow array read,
member access and conditional expressions (and their combination) as LHS.
Such constructs can be eliminated iteratively in the following way until only
identifiers appear as LHS in assignments.

– a[i] := e is equivalent to a := a[i ← e].

– d.mj := e is equivalent to d := D(d.m1 , . . . , d.mj−1 , e, d.mj+1 , . . . , d.mn ),
where D is the constructor of a datatype with members m1 , . . . , mn .
– ite(c, t, f ) := e is equivalent to if c then t := e else f := e.
5
Our current implementation is based on Boogie, but we have plans to introduce a
generic intermediate representation that could incorporate alternate backends such
as Why3 or Dafny.
232 Á. Hajdu and D. Jovanović

3 Formalization

In this section we present our formalization of the Solidity semantics through

a translation that maps Solidity elements to constructs in the SMT-based lan-
guage. The formalization is described top-down in separate subsections for types,
contracts, state variables, functions, statements, and expressions.

3.1 Types

We use T (.) to denote the function that maps a Solidity type to an SMT type.
This function is used in the translation of contract elements and can, as a side
effect, introduce datatype definitions and variable declarations. This is denoted
with [decl ] in the result of the function. To simplify the presentation, we assume
that such side effects are automatically added to the preamble of the SMT pro-
gram. Furthermore, we assume that declarations with the same name are only
added once. We use type(expr) to denote the original (Solidity) type of an ex-
pression (to be used later in the formalization). The definition of T (.) is shown
in Figure 5.

T (bool) =
˙ bool
T (address) =
˙ T (int) =
˙ T (uint) =
˙ int
T (mapping(K=>V ) storage) =
˙ [T (K)]T (V )
T (mapping(K=>V ) storptr) =
˙ [int]int
T (T [n] storage) =
˙ T (T [] storage)
T (T [n] storptr) =
˙ T (T [] storptr)
T (T [n] memory) =˙ T (T [] memory)
T (T [] storage) =
˙ StorArrT with [StorArrT (arr : [int]T (T ), length : int)]
T (T [] storptr) =
˙ [int]int
T (T [] memory) =˙ int with [MemArrT (arr : [int]T (T ), length : int)]
[arrheap T : [int]MemArrT ]
T (struct S storage) =
˙ StorStructS with [StorStructS (. . . , mi : T (Si ), . . .)]
T (struct S storptr) =
˙ [int]int
T (struct S memory) =˙ int with [MemStructS (. . . , mi : T (Si ), . . .)]
[structheap S : [int]MemStructS ]

Fig. 5: Formalization of Solidity types. Members of struct S are denoted as mi

with types Si .

Value types. Booleans are mapped to SMT Booleans while other value types
are mapped to SMT integers. Addresses are also mapped to SMT integers so
that arithmetic comparison and conversions between integers and addresses is
supported. For simplicity, we map all integers (signed or unsigned) to SMT
SMT-Friendly Formalization of the Solidity Memory Model 233

integers.6 Solidity also allows function types to store, pass around, and call
functions, but this is not yet supported by our encoding.

Reference types. The Solidity syntax does not always require the data location
for variable and parameter declarations. However, for reference types it is always
required (enforced by the compiler), except for state variables that are always
implicitly storage. In our formalization, we assume that the data location of
reference types is a part of the type. As discussed before, memory entities are
always accessed through pointers. However, for storage we distinguish whether
it is the storage reference itself (e.g., state variable) or a storage pointer (e.g.,
local variable, function parameter). We denote the former with storage and the
latter with storptr in the type name. Our modeling of reference types relies on
the generalized theory of arrays [16] and the theory of inductive data-types [8],
both of which are supported by modern SMT solvers (e.g., cvc4 [6] and z3 [28]).

Mappings and arrays. For both arrays and mappings, we abstract away the
implementation details of Solidity and model them with the SMT theory of
arrays and inductive datatypes. We formalize Solidity mappings simply as SMT
arrays. Both fixed- and dynamically-sized arrays are translated using the same
SMT type and we only treat them differently in the context of statements and
expressions. Strings and byte arrays are not discussed here, but we support them
as particular instances of the array type. To ensure that array size is properly
modeled we keep track of it in the datatype (length) along with the actual
elements (arr ).
For storage array types with base type T , we introduce an SMT datatype
StorArrT with a constructor that takes two arguments: an inner SMT array (arr )
associating integer indexes and the recursively translated base type (T (T )), and
an integer length. The advantage of this encoding is that the value semantics
of storage data is provided by construction: each array element is a separate
entity (no aliasing) and assigning storage arrays in SMT makes a deep copy.
This encoding also generalizes if the base type is a reference type.
For memory array types with base type T , we introduce a separate datatype
MemArrT (side effect). However, memory arrays are stored with pointer values.
Therefore the memory array type is mapped to integers, and a heap (arrheap T )
is introduced to associate integers (pointers) with the actual memory array
datatypes. Note that mixing data locations within a reference type is not possi-
ble: the element type of the array has the same data location as the array itself.
Therefore, it is enough to introduce two datatypes per element type T : one for
storage and one for memory. In the former case the element type will have value
semantics whereas in the latter case elements will be stored as pointers.

Structs. For each storage struct type S the translation introduces an inductive
datatype StorStructS , including a constructor for each struct member with types
6
Note that this does not capture the precise machine integer semantics, but this is
not relevant from the perspective of the memory model. Precise computation can be
provided by relying on SMT bitvectors or modular arithmetic (see, e.g., [20]).
234 Á. Hajdu and D. Jovanović

mapped recursively. Similarly to arrays, this ensures the value semantics of stor-
age such as non-aliasing and deep copy assignments. For each memory struct S
we also introduce a datatype MemStructS and a constructor for each member.7
However, the memory struct type itself is mapped to integers (pointer) and a
heap (structheap S ) is introduced to associate the pointers with the actual mem-
ory struct datatypes. Note that if a memory struct has members with reference
types, they are also pointers, which is ensured recursively by our encoding.

3.2 Local Storage Pointers

An interesting aspect of the storage data location is that, although the stored
data has value semantics, it is still possible to deﬁne pointers to an entity in
storage within a local context, e.g., with function parameters or local variables.
These pointers are called local storage pointers.

Example 6. In the append() function of Figure 1 the variable r is deﬁned to be

a convenience pointer into the storage map records[at]. Similarly, the isset()
function takes a storage pointer to a Record entity in storage as an argument.

Since our formalization uses SMT datatypes to encode the contract data in stor-
age, it is not possible to encode these pointers directly. A partial solution would
be to substitute each occurrence of the local pointer with the expression that is
assigned to it when it was defined. However, this approach is too simplistic and
has limitations. Local storage pointers can be reassigned, or assigned condition-
ally, or it might not be known at compile time which definition should be used.
Furthermore, local storage pointers can also be passed in as function arguments:
they can point to different storage entities for different calls.
We propose an approach to encode local storage pointers while overcoming
these limitations. Our encoding relies on the fact that storage data of a contract
can be viewed as a finite-depth tree of values. As such, each element of the stored
data can be uniquely identified by a finite path leading to it.8

Example 7. Consider the contract C in Figure 6a. The contract defines structs
T and S, and state variables of these types. If we are interested in all storage
entities of type T, we can consider the sub-tree of the contract storage tree that
has leaves of type T, as depicted in Figure 6b. The root of the tree is the contract
itself, with indexed sub-nodes for state variables, in order. For nodes of struct
type there are indexed sub-nodes leading to its members, in order. For each node
of array type there is a sub-node for the base type. Every pointer to a storage T
entity can be identified by a path in this tree: by fixing the index to each state
7
Mappings in Solidity cannot reside in memory. If a struct defines a mapping member
and it is stored in memory, the mapping is simply inaccessible. Such members could
be omitted from the constructor.
8
Solidity does support a limited form of recursive data-types. Such types could make
the storage a tree of potentially arbitrary depth. We chose not to support such types
as recursion is non-existing in Solidity types used in practice.
SMT-Friendly Formalization of the Solidity Memory Model 235

contract C {
struct T { unpack(ptr) =
int z ; t1 (0) ite(ptr[0] = 0,
} C T
t1,
struct S {
s1 (1) t (0) ite(ptr[0] = 1,
int x ;
S T
T t; ite(ptr[1] = 0,
T [] ts ; ts (1) (i)
} T[] T s1.t,
T t1 ;
ss (2) (i) t (0)
s1.ts[ptr[2]]),
S s1 ; S[] ite(ptr[2] = 0,
S T
S [] ss ;
} ts (1) (i) ss[ptr[1]].t,
T[] T ss[ptr[1]].ts[ptr[3]])))
(a) (b) (c)

Fig. 6: An example of packing and unpacking: (a) contract with struct deﬁnitions
and state variables; (b) the storage tree of the contract for type T; and (c) the
unpacking expression for storage pointers of type T.

variable, member, and array index, as seen in brackets in Figure 6b, such paths
can be encoded as an array of integers. For example, the state variable t1 can
be represented as [0], the member s1.t as [1, 0], and ss[8].ts[5] as [2, 8, 1, 5].

This idea allows us to encode storage pointer types (pointing to arrays, structs
or mappings) simply as SMT arrays ([int]int). The novelty of our approach is
that storage pointers can be encoded and passed around, while maintaining the
value semantics of storage data, without the need for quantiﬁers to describe
non-aliasing. To encode storage pointers, we need to address initialization and
dereference of storage pointers, while assignment is simply an assignment of
array values. When a storage pointer is initialized to a concrete expression, we
pack the indexed path to the storage entity (that the expression references) into
an array value. When a storage pointer is dereferenced (e.g., by indexing into or
accessing a member), the array is unpacked into a conditional expression that
will evaluate to a storage entity by decoding paths in the tree.

Storage tree. The storage tree for a given type T can be easily obtained by
ﬁltering the AST nodes of the contract deﬁnition to only include state variable
declarations and to, further, only include nodes that lead to a sub-node of type
T . We denote the storage tree for type T as tree(T ).9

Packing. Given an expression (such as ss[8].ts[5]), pack(.) uses the storage

tree for the type of the expression and encodes it to an array (e.g., [2, 8, 1, 5]) by
ﬁtting the expression into the tree. Pseudocode for pack(.) is shown in Figure 7.
To start, the expression is decomposed into a list of base sub-expressions. The
base expression of an identiﬁer id is id itself. For an array index e[i] or a member
9
In our implementation we do not explicitly compute the storage tree but instead
traverse directly the AST provided by the Solidity compiler.
236 Á. Hajdu and D. Jovanović

def packpath (node, subExprs, d, result):

foreach expr in subExprs do
if expr = id ∨ expr = e.id then
id (i)
find edge node −−−→ child;
result := result[d ← i];
if expr = e[idx] then
(i)
find edge node − −→ child;
result := result[d ← E(idx)];
node, d := child , d + 1;
return result
def pack(expr):
baseExprs := list of base sub-expressions of expr;
baseExpr := car(baseExprs);
if baseExpr is a state variable then
return packpath(tree(type(expr)), baseExprs, 0, constarr[int]int (0 ))
if baseExpr is a storage pointer then
result := constarr[int]int (0);
prefix := E(baseExpr );
foreach path to a leaf in tree(type(baseExpr )) do
pathResult, pathCond := prefix , true;
foreach kth edge on the path with label id (i) do
pathCond := pathCond ∧ prefix [k] = i
pathResult := packpath(leaf , cdr(baseExprs), len(path), pathResult);
result := ite(pathCond , pathResult, result);
return result

Fig. 7: Packing of an expressions. It returns a symbolic array expression that,

when evaluated, can identify the path to the storage entity that the expression
references.

access e.mi it is recursively the base expressions of e. We call the first element
of this list (denoted by car) the base expression (the innermost base expression).
The base expression is always either a state variable or a storage pointer, and
we consider these two cases separately.
If the base expression is a state variable, we simply align the expression along
the storage tree with the packpath function. The packpath function takes the
list of base sub-expressions, and the storage tree to use for alignment, and then
processes the expressions in order. If the current expression is an identifier (state
variable or member access), the algorithm finds the outgoing edge annotated with
the identifier (from the current node) and writes the index into the result array.
If the expression is an index access, the algorithm maps and writes the index
expression (symbolically) in the array. The expression mapping function E(.) is
introduced later in Section 3.6.
If the base expression is a storage pointer, the process is more general since
the “start” of the packing must accommodate any point in storage where the base
expression can point to. In this case the algorithm finds all paths to leaves in the
SMT-Friendly Formalization of the Solidity Memory Model 237

tree of the base pointer, identiﬁes the condition for taking that path and writes
the labels on the path to an array. Then it uses packpath to continue writing
the array with the rest of the expression (denoted by cdr), as before. Finally, a
conditional expression is constructed with all the conditions and packed arrays.
Note, that the type of this conditional is still an SMT array of integers as it is
the case for a single path.

Example 8. For contract in Figure 6a, pack(ss[8].ts[5]) produces [2, 8, 1, 5] by

calling packpath on the base sub-expressions [ss, ss[8], ss[8].ts, ss[8].ts[5]].
First, 2 is added as ss is the state variable with index 2. Then, ss[8] is an index
access so 8 is mapped to 8 and added to the result. Next, ss[8].ts is a member
access with ts having the index 1. Finally, ss[8].ts[5] is an index access so 5
is mapped to 5 and added.

def unpack(ptr):
return unpack(ptr, tree(type(ptr)), empty, 0);
def unpack(ptr, node, expr, d):
result := empty;
if node has no outgoing edges then result := expr ;
if node is contract then
id (i)
foreach edge node −−−→ child do
result := ite(ptr[d] = i, unpack(ptr, child, id, d + 1), result);
if node is struct then
id (i)
foreach edge node −−−→ child do
result := ite(ptr[d] = i, unpack(ptr, child, expr.id, d + 1), result);
(i)
if node is array/mapping with edge node − −→ child then
result := unpack(ptr, child, expr[ptr[d]], d + 1);
return result;

Fig. 8: Unpacking of a local storage pointer into a conditional expression.

Unpacking. The opposite of pack() is unpack(), shown in Figure 8. This function

takes a storage pointer (of type [int]int) and produces a conditional expression
that decodes any given path into one of the leaves of the storage tree. The
function recursively traverses the tree starting from the contract node and accu-
mulates the expressions leading to the leaves. The function creates conditionals
when branching, and when a leaf is reached the accumulated expression is sim-
ply returned. For contracts we process edges corresponding to each state variable
by setting the subexpression to be the state variable itself. For structs we pro-
cess edges corresponding to each member by wrapping the subexpression into a
member access. For both contracts and structs, the subexpressions are collected
into a conditional as separate cases. For arrays and mappings we process the
238 Á. Hajdu and D. Jovanović

single outgoing edge by wrapping the subexpression into an index access using
the current element (at index d) of the pointer.

Example 9. For example, the conditional expression corresponding to the tree

in Figure 6b can be seen in Figure 6c. Given a pointer ptr, if ptr[0] = 0 then
the conditional evaluates to t1. Otherwise, if ptr[0] = 1 then s1 has to be taken,
where two leaves are possible: if ptr[1] = 0 then the result is s1.t otherwise it is
s1.ts[ptr[2]], and so on. If ptr is [2, 8, 1, 5] then the conditional evaluates exactly
to ss[8].ts[5] from which ptr was packed.10

Note that with inheritance and libraries [30] it is possible that a contract
defines a type T but has no nodes in its storage tree. The contract can still
define functions with storage pointers to T , which can be called by derived
contracts that define state variables of type T . In such cases we declare an array
of type [int]T (T ), called the default context, and unpack storage pointers to T
as if the default context was a state variable. This allows us to reason about
abstract contracts and libraries, modeling that their storage pointers can point
to arbitrary entities not yet declared.

3.3 Contracts, State Variables, Functions

The focus of our discussion is the Solidity memory model and, for presentation
purposes, we assume a minimalist setting where the important aspects of storage
and memory can be presented: we assume a single contract and a single function
to translate. Interactions between multiple functions are handled differently de-
pending on the verification approach. For example, in modular verification func-
tions are checked individually against specifications (pre- and post-conditions)
and function calls are replaced by their specification [20].

State variables. Each state variable si of a contract is mapped to a variable

declaration si : T (type(si )) in the SMT program.11 The data location of state
variables is always storage. As discussed previously, reference types are mapped
using SMT datatypes and arrays, which ensures non-aliasing by construction.
While Solidity optionally allows inline initializer expressions for state variables,
without the loss of generality we can assume that they are initialized in the
constructor using regular assignments.

10
Note that due to the “else” branches, unpack is a is a non-injective surjective func-
tion. For example, [a, 8, 1, 5] with any a ≥ 2 would evaluate to the same slot. However
this does not aﬀect our encoding as pointers cannot be compared and pack always
returns the same (unique) values.
11
Generalizing this to multiple contracts can be done directly by using a separate
one-dimensional heap for each state variable, indexed by a receiver parameter (this :
address) identifying the current contract instance (see, e.g., [20]).
SMT-Friendly Formalization of the Solidity Memory Model 239

defval(bool) =
˙ false
defval(address) =
˙ defval(int) =
˙ defval(uint) =
˙ 0
defval(mapping(K=>V )) =
˙ constarr[T (K)]T (V ) (defval(V ))
defval(T [] storage) =
˙ defval(T [0] storage)
defval(T [] memory) =˙ defval(T [0] memory)
defval(T [n] storage) =
˙ StorArrT (constarr[int]T (T ) (defval(T )), n)
defval(T [n] memory) =˙ [ref : int] (fresh symbol)
{ref := refcnt := refcnt + 1}
{arrheap T [ref].length := n}
{arrheap T [ref].arr[i] := defval(T )} for 0 ≤ i ≤ n
ref
defval(struct S storage) =
˙ StorStructS (. . . , defval(Si ), . . .)
defval(struct S memory) =˙ [ref : int] (fresh symbol)
{ref := refcnt := refcnt + 1}
{structheap S [ref].mi = defval(Si )} for each mi
ref

Fig. 9: Formalization of default values. We denote struct S members as mi with

types Si .

Functions calls. From the perspective of the memory model, the only important
aspect of function calls is the way parameters are passed in and how function
return values are treated. Our formalization is general in that it allows us to
treat both of the above as plain assignments (explained later in Section 3.5).
For each parameter pi and return value ri of a function, we add declarations
pi : T (type(pi )) and ri : T (type(ri )) in the SMT program. Note that for reference
types appearing as parameters or return values of the function, their types are
either memory or storage pointers.

Memory allocation. In order to model allocation of new memory entities, while

keeping some non-aliasing information, we introduce an allocation counter refcnt :
int variable in the preamble of the SMT program. This counter is incremented
for each allocation of memory entities and used as the address of the new entity.
For each parameter pi with memory data location we include an assumption
assume(pi ≤ refcnt) as they can be arbitrary pointers, but should not alias with
new allocations within the function. Note that if a parameter of memory pointer
type is a reference type containing other references, such non-aliasing constraints
need to be assumed recursively [25]. This can be done for structs by enumerating
members. But, for dynamic arrays it requires quantiﬁcation that is nevertheless
still decidable (array property fragment [13]).

Initialization and default values. If we are translating the constructor function,

each state variable si is ﬁrst initialized to its default value with a statement
si := defval(type(si )). For regular functions, we set each return value ri to its
default value with a statement ri := defval(type(ri )). We use defval(.), as deﬁned
240 Á. Hajdu and D. Jovanović

in Figure 9, to denote the function that maps a Solidity type to its default
value as an SMT expression. Note that, as a side effect, this function can do
allocations for memory entities, introducing extra declarations and statements,
denoted by [decl ] and {stmt}. As expected, the default value is false for Booleans
and 0 for other primitives that map to integers. For mappings from K to V , the
default value is an SMT constant array returning the default value of the value
type V for each key k ∈ K (see, e.g., [16]). The default value of storage arrays
is the corresponding datatype value constructed with a constant array of the
default value for base type T , and a length of n or 0 for fixed- or dynamically-
sized arrays. For storage structs, the default value is the corresponding datatype
value constructed with the default values of each member.
The default value of uninitialized memory pointers is unusual. Since Solidity
doesn’t support “null” pointers, a new entity is automatically allocated in mem-
ory and initialized to default values (which might include additional recursive
initialization). Note, that for fixed-size arrays Solidity enforces that the array
size n must be an integer literal or a compile time constant, so setting each
element to its default value is possible without loops or quantifiers. Similarly
for structs, each member is recursively initialized, which is again possible by
explicitly enumerating each member.

3.4 Statements

We use S. to denote the function that translates Solidity statements to a list
of statements in the SMT program. It relies on the type mapping function T (.)
(presented previously in Section 3.1) and on the expression mapping function E(.)
(to be introduced in Section 3.6). Furthermore, we define a helper function A(., .)
dedicated to modeling Solidity assignments (to be discussed in Section 3.5).
The definition of S. is shown in Figure 10. As a side effect, extra declarations
can be introduced to the preamble of the SMT program (denoted by [decl ]).
The Solidity documentation [30] does not precisely state the order of evaluating
subexpressions in statements. It only specifies that subnodes are processed before
the parent node. This problem is independent form the discussion of the memory
models so we assume that side effects of subexpressions are added in the same
order as it is implemented in the compiler. Furthermore, if a subexpression is
mapped multiple times, we assume that the side effects are only added once.
This makes our presentation simpler by introducing fewer temporary variables.
Local variable declarations introduce a variable declaration with the same
identifier in the SMT program by mapping the type.12 If an initialization ex-
pression is given, it is mapped using E(.) and assigned to the variable. Otherwise,
the default value is used as defined by defval(.) in Figure 9. Delete assigns the
default value for a type, which is simply mapped to an assignment in our formal-
ization. Solidity supports multiple assignments as one statement with a tuple-like
syntax. The documentation [30] does not specify the behavior precisely, but the
12
Without the loss of generality we assume that identifiers in Solidity are unique. The
compiler handles scoping and assigns an unique identifier to each declaration.
SMT-Friendly Formalization of the Solidity Memory Model 241

ST id ˙ [id : T (T )]; A(id, defval(T ))

=
ST id = expr =
˙ [id : T (T )]; A(id, E(expr))
Sdelete e = ˙ A(E(e), defval(type(e)))
Sl1 , . . . , ln = r1 , . . . , rn =
˙ [tmpi : T (type(ri ))] for 1 ≤ i ≤ n (fresh symbols)
A(tmpi , E(ri )) for 1 ≤ i ≤ n
A(E(li ), tmpi ) for n ≥ i ≥ 1 (reversed)
Se1 .push(e2 ) =
˙ A(E(e1 ).arr[E(e1 ).length], E(e2 ))
E(e1 ).length := E(e1 ).length + 1
Se.pop() ˙ E(e).length := E(e).length − 1
=
A(E(e).arr[E(e).length], defval(arrtype(E(e))))

Fig. 10: Formalization of statements.

contract C { contract C {
struct S { int x ; } struct S { int x ; }

S s1 , s2 , s3 ; S [] a ;

function primitiveAssign () { constructor () {

s1 . x = 1; s2 . x = 2; s3 . x = 3; a . push ( S (1) ) ;
( s1 .x , s3 .x , s2 . x ) = ( s3 .x , s2 .x , s1 . x ) ; S storage s = a [0];
// s1 . x == 3 , s2 . x == 1 , s3 . x == 2 a . pop () ;
} assert ( s . x == 1) ; // Ok
function storageAssign () { // Following is error
s1 . x = 1; s2 . x = 2; s3 . x = 3; // assert ( a [0]. x == 1) ;
( s1 , s3 , s2 ) = ( s3 , s2 , s1 ) ; }
// s1 .x , s2 .x , s3 . x are all equal to 1 }
}
}

Fig. 12: Example illustrat-

Fig. 11: Example illustrating the right-to-left ing a dangling pointer to
assignment order and the treatment of refer- storage.
ence types in storage in tuple assignment.

compiler ﬁrst evaluates the RHS and LHS tuples (in this order) from left to right
and then assignment is performed component-wise from right to left.

Example 10. Consider the tuple assignment in function primitiveAssign() in

Figure 11. From right to left, s2.x is assigned ﬁrst with the value of s1.x which
is 1. Afterwards, when s3.x is assigned with s2.x, the already evaluated (old)
value of 2 is used instead of the new value 1. Finally, s1.x gets the old value
of s3.x, i.e., 3. Note however, that storage expressions on the RHS evaluate
to storage pointers. Consider, for example, the function storageAssign() in
Figure 11. From right to left, s2 is assigned ﬁrst, with a pointer to s1 making
s2.x become 1. However, as opposed to primitive types, when s3 is assigned
next, s2 on the RHS is a storage pointer and thus the new value in the storage
of s2 is assigned to s3 making s3.x become 1. Similarly, s1.x also becomes 1
as the new value behind s3 is used.
242 Á. Hajdu and D. Jovanović

Array push increases the length and assigns the given expression as the last
element. Array pop decreases the length and sets the removed element to its
default value. While the removed element can no longer be accessed via indexing
into an array (a runtime error occurs), it can still be accessed via local storage
pointers (see Figure 12).13

3.5 Assignments
Assignments between reference types in Solidity can be either pointer assign-
ments or value assignments, involving deep copying and possible new allocations
in the latter case. We use A(lhs, rhs) to denote the function that assigns a rhs
SMT expression to a lhs SMT expression based on their original types and data
locations. The deﬁnition of A(., .) is shown in Figure 13. Value type assignments
are simply mapped to an SMT assignment. To make our presentation more
clear, we subdivide the other cases into separate functions for array, struct and
mapping operands, denoted by AA (., .), AS (., .) and AM (., .) respectively.

Mappings. As discussed previously, Solidity prohibits direct assignment of map-

pings. However, it is possible to declare a storage pointer to a mapping, in which
case the RHS expression is packed. It is also possible to assign two storage point-
ers, which simply assigns pointers. Other cases are a no-op.14

Structs and arrays. For structs and arrays the semantics of assignment is sum-
marized in Figure 14. However, there are some notable details in various cases
that we expand on below.
Assigning anything to storage LHS always causes a deep copy. If the RHS is
storage, this is simply mapped to a datatype assignment in our encoding (with
an additional unpacking if the RHS is storage pointer).15 If the RHS is memory,
deep copy for structs can be done member wise by accessing the heap with the
RHS pointer and performing the assignment recursively (as members can be
reference types themselves). For arrays, we access the datatype corresponding
to the array via the heap and do an assignment, which does a deep copy in
SMT. Note however, that this only works if the base type of the array is a
value type. For reference types, memory array elements are pointers and would
require being dereferenced during assignment to storage. As opposed to struct
members, the number of array elements is not known at compile time so loops or
quantiﬁers have to be used (as in traditional software analysis). However, this is a
13
The current version (0.5.x) of Solidity supports resizing arrays by assigning to
the length member. However, this behavior is dangerous and has been since re-
moved in the next version (0.6.0) (see https://solidity.readthedocs.io/en/v0.6.0/
060-breaking-changes.html). Therefore, we do not support this in our encoding.
14
This is consequence of the fact that keys are not stored in mappings and so the
assignment is impossible to perform.
15
This also causes mappings to be copied, which contradicts the current semantics.
However, we chose to keep the deep copy as assignments of mappings is planned to
be disallowed in the future (see https://github.com/ethereum/solidity/issues/7739).
SMT-Friendly Formalization of the Solidity Memory Model 243

A(lhs, rhs) =
˙ lhs := rhs for value type operands
A(lhs, rhs) =
˙ AM (lhs, rhs) for mapping type operands
A(lhs, rhs) =
˙ AS (lhs, rhs) for struct type operands
A(lhs, rhs) =
˙ AA (lhs, rhs) for array type operands
AM (lhs : sp, rhs : s) =˙ lhs := pack(rhs)
AM (lhs : sp, rhs : sp) =
˙ lhs := rhs
AM (lhs, rhs) ˙ {}
= (all other cases)
AS (lhs : s, rhs : s) =
˙lhs := rhs
AS (lhs : s, rhs : m) =
˙A(lhs.mi , structheap type(rhs) [rhs].mi ) for each mi
AS (lhs : s, rhs : sp) =
˙AS (lhs, unpack(rhs))
AS (lhs : m, rhs : m) =
˙lhs := rhs
AS (lhs : m, rhs : s) =
˙lhs := refcnt := refcnt + 1
A(structheap type(lhs) [lhs].mi , rhs.mi ) for each mi
AS (lhs : m, rhs : sp) =˙ AS (lhs, unpack(rhs))
AS (lhs : sp, rhs : s) =˙ lhs := pack(rhs)
AS (lhs : sp, rhs : sp) =
˙ lhs := rhs
AA (lhs : s, rhs : s) =
˙lhs := rhs
AA (lhs : s, rhs : m) =
˙lhs := arrheap type(rhs) [rhs]
AA (lhs : s, rhs : sp) =
˙AA (lhs, unpack(rhs))
AA (lhs : m, rhs : m) =
˙lhs := rhs
AA (lhs : m, rhs : s) =
˙lhs := refcnt := refcnt + 1
arrheap type(lhs) [lhs] := rhs
AA (lhs : m, rhs : sp) =˙ AA (lhs, unpack(rhs))
AA (lhs : sp, rhs : s) =˙ lhs := pack(rhs)
AA (lhs : sp, rhs : sp) =
˙ lhs := rhs

Fig. 13: Formalization of assignment based on diﬀerent type categories and data
locations for the LHS and RHS. We use s, sp and m after the arguments to
denote storage, storage pointer and memory types respectively.

special case, which can be encoded in the decidable array property fragment [13].
Assigning storage (or storage pointer) to memory is also a deep copy but in
the other direction. However, instead overwriting the existing memory entity, a
new one is allocated (recursively for reference typed elements or members). We
model this by incrementing the reference counter, storing it in the LHS and then
accessing the heap for deep copy using the new pointer.

3.6 Expressions
We use E(.) to denote the function that translates a Solidity expression to an
SMT expression. As a side effect, declarations and statements might be intro-
duced (denoted by [decl ] and {stmt} respectively). The definition of E(.) is shown
in Figure 15. As discussed in Section 3.4 we assume that side effects are added
from subexpressions in the proper order and only once.
Member access is mapped to an SMT member access by mapping the base
expression and the member name. There is an extra unpacking step for storage
244 Á. Hajdu and D. Jovanović

lhs/rhs Storage Memory Stor.ptr.

Storage Deep copy Deep copy Deep copy
Memory Deep copy Pointer assign Deep copy
Stor.ptr. Pointer assign Error Pointer assign

Fig. 14: Semantics of assignment between array and struct operands based on
their data location.

E(id ) =
˙ id
E(expr.id) =
˙ E(expr).E(id) if type(expr) = struct S storage
E(expr.id) =
˙ unpack(E(expr)).E(id) if type(expr) = struct S storptr
E(expr.id) =
˙ structheap S [E(expr)].E(id) if type(expr) = struct S memory
E(expr.id) =
˙ E(expr).E(id) if type(expr) = T [] storage
E(expr.id) =
˙ unpack(E(expr)).E(id) if type(expr) = T [] storptr
E(expr.id) =
˙ arrheap T [E(expr)].E(id) if type(expr) = T [] memory
E(expr[idx]) =
˙ E(expr).arr [E(idx)] if type(expr) = T [] storage
E(expr[idx]) =
˙ unpack(E(expr)).arr [E(idx)] if type(expr) = T [] storptr
E(expr[idx]) =
˙ arrheap T [E(expr)].arr [E(idx)] if type(expr) = T [] memory
E(expr[idx]) =
˙ E(expr)[E(idx)] if type(expr) = mapping(K=>V ) storage
E(expr[idx]) =
˙ unpack(E(expr))[E(idx)] if type(expr) = mapping(K=>V ) storptr

E(cond ? exprT : exprF ) =

˙ [varT : T (type(cond ? exprT : exprF ))] (fresh symbol)
[varF : T (type(cond ? exprT : exprF ))] (fresh symbol)
{A(varT , E(exprT ))}
{A(varF , E(exprF ))}
ite(E(cond), varT , varF )
E(new T[](expr)) =
˙ [ref : int] (fresh symbol)
{ref := refcnt := refcnt + 1}
{arrheap T [ref].length := E(expr)}
{arrheap T [ref].arr[i] := defval(T )} for 0 ≤ i ≤ E(expr)
ref
E(S( . . . , expri , . . . )) =
˙ [ref : int] (fresh symbol)
{ref := refcnt := refcnt + 1}
{structheap S [ref].mi := E(expri )} for each member mi
ref

Fig. 15: Formalization of expressions. We denote struct S members as mi with

types Si .

pointers and a heap access for memory. Note that the only valid member for
arrays is length. Index access is mapped to an SMT array read by mapping the
base expression and the index, and adding en extra member access for arrays to
get the inner array arr of elements from the datatype. Furthermore, similarly to
member accesses, an extra unpacking step is needed for storage pointers and a
heap access for memory.
SMT-Friendly Formalization of the Solidity Memory Model 245

Conditionals in Solidity can be mapped to an SMT conditional in general.

However, data locations can be different for the true and false branches, causing
possible side effects. Therefore, we first introduce fresh variables for the true
and false branch with the common type (of the whole conditional), then make
assignments using A(., .) and finally use the new variables in the conditional. The
documentation [30] does not specify the common type, but the compiler returns
memory if any of the branches is memory, and storage pointer otherwise.
Allocating a new array in memory increments the reference counter, sets the
length and the default values for each element (recursively). Note that in general
the length might not be a compile time constant,in which case setting default
values could be encoded with the array property fragment (similarly to deep
copy in assignments) [13]. Allocating a new memory struct also increments the
reference counter and sets each value by translating the provided arguments.

4 Evaluation

The formalization described in this paper serves as the basis of our Solidity
verification tool solc-verify [20].16 In this section we provide an evaluation of
the presented formalization and our implementation by validating it on a set of
relevant test cases. For illustrative purposes we also compare our tool with other
available Solidity analysis tools.17
“Real world” contracts currently deployed on Ethereum (e.g., contract avail-
able on Etherscan) have limited value for evaluating memory model semantics.
Many such contracts use old compiler versions with constructs that are not sup-
ported anymore, and do not use newer features. There are also many toy and
trivial contracts that are deployed but not used, and popular contracts (e.g.
tokens) are over-represented with many duplicates. Furthermore, the inconsis-
tent usage of assert and require [20] makes evaluation hard. Evaluating the
memory semantics requires contracts that exercise diverse features of the mem-
ory model. There are larger dApps that do use more complex features (e.g.,
Augur or ENS), but these contracts also depend on many other features (e.g.
inheritance, modifiers, loops) that would skew the results.
Therefore we have manually developed a set of tests that try to capture
the interesting behaviors and corner cases of the Solidity memory semantics.
The tests are targeted examples that do not use irrelevant features. The set
is structured so that every target test behavior is represented with a test case
that sets up the state, exercises a specific feature and checks the correctness
of the behavior with assertions. This way a test should only pass if the tool
provides a correct verification result by modeling the targeted feature precisely.
16
solc-verify is open source, available at https://github.com/SRI-CSL/solidity. Be-
sides certain low-level constructs (such as inline assembly) solc-verify supports
a majority of Solidity features that we omitted from the presentation, including
inheritance, function modifiers, for/while loops and if-then-else.
17
All tests, with a Truffle test harness, a docker container with all the tools, and all indi-
vidual results are available at https://github.com/dddejan/solidity-semantics-tests.
246 Á. Hajdu and D. Jovanović

The correctness of the tests themselves is determined by running them through

the EVM with no assertion failures. Test cases are expanded to use all reference
types and combinations of reference types. This includes structures, mappings,
dynamic and fixed-size arrays, both single- and multi-dimensional.
The tests are organized into the following classes. Tests in the assignment
class check whether the assign statement is properly modeled. This includes
assignments in the same data location, but also assignments across data locations
that need deep copying, and assignments and re-assignments of memory and
storage pointers. The delete class of tests checks whether the delete statement
is properly modeled. Tests in the init class check whether variable and data
initialization is properly modeled. For variables in storage, we check if they are
properly initialized to default values in the contract constructor. Similarly, we
check whether memory variables are properly initialized to provided values, or
default values when no initializer is provided. The storage class of tests checks
whether storage itself is properly modeled for various reference types, including
for example non-aliasing. Tests in the storageptr class check whether storage
pointers are modeled properly. This includes checking if the model properly
treats storage pointers to various reference types, including nested types. In
addition, the tests check that the storage pointers can be properly passed to
functions and ensure non-aliasing for distinct parts of storage.
For illustrative purposes we include a comparison with the following avail-
able Solidity analysis tools: mythril v0.21.17 [29], verisol v0.1.1-alpha [24],
and smt-checker v0.5.12 [1]. mythril is a Solidity symbolic execution tool
that runs analysis at the level of the EVM bytecode. verisol is similar to
solc-verify in that it uses Boogie to model the Solidity contracts, but takes
the traditional approach to modeling memory and storage with pointers and
quantifiers. smt-checker is an SMT-based analysis module built into the So-
lidity compiler itself. There are other tools that can be found in the literature,
but they are either basic prototypes that cannot handle realistic features we are
considering, or are not available for direct comparison.
We ran the experiments on a machine with Intel Xeon E5-4627 v2 @ 3.30GHz
CPU enforcing a 60s timeout and a memory limit of 64GB. Results are shown in
Table 1. As expected, mythril has the most consistent results on our test set.
This is because mythril models contract semantics at the EVM level and does
not need to model complex Solidity semantics. Nevertheless, the results also in-
dicate that the performance penalty for this precision is significant (8 timeouts).
verisol, as the closest to our approach, still doesn’t support many features and
has a significant amount of false reports for features that it does support. Many
false reports are because their model of storage is based on pointers and tries
to ensure storage consistency with the use of quantifiers. smt-checker doesn’t
yet support the majority of the Solidity features that our tests target.
Based on the results, solc-verify performs well on our test set, matching
the precision of mythril at very low computational cost. The few false alarms
we have are either due to Solidity features that we chose to not implement (e.g.,
proper treatment of mapping assignments), or parts of the semantics that we
SMT-Friendly Formalization of the Solidity Memory Model 247

Table 1: Results of evaluating mythril, verisol, smt-checker, and solc-

verify on our test suite.

assignment (102) correct incorrect unsupported timeout time (s)

mythril 94 0 0 8 1655.14
verisol 10 61 31 0 175.27
smt-checker 6 9 87 0 15.25
solc-verify 78 8 16 0 62.81
delete (14) correct incorrect unsupported timeout time (s)
mythril 13 1 0 0 47.51
verisol 3 8 3 0 24.66
smt-checker 0 0 14 0 0.30
solc-verify 7 1 6 0 9.02
init (18) correct incorrect unsupported timeout time (s)
mythril 15 3 0 0 59.67
verisol 7 8 3 0 28.82
smt-checker 0 0 18 0 0.41
solc-verify 13 5 0 0 11.88
storage (27) correct incorrect unsupported timeout time (s)
mythril 27 0 0 0 310.40
verisol 12 15 0 0 43.45
smt-checker 2 0 25 0 1.32
solc-verify 27 0 0 0 17.61
storageptr (164) correct incorrect unsupported timeout time (s)
mythril 164 0 0 0 1520.29
verisol 128 19 17 0 203.93
smt-checker 4 18 142 0 21.93
solc-verify 164 0 0 0 96.92

only implemented partially (such as deep copy of arrays with reference types
and recursively initializing memory objects). There are no technical diﬃculties
in supporting them and they are planned in the future.

5 Related Work

There is a strong push in the Ethereum community to apply formal methods

to smart contract veriﬁcation. This includes many attempts to formalize the
semantics of smart contracts, both at the level of EVM and Solidity.

EVM-level semantics. Bhargavan et al. [11] decompile a fragment of EVM to F*,

modeling EVM as a stack based machine with word and byte arrays for storage
and memory. Grishchenko et al. [19] extend this work by providing a small
step semantics for EVM. Kevm [21] provides an executable formal semantics of
EVM in the K framework. Hirai [22] formalizes EVM in Lem, a language used by
248 Á. Hajdu and D. Jovanović

some interactive theorem provers. Amani et al. [2] extends this work by deﬁning
a program logic to reason about EVM bytecode.

Solidity-level semantics. Jiao et al. [23] formalize the operational semantics of

Solidity in the K framework. Their formalization focuses on the details of bit-
precise sizes of types, alignment and padding in storage. They encode storage
slots, arrays and mappings with the full encoding of hashing. However, the for-
malization does not describe assignments (e.g., deep copy) apart from simple
cases. Furthermore, user defined structs are also not mentioned. In contrast, our
semantics is high-level and abstracts away some details (e.g., hashes, alignments)
to enable efficient verification. Additionally, we provide proper modeling of dif-
ferent cases for assignments between storage and memory. Bartotelli et al. [10]
propose TinySol, a minimal core calculus for a subset of Solidity, required to
model basic features such as asset transfer and reentrancy. Contract data is mod-
eled as a key value store, with no differences in storage and memory, or in value
and reference types. Crafa et al. [15] introduce Featherweight Solidity, a calculus
formalizing core features of the language, with focus on primitive types. Data
locations and reference types are not discussed, only mappings are mentioned
briefly. The main focus is on the type system and type checking. They propose an
improved type system that can statically detect unsafe casts and callbacks. The
closest to our work is the work of Zakrzewski [33], a Coq formalization focusing
on functions, modifiers, and the memory model. The memory model is treated
similarly: storage is a mapping from names to storage objects (values), memory is
a mapping from references to memory objects (containing references recursively)
and storage pointers define a path in storage. Their formalization is also high-
level, without considering alignment, padding or hashing. The formalization is
provided as big step functional semantics in Coq. While the paper presents some
example rules, the formalization does not cover all cases. For example the details
of assignments (e.g., memory to storage), push/pop for arrays, treating memory
aliasing and new expressions. Furthermore, our approach focuses on SMT and
modular verification, which enables automated reasoning.

6 Conclusion
We presented a high-level SMT-based formalization of the Solidity memory
model semantics. Our formalization covers all aspects of the language related to
managing both the persistent contract storage and the transient local memory.
The novel encoding of storage pointers as arrays allows us to precisely model non-
aliasing and deep copy assignments between storage entities without the need
for quantifiers. The memory model forms the basis of our Solidity-level modular
verification tool solc-verify. We developed a suite of test cases exercising all
aspects of memory management with different combinations of reference types.
Results indicate that our memory model outperforms existing Solidity-level tools
in terms of soundness and precision, and is on par with low-level EVM-based
implementations, while having a significantly lower computational cost for dis-
charging verification conditions.
SMT-Friendly Formalization of the Solidity Memory Model 249

References
1. Alt, L., Reitwiessner, C.: SMT-based verification of Solidity smart con-
tracts. In: ISoLA 2018, LNCS, vol. 11247, pp. 376–388. Springer (2018).
https://doi.org/10.1007/978-3-030-03427-6 28
2. Amani, S., Bégel, M., Bortin, M., Staples, M.: Towards verifying ethereum smart
contract bytecode in Isabelle/HOL. In: Proceedings of the 7th ACM SIGPLAN In-
ternational Conference on Certified Programs and Proofs. pp. 66–77. ACM (2018)
3. Antonopoulos, A., Wood, G.: Mastering Ethereum: Building Smart Contracts and
Dapps. O’Reilly Media, Inc. (2018)
4. Atzei, N., Bartoletti, M., Cimoli, T.: A survey of attacks on Ethereum smart
contracts. In: POST 2017, LNCS, vol. 10204, pp. 164–186. Springer (2017).
https://doi.org/10.1007/978-3-662-54455-6 8
5. Barnett, M., Chang, B.Y.E., DeLine, R., Jacobs, B., Leino, K.R.M.: Boogie: A
modular reusable verifier for object-oriented programs. In: FMCO 2005, LNCS,
vol. 4111, pp. 364–387. Springer (2006). https://doi.org/10.1007/11804192 17
6. Barrett, C., Conway, C.L., Deters, M., Hadarean, L., Jovanović, D., King, T.,
Reynolds, A., Tinelli, C.: CVC4. In: CAV 2011, LNCS, vol. 6806, pp. 171–177.
Springer (2011). https://doi.org/10.1007/978-3-642-22110-1 14
7. Barrett, C., Fontaine, P., Tinelli, C.: The Satisfiability Modulo Theories Library
(SMT-LIB) (2016), www.SMT-LIB.org
8. Barrett, C., Shikanian, I., Tinelli, C.: An abstract decision procedure for satis-
fiability in the theory of recursive data types. Journal on Satisfiability, Boolean
Modeling and Computation 3, 21–46 (2007)
9. Barrett, C., Tinelli, C.: Satisfiability modulo theories. In: Handbook of Model
Checking, pp. 305–343. Springer (2018)
10. Bartoletti, M., Galletta, L., Murgia, M.: A minimal core calculus for Solidity con-
tracts. In: DPM 2019, CBT 2019, LNCS, vol. 11737, pp. 233–243. Springer (2019).
https://doi.org/978-3-030-31500-9 15
11. Bhargavan, K., Delignat-Lavaud, A., Fournet, C., Gollamudi, A., Gonthier, G.,
Kobeissi, N., Kulatova, N., Rastogi, A., Sibut-Pinote, T., Swamy, N., Zanella-
Béguelin, S.: Formal verification of smart contracts: Short paper. In: ACM Work-
shop on Programming Languages and Analysis for Security. pp. 91–96. ACM (2016)
12. Biere, A., Heule, M., van Maaren, H.: Handbook of satisfiability. IOS press (2009)
13. Bradley, A.R., Manna, Z., Sipma, H.B.: What’s decidable about ar-
rays? In: VMCAI 2006, LNCS, vol. 3855, pp. 427–442. Springer (2006).
https://doi.org/10.1007/11609773 28
14. Chen, H., Pendleton, M., Njilla, L., Xu, S.: A survey on ethereum systems security:
Vulnerabilities, attacks and defenses (2019), https://arxiv.org/abs/1908.04507
15. Crafa, S., Pirro, M.D., Zucca, E.: Is solidity solid enough? In: Financial Cryptog-
raphy Workshops (2019)
16. De Moura, L., Bjørner, N.: Generalized, efficient array decision procedures. In:
Formal Methods in Computer-Aided Design. pp. 45–52. IEEE (2009)
17. Dhillon, V., Metcalf, D., Hooper, M.: The DAO hacked. In: Blockchain Enabled
Applications, pp. 67–78. Apress (2017)
18. Filliâtre, J.C., Paskevich, A.: Why3 — where programs meet provers. In: ESOP
2013, LNCS, vol. 7792, pp. 125–128. Springer (2013). https://doi.org/10.1007/978-
3-642-37036-6 8
19. Grishchenko, I., Maffei, M., Schneidewind, C.: A semantic framework for the secu-
rity analysis of Ethereum smart contracts. In: POST 2018, LNCS, vol. 10804, pp.
243–269. Springer (2018). https://doi.org/10.1007/978-3-319-89722-6 10
250 Á. Hajdu and D. Jovanović

20. Hajdu, Á., Jovanović, D.: solc-verify: A modular verifier for Solidity smart con-
tracts. In: VSTTE 2019, LNCS, vol. 12301. Springer (2019), (In press)
21. Hildenbrandt, E., Saxena, M., Zhu, X., Rodrigues, N., Daian, P., Guth, D., Rosu,
G.: KEVM: A complete semantics of the Ethereum virtual machine. Tech. rep.,
IDEALS (2017)
22. Hirai, Y.: Defining the Ethereum virtual machine for interactive theorem
provers. In: FC 2017, LNCS, vol. 10323, pp. 520–535. Springer (2017).
https://doi.org/10.1007/978-3-319-70278-0 33
23. Jiao, J., Kan, S., Lin, S., Sanán, D., Liu, Y., Sun, J.: Executable operational
semantics of Solidity (2018), http://arxiv.org/abs/1804.01295
24. Lahiri, S.K., Chen, S., Wang, Y., Dillig, I.: Formal specification and verification of
smart contracts for azure blockchain. In: VSTTE 2019, LNCS, vol. 12301. Springer,
(In press)
25. Leino, K.R.M.: Ecstatic: An object-oriented programming language with an ax-
iomatic semantics. In: Proceedings of the Fourth International Workshop on Foun-
dations of Object-Oriented Languages (1997)
26. Leino, K.R.M.: Dafny: An automatic program verifier for functional cor-
rectness. In: LPAR 2010, LNCS, vol. 11247, pp. 348–370. Springer (2010).
https://doi.org/10.1007/978-3-642-17511-4 20
27. McCarthy, J.: Towards a mathematical science of computation. In: IFIP Congress.
pp. 21–28 (1962)
28. de Moura, L., Bjørner, N.: Z3: An efficient SMT solver. In: TACAS 2008, LNCS,
vol. 4963, pp. 337–340. Springer (2008). https://doi.org/10.1007/978-3-540-78800-
3 24
29. Mueller, B.: Smashing Ethereum smart contracts for fun and real profit. In: Pro-
ceedings of the 9th Annual HITB Security Conference (HITBSecConf) (2018)
30. Solidity documentation (2019), https://solidity.readthedocs.io/
31. Szabo, N.: Smart contracts (1994)
32. Wood, G.: Ethereum: A secure decentralised generalised transaction ledger (2017),
https://ethereum.github.io/yellowpaper/paper.pdf
33. Zakrzewski, J.: Towards verification of Ethereum smart contracts: A formalization
of core of Solidity. In: VSTTE 2018, LNCS, vol. 11294, pp. 229–247. Springer
(2018). https://doi.org/10.1007/978-3-030-03592-1 13

Sung-Shik Jongmans1,2,3 and Nobuko Yoshida3

1
Department of Computer Science, Open University, Heerlen, the Netherlands
2
CWI, Amsterdam, the Netherlands
3
Department of Computing, Imperial College, London, UK

Abstract. A key open problem with multiparty session types (MPST)

concerns their expressiveness: current MPST have inﬂexible choice, no
existential quantiﬁcation over participants, and limited parallel compo-
sition. This precludes many real protocols to be represented by MPST.
To overcome these bottlenecks of MPST, we explore a new technique
using weak bisimilarity between global types and endpoint types, which
guarantees deadlock-freedom and absence of protocol violations. Based
on a process algebraic framework, we present well-formed conditions for
global types that guarantee weak bisimilarity between a global type and
its endpoint types and prove their check is decidable. Our main practical
result, obtained through benchmarks, is that our well-formedness condi-
tions can be checked orders of magnitude faster than directly checking
weak bisimilarity using a state-of-the-art model checker.

1 Introduction
Background. To take advantage of modern parallel and distributed comput-
ing platforms, message-passing concurrency is becoming increasingly important.
Modern programming languages, however, offer insufficiently effective linguistic
support to guide programmers towards safe usage of message-passing abstrac-
tions (e.g., to prevent deadlocks or protocol violations).
Multiparty session types (MPST) [34]
constitute a static, correct-by-construc- G
tion approach to simplify concurrent project global
type G onto
programming, by offering a type-based each role
framework to specify message-passing L1 L2 ... Ln
type-check
protocols and ensure deadlock-freedom each process
and protocol conformance. The idea is Pi against
local type Li
to use behavioural types [1,37] to en- P1 P2 ... Pn
force protocols (i.e., patterns of admissi-
ble communications) between roles (e.g., Fig. 1: MPST framework
threads, processes, services) to avoid con-
currency bugs. The framework is illustrated in Fig. 1: first, a global type G (pro-
tocol specification; written by the programmer) is projected onto every role; then,
every resulting endpoint type (local type) Li (role specification) is type-checked
c The Author(s) 2020
P. Müller (Ed.): ESOP 2020, LNCS 12075, pp. 251–279, 2020.
https://doi.org/10.1007/978-3-030-44914-8_ 10
252 S. Jongmans and N. Yoshida

with the corresponding process Pi (role implementation). If every process is well-

typed against its local type, then their parallel composition is guaranteed to be
free of deadlocks and protocol violations relative to the global type. Notably,
common concurrency bugs as sends without receives, receives without sends,
and type mismatches (actual type sent vs. expected type received) are ruled
out statically. The MPST framework is language-agnostic: in recent years, prac-
tical implementations of MPST have been developed for several programming
languages, including Erlang, F#, Go, Java, and Scala [18,35,36,45,46,50].

Three open problems. Many practically relevant protocols cannot be spec-

ified as global types; this limits MPST’s applicability to real-world concurrent
programs. Specifically, while the original work [33] has been extended with sev-
eral advanced features (e.g., time [7,44], security [11,12,13,17], and parametrisa-
tion [18,25,47]), core features still have significant restrictions: inflexible choice,
no existential quantification over participants, and limited parallel composition.
1. Inflexible choice: In the original work [33], if there is a choice between
multiple branches, the sender in the first communication of each branch must be
the same, the receiver must be the same, and the message type must be different
(i.e., no non-determinism). Moreover, each role not involved in the first commu-
nication of each branch, must have the same behaviour in each continuation. For
instance, the following global type specifies a protocol where Client c repeatedly
requests an arithmetic Server s to compute the sum or product of two numbers:
μX. c s : Add · s c : Sum · X + c s : Mul · s c : Prod · X

Here, c s : Add speciﬁes a communication of an Add-message (with two numbers

as payload) from the Client to the Server, while · and + specify sequencing and
branching, and square brackets indicate operator precedence. This is a “good”
global type that satisﬁes the conditions. In contrast, the following “bad” global
type speciﬁes a protocol where Client c repeatedly requests addition and multi-
plication Servers s1 and s2 via Router r (payload types omitted; r1 r2 r3 : t
abbreviates r1 r2 : t · r2 r3 : t):
μX. c r s1 : Add · s1 c : Sum · X + c r s2 : Mul · s2 c : Prod · X

Several improvements to the original work have been proposed: Honda et al.
managed to allow each role r not involved in a choice to have different behaviour
in different branches [15], so long as r is made aware of which branch is chosen in a
timely and unambiguous fashion (e.g., the previous global type is still forbidden),
while Lange et al., Castagna et al., and Hu & Yoshida managed to allow choices
between different receivers [16,23,36,40]. For instance, the following global type
(the Client directly requests the specialised server) is allowed:
μX. c s1 : Add · s1 c : Sum · X + c s2 : Mul · s2 c : Prod · X

But, the following global type (two Clients c1 and c2 use Server S) is forbidden:
c1 s : Add · s c1 : Sum · X + c1 s : Mul · s c1 : Prod · X +

μX.
c2 s : Add · s c2 : Sum · X + c2 s : Mul · s c2 : Prod · X

Exploring Type-Level Bisimilarity towards More Expressive MPST 253

None of the existing works allow the above nondeterministic choices between
different senders. We call this the +-problem: how to add a choice constructor,
denoted by +, to specify choices between disjoint sender-receiver-label triples?
2. No existential quantification: Related to the +-problem is the ∃-
problem: how to add an existential role quantifier, denoted by ∃, to specify
the execution of ∃’s body for some role in ∃’s domain? For instance, instead
of writing a separate global type for 2 Clients, 3 Clients, etc., existential role
quantification allows us to write only one global type for any n>1 Clients:

μX. ∃r∈{ci | 1≤i≤n}. r s : Add · s r : Sum · X + r s : Mul · s r : Prod · X

The ∃-problem was first formulated by Deniélou & Yoshida [22] as the dual of the
∀-problem (i.e., specify the execution of ∀’s body for each role in ∀’s domain):
the ∀-problem was solved in the same paper, but the ∃-problem “raises many
semantic issues” [22] and has remained open for almost a decade.
3. Limited parallel composition: The third open problem related to
choice is the -problem: how to add a constructor, denoted by , that allows
infinite branching (i.e., non-finite control) through unbounded parallel inter-
leaving? While extensions of the original work with parallel composition exist
(e.g., [16,22,23,43]), none of these works supports unbounded interleaving. For
instance, the following global type allows an unbounded number of requests to
be served by the Server in parallel (instead of sequentializing them):

μX. ∃r∈{ci | 1≤i≤n}. r s : Add · s r : Sum X + r s : Mul · s r : Prod X

Contributions. We overcome these three bottlenecks of MPST with an ap-

proach based on three key novelties: first, we have a new definition of projection
that keeps more information in the local types than existing definitions; second,
we exploit this extra information to formulate our well-formedness conditions;
third, we use an unexplored proof method for MPST, namely to prove the op-
erational equivalence between a global type and its projections modulo weak
bisimilarity. This makes the proofs cleaner and ultimately allows for more flex-
ibility (e.g., our approach can be modularly combined with traditional session
type checking, but potentially also with other verification methods, such as model
checking or conformance testing). To summarise the highlights:

– For the first time, we provide solutions to the +-problem, the ∃-problem,
and the -problem, by presenting expressive syntax for global and local types
(formulated as process algebraic terms), a refined notion of projection, and
novel well-formedness conditions.
– Our main theoretical result is operational equivalence: a well-formed global
type behaves the same as the parallel composition of its projections, modulo
weak bisimulation. This implies freedom of deadlocks and freedom of protocol
violations of the projections. Checking this equivalence is decidable.
To our knowledge, we are the first to use (weak) bisimilarity to prove the
correctness of a projection operator from global to local types. By doing so,
254 S. Jongmans and N. Yoshida

Client 1 Server Client 2 Client 1 Server Client 2

1: Lock Get(“x” )
2: Set(“x” , 5) Value(“x” , 5)
3: Set(“y” , 7) Set(“x” , 7)
4: Unlock Set(“x” , 5+1)
5: Lock

6: Get(“x” ) (b) Invalid execution

7: Get(“y” )

8: Value(“x” , 5)
Client Server

9: Barrier Lock

10: Value(“y” , 7) Get(“x” )

11: Set(“z” , 13) Set(“x” , 42)

12: Unlock Value(“x” , 42)

(a) Valid execution (c) Invalid execution

Fig. 2: Example executions of the Key-Value Store protocol

we decouple (a) the act of reasoning about projection and (b) the act of
establishing compliance between local types and process implementations;
until our work, these two concerns have always been conﬂated.
– Our main practical results are: (1) to provide representative protocols ty-
pable in our approach; and (2) the well-formedness conditions of (1) can be
checked orders of magnitude faster than directly checking weak bisimilarity
using mCRL2 [10,20,29], a state-of-the-art model checker.
In Sect. 2, we present an overview of our contribution through a representative
example protocol that is not supported by previous work. In Sect. 3, we present
the details of our theoretical contribution. In Sect. 4, we present the details of our
practical contribution (implementation and evaluation). In Sect. 5, we discuss
related work. We conclude and discuss future work in Sect. 6.
Detailed formal deﬁnitions and proofs of all lemmas and theorems can be
found in our supplement [38].

2 Overview of our Approach

Scenario. To highlight our solutions to the +-problem, ∃-problem, and -
problem, we consider a Key-Value Store protocol, similar to those used in modern
NoSQL databases [21,27]. Speciﬁcally, our Key-Value Store protocol is inspired
by the transaction mechanism of the popular Redis database [48,49]. This pro-
tocol is not supported by any of the existing MPST works.
The Key-Value Store protocol consists of n Clients that require access to the
store, represented by role names c1 , ..., cn , and one Server that provides access to
Exploring Type-Level Bisimilarity towards More Expressive MPST 255

the store, represented by role name s. The store has keys of type Str (strings) and
values of type Nat (numbers). Fig. 2 shows valid and invalid example executions
of the protocol (n=2) as message sequence charts; it works as follows.
First, a Lock-message is communicated from some Client ci (1≤i≤n) to Server
s (Fig. 2a, arrows 1, 5); this grants ci exclusive access to the store. Then, a
sequence of messages to write and/or read values is communicated:
– To write, a Set-message is communicated from ci to s (arrows 2, 3, 11).
– To read, a Get-message is communicated from ci to s (arrows 6, 7). Then,
eventually, a Value-message is communicated from s to ci (arrows 8, 10), but
in the meantime, additional Get-messages can be communicated from ci to
s. In this way, the Client does not need to await the responses of the Server
to perform multiple independent requests. To indicate enough Get-messages
have been sent, a Barrier-message is communicated from ci to s (arrow 9),
which serves as a communication fence: the protocol will only proceed once
all Value-messages for pending Get-messages have been communicated.
The sequence ends with the communication of an Unlock-message from ci to s
(arrow 12). The protocol is then repeated for some Client cj (1≤j≤n); possibly,
but not necessarily, i=j. In this way, the Server atomically processes accesses to
the store between Lock/Unlock-messages.

Global and local types. The corresponding global type and local types, in-
ferred via projection (for some n), are as follows:

G = μX. ∃r∈{ci | 1≤i≤n}. r s : Lock ·

μZ. r s : Get(Str) · s r : Value(Str, Nat) Z + r s : Barrier · Y

μY.
+ r s : Set(Str, Nat) · Y + r s : Unlock · X

LCi = μX. ci s ! Lock ·

μZ. ci s ! Get(Str) · sci ? Value(Str, Nat) Z + ci s ! Barrier · Y

μY.
+ ci s ! Set(Str, Nat) · Y + ci s ! Unlock · X

LS = μX. ∃r∈{ci | 1 ≤ i ≤ n}. rs ? Lock ·

μZ. rs ? Get(Str) · sr ! Value(Str, Nat) Z + rs ? Barrier · Y

μY.
+ rs ? Set(Str, Nat) · Y + rs ? Unlock · X

Global type r1 r2 : (t) speciﬁes the communication of a message labelled

with a payload typed t from sender r1 to receiver r2 ; global type G1 · G2 speci-
fies the sequential composition of global types G1 and G2 ; global type G1 + G2
specifies the alternative composition (choice) of global types G1 and G2 ; global
type ∃r∈{r1 , ..., rn }. G specifies the existential role quantification over domain
{r1 , ..., rn } (i.e., the alternative composition of G[r1 /r] and ... and G[rn /r], where
G[ri /r] denotes the substitution of ri for every r in G); global type G1 G2 speci-
fies the interleaving composition of G1 and G2 (free merge [4]); global type μX. G
specifies recursion (i.e., X is bound to μX. G in G).
256 S. Jongmans and N. Yoshida

Local type r1 r2 ! (t) specifies the send of a (t)-message through the channel
from r1 to r2 ; dually, local type r1 r2 ?(t) specifies a receive. Because every
Client participates in only one branch of the quantification, their local types do
not contain ∃ under the recursion. In contrast, because the Server participates
in all branches, LS does contain ∃ under the recursion.
By Thm. 3, G and the parallel composition of LC1 , ..., LCn , LS are opera-
tionally equivalent (weakly bisimilar), which in turn implies deadlock-freedom
and absence of protocol violations. Note also that our global type for the Key-
Value Store protocol indeed relies on solutions to the +-problem (choice between
multiple clients that send a Lock-message), the ∃-problem (existential quantifica-
tion over clients), and the -problem (unbounded interleaving to support asyn-
chronous responses of a statically unknown number of requests).

3 An MPST Theory with +, ∃, and

3.1 Types as Process Algebraic Terms

We deﬁne our languages of global and local types as algebras over sets of (global)
communications and (local) sends/receives. This subsection presents preliminar-
ies on the generic algebraic framework we use, based on the existing algebras
PA [3] and TCP+REC [2]; the next subsection presents our speciﬁc instantia-
tions for global and local types.
Let A denote a set of actions, ranged over by α, and let {X1 , X2 , . . . , Y, . . .}
denote a set of recursion variables. Then, let Term(A) denote the set of (alge-
braic) terms, ranged over by T , generated by the following grammar:

T ::= 1 | α | T1 + T2 | T1 · T2 | T1 T2 | X | Xk |{Xi → Ti }i∈I

(k ∈ I)

Term 1 speciﬁes a skip; the grey background indicates it should not be

explicitly written by programmers (but it is used only implicitly in the oper-
ational semantics). Term α specifies an atomic action from A. Terms T1 + T2 ,
T1 · T2 , and T1 T2 specify the alternative composition, the sequential composi-
tion, and the interleaving composition (free merge [4]; a form of parallel com-
position without interaction between the operands) of T1 and T2 . Terms X and
Xk |{Xi → Ti }i∈I
specify recursion, where {Xi → Ti }i∈I is a recursive speci-
fication that maps recursion variables to terms, Xk is the initial call (for Tk ),
and every Xj that occurs in Tk is a subsequent recursive call (for Tj ); we write
μX. T instead of X | {X → T }
.
Let X Term(A) denote the set of all recursive specifications (i.e., ev-
ery recursive specification is a partial function), ranged over by E, F , and let
sub(E, T ) denote the simultaneous substitution of term E(X) for each recursion
variable X in T . Fig. 3 defines the operational semantics of terms. It consists of
two components: relation → − defines reduction of terms, while relation ↓ defines
successful termination of terms. In words, term T1 + T2 is reduced by reducing
either T1 or T2 ; term T1 · T2 is reduced by reducing first T1 and then T2 ; term
Exploring Type-Level Bisimilarity towards More Expressive MPST 257

α α α α
T1 −→ T1 T1 ↓ T2 −→ T2 T1 −→ T1 T2 −→ T2
α α α α α
α −→ 1 T1 · T2 −→ T1 · T2 T1 · T2 −→ T2 T1 + T2 −→ T1 T1 + T2 −→ T2
α α α
T1 −→ T1 T2 −→ T2 sub(E, E(X)) −→ T
α α α
T1 T2 −→ T1 T2 T1 T2 −→ T1 T2 X | E −→ T
(a) Reduction

T1 ↓ T2 ↓ T1 ↓ T 2 ↓ T1 ↓ T2 ↓ sub(E, E(X)) ↓
1↓ T 1 + T2 ↓ T 1 + T2 ↓ T 1 · T2 ↓ T1 T2 ↓ X | E ↓
(b) Termination

Fig. 3: Operational semantics of terms

T1 T2 is reduced by reducing T1 and T2 interleaved; and term X | E

is reduced
by reducing the version of E(X) where recursion variables have been substituted.
A term is 1 -free if it has no occurrences of 1 . A term is closed if it has
no occurrences of free recursion variables. A term T is deterministic if (1) for
every action α, there exists at most one term T such that T can reduce to T
by performing α, and (2) every term to which T can reduce is deterministic as
well. Henceforth, we consider only 1 -free, closed, and deterministic terms.
We note that A, +, ·,
is the signature of PA [3], while 1 , A, +, ·, , X, -| -

is a subsignature of TCP+REC [2]. As the operational semantics of terms in

Term(A) coincides with the operational semantics of terms in (the correspond-
ing subalgebra of) TCP+REC, our languages of global and local types inherit
TCP+REC’s sound and complete axiomatisation, used in our tool (Sect. 4.1).

3.2 Global Types and Local Types

Actions. We instantiate Term(A) to obtain languages of global and local types
by deﬁning action sets for (global) communications and for (local) sends/receives.
Let R = {a, b, ...} denote the set of all role names, ranged over by r. Let
Lab = {Lock, Get, ...} denote the set of all labels, ranged over by . Let T =
{Nat, Bool, . . .} denote the set of all payload types, ranged over by t. Let U =
Lab × T denote the set of all message types, ranged over by U ; we write (t)
instead of , t
. Finally, let Ag and Al denote the sets of all (global) communi-
cations and (local) sends/receives, ranged over by g and l, generated by:

g ::= r1 r2 : U (if: r1 = r2 )

l ::= r1 r2 ! U | r1 r2 ?U | εrr1 r2 (if: r1 = r2 and r1 = r = r2 )

Global action r1 r2 : U speciﬁes the communication of a U -message from

sender r1 to receiver r2 ; we note that communications are synchronous, as actions
in the underlying algebra are indivisible [2,3], but asynchrony can be encoded
(Exmp. 1, below). Local action r1 r2 ! U speciﬁes the send of a U -message through
channel r1 r2 (from r1 to r2 ). Dually, local action r1 r2 ?U speciﬁes a receive. Local
258 S. Jongmans and N. Yoshida

( 1 , r1 r2 : U ) if: r ∈ {r1 , r2 }
split(r, r1 r2 : U ) =
(r1 r2 : U, 1 ) otherwise
⎧
⎪
⎪ (G1 , G1 · G2 ) if: split(r, G1 ) = (G1 , G1 ) and G1 = 1
⎪
⎨(G · G , G ) if: split(r, G ) = (G , G ) and G = 1 and
1 2 2 1 1 1 1
split(r, G1 · G2 ) =
⎪
⎪ split(r, G
2 ) = (G2 , G2 ) and G2 = 1
⎪
⎩
(G1 · G2 , 1 ) otherwise

M G split(r2 , G) = (G , G )

(asynchrony)
GG r1 r2 : U · M r1 r1 r2 : U · r1 r2 r2 : U G · G

M k Gk Σ {Mi }i∈I\{k} G k∈I

(n-ary choice)
Σ∅ 1 Σ {Mi }i∈I Gk + G

(ﬁnite recursion: base)

μ(X, c , e , ∅) 1

Mi = μ(X, c , e , {r1j , r2j , Mj }j∈I\{i} ) for all i ∈ I

Σ { r1i r2i : c · Mi · X + r1i r2i : e · Mi }i∈I G
(ﬁnite recursion: step)
μ(X, c , e , {r1i , r2i , Gi }i∈I ) μX. G

Σ {M [ri /r]}i∈I G
(existential role quantiﬁcation)
∃r∈{ri }i∈I . M G

Fig. 4: Macros

action εrr1 r2 speciﬁes the idling of role r during a communication between roles
r1 and r2 . The inclusion of such annotated idling actions in local types is novel;
we shortly elaborate on its purpose.
We can now deﬁne Glob = Term(Ag ) and Loc = Term(Al ) as the sets of
all global and local types, ranged over by G and L.

Macros. As a testimony to the unique expressive power of our language of global

types, we extend it with a number of macros that can be expanded to “normal”
global types in Glob. A macro M is generated by the following grammar:
M ::= G ∈ Glob | r1 r2 · M | Σ{Mi }i∈I |
μ(X, c , e , {r1i , r2i , Mi
}i∈I ) | ∃r∈{ri }i∈I . M
Degenerate “macro” G is a normal global type; it is part of the grammar to nest
global types inside macros. Macro r1 r2 · M specifies an asynchronous commu-
nication from sender r1 to receiver r2 . Macro Σ{Mi }i∈I specifies an n-ary choice
among |I| alternatives. Macro μ(X, c , e , {r1i , r2i , Mi
}i∈I ) specifies finite re-
cursion: at the start of each unfolding of recursion variable X, for some i ∈ I,
either an c -message is communicated from sender r1i to receiver r2i (in which
case they continue their participation in the recursion), or an e -message is com-
municated (in which case they exit). Macro ∃r∈{ri }i∈I . M specifies existential
Exploring Type-Level Bisimilarity towards More Expressive MPST 259

role quantiﬁcation. Macros can be nested. Slightly abusing notation, we allow

macros to occur and be expanded freely in “normal” global types.
Fig. 4 defines the macro expansion rules. We note that the left-hand side of
is a macro, while the right-hand side is a normal global type. We demonstrated
existential role quantification in Sect. 2; below, we give two more examples to
illustrate our encoding of asynchronous communication and finite recursion.

Example 1 (Asynchrony). Although communications are synchronous, we can

encode asynchrony by representing buﬀered channels (unordered, as in asyn-
chronous π-calculus [32]) explicitly as roles that participate in a protocol. To
this end, assume for all r1 , r2 ∈ R, there exists a role r1 r2 ∈ R as well (to
represent the buﬀer from r1 to r2 ); alternatively r1 r2 could be any fresh name.
The following global types (message types omitted) specify paradigmatic
cases for protocols with asynchronous communications:

M1 = a b · 1 G1 = a ab · ab b
M2 = a b · a b · 1 G2 = a ab · ab b a ab · ab b

M3 = a b · b a · 1 G3 = a ab · ab b · b ba · ba a
M4 = a b · a b G4 = a ab · ab b · a b

(For brevity, we omit 1 from the resulting global types; this can be incorporated
in the macro expansion rules, at the expense of a more complex formulation.)
Global type G1 specifies an asynchronous communication from Alice to Bob.
Global type G2 specifies two asynchronous communications from Alice to Bob;
Alice can do the second send already before Bob has done the first receive.
Global type G3 specifies an asynchronous communication from Alice to Bob,
followed by one from Bob to Alice; in contrast to G2 , Bob can send only after
he has received (i.e., this encoding of asynchrony preserves causality of messages
sent and received by the same role). Global type G4 specifies an asynchronous
communication from Alice to Bob, followed by a synchronous communication
from Bob to Alice; it highlights that, unlike existing languages of global types,
ours supports mixing synchrony and asynchrony in a single global type.

Example 2 (Finite recursion). The Key-Value Store protocol in Sect. 2 does not
terminate: in its global type, the inner recursions (Y and Z) can be exited, but
the outer recursion (X) cannot. A version of this protocol that terminates once
each of the Clients has indicated it has finished using the store (e.g., by sending
an Exit-message) can also be specified.
We illustrate the key idea in a simplified example:

G1 = μX. a c : Con · X + a c : Exit G2 = μX. b c : Con · X + b c : Exit

G = μX. a c : Con · X + a c : Exit · G2 + b c : Con · X + b c : Exit · G1

Global type G1 speciﬁes the communication of either a Con-message (to continue

the recursion) or an Exit-message (to break it) from Alice to Carol. Global type
G2 is similar. Global type G speciﬁes the communication of a Con-message from
260 S. Jongmans and N. Yoshida

L(r) ↓ r r !U r r ?U εr
r r
L(r1 ) −−1−−
2
−→ Lr1 L(r2 ) −−−−−→ Lr2
1 2
L(r) −−−−→ Lr
1 2

for all r ∈ dom L r r :U εr

L −−1−−−2−−→ L[r1 → Lr1 , r2 → Lr2 ] L −−−
r r
−→ L[r → Lr ]
1 2
L↓
(a) Termination (b) Reduction

Fig. 5: Operational semantics of groups of local types

T r =T if: G ∈ { 1 } ∪ X ⎧
⎪
⎨ r1 r 2 ! U if: r1 = r = r2
(G1 ∗ G2 ) r = (G1 r) ∗ (G2 r) r 1 r2 : U r = r 1 r 2 ? U if: r1 =
r = r2
⎪
⎩ r
if: ∗ ∈ {+, ·, } εr 1 r 2 if: r1 = r = r2

X | E r = X | E r E r = {X → E(X) r | X ∈ dom E}

G R = {r → G r | r ∈ R} if: r(G) ⊆ R = ∅

Fig. 6: Projection

either Alice or Bob to Carol, or an Exit-message. In the latter case, Carol stops
communicating with a role, while she proceeds communicating with the other
role. Thus, the communications between Alice and Carol, and between Bob and
Carol, are decoupled (i.e., decisions to continue or break recursions are made per
role). Macro μ generalizes this pattern to arbitrary recursion bodies.

Groups. Finally, let R Loc denote the set of all groups of local types (i.e.,
every group is a partial function from role names to local types), ranged over
by L. The idea is that while a global type specifies a protocol among n roles
from one global perspective, a group of local types specifies a protocol from the
n local perspectives. Fig. 5 defines the operational semantics of groups, built on
top of the operational semantics of local types; we use the f [x → y] notation
to update function f with entry x → y. In words, group L is reduced either by
synchronously reducing the local types of a sender r1 and a receiver r2 (yielding
a communication from r1 to r2 ), or by reducing the local type of an idling role.

3.3 End-Point Projection: from Global Types to Local Types

A key part of MPST (Fig. 1) is a projection operator that consumes a global
type G as input and produces a group of local types L as output; it is correct if,
under certain well-formedness conditions, G and L are operationally equivalent.
Let r(G) denote the set of all role names that occur in G. Fig. 6 deﬁnes
our projection operator. In words, the projection of a communication r1 r2 : U
onto a role r is a send r1 r2 ! U if the role is sender in the communication, a
receive r1 r2 ? U if it is receiver, or an idling action εrr1 r2 if it is not involved;
the projections of all other forms of global types onto r are homomorphic; the
projection of a global type onto a set of roles R is the corresponding group of
Exploring Type-Level Bisimilarity towards More Expressive MPST 261

τ α τ α α τ σ
T↓ T −→ T ⇓ T −→ T T −→ T =⇒ T T =⇒ T −→ T T =⇒ T
α α α τ
T⇓ T⇓ T =⇒ T T =⇒ T T =⇒ T T =⇒ T
(a) Termination (b) Reduction

Fig. 7: Weak operational semantics; T, T , T ∈ Glob ∪ Loc ∪ (R Loc)

projections, where the side condition implies that the group is nonempty and
contains a local type for at least every role name that occurs in G. Thus, a group
of projections of G is a partial function relative to the set of all roles R, but it
is total relative to the set of roles r(G) ⊆ R that occur in G. (We note that we
also continue to assume global types are 1 -free, closed, and deterministic.)
Our projection operator is similar to existing projection operators in the
MPST literature [34], but it also diﬀers on a fundamental account: it produces
local types with annotated idling actions. These idling actions will be instrumen-
tal in the deﬁnition of our well-formedness conditions. We note that no idling
actions occur in the local types for the Key-Value Store protocol in Sect. 2. This
is because after the idling actions have been used to establish well-formedness,
they are of no more use and can be eliminated to simplify the local types.
The following lemmas state key properties about termination and reduction
behaviour of global types and their projections: Lem. 1 states projection is sound
and complete for termination; Lem. 2 states the same for reduction.

Lemma 1. G ↓ implies (G r) ↓ and (G r) ↓ implies G ↓

Proof. By induction on G.

g gr
Lemma 2. G −→ G implies (G r) −−→ (G r)

gr g
and (G r) −−→ L implies G −→ G and L = G r for some G

Proof. Both conjuncts are proven by induction on the structure of G, also using
Lem. 1 (needed because termination plays a role in reduction of ·).

3.4 Weak Bisimilarity of Global Types, Local Types, and Groups

The idling actions introduced in local types by our projection operator are inter-
nal, because they never compose into communications that emerge between local
types in groups. Therefore, the operational equivalence relation under which we
prove the correctness of projection should be insensitive to idling actions.
First, let Aτ = {εrr1 r2 | r1 = r2 and r1 = r = r2 } denote the set of all in-
ternal actions, ranged over by τ, σ. Second, Fig. 7 defines an extension of our
operational semantics (Fig. 3) with relations that assert weak termination and
weak reduction (i.e., versions of termination and reduction that are insensitive to
internal actions). Third, Fig. 8 defines weak bisimilarity (≈), in terms of weak
similarity (), in terms of weak termination and weak reduction; it coincides
with the definition found in the literature (e.g., [2]), with the administrative
262 S. Jongmans and N. Yoshida
α

T1 T2 and T2 =⇒ T2 for some T2

T1 ↓ implies T2 ⇓ or T1 T2 and α ∈ Aτ R, R-1 ⊆
α
for all T1 −→ T1 T1 R T 2
T1 T2 T 1 ≈ T2

Fig. 8: Weak operational equivalence; T1 , T1 , T2 , T2 ∈ Glob ∪ Loc ∪ (R Loc)

exception that we need the fourth rule in Fig. 7b to account for the fact we
have multiple different internal actions. We use a double horizontal line in the
formulation of rules to indicate they should be applied coinductively.
The notion of weak reduction allows us to generalize the soundness and com-
pleteness of projection from roles (Lem. 2) to groups of roles: Lem. 3 states (1)
if G can g-reduce to G and the projection of G is defined, then the group of
projections of G can reduce to the group of projections of G , either directly or
with a trailing weak τ -reduction; (2) conversely, if the group of projections of G
can g-reduce to L , then G can g-reduce to G and either L equals the group of
projections of G , or it can get there with a weak reduction.
⎡ g
⎤⎤
R) −→ R)
⎡
g
(G (G or
⎢ G −→ G and g τ
Lemma 3. ⎣ implies ⎣ (G R) −→ L =⇒ (G R) ⎦⎦
⎢ ⎥⎥
G R is defined
for some L , τ
L = G R or
⎡ ⎡ ⎤⎤
g
g ⎢ G −→ G and L =⇒ τ
(G R) ⎥
and ⎣(G R) −→ L implies ⎣
⎢ ⎥
⎦⎦
for some G , τ
Proof. Both conjuncts are proven by induction on R, also using Lem. 2.

3.5 Well-formedness of Global Types

In general, projection does not preserve weak operational semantics.

Example 3 (Bad protocols). The following global types (message types omitted)
specify “bad” protocols that do not permit “good” concurrent implementations:

G1 = a b + a c G2 = a b · c d

ab ! + ac ! ab ? + εbac εcab + ac ? ab ! · εacd ab ? · εacd εc · cd ! εd · cd ?

ab ab
G1 a G1 b G1 c G2 a G2 b G2 c G2 d

Global type G1 speciﬁes a communication from Alice to either Bob or Carol,

chosen by Alice. This is a bad protocol, because if Alice chooses Bob, there is no
way for Carol to know (and vice versa): Carol cannot locally distinguish between
whether Alice has not made her choice yet, or whether Alice has chosen Bob.
Formally, this is manifested in the fact that Carol’s local type can at any time
Exploring Type-Level Bisimilarity towards More Expressive MPST 263

choose to perform idling action εcab (i.e., local type G1 c has two reductions,
neither one of which has priority), thereby assuming that Alice has chosen Bob.
However, Bob can symmetrically assume that Alice has chosen Carol.b As a result,
εc ε
the group projection can reduce as follows: G1 {a, b, c} −−ab→ L1 −−ac→ L2 . Now,
L2 cannot reduce further, but Alice has not terminated yet. This sequence of
reductions cannot be (weakly) simulated by G1 .
Global type G2 speciﬁes a communication from Alice to Bob, followed by a
communication from Carol to Dave. This is a bad protocol, because there is no
way for Carol and Dave to know when the communication from Alice to Bob
has occurred. Formally, this is manifested in the fact that Carol’s and Dave’s
local types can at any time choose to perform idling actions, thereby assuming
that the communication from Alice to Bob has occurred. As a result, the group
εcab εdab dd ab
projection can reduce as follows: G2 {a, b, c, d} −−→ L1 −−→ L2 −−−→ L3 −−−→
L4 . This sequence cannot be (weakly) simulated by G2 .

Next, we deﬁne two well-formedness conditions that invalidate the previous

examples; in Sect. 3.6, we prove that if these conditions are satisfied by a global
type G, it is indeed guaranteed that G and G R are operationally equivalent
(i.e., weakly bisimilar). Instead of defining the conditions in terms of global types,
we define them in terms of projections (i.e., local types). Informally:

C For every r ∈ R, for every choice that local type G r has between a weak
l
reduction = ⇒ (where l is a send, a receive, or an idling action) and a com-
τ
pletely unobservable weak reduction =⇒, choosing to perform the former
does not disable the latter, and vice versa. This can be thought of as a form
of commutativity between l and τ .
EC For every r ∈ R, one of the following is true:
l
1. For every every weak reduction = ⇒ that local type G r can perform
(where l is a send or a receive, but not an idling action), it can perform
l
a reduction − →. That is, if G r can perform l in the future after idling
actions, it can do l already eagerly in the present.
2. Local type G r is the start of a causal chain: a sequence of τ -reductions,
followed by a non-τ -reduction, that are “causally related” to each other.
An εrr1 r2 -reduction is causally related to a εrr3 r4 -reduction iﬀ {r1 , r2 } ∩
{r3 , r4 } = ∅. Globally speaking, this means communication between r3
and r4 must be preceded by communication between r1 and r2 .

These conditions must hold coinductively for all local types that G r can reduce
to. Essentially, these conditions state that by performing idling actions, a local
type can neither decrease its possible behaviour (C), nor increase it (EC-1),
unless it is guaranteed the added behaviour cannot be exercised yet, because it
is causally related to other communications that need to happen ﬁrst (EC-2).

Example 4 (Bad protocols, continued). Global type G1 (Exmp. 3) is ill-formed:

its projections onto b and c violate condition C. Global type G2 (Exmp. 3) is
also ill-formed: its projections onto c and d violate condition EC.

264 S. Jongmans and N. Yoshida
⎡⎡ α2 α1 ⎤⎤
Λ1 ≈ Λ2 and Λ1 ==⇒ Λ1 and Λ2 ==⇒ Λ2 or
⎢⎢ α2 ⎥⎥
⎢⎢ Λ1 ≈ Λ2 and Λ1 ==⇒ Λ1 and α1 ∈ Aτ or ⎥⎥
⎢⎢ ⎥⎥
⎢⎢ ⎥

⎢⎣ Λ1 ≈ Λ2 and Λ2 ==⇒
α1
Λ2 and α2 ∈ Aτ or ⎥ ⎦⎥
⎢ ⎥
⎢ Λ ≈ Λ and α , α ∈ A ⎥
⎣ 1 2 1 2 τ ⎦
for some Λ1 , Λ2
Cατ (Λ) C(Λ )
α1 α
for all Λ ==⇒ Λ1 and Λ ==⇒
α2
Λ2 for all α, τ for all Λ −→ Λ
Cαα2 (Λ)
1
C(Λ)

⎡⎡ α2 α1 ⎤⎤
Λ ≈ Λ∗∗ and Λ −−→ Λ∗ ==⇒ Λ∗∗ or

⎢⎢ α2 ⎥⎥
⎢⎣ Λ ≈ Λ∗ and Λ −−→ Λ∗ and α1 ∈ Aτ or⎦⎥
⎢ ⎥
⎢ Chain Λ ⎥
⎣ ⎦
for some Λ , Λ ∗ ∗∗
ECτα (Λ) EC(Λ )
α
α1
for all Λ ==⇒ Λ −−→
α2
Λ for all α ∈/ Aτ , τ for all Λ −→ Λ
ECα
α2 (Λ)
1
EC(Λ)

L1 = L2 and l1 = l2 r(τ ) ∩ r(l) = ∅ and Chain L or l ∈
/ Aτ
l1 l2 τ l
for all L −−→ L1 and L −−→ L2 for all L −→ L −
→ L
Chain L

Fig. 9: Well-formedness conditions; Λ, Λ , Λ , Λ1 , Λ1 , Λ2 , Λ2 ∈ Loc ∪ (R Loc)

Fig. 9 defines C and EC formally. We define C not only for local types, but also
for groups of local types, as this simplifies some notation later on. We prove key
properties of C: Thm. 1 states commutativity of local sends/receives/idling (l) in
local types gets lifted to commutativity of global communications/idling (α) in
groups of local types; Lem. 4 states weak bisimilarity preserves commutativity.

Clτ (L(r)) Cα

τ (L)
Theorem 1. for all r ∈ dom L implies
for all l, τ for all α, τ
and C(L(r)) for all r ∈ dom L implies C(L)

Proof. The ﬁrst conjunct is proven by induction on the rules of =⇒ . The second
is proven by coinduction on the rule of C, also using the ﬁrst conjunct.

α1
Lemma 4. Cα2 (L1 ) and L1 ≈ L2 implies Cα

α2 (L2 )
1

and C(L1 ) and L1 ≈ L2 implies C(L2 )

Proof. The first conjunct is proven by applying the definitions of C and ≈; the
second is proven by coinduction on the rule of C, also using the first conjunct.

We also prove key properties of Chain and EC, both of which work speciﬁcally
for groups of projections: Lem. 5 states if the projections of r1 and r2 are both
causal chains, they cannot weakly reduce to local types where they can perform
Exploring Type-Level Bisimilarity towards More Expressive MPST 265

reciprocal actions (r1 the send; r2 the receive); Thm. 2 states eagerness of lo-
cal sends/receives (not idling) in projections gets lifted to eagerness of global
communications in groups of projections (cf. Thm. 1).
τ1 r r2 ! U
L (r1 ) −−1−− −→ L (r1 ) and

Chain (G R)(r1 ) ==⇒
Lemma 5. τ2 r r ?U implies false
Chain (G R)(r2 ) ==⇒ L (r2 ) −−1−2−−→ L (r2 )

Proof. By induction on the rules of = ⇒.

ECτl ((G R)(r)) ECτα (G R)

Theorem 2. implies
/ Aτ , τ, r ∈ R
for all l ∈ for all α, τ
and EC(L(r)) for all r ∈ dom L implies EC(L)

Proof. The ﬁrst conjunct is proven by using Lem. 5; the second is proven by
coinduction on the rule of EC, also using the ﬁrst conjunct.

We note that, in contrast to Lem. 4 for C, we do not have a lemma that states
weak bisimilarity preserves EC. Such a lemma would have been highly useful in
our subsequent proofs, but it is unfortunately false, because weak bisimilarity
does not preserve Chain. A simple counterexample, for local types, is this: L1 =
r1 r2 ! U and L2 = εrr34 r5 · r1 r2 ! U , where {r1 , r2 } ∩ {r3 , r4 , r5 } = ∅. While L1 and
L2 are weakly bisimilar, L1 is the start of a unary causal chain, but L2 is not.
The problem here is that Chain depends on the role names associated with idling
actions, whereas weak bisimilarity abstracts those role names away.
We call a global type well-formed if each of its projections satisﬁes C and EC.

3.6 Correctness of Projection under Well-Formedness

We now to prove our main result: if a global type is well-formed, it is weakly
bisimilar to the group of its projections. We start by deﬁning a relation
to
relate global types with groups of local types (denoted by R in Fig. 8):

C(G R) EC(G R) (G R) =⇒ L ⇐
=L C(L)
G
L

Here, we write L1 =⇒ L2 as an abbreviation for:
τ
L1 ≈ L1 =⇒ L2 ≈ L2 for some L1 , L2 or L1 ≈ L2

In words, L1 =⇒ L2 means L1 has a silent reduction (only τ -s) to a term that
is weakly bisimilar to L2 , or L1 is already weakly bisimilar to L2 (without any
reductions). Essentially, if C(G R) and EC(G R), then
relates G to a set
of groups S = {L | G
L} that can roughly be characterised as follows:
– (base) G R is in S;
– (successors) any group to which G R can silently reduce, is in S;
– (predecessors) any group that can silently reduce to G R, is in S;
266 S. Jongmans and N. Yoshida

– (pseudo-predecessors) any group that can silently reduce to a group to which

G R can silently reduce, is in S;
– (closure) S is closed under weak bisimilarity.

The following technical lemma states if a well-formed group of projections

G R can weakly g-reduce to some group L , then the original global type G
can g-reduce to some G , and L and the group of projections of G either are
weakly bisimilar, or they can weakly reduce to a weakly bisimilar group L .
g
Lemma 6. C(G R) and EC(G R) and (G R) =⇒ L

g

implies G −→ G and (G R) =⇒ L ⇐ = L for some L

⇒ , also using Lem. 3.

Proof. By induction on the rules of =

The following two lemmas state key properties of

: Lem. 7 states
preserves
termination (as weak termination); Lem. 8 states
coinductively preserves re-
duction (as weak reduction). Together, these lemmas imply
⊆ and
-1 ⊆ ,
which in turn imply
⊆ ≈.

Lemma 7. G
L and G ↓ implies L ⇓

and G
L and L ↓ implies G ⇓

Proof. The ﬁrst conjunct is proven by induction on the rules of =⇒ , also using
Lem. 1; the second is proven by contradiction (assume not G ↓; derive false;
conclude G ↓; it implies G ⇓).

g
G
L and L =⇒ L

g
Lemma 8. G
L and G −→ G implies

for some L
g
G
L and G −→ G

g
and G
L and L −→ L implies

for some G
τ
and G
L and L −→ L implies G
L

⇒,
Proof. The ﬁrst and second conjunct are proven by induction on the rules of =
⇒.
also using Lemmas 3–4; the third is proven by induction on the rules of =

Theorem 3. C(G R) and EC(G R) implies G ≈ (G R)

Proof. By coinduction on the rule of (Fig. 8), also using Lemmas 7-8.

A group of local types L enjoys deadlock-freedom if it either has successfully

terminated (L ↓; Fig. 5a) or can make another reduction. A group of local types
L enjoys absence of protocol violations relative to global type G if, coinductively,
every non-τ reduction of L can be simulated by G (i.e., every communication
in the group is “permitted” by G). The following corollary relates Thm. 3 of
operational equivalence to these classical MPST properties:
Exploring Type-Level Bisimilarity towards More Expressive MPST 267

Corollary 1. If global type G is well-formed, then the group of G’s projections

enjoys deadlock-freedom and absence of protocol violations relative to G.
The key insight to understand this, is that global types are by deﬁnition free
of deadlocks (they either reduce to 1 , or they never terminate; Fig. 3), while
weak bisimilarity preserves deadlock-freedom of global types in their projections
(notably, weak bisimilarity is sensitive to termination, and a group of local types
terminates only if all individual local types terminate; Fig. 5a). Weak bisimilarity
also directly implies freedom of protocol violations.

3.7 Decidability of Checking Well-Formedness

We note our proof of Thm. 3 is non-constructive, in the sense that

is inﬁnitely
large (i.e., for each group of local types, there exist inﬁnitely many weakly bisim-
ilar groups). The following proposition states this is not a problem in practice.

Proposition 1. Checking C(L) and EC(L) is decidable.

The rationale behind this proposition is as follows. First, to check C(L) and
EC(L), by Thm. 1 and Thm. 2, it suffices to check C(L(r)) and EC(L(r)) for
each r ∈ dom L. For each such local type L(r), there are two possibilities.
If local type L(r) has finite control, its state space can be exhaustively ex-
plored in finite time, so checking C(L(r)) and EC(L(r)) is obviously decidable.
In contrast, if L(r) has non-finite control, we make two observations. The
first observation is that the only possibly source of infinity is the occurrence of
recursion variables under parallel composition. The second observation is that
C and EC are true for L1 L2 if they are true for L1 and L2 separately; this is
because C and EC essentially assert a “diamond structure” on the reductions of
L1 L2 , which is precisely the operational semantics of (Fig. 3). Thus, we can
check C(L1 L2 ) and EC(L1 L2 ) by checking C(L1 ), C(L2 ), EC(L1 ), and EC(L2 ),
thereby “avoiding” the possible source of infinity.
We note that splitting the checks for parallel composition in this way not only
ensures decidability; it also avoids exponential state explosion (in the number of
nested -operators in a single local type) in local types with finite control.

3.8 Discussion of Challenges

Our use of (weak) bisimilarity, plus the key insight to annotate silent actions with
additional information to keep track of choices, made the problem of proving the
correctness of projection (Thm. 3) feasible. The major technical challenges to
achieve this were deﬁning the right bisimulation relation (Sect. 3.5) and discov-
ering corresponding well-formedness conditions (Sect. 3.6).
A naive weak bisimulation relation, Rnaive , relates every global type only
with its group of projections. Rnaive is suﬃcient to prove that every reduction
of a global type can be weakly simulated with one non-silent reduction of the
group (sender and receiver), followed by a number of silent reductions (idling
268 S. Jongmans and N. Yoshida

Local types
(if well-formed)
Global type Local types

Parse .glob Project onto Check well- Generate

or .scr ﬁle all roles formedness APIs in Java

Fig. 10: Overview of mpstpp

processes). In contrast, Rnaive is insuﬃcient to prove that every reduction of the

group can be simulated by its global type, because of silent actions: if global type
G is related to group of projections L by Rnaive , and a silent action subsequently
reduces L to L , the simulation fails, as Rnaive does not relate G to L .
To alleviate this issue, we deﬁned the bisimulation relation in such a way
that it relates every global type G to a group of local types that are not nec-
essarily equal to the projections of G, but every local type can be behind the
corresponding projection (the local type can reach the projection with silent
actions) or ahead (the projection can reach the local type with silent actions).

4 Practical Experience with the Theory

4.1 Implementation

Tool. We implemented a tool, mpstpp, based on the core theoretical contribu-

tions of this paper. Fig. 10 shows a high-level overview of the tool, including the
main components (boxes) and data ﬂows (arrows).
First, mpstpp parses an input .glob-ﬁle to a data structure for a global type
G (programmer-friendly Scribble-style syntax [35] is also supported as input).
Then, it projects G onto all roles that occur in G. Then, it checks each of the re-
sulting local types for well-formedness, depending on settings, either sequentially
or in parallel : a key advantage of the formulation of our well-formedness condi-
tions is that they can be checked modularly for every role in isolation, enabling
us to take advantage of modern multicore hardware. Finally, if the local types
are well-formed, idling actions are eliminated and typed communication APIs are
generated from the local types to enable MPST++-based programming in Java.

Optimisations. Parsing, computing projections, and generating APIs is rela-

tively inexpensive; instead, the run times of our tool are dominated by checks for
well-formedness. We therefore implemented several optimisations to make these
checks more efficient. Before we present these optimisations, we first note that
the complexity of checking well-formedness of a local type L is polynomial in
the number of successors that can be reached from L (Fig. 9).
(1) Our first optimisation targets local types with parallel composition; local
type L1 L2 is potentially a serious bottleneck, as its number of successors is
exponential in the number of nested -operators. Therefore, even with finite state
Exploring Type-Level Bisimilarity towards More Expressive MPST 269

spaces, we check the well-formedness of L1 L2 by checking the well-formedness

of L1 and L2 , without explicitly considering the exponentially many successors
of L1 L2 , exploiting the same observation as with decidability (Sect. 3.7).
(2) Our second optimisation concerns computation of weak reductions. In
particular, to check whether C and EC are true for a local type L, according to
their definitions (Fig. 9), we need to iterate over each of their weak reductions.
Especially if L has many τ -reductions (Fig. 7), computing the set of weak reduc-
tions can be expensive. To avoid this, mpstpp computes sound (but incomplete)
approximations of C and EC. We implemented two kinds of approximations: (a)
checking versions of C and EC where every occurrence of = ⇒ in the definition is
replaced with → − , and (b) checking L ≈ L for every τ -reduction from L to L .
Approximation (a) is sound for both C and EC (rationale: if individual reductions
can commute, sequences of reductions consisting of those individual reductions
can commute as well), but approximation (b) is sound only for C (rationale:
auxiliary relation Chain of EC is not preserved by weak bisimilarity). To ensure
soundness, thus, mpstpp never uses approximation (b) for EC.
(3) Our third optimisation targets the checks for weak bisimilarity that occur
in several places in the definitions of C and EC (Fig. 9). Instead of computing the
full reduction relations and run an algorithm to decide their weak bisimilarity
(which would be computationally costly), we take advantage of the fact that our
language of local types is based on existing algebras (Sect. 3.1) that have sound
and complete axiomatisations. Specifically, to check whether two local types are
weakly bisimilar, mpstpp applies the axioms as rewrite rules and compares the
resulting normal forms for structural equality. To ensure rewriting is fast, we
sacrificed completeness (i.e., we use rewriting only to eliminate as many silent
actions as possible in a sound way, but for instance, our rewrite procedure cannot
prove that (L1 · τ ) + L2 and L2 + L1 are weakly bisimilar); however, for the ample
examples we tried (including this paper’s), this optimisation is highly effective.
Optimisations (2) and (3) are conservative: mpstpp may conclude C or EC is
false, even though it is actually true. While this affects completeness, soundness
is guaranteed: if mpstpp concludes a local type is well-formed, it really is.

4.2 Evaluation of the Approach

Setup. In the previous section, we formulated and proved the theoretical cor-
rectness of our well-formedness conditions (Thm. 3). In this section, we demon-
strate the practical usefulness through experimental evaluation in benchmarks.
Speciﬁcally, we show that checking our well-formedness conditions is faster and
more scalable than explicitly checking operational equivalence (which currently
seems the only alternative to attain the same level of expressiveness as our work).
In our benchmarks, we compare three approaches to check operational equiv-
alence between a global type and its group of projected local types:

– mpstpp-seq (baseline): In this approach, the mpstpp tool is used to check our
well-formedness conditions (which imply operational equivalence; Thm. 3),
without using any form of parallel processing.
270 S. Jongmans and N. Yoshida

– mpstpp-par: Like mpstpp-seq, except each projected local type is checked

in a separate thread. The fact our well-formedness conditions can be easily
parallelised in this way is an important practical advantage.
– explicit: In this approach, mpstpp is used only for parsing and projecting;
after that, we use the state-of-the-art verification tool set mCRL2 [10,20,29]
to explicitly check operational equivalence (details below).
We identified six example protocols (details below) that can naturally be
scaled in the number of roles N (e.g., the number of Clients in the Key-Value
Store protocol). Using each of the three approaches, for each of the protocols, for
each value of N between the minimal number of roles Nmin (e.g, Nmin =2 in the
Key-Value Store protocol: the Server and one Client) and 16, we subsequently
checked operational equivalence; varying N in this way, yields insights not only in
per-case performance, but also scalability. To get statistically reliable results [31],
we repeated executions as many times as was necessary until the 95% confidence
interval was within 5% of our reported means (i.e., there is a 95% probability
that the true mean is within 5% of our reported means).
We ran our benchmarks on a machine with an Intel Xeon 6130 processor (16
cores; no hyper-threading), using Debian 9, Java 13, and mCRL2 201908.0.

Translation to mCRL2. In the explicit approach, we use mCRL2 [10,20,29]

to explicitly check if global type G and its group of projections L are opera-
tionally equivalent. Our choice for mCRL2 is motivated by the fact our languages
of global and local types are based on the same process algebra as mCRL2’s spec-
ification language, so their translation to mCRL2 specifications is direct and
straightforward. Moreover, mCRL2 is mature (e.g., used in industry [5]), and
it uses optimised, state-of-the-art algorithms to check behavioural equivalences
(e.g., [28]), so we are comparing our tool with a serious competitor.
First, we translate global type G to mCRL2 specification G. Then, we use
mCRL2 tools mcrl22lps and lps2lts to normalize G to a linear process spec-
ification (LPS) and generate a corresponding labelled transition system (LTS).
Because of the directness of the translation, the transition labels in the resulting
LTS are all global communication actions of the form r1 r2 : U .
Second, we translate group of projections L, consisting of roles r1 , ..., rn , to
mCRL2 specification L. It looks as follows (in formal mCRL2 notation [29]):

∇{ri rj :U |1≤i,j≤n,i=j,U ∈U} (

Γ{(ri rj !U ri rj ?U )→(ri rj :U )|1≤i,j≤n,i=j,U ∈U} (L(r1 ) ... L(rn )))

where each L(ri ) is a direct translation of local type L(ri ) to an mCRL2

speciﬁcation; is a form of parallel composition that prescribes both interleaving
and synchronisation of operand actions; is synchronous composition of actions;
Γ is the communication operator that replaces synchronised local send/receive
actions ri rj !U ri rj ?U with global communication action ri rj :U ; and ∇ is
the allow operator that allows only global communication actions to be executed
(i.e., unsynchronized, individual send/receive actions cannot be executed).
Exploring Type-Level Bisimilarity towards More Expressive MPST 271

When translating a local type L(ri ) to an mCRL2 speciﬁcation L(ri ), to

make mCRL2’s subsequent veriﬁcation easier, we already eliminate as many
idling actions εrr1 r2 as possible (modulo branching bisimulation); those that re-
main are represented as a general τ action, because mCRL2 does not need the
additional information provided by εrr1 r2 . Then, we use mcrl22lps and lps2lts
to generate an LPS and LTS for L.
Third, we use mCRL2 tool ltscompare to check if the LTS for G is weakly
bisimilar to the LTS for L. We note that normalisation to an LPS using
mcrl22lps is a requirement to use ltscompare.

Protocols. We used the following protocols in our benchmarks:

Key-Value Store (KVS): This protocol is the same protocol as the one pre-
sented in Sect. 2, except each inner parallel composition () is replaced with
sequential composition (·). This is because mcrl22lps does not support nor-
malisation of mCRL2 specifications where occurs under recursion.
Load Balancer (LB): This protocol consists of a Master and a number of
Workers. Iteratively, first, a Request-message is communicated from the Mas-
ter to one of the Workers; then, a Response-message is communicated from
that Worker to the Master.
Work Stealing (WS): This protocol consists of a Master and a number of
Workers. Iteratively, a Job-message is communicated from the Master to one
of the Workers. Meanwhile, Workers can try to “steal” jobs from each other:
at any point, first, a Steal-message can be communicated from one Worker
to another Worker; then, either a Job-message (if the former Worker has a
job to spare) or a None-message (otherwise) is communicated from the latter
Worker to the former Worker.
Map/Reduce (MR): This protocol consists of a Master and a number of Work-
ers. First, in no particular order, a Map-message is communicated from the
Master to each Worker; then, in no particular order, a Reduce-message is
communicated from each Worker to the Master.
Peer-to-Peer (PtP): This protocol consists of a number of Peers. Unordered,
a Msg-message is communicated from each Peer to each other Peer.
Pub/Sub (PS): This protocol consists of a Publisher and a number of Sub-
scribers. In no particular order, a Sub-message can be communicated once
from each Subscriber to the Publisher to gain a subscription. Concurrently,
a Pub-message can be communicated from the Publisher to each Subscriber
with a subscription.

The table on the right summarises the KVS LB WS MR PtP PS

features used in each of these protocols.
For each 1≤n≤15, we instantiated the +
Key-Value Store, Load Balancer, Work ∃
Stealing, and Map/Reduce protocols with
1 Server/Master + n Clients/Workers.
For each 2≤n≤16, we instantiated the Peer-to-Peer protocol with n Peers. For
272 S. Jongmans and N. Yoshida

(a) Key-Value Store (b) Load Balancer (c) Work Stealing

(d) Map/Reduce (e) Peer-to-Peer (f) Pub/Sub

Fig. 11: Speedups (y-axis; y>1E+0 means faster, y<1E+0 means slower) of ex-
plicit relative to mpstpp-seq as the number of roles increases (x-axis)

each 2≤n≤7, we instantiated the Pub/Sub protocol with 1 Publisher and n Sub-
scribers; we did not instantiate the Pub/Sub protocol with n>7 Subscribers, as
the resulting global types are too large (their size grows exponentially in n).

Benchmark results. Figures 11–12 shows the results of our benchmarks. The
x-axis indicates the number of roles; the y-axis indicates relative speed-ups. The
baselines are at y=1E+0 and y=1: above it, a competing approach is faster than
mpstpp-seq; below it, it is slower. We draw two conclusions.
(1) For each protocol and number of roles, mpstpp-seq outperforms
explicit. In the cases of Key-Value Store and Load Balancer, explicit grows
towards mpstpp-seq, but the growth levels oﬀ as the number of roles increases,
while explicit is still about two order of magnitude slower than mpstpp-seq
in the best of circumstances. In the cases of Work Stealing, Peer-to-Peer, and
Pub/Sub, the LTSs generated from the translated mCRL2 speciﬁcations were
too large to be compared (i.e., ltscompare produced an error) beyond 7, 5, and
5 roles; this was no issue for mpstpp-seq. In the case of Map/Reduce, the LTSs
were small enough to compare using mCRL2’s ltscompare, but after an initial
upwards slope for 2≤N ≤7 roles, explicit starts to perform progressively worse.
(2) Especially for larger numbers of roles, parallelisation can yield
serious performance improvements. In the cases of Key-Value Store and
Load Balancer, mpstpp-par outperforms mpstpp-seq only with 14–16 roles; for
smaller numbers of roles, parallel execution is slower. In the worst case (Load
Balancer, 2 roles), the slowdown is roughly 10.9μs
3.2μs = 3.4; we hypothesise that be-
Exploring Type-Level Bisimilarity towards More Expressive MPST 273

(a) Key-Value Store (b) Load Balancer (c) Work Stealing

(d) Map/Reduce (e) Peer-to-Peer (f) Pub/Sub

Fig. 12: Speedups (y-axis; y>1 means faster, y<1 means slower) of mpstpp-par
relative to mpstpp-seq as the number of roles increases (x-axis)

cause of the low absolute execution times, the cost of spawning and synchronising
threads outweighs their beneﬁt. However, the ascending gradient indicates that
as the number of roles increases, relatively more of the total work can be paral-
lelised, yielding progressive rewards. In the cases of Work Stealing, Map/Reduce,
Peer-to-Peer, and Pub/Sub, similar trends can be observed, except y=1 is crossed
sonner. The absolute execution times for these protocols and for small numbers
of roles are higher than for Key-Value Store and Load Balancer.

5 Related Work

Multiparty compatibility. Closest to this paper is existing literature on mul-

tiparty compatibility [6,24,40,42]. The key idea, initially developed by Deniélou
and Yoshida for the original MPST [23,24], is to represent (groups of) local types
operationally as (systems of) communicating finite state machines (CFSM) [8]. A
CFSM M is a state machine where transitions are labelled with sends/receives;
a system of CFSMs S is a parallel composition where CFSMs communicate
through asynchronous buffers. Multiparty compatibility, then, is a condition on
the reachable states and transitions of a system S = (M1 , ..., Mn ): if it is sat-
isfied by S, the system is guaranteed to be safe (no deadlocks; no unmatched
sends/receives) and live (S terminates, assuming at least one Mi can termi-
nate). Multiparty compatibility is a sufficient condition to guarantee safety and
liveness, but not necessary: there exist safe/live systems that are not multiparty
274 S. Jongmans and N. Yoshida

compatible. Therefore, several generalisations have been proposed to cover timed

behaviour [6], undirected choice [40], and non-synchronisability [42].
The main similarities between our method in this paper and the multiparty
compatibility approach are: (1) we also use an operational interpretation of local
types; (2) we guarantee similar liveness/safety properties; (3) and we also neatly
factor out the act of checking conformance of processes to local types (resp. CF-
SMs). In contrast, we support a wider range of behaviours. Moreover, from a
practical/computational perspective, multiparty compatibility is a global condi-
tion that needs to be checked on the whole state space of a system (i.e., parallel
composition of the CFSMs), prone to exponential blow-up; our well-formedness
conditions, in contrast, are completely local and require only polynomial time to
check. The reason we do not require CFSM-like machinery in this paper is that
our operational correspondence (weak bisimilarity) is sensitive to termination:
notably, in Fig. 5a, a group of local types terminates iﬀ every individual lo-
cal type terminates (for multiparty compatibility, proofs are done modulo trace
equivalence [24], which cannot distinguish between successful/abnormal termi-
nation and is therefore in itself too weak to show deadlock-freedom).

Expressiveness of MPST. In the original MPST theory [33], and many of

its descendants (e.g., [14,19,22,24,25,43]), the restrictions on choices are en-
forced through a combination of syntax and additional well-formedness con-
ditions. Notably, in these works, communications in global types are specified
as r1 r2 : {i · Gi }i∈I , so syntactically, it is impossible to specify choices among
senders or receivers. There exist also papers where a seemingly more general
binary +-like operator is introduced, particularly those that support choices
among receivers [16,23,36,40], but the well-formedness conditions still basically
restrict the use of + in these works to r1 r2 : {i · Gi }i∈I or r {ri : i · Gi }i∈I .
This is the first paper where well-formedness conditions do not force the use
of + into one of those two restricted forms. Moreover, our well-formedness con-
ditions are compatible with unbounded interleaving (recursion under parallel),
beyond similar operators in previous work [16,22,23,43]. An alternative approach
is to completely omit statically checked well-formedness conditions (and projec-
tion), and to only dynamically verify communication actions against global types
through monitoring, as recently proposed [30]. The language of global types in
that paper is more expressive than ours in this paper, but all verification happens
at run-time, whereas we provide correctness guarantees already at compile-time.

Session types and model checking. Recently, there has been growing interest
in using model checking to verify properties of (multiparty) session types, similar
to our use of mCRL2 as an alternative to checking well-formedness (Sect. 4.2).
Lange et al. [39] infer behavioural types from Go programs and use mCRL2 to
verify the inferred types, to establish safety properties (combined with another
tool, KITTeL [26], to establish liveness). Hu and Yoshida [36] use a custom model
checker to verify safety and progress properties of local types (represented as
CFSMs) as part of API generation in the Scribble toolchain for MPST [35].
Exploring Type-Level Bisimilarity towards More Expressive MPST 275

Closest to our use of mCRL2 is the work of Scalas et al. [52,53], where mCRL2
is used to verify properties of local types (e.g., deadlock-freedom), while a form of
dependent type-checking is used to verify conformance of processes against those
types (i.e., actors in Scala); no global types and projection are used, though (pro-
grammers write local types manually). The idea is that properties model-checked
on the types carry over to the processes. Similarly, Scalas and Yoshida [51] use
mCRL2 to model-check session environments, as a more expressive alternative
to the classical consistency condition needed to prove subject reduction. Note
that [51, Theorem 5.15] shows that, in the case that a set of processes is typable
by a single multiparty session (i.e. a single global type), type-level properties
including safety, deadlock-freedom and liveness guarantee the same properties
for multiparty session π-processes. Hence our type-level analysis is directly us-
able to provide decidable procedures to verify session π-calculi with extended
expressiveness [51, Theorem 7.2].

6 Conclusion

A key open problems with multiparty session types (MPST) concerns expressive-
ness: none of the previous languages of global and local types supports arbitrary
choice (e.g., choices between different senders), existential quantification over
roles, and unbounded interleaving of subprotocols (in the same session). In this
paper, we presented the first theory that supports these features. Our main the-
oretical result is operational equivalence under weak bisimilarity: this guarantees
classical MPST properties for groups of local types projected from a global type,
namely freedom of deadlocks and absence of protocol violations. Our main prac-
tical result is that our well-formedness conditions, which guarantee operational
equivalence, can be checked orders of magnitude faster than directly checking
weak bisimilarity, which is demonstrated by our benchmark results.
We identify several interesting avenues for future work. First, it is useful to
extend our theory with parametrisation along the lines of Castro et al. [18] (which
currently works only for restrictive choices); their proof technique for correctness
seems to offer substantial synergy with our bisimilarity-based approach in this
paper. Second, we aim to investigate extensions of our theory with subtyping
(e.g., in terms of weak similarity). Notably, while asynchronous communication
can be encoded in our current theory, asynchronous subtyping is known to be
undecidable [9,41], so the connection between the two is interesting to explore.

Acknowledgments. Funded by the Netherlands Organisation of Scientiﬁc Re-

search (NWO): 016.Veni.192.103. This work was carried out on the Dutch na-
tional e-infrastructure with the support of SURF Cooperative. Supported by EP-
SRC projects EP/K034413/1, EP/K011715/1, EP/L00058X/1, EP/N027833/1,
EP/N028201/1, EP/T006544/1.
276 S. Jongmans and N. Yoshida

References
1. Ancona, D., Bono, V., Bravetti, M., Campos, J., Castagna, G., Deniélou, P., Gay,
S.J., Gesbert, N., Giachino, E., Hu, R., Johnsen, E.B., Martins, F., Mascardi, V.,
Montesi, F., Neykova, R., Ng, N., Padovani, L., Vasconcelos, V.T., Yoshida, N.:
Behavioral types in programming languages. Foundations and Trends in Program-
ming Languages 3(2-3), 95–230 (2016)
2. Baeten, J.C.M., Bravetti, M.: A ground-complete axiomatisation of finite-state pro-
cesses in a generic process algebra. Mathematical Structures in Computer Science
18(6), 1057–1089 (2008)
3. Bergstra, J.A., Fokkink, W., Ponse, A.: Chapter 5 - process algebra with recursive
operations. In: Bergstra, J., Ponse, A., Smolka, S. (eds.) Handbook of Process
Algebra, pp. 333 – 389. Elsevier Science (2001)
4. Bergstra, J.A., Klop, J.W.: Process algebra for synchronous communication. In-
formation and Control 60(1-3), 109–137 (1984)
5. van Beusekom, R., Groote, J.F., Hoogendijk, P.F., Howe, R., Wesselink, W.,
Wieringa, R., Willemse, T.A.C.: Formalising the dezyne modelling language in
mcrl2. In: FMICS-AVoCS. Lecture Notes in Computer Science, vol. 10471, pp.
217–233. Springer (2017)
6. Bocchi, L., Lange, J., Yoshida, N.: Meeting deadlines together. In: CONCUR.
LIPIcs, vol. 42, pp. 283–296. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik
(2015)
7. Bocchi, L., Yang, W., Yoshida, N.: Timed multiparty session types. In: CONCUR.
Lecture Notes in Computer Science, vol. 8704, pp. 419–434. Springer (2014)
8. Brand, D., Zafiropulo, P.: On communicating finite-state machines. J. ACM 30(2),
323–342 (1983)
9. Bravetti, M., Carbone, M., Zavattaro, G.: Undecidability of asynchronous session
subtyping. Inf. Comput. 256, 300–320 (2017)
10. Bunte, O., Groote, J.F., Keiren, J.J.A., Laveaux, M., Neele, T., de Vink, E.P., Wes-
selink, W., Wijs, A., Willemse, T.A.C.: The mcrl2 toolset for analysing concurrent
systems - improvements in expressivity and usability. In: TACAS (2). Lecture Notes
in Computer Science, vol. 11428, pp. 21–39. Springer (2019)
11. Capecchi, S., Castellani, I., Dezani-Ciancaglini, M.: Typing access control and se-
cure information flow in sessions. Inf. Comput. 238, 68–105 (2014)
12. Capecchi, S., Castellani, I., Dezani-Ciancaglini, M.: Information flow safety in mul-
tiparty sessions. Mathematical Structures in Computer Science 26(8), 1352–1394
(2016)
13. Capecchi, S., Castellani, I., Dezani-Ciancaglini, M., Rezk, T.: Session types for
access and information flow control. In: CONCUR. Lecture Notes in Computer
Science, vol. 6269, pp. 237–252. Springer (2010)
14. Carbone, M., Montesi, F.: Deadlock-freedom-by-design: multiparty asynchronous
global programming. In: POPL. pp. 263–274. ACM (2013)
15. Carbone, M., Yoshida, N., Honda, K.: Asynchronous session types: Exceptions and
multiparty interactions. In: SFM. Lecture Notes in Computer Science, vol. 5569,
pp. 187–212. Springer (2009)
16. Castagna, G., Dezani-Ciancaglini, M., Padovani, L.: On global types and multi-
party session. Logical Methods in Computer Science 8(1) (2012)
17. Castellani, I., Dezani-Ciancaglini, M., Pérez, J.A.: Self-adaptation and secure infor-
mation flow in multiparty communications. Formal Asp. Comput. 28(4), 669–696
(2016)
Exploring Type-Level Bisimilarity towards More Expressive MPST 277

18. Castro, D., Hu, R., Jongmans, S., Ng, N., Yoshida, N.: Distributed program-
ming using role-parametric session types in go: statically-typed endpoint apis
for dynamically-instantiated communication structures. PACMPL 3(POPL), 29:1–
29:30 (2019)
19. Coppo, M., Dezani-Ciancaglini, M., Yoshida, N., Padovani, L.: Global progress for
dynamically interleaved multiparty sessions. Mathematical Structures in Computer
Science 26(2), 238–302 (2016)
20. Cranen, S., Groote, J.F., Keiren, J.J.A., Stappers, F.P.M., de Vink, E.P., Wes-
selink, W., Willemse, T.A.C.: An overview of the mcrl2 toolset and its recent
advances. In: TACAS. Lecture Notes in Computer Science, vol. 7795, pp. 199–213.
Springer (2013)
21. Davoudian, A., Chen, L., Liu, M.: A survey on nosql stores. ACM Comput. Surv.
51(2), 40:1–40:43 (2018)
22. Deniélou, P., Yoshida, N.: Dynamic multirole session types. In: POPL. pp. 435–446.
ACM (2011)
23. Deniélou, P., Yoshida, N.: Multiparty session types meet communicating automata.
In: ESOP. Lecture Notes in Computer Science, vol. 7211, pp. 194–213. Springer
(2012)
24. Deniélou, P., Yoshida, N.: Multiparty compatibility in communicating automata:
Characterisation and synthesis of global session types. In: ICALP (2). Lecture
Notes in Computer Science, vol. 7966, pp. 174–186. Springer (2013)
25. Deniélou, P., Yoshida, N., Bejleri, A., Hu, R.: Parameterised multiparty session
types. Logical Methods in Computer Science 8(4) (2012)
26. Falke, S., Kapur, D., Sinz, C.: Termination analysis of imperative programs using
bitvector arithmetic. In: VSTTE. Lecture Notes in Computer Science, vol. 7152,
pp. 261–277. Springer (2012)
27. Gessert, F., Wingerath, W., Friedrich, S., Ritter, N.: Nosql database systems: a
survey and decision guidance. Computer Science - R&D 32(3-4), 353–365 (2017)
28. Groote, J.F., Jansen, D.N., Keiren, J.J.A., Wijs, A.: An O(mlogn) algorithm for
computing stuttering equivalence and branching bisimulation. ACM Trans. Com-
put. Log. 18(2), 13:1–13:34 (2017)
29. Groote, J.F., Mousavi, M.R.: Modeling and Analysis of Communicating Systems.
MIT Press (2014)
30. Hamers, R., Jongmans, S.S.: Discourje: Runtime verification of communication
protocols in clojure. In: TACAS 2020 (in press)
31. Hoefler, T., Belli, R.: Scientific benchmarking of parallel computing systems: twelve
ways to tell the masses when reporting performance results. In: SC. pp. 73:1–73:12.
ACM (2015)
32. Honda, K., Tokoro, M.: An object calculus for asynchronous communication. In:
ECOOP. Lecture Notes in Computer Science, vol. 512, pp. 133–147. Springer
(1991)
33. Honda, K., Yoshida, N., Carbone, M.: Multiparty asynchronous session types. In:
POPL. pp. 273–284. ACM (2008)
34. Honda, K., Yoshida, N., Carbone, M.: Multiparty asynchronous session types. J.
ACM 63(1), 9:1–9:67 (2016)
35. Hu, R., Yoshida, N.: Hybrid session verification through endpoint API generation.
In: FASE. Lecture Notes in Computer Science, vol. 9633, pp. 401–418. Springer
(2016)
36. Hu, R., Yoshida, N.: Explicit connection actions in multiparty session types. In:
FASE. Lecture Notes in Computer Science, vol. 10202, pp. 116–133. Springer (2017)
278 S. Jongmans and N. Yoshida

37. Hüttel, H., Lanese, I., Vasconcelos, V.T., Caires, L., Carbone, M., Deniélou, P.,
Mostrous, D., Padovani, L., Ravara, A., Tuosto, E., Vieira, H.T., Zavattaro, G.:
Foundations of session types and behavioural contracts. ACM Comput. Surv.
49(1), 3:1–3:36 (2016)
38. Jongmans, S.S., Yoshida, N.: Exploring Type-Level Bisimilarity towards More Ex-
pressive Multiparty Session Types. Tech. Rep. TR-OU-INF-2020-01, Open Univer-
sity of the Netherlands (2020)
39. Lange, J., Ng, N., Toninho, B., Yoshida, N.: A static verification framework for
message passing in go using behavioural types. In: ICSE. pp. 1137–1148. ACM
(2018)
40. Lange, J., Tuosto, E., Yoshida, N.: From communicating machines to graphical
choreographies. In: POPL. pp. 221–232. ACM (2015)
41. Lange, J., Yoshida, N.: On the undecidability of asynchronous session subtyping.
In: FoSSaCS. Lecture Notes in Computer Science, vol. 10203, pp. 441–457 (2017)
42. Lange, J., Yoshida, N.: Verifying asynchronous interactions via communicating
session automata. In: CAV (1). Lecture Notes in Computer Science, vol. 11561,
pp. 97–117. Springer (2019)
43. Mostrous, D., Yoshida, N., Honda, K.: Global principal typing in partially com-
mutative asynchronous sessions. In: ESOP. Lecture Notes in Computer Science,
vol. 5502, pp. 316–332. Springer (2009)
44. Neykova, R., Bocchi, L., Yoshida, N.: Timed runtime monitoring for multiparty
conversations. Formal Asp. Comput. 29(5), 877–910 (2017)
45. Neykova, R., Hu, R., Yoshida, N., Abdeljallal, F.: A session type provider: compile-
time API generation of distributed protocols with refinements in f#. In: CC. pp.
128–138. ACM (2018)
46. Neykova, R., Yoshida, N.: Let it recover: multiparty protocol-induced recovery. In:
CC. pp. 98–108. ACM (2017)
47. Ng, N., Yoshida, N.: Pabble: parameterised scribble. Service Oriented Computing
and Applications 9(3-4), 269–284 (2015)
48. Redis Labs: Redis (nd), accessed 18 October 2019, https://redis.io
49. Redis Labs: Transactions – redis (nd), accessed 18 October 2019, https://redis.io/
topics/transactions
50. Scalas, A., Dardha, O., Hu, R., Yoshida, N.: A linear decomposition of multiparty
sessions for safe distributed programming. In: ECOOP. LIPIcs, vol. 74, pp. 24:1–
24:31. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2017)
51. Scalas, A., Yoshida, N.: Less is more: multiparty session types revisited. PACMPL
3(POPL), 30:1–30:29 (2019)
52. Scalas, A., Yoshida, N., Benussi, E.: Effpi: verified message-passing programs in
dotty. In: SCALA@ECOOP. pp. 27–31. ACM (2019)
53. Scalas, A., Yoshida, N., Benussi, E.: Verifying message-passing programs with de-
pendent behavioural types. In: PLDI. pp. 502–516. ACM (2019)
Exploring Type-Level Bisimilarity towards More Expressive MPST 279

Siddharth Krishna1 , Michael Emmi2 , Constantin Enea3 , and Dejan Jovanović2

1
New York University, New York, NY, USA, [email protected]
2
SRI International, New York, NY, USA, [email protected],
[email protected]
3
Université de Paris, IRIF, CNRS, F-75013 Paris, France, [email protected]

Abstract. Multithreaded programs generally leverage efficient and thread-safe

concurrent objects like sets, key-value maps, and queues. While some concurrent-
object operations are designed to behave atomically, each witnessing the atomic
effects of predecessors in a linearization order, others forego such strong consis-
tency to avoid complex control and synchronization bottlenecks. For example,
contains (value) methods of key-value maps may iterate through key-value
entries without blocking concurrent updates, to avoid unwanted performance
bottlenecks, and consequently overlook the effects of some linearization-order
predecessors. While such weakly-consistent operations may not be atomic, they
still offer guarantees, e.g., only observing values that have been present.
In this work we develop a methodology for proving that concurrent object
implementations adhere to weak-consistency specifications. In particular, we
consider (forward) simulation-based proofs of implementations against relaxed-
visibility specifications, which allow designated operations to overlook some of
their linearization-order predecessors, i.e., behaving as if they never occurred. Be-
sides annotating implementation code to identify linearization points, i.e., points
at which operations’ logical effects occur, we also annotate code to identify visible
operations, i.e., operations whose effects are observed; in practice this annotation
can be done automatically by tracking the writers to each accessed memory
location. We formalize our methodology over a general notion of transition
systems, agnostic to any particular programming language or memory model,
and demonstrate its application, using automated theorem provers, by verifying
models of Java concurrent object implementations.

1 Introduction
Programming efficient multithreaded programs generally involves carefully organiz-
ing shared memory accesses to facilitate inter-thread communication while avoiding
synchronization bottlenecks. Modern software platforms like Java include reusable
abstractions which encapsulate low-level shared memory accesses and synchronization
into familiar high-level abstract data types (ADTs). These so-called concurrent objects
typically include mutual-exclusion primitives like locks, numeric data types like atomic
integers, as well as collections like sets, key-value maps, and queues; Java’s standard-
edition platform contains many implementations of each. Such objects typically provide
strong consistency guarantees like linearizability [18], ensuring that each operation
appears to happen atomically, witnessing the atomic effects of predecessors according
to some linearization order among concurrently-executing operations.

c The Author(s) 2020

P. Müller (Ed.): ESOP 2020, LNCS 12075, pp. 280–307, 2020.
https://doi.org/10.1007/978-3-030-44914-8_ 11
Verifying Visibility-Based Weak Consistency 281

While such strong consistency guarantees are ideal for logical reasoning about
programs which use concurrent objects, these guarantees are too strong for many oper-
ations, since they preclude simple and/or efficient implementation — over half of Java’s
concurrent collection methods forego atomicity for weak-consistency [13]. On the one
hand, basic operations like the get and put methods of key-value maps typically admit
relatively-simple atomic implementations, since their behaviors essentially depend
upon individual memory cells, e.g., where the relevant key-value mapping is stored.
On the other hand, making aggregate operations like size and contains (value) atomic
would impose synchronization bottlenecks, or otherwise-complex control structures,
since their atomic behavior depends simultaneously upon the values stored across
many memory cells. Interestingly, such implementations are not linearizable even
when their underlying memory operations are sequentially consistent, e.g., as is the
case with Java 8’s concurrent collections, whose memory accesses are data-race free.4
For instance, the contains (value) method of Java’s concurrent hash map iterates
through key-value entries without blocking concurrent updates in order to avoid
unreasonable performance bottlenecks. Consequently, in a given execution, a contains-
value-v operation o1 will overlook operation o2 ’s concurrent insertion of k1 → v for a
key k1 it has already traversed. This oversight makes it possible for o1 to conclude that
value v is not present, and can only be explained by o1 being linearized before o2 . In the
case that operation o3 removes k2 → v concurrently before o1 reaches key k2 , but only
after o2 completes, then atomicity is violated since in every possible linearization, either
mapping k2 → v or k1 → v is always present. Nevertheless, such weakly-consistent
operations still offer guarantees, e.g., that values never present are never observed, and
initially-present values not removed are observed.
In this work we develop a methodology for proving that concurrent-object imple-
mentations adhere to the guarantees prescribed by their weak-consistency specifica-
tions. The key salient aspects of our approach are the lifting of existing sequential ADT
specifications via visibility relaxation [13], and the harnessing of simple and mechaniz-
able reasoning based on forward simulation [25] by relaxed-visibility ADTs. Effectively,
our methodology extends the predominant forward-simulation based linearizability-
proof methodology to concurrent objects with weakly-consistent operations, and
enables automation for proving weak-consistency guarantees.
To enable the harnessing of existing sequential ADT specifications, we adopt the
recent methodology of visibility relaxation [13]. As in linearizability [18], the return
value of each operation is dictated by the atomic effects of its predecessors in some
(i.e., existentially quantified) linearization order. To allow consistency weakening,
operations are allowed, to a certain extent, to overlook some of their linearization-order
predecessors, behaving as if they had not occurred. Intuitively, this (also existentially
quantified) visibility captures the inability or unwillingness to atomically observe
the values stored across many memory cells. To provide guarantees, the extent of
4
Java 8 implementations guarantee data-race freedom by accessing individual shared-memory
cells with atomic operations via volatile variables and compare-and-swap instructions. Starting
with Java 9, the implementations of the concurrent collections use the VarHandle mechanism
to specify shared variable access modes. Java’s official language and API specifications do not
clarify whether these relaxations introduce data races.
282 S. Krishna et al.

visibility relaxation is bounded to varying degrees. Notably, the visibility of an absolute

operation must include all of its linearization-order predecessors, while the visibility
of a monotonic operation must include all happens-before predecessors, along with
all operations visible to them. The majority of Java’s concurrent collection methods
are absolute or monotonic [13]. For instance, in the contains-value example described
above, by considering that operation o2 is not visible to o1 , the conclusion that v is not
present can be justified by the linearization o2 ; o3 ; o1 , in which o1 sees o3 ’s removal
of k2 → v yet not o2 ’s insertion of k1 → v. Ascribing the monotonic visibility to
the contains-value method amounts to a guarantee that initially-present values are
observed unless removed (i.e., concurrently).
While relaxed-visibility specifications provide a means to describing the guar-
antees provided by weakly-consistent concurrent-object operations, systematically
establishing implementations’ adherence requires a strategy for demonstrating simula-
tion [25], i.e., that each step of the implementation is simulated by some step of (an
operational representation of) the specification. The crux of our contribution is thus
threefold: first, to identify the relevant specification-level actions with which to relate
implementation-level transitions; second, to identify implementation-level annotations
relating transitions to specification-level actions; and third, to develop strategies for
devising such annotations systematically. For instance, the existing methodology based
on linearization points [18] essentially amounts to annotating implementation-level
transitions with the points at which its specification-level action, i.e., its atomic effect,
occurs. Relaxed-visibility specifications require not only a witness for the existentially-
quantified linearization order, but also an existentially-quantified visibility relation,
and thus requires a second kind of annotation to resolve operations’ visibilities. We
propose a notion of visibility actions which enable operations to declare their visibility
of others, e.g., specifying the writers of memory cells it has read.
The remainder of our approach amounts to devising a systematic means for con-
structing simulation proofs to enable automated verification. Essentially, we identify a
strategy for systematically annotating implementations with visibility actions, given
linearization-point annotations and visibility bounds (i.e., absolute or monotonic), and
then encode the corresponding simulation check using an off-the-shelf verification
tool. For the latter, we leverage civl [16], a language and verifier for Owicki-Gries style
modular proofs of concurrent programs with arbitrarily-many threads. In principle,
since our approach reduces simulation to safety verification, any safety verifier could
be used, though civl facilitates reasoning for multithreaded programs by capturing
interference at arbitrary program points. Using civl, we have verified monotonicity of
the contains-value and size methods of Java’s concurrent hash-map and concurrent
linked-queue, respectively — and absolute consistency of add and remove operations.
Although our models are written in civl and assume sequentially-consistent memory
accesses, they capture the difficult aspects of weak-consistency in Java, including heap-
based memory access; furthermore, our models are also sound with respect to Java 8’s
memory model, since their Java 8 implementations guarantee data-race freedom.
In summary, we present the first methodology for verifying weakly-consistent op-
erations using sequential specifications and forward simulation. Contributions include:
Verifying Visibility-Based Weak Consistency 283

– the formalization of our methodology over a general notion of transition systems,

agnostic to any particular programming language or memory model (§3);
– the application of our methodology to verifying a weakly-consistent contains-value
method of a key-value map (§4); and
– a mechanization of our methodology used for verifying models of weakly-consistent
Java methods using automated theorem provers (§5).
Aside from the outline above, this article summarizes an existing weak-consistency
specification methodology via visibility relaxation (§2), summarizes related work (§6),
and concludes (§7). Proofs of all theorems and lemmas are listed in Appendix A.

2 Weak Consistency
Our methodology for verifying weakly-consistent concurrent objects relies both on the
precise characterization of weak consistency specifications, as well as a proof technique
for establishing adherence to specifications. In this section we recall and outline a
characterization called visibility relaxation [13], an extension of sequential abstract
data type (ADT) specifications in which the return values of some operations may not
reflect the effects of previously-effectuated operations.
Notationally, in the remainder of this article, ε denotes the empty sequence, ∅
denotes the empty set, _ denotes an unused binding, and and ⊥ denote the Boolean
values true and false, respectively. We write R(x) to denote the inclusion x ∈ R of
a tuple x in the relation R; and R[x → y] to denote the extension R ∪ {xy} of R to
include xy; and R | X to denote the projection R ∩ X ∗ of R to set X; and R to denote
the complement {x : x ∈ / R} of R; and R(x) to denote the image {y : xy ∈ R} of R on
x; and R−1 (y) to denote the pre-image {x : xy ∈ R} of R on y; whether R(x) refers
to inclusion or an image will be clear from its context. Finally, we write xi to refer to
the ith element of tuple x = x0 x1 . . ..

2.1 Weak-Visibility Specifications

For a general notion of ADT specifications, we consider fixed sets M and X of method
names and argument or return values, respectively. An operation label λ =
m, x, y
is a method name m ∈ M along with argument and return values x, y ∈ X. A read-
only predicate is a unary relation R(λ) on operation labels, an operation sequence
s = λ0 λ1 . . . is a sequence of operation labels, and a sequential specification S =
{s0 , s1 , . . .} is a set of operation sequences. We say that R is compatible with S when S
is closed under deletion of read-only operations, i.e., λ0 . . . λj−1 λj+1 . . . λi ∈ S when
λ0 . . . λi ∈ S and R(λj ).

Example 1. The key-value map ADT sequential specification Sm is the prefix-closed

set containing all sequences λ0 . . . λi such that λi is either:
–
put, kv, b, and b = iff some
rem, k, _ follows any prior
put, kv, _;
–
rem, k, b, and b = iff no other
rem, k, _ follows some prior
put, kv, _;
–
get, k, v, and no
put, kv , _ nor
rem, k, _ follows some prior
put, kv, _, and
v = ⊥ if no such
put, kv, _ exists; or
284 S. Krishna et al.

–
has, v, b, and b = iff no prior
put, kv , _ nor
rem, k, _ follows some prior

put, kv, _.
The read-only predicate Rm holds for the following cases:

Rm (
put, _, b) if ¬b Rm (
rem, _, b) if ¬b Rm (
get, _, _) Rm (
has, _, _).

This is a simplification of Java’s Map ADT, i.e., with fewer methods.5

To derive weak specifications from sequential ones, we consider a set V of ex-

actly two visibility labels from prior work [13]: absolute and monotonic.6 A visibility
annotation V : M → V maps each method m ∈ M to a visibility V (m) ∈ V.
Intuitively, absolute visibility requires operations to observe the effects of all of their
linearization-order predecessors. The weaker monotonic visibility requires operations
to observe the effects of all their happens-before (i.e., program- and synchronization-
order) predecessors, along with the effects already observed by those predecessors,
i.e., so that sets of visible effects are monotonically increasing over happens-before
chains of operations; conversely, operations may ignore effects which have been ignored
by their happens-before predecessors, so long as those effects are not transitively related
by program and synchronization order.

Definition 1. A weak-visibility specification W =

S, R, V is a sequential specifica-
tion S with a compatible read-only predicate R and a visibility annotation V .

Example 2. The weakly-consistent contains-value map Wm =

Sm , Rm , Vm annotates
the key-value map ADT methods of Sm from Example 1 with:

Vm (put) = Vm (rem) = Vm (get) = absolute, Vm (has) = monotonic.

Java’s concurrent hash map appears to be consistent with this specification [13].

We ascribe semantics to specifications by characterizing the values returned by

concurrent method invocations, given constraints on invocation order. In practice, the
happens-before order among invocations is determined by a program order, i.e., among
invocations of the same thread, and a synchronization order, i.e., among invocations
of distinct threads accessing the same atomic objects, e.g., locks. A history h =

O, inv , ret, hb is a set O ⊆ N of numeric operation identifiers, along with an invoca-

tion function inv : O → M × X mapping operation identifiers to method names and
argument values, a partial return function ret : O X mapping operation identifiers
to return values, and a (strict) partial happens-before relation hb ⊆ O × O; the empty
history h∅ has O = inv = ret = hb = ∅. An operation o ∈ O is complete when ret(o)
is defined, and is otherwise incomplete; then h is complete when each operation is. The
label of a complete operation o with inv (o) =
m, x and ret(o) = y is
m, x, y.
To relate operations’ return values in a given history back to sequential specifica-
tions, we consider certain sequencings of those operations. A linearization of a history
h =
O, _, _, hb is a total order lin ⊇ hb over O which includes hb, and a visibility
5
For brevity, we abbreviate Java’s remove and contains-value methods by rem and has.
6
Previous work refers to absolute visibility as complete, and includes additional visibility labels.
Verifying Visibility-Based Weak Consistency 285

projection vis of lin maps each operation o ∈ O to a subset vis(o) ⊆ lin −1 (o) of the
operations preceding o in lin; note that
o1 , o2 ∈ vis means o1 observes o2 . For a
given read-only predicate R, we say o’s visibility is monotonic when it includes every
happens-before predecessor, and operation visible to a happens-before predecessor,
which is not read-only,7 i.e., vis(o) ⊇ hb −1 (o) ∪ vis(hb −1 (o)) | R. We says o’s

visibility is absolute when vis(o) = lin −1 (o), and vis is itself absolute when each vis(o)
is. An abstract execution e =
h, lin, vis is a history h along with a linearization of
h, and a visibility projection vis of lin. An abstract execution is sequential when hb is
total, complete when h is, and absolute when vis is.

Example 3. An abstract execution can be defined using the linearization8

put,
1, 1,
get, 1, 1
put,
0, 1,
put,
1, 0, ⊥
has, 1, ⊥

along with a happens-before order that, compared to the linearization order, keeps

has, 1, ⊥ unordered w.r.t.

put,
0, 1, and
put,
1, 0, ⊥, and a visibility projec-
tion where the visibility of every put and get includes all the linearization predecessors
and the visibility of
has, 1, ⊥ consists of
put,
1, 1, and
put,
1, 0, ⊥. Recall
that in the argument
k, v to put operations, the key k precedes value v.

To determine the consistency of individual histories against weak-visibility spec-

ifications, we consider adherence of their corresponding abstract executions. Let
h =
O, inv , ret, hb be a history and e =
h, lin, vis a complete abstract execu-
tion. Then e is consistent with a visibility annotation V and read-only predicate R if
for each operation o ∈ dom(lin) with inv (o) =
m, _, vis(o) is absolute or mono-
tonic, respectively, according to V (m) and R. The labeling λ0 λ1 . . . of a total order
o0 ≺ o1 ≺ . . . of complete operations is the sequence of operation labels, i.e., λi is the
label of oi . Then e is consistent with a sequential specification S when the labeling9
of lin | (vis(o) ∪ {o}) is included in S, for each operation o ∈ dom(lin).10 Finally, we
say e is consistent with a weak-visibility specification
S, R, V when it is consistent
with S, R, and V .

Example 4. The execution in Example 3 is consistent with the weakly-consistent

contains-value map Wm defined in Example 2.

Remark 1. Consistency models suited for modern software platforms like Java are based
on happens-before relations which abstract away from real-time execution order. Since
happens-before, unlike real-time, is not necessarily an interval order, the composition
7
For convenience we rephrase Emmi and Enea [13]’s notion to ignore read-only predecessors.
8
For readability, we list linearization sequences with operation labels in place of identifiers.
9
As is standard, adequate labelings of incomplete executions are obtained by completing each
linearized yet pending operation with some arbitrarily-chosen return value [18]. It is sufficient
that one of these completions be included in the sequential specification.
10
We consider a simplification from prior work [13]: rather than allowing the observers of a
given operation to pretend they see distinct return values, we suppose that all observers agree
on return values. While this is more restrictive in principle, it is equivalent for the simple
specifications studied in this article.
286 S. Krishna et al.

of linearizations of two distinct objects in the same execution may be cyclic, i.e., not
linearizable. Recovering compositionality in this setting is orthogonal to our work of
proving consistency against a given model, and is explored elsewhere [11].

The abstract executions E(W ) of a weak-visibility specification W =

S, R, V
include those complete, sequential, and absolute abstract executions derived from
sequences of S, i.e., when s = λ0 . . . λn ∈ S then each es labels each oi by λi , and
orders hb(oi , oj ) iff i < j. In addition, when E(W ) includes an abstract execution

h, lin, vis with h =

O, inv , ret, hb, then E(W ) also includes any:

– execution
h , lin, vis such that h = O, inv , ret, hb and hb ⊆ hb; and

– W -consistent execution
h , lin, vis with h =
O, inv , ret , hb and vis ⊆ vis.

Note that while happens-before weakening hb ⊆ hb always yields consistent executions,

unguarded visibility weakening vis ⊆ vis generally breaks consistency with visibility
annotations and sequential specifications: visibilities can become non-monotonic, and
return values can change when operations observe fewer operations’ effects.

Lemma 1. The abstract executions E(W ) of a specification W are consistent with W .

Example 5. The abstract executions of Wm include the complete, sequential, and abso-
lute abstract execution defined by the following happens-before order

put,
1, 1,
get, 1, 1
put,
0, 1,
put,
1, 0, ⊥
has, 1,

which implies that it also includes one in which just the happens-before order is modi-
fied such that
has, 1, becomes unordered w.r.t.
put,
0, 1, and
put,
1, 0, ⊥.
Since it includes the latter, it also includes the execution in Example 3 where the
visibility of has is weakened which also modifies its return value from to ⊥.

Definition 2. The histories of a weak-visibility specification W are the projections

H(W ) = {h :
h, _, _ ∈ E(W )} of its abstract executions.

2.2 Consistency against Weak-Visibility Specifications

To define the consistency of implementations against specifications, we leverage a
general model of computation to capture the behavior of typical concurrent systems,
e.g., including multiprocess and multithreaded systems. A sequence-labeled transition
system
Q, A, q, → is a set Q of states, along with a set A of actions, initial state q ∈ Q
and transition relation → ∈ Q × A∗ × Q. An execution is an alternating sequence
η = q0a0 q1a1 . . . qn of states and action sequences starting with q0 = q such that

ai ∗
qi −→ qi+1 for each 0 ≤ i < n. The trace τ ∈ A of the execution η is its projection
a0a1 . . . to individual actions.
To capture the histories admitted by a given implementation, we consider sequence-
labeled transition systems (SLTSs) which expose actions corresponding to method call,
return, and happens-before constraints. We refer to the actions call(o, m, x), ret(o, y),
and hb(o, o ), for o, o ∈ N, m ∈ M, and x, y ∈ X, as the history actions, and a history
transition system is an SLTS whose actions include the history actions. We say that an
Verifying Visibility-Based Weak Consistency 287

action over operation identifier o is an o-action, and assume that executions are well
formed in the sense that for a given operation identifier o: at most one call o-action
occurs, at most one ret o-action occurs, and no ret nor hb o-actions occur prior to a
call o-action. Furthermore, we assume call o-actions are enabled, so long as no prior
call o-action has occurred. The history of a trace τ is defined inductively by fh (h∅ , τ ),
where h∅ is the empty history, and,
fh (h, ε) = h gh (h, call(o, m, x)) =
O ∪ {o}, inv [o →
m, x], ret, hb
fh (h, aτ ) = fh (gh (h, a), τ ) gh (h, ret(o, y)) =
O, inv , ret[o → y], hb
fh (h, ãτ ) = fh (h, τ ) gh (h, hb(o, o )) =
O, inv , ret, hb ∪
o, o

where h =
O, inv , ret, hb, and a is a call, ret, or hb action, and ã is not. An imple-
mentation I is a history transition system, and the histories H(I) of I are those of its
traces. Finally, we define consistency against specifications via history containment.
Definition 3. Implementation I is consistent with specification W iff H(I) ⊆ H(W ).

3 Establishing Consistency with Forward Simulation

To obtain a consistency proof strategy, we more closely relate implementations to
specifications via their admitted abstract executions. To capture the abstract executions
admitted by a given implementation, we consider SLTSs which expose not only history-
related actions, but also actions witnessing linearization and visibility. We refer to
the actions lin(o) and vis(o, o ) for o, o ∈ N, along with the history actions, as the
abstract-execution actions, and an abstract-execution transition system (AETS) is an SLTS
whose actions include the abstract-execution actions. Extending the corresponding
notion from history transition systems, we assume that executions are well formed in
the sense that for a given operation identifier o: at most one lin o-action occurs, and no
lin or vis o-actions occur prior to a call o-action. The abstract execution of a trace τ is
defined inductively by fe (e∅ , τ ), where e∅ =
h∅ , ∅, ∅ is the empty execution, and,
fe (e, ε) = e ge (e, â) =
gh (h), lin, vis
fe (e, aτ ) = fe (ge (e, a), τ ) ge (e, lin(o)) =
h, lin ∪ {
o , o : o ∈ lin}, vis
fe (e, ãτ ) = fe (e, τ ) ge (e, vis(o, o )) =
h, lin, vis ∪ {
o, o }
where e =
h, lin, vis, and a is a call, ret, hb, lin, or vis action, ã is not, and â is a
call, ret, or hb action. A witnessing implementation I is an abstract-execution transition
system, and the abstract executions E(I) of I are those of its traces.
We adopt forward simulation [25] for proving consistency against weak-visibility
specifications. Formally, a simulation relation from one system Σ1 =
Q1 , A1 , χ1 , →1
to another Σ2 =
Q2 , A2 , χ2 , →2 is a binary relation R ⊆ Q1 × Q2 such that initial
states are related, R(χ1 , χ2 ), and: for any pair of related states R(q1 , q2 ) and source-
a1
a2
system transition q1 − →1 q1 , there exists a target-system transition q2 −→ 2 q2 to

related states, i.e., R(q1 , q2 ), over common actions, i.e., (a1 | A2 ) = (a2 | A1 ). We say
Σ2 simulates Σ1 and write Σ1 Σ2 when a simulation relation from Σ1 to Σ2 exists.
We derive transition systems to model consistency specifications in simulation. The
following lemma establishes the soundness and completeness of this substitution, and
the subsequent theorem asserts the soundness of the simulation-based proof strategy.
288 S. Krishna et al.

Definition 4. The transition system W s of a weak-visibility specification W is the

AETS whose actions are the abstract execution actions, whose states are abstract executions,

a
whose initial state is the empty execution, and whose transitions include e1 − → e2 iff
fe (e1 , a) = e2 and e2 is consistent with W .

Lemma 2. A weak-visibility spec. and its transition system have identical histories.

Theorem 1. A witnessing implementation I is consistent with a weak-visibility specifi-

cation W if the transition system W s of W simulates I.

Our notion of simulation is in some sense complete when the sequential specifica-
tion S of a weak-consistency specification W =
S, R, V is return-value deterministic,
i.e., there is a single label
m, x, y such that λ ·
m, x, y ∈ S for any method m,
argument-value x, and admitted sequence λ ∈ S. In particular, W s simulates any wit-
nessing implementation I whose abstract executions E(I) are included in E(W s ).11
This completeness, however, extends only to inclusion of abstract executions, and not
all the way to consistency, since consistency is defined on histories, and any given
operation’s return value is not completely determined by the other operation labels
and happens-before relation of a given history: return values generally depend on lin-
earization order and visibility as well. Nevertheless, sequential specifications typically
are return-value deterministic, and we have used simulation to prove consistency of
Java-inspired weakly-consistent objects.
Establishing simulation for an implementation is also helpful when reasoning
about clients of a concurrent object. One can use the specification in place of the
implementation and encode the client invariants using the abstract execution of the
specification in order to prove client properties, following Sergey et al.’s approach [35].

3.1 Reducing Consistency to Safety Verification

Proving simulation between an implementation and its specification can generally be
achieved via product construction: complete the transition system of the specification,
replacing non-enabled transitions with error-state transitions; then ensure the synchro-
nized product of implementation and completed-specification transition systems is safe,
i.e., no error state is reachable. Assuming that the individual transition systems are
safe, then the product system is safe iff the specification simulates the implementation.
This reduction to safety verification is also generally applicable to implementation
and specification programs, though we limit our formalization to their underlying
transition systems for simplicity. By the upcoming Corollary 1, such reductions enable
consistency verification with existing safety verification tools.

3.2 Verifying Implementations

While Theorem 1 establishes forward simulation as a strategy for proving the con-
sistency of implementations against weak-visibility specifications, its application to
11
This is a consequence of a generic result stating that the set of traces of an LTS A1 is included
in the set of traces of an LTS A2 iff A2 simulates A1 , provided that A2 is deterministic [25].
Verifying Visibility-Based Weak Consistency 289

real-world implementations requires program-level mechanisms to signal the underly-

ing AETS lin and vis actions. To apply forward simulation, we thus develop a notion of
programs whose commands include such mechanisms.
This section illustrates a toy programming language with AETS semantics which
provides these mechanisms. The key features are the lin and vis program commands,
which emit linearization and visibility actions for the currently-executing operation,
along with load, store, and cas (compare-and-swap) commands, which record and return
the set of operation identifiers having written to each memory cell. Such augmented
memory commands allow programs to obtain handles to the operations whose effects
it has observed, in order to signal the corresponding vis actions.
While one can develop similar mechanisms for languages with any underlying
memory model, the toy language presented here assumes a sequentially-consistent
memory. Note that the assumption of sequentially-consistent memory operations is
practically without loss of generality for Java 8’s concurrent collections since they are
designed to be data-race free — their anomalies arise not from weak-memory semantics,
but from non-atomic operations spanning several memory cells.
For generality, we assume abstract notions of commands and memory, using κ,
μ, , and M respectively to denote a program command, memory command, local
state, and global memory. So that operations can assert their visibilities, we consider
memory which stores, and returns upon access, the identifier(s) of operations which
previously accessed a given cell. A program P =
init, cmd, idle, done consists of an
init(m, x) = function mapping method name m and argument values x to local state
, along with a cmd() = κ function mapping local state to program command κ,
and idle() and done() predicates on local states . Intuitively, identifying local states
with threads, the idle predicate indicates whether a thread is outside of atomic sections,
and subject to interference from other threads; meanwhile the done predicate indicates
whether whether a thread has terminated.
The denotation of a memory command μ is a function μm from global memory
M1 , argument value x, and operation o to a tuple μm (M1 , x, o) =
M2 , y consisting
of a global memory M2 , along with a return value y.

Example 6. A sequentially-consistent memory system which records the set of oper-

ations to access each location can be captured by mapping addresses x to value and
operation-set pairs M (x) =
y, O, along with three memory commands:

loadm (M, x, _) =
M, M (x)
storem (M, xy, o) =
M [x →
y, M (x)1 ∪ {o}], ε

M [x →
z, M (x)1 ∪ {o}],
true, M (x)1 if M (x)0 = y

casm (M, xyz, o) =

M,
false, M (x)1 if M (x)0 = y

where the compare-and-swap (CAS) operation stores value z at address x and returns
true when y was previously stored, and otherwise returns false.

The denotation of a program command κ is a function κc from local state 1 to a

tuple κc (1 ) =
μ, x, f consisting of a memory command μ and argument value x,
290 S. Krishna et al.

and a update continuation f mapping the memory command’s return value y to a pair
f (y) =
2 , α, where 2 is an updated local state, and α maps an operation o to an LTS
action α(o). We assume the denotation ret xc (1 ) =
nop, ε, λy.
2 , λo.ret(z) of
the ret command yields a local state 2 with done(2 ) without executing memory
commands, and outputs a corresponding LTS ret action.

Example 7. A simple goto language over variables a, b, . . . for the memory system of
Example 6 would include the following commands:

goto ac () =
nop, ε, λy.
jump(, (a)), λo.ε
assume ac () =
nop, ε, λy.
next(), λo.ε if (a) = 0
b, c = load(a)c () =
load, (a), λy1 , y2 .
next([b → y1 ][c → y2 ]), λo.ε
store(a, b)c () =
store, (a)(b), λy.
next(), λo.ε
d, e = cas(a, b, c)c () =
cas, (a)(b)(c), λy1 , y2 .
next([d → y1 ][e → y2 ]), λo.ε

where the jump and next functions update a program counter, and the load command
stores the operation identifier returned from the corresponding memory commands.
Linearization and visibility actions are captured as program commands as follows:

linc () =
nop, ε, λy.
next(), λo.lin(o)
vis(a)c () =
nop, ε, λy.
next(), λo.vis(o, (a))

Atomic sections can be captured with a lock variable and a pair of program commands,

beginc () =
nop, ε, λy.
next([lock → true]), λo.ε
endc () =
nop, ε, λy.
next([lock → false]), λo.ε

such that idle states are identified by not holding the lock, i.e., idle() = ¬(lock), as
in the initial state init(m, x)(lock) = false.

Figure 1 lists the semantics P p of a program P as an abstract-execution transition

system. The states
M, L of P p include a global memory M , along with a partial
function L from operation identifiers o to local states L(o); the initial state is
M∅ , ∅,
where M∅ is an initial memory state. The transitions for call and hb actions are enabled
independently of implementation state, since they are dictated by implementations’
environments. Although we do not explicitly model client programs and platforms
here, in reality, client programs dictate call actions, and platforms, driven by client
programs, dictate hb actions; for example, a client which acquires the lock released after
operation o1 , before invoking operation o2 , is generally ensured by its platform that o1
happens before o2 . The transitions for all other actions are dictated by implementation
commands. While the ret, lin, and vis commands generate their corresponding LTS
actions, all other commands generate ε transitions.
a
Each atomic − → step of the AETS underlying a given program is built from a
sequence of steps for the individual program commands in an atomic section.
Individual program commands essentially execute one small step from shared
memory and local state
M1 , 1 to
M2 , 2 , invoking memory command μ with
Verifying Visibility-Based Weak Consistency 291

o ∈ dom(L) = init(m, x) done(L(o1 )) o2 ∈ dom(L)

call(o,m,x) hb(o1 ,o2 )
M, L −−−−−−→ M, L[o → ] M, L −−−−−−→ M, L

M1 , 1 , o, ε ∗ M2 , 2 , o, a idle(2 )

a
M1 , L[o → 1 ] −
→ M2 , L[o → 2 ]

cmd(1 ) = κ κc (1 ) = μ, x, f

μm (M1 , x, o) = M2 , y f (y) = 2 , α
M1 , 1 , o, a M2 , 2 , o, a · α(o)

Fig. 1. The semantics of program P = init, cmd, idle, done as an abstract-execution transition
system, where ·c and ·m are the denotations of program and memory commands, respectively.

argument x, and emitting action α(o). Besides its effect on shared memory, each step
uses the result
M2 , y of memory command μ to update local state and emit an action
using the continuation f , i.e., f (y) =
2 , α. Commands which do not access memory
are modeled by a no-op memory commands. We define the consistency of programs by
reduction to their transition systems.

Definition 5. A program P is consistent with a specification iff its semantics P p is.

Thus the consistency of P with W amounts to the inclusion of P p ’s histories

in W ’s. The following corollary of Theorem 1 follows directly by Definition 5, and
immediately yields a program verification strategy: validate a simulation relation from
the states of P p to the states of W s such that each command of P is simulated by
a step of W s .

Corollary 1. A program P is consistent with specification W if W s simulates P p .

4 Proof Methodology

In this section we develop a systematic means to annotating concurrent objects for

relaxed-visibility simulation proofs. Besides leveraging an auxiliary memory system
which tags memory accesses with the operation identifiers which wrote read values
(see §3.2), annotations signal linearization points with lin commands, and indicate
visibility of other operations with vis commands. As in previous works [3, 37, 2, 18] we
assume linearization points are given, and focus on visibility-related annotations.
As we focus on data-race free implementations (e.g., Java 8’s concurrent collections)
for which sequential consistency is sound, it can be assumed without loss of generality
that the happens-before order is exactly the returns-before order between operations,
which orders two operations o1 and o2 iff the return action of o1 occurs in real-time
before the call action of o2 . This assumption allows to guarantee that linearizations are
consistent with happens-before just by ensuring that the linearization point of each
operation occurs in between its call and return action (like in standard linearizability).
292 S. Krishna et al.

var table: array of T; procedure monotonic has(v: T)

vis(getModLin());
procedure absolute put(k: int, v: T) { {
atomic { store(k, 0);
store(table[k], v); while (k < table.length) {
vis(getLin()); atomic{
lin(); tv, O = load(table[k]);
} vis(O ∩ getModLin());
} }
if (tv = v) then {
procedure absolute get(k: int) { lin();
atomic{ return true;
v, O = load(table[k]); }
vis(getLin()); inc(k);
lin(); }
} lin();
return v; return false;
} }

Fig. 2. An implementation Ichm modeling Java’s concurrent hash map. The command inc(k)
increments counter k, and commands within atomic {. . .} are collectively atomic.

It is without loss of generality because the clients of such implementations can use
auxiliary variables to impose synchronization order constraints between every two
operations ordered by returns-before, e.g., writing a variable after each operation
returns which is read before each other operation is called (under sequential consistency,
every write happens-before every other read which reads the written value).
We illustrate our methodology with the key-value map implementation Ichm of
Figure 2, which models Java’s concurrent hash map. The lines marked in blue and
red represent linearization/visibility commands added by the instrumentation that
will be described below. Key-value pairs are stored in an array table indexed by keys.
The implementation of put and get are obvious while the implementation of has
returns true iff the input value is associated to some key consists of a while loop
traversing the array and searching for the input value. To simplify the exposition, the
shared memory reads and writes are already adapted to the memory system described
in Section 3.2 (essentially, this consists in adding new variables storing the set of
operation identifiers returned by a shared memory read). While put and get are
obviously linearizable, has is weakly consistent, with monotonic visibility. For instance,
given the two thread program {get(1); has(1)} || {put(1, 1); put(0, 1); put(1, 0)} it
is possible that get(1) returns 1 while has(1) returns false. This is possible in an
interleaving where has reads table[0] before put(0,1) writes into it (observing the
initial value 0), and table[1] after put(1,0) writes into it (observing value 0 as well).
The only abstract execution consistent with the weakly-consistent contains-value map
Wm (Example 2) which justifies these return values is given in Example 3. We show
that this implementation is consistent with a simplification of the contains-value map
Wm , without remove key operations, and where put operations return no value.
Given an implementation I, let L(I) be an instrumentation of I with program
commands lin() emitting linearization actions. The execution of lin() in the context
of an operation with identifier o emits a linearization action lin(o). We assume that L(I)
leads to well-formed executions (e.g., at most one linearization action per operation).
Verifying Visibility-Based Weak Consistency 293

Example 8. For the implementation in Figure 2, the linearization commands of put

and get are executed atomically with the store to table[k] in put and the load of
table[k] in get, respectively. The linearization command of has is executed at any
point after observing the input value v or after exiting the loop, but before the return.
The two choices correspond to different return values and only one of them will be
executed during an invocation.

Given an instrumentation L(I), a visibility annotation V for I’s methods, and a

read-only predicate R, we define a witnessing implementation V(L(I)) according to
a generic heuristic that depends only on V and R. This definition uses a program
command getLin() which returns the set of operations in the current linearization
sequence.12 The current linearization sequence is stored in a history variable which
is updated with every linearization action by appending the corresponding operation
identifier. For readability, we leave this history variable implicit and omit the corre-
sponding updates. As syntactic sugar, we use a command getModLin() which returns
the set of modifiers (non read-only operations) in the current linearization sequence.
To represent visibility actions, we use program commands vis(A) where A is a set
of operation identifiers. The execution of vis(A) in the context of an operation with
identifier o emits the set of visibility actions vis(o, o ) for every operation o ∈ A.
Therefore, V(L(I)) extends the instrumentation L(I) with commands generating
visibility actions as follows:
– for absolute methods, each linearization command is preceded by vis(getLin())
which ensures that the visibility of an invocation includes all the predecessors in
linearization order. This is executed atomically with lin().
– for monotonic methods, the call action is followed by vis(getModLin()) (and
executed atomically with this command) which ensures that the visibility of each
invocation is monotonic, and every read of a shared variable which has been written
by a set of operations O is preceded by vis(O ∩ getModLin()) (and executed
atomically with this command). The latter is needed so that the visibility of such
an invocation contains enough operations to explain its return value (the visibility
command attached to call actions is enough to ensure monotonic visibilities).

Example 9. The blue lines in Figure 2 demonstrate the visibility commands added by
the instrumentation V(·) to the key-value map in Figure 2 (in this case, the modifiers
are put operations). The first visibility command in has precedes the procedure body
to emphasize the fact that it is executed atomically with the procedure call. Also, note
that the read of the array table is the only shared memory read in has.

Theorem 2. The abstract executions of the witnessing implementation V(L(I)) are

consistent with V and R.

Proof. Let
h, lin, vis be the abstract execution of a trace τ of V(L(I)), and let o be
an invocation in h of a monotonic method (w.r.t. V ). By the definition of V, the call
action of o is immediately followed in τ by a sequence of visibility actions vis(o, o )
12
We rely on retrieving the identifiers of currently-linearized operations. More complex proofs
may also require inspecting, e.g., operation labels and happens-before relationships.
294 S. Krishna et al.

for every modifier o which has been already linearized. Therefore, any operation
which has returned before o (i.e., happens-before o) has already been linearized and it
will necessarily have a smaller visibility (w.r.t. set inclusion) because the linearization
sequence is modified only by appending new operations. The instrumentation of
shared memory reads may add more visibility actions vis(o, _) but this preserves the
monotonicity status of o’s visibility. The case of absolute methods is obvious.
The consistency of the abstract executions of V(L(I)) with a given sequential
specification S, which completes the proof of consistency with a weak-visibility speci-
fication W =
S, R, V , can be proved by showing that the transition system W s of
W simulates V(L(I)) (Theorem 1). Defining a simulation relation between the two
systems is in some part implementation specific, and in the following we demonstrate
it for the key-value map implementation V(L(Ichm )).
We show that Wm s simulates implementation Ichm . A state of Ichm in Figure 2
is a valuation of table and the history variable lin storing the current linearization
sequence, and a valuation of the local variables for each active operation. Let ops(q)
denote the set of operations which are active in an implementation state q. Also, for
a has operation o ∈ ops(q), let index (o) be the maximal index k of the array table
such that o has already read table[k] and table[k] = v. We assume index (o) = −1 if
o did not read any array cell.
Definition 6. Let Rchm be a relation which associates every implementation state q with
a state of Wm s , i.e., an
S, R, V -consistent abstract execution e =
h, lin, vis with
h =
O, inv , ret, hb, such that:
1. O is the set of identifiers occurring in ops(q) or the history variable lin,
2. for each operation o ∈ ops(q), inv (o) is defined according to its local state, ret(o) is
undefined, and o is maximal in the happens-before order hb,
3. the value of the history variable lin in q equals the linearization sequence lin,
4. every invocation o ∈ ops(q) of an absolute method (put or get) has absolute visibility
if linearized, otherwise, its visibility is empty,
5. table is the array obtained by executing the sequence of operations lin,
6. for every linearized get(k) operation o ∈ ops(q), the put(k,_) operation in vis(o)
which occurs last in lin writes v to key k, where v is the local variable of o,
7. for every has operation o ∈ ops(q), vis(o) consists of:
– all the put operations o which returned before o was invoked,
– for each i ≤ index (o), all the put(i,_) operations from a prefix of lin that
wrote a value different from v,
– all the put(index (o) + 1,_) operations from a prefix of lin that ends with a
put(index (o) + 1,v) operation, provided that tv = v.
Above, the linearization prefix associated to an index j1 < j2 should be a prefix of
the one associated to j2 .
A large part of this definition is applicable to any implementation, only points (5),
(6), and (7) being specific to the implementation we consider. The points (6) and (7)
ensure that the return values of operations are consistent with S and mimic the effect
of the vis commands from Figure 2.
Theorem 3. Rchm is a simulation relation from V(L(Ichm )) to Wm s .
Verifying Visibility-Based Weak Consistency 295

5 Implementation and Evaluation

In this section we effectuate our methodology by verifying two weakly-consistent

concurrent objects: Java’s ConcurrentHashMap and ConcurrentLinkedQueue.13 We
use an off-the-shelf deductive verification tool called civl [16], though any concurrent
program verifier could suffice. We chose civl because comparable verifiers either
require a manual encoding of the concurrency reasoning (e.g. Dafny or Viper) which
can be error-prone, or require cumbersome reasoning about interleavings of thread-
local histories (e.g. VerCors). An additional benefit of civl is that it directly proves
simulation, thereby tying the mechanized proofs to our theoretical development. Our
proofs assume no bound on the number of threads or the size of the memory.
Our use of civl imposes two restrictions on the implementations we can verify.
First, civl uses the Owicki-Gries method [29] to verify concurrent programs. These
methods are unsound for weak memory models [22], so civl, and hence our proofs,
assume a sequentially-consistent memory model. Second, civl’s strategy for building
the simulation relation requires implementations to have statically-known linearization
points because it checks that there exists exactly one atomic section in each code path
where the global state is modified, and this modification is simulated by the specification.
Given these restrictions, we can simplify our proof strategy of forward refinement
by factoring the simulations we construct through an atomic version of the specification
transition system. This atomic specification is obtained from the specification AETS
W s by restricting the interleavings between its transitions.

Definition 7. The atomic transition system of a specification W is the AETS W a =

Q, A, q, →a , where W s =
Q, A, q, → is the AETS of W and e1 −
→a e2 if and only if

a ∗
→ e2 and a ∈ {call(o, m, x)}∪{ret(o, y)}∪{hb(o, o )}∪{a1 lin(o) : a1 ∈ {vis(o, _)} }.
e1 −

Note that the language of W a is included in the language of W s and simulation

proofs towards W a apply to W s as well.
Our civl proofs show that there is a simulation from an implementation to its atomic
specification, which is encoded as a program whose state consists of the components
of an abstract execution, i.e.,
O, inv , ret, hb, lin, vis. These were encoded as maps
from operation identifiers to values, sequences of operation identifiers, and maps from
operation identifiers to sets of operation identifiers respectively. Our axiomatization
of sequences and sets were adapted from those used by the Dafny verifier [23]. For
each method in M, we defined atomic procedures corresponding to call actions, return
actions, and combined visibility and linearization actions in order to obtain exactly the
atomic transitions of W a .
It is challenging to encode Java implementations faithfully in civl, as the latter’s
input programming language is a basic imperative language lacking many Java features.
Most notable among these is dynamic memory allocation on the heap, used by almost
all of the concurrent data structure implementations. As civl is a first-order prover,
we needed an encoding of the heap that lets us perform reachability reasoning on the
13
Our verified implementations are open source, and available at:
https://github.com/siddharth-krishna/weak-consistency-proofs.
296 S. Krishna et al.

heap. We adapted the first-order theory of reachability and footprint sets from the
GRASShopper verifier [30] for dynamically allocated data structures. This fragment is
decidable, but relies on local theory extensions [36], which we implemented by using
the trigger mechanism of the underlying SMT solver [27, 15] to ensure that quantified
axioms were only instantiated for program expressions. For instance, here is the “cycle”
axiom that says that if a node x has a field f[x] that points to itself, then any y that
it can reach via that field (encoded using the between predicate Btwn(f, x, y, y))
must be equal to x:
axiom (forall f: [Ref]Ref, x: Ref, y:Ref :: {known(x), known(y)}
f[x] == x && Btwn(f, x, y, y) ==> x == y);

We use the trigger known(x), known(y) (known is a dummy function that maps every
reference to true) and introduce known(t) terms in our programs for every term t of
type Ref (for instance, by adding assert known(t) to the point of the program where
t is introduced). This ensures that the cycle axiom is only instantiated for terms that
appear in the program, and not for terms that are generated by instantations of axioms
(like f[x] in the cycle axiom). This process was key to keeping the verification time
manageable.
Since we consider fine-grained concurrent implementations, we also needed to
reason about interference by other threads and show thread safety. civl provides
Owicki-Gries [29] style thread-modular reasoning, by means of demarcating atomic
blocks and providing preconditions for each block that are checked for stability under
all possible modifications by other threads. One of the consequences of this is that
these annotations can only talk about the local state of a thread and the shared global
state, but not other threads. To encode facts such as distinctness of operation identifiers
and ownership of unreachable nodes (e.g. newly allocated nodes) in the shared heap,
we use civl’s linear type system [40].
For instance, the proof of the push method needs to make assertions about the value
of the newly-allocated node x. These assertions would not be stable under interference
of other threads if we didn’t have a way of specifying that the address of the new node
is known only by the push thread. We encode this knowledge by marking the type of
the variable x as linear – this tells civl that all values of x across all threads are distinct,
which is sufficient for the proof. civl ensures soundness by making sure that linear
variables are not duplicated (for instance, they cannot be passed to another method
and then used afterwards).
We evaluate our proof methodology by considering models of two of Java’s weakly-
consistent concurrent objects.

Concurrent Hash Map One is the ConcurrentHashMap implementation of the Map

ADT, consisting of absolute put and get methods and a monotonic has method that
follows the algortihm given in Figure 2. For simplicity, we assume here that keys are
integers and the hash function is identity, but note that the proof of monotonicity of
has is not affected by these assumptions.14
14
Our civl implementation assumes the hash function is injective to avoid reasoning about the
dynamic bucket-list needed to resolve hash collisions. While such reasoning is possible within
Verifying Visibility-Based Weak Consistency 297

Module Code Proof Total Time (s)

Sets and Sequences - 85 85 -
Executions and Consistency - 30 30 -
Heap and Reachability - 35 35 -
Map ADT 51 34 85 -
Array-map implementation 138 175 313 6
Queue ADT 50 22 72 -
Linked Queue implementation 280 325 605 13

Fig. 3. Case study detail: for each object we show lines of code, lines of proof, total lines, and
verification time in seconds. We also list common definitions and axiomatizations separately.

civl can construct a simulation relation equivalent to the one defined in Definition 6
automatically, given an inductive invariant that relates the state of the implementation
to the abstract execution. A first attempt at an invariant might be that the value stored
at table[k] for every key k is the same as the value returned by adding a get operation
on k by the specification AETS. This invariant is sufficient for civl to prove that the
return value of the absolute methods (put and get) is consistent with the specification.
However, it is not enough to show that the return value of the monotonic has
method is consistent with its visibility. This is because our proof technique constructs
a visibility set for has by taking the union of the memory tags (the set of operations
that wrote to each memory location) of each table entry it reads, but without additional
invariants this visibility set could entail a different return value. We thus strengthen
the invariant to say that tableTags[k], the memory tags associated with hash table
entry k, is exactly the set of linearized put operations with key k. A consequence of
this is that the abstract state encoded by tableTags[k] has the same value for key k as
the value stored at table[k]. civl can then prove, given the following loop invariant,
that the value returned by has is consistent with its visibility set.
(forall i: int :: 0 <= i && i < k ==> Map.ofVis(my_vis, lin)[i] != v)
This loop invariant says that among the entries scanned thus far, the abstract map
given by the projection of lin to the current operation’s visibility my_vis does not
include value v.

Concurrent Linked Queue Our second case study is the ConcurrentLinkedQueue

implementation of the Queue ADT, consisting of absolute push and pop methods and
a monotonic size method that traverses the queue from head to tail without any locks
and returns the number of nodes it sees (see Figure 4 for the full code). We again model
the core algorithm (the Michael-Scott queue [26]) and omit some of Java’s optimizations,
for instance to speed up garbage collection by setting the next field of popped nodes
to themselves, or setting the values of nodes to null when popping values.
The invariants needed to verify the absolute methods are a straightforward combi-
nation of structural invariants (e.g. that the queue is composed of a linked list from
the head to null, with the tail being a member of this list) and a relation between the
civl, see our queue case study, this issue is orthogonal to the weak-consistency reasoning
that we study here.
298 S. Krishna et al.

var head, tail: Ref; struct Node { var data: K; var next: Ref; }

procedure absolute push(k: K) { procedure absolute pop() { procedure monotonic size()

x = new Node(k, null); while (true) { vis(getModLin());
while (true) { h, _ = load(head); {
t, _ = load(tail); t, _ = load(tail); store(s, 0);
tn, _ = load(tail.next); hn, _ = load(h.next); c, _ = load(head);
if (tn == null) { if (h != t) { atomic {
atomic { k, _ = load(hn.data); cn, O = load(c.next);
b, _ = cas(t.next, tn, x); atomic { vis(O ∩ getModLin());
if (b) { b, _ = cas(head, h, hn); }
vis(getLin()); if (b) { while (cn != null) {
lin(); vis(getLin()); inc(s);
} lin(); c = cn;
} } atomic {
if (b) then break; } cn, O = load(c.next);
if (b) then return k; vis(O ∩ getModLin());
} else { } }
b, _ = cas(tail, t, tn); } }
} } lin();
} return s;
} }

Fig. 4. The simplified implementation of Java’s ConcurrentLinkedQueue that we verify.

abstract and concrete states. Once again, we need to strengthen this invariant in order
to verify the monotonic size method, because otherwise we cannot prove that the
visibility set we construct (by taking the union of the memory tags of nodes in the list
during traversal) justifies the return value.
The key additional invariant is that the memory tags for the next field of each node
(denoted x.nextTags for each node x) in the queue contain the operation label of the
operation that pushed the next node into the queue (if it exists). Further, the sequence
of push operations in lin are exactly the operations in the nextTags field of nodes in
the queue, and in the order they are present in the queue.
Figure 5 shows a simplified version of the civl encoding of these invariants. In
it, we use the following auxiliary variables in order to avoid quantifier alternation:
nextInvoc maps nodes to the operation label (type Invoc in civl) contained in the
nextTags field; nextRef maps operations to the nodes whose nextTags field contains
them, i.e. it is the inverse of nextInvoc; and absRefs maps the index of the abstract
queue (represented as a mathematical sequence) to the corresponding concrete heap
node. We omit the triggers and known predicates for readability; the full invariant can
be found in the accompanying proof scripts.
Given these invariants, one can show that the return value s computed by size
is consistent with the visibility set it constructs by picking up the memory tags from
each node that it traverses. The loop invariant is more involved, as due to concurrent
updates size could be traversing nodes that have been popped from the queue; see
our civl proofs for more details.

Results Figure 3 provides a summary of our case studies. We separate the table into
sections, one for each case study, and a common section at the top that contains the
common theories of sets and sequences and our encoding of the heap. In each case study
section, we separate the definitions of the atomic specification of the ADT (which can
Verifying Visibility-Based Weak Consistency 299

// nextTags only contains singleton sets of push operations

(forall y: Ref ::
(Btwn(next, start, y, null) && y != null && next[y] != null
==> nextTags[y] == Set(nextInvoc[y])
&& invoc_m(nextInvoc[y]) == Queue.push))

// nextTags of the last node is the empty set

&& nextTags[absRefs[Queue.stateTail(Queue.ofSeq(lin)) - 1]]
== Set_empty()

// lin is made up of nextInvoc[y] for y in the queue

&& (forall n: Invoc :: invoc_m(n) == Queue.push
==> (Seq_elem(n, lin)
<==> Btwn(next, start, nextRef[n], null)
&& nextRef[n] != null && next[nextRef[n]] != null))

// lin is ordered by order of nodes in queue

&& (forall n1, n2: Invoc ::
(invoc_m(n1) == Queue.push && invoc_m(n2) == Queue.push
&& Seq_elem(n1, lin) && Seq_elem(n2, lin)
==> (Seq_ord(lin, n1, n2)
<==> Btwn(next, nextRef[n1], nextRef[n1], nextRef[n2])
&& nextRef[n1] != nextRef[n2])))

Fig. 5. A snippet from the civl invariant for the queue.

be reused for other implementations) from the code and proof of the implementation
we consider. For each resulting module, we list the number of lines of code, lines of
proof, total lines, and civl’s verification time in seconds. Experiments were conducted
on an Intel Core i7-4470 3.4 GHz 8-core machine with 16GB RAM.
Our two case studies are representative of the weakly-consistent behaviors exhibited
by all the Java concurrent objects studied in [13], both those using fixed-size arrays
and those using dynamic memory. As civl does not direclty support dynamic memory
and other Java language features, we were forced to make certain simplifications
to the algorithms in our verification effort. However, the assumptions we make are
orthogonal to the reasoning and proof of weak consistency of the monotonic methods.
The underlying algorithm used by, and hence the proof argument for monotonicity
of, hash map’s has method is the same as that in the other monotonic hash map
operations such as elements, entrySet, and toString. Similarly, the argument used
for the queue’s size can be adapted to other monotonic ConcurrentLinkedQueue
and LinkedTransferQueue operations like toArray and toString. Thus, our proofs
carry over to the full versions of the implementations as the key invariants linking the
memory tags and visibility sets to the specification state are the same.
In addition, civl does not currently have any support for inferring the preconditions
of each atomic block, which currently accounts for most of the lines of proof in our case
studies. However, these problems have been studied and solved in other tools [30, 39],
and in theory can be integrated with civl in order to simplify these kinds of proofs.
300 S. Krishna et al.

In conclusion, our case studies show that verifying weakly-consistent operations

introduces little overhead compared to the proofs of the core absolute operations. The
additional invariants needed to prove monotonicity were natural and easy to construct.
We also see that our methodology brings weak-consistency proofs within the scope of
what is provable by off-the-shelf automated concurrent program verifiers in reasonable
time.

6 Related Work

Though linearizability [18] has reigned as the de-facto concurrent-object consistency

criterion, several recent works proposed weaker criteria, including quantitative re-
laxation [17], quiescent consistency [10], and local linearizability [14]; these works
effectively permit externally-visible interference among threads by altering objects’ se-
quential specifications, each in their own way. Motivated by the diversity of these
proposals, Sergey et al. [35] proposed the use of Hoare logic for describing a custom
consistency specification for each concurrent object. Raad et al. [31] continued in this
direction by proposing declarative consistency models for concurrent objects atop
weak-memory platforms. One common feature between our paper and this line of
work (see also [21, 9]) is encoding and reasoning directly about the concurrent history.
The notion of visibility relaxation [13] originates from Burckhardt et al.’s axiomatic
specifications [7], and leverages traditional sequential specifications by allowing certain
operations to behave as if they are unaware of concurrently-executed linearization-
order predecessors. The linearization (and visibility) actions of our simulation-proof
methodology are unique to visibility-relaxation based weak-consistency, since they
refer to a global linearization order linking executions with sequential specifications.
Typical methodologies for proving linearizability are based on reductions to safety
verification [8, 5] and forward simulation [3, 37, 2], the latter generally requiring
the annotation of per-operation linearization points, each typically associated with
a single program statement in the given operation, e.g., a shared memory access.
Extensions to this methodology include cooperation [38, 12, 41], i.e., allowing operations’
linearization points to coincide with other operations’ statements, and prophecy [33, 24],
i.e., allowing operation’ linearization points to depend on future events. Such extensions
enable linearizability proofs of objects like the Herlihy-Wing Queue (HWQ). While
prophecy [25], alternatively backward simulation [25], is generally more powerful
than forward simulation alone, Bouajjani et al. [6] described a methodology based on
forward simulation capable of proving seemingly future-dependent objects like HWQ
by considering fixed linearization points only for value removal, and an additional
kind of specification-simulated action, commit points, corresponding to operations’
final shared-memory accesses. Our consideration of specification-simulated visibility
actions follows this line of thinking, enabling the forward-simulation based proof of
weakly-consistent concurrent objects.
Verifying Visibility-Based Weak Consistency 301

7 Conclusion and Future Work

This work develops the first verification methodology for weakly-consistent operations
using sequential specifications and forward simulation, thus reusing existing sequential
ADT specifications and enabling simple reasoning, i.e., without prophecy [1] or back-
ward simulation [25]. This paper demonstrates the application of our methodology to
absolute and monotonic methods on sequentially-consistent memory, as these are the
consistency levels demonstrated in actual Java implementations of which we are aware.
Our formalization is general, and also applicable to the other visibility relaxations,
e.g., the peer and weak visibilities [13], and weaker memory models, e.g., the Java
memory model.
Extrapolating, we speculate that handling other visibilities amounts to adding anno-
tations and auxiliary state which mirrors inter-operation communication. For example,
while monotonic operations on shared-memory implementations observe mutating
linearization-order predecessors – corresponding to a sequence of shared-memory up-
dates – causal operations with message-passing based implementations would observe
operations whose messages have (transitively) propagated. The corresponding anno-
tations may require auxiliary state to track message propagation, similar in spirit to
the getModLin() auxiliary state that tracks mutating linearization-order predecessors
(§4). Since weak memory models essentially alter the mechanics of inter-operation
communication, the corresponding visibility annotations and auxiliary state may simi-
larly reflect this communication. Since this communication is partly captured by the
denotations of memory commands (§3.2), these denotations would be modified, e.g., to
include not one value and tag per memory location, but multiple. While variations are
possible depending on the extent to which the proof of a given implementation relies
on the details of the memory model, in the worst case the auxiliary state could capture
an existing memory model (e.g., operational) semantics exactly.
As with systematic or automated linearizability-proof methodologies, our proof
methodology is susceptible to two potential sources of incompleteness. First, as men-
tioned in Section 3, methodologies like ours based on forward simulation are only
complete when specifications are return-value deterministic. However, data types are
typically designed to be return-value deterministic and this source of incompleteness
does not manifest in practice.
Second, methodologies like ours based on annotating program commands, e.g., with
linearization points, are generally incomplete since the consistency mechanism em-
ployed by any given implementation may not admit characterization according to a
given static annotation scheme; the Herlihy-Wing Queue, whose linearization points
depend on the results of future actions, is a prototypical example [18]. Likewise, our
systematic strategy for annotating implementations with lin and vis commands (§3)
can fail to prove consistency of future-dependent operations. However, we have yet
to observe any practical occurrence of such exotic objects; our strategy is sufficient
for verifying the weakly-consistent algorithms implemented in the Java development
kit. As a theoretical curiosity for future work, investigating the potential for complete
annotation strategies would be interesting, e.g., for restricted classes of data types
and/or implementations.
302 S. Krishna et al.

Finally, while civl’s high-degree of automation facilitated rapid prototyping of

our simulation proofs, its underlying foundation using Owicki-Gries style proof rules
limits the potential for modular reasoning. In particular, while our weak-consistency
proofs are thread-modular, our invariants and intermediate assertions necessarily talk
about state shared among multiple threads. Since our simulation-based methodology
and annotations are completely orthogonal to the underlying program logic, it would
be interesting future work to apply our methodology using expressive logics like Rely-
Guarantee, e.g. [19, 38], or variations of Concurrent Separation Logic, e.g. [28, 32, 34,
35, 4, 20]. It remains to be seen to what degree increased modularity may sacrifice
automation in the application of our weak-consistency proof methodology.

Acknowledgments This material is based upon work supported by the National

Science Foundation under Grant No. 1816936, and the European Research Council
(ERC) under the European Union’s Horizon 2020 research and innovation programme
(grant agreement No 678177).
Verifying Visibility-Based Weak Consistency 303

A Appendix: Proofs to Theorems and Lemmas

Lemma 1. The abstract executions E(W ) of a specification W are consistent with W .
Proof. Any complete, sequential, and absolute execution is consistent by definition,
since the labeling of its linearization is taken from the sequential specification. Then,
any happens-before weakening is consistent for exactly the same reason as its source
execution, since its linearization and visibility projection are both identical. Finally, any
visibility weakening is consistent by the condition of W -consistency in its definition.

Lemma 2. A weak-visibility specification and its transition system have identical histo-
ries.
Proof. It follows almost immediately that the abstract executions of W s are identical
to those of W , since W s ’s state effectively records the abstract execution of a given
AETS execution, and only enables those returns that are consistent with W . Since
histories are the projections of abstract executions, the corresponding history sets are
also identical.
Theorem 1. A witnessing implementation I is consistent with a weak-visibility specifi-
cation W if the transition system W s of W simulates I.
Proof. This follows from standard arguments, given that the corresponding SLTSs
include ε transitions to ensure that every move of one system can be matched by
stuttering from the other: since both systems synchronize on the call, ret, hb, lin, and
vis actions, the simulation guarantees that every abstract execution, and thus history,
of I is matched by one of W s . Then by Lemma 2, the histories of I are included in
W.
Theorem 3. Rchm is a simulation relation from Ichm to Wm s .
Proof Sketch. We show that every step of the implementation, i.e., an atomic section
or a program command, is simulated by Wm s . Given
q, e ∈ Rchm , we consider the
different implementation steps which are possible in q.
The case of commands corresponding to procedure calls of put and get is trivial.
Executing a procedure call in q leads to a new state q which differs only by having
call(o,_,_)
a new active operation o. We have that e −−−−−−→ e and
q , e ∈ Rchm where e
is obtained from e by adding o with an appropriate value of inv (o) and an empty
visibility.
The transition corresponding to the atomic section of put is labeled by a sequence
of visibility actions (one for each linearized operation) followed by a linearization
action. Let σ denote this sequence of actions. This transition leads to a state q where
the array table may have changed (unless writing the same value), and the history
variable lin is extended with the put operation o executing this step. We define an
abstract execution e from e by changing lin to the new value of lin, and defining an
σ
absolute visibility for o. We have that e − → e because e is consistent with Wm . Also,

q , e ∈ Rchm because the validity of (3), (4), and (5) follow directly from the definition
304 S. Krishna et al.

of e . The atomic section of get can be handled in a similar way. The simulation of
return actions of get operations is a direct consequence of point (6) which ensures
consistency with S.
For has, we focus on the atomic sections containing vis commands and the lin-
earization commands (the other internal steps are simulated by steps of Wm s , and
the simulation of the return step follows directly from (7) which justifies the consis-
tency of the return value). The atomic section around the procedure call corresponds
to a transition labeled by a sequence σ of visibility actions (one for each linearized
modifier) and leads to a state q with a new active has operation o (compared to q).
σ
We have that e − → e because e is consistent with Wm . Indeed, the visibility of o in
e is not constrained since o has not been linearized and the Wm -consistency of e
follows from the Wm -consistency of e. Also,
q , e ∈ Rchm because index (o) = −1
and (7) is clearly valid. The atomic section around the read of table[k] is simulated
by Wm s in a similar way, noticing that (7) models precisely the effect of the visibility
commands inside this atomic section. For the simulation of the linearization commands
is important to notice that any active has operation in e has a visibility that contains
all modifiers which returned before it was called and as explained above, this visibility
is monotonic.

References

[1] Abadi, M., Lamport, L.: The existence of refinement mappings. Theor. Comput.
Sci. 82(2), 253–284 (1991)
[2] Abdulla, P.A., Haziza, F., Holík, L., Jonsson, B., Rezine, A.: An integrated specifica-
tion and verification technique for highly concurrent data structures for highly
concurrent data structures. STTT 19(5), 549–563 (2017)
[3] Amit, D., Rinetzky, N., Reps, T.W., Sagiv, M., Yahav, E.: Comparison under abstrac-
tion for verifying linearizability. In: CAV. Lecture Notes in Computer Science,
vol. 4590, pp. 477–490. Springer (2007)
[4] Blom, S., Darabi, S., Huisman, M., Oortwijn, W.: The vercors tool set: Verification
of parallel and concurrent software. In: IFM. Lecture Notes in Computer Science,
vol. 10510, pp. 102–110. Springer (2017)
[5] Bouajjani, A., Emmi, M., Enea, C., Hamza, J.: On reducing linearizability to state
reachability. Inf. Comput. 261(Part), 383–400 (2018)
[6] Bouajjani, A., Emmi, M., Enea, C., Mutluergil, S.O.: Proving linearizability using
forward simulations. In: CAV (2). Lecture Notes in Computer Science, vol. 10427,
pp. 542–563. Springer (2017)
[7] Burckhardt, S., Gotsman, A., Yang, H., Zawirski, M.: Replicated data types: specifi-
cation, verification, optimality. In: POPL. pp. 271–284. ACM (2014)
[8] Chakraborty, S., Henzinger, T.A., Sezgin, A., Vafeiadis, V.: Aspect-oriented lin-
earizability proofs. Logical Methods in Computer Science 11(1) (2015)
Verifying Visibility-Based Weak Consistency 305

[9] Delbianco, G.A., Sergey, I., Nanevski, A., Banerjee, A.: Concurrent data structures
linked in time. In: ECOOP. LIPIcs, vol. 74, pp. 8:1–8:30. Schloss Dagstuhl - Leibniz-
Zentrum fuer Informatik (2017)
[10] Derrick, J., Dongol, B., Schellhorn, G., Tofan, B., Travkin, O., Wehrheim, H.: Qui-
escent consistency: Defining and verifying relaxed linearizability. In: FM. Lecture
Notes in Computer Science, vol. 8442, pp. 200–214. Springer (2014)
[11] Dongol, B., Jagadeesan, R., Riely, J., Armstrong, A.: On abstraction and composi-
tionality for weak-memory linearisability. In: VMCAI. Lecture Notes in Computer
Science, vol. 10747, pp. 183–204. Springer (2018)
[12] Dragoi, C., Gupta, A., Henzinger, T.A.: Automatic linearizability proofs of con-
current objects with cooperating updates. In: CAV. Lecture Notes in Computer
Science, vol. 8044, pp. 174–190. Springer (2013)
[13] Emmi, M., Enea, C.: Weak-consistency specification via visibility relaxation.
PACMPL 3(POPL), 60:1–60:28 (2019)
[14] Haas, A., Henzinger, T.A., Holzer, A., Kirsch, C.M., Lippautz, M., Payer, H., Sezgin,
A., Sokolova, A., Veith, H.: Local linearizability for concurrent container-type
data structures. In: CONCUR. LIPIcs, vol. 59, pp. 6:1–6:15. Schloss Dagstuhl -
Leibniz-Zentrum fuer Informatik (2016)
[15] Hawblitzel, C., Petrank, E.: Automated verification of practical garbage collectors.
Logical Methods in Computer Science 6(3) (2010)
[16] Hawblitzel, C., Petrank, E., Qadeer, S., Tasiran, S.: Automated and modular refine-
ment reasoning for concurrent programs. In: CAV (2). Lecture Notes in Computer
Science, vol. 9207, pp. 449–465. Springer (2015)
[17] Henzinger, T.A., Kirsch, C.M., Payer, H., Sezgin, A., Sokolova, A.: Quantitative
relaxation of concurrent data structures. In: POPL. pp. 317–328. ACM (2013)
[18] Herlihy, M., Wing, J.M.: Linearizability: A correctness condition for concurrent
objects. ACM Trans. Program. Lang. Syst. 12(3), 463–492 (1990)
[19] Jones, C.B.: Specification and design of (parallel) programs. In: IFIP Congress. pp.
321–332. North-Holland/IFIP (1983)
[20] Jung, R., Krebbers, R., Jourdan, J., Bizjak, A., Birkedal, L., Dreyer, D.: Iris from the
ground up: A modular foundation for higher-order concurrent separation logic. J.
Funct. Program. 28, e20 (2018)
[21] Khyzha, A., Dodds, M., Gotsman, A., Parkinson, M.J.: Proving linearizability using
partial orders. In: ESOP. Lecture Notes in Computer Science, vol. 10201, pp. 639–
667. Springer (2017)
[22] Lahav, O., Vafeiadis, V.: Owicki-gries reasoning for weak memory models. In:
ICALP (2). Lecture Notes in Computer Science, vol. 9135, pp. 311–323. Springer
(2015)
[23] Leino, K.R.M.: Dafny: An automatic program verifier for functional correctness.
In: LPAR (Dakar). Lecture Notes in Computer Science, vol. 6355, pp. 348–370.
Springer (2010)
[24] Liang, H., Feng, X.: Modular verification of linearizability with non-fixed lineariza-
tion points. In: PLDI. pp. 459–470. ACM (2013)
[25] Lynch, N.A., Vaandrager, F.W.: Forward and backward simulations: I. untimed
systems. Inf. Comput. 121(2), 214–233 (1995)
306 S. Krishna et al.

[26] Michael, M.M., Scott, M.L.: Simple, fast, and practical non-blocking and blocking
concurrent queue algorithms. In: PODC. pp. 267–275. ACM (1996)
[27] Moskal, M., Lopuszanski, J., Kiniry, J.R.: E-matching for fun and profit. Electr.
Notes Theor. Comput. Sci. 198(2), 19–35 (2008)
[28] O’Hearn, P.W.: Resources, concurrency and local reasoning. In: CONCUR. Lecture
Notes in Computer Science, vol. 3170, pp. 49–67. Springer (2004)
[29] Owicki, S.S., Gries, D.: Verifying properties of parallel programs: An axiomatic
approach. Commun. ACM 19(5), 279–285 (1976)
[30] Piskac, R., Wies, T., Zufferey, D.: Grasshopper - complete heap verification with
mixed specifications. In: TACAS. Lecture Notes in Computer Science, vol. 8413,
pp. 124–139. Springer (2014)
[31] Raad, A., Doko, M., Rozic, L., Lahav, O., Vafeiadis, V.: On library correctness under
weak memory consistency: specifying and verifying concurrent libraries under
declarative consistency models. PACMPL 3(POPL), 68:1–68:31 (2019)
[32] Reynolds, J.C.: Separation logic: A logic for shared mutable data structures. In:
LICS. pp. 55–74. IEEE Computer Society (2002)
[33] Schellhorn, G., Wehrheim, H., Derrick, J.: How to prove algorithms linearisable. In:
CAV. Lecture Notes in Computer Science, vol. 7358, pp. 243–259. Springer (2012)
[34] Sergey, I., Nanevski, A., Banerjee, A.: Mechanized verification of fine-grained
concurrent programs. In: PLDI. pp. 77–87. ACM (2015)
[35] Sergey, I., Nanevski, A., Banerjee, A., Delbianco, G.A.: Hoare-style specifications
as correctness conditions for non-linearizable concurrent objects. In: OOPSLA.
pp. 92–110. ACM (2016)
[36] Sofronie-Stokkermans, V.: Hierarchic reasoning in local theory extensions. In:
CADE. Lecture Notes in Computer Science, vol. 3632, pp. 219–234. Springer (2005)
[37] Vafeiadis, V.: Shape-value abstraction for verifying linearizability. In: VMCAI.
Lecture Notes in Computer Science, vol. 5403, pp. 335–348. Springer (2009)
[38] Vafeiadis, V.: Automatically proving linearizability. In: CAV. Lecture Notes in
Computer Science, vol. 6174, pp. 450–464. Springer (2010)
[39] Vafeiadis, V.: Rgsep action inference. In: VMCAI. Lecture Notes in Computer
Science, vol. 5944, pp. 345–361. Springer (2010)
[40] Wadler, P.: Linear types can change the world! In: Programming Concepts and
Methods. p. 561. North-Holland (1990)
[41] Zhu, H., Petri, G., Jagannathan, S.: Poling: SMT aided linearizability proofs. In:
CAV (2). Lecture Notes in Computer Science, vol. 9207, pp. 3–19. Springer (2015)
Verifying Visibility-Based Weak Consistency 307

Siddharth Krishna1 , Alexander J. Summers2 , and Thomas Wies1

1
New York University, New York, NY, USA, {siddharth,wies}@cs.nyu.edu
2
ETH Zürich, Zurich, Switzerland, [email protected]

Abstract. Separation logics are widely used for verifying programs that manipu-
late complex heap-based data structures. These logics build on so-called separation
algebras, which allow expressing properties of heap regions such that modifica-
tions to a region do not invalidate properties stated about the remainder of the heap.
This concept is key to enabling modular reasoning and also extends to concurrency.
While heaps are naturally related to mathematical graphs, many ubiquitous graph
properties are non-local in character, such as reachability between nodes, path
lengths, acyclicity and other structural invariants, as well as data invariants which
combine with these notions. Reasoning modularly about such graph properties
remains notoriously difficult, since a local modification can have side-effects on a
global property that cannot be easily confined to a small region.
In this paper, we address the question: What separation algebra can be used to
avoid proof arguments reverting back to tedious global reasoning in such cases?
To this end, we consider a general class of global graph properties expressed as
fixpoints of algebraic equations over graphs. We present mathematical foundations
for reasoning about this class of properties, imposing minimal requirements on the
underlying theory that allow us to define a suitable separation algebra. Building
on this theory, we develop a general proof technique for modular reasoning about
global graph properties expressed over program heaps, in a way which can be
directly integrated with existing separation logics. To demonstrate our approach,
we present local proofs for two challenging examples: a priority inheritance
protocol and the non-blocking concurrent Harris list.

1 Introduction
Separation logic (SL) [31,37] provides the basis of many successful verification tools that
can verify programs manipulating complex data structures [1, 4, 17, 29]. This success is
due to the logic’s support for reasoning modularly about modifications to heap-based data.
For simple inductive data structures such as lists and trees, much of this reasoning can
be automated [2, 11, 20, 33]. However, these techniques often fail when data structures
are less regular (e.g. multiple overlaid data structures) or provide multiple traversal
patterns (e.g. threaded trees). Such idioms are prevalent in real-world implementations
such as the fine-grained concurrent data structures found in operating systems and
databases. Solutions to these problems have been proposed [14] but remain difficult to
automate. For proofs of general graph algorithms, the situation is even more dire. Despite
substantial improvements in the verification methodology for such algorithms [35, 38],
significant parts of the proof argument still typically need to be carried out using non-
local reasoning [7, 8, 13, 25]. This paper presents a general technique for local reasoning

c The Author(s) 2020

P. Müller (Ed.): ESOP 2020, LNCS 12075, pp. 308–335, 2020.
https://doi.org/10.1007/978-3-030-44914-8_ 12
Local Reasoning for Global Graph Properties 309

0 1
1 method acquire(p: Node, r: Node) {
2 if (r.next == null) { r1 p2
3 r.next := p; update(p, -1, r.curr_prio)
4 } else { ∅ {0}
5 p.next := r; update(r, -1, p.curr_prio)
6 } 3 p1 {2} 0 r2 {1}
7 }
8 method update(n: Node, from: Int, to: Int) { 0 2
9 n.prios := n.prios \ {from}
10 if (to >= 0) n.prios := n.prios ∪ {to} r3 p3
11 from := n.curr_prio
{1, 2, 2} {1}
12 n.curr_prio := max(n.prios ∪ {n.def_prio})
13 to := n.curr_prio;
14 if (from != to && n.next != null) { 1 p4 ∅ 2 p5 ∅ 2 p6 ∅
15 update(n.next, from, to)
16 }
17 } r4
2 p7 ∅ 0 {2, 2}

Fig. 1: Pseudocode of the PIP and a state of the protocol data structure. Round nodes
represent processes and rectangular nodes resources. Nodes are marked with their default
priorities def_prio as well as the aggregate priority multiset prios. A node’s current
priority curr_prio is underlined and marked in bold blue.

about global graph properties that can be used within off-the-shelf separation logics.
We demonstrate our technique using two challenging examples for which no fully local
proof existed before, respectively, whose proof required a tailor-made logic.
As a motivating example, we consider an idealized priority inheritance protocol (PIP),
a technique used in process scheduling [39]. The purpose of the protocol is to avoid
priority inversion, i.e. a situation where a low-priority process causes a high-priority
process to be blocked. The protocol maintains a bipartite graph with nodes representing
processes and resources. An example graph is shown in Fig. 1. An edge from a process
p to a resource r indicates that p is waiting for r to be available whereas an edge in
the other direction means that r is currently held by p. Every node has an associated
default priority and current; these are natural numbers. The current priority is used for
scheduling processes. When a process attempts to acquire a resource currently held by
another process, the graph is updated to avoid priority inversion. For example, when
process p1 with current priority 3 attempts to acquire the resource r1 held by process
p2 of priority 1, p1 ’s higher priority is propagated to p2 and, transitively, to any other
process that p2 is waiting for (p3 in this case). As a result, all nodes on the created cycle3
will get current priority 3. The protocol maintains the following invariant: the current
priority of each node is the maximum of its default priority and the current priorities of
all its predecessors. Priority propagation is implemented by the method update shown
in Fig 1. The implementation represents graph edges by next pointers and handles both
adding an edge (acquire) and removing one (release - code omitted). To recalculate
the current priority of a node (line 12), each node maintains its default priority def_prio
and a multiset prios which contains the priorities of all its immediate predecessors.
Verifying that the PIP maintains its invariant using established separation logic (SL)
techniques is challenging. In general, SL assertions describe resources and express the
fact that the program has permission to access and manipulate these resources. In what
3
The cycle can be used to detect/handle a deadlock; this is not the concern of this data structure.
310 S. Krishna et al.

follows, we stick to the standard model of SL where resources are memory regions
represented as partial heaps. We sometimes view partial heaps more abstractly as partial
graphs (hereafter, simply graphs). Assertions describing larger regions are built from
smaller ones using separating conjunction, φ1 ∗ φ2 . Semantically, the ∗ operator is tied to
a notion of resource composition defined by an underlying separation algebra [5, 6]. In
the standard model, composition enforces that φ1 and φ2 must describe disjoint regions.
The logic and algebra are set up so that changes to the region φ1 do not affect φ2 (and
vice versa). That is, if φ1 ∗ φ2 holds before the modification and φ1 is changed to φ1 ,
then φ1 ∗ φ2 holds afterwards. This so-called frame rule enables modular reasoning
about modifications to the heap and extends well to the concurrent setting when threads
operate on disjoint portions of memory [3, 9, 10, 36]. However, the mere fact that φ2 is
preserved by modifications to φ1 does not guarantee that if a global property such as the
PIP invariant holds for φ1 ∗ φ2 , it also still holds for φ1 ∗ φ2 .
For example, consider the PIP scenario depicted in Fig. 1. If φ1 describes the
subgraph containing only node p1 , φ2 the remainder of the graph, and φ1 the graph
obtained from φ1 by adding the edge from p1 to r1 , then the PIP invariant will no longer
hold for the new composed graph described by φ1 ∗ φ2 . On the other hand, if φ1 captures
p1 and the nodes reachable from r1 (i.e., the set of nodes modified by update), φ2 the
remainder of the graph, and we reestablish the PIP invariant locally in φ1 obtaining φ1
(i.e., run update to completion), then φ1 ∗ φ2 will also globally satisfy the PIP invariant.
The separating conjunction ∗ is not sufficient to differentiate these two cases; both
describe valid partitions of a possible program heap. As a consequence, prior techniques
have to revert back to non-local reasoning to prove that the invariant is maintained.
A first helpful idea towards a solution to this problem is that of iterated separating
conjunction [30, 44], which describes a graph G consisting of a set of nodes X by a
∗
formula Ψ = x∈X N(x) where N(x) is some predicate that holds locally for every
node x ∈ X. Using such node-local conditions one can naturally express non-inductive
properties of graphs (e.g. “G has no outgoing edges” or “G is bipartite”). The advan-
tages of this style of specification are two-fold. First, one can arbitrarily decompose
and recompose Ψ by splitting X into disjoint subsets. For example, if X is partitioned
∗ ∗
into X1 and X2 , then Ψ is equivalent to x∈X1 N(x) ∗ x∈X2 N(x). Moreover, it is
very easy to prove that Ψ is preserved under modifications of subgraphs. For instance,
∗
if a program modifies the subgraph induced by X1 such that x∈X1 N(x) is preserved
locally, then the frame rule guarantees that Ψ will be preserved in the new larger graph.
Iterated separating conjunction thus yields a simple proof technique for local reasoning
about graph properties that can be described in terms of node-local conditions. However,
this idea alone does not actually solve our problem because general global graph proper-
ties such as “G is a direct acyclic graph”, “G is an overlay of multiple trees”, or “G
satisfies the PIP invariant” cannot be directly described via node-local conditions.

Solution. The key ingredient of our approach is the concept of a flow of a graph: a
function fl from the nodes of the graph to flow values. For the PIP, the flow maps
each node to the multiset of its incoming priorities. In general, a flow is a fixpoint of
a set of algebraic equations induced by the graph. These equations are defined over a
flow domain, which determines how flow values are propagated along the edges of the
graph and how they are aggregated at each node. In the PIP example, an edge between
Local Reasoning for Global Graph Properties 311

nodes (n, n ) propagates the multiset containing max(fl (n), n.def_prio) from n to
n . The multisets arriving at n are aggregated with multiset union to obtain fl (n ).
Flows enable capturing global graph properties in terms of node-local conditions. For
example, the PIP invariant can be expressed by the following node-local condition:
n.curr_prio = max(fl (n), n.def_prio). To enable compositional reasoning about
such properties we need an appropriate separation algebra allowing us to prove locally
that modifications to a subgraph do not affect the flow of the remainder of the graph.
To this end, we make the useful observation that a separation algebra induces a
notion of an interface of a resource: we say that two resources a and a are equivalent
if they compose with the same resources. The interface of a resource a could then be
defined as a’s equivalence class, but more-succinct and simpler representations may be
possible. In the standard model of SL where resources are graphs and composition is
disjoint graph union, the interface of a graph G is the set of all graphs G that have the
same domain as G; in this model, a graph’s domain could be defined to be its interface.
The interfaces of resources described by assertions capture the information that is
implicitly communicated when these assertions are conjoined by separating conjunction.
As we discussed earlier, in the standard model of SL, this information is too weak to
enable local reasoning about global properties of the composed graphs because some
additional information about the subgraphs’ structure other than which nodes they
contain must be communicated. For instance, if the goal is to verify the PIP invariant, the
interfaces must capture information about the multisets of priorities propagated between
the subgraphs. We define a separation algebra achieving exactly this: the induced flow
interface of a graph G in this separation algebra captures how values of the flow domain
must enter and leave G such that, when composed with a compatible graph G , the
imposed local conditions on the flow of each node are satisfied in the composite graph.
This is the key to enabling SL-style framing for global graph properties. Using iter-
ated separating conjunctions over the new separation algebra, we obtain a compositional
proof technique that yields succinct proofs of programs such as the PIP, whose proofs
with existing techniques would involve non-trivial global reasoning steps.

Contributions. In §2, we present mathematical foundations for ﬂow domains, imposing

the minimal requirements on the underlying algebra that allow us to capture a broad
range of data structure invariants and graph properties and reason locally about them in a
suitable separation algebra. Building on this theory we develop a general proof technique
for modular reasoning about global graph properties that can be integrated with existing
separation logics (§3). We further identify general mathematical conditions that can be
used when desired to guarantee unique ﬂows, and provide local proof arguments to check
the preservation of these conditions (§4). We demonstrate the versatility of our approach
by presenting local proofs for two challenging examples: the PIP and the concurrent
non-blocking list due to Harris [12].

Flows Redesigned. Our work is inspired by the recent flow framework explored by
some of the authors [22], but was redesigned from the ground up. We revisit the core
algebra behind flow reasoning, and derive a different algebraic foundation by analysing
the minimal requirements for general local reasoning; we call our newly-designed
reasoning framework the foundational flow framework. Our new framework makes
312 S. Krishna et al.

several signiﬁcant improvements over [22] and eliminates its most stark limitations. We
provide a detailed technical comparison with [22] and discuss other related work in §5.

2 The Foundational Flow Framework

In this section, we introduce the foundational flow framework, explaining the motivation
for its design with respect to local reasoning principles. We aim for a general technique
for modularly proving the preservation of recursively-defined invariants over (partial)
graphs, with well-defined decomposition and composition operations.

2.1 Preliminaries and Notation

The term (b ? t1 : t2 ) denotes t1 if condition b holds and t2 otherwise. We write f : A →
B for a function from A to B, and f : A B for a partial function from A to B. For a
partial function f , we write f (x) = ⊥ if f is undefined at x. We use lambda notation
(λx. E) to denote a function that maps x to the expression E (typically containing x). If
f is a function from A to B, we write f [x y] to denote the function from A ∪ {x}
defined by f [x y](z) := (z = x ? y : f (z)). We use {x1 y1 , . . . , xn yn } for
pairwise different xi to denote the function [x1 y1 ] · · · [xn yn ], where is the
function on an empty domain. Given functions f1 : A1 → B and f2 : A2 → B we write
f1 f2 for the function f : A1 A2 → B that maps x ∈ A1 to f1 (x) and x ∈ A2 to
f2 (x) (if A1 and A2 are not disjoint sets, f1 f2 is undefined).
We write δn=n : M → M for the function defined by δn=n (m) := m if n = n
else 0. We also write λ0 := (λm. 0) for the identically zero function, λid := (λm. m)
for the identity function, and use e ≡ e to denote function equality. For e : M → M and
m ∈ M we write m e to denote the function application e(m). We write e ◦ e to denote
function composition, i.e. (e ◦ e )(m) = e(e (m)) for m ∈ M , and use superscript
notation ep to denote the function composition of e with itself p times.
For multisets S, we use standard set notation when clear from the context. We write
S(x) to denote the number of occurrences of x in S. We write {x1 i1 , . . . , xn in }
for the multiset containing i1 occurrences of x1 , i2 occurrences of x2 , etc.
A partial monoid is a set M , along with a partial binary operation + : M ×
M M , and a special zero element 0 ∈ M , such that (1) + is associative, i.e.,
(m1 + m2 ) + m3 = m1 + (m2 + m3 ); and (2) 0 is an identity, i.e., m + 0 = 0 + m = m.
Here, = means either both sides are defined and equal, or both are undefined. We
identify a partial monoid with its support set M . If + is a total function, then we call
M a monoid. Let m1 , m2 , m3 ∈ M be arbitrary elements of the (partial) monoid in
the following. We call a (partial) monoid M commutative if + is commutative, i.e.,
m1 + m2 = m2 + m1 . Similarly, a commutative monoid M is cancellative if + is
cancellative, i.e., if m1 + m2 = m1 + m3 is defined, then m2 = m3 .
A separation algebra [5] is a cancellative, partial, commutative monoid.

2.2 Flows
Recursive properties of graphs naturally depend on non-local information; e.g. we cannot
express that a graph is acyclic directly as a conjunction of per-node invariants. Our
Local Reasoning for Global Graph Properties 313

foundational flow framework defines flow values at each node that capture non-local
graph properties, and enables local specification and reasoning about such properties.
Flow values are drawn from a flow domain, an algebraic structure which also specifies
the operations used to define a flow via recursive computations over the graph. Our
entire theory is parametric with the choice of a flow domain, whose components will be
explained and motivated in the rest of this section.

Deﬁnition 1 (Flow Domain). A ﬂow domain (M, +, 0, E) consists of a commutative

cancellative (total) monoid (M, +, 0) and a set of edge functions E ⊆ M → M .

Example 1. The path-counting flow domain is (N, +, 0, {λid , λ0 }), consisting of the
monoid of natural numbers under addition and the set of edge functions containing only
the identity function and the zero function. This can be used to define a flow where the
values at each node represent the number of paths to this node from a distinguished node
n. Path-counting provides enough information to express locally per node that e.g. (a)
all nodes are reachable from n (all path counts are non-zero), or (b) that the graph forms
a tree rooted at n (all path counts are exactly 1).

Example 2. We use (NN , ∪, ∅, {λ0 } ∪ {(λm. {max(m ∪ {p})}) | p∈N}) as flow do-
main for the PIP example (Figure 1). This consists of the monoid of multisets of natural
numbers under multiset union and two kinds of edge functions: λ0 and functions map-
ping a multiset m to the singleton multiset containing the maximum value between m
and a fixed value p (used to represent a node’s default priority). This can define a flow
which locally captures the appropriate current node priorities as the graph is modified.

Further definitions in this section assume a fixed flow domain (M, +, 0, E) and a
(potentially infinite) set of nodes N. For this section, we abstract heaps using directed
partial graphs; integration of our graph reasoning with direct proofs over program heaps
is explained in §3.

Deﬁnition 2 (Graph). A (partial) graph G = (N, e) consists of a ﬁnite set of nodes

N ⊆ N and a mapping from pairs of nodes to edge functions e : N × N → E.

Flow Values and Flows. Flow values (taken from M ; the first element of a flow domain)
are used to capture sufficient information to express desired non-local properties of a
graph. In Example 1, flow values are non-negative integers; for the PIP (Example 2)
we instead use multisets of integers, representing relevant non-local information: the
priorities of nodes currently referencing a given node in the graph. Given such flow values,
a node’s correct priority can be defined locally per node in the graph. This definition
requires only the maximum value of these multisets, but as we will see shortly these
multisets enable local recomputation of a correct priority when the graph is changed.
For a graph G = (N, e) we express properties of G in terms of node-local conditions
that may depend on the nodes’ flow. A flow is a function fl : N → M assigning every
node a flow value and must be some fixpoint of the following flow equation:

ﬂ (n ) e(n , n)

∀n ∈ N. ﬂ (n) = in(n) + (FlowEqn)
n ∈N
314 S. Krishna et al.

Intuitively, one can think of the flow as being obtained by a fold computation over the
graph:4 the inflow in : N → M defines an initial flow at each node. This initial flow
is then updated recursively for each node n: the current flow value at its predecessor
nodes n is transferred to n via edge functions e(n , n) : M → M . These flow values are
aggregated using the summation operation + of the flow domain to obtain an updated
flow of n; a flow for the graph is some fixpoint satisfying this equation at all nodes. 5
Definition 3 (Flow Graph). A flow graph H = (N, e, fl ) is a graph (N, e) and function
fl : N → M such that there exists an inflow in : N → M satisfying FlowEqn(in, e, fl ).
We let dom(H) = N , and sometimes identify H and dom(H) to ease notational
burden. For n ∈ H we write Hn for the singleton flow subgraph of H induced by n.

Edge Functions. In any flow graph, the flow value assigned to a node n by a flow
is propagated to its neighbours n (and transitively) according to the edge function
e(n, n ) labelling the edge (n, n ). The edge function maps the flow value at the source
node n to one propagated on this edge to the target node n . Note that we require such
a labelling for all pairs consisting of a source node n inside the graph and a target
node n ∈ N (i.e., possibly outside the graph). The 0 flow value (the third element
of our flow domains) is used to represent no flow; the corresponding (constant) zero
function λ0 = (λm. 0) is used as edge function to model the absence of an edge in the
graph. A set of edge functions E from which this labelling is chosen can, other than
the requirement λ0 ∈ E, be chosen as desired. As we will see in §4.4, restrictions to
particular sets of edge functions E can be exploited to further strengthen our overall
technique. Edge functions can depend on the local state of the source node (as in the
following example); dependencies from elsewhere in the graph must be represented by
the node’s flow.
Example 3. Consider the graph in Figure 1 and the flow domain as in Example 2. We
choose the edge functions to be λ0 where no edge exists in the PIP structure, and other-
wise (λm. {max(m ∪ {d})}) where d is the default priority of the source of the edge.
For example, in Figure 1, e(r3 , p2 ) = λ0 and e(r3 , p1 ) = (λm. {max(m ∪ {0})}).
Since the flow value at r3 is {1, 2, 2}, the edge (r3 , p1 ) propagates the value {2} to p1 ,
correctly representing the current priority of r3 .

Flow Aggregation and Inflows. The flow value at a node is defined by those propagated
to it from each node in a graph via edge functions, along with an additional inflow value
explained here. Since multiple non-zero flow values can be propagated to a node, we
require an aggregation of these values via a binary + operator on flow values : the second
element of our flow domains. The edges from which the aggregated values originate
are unordered. Thus, we require + to be commutative and associative, making this
aggregation order-independent. The 0 flow value must act as a unit for +. For example,
in the path-counting flow domain + means addition on natural numbers, while for the
multisets employed for the PIP it means multiset union.
4
We note that flows are not generally defined in this manner as we consider any fixpoint of the
flow equation to be a flow. Nonetheless, the analogy helps to build an initial intuition.
5
We discuss questions regarding the existence and uniqueness of such fixpoints in §4.
Local Reasoning for Global Graph Properties 315

Each node in a flow graph has an inflow, modelling contributions to its flow value
which do not come from inside the graph. Inflows play two important roles: first, since
our graphs are partial, they model contributions from nodes outside of the graph. Second,
inflow can be artificially added as a means of specialising the computation of flow values
to characterise specific graph properties. For example, in the path-counting domain, we
give an inflow of 1 to the node from which we are counting paths, and 0 to all others.

Example 4. Let the edges in the graph in Figure 1 be labelled as described in Example 3.
If the inflow function in assigns the empty multiset to every node n and we let fl (n) be
the multiset labelling every node in the figure, then FlowEqn(in, e, fl ) holds.

The flow equation (FlowEqn) defines the flow of a node n to be the aggregation of
flow values coming from other nodes n inside the graph (as given by the respective edge
function e(n , n)) as well as the inflow in(n). Preserving solutions to this equation across
updates to the graph structure is a fundamental goal of our technique. The following
lemma (which relies on the fact that + is required to be cancellative) states that any
correct flow values uniquely determine appropriate inflow values:
Lemma 1. Given a flow graph (N, e, fl ), there exists a unique inflow in such that
FlowEqn(in, e, fl ).
We now turn to how solutions of the flow equation can be preserved or appropriately
updated under changes to the underlying graph.

Graph Updates and Cancellativity. Given a flow graph with known flow and inflow
values, suppose we remove an edge from n1 to n2 (replacing the edge function with
λ0 ). For the same inflow, such an update will potentially affect the flow at n2 and nodes
to which n2 (transitively) propagates flow. Starting from the simple case that n2 has
no outgoing edges, we need to recompute a suitable flow at n2 . Knowing the old flow
value (say, m) and the contribution m = fl (n1 ) e(n1 , n2 ) previously provided along
the removed edge, we know that the correct new flow value is some m such that
m + m = m. This constraint has a unique solution (and thus, we can unambiguously
recompute a new flow value) exactly when the aggregation + is cancellative; we therefore
make cancellativity a requirement on the + of any flow domain.
Cancellativity intuitively enforces that the flow domain carries enough information
to enable adaptation to local updates (in particular, removal of edges6 ). Returning to the
PIP example, cancellativity requires us to carry multisets as flow values rather than only
the maximum priority value: + cannot be the maximum operation, as this would not be
cancellative. The resulting multisets (like the prio fields in the actual code) provide the
information necessary to recompute corrected priority values locally.
For example, in the PIP graph shown in Figure 1, removing the edge from p6 to
r4 would not affect the current priority of r4 whereas if p7 had current priority 1 instead
of 2, then the current priority of r4 would have to decrease. In either case, recomputing
the flow value for r4 is simply a matter of subtraction (removing {2} from the multiset at
r4 ); cancellativity guarantees that our flow domains will always provide the information
6
As we will show in §2.3, an analogous problem for composition of flow graphs is also directly
solved by this choice to force aggregation to be cancellative.
316 S. Krishna et al.

needed for this recomputation. Without this property, the recomputation of a flow value
for the target node n2 would, in general, entail recomputing the incoming flow values
from all remaining edges from scratch. Cancellativity is also crucial for Lemma 1 above,
forcing uniqueness of inflows, given known flow values in a flow graph. This allows us
to define natural but powerful notions of flow graph decomposition and recomposition.

2.3 Flow Graph Composition and Abstraction

Building towards the core of our reasoning technique, we now turn to the question
of decomposition and recomposition of flow graphs. Two flow graphs with disjoint
domains always compose to a graph, but this will be a flow graph only if their flows are
chosen consistently to admit a solution to the resulting flow equation (i.e. the flow graph
composition operator defined below is partial).

Definition 4 (Flow Graph Algebra). The flow graph algebra (FG, , H∅ ) for the flow
domain (M, +, 0, E) is defined by

FG := {(N, e, fl ) | (N, e, fl ) is a flow graph} , H∅ := (∅, e∅ , fl ∅ ),

(N1 N2 , e1 e2 , fl 1 fl 2 ) if in FG
(N1 , e1 , fl 1 ) (N2 , e2 , fl 2 ) :=
⊥ otherwise,

where e∅ and ﬂ ∅ are the edge functions and ﬂow on the empty set of nodes N = ∅.

Intuitively, two flow graphs compose to a flow graph if their contributions to each
others’ flow (along edges from one to the other) are reflected in the corresponding inflow
of the other graph. For example, consider the subgraph from Figure 1 consisting of
the single node p7 (with 0 inflow). This will compose with the remainder of the graph
depicted only if this remainder subgraph has an inflow which, at node r4 , includes at
least the multiset {2}, reflecting the propagated value from p7 .
We use this intuition to extract an abstraction of flow graphs which we call flow
interfaces. Given a flow (sub)graph, its flow interface consists of the node-wise inflow
and outflow (the flow contributions its nodes make to all nodes outside of the graph,
defined below). It is thus an abstraction that hides the flow values and edges that are
wholly inside the flow graph. Flow graphs that have the same flow interface “look the
same” to the external graph, as the same values are propagated inwards and outwards.
Definition 5 (Flow Interface). For a given flow domain M , a flow interface is a pair
I = (in, out) where in : N → M and out : N \ N → M for some N ⊆ N.

We write I.in, I.out for the two components of the interface I = (in, out). We will
again sometimes identify I and dom(I.in) to ease notational burden.
Given a flow graph H ∈ FG, we can compute its interface as follows. Recall that
Lemma 1 implies that any flow graph has a unique inflow. Thus, we can define an inflow
function that maps each flow graph H = (N, e, fl ) to the unique inflow inf(H) : H →
M such that FlowEqn(inf(H), e, fl ). Dually, we define the outflow of H as the function
outf(H) : N \ N → M defined by outf(H)(n) := n ∈N fl (n ) e(n , n). The flow
interface of H, written int(H), is the pair (inf(H), outf(H)) consisting of its inflow
Local Reasoning for Global Graph Properties 317

and its outflow. Returning to the previous example, if H is the singleton subgraph
consisting of node p7 from Figure 1 with flow and edges as depicted, then int(H) =
(λn. ∅, λn. (n=r4 ? {2} : ∅)).
This abstraction, while simple, turns out to be powerful enough to build a separation
algebra over our flow graphs, allowing them to be decomposed, locally modified and
recomposed in ways yielding all the local reasoning benefits of separation logics. In
particular, for graph operations within a subgraph with a certain interface, we need to
prove: (a) that the modified subgraph is still a flow graph (by checking that the flow
equation still has a solution locally in the subgraph) and (b) that it satisfies the same
interface (in other words, the effect of the modification on the flow is contained within
the subgraph); the meta-level results for our technique then justify that we can recompose
the modified subgraph with any graph that the original could be composed with.
We define the corresponding flow interface algebra as follows:

Definition 6 (Flow Interface Algebra). For a given flow domain M , the flow interface
algebra over M is defined to be (FI, ⊕, I∅ ), where:

FI := {I | I is a ﬂow interface} , I∅ := int(H∅ ),

I I 1 ∩ I2 = ∅
⎧
⎪
⎪
∧ ∀i = j ∈ {1, 2} , n ∈ Ii . Ii .in(n) = I.in(n) + Ij .out(n)
⎪
⎨
I1 ⊕ I2 :=
⎪
⎪
⎪ ∧ ∀n ∈ I. I.out(n) = I1 .out(n) + I2 .out(n)
⊥ otherwise.
⎩

Flow interface composition is well-deﬁned because of cancellativity of the underlying

flow domain (it is also, exactly as flow graph composition, partial). We next show the
key result for this abstraction: the ability for two flow graphs to compose depends only
on their interfaces; flow interfaces implicitly define a congruence relation on flow graphs.

Lemma 2. int(H1 ) = I1 ∧ int(H2 ) = I2 ⇒ int(H1 H2 ) = I1 ⊕ I2 .

Crucially, the following result shows that we can use our ﬂow interfaces as an
abstraction directly compatible with existing separation logics.

Theorem 1. The ﬂow interface algebra (FI, ⊕, I∅ ) is a separation algebra.

This result forms the core of our reasoning technique; it enables us to make modifi-
cations within a chosen subgraph and, by proving preservation of its interface, know that
the result composes with any context exactly as the original did. Flow interfaces cap-
ture precisely the information relevant about a flow graph, with respect to composition
with other flow graphs. In Appendix B of the accompanying technical report (hereafter,
TR) [23] we provide additional examples of flow domains that demonstrate the range of
data structures and graph properties that can be expressed using flows, including a notion
of universal flow that in a sense provides a completeness result for the expressivity of
the framework. We now turn to constructing proofs atop these new reasoning principles.
318 S. Krishna et al.

3 Proof Technique

This section shows how to integrate flow reasoning into a standard separation logic,
using the priority inheritance protocol (PIP) algorithm to illustrate our proof techniques.
Since flow graphs and flow interfaces form separation algebras, it is possible in
principle to define a separation logic (SL) using these notions as a custom semantic
model (indeed, this is the proof approach taken in [22]). By contrast, we integrate flow
interfaces with a standard separation logic without modifying its semantics. This has
the important technical advantage that our proof technique can be naturally integrated
with existing separation logics and verification tools supporting SL-style reasoning. We
consider a standard sequential SL in this section, but our technique can also be directly
integrated with a concurrent SL such as RGSep (as we show in §4.5) or frameworks such
as Iris [18] supporting (ghost) resources ranging over user-defined separation algebras.

3.1 Encoding Flow-based Proofs in SL

Proofs using our flow framework can employ a combination of specifications enforced
at the node level and in terms of the flow graphs and interfaces corresponding to larger
heap regions such as entire data structures (henceforth, composite graphs and composite
interfaces). At the node level, we write invariants that every node is intended to satisfy,
typically relating the node’s flow value to its local state (fields). For example, in the PIP,
we use node-local invariants to express that a node’s current priority is the maximum of
the node’s default priority and those in its current flow value. We typically express such
specifications in terms of singleton (flow) graphs, and their singleton interfaces.
Specification in terms of composite interfaces has several important purposes. One
is to define custom inflows: e.g. in the path-counting flow domain, specifying that the
inflow of a composite interface is 1 at some designated node r and 0 elsewhere enforces
in any underlying flow graph that each node n’s flow value will be the number of paths
from r to n.7 Composite interfaces can also be used to express that, in two states of
execution, a portion of the heap “looks the same” with respect to composition (it has the
same interface, and so can be composed with the same flow graphs), or to capture by
how much there is an observable difference in inflow or outflow; we employ this idea in
the PIP proof below.
We now define an assertion syntax convenient for capturing both node-level and
composite-level constraints, defined within an SL-style proof system. We assume an intu-
itionistic, garbage-collected SL [6] with standard syntax and semantics:8 see Appendix A
of the TR [23] for more details.

Node Predicates. The basic building block of our flow-based specifications is a node
predicate N(x, H), representing ownership of the fields of a single node x, as well as
7
Note that the analogous property cannot be captured at the node level; when considering
singleton interfaces per node in a tree rooted at r, every singleton interface has an inflow of 1.
8
As P ∗ φ ≡ P ∧ φ for pure formulas P in garbage-collected SLs, we use ∗ instead of ∧
throughout this paper.
Local Reasoning for Global Graph Properties 319

capturing its corresponding singleton ﬂow graph H:

N(x, H) := ∃fs, fl . x → fs ∗ H = ({x} , (λy. edge(x, fs, y)), fl ) ∗ γ(x, fs, fl (x))

N is implicitly parameterised by fs, edge and γ; these are explained next and are typically
fixed across any given flow-based proof. The N predicate expresses that we have a heap
cell at location x containing fields fs (a list of field-name/value mappings).9 It also
says that H is a singleton flow graph with domain {x} with some flow fl , whose edge
functions are defined by a user-defined abstraction function edge(x, fs, y); this function
allows us to define edges in terms of x’s field values. Finally, the node, its fields, and
its flow in this flow graph satisfy the custom predicate γ, used to encode node-local
properties such as constraints in terms of the flow values of nodes.

Graph Predicates. The analogous predicate for composite graphs is Gr. It carries owner-
ship to the nodes making up a potentially unbounded graph, using iterated separating
conjunction over a set of nodes X as mentioned in §1:

Gr(X, H) := ∃H.
∗
N(x, H(x)) ∗ H =
x∈X

H(x)
x∈X

Gr is also implicitly parameterised by fs, edge and γ. The existentially-quantiﬁed H is

a logical variable representing a function from nodes in X to corresponding singleton
flow graphs. Gr(X, H) describes a set of nodes X, such that each x ∈ X is an N (in
particular, it satisfies γ), whose singleton flow graphs compose back to H. As well as
carrying ownership of the underlying heap locations, Gr’s definition allows us to connect
a node-level view of the region X (each H(x)) with a composite-level view defined by
H, on which we can impose appropriate graph-level properties such as constraints on
the region’s inflow.

Lifting to Interfaces. Flow based proofs can often be expressed more elegantly and
abstractly using predicates in terms of node and composite-level interfaces rather than
flow graphs. To this end, we overload both our node and graph predicates with analogues
whose second parameter is a flow interface, defined as follows:

N(x, I) := ∃H. N(x, H) ∗ I = int(H)

Gr(X, I) := ∃H. Gr(x, H) ∗ I = int(H)

We will use these versions in the PIP proof below; interfaces capture all relevant proper-
ties for decomposition and composition of these ﬂow graphs.

Flow Lemmas. We first illustrate our N and Gr predicates (which capture SL ownership
of heap regions and abstract these with flow interfaces) by identifying a number of
lemmas which are generically useful in flow-based proofs. Reasoning at the level of flow
interfaces is entirely in the pure world (mathematics independent of heap-ownership and
9
For simplicity, we assume that all fields of a flow graph node are to be handled by our flow-
based technique, and that their ownership (via → points-to predicates) is always carried around
together; lifting these restrictions would be straightforward.
320 S. Krishna et al.

Gr(X1 X2 , H) |= ∃H1 , H2 . Gr(X1 , H1 ) ∗ Gr(X2 , H2 )

∗ H 1
H2 = H (D ECOMP )
Gr(X1 , H1 ) ∗ Gr(X2 , H2 ) ∗ H1
H2 = ⊥ |= Gr(X1 X2 , H1
H2 ) (C OMP )
N(x, H) ≡ Gr({x} , H) (S ING )
emp |= Gr(∅, H∅ ) (G R E MP )
Gr(X1 , H1 ) ∗ Gr(X2 , H2 ) ∗ H = H1
H2 |= Gr(X1 X2 , H1
H2 ) (R EPL )
∗ int(H1 ) = int(H1 ) ∗ int(H) = int(H1
H2 )

Fig. 2: Some useful lemmas for proving entailments between ﬂow-based speciﬁcations.

resources) with respect to the underlying SL reasoning; these lemmas are consequences
of our predicate definitions and the foundational flow framework definitions themselves.
Examples of these lemmas are shown in Figure 2. (D ECOMP) shows that we can
always decompose a valid flow graph into subgraphs which are themselves flow graphs.
Recomposition (C OMP) is possible only if the subgraphs compose. These rules, as well
as (S ING), and (G R E MP) follow directly from the definition of Gr and standard SL prop-
erties of iterated separating conjunction. The final rule (R EPL) is a direct consequence of
rules (C OMP), (D ECOMP) and the congruence relation on flow graphs induced by their
interfaces (cf. Lemma 2). Conceptually, it expresses that after decomposing any flow
graph into two parts H1 and H2 , we can replace H1 with a new flow graph H1 with the
same interface; when recomposing, the overall graph will be a flow graph with the same
overall interface.
Note the connection between rules (C OMP)/(D ECOMP) and the algebraic laws of
standard inductive predicates such as ls describing a segment of a linked list [2]. For
instance by combining the definition of Gr with these rules and (S ING) we can prove the
following graph analogue of the rule to separate a list into the head node and the tail:

Gr(X {y} , H) ≡ ∃Hy , H .N(y, Hy ) ∗ Gr(X, H ) ∗ H = Hy H ((U N )F OLD )

However, crucially (and unlike when using general inductive predicates [32]), this rule
is symmetrical for any node x in X; it works analogously for any desired order of
decomposition of the graph, and for any data structure specified using flows.
When working with our overloaded N and Gr predicates, similar steps to those
described by the above lemmas are useful. Given these overloaded predicates, we simply
apply the lemmas above to the existentially quantified flow-graphs in their definitions and
then lift the consequence of the lemma back to the interface level using the congruence
between our flow graph and interface composition notions (Lemma 2).

3.2 Proof of the PIP

We now have all the tools necessary to verify the priority inheritance protocol (PIP).
Figure 3 gives the full algorithm with ﬂow-based speciﬁcations; we also include some
intermediate assertions to illustrate the reasoning steps for the acquire method, which
Local Reasoning for Global Graph Properties 321

1 // Let δ(m, q1 , q2 ) := m \ (q1 ≥ 0 ? {q1 } : ∅) ∪ (q2 ≥ 0 ? {q2 } : ∅)

3 method update(n: Ref, from: Int, to: Int)

4 requires N(n, In ) ∗ Gr(X \ {n} , I ) ∗ I = In ⊕ I ∗ ϕ(I) ∗ n ∈ X
5 requires In = ({n δ(In .in(n), from, to)} , In .out) ∗ from = to
6 ensures Gr(X, I)
7 {
8 n.prios := n.prios \ {from}
9 if (to >= 0) {
10 n.prios := n.prios ∪ {to}
11 }
12 from := n.curr_prio
13 n.curr_prio := max(n.prios ∪ {n.def_prio})
14 to := n.curr_prio
15

16 if (from != to && n.next != null) {

17 update(n.next, from, to)
18 }
19 }
20

21 method acquire(p: Ref, r: Ref)

22 requires Gr(X, I) ∗ ϕ(I) ∗ p ∈ X ∗ r ∈ X ∗ p = r
23 ensures Gr(X, I)
{

24

25 ∃Ir , Ip , I1 . N(r, Ir ) ∗ N(p, Ip ) ∗ Gr(X \ {r, p} , I1 ) ∗ I = Ir ⊕ Ip ⊕ I1 ∗ ϕ(I)

26 if (r.next == null) {
27 r.next := p;
// Let qr = r.curr_prio

28

∃Ir , Ir , Ip , I1 . N(r, Ir ) ∗ N(p, Ip ) ∗ Gr(X \ {r, p} , I1 ) ∗ I = Ir ⊕ Ip ⊕ I1

29
∗ Ir = (Ir .in, {p {qr }}) ∗ Ir .out = λ0 ∗ · · ·

∃Ip , Ip , I2 . N(p, Ip ) ∗ Gr(X \ {p} , I2 ) ∗ I = Ip ⊕ I2
30 |=
∗ Ip = ({p δ(Ip .in(p), −1, qr )} , Ip .out) ∗ · · ·
update(p, -1, r.curr_prio)

31

32 Gr(X, I)
33 } else {
34 p.next := r; update(r, -1, p.curr_prio)
35 }
36 }
37

38 method release(p: Ref, r: Ref)

39 requires Gr(X, I) ∗ ϕ(I) ∗ p ∈ X ∗ r ∈ X ∗ p = r
40 ensures Gr(X, I)
41 { r.next := null; update(p, r.curr_prio, -1) }

Fig. 3: Full PIP code and speciﬁcations, with proof sketch for acquire. The comments
and coloured annotations (lines 29 to 32) are used to highlight steps in the proof, and are
explained in detail in the text.
322 S. Krishna et al.

we explain in more detail below. 10 We instantiate our framework in order to capture the
PIP invariants as follows:
fs := next : y, curr_prio : q, def_prio : q 0 , prios : Q

(λm. max(m ∪ {q 0 })) if z = y = null
edge(x, fs, z) :=
λ0 otherwise

γ(x, fs, m) := q ≥ 0 ∗ (∀q ∈ Q. q ≥ 0) ∗ m = Q ∗ q = max(Q ∪ {q 0 })
0

ϕ(I) := I = (λ0 , λ0 )
Each node has the four fields listed in fs. fs also defines variables such as y to denote
field values that are used in the definitions of edge and γ; these variables are bound to the
heap by N. edge abstracts the heap into a flow graph by letting each node have an edge
to its next successor labelled by a function that passes to it the maximum incoming
priority or the node’s default priority: whichever is larger. With this definition, one can
see that the flow of every node will be the multiset containing exactly the priorities of
its predecessors. The node-local invariant γ says that all priorities are non-negative, the
flow m of each node is stored in the prios field, and its current priority is the maximum
of its default and incoming priorities. Finally, the constraint ϕ on the global interface
expresses that the graph is closed – it has no inflow or outflow.

Flows Specifications for the PIP. Our specifications of acquire and release guarantee
that if we start with a valid flow graph (closed, according to ϕ), we are guaranteed to
return a valid flow graph with the same interface (i.e. the graph remains closed). For
clarity of the exposition, we focus here on how we prove that being a flow graph that
satisfies the PIP invariant is preserved (as is the composite flow graph’s interface).
Extending this specification to one which proves, e.g., that acquire adds the expected
edge is straightforward (see Appendix C of the TR [23]). 11
The specification for update is somewhat subtle, and exploits the full flexibility
of flow interfaces as a specification medium. The preconditions of update describe an
update to the graph which is not yet completed. There are three complementary aspects
to this specification. Firstly, (as for acquire and release), node-local invariants (γ)
hold for all nodes in the graph (enforced via N and Gr predicates). Secondly, we employ
flow interfaces to express a decomposition of the original top-level interface I into
compatible (primed) sub-interfaces. The key to understanding this specification is that
In is in some sense a fake interface; it does not abstract the current state of the heap node
n. Instead, In expresses the way in which the node n’s current inflow hasn’t yet been
accounted for in the heap: that if n could adjust its inflow according to the propagated
priority change without changing its outflow, then it would compose back with the rest of
the graph, and restore the graph’s overall interface. The shorthand δ defines the required
change to n’s inflow.
In general (except when n’s next field is null, or n’s flow value is unchanged), it
is not even possible for n’s fields to be updated to satisfy In ; by updating n’s inflow,
10
In specifications, we implicitly quantify at the top level over free variables such as I. λ0 denotes
an identically zero function on an unconstrained domain.
11
We also omit acquire’s precondition that p.next == null for brevity.
Local Reasoning for Global Graph Properties 323

we will necessarily update its outflow. However, we can then construct a corresponding
“fake” interface for the next node in the graph, reflecting the update yet to be accounted
for, and establishing the precondition for the recursive call to update.
The third specification aspect is the connection between heap-level nodes and in-
terfaces. The N(n, In ) predicate connects n with a different interface; In is the actual
current abstraction of n’s state. Conceptually, the key property which is broken at this
point is this connection between the interface-level specification and the heap at node n,
reflected by the decomposition in the specification between X \ {n} and {n}.
We note that the same specification ideas and proof style can be easily adapted to
other data structure implementations with an update-notify style, including well-known
designs such as Subject-Observer patterns, or the Composite pattern [27].

Proof Outline. To illustrate the application of flows reasoning to our PIP specification
ideas more clearly, we examine in detail the first if-branch in the proof of acquire. Our
intermediate proof steps are shown as purple annotations surrounded by braces. The first
step, as shown in the first line inside the method body, is to apply ((U N )F OLD) twice (on
the flow graphs represented by these predicates) and peel off N predicates for each of r
and p. The update to r’s next field (line 27) causes the correct singleton interface of r to
change to Ir : its outflow (previously none, since the next field was null) now propagates
flow to p. We summarise this state in the assertion on line 29 (we omit e.g. repetition
of properties from the function’s precondition, focusing on the flow-related steps of
the argument). We now rewrite this state; using the definition of interface composition
(Definition 6) we deduce that although Ir and Ip do not compose (since the former has
outflow that the latter does not account for as inflow), the alternative “fake” interface
Ip for p (which artificially accounts for the missing inflow) would do so (cf. line 30).
Essentially, we show Ir ⊕ Ip = Ir ⊕ Ip , that the interface of {r, p} would be unchanged
if p could somehow have interface Ip . Now by setting I2 = Ir ⊕ I1 and using algebraic
properties of interfaces, we assemble the precondition expected by update. After the
call, update’s postcondition gives us the desired postcondition.
We focused here on the details of acquire’s proof, but very similar manipulations
are required for reasoning about the recursive call in update’s implementation.12 The
main difference there is that if the if-condition wrapping the recursive call is false then
either the last-modified node has no successor (and so there is no outstanding inflow
change needed), or we have from = to which implies that the “fake” interface is actually
the same as the currently correct one.
Despite the property proved for the PIP example being a rather delicate recursive in-
variant over the (potentially cyclic) graph, the power of our framework enables extremely
succinct specifications for the example, and proofs which require the application of rela-
tively few generic lemmas. The integration with standard separation logic reasoning, and
the complementary separation algebras provided by flow interfaces allow decomposition
and recomposition to be simple proof steps. For this proof, we integrated with standard
sequential separation logic, but in the next section we will show that compatibility with
concurrent SL techniques is similarly straightforward.

12
We provide further proof outlines in Appendix C of the TR [23].
324 S. Krishna et al.

mh −∞ 3 5 9 10 12 ∞

fh 2 6 1 7 ft

Fig. 4: A potential state of the Harris list with explicit memory management. fnext
pointers are shown with dashed edges, marked nodes are shaded gray, and null pointers
are omitted for clarity.

4 Advanced Flow Reasoning and the Harris List

This section introduces some advanced foundational flow framework theory and demon-
strates its use in the proof of the Harris list. We note that [22] presented a proof of this
data structure in the original flow framework. The proof given here shows that the new
framework eliminates the need for the customized concurrent separation logic defined
in [22]. We start with a recap of Harris’ algorithm adapted from [22].

4.1 The Harris List Algorithm

The power of flow-based reasoning is exhibited in the proof of overlaid data structures
such as the Harris list, a concurrent non-blocking linked list algorithm [12]. This algo-
rithm implements a set data structure as a sorted list, and uses atomic compare-and-swap
(CAS) operations to allow a high degree of parallelism. As with the sequential linked
list, Harris’ algorithm inserts a new key k into the list by finding nodes k1 , k2 such that
k1 < k < k2 , setting k to point to k2 , and using a CAS to change k1 to point to k only
if it was still pointing to k2 . However, a similar approach fails for the delete operation.
If we had consecutive nodes k1 , k2 , k3 and we wanted to delete k2 from the list (say by
setting k1 to point to k3 ), there is no way to ensure with one CAS that k2 and k3 are also
still adjacent (another thread could have inserted/deleted in between them).
Harris’ solution is a two step deletion: first atomically mark k2 as deleted (by setting
a mark bit on its successor field) and then later remove it from the list using a single
CAS. After a node is marked, no thread can insert or delete to its right, hence a thread
that wanted to insert k to the right of k2 would first remove k2 from the list and then
insert k as the successor of k1 .
In a non-garbage-collected environment, unlinked nodes cannot be immediately freed
as suspended threads might continue to hold a reference to them. A common solution
is to maintain a second “free list” to which marked nodes are added before they are
unlinked from the main list (this is the so-called drain technique). These nodes are then
labelled with a timestamp, which is used by a maintenance thread to free them when it is
safe to do so. This leads to the kind of data structure shown in Figure 4, where each node
has two pointer fields: a next field for the main list and an fnext field for the free list
(the list from fh to ft via dashed edges). Threads that have been suspended while holding
Local Reasoning for Global Graph Properties 325

1 1 1 1

1 1 1 n1 n2 1 1 n1 n2 1

? x n3 1 1 n4 n3 1 1 n4

? x 1 1

n5 n5
(a) (b) (c)

Fig. 5: Examples of graphs that motivate effective acyclicity. All graphs use the path-
counting flow domain, the flow is displayed inside each node, and the inflow is displayed
as curved arrows to the top-left of nodes. (a) shows a graph and inflow that has no
solution to (FlowEqn); (b) has many solutions. (c) shows a modification that preserves
the interface of the modified nodes, yet goes from a graph that has a unique flow to one
that has many solutions to (FlowEqn).

a reference to a node that was added to the free list can simply continue traversing the
next pointers to ﬁnd their way back to the unmarked nodes of the main list.
Even for seemingly simple properties such as that the Harris list is memory safe and
not leaking memory, the proof will rely on the following non-trivial invariants:

(a) The data structure consists of two (potentially overlapping) lists: a list on next
edges beginning at mh and one on fnext edges beginning at fh.
(b) The two lists are null terminated and next edges from nodes in the free list point to
nodes in the free list or main list.
(c) All nodes in the free list are marked.
(d) ft is an element in the free list (due to concurrency, it’s not always the tail).

Challenges. To prove that Harris’ algorithm maintains the invariants listed above we
must tackle a number of challenges. First, we must construct flow domains that allow us
to describe overlaid data structures, such as the overlapping main and free lists (§4.2).
Second, the flow-based proofs we have seen so far work by showing that the interface of
some modified region is unchanged. However, if we consider a program that allocates
and inserts a new node into a data structure (like the insert method of Harris), then the
interface cannot be the same since the domain has changed (it has increased by the
newly allocated node). We must thus have a means to reason about preservation of flows
by modifications that allocate new nodes (§4.3). The third issue is that in some flow
domains, there exist graphs G and inflows in for which no solutions to the flow equation
(FlowEqn) exist. For instance, consider the path-counting flow domain and the graph
in Figure 5(a). Since we would need to use the path-counting flow in the proof of the
Harris list to encode its structural invariants, this presents a challenge (§4.4).
We will next see how to overcome these three challenges in turn, and then apply
those solution to the proof of the Harris list in §4.5.
326 S. Krishna et al.

4.2 Product Flows for Reasoning about Overlays

An important fact about flows is that any flow of a graph over a product of two flow
domains is the product of the flows on each flow domain component.
Lemma 3. Given two flow domains (M1 , +1 , 01 , E1 ) and (M2 , +2 , 02 , E2 ), the product
domain (M1 × M2 , +, (01 , 02 ), E) is a flow domain, where + and E are the pointwise
liftings of (+1 , +2 ) and (E1 , E2 ), respectively, to the domain M1 × M2 .
This lemma greatly simplifies reasoning about overlaid graph structures; we will use
the product of two path-counting flows to describe a structure consisting of two overlaid
lists that make up the Harris list.

4.3 Contextual Extensions and the Replacement Theorem

In general, when modifying a flow graph H to another flow graph H , requiring that H
satisfies precisely the same interface int(H) can be too strong a condition as it does not
permit allocating new nodes. Instead, we want to allow int(H ) to differ from int(H)
in that the new interface could have a larger domain, as long as the edges from the new
nodes do not change the outflow of the modified region.
Definition 7. An interface I = (in, out) is contextually extended by I = (in , out ),
written I I , if and only if the following conditions all hold:
(1) dom(in) ⊆ dom(in ),
(2) ∀n ∈ dom(in). in(n) = in (n), and
(3) ∀n ∈ dom(in ). out(n ) = out (n ).
The following theorem states that contextual extension preserves composability and
is itself preserved under interface composition.
Theorem 2 (Replacement Theorem). If I = I1 ⊕ I2 , and I1 I1 are all valid
interfaces such that I1 ∩ I2 = ∅ and ∀n ∈ I1 \ I1 . I2 .out(n) = 0, then there exists a
valid I = I1 ⊕ I2 such that I I .
In terms of our flow predicates, this theorem gives rise to the following adaptation of
the (R EPL) rule:
Gr(X1 , H1 ) ∗ Gr(X2 , H2 ) ∗ H = H1 H2 ∗ int(H1 ) int(H1 )
|= ∃H . Gr(X1 X2 , H ) ∗ H = H1 H2 ∗ int(H) int(H ) (R EPL +)
The rule (R EPL +) is derived from the Replacement Theorem by instantiating with
I = int(H), I1 = int(H1 ), I2 = int(H2 ) and I1 = int(H1 ). We know I1 I1 ;
H = H1 H2 tells us (by Lemma 2) that I = I1 ⊕ I2 , and Gr(X1 , H1 ) ∗ Gr(X2 , H2 )
gives us I1 ∩ I2 = ∅. The final condition of the Replacement Theorem is to prove that
there is no outflow from X2 to any newly allocated node in X1 . While we can use
additional ghost state to prove such constraints in our proofs, if we assume that the
memory allocator only allocates fresh addresses and restrict the abstraction function
edge to only propagate flow along an edge (n, n ) if n has a (non-ghost) field with a
reference to n then this condition is always true. For simplicity, and to keep the focus of
this paper on the flow reasoning, we make this assumption in the Harris list proof.
Local Reasoning for Global Graph Properties 327

4.4 Existence and Uniqueness of Flows

We typically express global properties of a graph G = (N, e) by fixing a global inflow
in : N → M and then constraining the flow of each node in N using node-local
conditions. However, as we discussed at the beginning of this section, there is no general
guarantee that a flow exists or is unique for a given in and G. The remainder of this
section presents two complementary conditions under which we can prove that our flow
fixpoint equation always has a unique solution. To this end, we say that a flow domain
(M, +, 0, E) has unique flows if for every graph (N, e) over this flow domain and inflow
in : N → M , there exists a unique fl that satisfies the flow equation FlowEqn(in, e, fl ).
But first, we briefly recall some more monoid theory.
We say M is positive if m1 + m2 = 0 implies that m1 = m2 = 0. For a positive
monoid M , we can define a partial order ≤ on its elements as m1 ≤ m2 if and only if
∃m3 . m1 + m3 = m2 . This definition implies that every m ∈ M satisfies 0 ≤ m.
For e, e : M → M , we write e + e for the function that maps m∈ M to e(m) +

e (m). We lift this construction to a set of functions E and write it as e∈E e.
Definition 8. A function e : M → M is called an endomorphism on M if for every
m1 , m2 ∈ M , e(m1 + m2 ) = e(m1 ) + e(m2 ). We denote the set of all endomorphisms
on M by End(M ).
Note that for cancellative M , e(0) = 0 for every endomorphism e ∈ End(M ).

Note further thate + e ∈ End(M ) for any e, e ∈ End(M ). Similarly, for finite sets
E ⊆ End(M ), e∈E e ∈ End(M ). We say that a set of endomorphisms E ⊆ End(M )
is closed if for every e, e ∈ E, e ◦ e ∈ E and e + e ∈ E.

Nilpotent Cycles. Let (M, +, 0, E) be a flow domain where every edge function e ∈ E
is an endomorphism on M . In this case, we can show that the flow of a node n is the
sum of the flow as computed along each path in the graph that ends at n. Suppose we
additionally know that the edge functions are defined such that their composition along
any cycle in the graph eventually becomes the identically zero function. We then need
only consider finitely many paths to compute the flow of a node, which means the flow
equation has a unique solution.
Definition 9. A closed set of endomorphisms E ⊆ End(M ) is called nilpotent if there
exists p > 1 such that ep ≡ 0 for every e ∈ E.
Example 5. The flow domain (N2 , +, (0, 0), {(λ(x, y). (0, c · x)) | c ∈ N}) contains
nilpotent edge functions that shift the first component of the flow to the second (with
a scaling factor). This domain can be used to express the property that every node in a
graph is reachable from the root via a single edge (by requiring the flow of every node to
be (0, 1) under the inflow (λn. (n = r ? (1, 0) : (0, 0)))).
Before we prove that nilpotent endomorphisms lead to unique flows, we present a
useful notion when dealing with endomorphic flow domains.
Definition 10. The capacity of a flow graph G = (N, e) is cap(G) : N × N → (M →
M ), defined inductively as cap(G) := cap|G| (G), where cap0 (G)(n, n ) := δn=n and
capi+1 (G)(n, n ) := δn=n + capi (G)(n, n ) ◦ e(n , n ).

n ∈G
328 S. Krishna et al.

For a ﬂow graph H = (N, e, ﬂ ), we write cap(H)(n, n ) = cap((N, e))(n, n )

for the capacity of the underlying graph. Intuitively, cap(G)(n, n ) is the function that
summarizes how flow is routed from any source node n in G to any other node n ,
including those outside of G.
We can now show that if all edges of a flow graph are labelled with edges from a
nilpotent set of endomorphisms, then the flow equation has a unique solution:

Lemma 4. If (M, +, 0, E) is a ﬂow domain such that M is a positive monoid and E is

a nilpotent set of endomorphisms, then this ﬂow domain has unique ﬂows.

Effectively Acyclic Flow Graphs. There are some flow domains that compute flows
useful in practice, but which do not guarantee either existence or uniqueness of fixpoints
a priori for all graphs. For example, the path-counting flow from Example 1 is one where
for certain graphs, there exist no solutions to the flow equation (see Figure 5(a)), and for
others, there can exist more than one (in Figure 5(b), the nodes marked with x can have
any path count, as long as they both have the same value).
In such cases, we explore how to restrict the class of graphs we use in our flow-based
proofs such that each graph has a unique fixpoint; the difficulty is that this restriction must
be respected for composition of our graphs. Here, we study the class of flow domains
(M, +, 0, E) such that M is a positive monoid and E is a set of reduced endomorphisms
(defined below). In such domains we can decompose the flow computations into the
various paths in the graph, and achieve unique fixpoints by restricting the kinds of cycles
graphs can have.

Definition 11. A flow graph H = (N, e, fl ) is effectively acyclic (EA) if for every 1 ≤ k
and n1 , . . . , nk ∈ N ,

ﬂ (n1 ) e(n1 , n2 ) · · · e(nk−1 , nk ) e(nk , n1 ) = 0.

The simplest example of an effectively acyclic graph is one where the edges with
non-zero edge functions form an acyclic graph. However, our semantic condition is
weaker: for example, when reasoning about two overlaid acyclic lists whose union
happens to form a cycle, a product of two path-counting domains will satisfy effective
acyclicity because the composition of different types of edges results in the zero function.

Lemma 5. Let (M, +, 0, E) be a flow domain such that M is a positive monoid and
E is a closed set of endomorphisms. Given a graph (N, e) over this flow domain and
inflow in : N → M , if there exists a flow graph H = (N, e, fl ) that is effectively acyclic,
then fl is unique.

While the restriction to effectively acyclic flow graphs guarantees us that the flow is
the unique fixpoint of the flow equation, it is not easy to show that modifications to the
graph preserve EA while reasoning locally. Even modifying a subgraph to another with
the same flow interface (which we know guarantees that it will compose with any context)
can inadvertently create a cycle in the larger composite graph. For instance, consider
Figure 5(c), that shows a modification to nodes {n3 , n4 } (the boxed blue region). The
interface of this region is ({n3 1, n4 1} , {n5 1, n2 1}), and so swapping
Local Reasoning for Global Graph Properties 329

the edges of n3 and n4 preserves this interface. However, the resulting graph, despite
composing with the context to form a valid flow graph, is not EA (in this case, it has
multiple solutions to the flow equation). This shows that flow interfaces are not powerful
enough to preserve effective acyclicity. For a special class of endomorphisms, we show
that a local property of the modified subgraph can be checked, which implies that the
modified composite graph continues to be EA.

Deﬁnition 12. A closed set of endomorphisms E ⊆ End(M ) is called reduced if e ◦ e ≡

λ0 implies e ≡ λ0 for every e ∈ E.

Note that if E is reduced, then no e ∈ E can be nilpotent. In that sense, this class of
instantiations is complementary to the nilpotent class.

Example 6. Examples of flow domains that fall into this class include positive semirings
of reduced rings (with the additive monoid of the semiring being the aggregation monoid
of the flow domain and E being any set of functions that multiply their argument with
a constant flow value). Note that any direct product of integral rings is a reduced ring.
Hence, products of the path counting flow domain are a special case.

For reduced endomorphisms, it sufﬁces to check that a modiﬁcation preserves the

ﬂow routed between every pair of source and sink node in order to ensure that it does
not create any new cycles in any composite graph.

Definition 13. A flow graph H is a subflow-preserving extension of H, for which we

write H s H , if the following conditions all hold:

(1) int(H) int(H )

(2) ∀n ∈ H, n ∈ H , m. m ≤ inf(H)(n) ⇒ mcap(H)(n, n ) = mcap(H )(n, n )
(3) ∀n ∈ H \ H, n ∈ H , m. m ≤ inf(H )(n) ⇒ m cap(H )(n, n ) = 0

This pairwise check, apart from requiring the interface of the modified region to be
unchanged, also permits allocating new nodes as long as no flow is routed via the new
nodes (condition (3)). We now show that it is sufficient to check that a modification is a
subflow-preserving extension to guarantee composition back to an effectively-acyclic
composite graph:

Theorem 3. Let (M, +, 0, E) be a flow domain such that M is a positive monoid and E
is a reduced set of endomorphisms. If H = H1 H2 and H1 s H1 are all effectively
acyclic flow graphs such that H1 ∩ H2 = ∅ and ∀n ∈ H1 \ H1 . outf(H2 )(n) = 0, then
there exists an effectively acyclic flow graph H = H1 H2 such that H s H .

We deﬁne effectively acyclic versions of our ﬂow graph predicates, Na (x, H) and
Gra (X, H), that additionally constrain H to be effectively acyclic. The above theorem
yields the following variant of the (R EPL) rule for EA graphs:

Gra (X1 , H1 ) ∗ Gra (X2 , H2 ) ∗ H = H1 H2 ∗ H1 s H1

|= ∃H . Gra (X1 X2 , H ) ∗ H = H1 H2 ∗ H s H (R EPL EA)
330 S. Krishna et al.

4.5 Proof of the Harris List

We use the techniques seen in this section in the proof of the Harris list. As the data
structure consists of two potentially overlapping lists, we use Lemma 3 to construct a
product flow domain of two path-counting flows: one tracks the path count from the
head of the main list, and one from the head of the free list. We also work under the
effectively acyclic restriction (i.e. we use the Na and Gra predicates), both in order to
obtain the desired interpretation of the flow as well as to ensure existence of flows in this
flow domain.
We instantiate the framework using the following definitions of parameters:

fs := {key : k, next : y, fnext : z}

edge(x, fs, v) := (v = null ? λ0 : (v = y ∧ y = z ? λ(1,0)
: (v = y ∧ y = z ? λ(0,1) : (v = y ∧ y = z ? λid : λ0 ))))
γ(x, fs, I) := (I.in(x) ∈ {(1, 0), (0, 1), (1, 1)}) ∗ (I.in(x) = (1, 0) ⇒ M (y))
∗ (x = ft ⇒ I.in(x) = (_, 1)) ∗ (¬M (y) ⇒ z = null )
ϕ(I) := I = (λ0 [mh (1, 0)][fh (0, 1)], λ0 )

Here, edge encodes the edge functions needed to compute the product of two path
counting flows, the first component tracks path-counts from mh on next edges and the
second tracks path-counts from fh on fnext edges 13 . The node-local invariant γ says:
the flow is one of {(1, 0), (0, 1), (1, 1)} (meaning that the node is on one of the two lists,
invariant (a)); if the flow is not (1, 0) (the node is not only on the main list, i.e. it is
on the free list) then the node is marked (indicated by M (y), invariant (c)); and if the
node is ft then it must be on the free list (invariant (d)). The constraint on the global
interface, ϕ, says that the inflow picks out mh and fh as the roots of the lists, and there
is no outgoing flow (thus, all non-null edges must stay within the graph, invariant (b)).
Since the Harris list is a concurrent algortihm, we perform the proof in rely-guarantee
separation logic (RGSep) [41]. Like in §3, we do not need to modify the semantics of
RGSep in any way; our flow-based predicates can be defined and reasoning using our
lemmas can be performed in the logic out-of-the-box. For space reasons, the full proof
can be found in Appendix D of the TR [23].

5 Related Work

As mentioned in §1, the most closely related work is the flow framework developed by
some of the authors in [22]. We here present a simplified and generalized meta theory of
flows that makes the approach much more broadly applicable. There were a number of
limitations of the prior framework that prevented its application to more general classes
of examples.
First, [22] required flow domains to form a semiring; the analogue of edge functions
are restricted to multiplication with a constant which must come from the same flow
13
We use the shorthands λ(1,0) := (λ(m1 , m2 ). (m1 , 0)) and λ(0,1) := (λ(m1 , m2 ). (0, m2 )),
and denote an anonymous existentially-quantified variable by _.
Local Reasoning for Global Graph Properties 331

value set. This restriction made it complex to encode many graph properties of interest.
For example, one could not easily encode the PIP flow, or a simple flow that counts the
number of incoming edges to each node. Our foundational flow framework decouples
the algebraic structure defining how flow is aggregated from the algebraic structure of
the edge functions. In this way, we obtain a more general framework that applies to many
more examples, and with simpler flow domains.
Second, in [22], a flow graph did not uniquely determine its inflow (cf. Lemma 1).
Correspondingly, [22]’s notion of interface included an equivalence class of inflows (all
those that induce the same flow values). Since, in [22], the interface also determines
which modifications are permitted by the framework, [22] could only handle modifica-
tions that preserve the inflow equivalence class. For example, this prevents one from
reasoning locally about the removal of a single edge from a graph in certain cases (in
particular, like release does in the PIP). Our foundational flow framework solves
this problem by requiring that the aggregation operation on flow values is cancellative,
guaranteeing unique inflows.
Cancellativity is fundamentally incompatible with [22], which requires the flow
domain to form an ω-CPO in order to guarantee the existence of unique flows. For
example, in a graph with two nodes n and n with identity edges between them and
all other edges zero (in [22], edges labelled with 1 and 0), if we have in(n) = 0
and in(n) = m for some non-zero m, a solution to the flow equation must satisfy
fl (n) = m + fl (n). [22] forces such solutions to exist, ruling out cancellativity. To solve
this problem, we present a new theory which can optionally guarantee unique flows
when desired and show that requiring cancellativity does not limit expressivity.
Next, the proofs of programs shown in [22] depend on a bespoke program logic. This
logic requires new reasoning primitives that are not supported by the logics implemented
in existing SL-based verification tools. Our general proof technique eliminates the need
for a dedicated program logic and can be implemented on top of standard separation log-
ics and existing SL-based tools. Finally, the underlying separation algebra of the original
framework makes it hard to use equational reasoning, which is a critical prerequisite for
enabling proof automation.
An abundance of SL variants provide complementary mechanisms for modular
reasoning about programs (e.g. [18, 36, 38]). Most are parameterized by the underlying
separation algebra; our flow-based reasoning technique easily integrates with these
existing logics.
The most common approach to reason about irregular graph structures in SL is to
use iterated separating conjunction [30, 44] and describe the graph as a set of nodes each
of which satisfies some local invariant. This approach has the advantage of being able to
naturally describe general graphs. However, it is hard to express non-local properties that
involve some form of fixpoint computation over the graph structure. One approach is to
abstract the program state as a mathematical graph using iterated separating conjunction
and then express non-local invariants in terms of the abstract graph rather than the
underlying program state [14, 35, 38]. However, a proof that a modification to the state
maintains a global invariant of the abstract graph must then often revert back to non-local
and manual reasoning, involving complex inductive arguments about paths, transitive
closure, and so on. Our technique also exploits iterated separating conjunction for the
332 S. Krishna et al.

underlying heap ownership, with the key benefit that flow interfaces exactly capture the
necessary conditions on a modified subgraph in order to compose with any context and
preserve desired non-local invariants.
In recent work, Wang et al. present a Coq-mechanised proof of graph algorithms in
C, based on a substantial library of graph-related lemmas, both for mathematical and
heap-based graphs [42]. They prove rich functional properties, integrated with the VST
tool. In contrast to our work, a substantial suite of lemmas and background properties are
necessary, since these specialise to particular properties such as reachability. We believe
that our foundational flow framework could be used to simplify framing lemmas in a
way which remains parameteric with the property in question.
Proofs of a number of graph algorithms have been mechanized in various verification
tools and proof assistants, including Tarjan’s SCC algorithm [8], union-find [7], Kruskal’s
minimum spanning tree algorithm [13], and network flow algorithms [25]. These proofs
generally involve non-local reasoning arguments about mathematical graphs.
An alternative approach to using SL-style reasoning is to commit to global reasoning
but remain within decidable logics to enable automation [16, 21, 24, 28, 43]. However,
such logics are restricted to certain classes of graphs and certain types of properties.
For instance, reasoning about reachability in unbounded graphs with two successors
per node is undecidable [15]. Recent work by Ter-Gabrielyan et al. [40] shows how
to deal with modular framing of pairwise reachability specifications in an imperative
setting. Their framing notion has parallels to our notion of interface composition, but
allows subgraphs to change the paths visible to their context. The work is specific to
a reachability relation, and cannot express the rich variety of custom graph properties
available in our technique.
Dynamic frames [19] (e.g. implemented in Dafny [26]), can be used to explicitly
reason about framing of heap information in a first-order logic. However, by itself, this
theory does not enable modular reasoning about global graph properties. We believe that
the flow framework could in principle be adapted to the dynamic frames setting.

6 Conclusions and Future Work

We have presented the foundational flow framework, enabling local modular reasoning
about recursively-defined properties over general graphs. The core reasoning technique
has been designed to make minimal mathematical requirements, providing great flexi-
bility in terms of potential instantiations and applications. We identified key classes of
these instantiations for which we can provide existence and uniqueness guarantees for
the fixpoint properties our technique addresses and demonstrate our proof technique on
several challenging examples. As future work, we plan to automate flow-based proofs
in our new framework using existing tools that support SL-style reasoning such as
Viper [29] and GRASShopper [34].

Acknowledgments. This work is funded in parts by the National Science Foundation

under grants CCF-1618059 and CCF-1815633.
Local Reasoning for Global Graph Properties 333

References
1. Appel, A.W.: Verified software toolchain. In: NASA Formal Methods. Lecture Notes in
Computer Science, vol. 7226, p. 2. Springer (2012)
2. Berdine, J., Calcagno, C., O’Hearn, P.W.: A decidable fragment of separation logic. In:
FSTTCS. Lecture Notes in Computer Science, vol. 3328, pp. 97–109. Springer (2004)
3. Brookes, S., O’Hearn, P.W.: Concurrent separation logic. SIGLOG News 3(3), 47–65 (2016)
4. Calcagno, C., Distefano, D., Dubreil, J., Gabi, D., Hooimeijer, P., Luca, M., O’Hearn, P.W.,
Papakonstantinou, I., Purbrick, J., Rodriguez, D.: Moving fast with software verification. In:
NFM. Lecture Notes in Computer Science, vol. 9058, pp. 3–11. Springer (2015)
5. Calcagno, C., O’Hearn, P.W., Yang, H.: Local action and abstract separation logic. In: LICS.
pp. 366–378. IEEE Computer Society (2007)
6. Cao, Q., Cuellar, S., Appel, A.W.: Bringing order to the separation logic jungle. In: APLAS.
Lecture Notes in Computer Science, vol. 10695, pp. 190–211. Springer (2017)
7. Charguéraud, A., Pottier, F.: Verifying the correctness and amortized complexity of a union-
find implementation in separation logic with time credits. J. Autom. Reasoning 62(3), 331–365
(2019)
8. Chen, R., Cohen, C., Lévy, J., Merz, S., Théry, L.: Formal proofs of tarjan’s strongly connected
components algorithm in why3, coq and isabelle. In: ITP. LIPIcs, vol. 141, pp. 13:1–13:19.
Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019)
9. Dockins, R., Hobor, A., Appel, A.W.: A fresh look at separation algebras and share accounting.
In: APLAS. Lecture Notes in Computer Science, vol. 5904, pp. 161–177. Springer (2009)
10. Dodds, M., Jagannathan, S., Parkinson, M.J., Svendsen, K., Birkedal, L.: Verifying custom
synchronization constructs using higher-order separation logic. ACM Trans. Program. Lang.
Syst. 38(2), 4:1–4:72 (2016)
11. Enea, C., Lengál, O., Sighireanu, M., Vojnar, T.: SPEN: A solver for separation logic. In:
NFM. Lecture Notes in Computer Science, vol. 10227, pp. 302–309 (2017)
12. Harris, T.L.: A pragmatic implementation of non-blocking linked-lists. In: DISC. Lecture
Notes in Computer Science, vol. 2180, pp. 300–314. Springer (2001)
13. Haslbeck, M.P.L., Lammich, P., Biendarra, J.: Kruskal’s algorithm for minimum spanning
forest. Archive of Formal Proofs 2019 (2019)
14. Hobor, A., Villard, J.: The ramifications of sharing in data structures. In: POPL. pp. 523–536.
ACM (2013)
15. Immerman, N., Rabinovich, A.M., Reps, T.W., Sagiv, S., Yorsh, G.: The boundary between de-
cidability and undecidability for transitive-closure logics. In: CSL. Lecture Notes in Computer
Science, vol. 3210, pp. 160–174. Springer (2004)
16. Itzhaky, S., Banerjee, A., Immerman, N., Nanevski, A., Sagiv, M.: Effectively-propositional
reasoning about reachability in linked data structures. In: CAV. Lecture Notes in Computer
Science, vol. 8044, pp. 756–772. Springer (2013)
17. Jacobs, B., Smans, J., Philippaerts, P., Vogels, F., Penninckx, W., Piessens, F.: Verifast: A
powerful, sound, predictable, fast verifier for C and java. In: NASA Formal Methods. Lecture
Notes in Computer Science, vol. 6617, pp. 41–55. Springer (2011)
18. Jung, R., Krebbers, R., Jourdan, J., Bizjak, A., Birkedal, L., Dreyer, D.: Iris from the ground
up: A modular foundation for higher-order concurrent separation logic. J. Funct. Program. 28,
e20 (2018)
19. Kassios, I.T.: Dynamic frames: Support for framing, dependencies and sharing without
restrictions. In: FM. Lecture Notes in Computer Science, vol. 4085, pp. 268–283. Springer
(2006)
20. Katelaan, J., Matheja, C., Zuleger, F.: Effective entailment checking for separation logic with
inductive definitions. In: TACAS (2). Lecture Notes in Computer Science, vol. 11428, pp.
319–336. Springer (2019)
334 S. Krishna et al.

21. Klarlund, N., Schwartzbach, M.I.: Graph types. In: POPL. pp. 196–205. ACM Press (1993)
22. Krishna, S., Shasha, D.E., Wies, T.: Go with the flow: compositional abstractions for concur-
rent data structures. PACMPL 2(POPL), 37:1–37:31 (2018)
23. Krishna, S., Summers, A.J., Wies, T.: Local reasoning for global graph properties. CoRR
abs/1911.08632 (2019)
24. Lahiri, S.K., Qadeer, S.: Back to the future: revisiting precise program verification using SMT
solvers. In: POPL. pp. 171–182. ACM (2008)
25. Lammich, P., Sefidgar, S.R.: Formalizing network flow algorithms: A refinement approach in
isabelle/hol. J. Autom. Reasoning 62(2), 261–280 (2019)
26. Leino, K.R.M.: Dafny: An automatic program verifier for functional correctness. In: LPAR
(Dakar). Lecture Notes in Computer Science, vol. 6355, pp. 348–370. Springer (2010)
27. Leino, K.R.M., Moskal, M.: Vacid-0: Verification of ample correctness of invariants of
data-structures, edition 0. Microsoft Research Technical Report (2010)
28. Madhusudan, P., Qiu, X., Stefanescu, A.: Recursive proofs for inductive tree data-structures.
In: POPL. pp. 123–136. ACM (2012)
29. Müller, P., Schwerhoff, M., Summers, A.J.: Viper: A verification infrastructure for permission-
based reasoning. In: Jobstmann, B., Leino, K.R.M. (eds.) Verification, Model Checking, and
Abstract Interpretation (VMCAI). LNCS, vol. 9583, pp. 41–62. Springer-Verlag (2016)
30. Müller, P., Schwerhoff, M., Summers, A.J.: Automatic verification of iterated separating
conjunctions using symbolic execution. In: CAV (1). Lecture Notes in Computer Science,
vol. 9779, pp. 405–425. Springer (2016)
31. O’Hearn, P.W., Reynolds, J.C., Yang, H.: Local reasoning about programs that alter data
structures. In: CSL. Lecture Notes in Computer Science, vol. 2142, pp. 1–19. Springer (2001)
32. Parkinson, M.J., Bierman, G.M.: Separation logic and abstraction. In: Palsberg, J., Abadi, M.
(eds.) Principles of Programming Languages (POPL). pp. 247–258. ACM (2005)
33. Piskac, R., Wies, T., Zufferey, D.: Automating separation logic using SMT. In: CAV. Lecture
Notes in Computer Science, vol. 8044, pp. 773–789. Springer (2013)
34. Piskac, R., Wies, T., Zufferey, D.: Grasshopper - complete heap verification with mixed
specifications. In: TACAS. Lecture Notes in Computer Science, vol. 8413, pp. 124–139.
Springer (2014)
35. Raad, A., Hobor, A., Villard, J., Gardner, P.: Verifying concurrent graph algorithms. In:
APLAS. Lecture Notes in Computer Science, vol. 10017, pp. 314–334 (2016)
36. Raad, A., Villard, J., Gardner, P.: Colosl: Concurrent local subjective logic. In: ESOP. Lecture
Notes in Computer Science, vol. 9032, pp. 710–735. Springer (2015)
37. Reynolds, J.C.: Separation logic: A logic for shared mutable data structures. In: LICS. pp.
55–74. IEEE Computer Society (2002)
38. Sergey, I., Nanevski, A., Banerjee, A.: Mechanized verification of fine-grained concurrent
programs. In: PLDI. pp. 77–87. ACM (2015)
39. Sha, L., Rajkumar, R., Lehoczky, J.P.: Priority inheritance protocols: An approach to real-time
synchronization. IEEE Trans. Computers 39(9), 1175–1185 (1990)
40. Ter-Gabrielyan, A., Summers, A.J., Müller, P.: Modular verification of heap reachability
properties in separation logic. PACMPL 3(OOPSLA), 121:1–121:28 (2019)
41. Vafeiadis, V.: Modular fine-grained concurrency verification. Ph.D. thesis, University of
Cambridge, UK (2008)
42. Wang, S., Cao, Q., Mohan, A., Hobor, A.: Certifying graph-manipulating C programs via
localizations within data structures. PACMPL 3(OOPSLA), 171:1–171:30 (2019)
43. Wies, T., Muñiz, M., Kuncak, V.: An efficient decision procedure for imperative tree data
structures. In: CADE. Lecture Notes in Computer Science, vol. 6803, pp. 476–491. Springer
(2011)
44. Yang, H.: An example of local reasoning in BI pointer logic: the Schorr-Waite graph marking
algorithm. In: Proceedings of the SPACE Workshop (2001)
Local Reasoning for Global Graph Properties 335

Morten Krogh-Jespersen, Amin Timany , Marit Edna Ohlenbusch,

Simon Oddershede Gregersen , and Lars Birkedal

Aarhus University, Aarhus, Denmark

Abstract. Building network-connected programs and distributed sys-

tems is a powerful way to provide scalability and availability in a digital,
always-connected era. However, with great power comes great complexity.
Reasoning about distributed systems is well-known to be diﬃcult.
In this paper we present Aneris, a novel framework based on separation
logic supporting modular, node-local reasoning about concurrent and
distributed systems. The logic is higher-order, concurrent, with higher-
order store and network sockets, and is fully mechanized in the Coq proof
assistant. We use our framework to verify an implementation of a load
balancer that uses multi-threading to distribute load amongst multiple
servers and an implementation of the two-phase-commit protocol with
a replicated logging service as a client. The two examples certify that
Aneris is well-suited for both horizontal and vertical modular reasoning.

Keywords: Distributed systems · Separation logic · Higher-order logic ·

Concurrency · Formal veriﬁcation

1 Introduction

Reasoning about distributed systems is notoriously diﬃcult due to their sheer

complexity. This is largely the reason why previous work has traditionally focused
on verification of protocols of core network components. In particular, in the
context of model checking, where safety and liveness assertions [29] are consid-
ered, tools such as SPIN [9], TLA+ [23], and Mace [17] have been developed.
More recently, significant contributions have been made in the field of formal
proofs of implementations of challenging protocols, such as two-phase-commit,
lease-based key-value stores, Paxos, and Raft [7, 25, 30, 35, 40]. All of these
developments define domain specific languages (DSLs) specialized for distributed
systems verification. Protocols and modules proven correct can be compiled to
an executable, often relying on some trusted code-base.
Formal reasoning about distributed systems has often been carried out by
giving an abstract model in the form of a state transition system or flow-chart in
the tradition of Floyd [5], Lamport [21, 22]. A state is normally taken to be a

This research was carried out while Amin Timany was at KU Leuven, working as a
postdoctoral fellow of the Flemish research fund (FWO).

c The Author(s) 2020

P. Müller (Ed.): ESOP 2020, LNCS 12075, pp. 336–365, 2020.
https://doi.org/10.1007/978-3-030-44914-8_ 13
Aneris: A Logic for Modular Reasoning about Distributed Systems 337

view of the global state and events are observable changes to this state. State
transition systems are quite versatile and have been used in other verification
applications. However, reasoning based on state transition systems often suffer
from a lack of modularity due to their very global. As a consequence, separate
nodes or components cannot be verified in isolation and the system has to be
verified as a whole.
IronFleet [7] is the first system that supports node-local reasoning for verifying
the implementation of programs that run on different nodes. In IronFleet, a
distributed system is modeled by a transition system. This transition system
is shown to be refined by the composition of a number of transition systems,
each pertaining to one of the nodes in the system. Each node in the distributed
system is shown to be correct and a refinement of its corresponding transition
system. Nevertheless, IronFleet does not allow you to reason compositionally; a
correctness proof for a distributed system cannot be used to show the correctness
of a larger system.
Higher-order concurrent separation logics (CSLs) [3, 4, 13, 15, 18, 26, 27,
28, 33, 34, 36, 39] simplify reasoning about higher-order imperative concurrent
programs by offering facilities for specifying and proving correctness of programs in
a modular way. Indeed, their support for modular reasoning (a.k.a. compositional
reasoning) is the key reason for their success. Disel [35] is a separation logic
that does support compositional reasoning about distributed systems, allowing
correctness proofs of distributed systems to be used for verifying larger systems.
However, Disel struggles with node-local reasoning in that it cannot hide node-
local usage of mutable state. That is, the use of internal state in nodes must be
exposed in the high-level protocol of the system and changes to the internal state
are only possible upon sending and receiving messages over the network.
Finally, both Disel and IronFleet restrict nodes to run only sequential programs
and no node-level concurrency is supported.
In this paper we present Aneris, a framework for implementing and reasoning
about functional correctness of distributed systems. Aneris is based on concurrent
separation logic and supports modular reasoning with respect to both nodes
(node-local reasoning) and threads within nodes (thread-local reasoning). The
Aneris framework consists of a programming language, AnerisLang, for writing
realistic, real-world distributed systems and a higher-order concurrent separation
logic for reasoning about these systems. AnerisLang is a concurrent ML-like
programming language with higher-order functions, local state, threads, and
network primitives. The operational semantics of the language, naturally, involves
multiple hosts (each with their own heap and multiple threads) running in a
network. The Aneris logic is build on top of the Iris framework [13, 15, 18]
and supports machine-verified formal proofs in the Coq proof assistant about
distributed systems written in AnerisLang.

Networking. There are several ways of adding network primitives to a program-

ming language. One approach is message-passing using ﬁrst-class communication
channels á la the π-calculus or using an implementation of the actor model as
done in high-level languages like Erlang, Elixir, Go, and Scala. However, any
338 M. Krogh-Jespersen et al.

such implementation is an abstraction built on top of network sockets where all

data has to be serialized, data packets may be dropped, and packet reception
may not follow the transmission order. Network sockets are a quintessential
part of building eﬃcient, real-world distributed systems and all major operating
systems provide an application programming interface (API) to them. Likewise,
AnerisLang provides support for datagram-like sockets by directly exposing a
simple API with the core methods necessary for socket-based communication
using the User Datagram Protocol (UDP) with duplicate protection. This allows
for a wide range of real-world systems and protocols to be implemented (and
veriﬁed) using the Aneris framework.

Modular Reasoning in Aneris. In general, there are two diﬀerent ways to support
modular reasoning about distributed systems corresponding to how components
can be composed. Aneris enables simultaneously both:

– Vertical composition: when reasoning about programs within each node, one
is able to compose proofs of different components to prove correctness of the
whole program. For instance, the specification of a verified data structure,
e.g. a concurrent queue, should suffice for verifying programs written against
that data structure, independently of its implementation.
– Horizontal composition: at each node, a verified thread is composable with
other verified threads. Similarly, a verified node is composable with other
verified nodes which potentially engage in different protocols. This naturally
aids implementing and verifying large-scale distributed systems.

Node-local variants of the standard rules of CSLs like, for example, the bind rule
and the frame rule (as explained in Sect. 2) enable vertical reasoning. Sect. 6
showcases vertical reasoning in Aneris using a replicated distributed logging
service that is implemented and verified using a separate implementation and
specification of the two-phase commit protocol.
Horizontal reasoning in Aneris is achieved through the Thread-par-rule and
the Node-par-rule (further explained in Sect. 2) which intuitively says that to
verify a distributed system, it suffices to verify each thread and each node in
isolation. This is analogous to how CSLs allow us to reason about multi-threaded
programs by considering individual threads in isolation; in Aneris we extend
this methodology to include both threads and nodes. Where most variants of
concurrent separation logic use some form of an invariant mechanism to reason
about shared-memory concurrency, we abstract the communication between nodes
over the network through socket protocols that restrict what can be sent and
received on a socket and allow us to share ownership of logical resources among
nodes. Sect. 5 showcases horizontal reasoning in Aneris using an implementation
and a correctness proof for a simple addition service that uses a load balancer to
distribute the workload among several addition servers. Each node is verified in
isolation and composed to form the final distributed system.

Contributions. In summary, we make the following contributions:

Aneris: A Logic for Modular Reasoning about Distributed Systems 339

– We present AnerisLang, a formalized higher-order functional programming

language for writing distributed systems. The language features higher-order
store, node-local concurrency, and network sockets, allowing for dynamic cre-
ation and binding of sockets to addresses with serialization and deserialization
primitives for encoding and parsing messages.
– We define the Aneris logic, the first higher-order concurrent separation logic
with support for network sockets and with support for both node-local and
thread-local reasoning.
– We introduce a simple and novel approach to specifying network protocols;
a mechanism that supports separation-logic-style modular specifications of
distributed systems.
– We conduct two case studies that showcase how our framework aids the
implementation and verification of real-world distributed systems using com-
positional reasoning:
• A replicated logging service that is implemented and verified using a sep-
arate implementation and specification of the two-phase commit protocol,
demonstrating vertical compositional reasoning.
• A load balancer that distributes work on multiple servers by means of
node-local multi-threading. We use this to verify a simple addition service
that uses the load balancer to distribute its requests over multiple servers,
demonstrating horizontal compositional reasoning.
– We have formalized all of the theory and examples on top of Iris in the Coq
proof assistant using the MoSeL framework [19]. The Coq formalization can
be found online at https://iris-project.org/artifacts/2020-esop-aneris.tar.gz.

Outline. We start by describing the core concepts of the Aneris framework in

Sec. 2. We then describe the AnerisLang programming language (Sec. 3) before
presenting the Aneris logic proof rules and stating our adequacy theorem, i.e.,
soundness of Aneris, in Sec. 4. Subsequently, we use the logic to verify a load
balancer (Sec. 5) and a two-phase-commit implementation with a replicated
logging client (Sec. 6). We discuss related work in Sec. 7 and conclude in Sec. 8.

2 The Core Concepts of Aneris

In this section we present our methodology to modular veriﬁcation of distributed
systems. We begin by recalling the ideas of thread-local reasoning and protocols
from concurrent separation logic and explain how we lift those ideas to node-
local reasoning. Finally, we illustrate the Aneris methodology for specifying,
implementing, and verifying distributed systems by developing a simple addition
service and a lock server. The distributed systems are composed of individually
veriﬁed concurrently running nodes communicating asynchronously by exchanging
messages that can be reordered or dropped.

2.1 Local and Thread-Local Reasoning

The most important feature of (concurrent) separation logic is, arguably, how
it enables scalable modular reasoning about pointer-manipulating programs.
340 M. Krogh-Jespersen et al.

Separation logic is a resource logic, in the sense that propositions denote not only
facts about the state, but ownership of resources. Originally, separation logic [32]
was introduced for modular reasoning about the heap—i.e. the notion of resource
was fixed to be logical pieces of the heap. The essential idea is that we can give a
local specification {P } e {v.Q} to a program e involving only the footprint of e.
Hence, while verifying e, we need not consider the possibility that another piece
of code in the program might interfere with e; the program e can be verified
without concern for the environment in which e may occur. Local specifications
can then be lifted to more global specifications by framing and binding:

{P } e {v.Q} {P } e {v.Q} ∀v.{Q} K[v] {w.R}

{P ∗ R} e {v.Q ∗ R} {P } K[e] {w.R}

where K denotes an evaluation context. The symbol ∗ denotes separating con-

junction. Intuitively, P ∗ Q holds for a given resource (in this case a heap) if
it can be divided into two disjoint resources such that P holds for one and Q
holds for the other. Thus, the frame rule essentially says that executing e for
which we know {P } e {x.Q} cannot possibly aﬀect parts of the heap that are
separate from its footprint. Another related separation logic connective is −∗, the
separating implication. Proposition P −∗ Q describes a resource that, combined
with a disjoint resource satisfying P , results in a resource satisfying Q.
Since its introduction, separation logic has been extended to resources be-
yond heaps and with more sophisticated mechanisms for modular control of
interference. Concurrent separation logics (CSLs) [28] allow reasoning about
concurrent programs and a preeminent feature of these program logics is again
the support for modular reasoning, in this case with respect to concurrency
through thread-local reasoning. When reasoning about a concurrent program we
consider threads one at a time and need not reason about interleavings of threads
explicitly. In a way, our frame here includes, in addition to the shared fragments
of the heap and other resources, the execution of other threads which can be
interleaved throughout the execution of the thread being veriﬁed. This can be
seen from the following disjoint concurrency rule:
Thread-par
{P1 } n; e1 {v.Q1 } {P2 } n; e2 {v.Q2 }
{P1 ∗ P2 } n; e1 || e2 {v.∃v1 , v2 .v = (v1 , v2 ) ∗ Q1 [v1 /v] ∗ Q2 [v2 /v]}

where e1 || e2 denotes parallel composition of expressions e1 and e2 and we use

the notation n; e to denote an expression e running on a node with identiﬁer
n.1
Inevitably, at some point threads typically have to communicate with one
another through some kind of shared state, an unavoidable form of interference.
The original CSL used a simple form of resource invariant in which ownership of
a shared resource can be transferred between threads.
1
In a language with fork-based concurrency, the parallel composition operator is an
easily deﬁned construct and the rule is derivable from a more general fork-rule.
Aneris: A Logic for Modular Reasoning about Distributed Systems 341

A notable program logic in the family of concurrent separation logics is Iris

that is speciﬁcally designed for reasoning about programs written in concurrent
higher-order imperative programming languages. Iris has already proven to be
versatile for reasoning about a number of sophisticated properties of programming
languages [12, 16, 37]. In order to support modular reasoning about concurrent
programs Iris features (1) impredicative invariants for expressing protocols on
shared state among multiple threads and (2) allows for encoding of higher-order
ghost state using a form of partial commutative monoids for reasoning about
resources. We will give examples of these features and explain them in more
detail as needed.

2.2 Node-Local Reasoning

Programs written in AnerisLang are higher-order imperative concurrent programs

that run on multiple nodes in a distributed system. When reasoning about
distributed systems in Aneris, alongside heap-local and thread-local reasoning,
we also reason node-locally. When proving correctness of AnerisLang programs
we reason about each node of the system in isolation, akin to how we in CSLs
reason about each thread in isolation.
By virtue of building on Iris, reasoning in Aneris is naturally modular with
respect to separation logic frames and with respect to threads. What Aneris
adds on top of this is support for node-local reasoning about programs. This is
expressed by the following rule:

Node-par
{P1 ∗ IsNode(n1 ) ∗ FreePorts(ip 1 , P)} n1 ; e1 {True}
{P2 ∗ IsNode(n2 ) ∗ FreePorts(ip 2 , P)} n2 ; e2 {True}
{P1 ∗ P2 ∗ FreeIp(ip1 ) ∗ FreeIp(ip 2 )} S; (n1 ; ip 1 ; e1 ) ||| (n2 ; ip 2 ; e2 ) {True}

where ||| denotes parallel composition of two nodes with identiﬁer n1 and n2
running expressions e1 and e2 with IP addresses ip 1 and ip 2 .2 The set P =
{p | 0 ≤ p ≤ 65535} denotes a ﬁnite set of ports.
Note that only a distinguished system node S can start new nodes (as
elaborated on in Sect. 3). In Aneris, the execution of the distributed system
starts with the execution of S as the only node in the system. In order to start
a new node associated with ip address ip one provides the resource FreeIp(ip)
which indicates that ip is not used by other nodes. The node can then rely
on the fact that when it starts, all ports on ip are available. The resource
IsNode(n) indicates that the node n is a node in the system and keeps track of
abstract state related to our modeling of node n’s heap and allocated sockets.
To facilitate modular reasoning, free ports can be split: if A ∩ B = ∅ then
FreePorts(ip, A) ∗ FreePorts(ip, B)
FreePorts(ip, A ∪ B) where
denotes
2
In the same way as the parallel composition rule is derived from a more general
fork-based rule, this composition rule is also an instance of a more general rule for
spawning nodes shown in Sect. 3.
342 M. Krogh-Jespersen et al.

logical equivalence of Aneris propositions (of type iProp). We will use FreePort(a)
as shorthand for FreePorts(ip, {p}) where a = (ip, p).
Finally, observe that the node-local postconditions are simply True, in contrast
to the arbitrary thread-local postconditions in the Thread-par-rule that carry
over to the main thread. In the concurrent setting, shared memory provides
reliable communication and synchronization between the child threads and the
main thread; in the rule for parallel composition, the main thread will wait for
the two child processes to ﬁnish. In the distributed setting, there are no such
guarantees and nodes are separate entities that cannot synchronize with the
distinguished system node.

Socket Protocols. Similar to how classical CSLs introduce the concept of resource
invariants for expressing protocols on shared state among multiple threads, we
introduce the simple and novel concept of socket protocols for expressing protocols
among multiple nodes. With each socket address—a pair of an IP address and
a port—a protocol is associated, which restricts what can be communicated on
that socket.
A socket protocol is a predicate Φ : Message → iProp on incoming messages
received on a particular socket. One can think of this as a form of rely-guarantee
reasoning since the socket protocol will be used to restrict the distributed en-
vironment’s interference with a node on a particular socket. In Aneris we write
a ⇒ Φ to mean that socket address a is governed by the protocol Φ. In particular,
if a ⇒ Φ and a ⇒ Ψ then Φ and Ψ are equivalent.3 Moreover, the proposition is
duplicable: a ⇒ Φ
a ⇒ Φ ∗ a ⇒ Φ.
Conceptually, a socket is an abstract representation of a handle for a local
endpoint of some channel. We further restrict channels to use the User Datagram
Protocol (UDP) which is asynchronous, connectionless, and stateless. In accor-
dance with UDP, Aneris provides no guarantee of delivery or ordering although
we assume duplicate protection. We assume duplicate protection to simplify
our examples, as otherwise the code of all of our examples would have to be
adapted to cope with duplication of messages. One can think of sockets in Aneris
as open-ended multi-party communication channels without synchronization.
It is noteworthy that inter-process communication can happen in two ways.
Thread-concurrent programs can communicate both through the shared heap and
by sending messages through sockets. For memory-separated programs running
on different nodes all communication is by message-passing.
In the logic, we consider both static and dynamic socket addresses. This
distinction is entirely abstract and at the level of the logic. Static addresses come
with primordial protocols, agreed upon before starting the distributed system,
whereas dynamic addresses do not. Protocols on static addresses are primarily
intended for addresses pointing to nodes that offer a service.
To distinguish between static and dynamic addresses, we use a resource
Fixed(A) which denotes that the addresses in A are static and should have a fixed
3
The predicate equivalence is under a later modality in order to avoid self-referential
paradoxes. We omit it for the sake of presentation as this is an orthogonal issue.
Aneris: A Logic for Modular Reasoning about Distributed Systems 343

interpretation. This proposition expresses knowledge without asserting ownership

of resources and is duplicable: Fixed(A)
Fixed(A) ∗ Fixed(A).
Corresponding to the two kinds of addresses we have two diﬀerent rules,
Socketbind-static and Socketbind-dynamic, for binding an address to a socket
as seen below. Both rules consume an instance of Fixed(A) and FreePort(a) as well
as a resource z →n None. The latter keeps track of the address associated with
the socket handle z on node n and ensures that the socket is bound only once as
further explained in Sect. 4. Notice that the protocol Φ in Socketbind-dynamic
can be freely chosen.

Socketbind-static
{Fixed(A) ∗ a ∈ A ∗ FreePort(a) ∗ z →n None}
n; socketbind z a
{x. x = 0 ∗ z →n Some a}

Socketbind-dynamic
{Fixed(A) ∗ a ∈ A ∗ FreePort(a) ∗ z →n None}
n; socketbind z a
{x. x = 0 ∗ z →n Some a ∗ a ⇒ Φ}

In the remainder of the paper we will use the following shorthands in order to
simplify the presentation of our speciﬁcations.

Static(a, A, Φ) Fixed(A) ∗ a ∈ A ∗ FreePort(a) ∗ a ⇒ Φ

Dynamic(a, A) Fixed(A) ∗ a ∈
/ A ∗ FreePort(a)

2.3 Example: An Addition Service

To illustrate node-local reasoning, socket protocols, and the Aneris methodology

for specifying, implementing, and verifying distributed systems we develop a
simple addition service that oﬀers to add numbers for clients.
Fig. 1 depicts an implementation of a server and a client written in AnerisLang.
Notice that the programs look as if they were written in a realistic functional
language with sockets like OCaml. Messages are strings to make programming
with sockets easier (similar to send_substring in the Unix module in OCaml).
The server is parameterized over an address on which it will listen for requests.
The server allocates a new socket and binds the address to the socket. Then the
server starts listening for an incoming message on the socket, calling a handler
function on the message, if any. The handler function will deserialize the message,
perform the addition, serialize the result, and return it to the sender before
recursively listening for new messages.
The client is parameterized over two numbers to compute on, a server address,
and a client address. The client allocates a new socket, binds the address to the
socket, and serializes the two numbers. In the end, it sends the serialized message
344 M. Krogh-Jespersen et al.

rec server a = rec client x y srv a =

let skt = socket () in let skt = socket () in
socketbind skt a; socketbind skt a;
listen skt (rec handler msg from = let m = serialize (x, y) in
let m = deserialize msg in sendto skt m srv;
let res = serialize (π1 m + π2 m) in let res = listenwait skt in
sendto skt res from; deserialize (π1 res)
listen skt handler)

Fig. 1. An implementation of an addition service and a client written in AnerisLang.

listen and listenwait are convenient helper functions to be found in the appendix [20].

to the server address using the socket and waits for a response, projecting out
the result of the addition on arrival and deserializing it.
In order to give the server code a specification we will fix a primordial socket
protocol that will govern the address given to the server. The protocol will spell
out how the server relies on the socket. We will use from(m) and body(m) for
projections of the sender and the message body, respectively, from the message
m. We define Φadd as follows:
Φadd (m) ∃Ψ, x, y. from(m) ⇒ Ψ ∗ body(m) = serialize(x, y) ∗
∀m , body(m ) = serialize(x + y) −∗ Ψ (m )
Intuitively, the protocol demands that the sender of a message m is governed by
some protocol Ψ and that the message body body(m) must be the serialization
of two numbers x and y. Moreover, the sender’s protocol must be satisfied if the
serialization of x + y is sent as a response.
Using Φadd as the socket protocol, we can give server the specification
{Static(a, A, Φadd ) ∗ IsNode(n)} n; server a {False}.
The postcondition is allowed to be False as the program does not terminate. The
triple guarantees safety which, among others, means that if the server responds
to communication on address a it does so according to Φadd .
Similarly, using Φadd as a primordial protocol for the server address, we can
also give client a specification
{srv ⇒ Φadd ∗ srv ∈ A ∗ Dynamic(a, A) ∗ IsNode(m)}
m; client x y srv a
{v.v = x + y}
that showcases how the client is able to conclude that the response from the
server is the sum of the numbers it sent to it. In the proof, when binding a to
the socket using Socketbind-dynamic, we introduce the proposition a ⇒ Φclient
where
Φclient (m) body(m) = serialize(x + y)
and use it to instantiate Ψ when satisfying Φadd . Using the two specifications
and the Node-par-rule it is straightforward to specify and verify a distributed
system composed of, e.g., a server and multiple clients.
Aneris: A Logic for Modular Reasoning about Distributed Systems 345

2.4 Example: A Lock Server

Mutual exclusion in distributed systems is often a necessity and there are many
diﬀerent approaches for providing it. The simplest solution is a centralized
algorithm with a single node acting as the coordinator. We will develop this
example to showcase a more interesting protocol that relies on ownership transfer
of spatial resources between nodes to ensure correctness.
The code for a centralized lock server implementation is shown in Fig. 2.

rec lockserver a =
let lock = ref NONE in
let skt = socket () in
socketbind skt a;
listen skt (rec handler msg from =
if (msg = "LOCK") then
match !lock with
NONE => lock ← SOME (); sendto skt "YES" from
| SOME __ => sendto skt "NO" from
end
else lock ← NONE; sendto skt "RELEASED" from
listen skt handler)

Fig. 2. A lock server in AnerisLang.

The lock server declares a node-local variable lock to keep track of whether
the lock is taken or not. It allocates a socket, binds the input address to the
socket and continuously listens for incoming messages. When a "LOCK" message
arrives and the lock is available, the lock gets taken and the server responds
"YES". If the lock was already taken, the server will respond "NO". Finally, if
the message was not "LOCK", the lock is released and the server responds with
"RELEASED".
Our specification of the lock server will be inspired by how a lock can
be specified in concurrent separation logic. Thus we first recall how such a
specification usually looks like.
Conceptually, a lock can either be unlocked or locked, as described by a
two-state labeled transition system.

unlocked locked

K
In concurrent separation logic, the lock speciﬁcation does not describe this
transition system directly, but instead focuses on the resources needed for the
transitions to take place. In the case of the lock, the resources are simply a
non-duplicable resource K, which is needed in order to call the lock’s release
method. Intuitively, this resource corresponds to the key of the lock.
346 M. Krogh-Jespersen et al.

A typical concurrent separation logic speciﬁcation for a spin lock module

looks roughly like the following:

∃ isLock .
∧ ∀v, K. isLock(v, K)
isLock(v, K) ∗ isLock(v, K)
∧ ∀v, K. isLock(v, K) K ∗ K ⇒ False
∧ {True} newLock () {v. ∃K. isLock(v, K)}
∧ ∀v. {isLock(v, K)} acquire v {v.K}
∧ ∀v. {isLock(v, K) ∗ K} release v {True}

The intuitive reading of such a speciﬁcation is:

– Calling newLock will lead to the duplicable knowledge of the return value v
being a lock.
– Knowing that a value is a lock, a thread can try to acquire the lock and when
it eventually succeeds it will get the key K.
– Only a thread holding this key is allowed to call release.

Sharing of the lock among several threads is achieved by the isLock predicate
being duplicable. Mutual exclusion is ensured by the last bullet point together
with the requirement of K being non-duplicable whenever we have isLock(v, K).
For a leisurely introduction to such speciﬁcations, the reader may consult Birkedal
and Bizjak [1].
Let us now return to the distributed lock synchronization. To give clients
the possibility of interacting with the lock server as they would with such a
concurrent lock module, the speciﬁcation for the lock server will look like follows.

{K ∗ Static(a, A, Φlock )} n; lockserver a {False}.

This speciﬁcation simply states that a lock server should have a primordial
protocol Φlock and that it needs the key resource to begin with. To allow for the
desired interaction with the server, we deﬁne the socket protocol Φlock as follows:

acq(m, Ψ ) (body(m) = ”LOCK”) ∗

∀m . (body(m ) = ”NO”) ∨ (body(m ) = ”YES” ∗ K) −∗ Ψ (m )
rel(m, Ψ ) (body(m) = ”RELEASE”) ∗ K ∗
∀m . (body(m ) = ”RELEASED”) −∗ Ψ (m )
Φlock (m) ∃Ψ. from(m) ⇒ Ψ ∗ (acq(m, Ψ ) ∨ rel(m, Ψ ))

The protocol Φlock demands that a client of the lock has to be bound to some
protocol Ψ and that the server can receive two types of messages fulfilling either
acq(m, Ψ ) or rel(m, Ψ ). These correspond to the module’s two methods acquire
and release respectively. In the case of a "LOCK" message, the server will answer
either "NO" or "YES" along with the key resource. In either case, the answer should
suffice for fulfilling the client protocol Ψ .
Aneris: A Logic for Modular Reasoning about Distributed Systems 347

Receiving a ”RELEASE” request is similar, but the important part is that we

require a client to send the key resource K along with the message, which ensures
that only the current holder can release the lock.
One difference between the distributed and the concurrent specification is
that we allow for the distributed lock to directly deny access. The client can use
a simple loop, asking for the lock until it is acquired, if it wishes to wait until
the lock can be acquired.
There are several interesting observations one can make about the lock server
example: (1) The lock server can allocate, read, and write node-local references
but these are hidden in the specification. (2) There are no channel descriptors
or assertions on the socket in the code. (3) The lock server provides mutual
exclusion by requiring clients to satisfy a sufficient protocol.

3 AnerisLang
AnerisLang is an untyped functional language with higher-order functions, fork-
based concurrency, higher-order mutable references, and primitives for communi-
cating over network sockets. The syntax is as follows:

v ∈ Val ::= () | b | i | s | | z | rec f x = e | . . .

e ∈ Expr ::= v | x | rec f x = e | e1 e2 | ref e | ! e | e1 ← e2 | cas e1 e2 e3
| find e1 e2 e3 | substring e1 e2 e3 | i2s e | s2i e
| fork {e} | start {n; ip; e} | makeaddress e1 e2
| socket e | socketbind e1 e2 | sendto e1 e2 e3 | receivefrom e | . . .

We omit the usual operations on pairs, sums, booleans b ∈ B, and integers

i ∈ Z which are all standard. We introduce the following syntactic sugar: lambda
abstractions λx. e defined as rec _ x = e, let-bindings let x = e1 in e2 defined as
(λx. e2 )(e1 ), and sequencing e1 ; e2 defined as let _ = e1 in e2 .
We have the usual operations on locations ∈ Loc in the heap: ref v for
allocating a new reference, ! for dereferencing, and ← v for assignment.
cas v1 v2 is an atomic compare-and-set operation used to achieve synchronization
between threads on a specific memory location . Operationally, it tests whether
has value v1 and if so, updates the location to v2 , returning a boolean indicating
whether the swap succeeded or not.
The operation find finds the index of a particular substring in a string s ∈
String and substring splits a string at given indices, producing the corresponding
substring. i2s and s2i convert between integers and strings. These operations
are mainly used for serialization and deserialization purposes.
The expression fork {e} forks off a new (node-local) thread and start {n; ip; e}
will spawn a new node n ∈ Node with ip address ip ∈ Ip running the program e.
Note that it is only at the bootstrapping phase of a distributed system that a
special system-node S will be able to spawn nodes.
We use z ∈ Handle to range over socket handles created by the socket
operation. makeaddress constructs an address given an ip address and a port,
348 M. Krogh-Jespersen et al.

and the network primitives socketbind, sendto, and receivefrom correspond to

the similar BSD-socket API methods.

Operational Semantics. We deﬁne the operational semantics of AnerisLang in

three stages.
We first define a node-local, thread-local, head step reduction (e, h) (e , h )
fin
for e, e ∈ Expr and h, h ∈ Loc − Val that handles all pure and heap-related
node-local reductions. All rules of the relation are standard.
Next, the node-local head step reduction induces a network-aware head step
reduction (n; e, Σ) → (n; e , Σ ).

(e, h) (e , h )
.
n; e, (H[n → h], S, P, M) → n; e , (H[n → h ], S, P, M)

Here n ∈ Node denotes a node identiﬁer and Σ, Σ ∈ NetworkState the global

network state. Elements of NetworkState are tuples (H, S, P, M) tracking heaps
fin fin fin
H ∈ Node − Heap and sockets S ∈ Node − Handle − Option Address for all
fin fin
nodes, ports in use P ∈ Ip − ℘fin (Port), and messages sent M ∈ Id − Message.
The induced network-aware reduction is furthermore extended with rules for
the network primitives as seen in Fig. 3. The socket operation allocates a new

z ∈ dom(S(n)) S = S[n → S(n)[z → None]]

n; socket (), (H, S, P, M) → n; z, (H, S , P, M)

S(n)(z) = None
p ∈ P(ip) S = S[n → S(n)[z → Some (ip, p)]] P = P[ip → P(ip) ∪ {p}]
n; socketbind z (ip, p), (H, S, P, M) → n; 0, (H, S , P , M)

S(n)(z) = Some f rom i∈

/ dom(M) M = M[i → (f rom, to, msg, Sent)]
n; sendto z msg to, (H, S, P, M) → n; |msg|, (H, S, P, M )

S(n)(z) = Some to
M(i) = (f rom, to, msg, Sent) M = M[i → (f rom, to, msg, Received)]
n; receivefrom z, (H, S, P, M) → n; Some (msg, f rom), (H, S, P, M )

S(n)(z) = Some to
n; receivefrom z, (H, S, P, M) → n; None, (H, S, P, M)

Fig. 3. An excerpt of the rules for network-aware head reduction.

unbound socket using a fresh handle z for a node n and socketbind binds a
socket address a to an unbound socket z if the address and port p is not already
in use. Hereafter, the port is no longer available in P (ip). For bound sockets,
sendto sends a message msg to a destination address to from the sender’s address
Aneris: A Logic for Modular Reasoning about Distributed Systems 349

f rom found in the bound socket. The message is assigned a unique identifier and
tagged with a status flag Sent indicating that the message has been sent and
not received. The operation returns the number of characters sent.
To model possibly dropped or delayed messages we introduce two rules for
receiving messages using the receivefrom operation that on a bound socket either
returns a previously unreceived message or nothing. If a message is received the
status flag of the message is updated to Received
Third and finally, using standard call-by-value right-to-left evaluation contexts
K ∈ Ectx we lift the node-local head reduction to a distributed systems reduction
shown below. We write ∗ for its reflexive-transitive closure. The distributed
systems relation reduces by picking a thread on any node or forking off a new
thread on a node.
(n; e, Σ) → (n; e , Σ )
+ [n; K[e]] +
(T 1 + + T 2 , Σ) (T 1 ++ [n; K[e ]] +
+ T 2; Σ)

+ [n; K[fork {e}]] +

(T 1 + + T 2 , Σ) (T 1 +
+ [n; K[()]] +
+ T2 +
+ [n; e], Σ)

4 The Aneris Logic

As a consequence of building on the Iris framework, the Aneris logic features all
the usual connectives and rules of higher-order separation logic, some of which
are shown in the grammar below.4 The full expressiveness of the logic can be
exploited when giving speciﬁcations to programs or stating protocols.

P, Q ∈ iProp ::= True | False | P ∧ Q | P ∨ Q | P ⇒ Q |

∀x. P | ∃x. P | P ∗ Q | P −∗ Q | t = u |
γ
→n v | P | a | {P } n; e {x. Q} | . . .

Note that in Aneris the usual points-to connective about the heap, →n v, is
indexed by a node identifier n ∈ Node, asserting ownership of the singleton heap
mapping to v on node n.
The logic features (impredicative) invariants P and user-definable ghost state
γ
via the proposition a , which asserts ownership of a piece of ghost state a at
ghost location γ. The logical support for user-defined invariants and ghost state
allows one to relate (ghost and physical) resources to each other; this is vital for
our specifications as will become evident in Sect. 5 and Sect. 6. We refer to Jung
et al. [14] for a more thorough treatment of user-defined ghost state.
To reason about AnerisLang programs, the logic features Hoare triples.5 The
intuitive reading of the Hoare triple {P } n; e {x. Q} is that if the program e on
4
To avoid the issue of reentrancy, invariants are annotated with a namespace and
Hoare triples with a mask. We omit both for the sake of presentation as they are
orthogonal issues.
5
In both Iris and Aneris the notion of a Hoare triple is defined in terms of a weakest
precondition but this will not be important for the remainder of this paper.
350 M. Krogh-Jespersen et al.

node n is run in a distributed system s satisfying P , then the computation does

not get stuck and, moreover, if it terminates with a value v and in a system s ,
then s satisﬁes Q[v/x]. In other words, a Hoare triple implies safety and states
that all spatial resources that are used by e are contained in the precondition P .
In contrast to spatial propositions that express ownership, e.g., →n v,
propositions like P and {P } n; e {x. Q} express knowledge of properties that,
once true, hold true forever. We call this class of propositions persistent. Persistent
propositions P can be freely duplicated: P
P ∗ P .

4.1 The Program Logic

The Aneris proof rules include the usual rules of concurrent separation logic for
Hoare triples, allowing formal reasoning about node-local pure computations,
manipulations of the the heap, and forking of threads. Expressions e are annotated
with a node identiﬁer n, but the rules are otherwise standard.
To reason about individual nodes in a distributed system in isolation, Aneris
introduces the following rule:
Start
{P ∗ IsNode(n) ∗ FreePorts(ip, P)} n; e {True}
{P ∗ FreeIp(ip)} S; start {n; ip; e} {x. x = ()}

where P = {p | 0 ≤ p ≤ 65535}. This rule is the key rule allowing node-local

reasoning; the rule expresses exactly that to reason about a distributed system it
suﬃces to reason about each node in isolation.
As described in Sect. 3, only the distinguished system node S can start new
nodes—this is also reﬂected in the Start-rule. In order to start a new node
associated with IP address ip, the resource FreeIp(ip) is provided. This indicates
that ip is not used by other nodes. When reasoning about the node n, the proof
can rely on all ports on ip being available. The resource IsNode(n) indicates that
the node n is a valid node in the system and keeps track of abstract state related
to the modeling of node n’s heap and sockets. IsNode(n) is persistent and hence
duplicable.

Network Communication. To reason about network communication in a dis-

tributed system, the logic includes a series of rules for reasoning about socket
manipulation: allocation of sockets, binding of addresses to sockets, sending via
sockets, and receiving from sockets.
To allocate a socket it suﬃces to prove that the node n is valid by providing
the IsNode(n) resource. In return, an unbound socket resource z →n None is
given.
Socket
{IsNode(n)} n; socket () {z. z →n None}

The socket resource z →n o keeps track of the address associated with the
socket handle z on node n and takes part in ensuring that the socket is bound
Aneris: A Logic for Modular Reasoning about Distributed Systems 351

only once. It behaves similarly to the points-to connective for the heap, e.g.,
z →n o ∗ z →n o ⇒ False.
As briefly touched upon in Sect. 2, the logic offers two different rules for
binding an address to a socket depending on whether or not the address has a (at
the level of the logic) primordial, agreed upon protocol. To distinguish between
such static and dynamic addresses, we use a persistent resource Fixed(A) to keep
track of the set of addresses that have a fixed socket protocol.
To reason about a static address binding to a socket z it suffices to show that
the address a being bound has a fixed interpretation (by being in the “fixed” set),
that the port of the address is free, and that the socket is not bound.
Socketbind-static
{Fixed(A) ∗ a ∈ A ∗ FreePort(a) ∗ z →n None}
n; socketbind z a
{x. x = 0 ∗ z →n Some a}

In accordance with the BSD-socket API, the bind operation returns the integer 0
and the socket resource gets updated, reﬂecting the fact that the binding took
place.
The rule for dynamic address binding is similar but the address a should not
have a ﬁxed interpretation. Moreover, the user of the logic is free to pick the
socket protocol Φ to govern address a.
Socketbind-dynamic
{Fixed(A) ∗ a ∈ A ∗ FreePort(a) ∗ z →n None}
n; socketbind z a
{x. x = 0 ∗ z →n Some a ∗ a ⇒ Φ}

To reason about sending a message on a socket z it suﬃces to show that z is

bound, that the destination of the message is governed by a protocol Φ, and that
the message satisﬁes the protocol.
Sendto
{z →n Some f rom ∗ to ⇒ Φ ∗ Φ((f rom, to, msg, Sent))}
n; sendto z msg to
{x. x = |msg| ∗ z →n Some f rom}

Finally, to reason about receiving a message on a socket z the socket must be

bound to an address governed by a protocol Φ.
Receivefrom
{z →n Some to ∗ to ⇒ Φ}
n; receivefrom z

{ x. zx→= Some∨to ∃m.

n
None
∗
x= Some (body(m), from(m)) ∗ Φ(m) ∗ R(m)

}
352 M. Krogh-Jespersen et al.

When trying to receive a message on a socket, either a message will be received

or no message is available. This is reflected directly in the logic: if no message
was received, no resources are obtained. If a message m is received, the resources
prescribed by Φ(m) are transferred together with an unmodifiable certificate R(m)
accounting logically for the fact that message m was received. This certificate
can in the logic be used to talk about messages that has actually been received
in contrast to arbitrary messages. In our specification of the two-phase commit
protocol presented in Sect. 6, the notion of a vote denotes not just a message
with the right content but only one that has been sent by a participant and
received by the coordinator.

4.2 Adequacy for Aneris

We now state a formal adequacy theorem, which expresses that Aneris guarantees
both safety, and, that all protocols are adhered to.
To state our theorem we introduce a notion of initial state coherence: A
ﬁn
set of addresses A ⊆ Address = Ip × Port and a map P : Ip − ℘ﬁn (Port) are
said to satisfy initial state coherence if the following hold: (1) if (i, p) ∈ A then
i ∈ dom(P), and (2) if i ∈ dom(P) then P(i) = ∅.

Theorem 1 (Adequacy). Let ϕ be a ﬁrst-order predicate over values, i.e.,

ﬁn
a meta logic predicate (as opposed to Iris predicates), let P be a map Ip −
℘ﬁn (Port), and A ⊆ Address such that A and P satisfy initial state coherence.
Given a primordial socket protocol Φa for each a ∈ A, suppose that the Hoare

∗ ∗
triple
{Fixed(A) ∗ a ⇒ Φa ∗ FreeIp(i)} n1 ; e {v.ϕ(v)}
a∈A i∈dom(P)

is derivable in Aneris.
If we have

(n1 ; e, (∅, ∅, P, ∅)) ∗ ([n1 ; e1 , n2 ; e2 , . . . nm ; em ], Σ)

then the following properties hold:

1. If e1 is a value, then ϕ(e1 ) holds at the meta-level.

2. Each ei that is not a value can make a node-local, thread-local reduction step.

Given predeﬁned socket protocols for all primordial protocols and the necessary
free IP addresses, this theorem provides the normal adequacy guarantees of Iris-
like logics, namely safety, i.e., that nodes and threads on nodes cannot get stuck
and that the postcondition holds for the resulting value. Notice, however, that
this theorem also implies that all nodes adhere to the agreed upon protocols;
otherwise, a node not adhering to a protocol would be able to cause another
node to get stuck, which the adequacy theorem explicitly guarantees against.
Aneris: A Logic for Modular Reasoning about Distributed Systems 353

5 Case Study 1: A Load Balancer

AnerisLang supports concurrent execution of threads on nodes through the

fork {e} primitive. We will illustrate the beneﬁts of node-local concurrency
by presenting an example of server-side load balancing.

Clients Load balancer Servers

C1 S1
T1 : serve z1
..
. z0
T2 : serve z2
Cn S2

socket node
communication thread

Fig. 4. The architecture of a distributed system with a load balancer and two servers.

Implementation. In the case of server-side load balancing, the work distribution

is implemented by a program listening on a socket that clients send their requests
to. The program forwards the requests to an available server, waits for the
response from the server, and sends the answer back to the client. In order to
handle requests from several clients simultaneously, the load balancer can employ
concurrency by forking oﬀ a new thread for every available server in the system
that is capable of handling such requests. Each of these threads will then listen
for and forward requests. The architecture of such a system with two servers and
n clients is illustrated in Fig. 4.
An implementation of a load balancer is shown in Fig. 5. The load balancer is
parameterized over an IP address, a port, and a list of servers. It creates a socket
(corresponding to z0 in Fig. 4), binds the address, and folds a function over the
list of servers. This function forks oﬀ a new thread (corresponding to T1 and T2
in Fig. 4) for each server that runs the serve function with the newly-created
socket, the given IP address, a fresh port number, and a server as arguments.
The serve function creates a new socket (corresponding to z1 and z2 in Fig. 4),
binds the given address to the socket, and continuously tries to receive a client
request on the main socket (z0 ) given as input. If a request is received, it forwards
the request to its server and waits for an answer. The answer is passed on to
the client via the main socket. In this way, the entire load balancing process is
transparent to the client, whose view will be the same as if it was communicating
with just a single server handling all requests itself as the load balancer is simply
relaying requests and responses.

Speciﬁcation and Protocols. To provide a general, reusable speciﬁcation of the

load balancer, we will parameterize its socket protocol by two predicates Pin
and Pout that are both predicates on a message m and a meta-language value
354 M. Krogh-Jespersen et al.

rec load__balancer ip port servers = rec serve main ip port srv =

let skt = socket () in let skt = socket () in
let a = makeaddress ip port in let a = makeaddress ip port in
socketbind skt a; socketbind skt a;
listfold (λ server, acc. (rec loop () =
fork { serve skt ip acc server }; match receivefrom main with
acc + 1) 1100 servers SOME m =>
sendto skt (π1 m) srv;
let res = π1 (listenwait skt) in
sendto main res (π2 m); loop ()
| NONE => loop ()
end) ()

Fig. 5. An implementation of a load balancer in AnerisLang. listfold and listenwait

are convenient helper functions available in the appendix [20].

v. The two predicates are application specific and used to give logical accounts
of the client requests and the server responses, respectively. Furthermore, we
parameterize the protocol by a predicate Pval on a meta-language value that
will allows us to maintain ghost state between the request and response as will
become evident in following.
In our specification, the sockets where the load balancer and the servers
receive requests (the blue sockets in Fig. 4) will all be governed by the same
socket protocol Φrel such that the load balancer may seamlessly relay requests
and responses between the main socket and the servers, without invalidating any
socket protocols. We define the generic relay socket protocol Φrel as follows:
Φrel (Pval , Pin , Pout )(m) ∃Ψ, v. from(m) ⇒ Ψ ∗ Pin (m, v) ∗ Pval (v) ∗
(∀m . Pval (v) ∗ Pout (m , v) −∗ Ψ (m ))
When verifying a request, this protocol demands that the sender (corresponding
to the red sockets in Fig. 4) is governed by some protocol Ψ , that the request
fulfills the Pin and Pval predicates, and that Ψ is satisfied given a response that
maintains Pval and satisfies Pout .
When verifying the load balancer receiving a request m from a client, we
obtain the resources Pin (m, v) and Pval (v) for some v according to Φrel . This
suffices for passing the request along to a server. However, to forward the server’s
response to the client we must know that the server behaves faithfully and
gave us the response to the right request value v. Φrel does not give us this
immediately as the v is existentially quantified. Hence we define a ghost resource
LB(π, s, v) that provides fractional ownership for π ∈ (0, 1], which satisfies
LB(1, s, v)
LB( 12 , s, v) ∗ LB( 12 , s, v), and for which v can only get updated if
π = 1 and in particular LB(π, s, v) ∗ LB(π, s, v ) =⇒ v = v for any π. Using
this resource, the server with address s will have PLB (s) as its instantiation of
Pval where
PLB (s)(v) LB( 12 , s, v).
When verifying the load balancer, we will update this resource to the request
value v when receiving a request (as we have the full fraction) and transfer
Aneris: A Logic for Modular Reasoning about Distributed Systems 355

LB( 12 , s, v) to the server with address s handling the request and, according to
Φrel , it will be required to send it back along with the result. Since the server
logically only gets half ownership, the value cannot be changed. Together with
the fact that v is also an argument to Pin and Pout , this ensures that the server
fulﬁlls Pout for the same value as it received Pin for. The socket protocol for the
serve function’s socket (z1 and z2 in Fig. 4) that communicates with a server
with address s can now be stated as follows.

Φserve (s, Pout )(m) ∃v. LB( 12 , s, v) ∗ Pout (m, v)

Since all calls to the serve function need access to the main socket in order to
receive requests, we will keep the socket resource required in an invariant ILB
which is shared among all the threads:

ILB (n, z, a) z →n Some a

The speciﬁcation for the serve function becomes:

{ ILB (n, main, amain ) ∗ Dynamic((ip, p), A) ∗ IsNode(n) ∗ LB(1, s, v) ∗

amain ⇒ Φrel (λ_.True, Pin , Pout ) ∗ s ⇒ Φrel (PLB (s), Pin , Pout ) }
n; serve main ip p s
{False}

The specification requires the address amain of the socket main to be governed
by Φrel with a trivial instantiation of Pval and the address s of the server to
be governed by Φrel with Pval instantiated by PLB . The specification moreover
expects resources for a dynamic setup, the invariant that owns the resource
needed to verify use of the main socket, and a full instance of the LB(1, s, v)
resource for some arbitrary v.
With this specification in place the complete specification of our load balancer
is immediate (note that it is parameterized by Pin and Pout ):

{ }
Static((ip, p), A, φrel (λ_.True, Pin , Pout )) ∗ IsNode(n) ∗
⎛ ⎞
⎝
∗
p ∈ports
Dynamic((ip, p ), A)⎠ ∗

∗

∃v. LB(1, s, v) ∗ s ⇒ φrel (PLB (s), Pin , Pout )
s∈srvs
n; load_balancer ip p srvs
{True}

where ports = [1100, · · · , 1100 + |srvs|]. In addition to the protocol setup for
each server as just described, for each port p ∈ ports which will become the
endpoint for a corresponding server, we need the resources for a dynamic setup,
and we need the resource for a static setup on the main input address (ip, p).
356 M. Krogh-Jespersen et al.

In the accompanying Coq development we provide an implementation of

the addition service from Sect. 2.3, both in the single server case and in a load
balanced case. For this particular proof we let the meta-language value v be a
pair of integers corresponding to the expected arguments. In order to instantiate
the load balancer speciﬁcation we choose

Pin
add
(m, (v1 , v2 )) body(m) = serialize(v1 , v2 )
Pout
add
(m, (v1 , v2 )) body(m) = serialize(v1 + v2 )

with serialize being the same serialization function from Sect. 2.3. We build and
verify two distributed systems, (1) one consisting of two clients and an addition
server and (2) one including two clients, a load balancer and three addition servers.
We prove both of these systems safe and the proofs utilize the specifications we
have given for the individual components. Notice that Φrel (λ_.True, Pin add
, Pout
add
)
and Φadd from Sect. 2.3 are the same. This is why we can use the same client
specification in both system proofs. Hence, we have demonstrated Aneris’ ability
and support for horizontal composition of the same modules in different systems.
While the load balancer demonstrates the use of node-local concurrency, its
implementation does not involve shared memory concurrency, i.e., synchronization
among the node-local threads. The appendix [20] includes an example of a
distributed system, where clients interact with a server that implements a bag.
The server uses multiple threads to handle client requests concurrently and
the threads use a shared bag data structure governed by a lock. This example
demonstrates Aneris’ ability to support both shared-memory concurrency and
distributed networking.

6 Case Study 2: Two-Phase Commit

A typical problem in distributed systems is that of consensus and distributed

commit; an operation should be performed by all participants in a system or none
at all. The two-phase commit protocol (TPC) by Gray [6] is a classic solution
to this problem. We study this protocol in Aneris as (1) it is widely used in
the real-world, (2) it is a complex network protocol and thus serves as a decent
benchmark for reasoning in Aneris, and (3) to show how an implementation can
be given a speciﬁcation that is usable for a client that abstractly relies on some
consensus protocol.
The two-phase commit protocol consists of the following two phases, each
involving two steps:

1. (a) The coordinator sends out a vote request to each participant.

(b) A participant that receives a vote request replies with a vote for either
commit or abort.
2. (a) The coordinator collects all votes and determines a result. If all par-
ticipants voted commit, the coordinator sends a global commit to all.
Otherwise, the coordinator sends a global abort to all.
Aneris: A Logic for Modular Reasoning about Distributed Systems 357

(b) All participants that voted for a commit wait for the ﬁnal verdict from
the coordinator. If the participant receives a global commit it locally
commits the transaction, otherwise the transaction is locally aborted. All
participants must acknowledge.

Our implementation and specification details can be found in the appendix [20]
and in the accompanying Coq development, but we will emphasize a few key
points.
To provide general, reusable implementations and specifications of the coordi-
nator and participants implementing TPC, we do not define how requests, votes,
nor decisions look like. We leave it to a user of the module to provide decidable
predicates matching the application specific needs and to define the logical, local
pre- and postconditions, P and Q, of participants for the operation in question.
Our specifications use fractional ghost resources to keep track of coordinator
and participant state w.r.t. the coordinator and participant transition systems
indicated in the protocol description above. Similar to our previous case study, we
exploit partial ownership to limit when transitions can be made. When verifying
a participant, we keep track of their state and the coordinator’s state and require
all participants’ view of the coordinator state to be in agreement through an
invariant.
In short, our specification of TPC

– ensures the participants and coordinator act according to the protocol, i.e.,
• the coordinator decides based on all the participant votes,
• participants act according to the global decision,
• if the decision was to commit, we obtain the resources described by Q
for all participants,
• if the decision was to abort, we still have the resources described by P
for all participants,
– does not require the coordinator to be primordial, so the coordinator could
change from round to round.

6.1 A Replicated Log

In a distributed replicated logging system, a log is stored on several databases

distributed across several nodes where the system ensures consistency among the
logs through a consensus protocol. We have verified such a system implemented
on top of the TPC coordinator and participant modules to showcase vertical
composition of complex protocols in Aneris as illustrated in Fig. 6. The blue
parts of the diagram constitute node-local instantiations of the TPC modules
invoked by the nodes to handle the consensus process. As noted by Sergey et al.
[35], clients of core consensus protocols have not received much focus from other
major verification efforts [7, 30, 40].
Our specification of a replicated logging system draws on the generality of the
TPC specification. In this case, we use fractional ghost state to keep track of two
related pieces of information. The first keeps a logical account of the log l already
358 M. Krogh-Jespersen et al.

Clients Coordinator Databases

C1 S1

coordinate
.. updates
.

Cn S2

socket node
communication

Fig. 6. The architecture of a replicated logging system implemented using the TPC
modules (the blue parts of the diagram) with a coordinator and two databases (S1 and
S2 ) each storing a copy of the log.

stored in the database at a node at address a, LOG(π, a, l). The second one keeps
track of what the log should be updated to, if the pending round of consensus
succeeds. This is a pair of the existing log l and the (pending) change s proposed
in this round, PEND(π, a, (l, s)). We exploit fractional resource ownership by
letting the coordinator, logically, keep half of the pending log resources at all
times. Together with suitable local pre- and postconditions for the databases,
this prevents the databases from doing arbitrary changes to the log. Concretely,
we instantiate P and Q of the TPC module as follows:

Prep (p)(m) ∃l, s. (m = "REQUEST_" @ s) ∗ LOG( 12 , p, l) ∗ PEND( 12 , p, (l, s))

Qrep (p)(n) ∃l, s. LOG( 12 , p, l@s) ∗ PEND( 12 , p, (l, s))

where @ denotes string concatenation. Note how the request message specifies the
proposed change (since the string that we would like to add to the log is appended
to the requests message) and how we ensure consistency by making sure the two
ghost assertions hold for the same log. Even though l and s are existentially
quantified, we know the logs cannot be inconsistent since the coordinator retains
partial knowledge of the log. Due to the guarantees given by TPC specification,
this implies that if the global decision was to commit a change this change
will have happened locally on all databases, cf. LOG( 12 , p, l@s) in Qrep , and if
the decision was to abort, then the log remains unchanged on all databases,
cf. LOG( 12 , p, l) in Prep . We refer to the appendix [20] or the Coq development
for further details.

7 Related Work

Veriﬁcation of distributed systems has received a fair amount of attention. In

order to give a better overview, we have divided related work into four categories.
Aneris: A Logic for Modular Reasoning about Distributed Systems 359

Model-Checking of Distributed Protocols. Previous work on veriﬁcation of dis-

tributed systems has mainly focused on veriﬁcation of protocols or core network
components through model-checking. Frameworks for showing safety and liveness
properties, such as SPIN [9], and TLA+ [23], have had great success. A beneﬁt
of using model-checking frameworks is that they allow to state both safety and
liveness assertions as LTL assertions [29]. Mace [17] provides a suite for building
and model-checking distributed systems with asynchronous protocols, includ-
ing liveness conditions. Chapar [25] allows for model-checking of programs that
use causally consistent distributed key-value stores. Neither of these languages
provide higher-order functions or thread-based concurrency.

Session Types for Giving Types to Protocols. Session types have been studied for
a wide range of process calculi, in particular, typed π-calculus. The idea is to
describe two-party communication protocols as a type to ensure communication
safety and progress [10]. This has been extended to multi-party asynchronous
channels [11], multi-role types [2] which informally model topics of actor-based
message-passing and dependent session types allowing quantiﬁcation over mes-
sages [38]. Our socket protocol deﬁnitions are quite similar to the multi-party
asynchronous session types with progress encoded by having suitable ghost-
assertions and using the magic wand. Actris [8] is a logic for session-type based
reasoning about message-passing in actor-based languages.

Hoare Style Reasoning About Distributed Systems. Disel [35] is a Hoare Type
Theory for distributed program verification in Coq with ideas from separation
logic. It provides the novel protocol-tailored rules WithInv and Frame which
allow for modularity of proofs under the condition of an inductive invariant
and distributed systems composition. In Disel, programs can be extracted into
runnable OCaml programs, which is on our agenda for future work.
IronFleet [7] allows for building provably correct distributed systems by
combining TLA-style state-machine refinement with Hoare-logic verification in a
layered approach, all embedded in Dafny [24]. IronFleet also allows for liveness
assertions. For a comparison of Disel and IronFleet to Aneris from a modularity
point of view we refer to the Introduction section.

Other Distributed Verification Efforts. Verdi [40] is a framework for writing and
verifying implementations of distributed algorithms in Coq, providing a novel
approach to network semantics and fault models. To achieve compositionality, the
authors introduced verified system transformers, that is, a function that trans-
forms one implementation to another implementation with different assumptions
about its environment. This makes vertical composition difficult for clients of
proven protocols and in comparison AnerisLang seems more expressive.
EventML [30, 31] is a functional language in the ML family that can be used
for coding distributed protocols using high-level combinators from the Logic of
Events, and verify them in the Nuprl interactive theorem prover. It is not quite
clear how modular reasoning works, since one works within the model, however,
the notion of a central main observer is akin to our distinguished system node.
360 M. Krogh-Jespersen et al.

8 Conclusion
Distributed systems are ubiquitous and hence it is essential to be able to verify
them. In this paper we presented Aneris, a framework for writing and verifying
distributed systems in Coq built on top of the Iris framework. From a programming
point of view, the important aspect of AnerisLang is that it is feature-rich: it is a
concurrent ML-like programming language with network primitives. This allows
individual nodes to internally use higher-order heap and concurrency to write
efficient programs.
The Aneris logic provides node-local reasoning through socket protocols. That
is, we can reason about individual nodes in isolation as we reason about indi-
vidual threads. We demonstrate the versatility of Aneris by studying interesting
distributed systems both implemented and verified within Aneris. The adequacy
theorem of Aneris implies that these programs are safe to run.
Table 1. Sizes of implementations, specifications, and proofs in lines of code. When
proving adequacy, the system must be closed.

Module Implementation Speciﬁcation Proofs

Load Balancer (Sect. 5)
Load balancer 18 78 95
Addition Service
Server 11 15 38
Client 9 14 26
Adequacy (1 server, 2 clients) 5 12 62
Adequacy w. Load Balancing 16 28 175
(3 servers, 2 clients)
Two-phase commit (Sect. 6)
Coordinator 18 265
181
Participant 11 280
Replicated logging (Sect. 6 + appendix [20])
Instantiation of TPC - 85 -
Logger 22 19 95
Database 24 20 190
Adequacy 13 - 137
(2 dbs, 1 coordinator, 2 clients)

Relating the verification sizes of the modules from Table 1 to other formal
verification efforts in Coq indicates that it is easier to specify and verify systems
in Aneris. The total work required to prove two-phase commit with replicated
logging is 1,272 lines which is just half of the lines needed for proving the inductive
invariant for TPC in other works [35]. However, extensive work has gone into
Iris Proof Mode thus it is hard to conclude that Aneris requires less verification
effort and does not just have richer tactics.

Acknowledgments
This work was supported in part by the ModuRes Sapere Aude Advanced Grant
from The Danish Council for Independent Research for the Natural Sciences
(FNU); a Villum Investigator grant (no. 25804), Center for Basic Research in
Program Veriﬁcation (CPV), from the VILLUM Foundation; and the Flemish
research fund (FWO).
Aneris: A Logic for Modular Reasoning about Distributed Systems 361

Bibliography

[1] Birkedal, L., Bizjak, A.: Lecture notes on Iris: Higher-order concur-
rent separation logic (2017), URL http://iris-project.org/tutorial-pdfs/
iris-lecture-notes.pdf
[2] Deniélou, P., Yoshida, N.: Dynamic multirole session types. In: Ball,
T., Sagiv, M. (eds.) Proceedings of the 38th ACM SIGPLAN-SIGACT
Symposium on Principles of Programming Languages, POPL 2011,
Austin, TX, USA, January 26-28, 2011, pp. 435–446, ACM (2011),
https://doi.org/10.1145/1926385.1926435
[3] Dinsdale-Young, T., Birkedal, L., Gardner, P., Parkinson, M.J., Yang,
H.: Views: compositional reasoning for concurrent programs. In: Gi-
acobazzi, R., Cousot, R. (eds.) The 40th Annual ACM SIGPLAN-
SIGACT Symposium on Principles of Programming Languages, POPL
’13, Rome, Italy - January 23 - 25, 2013, pp. 287–300, ACM (2013),
https://doi.org/10.1145/2429069.2429104
[4] Dinsdale-Young, T., Dodds, M., Gardner, P., Parkinson, M.J., Vafeiadis, V.:
Concurrent abstract predicates. In: D’Hondt, T. (ed.) ECOOP 2010 - Object-
Oriented Programming, 24th European Conference, Maribor, Slovenia, June
21-25, 2010. Proceedings, Lecture Notes in Computer Science, vol. 6183, pp.
504–528, Springer (2010), https://doi.org/10.1007/978-3-642-14107-2_24
[5] Floyd, R.W.: Assigning meanings to programs. Mathematical aspects of
computer science 19(19-32), 1 (1967)
[6] Gray, J.: Notes on data base operating systems. In: Flynn, M.J., Gray, J.,
Jones, A.K., Lagally, K., Opderbeck, H., Popek, G.J., Randell, B., Saltzer,
J.H., Wiehle, H. (eds.) Operating Systems, An Advanced Course, Lec-
ture Notes in Computer Science, vol. 60, pp. 393–481, Springer (1978),
https://doi.org/10.1007/3-540-08755-9_9
[7] Hawblitzel, C., Howell, J., Kapritsos, M., Lorch, J.R., Parno, B., Roberts,
M.L., Setty, S.T.V., Zill, B.: Ironﬂeet: proving practical distributed systems
correct. In: Miller, E.L., Hand, S. (eds.) Proceedings of the 25th Symposium
on Operating Systems Principles, SOSP 2015, Monterey, CA, USA, October
4-7, 2015, pp. 1–17, ACM (2015), https://doi.org/10.1145/2815400.2815428
[8] Hinrichsen, J.K., Bengtson, J., Krebbers, R.: Actris: session-type
based reasoning in separation logic. PACMPL 4, 6:1–6:30 (2020),
https://doi.org/10.1145/3371074
[9] Holzmann, G.J.: The model checker SPIN. IEEE Trans. Software Eng. 23(5),
279–295 (1997), https://doi.org/10.1109/32.588521
[10] Honda, K., Vasconcelos, V.T., Kubo, M.: Language primitives and type
discipline for structured communication-based programming. In: Hankin,
C. (ed.) Programming Languages and Systems - ESOP’98, 7th European
Symposium on Programming, Held as Part of the European Joint Conferences
on the Theory and Practice of Software, ETAPS’98, Lisbon, Portugal, March
362 M. Krogh-Jespersen et al.

28 - April 4, 1998, Proceedings, Lecture Notes in Computer Science, vol.

1381, pp. 122–138, Springer (1998), https://doi.org/10.1007/BFb0053567
[11] Honda, K., Yoshida, N., Carbone, M.: Multiparty asynchronous session types.
In: Necula, G.C., Wadler, P. (eds.) Proceedings of the 35th ACM SIGPLAN-
SIGACT Symposium on Principles of Programming Languages, POPL 2008,
San Francisco, California, USA, January 7-12, 2008, pp. 273–284, ACM
(2008), https://doi.org/10.1145/1328438.1328472
[12] Jung, R., Jourdan, J., Krebbers, R., Dreyer, D.: Rustbelt: securing the
foundations of the rust programming language. PACMPL 2(POPL), 66:1–
66:34 (2018), https://doi.org/10.1145/3158154
[13] Jung, R., Krebbers, R., Birkedal, L., Dreyer, D.: Higher-order ghost state.
In: Proceedings of the 21st ACM SIGPLAN International Conference on
Functional Programming, p. 256–269, ICFP 2016, Association for Com-
puting Machinery, New York, NY, USA (2016), ISBN 9781450342193,
https://doi.org/10.1145/2951913.2951943
[14] Jung, R., Krebbers, R., Jourdan, J., Bizjak, A., Birkedal, L., Dreyer,
D.: Iris from the ground up: A modular foundation for higher-
order concurrent separation logic. J. Funct. Program. 28, e20 (2018),
https://doi.org/10.1017/S0956796818000151
[15] Jung, R., Swasey, D., Sieczkowski, F., Svendsen, K., Turon, A., Birkedal, L.,
Dreyer, D.: Iris: Monoids and invariants as an orthogonal basis for concurrent
reasoning. In: Rajamani, S.K., Walker, D. (eds.) Proceedings of the 42nd
Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming
Languages, POPL 2015, Mumbai, India, January 15-17, 2015, pp. 637–650,
ACM (2015), https://doi.org/10.1145/2676726.2676980
[16] Kaiser, J., Dang, H., Dreyer, D., Lahav, O., Vafeiadis, V.: Strong logic
for weak memory: Reasoning about release-acquire consistency in Iris. In:
Müller, P. (ed.) 31st European Conference on Object-Oriented Programming,
ECOOP 2017, June 19-23, 2017, Barcelona, Spain, LIPIcs, vol. 74, pp.
17:1–17:29, Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2017),
https://doi.org/10.4230/LIPIcs.ECOOP.2017.17
[17] Killian, C.E., Anderson, J.W., Braud, R., Jhala, R., Vahdat, A.: Mace:
language support for building distributed systems. In: Ferrante, J.,
McKinley, K.S. (eds.) Proceedings of the ACM SIGPLAN 2007 Con-
ference on Programming Language Design and Implementation, San
Diego, California, USA, June 10-13, 2007, pp. 179–188, ACM (2007),
https://doi.org/10.1145/1250734.1250755
[18] Krebbers, R., Jung, R., Bizjak, A., Jourdan, J., Dreyer, D., Birkedal, L.:
The essence of higher-order concurrent separation logic. In: Yang, H. (ed.)
Programming Languages and Systems - 26th European Symposium on
Programming, ESOP 2017, Held as Part of the European Joint Conferences
on Theory and Practice of Software, ETAPS 2017, Uppsala, Sweden, April
22-29, 2017, Proceedings, Lecture Notes in Computer Science, vol. 10201, pp.
696–723, Springer (2017), https://doi.org/10.1007/978-3-662-54434-1_26
[19] Krebbers, R., Timany, A., Birkedal, L.: Interactive proofs in higher-order
concurrent separation logic. In: Castagna, G., Gordon, A.D. (eds.) Proceed-
Aneris: A Logic for Modular Reasoning about Distributed Systems 363

ings of the 44th ACM SIGPLAN Symposium on Principles of Programming

Languages, POPL 2017, Paris, France, January 18-20, 2017, pp. 205–217,
ACM (2017)
[20] Krogh-Jespersen, M., Timany, A., Ohlenbusch, M.E., Gregersen, S.O.,
Birkedal, L.: Aneris: A mechanised logic for modular reasoning about dis-
tributed systems - technical appendix (2020), URL https://iris-project.org/
pdfs/2020-esop-aneris-final-appendix.pdf
[21] Lamport, L.: Proving the correctness of multiprocess pro-
grams. IEEE Trans. Software Eng. 3(2), 125–143 (1977),
https://doi.org/10.1109/TSE.1977.229904
[22] Lamport, L.: The implementation of reliable distributed multiprocess sys-
tems. Computer Networks 2, 95–114 (1978), https://doi.org/10.1016/0376-
5075(78)90045-4
[23] Lamport, L.: Hybrid systems in TLA+ . In: Grossman, R.L., Nerode, A.,
Ravn, A.P., Rischel, H. (eds.) Hybrid Systems, Lecture Notes in Computer
Science, vol. 736, pp. 77–102, Springer (1992), https://doi.org/10.1007/3-
540-57318-6_25
[24] Leino, K.R.M.: Dafny: An automatic program verifier for functional cor-
rectness. In: Clarke, E.M., Voronkov, A. (eds.) Logic for Programming,
Artificial Intelligence, and Reasoning - 16th International Conference, LPAR-
16, Dakar, Senegal, April 25-May 1, 2010, Revised Selected Papers, Lec-
ture Notes in Computer Science, vol. 6355, pp. 348–370, Springer (2010),
https://doi.org/10.1007/978-3-642-17511-4_20
[25] Lesani, M., Bell, C.J., Chlipala, A.: Chapar: certified causally consistent dis-
tributed key-value stores. In: Bodík, R., Majumdar, R. (eds.) Proceedings of
the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Pro-
gramming Languages, POPL 2016, St. Petersburg, FL, USA, January 20 - 22,
2016, pp. 357–370, ACM (2016), https://doi.org/10.1145/2837614.2837622
[26] Ley-Wild, R., Nanevski, A.: Subjective auxiliary state for coarse-grained
concurrency. In: Giacobazzi, R., Cousot, R. (eds.) The 40th Annual ACM
SIGPLAN-SIGACT Symposium on Principles of Programming Languages,
POPL ’13, Rome, Italy - January 23 - 25, 2013, pp. 561–574, ACM (2013),
https://doi.org/10.1145/2429069.2429134
[27] Nanevski, A., Ley-Wild, R., Sergey, I., Delbianco, G.A.: Communicating
state transition systems for fine-grained concurrent resources. In: Shao, Z.
(ed.) Programming Languages and Systems - 23rd European Symposium on
Programming, ESOP 2014, Held as Part of the European Joint Conferences
on Theory and Practice of Software, ETAPS 2014, Grenoble, France, April
5-13, 2014, Proceedings, Lecture Notes in Computer Science, vol. 8410, pp.
290–310, Springer (2014), https://doi.org/10.1007/978-3-642-54833-8_16
[28] O’Hearn, P.W.: Resources, concurrency, and local reasoning. Theor. Comput.
Sci. 375(1-3), 271–307 (2007), https://doi.org/10.1016/j.tcs.2006.12.035
[29] Pnueli, A.: The temporal logic of programs. In: 18th Annual Symposium
on Foundations of Computer Science, Providence, Rhode Island, USA, 31
October - 1 November 1977, pp. 46–57, IEEE Computer Society (1977),
https://doi.org/10.1109/SFCS.1977.32
364 M. Krogh-Jespersen et al.

[30] Rahli, V., Guaspari, D., Bickford, M., Constable, R.L.: Formal specification,
verification, and implementation of fault-tolerant systems using EventML.
ECEASST 72 (2015), https://doi.org/10.14279/tuj.eceasst.72.1013
[31] Rahli, V., Guaspari, D., Bickford, M., Constable, R.L.: EventML: Spec-
ification, verification, and implementation of crash-tolerant state ma-
chine replication systems. Sci. Comput. Program. 148, 26–48 (2017),
https://doi.org/10.1016/j.scico.2017.05.009
[32] Reynolds, J.C.: Separation logic: A logic for shared mutable data structures.
In: 17th IEEE Symposium on Logic in Computer Science (LICS 2002), 22-25
July 2002, Copenhagen, Denmark, Proceedings, pp. 55–74, IEEE Computer
Society (2002), https://doi.org/10.1109/LICS.2002.1029817
[33] da Rocha Pinto, P., Dinsdale-Young, T., Gardner, P.: Tada: A logic for time
and data abstraction. In: Jones, R.E. (ed.) ECOOP 2014 - Object-Oriented
Programming - 28th European Conference, Uppsala, Sweden, July 28 -
August 1, 2014. Proceedings, Lecture Notes in Computer Science, vol. 8586,
pp. 207–231, Springer (2014), https://doi.org/10.1007/978-3-662-44202-9_9
[34] Sergey, I., Nanevski, A., Banerjee, A.: Mechanized verification of fine-grained
concurrent programs. In: Grove, D., Blackburn, S. (eds.) Proceedings of the
36th ACM SIGPLAN Conference on Programming Language Design and
Implementation, Portland, OR, USA, June 15-17, 2015, pp. 77–87, ACM
(2015), https://doi.org/10.1145/2737924.2737964
[35] Sergey, I., Wilcox, J.R., Tatlock, Z.: Programming and proving
with distributed protocols. PACMPL 2(POPL), 28:1–28:30 (2018),
https://doi.org/10.1145/3158116
[36] Svendsen, K., Birkedal, L.: Impredicative concurrent abstract predicates.
In: Shao, Z. (ed.) Programming Languages and Systems - 23rd European
Symposium on Programming, ESOP 2014, Held as Part of the European Joint
Conferences on Theory and Practice of Software, ETAPS 2014, Grenoble,
France, April 5-13, 2014, Proceedings, Lecture Notes in Computer Science,
vol. 8410, pp. 149–168, Springer (2014), https://doi.org/10.1007/978-3-642-
54833-8_9
[37] Timany, A., Stefanesco, L., Krogh-Jespersen, M., Birkedal, L.: A logical
relation for monadic encapsulation of state: proving contextual equiva-
lences in the presence of runST. PACMPL 2(POPL), 64:1–64:28 (2018),
https://doi.org/10.1145/3158152
[38] Toninho, B., Caires, L., Pfenning, F.: Dependent session types via intuitionis-
tic linear type theory. In: Schneider-Kamp, P., Hanus, M. (eds.) Proceedings
of the 13th International ACM SIGPLAN Conference on Principles and
Practice of Declarative Programming, July 20-22, 2011, Odense, Denmark,
pp. 161–172, ACM (2011), https://doi.org/10.1145/2003476.2003499
[39] Turon, A., Dreyer, D., Birkedal, L.: Unifying refinement and hoare-style
reasoning in a logic for higher-order concurrency. In: Morrisett, G., Uustalu,
T. (eds.) ACM SIGPLAN International Conference on Functional Program-
ming, ICFP’13, Boston, MA, USA - September 25 - 27, 2013, pp. 377–390,
ACM (2013), https://doi.org/10.1145/2500365.2500600
Aneris: A Logic for Modular Reasoning about Distributed Systems 365

[40] Wilcox, J.R., Woos, D., Panchekha, P., Tatlock, Z., Wang, X., Ernst, M.D.,
Anderson, T.E.: Verdi: a framework for implementing and formally verifying
distributed systems. In: Grove, D., Blackburn, S. (eds.) Proceedings of the
36th ACM SIGPLAN Conference on Programming Language Design and
Implementation, Portland, OR, USA, June 15-17, 2015, pp. 357–368, ACM
(2015), https://doi.org/10.1145/2737924.2737958

Jacob Laurel( )
and Sasa Misailovic

University of Illinois Urbana-Champaign, Department of Computer Science

Urbana, Illinois 61820, USA
{jlaurel2,misailo}@illinois.edu

Abstract. Probabilistic Programming oﬀers a concise way to represent

stochastic models and perform automated statistical inference. However,
many real-world models have discrete or hybrid discrete-continuous dis-
tributions, for which existing tools may suffer non-trivial limitations.
Inference and parameter estimation can be exceedingly slow for these
models because many inference algorithms compute results faster (or
exclusively) when the distributions being inferred are continuous. To
address this discrepancy, this paper presents Leios. Leios is the first ap-
proach for systematically approximating arbitrary probabilistic programs
that have discrete, or hybrid discrete-continuous random variables. The
approximate programs have all their variables fully continualized. We
show that once we have the fully continuous approximate program, we
can perform inference and parameter estimation faster by exploiting the
existing support that many languages offer for continuous distributions.
Furthermore, we show that the estimates obtained when performing in-
ference and parameter estimation on the continuous approximation are
still comparably close to both the true parameter values and the esti-
mates obtained when performing inference on the original model.

Keywords: Probabilistic Programming · Program Transformation · Continuity

· Parameter Synthesis · Program Approximation

1 Introduction

Probabilistic programming languages (PPLs) oﬀer an intuitive way to model

uncertainty by representing complex probability models as simple programs [28].
A probabilistic programming system then performs fully automated statistical
inference on this program by conditioning on observed data, to obtain a posterior
distribution, all while hiding the intricate details of this inference process.
Probabilistic inference is a computationally hard task, even for programs
containing only Bernoulli distributions (#P-complete [18]), but prior work has
shown that for many inference algorithms, continuous and smooth distributions
(such as Gaussians) can be signiﬁcantly easier to handle than the distributions
having discrete components or discontinuities in their densities [15, 53, 52, 9, 56].

c The Author(s) 2020

P. Müller (Ed.): ESOP 2020, LNCS 12075, pp. 366–393, 2020.
https://doi.org/10.1007/978-3-030-44914-8_ 14
Continualization of Probabilistic Programs With Correction 367

Fig. 1: Overview of Leios

However, many popular Bayesian models can have distributions which are
discrete or hybrid discrete-continuous mixtures (denoted simply as “hybrid”)
leading to computationally inefficient inference for much the same reason. Par-
ticularly when the observed variable is a discrete-continuous mixture, inference
may fail altogether [65]. Likewise even if the observed variable and likelihood
are continuous, the prior or important latent variables, may be discrete (e.g.,
Binomial) leading to an equally difficult discrete inference problem [61, 50].
In fact, a number of popular inference algorithms such as Hamiltonian Monte
Carlo [48], NUTS [31, 50], or versions of Variational Inference (VI) [9] only work
for restricted classes of programs (e.g. by requiring each latent be continuous)
to avoid these problems. Furthermore, we cannot always marginalize away the
program’s discrete component since it is often precisely the one we are interested
in. Even if the parameter was one which could be safely marginalized out, doing
so may require the programmer to use advanced domain knowledge to analyti-
cally solve and obtain a new model and re-write the program completely, which
can be well beyond the abilities of the average PPL user.
Problem statement: We address the question of how to accurately approx-
imate the semantics of a probabilistic program P whose prior or likelihood is
either discrete or hybrid, with a new program PC , where all variables follow
continuous distributions, so that we can exploit the aforementioned inference
algorithms to improve inference in an easy, off-the-shelf fashion.
While a programmer could manually rewrite the probabilistic program or
model and apply approximations in an ad hoc manner, such as simply adding
Gaussian noise to each variable, this would be neither sufficient nor wise. For
instance, it has been shown that when a model contains Gaussians, how they
are programatically written and parametrized can impact the inference time and
quality [29, 5]. Also, by not correcting for continuity in the program’s branch
conditions, one could significantly alter the probability of executing a particular
program branch, and hence alter the overall distribution represented by the
probabilistic program.
Leios: We introduce a fully automated program analysis framework to continu-
alize probabilistic programs for significantly improved inference performance, es-
pecially in cases where inference was originally intractable or prohibitively slow.
An input to Leios is a probabilistic program, which consists of (1) model
that specifies the prior distributions and how the latent variables are related,
368 J. Laurel and S. Misailovic

(2) specifications of observable variables, and (3) specifications of data sets. Leios
transforms the model, given the set of the observable variables. This model is
then substituted back into the original program to produce a fully continuous
probabilistic program leading to greatly improved inference. Furthermore the
approximated program can easily be reused with different, unseen data.
Figure 1 presents the main workflow of Leios :
– Distribution transformer and Boolean predicate correction: Leios first finds
individual discrete distribution sample statements to replace with continu-
ous approximations based on known convergence theorems that specifically
match the distributions’ first moments [23]. Leios then performs a dataflow
analysis to identify and then correct Boolean predicates in branches to best
preserve the original program’s probabilistic control flow. To correct Boolean
predicates, we convert the program to a sketch and fill in the predicates with
holes that will then be synthesized with the optimal values. We ensure that
the distribution of the model’s observed variables is fully continuous with
a differentiable density function, by transforming it using an approach that
adapts Smooth Interpretation [14] to probabilistic programs. We describe
the transformations in Section 4.
– Parameter Synthesizer: Leios determines the optimal parameters which min-
imize a numerical approximation of the Wasserstein Distance to fill in the
holes in the program sketch. This step of the algorithm can be thought of as
a “training phase” much like in machine learning, and we need only perform
it once for a given program, regardless of the number of times we will later
perform inference on different data sets. These parameters correspond to
continuity correction factors in classical probability theory [23]. We describe
the synthesizer in Section 5.
Contributions: This paper makes the following main contributions:
– Concept: To the best of our knowledge, Leios is the first technique to auto-
mate program transformations that approximate discrete or hybrid discrete-
continuous probabilistic programs with fully continuous ones to improve in-
ference. It combines insights from probability theory, program analysis, com-
piler autotuning, and machine learning.
– Program Transformation: Leios implements a set of transformations on
distributions and the conditional statements that can produce provably con-
tinuous probabilistic programs that approximate the original ones.
– Parameter Synthesis: We present a synthesis algorithm that corrects the
probabilities of taking specific branches in the probabilistic program and
improves the overall inference accuracy.
– Evaluation: We evaluated Leios on a set of ten benchmarks from existing
literature and two systems, WebPPL (using MCMC sampling) and Pyro
(using stochastic variational inference). The results demonstrate that Leios
can achieve a substantial decrease in inference time compared to the origi-
nal model, while still achieving high inference accuracy. We also show how
a continualized program allows for easy off-the-shelf inference that is not
always readily available to discrete or hybrid models.
Continualization of Probabilistic Programs With Correction 369

1 Model {
1 Data := [ 1 2 , 8 , . . . ] ;
2 p r i o r = Uniform ( 2 0 , 5 0 ) ;
2
3 mu p = prior;
3 Model {
4 sigma p = sqrt(prior);
4 p r i o r = Uniform ( 2 0 , 5 0 ) ;
5 R e c r u i t e r s = Gaussian(mu p,sigma p) ;
5 Recruiters = Poisson ( p r i o r ) ;
6
6
7 perfGPA = 4 ;
7 perfGPA = Gaussian(4,β) ;
8 regGPA = 4∗ Beta ( 7 , 3 ) ;
8 regGPA = 4∗ Beta ( 7 , 3 ) ;
9 GPA = Mix ( perfGPA , . 0 5 , regGPA , . 9 5 )
9 GPA = Mix ( perfGPA , . 0 5 , regGPA , . 9 5 )
10
10
11 i f ( 4 - θ1 < GPA < 4+ θ2 ) {
11 i f (GPA == 4 ) {
12 mu = Recruiters ∗ 0.9;
12 I n t e r v i e w s = Bin ( R e c r u i t e r s , . 9 ) ;
13 sigma = sqrt(Recruiters∗0.9∗0.1);
13 } e l s e i f (GPA > 3 . 5 ) {
14 I n t e r v i e w s = Gaussian(mu,sigma) ;
14 I n t e r v i e w s = Bin ( R e c r u i t e r s , . 6 ) ;
15 } e l s e i f ( GPA > 3.5 + θ3 ) {
15 } else {
16 mu = Recruiters ∗ 0.6;
16 I n t e r v i e w s = Bin ( R e c r u i t e r s , . 5 ) ;
17 sigma= sqrt(Recruiters∗0.6∗0.4);
17 }
18 I n t e r v i e w s = Gaussian(mu,sigma) ;
18
19 } else {
19 O f f e r s = Bin ( I n t e r v i e w s , 0 . 4 ) ;
20 mu = Recruiters ∗ 0.5;
20 }
21 sigma = sqrt(Recruiters∗0.5∗0.5);
21
22 I n t e r v i e w s = Gaussian(mu,sigma) ;
22 f o r d i n Data {
23 }
23 factor ( Offers , d) ;
24 mu2 = Interviews ∗ 0.4;
24 }
25 sigma2 = sqrt(Interviews∗0.4∗0.6);
25
26 O f f e r s = Gaussian(mu2,sigma2) ;
26 return p r i o r ;
27 }

(a) (b)

Fig. 2: (a) Program P and (b) the Continualized Model Sketch

2 Example
Figure 2 (a) presents a program that infers the parameters of the distribution
modeling the number of recruiters coming to a recruiting fair given both the
number of offers multiple students receive (line 1). As the number of recruiters
may vary year to year, we model this count as a Poisson distribution (line 5).
However, to accurately quantify how much this count varies year to year, we
want to estimate the unknown parameter of this Poisson variable. We thus place
a uniform prior over this parameter (line 4).
The example represents the student GPAs in lines 7-9: it is either a perfect
4.0 score or any number between 0 and 4. We model the perfect GPA with a dis-
crete distribution that has all the probability mass at 4.0 (line 7). To model the
imperfect GPA, we use a Beta distribution (line 8), scaled by 4 to lie in the range
[0.0, 4.0]. Finally, the distribution of the GPAs is a mixture of these two compo-
nents (line 9). Our mixture assumes that 5% of students obtain perfect GPAs.
Because the GPA impacts the number of interviews a student receives, our
model incorporates control flow where each branch captures the distribution
of interviews received, conditioned on the GPA being in a certain range (lines
11-17). Each student’s resume is available to all recruiters and each recruiter
can request an interview or not, hence all three of the Interviews distributions
follow a Binomial distribution (here denoted as bin) with the same n (number of
recruiters) but with different probabilities (higher probabilities for higher GPAs).
From the factor statement (line 23) we see that the Offers variable governs the
370 J. Laurel and S. Misailovic

distribution of the observed data, hence it is the observed variable. Furthermore,

given the values of all latent variables, Offers follows a Binomial distribution
(line 19), hence the likelihood function of this program is discrete.
This program poses several challenges for inference. First, it contains dis-
crete latent variables (such as the Binomials), which are expensive to sample
from or rule out certain inference methods [26]. Second, it contains a hybrid
discrete-continuous distribution governing the student GPA, and such hybrid
distributions are challenging for inference algorithms [65]. Third, the model has
complex control flow introduced by the if statements, making the observable
data follow a (potentially multimodal) mixture distribution, which is yet an-
other obstacle to efficient inference [43, 17]. Lastly, the discrete distribution of
the observed data and likelihood also hinder the inference efficiency [61, 50, 59].

2.1 Continualization
Our approach starts from the observation that inference with continuous distri-
butions is often more efficient for several inference algorithms [53, 52, 56]. Leios
first continualizes discrete and hybrid distributions in the original model. Start-
ing in line 5 in Figure 2 (b), we approximate the Poisson variable with a Gaussian
using a classical result [16], hence relaxing the constraint that the number of re-
cruiters be an integer. (For ease of presentation we created new variables mu p
and sigma p corresponding to the parameters of the approximation; Leios sim-
ply inlines these.) We next approximate the discrete component of the GPA
hybrid mixture distribution by a Gaussian centered at 4 and small tunable stan-
dard deviation β (line 7). The GPA is now a mixture of two continuous distri-
butions. We then transform all of the Binomials to Gaussians (lines 14, 18, 22,
and 26) using another classic approximation [23].
Finally, Leios smooths the observed variables by a Gaussian to ensure the
likelihood function is both fully continuous and differentiable. In this example
we see that the approximation of the Binomial already makes the distribution of
Offers (given all latent values) a Gaussian, hence this final step is not needed.
After continualization, the GPA cannot be exactly 4.0, thus we need to re-
pair the first conditional branch of the continualized program. In line 11, we re-
place the exact equality predicate with the interval predicate 4-θ1 < GPA < 4+θ2
where each θ is a hole whose value Leios will synthesize. Leios finds all such
branching predicates by tracking transitive data dependencies of all continual-
ized variables.

2.2 Parameter Synthesis

Our continuous approximation should be close enough to the original model
such that upon performing inference on the approximation, the estimations ob-
tained will also be close to the ground-truth values. Hence Leios needs to ensure
that the values synthesized for each θ are such that for every conditional state-
ment, the probability of executing the true branch in the continualized program
roughly matches the original (ensuring similar likelihoods). In probability the-
ory, this value has a natural interpretation as a continuity correction factor as
Continualization of Probabilistic Programs With Correction 371

1 Model {
2 p r i o r = Uniform ( 2 0 , 5 0 ) ;
3 mu p = p r i o r ;
4 sigma p = s q r t ( p r i o r ) ;
5 R e c r u i t e r s = G a u s s i a n (mu p, sigma p ) ;
6
7 perfGPA = G a u s s i a n ( 4 , 0.1 ) ;
8 regGPA = 4∗ Beta ( 7 , 3 ) ;
9 GPA = Mix ( perfGPA , . 0 5 , regGPA , . 9 5 ) ;
10
11 if ( 3.99999 < GPA < 4.95208 ) {
12 mu = R e c r u i t e r s ∗ 0 . 9 ;
13 sigma = s q r t ( R e c r u i t e r s ∗ 0 . 9 ∗ 0 . 1 ) ;
14 I n t e r v i e w s = G a u s s i a n (mu, sigma ) ;
15 } e l s e i f (GPA > 3.500122 ) {
16 mu = R e c r u i t e r s ∗ 0 . 6 ;
17 sigma = s q r t ( R e c r u i t e r s ∗ 0 . 6 ∗ 0 . 4 ) ;
18 I n t e r v i e w s = G a u s s i a n (mu, sigma ) ; }
19 } else {
20 mu = R e c r u i t e r s ∗ 0 . 5 ;
21 sigma = s q r t ( R e c r u i t e r s ∗ 0 . 5 ∗ 0 . 5 ) ;
22 I n t e r v i e w s = G a u s s i a n (mu, sigma ) ;
23 }
24 (b)
25 mu2 = I n t e r v i e w s ∗ 0 . 4 ;
26 sigma2 = s q r t ( I n t e r v i e w s ∗ 0 . 4 ∗ 0 . 6 ) ;
27 O f f e r s = G a u s s i a n ( mu2 , sigma2 ) ;
28 }

(a)

Fig. 3: (a) the fully continualized model and (b) Convergence of the Synthesis
Step for multiple β.

it “corrects’ the probability of a predicate being true after applying continuous

approximations. For the (GPA == 4) condition, we might think about using a
typical continuity correction factor of 0.5 [23], and transform it to 4-0.5 < GPA
< 4+0.5. However, in that case, the second else if (GPA > 3.5) branch would
never execute, thus significantly changing the program’s semantics (and thus the
likelihood function). Experimentally, such an error can lead to highly inaccurate
inference results.
Hence we must synthesize a better continuity correction factor that makes the
approximated model “closest” to the original program’s with respect to a well-
defined distance metric between probability distributions. In this paper, we will
use the common Wasserstein distance, which we describe later in Section 5. The
objective function aims to find the continuity correction factors that minimize
the Wasserstein distance between the original and continualized models.
Figure 3 (a) shows the continualized model. Leios calculated that the optimal
values for the first branch are θ1 = 0.00001 (hence the lower bound is 3.99999)
and θ2 = 0.95208 (hence the upper bound is 4.95208) in line 11, and θ3 = 0.00012
(hence the lower bound is 3.500122) for the branch in line 15. Intuitively the
synthesizer found the upper bound 4.95208 so that any sample larger than 4
(which must have come from the right tail of the continualized perfect GPA)
is consumed by the first branch, instead of accidentally being consumed by the
second branch.
372 J. Laurel and S. Misailovic

Fig. 4: Visual comparison between Model Distribution of Original Program

with Naive Smoothing and Leios (both with β = 0.1)

Another part of the synthesis step is to make sure that approximations do

not introduce run-time errors. Since Interviews is now sampled from Gaus-
sian, there is a small possibility that it could become negative, thus causing
a runtime error (since we later take its square root). By dynamically sampling
the continualized model during the parameter synthesis, as part of a light-weight
auto-tuning step, Leios checks if such an error exists. If it does, Leios can instead
use a Gamma approximation (which is always non-negative).
While continualization incurs additional computational cost, this cost is typi-
cally amortized. In particular, continualization needs to be performed only once.
The continualized model can be then be used multiple times for inference on
diﬀerent data-sets. Further, we experimentally observed that our synthesis step
is fast. In this example, for all the values of β we evaluated, this step required
only a few hundred iterations to converge to the optimal continuity correction
factors, as shown in Figure 3 (b).

2.3 Improving Inference

Upon constructing the continuous approximation of the model, we now wish to
perform inference by conditioning upon the outcomes of 25 sampled students.
To make a fair comparison, we compile both the original and continuous versions
down to Webppl [26] and run MCMC inference (with 3500 samples and a burn-
in of 700). We also seek to understand how smoothing latent variables improves
inference, thus we also compare against a naively continualized version where
only the observed variable was smoothed using the same β, number of MCMC
samples and burn-in.
Figure 4 presents the distribution of the Offers variable in the original
model, naively smoothed model, and the Leios-optimized model. The continu-
ous approximation achieved by Leios is smooth and unimodal, unlike the naively
smoothed approximation, which is highly multimodal. However all models have
similar means
Using these three models for inference, Figure 5 (a) presents the posterior
distribution of the variable param for each approach. We ﬁnally take the mean as
Continualization of Probabilistic Programs With Correction 373

Metric Leios Naive Original

Accuracy 0.058 0.069 0.090

Runtime (s) 0.604 0.631 0.805

(b)

(a)

Fig. 5: (a) Posteriors of each method – the true value is equal to 37. (b) Avg.
Accuracy and Inference time; the bars represent accuracy (left Y-axis), the lines
represent time (right Y-axis).

the point-estimate, τest , of the parameter’s true value τ . Figure 5 (b) presents the
run time and the error ratio, | τ −ττ est |, for each approach (for the given true value
of 37). It shows that our continualized version leads to the fastest inference.

3 Syntax and Semantics of Programs

We present the syntax and semantics of the probabilistic programming language
on which our analyses is deﬁned

3.1 Source Language Syntax

Program ::= DataBlock ? ; M odel { Stmt + } ; ObserveBlock ?; return Var ;
Stmt ::= skip | abort | Var := Expr | Var := Dist | CONST Var := Expr
| Stmt ; Stmt | { Stmt } | condition ( BExpr )
| if ( BExpr ) Stmt else Stmt | for i = Int to Int Stmt
| while ( BExpr ) Stmt
Expr ::= Expr ArithOp Expr | f (Expr ) | Real | Int | Var
BExpr ::= BExpr or BExpr | BExpr and BExpr | not BExpr
| Expr RelOp Expr | ( BExpr )
DataBlock ::= Data:= [(Int | Real )∗ ]
ObserveBlock ::= for D in Data { factor(Var,D); }
Dist ::= ContDist | DiscDist
ContDist ∈ {Gaussian, Uniform, etc.}, DiscDist ∈ {Binomial, Bernoulli, etc.}
ArithOp ∈ {+, −, ∗, /, ∗∗}, f ∈ {log, abs, sqrt, exp}, RelOp ∈ {<, ≤, ==}

The syntax is similar to the ones used in [24, 51]. Unlike [51], our syntax does include
exact equality predicates, which introduce diﬃculties during the approximation. To give
the developer the ﬂexibility in selecting which parts of the program to continualize,
we add the CONST annotation. It indicates that the variable’s distribution should not
374 J. Laurel and S. Misailovic

be continualized. Until explicitly noted, we will not use this annotation in the rest
of the paper. For simplicity of exposition, we present only a single DataBlock and
ObserveBlock, but our approach naturally extends to the cases with multiple data and
observed variables.

Measure Theory Preliminaries Though various semantics have been proposed

[44, 36, 7], we adapt the sub-probability measure transformer semantics of Dahlqvist et
al. [19]. We will use the terms distribution and measure interchangeably.

Deﬁnition 1. A program state σ ∈ S is a n-tuple of real numbers: S = Rn where the

ith tuple element corresponds to the ith program variable’s value.

Deﬁnition 2. A Σ-algebra on a set X (denoted as ΣX ) is a collection of subsets of X

such that (1) X ∈ ΣX and (2) Xi ∈ ΣX ⇒ Xic ∈ ΣX (closure under complementation)
and (3) X1 , X2 ∈ ΣX ⇒ X1 ∨ X2 ∈ ΣX (closure under countable union). The tuple of
(X, ΣX ) is called a measurable space. Our semantics is deﬁned on the Borel measurable
space (Rn , B{Rn }) where B{Rn } is the standard Borel Σ-algebra over Rn .

Deﬁnition 3. A measure μover Rn is a mapping from B{Rn } to [0, +∞) such that
μ(∅) = 0 and μ( i∈N Xi ) = i∈N μ(Xi ) when all Xi are mutually disjoint. A probability
measure is a measure that satisﬁes μ(Rn ) = 1 and a sub-probability measure is one
satisfying μ(Rn ) ≤ 1. The simplest measure is the Dirac measure denoted as δai (S) =
1 if ai in S else 0. We denote the set of all sub-probability measures as M(Rn ).

Deﬁnition 4. Given measures μ1 , μ2 ∈ M(R), the product measure μ1 ⊗ μ2 ∈ M(R2 )

is deﬁned as μ1 ⊗ μ2 (B1 × B2 ) = μ1 (B1 )μ2 (B2 ) for B1 , B2 ∈ B{R}

Deﬁnition 5. Given a measure μ ∈ M(Rn ) the marginal measure of a variable xi is

deﬁned as μxi (Bi ) = μ(R × ...R × Bi × R...) for Bi ∈ B{R}

Deﬁnition 6. A kernel is a function κ : S → M(Rn ) mapping states to measures.

Deﬁnition 7. The Lebesgue measure on R (denoted Leb) is the measure that maps
any interval to its length, e.g., Leb([a, b]) = b − a. The Lebesgue measure in Rn is
simply the n-fold product measure of n copies of the Lebesgue measure on R.

Deﬁnition 8. A measure μ is absolutely continuous with respect to the Lebesgue mea-

sure Leb (denoted as μ Leb or simply μ is A.C.) iﬀ for any measurable set S
Leb(S) = 0 ⇒ μ(S) = 0.

3.2 Semantics
Expression Level Semantics Arithmetic Expression semantics are standard, they
map states σ ∈ Rn to values, equivalently Expr : Rn → R. Boolean Expression
Semantics, denoted BExpr, simply return the set of states Bi ∈ B{Rn } satisfying the
Boolean conditional.

c(σ) = c xi (σ) = σ[xi ] t1 op t2 (σ) = t1 (σ) op t2 (σ) f (t1 )(σ) = f (t1 (σ))

B1 and B2 = B1 ∩ B2 B1 or B2 = B1 ∪ B2 not B1 = Rn \ B1

e1 relop e2 = {σ ∈ Rn | e1 (σ) relop e2 (σ)}

Continualization of Probabilistic Programs With Correction 375

Distribution Semantics The interpretation of a distribution is a kernel, κ, map-

ping a state to the measure associated with the speciﬁc parametrization of the dis-
tribution in that state. Since measures are set functions we will represent them as λ
abstractions. The signature is Dist : Rn → (B{R} → [0, 1])

κCont (σ) = ContDist(e1 , e2 , ...)(σ) = λS. 1S (v) · fCont (v; e1 (σ), e2 (σ), ...)
v∈R

κDisc (σ) = DiscDist(e1 , e2 , ...)(σ) = λS. fDisc (v; e1 (σ), e2 (σ), ...)
v∈Supp∩S

Where fCont and fDisc are the density and mass functions, respectively, of the prim-
−(x−μ)2
itive distribution being sampled from (e.g., fGauss (x; μ, σ) = σ
√1 e
2π
2σ 2 · 1{σ>0} )
and Supp is the distribution’s support.

Statement Level Semantics The statement-level semantics are shown in Figure

6. We interpret each statement as a (sub) measure transformer, hence the semantic
signature is Statement : M(Rn ) → M(Rn ) . The skip statement returns the original
measure and the abort statement transforms any measure to the 0 sub-measure. The
condition statement removes measure from regions not satisfying the Boolean guard
B. The factor statement can be seen as a “smoothed” version of condition that uses g,
a function of the observed data and its distribution, to re-weight the measure associated
with a set by some real value in [0, 1] (as opposed to strictly 0 or 1). Deterministic
assignment transforms the measure into one which assigns to any set of states S the
same value that the original measure μ would have assigned to all states that end
up in S after executing the assignment statement. Probabilistic Assignment updates
the measure so that xi ’s marginal is the measure associated with Dist, but with the
parameters governed by μ.
An if else statement can be decomposed into the sum of the true branch’s mea-
sure and the false branch’s measure. The while loop semantics are the solution to the
standard least fixed point equation [19], but can also be viewed as a mixture distri-
bution where each mixture component corresponds to going through the loop k times.
A for loop is just syntactic sugar for a sequencing of a fixed number of statements.
We note that the Data block does not affect the measure (it is also syntactic sugar,
and could simply be inlined in the Observe block). The program can be thought of as
starting in some initial input measure μ0 where each variable is undefined (which could
simply mean initialized to some special value or even just zero), and as each variable
gets defined, that variable’s marginal (and hence the joint measure μ) gets updated.

4 Continualizing Probabilistic Programs

Our goal is to synthesize a new continuous approximation of the original program P .
We formally deﬁne this via a transformation operator TPβ [•]: P rogram → P rogram.
Our approach operates in two main steps:

(1) We ﬁrst locally approximate the program’s prior and latent variables using a series
of program transformations to best preserve the local structural properties of the
program and then apply smoothing globally to ensure that the likelihood function
is both fully continuous and diﬀerentiable.
376 J. Laurel and S. Misailovic

skip(μ) = μ abort(μ) = λS.0 P1 ; P2 (μ) = P2 (P1 (μ))

condition(B)(μ) = λS.μ(S ∩ B) factor(xi ,t)(μ) = λS. 1S · g(t, σ) · μ(dσ)
Rn

xi := e(μ) = λS.μ({(x1 , ..., xn ) ∈ Rn | (x1 , ..., xi−1 , e(x1 , ..xn ), xi+1 ..., xn ) ∈ S})

xi := Dist(e1 ,...ek )(μ) = λS. μ(dσ)·δx1 ⊗...δxi−1 ⊗Dist(e1 ,...ek )(σ)⊗δxi+1 ...(S)
Rn

if (B) {P1 } else {P2 }(μ) = P1 (condition(B)(μ))+P2 (condition(not B)(μ))

∞
while (B) { P1 }(μ) = (condition(B); P1 )k ; condition(not B)(μ)
k=0

Fig. 6: Denotational Semantics of Probabilistic Programs

(2) We next synthesize a set of parameters that (approximately) minimize the distance
metric between the distributions of the original and continualized models and we
use light-weight auto-tuning to ensure the approximations do not introduce run-
time errors.

4.1 Overview of the Algorithm

Algorithm 1 presents the technique for continualizing programs. It takes as input a
program P containing a prior or observed variable that is discrete (or hybrid) and
returns TPβ [P ], a probabilistic program representing a fully continuous random variable
with a differentiable likelihood function. The algorithm uses a tunable hyper-parameter
β ∈ (0, ∞) to control the amount of smoothing (like in [14]). A smaller β leads to less
smoothing, while a larger β leads to more smoothing, however the smallest β does not
always lead to the best inference, and vice-versa, as can be seen in section 7.
In line 3 of Algorithm 1 Leios constructs a standard control flow graph (CFG)
to represent the program, using a method called GetCFG(). This data structure will
form the basis of Leios’s future analyses. Each CFG node corresponds to a single
statement and contains all relevant attributes of that statement. Leios then uses this
CFG to build a data dependency graph (line 4) which will be used for checking which
variables are tainted by the approximations. In line 5 Leios then applies TPβ [•] to
obtain a continualized sketch, PC . Lastly, Leios synthesizes the optimal continuity
correction parameters (line 7), and in doing so, samples the program to detect if a
runtime error occurred, also returning a Boolean flag success to convey this information.
If a runtime error did occur we find the expression causing it (line 9) and then in
lines 10-12 reapply the safer transformations (e.g., Gamma instead of Gaussian) to all
possible dependencies which could have contributed to the runtime error.

4.2 Distribution and Expression Transformations

To continualize each variable, Leios mutates the individual distributions and expres-
sions assigned to latent variables within the program. We use a transform operator for
expressions and distributions TEβ [•]: Expr ∪Dist → Expr ∪Dist, which we deﬁne next.
Continualization of Probabilistic Programs With Correction 377

Algorithm 1: Procedure for Continualizing a Probabilistic Program

1 function Continualize (P, β);
Input : A probabilistic program P containing discrete/hybrid observable
variables and/or priors and a smoothing factor β > 0
Output: A fully continuous probabilistic program PC
2 Acceptable ← F alse;
3 CF G ← GetCFG(P );
4 DataDepGraph ← ComputeDataFlow(CF G);
5 PC ← TPβ [P ]; /* apply all continuous transformations */
6 while not Acceptable do
7 PC , success ← Synthesize(PC , P );
8 if not success:
9 D ← getInvalidExpression();
10 Deps ← getDependencies(DataDepGraph,D);
11 forall Expression in Deps do
12 PC ← reapplySafeTransformation(PC , Expression);
13 else:
14 Acceptable ← T rue;
15 end
16 return PC

Transform Operator For Distributions and Expressions We now detail

the full list of continuous probability distribution transformations that TEβ [•] uses.

⎧ √
⎪
⎪ Gaussian(λ, λ) E = P oisson(λ)
⎪
⎪
⎪
⎪ Gamma(λ, 1) E = P oisson(λ) & Gaussian fails
⎪
⎪

⎪
⎪ Gaussian(np, np(1 − p))
⎪
⎪ E = Binomial(n, p)
⎪
⎪
⎪
⎪ Gamma(n, p) E = Binomial(n, p) & Gaussian fails
⎪
⎪
⎪
⎪ U nif orm(a, b) E = DiscU nif orm(a, b)
⎪
⎪
⎪
⎪
⎪
⎨Exponential(p) E = Geometric(p)
TEβ [E] = M ixOf Gaussβ ([(1, p), (0, 1 − p)]) E = Bernoulli(p)
⎪
⎪
⎪Beta(β, β 1−p )
⎪ E = Bernoulli(p) & MixOfGauss fails
⎪
⎪ p
⎪
⎪
⎪
⎪ M ixture([(TEβ [D1 ], p1 ), ...(TEβ [D2 ], p2 )]) E = M ixture([(D1 , p1 ), ...(D2 , p2 )])
⎪
⎪
⎪
⎪ Gaussian(c, β) E = c (constant)
⎪
⎪
⎪
⎪
⎪
⎪ E E = a·xi +b (a=0)
⎪
⎪
⎪
⎪ KDE(β) E ∈ DiscDist & not covered
⎪
⎩
Gaussian(E, β) otherwise

The rationale for this definition is that these approximations all preserve key struc-
tural properties of the distributions’ shape (e.g., the number of modes) which have been
shown to strongly affect the quality of inference [25, 45, 17]. Second, these continuous
approximations all match the first moment of their corresponding discrete distributions,
which is another important feature that affects the quality of approximation [53]. We
refer the reader to [54] to see that for each distribution on the left, the corresponding
378 J. Laurel and S. Misailovic

continuous distribution on the right has the same mean. These approximations are best
when certain limit conditions are satisfied, e.g. λ ≥ 10 for approximating a Poisson dis-
tribution with Gaussian, hence the values in the program itself do affect the overall
approximation accuracy.
However, if we are not careful, a statement level transformation could introduce
runtime errors. For example, a Binomial is always non-negative, but its Gaussian ap-
proximation could be negative. This is why TEβ [•] has multiple transformations for the
same distribution. For example, in addition to using a Gaussian to approximate both a
Binomial and a Poisson, we also have a Gamma approximation since a Gamma distri-
bution is always non-negative. Likewise we have a Beta approximation to a Bernoulli
if we require that the approximation also have support in the range [0, 1]. Leios uses
auto-tuning to safeguard against such errors during the synthesis phase, whereby when
sampling the transformed program, if we encounter a run-time error of this nature,
we simply go back and try a safer (but possibly slower) alternative (Algorithm 1 line
12). Since there are only finitely many variables and (safer) transformations to apply,
this process will eventually terminate. For discrete distributions not supported by the
specific approximations, but with fixed parameters, we empirically sample them to get
a set of samples and then use a Kernel Density Estimate (KDE) [62] with a Gaussian
kernel (the KDE bandwidth is precisely β) as the approximation.
Lastly, by default all discrete random variables become approximated with contin-
uous versions, however we leave the option to the user to manually specify CONST in
front of a variable if they do not wish for it to be approximated (in which case we no
longer make any theoretical guarantees about continuity).
4.3 Influence Analysis and Control-Flow Correction of Predicates

Simply changing all instances of discrete distributions in the program to continuous

ones is not enough to closely approximate the semantics of the original program. We
additionally need to ensure that such changes do not introduce control ﬂow errors into
the program, in the sense that quantitative properties such as the probability of taking
a particular branch need to be reasonably preserved.

Avoiding Zero Probability Events A major concern of the approximation is

to ensure that no zero-probability events are introduced, such as when we have an
exact equality “==” predicate in an if, observe or while statement and the vari-
able being checked was transformed from a discrete to a continuous type. For example,
discrete programs commonly have a statement like x := Poisson(1) followed by a con-
ditional such as if (x==4), because the probability that a discrete random variable
is exactly equal to a value can be non-zero. However upon applying our distribution
transformations and transforming the distribution of x from a discrete Poisson to a con-
tinuous Gaussian, the conditional statement “if (x==4)” now corresponds to a zero
probability (or measure zero) event, as the probability that an absolutely continuous
probability measure assigns to the singleton set {4} is by definition zero. Thus, if not
corrected for, we could significantly change the probabilities of taking certain branches
and hence the overall distribution of the program.
The converse can also be true: applying approximations can make a zero proba-
bility event in the original program now have non-zero probability. For example, in
x := DiscUniform(1,5); if (x<3 and x>2) the true branch has probability zero of
executing but this becomes non-zero after approximations are applied. However, the
branch paths like these in the original model could be identified by symbolic analysis
(e.g., [24]) and removed via dead code elimination during pre-processing.
Continualization of Probabilistic Programs With Correction 379

Correcting Control Flow Probabilities via Static Analysis To prevent

zero-probability events and ensure that the branch execution probabilities of the con-
tinualized program closely matches the original’s, we use data dependence analysis to
track which if, while or condition statements have logical comparisons with vari-
ables “tainted” by the approximations. A variable v is “tainted” if it has a transitive
data dependence on an approximated variable, and we use reaching definitions analysis
[35] on the program’s CFG to identify these.
As shown in Algorithm 1 line 4, to compute the reaching definitions analysis we use
a method called ComputeDataFlow() as part of a pre-transformation pass whereby for
each program point in the CFG, each variable is marked with all the other variables
on which it has a data-dependence. These annotations are stored in a data structure
called DataDepGraph which maps nodes (program points) to sets of tuples where
each tuple contains a variable, the other variables it depends on (and where they are
assigned), and lastly, whether it will become tainted. Note that in the algorithm this
step is done before the previously discussed expression-level transformations, hence why
ComputeDataFlow() marks which variables will become continualized and which ones
will not (i.e if a variable already defines a continuous random variable or was annotated
with CONST). Furthermore, though we are computing the data dependencies before the
approximations, because the approximations do not re-order or remove statements, all
data dependencies will be the same before and after applying the approximations.

Transform Operator For Boolean Expressions We take all such control

predicates that contain an exact equality “==” comparison with a tainted variable and
transform these predicates from exact equality predicates to interval-style predicates.
Thus if we originally had a predicate of the form if(x==4) we will mutate this into a
predicate of the form if(x>4-θ1 && x<4+θ2 ) where θ are now placeholder values that
will need to be filled with a concrete value during the synthesis phase (Section 5). Hence
checking for exact equality gets relaxed to checking for containment within the interval
(4 − θ1 , 4 + θ2 ). We also need to correct < and <= predicates if one of the variables was
approximated or transitively affected by an approximation.
Hence we also define our transform operator TBβ [•] : BExpr → BExpr at the level
of Boolean expressions:

(y − θ1 < x) and (x < y + θ2 ) default
TB [(x == y)] =
β
(x == y) CONST x and CONST y specified

(x < y + θ) if x or y tainted
TBβ [(x < y)] =
(x < y) otherwise

(x ≤ y + θ) if x or y tainted
TBβ [(x ≤ y)] =
(x ≤ y) otherwise

Because we have already pre-computed DataDepGraph one can check if a variable in

a given statement or expression is tainted (or marked as CONST) in constant time.
This correction has a natural interpretation in classical probability theory. It is
well known that to approximate a discrete distribution X with a continuous one X̂,
we need a continuity correction factor, θ, such that P (X < x) ≈ P (X̂ < x + θ) (hence
why TBβ [•] also corrects < and <= predicates). For simple approximations (i.e Binomial
to Gaussian), the canonical correction factor is known (θ = 0.5) [23], however for the
general case, it is not. Furthermore, it has been shown that in many cases, 0.5 is not
the best correction factor [3].
380 J. Laurel and S. Misailovic

4.4 Bringing it all together: Full Program Transformations

Having deﬁned the transformation for distributions, arithmetic and Boolean expres-
sions, we now deﬁne the program transformation operator TPβ [•]: P rogram → P rogram
inductively:

TPβ [P1 ; P2 ] = TPβ [P1 ]; TPβ [P2 ]

TPβ [if (B) {P1 } else {P2 }] = if (TBβ [B]) TPβ [P1 ] else TPβ [P2 ]
TPβ [while(B) P1 ] = while(TBβ [B]) TPβ [P1 ]
TPβ [condition(B)] = condition(TBβ [B])
TPβ [x := E] = x := TEβ [E]
TPβ [CONST x := E] = x := E

The abort, factor and skip statements and the DataBlock remain the same after
applying the transformation operator TPβ [•].

Ensuring Smoothness Upon applying the statement-level transformations and

performing both dataflow analysis and predicate mutations, Leios ensures each latent
variable comes from a continuous distribution. However a continuous distribution may
still have jump discontinuities or non-differentiable regions in its density function (such
as a uniform distribution), which can make inference difficult [66]. Furthermore it is
known that performing parameter estimation on data that is distributed according
to a discontinuous or non-smooth density function, or on distributions with a non-
smooth likelihoods can be just as challenging [50, 1, 59]. Thus to make the Program’s
likelihood function and density function of the observed data fully smooth, we need to
apply additional Gaussian smoothing.
Since it would be redundant to apply smoothing if we already knew this variable
came from a smooth distribution (as in the example) hence we make this simple check
first. The following transformation performs this on the observed variables (which
appear in the factor statement).

xo := E if x already smooth
TPβ [xo := E] =
xo := Gaussian(E,β); otherwise
We could perform additional smoothing for every variable to ensure each has a
differentiable density, however we empirically observed that the variance added up
enough to where inference quality deteriorated, hence we only apply the additional
smoothing to observed variables.
Having defined the statement-level transformations we now state a theorem about
TPβ [•] preserving continuity. As many applications may invoke inference at any point
in the program [46, 60], it is important that absolute continuity of each marginal hold
at every point.

Theorem 1. In the transformed program, TPβ [P ], the marginal sub-probability measure

of each variable, denoted μxi , is absolutely continuous with respect to the Lebesgue
measure (denoted μxi is A.C.) at each program point for which that variable is deﬁned.

Proof. (sketch) To prove the theorem we will show that when any variable xi is initially
deﬁned, it comes from an absolutely continuous distribution and furthermore that the
Continualization of Probabilistic Programs With Correction 381

semantics of each statement in TPβ [P ] preserves the absolute continuity of each marginal
measure (where μxi ≡ μ(R × ... × Bi × R... × R)), equivalently for any statement, any
(already deﬁned) variable xi and any Borel set Bi ∈ B{R}:

μ(R × ... × Bi × R... × R) is A.C. ⇒ statement(μ)(R × ... × Bi × R... × R) is A.C.

Case 1. skip and abort: Since skip is the identity measure transformer of each de-
ﬁned marginal measure μxi was A.C. before, then they will trivially be so afterward
since they are unchanged. abort sends each marginal to the 0 sub-measure (which
is trivially A.C.).

Case 2. condition and factor: Since factor and condition only lose measure we have
condition(B)(μ)(S) ≤ μ(S) and factor(xk ,t)(μ)(S) ≤ μ(S) for any Borel set S.
Thus μ(S) = 0 ⇒ condition(B)(μ)(S) = 0 and μ(S) = 0 ⇒ factor(xk ,t)(μ)(S) = 0
since all measures are non-negative. Hence by transitivity, since μ(R×...Bi ×R...) is A.C.,
factor(xk ,t)(μ)(S)(R × ...Bi × R... × R) is A.C. and likewise for similar reasons, we
have that condition(B)(μ)(R × ...Bi × R... × R) is A.C.

Case 3. Assignment: Probabilistic assignment is straightforward. Since the continu-

alized program only samples from absolutely continuous distributions, the marginal
of the sampled variable xi will be A.C. and all other marginals μxj were A.C. by
assumption. Deterministic assignment has to be handled carefully. In the continual-
ized program the only deterministic assignments will be xi := a*xj +b; for a = 0 (all
other assignments are smoothed). The marginal μxi (S) is just μxj (aS + b) where the
set aS + b ≡ {s ∈ R | a · s + b ∈ S}. However by assumption of the A.C. of xj ,
Leb(aS + b) = 0 ⇒ μxj (aS + b) = 0, but Leb(S) = 0 ⇔ Leb(aS + b) = 0 [55], hence:
Leb(S) = 0 ⇒ Leb(aS + b) = 0 ⇒ μxj (aS + b) = 0. Lastly by the semantic deﬁnition
of xi , we have that μxj (aS + b) = 0 ⇒ μxi (S) = 0, hence Leb(S) = 0 ⇒ μxi (S) = 0 by
transitivity. All other marginals are unchanged, hence A.C. of each is preserved.

Case 4. Sequencing, if and while: Intuitively since the above statements each preserve
A.C of each marginal, any sequencing of them should too. Since the sum of two measures
that are both A.C. in each marginal is also A.C. in each marginal, if statements
preserve A.C. of each marginal. For this same reason while loops also preserve A.C.

5 Synthesis of Continuity Correction Parameters

We now present our procedure for synthesizing optimal continuity correction parame-
ters which covers lines 6 to 15 in Algorithm 1. This can be thought of as a “training”
step which fits the continualized model to the original one. It is important to note that
this step is agnostic to the observed data (it only fits to the Model ), hence it need only
be done once off-line, regardless of how many times we perform inference on new data
sets. Furthermore, even if we do not have parameters to synthesize, this step is still
useful for catching runtime errors caused by the approximations, so that we can go
back and apply safer approximations if necessary.

5.1 Optimization Framework

Ideally the posteriors of our approximated program TPβ [P ] and the original P , should
be reasonably close. However a speciﬁc posterior is induced by the corresponding data-
set, if our optimization objective tries to minimize the statistical distance from TPβ [P ]
382 J. Laurel and S. Misailovic

to P , we would simply be over-fitting to the data and we would not be able to re-use
TPβ [P ] for new data sets with different true parameters. Instead our objective is to
minimize the distance between the original model M , which is simply the fragment of
P that does not contain the data or observe block (and hence only defines the prior,
likelihood and latent variables), and the corresponding continualized approximation,
TPβ [M ]. To do so, we need to choose the best possible continuity correction factors,
θ, for TPβ [M ]. Thus we define the “optimal” parameters as those which minimize a
distance metric d between probability measures d : M(Rn ) × M(Rn ) → [0, ∞). We
also need to ensure that the metric can (a) compute the distance between discrete and
continuous distributions and (b) is such that if models or likelihoods are close with
respect to d, the posteriors should be as well.

Wasserstein Distance We choose to use the Wasserstein distance primarily be-

cause (1) it can measure the distance between a continuous and discrete distribution
(unlike KL-Divergence or Total Variation Distance) and (2) prior work has shown that
when performing inference, if using the Wasserstein distance as the chosen metric to
approximate a likelihood, the (approximate) posteriors induced are comparable to the
true posteriors (obtainable if one used the true likelihood) [49]. Additionally, unlike
other metrics, the Wasserstein metric incorporates the underlying difference in geom-
etry of the distributions (which strongly affects inference accuracy [37, 59]).
Let M (μ0 ) represent the renormalized measure associated to the observed vari-
ables of the original model and let TPβ [Mθ ](μ0 ) represent the observed variables of
the continualized model, but where a given continuity correction factor θ has been
substituted in (both measures start in initial distribution μ0 ). Furthermore, let J ⊆
M(R2 ) represent the set of all joint measures with marginal measures M (μ0 ) and
TPβ [Mθ ](μ0 ). Hence we now define the 1-Wasserstein Distance:

W (M (μ0 ), TPβ [Mθ ](μ0 )) = inf ||x − y||dJ(x, y) (1)
J∈J

We also provide further justiﬁcation why the Wasserstein Distance is a sensible

metric to use. It is well known that a mixture of Gaussians can converge in distribution
to any continuous random variable, however existing work has shown that a mixture
of Gaussians can approximate any discrete distribution in the Wasserstein Distance
arbitrarily well [20].

Objective Function We now formulate our optimization approach as follows, where

θ̂ is the parameter vector minimizing the Wasserstein Distance with respect to the
original model M , and d is the number of parameters to synthesize.

θ̂ = argmin W (M (μ0 ), TPβ [Mθ ](μ0 )) (2)

θ∈(0,1)d

To restrict the search space we follow common practice [23, 3] by requiring each θi ∈
(0, 1). Such optimization problem lacks a closed form solution. Symbolically computing
the Wasserstein Distance is intractable, hence we numerically approximate it via the
empirical Wasserstein Distance (EWD) between observed samples of M and TPβ [Mθ ].
Because this step is fully dynamic (we run and sample the model), the samples are
conditioned upon successfully terminating, and hence the model’s sub-measure has
been implicitly renormalized to a full probability measure, thus justifying the use of a
fully renormalized measure in equations (1) and (2).
Continualization of Probabilistic Programs With Correction 383

Algorithm 2: Synthesizing Optimal Continuity Correction Parameters

1 Function Synthesize P, TPβ [P ];
Input : A program P and a continualized sketch TPβ [P ] with d parameters to
be synthesized
Output: A fully continuous probabilistic program PC and a binary ﬂag
denoting the existence of a runtime error
2 if d==0 then
3 s ←sample(TPβ [P ],n);
4 if s==Error then
5 return TPβ [P ], false
6 end
7 end
8 else
9 M, TPβ [M ] ←getModel(P, TPβ [P ]);
10 for θi ∈ Grid([0, 1]d ) do
11 p, s ← Nelder-Mead(W ,θi ,M ,TPβ [M ],η,
,n);
12 if s==Error then
13 return TPβ [P ], false
14 end
15 if W(p) < W(θ̂) then
16 θ̂ ← p
17 end
18 end
19 end
20 return substitute(TPβ [P ], θ̂), true

Though intuitively we would expect that as we apply less smoothing (i.e. β < 1),
the optimal θi should also be smaller (less need for correction) and the continualized
program should become closer to the original, a simple negative result illustrates this
is not always the case and that the dependence between the smoothing and continuity
correction must be non-linear.

Remark 1. θ̂ cannot be linearly proportional to β.

Proof. Let X be the constant random variable that is 0 with probability 1 and let
X ∼ Gaussian(0, β). Furthermore, let I := (X == 0) and Ic := (cβ ≤ X ≤ cβ) be
two indicator random variables. Intuitively we want Ic to have the same probability of
being true as I for any β. However if c is constant (such as 1) then P r(cβ ≤ X ≤ cβ)
will always be the same regardless of β (when c = 1, the probability is always 0.68).

5.2 Optimization Algorithm

Algorithm 2 presents our approximate synthesis algorithm, which is called as a sub-
routine in the main algorithm. As seen in line 2, if there are no parameters to be
synthesized (d == 0) we still sample the continualized program in hopes of uncovering
a possible runtime error (or gaining statistical conﬁdence that one does not occur). We
384 J. Laurel and S. Misailovic

check for such an error in line 4 and if one exists, we return immediately, with the flag
variable set to false (line 5).
To evaluate the EWD objective function (when there are parameters to synthesize),
Algorithm 2 follows a technique from [14] and uses a Nelder-Mead search (line 11),
due to Nelder-Mead’s well known success in solving non-convex program synthesis
problems. We first extract the fragment of the programs corresponding to the models,
M and TPβ [M ], respectively in line 9. In each step of the Nelder-Mead search we take
n samples (n ≈ 500) of TPβ [M ], but with a fixed value of θi substituted into TPβ [M ],
to compute the EWD with respect to samples of the original model M (which have
been cached to avoid redundant resampling). The Nelder-Mead search steps through
the parameter space (with step size η > 0), substituting different values of θ into
TPβ [M ]. This process continues until the search converges to a minimizing parameter,
p, that is within the stopping threshold
> 0 or encounters a runtime error during
the sampling (which is checked in line 12). As before, if we encounter such an error we
immediately return with the flag set to false (line 13). Following [14], we successively
restart the Nelder-Mead search from k evenly spaced grid points in [0, 1]d (hence the
loop in line 10), to find the globally optimal parameter (hence our approach is robust
to local minima), which we successively update in lines 15-16. If no runtime error was
ever encountered, we substitute in the parameters with the minimum EWD over all
runs, θ̂, to the fully continuous program TPβ [P ] and return (line 20). Though it can be
argued this sampling is potentially as difficult as the original inference, we reiterate
that we need only do this once offline, hence the cost is easily amortized.

6 Methodology
6.1 Benchmarks
Table 1 presents the benchmarks. For each benchmark, Columns 2 and 3 present the
original prior and likelihood type, respectively. Column 4 presents whether the conti-
nuity correction was applied. Column 5 presents the time to continualize the program,
TCont. . As can be seen in Columns 4 and 5 the total continualization time, TCont. ,
depends on whether parameters had to be synthesized. GPAExample had the longest
TCont. at 3.6s, due to the complexity of the multiple predicates, however these times
are amortized as our synthesis step is done only once.
As our problem has received little attention, no standard benchmark suites exist.
In fact, to make inference tractable, for many models, developers would construct
continuous approximations by hand, in an ad hoc fashion. However we wanted a
benchmark suite that showcased all 3 inference scenarios that our approach works
for: (1) discrete/hybrid prior and discrete/hybrid likelihood (2) continuous prior but
discrete/hybrid likelihood and (3) discrete/hybrid prior but a continuous likelihood.
Therefore, we obtained the benchmarks in two ways. First, we looked at variations
of the mixed distributions benchmarks previously published in the machine learning
community, e.g., [65, 58], which served as the inspiration for our GPAExample. Sec-
ond, we took existing benchmarks [27, 30] for which designers modeled certain distri-
butions with continuous approximations, and we retro-ﬁtted these models with the
corresponding discrete distributions. This step was done for Election, Fairness,
SVMfairness, SVE, and TrueSkill. These discretizations were only applied where they
made sense, e.g., the Gauss(np,np(1-p)) in the original Election program became dis-
cretized as Binomial(n,p). We also took popular Bayesian models from Cognitive
Science literature which use multiple discrete latent variables [39] and these models
Continualization of Probabilistic Programs With Correction 385

Table 1: Description of Benchmarks

Program Prior Likelihood Correction? TCont. (s)
GPAExample Uniform Discrete 3.643
Election [27] DiscUniform Bernoulli 1.139
Fairness [2] DiscUniform Bernoulli 1.809
SVMfairness [2] Binomial Continuous 1.578
TrueSkill [30] Poisson Bernoulli 1.149
DiscreteDisease DiscUniform Discrete × 0.006
SVE [58] Uniform Hybrid × 0.009
BetaBinomial [39] Beta Discrete × 0.006
Exam [39] Uniform Discrete × 0.008
Plankton [10] DiscUniform Discrete × 0.006

are BetaBinomial and Exam. Lastly we took population models from the mathematical
biology literature [10, 4] to build benchmarks since populations are by nature discrete.
This was done for Plankton and DiscreteDisease. We present the original programs
in the appendix [38].

Implementation We implemented Leios in Python (∼4.5K LoC). All experiments

were run on an Intel Xeon, multi-core desktop running Ubuntu 16.04 with a 3.7 GHz
CPU and with 32GB RAM. All results are obtained from single-core executions.

6.2 Experimental Setup

Continualized Versions As there are no other general tools that automatically

continualize probabilistic programs in mainstream languages, we compare Leios with:

– Original Program: inference done in standard fashion on the original model, and
– Naive Smoothing : inference done on a KDE style model in which Gaussian smooth-
ing is applied only to the observed variable, but no approximations are applied to
the inner latent variables.
We will refer to these as simply “Original” and “Naive” respectively.

Inference Accuracy Comparison using Ground Truth Our experimental

design compares the respective inference estimates with the ground truth. We set the
experiments as follows: For each of the original discrete or hybrid programs P , we
replace the program variable corresponding to the prior distribution with a ﬁxed value
τ (the ground-truth) to obtain P (τ ). We then sample P (τ ) to obtain 25 observed
data points, which will be used to test inference performance on P , PNS , and PLeios
respectively. To test inference performance we then score P (original program), PNS
(naively smoothed program), and PLeios against the observed data points to infer the
posterior over the ground truth parameter τ . Note the programs only have access to
the data samples, but not τ .
For each of the 3 versions: P , PN S , and PLeios , we take the inferred posterior means
as the estimates of the value, and then compare it with the ground-truth value τ to
measure the error ratio E = τ −ττ est . This entire procedure is repeated for 10 diﬀerent
values of τ to get a representative average of inference performance over a wide range
of true parameter values.
386 J. Laurel and S. Misailovic

Table 2: Inference Times (s) and Error Ratios for each model, β = 0.1
Program Original Original Naive Naive Leios Leios
Time Error Time Error Time Error
GPAExample 0.806 0.090 0.631 0.070 0.605 0.058
Election - - 3.232 0.051 0.616 0.036
Fairness 4.396 0.057 0.563 0.056 0.603 0.093
SVMfairness - - 0.626 0.454 0.980 0.261
TrueSkill 3.668 0.009 0.494 0.059 0.586 0.053
DiscreteDisease 4.944 0.009 1.350 0.013 0.490 0.008
SVE - - 0.522 0.045 0.516 0.091
BetaBinomial 1.224 0.028 0.564 0.024 0.459 0.013
Exam 3.973 0.087 0.504 0.126 0.527 0.133
Plankton 0.570 0.017 0.457 0.080 0.453 0.042
Average 2.797 0.043 0.894 0.098 0.584 0.079

Analyzed Probabilistic Programming Systems. We used two languages in

our development: WebPPL [26] (with MCMC inference) and Pyro [8] (with Varia-
tional inference). Our implementation automatically generates WebPPL code for all
the programs. We used 3500 MCMC samples (with burn-in of 700 samples) in the
simulation. For Pyro, we only wanted to test fully-automatic black-box Variational In-
ference, hence we did not manually marginalize out discrete variables (which is often
not even applicable, as the discrete variables are the one we wish to estimate).
Inference Time Measurement We measure the time taken for inference for each
version using built-in timers (which exclude ﬁle reading and warm-up). A timeout of
10 minutes was used for the inference step. We used this same procedure for both
MCMC-based sampling in WebPPL and Variational Inference in Pyro.

7 Evaluation
We study the following three research questions:
RQ1 Can program continualization make inference faster, while still maintaining a
high degree of accuracy, compared to the original program and naive smoothing?
RQ2 How do performance and accuracy vary for diﬀerent smoothing factors β?
RQ3 Can program continualization enable running transformed programs with oﬀ-
the-shelf inference algorithms that cannot execute the original programs?

7.1 RQ1: Beneﬁts of Continualization

Table 2 presents detailed timing and accuracy errors for a single smoothing factor β
on WebPPL programs. Columns 2 and 3 present the time and error (compared to the
ground truth) for the original program. Columns 4 and 5 present time/error for the
naive smoothing and Columns 6 and 7 present time/error for Leios.
From Table 2 we can see that on average, Leios leads to faster inference than
both the Original (no approximations) and Naive (0.584s vs 2.797s and 0.894s, respec-
tively). The Naive version was also faster than the original, giving more evidence that
continuous models (even when just the observed variable is continualized) yield faster
inference.
Continualization of Probabilistic Programs With Correction 387

(a) Avg. Inference Time (b) Avg. Error Ratio

Fig. 7: Inference Times and Error ratios for Leios and Naive for diﬀerent β
For accuracy, inference performed via Leios was on average more accurate than
Naive (E = 0.079 vs. 0.098, respectively). Both were slightly less accurate than infer-
ence performed on Original (E = 0.043). This is not unreasonable as Original has no
approximations applied (which are the main source of inference error). However the
Original failed on Election, SVE, and SVMfairness. For Election, a large Binomial
latent led to a timeout, and it also slowed the Naive version relative to Leios (3.23s vs
0.61s). The Original failed on SVE since it is a hybrid discrete-continuous model (which
can make inference intractable [65, 6]). SVMfairness is a non-linear model where many
latent variables have high variances, leading to inference on the Original failing to con-
verge; Leios and Naive had higher error on this benchmark, for much the same reason
(though Leios was still signiﬁcantly better than Naive, E = 0.261 vs 0.454).
Although Leios was faster than Original in all cases, for TrueSkill and SVMfairness,
Leios was somewhat slower than Naive. This is likely because the discrete latent vari-
ables in these benchmarks had small enough parameters (Binomial with small n). Sim-
ilarly, for Fairness, Leios was slightly less accurate than Naive because the Gaussian
approximation can be less accurate for smaller n.

7.2 RQ2: Impact of Smoothing Factors

Figure 7 presents the average inference times and ERs for different smoothing factors
β. In both cases, X-axes represent smoothing factors. The Y-Axis of the left subfigure
presents time, and Y-Axis of the right presents error ratio compared to the ground
truth (less is better).
Figure 7 (a) shows that Inference on the programs constructed by Leios is non-
trivially faster than inference done on the naively smoothed version, regardless of the
β used (which has negligible affect on the inference time for the β we examined).
Figure 7 (b) presents how accuracy directly depends on β. The Error Ratio for Leios
reaches a local minimum when β = 0.1. Because Leios achieves “global” smoothing by
approximating each latent, a larger value for β is not needed (unlike Naive). We also
noticed for many benchmarks, smaller β led to better continuity correction parameters
which also leads to better inference. Naive’s performance suffers for smaller β, which
we attribute to small β creating a highly multimodal observed variable distribution
(also presented in Section 2) which hampers inference [37, 59]. Consequently, Naive
performs best when β = 0.5, however this β introduces non-trivially higher variance,
which may often negatively affect the precision of inference.
388 J. Laurel and S. Misailovic

Table 3: Variational Inference Times (s) and Error Ratios for selected β
β : 0.25 β : 0.5 β : 0.75
Program Torg Eorg TN S EN S TLeios ELeios TLeios ELeios TLeios ELeios
GPAExample - - - - 3.111 0.207 3.341 0.241 3.435 0.321
Election - - - - 1.762 0.070 1.755 0.110 1.764 0.064
Fairness - - - - 1.813 0.722 1.827 0.769 1.830 0.753
SVMfairness - - - - 1.800 0.201 1.806 0.293 1.804 0.301
TrueSkill - - - - 1.809 0.119 1.802 0.062 1.790 0.090
DiscreteDisease - - - - 1.734 0.248 1.731 0.471 1.747 0.553
SVE 0.677 0.684 1.478 3.095 1.471 0.587 1.460 0.566 1.448 0.348
BetaBinomial - - - - 1.605 0.834 1.596 0.708 1.587 0.497
Exam - - - - 0.603 0.222 0.602 0.213 0.603 0.285
Plankton - - - - 3.432 0.297 3.427 0.763 3.434 0.530

7.3 RQ3: Extending Results to Other Systems

Table 3 presents the results for running translated programs in Pyro. Columns 2-5
present the inference times and result errors for the original and naively smoothed pro-
gram. These columns are “-” when Pyro cannot successfully perform inference (i.e. the
model contains a discrete variable that is unsupported by the auto guide). Columns 6-11
present Leios’ time and error for each model, for three different smoothing parameters.
Fully-automated Variational Inference failed on all but one of the examples for
both the Original and Naive. This is because in both cases the program still contains
latent or observed discrete random variables. For most of the benchmarks (Election,
GPA, TrueSkill) the program optimized with Leios had errors comparable to those
computed previously with MCMC in WebPPL. For some the error was over 0.5 for all
β (BetaBinomial, Fairness), which is in part a consequence of limitations of automatic
VI, and hence for certain models manual fine-tuning may be unavoidable. These results
illustrate that Leios can be used to create an efficient program in situations when the
original language does not easily support non-continuous distributions.

8 Related Work

Probabilistic Program Synthesis To the best of our knowledge, we are the

first to study program transformations that approximate discrete or hybrid discrete-
continuous probabilistic programs with fully continuous ones to improve inference.
Probabilistic program synthesis takes a more ambitious task of generating probabilis-
tic programs with certain properties directly from data. For instance, Nori et al. [51]
aim to synthesize a probabilistic program given a program sketch and a data-set to
fit the program to. However, it merely fits the distribution parameters to the sketch.
Furthermore their language lacks ‘==’ comparisons. Chasins et al. [11] takes a similar
approach but only apply continuous approximations to already continuous variables.

Probabilistic Inference with Discrete and Hybrid Distributions Re-

cent work [65, 66] has explored developing languages and semantics to encode discrete-
continuous mixtures, however these all restrict the types of programs that can be
Continualization of Probabilistic Programs With Correction 389

expressed and require specialized inference algorithms. In contrast, Leios can work
with a variety of off-the-shelf inference algorithms that operate on arbitrary models
and does not need to define its own inference algorithm. In [66] the authors explored
a restricted programming language that can statically detect which parameters the
program’s density is discontinuous in. However they did not address the question of
continuous approximation, rather their approach was to develop a custom inference
scheme and restrict the language so that pathological models cannot be written (they
also disallow ‘==’ predicates). In [65], Wu et al. develop a custom inference method for
discrete-continuous mixtures but only for models encodeable as a Bayesian network,
furthermore as pointed out by [47], the specialized inference method of Wu et al. is
restrictive since it cannot be composed with other program transformations.
Additionally, Machine Learning researchers have developed other continuous relax-
ation techniques to address the inherent problems of non-differentiable models. One
other popular method is to reparametrize the gradient estimator during Variational
Inference (VI) computation, commonly called the “reparameterization trick” [42, 61].
However, this approach suffers from the fact that not all distributions support such
gradient reparameterizations, and also this method is only limited to Variational In-
ference. Conversely our approach allows one to still use any inference scheme. Further,
even though these techniques have been attempted in the probabilistic programming
setting, [40], such work still inherits the aforementioned weaknesses.
We also draw upon Kernel Density Estimation (KDE) [62], a common approxima-
tion scheme in statistics. KDE fits a Kernel density to each observed data point, hence
constructing a smooth approximation. Naive Smoothing is essentially a KDE (with
a Gaussian Kernel) of the original while Leios employs additional continualizations.
Furthermore, our smoothing factor β is analogous to the bandwidth of a KDE.

Program Analysis for Probabilistic Programs Multiple Program Analysis

frameworks and systems have been developed for Probabilistic Programming [57, 33,
63, 32, 22]. Additionally these analyses make use of a rich set of semantics [44, 36, 7,
64, 19], however of particular note is recent work by Lew et al. [41], which provides
a type system for reasoning about variational approximations; however they focus on
continuous approximations of already continuous variables.

Beneﬁts of Continuity in Conventional Programs The idea of smoothing

and working with continuous functions in non-probabilistic programs has found success
in a variety of applications [21, 12, 34, 13]. Our work derives inspiration mainly from
Smooth interpretation [14], which provides a semantics for smoothing deterministic
programs encoding a discontinuous or discrete function.

9 Conclusion

We presented Leios as a method for approximating probabilistic programs with fully

continuous versions. Our approach shows that by continualizing probabilistic programs,
it is possible to achieve substantial speed-ups in inference performance whilst still
preserving a high degree of accuracy. To this eﬀect we combined two key techniques:
statement level program transformations to continualize latent variables and a novel
continuity correction synthesis procedure to correct branch conditions.
390 J. Laurel and S. Misailovic

Acknowledgements

We would like to thank the anonymous reviewers for their constructive feedback. We
thank Darko Marinov for his helpful feedback during early stages of the work. We thank
Adithya Murali for valuable feedback about the semantics. We thank Zixin Huang and
Saikat Dutta for helpful discussions about the evaluation and Vimuth Fernando and
Keyur Joshi for helpful proofreads. JL is grateful for support from the Alfred P. Sloan
foundation for a Sloan Scholar award used to support much of this work. The research
presented in this paper has been supported in part by NSF, Grant no. CCF-1846354.

References

1. Aigner, D.J., Amemiya, T., Poirier, D.J.: On the estimation of production fron-
tiers: maximum likelihood estimation of the parameters of a discontinuous density
function. International Economic Review pp. 377–396 (1976)
2. Albarghouthi, A., D’Antoni, L., Drews, S., Nori, A.V.: Fairsquare: Probabilistic
veriﬁcation of program fairness. Proc. ACM Program. Lang. (OOPSLA) (2017)
3. Bar-Lev, S.K., Fuchs, C.: Continuity corrections for discrete distributions under
the edgeworth expansion. Methodology And Computing In Applied Probability
3(4), 347–364 (2001)
4. Becker, N.: A general chain binomial model for infectious diseases. Biometrics
37(2), 251–258 (1981)
5. Betancourt, M., Girolami, M.: Hamiltonian monte carlo for hierarchical models.
Current trends in Bayesian methodology with applications 79, 30 (2015)
6. Bhat, S., Borgström, J., Gordon, A.D., Russo, C.: Deriving probability density
functions from probabilistic functional programs. In: International Conference on
Tools and Algorithms for the Construction and Analysis of Systems. pp. 508–522.
TACAS’13 (2013)
7. Bichsel, B., Gehr, T., Vechev, M.T.: Fine-grained semantics for probabilistic pro-
grams. In: Programming Languages and Systems - 27th European Symposium on
Programming, ESOPh. pp. 145–185 (2018)
8. Bingham, E., Chen, J.P., Jankowiak, M., Obermeyer, F., Pradhan, N., Karalet-
sos, T., Singh, R., Szerlip, P., Horsfall, P., Goodman, N.D.: Pyro: Deep Universal
Probabilistic Programming. arXiv preprint arXiv:1810.09538 (2018)
9. Blei, D.M., Kucukelbir, A., McAuliﬀe, J.D.: Variational inference: A review for
statisticians. Journal of the American Statistical Association 112(518) (2017)
10. Blumenthal, S., Dahiya, R.C.: Estimating the binomial parameter n. Journal of
the American Statistical Association 76(376), 903–909 (1981)
11. Chasins, S., Phothilimthana, P.M.: Data-driven synthesis of full probabilistic pro-
grams. In: CAV (2017)
12. Chaudhuri, S., Clochard, M., Solar-Lezama, A.: Bridging boolean and quantitative
synthesis using smoothed proof search. In: ACM SIGPLAN-SIGACT Symposium
on Principles of Programming Languages. POPL ’14 (2014)
13. Chaudhuri, S., Gulwani, S., Lublinerman, R.: Continuity and robustness of pro-
grams. In: Communications of the ACM, Research Highlights. vol. 55 (2012)
14. Chaudhuri, S., Solar-Lezama, A.: Smooth interpretation. In: Proceedings of the
31st ACM SIGPLAN Conference on Programming Language Design and Imple-
mentation. pp. 279–291. PLDI ’10 (2010)
Continualization of Probabilistic Programs With Correction 391

15. Chen, Y., Ghahramani, Z.: Scalable discrete sampling as a multi-armed bandit
problem. In: Proceedings of the 33rd International Conference on International
Conference on Machine Learning - Volume 48. pp. 2492–2501. ICML’16 (2016)
16. Cheng, T.T.: The normal approximation to the poisson distribution and a proof
of a conjecture of ramanujan. Bull. Amer. Math. Soc. 55(4), 396–401 (04 1949)
17. Chung, H., Loken, E., Schafer, J.L.: Difficulties in drawing inferences with finite-
mixture models. The American Statistician 58(2), 152–158 (2004)
18. Cooper, G.F.: The computational complexity of probabilistic inference using
bayesian belief networks. Artificial Intelligence 42(2), 393 – 405 (1990)
19. Dahlqvist, F., Kozen, D., Silva, A.: Semantics of probabilistic programming: A
gentle introduction. In: Foundations of Probabilistic Programming (2020)
20. Delon, J., Desolneux, A.: A wasserstein-type distance in the space of gaussian
mixture models. arXiv preprint arXiv:1907.05254 (2019)
21. DeMillo, R.A., Lipton, R.J.: Defining software by continuous, smooth functions.
IEEE Trans. Softw. Eng. 17(4) (Apr 1991)
22. Dutta, S., Zhang, W., Huang, Z., Misailovic, S.: Storm: program reduction for
testing and debugging probabilistic programming systems. In: Proceedings of the
2019 27th ACM Joint Meeting on European Software Engineering Conference and
Symposium on the Foundations of Software Engineering. pp. 729–739 (2019)
23. Feller, W.: On the normal approximation to the binomial distribution. Ann. Math.
Statist. 16(4), 319–329 (12 1945)
24. Gehr, T., Misailovic, S., Vechev, M.T.: PSI: exact symbolic inference for proba-
bilistic programs. In: Computer Aided Verification, CAV. pp. 62–83 (2016)
25. Gelman, A.: Parameterization and bayesian modeling. Journal of the American
Statistical Association 99(466), 537–545 (2004)
26. Goodman, N.D., Stuhlmüller, A.: The Design and Implementation of Probabilistic
Programming Languages (2014)
27. Goodman, N.D., Tenenbaum, J.B., Contributors, T.P.: Probabilistic Models of
Cognition (2016)
28. Gordon, A.D., Henzinger, T.A., Nori, A.V., Rajamani, S.K.: Probabilistic pro-
gramming. In: Proceedings of the on Future of Software Engineering (2014)
29. Gorinova, M.I., Moore, D., Hoffman, M.D.: Automatic reparameterisation in prob-
abilistic programming (2019)
30. Herbrich, R., Minka, T., Graepel, T.: TrueskillTM : A bayesian skill rating system.
In: Proceedings of the 19th International Conference on Neural Information Pro-
cessing Systems. pp. 569–576. NIPS’06 (2006)
31. Hoffman, M.D., Gelman, A.: The no-u-turn sampler: Adaptively setting path
lengths in hamiltonian monte carlo (2011)
32. Huang, Z., Wang, Z., Misailovic, S.: Psense: Automatic sensitivity analysis for
probabilistic programs. In: Automated Technology for Verification and Analysis -
15th International Symposium, ATVA 2018, Los Angeles, California, October 7-10,
2018, Proceedings (2018)
33. Hur, C.K., Nori, A.V., Rajamani, S.K., Samuel, S.: Slicing probabilistic programs.
In: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language
Design and Implementation. pp. 133–144 (2014)
34. Inala, J.P., Gao, S., Kong, S., Solar-Lezama, A.: REAS: combining numerical op-
timization with SAT solving (2018)
35. Kildall, G.A.: A unified approach to global program optimization. In: Proceedings
of the 1st Annual ACM SIGACT-SIGPLAN Symposium on Principles of Program-
ming Languages. pp. 194–206. POPL ’73 (1973)
392 J. Laurel and S. Misailovic

36. Kozen, D.: Semantics of probabilistic programs. Journal of Computer and System
Sciences 22(3), 328 – 350 (1981)
37. Lan, S., Streets, J., Shahbaba, B.: Wormhole hamiltonian monte carlo. In: Pro-
ceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence. pp.
1953–1959. AAAI’14 (2014)
38. Laurel, J., Misailovic, S.: Continualization of probabilistic programs with correction
(appendix) (2020), https://jsl1994.github.io/papers/ESOP2020 appendix.pdf
39. Lee, M.D., Wagenmakers, E.J.: Bayesian cognitive modeling: A practical course.
Cambridge University Press (2014)
40. Lee, W., Yu, H., Yang, H.: Reparameterization gradient for non-differentiable mod-
els. In: Advances in Neural Information Processing Systems. pp. 5553–5563 (2018)
41. Lew, A.K., Cusumano-Towner, M.F., Sherman, B., Carbin, M., Mansinghka, V.K.:
Trace types and denotational semantics for sound programmable inference in prob-
abilistic languages. Proc. ACM Program. Lang. 4(POPL) (2019)
42. Maddison, C.J., Mnih, A., Teh, Y.W.: The Concrete Distribution: A Continuous
Relaxation of Discrete Random Variables. In: International Conference on Learning
Representations (2017)
43. Marin, J.M., Mengersen, K., Robert, C.P.: Bayesian modelling and inference on
mixtures of distributions. Handbook of statistics 25, 459–507 (2005)
44. Morgan, C., McIver, A., Seidel, K.: Probabilistic predicate transformers. ACM
Trans. Program. Lang. Syst. 18(3), 325–353 (May 1996)
45. Murray, I., Salakhutdinov, R.: Evaluating probabilities under high-dimensional la-
tent variable models. In: Proceedings of the 21st International Conference on Neu-
ral Information Processing Systems. pp. 1137–1144. NIPS’08 (2008)
46. Nandi, C., Grossman, D., Sampson, A., Mytkowicz, T., McKinley, K.S.: Debugging
probabilistic programs. In: Proceedings of the 1st ACM SIGPLAN International
Workshop on Machine Learning and Programming Languages. MAPL 2017 (2017)
47. Narayanan, P., Shan, C.c.: Symbolic disintegration with a variety of base measures
(2019), http://homes.sice.indiana.edu/ccshan/rational/disint2arg.pdf
48. Neal, R.M.: Mcmc using hamiltonian dynamics. In: Handbook of Markov Chain
Monte Carlo, chap. 5 (2012)
49. Nguyen, V.A., Abadeh, S.S., Yue, M.C., Kuhn, D., Wiesemann, W.: Optimistic
distributionally robust optimization for nonparametric likelihood approximation.
In: Advances in Neural Information Processing Systems. pp. 15846–15856 (2019)
50. Nishimura, A., Dunson, D., Lu, J.: Discontinuous hamiltonian monte
carlo for discrete parameters and discontinuous likelihoods (2017),
https://arxiv.org/abs/1705.08510
51. Nori, A.V., Ozair, S., Rajamani, S.K., Vijaykeerthy, D.: Efficient synthesis of prob-
abilistic programs. In: Proceedings of the 36th ACM SIGPLAN Conference on Pro-
gramming Language Design and Implementation. pp. 208–217. PLDI ’15 (2015)
52. Opper, M., Archambeau, C.: The variational gaussian approximation revisited.
Neural Computation 21(3), 786–792 (2009)
53. Opper, M., Winther, O.: Expectation consistent approximate inference. J. Mach.
Learn. Res. 6, 2177–2204 (Dec 2005)
54. Ross, S.: A First Course in Probability. Pearson (2010)
55. Rudin, W.: Real and complex analysis. McGraw-Hill Education (2006)
56. Salimans, T., Kingma, D.P., Welling, M.: Markov chain monte carlo and variational
inference: Bridging the gap. In: Proceedings of the 32nd International Conference
on International Conference on Machine Learning. pp. 1218–1226. ICML (2015)
Continualization of Probabilistic Programs With Correction 393

57. Sankaranarayanan, S., Chakarov, A., Gulwani, S.: Static analysis for probabilistic
programs: inferring whole program properties from finitely many paths. In: Pro-
ceedings of the 34th ACM SIGPLAN conference on Programming language design
and implementation. pp. 447–458 (2013)
58. Sanner, S., Abbasnejad, E.: Symbolic variable elimination for discrete and contin-
uous graphical models. In: Proceedings of the Twenty-Sixth AAAI Conference on
Artificial Intelligence. pp. 1954–1960. AAAI’12 (2012)
59. Smith, J., Croft, J.: Bayesian networks for discrete multivariate data: an algebraic
approach to inference. Journal of Multivariate Analysis 84(2), 387 – 402 (2003)
60. Tolpin, D., van de Meent, J.W., Yang, H., Wood, F.: Design and implementa-
tion of probabilistic programming language anglican. In: Proceedings of the 28th
Symposium on the Implementation and Application of Functional Programming
Languages. IFL 2016 (2016)
61. Tucker, G., Mnih, A., Maddison, C.J., Sohl-Dickstein, J.: REBAR : Low-variance,
unbiased gradient estimates for discrete latent variable models. In: Neural Infor-
mation Processing Systems (2017)
62. Wand, M., Jones, M.: Kernel Smoothing (Chapman & Hall/CRC Monographs on
Statistics and Applied Probability) (1995)
63. Wang, D., Hoffmann, J., Reps, T.: Pmaf: An algebraic framework for static analysis
of probabilistic programs. In: Proceedings of the 39th ACM SIGPLAN Conference
on Programming Language Design and Implementation. PLDI 2018 (2018)
64. Wang, D., Hoffmann, J., Reps, T.: A denotational semantics for low-level proba-
bilistic programs with nondeterminism. Electronic Notes in Theoretical Computer
Science 347 (2019), proceedings of the Thirty-Fifth Conference on the Mathemat-
ical Foundations of Programming Semantics
65. Wu, Y., Srivastava, S., Hay, N., Du, S., Russell, S.: Discrete-continuous mixtures
in probabilistic programming: Generalized semantics and inference algorithms. In:
Proceedings of the 35th International Conference on Machine Learning. Proceed-
ings of Machine Learning Research, vol. 80, pp. 5343–5352 (2018)
66. Zhou, Y., Gram-Hansen, B.J., Kohn, T., Rainforth, T., Yang, H., Wood, F.:
LF-PPL: A low-level first order probabilistic programming language for non-
differentiable models. In: The 22nd International Conference on Artificial Intelli-
gence and Statistics, AISTATS. Proceedings of Machine Learning Research, vol. 89,
pp. 148–157 (2019)

Konstantinos Mamouras

Rice University, Houston TX 77005, USA

[email protected]

Abstract. We propose a denotational semantic framework for deter-

ministic dataflow and stream processing that encompasses a variety of
existing streaming models. Our proposal is based on the idea that data
streams, stream transformations, and stream-processing programs should
be classified using types. The type of a data stream is captured for-
mally by a monoid, an algebraic structure with a distinguished binary
operation and a unit. The elements of a monoid model the finite frag-
ments of a stream, the binary operation represents the concatenation of
stream fragments, and the unit is the empty fragment. Stream trans-
formations are modeled using monotone functions on streams, which we
call stream transductions. These functions can be implemented using
abstract machines with a potentially infinite state space, which we call
stream transducers. This abstract typed framework of stream transduc-
tions and transducers can be used to (1) verify the correctness of stream-
ing computations, that is, that an implementation adheres to the desired
behavior, (2) prove the soundness of optimizing transformations, e.g. for
parallelization and distribution, and (3) inform the design of program-
ming models and query languages for stream processing. In particular,
we show that several useful combinators can be supported by the full
class of stream transductions and transducers: serial composition, paral-
lel composition, and feedback composition.

Keywords: Data streams · Denotational semantics · Type system

1 Introduction
Stream processing is the computational paradigm where the input is not pre-
sented in its entirety at the beginning of the computation, but instead it is
given in an incremental fashion as a potentially unbounded sequence of elements
or data items. This paradigm is appropriate in settings where data is created
continually in real-time and has to be processed immediately in order to ex-
tract actionable insights and enable timely decision-making. Examples of such
datasets are streams of business events in an enterprise setting [26], streams
of packets that ﬂow through computer networks [37], time-series data that is
captured by sensors in healthcare applications [33], etc.
Due to the great variety of streaming applications, there are various propos-
als for specialized languages, compilers, and runtime systems that deal with the
c The Author(s) 2020
P. Müller (Ed.): ESOP 2020, LNCS 12075, pp. 394–427, 2020.
https://doi.org/10.1007/978-3-030-44914-8_ 15
Semantic Foundations for Deterministic Dataﬂow and Stream Processing 395

processing of streaming data. Relational database systems and SQL-based lan-

guages have been adapted to the streaming setting [1,2,15,16,18,19,32,37,57,91].
Recently, several systems have been developed for the distributed processing of
data streams that are based on the distributed dataflow model of computa-
tion [6, 7, 70, 86, 92, 94, 108, 112, 113]. Languages for detecting complex events
in distributed systems, which draw on the theory of regular expressions and
finite-state automata, have also been proposed [29, 40, 41, 50, 53, 88, 99, 111]. The
synchronous dataflow formalisms [20, 24, 28, 51, 73, 107] are based on Kahn’s
seminal work [59], and they have been used for exposing and exploiting task-
level and pipeline parallelism within streaming computations in the context
of embedded systems. Several formalisms for the runtime verification of re-
active systems have been proposed, many of which are based on variants of
Temporal Logic and its timed/quantitative extensions [39, 43, 52, 74, 105]. Fi-
nally, there is a large collection of languages and systems for reactive program-
ming [34,36,38,46,47,55,68,69,77,89,93,103], which focus on the development of
event-driven and interactive applications such as GUIs and web programming.
The aforementioned languages and systems have been successfully used in the
application domains for which they were developed. However, each one of them
typically introduces a unique variant of the streaming model in terms of: (1) the
form of the input and output data, (2) the class of expressible stream-processing
computations, and (3) the syntax employed to describe these computations.
This has resulted in an enormous proliferation of semantic models for stream
processing that are difficult to compare. For this reason, we are interested in
identifying a semantic unification of several existing streaming models.
This paper introduces a typed semantic framework for reasoning about
languages and systems for stream processing. Three key questions are tackled:
1. How do we model streams and what is the form of the data that they carry?
2. How do we capture mathematically the notion of a stream transformation?
3. What is a general programming model for specifying streaming computations?
The first two questions concern the discovery of an appropriate denotational
model for streaming computation. The third question concerns the design of
programming and query languages, where a key requirement is that the behav-
ior of a streaming program/query admits a precise mathematical description.
Existing works have addressed these questions in the context of specific classes
of applications. Here are examples of various perspectives:
− Transductions of strings [8, 100, 104, 110]: A stream is viewed as an
unbounded sequence of letters, and a stream transformation is a translation
from input sequences to output sequences, which is typically called string/word
transduction. These translations are commonly described using finite-state trans-
ducers, a class of automata that extend acceptors with output.
− The streaming dataflow model of Gilles Kahn [59, 60]: The input
and output consist of multiple independent channels that carry unbounded se-
quences of elements. A transformation is a function from a tuple of input se-
quences to a tuple of output sequences. Such transformations are specified with
dataflow graphs whose nodes describe single-process computations.
396 K. Mamouras

− Relational transformations [71]: A stream is an unbounded multiset

(bag) of tuples, and a stream transformation is a monotone operator (w.r.t. mul-
tiset containment) on multisets. This can be generalized to consider more than
one input stream. An interesting subclass of these operators can be described
syntactically using monotone relational algebra.
− Processing of time-varying relations [16, 17]: A stream is a time-
varying finite multiset of tuples, i.e. an unbounded sequence of finite multisets of
tuples. In this setting, a stream transformation processes the input in a way that
preserves the notion of time: after processing t input multisets (i.e., t time units)
the output consists of t output multisets. The query language CQL [16] defines
a class of such computations that involve relational and windowing operators.
− Transformations of continuous-time signals [27]: An input stream
is a continuous-time signal, that is, a function from the real numbers R to an n-
dimensional space Rn . A stream transformation is a mapping from input signals
to output signals that is causal, which means that the value of the output at time
t depends on the values of input signal up to (and including) time t. Systems of
differential equations can be used to describe classes of such transformations.
We are interested here in a unifying framework that encompasses all the
aforementioned concrete instances of streaming models and enables formal rea-
soning about the composition of streaming computations from different models.
In order to achieve this we take an abstract algebraic approach that retains
only the essential aspects of stream processing without any unnecessary special-
ization. The rest of the section outlines our proposal.
At the most fundamental level, stream processing is computation over input
that is not given at the beginning in full, but rather is presented incrementally
as the computation evolves. Since the input is presented piece by piece, the basic
concepts that need to be captured mathematically are: (1) what is a piece or
fragment of the input, and (2) how do we extend the input. The most general
class of algebraic structures that model these notions is the class of monoids,
the collection of algebras that have a distinguished binary associative multi-
plication operation · and an identity element 1 for this operation. A monoid
(A, ·, 1) then constitutes a type of data streams, where the elements of the
monoid are all the possible finite stream fragments, the identity 1 ∈ A is the
empty stream fragment, and the multiplication operation · : A × A → A models
the concatenation of stream fragments. Using monoids, we can organize several
notions of data streams using types that describe the form of the data, as well
any invariants or assumptions about them. Monoids encompass the kinds of data
streams that we mentioned earlier and many more: strings of letters, linear se-
quences of data items, tuples of sequences, multisets (bags) of data items, sets
of data items, time-varying relations/multisets, (potentially disordered) times-
tamped sequences of data items, continuous-time signals, and so on.
Stream transformations can be classified according to the type of their input
and output streams, which we call a transduction type. They are modeled us-
ing monotone functions that map an input stream history (i.e., the fragment of
the input stream that has been received from the beginning of the computation
Semantic Foundations for Deterministic Dataflow and Stream Processing 397

until now) to an output stream history (i.e., the fragment of the output stream
produced so far). The monotonicity requirement captures the idea that a stream
transformation cannot retract the output that has already been emitted. We
call such functions stream transductions, and we propose them as a deno-
tational semantic model for stream processing. This model encompasses string
transductions, non-diverging Kahn-computable [59] functions on streams, mono-
tone relational transformations [71], the CQL-definable [16] transformations on
time-varying relations, and transformations of continuous-time signals [27].
We also introduce an abstract model of computation for stream processing.
The considered programs or abstract machines are called stream transduc-
ers, and they are organized using transducer types that specify the input and
output stream types. A stream transducer processes the input stream in an in-
cremental fashion, by consuming it fragment by fragment. The consumption of
an input fragment results in the emission of an output fragment. Our algebraic
setting brings in an unavoidable complication compared to the classical theory
of word transducers: not all stream transducers describe a stream transduction.
This phenomenon has to do with the generalization of the input and output data
streams from sequences of atomic data items to elements of arbitrary monoids.
A stream transducer has to respect its input/output type, which means that the
way in which the input stream is fragmented into pieces and fed to the trans-
ducer does not affect the cumulative output. More concisely, this says that the
cumulative output is independent from the fragmentation of the input. In order
to formalize this notion, we say that a factorization of an input history u is a
sequence of stream fragments u1 , u2 , . . . , un whose concatenation is equal to the
input history, i.e. u1 · u2 · · · un = u. Now, the desired restriction can be described
as follows: for every input history w and any two factorizations u1 , . . . , um and
v1 , . . . , vm of w, the cumulative output that the transducer emits when consum-
ing the fragments u1 , . . . , um in sequence is equal to the cumulative output when
consuming the fragments v1 , . . . , vn . Fortunately, this complex property can be
distilled into an equivalent property on the structure of the stream transducer
that we call coherence property. Every stream transducer that is coherent has
a well-defined semantics or denotation in terms of a stream transduction.
We have already outlined the basics of our general framework for streaming
computation, which includes: (1) a classification of streams using monoids as
types, (2) a denotational semantic model that employs monotone functions from
input histories to output histories, and (3) a programming model that general-
izes transducers to compute meaninfully on elements of arbitrary monoids. This
already allows us to address important questions about specific computations:
− Does a streaming program (transducer) behave as intended? This amounts
to checking whether the denotation of the transducer is the desired function.
− Are two streaming programs (transducers) equivalent? This means that their
denotations in terms of stream transductions are the same.
The first question is a correctness property. The second question is relevant for
semantics-preserving program optimization. We will turn now to the issue of how
to modularly specify complex stream transductions and transducers.
398 K. Mamouras

One of the most common ways to conceptually organize complex streaming

computations is to view the overall computation as the composition of several
processes that run independently and are connected with directed communi-
cation channels on which streams of data flow. This way of structuring com-
putations is called the dataflow programming model. The simple deterministic
parallel model of Karp and Miller [61] is one of the first variants of dataflow,
and other notable early works on dataflow models include Dennis’s parallel lan-
guage of actors and links [42] and Kahn’s networks [59] of computing stations and
communication lines. We investigate three key dataflow combinators for com-
posing stream transductions (i.e., semantic-level) and stream transducers (i.e.,
program-level): serial composition, parallel composition, and feedback com-
position. Serial composition is useful for describing pipelines of processing stages,
where the output of one stage is streamed as input into the next stage. Parallel
composition describes the independent and concurrent computation of two or
more components. Feedback composition supports computations whose current
output depends on previously produced outputs. We show that our framework
supports all these combinators, which facilitate the modular description of com-
plex computations and expose pipeline and task-based parallelism.

Outline of paper. In Sect. 2 we introduce the idea that data streams can be
classified using monoids as their types, and in Sect. 3 we propose the semantic
model of stream transductions. Sect. 4 is devoted to the description of an ab-
stract model of streaming computation, called stream transducer, and the main
properties that it satisfies. In Sect. 5 we show that our abstract model is closed
under a fundamental set of dataflow combinators: serial, parallel, and feedback
composition. In Sect. 6 we prove the soundness of a streaming optimizing trans-
formation using denotational arguments and algebraic rewriting. Sect. 7 contains
related work, and Sect. 8 concludes with a brief summary of our proposal.

2 Monoids as Types for Streams

Data streams are typically viewed as unbounded linear sequences of data items,
where a data item can be thought of as a small indivisible piece of data. This
viewpoint is sufficient for describing many useful semantic and programming
models, but it is too concrete and unnecessarily restricts the notion of a data
stream. In order to see this, consider a computation where the specific order in
which the data items arrive is not relevant. Counting is a trivial example of such
a computation, and it can be described operationally as follows: every time a
new data item arrives, the counting stream algorithm emits the total number of
items that have been seen so far. This can be described mathematically by the
function β, given by β(x1 , x2 , . . . , xn ) = 1, 2, . . . , n, where x1 , x2 , . . . , xn
is the input and 1, 2, . . . , n is the cumulative output of the computation. For
this computation, the input can be meaningfully viewed as a multiset (or bag)
instead of a sequence, since the ordering of the data items is irrelevant. This
means that multisets can also be viewed as data streams, and in some cases this
viewpoint is preferable to the traditional one of “streams = sequences”.
Semantic Foundations for Deterministic Dataflow and Stream Processing 399

The example of the previous paragraph raises an obvious question: What

class of mathematical objects can meaningfully serve as data streams? Linear
sequences and multisets should certainly be included, but it would be desirable
to generalize the notion of streams as much as possible. Recent works explore the
idea of generalizing streams to encompass a large class of partial orders [13, 85],
but we will see later that this approach excludes many useful instances. Stream
processing is the computational paradigm where the input is not presented in
full at the beginning of the computation, but instead it is given in an incremental
fashion or piece by piece. For this reason, there are just three notions that need
to be modeled mathematically: (1) a fragment or piece of a data stream, (2)
the extension of data with an additional fragment of data, and (3) the empty
data stream, i.e. the data seen at the very beginning of the computation. This
leads us to consider a kind or type of a data stream as an algebraic structure that
satisfies the following: (1) its elements model data stream fragments, (2) it has a
distinguished associative operation · for the concatenation of stream fragments,
and (3) it has a distinguished element 1 that represents the empty fragment so
that 1 is a unit for concatenation. The class of monoids is the largest class of
algebraic structures that fulfill these requirements.
More formally, a monoid is an algebraic structure (A, ·, 1), where · : A×A →
A is a binary operation called multiplication and 1 ∈ A is a constant called unit,
that satisfies the following two axioms: (I) (x · y) · z = x · (y · z) for all x, y, z ∈ A,
and (II) 1 · x = x · 1 = x for all x ∈ A. The first axiom says that · is associative,
and the second axiom says that 1 is a left and right identity for the · operation.
For brevity, we will sometimes write xy to denote x · y.
Suppose that A is a monoid. We write A∗ for the set of all finite sequences of
elements of A and ε for the empty sequence. The finite multiplication function
π : A∗ → A is given by π(ε) = 1 and π(x̄ · y) = π(x̄) · y for x̄ ∈ A∗ and y ∈ A.
For sequences x̄, ȳ ∈ A∗ , it holds that π(x̄ · ȳ) = π(x̄) · π(ȳ). So, π generalizes
the binary multiplication · to a finite but arbitrary number of arguments.
Let (A, ·A , 1A ) and (B, ·B , 1B ) be monoids. Their product is the monoid (A ×
B, ·, 1), where the multiplication operation is given by (x, y) · (x , y ) = (x ·A
x , y ·B y ) for x, x ∈ A and y, y ∈ B, and the identity is 1 = (1A , 1B ).
A monoid homomorphism from a monoid (A, ·, 1) to a monoid (B, ·, 1)
is a function h : A → B that commutes with the monoid operations, that is,
h(1) = 1 and h(x · y) = h(x) · h(y) for all x, y ∈ A.
As we discussed earlier, we can think of a monoid as a type of data streams.
The elements of the monoid represent finite stream fragments. The multiplication
operation · models the concatenation of stream fragments, and the unit of the
monoid is the empty stream fragment.
For a monoid (A, ·, 1) we define the binary relation as follows: for all
x, y ∈ A, we put x y if and only if xz = y for some z ∈ A. Since the relation
is reflexive and transitive, we call it the prefix preorder for the monoid
A. The unit 1 is a minimal element w.r.t. the relation: 1 · x = x and hence
1 x for every x ∈ A. Define the function prefix : A × A → P(A) as follows:
prefix(x, y) = {z ∈ A | xz = y} for all x, y ∈ A. This implies that x y iff
400 K. Mamouras

preﬁx(x, y) = ∅. In other words, preﬁx(x, y) is the set of all witnesses for x y.

A partial function ∂ : A × A A is said to be a prefix witness function (or
simply a witness function) for the monoid A if its domain is equal to and it
satisfies: ∂(x, y) ∈ prefix(x, y) for every x, y ∈ A with x y. We can express this
equivalently by requiring that the type of the function ∂ is (x,y)∈ prefix(x, y).
We say that a monoid A satisfies the left cancellation property if xy = xz
implies y = z for all x, y, z ∈ A. In this case we say that A is left-cancellative. If
A is left-cancellative, then it has a unique prefix witness function, because x y
implies that there is a unique z with xz = y.
Example 1 (Finite Sequences). Consider the algebra (FSeq(A), ·, ε), where
FSeq(A) is the set A∗ of all finite words (strings) over a set A, · is word concate-
nation, and ε is the empty word. This algebra is a monoid. In fact, it is the free
monoid with generators A. For u, v ∈ A∗ , u v iff the word u is a prefix of the
word v. There is a unique prefix witness function, because for every x, y ∈ A∗
with x y there is a unique z ∈ A∗ such that xz = y.
Let us consider now a variant of Example 1 in order to clear any misunder-
standings regarding the order. The set A∗ , together with the empty sequence
ε, and the operation ◦ given by x◦ y = yx is a monoid. For the monoid (A∗ , ε, ◦),
we have that x y iff x ◦ z = zx = y for some z ∈ A∗ . So, x y iff the word x
is a suffix of the word y.
Example 2 (Finite Multisets). Consider the algebra (FBag(A), ∪, ∅), where
FBag(A) is the set of all finite multisets (bags) over a set A, ∪ is multiset
union, and ∅ is the empty multiset. This algebra is a monoid. In fact, it is
the free commutative monoid with generators A. It is also left cancellative. For
x, y ∈ FBag(A), x y iff x is contained in y. So, we also use the notation
⊆ instead of . There is a unique prefix witness function, because for every
x, y ∈ FBag(A) with x ⊆ y there is a unique z ∈ FBag(A) such that xz = y.
Example 3 (Finite Sets). Let A be a set. Consider the algebra (FSet(A), ∪, ∅),
where FSet(A) is the set of all finite subsets of A, ∪ is set union, and ∅ is the
empty set. This algebra is a monoid. In fact, it is the free commutative idempotent
monoid with generators A. For x, y ∈ FBag(A), x y iff x is contained in y. So,
we also use the notation ⊆ instead of .
For x ⊆ y, define ∂(x, y) = y \ x, where \ is the set difference operation.
Since x ∪ (y \ x) = y for x ⊆ y, ∂ is a prefix witness function. We also define
τ (x, y) = y for x ⊆ y. Since x ∪ y = y for x ⊆ y, τ is a prefix witness function.
So, FSet(A) has several distinct prefix witness functions.
Example 4 (Finite Maps). Let K be a set of keys, and V be a set of values.
Consider the algebra (FMap(K, V ), ·, ∅), where FMap(K, V ) is the set of all par-
tial maps K V with a finite domain, ∅ is the partial map with empty domain,
and · is defined as follows:
⎧
⎨g(k),
⎪ if g(k) is defined
(f · g)(k) = f (k), if g(k) is undefined and f (k) is defined
⎪
undefined, otherwise
⎩
Semantic Foundations for Deterministic Dataflow and Stream Processing 401

for every f, g ∈ FMap(K, V ) and k ∈ K. We leave it to the reader to check that

∅ · f = f · ∅ = f and (f · g) · h = f · (g · h) for all f, g, h ∈ FMap(K, V ). So, the
algebra FMap(K, V ) is a monoid.
Let f, g ∈ FMap(A). We write dom(f ) = {k ∈ K | f (k) is defined} for the
domain of f . It holds that dom(f · g) = dom(f ) ∪ dom(g). Using this property,
we see that f g iff dom(f ) ⊆ dom(g).
Let f, g ∈ FMap(K, V ) with f g. Define ∂(f, g) = g. Since dom(f ) ⊆
dom(g), we have that f ·∂(f, g) = g. It follows that ∂ is a prefix witness function.
Define g \ f ∈ FMap(K, V ) as follows:
⎧
⎨g(k),
⎪ if g(k) is defined and f (k) is undefined
(g \ f )(k) = g(k), if g(k), f (k) are defined and g(k) = f (k)
⎪
undefined, otherwise
⎩

for every k ∈ K. From f g we get f ·(g\f ) = g. So, \ is a preﬁx witness function.

This means that FMap(K, V ) has several distinct preﬁx witness functions.

Example 5 (Bounded-Domain Continuous-Time Signals). Let A be an

arbitrary set, and R be the set of real numbers. A bounded-domain continuous-
time signal with values in A is a function f : [0, u) → A where u ≥ 0 is a real
number and [u, v) = {t ∈ R | u ≤ t < v}. We deﬁne the concatenation operation
· for such signals as follows:

f : [0, u) → A g : [0, v) → A f (t), if t ∈ [0, u)
(f · g)(t) =
f · g : [0, u + v) → A g(t − u), if t ∈ [u, u + v)

We write BSig(A) for the set of all these bounded-domain continuous-time sig-
nals. The unit signal is the unique function of type [0, 0) → A, whose domain of
definition is empty. Observe that BSig(A) is a monoid. For signals f : [0, u) → A
and g : [0, v) → A, it holds that f g iff u ≤ v and f (t) = g(t) for ev-
ery t ∈ [0, u). There is a unique prefix witness function, because for every
f, g ∈ BSig(A) with f g there is a unique h ∈ BSig(A) such that f · h = g.

Example 6 (Timed Finite Sequences). We write N to denote the set of nat-

ural numbers (non-negative integers). A timed sequence over A is an alternating
sequence s0 a1 s1 a2 . . . an sn , where si ∈ N and ai ∈ A for every i. The occurrences
s0 , s1 , . . . are called time punctuations and indicate the passage of time. So, the
set of all timed sequences over A is equal to TFSeq(A) = N·(A·N)∗ . We deﬁne the
fusion product of timed sequences as follows: s0 a1 s1 . . . am sm t0 b1 t1 . . . bn tn =
s0 a1 s1 . . . am (sm + t0 )b1 t1 . . . bn tn . The unit timed sequence is the singleton se-
quence 0. The algebra (TFSeq(A), , 0) is easily shown to be a monoid. There
is a unique preﬁx witness function, because for all x, y ∈ TFSeq(A) with x y
there is a unique z ∈ TFSeq(A) s.t. x z = y.

Example 7 (Finite Time-Varying Multisets). A ﬁnite time-varying mul-

tiset over A is a partial function f : N FBag(A) whose domain is equal to
402 K. Mamouras

[0..n] = {0, . . . , n} for some integer n ≥ 0. We also use the notation f : [0..n] →
FBag(A) to convey this information regarding the domain of f . We define the
concatenation operation · for finite time-varying multisets as follows:
⎧
f : [0..m] → FBag(A) ⎨f (t),
⎪ if t ∈ [0..m − 1]
g : [0..n] → FBag(A) (f · g)(t) = f (t) ∪ g(0), if t = m
⎩g(t − m),
⎪
if t ∈ [m + 1..n]
f · g : [0..m + n] → FBag(A)
We write TFBag(A) to denote the set of all finite time-varying multisets over A.
The unit time-varying multiset Id : [0..0] → FBag(A) is given by Id(0) = ∅. It is
easy to see that f · Id = f and that Id · f = f for every f : [0..n] → FBag(A).
We leave it to the reader to also verify that (f · g) · h = f · (g · h) for finite
time-varying multisets f , g and h. So, the set TFBag(A) together with · and Id
is a monoid. It is not difficult to show that it is left-cancellative.
Let us consider now the prefix preorder on finite time-varying multisets.
For f : [0..m] → FBag(A) and g : [0..n] → FBag(A), it holds that f g iff
m ≤ n and f (t) = g(t) for every t ∈ [0..m].
The examples above highlight the variety of mathematical objects that can
be meaningfully viewed as streams. These streams can be organized elegantly
using the structure of monoids. The sequences of Example 1, the multisets of
Example 2, and the finite time-varying multisets of Example 7 can be described
equivalently in terms of the partial orders of [13, 85], which have also been sug-
gested as an approach to unify notions of streams. Using partial orders it is
also possible to model the timed finite sequences of Example 6, but only with a
non-succinct encoding: every time punctuation t ∈ N is encoded with a sequence
11 . . . 1 of t punctuations, one for each time unit. Partial orders cannot encode
the sets of Example 3, the maps of Example 4, or the signals of Example 5. In-
formally, the reason for this is that partial orders can only encode commutation
equations, which are insufficient for objects such as sets and maps.

3 Stream Transductions
In this section we will introduce stream transductions as semantic denotational
models of stream transformations. At any given point in a streaming computa-
tion, we have seen an input history (the part of the stream from the beginning
of the computation until now) and we have produced an output history (the
cumulative output that has been emitted from the beginning until now). As a
ﬁrst approximation, a streaming computation can be described mathematically
by a function β : A → B, where A and B are monoids that describe the input
and output type respectively, which maps an input history x ∈ A to an output
history β(x) ∈ B. The function β has to be monotone because the output is
cumulative, which means that it can only be extended with more output items
as the computation proceeds. An equivalent way to understand the monotonicity
property is that it captures the idea that any output that has already been emit-
ted cannot be retracted. Since β takes an entire input history as its argument,
Semantic Foundations for Deterministic Dataﬂow and Stream Processing 403

it can describe stateful computations, where the output that is emitted at every
step potentially depends on the entire input history.
Definition 8 (Stream Transduction & Incremental Form). Let A and B
be monoids. A function β : A → B is said to be monotone (with respect to the
prefix preorder) if x y implies β(x) β(y) for all x, y ∈ A. For a monotone
β : A → B, we say that the partial function μ is a monotonicity witness function
if it maps elements x, y ∈ A and z ∈ prefix(x, y) witnessing that x y to a
witness μ(x, y, z)
∈ prefix(β(x), β(y)) for β(x) β(y). That is, we require that
the type of μ is x,y∈A prefix(x, y) → prefix(β(x), β(y)). So, the defining property
of μ is that for all x, y, z ∈ A with xz = y it holds that β(x) · μ(x, y, z) = β(y).
For brevity, we will sometimes write μ(x, z) to denote μ(x, xz, z). The defining
property of μ is then written as β(x) · μ(x, z) = β(xz) for all x, z ∈ A.
A stream transduction from A to B is a function β : A → B that is mono-
tone with respect
to the prefix preorder, together with a monotonicity witness
function μ : x,y∈A prefix(x, y) → prefix(β(x), β(y)). We write STrans(A, B) to
denote the set of all stream transductions from A to B.
The incremental form of a stream transduction β, μ ∈ STrans(A, B) is a
function F(β, μ) : A∗ → B ∗ , which is defined inductively by F(β, μ)(ε) = β(1)
and F(β, μ)(x1 , . . . , xn , xn+1 ) = F(β, μ)(x1 , . . . , xn ) · μ(x1 · · · xn , xn+1 ) for
every sequence x1 , . . . , xn+1 ∈ A∗ .
Consider the stream transduction β, μ : STrans(A, B) and the input frag-
ments x, y ∈ A. Notice that μ(x, y) gives the output increment that the streaming
computation generates when the input history x is extended into xy. For an ar-
bitrary output monoid B, the output increment μ(x, y) is generally not uniquely
determined by β(x) and β(xy). This means that the monotonicity witness func-
tion μ generally provides some additional information about the streaming com-
putation that cannot be obtained purely from β. However, if the output monoid
B is left-cancellative then there is a unique function μ that witnesses the mono-
tonicity of β.
Suppose that β, μ : STrans(A, B) is a stream transduction. The incremental
form F(β, μ) of the transduction β, μ describes the stream transformation in
explicit input/output increments. For example, F(β, μ)(x1 ) = β(1), μ(1, x1 )
and F(β, μ)(x1 , x2 ) = β(1), μ(1, x1 ), μ(x1 , x2 ). The key property of the in-
cremental form is that π(F(β, μ)(x̄)) = β(π(x̄)) for every x̄ ∈ A∗ . For example,
π(F(β, μ)(x1 , x2 , x3 )) = β(1)·μ(1, x1 )·μ(x1 , x2 )·μ(x1 x2 , x3 ) = β(x1 )·μ(x1 , x2 )·
μ(x1 x2 , x3 ) = β(x1 x2 ) · μ(x1 x2 , x3 ) = β(x1 x2 x3 ).
Example 9 (Counting). Let A be an arbitrary set. We will describe a stream-
ing computation whose input type is the monoid FBag(A) and whose output
type is the monoid FSeq(N). The informal operational description is as follows:
there is no initial output, and every time a new data item arrives the compu-
tation emits the total number of items seen so far. The formal description is
given by the stream transduction β : FBag(A) → FSeq(N), defined by β(∅) = ε
and β(x) = 1, 2, . . . , |x| for every non-empty x ∈ FBag(A), where |x| denotes
the size of the multiset x. It is easy to see that β is monotone. Since FSeq(N)
404 K. Mamouras

is left-cancellative, the monotonicity witness function is uniquely determined:

μ(x, ∅) = ε and μ(x, y) = |x| + 1, . . . , |x| + |y| when y = ∅.
Example 10 (Per-Key Aggregation). Let K be a set of keys, and V be
a set of values. The elements of K × V are typically called key-value pairs.
Suppose that op : V × V → V is an associative and commutative operation. So,
op can be generalized to an aggregation operation that takes non-empty finite
multisets over V as input. We will describe a streaming computation whose
input type is the monoid FBag(K × V ) and whose output type is the monoid
FMap(K, V ). Informally, every time an item (k, v) is processed, the output map
is updated so that the k-indexed entry contains the aggregate (using op) of all
values seen so far for the key k. The formal description of this computation is
given by the stream transduction β : FBag(K × V ) → FMap(K, V ), defined by
β(x) = {k → op(x|k ) | k appears in x} for every multiset x, where x|k denotes
the multiset that results from x by keeping only the pairs whose key is equal to
k. That is, the domain of β(x) is equal to dom(β(x)) = {k ∈ K | k appears in x}
and β(x)(k) = op(x|k ) for every k that appears in x. The monotonicity witness
function μ is defined as follows: μ(x, y) is equal to the restriction of the map
β(x ∪ y) to the set of all keys that appear in y.
We saw in Sect. 2 that we can form products of monoids: if A and B are
monoids, then so is A × B. Intuitively, we can think of A × B as the data
stream type that involves two parallel and independent channels: one channel
for streams of type A and another channel for streams of type B.
Example 11 (Merging of Multiple Input Channels). Given a set A, we
want to describe a transformation with two input channels of type FBag(A) and
one output channel of type FBag(A). The monotone function β : FBag(A) ×
FBag(A) → FBag(A), given by β(x, y) = x ∪ y for multisets x and y, describes
the merging of the two input substreams. Operationally, whenever a new data
item arrives (regardless of channel) it is propagated to the output channel. Since
FBag(A) is left-cancellative, the monotonicity witness function is uniquely deter-
mined: μ(x1 , y1 , x2 , y2 ) = (x2 ∪ y2 ) \ (x1 ∪ y1 ) for all x1 , y1 , x2 , y2 ∈ FBag(A).
Example 12 (Flatten). Let A be a monoid. The function β : FSeq(A) → A,
given by β(x̄) = π(x̄) for every x̄ ∈ FSeq(A), describes the flattening of a se-
quence of monoid elements. The function β is monotone, and its monotonicity
witness function μ is given by μ(x̄, ȳ) = π(ȳ) for all x̄ and ȳ. The stream trans-
duction flatten(A) = β, μ has type STrans(FSeq(A), A).
Example 13 (Split in Batches). Let Σ = {a, b} be an alphabet of sym-
bols. Suppose that we want to describe the decomposition of an element of
Σ ∗ into batches of size exactly 3. We describe this using two functions r1 :
Σ ∗ → FSeq(Σ ∗ ) and r2 : Σ ∗ → Σ ∗ . Informally, r1 gives the sequence of full
batches of size 3, and r2 gives the remaining incomplete batch. For example,
r1 (abbaabba) = abb, aab and r2 (abbaabba) = ba.
This idea of splitting in batches can be generalized from the monoid Σ ∗ to
an arbitrary monoid A. We say that a splitter for A is a pair r = (r1 , r2 ) of
Semantic Foundations for Deterministic Dataflow and Stream Processing 405

functions r1 : A → FSeq(A) and r2 : A → A satisfying the following prop-

erties: (1) the equality x = π(r1 (x)) · r2 (x) says that r1 and r2 decompose
x ∈ A, (2) r1 (1A ) = ε says that the unit cannot be decomposed, (3) r1 (x · y) =
r1 (x) · r1 (r2 (x) · y) and (4) r2 (x · y) = r2 (r2 (x) · y) describe how to decom-
pose the concatenation of two monoid elements. The ﬁrst two properties im-
ply that r2 (1A ) = 1A . The third property implies that r1 is monotone. Deﬁne
μ(x, y) = r1 (r2 (x)·y) for x, y ∈ A and observe that r1 (x)·μ(x, y) = r1 (xy). It fol-
lows that split(r) = r1 , μ is a stream transduction of type STrans(A, FSeq(A)).

Our denotational model of a stream transformation uses a monotone function

whose domain is the monoid of (finite) input histories. We emphasize that such
a denotation can also describe the transformation of an infinite stream. To il-
lustrate this point in simple terms, consider a monotone function β : A∗ → B ∗ ,
where A (resp., B) is the type of input (resp., output) items. This function ex-
tends uniquely to the ω-continuous function β ∞ : A∞ → B ∞ , where A∞ = A∗ ∪
Aω is the set of finite and infinite sequences over A, as follows: β ∞ (a0 a1 a2 . . .)
is equal to the supremum of the chain β(ε) ≤ β(a0 ) ≤ β(a0 a1 ) ≤ . . .

4 Model of Computation
We will present an abstract model of computation for stream processing, where
the input and output data streams are elements of monoids A and B respec-
tively. A streaming algorithm is described by a transducer, a kind of automaton
that produces output values. We consider transducers that can have a poten-
tially inﬁnite state space, which we denote by St. The computation starts at a
distinguished initial state init ∈ St, and the initialization triggers some initial
output o ∈ B. The computation then proceeds by consuming the input stream
incrementally, i.e. fragment by fragment. One step of the computation from a
state s ∈ St involves consuming an input fragment x ∈ A, producing an output
increment out(s, x) ∈ B and transitioning to the next state next(s, x) ∈ St.

Deﬁnition 14 (Stream Transducer). Let A, B be monoids. A stream trans-

ducer with inputs from A and outputs from B is a tuple G = (St, init, o, next, out),
where St is a nonempty set of states, init ∈ St is the initial state, o ∈ B is the ini-
tial output, next : St × A → St is the transition function, and out : St × A → B is
the output function. We write G(A, B) to denote the set of all stream transducers
with inputs from A and outputs from B.
We define the generalized transition function gnext : St × A∗ → St by in-
duction: gnext(s, ε) = s and gnext(s, x · ȳ) = gnext(next(s, x), ȳ) for all s ∈ St,
x ∈ A and ȳ ∈ A∗ . A state s ∈ St is said to be reachable in G if there exists a
sequence x̄ ∈ A∗ such that gnext(init, x̄) = s.
We define the generalized output function gout : St × A∗ → B by induc-
tion on the second argument: gout(s, ε) = 1 and gout(s, x · ȳ) = out(s, x) ·
gout(next(s, x), ȳ) for all s ∈ St, x ∈ A and ȳ ∈ A∗ . The extended output func-
tion eout : St × A∗ → B ∗ is defined similarly: eout(s, ε) = ε and eout(x, x · ȳ) =
out(s, x) · eout(next(s, x), ȳ) for all s ∈ St, x ∈ A and ȳ ∈ A∗ .
406 K. Mamouras

Example 15 (Transducer for Counting). Recall the counting streaming

computation that was described in Example 9. We will describe a stream trans-
ducer that implements the counting computation. The input monoid is FBag(A)
and the output monoid is FSeq(N). The state space is St = N, because the
transducer has to maintain a counter that remembers the number of data items
seen so far. The initial state is init = 0 and the initial output is o = ε. The
transition function increments the counter, i.e. next(s, x) = s + |x| for every
s ∈ St and x ∈ FBag(A). The output function is deﬁned by out(s, ∅) = ε and
out(s, x) = s + 1, . . . , s + |x| for a nonempty multiset x. The type of this
transducer is G(FBag(A), FSeq(N)).

Example 16 (Transducer for Merging). We will implement the merging

computation of Example 11, where there are two input channels of type FBag(A)
and one output channel of type FBag(A). The transducer does not need mem-
ory, so St = Unit, where Unit = {
} is a singleton set. The initial state is
init =
and the initial output is o = ∅. There is only one possibility for the
transition function: next(s, x, y) =
. The output function describes the prop-
agation of the input increments of both input channels to the output chan-
nel: out(s, x, y) = x ∪ y for all multisets x, y. The type of this transducer is
G(FBag(A) × FBag(A), FBag(A)).

Example 17 (Flatten). For a monoid A, we deﬁne a transducer Flatten(A) =

(St, init, o, next, out) : G(FSeq(A), A) that implements the ﬂattening transduc-
tion of Example 12. This computation does not require memory, so we deﬁne
St = Unit and init =
. The initial output is o = 1A , the transition function
is uniquely determined by next(s, x) =
, and the output function is given by
out(s, a1 , . . . , an ) = a1 · · · an .

Example 18 (Split in Batches). For a monoid A and a splitter r = (r1 , r2 ) for

A (Example 13), we describe a transducer Split(r) = (St, init, o, next, out) that
implements the transduction split(r) : STrans(A, FSeq(A)). We deﬁne St = A,
because the transducer needs to remember the remainder of the cumulative
input that does not yet form a complete batch, and init = 1A . The initial output
o = ε is the empty sequence. The transition and output functions are deﬁned by
next(s, x) = r2 (s · x) and out(s, x) = r1 (s · x).

Deﬁnition 14 does not capture a key requirement for streaming computations

over monoids, namely that the cumulative output of a transducer G should be
independent of the particular way in which the input history is split into the
fragments that are fed to it. More precisely, suppose that w is an input history
that can be fragmented (factorized) in two different ways: w = u1 · u2 · · · um
and w = v1 · v2 · · · vn . Then, the cumulative output of the transducer G when
consuming the sequence of fragments (factorization) u1 , u2 , . . . , um should be
equal to the cumulative output when consuming v1 , v2 , . . . , vn . In Definition 20
below, we formulate a set of coherence conditions that a transducer must adhere
to in order to satisfy this “factorization independence” requirement.
Semantic Foundations for Deterministic Dataflow and Stream Processing 407

Deﬁnition 19 (Bisimulation & Bisimilarity). Let G = (St, init, o, next, out)

be a transducer with inputs from A and outputs from B. A relation R ⊆ St × St
is a bisimulation for G if for every s, t ∈ St and x ∈ A we have that (s, t) ∈ R
implies out(s, x) = out(t, x) and (next(s, x), next(t, x)) ∈ R. We will also use the
notation sRt to mean (s, t) ∈ R. We say that the states s, t ∈ R are bisimilar,
denoted s ∼ t, if there exists a bisimulation R for G such that sRt. The relation
∼ is called the bisimilarity relation for G.
It is well-known that the bisimilarity relation for G is an equivalence relation
(reflexive, symmetric, and transitive), and for all s, t ∈ St and x ∈ A it satisfies
the following extension property : s ∼ t implies that next(s, x) ∼ next(t, x). It
can then be easily seen that the bisimilarity relation is a bisimulation. In fact,
it is the largest bisimulation for the transducer G.
Definition 20 (Coherence). Suppose G = (St, init, o, next, out) : G(A, B) is a
stream transducer. We say that G is coherent if it satisfies the following:
(N1) next(init, 1) ∼ init.
(N2) next(init, xy) ∼ next(next(init, x), y) for every x, y ∈ A.
(O1) o · out(init, 1) = o.
(O2) o · out(init, xy) = o · out(init, x) · out(next(init, x), y) for every x, y ∈ A.
The coherence conditions of Definition 20 capture the idea that the trans-
ducer behaves in “essentially the same way” regardless of how the input is split
into fragments. For example, the condition (N2) says that the two-step transi-
tion init →x s1 →y s2 and the single-step transition init →xy t1 end up in states
(s2 and t1 ) that will have exactly the same behavior in the subsequent compu-
tation. In other words, it does not matter whether the input xy was fed to the
transducer as a single fragment xy or as a sequence of two fragments x, y.
Let (A, ·, 1) be a monoid. A factorization of an element x ∈ A is a sequence
x1 , . . . , xn of elements of A such that x = x1 · · · xn . In particular, the empty
sequence ε ∈ A∗ is a factorization of 1. In other words, x̄ ∈ A∗ is a factorization
of x ∈ A if π(x̄) = x.
Theorem 21 (Factorization Independence). Let G = (St, init, o, next, out)
be a stream transducer of type G(A, B). If G is coherent, then for every x ∈ A
and every factorization x̄ ∈ A∗ of x we have that o · gout(init, x̄) = o · out(init, x).
Proof. For clarity, we write x1 , x2 , . . . , xn ∈ A∗ to denote a finite sequence of
elements of A. The following properties hold for all s ∈ St, x̄ ∈ A∗ and y ∈ A:
gnext(s, x̄ · y) = next(gnext(s, x̄), y) (1)
gout(s, x̄ · y) = gout(s, x̄) · out(gnext(s, x̄), y) (2)
eout(s, x̄ · y) = eout(s, x̄) · out(gnext(s, x̄), y) (3)
Each property shown above can be proved by induction on the sequence x̄.
Consider an arbitrary coherent stream transducer G = (St, init, o, next, out).
We claim that G satisfies the following coherence property:
gnext(init, x1 , . . . , xn ) ∼ next(init, x1 · · · xn ) for all x1 , . . . , xn ∈ A∗ . (N*)
408 K. Mamouras

The proof is by induction on the length of the sequence. For the base case, we
have that gnext(init, ε) = init and next(init, 1) are bisimilar because G is coherent
(recall Property (N1) of Deﬁnition 20). For the induction step we have:

gnext(init, x̄ · y) = next(gnext(init, x̄), y) [Equation (1)]

∼ next(next(init, π(x̄)), y) [I.H., extension]
∼ next(init, π(x̄) · y), [coherence (N2)]

which is equal to next(init, π(x̄ · y)). This concludes the proof of the claim (N*).
The proof of the theorem proceeds by induction on x̄ ∈ A∗ . For the base case,
observe that o · gout(init, ε) = o · 1 = o is equal to o · out(init, 1) = o (property
(O1) for G). For the induction step, we have:

o · gout(init, x̄ · y) = o · gout(init, x̄) · out(gnext(init, x̄), y) [Eq. (2)]

= o · out(init, π(x̄)) · out(gnext(init, x̄), y) [I.H.]
= o · out(init, π(x̄)) · out(next(init, π(x̄)), y) [Prop. (N*)]
= o · out(init, π(x̄) · y) [Prop. (O2)]

which is equal to o · out(init, π(x̄ · y)).

Theorem 21 says that the condition of coherence guarantees a basic correct-

ness property for stream transducers: the output that they produce does not
depend on the specific way in which the input was partitioned into fragments.
For a transducer G = (St, init, o, next, out) we define the function G : A∗ →
B as follows: G(x̄) = o · eout(init, x̄) for every x̄ ∈ A∗ . We call G the
∗
interpretation or denotation of G. The definition of G implies that G(ε) =
o and the following holds for every x̄ ∈ A∗ and y ∈ A:

G(x̄ · y) = G(x̄) · out(gnext(init, x̄), y) (4)

When G is coherent, Theorem 21 says that the denotation gives the same cumu-
lative output for any two factorizations of the input. We say that the transducers
G1 and G2 are equivalent if their denotations are equal, i.e. G1 = G2 .

Deﬁnition 22 (The Implementation Relation). Let A, B be monoids, G :

G(A, B) be a stream transducer, and β, μ : STrans(A, B) be a stream transduc-
tion. We say that G implements β, μ if G(x̄) = F(β, μ)(x̄) for every x̄ ∈ A∗ .

Theorem 23 (Implementation & Coherence). A stream transducer G :

G(A, B) is coherent if and only if it implements some stream transduction.

Proof. Suppose that G = (St, init, o, next, out) : G(A, B) is a coherent transducer.
Deﬁne the function β : A → B by β(x) = o · out(init, x) for every x ∈ A, and
the function μ : A × A → B by μ(x, y) = out(next(init, x), y) for all x, y ∈
A. For any x, y ∈ A, we have to establish that β(x) · μ(x, y) = β(xy). This
follows immediately from Part (O2) of the coherence property for G. So, β, μ
is a stream transduction. It remains to prove that G implements β, μ, that is,
Semantic Foundations for Deterministic Dataﬂow and Stream Processing 409

G(x̄) = F(β, μ)(x̄) for every x̄ ∈ A∗ . For the base case, we have G(ε) = o
and F(β, μ)(ε) = β(1), which are equal because β(1) = o · out(init, 1) = o by
(O1). For the step case, we observe that:

G(x̄ · y) = G(x̄) · out(gnext(init, x̄), y) [Equation (4)]

F(β, μ)(x̄ · y) = F(β, μ)(x̄) · μ(π(x̄), y) [def. of F(β, μ)]

By the induction hypothesis, it suﬃces to show that out(gnext(init, x̄), y) is

equal to μ(π(x̄), y) = out(next(init, π(x̄)), y). This follows from the fact that
gnext(init, x̄) and next(init, π(x̄)) are bisimilar, see Property (N*).
For the converse, suppose that G = (St, init, o, next, out) : G(A, B) is a trans-
ducer that implements β, μ : STrans(A, B). Deﬁne the relation R as:

R = {(s, t) ∈ St × St | there are x̄, ȳ ∈ A∗ with π(x̄) = π(ȳ) s.t.

s = gnext(init, x̄) and t = gnext(init, ȳ)}.

We claim that R is a bisimulation. Consider arbitrary states s, t ∈ St with

sRt and z ∈ A. It follows that there are x̄, ȳ ∈ A∗ with π(x̄) = π(ȳ) such that
s = gnext(init, x̄) and t = gnext(init, ȳ). We have to show that out(s, z) = out(t, z)
and next(s, z) R next(t, z). First, notice that:

G(x̄ · z) = G(x̄) · out(s, z) [Equation (4), def. of s]

F(β, μ)(x̄ · z) = F(β, μ)(x̄) · μ(π(x̄), z) [def. of F(β, μ)]

Since G implements β, μ, we have that G(x̄ · z) = F(β, μ)(x̄ · z) and there-
fore out(s, z) = μ(π(x̄), z). Similarly, we can obtain that out(t, z) = μ(π(ȳ), z).
From π(x̄) = π(ȳ) we get that μ(π(x̄), z) = μ(π(ȳ), z), and therefore out(s, z) =
out(t, z). Now, observe that s = next(s, z) = next(gnext(init, x̄), z) = gnext(x̄ ·
z) using Property 1. Similarly, we have that t = next(t, z) = gnext(ȳ · z).
From π(x̄ · z) = π(x̄)z = π(ȳ)z = π(ȳ · z) we conclude that s Rt . We have
thus established that R is a bisimulation.
Now, we are ready to prove that G is coherent. We will only present the cases
of Part (N2) and Part (O2), since they are the most interesting ones. Let x, y ∈ A.
For Part (N2), we have to show that the states s = next(next(init, x), y) and
t = next(init, xy) are bisimilar. Since R (previous paragraph) is a bisimulation, it
suﬃces to show that (s, t) ∈ R. Indeed, this is true because s = gnext(init, x, y),
t = gnext(init, xy) and π(x, y) = xy = π(xy). For Part (O2), we have that
G(xy) = o, out(init, xy) and F(β, μ)(xy) = β(1), μ(1, xy), as well as

G(x, y) = o, out(init, x), out(next(init, x), y) and

F(β, μ)(x, y) = β(1), μ(1, x), μ(x, y),

using the deﬁnitions of G and F. Since G implements β, μ, we know that
G(x, y) = F(β, μ)(x, y) and G(xy) = F(β, μ)(xy). Using all the above,
we get that o · out(init, x) · out(next(init, x), y) = β(1) · μ(1, x) · μ(x, y) = β(x) ·
μ(x, y) = β(xy) and o · out(init, xy) = β(1) · μ(1, xy) = β(xy). So, Part (O2) of
the coherence property holds.

410 K. Mamouras

Theorem 23 provides justiﬁcation for our deﬁnition of the coherence property

for stream transducers (recall Definition 20). It says that the definition is ex-
actly appropriate, because it is a necessary and sufficient condition for a stream
transducer to have a stream transduction as its denotation. In other words, the
coherence property characterizes the transducers have a well-defined denota-
tional semantics in terms of transductions. It offers this guarantee of correctness
without limiting their expressive power as implementations of transductions.

Theorem 24 (Expressive Completeness). Let A and B be monoids, and

β, μ be a stream transduction in STrans(A, B). There exists a coherent stream
transducer that implements β, μ.

Proof. Recall from Definition 8 that the monotonicity witness function μ satisfies
the following property: β(x) · μ(x, y) = β(xy) for every x, y ∈ A. Now, we define
the transducer G = (St, init, o, next, out) as follows: St = A, init = 1, o = β(1),
next(s, x) = s · x and out(s, x) = μ(s, x) for every state s ∈ St and input x ∈ A.
The following properties hold for every s ∈ St and x1 , . . . , xn ∈ A∗ :

gnext(s, x1 , . . . , xn ) = s · x1 · · · xn and (5)

o · eout(init, x1 , . . . , xn ) = F(β, μ)(x1 , . . . , xn ) (6)

Both these properties are shown by induction on the sequence x1 , . . . , xn . It

follows that G(x̄) = o · eout(init, x̄) = F(β, μ)(x̄) for every x̄ ∈ A∗ . So, G
implements the transduction β, μ. Finally, G is coherent by Theorem 23.

Theorem 24 assures us that the abstract computational model of coherent

stream transducers is expressive enough to implement any stream transduction.
For this reason, we will be using stream transducers as the basic programming
model for describing streaming computations.

Example 25 (Correctness of Flatten). Using induction, we will show that

the transducer G = Flatten(A) = (Unit,
, 1A , next, out) implements the trans-
duction π, μ = ﬂatten(A) for a monoid A (recall Examples 12 and 17). We
show by induction that G(x̄) = F(π, μ)(x̄) for every x̄ ∈ FSeq(A)∗ . For the
base case, we have that G(ε) = 1A and F(π, μ)(ε) = π(ε) = 1A . Now,

G(x̄ · y) = G(x̄) · out(gnext(init, x̄), y) [def. of G]

= F(π, μ)(x̄) · π(y) [I.H. and def. of out]
= F(π, μ)(x̄) · μ(π(x̄), y) [def. of μ]
= F(π, μ)(x̄ · y) [def. of F]

for all x̄ ∈ FSeq(A)∗ and y ∈ FSeq(A). We have thus proved that Flatten(A) is
correct: its denotation is equal to the intended semantics.

Example 26 (Correctness of Split). We will establish that the transducer for

splitting in batches is correct, namely that G = Split(r) = (A, 1A , ε, next, out)
implements r1 , μ = split(r) for a splitter r = (r1 , r2 ) for the monoid A (recall
Semantic Foundations for Deterministic Dataﬂow and Stream Processing 411

Examples 13 and 18). Using the properties of splitters and an argument by

induction, we obtain that gnext(init, x̄) = r2 (π(x̄)) for every x̄ ∈ A∗ . We show
by induction that G(x̄) = F(r1 , μ)(x̄) for every x̄ ∈ A∗ . For the base case, we
have that G(ε) = ε and F(r1 , μ)(ε) = r1 (1A ) = ε. Now,

G(x̄ · y) = G(x̄) · out(gnext(init, x̄), y) [Equation (4)]

= F(r1 , μ)(x̄) · out(r2 (π(x̄)), y) [I.H. and previous claim]
= F(r1 , μ)(x̄) · r1 (r2 (π(x̄)) · y) [def. of out]
= F(r1 , μ)(x̄) · μ(π(x̄), y) [def. of μ]
= F(r1 , μ)(x̄ · y) [def. of F]

for all x̄ ∈ A∗ and y ∈ A. We have thus established that Split(r) is correct: its
denotation is equal to the intended semantics.

5 Combinators for Deterministic Dataﬂow

We consider four dataflow combinators: (1) the lifting of pure morphisms to
streaming computations, (2) serial composition for exposing pipeline parallelism,
(3) parallel composition for exposing task-based parallelism, and (4) feedback
composition for describing computations whose current output depends on pre-
viously produced output. The combinators are defined both for stream transduc-
tions (semantic objects) and for stream transducers (programs). Table 1 shows
the definitions. The lifting of pure morphisms is implemented with a stateless
transducer (i.e., the state space is a singleton set). Both parallel and serial com-
position are implemented using a product construction on transducers. In the
case of parallel composition, each component computes independently. In the
case of serial composition, the output of the first component is passed as input
to the second component. In the case of feedback composition, the computation
proceeds in well-defined rounds in order to prevent divergence.
We prove a precise correspondence between the semantics-level and program-
level combinators for all cases: lifting (Proposition 27), parallel composition
(Propsition 28), serial composition (Proposition 29), and feedback composition
(Proposition 30). These are essentially correctness properties for the imple-
mentations of the combinators Lift, Par, Serial, Loop. They establish that our
typed framework is appropriate for the modular specification of complex stream-
ing computations, as it can support composition constructs that are essential for
parallelization and distribution.
Proposition 27 (Lifting). Let h : A → B be a monoid homomorphism. Then,
Lift(h) is a coherent transducer and it implements the transduction lift(h).
Proposition 28 (Parallel Composition). Let A1 , A2 , B1 , B2 be monoids,
β1 , μ1 : STrans(A1 , B1 ) and β2 , μ2 : STrans(A2 , B2 ) be transductions, and
G1 : G(A1 , B1 ) and G2 : G(A2 , B2 ) be transducers.
(1) Implementation: If G1 implements β1 , μ1 and G2 implements β2 , μ2 ,
then Par(G1 , G2 ) implements β1 , μ1 β2 , μ2 .
412 K. Mamouras

Table 1. Combinators for deterministic dataﬂow.

Lifting of monoid homomorphisms

monoid homomorphism h : A → B β(x) = h(x)
lift(h) = β, μ : STrans(A, B) μ(x, y) = h(y)
Lift(h) = (St, init, o, next, out) init = next(s, x) = s
St = Unit o = h(1) out(s, x) = h(x)
Parallel composition
β1 , μ1 : STrans(A1 , B1 ) β2 , μ2 : STrans(A2 , B2 )
β1 , μ1 β2 , μ2 = β, μ : STrans(A1 × A2 , B1 × B2 )
β(x1 , x2 ) = β1 (x1 ), β2 (x2 ) μ(x1 , x2 , y1 , y2 ) = μ1 (x1 , y1 ), μ2 (x2 , y2 )
G1 = (St1 , init1 , o1 , next1 , out1 ) init = init1 , init2
G2 = (St2 , init2 , o2 , next2 , out2 ) o = o1 , o2
Par(G1 , G2 ) = (St, init, o, next, out) next(s1 , s2 , a, c) = next1 (s1 , a), next2 (s2 , c)
St = St1 × St2 out(s1 , s2 , a, c) = out1 (s1 , a), out2 (s2 , c)
Serial composition
β1 , μ1 : STrans(A, B) β2 , μ2 : STrans(B, C) β(x) = β2 (β1 (x))
β1 , μ1 β2 , μ2 = β, μ : STrans(A, C) μ(x, y) = μ2 (β1 (x), μ1 (x, y))
G1 = (St1 , init1 , o1 , next1 , out1 ) o = o2 · out2 (init2 , o1 )
G2 = (St2 , init2 , o2 , next2 , out2 ) next(s1 , s2 , a) = next1 (s1 , a),
Serial(G1 , G2 ) = (St1 ×St2 , init, o, next, out) next2 (s2 , out1 (s1 , a))
init = init1 , next2 (init2 , o1 ) out(s ,
1 2s , a) = out 2 (s2 , out1 (s1 , a))

Feedback composition
β, μ : STrans(A × B, B)
loopB (β, μ) = γ, ν : STrans(FSeq(A), FSeq(B))
γ(a1 , . . . , an ) = b0 , b1 , . . . , bn
γ(ε) = b0 , where b0 = β(1A , 1B )
γ(a1 , . . . , an , an+1 ) = γ(a1 , . . . , an ) · bn+1 , where
bn+1 = μ(a1 · · · an , b0 b1 · · · bn−1 , an+1 , bn )
G = (St, init, o, next, out) : G(A × B, B)
LoopB(G) = (St , init , o , next , out ) : G(FSeq(A), FSeq(B))
St = St × B (second component: last output batch)
init = init, o and o = o

next (s, b, a) = next(s, a, b), out(s, a, b)
out (s, b, a) = out(s, a, b)
β, μ : STrans(A × B, B) splitter r for A
loop(β, μ, r) = split(r) loopB (β, μ) ﬂatten(B) : STrans(A, B)
G : G(A × B, B) splitter r for A
Loop(G, r) = Serial(Split(r), LoopB(G), Flatten(B)) : G(A, B)
Semantic Foundations for Deterministic Dataﬂow and Stream Processing 413

(2) Coherence: If G1 and G2 are coherent, then so is Par(G1 , G2 ).

Proof. Notice that Part (2) follows immediately from Part (1) and Theorem 23.
Define f = Par(G1 , G2 ) and β, μ = β1 , μ1 β2 , μ2 . We will show that
f (w̄) = F(β, μ)(w̄) for every w̄ ∈ (A1 ×A2 )∗ . Suppose that fst is the (elementwise)
left projection function. We claim that fst(gnext(s, w̄)) = gnext1 (fst(s), fst(w̄))
and fst(eout(s, w̄)) = eout1 (fst(s), fst(w̄)) for all s ∈ St and w̄ ∈ (A1 × A2 )∗ . Both
claims are shown by induction on the length of w̄. With similar arguments we can
obtain that snd(f (w̄)) = G2 (snd(w̄)) for every w̄ ∈ (A1 × A2 )∗ . It can be shown
by induction that fst(F(β, μ)(w̄)) = F(β1 , μ1 )(fst(w̄)) and snd(F(β, μ)(w̄)) =
F(β1 , μ1 )(snd(w̄)) for all w̄ ∈ (A1 × A2 )∗ . In order to establish that f (w̄) =
F(β, μ)(w̄), it suffices to show that fst(f (w̄)) = fst(F(β, μ)(w̄)) and snd(f (w̄)) =
snd(F(β, μ)(w̄)). Given the claims shown previously, these equalities are equiv-
alent to G1 (fst(w̄)) = F(β1 , μ1 )(fst(w̄)) and G2 (snd(w̄)) = F(β2 , μ2 )(snd(w̄))
respectively. These equalities follow from the assumptions that G1 implements
β1 , μ1 and G2 implements β2 , μ2 .

Proposition 29 (Serial Composition). Let A, B, C be monoids, β1 , μ1 :
STrans(A, B) and β2 , μ2 : STrans(B, C) be transductions, and G1 : G(A, B) and
G2 : G(B, C) be transducers.
(1) Implementation: If G1 implements β1 , μ1 and G2 implements β2 , μ2 ,
then Serial(G1 , G2 ) implements β1 , μ1 β2 , μ2 .
(2) Coherence: If G1 and G2 are coherent, then so is Serial(G1 , G2 ).
Proof. Part (2) follows easily from Part (1) and Theorem 23. In order to prove
Part (1) we have to first establish a number of preliminary facts. We define the
function M2 : A∗ → A as follows: M2 (ε) = 1, M2 (x) = x for x ∈ A, and
M2 (x, y · z̄) = xy · z̄ for x, y ∈ A and z̄ ∈ A∗ . We write G to denote G1 G2 .

fst(gnext(s, x̄)) = gnext1 (fst(s), x̄) for all s ∈ St and x̄ ∈ A∗ (7)

∗
snd(gnext(s, x̄)) = gnext2 (snd(s), eout1 (fst(s), x̄)) for all s ∈ St and x̄ ∈ A (8)
G(x̄) = M (G (G (x̄))) for all x̄ ∈ A∗
2 2 1 (9)
F(β, μ)(x̄) = M2 (F(β2 , μ2 )(F(β1 , μ1 )(x̄))) for all x̄ ∈ A∗ (10)

where β, μ = β1 , μ1 β2 , μ2 . All four claims above are proved by induction
on the sequence x̄. Equations (7) and (8) are needed to prove Equation (9). Now,
we will establish that G implements β, μ. Indeed, we have that

G(x̄) = M2 (G2 (G1 (x̄))) [Equation (9)]

= M2 (G2 (F(β1 , μ1 )(x̄))) [G1 implements β1 , μ1 ]
= M2 (F(β2 , μ2 )(F(β1 , μ1 )(x̄))) [G2 implements β2 , μ2 ]
= F(β, μ)(x̄) [Equation (10)]

for every x̄ ∈ A∗ . So, we conclude that G implements β, μ.

Let us give an example of how to construct complex computations from sim-
pler ones using the dataﬂow combinators. Let A, B be sets and op : A → B
414 K. Mamouras

be a function. We want to describe a streaming computation with two input

channels, both of type FBag(A), and one output channel of type FBag(B).
The computation transforms both input channels in the same way, namely by
applying the function op to each element. This gives two output substreams,
both of type FBag(B), that are merged into the output stream. The function
op : A → B lifts to a monoid homomorphism op : FBag(A) → FBag(B), given
by op(x) = {op(a) | a ∈ x} for every multiset x. The streaming computation
described previously can be visualized using the dataﬂow graph shown below.
FBag(A) FBag(B)
Lift(op) FBag(B)
FBag(A) Merge
Lift(op) FBag(B)

Each edge of the graph represents a communication channel along which a stream
flows, and it is annotated with the type of the stream. The dataflow graph
above represents the transducer G = Serial(Par(Lift(op), Lift(op)), Merge),
where Merge : G(FBag(A) × FBag(A), FBag(A)) is the transducer of Example 16.
From Propositions 27, 29 and 28 we obtain that G implements the transduction
(lift(op) lift(op)) merge, where merge is described in Example 11.
We will now consider the feedback combinator, which introduces cycles in
the dataflow graph. One consequence of cyclic graphs in the style of Kahn-
MacQueen [60] is that divergence can be introduced, that is, a finite amount
of input can cause an operator to enter an infinite loop. For example, consider
the transducer Merge : G(FBag(A) × FBag(A), FBag(A)) of Example 16. The
figure below visualizes the dataflow graph, where the output channel of Merge
is connected to one of its input channels, thus forming a feedback loop.
FBag(A) FBag(A) FBag(A)
Merge •

Suppose that the singleton input {a} is fed to the input of the dataflow graph
above, which corresponds to the first input channel of Merge. This will cause
Merge to emit {a}, which will be sent again to the second input channel of Merge.
Intuitively, this will cause the computation to enter an infinite loop (divergence)
of consuming and emitting {a}. This behavior is undesirable in systems that
process data streams, because divergence can make the system unresponsive. For
this reason, we will consider here a form of feedback that eliminates this problem
by ensuring that the computation of a feedback loop proceeds in a sequence of
rounds. This avoid divergence, because the computation always makes progress
by moving from one round to the next, as dictated by the input data. We describe
this organization in rounds by requiring that the programmer specifies a splitter
(recall Example 18). The splitter decomposes the input stream into batches,
and one round of computation for the feedback loop corresponds to consuming
one batch of data, generating the corresponding output batch, and sending the
output batch along the feedback loop to be available for the next round of
processing. This form of feedback allows flexibility in specifying what constitutes
Semantic Foundations for Deterministic Dataflow and Stream Processing 415

a single batch (and thus a single round ), and therefore generalizes the feedback
combinator of Synchronous Languages such as Lustre [31].
Proposition 30 (Feedback Composition). Let A and B be monoids, β, μ :
STrans(A, B) be a transduction, G : G(A, B) be a transducer, and r = (r1 , r2 )
be a splitter for A (see Example 13).
(1) Implem.: If G implements β, μ, then Loop(G, r) implements loop(β, μ, r).
(2) Coherence: If G is coherent, then so is Loop(G, r).
Proof. We leave to the reader the proofs that Split (Example 18) implements
split and that Flatten (Example 17) implements flatten. Given Proposition 29,
it suffices to show that G = LoopB(G) implements γ, ν = loopB (β, μ). Since
G is of type G(FSeq(A), FSeq(B)) it suffices to define the transition and output
functions on singleton sequences (as done in Table 1), because there is a unique
way to extend them so that G is coherent. It remains to show that G (x̄) =
F(γ, ν)(x̄) for every x̄ ∈ FSeq(A)∗ . The base case is easy, and for the step case it
suffices to show that out (gnext (init , x̄), y) = ν(π(x̄), y) for every x̄ ∈ FSeq(A)∗
and y ∈ FSeq(A). As we discussed before, gnext and out can be viewed as being
defined on elements of A rather than sequences of FSeq(A), so we can equivalently
prove that out (gnext (init , a1 , . . . , an ), an+1 ) = ν(a1 , . . . , an , an+1 ) with each
ai an element of A. Given that G implements β, μ, the key observation to finish
the proof is gnext (init , a1 , . . . , an ) = gnext(init, a1 , b0 , . . . , an , bn−1 ), bn ,
where γ(a1 , . . . , an ) = b0 , b1 , . . . , bn .

Example 31. For an example of using the feedback combinator, consider the
transduction β, μ which adds two input streams of numbers pointwise. That
is, β : FSeq(N) × FSeq(N) → FSeq(N) is defined by β(x1 x2 . . . xm , y1 y2 . . . yn ) =
0(x1 + y1 )(x2 + y2 ) . . . (xk + yk ) where k = min(m, n). Additionally, consider
the trivial splitter r = (r1 , r2 ) for sequences where each batch is a singleton:
r1 (x1 . . . xn ) = x1 , . . . , xn and r2 (x1 . . . xn ) = ε. We use this splitter to enforce
that each batch is a single element and that each round of the computation
involves consuming one element. Finally, the transduction loop(β, μ, r) = γ, ν
describes the running sum, that is, γ(x1 . . . xn ) = 0x1 (x1 + x2 ) . . . (x1 + · · · + xn ).
The dataflow combinators of this section could form the basis of query lan-
guage design. The StreamQRE language [10,84] and related formalisms [9,11,12,
14] are based on a set of combinators for efficiently processing linearly-ordered
streams (e.g., time series [3, 4]). Extending a language like StreamQRE to the
typed setting of stream transductions is an interesting research direction.

6 Algebraic Reasoning for Optimizing Transformations

Our typed denotational framework can be used to validate optimizing transfor-
mations using algebraic reasoning. This amounts to establishing that the original
transducer is equivalent to the optimized one. A fundamental approach for show-
ing equivalence of composite transducers is to establish algebraic laws between
basic building blocks, and then use algebraic rewriting.
416 K. Mamouras

As a concrete example, consider the per-key streaming aggregation of Exam-

ple 10, which is described by the transduction reduce(K, op) : STrans(FBag(K ×
V ), FMap(K, V )), where K is the set of keys, V is the set of values, and op :
V × V → V is an associative and commutative aggregation operation. Let
h : K → {1, . . . , n} be a hash function for the keys, and deﬁne Kih = h−1 (i) =
{k ∈ K | h(k) = i} for every i. Consider two variants of the merging operation of
Example 11: (1) kmerge(h) merges n input streams of types FBag(K1h × V ), . . . ,
FBag(Knh × V ) respectively into an output stream of type FBag(K × V ), and (2)
mmerge(h) merges n input streams of types FMap(K1h , V ), . . . , FMap(Knh , V )
respectively into an output stream of type FMap(K, V ). We also consider the
transduction ksplit(h) that partitions an input stream of type FBag(K × V )
into n output substreams of types FBag(K1h × V ), . . . , FBag(Knh × V ) respec-
tively. Using elementary set-theoretic arguments, the following equalities can be
established: ksplit(h) kmerge(h) = id and

kmerge(h) rd(K, op) = (rd(K1h , op) · · · rd(Knh , op)) mmerge(h),

where rd abbreviates reduce. Next, we consider the corresponding transducers

KSplit(h), KMerge(h), Id, Reduce(K, op) (abbreviation Rd) and MMerge(h) and
establish that they implement the respective transductions. This can be shown
with induction proofs as shown earlier in Example 25 and Example 26. Using
these facts and the propositions of Sect. 5, the equalities between transductions
shown earlier give the following equations (equivalences) between transducers:
KSplit(h) KMerge(h) ≡ Id and

KMerge(h) Rd(K, op) ≡ (Rd(K1h , op) · · · Rd(Knh , op)) MMerge(h).

Using these equations, we can establish the following optimizing transformation

for data parallelization, which is useful when processing high-rate data streams.

Reduce(K, op) ≡ Id Reduce(K, op)

≡ KSplit(h) KMerge(h) Reduce(K, op)
≡ KSplit(h) (Rd(K1h , op) · · · Rd(Knh , op)) MMerge(h).

The above equation illustrates our proposed style of reasoning for establishing
the soundness of optimizing streaming transformations: (1) prove equalities be-
tween transductions using elementary set-theoretic arguments, (2) prove that
the transducers (programs) implement the transductions (denotations) using
induction, (3) translate the equalities between transductions into equivalences
between transducers using the results of Sect. 5, and ﬁnally (4) use algebraic
reasoning to establish more complex equivalences.
The example of this section is simple but illustrates two key points: (1) our
data types for streams (monoids) capture important invariants about the streams
that enable transformations, and (2) useful program transformations can be
established with denotational arguments that require an appropriate notion of
transduction. This approach opens up the possibility of formally verifying the
wealth of optimizing transformations that are used in stream processing systems.
Semantic Foundations for Deterministic Dataﬂow and Stream Processing 417

The papers [54, 101] describe several of them, but use informal arguments that
rely on the operational intuition about streaming computations. Our approach
here, on the other hand, relies on rigorous denotational arguments.
The equational axiomatizations of arrows [56] and traced monoidal categories
[58] are relevant to our setting, but would require adaptation. An interesting
question is whether a complete axiomatization can be provided for the basic
dataﬂow combinators of Sect. 5, similarly to how Kleene Algebra (KA) [62, 63]
and its extensions [49,64,79,83] (as well as other program logics [65,66,78,80–82])
capture properties of imperative programs at the propositional level. We also
leave for future work the development of the coalgebraic approach [96–98] for
reasoning about the equivalence of stream transducers. We have already deﬁned
a notion of bisimulation in Sect. 4, which could give an alternative approach for
proving equivalence using coinduction on the transducers.

7 Related Work

Sect. 1 contains several pointers to related literature for stream processing. In

this section, we will focus on prior work that specifically addresses aspects of
formal semantics for streaming computation.
The seminal work of Gilles Kahn [59] is exemplary in its rigorous treatment
of denotational semantics for a language of deterministic dataflow graphs
of independent processes, which access their input channels using blocking read
statements and the output channels using nonblocking write statements. The lan-
guage Lustre [31] is a synchronous restriction of Kahn’s model, which introduces
the semantic idea of a clock for specifying the rate of a stream. Other notable
synchronous formalisms are the language Signal [21, 72] and Esterel [22, 28], and
the synchronous dataflow graphs of [73] and [24]. These formalisms are all de-
terministic, in the sense the the output is determined purely by the input data.
Nondeterminism creates unavoidable semantic complications [30].
The CQL language [16] is a streaming extension of a relational database
language with additional constructs for time-based windowing. The denotational
semantics of CQL [17] can be reconstructed and greatly simplified within our
framework using the notion of stream described in Example 7 (finite time-varying
multisets). There are several works that deal with the semantics of specific lan-
guage constructs (e.g., windows), notions of time, punctuations and disordered
streams, but do not give a mathematical description of the overall streaming
computation [5, 7, 25, 44, 67, 75, 76, 109].
The literature on Functional Reactive Programming (FRP) [34, 46, 47, 55,
68, 69, 93, 103] is closely related to the deterministic dataflow formalisms men-
tioned earlier. The main abstractions in FRP are signals and event sequences,
which are linearly ordered data. Processing unordered data (e.g., multisets and
maps) and extracting data parallelism (e.g., the per-key aggregation of Sect. 6)
require a data model that goes beyond linear orders. In particular, the axioms
of arrows [56] (often used in FRP) cannot prove the soundness of the optimizing
transformation of Sect. 6, which requires reasoning about multisets.
418 K. Mamouras

The idea of using types to classify streams has been recently explored in [85]
(see also [13]), but only for a restricted class of types that correspond to partial
orders. No general abstract model of computation is presented in [85], and many
of the examples in this paper cannot be adequately accomodated.
The mathematical framework of coalgebras [97] has been used to describe
streams [98]. One advantage of this approach is that proofs of equivalence can
be given using the proof principle of coinduction [96], which in many cases offers
a useful alternative to proofs by induction. This line of work mostly focuses on
infinite sequences of elements, whereas here we focus on the transformation of
streams of data that can be of various different forms (not just sequences).
The idea to model the input/output of automata using monoids has appeared
in the algebraic theory of automata and transducers. Monoids (non-free, e.g.
A∗ × B ∗ ) have been used to generalize automata from recognizers of languages
to recognizers of relations [45], which are sometimes called rational transduc-
ers [100]. Our focus here is on (deterministic) functions, as models that recog-
nize relations can give rise to the Brock-Ackerman anomaly [30]. The automata
models (with inputs from a free monoid A∗ ) most closely related to our stream
transducers are deterministic: Mealy machines [87], Moore machines [90], se-
quential transducers [48, 95], and sub-sequential transducers [102]. The concept
of coherence that we introduce here (Definition 20) does not arise in these mod-
els, because they do not operate on input batches. An algebraic generalization
of a deterministic acceptor is provided by a right monoid action δ : St × A → St
(see page 231 of [100]), which satisfies the following properties for all s ∈ St and
x, y ∈ A: (1) δ(s, 1) = s, and (2) δ(δ(s, x), y) = δ(s, xy). These properties look
similar to (N1) and (N2) of Definition 20. They are, however, too restrictive for
our stream transducers, as they would falsify Theorem 23.

8 Conclusion

We have presented a typed semantic framework for stream processing, based

on the idea of abstracting data streams as elements of algebraic structures
called monoids. Data streams are thus classified using monoids as types. Stream
transformations are modeled as monotone functions, which are organized by in-
put/output type. We have adapted the classical model of string transducers to
our setting, and we have developed a general theory of streaming computation
with a formal denotational semantics. The entire technical development in this
paper is constructive, and therefore lends itself well to formalization in a proof
assistant such as Coq [23,35,106]. Our framework can be used for the formaliza-
tion of streaming models, and the validation of subtle optimizations of stream-
ing programs (e.g., Sect. 6), such as the ones described in [54, 101]. We have
restricted our attention in this paper to deterministic streaming computation,
in the sense that the behaviors that we model have predictable and reproducible
results. Nondeterminism causes fundamental semantic difficulties [30], and it is
undesirable in applications where repeatability is important.
Semantic Foundations for Deterministic Dataflow and Stream Processing 419

References
1. Abadi, D.J., Ahmad, Y., Balazinska, M., Cetintemel, U., Cherniack, M., Hwang,
J.H., Lindner, W., Maskey, A., Rasin, A., Ryvkina, E., Tatbul, N., Xing, Y.,
Zdonik, S.: The design of the Borealis stream processing engine. In: Proceedings
of the 2nd Biennial Conference on Innovative Data Systems Research (CIDR ’05).
pp. 277–289 (2005), http://cidrdb.org/cidr2005/papers/P23.pdf
2. Abadi, D.J., Carney, D., Cetintemel, U., Cherniack, M., Convey, C., Lee, S.,
Stonebraker, M., Tatbul, N., Zdonik, S.: Aurora: A new model and architec-
ture for data stream management. The VLDB Journal 12(2), 120–139 (2003).
https://doi.org/10.1007/s00778-003-0095-z
3. Abbas, H., Alur, R., Mamouras, K., Mangharam, R., Rodionova, A.: Real-time
decision policies with predictable performance. Proceedings of the IEEE, Spe-
cial Issue on Design Automation for Cyber-Physical Systems 106(9), 1593–1615
(2018). https://doi.org/10.1109/JPROC.2018.2853608
4. Abbas, H., Rodionova, A., Mamouras, K., Bartocci, E., Smolka, S.A., Grosu, R.:
Quantitative regular expressions for arrhythmia detection. IEEE/ACM Trans-
actions on Computational Biology and Bioinformatics 16(5), 1586–1597 (2019).
https://doi.org/10.1109/TCBB.2018.2885274
5. Affetti, L., Tommasini, R., Margara, A., Cugola, G., Della Valle, E.: Defining
the execution semantics of stream processing engines. Journal of Big Data 4(1)
(2017). https://doi.org/10.1186/s40537-017-0072-9
6. Akidau, T., Balikov, A., Bekiroğlu, K., Chernyak, S., Haberman, J., Lax, R.,
McVeety, S., Mills, D., Nordstrom, P., Whittle, S.: MillWheel: Fault-tolerant
stream processing at Internet scale. Proceedings of the VLDB Endowment 6(11),
1033–1044 (2013). https://doi.org/10.14778/2536222.2536229
7. Akidau, T., Bradshaw, R., Chambers, C., Chernyak, S., Fernández-Moctezuma,
R.J., Lax, R., McVeety, S., Mills, D., Perry, F., Schmidt, E., Whittle, S.: The
dataflow model: A practical approach to balancing correctness, latency, and cost in
massive-scale, unbounded, out-of-order data processing. Proceedings of the VLDB
Endowment 8(12), 1792–1803 (2015). https://doi.org/10.14778/2824032.2824076
8. Alur, R., Černý, P.: Streaming transducers for algorithmic verification of
single-pass list-processing programs. In: Proceedings of the 38th Annual
ACM SIGPLAN-SIGACT Symposium on Principles of Programming Lan-
guages. pp. 599–610. POPL ’11, ACM, New York, NY, USA (2011).
https://doi.org/10.1145/1926385.1926454
9. Alur, R., Fisman, D., Mamouras, K., Raghothaman, M., Stanford, C.: Stream-
able regular transductions. Theoretical Computer Science 807, 15–41 (2020).
https://doi.org/10.1016/j.tcs.2019.11.018
10. Alur, R., Mamouras, K.: An introduction to the StreamQRE language. Depend-
able Software Systems Engineering 50, 1–24 (2017). https://doi.org/10.3233/978-
1-61499-810-5-1
11. Alur, R., Mamouras, K., Stanford, C.: Automata-based stream processing. In:
Chatzigiannakis, I., Indyk, P., Kuhn, F., Muscholl, A. (eds.) Proceedings of
the 44th International Colloquium on Automata, Languages, and Programming
(ICALP ’17). Leibniz International Proceedings in Informatics (LIPIcs), vol. 80,
pp. 112:1–112:15. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl,
Germany (2017). https://doi.org/10.4230/LIPIcs.ICALP.2017.112
12. Alur, R., Mamouras, K., Stanford, C.: Modular quantitative monitoring. Pro-
ceedings of the ACM on Programming Languages 3(POPL), 50:1–50:31 (2019).
https://doi.org/10.1145/3290363
420 K. Mamouras

13. Alur, R., Mamouras, K., Stanford, C., Tannen, V.: Interfaces for stream process-
ing systems. In: Lohstroh, M., Derler, P., Sirjani, M. (eds.) Principles of Modeling:
Essays Dedicated to Edward A. Lee on the Occasion of His 60th Birthday, Lec-
ture Notes in Computer Science, vol. 10760, pp. 38–60. Springer, Cham (2018).
https://doi.org/10.1007/978-3-319-95246-8 3
14. Alur, R., Mamouras, K., Ulus, D.: Derivatives of quantitative regular expressions.
In: Aceto, L., Bacci, G., Bacci, G., Ingólfsdóttir, A., Legay, A., Mardare, R.
(eds.) Models, Algorithms, Logics and Tools: Essays Dedicated to Kim Guldstrand
Larsen on the Occasion of His 60th Birthday, Lecture Notes in Computer Science,
vol. 10460, pp. 75–95. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-
63121-9 4
15. Arasu, A., Babcock, B., Babu, S., Cieslewicz, J., Datar, M., Ito, K., Motwani,
R., Srivastava, U., Widom, J.: STREAM: The Stanford data stream management
system. Tech. Rep. 2004-20, Stanford InfoLab (2004), http://ilpubs.stanford.edu:
8090/641/
16. Arasu, A., Babu, S., Widom, J.: The CQL continuous query language: Seman-
tic foundations and query execution. The VLDB Journal 15(2), 121–142 (2006).
https://doi.org/10.1007/s00778-004-0147-z
17. Arasu, A., Widom, J.: A denotational semantics for continuous queries
over streams and relations. SIGMOD Record 33(3), 6–11 (2004).
https://doi.org/10.1145/1031570.1031572
18. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models
and issues in data stream systems. In: Proceedings of the Twenty-first
ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database
Systems. pp. 1–16. PODS ’02, ACM, New York, NY, USA (2002).
https://doi.org/10.1145/543613.543615
19. Bai, Y., Thakkar, H., Wang, H., Luo, C., Zaniolo, C.: A data stream
language and system designed for power and extensibility. In: Proceedings
of the 15th ACM International Conference on Information and Knowledge
Management. pp. 337–346. CIKM ’06, ACM, New York, NY, USA (2006).
https://doi.org/10.1145/1183614.1183664
20. Benveniste, A., Caspi, P., Edwards, S.A., Halbwachs, N., Guernic, P.L., de Si-
mone, R.: The synchronous languages 12 years later. Proceedings of the IEEE
91(1), 64–83 (2003). https://doi.org/10.1109/JPROC.2002.805826
21. Benveniste, A., Guernic, P.L., Jacquemot, C.: Synchronous programming with
events and relations: The SIGNAL language and its semantics. Science of
Computer Programming 16(2), 103–149 (1991). https://doi.org/10.1016/0167-
6423(91)90001-E
22. Berry, G., Gonthier, G.: The Esterel synchronous programming language: De-
sign, semantics, implementation. Science of Computer Programming 19(2), 87–
152 (1992). https://doi.org/10.1016/0167-6423(92)90005-V
23. Bertot, Y., Castéran, P.: Interactive Theorem Proving and Program Development.
Springer (2013). https://doi.org/10.1007/978-3-662-07964-5
24. Bilsen, G., Engels, M., Lauwereins, R., Peperstraete, J.: Cyclo-static
dataflow. IEEE Transactions on Signal Processing 44(2), 397–408 (1996).
https://doi.org/10.1109/78.485935
25. Botan, I., Derakhshan, R., Dindar, N., Haas, L., Miller, R.J., Tatbul, N.: SE-
CRET: A model for analysis of the execution semantics of stream process-
ing systems. Proceedings of the VLDB Endowment 3(1-2), 232–243 (2010).
https://doi.org/10.14778/1920841.1920874
Semantic Foundations for Deterministic Dataflow and Stream Processing 421

26. Bouillet, E., Kothari, R., Kumar, V., Mignet, L., Nathan, S., Ranganathan, A.,
Turaga, D.S., Udrea, O., Verscheure, O.: Processing 6 billion CDRs/day: From
research to production (experience report). In: Proceedings of the 6th ACM In-
ternational Conference on Distributed Event-Based Systems. pp. 264–267. DEBS
’12, ACM, New York, NY, USA (2012). https://doi.org/10.1145/2335484.2335513
27. Bourke, T., Pouzet, M.: Zélus: A synchronous language with ODEs. In: Pro-
ceedings of the 16th International Conference on Hybrid Systems: Computa-
tion and Control. pp. 113–118. HSCC ’13, ACM, New York, NY, USA (2013).
https://doi.org/10.1145/2461328.2461348
28. Boussinot, F., de Simone, R.: The ESTEREL language. Proceedings of the IEEE
79(9), 1293–1304 (1991). https://doi.org/10.1109/5.97299
29. Brenna, L., Demers, A., Gehrke, J., Hong, M., Ossher, J., Panda, B., Riedewald,
M., Thatte, M., White, W.: Cayuga: A high-performance event processing engine.
In: Proceedings of the 2007 ACM SIGMOD International Conference on Manage-
ment of Data. pp. 1100–1102. SIGMOD ’07, ACM, New York, NY, USA (2007).
https://doi.org/10.1145/1247480.1247620
30. Brock, J.D., Ackerman, W.B.: Scenarios: A model of non-determinate computa-
tion. In: Dı́az, J., Ramos, I. (eds.) Proceedings of the International Colloquium
on the Formalization of Programming Concepts (ICFPC ’81). Lecture Notes in
Computer Science, vol. 107, pp. 252–259. Springer, Berlin, Heidelberg (1981).
https://doi.org/10.1007/3-540-10699-5 102
31. Caspi, P., Pilaud, D., Halbwachs, N., Plaice, J.A.: LUSTRE: A declar-
ative language for real-time programming. In: Proceedings of the 14th
ACM SIGACT-SIGPLAN Symposium on Principles of Programming Lan-
guages. pp. 178–188. POPL ’87, ACM, New York, NY, USA (1987).
https://doi.org/10.1145/41625.41641
32. Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M.J., Hellerstein, J.M.,
Hong, W., Krishnamurthy, S., Madden, S., Raman, V., Reiss, F., Shah, M.: Tele-
graphCQ: Continuous dataﬂow processing for an uncertain world. In: Proceedings
of the First Biennial Conference on Innovative Data Systems Research (CIDR ’03)
(2003), http://cidrdb.org/cidr2003/program/p24.pdf
33. Chen, C.M., Agrawal, H., Cochinwala, M., Rosenbluth, D.: Stream query pro-
cessing for healthcare bio-sensor applications. In: Proceedings of the 20th Inter-
national Conference on Data Engineering. pp. 791–794. ICDE ’04, IEEE (2004).
https://doi.org/10.1109/ICDE.2004.1320048
34. Cooper, G.H., Krishnamurthi, S.: Embedding dynamic dataﬂow in a call-by-value
language. In: Sestoft, P. (ed.) Proceedings of the 15th European Symposium on
Programming (ESOP ’06). Lecture Notes in Computer Science, vol. 3924, pp. 294–
308. Springer, Berlin, Heidelberg (2006). https://doi.org/10.1007/11693024 20
35. Coquand, T., Huet, G.: The calculus of constructions. Information and Compu-
tation 76(2), 95–120 (1988). https://doi.org/10.1016/0890-5401(88)90005-3
36. Courtney, A.: Frappé: Functional reactive programming in Java. In: Ra-
makrishnan, I.V. (ed.) Proceedings of the 3rd International Symposium on
Practical Aspects of Declarative Languages (PADL ’01). Lecture Notes in
Computer Science, vol. 1990, pp. 29–44. Springer, Berlin, Heidelberg (2001).
https://doi.org/10.1007/3-540-45241-9 3
37. Cranor, C., Johnson, T., Spataschek, O., Shkapenyuk, V.: Gigascope: A stream
database for network applications. In: Proceedings of the 2003 ACM SIGMOD
International Conference on Management of Data. pp. 647–651. SIGMOD ’03,
ACM, New York, NY, USA (2003). https://doi.org/10.1145/872757.872838
422 K. Mamouras

38. Czaplicki, E., Chong, S.: Asynchronous functional reactive programming for GUIs.
In: Proceedings of the 34th ACM SIGPLAN Conference on Programming Lan-
guage Design and Implementation. pp. 411–422. PLDI ’13, ACM, New York, NY,
USA (2013). https://doi.org/10.1145/2491956.2462161
39. D’Angelo, B., Sankaranarayanan, S., Sanchez, C., Robinson, W., Finkbeiner,
B., Sipma, H.B., Mehrotra, S., Manna, Z.: LOLA: Runtime monitoring of syn-
chronous systems. In: Proceedings of the 12th International Symposium on Tem-
poral Representation and Reasoning (TIME ’05). pp. 166–174. IEEE (2005).
https://doi.org/10.1109/TIME.2005.26
40. Demers, A., Gehrke, J., Hong, M., Riedewald, M., White, W.: Towards expres-
sive publish/subscribe systems. In: Ioannidis, Y., Scholl, M.H., Schmidt, J.W.,
Matthes, F., Hatzopoulos, M., Boehm, K., Kemper, A., Grust, T., Boehm, C.
(eds.) Proceedings of the 10th International Conference on Extending Database
Technology (EDBT ’06). Lecture Notes in Computer Science, vol. 3896, pp. 627–
644. Springer, Berlin, Heidelberg (2006). https://doi.org/10.1007/11687238 38
41. Demers, A., Gehrke, J., Panda, B., Riedewald, M., Sharma, V., White, W.:
Cayuga: A general purpose event monitoring system. In: Proceedings of the 3rd
Biennial Conference on Innovative Data Systems Research (CIDR ’07). pp. 412–
422 (2007), http://cidrdb.org/cidr2007/papers/cidr07p47.pdf
42. Dennis, J.B.: First version of a data flow procedure language. In: Robinet, B.
(ed.) Programming Symposium. Lecture Notes in Computer Science, vol. 19,
pp. 362–376. Springer, Berlin, Heidelberg (1974). https://doi.org/10.1007/3-540-
06859-7 145
43. Deshmukh, J.V., Donzé, A., Ghosh, S., Jin, X., Juniwal, G., Seshia, S.A.: Robust
online monitoring of signal temporal logic. Formal Methods in System Design
51(1), 5–30 (2017). https://doi.org/10.1007/s10703-017-0286-7
44. Dindar, N., Tatbul, N., Miller, R.J., Haas, L.M., Botan, I.: Modeling the execution
semantics of stream processing engines with SECRET. The VLDB Journal 22(4),
421–446 (2013). https://doi.org/10.1007/s00778-012-0297-3
45. Elgot, C.C., Mezei, J.E.: On relations defined by generalized finite au-
tomata. IBM Journal of Research and Development 9(1), 47–68 (1965).
https://doi.org/10.1147/rd.91.0047
46. Elliott, C., Hudak, P.: Functional reactive animation. In: Proceedings of
the Second ACM SIGPLAN International Conference on Functional Pro-
gramming. pp. 263–273. ICFP ’97, ACM, New York, NY, USA (1997).
https://doi.org/10.1145/258948.258973
47. Elliott, C.M.: Push-pull functional reactive programming. In: Proceedings of the
2nd ACM SIGPLAN Symposium on Haskell. pp. 25–36. Haskell ’09, ACM, New
York, NY, USA (2009). https://doi.org/10.1145/1596638.1596643
48. Ginsburg, S., Rose, G.F.: A characterization of machine mappings. Canadian
Journal of Mathematics 18, 381—-388 (1966). https://doi.org/10.4153/CJM-
1966-040-3
49. Grathwohl, N.B.B., Kozen, D., Mamouras, K.: KAT + B! In: Proceedings of
the Joint Meeting of the Twenty-Third EACSL Annual Conference on Computer
Science Logic (CSL) and the Twenty-Ninth Annual ACM/IEEE Symposium on
Logic in Computer Science (LICS). pp. 44:1–44:10. CSL-LICS ’14, ACM, New
York, NY, USA (2014). https://doi.org/10.1145/2603088.2603095
50. Gyllstrom, D., Wu, E., Chae, H.J., Diao, Y., Stahlberg, P., Anderson, G.: SASE:
Complex event processing over streams. In: Proceedings of the 3rd Biennial Con-
ference on Innovative Data Systems Research (CIDR ’07). pp. 407–411 (2007),
http://cidrdb.org/cidr2007/papers/cidr07p46.pdf
Semantic Foundations for Deterministic Dataflow and Stream Processing 423

51. Halbwachs, N., Caspi, P., Raymond, P., Pilaud, D.: The synchronous data ﬂow
programming language LUSTRE. Proceedings of the IEEE 79(9), 1305–1320
(1991). https://doi.org/10.1109/5.97300
52. Havelund, K., Roşu, G.: Eﬃcient monitoring of safety properties. Interna-
tional Journal on Software Tools for Technology Transfer 6(2), 158–173 (2004).
https://doi.org/10.1007/s10009-003-0117-6
53. Hirzel, M.: Partition and compose: Parallel complex event processing. In:
Proceedings of the 6th ACM International Conference on Distributed Event-
Based Systems. pp. 191–200. DEBS ’12, ACM, New York, NY, USA (2012).
https://doi.org/10.1145/2335484.2335506
54. Hirzel, M., Soulé, R., Schneider, S., Gedik, B., Grimm, R.: A catalog of stream
processing optimizations. ACM Computing Surveys (CSUR) 46(4), 46:1–46:34
(2014). https://doi.org/10.1145/2528412
55. Hudak, P., Courtney, A., Nilsson, H., Peterson, J.: Arrows, robots, and functional
reactive programming. In: Jeuring, J., Jones, S.L.P. (eds.) Revised Lectures of
the 4th International School on Advanced Functional Programming: AFP 2002,
Oxford, UK, August 19-24, 2002., Lecture Notes in Computer Science, vol. 2638,
pp. 159–187. Springer, Berlin, Heidelberg (2003). https://doi.org/10.1007/978-3-
540-44833-4 6
56. Hughes, J.: Generalising monads to arrows. Science of Computer Programming
37(1), 67–111 (2000). https://doi.org/10.1016/S0167-6423(99)00023-4
57. Jain, N., Mishra, S., Srinivasan, A., Gehrke, J., Widom, J., Balakrishnan, H.,
Çetintemel, U., Cherniack, M., Tibbetts, R., Zdonik, S.: Towards a streaming
SQL standard. Proceedings of the VLDB Endowment 1(2), 1379–1390 (2008).
https://doi.org/10.14778/1454159.1454179
58. Joyal, A., Street, R., Verity, D.: Traced monoidal categories. Mathematical
Proceedings of the Cambridge Philosophical Society 119(3), 447—-468 (1996).
https://doi.org/10.1017/S0305004100074338
59. Kahn, G.: The semantics of a simple language for parallel programming. Infor-
mation Processing 74, 471–475 (1974)
60. Kahn, G., MacQueen, D.B.: Coroutines and networks of parallel processes. Infor-
mation Processing 77, 993–998 (1977)
61. Karp, R.M., Miller, R.E.: Properties of a model for parallel computations: De-
terminacy, termination, queueing. SIAM Journal on Applied Mathematics 14(6),
1390–1411 (1966). https://doi.org/10.1137/0114108
62. Kozen, D.: A completeness theorem for Kleene algebras and the algebra
of regular events. Information and Computation 110(2), 366–390 (1994).
https://doi.org/10.1006/inco.1994.1037
63. Kozen, D.: Kleene algebra with tests. ACM Transactions on Pro-
gramming Languages and Systems (TOPLAS) 19(3), 427–443 (1997).
https://doi.org/10.1145/256167.256195
64. Kozen, D., Mamouras, K.: Kleene algebra with equations. In: Esparza, J., Fraigni-
aud, P., Husfeldt, T., Koutsoupias, E. (eds.) Proceedings of the 41st International
Colloquium on Automata, Languages and Programming (ICALP ’14). Lecture
Notes in Computer Science, vol. 8573, pp. 280–292. Springer, Berlin, Heidelberg
(2014). https://doi.org/10.1007/978-3-662-43951-7 24
65. Kozen, D., Parikh, R.: An elementary proof of the completeness of PDL. The-
oretical Computer Science 14(1), 113–118 (1981). https://doi.org/10.1016/0304-
3975(81)90019-0
424 K. Mamouras

66. Kozen, D., Tiuryn, J.: On the completeness of propositional Hoare logic. In-
formation Sciences 139(3—4), 187–195 (2001). https://doi.org/10.1016/S0020-
0255(01)00164-5
67. Krämer, J., Seeger, B.: Semantics and implementation of continuous sliding win-
dow queries over data streams. ACM Transactions on Database Systems (TODS)
34(1), 4:1–4:49 (2009). https://doi.org/10.1145/1508857.1508861
68. Krishnaswami, N.R.: Higher-order functional reactive programming without
spacetime leaks. In: Proceedings of the 18th ACM SIGPLAN International Con-
ference on Functional Programming. pp. 221–232. ICFP ’13, ACM, New York,
NY, USA (2013). https://doi.org/10.1145/2500365.2500588
69. Krishnaswami, N.R., Benton, N.: Ultrametric semantics of reactive programs. In:
Proceedings of the 26th Annual IEEE Symposium on Logic in Computer Science
(LICS ’11). pp. 257–266. IEEE (2011). https://doi.org/10.1109/LICS.2011.38
70. Kulkarni, S., Bhagat, N., Fu, M., Kedigehalli, V., Kellogg, C., Mittal, S., Patel,
J.M., Ramasamy, K., Taneja, S.: Twitter Heron: Stream processing at scale. In:
Proceedings of the 2015 ACM SIGMOD International Conference on Manage-
ment of Data. pp. 239–250. SIGMOD ’15, ACM, New York, NY, USA (2015).
https://doi.org/10.1145/2723372.2742788
71. Law, Y.N., Wang, H., Zaniolo, C.: Relational languages and data
models for continuous queries on sequences and data streams. ACM
Transactions on Database Systems (TODS) 36(2), 8:1–8:32 (2011).
https://doi.org/10.1145/1966385.1966386
72. Le Guernic, P., Benveniste, A., Bournai, P., Gautier, T.: SIGNAL–
a data flow-oriented language for signal processing. IEEE Transactions
on Acoustics, Speech, and Signal Processing 34(2), 362–374 (1986).
https://doi.org/10.1109/TASSP.1986.1164809
73. Lee, E.A., Messerschmitt, D.G.: Synchronous data flow. Proceedings of the IEEE
75(9), 1235–1245 (1987). https://doi.org/10.1109/PROC.1987.13876
74. Leucker, M., Schallhart, C.: A brief account of runtime verification.
The Journal of Logic and Algebraic Programming 78(5), 293–303 (2009).
https://doi.org/10.1016/j.jlap.2008.08.004
75. Li, J., Maier, D., Tufte, K., Papadimos, V., Tucker, P.A.: Semantics and
evaluation techniques for window aggregates in data streams. In: Proceed-
ings of the 2005 ACM SIGMOD International Conference on Management
of Data. pp. 311–322. SIGMOD ’05, ACM, New York, NY, USA (2005).
https://doi.org/10.1145/1066157.1066193
76. Maier, D., Li, J., Tucker, P., Tufte, K., Papadimos, V.: Semantics of data
streams and operators. In: Eiter, T., Libkin, L. (eds.) Proceedings of the 10th
International Conference on Database Theory (ICDT ’05). Lecture Notes in
Computer Science, vol. 3363, pp. 37–52. Springer, Berlin, Heidelberg (2005).
https://doi.org/10.1007/978-3-540-30570-5 3
77. Maier, I., Odersky, M.: Higher-order reactive programming with incremen-
tal lists. In: Castagna, G. (ed.) Proceedings of the 27th European Confer-
ence on Object-Oriented Programming (ECOOP ’13). Lecture Notes in Com-
puter Science, vol. 7920, pp. 707–731. Springer, Berlin, Heidelberg (2013).
https://doi.org/10.1007/978-3-642-39038-8 29
78. Mamouras, K.: On the Hoare theory of monadic recursion schemes. In: Proceed-
ings of the Joint Meeting of the 23rd EACSL Annual Conference on Computer
Science Logic (CSL) and the 29th Annual ACM/IEEE Symposium on Logic in
Computer Science (LICS). pp. 69:1–69:10. CSL-LICS ’14, ACM, New York, NY,
USA (2014). https://doi.org/10.1145/2603088.2603157
Semantic Foundations for Deterministic Dataflow and Stream Processing 425

79. Mamouras, K.: Extensions of Kleene Algebra for Program Verification. Ph.D. the-
sis, Cornell University, Ithaca, NY (August 2015), http://hdl.handle.net/1813/
40960
80. Mamouras, K.: Synthesis of strategies and the Hoare logic of angelic nondeter-
minism. In: Pitts, A. (ed.) Proceedings of the 18th International Conference on
Foundations of Software Science and Computation Structures (FoSSaCS ’15). Lec-
ture Notes in Computer Science, vol. 9034, pp. 25–40. Springer, Berlin, Heidelberg
(2015). https://doi.org/10.1007/978-3-662-46678-0 2
81. Mamouras, K.: The Hoare logic of deterministic and nondeterministic monadic
recursion schemes. ACM Transactions on Computational Logic (TOCL) 17(2),
13:1–13:30 (2016). https://doi.org/10.1145/2835491
82. Mamouras, K.: Synthesis of strategies using the Hoare logic of angelic and de-
monic nondeterminism. Logical Methods in Computer Science 12(3) (2016).
https://doi.org/10.2168/LMCS-12(3:6)2016
83. Mamouras, K.: Equational theories of abnormal termination based on Kleene al-
gebra. In: Esparza, J., Murawski, A.S. (eds.) Proceedings of the 20th International
Conference on Foundations of Software Science and Computation Structures (FoS-
SaCS ’17). Lecture Notes in Computer Science, vol. 10203, pp. 88–105. Springer,
Berlin, Heidelberg (2017). https://doi.org/10.1007/978-3-662-54458-7 6
84. Mamouras, K., Raghothaman, M., Alur, R., Ives, Z.G., Khanna, S.: StreamQRE:
Modular specification and efficient evaluation of quantitative queries over stream-
ing data. In: Proceedings of the 38th ACM SIGPLAN Conference on Program-
ming Language Design and Implementation. pp. 693–708. PLDI ’17, ACM, New
York, NY, USA (2017). https://doi.org/10.1145/3062341.3062369
85. Mamouras, K., Stanford, C., Alur, R., Ives, Z.G., Tannen, V.: Data-trace
types for distributed stream processing systems. In: Proceedings of the 40th
ACM SIGPLAN Conference on Programming Language Design and Imple-
mentation. pp. 670–685. PLDI ’19, ACM, New York, NY, USA (2019).
https://doi.org/10.1145/3314221.3314580
86. McSherry, F., Murray, D.G., Isaacs, R., Isard, M.: Differential dataflow. In: Pro-
ceedings of the 6th Biennial Conference on Innovative Data Systems Research
(CIDR ’13) (2013), http://cidrdb.org/cidr2013/Papers/CIDR13 Paper111.pdf
87. Mealy, G.H.: A method for synthesizing sequential circuits. The Bell Sys-
tem Technical Journal 34(5), 1045–1079 (1955). https://doi.org/10.1002/j.1538-
7305.1955.tb03788.x
88. Mei, Y., Madden, S.: ZStream: A cost-based query processor for adaptively detect-
ing composite events. In: Proceedings of the 2009 ACM SIGMOD International
Conference on Management of Data. pp. 193–206. SIGMOD ’09, ACM, New York,
NY, USA (2009). https://doi.org/10.1145/1559845.1559867
89. Meyerovich, L.A., Guha, A., Baskin, J., Cooper, G.H., Greenberg, M., Bromfield,
A., Krishnamurthi, S.: Flapjax: A programming language for Ajax applications.
In: Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Pro-
gramming Systems Languages and Applications. pp. 1–20. OOPSLA ’09, ACM,
New York, NY, USA (2009). https://doi.org/10.1145/1640089.1640091
90. Moore, E.F.: Gedanken-Experiments on Sequential Machines, Annals of Mathe-
matics Studies, vol. 34, pp. 129–153. Princeton University Press (1956)
91. Motwani, R., Widom, J., Arasu, A., Babcock, B., Babu, S., Datar, M., Manku,
G.S., Olston, C., Rosenstein, J., Varma, R.: Query processing, approximation,
and resource management in a data stream management system. In: Proceedings
of the First Biennial Conference on Innovative Data Systems Research (CIDR
’03) (2003), http://cidrdb.org/cidr2003/program/p22.pdf
426 K. Mamouras

92. Murray, D.G., McSherry, F., Isaacs, R., Isard, M., Barham, P., Abadi, M.: Naiad:
A timely dataflow system. In: Proceedings of the Twenty-Fourth ACM Sympo-
sium on Operating Systems Principles. pp. 439–455. SOSP ’13, ACM, New York,
NY, USA (2013). https://doi.org/10.1145/2517349.2522738
93. Nilsson, H., Courtney, A., Peterson, J.: Functional reactive programming,
continued. In: Proceedings of the 2002 ACM SIGPLAN Workshop on
Haskell. pp. 51—-64. Haskell ’02, ACM, New York, NY, USA (2002).
https://doi.org/10.1145/581690.581695
94. Noghabi, S.A., Paramasivam, K., Pan, Y., Ramesh, N., Bringhurst, J.,
Gupta, I., Campbell, R.H.: Samza: Stateful scalable stream processing at
LinkedIn. Proceedings of the VLDB Endowment 10(12), 1634–1645 (2017).
https://doi.org/10.14778/3137765.3137770
95. Raney, G.N.: Sequential functions. Journal of the ACM 5(2), 177––180 (1958).
https://doi.org/10.1145/320924.320930
96. Rutten, J.J.M.M.: Automata and coinduction (an exercise in coalgebra). In:
Sangiorgi, D., de Simone, R. (eds.) Proceedings of the 9th International
Conference on Concurrency Theory (CONCUR ’98). Lecture Notes in Com-
puter Science, vol. 1466, pp. 194–218. Springer, Berlin, Heidelberg (1998).
https://doi.org/10.1007/BFb0055624
97. Rutten, J.J.M.M.: Universal coalgebra: A theory of systems. Theoreti-
cal Computer Science 249(1), 3–80 (2000). https://doi.org/10.1016/S0304-
3975(00)00056-6
98. Rutten, J.J.M.M.: A coinductive calculus of streams. Mathe-
matical Structures in Computer Science 15(1), 93–147 (2005).
https://doi.org/10.1017/S0960129504004517
99. Sadri, R., Zaniolo, C., Zarkesh, A., Adibi, J.: Expressing and optimizing sequence
queries in database systems. ACM Transactions on Database Systems 29(2), 282–
318 (2004). https://doi.org/10.1145/1005566.1005568
100. Sakarovitch, J.: Elements of Automata Theory. Cambridge University Press
(2009)
101. Schneider, S., Hirzel, M., Gedik, B., Wu, K.L.: Safe data parallelism for
general streaming. IEEE Transactions on Computers 64(2), 504–517 (2015).
https://doi.org/10.1109/TC.2013.221
102. Schützenberger, M.P.: Sur une variante des fonctions séquentielles. Theo-
retical Computer Science 4(1), 47–57 (1977). https://doi.org/10.1016/0304-
3975(77)90055-X
103. Sculthorpe, N., Nilsson, H.: Safe functional reactive programming through depen-
dent types. In: Proceedings of the 14th ACM SIGPLAN International Conference
on Functional Programming. pp. 23—-34. ICFP ’09, ACM, New York, NY, USA
(2009). https://doi.org/10.1145/1596550.1596558
104. Shivers, O., Might, M.: Continuations and transducer composition. In: Proceed-
ings of the 27th ACM SIGPLAN Conference on Programming Language Design
and Implementation. pp. 295—-307. PLDI ’06, ACM, New York, NY, USA (2006).
https://doi.org/10.1145/1133981.1134016
105. Thati, P., Roşu, G.: Monitoring algorithms for metric temporal logic specifica-
tions. Electronic Notes in Theoretical Computer Science 113, 145–162 (2005).
https://doi.org/10.1016/j.entcs.2004.01.029
106. The Coq development team: The Coq proof assistant. https://coq.inria.fr (2020),
[Online; accessed February 22, 2020]
Semantic Foundations for Deterministic Dataflow and Stream Processing 427

107. Thies, W., Karczmarek, M., Amarasinghe, S.: StreamIt: A language for stream-
ing applications. In: Horspool, R.N. (ed.) Proceedings of the 11th Interna-
tional Conference on Compiler Construction (CC ’02). Lecture Notes in Com-
puter Science, vol. 2304, pp. 179–196. Springer, Berlin, Heidelberg (2002).
https://doi.org/10.1007/3-540-45937-5 14
108. Toshniwal, A., Taneja, S., Shukla, A., Ramasamy, K., Patel, J.M., Kulkarni, S.,
Jackson, J., Gade, K., Fu, M., Donham, J., Bhagat, N., Mittal, S., Ryaboy, D.:
Storm @ Twitter. In: Proceedings of the 2014 ACM SIGMOD International Con-
ference on Management of Data. pp. 147–156. SIGMOD ’14, ACM, New York,
NY, USA (2014). https://doi.org/10.1145/2588555.2595641
109. Tucker, P.A., Maier, D., Sheard, T., Fegaras, L.: Exploiting punctuation semantics
in continuous data streams. IEEE Transactions on Knowledge and Data Engineer-
ing 15(3), 555–568 (2003). https://doi.org/10.1109/TKDE.2003.1198390
110. Veanes, M., Hooimeijer, P., Livshits, B., Molnar, D., Bjorner, N.: Symbolic
ﬁnite state transducers: Algorithms and applications. In: Proceedings of the
39th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Program-
ming Languages. pp. 137–150. POPL ’12, ACM, New York, NY, USA (2012).
https://doi.org/10.1145/2103656.2103674
111. Wu, E., Diao, Y., Rizvi, S.: High-performance complex event processing over
streams. In: Proceedings of the 2006 ACM SIGMOD International Conference on
Management of Data. pp. 407–418. SIGMOD ’06, ACM, New York, NY, USA
(2006). https://doi.org/10.1145/1142473.1142520
112. Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., Stoica, I.: Dis-
cretized streams: Fault-tolerant streaming computation at scale. In: Pro-
ceedings of the Twenty-Fourth ACM Symposium on Operating Systems
Principles. pp. 423–438. SOSP ’13, ACM, New York, NY, USA (2013).
https://doi.org/10.1145/2517349.2522737
113. Zaharia, M., Xin, R.S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X.,
Rosen, J., Venkataraman, S., Franklin, M.J., Ghodsi, A., Gonzalez, J., Shenker,
S., Stoica, I.: Apache Spark: A uniﬁed engine for big data processing. Communi-
cations of the ACM 59(11), 56–65 (2016). https://doi.org/10.1145/2934664

William Mansky1 , Wolf Honoré2 , and Andrew W. Appel3

1
University of Illinois at Chicago, Chicago, IL, USA
2
Yale University, New Haven, CT, USA
3
Princeton University, Princeton, NJ, USA

Abstract. Separation logic is a useful tool for proving the correctness of

programs that manipulate memory, especially when the model of memory
includes higher-order state: Step-indexing, predicates in the heap, and
higher-order ghost state have been used to reason about function point-
ers, data structure invariants, and complex concurrency patterns. On
the other hand, the behavior of system features (e.g., operating systems)
and the external world (e.g., communication between components) is
usually specified using first-order formalisms. In principle, the soundness
theorem of a separation logic is its interface with first-order theorems,
but the soundness theorem may implicitly make assumptions about how
other components are specified, limiting its use. In this paper, we show
how to extend the higher-order separation logic of the Verified Software
Toolchain to interface with a first-order verified operating system, in
this case CertiKOS, that mediates its interaction with the outside world.
The resulting system allows us to prove the correctness of C programs
in separation logic based on the semantics of system calls implemented
in CertiKOS. It also demonstrates that the combination of interaction
trees + CompCert memories serves well as a lingua franca to interface
and compose two quite different styles of program verification.

Keywords: formal veriﬁcation · verifying communication · modular ver-

iﬁcation · interaction trees · VST · CertiKOS

1 Introduction

Separation logic allows us to verify programs by stating pre- and postconditions

that describe the memory usage of a program. Modern variants include reasoning
principles for shared-memory concurrency, invariants of locks and shared data
structures, function pointers, rely-guarantee-style reasoning, and various other
interesting features of programming languages. To support these features, the
“memory” that is the subject of their assertions is not just a map from addresses
to values, but something more complex: it may contain “predicates in the heap”
to allow reasoning about invariants attached to dynamically allocated objects
such as semaphores, it may be step-indexed to allow higher-order assertions, and
it may contain various forms of ghost state describing resources that exist only
c The Author(s) 2020
P. Müller (Ed.): ESOP 2020, LNCS 12075, pp. 428–455, 2020.
https://doi.org/10.1007/978-3-030-44914-8_ 16
Connecting Higher-Order Separation Logic to a First-Order Outside World 429

for the purposes of verification. The soundness proof of the logic then relates
these decorated heaps to the simple address-map view of memory used in the
semantics of the target language.
This works well as long as every piece of the system is verified with re-
spect to decorated heaps, but what if we have multiple verification tools, some
of which provide correctness results in terms of undecorated memory (or, still
worse, memory with a different set of decorations)? To take advantage of the
correctness theorem of a function verified with one of these tools, we will need
to translate our decorated memory into an undecorated one, demonstrate that
it meets the function’s undecorated precondition, and then take the memory
output by the function and use it to reconstruct a decorated memory. In this
paper, we demonstrate a technique to do exactly that, allowing higher-order
separation logics (in this instance, the Verified Software Toolchain) to take ad-
vantage of correctness proofs generated by other tools (in this case, the CertiKOS
verified operating system). This allows us to remove the separation-logic-level
specifications of system calls from our trusted computing base, instead relying
on the operating system’s proofs of its own calls. In particular, we are interested
in functions that do more than just manipulate memory (which is separation
logic’s specialty)—they communicate with the outside world, which may not
know anything about program memory or higher-order state.

int main(void) {
unsigned int n, d; char c;
n=0;
c=getchar();
while (n<1000) {
d = ((unsigned)c)-(unsigned)’0’;
if (d>=10) break;
n+=d;
print int(n);
putchar(’\n’);
c=getchar();
}
return 0;
}
Fig. 1: A simple communicating program

Consider the program in Figure 1. It repeatedly reads a digit from the

console, adds it to the sum of the digits seen so far, and prints the current
sum to the console. Although this is a very simple program, it is not a nat-
ural ﬁt for separation-logic-based veriﬁcation tools, which model the behavior
of C programs in terms of computation and memory rather than I/O. Sev-
eral approaches have been suggested for reasoning about I/O in separation
logic, for instance by Penninckx et al. [18] and Koh et al. [13]. Using the lat-
ter approach, we might specify the behavior of getchar with the Hoare triple
{ITree(r ← read; ; k r)} x = getchar() {ITree(k x)}, relating the function call to
430 W. Mansky et al.

an external read event: the program before the call to getchar must have per-
mission to perform a sequence of operations beginning with a read, and after the
call it has permission to perform the remaining operations (with values that may
depend upon the received value). By adding these specifications as axioms to
VST’s separation logic, we can use standard separation logic techniques to prove
the correctness of programs such as the one above. But when we compile and
run this C program, putchar and getchar are not axiomatized functions; they
are system calls provided by the operating system, which may have an effect
on kernel memory, user memory, and of course the console itself. If we prove
a specification of this C program using the separation logic rules for putchar
and getchar, what does that tell us about the behavior of the program when it
runs? For programs without external calls, we can answer this question with the
soundness proof of the logic. To extend this soundness proof to programs with
external calls, we must relate the pre- and postconditions of the external calls
to both the semantics of C and their implementations in the operating system.
In this paper, we describe a modular approach to proving soundness of a ver-
ification system for communicating programs, including the following elements:
– An extension of VST with support for generic ghost state.
– A generic mechanism for reasoning about external communication in a higher-
order separation logic, built on top of ghost state.
– A technique for relating pre- and postconditions for external functions in
higher-order separation logic to first-order specifications of the same func-
tions in the verified operating system CertiKOS, with a general approach to
“de-step-indexing” a certain class of step-indexed specifications.
– A new notion of correctness of the implementation of external communi-
cation, by relating user-level traces of external behavior to I/O operations
inside the operating system.
The result is the first soundness proof of a separation logic that can be extended
with first-order specifications of system calls. All proofs are formalized in the
Coq proof assistant.
To understand the scope of our results, it is important to clarify exactly
how much of CertiKOS we have brought into our proofs of correctness for C
programs, and how much of a gap remains. The semantics on which we prove
the soundness of our separation logic is the standard CompCert semantics of
C, extended with the specifications of system calls provided by CertiKOS. Our
model does not include the process by which CertiKOS switches from user mode
to kernel mode when executing a system call, but rather assumes that CertiKOS
implements this process so that the user cannot distinguish it from a normal
function call. To prove this assertion rather than assuming it, we would need to
transfer our soundness proof to the whole-system assembly-language semantics
used by CertiKOS, and interface with not just CertiKOS’s system call specifica-
tions but also its top-level correctness theorem. We discuss this last gap further
in Section 7, but in summary, we prove that our client-side programs and OS-side
system calls are correct, while assuming that CertiKOS correctly implements its
transition between user mode and kernel mode.
Connecting Higher-Order Separation Logic to a First-Order Outside World 431

The rest of the paper proceeds as follows. In Section 2, we describe generic

ghost state in separation logic. In Section 3, we show how to encode the state
of the outside world as ghost state that can only be changed through calls to
external functions, allowing us to describe external communication in separation
logic specifications. In Section 4, we use this approach to specify console I/O op-
erations, and demonstrate the verification of a simple communicating program.
In Sections 5 and 6, we describe the process of verifying the implementation of
an external call, by first connecting its VST specification to a first-order speci-
fication on memory and then relating that “dry” specification to the functional
specification of the same call in CertiKOS. This allows us to state our central
theorem, which guarantees that programs verified in VST run correctly given the
CertiKOS system call specifications. In Section 7, we address the relationship
between user-level events and the actual communication performed by the OS.
In Sections 8 and 9, we review related work and summarize our results.

2 Background: Ghost State in Separation Logic

2.1 Ghost Algebras
The fundamental insight behind ghost state is that if a mathematical object
has the same basic properties as a separation logic heap, it can be injected
into separation logic as a resource, even if it is not actually present in program
memory. This insight was discovered independently by many people [4,3,19], and
the “basic properties” required have been characterized in many ways: partial
commutative monoids (PCMs), resource algebras, separation algebras, etc. They
all include the idea that the ghost state must support an operator, often written
as ·, for combining it in the same way heaps are combined by disjoint union,
and they require that operator to have some of the properties of heap union
(associativity, commutativity) but not all (for instance, it may be possible to
combine two identical pieces of ghost state). Crucially, the operator · may be
partial, so that the very existence of one piece of state means that another piece
cannot possibly exist in the same program (just as ownership of one piece of the
heap means that no other thread can hold the same piece). We follow Iris [11]
in also including a validity predicate valid that marks out the elements of an
algebra that represent well-formed ghost state.
Ghost state appears in the logic in a new kind of assertion, which we write
as own, asserting that the current thread owns a certain ghost resource. In the
assertion own g a pp, g is an identifier (analogous to a location in the heap), a is
an element of the underlying algebra, and pp is a predicate, allowing for a limited
form of higher-order ghost state—for instance, we can store separation logic
assertions in ghost state to implement global invariants. The key property of the
own assertion is that separating conjunction on it corresponds to the · operator
of the underlying algebra (see rule own op in Figure 2). By defining different
algebras with different operators, we can define different sharing protocols for
the ghost state. For instance, if we only want to count the number of times
some shared resource is used, the state may be a number and the operator
432 W. Mansky et al.

own op a1 · a2 = a3
own g a3 pp ⇔ own g a1 pp ∗ own g a2 pp

fp update a b
own update
own g a pp own g b pp

P P {P } C {Q } Q Q
consequence
{P } C {Q}

Fig. 2: Key separation logic rules for ghost state

may be addition; if we want to describe the pattern of sharing more precisely,

as with ghost variables, the state may be a pair of the variable’s value and a
fraction of ownership, with a guarantee that two fractions are only compatible
if they agree on the value. More complex sharing patterns correspond to more
complicated join operations; for instance, Jung et al. [11] showed that any acyclic
state machine can be encoded as ghost state, with the join operation computing
the closest common successor of two states. The ghost state is not explicitly
referenced by program instructions, but it can be modiﬁed at any time via a
frame-preserving update: ghost state a can be replaced with b as long as any
third party’s ghost state c that is consistent with a is also consistent with b,
formally expressed as fp update a b ∀c, a · c ⇒ b · c, where we write a · b to
mean ∃d. a · b = d, i.e., a and b are compatible pieces of ghost state. This frame-
preserving update is embedded into the logic using a view-shift operator , as
shown in rule own update of Figure 2.

x = 0;
acquire(l); acquire(l);
x++; x++;
release(l); release(l);

Fig. 3: The increment example

Figure 3 shows the canonical example of a program where ghost state in-
creases the veriﬁcation power of separation logic. Using concurrent separation
logic as originally presented by O’Hearn [17], we can prove that the value of x
at the end of the program is at least 0, but we cannot prove that it is exactly 2.
This limitation comes from the fact that we can associate an invariant with the
lock l, but that invariant cannot express progress properties such as a change
in the value of x. We can get around this limitation by adding ghost state that
captures the contribution of each thread to x, and then use the invariant to en-
sure that the value of x is the sum of all contributions. (This approach is due to
Ley-Wild and Nanevski [16].) We begin with ghost state that models the central
operation of the program:
Connecting Higher-Order Separation Logic to a First-Order Outside World 433

Deﬁnition 1. The sum ghost algebra is the algebra (N, +, λn.True) of natural
numbers with addition, in which every number is a valid element.
Intuitively, the lock invariant should remember every addition to x, while each
individual thread only knows its own contribution. This is actually an instance of
a very general pattern: the reference pattern, in which one party holds a complete
and correct “reference” copy of some ghost state, and one or more other parties
hold possibly incomplete “partial” copies. Because the reference copy must al-
ways be completely up to date, the partial copies cannot be modiﬁed without
access to the reference copy. When all the partial copies are gathered together,
they are guaranteed to accurately represent the state of the data structure. The
reference ghost algebra is built as follows:

Deﬁnition 2. Given a ghost algebra G, we deﬁne the positive ghost algebra on

G, written pos(G), as an algebra whose carrier set is (Π × G) ∪ {⊥}, where Π
is a set of shares.4 An element of pos(G) is valid if it has a nonempty share,
and the operator · is deﬁned such that (π1 , a1 ) · (π2 , a2 ) = (π1 + π2 , a1 · a2 ) and
x · ⊥ = x for all x.

The positive ghost algebra contains pairs of a nonempty share and an element
of G, with join deﬁned pointwise, representing partial ownership of an element
of G. Total ownership of the element can be recovered by combining all of the
pieces, obtaining a full share, and combining all of the G elements accordingly.

Deﬁnition 3. Given a ghost algebra G, let the reference ghost algebra on G,

written ref(G), be the algebra (pos(G) × (G ∪ ⊥), ·, {(p, r) | r = ⊥ ∨ p r}),
where (p1 , r) · (p2 , ⊥) = (p1 · p2 , r), and p r ∃q. p · q = (
, r).

An element of the reference ghost algebra is a pair of a positive share of G

(partial element) and an optional reference element of G, where the reference
element is unique and indivisible, and the partial element must be completable
to the reference element if one exists. This ensures that when all the shares are
gathered, i.e., when the partial element is (
, a), then it exactly matches the
reference element, but no changes can be made to the partial element without
the reference element present. To more clearly relate elements of this algebra
to their intended meanings, we write ref r for the reference element (⊥, r) and
part s v for the partial element ((s, v), ⊥).
Now we can formalize our intuition about what each party knows about the
sum. We let the lock invariant for l be ∃v. x → v ∗ own g (ref v), and start each
thread with a partial element part ½ 0. When each thread acquires its lock and
increments x, it also uses the own update rule to increment its partial ghost state.
At the end of the program, we can combine the two partial elements to obtain
part
2, which in combination with the lock invariant is suﬃcient to guarantee
that the value of x is 2. This pattern can be used for a wide range of applications
4
We use tree shares [1, Chapter 41] in the Coq proofs, but for simplicity of presentation
in this paper we will use fractional shares: ⊥ is the empty share, ½ is a half share,
and is the full share.
434 W. Mansky et al.

by replacing the sum algebra with one appropriate to the application or data
structure in question. We will also make use of it later to model the state of the
external world as a separation logic resource.

2.2 Semantics of Ghost State

To support the use of ghost state in a separation logic, we need to make two main
changes in the construction of the logic. First, we need to extend the underlying
model of the logic with ghost state: rather than being predicates on the heap,
our assertions are now predicates on the combination of heap and ghost state.
Once ghost state exists in the model, we can give semantics to the own assertion.
Second, we need to change our deﬁnition of Hoare triples to allow for the
possibility of frame-preserving updates to ghost state at any point in a program’s
execution. In a ghost-free separation logic, we might deﬁne Hoare triples with
respect to an operational semantics for the language as follows:

{P } c {Q} ∀h, P (h) ⇒ (c, h) →∗ (done, h ) ⇒ Q(h )

where (c, h) → (c , h ) means that the program c executed with starting heap h
may take a step to a new program c with heap h . For a step-indexed logic, it
is more convenient to write this definition inductively:
Definition 4 (Safety). A configuration (c, h) is safe for n steps with postcon-
dition Q if:
– n is 0, or
– c has terminated and Q(h) holds to approximation (step-index) n, or
– (c, h) → (c , h ) and (c , h ) is safe for n − 1 steps with Q.

We can then define {P } c {Q} (at step-index n) to mean that ∀h. P (h) ⇒ (c, h)
is safe for n steps with Q.
Once we have added ghost state, our heap h is now a pair (h, g) of physical
and ghost state, and between any two steps the ghost state may change. This
leads us to a ghost-augmented version of safety.
Definition 5 (Safety with Ghost State). A configuration (c, h, g) is safe for
n steps with postcondition Q if:
– n is 0, or
– c has terminated and Q(h, g) holds to approximation n, or
– (c, h) → (c , h ) and ∀gframe . g · gframe ⇒ ∃g . (g · gframe ∧ (c , h , g ) is safe
for n − 1 steps with Q).

The program must be able to continue executing under any gframe consistent
with its current ghost state, but its choice of new ghost state g may depend on
the frame. This quantiﬁer alternation captures the essence of ghost state: the
ghost state held by the program constrains any other ghost state held by the
notional “rest of the system”, and may be changed arbitrarily in any way that
does not invalidate that other ghost state.
Connecting Higher-Order Separation Logic to a First-Order Outside World 435

3 External State as Ghost State

An I/O-performing program modifies the state of the outside world. We would
like to treat this external state as a kind of ghost state, since it is not in the
program’s memory and yet can be described by separation logic assertions. At
the same time, we would emphatically not like to allow users to make arbitrary
frame-preserving updates to external state: the external environment should have
complete control of the external state, and the program should never be able to
change it except by calling external functions. Furthermore, VST’s semantic
model (used to prove soundness) already includes an external state element 5 , a
black box of arbitrary type that is carried around by the program and passed to
the environment at each external call, allowing the effects of external calls to be
stateful without explicitly representing their state in program memory. While
this external state is present in the operational semantics of VST, prior to the
changes we describe it could not be referred to by separation logic assertions and
was never instantiated with anything other than the singleton type unit. In this
section, we describe how we combine ghost state with the built-in external state
to make the external state visible in the separation logic.
Intuitively, external state is just another kind of shared resource, and we
should be able to model it with a form of ghost state. However, one of the key
features of ghost state is that programs can make arbitrary frame-preserving
updates to it, while programs should never be able to modify external state. We
can accomplish this using the reference ghost algebra of Section 2: the reference
element ref a will be held by the external environment, while the program holds
a partial element part
a. This ensures that the program cannot make any
frame-preserving updates without the reference element, which is only available
when the program passes control to the external environment via an external
call. It then remains to choose the underlying algebra G of the external state.
Different applications may call for external state with different carrier sets and
operations, but in the simplest case, the VST user will not want to split or
combine the local copy of the external state6 . In this case, they can pick a type
Z and make G the exclusive ghost algebra for Z, which holds only an empty
unit element and an indivisible ownership element, preventing the local copy
from being divided. Then the user program holds an element part
a that
cannot be divided or modified, but only passed to the external environment,
where a : Z is the current value of the external state. We encapsulate the ghost
state construction in an assertion has ext a own 0 (part
a), where 0 is the
identifier reserved for the external ghost state. Now, when verifying a program
with external state, the user simply provides the starting state a, and receives
in the precondition of the main function the assertion has ext a, with no need to
use or understand the ghost state mechanism.
5
Appel et al. [1] call this the external oracle, but we refer to it as simply “external
state” to avoid confusion with the environment oracles of CertiKOS.
6
One example of a use case that benefits from nontrivial external state structure is a
multithreaded web server in which different threads serve different clients simulta-
neously; in this case, each thread might have its own piece of the external state.
436 W. Mansky et al.

On the back end, we must still modify VST’s semantics to connect the ghost
state a to the actual external state, and to prevent the “ghost steps” of the
semantics from changing the external state. Recall from Section 2 that in order
for a non-terminated configuration (c, h, g) to be safe for a nonzero number
of steps, it must be the case that (c, h) → (c , h ) and ∀gframe . g · gframe ⇒
∃g . g · gframe ∧ (c , h , g ) is safe. To connect the external ghost state to a real
external state z , we simply extend this definition to require that gframe include
an element (⊥, z ) at identifier 0. This enforces the requirement that the value
of the external ghost state always be the same as the value of the external
state, and ensures that frame-preserving updates cannot change the value of the
external state. Re-proving the separation logic rules of Verifiable C with this new
definition of Hoare triple required only minor changes, since internal program
steps never change the external ghost state.
When the semantics reaches an external call, the call is allowed to make
arbitrary changes to the state consistent with its pre- and postcondition, in-
cluding changing the value of the external ghost state (as well as the actual
external state). We can use has ext assertions in the pre- and postcondition of
an external function to describe how that function affects the external state. For
instance, we might give a console write function the “consuming-style” specifica-
tion {has ext(write(v); ; k)} write(v) {has ext(k)}, stating that if before calling
write(v) the program has permission to write the value v and then do the opera-
tions in k, then after the call it is left with permission to do k. (We could reverse
the pre- and postcondition for a “trace-style” specification, in which the external
state records the history of operations performed by the program instead of the
future operations allowed.) In this paper, we use interaction trees [13] as a means
of describing a collection of allowed traces of external events. Interaction trees
can be thought of as “abstract traces with binding”; for instance, we can write
x ← read; ; write (x + 1); ; k x to mean “read a value, call it x, write the value
x + 1, and then continue to do the actions in k using the same value of x.”
In the end, we have a new assertion has ext on external state that works in
exactly the way we expect: it can hold external state of any type, it cannot be
modified by user code, it can be freely modified by external calls, it always has
exactly the same value as the external state already present in VST’s semantics,
and it exposes no ghost-state functionality to the user. If the user wants more
fine-grained control over external state (for instance, to split it into pieces so
multiple threads can make concurrent calls to external functions), they can define
their own ghost algebra for the state and pass around part elements explicitly,
but for the common case, has ext provides seamless separation-logic reasoning
about C programs that interact with an external environment.

4 Verifying C Programs with I/O in VST

Once we have separation logic speciﬁcations for external function calls, verifying
a communicating program is no diﬀerent from verifying any other program. We
demonstrate this with the example program excerpted in Figure 1, shown in
Connecting Higher-Order Separation Logic to a First-Order Outside World 437

{ITree(write list(decimal rep (i)); ; k)} {ITree(c ← read; ; main loop(0, c))}
void print intr(unsigned int i) { int main(void) {
unsigned int q,r; unsigned int n, d; char c;
if (i!=0) {
q=i/10u; n=0;
r=i%10u; c=getchar();
print intr(q); while (n<1000) {
putchar(r+’0’); d = ((unsigned)c)-
} (unsigned)’0’;
} if (d>=10) break;
n+=d;
{ITree(k)}
print int(n);
putchar(’\n’);
{ITree(write list(decimal rep(i)); ; k)}
c=getchar();
void print int(unsigned int i) { }
if (i==0) return 0;
putchar(’0’); }
else print intr(i);
{ITree(done)}
}
{ITree(k)}

Fig. 4: A simple communicating program, with speciﬁcations for each function

full in Figure 4. The print intr function uses external calls to putchar to print
the decimal representation of its argument, as long as that argument is nonzero;
print int handles the zero case as well. The main function repeatedly reads in
digits using getchar and then prints the running total of the digits read so far.
The ITree predicate is simply a wrapper around the has ext predicate of the
previous section (i.e., an assertion on the external ghost state), specialized to
interaction trees on I/O operations. We can then write simple speciﬁcations for
getchar and putchar, using interaction trees to represent external state:

{ITree(r ← read; ; k r)} x = getchar() {ITree(k x)}

{ITree(write(x); ; k)} putchar(x) {ITree(k)}

Next, we annotate each function with separation logic pre- and postcon-
ditions; the program does not manipulate memory, so the specifications only
describe the I/O behavior of each function. The effect of print intr is to make a
series of calls to putchar, printing the digits of the argument i as computed by
the meta-level function decimal rep (where write list([i0 ; i1 ; ...; in ]) is an abbre-
viation for the series of outputs write(i0 ); ; write(i1 ); ; ...; ; write(in )). When the
value of i is 0, print intr assumes that the number has been completely printed,
so print int adds a special case for 0 as the initial input. The specification for
the main loop is a recursive sequence of read and write operations, taking the
438 W. Mansky et al.

running total (which starts at 0) and the most recent input as arguments:

main loop(n, d) if n < 1000

then write list(decimal rep(n + d)); ; c ← read; ; main loop(n + d, c) else done

Using the speciﬁcations for putchar and getchar as axioms, we can easily prove
the speciﬁcations of print intr, print int, and main. (The following sections show
how we substantiate these axioms.)

{ITree( ← read list(n); ; k ) ∗ buf → }

x = getchars(buf , n)
{∃vs. length(vs) = n ∧ x = n ∧ ITree(k vs) ∗ buf → vs}

{length(vs) = n ∧ ITree(write list(vs); ; k) ∗ buf → vs}

putchars(buf , n)
{ITree(k) ∗ buf → vs}

Fig. 5: Separation logic speciﬁcations for I/O calls with memory

More complicated programs may manipulate memory as well as communicat-

ing, and we can easily combine the two. For instance, if we want to read or write
several characters in a single call, the standard C idiom is to pass a buffer in
memory as an argument. Figure 5 shows the specifications for functions putchars
and getchars in this style, where each function takes as arguments a buffer to
hold the input/output and a number indicating the size of the buffer7 . The pre-
and postconditions of these functions now involve both the external state and
a standard points-to assertion for the buffer. (Note that ← read list(n) is an
abbreviation for the series of inputs 0 ← read; ; 1 ← read; ; ...; ; n−1 ← read.)
Figures 6 and 7 show a variant of the previous program that uses these exter-
nal functions with memory. The print intr function now populates a buffer with
the characters to be written and returns the length of the decimal representation
of its argument (retval in the postcondition refers to the return value of the func-
tion), while print int makes a single call to putchars with the populated buffer.
The main function now reads four characters at a time and then processes them
one by one, ultimately producing the same output as the previous program. The
specifications for putchars and getchars describe changes to both external state
and memory, as shown in Figure 5. Proving the specifications for the functions in
this program is not any more difficult than in the memoryless case: we define an
interaction tree main loop capturing the slightly different pattern of interaction
in this program, and then apply the appropriate separation logic rule to each
command. The external calls affect both memory and the ITree predicate, while
all other commands affect only memory and local variables, as usual.
7
While these are not standard POSIX I/O functions, they are close to the behavior
of POSIX read/write, socket operations, and other common forms of I/O.
Connecting Higher-Order Separation Logic to a First-Order Outside World 439

{length(decimal rep (i)) ≤ length(contents) ∧ {ITree(write list(decimal rep(i)); ; k)}

buf → contents} void print int(unsigned int i) {
int print intr(unsigned int i, unsigned char ∗buf = malloc(5);
unsigned char ∗buf) { if (!buf) exit(1);
unsigned int q; int k;
unsigned char r; if (i==0){
int k = 1; buf[0] = ’0’;
if (i!=0) { buf[1] = ’\n’;
q=i/10u; k = 2;
r=i%10u; }
k = print intr(q, buf); else{
buf[k] = r+’0’; k = print intr(i, buf);
} buf[k] = ’\n’;
return k + 1; k++;
} }
putchars(buf, k);
{buf → contents[0...(retval − 1) := free(buf);
decimal rep (i)]} }
{ITree(k)}

Fig. 6: A communicating program with memory (part 1)

5 Soundness of External-State Reasoning

The soundness proof of VST [1] describes the guarantees that the Hoare-logic
proof of correctness for a C program provides about the actual execution of that
program. A C program P is represented as a list P1 , ..., Pn of function definitions
in CompCert Clight, a Coq representation of the abstract syntax of C. The
program is annotated with a collection of function specifications (i.e., separation
logic pre- and postconditions) Γ = Γ1 , ..., Γn , one for each function. We then
prove that each Pi satisfies its specification Γi , which we write as Γ Pi : Γi
(note that each function may call on the specification of any function, including
itself). The soundness theorem of VST without external function calls is then:

Theorem 1 (VST Soundness). Let P be a program with speciﬁcation Γ .

Suppose for every function Pi there is a proof Γ Pi : Γi that Pi satisfies
its specification. Then the main function of P can run according to the Comp-
Cert Clight semantics for any number of steps without getting stuck, and if it
terminates then it does so in a state that satisfies its postcondition.

Proof. First, make a nonstandard, ownership-annotated, resource-annotated, step-

indexed small-step semantics for Clight. Deﬁne Veriﬁable C’s Hoare triple as a
shallowly embedded statement about safe executions in this “juicy” semantics.
Then show that executions in the juicy semantics erase to corresponding safe
executions in Clight’s standard “dry” small-step semantics.
440 W. Mansky et al.

{ITree(cs ← read list(4); ; main loop (0, cs))}

int main(void) {
unsigned int n, d; unsigned char c;
unsigned char ∗buf;
int i, j;

n=0;
buf = malloc(4);
if (!buf) exit(1);
i = getchars(buf, 4);
while (n<1000) {
for(j = 0; j < i; j++){
c = buf[j];
d = ((unsigned)c)-(unsigned)’0’;
if (d>=10) { free(buf); return 0; }
n+=d;
print int(n);
}
i = getchars(buf, 4);
}
free(buf);
return 0;
}
{ITree(done)}

Fig. 7: A communicating program with memory (part 2)

Corollary 1. Since null pointer dereferences, integer overﬂows, etc. are all
stuck in CompCert’s small-step semantics, this means that a veriﬁed program
will be free of all of these kinds of errors.

This soundness theorem expresses the relationship between the juicy seman-
tics described by VST’s separation logic and the dry semantics under which
C programs actually execute8 . The proof of correctness of a program gives us
enough information to construct a corresponding dry execution for each juicy
execution9 . However, we may not have access to the code of external functions,
and in some cases (e.g., system calls) they may not even be implemented in C. In
this section, we generalize the soundness theorem to include external functions.
8
Of course, a C program actually executes by running machine code, but the relation-
ship between the dry C semantics and the semantics of assembly language is already
proved in CompCert, as is assembly-to-machine language [20].
9
Theorem 1 blurs the line between juicy and dry by saying that a dry execution
“terminates in a state that satisfies its postcondition”, where the postcondition is
stated in separation logic. In the original proof of soundness [1], this is resolved by
assuming that the postcondition of main is always true. The techniques we use in
this section can also be applied to more refined specifications of main.
Connecting Higher-Order Separation Logic to a First-Order Outside World 441

In order to prove correctness of a C program with external calls in our sepa-

ration logic, we must have a pre- and postcondition Γi for each external function.
At this level these specifications are taken as axioms, since we do not have access
to the code of the external functions. To be able to describe the dry executions
of programs that call these functions, we also need simpler specifications on dry
states. Each dry external specification contains a pre- and postcondition for the
function, which may refer to the memory state, arguments/return values, the
external state, and a witness used to provide logical parameters to the pre- and
postcondition. The core of our approach is to prove the correspondence between
the juicy specification and the dry specification of each external function.
If we can relate every juicy specification to a dry specification, then why
bother with the juicy specifications at all? The answer is, not every function
can be specified “dry.” Higher-order functions in object-oriented patterns, dy-
namically created locks with self-referential resource invariants, and many other
C programming patterns cannot be given simple first-order specifications. But
the external functions that correspond to ordinary input/output can be given
first-order specifications. Therefore, users can write higher-order object-oriented
programs, in which the internal functions have (only) juicy specifications, so long
as the external functions have (also) dry specifications. For instance, consider
the specification of the putchars function from the previous section:

{length(vs) = n ∧ ITree(write list(vs); ; k) ∗ buf → vs} putchars(buf , n)

{ITree(k) ∗ buf → vs}

The pre- and postcondition each make one assertion about memory (that the
buffer buf points to the string of bytes vs) and one assertion about the external
state10 (that the interaction tree allows write list(vs) followed by k before the
call, and k afterward). The corresponding first-order specification on dry memory
and external state is:
Pre((vs, k), (buf , n), m, z) length(vs) = n ∧ z = (write list(vs); ; k) ∧
∀i < n. m(buf + i) = vs[i]
Post((vs, k), (buf , n), m0 , m, z) m0 = m ∧ z = k

where (vs, k) is the witness (i.e., the parameters to the specification), buf and
n are the arguments passed to the function, m is the current memory, z is
the external state, and m0 in the postcondition is the memory before the call
(allowing us to state that memory is unchanged). Of the roughly 210 Linux
system calls that are not Linux- or platform-specific, about 140 fall into this
pattern, including socket, console, and file I/O, memory allocation, or are simpler
informational calls like gethostname that do not involve memory.
Once we have a juicy and a dry specification for a given external function,
what is the relationship between them? Intuitively, if the juicy specification for a
function f is {Pj } f (args); {Qj }, the Hoare logic proof for a program that calls
10
ITree is actually an assertion on the external ghost state, which is connected to the
true external state as described in Section 3, and is erased at the dry level.
442 W. Mansky et al.

f guarantees that Pj is satisﬁed before every call to f , and relies on Qj holding

after each such call returns. To know that the program will run without getting
stuck, on the other hand, we must know that the dry precondition Pd is satisfied
before each call, and we can assume that the dry postcondition Qd is satisfied
after each return. So informally, we need to know that Pj implies Pd and that
Qd implies Qj . This cannot be a simple logical implication, however, because
Pj and Qj are predicates on juicy memories, while Pd and Qd are predicates on
dry memories. A juicy memory jm is a dependent triple (m, φ, pf ), where m is
a dry memory, φ is a higher-order, step-indexed memory with ghost state, and
pf is a proof of the relationship between m and φ. We can easily extract the dry
memory m from a juicy memory (we write this as dry(jm)), but there are many
possible φ’s that may correspond to a single m: we need to make decisions about
ownership information and ghost state that is not present at the CompCert level.
In order to relate the juicy and dry specifications, we must erase the juice from
the precondition, Pj ⇒ Pd , and then reconstruct the juice in the postcondition,
Qd ⇒ Qj . The key to this erasure is that, as explained above, the Pj and Qj for
external functions generally make only first-order assertions on memory (memory
buffers passed to system calls don’t contain higher-order objects such as function
pointers and locks). The rest of the memory is implicitly the frame, and will not
be changed by the external call. For first-order predicates, erasure is injective,
and the associated juicy memory can be uniquely reconstructed once the buffer
has been modified. The frame can contain noninjective juice, but we can reuse
the same juice in going from Qd ⇒ Qj that we erased in going from Pj ⇒ Pd ,
since the external function does not modify the frame. In practice, the story is
not quite so simple: the external function might allocate or free memory, the dry
witness (used in Pd and Qd ) must be derived from the juicy witness (used in Pj
and Qj ), and so on. We now formalize the details, culminating in Definition 6,
the formal correspondence between juicy and dry specifications.
First, we address the problem of reconstructing a juicy memory from a dry
memory. While there are many juicy memories that correspond to a given Comp-
Cert memory, it is easy to start with a (precondition) juicy memory and change it
to reflect (postcondition) modifications to the associated dry memory, as long as
those changes fall within certain limits. In particular, a memory location may be
newly allocated or deallocated, or its value may be changed while staying at the
same permission level, but its permissions should not otherwise be changed 11 . If
a dry specification ensures that memory is changed in only (at most) these ways,
we say that it safely evolves memory. When a user adds a new set of external
functions to VST, this safe evolution property will be one of their proof obliga-
tions. As long as an external function satisfies a specification that safely evolves
memory, we can always reconstruct the juicy memory after the call by modify-
ing the original juicy memory to reflect the changes to the dry memory. This
11
Any function that interacts with memory through the standard interface of load,
store, alloc, and free will fall within these limits; concurrency operations, such as
acquiring or releasing a lock, may not, and proving that lock operations are correctly
implemented is outside the scope of this work.
Connecting Higher-Order Separation Logic to a First-Order Outside World 443

reconstruction captures the effects of the external call on the program’s memory;
to reflect the changes to the external state, we must also set the external ghost
state of the reconstructed juicy memory to match the external state returned
by the call. We define a reconstruct operation such that reconstruct(jm, m, z) is
a version of the juicy memory jm that has been modified to take into account
the changes in the dry memory m and the external state z.
Second, we need a way to transform a juicy witness into the corresponding
dry witness. When a user adds a new external call to VST, they must provide a
dessicate function that performs this transformation. Fortunately, the dessicate
operation usually follows a simple pattern. Components of the witness that are
not memory objects are generally identical in their juicy and dry versions. The
frame is usually the only memory object in the juicy witness; while it is possible in
VST to write a Hoare triple that quantifies over other memory objects explicitly,
it is very unusual and runs counter to the spirit of separation logic. Similarly, the
postcondition of the dry specification may refer to the memory state before the
call (to express properties such as “this call stored value v at location ”), but
there is rarely a reason to refer to any other memory object. Thus, the dessicate
operation for each function can simply discard the frame (juicy) memory and
replace it with the dry memory from before the call. This standard dessicate
operation works for all external functions shown in this paper.
This leads to the following definition and theorem:
Definition 6 (Juicy-Dry Correspondence). A juicy specification (Pj , Qj )
and a dry specification (Pd , Qd ) for an external function correspond if, for a
suitable dessicate operation:

– for all witnesses w, arguments a, external states z, and juicy memories jm,
if Pj (w, a, z, jm), then Pd (dessicate(jm, w), a, z, dry(jm)); and
– for all witnesses w, arguments a, return values r, external states z, ini-
tial juicy memories jm 0 , initial external states z0 , and dry memories m, if
Pd (dessicate(jm 0 , w), a, z0 , dry(jm 0 )) and Qd (dessicate(jm 0 , w), r, z, m), then
Qj (w, r, z, reconstruct(jm 0 , m, z)).

Theorem 2 (VST Soundness with External Functions). Let P be a pro-

gram with n functions, calling also upon m external functions. The internal
functions have (juicy) specifications Γ1 . . . Γn and the external functions have
(juicy) specifications Γn+1 . . . Γn+m . Suppose P is proved correct in Verifiable
C—there is a derivation Γ P1 : Γ1 , . . . , Pn : Γn . Let Dn+1 , . . . , Dn+m be dry
specifications that safely evolve memory, and that correspond to Γn+1 . . . Γn+m .
Then the main function of P can run according to the CompCert C semantics,
using D as the semantics of external function calls, for any number of steps
without getting stuck, and if it terminates then it satisfies its postcondition.

Proof. We extend the juicy semantics of Theorem 1 with a rule for external
calls that uses their juicy pre- and postconditions, and then prove that execu-
tions in this semantics erase to safe executions in the dry semantics, using the
correspondence to relate juicy and dry behaviors of external calls.
444 W. Mansky et al.

Although this theorem does not explicitly mention external communication,

it implies that any I/O operations performed by P conform to the description of
allowed communication in the specification of main. This follows from the fact
that only external calls can change the external state, and only external calls can
communicate with the outside world. Thus, if P performs a sequence of external
function calls f1 , ..., fn , the external communication performed by P must be
consistent with the specifications Df1 , ..., Dfn . In the case of the examples above,
this means that at any point in a program’s execution, its communication so far
will be a prefix of the operations allowed by the initial ITree predicate, as desired.
Proving the correspondence between the juicy and dry specifications is the
primary proof burden for a VST user who wants to use a new external function
in their program. Fortunately, this proof only needs to be done once per external
function rather than once per program (as long as the original specification is
general enough to be usable in many different programs), and soundness (Theo-
rem 2) has been proved once and for all. As a result, a VST user can prove that
their program with external calls runs correctly as follows:
1. For each external function used in the program (that has not already been
specified in VST), write a separation logic specification for that function.
2. Prove correctness of the program in VST as usual using the separation-logic-
level external specifications.
3. For each external function used in the program (again, that has not already
been specified), write a dry specification describing its effects on CompCert
memories, and prove that the dry specification corresponds to the juicy spec-
ification and safely evolves memory.
4. Show immediately that the program runs correctly for any number of steps
by applying Theorem 2.
For instance, we have already seen the VST-level specifications for putchars
and getchars, and used them to prove correctness of a simple program; we can
complete the process with the following lemma.
Lemma 1. The juicy specifications of putchars and getchars correspond to their
dry specifications.
As a result, we now know that the sample program in Figure 7 runs correctly for
any implementation of putchars and getchars that satisfy their dry specifications.

6 Connecting VST to CertiKOS

In the previous section, we showed how to connect a step-indexed separation logic
specification of an external function to a “dry” specification on non-step-indexed
CompCert memories and external state. This gives us a correctness property for
C programs with external functions, but it still treats the dry specifications of
the external functions as axioms. In this section, we show how to discharge these
axioms by connecting dry specifications to implementations of the corresponding
functions in the verified operating system CertiKOS [7].
Connecting Higher-Order Separation Logic to a First-Order Outside World 445

Definition serial_in (port : Z) (st : OSState) : OSState * Z :=

... (* read buffers, compare bits, etc *)
let new := st.(serial_oracle) st.(serial_trace) in
match new with
| SerialRecv data ⇒
let (st’, byte) := ... in (* manipulate data *)
(st’/[serial_trace := st.(serial_trace) ++ [new]], byte)
| ... (* handle other events *) end.

Fig. 8: A speciﬁcation of a serial driver

6.1 CertiKOS Speciﬁcations

In order to explain how to connect VST and CertiKOS speciﬁcations, we ﬁrst

summarize how their specification styles differ. In VST, a specification is a pre-
and postcondition on the (step-indexed, ghost-state-augmented) memory state
of a program. In CertiKOS, a specification is a function representing a state
transition from the current OS state to a new one with an (optional) return value.
The OS state is a record with fields for each piece of concrete or logical state that
CertiKOS maintains, such as page table maps and console buffers. Specifications
are organized into “Certified Abstraction Layers” [6], which can be independently
proven to refine higher-level abstractions, and then composed with other layers
to build more complex systems. The concrete CertiKOS kernel implementation,
in C and assembly, is verified with respect to high-level specifications using this
layer framework and the CompCert compiler.
Because the specifications are pure, deterministic functions, something more
is needed to model functions with externally visible effects such as I/O. To
handle such functions, CertiKOS parameterizes specifications by “environment
contexts” [8], which act as oracles that take a log of the events up to that point
and return the next steps taken by the environment. Each oracle has a fixed set
of events it can produce, along with a trace well-formedness invariant that it
must preserve. For example, the oracle for modeling the behavior of the serial
device can return events indicating the successful completion of a send or the
arrival of some data, and it is assumed to only receive values that fit in a byte
([0, 255]). Although any particular choice of oracle is a deterministic function, its
implementation is completely opaque to the specification, so that proofs about
the specification’s behavior hold given any oracle and environment state.
As a concrete example, consider the abridged specification of part of the
serial driver in CertiKOS (Figure 8). After some initial work, the specification
needs to know what bits came in from the physical device, so it consults the
oracle and branches based on the next serial event. If the next event is a receive,
it manipulates the received data to extract a byte and returns it along with a
new state in which the trace is updated to include the processed event.
446 W. Mansky et al.

6.2 Relating OS and User State

Definition serial_putc (c : Z) (st : OSState) : option (OSState * Z) :=

let c’ := c mod 256 in
if st.(ikern) && st.(init) && st.(ihost) then
if st.(drv_serial).(serial_exists) then
match st.(com1) with
| mkDevData (mkSerialState _ true _ _ txbuf nil false) _ ltx _ ⇒
let cs := if c’ =? CHAR_LF then [CHAR_LF;CHAR_CR] else [c’] in
Some (st/[com1/s/TxBuf := cs,
serial_log := st.(serial_log) ++ [IOEvPutc c]], c’)
| _ ⇒ None end
else Some (st, -1)
else None.

Pre(k, c, m, z) (write(c); ; k) z
Post(k, c, m0 , m, z) m0 = m ∧ z k

Fig. 9: The core of the putchar system call vs. its dry speciﬁcation

User-level programs cannot directly interact with the outside environment,

and must instead communicate through the OS using the system call interface
it provides. System calls in CertiKOS are specified just like any other operation,
i.e., as a state transition function. For each system call, we would like to relate its
dry pre- and postcondition (as described in Section 5) to its functional specifica-
tion in CertiKOS. The property we would like to prove is something like: for any
initial state s, if the dry precondition holds for s, then the value v and state s
returned by the functional specification satisfy the dry postcondition. Combined
with the correspondence between juicy and dry specifications, this implies that
the system call specification correctly implements the behavior expected by the
user program (as expressed by its separation logic specification in VST). How-
ever, this property cannot be proven in its current form because the dry pre-
and postconditions are predicates on CompCert memories and external state,
which differ from CertiKOS’s state, much of which is invisible and irrelevant
to the user program, as can be seen in Figure 9. Instead, we must restate the
correctness property in terms of relations between the common elements of the
two state representations. The key components to relate are the return value of
the system call, the representation of the user program’s memory, and the model
of external behaviors. The return value is a CompCert value in both systems,
but the other two require additional work to translate between them.
Although, like VST, the CertiKOS kernel uses the CompCert C semantics
and memory model, user-process memory is represented as a flat physical ad-
dress space rather than a set of disjoint blocks. The OS state also includes page
tables to map virtual to physical addresses and a record of which addresses are
allocated. Fortunately, aside from these differences, the flat memory model is
quite similar to CompCert’s (see Figure 10). We assume the existence of a re-
lation Rmem that maps blocks to virtual addresses. Other than the restriction
Connecting Higher-Order Separation Logic to a First-Order Outside World 447

Inductive flatmem_val := Inductive memval :=

| HUndef | Undef: memval
| HByte: byte → flatmem_val. | Byte: byte → memval
| Pointer: block → int → nat → memval.
(* Map from address to value *) (* Map from block and offset to value *)
Definition flatmem := Record mem := mkmem {
ZMap.t flatmem_val. mem_contents: PMap.t (ZMap.t memval);
... }.
Fig. 10: A comparison of CertiKOS ﬂat memory and CompCert memory

that blocks fit in the virtual address space and map to nonoverlapping regions,
the exact mapping has no effect on the system call correctness, so it can be com-
pletely arbitrary. To relate a CompCert memory to a CertiKOS one, we define
a relation inj(m, flat(s), ptbl(s)), which states that if a block and offset in the
CompCert memory m is valid, then it contains the same data as the correspond-
ing location (according to Rmem and the page table) in the flat memory of the
OS state s. Note that inj is parameterized by the page table to allow a system
call to alter the address mapping, for example by allocating new memory.
At the user level, the precondition contains an interaction tree (or similar
external specification) that specifies the allowed external behaviors, and the
postcondition contains a smaller tree that continues using the return value of
the “consumed” actions. On the other hand, in CertiKOS, specifications begin
with a trace of the events that have already happened and extend it with new
events by querying the external environment. To reconcile these two views, we
can first relate an interaction tree to a (possibly infinite) set of (possibly infinitely
long) traces, each of which intuitively is the result of following one path in the
tree. Then any trace allowed by the output interaction tree should be a suffix of
a trace allowed by the input tree, and the difference between the two should be
exactly the trace of events generated during the system call:

Deﬁnition 7. We write consume(T , T , tr ) to mean that, if tr is a trace of T ,

then tr ++ tr (concatenation of tr and tr ) is a trace of T .
Equipped with the relations defined above, we can define more precisely what
it means for a system call to satisfy its dry specification.

Deﬁnition 8 (Dry-Syscall Correspondence). A system call f with func-

tional specification Of correctly implements a dry specification (Pd , Qd ) if for
any arguments v, CompCert memory m, interaction tree T , and OS state s, if
Pd (v, m, T ), inj(m, flat(s), ptbl(s)), and Of (v, s) = (s , v , tnew ), then for all m
such that inj(m , flat(s ), ptbl(s )), there exists T such that consume(T , T , tnew ),
and Qd (v, v , m , T ).
That is, if f correctly implements a dry specification then for any state that
satisfies the dry precondition Pd , we can inject the relevant piece of memory
into an OS state s, apply the functional specification Of , and then extract a
448 W. Mansky et al.

resulting state that satisfies the dry postcondition Qd . The inj relation may
relate multiple CompCert memories to a given OS state (hence the universal
quantification over the resulting memory m ), but all such memories must agree
on the contents of all valid addresses, so the postcondition will usually hold for
all m if it holds for any m .
Theorem 3. Putchar and getchar in CertiKOS correctly implement their dry
specifications.
While this correspondence is specific to CertiKOS, we can adapt it to other
verified operating systems by replacing the CertiKOS system call specification,
user memory model, and external event representation with those of the other
OS. For example, in the case of the seL4 microkernel [12], inj could be redefined to
relate a CompCert memory to certain capability slots that represent the virtual
memory, and the system call might send a message to a device driver running
in another process. Despite these changes, most of the theorems in this paper
aside from Theorem 3 would continue to hold with minor or no alterations.

6.3 Soundness of VST + CertiKOS

In Section 5, we described a correspondence between “juicy” separation logic
specifications for external functions and “dry” CompCert-level specifications
that is sufficient to guarantee that verified C programs behave correctly when
run, as long as the external functions actually satisfy their dry specifications.
Now we have seen how to prove that an external function satisfies its dry specifi-
cation, by relating it to its CertiKOS specification. We combine these two proofs
to get a stronger correctness property for programs that use CertiKOS system
calls. This will also allow us to formalize the idea that at each point in a pro-
gram’s execution, it has performed some prefix of the communication operations
specified in its precondition.
First, we define the semantics of programs with respect to the implementation
of external functions:
Definition 9 (OS Safety). Suppose that we have a set of external calls F
such that each f ∈ F has a functional specification Of . Then a configura-
tion (c, m, t, T ), where c is a C program state, m is a memory, t is a trace
of events performed so far, and T is an interaction tree specifying the allowed
future events, is safe for n steps with respect to a set of traces T if:
– n is 0 and T is {}, or
– (c, m) → (c , m ) and (c , m , t, T ) is safe for n − 1 steps with respect to T ,
or
– c is at a call to an external function f with arguments v, and for all s con-
sistent with t such that inj(m, flat(s), ptbl(s)), if Of (v, s) = (s , v , tnew ), then
there is some new interaction tree T such that (c , m , t ++ tnew , T ) is safe
for n − 1 steps with respect to T , where c is the program state after the call
(using the return value v ), inj(m , flat(s ), ptbl(s )), and consume(T , T , tnew ),
and T is the union of {tnew ++ t | t ∈ T } for all such T .
Connecting Higher-Order Separation Logic to a First-Order Outside World 449

The C program has states (c, m), where c holds the values of local variables
and the control stack, and m is the memory. Our small-step relation (c, m) →
(c , m ) characterizes internal C execution, and therefore if c is at a call to an
external function then (c, m) → (c , m ). The operating system has states s that
contain the physical memory flat(s) and many other components used internally
by the OS (and its proof of correctness), including a trace of past events; we say
that s is consistent with t when the trace in s is exactly t.
Definition 9 has several important differences from our original definition of
safety in Section 2. First, configurations include the trace t of events performed
so far, as well as T , the high-level specification of the allowed communication
events (here it is taken to be an interaction tree, but it could easily be defined
in another formalism just by changing the definition of consume). Second, our
external functions are not simply axiomatized with pre- and postconditions,
but implemented by the executable specifications Of provided by the operating
system. We use the ideas of the previous section to relate the execution of C
programs to the behavior of system calls: we inject the user memory into the OS
state, extract the resulting memory from the resulting state, and require that the
new interaction tree T reflect the communication events tnew performed by the
call. Note the quantification over the current OS state s: the details of the OS
state, such as the buffer of values received, are unknown to the C program (and
may change arbitrarily between steps, for instance, if an interrupt occurs), and
so it must be safe under all possible OS states consistent with the events t. The
set T contains all possible communication traces from the program’s execution,
so by proving that every trace in T is allowed by the initial interaction tree T ,
we show that the program’s communication is always constrained by T .
Lemma 2 (Trace Correctness). If (c, m, T ) is safe for n steps with respect
to T , then for all traces t ∈ T , there exists some interaction tree T such that
consume(T , T , t).
Proof. By induction on n. Since the consume relation holds for the trace segment
produced by each external call, it suffices to show that it is transitive, i.e., that
consume(a, b, t1 ) and consume(b, c, t2 ) imply consume(a, c, t1 ++ t2 ).
Theorem 4 (Soundness of VST + CertiKOS). Let P be a program with
n functions, calling also upon m external functions. The internal functions have
(juicy) specifications Γ1 . . . Γn and the external functions have (juicy) specifi-
cations Γn+1 . . . Γn+m . Suppose P is proved correct in Verifiable C with initial
interaction tree T . Let Dn+1 , . . . , Dn+m be dry specifications that safely evolve
memory, and that correspond to Γn+1 . . . Γn+m . Further, let each Di be correctly
implemented by an OS function fi with executable specification Ofi . Then for all
n, the main function of P is safe for n steps with respect to some set of traces
T , and for every trace t ∈ T , there exists some interaction tree T such that
consume(T , T , t).
Proof. By the combination of the soundness of VST with external functions
(Theorem 2), Lemma 2, and a proof relating our previous definition of safety to
the new definition.
450 W. Mansky et al.

This is our main result: by combining the results of the previous sections, we
obtain a soundness theorem down to the operating system’s implementation of
system calls, one that guarantees that the actual communication operations per-
formed by the program are always a prefix of the initial specification of allowed
operations. By instantiating the theorem with a set of verified system calls, we
obtain a strong correctness result for our VST-verified programs, such as:
Theorem 5. Let P be a program that uses the putchar and getchar system calls
provided by CertiKOS, such as the one in Figure 4. Suppose P is proved correct
with initial interaction tree T . Then for all n, the main function of P is safe
for n steps with respect to some set of traces T , and for every trace t ∈ T , there
exists some interaction tree T such that consume(T , T , t).

7 From syscall-level to hardware-level interactions

Thus far, we have assumed that the events in a program’s trace are exactly
the events described in the user-level interaction tree T . In practice, however,
the communication performed by the OS may differ from that observed by the
user. For example, like all operating systems, CertiKOS uses a kernel buffer of
finite size to store characters received from the serial device; if the buffer is
full, incoming characters are discarded without being read. To represent this
distinction, we distinguish between the user-visible events produced by system
calls, and external events, which are generated by the environment oracle and
recorded in the trace at the time that they occur. For the system call events
to be meaningful, they must correspond in some way to the external events,
but this correspondence may not be one-to-one. In the case of console I/O, each
character received by the serial device should be returned by getchar at most
once, and in the order they arrived, but characters may be dropped. This leads us
to the condition that the user events should be a subsequence of the environment
events, which is proved in CertiKOS.

Lemma 3. The getchar system call maintains the invariant that there exists an
injective map from a system call event with value v in the OS trace to an external
event with value v earlier in the trace.

Corollary 2. Let P be a veriﬁed program as described in Theorem 4, in which

getchar is the only system call performed. Then for all n, the main function of
P is safe for n steps with respect to some set of traces T , and for every trace
t ∈ T , there exists some interaction tree T such that consume(T , T , t), and the
events in t correspond to external events performed as described in Lemma 3.

Unlike Theorem 4, this corollary is speciﬁc to a particular system call, but it

gives a stronger correctness property: the events in the user-level interaction tree
are now interpreted in terms of actual bytes received by the OS, in the form of
external events. Note that Lemma 3 does not require that every external event
has a corresponding system call event; if the buﬀer ﬁlls up and characters are
Connecting Higher-Order Separation Logic to a First-Order Outside World 451

dropped before a getchar call, then there will be external events that do not cor-
respond to anything in the interaction tree, and this is the intended semantics of
buffered communication without flow control. A similar corollary can be proved
for any set of system calls, but the precise correspondence between user events
and external events will depend on the particular system calls involved.
There is one more soundness theorem we might want to prove, asserting
that the combined system of program and operating system executes correctly
according to the assembly-level semantics of the OS. We should be able to obtain
this theorem by connecting Theorem 4 with the soundness theorem of CertiKOS,
which guarantees that the behavior of the operating system running a program
P refines the behavior of a system K
P consisting of the program along
with an abstract model of the operating system. However, this connection is
far from trivial: it involves lowering our soundness result from C to assembly
(using the correctness theorem of CompCert), modeling the switch from user to
kernel mode (including the semantics of the trap instruction), and considering
the effects of other OS features on program behavior (e.g., context switching). We
estimate that we have covered more than half of the distance between VST and
CertiKOS with our current result, but there is still work to be done to complete
the connection. We can now remove the OS’s implementation of each system call
from the trusted computing base; it remains to remove the OS entirely.

8 Related Work

The most comprehensive prior work connecting verified programs to the imple-
mentation of I/O operations is that of Férée et al. [5] in CakeML, a functional
language with I/O connected to a verified compiler and verified hardware. As in
our approach, the language is parameterized by functional specifications for ex-
ternal functions, backed by proofs at a lower level. However, while CakeML does
support a separation logic [9], it is not higher-order, so all of the components are
specified in the same basic style. Our approach could enable higher-order sepa-
ration logic reasoning about CakeML programs. Ironclad Apps [10] also includes
verified communicating code, for user-level networking applications running on
the Verve operating system [21]. However, their network stack is implemented
outside of the operating system, so proofs about I/O operations are carried out
within the same framework as the programs that use the operations.
One major category of system calls is file I/O operations. The FSCQ file
system [2] is verified using Crash Hoare Logic, a separation logic which accounts
for possible crashes at any point in a program. File system assertions are similar
to the ordinary points-to assertions of separation logic, but may persist through
crashes while memory is reset. In Crash Hoare Logic, the implementation-level
model of the file state is the same as the user’s model, and the approach does
not obviously generalize to other forms of external communication.
Another related area is the extension of separation logic to distributed sys-
tems, which necessarily involves reasoning about communication with external
entities. The most closely related such logic is Aneris [14], which is built on
452 W. Mansky et al.

Iris, the inspiration for VST’s approach to ghost state. The adequacy theorem
of Aneris proves the connection between higher-order separation logic specifica-
tions of socket operations and a language that includes first-order operational
semantics for those functions. In our approach, this would correspond to directly
adding the “dry” specifications for each operation to the language semantics, and
building the correspondence proof for those particular operations into the sound-
ness theorem of the logic; our more generic style of soundness theorem would
make it easier to plug in new external calls. The bottom half of our approach—
showing that the language-level semantics of the operations are implemented by
an OS such as CertiKOS—could be applied to Aneris more or less as is. Another
interesting feature of Aneris is that the communication allowed on each socket
is specified by a user-provided protocol, an arbitrary separation logic predicate
on messages and resources. In our examples thus far, we have assumed that the
external world does not share any notion of resource with the program, and
so our external state only mentions the messages to be sent and received; how-
ever, the construction of Section 3 does allow the external state to have arbitrary
ghost-state structure, which we could use to define similarly expressive protocols.

9 Conclusion and Future Work

We have now seen how to connect programs verified using higher-order separa-
tion logic to external functions provided by a first-order verified system, effec-
tively importing the results of outside verification (e.g. OS verification) into our
separation logic. The approach consists of two halves: we first relate separation
logic specifications for the external functions to “dry” first-order specifications
on CompCert memories [15] and interaction trees [13], and then relate these dry
specifications to the system that implements the functions (CertiKOS in our
example). In the process, we interpret the C-level communication constraints in
terms of OS-level events that more accurately represent the communication that
occurs in the real world. Our approach works for any type of external commu-
nication, and allows users to extend the system with new external functions as
needed. Each new correspondence proof for an external function modularly ex-
tends the soundness theorem of VST, removing the separation-logic specification
of the function from the trusted computing base.
The combination of CompCert memories with interaction trees has served
as a robust specification interface between two quite different approaches to
verification: VST’s higher-order impredicative concurrent separation logic, and
CertiKOS’s certified concurrent abstraction layers. This strongly suggests that
the combination of CompCert memories and interaction trees can serve as a
lingua franca to interface with other verification systems for client programs
and for operating systems.

References
1. Appel, A.W., Dockins, R., Hobor, A., Beringer, L., Dodds, J., Stewart, G., Blazy,
S., Leroy, X.: Program Logics for Certiﬁed Compilers. Cambridge University Press
Connecting Higher-Order Separation Logic to a First-Order Outside World 453

(2014), http://www.cambridge.org/de/academic/subjects/computer-science/
programming-languages-and-applied-logic/program-logics-certified-compilers?
format=HB
2. Chen, H., Ziegler, D., Chajed, T., Chlipala, A., Kaashoek, M.F., Zeldovich, N.:
Using Crash Hoare Logic for certifying the FSCQ file system. In: Proceedings of
the 25th Symposium on Operating Systems Principles. pp. 18–37. SOSP ’15, ACM,
New York, NY, USA (2015). https://doi.org/10.1145/2815400.2815402
3. Dinsdale-Young, T., Birkedal, L., Gardner, P., Parkinson, M.J., Yang, H.: Views:
compositional reasoning for concurrent programs. In: Giacobazzi, R., Cousot, R.
(eds.) The 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of
Programming Languages, POPL ’13, Rome, Italy - January 23 - 25, 2013. pp.
287–300. ACM (2013). https://doi.org/10.1145/2429069.2429104
4. Dinsdale-Young, T., Dodds, M., Gardner, P., Parkinson, M.J., Vafeiadis, V.: Con-
current abstract predicates. In: D’Hondt, T. (ed.) ECOOP 2010 - Object-Oriented
Programming, 24th European Conference, Maribor, Slovenia, June 21-25, 2010.
Proceedings. Lecture Notes in Computer Science, vol. 6183, pp. 504–528. Springer
(2010). https://doi.org/10.1007/978-3-642-14107-2 24
5. Férée, H., Pohjola, J.Å., Kumar, R., Owens, S., Myreen, M.O., Ho, S.: Program
verification in the presence of I/O - semantics, verified library routines, and verified
applications. In: Piskac, R., Rümmer, P. (eds.) Verified Software. Theories, Tools,
and Experiments - 10th International Conference, VSTTE 2018, Oxford, UK, July
18-19, 2018, Revised Selected Papers. Lecture Notes in Computer Science, vol.
11294, pp. 88–111. Springer (2018). https://doi.org/10.1007/978-3-030-03592-1 6
6. Gu, R., Koenig, J., Ramananandro, T., Shao, Z., Wu, X.N., Weng, S.C., Zhang,
H., Guo, Y.: Deep specifications and certified abstraction layers. In: Proceedings
of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Pro-
gramming Languages. pp. 595–608. POPL ’15, ACM, New York, NY, USA (2015).
https://doi.org/10.1145/2676726.2676975
7. Gu, R., Shao, Z., Chen, H., Wu, X.N., Kim, J., Sjöberg, V., Costanzo, D.: Certikos:
An extensible architecture for building certified concurrent OS kernels. In: 12th
USENIX Symposium on Operating Systems Design and Implementation, OSDI
2016, Savannah, GA, USA, November 2-4, 2016. pp. 653–669 (2016), https://www.
usenix.org/conference/osdi16/technical-sessions/presentation/gu
8. Gu, R., Shao, Z., Kim, J., Wu, X.N., Koenig, J., Sjöberg, V., Chen, H., Costanzo,
D., Ramananandro, T.: Certified concurrent abstraction layers. In: Proceedings
of the 39th ACM SIGPLAN Conference on Programming Language Design and
Implementation, PLDI 2018, Philadelphia, PA, USA, June 18-22, 2018. pp. 646–
661 (2018). https://doi.org/10.1145/3192366.3192381
9. Guéneau, A., Myreen, M.O., Kumar, R., Norrish, M.: Verified characteristic for-
mulae for CakeML. In: Yang, H. (ed.) Programming Languages and Systems. pp.
584–610. Springer Berlin Heidelberg, Berlin, Heidelberg (2017)
10. Hawblitzel, C., Howell, J., Lorch, J.R., Narayan, A., Parno, B., Zhang, D., Zill, B.:
Ironclad apps: End-to-end security via automated full-system verification. In: 11th
USENIX Symposium on Operating Systems Design and Implementation, OSDI
’14, Broomfield, CO, USA, October 6-8, 2014. pp. 165–181 (2014), https://www.
usenix.org/conference/osdi14/technical-sessions/presentation/hawblitzel
11. Jung, R., Krebbers, R., Birkedal, L., Dreyer, D.: Higher-order ghost state. In:
Proceedings of the 21st ACM SIGPLAN International Conference on Functional
Programming. pp. 256–269. ICFP 2016, ACM, New York, NY, USA (2016).
https://doi.org/10.1145/2951913.2951943
454 W. Mansky et al.

12. Klein, G., Elphinstone, K., Heiser, G., Andronick, J., Cock, D., Derrin, P., Elka-
duwe, D., Engelhardt, K., Kolanski, R., Norrish, M., Sewell, T., Tuch, H., Win-
wood, S.: seL4: Formal verification of an OS kernel. In: Proceedings of the ACM
SIGOPS 22nd Symposium on Operating Systems Principles. pp. 207–220. SOSP
’09, ACM, New York, NY, USA (2009). https://doi.org/10.1145/1629575.1629596
13. Koh, N., Li, Y., Li, Y., Xia, L.y., Beringer, L., Honoré, W., Mansky, W., Pierce,
B.C., Zdancewic, S.: From C to interaction trees: Specifying, verifying, and test-
ing a networked server. In: Proceedings of the 8th ACM SIGPLAN International
Conference on Certified Programs and Proofs. pp. 234–248. CPP 2019, ACM, New
York, NY, USA (2019). https://doi.org/10.1145/3293880.3294106
14. Krogh-Jespersen, M., Timany, A., Ohlenbusch, M.E., Birkedal, L.: Aneris: A
logic for node-local, modular reasoning of distributed systems (2019), https:
//iris-project.org/pdfs/2019-aneris-submission.pdf , unpublished draft
15. Leroy, X., Appel, A.W., Blazy, S., Stewart, G.: The CompCert memory model. In:
Appel, A.W. (ed.) Program Logics for Certified Compilers, chap. 32. Cambridge
University Press (2014)
16. Ley-Wild, R., Nanevski, A.: Subjective auxiliary state for coarse-grained concur-
rency. In: Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium
on Principles of Programming Languages. pp. 561–574. POPL ’13, ACM, New
York, NY, USA (2013). https://doi.org/10.1145/2429069.2429134
17. O’Hearn, P.W.: Resources, concurrency, and local reasoning. Theor. Comput. Sci.
375(1-3), 271–307 (Apr 2007). https://doi.org/10.1016/j.tcs.2006.12.035
18. Penninckx, W., Jacobs, B., Piessens, F.: Sound, modular and compositional ver-
ification of the input/output behavior of programs. In: Programming Languages
and Systems - 24th European Symposium on Programming, ESOP 2015, Held
as Part of the European Joint Conferences on Theory and Practice of Software,
ETAPS 2015, London, UK, April 11-18, 2015. Proceedings. pp. 158–182 (2015).
https://doi.org/10.1007/978-3-662-46669-8 7
19. Sergey, I., Nanevski, A., Banerjee, A.: Specifying and verifying concurrent algo-
rithms with histories and subjectivity. In: Vitek, J. (ed.) Proceedings of the 24th
European Symposium on Programming (ESOP 2015). Lecture Notes in Computer
Science, vol. 9032, pp. 333–358. Springer (2015). https://doi.org/10.1007/978-3-
662-46669-8 14
20. Wang, Y., Wilke, P., Shao, Z.: An abstract stack based approach to verified com-
positional compilation to machine code. Proceedings of the ACM on Programming
Languages 3(POPL), 62 (2019)
21. Yang, J., Hawblitzel, C.: Safe to the last instruction: automated verifica-
tion of a type-safe operating system. In: Proceedings of the 2010 ACM SIG-
PLAN Conference on Programming Language Design and Implementation,
PLDI 2010, Toronto, Ontario, Canada, June 5-10, 2010. pp. 99–110 (2010).
https://doi.org/10.1145/1806596.1806610
Connecting Higher-Order Separation Logic to a First-Order Outside World 455

Kazutaka Matsuda1

Graduate School of Information Sciences, Tohoku University, Sendai 980-8579, Japan

[email protected]

Abstract. Bernardy et al. [2018] proposed a linear type system λq→ as a

core type system of Linear Haskell. In the system, linearity is represented
by annotated arrow types A →m B, where m denotes the multiplicity
of the argument. Thanks to this representation, existing non-linear code
typechecks as it is, and newly written linear code can be used with
existing non-linear code in many cases. However, little is known about
the type inference of λq→ . Although the Linear Haskell implementation
is equipped with type inference, its algorithm has not been formalized,
and the implementation often fails to infer principal types, especially for
higher-order functions. In this paper, based on OutsideIn(X) [Vytiniotis
et al., 2011], we propose an inference system for a rank 1 qualified-typed
variant of λq→ , which infers principal types. A technical challenge in this
new setting is to deal with ambiguous types inferred by naive qualified
typing. We address this ambiguity issue through quantifier elimination
and demonstrate the effectiveness of the approach with examples.

Keywords: Linear Types · Type Inference · Qualiﬁed Typing.

1 Introduction

Linearity is a fundamental concept in computation and has many applications.

For example, if a variable is known to be used only once, it can be freely inlined
without any performance regression [29]. In a similar manner, destructive updates
are safe for such values without the risk of breaking referential transparency [32].
Moreover, linearity is useful for writing transformation on data that cannot be
copied or discarded for various reasons, including reversible computation [19, 35]
and quantum computation [2, 25]. Another interesting application of linearity is
that it helps to bound the complexity of programs [1, 5, 13]
Linear type systems use types to enforce linearity. One way to design a
linear type system is based on Curry-Howard isomorphism to linear logic. For
example, in Wadler [33]’s type system, functions are linear in the sense that their
arguments are used exactly once, and any exception to this must be marked by
the type operator (!). Such an approach is theoretically elegant but cumbersome
in programming; a program usually contains both linear and unrestricted code,
and many manipulations concerning (!) are required in the latter and around the

c The Author(s) 2020

P. Müller (Ed.): ESOP 2020, LNCS 12075, pp. 456–483, 2020.
https://doi.org/10.1007/978-3-030-44914-8_ 17
Modular Inference of Linear Types for Multiplicity-Annotated Arrows 457

interface between the two. Thus, there have been several proposed approaches
for more practical linear type systems [7, 21, 24, 28].
Among these approaches, a system called λq→ , the core type system of Linear
Haskell, stands out for its ability to have linear code in large unrestricted code
bases [7]. With it, existing unrestricted code in Haskell typechecks in Linear
Haskell without modification, and if one desires, some of the unrestricted code
can be replaced with linear code, again without any special programming effort.
For example, one can use the function append in an unrestricted context as
λx.tail (append x x), regardless of whether append is a linear or unrestricted
function. This is made possible by their representation of linearity. Specifically,
they annotate function type with its argument’s multiplicity (“linearity via
arrows” [7]) as A →m B, where m = 1 means that the function of the type
uses its argument linearly, and m = ω means that there is no restriction in
the use of the argument, which includes all non-linear standard Haskell code.
In this system, linear functions can be used in an unrestricted context if their
arguments are unrestricted. Thus, there is no problem in using append : List A →1
List A →1 List A as above, provided that x is unrestricted. This promotion of
linear expressions to unrestricted ones is difficult in other approaches [21, 24, 28]
(at least in the absence of bounded kind-polymorphism), where linearity is a
property of a type (called “linearity via kinds” in [7]).
However, as far as we are aware, little is known about type inference for
λq→ . It is true that Linear Haskell is implemented as a fork1 of the Glasgow
Haskell Compiler (GHC), which of course comes with type inference. However,
the algorithm has not been formalized and has limitations due to a lack of proper
handling of multiplicity constraints. Indeed, Linear Haskell gives up handling
complex constraints on multiplicities such as those with multiplications p · q; as
a result, Linear Haskell sometimes fails to infer principal types, especially for
higher-order functions.2 This limits the reusability of code. For example, Linear
Haskell cannot infer an appropriate type for function composition to allow it to
compose both linear and unrestricted functions.
A classical approach to have both separated constraint solving that works
well with the usual unification-based typing and principal typing (for a rank 1
fragment) is qualified typing [15]. In qualified typing, constraints on multiplicities
are collected, and then a type is qualified with it to obtain a principal type.
Complex multiplicities are not a problem in unification as they are handled by a
constraint solver. For example, consider app = λf.λx.f x. Suppose that f has
type a →p b, and x has type a (here we focus only on multiplicities). Let us write
the multiplicities of f and x as pf and px , respectively. Since x is passed to f ,
there is a constraint that the multiplicity px of x must be ω if the multiplicity p
of the f ’s argument also is. In other words, px must be no less than p, which is
represented by inequality p ≤ px under the ordering 1 ≤ ω. (We could represent
the constraint as an equality px = p · px , but using inequality is simpler here.)
1
https://github.com/tweag/ghc/tree/linear-types
2
Confirmed for commit 1c80dcb424e1401f32bf7436290dd698c739d906 at May 14,
2019.
458 K. Matsuda

For the multiplicity pf of f , there is no restriction because f is used exactly once;

linear use is always legitimate even when pf = ω. As a result, we obtain the
inferred type ∀p pf px a b. p ≤ px ⇒ (a →p b) →pf a →px b for app. This type is
a principal one; it is intuitively because only the constraints that are needed for
typing λf.λx.f x are gathered. Having separate constraint solving phases itself
is rather common in the context of linear typing [3, 4, 11, 12, 14, 23, 24, 29, 34].
Qualified typing makes the constraint solving phase local and gives the principal
typing property that makes typing modular. In particular, in the context of
linearity via kinds, qualified typing is proven to be effective [11, 24].
As qualified typing is useful in the context of linearity via kinds, one may
expect that it also works well for linearity via arrows such as λq→ . However, naive
qualified typing turns out to be impractical for λq→ because it tends to infer
ambiguous types [15, 27]. As a demonstration, consider a slightly different version
of app defined as app = λf.λx.app f x. Standard qualified typing [15, 31] infers
the type

∀q qf qx pf px a b. (q ≤ qx ∧ qf ≤ pf ∧ qx ≤ px ) ⇒ (a →q b) →pf a →px b

by the following steps:

– The polymorphic type of app is instantiated to (a →q b) →qf a →qx b and
yields a constraint q ≤ qx (again we focus only on multiplicity constraints).
– Since f is used as the first argument of app, f must have type a →q b. Also,
since the multiplicity of app’s first argument is qf , there is a restriction on
the multiplicity of f , say pf , that qf ≤ pf .
– Similarly, since x is used as the second argument of app, x must have type a,
and there is a constraint on the multiplicity of x, say px , that qx ≤ px .
This inference is unsatisfactory, as the inferred type leaks internal details and
is ambiguous [15, 27] in the sense that one cannot determine qf and qx from
an instantiation of (a →q b) →pf a →px b. Due to this ambiguity, the types of
app and app’ are not judged as equivalent; in fact, the standard qualified typing
algorithms [15, 31] reject app : ∀p pf px a b. p ≤ px ⇒ (a →p b) →pf a →px b. We
conjecture that the issue of inferring ambiguous types is intrinsic to linearity via
arrows because of the separation of multiplicities and types, unlike the case of
linearity via kinds, where multiplicities are always associated with types. Simple
solutions such as rejecting ambiguous types are not desirable as this case appears
very often. Defaulting ambiguous variables (such as qf and qx ) to 1 or ω is not a
solution either because it loses principality in general.
In this paper, we propose a type inference method for a rank 1 qualified-typed
variant of λq→ , in which the ambiguity issue is addressed without compromising
principality. Our type inference system is built on top of OutsideIn(X) [31],
an inference system for qualified types used in GHC, which can handle local
assumptions to support let, existential types, and GADTs. An advantage of using
OutsideIn(X) is that it is parameterized over theory X of constraints. Thus,
applying it to linear typing boils down to choosing an appropriate X. We choose
X carefully so that the representation of constraints is closed under quantifier
Modular Inference of Linear Types for Multiplicity-Annotated Arrows 459

elimination, which is the key to addressing the ambiguity issue. Speciﬁcally, in

this paper:

– We present a qualiﬁed typing variant of a rank-1 fragment of λq→ without

local definitions, in which manipulation of multiplicities is separated from
the standard unification-based typing (Sect. 2).
– We give an inference method for the system based on gathering constraints
and solving them afterward (Sect. 3). This step is mostly standard, except
that we solve multiplicity constraints in time polynomial in their sizes.
– We address the ambiguity issue by quantifier elimination under the assumption
that multiplicities do not affect runtime behavior (Sect. 4).
– We extend our technique to local assumptions (Sect. 5), which enables let
and GADTs, by showing that the disambiguation in Sect. 4 is compatible
with OutsideIn(X).
– We report experimental results using our proof-of-concept implementation
(Sect. 6). The experiments show that the system can infer unambiguous
principal types for selected functions from Haskell’s Prelude, and performs
well with acceptable overhead.

Finally, we discuss related work (Sect. 7) and then conclude the paper (Sect. 8).
The prototype implementation is available as a part of a reversible programming
system Sparcl, available from https://bitbucket.org/kztk/partially-reversible-lang-impl/.
Due to space limitation, we omit some proofs from this paper, which can be
found in the full version [20].

2 Qualiﬁed-Typed Variant of λq→

In this section, we introduce a qualified-typed [15] variant of λq→ [7] for its
rank 1 fragment, on which we base our type inference. Notable differences to the
original λq→ include: (1) multiplicity abstractions and multiplicity applications
are implicit (as type abstractions and type applications), (2) this variant uses
qualified typing [15], (3) conditions on multiplicities are inequality based [6],
which gives better handling of multiplicity variables, and (4) local definitions
are excluded as we postpone the discussions to Sect. 5 due to their issues in the
handling of local assumptions in qualified typing [31].

2.1 Syntax of Programs

Programs and expressions, which will be typechecked, are given below.

prog ::= bind 1 ; . . . ; bind n

bind ::= f = e | f : A = e
e ::= x | λx.e | e1 e2 | C e | case e0 of {Ci xi → ei }i

A program is a sequence of bindings with or without type annotations, where

bound variables can appear in following bindings. As mentioned at the beginning
460 K. Matsuda

A, B ::= ∀pa.Q ⇒ τ (polytypes) Q ::= i φi (constraints)
σ, τ ::= a | D μ τ | σ →μ τ (monotypes) φ ::= M ≤ M (predicates)

μ ::= p | 1 | ω (multiplicities) M, N ::= i μi (multiplications)

Fig. 1. Types and related notions: a and p are type and multiplicity variables, respec-
tively, and D represents a type constructor.

of this section, we shall postpone the discussions of local bindings (i.e., let) to
Sect. 5. Expressions consist of variables x, applications e1 e2 , λ-abstractions λx.e,
constructor applications C e, and (shallow) pattern matching case e0 of {Ci xi →
ei }i . For simplicity, we assume that constructors are fully-applied and patterns
are shallow. As usual, patterns Ci xi must be linear in the sense that each variable
in xi is diﬀerent. Programs are assumed to be appropriately α-renamed so that
newly introduced variables by λ and patterns are always fresh. We do not require
the patterns of a case expression to be exhaustive or no overlapping, following
the original λq→ [7]; the linearity in λq→ cares only for successful computations.
Unlike the original λq→ , we do not annotate λ and case with the multiplicity of
the argument and the scrutinee, respectively.
Constructors play an important role in λq→ . As we will see later, they can be
used to witness unrestrictedness, similarly to ! of !e in a linear type system [33].

2.2 Types
Types and related notations are defined in Fig. 1. Types are separated into
monotypes and polytypes (or, type schemes). Monotypes consist of (rigid) type
variables a, datatypes D μ τ , and multiplicity-annotated function types τ1 →μ τ2 .
Here, a multiplicity μ is either 1 (linear), ω (unrestricted), or a (rigid) multiplicity
variable p. Polytypes have the form ∀pa.Q ⇒ τ , where Q is a constraint that
is a conjunction of predicates. A predicate φ has the form of M ≤ M , where
M and M are multiplications of multiplicities. We shall sometimes treat Q as
a set of predicates, which means that we shall rewrite Q according to contexts
by the idempotent commutative monoid laws of ∧. We call both multiplicity (p)
and type (a) variables type-level variables, and write ftv(t) for the set of free
type-level variables in syntactic objects (such as types and constraints) t.
The relation (≤) and operator (·) in predicates denote the corresponding
relation and operator on {1, ω}, respectively. On {1, ω}, (≤) is defined as the
reflexive closure of 1 ≤ ω; note that ({1, ω} , ≤) forms a total order. Multiplication
(·) on {1, ω} is defined by
1·m=m·1=m ω · m = m · ω = ω.
For simplicity, we shall sometimes omit (·) and write m1 m2 for m1 · m2 . Note
that, for m1 , m2 ∈ {1, ω}, m1 · m2 is the least upper bound of m1 and m2 with
respect to ≤. As a result, m1 · m2 ≤ m holds if and only if (m1 ≤ m) ∧ (m2 ≤ m)
holds; we will use this property for efficient handling of constraints (Sect. 3.2).
Modular Inference of Linear Types for Multiplicity-Annotated Arrows 461

We assume a ﬁxed set of constructors given beforehand. Each constructor

is assigned a type of the form ∀pa. τ1 →μ1 . . . →μn1 τn →μn D p a where
each τi and μi do not contain free type-level variables other than {pa}, i.e.,
i , μi ) ⊆ {pa}. For simplicity, we write the above type as ∀pa. τ →μ D p a.

i ftv(τ
We assume that types are well-kinded, which eﬀectively means that D is applied
to the same numbers of multiplicity arguments and type arguments among the
constructor types. Usually, it suﬃces to use constructors of linear function types
as below because they can be used in both linear and unrestricted code.
(−, −) : ∀a b. a →1 b →1 a ⊗ b
Nil : ∀a. List a Cons : ∀a. a →1 List a →1 List a
In general, constructors can encapsulate arguments’ multiplicities as below,
which is useful when a function returns both linear and unrestricted results.
MkUn : ∀a. a →ω Un a MkMany : ∀p a. a →p Many p a
For example, a function that reads a value from a mutable array at a given
index can be given as a primitive of type readMArray : ∀a. MArray a →1 Int →ω
(MArray a ⊗ Un a) [7]. Multiplicity-parameterized constructors become useful
when the multiplicity of contents can vary. For example, the type IOL p a with
the constructor MkIOL : (World →1 (World ⊗ Many p a)) →1 IOL p a can
represent the IO monad [7] with methods return : ∀p a. a →p IOL p a and
(>>=) : ∀p q a b. IOL p a →1 (a →p IOL q b) →1 IOL q b.

2.3 Typing Rules

Our type system uses two sorts of environments A typing environment maps
variables into polytypes (as usual in non-linear calculi), and a multiplicity envi-
ronment maps variables into multiplications of multiplicities. This separation of
the two will be convenient when we discuss type inference. As usual, we write
x1 : A1 , . . . , xn : An instead of {x1
→ A1 , . . . , xn
→ An } for typing environments.
For multiplicity environments, we use multiset-like notation as x1 M1 , . . . , xn Mn .
We use the following operations on multiplicity environments: 3

ω if x ∈ dom(Δ1 ) ∩ dom(Δ2 )
(Δ1 + Δ2 )(x) =
Δi (x) if x ∈ dom(Δi ) \ dom(Δj ) (i = j ∈ {1, 2})
(μΔ)(x) = μ · Δ(x)

Δ1 (x) · Δ2 (x) if x ∈ dom(Δ1 ) ∩ dom(Δ2 )
(Δ1 Δ2 )(x) =
ω if x ∈ dom(Δi ) \ dom(Δj ) (i = j ∈ {1, 2})
3
In these definitions, we implicitly consider multiplicity 0 and regard Δ(x) = 0 if
x ∈ dom(Δ). It is natural that 0 + m = m + 0. With 0, multiplication ·, which is
extended as 0 · m = m · 0 = 0, no longer computes the least upper bound. Therefore,
we use for the last definition; in fact, the definition corresponds to the pointwise
computation of Δ1 (x) Δ2 (x), where ≤ is extended as 0 ≤ ω but not 0 ≤ 1. This
treatment of 0 coincides with that in the Linear Haskell proposal [26].
462 K. Matsuda

Q; Γ ; Δ e : τ Γ (x) = ∀pa.Q ⇒ τ
Q |= Δ = Δ Q |= τ ∼ τ Q |= Q [p → μ] Q |= x1 ≤ Δ
Eq Var
Q; Γ ; Δ e : τ Q; Γ ; Δ x : τ [p → μ, a → τ ]
Q; Γ, x : σ; Δ, xμ e : τ Q; Γ ; Δ1 e1 : σ →μ τ Q; Γ ; Δ2 e2 : σ
Abs App
Q; Γ ; Δ λx.e : σ →μ τ Q; Γ ; Δ1 + μΔ2 e1 e2 : τ
C : ∀pa. τ →ν D p a {Q; Γ ; Δi ei : τi [p → μ, a → σ]}i
Con
Q; Γ ; ωΔ0 + i νi [p → μ]Δi C e : D μ σ
Q; Γ ; Δ0 e0 : D μ σ

Ci : ∀pa. τi →νi D p a
Q; Γ, xi : τi [p → μ, a → σ]; Δi , xi μ0 νi [p→μ] ei : τ i
Case
Q; Γ ; μ0 Δ0 + i Δi case e0 of {Ci xi → ei }i : τ

Fig. 2. Typing relation for expressions

Intuitively, Δ(x) represents the number of uses of x. So, in the deﬁnition of

Δ1 + Δ2 , we have (Δ1 + Δ2 )(x) = ω if x ∈ dom(Δ1 ) ∩ dom(Δ2 ) because this
condition means that x is used in two places. Operation Δ1 Δ2 is used for case
branches. Suppose that a branch e1 uses variables as Δ1 and another branch e2
uses variables as Δ2 . Then, putting the branches together, variables are used
as Δ1 Δ2 . The deﬁnition says that x is considered to be used linearly in the
two branches put together if and only if both branches use x linearly, where
non-linear use includes unrestricted use (Δi (x) = ω) and non-use (x ∈ dom(Δ)).
We write Q |= Q if Q logically entails Q . That is, for any valuation of
multiplicity variables θ(p) ∈ {1, ω}, Q θ holds if Qθ does. For example, we have
p ≤ r ∧ r ≤ q |= p ≤ q. We extend the notation to multiplicity environments
and write Q |= Δ1 ≤ Δ2 if dom(Δ1 ) ⊆ dom(Δ2 ) and Q |= x∈dom(Δ) Δ1 (x) ≤

Δ2 (x) ∧ x∈dom(Δ2 )\dom(Δ1 ) ω ≤ Δ2 (x) hold. We also write Q |= Δ1 = Δ2 if both

Q |= Δ1 ≤ Δ2 and Q |= Δ2 ≤ Δ1 hold. We then have the following properties.

Lemma 2. Q |= μΔ ≤ Δ implies Q |= Δ ≤ Δ .

Lemma 3. Q |= Δ1 Δ2 ≤ Δ implies Q |= Δ1 ≤ Δ and Q |= Δ2 ≤ Δ .

Constraints Q aﬀect type equality; for example, under Q = p ≤ q ∧ q ≤ p,

σ →p τ and σ →q τ become equivalent. Formally, we write Q |= τ ∼ τ if
τ θ = τ θ for any valuation θ of multiplicity variables that makes Qθ true.
Now, we are ready to deﬁne the typing judgment for expressions, Q; Γ ; Δ e :
τ , which reads that under assumption Q, typing environment Γ , and multiplicity
environment Δ, expression e has monotype τ , by the typing rules in Fig. 2. Here,
we assume dom(Δ) ⊆ dom(Γ ). Having x ∈ dom(Γ ) \ dom(Δ) means that the
multiplicity of x is essentially 0 in e.
Rule Eq says that we can replace τ and Δ with equivalent ones in typing.
Modular Inference of Linear Types for Multiplicity-Annotated Arrows 463

Q; Γ ; Δ e : τ pa = ftv(Q, τ ) Γ, f : ∀pa.Q ⇒ τ prog

Empty Bind
Γ ε Γ f = e; prog
Q; Γ ; Δ e : τ pa = ftv(Q, τ ) Γ, f : ∀pa.Q ⇒ τ prog
BindA
Γ f : (∀pa.Q ⇒ τ ) = e; prog

Fig. 3. Typing rules for programs

Rule Var says that x is used once in a variable expression x, but it is safe to
regard that the expression uses x more than once and uses other variables ω times.
At the same time, the type ∀pa.Q ⇒ τ of x instantiated to τ [p
→ μ, a
→ σ] with
yielding constraints Q [p
→ μ], which must be entailed from Q.
Rule Abs says that λx.e has type σ →μ τ if e has type τ , assuming that
the use of x in e is μ. Unlike the original λq→ [7], in our system, multiplicity
annotations on arrows must be μ, i.e., 1, ω, or a multiplicity variable, instead of
M . This does not limit the expressiveness because such general arrow types can
be represented by type σ →p τ with constraints p ≤ M ∧ M ≤ p.
Rule App sketches an important principle in λq→ ; when an expression with
variable use Δ is used μ-many times, the variable use in the expression becomes
μΔ. Thus, since we pass e2 (with variable use Δ2 ) to e1 , where e1 uses the
argument μ-many times as described in its type σ →μ τ , the use of variables in
e2 of e1 e2 becomes μΔ2 . For example, for (λy.42) x, x is considered to be used
ω times because (λy.42) has type σ →ω Int for any σ.
Rule Con is nothing but a combination of Var and App. The ωΔ0 part is
only useful when C is nullary; otherwise, we can weaken Δ at leaves.
Rule Case is the most complicated rule in this type system. In this rule, μ0
represents how many times the scrutinee e0 is used in the case. If μ0 = ω, the
pattern bound variables can be used unrestrictedly, and if μ0 = 1, the pattern
bound variables can be used according to the multiplicities of the arguments of the
constructor.4 Thus, in the ith branch, variables in xi can be used as μ0 νi [p
→ μ],
where μi [p
→ μ] represents the multiplicities of the arguments of the constructor
Ci . Other than xi , each branch body ei can contain free variables used as Δi .
Thus, the uses of free variables in the whole branch bodies are summarized as
i Δi . Recall that the case uses the scrutinee μ0 times; thus, the whole uses of

variables are estimated as μ0 Δ0 + i Δi .
Then, we define the typing judgment for programs, Γ prog, which reads that
program prog is well-typed under Γ , by the typing rules in Fig. 3. At this place,
the rules Bind and BindA have no significant differences; their difference will be
clear when we discuss type inference. In the rules Bind and BindA, we assumed
that Γ contains no free type-level variables. Therefore, we can safely generalize
all free type-level variables in Q and τ . We do not check the use Δ in both rules
4
This behavior, inherited from λq→ [7], implies the isomorphism !(A ⊗ B) ≡ !A ⊗ !B,
which is not a theorem in the standard linear logic. The isomorphism intuitively means
that unrestricted products can (only) be constructed from unrestricted components,
as commonly adopted in linearity-via-kind approaches [11, 21, 24, 28, 29].
464 K. Matsuda

as bound variables are assumed to be used arbitrarily many times in the rest
of the program; that is, the multiplicity of a bound variable is ω and its body
uses variable as ωΔ, which maps x ∈ dom(Δ) to ω and has no free type-level
variables.

2.4 Metatheories
Lemma 4 is the standard weakening property. Lemma 5 says that we can replace
Q with a stronger one, Lemma 6 says that we can replace Δ with a greater one,
and Lemma 7 says that we can substitute type-level variables in a term-in-context
without violating typeability. These lemmas state some sort of weakening, and
the last three lemmas clarify the goal of our inference system discussed in Sect. 3.
Lemma 4. Q; Γ ; Δ e : τ implies Q; Γ, x : A; Δ e : τ .

Lemma 5. Q; Γ ; Δ e : τ and Q |= Q implies Q ; Γ ; Δ e : τ .

Lemma 6. Q; Γ ; Δ e : τ and Q |= Δ ≤ Δ implies Q; Γ ; Δ e : τ .

Lemma 7. Q; Γ ; Δ e : τ implies Qθ; Γ θ; Δθ e : τ θ.

We have the following form of the substitution lemma:
Lemma 8 (Substitution). Suppose Q0 ; Γ, x: σ; Δ0 , xμ e : τ , and Qi ; Γ ; Δi

ei : σi for each i. Then, Q1 ∧ i Qi ; Γ ; Δ0 + i μi Δi e[x
→ e ] : τ .

Subject Reduction We show the subject reduction property for a simple call-by-
name semantics. Consider the standard small-step call-by-name relation e −→ e
with the following β-reduction rules (we omit the congruence rules):
(λx.e1 ) e2 −→ e1 [x
→ e2 ] case Cj ej of {Ci xi → ei }i −→ ej [xj
→ ej ]
Then, by Lemma 8, we have the following subjection reduction property:
Lemma 9 (Subject Reduction). Q; Γ ; Δ e : τ and e −→ e implies
Q; Γ ; Δ e : τ .

Lemma 9 holds even for the call-by-value reduction, though with a caveat.
For a program f1 = e1 ; . . . ; fn = en , it can happen that some ei is typed
only under unsatisfiable (i.e., conflicting) Qi . As conflicting Qi means that ei
is essentially ill-typed, evaluating ei may not be safe. However, the standard
call-by-value strategy evaluates ei , even when fi is not used at all and thus the
type system does not reject this unsatisfiability. This issue can be addressed
by the standard witness-passing transformation [15] that converts programs so
that Q ⇒ τ becomes WQ → τ , where WQ represents a set of witnesses of Q.
Nevertheless, it would be reasonable to reject conflicting constraints locally.
We then state the correspondence with the original system [7] (assuming the
modification [6] for the variable case5 ) to show that the qualified-typed version
5
In the premise of Var, the original [7] uses ∃Δ . Δ = x1 + ωΔ , which is modified
to x1 ≤ Δ in [6]. The difference between the two becomes clear when Δ(x) = p, for
which the former one does not hold as we are not able to choose Δ depending on p.
Modular Inference of Linear Types for Multiplicity-Annotated Arrows 465

captures the linearity as the original. While the original system assumes the
call-by-need evaluation, Lemma 9 could be lifted to that case.
Theorem 1. If ; Γ ; Δ e : τ where Γ contains only monotypes, e is also
well-typed in the original λq→ under some environment.

The main reason for the monotype restriction is that our polytypes are strictly
more expressive than their (rank-1) polytypes. This extra expressiveness comes
from predicates of the form · · · ≤ M ·M . Indeed, f = λx.case x of {MkMany y →
(y, y)} has type ∀p q a. ω ≤ p · q ⇒ MkMany p a →q a ⊗ a in our system, while it
has three incomparable types in the original λq→ .

3 Type Inference
In this section, we give a type inference method for the type system in the
previous section. Following [31, Section 3], we adopt the standard two-phase
approach; we ﬁrst gather constraints on types and then solve them. As mentioned
in Sect. 1, the inference system described here has the issue of ambiguity, which
will be addressed in Sect. 4.

3.1 Inference Algorithm

We ﬁrst extend types τ and multiplicities μ to include uniﬁcation variables.

τ ::= · · · | α μ ::= · · · | π

We call α/π a uniﬁcation type/multiplicity variable, which will be substituted

by a concrete type/multiplicity (including rigid variables) during the inference.
Similarly to ftv(t), we write fuv(t) for the uniﬁcation variables (of both sorts) in
t, where each ti ranges over any syntactic element (such as τ , Q, Γ , and Δ).
Besides Q, the algorithm will generate equality constraints τ ∼ τ . Formally,
the sets of generated constraints C and generated predicates ψ are given by

ψ ::= φ | τ ∼ τ

C ::= ψi
i

Then, we deﬁne type inference judgment for expressions, Γ e : τ ; Δ; C,

which reads that, given Γ and e, type τ is inferred together with variable use Δ
and constraints C, by the rules in Fig. 4. Note that Δ is also synthesized as well
as τ and C in this step. This difference in the treatment of Γ and Δ is why we
separate multiplicity environments Δ from typing environments Γ .
Gathered constraints are solved when we process top-level bindings. Figure 5
defines type inference judgment for programs, Γ prog, which reads that the
inference finds prog well-typed under Γ . In the rules, manipulation of constraints
is done by the simplification judgment Q simp C ; Q ; θ, which simplifies
C under the assumption Q into the pair (Q , θ) of residual constraints Q and
substitution θ for unification variables, where (Q , θ) is expected to be equivalent
466 K. Matsuda

Γ (x) = ∀pa.Q ⇒ τ α, π : fresh Γ, x : α e : τ ; Δ, xM ; C α, π : fresh

Γ x : τ [p → π, a → α] ; x ; Q[p → π]
1
Γ λx.e : α →π τ ; Δ; C ∧ M ≤ π
Γ e1 : τ1 ; Δ1 ; C2 Γ e2 : τ2 ; Δ2 ; C1 β, π : fresh
Γ e1 e2 : β ; Δ1 + πΔ2 ; C1 ∧ C2 ∧ τ1 ∼ (τ2 →π β)
C : ∀pa. σ →ν D p a {Γ ei : τi ; Δi ; Ci }i α, π : fresh

Γ C e : D π α ; i νi [p → π]Δi ; i Ci ∧ τi ∼ σi [p → π, a → α]

e0 : τ0 ; Δ0 ; C0 π0 , πi , αi , β : fresh
Γ

Ci : ∀pa. τi →νi D p a
Γ, xi : τi [p → πi , a → αi ] ei : τi ; Δi , xi Mi ; Ci i

C = C0 ∧ i Ci ∧ β ∼ τi ∧ (τ0 ∼ D πi αi ) ∧ j Mij ≤ π0 νij [p → πi ]

Γ case e0 of {Ci xi → ei }i : β ; π0 Δ0 + i Δi ; C

Fig. 4. Type inference rules for expressions

Γ e : τ ; Δ; C simp C ; Q; θ {πα} = fuv(Q, τ θ)

p, a : fresh Γ, f : ∀pa.(Q ⇒ τ θ)[α → a, π → p] prog
Γ ε Γ f = e; prog
Γ e : σ ; Δ; C Q simp C ∧ τ ∼ σ ; ; θ Γ, f : ∀pa.Q ⇒ τ prog
Γ f : (∀pa.Q ⇒ τ ) = e; prog

Fig. 5. Type inference rules for programs

in some sense to C under the assumption Q. The idea underlying our simpliﬁcation
is to solve type equality constraints in C as much as possible and then remove
predicates that are implied by Q. Rules s-Fun, s-Data, s-Uni, and S-Triv
are responsible for the former, which decompose type equality constraints and
yield substitutions once either of the sides becomes a uniﬁcation variable. Rules
S-Entail and S-Rem are responsible for the latter, which remove predicates
implied by Q and then return the residual constraints. Rule S-Entail checks
Q |= φ; a concrete method for this check will be discussed in Sect. 3.2.

Example 1 (app). Let us illustrate how the system infers a type for app =
λf.λx.f x. We have the following derivation for its body λf.λx.f x:

f : αf f : αf ; f 1 ; x : αx x : αx ; x1 ;
f : αf , x : αx f x : β ; f 1 , xπ ; αf ∼ (αx →π β)
f : αf λx.f x : αx →πx β ; f 1 ; αf ∼ (αx →π β) ∧ πx ≤ π
λf.λx.f x : αf →πf αx →πx β ; ∅; αf ∼ (αx →π β) ∧ πx ≤ π ∧ 1 ≤ πf

The highlights in the above derivation are:

– In the last two steps, f is assigned to type αf and multiplicity πf , and x is
assigned to type αx and multiplicity πx .
Modular Inference of Linear Types for Multiplicity-Annotated Arrows 467

Q simp σ ∼ σ ∧ μ ≤ μ ∧ μ ≤ μ ∧ τ ∼ τ ; Q ; θ
S-Fun
Q simp (σ →μ τ ) ∼ (σ →μ τ ) ∧ C ; Q ; θ
Q simp μ ≤ μ ∧ μ ≤ μ ∧ σ ∼ σ ∧ C ; Q ; θ
S-Data
Q simp (D μ σ) ∼ (D μ σ ) ∧ C ; Q ; θ
α ∈ fuv(τ ) Q simp C[α → τ ] ; Q ; θ Q simp C ; Q ; θ
S-Uni S-Triv
Q simp α ∼ τ ∧ C ; Q ; θ ◦ [α → τ ] Q simp τ ∼ τ ∧ C ; Q ; θ
Q ∧ Qw |= φ Q simp Qw ∧ C ; Q ; θ no other rules can apply
S-Entail S-Rem
Q simp φ ∧ Qw ∧ C ; Q ; θ Q simp Q ; Q ; ∅

Fig. 6. Simpliﬁcation rules (modulo commutativity and associativity of ∧ and commu-

tativity of ∼)

– Then, in the third last step, for f x, the system infers type β with constraint
αf ∼ (αx →π β). At the same time, the variable use in f x is also inferred
as f 1 , xπ . Note that the use of x is π because it is passed to f : αx →π β.
– After that, in the last two steps again, the system yields constraints πx ≤ π
and 1 ≤ πf .
As a result, the type τ = αf →πf αx →πx β is inferred with the constraint
C = αf ∼ (αx →π β) ∧ πx ≤ π ∧ 1 ≤ πf .
Then, we try to assign a polytype to app by the rules in Fig. 4. By simpliﬁ-
cation, we have simp C ; πx ≤ π; [αf
→ (αx →π β)]. Thus, by generalizing
τ [αf
→ (αx →π β)] = (αx →π β) →πf αx →πx β with πx ≤ π, we obtain the
following type for app:
app : ∀p pf px a b. p ≤ px ⇒ (a →p b) →pf a →px b

Correctness We ﬁrst prepare some deﬁnitions for the correctness discussions.

First, we allow substitutions θ to replace unification multiplicity variables as well
as unification type variables. Then, we extend the notion of |= and write C |= C
if C θ holds when Cθ holds. From now on, we require that substitutions are
idempotent, i.e., τ θθ = τ θ for any τ , which excludes substitutions [α
→ List α]
and [α
→ β, β
→ Int] for example. Let us write Q |= θ = θ if Q |= τ θ ∼ τ θ for
any τ . The restriction of a substitution θ to a domain X is written by θ|X .
Consider a pair (Qg , Cw ), where we call Qg and Cw given and wanted con-
straints, respectively. Then, a pair (Q, θ) is called a (sound) solution [31] for the
pair (Qg , Cw ) if Qg ∧ Q |= Cw θ, dom(θ) ∩ fuv(Qg ) = ∅, and dom(θ) ∩ fuv(Q) = ∅.
A solution is called guess-free [31] if it satisfies Qg ∧ Cw |= Q ∧ π∈dom(θ) (π =
θ(π)) ∧ α∈dom(θ) (α ∼ θ(α)) in addition. Intuitively, a guess-free solution consists

of necessary conditions required for a wanted constraint Cw to hold, assuming
a given constraint Qg . For example, for (, α ∼ (β →1 β)), (, [α
→ (Int →1
Int), β
→ Int]) is a solution but not guess-free. Very roughly speaking, being for
(Q, θ) a guess-free solution of (Qg , Cw ) means that (Q, θ) is equivalent to Cw
under the assumption Qg . There can be multiple guess-free solutions; for example,
for (, π ≤ 1), both (π ≤ 1, ∅) and (, [π
→ 1]) are guess-free solutions.
468 K. Matsuda

Lemma 10 (Soundness and Principality of Simpliﬁcation). If Q simp

C ; Q ; θ, (Q , θ) is a guess-free solution for (Q, C).

Lemma 11 (Completeness of Simpliﬁcation). If (Q , θ) is a solution for

(Q, C) where Q is satisﬁable, then Q simp C ; Q ; θ for some Q and θ .

Theorem 2 (Soundness of Inference). Suppose Γ e : τ ; Δ; C and there

is a solution (Q, θ) for (, C). Then, we have Q; Γ θ; Δθ e : τ θ.

Theorem 3 (Completeness and Principality of Inference). Suppose Γ

e : τ ; Δ; C. Suppose also that Q ; Γ θ ; Δ e : τ for some substitution θ
on uniﬁcation variables such that dom(θ ) ⊆ fuv(Γ ) and dom(θ ) ∩ fuv(Q ) = ∅.
Then, there exists θ such that dom(θ) \ dom(θ ) ⊆ X, (Q , θ) is a solution for
(, C), Q |= θ|dom(θ ) = θ , Q |= τ θ ∼ τ , and Q |= Δθ ≤ Δ , where X is the
set of uniﬁcation variables introduced in the derivation.

Note that the constraint generation Γ e : τ ; Δ; C always succeeds,

whereas the generated constraints may possibly be conﬂicting. Theorem 3 states
that such a case cannot happen when e is well-typed under the rules in Fig. 2.

Incompleteness in Typing Programs. It may sound contradictory to Theorem 3,

but the type inference is indeed incomplete for checking type-annotated bindings.
Recall that the typing rule for type-annotated bindings requires that the resulting
constraint after simplification must be . However, even when there exists a
solution of the form (, θ) for (Q, C), there can be no guess-free solution of this
form. For example, (, π ≤ π ) has a solution (, [π
→ π ]), but there are no
guess-free solutions of the required form. Also, even though there exists a guess-
free solution of the form (, θ), the simplification may not return the solution, as
guess-free solutions are not always unique. For example, for (, π ≤ π ∧ π ≤ π),
(, [π
→ π ]) is a guess-free solution, whereas we have simp π ≤ π ∧ π ≤ π ;
π ≤ π ∧ π ≤ π; ∅. The source of the issue is that constraints on multiplicities
can (also) be solved by substitutions.
Fortunately, this issue disappears when we consider disambiguation in Sect. 4.
By disambiguation, we can eliminate constraints for internally-introduced multi-
plicity unification variables that are invisible from the outside. As a result, after
processing equality constraints, we essentially need only consider rigid multiplicity
variables when checking entailment for annotated top-level bindings.

Promoting Equalities to Substituions. The inference can infer polytypes ∀p. p ≤

1 ⇒ Int →p Int and ∀p1 p2 . (p1 ≤ p2 ∧ p2 ≤ p1 ) ⇒ Int →p1 Int →p2 Int, while
programmers would prefer more simpler types Int →1 Int and ∀p. Int →p Int →p
Int; the simpliﬁcation so far does not yield substitutions on multiplicity uniﬁcation
variables. Adding the following rule remedies the situation:

π ∈ fuv(Q) π = μ
Q ∧ Qw |= π ≤ μ ∧ μ ≤ π Q simp (Qw ∧ C)[π
→ μ] ; Q ; θ
S-Eq
Q simp Qw ∧ C ; Q ; θ ◦ [π
→ μ]
Modular Inference of Linear Types for Multiplicity-Annotated Arrows 469

This rule says that if π = μ must hold for Qw ∧ C to hold, the simplification yields
the substitution [π
→ μ]. The condition π ∈ fuv(Q) is required for Lemma 10; a
solution cannot substitute variables in Q. Note that this rule essentially finds an
improving substitution [16].
Using the rule is optional. Our prototype implementation actually uses S-Eq
only for Qw for which we can find μ easily: M ≤ 1, ω ≤ μ, and looping chains
μ1 ≤ μ2 ∧ · · · ∧ μn−1 ≤ μn ∧ μn ≤ μ1 .

3.2 Entailment Checking by Horn SAT Solving

The simplification rules rely on the check of entailment Q |= φ. For the constraints
in this system, we can perform this check in quadratic time at worst but in linear
time for most cases. Specifically, we reduce the checking Q |= φ to satisfiability of
propositional Horn formulas (Horn SAT), which is known to be solved in linear
time in the number of occurrences of literals [10], where the reduction (precisely,
the preprocessing of the reduction) may increase the problem size quadratically.
The idea of using Horn SAT for constraint solving in linear typing can be found
in Mogensen [23].
First, as a preprocess, we normalize both given and wanted constraints by
the following rules:

– Replace M1 · M2 ≤ M with M1 ≤ M ∧ M2 ≤ M .
– Replace M · 1 and 1 · M with M , and M · ω and ω · M with ω.
– Remove trivial predicates 1 ≤ M and M ≤ ω.

After this, each predicate φ has the form μ ≤ i νi .

After the normalization above, we can reduce the entailment checking to
satisfiability. Specifically, we use the following property:

Q |= μ ≤ νi iff Q ∧ (νi ≤ 1) ∧ (ω ≤ μ) is unsatisfiable
i i

Here, the constraint Q ∧ i (νi ≤ 1) ∧ (ω ≤ μ) intuitively asserts that there exists

a counterexample of Q |= μ ≤ i νi .
Then, it is straightforward to reduce the satisﬁability of Q to Horn SAT;
we just map 1 to true and ω to false and accordingly map ≤ and · to ⇐ and
∧, respectively. Since Horn SAT can be solved in linear time in the number of
occurrences of literals [10], the reduction also shows that the satisﬁability of Q is
checked in linear time in the size of Q if Q is normalized.

Corollary 1. Checking Q |= φ is in linear time if Q and φ are normalized.

The normalization of constraints can duplicate M of · · · ≤ M , and thus

increases the size quadratically in the worst case. Fortunately, the quadratic
increase is not common because the size of M is bounded in practice, in many cases
by one. Among the rules in Fig. 2, only the rule that introduces non-singleton
M in the right-hand side of ≤ is Case for a constructor whose arguments’
470 K. Matsuda

multiplicities are non-constants, such as MkMany : ∀p a. a →p Many p a. However,

it often suﬃces to use non-multiplicity-parameterized constructors, such as
Cons : ∀a. a →1 List a →1 List a, because such constructors can be used to
construct or deconstruct both linear and unrestricted data.

3.3 Issue: Inference of Ambiguous Types

The inference system so far looks nice; the system is sound and complete, and
infers principal types. However, there still exists an issue to overcome for the
system to be useful: it often infers ambiguous types [15, 27] in which internal
multiplicity variables leak out to reveal internal implementation details.
Consider app = λf.λx.app f x for app = λf.λx.f x from Example 1. We
would expect that equivalent types are inferred for app and app. However, this
is not the case for the inference system. In fact, the system infers the following
type for app (here we reproduce the inferred type of app for comparison):
app : ∀p pf px a b. (p ≤ px ) ⇒ (a →p b) →pf a →px b
app : ∀q qf qx pf px a b. (q ≤ qx ∧ qf ≤ pf ∧ qx ≤ px ) ⇒ (a →q b) →pf a →px b
We highlight why this type is inferred as follows.
– By abstractions, f is assigned to type αf and multiplicity πf , and x is
assigned to type αx and multiplicity πx .
– By its use, app is instantiated to type (α →π β ) →πf α →πx β with
constraint π ≤ πx .
– For app f , the system infers type β with constraint ((α →π β ) →πf α →πx
β ) ∼ (αf →π1 β). At the same time, the variable use in the expression is
inferred as app 1 , f π1 .
– For (app f x), the system infers type γ with constraint β ∼ (α →π2 γ). At
the same time, the variable use in the expression is inferred as app 1 , f π1 , xπ2 .
– As a result, λf.λx.app f x has type αf →πf αx →πx γ, yielding constraints
π1 ≤ πf ∧ π2 ≤ πx .
Then, for the gathered constraints, by simpliﬁcation (including S-Eq), we obtain
a (guess-free) solution (Q, θ) such that Q = (πf ≤ πf ∧ π ≤ πx ∧ πx ≤ πx ) and
θ = [αf
→ (α →π β ), π1
→ πf , β
→ (αf →πx β ), π2
→ πx , γ
→ β ]). Then,
after generalizing (αf →πf αx →πx γ)θ = (α →π β ) →πf α →πx β, we obtain
the inferred type above.
There are two problems with this inference result:
– The type of app is ambiguous in the sense that the type-level variables in the
constraint cannot be determined only by those that appear in the type [15,27].
Usually, ambiguous types are undesirable, especially when their instantiation
aﬀects runtime behavior [15, 27, 31].
– Due to this ambiguity, the types of app and app are not judged equivalent
by the inference system. For example, the inference rejects the binding
app : ∀p pf px a b. (p ≤ px ) ⇒ (a →p b) →pf a →px b = app because the
system does not know how to instantiate the ambiguous type-level variables
qf and qx , while the binding is valid in the type system in Sect. 2.
Modular Inference of Linear Types for Multiplicity-Annotated Arrows 471

Inference of ambiguous types is common in the system; it is easily caused by

using deﬁned variables. Rejecting ambiguous types is not a solution for our case
because it rejects many programs. Defaulting such ambiguous type-level variables
to 1 or ω is not a solution either because it loses principality in general. However,
we have no other choices than to reject ambiguous types, as long as multiplicities
are relevant in runtime behavior.
In the next section, we will show how we address the ambiguity issue un-
der the assumption that multiplicities are irrelevant at runtime. Under this
assumption, it is no problem to have multiplicity-monomorphic primitives such
as array processing primitives (e.g., readMArray : ∀a. MArray a →1 Int →ω
(MArray a ⊗ Un a)) [31]. Note that this assumption does not rule out all
multiplicity-polymorphic primitives; it just prohibits the primitives from in-
specting multiplicities at runtime.

4 Disambiguation by Quantiﬁer Elimination

In this section, we address the issue of ambiguous and leaky types by using
quantiﬁer elimination. The basic idea is simple; we just view the type of app as

app : ∀q pf px a b. (∃qx qf . q ≤ qx ∧ qf ≤ pf ∧ qx ≤ px ) ⇒ (a →q b) →pf a →px b

In this case, the constraint (∃qx qf . q ≤ qx ∧ qf ≤ pf ∧ qx ≤ px ) is logically

equivalent to q ≤ px , and thus we can infer the equivalent types for both app
and app . Fortunately, such quantifier elimination is always possible for our repre-
sentation of constraints; that is, for ∃p.Q, there always exists Q that is logically
equivalent to ∃p.Q. A technical subtlety is that, although we perform quantifier
elimination after generalization in the above explanation, we actually perform
quantifier elimination just before generalization, or more precisely, as a final step
of simplification, for compatibility with the simplification in OutsideIn(X) [31],
especially in the treatment of local assumptions.

4.1 Elimination of Existential Quantiﬁers

The elimination of existential quantifiers is rather easy; we simply use the well-
known fact that a disjunction of a Horn clause and a definite clause can also be
represented as a Horn clause. Regarding our encoding of normalized predicates
(Sect. 3.2) that maps μ ≤ M to a Horn clause, the fact can be rephrased as:
Lemma 12. (μ ≤ M ∨ ω ≤ M ) ≡ μ ≤ M · M .

Here, we extend constraints to include ∨ and write ≡ for the logical equivalence;
that is, Q ≡ Q if and only if Q |= Q and Q |= Q.
As a corollary, we obtain the following result:
Corollary 2. There effectively exists a quantifier-free constraint Q , denoted by
elim(∃π.Q), such that Q is logically equivalent to ∃π.Q.
472 K. Matsuda

Proof. Note that ∃π.Q means Q[π

→ 1] ∨ Q[π
→ ω] because π ranges over {1, ω}.
We safely assume that Q is normalized (Sect. 3.2) and that Q does not contain a
predicate π ≤ M where π appears also in M , because such a predicate trivially
holds.
We define Φ1 , Φω , and Qrest as Φ1 = {μ ≤ M | (μ≤ π · M ) ∈ Q, μ = π}, Φω =
{ω ≤ M | (π ≤ M ) ∈ Q, π ∈ fuv(M )}, and Qrest = {φ | φ ∈ Q, π ∈ fuv(φ)}. Here,
we abused the notation to write φ ∈ Q to mean that Q = i φi and φ = φi
for some i. In the construction of Φ1 , we assumed the monoid laws of (·);
the definition says that we remove π from the right-hand sides and M be-
comes 1 if the right-hand side is π. By construction, Q[p
→ 1] and Q[p
→ ω]
are equivalent to ( Φ1 ) ∧ Qrest and ( Φω ) ∧ Qrest , respectively. Thus, by

Lemma 12 and by the distributivity of ∨ over ∧ it suffices to define Q as
Q = ( {μ ≤ M · M | μ ≤ M ∈ Φ1 , ω ≤ M ∈ Φω }) ∧ Qrest .

Example 2. Consider Q = (πf ≤ πf ∧ π ≤ πx ∧ πx ≤ πx ); this is the constraint

obtained from λf.λx.app f x (Sect. 3.3). Since πf and πx do not appear in the
inferred type (α →π β ) →πf α →πx β, we want to eliminate them by the
above step. There is a freedom to choose which variable is eliminated ﬁrst. Here,
we shall choose πf ﬁrst.
First, we have elim(∃πf .Q) = π ≤ πx ∧ πx ≤ πx because for this case
we have Φ1 = ∅, Φω = {ω ≤ πf }, and Qrest = π ≤ πx ∧ πx ≤ πx . We then
have elim(∃πx .π ≤ πx ∧ πx ≤ πx ) = π ≤ πx because for this case we have
Φ1 = {π ≤ 1}, Φ2 = {ω ≤ πx }, and Qrest = .

In the worst case, the size of elim(∃π.Q) can be quadratic to that of Q. Thus,
repeating elimination can make the constraints exponentially bigger. We believe
that such blow-up rarely happens because it is usual that π occurs only in a few
predicates in Q. Also, recall that non-singleton right-hand sides are caused only
by multiplicity-parameterized constructors. When each right-hand side of ≤ is a
singleton in Q, the same holds in elim(∃π.Q). For such a case, the exponential
blow-up cannot happen because the size of constraints in the form is at most
quadratic in the number of multiplicity variables.

4.2 Modiﬁed Typing Rules

As mentioned at the begging of this section, we perform quantiﬁer elimination as

the last step of simpliﬁcation. To do so, we deﬁne Q τsimp C ; Q ; θ as follows:

Q simp C ; Q ; θ {π} = fuv(Q ) \ fuv(τ θ) Q = elim(∃π.Q )

Q τsimp C ; Q ; θ

Here, τ is used to determine which uniﬁcation variables will be ambiguous after

generalization. We simply identify variables (π above) that are not in τ as
ambiguous [15] for simplicity. This check is indeed conservative in a more general
deﬁnition of ambiguity [27], in which ∀p r a. (p ≤ r, r ≤ p) ⇒ a →p a for example
is not judged as ambiguous because r is determined by p.
Modular Inference of Linear Types for Multiplicity-Annotated Arrows 473

Then, we replace the original simpliﬁcation with the above-deﬁned version.

Γ e : τ ; Δ; C τsimp C ; Q; θ {πα} = fuv(Q, τ θ)

p, a : fresh Γ, f : ∀pa.(Q ⇒ τ θ)[α
→ a, π
→ p] prog
Γ f = e; prog
Γ e : σ ; Δ; C Q σsimp C ∧ τ ∼ σ ; ; θ Γ, f : ∀pa.Q ⇒ τ prog
Γ f : (∀pa.Q ⇒ τ ) = e; prog

Here, the changed parts are highlighted for readability.

Example 3. Consider (Q, θ) in Sect. 3.3 such that Q = (πf ≤ πf ∧ π ≤ πx ∧ πx ≤
πx ) and θ = [αf
→ (α →π β ), π1
→ πf , β
→ (αf →πx β ), π2
→ πx , γ
→
β ]), which is obtained after simpliﬁcation of the gathered constraint. Following
Example 2, eliminating variables that are not in τ θ = (α →π β ) →πf α →πx β
yields the constraint π ≤ πx . As a result, by generalization, we obtain the
polytype
∀q pf px a b. (q ≤ px ) ⇒ (a →q b) →pf a →px b
for app , which is equivalent to the inferred type of app.

Note that (Q , θ) of Q τsimp C ; Q ; θ is no longer a solution of (Q, C)

because C can have eliminated variables. However, it is safe to use this version
when generalization takes place, because, for variables q that do not occur in τ ,
∀pqa. Q ⇒ τ and ∀pa. Q ⇒ τ have the same set of monomorphic instances, if
∃q.Q is logically equivalent to Q . Note that in this type system simpliﬁcation
happens only before (implicit) generalization takes place.

5 Extension to Local Assumptions

In this section, following OutsideIn(X) [31], we extend our system with local
assumptions, which enable us to have lets and GADTs. We focus on the treatment
of lets in this section because type inference for lets involves a linearity-speciﬁc
concern: the multiplicity of a let-bound variable.

5.1 “Let Should Not Be Generalized” for Our Case

We ﬁrst discuss that even for our case “ let should not be generalized” [31]. That
is, generalization of let sometimes results in counter-intuitive typing and conﬂicts
with the discussions so far.
Consider the following program:

h = λf.λk.let y = f (λx.k x) in 0

Suppose for simplicity that f and x have types (a →π1 b) →π2 c and a →π3 b,
respectively (here we only focus on the treatment of multiplicity). Then, f (λx.k x)
474 K. Matsuda

has type c with the constraint π3 ≤ π1 . Thus, after generalization, y has type
π3 ≤ π1 ⇒ c, where π3 and π1 are neither generalized nor eliminated because
they escape from the definition of y. As a result, h has type ∀p1 p2 p3 a b c. ((a →p1
b) →p2 c) →ω (a →p3 b) →ω Int; there is no constraint p3 ≤ p1 because the
definition of y does not yield a constraint. This nonexistence of the constraint
would be counter-intuitive because users wrote f (λx.k x) while the constraint
for the expression is not imposed. In particular, it does not cause an error even
when f : (a →1 b) →1 c and k : a →ω b, while f (λx.k x) becomes illegal for this
case. Also, if we change 0 to y, the error happens at the use site instead of the
definition site. Moreover, the type is fragile as it depends on whether y occurs or
not; for example, if we change 0 to const 0 y where const = λa.λb.a, the type of
h changes to ∀p1 p2 p3 a b c. p1 ≤ p3 ⇒ ((a →p1 b) →p2 c) →ω (a →p3 b) →ω Int.
In this discussion, we do not consider type-equality constraints, but there are no
legitimate reasons why type-equality constraints are solved on the fly in typing y.
As demonstrated in the above example, “ let should not be generalized” [30,31]
in our case. Thus, we adopt the same principle in OutsideIn(X) that let will
be generalized only if users write a type annotation for it [31]. This principle is
also adopted in GHC (as of 6.12.1 when the language option MonoLocalBinds is
turned on) with a slight relaxation to generalize closed bindings.

5.2 Multiplicity of Let-Bound Variables

Another issue with let-generalization, which is specific to linear typing, is that a
generalization result depends on the multiplicity of the let-bound variable. Let
us consider the following program, where we want to generalize the type of y
(even without a type annotation):
g = λx.let y = λf.f x in y not
Suppose for simplicity that not has type Bool →1 Bool and x has type Bool already
in typing let. Then, y’s body λf.f x has a monotype (Bool →π r) →π r with
no constraints (on multiplicity). There are two generalization results depending
on the multiplicity πy of y because the use of x also escapes in the type system.
– If πy = 1, the type is generalized into ∀q r. (Bool →π r) →q r, where π is not
generalized because the use of x in y’s body is π.
– If πy = ω, the type is generalized into ∀p q r. (Bool →p r) →q r, where π is
generalized (to p) because the use of x in y’s body is ω.
A difficulty here is that πy needs to be determined at the definition of y, while
the constraint on πy is only obtained from the use of y.
Our design choice is the latter; the multiplicity of a generalizable let-bound
variable is ω in the system. One justification for this choice is that a motivation
of polymorphic typing is to enhance reusability, while reuse is not possible for
variables with multiplicity 1. Another justification is compatibility with recursive
definitions, where recursively-defined variables must have multiplicity ω; it might
be confusing, for example, if the multiplicity of a list-manipulation function
changes after we change its definition from an explicit recursion to foldr .
Modular Inference of Linear Types for Multiplicity-Annotated Arrows 475

5.3 Inference Rule for Lets

In summary, the following are our criteria about let generalization:
– Only lets with polymorphic type annotations are generalized.
– Variables introduced by let to be generalized have multiplicity ω.
This idea can be represented by the following typing rule:
Γ e1 : τ1 ; Δ1 ; C1 {πα} = fuv(τ1 , C1 ) \ fuv(Γ )
C1 = ∃πα.(Q |=τ1 C1 ∧ τ ∼ τ1 )
Γ θ1 , x : (∀pa.Q ⇒ τ ) e2 : τ2 ; Δ2 , xM ; C2
LetA
Γ let x : (∀pa.Q ⇒ τ ) = e1 in e2 : τ2 ; ωΔ1 + Δ2 ; C1 ∧ C2
(We do not discuss non-generalizable let because they are typed as (λx.e2 ) e1 .)
Constraints like ∃πα.(Q |=τ1 C1 ∧ τ ∼ τ1 ) above are called implication con-
straints [31], which states that the entailment must hold only by instantiating
unification variables in πα. There are two roles of implication constraints. One
is to delay the checking because τ1 and C1 contain some unification variables
that will be made concrete after this point by solving C2 . The other is to guard
constraints; in the above example, since the constraints C1 ∧ τ ∼ τ1 hold by
assuming Q, it is not safe to substitute variables outside πα in solving the con-
straints because the equivalence might be a consequence of Q; recall that Q
affects type equality. We note that there is a slight deviation from the original
approach [31]; an implication constraint in our system is annotated by τ1 to
identify for which subset of {πα} the existence of a unique solution is not required
and thus quantifier elimination is possible, similarly to Sect. 4.

5.4 Solving Constraints

Now, the set of constraints is extended to include implication constraints.

ψi ::= · · · | ∃πα.(Q |=τ C)

C ::= ψi
i

As we mentioned above, an implication constraint ∃πα.(Q |=τ C) means that

Q |= C must hold by substituting π and α with appropriate values, where we do
not require uniqueness of solutions for unification variables that do not appear
in τ . That is, Q τsimp C ; ; θ must hold with dom(θ) ⊆ {πα}.
Then, following OutsideIn(X) [31], we define the solving judgment πα.Q τsolv
C ; Q ; θ, which states that we solve (Q, C) as (Q , θ) where θ only touches
variables in πα, where τ is used for disambiguation (Sect. 4). Let us write impl(C)
for all the implication constraints in C, and simpl(C) for the rest. Then, we can
define the inference rules for the judgment simply by recursive simplification,
similarly to the original [31].
πα. Q τsimpl simpl(C) ; Qr ; θ
τi
{πi αi . Q ∧ Qi ∧ Qr solv Ci ; ; θi }(∃πi αi .(Qi |=τi Ci ))∈impl(Cθ)
πα. Q τsolv C ; Qr ; θ
476 K. Matsuda

Here, πα. Q τsimpl C ; Qr ; θ is a simpliﬁcation relation deﬁned similarly to

Q τsimp C ; Qr ; θ except that we are allowed to touch only variables in πα. We
omit the concrete rules for this version of simplification relation because they are
straightforward except that unification caused by S-Uni and S-Eq and quantifier
elimination (Sect. 4) are allowed only for variables in {πα}.
Accordingly, we also change the typing rules for bindings to use the solving
relation instead of the simplification relation.
Γ e : τ ; Δ; C fuv(C, τ ). τsolv C ; Q; θ {πα} = fuv(Q, τ θ)
p, a : fresh Γ, f : ∀pa.(Q ⇒ τ θ)[α → a, π → p] prog
Γ f = e; prog
Γ e : σ ; Δ; C fuv(C, σ). Q σsolv C ∧ τ ∼ σ ; ; θ Γ, f : ∀pa.Q ⇒ τ prog
Γ f : (∀pa.Q ⇒ τ ) = e; prog
Above, there are no unification variables other than fuv(C, τ ) or fuv(C, σ).
The definition of the solving judgment and the updated inference rules for
programs are the same as those in the original OutsideIn(X) [31] except τ for
disambiguation. This is one of the advantages of being based on OutsideIn(X).

6 Implementation and Evaluation

In this section, we evaluate the proposed inference method using our prototype
implementation. We first report what types are inferred for functions from
Prelude to see whether or not inferred types are reasonably simple. We then
report the performance evaluation that measures efficiency of type inference and
the overhead due to entailment checking and quantifier elimination.

6.1 Implementation
The implementation follows the present paper except for a few points. Following
the implementation of OutsideIn(X) in GHC, our type checker keeps a natural
number, which we call an implication level, corresponding to the depth of implica-
tion constraints, and a unification variable also accordingly keeps the implication
level at which the variable is introduced. As usual, we represent unification
variables by mutable references. We perform unification on the fly by destructive
assignment, while unification of variables that have smaller implication levels than
the current level is recorded for later checking of implication constraints; such a
variable cannot be in πα of ∃πα.Q |=τ C. The implementation supports GADTs
because they can be implemented rather easily by extending constraints Q to
include type equalities, but does not support type classes because the handling
of them requires another X of OutsideIn(X).
Although we can use a linear-time Horn SAT solving algorithm [10] for
checking Q |= φ, the implementation uses a general SAT solver based on DPLL [8,
9] because the unit propagation in DPLL works efficiently for Horn formulas.
We do not use external solvers, such as Z3, as we conjecture that the sizes of
formulas are usually small, and overhead to use external solvers would be high.
Modular Inference of Linear Types for Multiplicity-Annotated Arrows 477

(◦) : (q ≤ s ∧ q ≤ t ∧ p ≤ t) ⇒ (b →q c) →r (a →p b) →s a →t c
curry : (p ≤ r ∧ p ≤ s) ⇒ ((a ⊗ b) →p c) →q a →r b →s c
uncurry : (p ≤ s ∧ q ≤ s) ⇒ (a →p b →q c) →r (a ⊗ b) →s c
either : (p ≤ r ∧ q ≤ r) ⇒ (a →p c) →ω (b →q c) →ω Either a b →r c
foldr : (q ≤ r ∧ p ≤ s ∧ q ≤ s) ⇒ (a →p b →q b) →ω b →r List a →s b
foldl : (p ≤ r ∧ r ≤ s ∧ q ≤ s) ⇒ (b →p a →q b) →ω b →r List a →s b
map : (p ≤ q) ⇒ (a →p b) →ω List a →q List b
ﬁlter : (a →p Bool) →ω List a →ω List a
append : List a →p List a →q List a
reverse : List a →p List a
concat : List (List a) →p List a
concatMap : (p ≤ q) ⇒ (a →p List b) →ω List a →q List b

Fig. 7. Inferred types for selected functions from Prelude (quantiﬁcations are omitted)

6.2 Functions from Prelude

We show how our type inference system works for some polymorphic functions
from Haskell’s Prelude. Since we have not implemented type classes and I/O
in our prototype implementation and since we can define copying or discarding
functions for concrete first-order datatypes, we focus on the unqualified poly-
morphic functions. Also, we do not consider the functions that are obviously
unrestricted, such as head and scanl , in this examination. In the implementation
of the examined functions, we use natural definitions as possible. For example, a
linear-time accumulative definition is used for reverse. Some functions can be
defined by both explicit recursions and foldr /foldl ; among the examined functions,
map, filter , concat, and concatMap can be defined by foldr , and reverse can be
defined by foldl . For such cases, both versions are tested.
Fig. 7 shows the inferred types for the examined functions. Since the inferred
types coincide for the two variations (by explicit recursions or by folds) of map,
filter , append , reverse, concat, and concatMap, the results do not refer to these
variations. Most of the inferred types look unsurprising, considering the fact that
the constraint p ≤ q is yielded usually when an input that corresponds to q is
used in an argument that corresponds to p. For example, consider foldr f e xs.
The constraint q ≤ r comes from the fact that e (corresponding to r) is passed as
the second argument of f (corresponding to q) via a recursive call. The constraint
p ≤ s comes from the fact that the head of xs (corresponding to s) is used as the
first argument of f (corresponding to p). The constraint q ≤ s comes from the
fact that the tail of xs is used in the second argument of f . A little explanation
is needed for the constraint r ≤ s in the type of foldl , where both r and s are
associated with types with the same polarity. Such constraints usually come from
recursive definitions. Consider the definition of foldl :

foldl = λf.λe.λx.case x of {Nil → e; Cons a y → foldl f (f e a) y}

Here, we ﬁnd that a, a component of x (corresponding to s), appears in the

second argument of fold (corresponding to r), which yields the constraint r ≤ s.
478 K. Matsuda

Note that the inference results do not contain →1 ; recall that there is no problem
in using unrestricted inputs linearly, and thus the multiplicity of a linear input
can be arbitrary. The results also show that the inference algorithm successfully
detected that append , reverse, and concat are linear functions.
It is true that these inferred types indeed leak some internal details into their
constraints, but those constraints can be understood only from their extensional
behaviors, at least for the examined functions. Thus, we believe that the inferred
types are reasonably simple.

6.3 Performance Evaluation

We measured the elapsed time Table 1. Experimental results

for type checking and the over- Total SAT QE
head of implication checking Program LOC Elapsed Elapsed ( #) Elapsed ( #)
funcs 40 4.3 0.70 (42) 0.086 (15)
and quantifier elimination. The
gv 53 3.9 0.091 ( 9) 0.14 (17)
following programs were exam- app1 4 0.34 0.047 ( 4) 0.012 ( 2)
ined in the experiments: funcs: app10 4 0.84 0.049 ( 4) 0.038 (21)
the functions in Fig. 7, gv: (times are measured in ms)
an implementation of a simple
communication in a session-type system GV [17] taken from [18, Section 4] with
some modifications,6 app1: a pair of the definitions of app and app , and app10:
a pair of the definitions of app and app10 = λf.λx. app . . . app f x. The former

10
two programs are intended to be miniatures of typical programs. The latter
two programs are intended to measure the overhead of quantifier elimination.
Although the examined programs are very small, they all involve the ambiguity
issues. For example, consider the following fragment of the program gv:
answer : Int = fork prf calculator $ \c -> left c & \c ->
send (MkUn 3) c & \c -> send (MkUn 4) c & \c ->
recv c & \(MkUn z, c) -> wait c & \() -> MkUn z

(Here, we used our paper’s syntax instead of that of the actual examined code.)
Here, both $ and & are operator versions of app, where the arguments are flipped
in &. As well as treatment of multiplicities, the disambiguation is crucial for this
expression to have type Int.
The experiments were conducted on a MacBook Pro (13-inch, 2017) with
Mac OS 10.14.6, 3.5 GHz Intel Core i7 CPU, and 16 GB memory. GHC 8.6.5
with -O2 was used for compiling our prototype system.
Table 1 lists the experimental results. Each elapsed time is the average of 1,000
executions for the first two programs, and 10,000 executions for the last two. All
columns are self-explanatory except for the # column, which counts the number of
6
We changed the type of fork : Dual s s →ω (Ch s →1 Ch End) →1 (Ch s →1
Un r) →1 r, as their type Dual s s ⇒ (Ch s →1 Ch End) →1 Ch s is incorrect for
the multiplicity erasing semantics. A minor difference is that we used a GADT to
witness duality because our prototype implementation does not support type classes.
Modular Inference of Linear Types for Multiplicity-Annotated Arrows 479

executions of corresponding procedures. We note that the current implementation

restricts Qw in S-Entail to be and removes redundant constraints afterward.
This is why the number of SAT solving in app1 is four instead of two. For the
artificial programs (app1 and app10), the overhead is not significant; typing
cost grows faster than SAT/QE costs. In contrast, the results for the latter two
show that SAT becomes heavy for higher-order programs (funcs), and quantifier
elimination becomes heavy for combinator-heavy programs (gv), although we
believe that the overhead would still be acceptable. We believe that, since we
are currently using naive algorithms for both procedures, there is much room
to reduce the overhead. For example, if users annotate most general types, the
simplification invokes trivial checks i φi |= φi often. Special treatment for such
cases would reduce the overhead.

7 Related Work

Borrowing the terminology from Bernardy et al. [7], there are two approaches to
linear typing: linearity via arrows and linearity via kinds. The former approaches
manage how many times an assumption (i.e., a variable) can be used; for example,
in Wadler [33]’s linear λ calculus, there are two sort of variables: linear and
unrestricted, where the latter variables can only be obtained by decomposing
let !x = e1 in e2 . Since primitive sources of assumptions are arrow types, it is nat-
ural to annotate them with arguments’ multiplicities [7, 12, 22]. For multiplicities,
we focused on 1 and ω following Linear Haskell [6, 7, 26]. Although {1, ω} would
already be useful for some domains including reversible computation [19, 35]
and quantum computation [2, 25], handling more general multiplicities, such
as {0, 1, ω} and arbitrary semirings [12], is an interesting future direction. Our
discussions in Sect. 2 and 3, similarly to Linear Haskell [7], could be extended
to more general domains with small modiﬁcations. In contrast, we rely on the
particular domains {1, ω} of multiplicities for the crucial points of our inference,
i.e., entailment checking and quantiﬁer elimination. Igarashi and Kobayashi [14]’s
linearity analysis for π calculus, which assigns input/output usage (multiplicities)
to channels, has similarity to linearity via arrows. Multiplicity 0 is important in
their analysis to identify input/output only channels. They solve constraints on
multiplicities separately in polynomial time, leveraging monotonicity of multi-
plicity operators with respect to ordering 0 ≤ 1 ≤ ω. Here, 0 ≤ 1 comes from the
fact that 1 in their system means “at-most once” instead of “exactly once”.
The “linearity via kinds” approaches distinguish types of which values are
treated linearly and types of which values are not [21,24,28], where the distinction
usually is represented by kinds [21, 28]. Interestingly, they also have two function
types—function types that belong to the linear kind and those that belong to
the unrestricted kind—because the kind of a function type cannot be determined
solely by the argument and return types. Mazurak et al. [21] use subkinding to
avoid explicit conversions from unrestricted values to linear ones. However, due
to the variations of the function types, a function can have multiple incompatible
types; e.g., the function const can have four incompatible types [24] in the system.
480 K. Matsuda

Universal types accompanied by kind abstraction [28] address the issue to some
extent; it works well for const, but still gives two incomparable types to the
function composition (◦) [24]. Morris [24] addresses this issue of principality
with qualified typing [15]. Two forms of predicates are considered in the system:
Un τ states that τ belongs to the unrestricted kind, and σ ≤ τ states that
Un σ implies Un τ . This system is considerably simple compared with the
previous systems. Turner et al. [29]’s type-based usage analysis has a similarity
to the linearity via kinds; in the system, each type is annotated by usage (a
multiplicity) as (List Intω )ω . Wansbrough and Peyton Jones [34] extends the
system to include polymorphic types and subtyping with respect to multiplicities,
and have discussions on multiplicity polymorphism. Mogensen [23] is a similar
line of work, which reduces constraint solving on multiplicities to Horn SAT.
His system concerns multiplicities {0, 1, ω} with ordering 0 ≤ 1 ≤ ω, and his
constraints can involve more operations including additions and multiplications
but only in the left-hand side of ≤.
Morris [24] uses improving substitutions [16] in generalization, which some-
times are effective for removing ambiguity, though without showing concrete
algorithms to find them. In our system, as well as S-Eq, elim(∃π.Q) can be
viewed as a systematic way to find improving substitutions. That is, elim(∃π.Q)
improves Q by substituting π with min{Mi | ω ≤ Mi ∈ Φω }, i.e., the largest
possible candidate of π. Though the largest solution is usually undesirable, espe-
of ≤ are all singletons, we can also view that
cially when the right-hand sides
elim(∃π.Q) substitutes π by μi ≤1∈Φ1 μi , i.e., the smallest possible candidate.

8 Conclusion
We designed a type inference system for a rank 1 fragment of λq→ [7] that can infer
principal types based on the qualified typing system OutsideIn(X) [31]. We
observed that naive qualified typing infers ambiguous types often and addressed
the issue based on quantifier elimination. The experiments suggested that the
proposed inference system infers principal types effectively, and the overhead
compared with unrestricted typing is acceptable, though not negligible.
Since we based our work on the inference algorithm used in GHC, the natural
expectation is to implement the system into GHC. A technical challenge to achieve
this is combining the disambiguation techniques with other sorts of constraints,
especially type classes, and arbitrarily ranked polymorphism.

Acknowledgments
We thank Meng Wang, Atsushi Igarashi, and the anonymous reviewers of ESOP
2020 for their helpful comments on the preliminary versions of this paper. This
work was partially supported by JSPS KAKENHI Grant Numbers 15H02681
and 19K11892, JSPS Bilateral Program, Grant Number JPJSBP120199913, the
Kayamori Foundation of Informational Science Advancement, and EPSRC Grant
EXHIBIT: Expressive High-Level Languages for Bidirectional Transformations
(EP/T008911/1).
Modular Inference of Linear Types for Multiplicity-Annotated Arrows 481

References

1. Aehlig, K., Berger, U., Hofmann, M., Schwichtenberg, H.: An arithmetic for non-
size-increasing polynomial-time computation. Theor. Comput. Sci. 318(1-2), 3–27
(2004). https://doi.org/10.1016/j.tcs.2003.10.023
2. Altenkirch, T., Grattage, J.: A functional quantum programming language. In:
20th IEEE Symposium on Logic in Computer Science (LICS 2005), 26-29 June
2005, Chicago, IL, USA, Proceedings. pp. 249–258. IEEE Computer Society (2005).
https://doi.org/10.1109/LICS.2005.1
3. Baillot, P., Hofmann, M.: Type inference in intuitionistic linear logic. In: Kut-
sia, T., Schreiner, W., Fernández, M. (eds.) Proceedings of the 12th Interna-
tional ACM SIGPLAN Conference on Principles and Practice of Declarative
Programming, July 26-28, 2010, Hagenberg, Austria. pp. 219–230. ACM (2010).
https://doi.org/10.1145/1836089.1836118
4. Baillot, P., Terui, K.: A feasible algorithm for typing in elementary affine
logic. In: Urzyczyn, P. (ed.) Typed Lambda Calculi and Applications, 7th In-
ternational Conference, TLCA 2005, Nara, Japan, April 21-23, 2005, Proceed-
ings. Lecture Notes in Computer Science, vol. 3461, pp. 55–70. Springer (2005).
https://doi.org/10.1007/11417170_6
5. Baillot, P., Terui, K.: Light types for polynomial time computation in lambda cal-
culus. Inf. Comput. 207(1), 41–62 (2009). https://doi.org/10.1016/j.ic.2008.08.005
6. Bernardy, J.P., Boespflug, M., Newton, R., Jones, S.P., Spiwack, A.: Linear mini-
core. GHC Developpers Wiki, https://gitlab.haskell.org/ghc/ghc/wikis/uploads/
ceaedb9ec409555c80ae5a97cc47470e/minicore.pdf, visited Oct. 14, 2019.
7. Bernardy, J., Boespflug, M., Newton, R.R., Peyton Jones, S., Spiwack, A.: Lin-
ear haskell: practical linearity in a higher-order polymorphic language. PACMPL
2(POPL), 5:1–5:29 (2018). https://doi.org/10.1145/3158093
8. Davis, M., Logemann, G., Loveland, D.W.: A machine program for theorem-proving.
Commun. ACM 5(7), 394–397 (1962). https://doi.org/10.1145/368273.368557
9. Davis, M., Putnam, H.: A computing procedure for quantification theory. J. ACM
7(3), 201–215 (1960). https://doi.org/10.1145/321033.321034
10. Dowling, W.F., Gallier, J.H.: Linear-time algorithms for testing the satisfia-
bility of propositional horn formulae. J. Log. Program. 1(3), 267–284 (1984).
https://doi.org/10.1016/0743-1066(84)90014-1
11. Gan, E., Tov, J.A., Morrisett, G.: Type classes for lightweight substructural types.
In: Alves, S., Cervesato, I. (eds.) Proceedings Third International Workshop on
Linearity, LINEARITY 2014, Vienna, Austria, 13th July, 2014. EPTCS, vol. 176,
pp. 34–48 (2014). https://doi.org/10.4204/EPTCS.176.4
12. Ghica, D.R., Smith, A.I.: Bounded linear types in a resource semiring. In: Shao,
Z. (ed.) Programming Languages and Systems - 23rd European Symposium on
Programming, ESOP 2014, Held as Part of the European Joint Conferences on
Theory and Practice of Software, ETAPS 2014, Grenoble, France, April 5-13, 2014,
Proceedings. Lecture Notes in Computer Science, vol. 8410, pp. 331–350. Springer
(2014). https://doi.org/10.1007/978-3-642-54833-8_18
13. Girard, J., Scedrov, A., Scott, P.J.: Bounded linear logic: A modular approach
to polynomial-time computability. Theor. Comput. Sci. 97(1), 1–66 (1992).
https://doi.org/10.1016/0304-3975(92)90386-T
14. Igarashi, A., Kobayashi, N.: Type reconstruction for linear -calculus with I/O sub-
typing. Inf. Comput. 161(1), 1–44 (2000). https://doi.org/10.1006/inco.2000.2872
482 K. Matsuda

15. Jones, M.P.: Qualified Types: Theory and Practice. Cambridge University Press,
New York, NY, USA (1995)
16. Jones, M.P.: Simplifying and improving qualified types. In: Williams, J. (ed.)
Proceedings of the seventh international conference on Functional programming
languages and computer architecture, FPCA 1995, La Jolla, California, USA, June
25-28, 1995. pp. 160–169. ACM (1995). https://doi.org/10.1145/224164.224198
17. Lindley, S., Morris, J.G.: A semantics for propositions as sessions. In: Vitek, J.
(ed.) Programming Languages and Systems - 24th European Symposium on Pro-
gramming, ESOP 2015, Held as Part of the European Joint Conferences on Theory
and Practice of Software, ETAPS 2015, London, UK, April 11-18, 2015. Proceed-
ings. Lecture Notes in Computer Science, vol. 9032, pp. 560–584. Springer (2015).
https://doi.org/10.1007/978-3-662-46669-8_23
18. Lindley, S., Morris, J.G.: Embedding session types in haskell. In: Main-
land, G. (ed.) Proceedings of the 9th International Symposium on Haskell,
Haskell 2016, Nara, Japan, September 22-23, 2016. pp. 133–145. ACM (2016).
https://doi.org/10.1145/2976002.2976018
19. Lutz, C.: Janus: a time-reversible language. Letter to R. Landauer. (1986), available
on: http://tetsuo.jp/ref/janus.pdf
20. Matsuda, K.: Modular inference of linear types for multiplicity-annotated arrows
(2020), http://arxiv.org/abs/1911.00268v2
21. Mazurak, K., Zhao, J., Zdancewic, S.: Lightweight linear types in system fdegree.
In: TLDI. pp. 77–88. ACM (2010)
22. McBride, C.: I got plenty o’ nuttin’. In: Lindley, S., McBride, C., Trinder, P.W., San-
nella, D. (eds.) A List of Successes That Can Change the World - Essays Dedicated
to Philip Wadler on the Occasion of His 60th Birthday. Lecture Notes in Computer
Science, vol. 9600, pp. 207–233. Springer (2016). https://doi.org/10.1007/978-3-
319-30936-1_12
23. Mogensen, T.Æ.: Types for 0, 1 or many uses. In: Clack, C., Hammond, K.,
Davie, A.J.T. (eds.) Implementation of Functional Languages, 9th International
Workshop, IFL’97, St. Andrews, Scotland, UK, September 10-12, 1997, Selected
Papers. Lecture Notes in Computer Science, vol. 1467, pp. 112–122. Springer (1997).
https://doi.org/10.1007/BFb0055427
24. Morris, J.G.: The best of both worlds: linear functional programming with-
out compromise. In: Garrigue, J., Keller, G., Sumii, E. (eds.) Proceedings of
the 21st ACM SIGPLAN International Conference on Functional Programming,
ICFP 2016, Nara, Japan, September 18-22, 2016. pp. 448–461. ACM (2016).
https://doi.org/10.1145/2951913.2951925
25. Selinger, P., Valiron, B.: A lambda calculus for quantum computation with classical
control. Mathematical Structures in Computer Science 16(3), 527–552 (2006).
https://doi.org/10.1017/S0960129506005238
26. Spiwack, A., Domínguez, F., Boespflug, M., Bernardy, J.P.: Linear types. GHC
Proposals, https://github.com/tweag/ghc-proposals/blob/linear-types2/proposals/
0000-linear-types.rst, visited Sep. 11, 2019.
27. Stuckey, P.J., Sulzmann, M.: A theory of overloading. ACM Trans. Program. Lang.
Syst. 27(6), 1216–1269 (2005). https://doi.org/10.1145/1108970.1108974
28. Tov, J.A., Pucella, R.: Practical affine types. In: Ball, T., Sagiv, M. (eds.) Proceed-
ings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming
Languages, POPL 2011, Austin, TX, USA, January 26-28, 2011. pp. 447–458. ACM
(2011). https://doi.org/10.1145/1926385.1926436
Modular Inference of Linear Types for Multiplicity-Annotated Arrows 483

29. Turner, D.N., Wadler, P., Mossin, C.: Once upon a type. In: Williams, J. (ed.)
Proceedings of the seventh international conference on Functional programming
languages and computer architecture, FPCA 1995, La Jolla, California, USA, June
25-28, 1995. pp. 1–11. ACM (1995). https://doi.org/10.1145/224164.224168
30. Vytiniotis, D., Peyton Jones, S.L., Schrijvers, T.: Let should not be gener-
alized. In: Kennedy, A., Benton, N. (eds.) Proceedings of TLDI 2010: 2010
ACM SIGPLAN International Workshop on Types in Languages Design and
Implementation, Madrid, Spain, January 23, 2010. pp. 39–50. ACM (2010).
https://doi.org/10.1145/1708016.1708023
31. Vytiniotis, D., Peyton Jones, S.L., Schrijvers, T., Sulzmann, M.: Outsidein(x)
modular type inference with local assumptions. J. Funct. Program. 21(4-5), 333–
412 (2011). https://doi.org/10.1017/S0956796811000098
32. Wadler, P.: Linear types can change the world! In: Broy, M. (ed.) Programming
concepts and methods: Proceedings of the IFIP Working Group 2.2, 2.3 Working
Conference on Programming Concepts and Methods, Sea of Galilee, Israel, 2-5
April, 1990. p. 561. North-Holland (1990)
33. Wadler, P.: A taste of linear logic. In: Borzyszkowski, A.M., Sokolowski, S. (eds.)
Mathematical Foundations of Computer Science 1993, 18th International Sym-
posium, MFCS’93, Gdansk, Poland, August 30 - September 3, 1993, Proceed-
ings. Lecture Notes in Computer Science, vol. 711, pp. 185–210. Springer (1993).
https://doi.org/10.1007/3-540-57182-5_12
34. Wansbrough, K., Peyton Jones, S.L.: Once upon a polymorphic type. In: Appel,
A.W., Aiken, A. (eds.) POPL ’99, Proceedings of the 26th ACM SIGPLAN-SIGACT
Symposium on Principles of Programming Languages, San Antonio, TX, USA, Jan-
uary 20-22, 1999. pp. 15–28. ACM (1999). https://doi.org/10.1145/292540.292545
35. Yokoyama, T., Axelsen, H.B., Glück, R.: Towards a reversible functional language.
In: Vos, A.D., Wille, R. (eds.) RC. Lecture Notes in Computer Science, vol. 7165,
pp. 14–29. Springer (2011). https://doi.org/10.1007/978-3-642-29517-1_2

Yusuke Matsushita1 , Takeshi Tsukada1 , and Naoki Kobayashi1

The University of Tokyo, Tokyo, Japan

{yskm24t,tsukada,koba}@is.s.u-tokyo.ac.jp

Abstract. Reduction to the satisﬁablility problem for constrained Horn

clauses (CHCs) is a widely studied approach to automated program veri-
fication. The current CHC-based methods for pointer-manipulating pro-
grams, however, are not very scalable. This paper proposes a novel trans-
lation of pointer-manipulating Rust programs into CHCs, which clears
away pointers and heaps by leveraging ownership. We formalize the trans-
lation for a simplified core of Rust and prove its correctness. We have
implemented a prototype verifier for a subset of Rust and confirmed the
effectiveness of our method.

1 Introduction

Reduction to constrained Horn clauses (CHCs) is a widely studied approach to

automated program verification [22,6]. A CHC is a Horn clause [30] equipped
with constraints, namely a formula of the form ϕ ⇐= ψ0 ∧ · · · ∧ ψk−1 , where ϕ
and ψ0 , . . . , ψk−1 are either an atomic formula of the form f (t0 , . . . , tn−1 ) (f is
a predicate variable and t0 , . . . , tn−1 are terms), or a constraint (e.g. a < b + 1).1
We call a finite set of CHCs a CHC system or sometimes just CHC. CHC solving
is an act of deciding whether a given CHC system S has a model, i.e. a valuation
for predicate variables that makes all the CHCs in S valid. A variety of program
verification problems can be naturally reduced to CHC solving.
For example, let us consider the following C code that defines McCarthy’s
91 function.

int mc91(int n) {
if (n > 100) return n - 10; else return mc91(mc91(n + 11));
}

Suppose that we wish to prove mc91(n) returns 91 whenever n ≤ 101 (if it ter-
minates). The wished property is equivalent to the satisﬁability of the following
CHCs, where Mc91 (n, r) means that mc91(n) returns r if it terminates.
Mc91 (n, r) ⇐= n > 100 ∧ r = n − 10

The full version of this paper is available as [47].
1
Free variables are universally quantiﬁed. Terms and variables are governed under
sorts (e.g. int, bool), which are made explicit in the formalization of § 3.

c The Author(s) 2020

P. Müller (Ed.): ESOP 2020, LNCS 12075, pp. 484–514, 2020.
https://doi.org/10.1007/978-3-030-44914-8_ 18
RustHorn: CHC-based Veriﬁcation for Rust Programs 485

Mc91 (n, r) ⇐= n ≤ 100 ∧ Mc91 (n + 11, res ) ∧ Mc91 (res , r)

r = 91 ⇐= n ≤ 101 ∧ Mc91 (n, r)
The property can be verified because this CHC system has a model:
Mc91 (n, r) :⇐⇒ r = 91 ∨ (n > 100 ∧ r = n − 10).
A CHC solver provides a common infrastructure for a variety of programming
languages and properties to be verified. There have been effective CHC solvers
[40,18,29,12] that can solve instances obtained from actual programs2 and many
program verification tools [23,37,25,28,38,60] use a CHC solver as a backend.
However, the current CHC-based methods do not scale very well for programs
using pointers, as we see in § 1.1. We propose a novel method to tackle this
problem for pointer-manipulating programs under Rust-style ownership, as we
explain in § 1.2.

1.1 Challenges in Verifying Pointer-Manipulating Programs

The standard CHC-based approach [23] for pointer-manipulating programs rep-
resents the memory state as an array, which is passed around as an argument
of each predicate (cf. the store-passing style), and a pointer as an index.
For example, a pointer-manipulating variation of the previous program
void mc91p(int n, int* r) {
if (n > 100) *r = n - 10;
else { int s; mc91p(n + 11, &s); mc91p(s, r); }
}
is translated into the following CHCs by the array-based approach:3
Mc91p(n, r, h, h ) ⇐= n > 100 ∧ h = h{r ← n − 10}
Mc91p(n, r, h, h ) ⇐= n ≤ 100 ∧ Mc91p(n + 11, s, h, h )
∧ Mc91p(h [s], r, h , h )
h [r] = 91 ⇐= n ≤ 101 ∧ Mc91p(n, r, h, h ).
Mc91p additionally takes two arrays h, h representing the (heap) memory states
before/after the call of mc91p. The second argument r of Mc91p, which corre-
sponds to the pointer argument r in the original program, is an index for the
arrays. Hence, the assignment *r = n - 10 is modeled in the ﬁrst CHC as an
update of the r-th element of the array. This CHC system has a model
Mc91p(n, r, h, h ) :⇐⇒ h [r] = 91 ∨ (n > 100 ∧ h [r] = n − 10),
which can be found by some array-supporting CHC solvers including Spacer [40],
thanks to evolving SMT-solving techniques for arrays [62,10].
However, the array-based approach has some shortcomings. Let us consider,
for example, the following innocent-looking code.4
2
For example, the above CHC system on Mc91 can be solved instantly by many CHC
solvers including Spacer [40] and HoIce [12].
3
h{r ← v} is the array made from h by replacing the value at index r with v. h[r] is
the value of array h at index r.
4
rand() is a non-deterministic function that can return any integer value.
486 Y. Matsushita et al.

bool just_rec(int* ma) {

if (rand() >= 0) return true;
int old_a = *ma; int b = rand(); just_rec(&b);
return (old_a == *ma);
}

It can immediately return true; or it recursively calls itself and checks if the
target of ma remains unchanged through the recursive call. In effect this function
does nothing on the allocated memory blocks, although it can possibly modify
some of the unused parts of the memory.
Suppose we wish to verify that just_rec never returns false. The standard
CHC-based verifier for C, SeaHorn [23], generates a CHC system like below:56
JustRec(ma, h, h , r) ⇐= h = h ∧ r = true
JustRec(ma, h, h , r) ⇐= mb = ma ∧ h = h{mb ← b}
∧ JustRec(mb, h , h , r ) ∧ r = (h[ma] == h [ma])
r = true ⇐= JustRec(ma, h, h , r)
Unfortunately the CHC system above is not satisfiable and thus SeaHorn issues
a false alarm. This is because, in this formulation, mb may not necessarily be
completely fresh; it is assumed to be different from the argument ma of the
current call, but may coincide with ma of some deep ancestor calls.7
The simplest remedy would be to explicitly specify the way of memory allo-
cation. For example, one can represent the memory state as a pair of an array h
and an index sp indicating the maximum index that has been allocated so far.
JustRec + (ma, h, sp, h , sp , r) ⇐= h = h ∧ sp = sp ∧ r = true
JustRec + (ma, h, sp, h , sp , r) ⇐= mb = sp = sp + 1 ∧ h = h{mb ← b}
JustRec + (mb, h , sp , h , sp , r ) ∧ r = (h[ma] == h [ma])
r = true ⇐= JustRec + (ma, h, sp, h , sp , r) ∧ ma ≤ sp
The resulting CHC system now has a model, but it involves quantifiers:
JustRec + (ma, h, sp, h , sp , r) :⇐⇒ r = true ∧ ∀ i ≤ sp. h[i] = h [i]
Finding quantified invariants is known to be difficult in general despite ac-
tive studies on it [41,2,36,26,19] and most current array-supporting CHC solvers
give up finding quantified invariants. In general, much more complex operations
on pointers can naturally take place, which makes the universally quantified in-
variants highly involved and hard to automatically find. To avoid complexity of
models, CHC-based verification tools [23,24,37] tackle pointers by pointer anal-
ysis [61,43]. Although it does have some effects, the current applicable scope of
pointer analysis is quite limited.

5
==, !=, >=, && denote binary operations that return boolean values.
6
We omitted the allocation for old_a for simplicity.
7
Precisely speaking, SeaHorn tends to even omit shallow address-freshness checks like
mb = ma.
RustHorn: CHC-based Veriﬁcation for Rust Programs 487

1.2 Our Approach: Leverage Rust’s Ownership System

This paper proposes a novel approach to CHC-based veriﬁcation of pointer-
manipulating programs, which makes use of ownership information to avoid an
explicit representation of the memory.

Rust-style Ownership. Various styles of ownership/permission/capability have

been introduced to control and reason about usage of pointers on programming
language design, program analysis and veriﬁcation [13,31,8,31,9,7,64,63]. In what
follows, we focus on the ownership in the style of the Rust programming language
[46,55].
Roughly speaking, the ownership system guarantees that, for each memory
cell and at each point of program execution, either (i) only one alias has the
update (write & read) permission to the cell, with any other alias having no
permission to it, or (ii) some (or no) aliases have the read permission to the cell,
with no alias having the update permission to it. In summary, when an alias
can read some data (with an update/read permission), any other alias cannot
modify the data.
As a running example, let us consider the program below, which follows
Rust’s ownership discipline (it is written in the C style; the Rust version is
presented at Example 1):
int* take_max(int* ma, int* mb) {
if (*ma >= *mb) return ma; else return mb;
}
bool inc_max(int a, int b) {
{
int* mc = take_max(&a, &b); // borrow a and b
*mc += 1;
} // end of borrow
return (a != b);
}
Figure 1 illustrates which alias has the update permission to the contents of a
and b during the execution of take_max(5,3).
A notable feature is borrow. In the running example, when the pointers &a
and &b are taken for take_max, the update permissions of a and b are temporarily
transferred to the pointers. The original variables, a and b, lose the ability to
access their contents until the end of borrow. The function take_max returns a
pointer having the update permission until the end of borrow, which justiﬁes the
update operation *mc += 1. In this example, the end of borrow is at the end of
the inner block of inc_max. At this point, the permissions are given back to the
original variables a and b, allowing to compute a != b. Note that mc can point
to a and also to b and that this choice is determined dynamically. The values of
a and b after the borrow depend on the behavior of the pointer mc.
The end of each borrow is statically managed by a lifetime. See § 2 for a more
precise explanation of ownership, borrow and lifetimes.
488 Y. Matsushita et al.

mc
ma
a

mb
b

L LL LLL LY
FDOO UHWXUQ HQGRI
take_max take_max ERUURZLQJ

Fig. 1. Values and aliases of a and b in evaluating inc_max(5,3). Each line shows
each variable’s permission timeline: a solid line expresses the update permission and a
bullet shows a point when the borrowed permission is given back. For example, b has
the update permission to its content during (i) and (iv), but not during (ii) and (iii)
because the pointer mb, created at the call of take_max, borrows b until the end of (iii).

Key Idea. The key idea of our method is to represent a pointer ma as a pair
a, a◦
of the current target value a and the target value a◦ at the end of borrow.89 This
representation employs access to the future information (it is related to prophecy
variables; see § 5). This simple idea turns out to be very powerful.
In our approach, the verification problem “Does inc_max always return true?”
is reduced to the satisfiability of the following CHCs:
TakeMax (
a, a◦ ,
b, b◦ , r) ⇐= a ≥ b ∧ b◦ = b ∧ r =
a, a◦
TakeMax (
a, a◦ ,
b, b◦ , r) ⇐= a < b ∧ a◦ = a ∧ r =
b, b◦
IncMax (a, b, r) ⇐= TakeMax (
a, a◦ ,
b, b◦ ,
c, c◦ ) ∧ c = c + 1
∧ c◦ = c ∧ r = (a◦ != b◦ )
r = true ⇐= IncMax (a, b, r).
The mutable reference ma is now represented as
a, a◦ , and similarly for mb and
mc. The first CHC models the then-clause of take_max: the return value is ma,
which is expressed as r =
a, a◦ ; in contrast, mb is released, which constrains
b◦ , the value of b at the end of borrow, to the current value b. In the clause on
IncMax , mc is represented as a pair
c, c◦ . The constraint c = c + 1 ∧ c◦ = c
models the increment of mc (in the phase (iii) in Fig. 1). Importantly, the final
check a != b is simply expressed as a◦ != b◦ ; the updated values of a/b are
available as a◦ /b◦ . Clearly, the CHC system above has a simple model.
Also, the just_rec example in § 1.1 can be encoded as a CHC system
JustRec(
a, a◦ , r) ⇐= a◦ = a ∧ r = true
JustRec(
a, a◦ , r) ⇐= mb =
b, b◦ ∧ JustRec(mb, r )
∧ a◦ = a ∧ r = (a == a0 )
8
Precisely, this is the representation of a pointer with a borrowed update permission
(i.e. mutable reference). Other cases are discussed in § 3.
9
For example, in the case of Fig. 1, when take_max is called, the pointer ma is 5, 6
and mb is 3, 3.
RustHorn: CHC-based Verification for Rust Programs 489

r = true ⇐= JustRec(
a, a◦ , r).
Now it has a simple model: JustRec(
a, a◦ , r) :⇐⇒ r = true ∧ a◦ = a. Re-
markably, arrays and quantiﬁed formulas are not required to express the model,
which allows the CHC system to be easily solved by many CHC solvers. More
advanced examples are presented in § 3.4, including one with destructive update
on a singly-linked list.

Contributions. Based on the above idea, we formalize the translation from pro-
grams to CHC systems for a core language of Rust, prove correctness (both
soundness and completeness) of the translation, and conﬁrm the eﬀectiveness
of our approach through preliminary experiments. The core language supports,
among others, recursive types. Remarkably, our approach enables us to automat-
ically verify some properties of a program with destructive updates on recursive
data types such as lists and trees.
The rest of the paper is structured as follows. In § 2, we provide a formalized
core language of Rust supporting recursions, lifetime-based ownership and recur-
sive types. In §3, we formalize our translation from programs to CHCs and prove
its correctness. In § 4, we report on the implementation and the experimental
results. In § 5 we discuss related work and in § 6 we conclude the paper.

2 Core Language: Calculus of Ownership and Reference

We formalize a core of Rust as Calculus of Ownership and Reference (COR),
whose design has been aﬀected by the safe layer of λRust in the RustBelt paper
[32]. It is a typed procedural language with a Rust-like ownership system.

2.1 Syntax
The following is the syntax of COR.
(program) Π ::= F0 · · · Fn−1
(function deﬁnition) F ::= fn f Σ {L0 : S0 · · · Ln−1 : Sn−1 }
(function signature) Σ ::= α0 , . . . , αm−1 | αa0 ≤ αb0 , . . . , αal−1 ≤ αbl−1
(x0 : T0 , . . . , xn−1 : Tn−1 ) → U
(statement) S ::= I; goto L | return x
| match ∗x {inj0 ∗y0 → goto L0 , inj1 ∗y1 → goto L1 }
(instruction) I ::= let y = mutborα x | drop x | immut x | swap(∗x, ∗y)
| let ∗y = x | let y = ∗x | let ∗y = copy ∗x | x as T
| let y = f α0 , . . . , αm−1 (x0 , . . . , xn−1 )
| intro α | now α | α ≤ β
| let ∗y = const | let ∗y = ∗x op ∗x | let ∗y = rand()
| let ∗y = injTi 0+T1 ∗x | let ∗y = (∗x0 , ∗x1 ) | let (∗y0 , ∗y1 ) = ∗x
(type) T, U ::= X | μX.T | P T | T0 +T1 | T0 ×T1 | int | unit
(pointer kind) P ::= own | Rα (reference kind) R ::= mut | immut
490 Y. Matsushita et al.

α, β, γ ::= (lifetime variable) X, Y ::= (type variable)

x, y ::= (variable) f, g ::= (function name) L ::= (label)
const ::= n | () bool := unit + unit op ::= op int | op bool
op int ::= + | − | · · · op bool ::= >= | == | != | · · ·

Program, Function and Label. A program (denoted by Π) is a set of function

definitions. A function definition (F ) consists of a function name, a function
signature and a set of labeled statements (L: S). In COR, for simplicity, the
input/output types of a function are restricted to pointer types. A function is
parametrized over lifetime parameters under constraints; polymorphism on types
is not supported for simplicity, just as λRust . For the lifetime parameter receiver,
often
α0 , · · · | is abbreviated to
α0 , . . . and
| is omitted.
A label (L) is an abstract program point to be jumped to by goto.10 Each
label is assigned a whole context by the type system, as we see later. This style,
with unstructured control flows, helps the formal description of CHCs in § 3.2. A
function should have the label entry (entry point), and every label in a function
should be syntactically reachable from entry by goto jumps.11

Statement and Instruction. A statement (S) performs an instruction with a jump

(I; goto L), returns from a function (return x), or branches (match ∗x {· · ·}).
An instruction (I) performs an elementary operation: mutable (re)borrow
(let y = mutborα x), releasing a variable (drop x), weakening ownership (immut
x),12 swap (swap(∗x, ∗y)), creating/dereferencing a pointer (let ∗y = x, let y =
∗x), copy (let ∗y = copy ∗x),13 type weakening (x as T ), function call (let y =
f
· · ·(· · ·)), lifetime-related ghost operations (intro α, now α, α ≤ β; explained
later), getting a constant / operation result / random integer (let ∗y = const /
∗xop∗x / rand()), creating a variant (let ∗y = injTi 0+T1 ∗x), and creating/destruct-
ing a pair (let ∗y = (∗x0 , ∗x1 ), let (∗y0 , ∗y1 ) = ∗x). An instruction of form
let ∗y = · · · implicitly allocates new memory cells as y; also, some instruc-
tions deallocate memory cells implicitly. For simplicity, every variable is de-
signed to be a pointer and every release of a variable should be explicitly an-
notated by ‘drop x’. In addition, we provide swap instead of assignment; the
usual assignment (of copyable data from ∗x to ∗y) can be expressed by let ∗x =
copy ∗x; swap(∗y, ∗x ); drop x .

Type. As a type (T ), we support recursive types (μX.T ), pointer types (P T ),

variant types (T0 + T1 ), pair types (T0 × T1 ) and basic types (int, unit).
A pointer type P T can be an owning pointer own T (Box<T> in Rust), muta-
ble reference mutα T (&'a mut T) or immutable reference immutα T (&'a T). An
10
It is related to a continuation introduced by letcont in λRust .
11
Here ‘syntactically’ means that detailed information such that a branch condition
on match or non-termination is ignored.
12
This instruction turns a mutable reference to an immutable reference. Using this, an
immutable borrow from x to y can be expressed by let y = mutborα x; immut y.
13
Copying a pointer (an immutable reference) x to y can be expressed by let ∗ox =
x; let ∗oy = copy ∗ox ; let y = ∗oy.
RustHorn: CHC-based Veriﬁcation for Rust Programs 491

owning pointer has data in the heap memory, can freely update the data (un-
less it is borrowed), and has the obligation to clean up the data from the heap
memory. In contrast, a mutable/immutable reference (or unique/shared refer-
ence) borrows an update/read permission from an owning pointer or another
reference with the deadline of a lifetime α (introduced later). A mutable ref-
erence cannot be copied, while an immutable reference can be freely copied. A
reference loses the permission at the time when it is released.14
A type T that appears in a program (not just as a substructure of some type)
should satisfy the following condition (if it holds we say the type is complete):
every type variable X in T is bound by some μ and guarded by a pointer con-
structor (i.e. given a binding of form μX.U , every occurrence of X in U is a part
of a pointer type, of form P U ).

Lifetime. A lifetime is an abstract time point in the process of computation,15

which is statically managed by lifetime variables α. A lifetime variable can be a
lifetime parameter that a function takes or a local lifetime variable introduced
within a function. We have three lifetime-related ghost instructions: intro α in-
troduces a new local lifetime variable, now α sets a local lifetime variable to
the current moment and eliminates it, and α ≤ β asserts the ordering on local
lifetime variables.

Expressivity and Limitations. COR can express most borrow patterns in the
core of Rust. The set of moments when a borrow is active forms a continuous
time range, even under non-lexical lifetimes [54].16
A major limitation of COR is that it does not support unsafe code blocks and
also lacks type traits and closures. Still, our idea can be combined with unsafe
code and closures, as discussed in §3.5. Another limitation of COR is that, unlike
Rust and λRust , we cannot directly modify/borrow a fragment of a variable (e.g.
an element of a pair). Still, we can eventually modify/borrow a fragment by
borrowing the whole variable and splitting pointers (e.g. ‘let (∗y0 , ∗y1 ) = ∗x’).
This borrow-and-split strategy, nevertheless, yields a subtle obstacle when we
extend the calculus for advanced data types (e.g. get_default in ‘Problem Case
#3’ from [54]). For future work, we pursue a more expressive calculus modeling
Rust and extend our veriﬁcation method to it.
Example 1 (COR Program). The following program expresses the functions
take_max and inc_max presented in § 1.2. We shorthand sequential executions
14
In Rust, even after a reference loses the permission and the lifetime ends, its address
data can linger in the memory, although dereferencing on the reference is no longer
allowed. We simplify the behavior of lifetimes in COR.
15
In the terminology of Rust, a lifetime often means a time range where a borrow is
active. To simplify the discussions, however, we in this paper use the term lifetime
to refer to a time point when a borrow ends.
16
Strictly speaking, this property is broken by recently adopted implicit two-phase
borrows [59,53]. However, by shallow syntactical reordering, a program with implicit
two-phase borrows can be ﬁt into usual borrow patterns.
492 Y. Matsushita et al.

by ‘;L ’ (e.g. L0 : I0 ;L1 I1 ; goto L2 stands for L0 : I0 ; goto L1 L1 : I1 ; goto L2 ).17

fn take-max α (ma: mutα int, mb: mutα int) → mutα int {
entry: let ∗ord = ∗ma >= ∗mb;L1 match ∗ord {inj1 ∗ou → goto L2, inj0 ∗ou → goto L5}
L2: drop ou;L3 drop mb;L4 return ma L5: drop ou;L6 drop ma;L7 return mb
}
fn inc-max(oa: own int, ob: own int) → own bool {
entry: intro α;L1 let ma = mutborα oa;L2 let mb = mutborα ob;L3
let mc = take-maxα(ma, mb);L4 let ∗o1 = 1;L5 let ∗oc = ∗mc + ∗o1 ;L6 drop o1 ;L7
swap(mc, oc );L8 drop oc ;L9 drop mc;L10 now α;L11 let ∗or = ∗oa != ∗ob;L12
drop oa;L13 drop ob;L14 return or
}

In take-max, conditional branching is performed by match and its goto directions

(at L1). In inc-max, increment on the mutable reference mc is performed by
calculating the new value (at L4, L5) and updating the data by swap (at L7).
The following is the corresponding Rust program, with ghost annotations
(marked italic and dark green, e.g. drop ma ) on lifetimes and releases of mutable
references.

fn take_max<'a>(ma: &'a mut i32, mb: &'a mut i32) -> &'a mut i32 {
if *ma >= *mb { drop mb; ma } else { drop ma; mb }
}
fn inc_max(mut a: i32, mut b: i32) -> bool {
{ intro 'a;
let mc = take_max<'a> (&'a mut a, &'a mut b); *mc += 1;
drop mc; now 'a; }
a != b
}

2.2 Type System

The type system of COR assigns to each label a whole context (Γ, A). We deﬁne
below the whole context and the typing judgments.

Context. A variable context Γ is a ﬁnite set of items of form x:a T , where T

should be a complete pointer type and a (which we call activeness) is of form
‘active’ or ‘†α’ (frozen until lifetime α). We abbreviate x:active T as x: T . A
variable context should not contain two items on the same variable. A lifetime
context A = (A, R) is a finite preordered set of lifetime variables, where A is the
underlying set and R is the preorder. We write |A| and ≤A to refer to A and R.
Finally, a whole context (Γ, A) is a pair of a variable context Γ and a lifetime
context A such that every lifetime variable in Γ is contained in A.
17
The first character of each variable indicates the pointer kind (o/m corresponds to
own/mutα ). We swap the branches of the match statement in take-max, to fit the
order to C/Rust’s if.
RustHorn: CHC-based Verification for Rust Programs 493

Notations. The set operation A + B (or more generally λ Aλ ) denotes the

disjoint union, i.e. the union defined only if the arguments are disjoint. The set
operation A − B denotes the set difference defined only if A ⊇ B. For a natural
number n, [n] denotes the set {0, . . . , n−1}.
Generally, an auxiliary definition for a rule can be presented just below,
possibly in a dotted box.

Program and Function. The rules for typing programs and functions are pre-
sented below. They assign to each label a whole context (Γ, A). ‘S:Π,f (Γ, A) |
(ΓL , AL )L | U ’ is explained later.
for any F in Π, F :Π (Γname(F ),L , Aname(F ),L )L∈LabelF
Π: (Γf,L , Af,L )(f,L) ∈ FnLabelΠ
name(F ): the function name of F LabelF : the set of labels in F
FnLabelΠ : the set of pairs (f, L) such that a function f in Π has a label L
F = fn f α0 , . . . , αm−1 | αa0 ≤ αb0 , . . . , αal−1 ≤ αbl−1 (x0 : T0 , . . . , xn−1 : Tn−1 ) → U {· · ·}
+
Γentry = {xi : Ti | i ∈ [n]} A = {αj | j ∈ [m]} Aentry = A, IdA ∪{(αak , αbk ) | k ∈ [l]}
for any L : S ∈ LabelStmtF , S:Π,f (ΓL , AL ) | (ΓL , AL )L∈LabelF | U
F :Π (ΓL , AL )L∈LabelF
LabelStmtF : the set of labeled statements in F
IdA : the identity relation on A R+ : the transitive closure of R

On the rule for the function, the initial whole context at entry is speciﬁed
(the second and third preconditions) and also the contexts for other labels are
checked (the fourth precondition). The context for each label (in each function)
can actually be determined in the order by the distance in the number of goto
jumps from entry, but that order is not very obvious because of unstructured
control ﬂows.

Statement. ‘S:Π,f (Γ, A) | (ΓL , AL )L | U ’ means that running the statement S

(under Π, f ) with the whole context (Γ, A) results in a jump to a label with the
whole contexts speciﬁed by (ΓL , AL )L or a return of data of type U . Its rules
are presented below. ‘I:Π,f (Γ, A) → (Γ , A )’ is explained later.
I:Π,f (Γ, A) → (ΓL0 , AL0 ) Γ = {x: U } |A| = Aex Π,f
I; goto L0 :Π,f (Γ, A) | (ΓL , AL )L | U return x:Π,f (Γ, A) | (ΓL , AL )L | U
Aex Π,f : the set of lifetime parameters of f in Π
x: P (T0 +T1 ) ∈ Γ
for i = 0, 1, (ΓLi , ALi ) = (Γ−{x: P (T0 +T1 )}+{yi : P Ti }, A)
match ∗x {inj0 ∗y0 → goto L0 , inj1 ∗y1 → goto L1 }:Π,f (Γ, A) | (ΓL , AL )L | U

The rule for the return statement ensures that there remain no extra variables
and local lifetime variables.

Instruction. ‘I:Π,f (Γ, A) → (Γ , A )’ means that running the instruction I (un-
der Π, f ) updates the whole context (Γ, A) into (Γ , A ). The rules are designed
so that, for any I, Π, f , (Γ, A), there exists at most one (Γ , A ) such that
494 Y. Matsushita et al.

I:Π,f (Γ, A) → (Γ , A ) holds. Below we present some of the rules; the complete
rules are presented in the full paper. The following is the typing rule for mutable
(re)borrow.
α∈
/ Aex Π,f P = own, mutα for any β ∈ LifetimeP T , α ≤A β
let y = mutborα x :Π,f (Γ+{x: P T }, A) → (Γ+{y: mutα T, x:†α P T }, A)
LifetimeT : the set of lifetime variables occurring in T

After you mutably (re)borrow an owning pointer / mutable reference x until α, x

is frozen until α. Here, α should be a local lifetime variable18 (the ﬁrst precondi-
tion) that does not live longer than the data of x (the third precondition). Below
are the typing rules for local lifetime variable introduction and elimination.

intro α :Π,f Γ, (A, R) → Γ, ({α}+A, {α}×({α}+Aex Π,f )+R)
α∈ / Aex Π,f

now α :Π,f Γ, ({α}+A, R) → {thawα (x:a T ) | x:a T ∈ Γ}, (A, {(β, γ) ∈ R | β = α})

x: T (a = †α)
thawα (x:a T ) :=
x:a T (otherwise)

On intro α, it just ensures the new local lifetime variable to be earlier than
any lifetime parameters (which are given by exterior functions). On now α, the
variables frozen with α get active again. Below is the typing rule for dereference
of a pointer to a pointer, which may be a bit interesting.
let y = ∗x :Π,f (Γ+{x: P P T }, A) → (Γ+{y: (P ◦P ) T }, A)

mut (R = R = mut)
P ◦ own = own ◦ P := P Rα ◦ Rβ := Rα where R =
immut (otherwise)

The third precondition of the typing rule for mutbor justiﬁes taking just α in
the rule ‘Rα ◦ Rβ := Rα

’.

Let us interpret Π: (Γf,L , Af,L )(f,L) ∈ FnLabelΠ as “the program Π has the
type (Γf,L , Af,L )(f,L) ∈ FnLabelΠ ”. The type system ensures that any program
has at most one type (which may be a bit unclear because of unstructured
control ﬂows). Hereinafter, we implicitly assume that a program has a type.

2.3 Concrete Operational Semantics

We introduce for COR concrete operational semantics, which handles a concrete
model of the heap memory.
The basic item, concrete configuration C, is defined as follows.
S ::= end [f, L] x, F; S (concrete configuration) C ::= [f, L] F; S | H

Here, H is a heap, which maps addresses (represented by integers) to integers

(data). F is a concrete stack frame, which maps variables to addresses. The stack
18
In COR, a reference that lives after the return from the function should be cre-
ated by splitting a reference (e.g. ‘let (∗y0 , ∗y1 ) = ∗x’) given in the inputs; see also
Expressivity and Limitations.
RustHorn: CHC-based Veriﬁcation for Rust Programs 495

part of C is of form ‘[f, L] F; [f , L ] x, F ; · · · ; end’ (we may omit the terminator

‘; end’). [f, L] on each stack frame indicates the program point. ‘x,’ on each non-
top stack frame is the receiver of the value returned by the function call.
Concrete operational semantics is characterized by the one-step transition
relation C →Π C and the termination relation finalΠ (C), which can be de-
fined straightforwardly. Below we show the rules for mutable (re)borrow, swap,
function call and return from a function; the complete rules and an example
execution are presented in the full paper. SΠ,f,L is the statement for the label
L of the function f in Π. TyΠ,f,L (x) is the type of variable x at the label.
SΠ,f,L = let y = mutborα x; goto L F(x) = a
[f, L] F; S | H →Π [f, L ] F+{(y, a)}; S | H
SΠ,f,L = swap(∗x, ∗y); goto L TyΠ,f,L (x) = P T F(x) = a F(y) = b
[f, L] F; S | H+{(a+k, mk ) | k ∈ [#T ]}+{(b+k, nk ) | k ∈ [#T ]}
→Π [f, L ] F; S | H+{(a+k, nk ) | k ∈ [#T ]}+{(b+k, mk ) | k ∈ [#T ]}
SΠ,f,L = let y = g· · ·(x0 , . . . , xn−1 ); goto L
ΣΠ,g = · · ·(x0 : T0 , . . . , xn−1 : Tn−1 ) → U
[f, L] F+{(xi , ai ) | i ∈ [n]}; S | H →Π [g, entry] {(xi , ai ) | i ∈ [n]}; [f, L] y, F; S | H
SΠ,f,L = return x
[f, L] {(x, a)}; [g, L ] x , F ; S | H →Π [g, L ] F +{(x , a)}; S | H
SΠ,f,L = return x

finalΠ [f, L] {(x, a)} | H

Here we introduce ‘#T ’, which represents how many memory cells the type T
takes (at the outermost level). #T is deﬁned for every complete type T , because
every occurrence of type variables in a complete type is guarded by a pointer
constructor.
#(T0 +T1 ) := 1 + max{#T0 , #T1 } #(T0 ×T1 ) := #T0 + #T1
# μX.T := # T [μX.T /X] # int = # P T := 1 # unit = 0

3 CHC Representation of COR Programs

To formalize the idea discussed in § 1, we give a translation from COR programs
to CHC systems, which precisely characterize the input-output relations of the
COR programs. We first define the logic for CHCs (§ 3.1). We then formally
describe our translation (§3.2) and prove its correctness (§3.3). Also, we examine
effectiveness of our approach with advanced examples (§ 3.4) and discuss how
our idea can be extended and enhanced (§ 3.5).

3.1 Multi-sorted Logic for Describing CHCs

To begin with, we introduce a ﬁrst-order multi-sorted logic for describing the
CHC representation of COR programs.
496 Y. Matsushita et al.

Syntax. The syntax is deﬁned as follows.

(CHC) Φ ::= ∀ x0 : σ0 , . . . , xm−1 : σm−1 . ϕ̌ ⇐= ψ0 ∧ · · · ∧ ψn−1
:= the nullary conjunction of formulas
(formula) ϕ, ψ ::= f (t0 , . . . , tn−1 ) (elementary formula) ϕ̌ ::= f (p0 , . . . , pn−1 )
(term) t ::= x | t | t∗ , t◦ | inji t | (t0 , t1 ) | ∗t | ◦t | t.i | const | t op t
(value) v, w ::= v | v∗ , v◦ | inji v | (v0 , v1 ) | const
(pattern) p, q ::= x | p | p∗ , p◦ | inji p | (p0 , p1 ) | const
(sort) σ, τ ::= X | μX.σ | C σ | σ0 + σ1 | σ0 × σ1 | int | unit
(container kind) C ::= box | mut const ::= same as COR op ::= same as COR
bool := unit + unit true := inj1 () false := inj0 ()
X ::= (sort variable) x, y ::= (variable) f ::= (predicate variable)

We introduce box σ and mut σ, which correspond to own T /immutα T and

mutα T respectively.
t/
t∗ , t◦ is the constructor for box σ/mut σ. ∗t takes the
body/first value of
−/
−,− and ◦t takes the second value of
−,−. We restrict
the form of CHCs here to simplify the proofs later. Although the logic does not
have a primitive for equality, we can define the equality in a CHC system (e.g.
by adding ∀ x: σ. Eq(x, x) ⇐= ).
A CHC system (Φ, Ξ) is a pair of a finite set of CHCs Φ = {Φ0 , . . . , Φn−1 }
and Ξ, where Ξ is a finite map from predicate variables to tuples of sorts (denoted
by Ξ), specifying the sorts of the input values. Unlike the informal description
in § 1, we add Ξ to a CHC system.

Sort System. ‘t:Δ σ’ (the term t has the sort σ under Δ) is deﬁned as follows.
Here, Δ is a ﬁnite map from variables to sorts. σ ∼ τ is the congruence on sorts
induced by μX.σ ∼ σ[μX.σ/X].
Δ(x) = σ t:Δ σ t∗ , t ◦ :Δ σ t:Δ σi t0 : Δ σ 0 t1 : Δ σ 1
x:Δ σ t:Δ box σ t∗ , t◦ :Δ mut σ inji t:Δ σ0 + σ1 (t0 , t1 ):Δ σ0 × σ1
t:Δ C σ t:Δ mut σ t:Δ σ0 + σ1 t, t :Δ int t:Δ σ σ ∼ τ
const:Δ σconst
∗t:Δ σ ◦t:Δ σ t.i:Δ σi t op t :Δ σop t:Δ τ
σconst : the sort of const σop : the output sort of op

‘wellSortedΔ,Ξ (ϕ)’ and ‘wellSortedΞ (Φ)’, the judgments on well-sortedness

of formulas and CHCs, are deﬁned as follows.
Ξ(f ) = (σ0 , . . . , σn−1 ) for any i ∈ [n], ti :Δ σi
wellSortedΔ,Ξ (f (t0 , . . . , tn−1 ))
Δ = {(xi , σi ) | i ∈ [m]} wellSortedΔ,Ξ (ϕ̌) for any j ∈ [n], wellSortedΔ,Ξ (ψj )

wellSortedΞ ∀x0 : σ0 , . . . , xm−1 : σm−1 . ϕ̌ ⇐= ψ0 ∧ · · · ∧ ψn−1

The CHC system (Φ, Ξ) is said to be well-sorted if wellSortedΞ (Φ) holds for any
Φ ∈ Φ.

Semantics. ‘[[t]]I ’, the interpretation of the term t as a value under I, is deﬁned

as follows. Here, I is a finite map from variables to values. Although the definition
RustHorn: CHC-based Verification for Rust Programs 497

is partial, the interpretation is deﬁned for all well-sorted terms.

[[x]]I := I(x) [[t]]I := [[t]]I [[t∗ , t◦ ]]I := [[t∗ ]]I , [[t◦ ]]I [[inji t]]I := inji [[t]]I

v ([[t]]I = v)
[[(t0 , t1 )]]I := ([[t0 ]]I , [[t1 ]]I ) [[∗t]]I := [[◦t]]I := v◦ if [[t]]I = v∗ , v◦
v∗ ([[t]]I = v∗ , v◦ )
[[t.i]]I := vi if [[t]]I = (v0 , v1 ) [[const]]I := const [[t op t ]]I := [[t]]I [[op]][[t ]]I
[[op]]: the binary operation on values corresponding to op

A predicate structure M is a ﬁnite map from predicate variables to (concrete)

predicates on values. M, I |= f (t0 , . . . , tn−1 ) means that M(f )([[t0 ]]I , . . . , [[tm−1 ]]I )
holds. M |= Φ is deﬁned as follows.
for any I s.t. ∀ i ∈ [m]. I(xi ):∅ σi , M, I |= ψ0 , . . . , ψn−1 implies M, I |= ϕ̌
M |= ∀ x0 : σ0 , . . . , xm−1 : σm−1 . ϕ̌ ⇐= ψ0 ∧ · · · ∧ ψn−1

Finally, M |= (Φ, Ξ) is deﬁned as follows.

for any (f, (σ0 , . . . , σn−1 )) ∈ Ξ, M(f ) is a predicate on values of sort σ0 , . . . , σn−1
dom M = dom Ξ for any Φ ∈ Φ, M |= Φ
M |= (Φ, Ξ)

When M |= (Φ, Ξ) holds, we say that M is a model of (Φ, Ξ). Every well-
sorted CHC system (Φ, Ξ) has the least model on the point-wise ordering (which
can be proved based on the discussions in [16]), which we write as Mleast
(Φ,Ξ) .

3.2 Translation from COR Programs to CHCs

Now we formalize our translation of Rust programs into CHCs. We deﬁne (|Π|),
which is a CHC system that represents the input-output relations of the functions
in the COR program Π.
Roughly speaking, the least model Mleast (|Π|) for this CHC system should sat-
isfy: for any values v0 , . . . , vn−1 , w, M(|Π|) |= fentry (v0 , . . . , vn−1 , w) holds exactly
least

if, in COR, a function call f (v0 , . . . , vn−1 ) can return w. Actually, in concrete
operational semantics, such values should be read out from the heap memory.
The formal description and proof of this expected property is presented in § 3.3.

Auxiliary Definitions. The sort corresponding to the type T , (|T |), is defined
as follows. P̌ is a meta-variable for a non-mutable-reference pointer kind, i.e.
own or immutα . Note that the information on lifetimes is all stripped off.
(|X|) := X (|μX.T |) = μX.(|T |) (|P̌ T |) := box (|T |) (|mutα T |) := mut (|T |)
(|int|) := int (|unit|) := unit (|T0 +T1 |) := (|T0 |) + (|T1 |) (|T0 ×T1 |) := (|T0 |) × (|T1 |)

We introduce a special variable res to represent the result of a function.19 For

a label L in a function f in a program Π, we define ϕ̌Π,f,L , ΞΠ,f,L and ΔΠ,f,L
19
For simplicity, we assume that the parameters of each function are sorted respecting
some fixed order on variables (with res coming at the last), and we enumerate various
items in this fixed order.
498 Y. Matsushita et al.

as follows, if the items in the variable context for the label are enumerated as
x0 :a0 T0 , . . . , xn−1 :an−1 Tn−1 and the return type of the function is U .
ϕ̌Π,f,L := fL (x0 , . . . , xn−1 , res) ΞΠ,f,L := ((|T0 |), . . . , (|Tn−1 |), (|U |))
ΔΠ,f,L := {(xi , (|Ti |)) | i ∈ [n]} + {(res, (|U |))}
∀(Δ) stands for ∀ x0 : σ0 , . . . , xn−1 : σn−1 , where the items in Δ are enumerated
as (x0 , σ0 ), . . . , (xn−1 , σn−1 ).

CHC Representation. Now we introduce ‘(|L: S|)Π,f ’, the set (in most cases,
singleton) of CHCs modeling the computation performed by the labeled state-
ment L: S in f from Π. Unlike informal descriptions in § 1, we turn to pattern
matching instead of equations, to simplify the proofs. Below we show some of the
rules; the complete rules are presented in the full paper. The variables marked
green (e.g. x◦ ) should be fresh. The following is the rule for mutable (re)borrow.
(|L: let y = mutborα x; goto L |)Π,f
⎧
⎪
⎪ ∀(ΔΠ,f,L +{(x◦ , (|T |))}).
⎨ ϕ̌Π,f,L ⇐= ϕ̌ (TyΠ,f,L (x) = own T )
Π,f,L [∗x, x◦ /y, x◦ /x]
:=
⎪
⎪ ∀(Δ Π,f,L +{(x ◦ , (|T |))}).
⎩ (TyΠ,f,L (x) = mutα T )
ϕ̌Π,f,L ⇐= ϕ̌Π,f,L [∗x, x◦ /y, x◦ , ◦x/x]
The value at the end of borrow is represented as a newly introduced variable x◦ .
Below is the rule for release of a variable.
drop x; goto L |)Π,f
(|L: ⎧

⎪
⎨ ∀(ΔΠ,f,L ). ϕ̌Π,f,L ⇐= ϕ̌Π,f,L (TyΠ,f,L (x) = P̌ T )

:= ∀(ΔΠ,f,L −{(x, mut (|T |))}+{(x∗ , (|T |))}).
⎪
⎩ ϕ̌ (TyΠ,f,L (x) = mutα T )
Π,f,L [x∗ , x∗ /x] ⇐= ϕ̌Π,f,L

When a variable x of type mutα T is dropped/released, we check the prophesied

value at the end of borrow. Below is the rule for a function call.
(|L: let y = g· · ·(x0 , . . . , xn−1 ); goto L |)Π,f
:= {∀(ΔΠ,f,L +{(y, (|TyΠ,f,L (y)|))}). ϕ̌Π,f,L ⇐= gentry (x0 , . . . , xn−1 , y) ∧ ϕ̌Π,f,L }
The body (the right-hand side of ⇐= ) of the CHC contains two formulas, which
yields a kind of call stack at the level of CHCs. Below is the rule for a return
from a function.

(|L: return x|)Π,f := ∀(ΔΠ,f,L ). ϕ̌Π,f,L [x/res] ⇐=
The variable res is forced to be equal to the returned variable x.
Finally, (|Π|), the CHC system that represents the COR program Π (or the
CHC representation of Π), is deﬁned as follows.

(|Π|) := F in Π, L:S ∈ LabelStmtF (|L: S|)Π,nameF , (ΞΠ,f,L )fL s.t. (f,L) ∈ FnLabelΠ

Example 2 (CHC Representation). We present below the CHC representation

of take-max described in § 2.1. We omit CHCs on inc-max here. We have also
excluded the variable binders ‘∀ · · ·’.20
take-maxentry (ma, mb, res) ⇐= take-maxL1 (ma, mb, ∗ma >=∗mb, res)
20
The sorts of the variables are as follows: ma, mb, res: mut int; ma ∗ , mb ∗ : int; ou: box unit.
RustHorn: CHC-based Veriﬁcation for Rust Programs 499

take-maxL1 (ma, mb, inj1 ∗ou, res) ⇐= take-maxL2 (ma, mb, ou, res)
take-maxL1 (ma, mb, inj0 ∗ou, res) ⇐= take-maxL5 (ma, mb, ou, res)
take-maxL2 (ma, mb, ou, res) ⇐= take-maxL3 (ma, mb, res)
take-maxL3 (ma, mb ∗ ,mb ∗ , res) ⇐= take-maxL4 (ma, res)
take-maxL4 (ma, ma) ⇐=
take-maxL5 (ma, mb, ou, res) ⇐= take-maxL6 (ma, mb, res)
take-maxL6 (ma ∗ ,ma ∗ , mb, res) ⇐= take-maxL7 (mb, res)
take-maxL7 (mb, mb) ⇐=
The ﬁfth and eighth CHC represent release of mb/ma. The sixth and ninth CHC
represent the determination of the return value res.

3.3 Correctness of the CHC Representation

Now we formally state and prove the correctness of the CHC representation.

Notations. We use {|· · ·|} (instead of {·· · }) for the intensional description of
a multiset. A ⊕ B (or more generally λ Aλ ) denotes the multiset sum (e.g.
{|0, 1|} ⊕ {|1|} = {|0, 1, 1|} = {|0, 1|}).

Readout and Safe Readout. We introduce a few judgments to formally de-

scribe how read out data from the heap.
First, the judgment ‘readoutH (∗a :: T | v; M)’ (the data at the address a of
type T can be read out from the heap H as the value v, yielding the memory
footprint M) is defined as follows.21 Here, a memory footprint M is a finite
multiset of addresses, which is employed for monitoring the memory usage.
H(a) = a readoutH (∗a :: T | v; M) readoutH (∗a :: T [μX.T /X] | v; M)
readoutH (∗a: own T | v; M⊕{|a|}) readoutH (∗a :: μX.T /X | v; M)
H(a) = n
readoutH (∗a :: unit | (); ∅)
readoutH (∗a :: int | n; {|a|})
H(a) = i ∈ [2] for any k ∈ [(#T1−i −#Ti )≥0 ], H(a+1+#Ti +k) = 0
readoutH (∗(a+1) :: Ti | v; M)

readoutH ∗a :: T0 +T1 | inji v; M⊕{|a|}⊕{|a+1+#Ti +k | k ∈ [(#T1−i −#Ti )≥0 ]|}
(n)≥0 := max{n, 0}

readoutH ∗a :: T0 | v0 ; M0 readoutH ∗(a+#T0 ) :: T1 | v1 ; M1

readoutH ∗a :: T0 ×T1 | (v0 , v1 ); M0 ⊕M1 )
For example, ‘readout{(100,7),(101,5)} (∗100 :: int × int | (7, 5); {|100, 101|})’ holds.
Next, ‘readoutH (F :: Γ | F; M)’ (the data of the stack frame F respecting
the variable context Γ can be read out from H as F, yielding M) is defined as
follows. dom Γ stands for {x | x:a T ∈ Γ}.
dom F = dom Γ for any x: own T ∈ Γ, readoutH (∗F(x) :: T | vx ; Mx )

readoutH (F :: Γ | {(x, vx ) | x ∈ dom F}; x∈dom F Mx )
21
Here we can ignore mutable/immutable references, because we focus on what we call
simple functions, as explained later.
500 Y. Matsushita et al.

Finally, ‘safeH (F :: Γ | F)’ (the data of F respecting Γ can be safely read

out from H as F) is deﬁned as follows.

readoutH (F :: Γ | F ; M) M has no duplicate items

safeH (F :: Γ | F )

Here, the ‘no duplicate items’ precondition checks the safety on the ownership.

COS-based Model. Now we introduce the COS-based model (COS stands for
concrete operational semantics) fΠ COS
to formally describe the expected input-
output relation. Here, for simplicity, f is restricted to one that does not take
lifetime parameters (we call such a function simple; the input/output types
of a simple function cannot contain references). We deﬁne fΠ COS
as the pred-
icate (on values of sorts (|T0 |), . . . , (|Tn−1 |), (|U |) if f ’s input/output types are
T0 , . . . , Tn−1 , U ) given by the following rule.

C0 →Π · · · →Π CN ﬁnal Π (CN ) C0 =[f, entry] F | H CN = [f, L] F | H

safeH F :: ΓΠ,f,entry {(xi , vi ) | i ∈ [n]} safeH F :: ΓΠ,f,L {(y, w)}
fΠCOS
(v0 , . . . , vn−1 , w)
ΓΠ,f,L : the variable context for the label L of f in the program Π

Correctness Theorem. Finally, the correctness (both soundness and com-

pleteness) of the CHC representation is simply stated as follows.

Theorem 1 (Correctness of the CHC Representation). For any program

Π and simple function f in Π, fΠ
COS
is equivalent to Mleast
(|Π|) (fentry ).

Proof. The details are presented in the full paper. We outline the proof below.
First, we introduce abstract operational semantics, where we get rid of heaps
and directly represent each variable in the program simply as a value with ab-
stract variables, which is strongly related to prophecy variables (see § 5). An
abstract variable represents the undetermined value of a mutable reference at
the end of borrow.
Next, we introduce SLDC resolution for CHC systems and find a bisimula-
tion between abstract operational semantics and SLDC resolution, whereby we
show that the AOS-based model, defined analogously to the COS-based model,
is equivalent to the least model of the CHC representation. Moreover, we find
a bisimulation between concrete and abstract operational semantics and prove
that the COS-based model is equivalent to the AOS-based model.
Finally, combining the equivalences, we achieve the proof for the correctness
of the CHC representation.

Interestingly, as by-products of the proof, we have also shown the soundness

of the type system in terms of preservation and progression, in both concrete and
abstract operational semantics. Simpliﬁcation and generalization of the proofs
is left for future work.
RustHorn: CHC-based Veriﬁcation for Rust Programs 501

3.4 Advanced Examples

We give advanced examples of pointer-manipulating Rust programs and their
CHC representations. For readability, we write programs in Rust (with ghost
annotations) instead of COR. In addition, CHCs are written in an informal style
like § 1, preferring equalities to pattern matching.

Example 3. Consider the following program, a variant of just_rec in § 1.1.

fn choose<'a>(ma: &'a mut i32, mb: &'a mut i32) -> &'a mut i32 {
if rand() { drop ma; mb } else { drop mb; ma }
}
fn linger_dec<'a>(ma: &'a mut i32) -> bool {
*ma -= 1; if rand() >= 0 { drop ma; return true; }
let mut b = rand(); let old_b = b; intro 'b; let mb = &'b mut b;
let r2 = linger_dec<'b> (choose<'b> (ma, mb)); now 'b;
r2 && old_b >= b
}

Unlike just_rec, the function linger_dec can modify the local variable of an
arbitrarily deep ancestor. Interestingly, each recursive call to linger_dec can
introduce a new lifetime 'b , which yields arbitrarily many layers of lifetimes.
Suppose we wish to verify that linger_dec never returns false. If we use,
like JustRec + in § 1.1, a predicate taking the memory states h, h and the stack
pointer sp, we have to discover the quantiﬁed invariant: ∀ i ≤ sp. h[i] ≥ h [i]. In
contrast, our approach reduces this veriﬁcation problem to the following CHCs:
Choose(a, a◦ , b, b◦ , r) ⇐= b◦ = b ∧ r = a, a◦
Choose(a, a◦ , b, b◦ , r) ⇐= a◦ = a ∧ r = b, b◦
LingerDec(a, a◦ , r) ⇐= a = a − 1 ∧ a◦ = a ∧ r = true
LingerDec(a, a◦ , r) ⇐= a = a − 1 ∧ oldb = b ∧ Choose(a , a◦ , b, b◦ , mc)
∧ LingerDec(mc, r ) ∧ r = (r && oldb >= b◦ )
r = true ⇐= LingerDec(a, a◦ , r).

This can be solved by many solvers since it has a very simple model:
Choose(
a, a◦ ,
b, b◦ , r) :⇐⇒ (b◦ = b ∧ r =
a, a◦ ) ∨ (a◦ = a ∧ r =
b, b◦ )
LingerDec(
a, a◦ , r) :⇐⇒ r = true ∧ a ≥ a◦ .

Example 4. Combined with recursive data structures, our method turns out to
be more interesting. Let us consider the following Rust code:22
enum List { Cons(i32, Box<List>), Nil } use List::*;
fn take_some<'a>(mxs: &'a mut List) -> &'a mut i32 {
match mxs {
Cons(mx, mxs2) => if rand() { drop mxs2; mx }
else { drop mx; take_some<'a> (mxs2) }
Nil => { take_some(mxs) }
22
In COR, List can be expressed as μX.int × own X + unit.
502 Y. Matsushita et al.

}
}
fn sum(xs: &List) -> i32 {
match xs { Cons(x, xs2) => x + sum(xs2), Nil => 0 }
}
fn inc_some(mut xs: List) -> bool {
let n = sum(&xs); intro 'a; let my = take_some<'a> (&'a mut xs);
*my += 1; drop my; now 'a; let m = sum(&xs); m == n + 1
}

This is a program that manipulates singly linked integer lists, deﬁned as a re-
cursive data type. take_some takes a mutable reference to a list and returns
a mutable reference to some element of the list. sum calculates the sum of the
elements of a list. inc_some increments some element of a list via a mutable
reference and checks that the sum of the elements of the list has increased by 1.
Suppose we wish to verify that inc_some never returns false. Our method
translates this veriﬁcation problem into the following CHCs.23

TakeSome([x|xs ], xs◦ , r) ⇐= xs◦ = [x◦ |xs◦ ] ∧ xs◦ = xs ∧ r = x, x◦

TakeSome([x|xs ], xs◦ , r) ⇐= xs◦ = [x◦ |xs◦ ] ∧ x◦ = x ∧ TakeSome(xs , xs◦ , r)
TakeSome([], xs◦ , r) ⇐= TakeSome([], xs◦ , r)
Sum([x|xs ], r) ⇐= Sum(xs , r ) ∧ r = x + r
Sum([], r) ⇐= r = 0
IncSome(xs, r) ⇐= Sum(xs, n) ∧ TakeSome(xs, xs◦ , y, y◦ ) ∧ y◦ = y + 1
∧ Sum(xs ◦ , m) ∧ r = (m == n+1).

A crucial technique used here is subdivision of a mutable reference, which is

achieved with the constraint xs◦ = [x◦ |xs◦ ].
We can give this CHC system a very simple model, using an auxiliary function
sum (satisfying sum([x|xs ]) := x + sum(xs ), sum([]) := 0):
TakeSome(
xs, xs◦ ,
y, y◦ ) :⇐⇒ y◦ − y = sum(xs◦ ) − sum(xs)
Sum(
xs, r) :⇐⇒ r = sum(xs)
IncSome(xs, r) :⇐⇒ r = true.
Although the model relies on the function sum, the validity of the model can be
checked without induction on sum (i.e. we can check the validity of each CHC
just by properly unfolding the deﬁnition of sum a few times).
The example can be fully automatically and promptly veriﬁed by our approach
using HoIce [12,11] as the back-end CHC solver; see § 4.

3.5 Discussions

We discuss here how our idea can be extended and enhanced.

23
[x|xs] is the cons made of the head x and the tail xs. [] is the nil. In our formal logic,
they are expressed as inj0 (x, xs) and inj1 ().
RustHorn: CHC-based Veriﬁcation for Rust Programs 503

Applying Various Veriﬁcation Techniques. Our idea can also be expressed as a

translation of a pointer-manipulating Rust program into a program of a stateless
functional programming language, which allows us to use various veriﬁcation
techniques not limited to CHCs. Access to future information can be modeled
using non-determinism. To express the value a◦ coming at the end of mutable
borrow in CHCs, we just randomly guess the value with non-determinism. At
the time we actually release a mutable reference, we just check a' = a and cut
oﬀ execution branches that do not pass the check.
For example, take_max/inc_max in § 1.2/Example 1 can be translated into
the following OCaml program.

let rec assume b = if b then () else assume b

let take_max (a, a') (b, b') =
if a >= b then (assume (b' = b); (a, a'))
else (assume (a' = a); (b, b'))
let inc_max a b =
let a' = Random.int(0) in let b' = Random.int(0) in
let (c, c') = take_max (a, a') (b, b') in
assume (c' = c + 1); not (a' = b')
let main a b = assert (inc_max a b)

‘let a' = Random.int(0)’ expresses a random guess and ‘assume (a' = a)’
expresses a check. The original problem “Does inc_max never return false?”
is reduced to the problem “Does main never fail at assertion?” on the OCaml
program.24
This representation allows us to use various verification techniques, including
model checking (higher-order, temporal, bounded, etc.), semi-automated verifi-
cation (e.g. on Boogie [48]) and verification on proof assistants (e.g. Coq [15]).
The property to be verified can be not only partial correctness, but also total
correctness and liveness. Further investigation is left for future work.

Verifying Higher-order Programs. We have to care about the following points in

modeling closures: (i) A closure that encloses mutable references can be encoded
as a pair of the main function and the ‘drop function’ called when the closure is
released; (ii) A closure that updates enclosed data can be encoded as a function
that returns, with the main return value, the updated version of the closure;
(iii) A closure that updates external data through enclosed mutable references
can also be modeled by combination of (i) and (ii). Further investigation on
veriﬁcation of higher-order Rust programs is left for future work.

Libraries with Unsafe Code. Our translation does not use lifetime information;
the correctness of our method is guaranteed by the nature of borrow. Whereas
24
MoCHi [39], a higher-order model checker for OCaml, successfully veriﬁed the safety
property for the OCaml representation above. It also successfully and instantly ver-
iﬁed a similar representation of choose/linger_dec at Example 3.
504 Y. Matsushita et al.

lifetimes are used for static check of the borrow discipline, many libraries in Rust
(e.g. RefCell) provide a mechanism for dynamic ownership check.
We believe that such libraries with unsafe code can be verified for our method
by a separation logic such as Iris [35,33], as RustBelt [32] does. A good news
is that Iris has recently incorporated prophecy variables [34], which seems to fit
well with our approach. This is an interesting topic for future work.
After the libraries are verified, we can turn to our method. For an easy
example, Vec [58] can be represented simply as a functional array; a muta-
ble/immutable slice &mut[T]/&[T] can be represented as an array of muta-
ble/immutable references. For another example, to deal with RefCell [56], we
pass around an array that maps a RefCell<T> address to data of type T equipped
with an ownership counter; RefCell itself is modeled simply as an address.2526
Importantly, at the very time we take a mutable reference
a, a◦ from a ref-cell,
the data at the array should be updated into a◦ . Using methods such as pointer
analysis [61], we can possibly shrink the array.
Still, our method does not go quite well with memory leaks [52] caused for
example by combination of RefCell and Rc [57], because they obfuscate the
ownership release of mutable references. We think that use of Rc etc. should
rather be restricted for smooth verification. Further investigation is needed.

4 Implementation and Evaluation

We report on the implementation of our verification tool and the preliminary
experiments conducted with small benchmarks to confirm the effectiveness of
our approach.

4.1 Implementation of RustHorn

We implemented a prototype verification tool RustHorn (available at https:
//github.com/hopv/rust-horn) based on the ideas described above. The tool
supports basic features of Rust supported in COR, including recursions and
recursive types especially.
The implementation translates the MIR (Mid-level Intermediate Representa-
tion) [45,51] of a Rust program into CHCs quite straightforwardly.27 Thanks to
the nature of the translation, RustHorn can just rely on Rust’s borrow check and
forget about lifetimes. For efficiency, the predicate variables are constructed by
the granularity of the vertices in the control-flow graph in MIR, unlike the per-
label construction of § 3.2. Also, assertions in functions are taken into account
unlike the formalization in § 3.2.
25
To borrow a mutable/immutable reference from RefCell, we check and update the
counter and take out the data from the array.
26
In Rust, we can use RefCell to naturally encode data types with circular references
(e.g. doubly-linked lists).
27
In order to use the MIR, RustHorn’s implementation depends on the unstable nightly
version of the Rust compiler, which causes a slight portability issue.
RustHorn: CHC-based Verification for Rust Programs 505

4.2 Benchmarks and Experiments

To measure the performance of RustHorn and the existing CHC-based veriﬁer

SeaHorn [23], we conducted preliminary experiments with benchmarks listed in
Table 1. Each benchmark program is designed so that the Rust and C versions
match. Each benchmark instance consists of either one program or a pair of safe
and unsafe programs that are very similar to each other. The benchmarks and
experimental results are accessible at https://github.com/hopv/rust-horn.
The benchmarks in the groups simple and bmc were taken from SeaHorn
(https://github.com/seahorn/seahorn/tree/master/test), with the Rust
versions written by us. They have been chosen based on the following criteria:
they (i) consist of only features supported by core Rust, (ii) follow Rust’s owner-
ship discipline, and (iii) are small enough to be amenable for manual translation
from C to Rust.
The remaining six benchmark groups are built by us and consist of programs
featuring mutable references. The groups inc-max, just-rec and linger-dec
are based on the examples that have appeared in § 1 and § 3.4. The group
swap-dec consists of programs that perform repeated involved updates via mu-
table references to mutable references. The groups lists and trees feature
destructive updates on recursive data structures (lists and trees) via mutable
references, with one interesting program of it explained in § 3.4.
We conducted experiments on a commodity laptop (2.6GHz Intel Core i7
MacBook Pro with 16GB RAM). First we translated each benchmark program
by RustHorn and SeaHorn (version 0.1.0-rc3) [23] translate into CHCs in the
SMT-LIB 2 format. Both RustHorn and SeaHorn generated CHCs suﬃciently
fast (about 0.1 second for each program). After that, we measured the time of
CHC solving by Spacer [40] in Z3 (version 4.8.7) [69] and HoIce (version 1.8.1)
[12,11] for the generated CHCs. SeaHorn’s outputs were not accepted by HoIce,
especially because SeaHorn generates CHCs with arrays. We also made modiﬁed
versions for some of SeaHorn’s CHC outputs, adding constraints on address
freshness, to improve accuracy of representations and reduce false alarms.28

4.3 Experimental Results

Table 1 shows the results of the experiments.

Interestingly, the combination of RustHorn and HoIce succeeded in verify-
ing many programs with recursive data types (lists and trees), although it
failed at difficult programs.29 HoIce, unlike Spacer, can find models defined with
primitive recursive functions for recursive data types.30
28
For base/3 and repeat/3 of inc-max, the address-taking parts were already removed,
probably by inaccurate pointer analysis.
29
For example, inc-some/2 takes two mutable references in a list and increments on
them; inc-all-t destructively increments all elements in a tree.
30
We used the latest version of HoIce, whose algorithm for recursive types is presented
in the full paper of [11].
506 Y. Matsushita et al.

RustHorn SeaHorn w/Spacer

Group Instance Property w/Spacer w/HoIce as is modiﬁed
01 safe <0.1 <0.1 <0.1
04-recursive safe 0.5 timeout 0.8
simple 05-recursive unsafe <0.1 <0.1 <0.1
06-loop safe timeout 0.1 timeout
hhk2008 safe timeout 40.5 <0.1
unique-scalar unsafe <0.1 <0.1 <0.1
1 safe 0.2 <0.1 <0.1
unsafe 0.2 <0.1 <0.1
2 safe timeout 0.1 <0.1
unsafe <0.1 <0.1 <0.1
3 safe <0.1 <0.1 <0.1
bmc unsafe <0.1 <0.1 <0.1
safe
diamond-1 unsafe 0.1 <0.1 <0.1
<0.1 <0.1 <0.1
safe
diamond-2 unsafe 0.2 <0.1 <0.1
<0.1 <0.1 <0.1
base safe <0.1 <0.1 false alarm <0.1
unsafe <0.1 <0.1 <0.1 <0.1
base/3 safe <0.1 <0.1 false alarm
unsafe 0.1 <0.1 <0.1
inc-max safe 0.1 timeout false alarm 0.1
repeat unsafe <0.1 0.4 <0.1 <0.1
repeat/3 safe 0.2 timeout <0.1
unsafe <0.1 1.3 <0.1
base safe <0.1 <0.1 false alarm <0.1
unsafe 0.1 timeout <0.1 <0.1
base/3 safe 0.2 timeout false alarm <0.1
unsafe 0.4 0.9 <0.1 0.1
swap-dec safe 0.1 0.5 false alarm timeout
exact unsafe <0.1 26.0 <0.1 <0.1
exact/3 safe timeout timeout false alarm false alarm
unsafe <0.1 0.4 <0.1 <0.1
just-rec base safe <0.1 <0.1 <0.1
unsafe <0.1 0.1 <0.1
base safe <0.1 <0.1 false alarm
unsafe <0.1 0.1 <0.1
base/3 safe <0.1 <0.1 false alarm
unsafe <0.1 7.0 <0.1
linger-dec safe <0.1 <0.1 false alarm
exact unsafe <0.1 0.2 <0.1
exact/3 safe <0.1 <0.1 false alarm
unsafe <0.1 0.6 <0.1
append safe tool error <0.1 false alarm
unsafe tool error 0.2 0.1
inc-all safe tool error <0.1 false alarm
unsafe tool error 0.3 <0.1
lists safe tool error <0.1 false alarm
inc-some unsafe tool error 0.3 0.1
safe
inc-some/2 unsafe tool error timeout false alarm
tool error 0.3 0.4
append-t safe tool error <0.1 timeout
unsafe tool error 0.3 0.1
safe
inc-all-t unsafe tool error timeout timeout
tool error 0.1 <0.1
trees safe tool error timeout timeout
inc-some-t unsafe tool error 0.3 0.1
safe
inc-some/2-t unsafe tool error timeout false alarm
tool error 0.4 0.1

Table 1. Benchmarks and experimental results on RustHorn and SeaHorn, with

Spacer/Z3 and HoIce. “timeout” denotes timeout of 180 seconds; “false alarm” means
reporting ‘unsafe’ for a safe program; “tool error” is a tool error of Spacer, which
currently does not deal with recursive types well.
RustHorn: CHC-based Veriﬁcation for Rust Programs 507

False alarms of SeaHorn for the last six groups are mainly due to problematic
approximation of SeaHorn for pointers and heap memories, as discussed in § 1.1.
On the modiﬁed CHC outputs of SeaHorn, ﬁve false alarms were erased and four
of them became successful. For the last four groups, unboundedly many mem-
ory cells can be allocated, which imposes a fundamental challenge for SeaHorn’s
array-based approach as discussed in § 1.1.31 The combination of RustHorn and
HoIce took a relatively long time or reported timeout for some programs, includ-
ing unsafe ones, because HoIce is still an unstable tool compared to Spacer; in
general, automated CHC solving can be rather unstable.

5 Related Work
CHC-based Verification of Pointer-Manipulating Programs. SeaHorn [23] is a
representative existing tool for CHC-based verification of pointer-manipulating
programs. It basically represents the heap memory as an array. Although some
pointer analyses [24] are used to optimize the array representation of the heap,
their approach suffers from the scalability problem discussed in §1.1, as confirmed
by the experiments in § 4. Still, their approach is quite effective as automated
verification, given that many real-world pointer-manipulating programs do not
follow Rust-style ownership.
Another approach is taken by JayHorn [37,36], which translates Java pro-
grams (possibly using object pointers) to CHCs. They represent store invariants
using special predicates pull and push. Although this allows faster reasoning
about the heap than the array-based approach, it can suffer from more false
alarms. We conducted a small experiment for JayHorn (0.6-alpha) on some of
the benchmarks of § 4.2; unexpectedly, JayHorn reported ‘UNKNOWN’ (instead of
‘SAFE’ or ‘UNSAFE’) for even simple programs such as the programs of the instance
unique-scalar in simple and the instance basic in inc-max.

Verification for Rust. Whereas we have presented the first CHC-based (fully au-
tomated) verification method specially designed for Rust-style ownership, there
have been a number of studies on other types of verification for Rust.
RustBelt [32] aims to formally prove high-level safety properties for Rust
libraries with unsafe internal implementation, using manual reasoning on the
higher-order concurrent separation logic Iris [35,33] on the Coq Proof Assistant
[15]. Although their framework is flexible, the automation of the reasoning on
the framework is little discussed. The language design of our COR is affected by
their formal calculus λRust .
Electrolysis [67] translates some subset of Rust into a purely functional pro-
gramming language to manually verify functional correctness on Lean Theorem
Prover [49]. Although it clears out pointers to get simple models like our ap-
proach, Electrolysis’ applicable scope is quite limited, because it deals with mu-
table references by simple static tracking of addresses based on lenses [20], not
31
We also tried on Spacer JustRec + , the stack-pointer-based accurate representation
of just_rec presented in § 1.1, but we got timeout of 180 seconds.
508 Y. Matsushita et al.

supporting even basic use cases such as dynamic selection of mutable references
(e.g. take_max in § 1.2) [66], which our method can easily handle. Our approach
covers all usages of pointers of the safe core of Rust as discussed in § 3.
Some serial studies [27,3,17] conduct (semi-)automated verification on Rust
programs using Viper [50], a verification platform based on separation logic with
fractional ownership. This approach can to some extent deal with unsafe code
[27] and type traits [17]. Astrauskas et al. [3] conduct semi-automated verifi-
cation (manually providing pre/post-conditions and loop invariants) on many
realistic examples. Because Viper is based on fractional ownership, however,
their platforms have to use concrete indexing on the memory for programs like
take_max/inc_max. In contrast, our idea leverages borrow-based ownership, and
it can be applied also to semi-automated verification as suggested in § 3.5.
Some researches [65,4,44] employ bounded model checking on Rust programs,
especially with unsafe code. Our method can be applied to bounded model check-
ing as discussed in § 3.5.

Veriﬁcation using Ownership. Ownership has been applied to a wide range of

verification. It has been used for detecting race conditions on concurrent pro-
grams [8,64] and analyzing the safety of memory allocation [63]. Separation logic
based on ownership is also studied well [7,50,35]. Some verification platforms
[14,5,21] support simple ownership. However, most prior studies on ownership-
based verification are based on fractional or counting ownership. Verification
under borrow-based ownership like Rust was little studied before our work.

Prophecy Variables. Our idea of taking a future value to represent a mutable

reference is linked to the notion of prophecy variables [1,68,34]. Jung et al. [34]
propose a new Hoare-style logic with prophecy variables. In their logic, prophecy
variables are not copyable, which is analogous to uncopyability of mutable ref-
erences in Rust. This logic can probably be used for generalizing our idea as
suggested in § 3.5.

6 Conclusion

We have proposed a novel method for CHC-based program veriﬁcation, which

represents a mutable reference as a pair of values, the current value and the
future value at the time of release. We have formalized the method for a core
language of Rust and proved its correctness. We have implemented a proto-
type verification tool for a subset of Rust and confirmed the effectiveness of our
approach. We believe that this study establishes the foundation of verification
leveraging borrow-based ownership.

Acknowledgments. This work was supported by JSPS KAKENHI Grant

Number JP15H05706 and JP16K16004. We are grateful to the anonymous re-
viewers for insightful comments.
RustHorn: CHC-based Veriﬁcation for Rust Programs 509

References

1. Abadi, M., Lamport, L.: The existence of refinement mappings. Theor. Comput.
Sci. 82(2), 253–284 (1991). https://doi.org/10.1016/0304-3975(91)90224-P
2. Alberti, F., Bruttomesso, R., Ghilardi, S., Ranise, S., Sharygina, N.: Lazy ab-
straction with interpolants for arrays. In: Bjørner, N., Voronkov, A. (eds.)
Logic for Programming, Artificial Intelligence, and Reasoning - 18th Interna-
tional Conference, LPAR-18, Mérida, Venezuela, March 11-15, 2012. Proceed-
ings. Lecture Notes in Computer Science, vol. 7180, pp. 46–61. Springer (2012).
https://doi.org/10.1007/978-3-642-28717-6 7
3. Astrauskas, V., Müller, P., Poli, F., Summers, A.J.: Leveraging Rust types
for modular specification and verification (2018). https://doi.org/10.3929/ethz-b-
000311092
4. Baranowski, M.S., He, S., Rakamaric, Z.: Verifying Rust programs with SMACK.
In: Lahiri and Wang [42], pp. 528–535. https://doi.org/10.1007/978-3-030-01090-
4 32
5. Barnett, M., Fähndrich, M., Leino, K.R.M., Müller, P., Schulte, W., Venter, H.:
Specification and verification: The Spec# experience. Commun. ACM 54(6), 81–91
(2011). https://doi.org/10.1145/1953122.1953145
6. Bjørner, N., Gurfinkel, A., McMillan, K.L., Rybalchenko, A.: Horn clause
solvers for program verification. In: Beklemishev, L.D., Blass, A., Dershowitz,
N., Finkbeiner, B., Schulte, W. (eds.) Fields of Logic and Computation II
- Essays Dedicated to Yuri Gurevich on the Occasion of His 75th Birthday.
Lecture Notes in Computer Science, vol. 9300, pp. 24–51. Springer (2015).
https://doi.org/10.1007/978-3-319-23534-9 2
7. Bornat, R., Calcagno, C., O’Hearn, P.W., Parkinson, M.J.: Permission accounting
in separation logic. In: Palsberg, J., Abadi, M. (eds.) Proceedings of the 32nd
ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages,
POPL 2005, Long Beach, California, USA, January 12-14, 2005. pp. 259–270. ACM
(2005). https://doi.org/10.1145/1040305.1040327
8. Boyapati, C., Lee, R., Rinard, M.C.: Ownership types for safe program-
ming: Preventing data races and deadlocks. In: Ibrahim, M., Matsuoka,
S. (eds.) Proceedings of the 2002 ACM SIGPLAN Conference on Object-
Oriented Programming Systems, Languages and Applications, OOPSLA 2002,
Seattle, Washington, USA, November 4-8, 2002. pp. 211–230. ACM (2002).
https://doi.org/10.1145/582419.582440
9. Boyland, J.: Checking interference with fractional permissions. In: Cousot, R. (ed.)
Static Analysis, 10th International Symposium, SAS 2003, San Diego, CA, USA,
June 11-13, 2003, Proceedings. Lecture Notes in Computer Science, vol. 2694, pp.
55–72. Springer (2003). https://doi.org/10.1007/3-540-44898-5 4
10. Bradley, A.R., Manna, Z., Sipma, H.B.: What’s decidable about arrays? In: Emer-
son, E.A., Namjoshi, K.S. (eds.) Verification, Model Checking, and Abstract In-
terpretation, 7th International Conference, VMCAI 2006, Charleston, SC, USA,
January 8-10, 2006, Proceedings. Lecture Notes in Computer Science, vol. 3855,
pp. 427–442. Springer (2006). https://doi.org/10.1007/11609773 28
11. Champion, A., Chiba, T., Kobayashi, N., Sato, R.: ICE-based refinement type
discovery for higher-order functional programs. In: Beyer, D., Huisman, M. (eds.)
Tools and Algorithms for the Construction and Analysis of Systems - 24th Interna-
tional Conference, TACAS 2018, Held as Part of the European Joint Conferences
510 Y. Matsushita et al.

on Theory and Practice of Software, ETAPS 2018, Thessaloniki, Greece, April 14-
20, 2018, Proceedings, Part I. Lecture Notes in Computer Science, vol. 10805, pp.
365–384. Springer (2018). https://doi.org/10.1007/978-3-319-89960-2 20
12. Champion, A., Kobayashi, N., Sato, R.: HoIce: An ICE-based non-linear Horn
clause solver. In: Ryu, S. (ed.) Programming Languages and Systems - 16th Asian
Symposium, APLAS 2018, Wellington, New Zealand, December 2-6, 2018, Pro-
ceedings. Lecture Notes in Computer Science, vol. 11275, pp. 146–156. Springer
(2018). https://doi.org/10.1007/978-3-030-02768-1 8
13. Clarke, D.G., Potter, J., Noble, J.: Ownership types for flexible alias protection.
In: Freeman-Benson, B.N., Chambers, C. (eds.) Proceedings of the 1998 ACM
SIGPLAN Conference on Object-Oriented Programming Systems, Languages &
Applications (OOPSLA ’98), Vancouver, British Columbia, Canada, October 18-
22, 1998. pp. 48–64. ACM (1998). https://doi.org/10.1145/286936.286947
14. Cohen, E., Dahlweid, M., Hillebrand, M.A., Leinenbach, D., Moskal, M., Santen,
T., Schulte, W., Tobies, S.: VCC: A practical system for verifying concurrent C. In:
Berghofer, S., Nipkow, T., Urban, C., Wenzel, M. (eds.) Theorem Proving in Higher
Order Logics, 22nd International Conference, TPHOLs 2009, Munich, Germany,
August 17-20, 2009. Proceedings. Lecture Notes in Computer Science, vol. 5674,
pp. 23–42. Springer (2009). https://doi.org/10.1007/978-3-642-03359-9 2
15. Coq Team: The Coq proof assistant (2020), https://coq.inria.fr/
16. van Emden, M.H., Kowalski, R.A.: The semantics of predicate logic as
a programming language. Journal of the ACM 23(4), 733–742 (1976).
https://doi.org/10.1145/321978.321991
17. Erdin, M.: Verification of Rust Generics, Typestates, and Traits. Master’s thesis,
ETH Zürich (2019)
18. Fedyukovich, G., Kaufman, S.J., Bodı́k, R.: Sampling invariants from frequency
distributions. In: Stewart, D., Weissenbacher, G. (eds.) 2017 Formal Methods in
Computer Aided Design, FMCAD 2017, Vienna, Austria, October 2-6, 2017. pp.
100–107. IEEE (2017). https://doi.org/10.23919/FMCAD.2017.8102247
19. Fedyukovich, G., Prabhu, S., Madhukar, K., Gupta, A.: Quantified invariants via
syntax-guided synthesis. In: Dillig, I., Tasiran, S. (eds.) Computer Aided Verifica-
tion - 31st International Conference, CAV 2019, New York City, NY, USA, July
15-18, 2019, Proceedings, Part I. Lecture Notes in Computer Science, vol. 11561,
pp. 259–277. Springer (2019). https://doi.org/10.1007/978-3-030-25540-4 14
20. Foster, J.N., Greenwald, M.B., Moore, J.T., Pierce, B.C., Schmitt, A.: Com-
binators for bidirectional tree transformations: A linguistic approach to the
view-update problem. ACM Trans. Program. Lang. Syst. 29(3), 17 (2007).
https://doi.org/10.1145/1232420.1232424
21. Gondelman, L.: Un système de types pragmatique pour la vérification déductive des
programmes. (A Pragmatic Type System for Deductive Verification). Ph.D. thesis,
University of Paris-Saclay, France (2016), https://tel.archives-ouvertes.fr/
tel-01533090
22. Grebenshchikov, S., Lopes, N.P., Popeea, C., Rybalchenko, A.: Synthesizing soft-
ware verifiers from proof rules. In: Vitek, J., Lin, H., Tip, F. (eds.) ACM
SIGPLAN Conference on Programming Language Design and Implementation,
PLDI ’12, Beijing, China - June 11 - 16, 2012. pp. 405–416. ACM (2012).
https://doi.org/10.1145/2254064.2254112
23. Gurfinkel, A., Kahsai, T., Komuravelli, A., Navas, J.A.: The SeaHorn verification
framework. In: Kroening, D., Pasareanu, C.S. (eds.) Computer Aided Verification
RustHorn: CHC-based Verification for Rust Programs 511

- 27th International Conference, CAV 2015, San Francisco, CA, USA, July 18-
24, 2015, Proceedings, Part I. Lecture Notes in Computer Science, vol. 9206, pp.
343–361. Springer (2015). https://doi.org/10.1007/978-3-319-21690-4 20
24. Gurfinkel, A., Navas, J.A.: A context-sensitive memory model for verification of
C/C++ programs. In: Ranzato, F. (ed.) Static Analysis - 24th International Sym-
posium, SAS 2017, New York, NY, USA, August 30 - September 1, 2017, Proceed-
ings. Lecture Notes in Computer Science, vol. 10422, pp. 148–168. Springer (2017).
https://doi.org/10.1007/978-3-319-66706-5 8
25. Gurfinkel, A., Shoham, S., Meshman, Y.: SMT-based verification of parameterized
systems. In: Zimmermann, T., Cleland-Huang, J., Su, Z. (eds.) Proceedings of
the 24th ACM SIGSOFT International Symposium on Foundations of Software
Engineering, FSE 2016, Seattle, WA, USA, November 13-18, 2016. pp. 338–348.
ACM (2016). https://doi.org/10.1145/2950290.2950330
26. Gurfinkel, A., Shoham, S., Vizel, Y.: Quantifiers on demand. In: Lahiri and Wang
[42], pp. 248–266. https://doi.org/10.1007/978-3-030-01090-4 15
27. Hahn, F.: Rust2Viper: Building a Static Verifier for Rust. Master’s thesis, ETH
Zürich (2016). https://doi.org/10.3929/ethz-a-010669150
28. Hoenicke, J., Majumdar, R., Podelski, A.: Thread modularity at many levels: A
pearl in compositional verification. In: Castagna, G., Gordon, A.D. (eds.) Pro-
ceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming
Languages, POPL 2017, Paris, France, January 18-20, 2017. pp. 473–485. ACM
(2017). https://doi.org/10.1145/3009837
29. Hojjat, H., Rümmer, P.: The Eldarica Horn solver. In: Bjørner, N., Gurfinkel,
A. (eds.) 2018 Formal Methods in Computer Aided Design, FMCAD 2018,
Austin, TX, USA, October 30 - November 2, 2018. pp. 1–7. IEEE (2018).
https://doi.org/10.23919/FMCAD.2018.8603013
30. Horn, A.: On sentences which are true of direct unions of algebras. The Journal of
Symbolic Logic 16(1), 14–21 (1951), http://www.jstor.org/stable/2268661
31. Jim, T., Morrisett, J.G., Grossman, D., Hicks, M.W., Cheney, J., Wang, Y.: Cy-
clone: A safe dialect of C. In: Ellis, C.S. (ed.) Proceedings of the General Track:
2002 USENIX Annual Technical Conference, June 10-15, 2002, Monterey, Califor-
nia, USA. pp. 275–288. USENIX (2002), http://www.usenix.org/publications/
library/proceedings/usenix02/jim.html
32. Jung, R., Jourdan, J., Krebbers, R., Dreyer, D.: RustBelt: Securing the founda-
tions of the Rust programming language. PACMPL 2(POPL), 66:1–66:34 (2018).
https://doi.org/10.1145/3158154
33. Jung, R., Krebbers, R., Jourdan, J., Bizjak, A., Birkedal, L., Dreyer, D.: Iris from
the ground up: A modular foundation for higher-order concurrent separation logic.
J. Funct. Program. 28, e20 (2018). https://doi.org/10.1017/S0956796818000151
34. Jung, R., Lepigre, R., Parthasarathy, G., Rapoport, M., Timany, A., Dreyer, D.,
Jacobs, B.: The future is ours: Prophecy variables in separation logic. PACMPL
4(POPL), 45:1–45:32 (2020). https://doi.org/10.1145/3371113
35. Jung, R., Swasey, D., Sieczkowski, F., Svendsen, K., Turon, A., Birkedal, L.,
Dreyer, D.: Iris: Monoids and invariants as an orthogonal basis for concurrent
reasoning. In: Rajamani, S.K., Walker, D. (eds.) Proceedings of the 42nd Annual
ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages,
POPL 2015, Mumbai, India, January 15-17, 2015. pp. 637–650. ACM (2015).
https://doi.org/10.1145/2676726.2676980
36. Kahsai, T., Kersten, R., Rümmer, P., Schäf, M.: Quantified heap invariants for
object-oriented programs. In: Eiter, T., Sands, D. (eds.) LPAR-21, 21st Interna-
tional Conference on Logic for Programming, Artificial Intelligence and Reasoning,
512 Y. Matsushita et al.

Maun, Botswana, May 7-12, 2017. EPiC Series in Computing, vol. 46, pp. 368–384.
EasyChair (2017)
37. Kahsai, T., Rümmer, P., Sanchez, H., Schäf, M.: JayHorn: A framework for ver-
ifying Java programs. In: Chaudhuri, S., Farzan, A. (eds.) Computer Aided Ver-
ification - 28th International Conference, CAV 2016, Toronto, ON, Canada, July
17-23, 2016, Proceedings, Part I. Lecture Notes in Computer Science, vol. 9779,
pp. 352–358. Springer (2016). https://doi.org/10.1007/978-3-319-41528-4 19
38. Kalra, S., Goel, S., Dhawan, M., Sharma, S.: Zeus: Analyzing safety of smart
contracts. In: 25th Annual Network and Distributed System Security Symposium,
NDSS 2018, San Diego, California, USA, February 18-21, 2018. The Internet So-
ciety (2018)
39. Kobayashi, N., Sato, R., Unno, H.: Predicate abstraction and CEGAR for higher-
order model checking. In: Hall, M.W., Padua, D.A. (eds.) Proceedings of the 32nd
ACM SIGPLAN Conference on Programming Language Design and Implementa-
tion, PLDI 2011, San Jose, CA, USA, June 4-8, 2011. pp. 222–233. ACM (2011).
https://doi.org/10.1145/1993498.1993525
40. Komuravelli, A., Gurfinkel, A., Chaki, S.: SMT-based model checking for recursive
programs. In: Biere, A., Bloem, R. (eds.) Computer Aided Verification - 26th Inter-
national Conference, CAV 2014, Held as Part of the Vienna Summer of Logic, VSL
2014, Vienna, Austria, July 18-22, 2014. Proceedings. Lecture Notes in Computer
Science, vol. 8559, pp. 17–34. Springer (2014). https://doi.org/10.1007/978-3-319-
08867-9 2
41. Lahiri, S.K., Bryant, R.E.: Constructing quantified invariants via predicate ab-
straction. In: Steffen, B., Levi, G. (eds.) Verification, Model Checking, and Ab-
stract Interpretation, 5th International Conference, VMCAI 2004, Venice, Italy,
January 11-13, 2004, Proceedings. Lecture Notes in Computer Science, vol. 2937,
pp. 267–281. Springer (2004). https://doi.org/10.1007/978-3-540-24622-0 22
42. Lahiri, S.K., Wang, C. (eds.): Automated Technology for Verification and Analysis
- 16th International Symposium, ATVA 2018, Los Angeles, CA, USA, October
7-10, 2018, Proceedings, Lecture Notes in Computer Science, vol. 11138. Springer
(2018). https://doi.org/10.1007/978-3-030-01090-4
43. Lattner, C., Adve, V.S.: Automatic pool allocation: Improving performance by
controlling data structure layout in the heap. In: Sarkar, V., Hall, M.W. (eds.)
Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language
Design and Implementation, Chicago, IL, USA, June 12-15, 2005. pp. 129–142.
ACM (2005). https://doi.org/10.1145/1065010.1065027
44. Lindner, M., Aparicius, J., Lindgren, P.: No panic! Verification of Rust programs
by symbolic execution. In: 16th IEEE International Conference on Industrial Infor-
matics, INDIN 2018, Porto, Portugal, July 18-20, 2018. pp. 108–114. IEEE (2018).
https://doi.org/10.1109/INDIN.2018.8471992
45. Matsakis, N.D.: Introducing MIR (2016), https://blog.rust-lang.org/2016/
04/19/MIR.html
46. Matsakis, N.D., Klock II, F.S.: The Rust language. In: Feldman, M., Taft, S.T.
(eds.) Proceedings of the 2014 ACM SIGAda annual conference on High integrity
language technology, HILT 2014, Portland, Oregon, USA, October 18-21, 2014. pp.
103–104. ACM (2014). https://doi.org/10.1145/2663171.2663188
47. Matsushita, Y., Tsukada, T., Kobayashi, N.: RustHorn: CHC-based verification for
Rust programs (full version). CoRR (2020), https://arxiv.org/abs/2002.09002
48. Microsoft: Boogie: An intermediate verification language (2020), https:
//www.microsoft.com/en-us/research/project/boogie-an-intermediate-
verification-language/
RustHorn: CHC-based Verification for Rust Programs 513

49. de Moura, L.M., Kong, S., Avigad, J., van Doorn, F., von Raumer, J.: The
Lean theorem prover (system description). In: Felty, A.P., Middeldorp, A.
(eds.) Automated Deduction - CADE-25 - 25th International Conference on
Automated Deduction, Berlin, Germany, August 1-7, 2015, Proceedings. Lec-
ture Notes in Computer Science, vol. 9195, pp. 378–388. Springer (2015).
https://doi.org/10.1007/978-3-319-21401-6 26
50. Müller, P., Schwerhoff, M., Summers, A.J.: Viper: A verification infrastructure
for permission-based reasoning. In: Jobstmann, B., Leino, K.R.M. (eds.) Verifi-
cation, Model Checking, and Abstract Interpretation - 17th International Con-
ference, VMCAI 2016, St. Petersburg, FL, USA, January 17-19, 2016. Proceed-
ings. Lecture Notes in Computer Science, vol. 9583, pp. 41–62. Springer (2016).
https://doi.org/10.1007/978-3-662-49122-5 2
51. Rust Community: The MIR (Mid-level IR) (2020), https://rust-lang.github.
io/rustc-guide/mir/index.html
52. Rust Community: Reference cycles can leak memory - the Rust programming lan-
guage (2020), https://doc.rust-lang.org/book/ch15-06-reference-cycles.
html
53. Rust Community: RFC 2025: Nested method calls (2020), https://rust-lang.
github.io/rfcs/2025-nested-method-calls.html
54. Rust Community: RFC 2094: Non-lexical lifetimes (2020), https://rust-lang.
github.io/rfcs/2094-nll.html
55. Rust Community: Rust programming language (2020), https://www.rust-lang.
org/
56. Rust Community: std::cell::RefCell - Rust (2020), https://doc.rust-lang.org/
std/cell/struct.RefCell.html
57. Rust Community: std::rc::Rc - Rust (2020), https://doc.rust-lang.org/std/
rc/struct.Rc.html
58. Rust Community: std::vec::Vec - Rust (2020), https://doc.rust-lang.org/std/
vec/struct.Vec.html
59. Rust Community: Two-phase borrows (2020), https://rust-lang.github.io/
rustc-guide/borrow_check/two_phase_borrows.html
60. Sato, R., Iwayama, N., Kobayashi, N.: Combining higher-order model checking with
refinement type inference. In: Hermenegildo, M.V., Igarashi, A. (eds.) Proceedings
of the 2019 ACM SIGPLAN Workshop on Partial Evaluation and Program Manip-
ulation, PEPM@POPL 2019, Cascais, Portugal, January 14-15, 2019. pp. 47–53.
ACM (2019). https://doi.org/10.1145/3294032.3294081
61. Steensgaard, B.: Points-to analysis in almost linear time. In: Boehm, H., Jr., G.L.S.
(eds.) Conference Record of POPL’96: The 23rd ACM SIGPLAN-SIGACT Sym-
posium on Principles of Programming Languages, Papers Presented at the Sympo-
sium, St. Petersburg Beach, Florida, USA, January 21-24, 1996. pp. 32–41. ACM
Press (1996). https://doi.org/10.1145/237721.237727
62. Stump, A., Barrett, C.W., Dill, D.L., Levitt, J.R.: A decision procedure for an ex-
tensional theory of arrays. In: 16th Annual IEEE Symposium on Logic in Computer
Science, Boston, Massachusetts, USA, June 16-19, 2001, Proceedings. pp. 29–37.
IEEE Computer Society (2001). https://doi.org/10.1109/LICS.2001.932480
63. Suenaga, K., Kobayashi, N.: Fractional ownerships for safe memory dealloca-
tion. In: Hu, Z. (ed.) Programming Languages and Systems, 7th Asian Sym-
posium, APLAS 2009, Seoul, Korea, December 14-16, 2009. Proceedings. Lec-
ture Notes in Computer Science, vol. 5904, pp. 128–143. Springer (2009).
https://doi.org/10.1007/978-3-642-10672-9 11
514 Y. Matsushita et al.

64. Terauchi, T.: Checking race freedom via linear programming. In: Gupta, R., Ama-
rasinghe, S.P. (eds.) Proceedings of the ACM SIGPLAN 2008 Conference on Pro-
gramming Language Design and Implementation, Tucson, AZ, USA, June 7-13,
2008. pp. 1–10. ACM (2008). https://doi.org/10.1145/1375581.1375583
65. Toman, J., Pernsteiner, S., Torlak, E.: crust: A bounded verifier for Rust.
In: Cohen, M.B., Grunske, L., Whalen, M. (eds.) 30th IEEE/ACM Interna-
tional Conference on Automated Software Engineering, ASE 2015, Lincoln,
NE, USA, November 9-13, 2015. pp. 75–80. IEEE Computer Society (2015).
https://doi.org/10.1109/ASE.2015.77
66. Ullrich, S.: Electrolysis reference (2016), http://kha.github.io/electrolysis/
67. Ullrich, S.: Simple Verification of Rust Programs via Functional Purification. Mas-
ter’s thesis, Karlsruhe Institute of Technology (2016)
68. Vafeiadis, V.: Modular fine-grained concurrency verification. Ph.D. thesis, Univer-
sity of Cambridge, UK (2008), http://ethos.bl.uk/OrderDetails.do?uin=uk.
bl.ethos.612221
69. Z3 Team: The Z3 theorem prover (2020), https://github.com/Z3Prover/z3

Adithya Murali† , Lucas Peña† , Christof Löding‡ , and P. Madhusudan†

†
University of Illinois at Urbana-Champaign, Department of Computer Science,
Urbana, IL, USA {adithya5,lpena7, madhu}@illinois.edu
‡
RWTH Aachen University, Department of Computer Science, Aachen, Germany
[email protected]

Abstract. We propose a novel logic, called Frame Logic (FL), that ex-
tends first-order logic (with recursive definitions) using a construct Sp(·)
that captures the implicit supports of formulas— the precise subset of
the universe upon which their meaning depends. Using such supports, we
formulate proof rules that facilitate frame reasoning elegantly when the
underlying model undergoes change. We show that the logic is expressive
by capturing several data-structures and also exhibit a translation from
a precise fragment of separation logic to frame logic. Finally, we design
a program logic based on frame logic for reasoning with programs that
dynamically update heaps that facilitates local specifications and frame
reasoning. This program logic consists of both localized proof rules as
well as rules that derive the weakest tightest preconditions in FL.

Keywords: Program Veriﬁcation, Program Logics, Heap Veriﬁcation, First-

Order Logic, First-Order Logic with Recursive Deﬁnitions

1 Introduction

Program logics for expressing and reasoning with programs that dynamically
manipulate heaps is an active area of research. The research on separation logic
has argued convincingly that it is highly desirable to have localized logics that
talk about small states (heaplets rather than the global heap), and the ability
to do frame reasoning. Separation logic achieves this objective by having a tight
heaplet semantics and using special operators, primarily a separating conjunction
operator ∗ and a separating implication operator (the magic wand −∗).
In this paper, we ask a fundamental question: can classical logics (such as
FOL and FOL with recursive definitions) be extended to support localized spec-
ifications and frame reasoning? Can we utilize classical logics for reasoning effec-
tively with programs that dynamically manipulate heaps, with the aid of local
specifications and frame reasoning?
The primary contribution of this paper is to endow a classical logic, namely
first-order logic with recursive definitions (with least fixpoint semantics) with
frames and frame reasoning.

Equal contribution Corresponding Author

c The Author(s) 2020

P. Müller (Ed.): ESOP 2020, LNCS 12075, pp. 515–543, 2020.
https://doi.org/10.1007/978-3-030-44914-8_ 19
516 A. Murali et al.

A formula in ﬁrst-order logic with recursive deﬁnitions (FO-RD) can be nat-

urally associated with a support— the subset of the universe that determines
its truth. By using a more careful syntax such as guarded quantification (which
continue to have a classical interpretation), we can in fact write specifications in
FO-RD that have very precise supports. For example, we can write the property
that x points to a linked list using a formula list(x) written purely in FO-RD
so that its support is precisely the locations constituting the linked list.
In this paper, we define an extension of FO-RD, called Frame Logic (FL)
where we allow a new operator Sp(α) which, for an FO-RD formula α, evaluates
to the support of α. Logical formulas thus have access to supports and can use
it to separate supports and do frame reasoning. For instance, the logic can now
express that two lists are disjoint by asserting that Sp(list(x)) ∩ Sp(list(y)) = ∅.
It can then reason that in such a program heap configuration, if the program
manipulates only the locations in Sp(list(y)), then list(x) would continue to be
true, using simple frame reasoning.
The addition of the support operator to FO-RD yields a very natural logic
for expressing specifications. First, formulas in FO-RD have the same meaning
when viewed as FL formulae. For example, f (x) = y (written in FO-RD as
well as in FL) is true in any model that has x mapped by f to y, instead of a
specialized “tight heaplet semantics” that demands that f be a partial function
with the domain only consisting of the location x. The fact that the support of
this formula contains only the location x is important, of course, but is made
accessible using the support operator, i.e., Sp(f (x) = y) gives the set containing
the sole element interpreted for x. Second, properties of supports can be naturally
expressed using set operations. To state that the lists pointed to by x and y are
disjoint, we don’t need special operators (such as the ∗ operator in separation
logic) but can express this as Sp(list(x)) ∩ Sp(list(y)) = ∅. Third, when used to
annotate programs, pre/post specifications for programs written in FL can be
made implicitly local by interpreting their supports to be the localized heaplets
accessed and modified by programs, yielding frame reasoning akin to program
logics that use separation logic. Finally, as we show in this paper, the weakest
precondition of specifications across basic loop-free paths can be expressed in
FL, making it an expressive logic for reasoning with programs. Separation logic,
on the other hand, introduces the magic wand operator −∗ (which is inherently
higher-order) in order to add enough expressiveness to be closed under weakest
preconditions [38].
We define frame logic (FL) as an extension of FO with recursive definitions
(FO-RD) that operates over a multi-sorted universe, with a particular foreground
sort (used to model locations on the heap on which pointers can mutate) and
several background sorts that are defined using separate theories. Supports for
formulas are defined with respect to the foreground sort only. A special back-
ground sort of sets of elements of the foreground sort is assumed and is used
to model the supports for formulas. For any formula ϕ in the logic, we have a
special construct Sp(ϕ) that captures its support, a set of locations in the fore-
ground sort, that intuitively corresponds to the precise subdomain of functions
A First-Order Logic with Frames 517

the value of ϕ depends on. We then prove a frame theorem (Theorem 1) that
says that changing a model M by changing the interpretation of functions that
are not in the support of ϕ will not affect the truth of the formula ϕ. This theo-
rem then directly supports frame reasoning; if a model satisfies ϕ and the model
is changed so that the changes made are disjoint from the support of ϕ, then
ϕ will continue to hold. We also show that FL formulae can be translated to
vanilla FO-RD logic (without support operators); in other words, the semantics
for the support of a formula can be captured in FO-RD itself. Consequently, we
can use any FO-RD reasoning mechanism (proof systems [19, 20] or heuristic
algorithms such as the natural proof techniques [24, 32, 37, 41]) to reason with
FL formulas.
We illustrate our logic using several examples drawn from program verifica-
tion; we show how to express various data-structure definitions and the elements
they contain and various measures for them using FL formulas (e.g., linked lists,
sorted lists, list segments, binary search trees, AVL trees, lengths of lists, heights
of trees, set of keys stored in the data-structure, etc.)
While the sensibilities of our logic are definitely inspired by separation logic,
there are some fundamental differences beyond the fact that our logic extends
the syntax and semantics of classical logics with a special support operator
and avoids operators such as ∗ and −∗. In separation logic, there can be many
supports of a formula (also called heaplets)— a heaplet for a formula is one that
supports its truth. For example, a formula of the form α ∨ β can have a heaplet
that supports the truth of α or one that supports the truth of β. However,
the philosophy that we follow in our design is to have a single support that
supports the truth value of a formula, whether it be true or false. Consequently,
the support of the formula α ∨ β is the union of the supports of the formulas α
and β.
The above design choice of the support being determined by the formula has
several consequences that lead to a deviation from separation logic. For instance,
the support of the negation of a formula ϕ is the same as the support of ϕ. And
the support of the formula f (x) = y and its negation are the same, namely the
singleton location interpreted for x. In separation logic, the corresponding for-
mula will have the same heaplet but its negation will include all other heaplets.
The choice of having determined supports or heaplets is not new, and there have
been several variants and sublogics of separation logics that have been explored.
For example, the logic Dryad [32, 37] is a separation logic that insists on de-
termined heaplets to support automated reasoning, and the precise fragment of
separation logic studied in the literature [29] defines a sublogic that has (essen-
tially) determined heaplets. The second main contribution in this paper is to
show that this fragment of separation logic (with slight changes for technical
reasons) can be translated to frame logic, such that the unique heaplet that
satisfies a precise separation logic formula is its support of the corresponding
formula in frame logic.
The third main contribution of this paper is a program logic based on frame
logic for a simple while-programming language destructively updating heaps. We
518 A. Murali et al.

present two kinds of proof rules for reasoning with such programs annotated with
pre- and post-conditions written in frame logic. The first set of rules are local
rules that axiomatically define the semantics of the program, using the small-
est supports for each command. We also give a frame rule that allows arguing
preservation of properties whose supports are disjoint from the heaplet modified
by a program. These rules are similar to analogous rules in separation logic.
The second class of rules work to give a weakest tightest precondition for any
postcondition with respect to non-recursive programs. In separation logic, the
corresponding rules for weakest preconditions are often expressed using separat-
ing implication (the magic-wand operator). Given a small change made to the
heap and a postcondition β, the formula α −∗ β captures all heaplets H where
if a heaplet that satisfies α is joined with H, then β holds. When α describes
the change effected by the program, α −∗ β captures, essentially, the weakest
precondition. However, the magic wand is a very powerful operator that calls for
quantifications over heaplets and submodels, and hence involves second order
quantification. In our logic, we show that we can capture the weakest precon-
dition with only first-order quantification, and hence first-order frame logic is
closed under weakest preconditions across non-recursive programs blocks. This
means that when inductive loop invariants are given also in FL, reasoning with
programs reduces to reasoning with FL. By translating FL to pure FO-RD for-
mulas, we can use FO-RD reasoning techniques to reason with FL, and hence
programs.

In summary, the contributions of this paper are:

– A logic, called frame logic (FL) that extends FO-RD with a support operator
and supports frame reasoning. We illustrate FL with specifications of various
data-structures. We show a translation to equivalent formulas in FO-RD.
– A program logic and proof system based on FL including local rules and rules
for computing the weakest tightest precondition. FL reasoning required for
proving programs is hence reducible to reasoning with FO-RD.
– A separation logic fragment that can generate only precise formulas, and a
translation from this logic to equivalent FL formulas.
The paper is organized as follows. Section 2 sets up first-order logics with
recursive definitions (FO-RD), with a special uninterpreted foreground sort of lo-
cations and several background sorts/theories. Section 3 introduces Frame Logic
(FL), its syntax, its semantics which includes a discussion of design choices for
supports, proves the frame theorem for FL, shows a reduction of FL to FO-RD,
and illustrates the logic by defining several data-structures and their properties
using FL. Section 4 develops a program logic based on FL, illustrating them
with proofs of verification of programs. Section 5 introduces a precise fragment
of separation logic and shows its translation to FL. Section 6 discusses com-
parisons of FL to separation logic, and some existing first-order techniques that
can be used to reason with FL. Section 7 compares our work with the research
literature and Section 8 has concluding remarks.
A First-Order Logic with Frames 519

2 Background: First-Order Logic with Recursive

Deﬁnitions and Uninterpreted Combinations of
Theories

The base logic upon which we build frame logic is a first order logic with recursive
definitions (FO-RD), where we allow a foreground sort and several background
sorts, each with their individual theories (like arithmetic, sets, arrays, etc.). The
foreground sort and functions involving the foreground sort are uninterpreted
(not constrained by theories). This hence can be seen as an uninterpreted com-
bination of theories over disjoint domains. This logic has been defined and used
to model heap verification before [23].
We will build frame logic over such a framework where supports are modeled
as subsets of elements of the foreground sort. When modeling heaps in program
verification using logic, the foreground sort will be used to model locations of the
heap, uninterpreted functions from the foreground sort to foreground sort will
be used to model pointers, and uninterpreted functions from the foreground sort
to the background sort will model data fields. Consequently, supports will be
subsets of locations of the heap, which is appropriate as these are the domains
of pointers that change when a program updates a heap.
We define a signature as Σ = (S; C; F ; R; I), where S is a finite non-empty
set of sorts. C is a set of constant symbols, where each c ∈ C has some sort
τ ∈ S. F is a set of function symbols, where each function f ∈ F has a type of
the form τ1 × . . . × τm → τ for some m, with τi , τ ∈ S. The sets R and I are
(disjoint) sets of relation symbols, where each relation R ∈ R ∪ I has a type of
the form τ1 × . . . × τm . The set I contains those relation symbols for which the
corresponding relations are inductively defined using formulas (details are given
below), while those in R are given by the model.
We assume that the set of sorts contains a designated “foreground sort”
denoted by σf . All the other sorts in S are called background sorts, and for
each such background sort σ we allow the constant symbols of type σ, function
symbols that have type σ n → σ for some n, and relation symbols have type σ m
for some m, to be constrained using an arbitrary theory Tσ .
A formula in first-order logic with recursive definitions (FO-RD) over such a
signature is of the form (D, α), where D is a set of recursive definitions of the
form R(x) := ρR (x), where R ∈ I and ρR (x) is a first-order logic formula, in
which the relation symbols from I occur only positively. α is also a first-order
logic formula over the signature. We assume D has at most one definition for any
inductively defined relation, and that the formulas ρR and α use only inductive
relations defined in D.
The semantics of a formula is standard; the semantics of inductively defined
relations are defined to be the least fixpoint that satisfies the relational equations,
and the semantics of α is the standard one defined using these semantics for
relations. We do not formally define the semantics, but we will formally define
the semantics of frame logic (discussed in the next section and whose semantics
is defined in the Technical Report [25]) which is an extension of FO-RD.
520 A. Murali et al.

3 Frame Logic

We now deﬁne Frame Logic (FL), the central contribution of this paper.

FL formulas: ϕ ::= tτ = tτ | R(tτ1 , . . . , tτm ) | ϕ ∧ ϕ | ¬ϕ | ite(γ : ϕ, ϕ) | ∃y : γ. ϕ

τ ∈ S, R ∈ R ∪ I of type τ1 × · · · × τm
Guards: γ ::= tτ = tτ | R(tτ1 , . . . , tτm ) | γ ∧ γ | ¬γ | ite(γ : γ, γ) | ∃y : γ. γ
τ ∈ S \ {σS(f) }, R ∈ R of type τ1 × · · · × τm
Terms: tτ ::= c | x | f (tτ1 , . . . , tτm ) | ite(γ : tτ , tτ ) |
Sp(ϕ) (if τ = σS(f) ) | Sp(tτ ) (if τ = σS(f) )
τ, τ ∈ S with constants c, variables x of type τ ,
and functions f of type τ1 × · · · × tm → τ
Recursive deﬁnitions: R(x) := ρR (x) with R ∈ I of type τ1 × · · · × τm with
τi ∈ S \ {σS(f) }, FL formula ρR (x) where all relation symbols
R ∈ I occur only positively or inside a support expression.

Fig. 1. Syntax of frame logic: γ for guards, tτ for terms of sort τ , and general formulas
ϕ. Guards cannot use inductively deﬁned relations or support expressions.

We consider a universe with a foreground sort and several background sorts,

each restricted by individual theories, as described in Section 2. We consider the
elements of the foreground sort to be locations and consider supports as sets
of locations, i.e., sets of elements of the foreground sort. We hence introduce a
background sort σS(f) ; the elements of sort σS(f) model sets of elements of sort σf .
Among the relation symbols in R there is the relation ∈ of type σf × σS(f) that
is interpreted as the usual element relation. The signature includes the standard
operations on sets ∪, ∩ with the usual meaning, the unary function · that is
interpreted as the complement on sets (with respect to the set of foreground
elements), and the constant ∅. For these functions and relations we assume a
background theory BσS(f) that is an axiomatization of the theory of sets. We
further assume that the signature does not contain any other function or relation
symbols involving the sort σS(f) .
For reasoning about changes of the structure over the locations, we assume
that there is a subset Fm ⊆ F of function symbols that are declared mutable.
These functions can be used to model mutable pointer fields in the heap that
can be manipulated by a program and thus change. Formally, we require that
each f ∈ Fm has at least one argument of sort σf .
For variables, let Var τ denote the set of variables of sort τ , where τ ∈ S. We
let x abbreviate tuples x1 , . . . , xn of variables.
Our frame logic over uninterpreted combinations of theories is a variant of
first-order logic with recursive definitions that has an additional operator Sp(ϕ)
that assigns to each formula ϕ a set of elements (its support or “heaplet” in the
context of heaps) in the foreground universe. So Sp(ϕ) is a term of sort σS(f) .
A First-Order Logic with Frames 521

The intended semantics of Sp(ϕ) (and of the inductive relations) is deﬁned

formally as a least fixpoint of a set of equations. This semantics is presented
in Section 3.3. In the following, we first define the syntax of the logic, then
discuss informally the various design decisions for the semantics of supports,
before proceeding to a formal definition of the semantics

3.1 Syntax of Frame Logic (FL)

The syntax of our logic is given in the grammar in Figure 1. This extends FO-RD
with the rule for building support expressions, which are terms of sort σS(f) of
the form Sp(α) for a formula α, or Sp(t) for a term t.
The formulas defined by γ are used as guards in existential quantification and
in the if-then-else-operator, which is denoted by ite. The restriction compared to
general formulas is that guards cannot use inductively defined relations (R ranges
only over R in the rule for γ, and over R ∪ I in the rule for ϕ), nor terms of sort
σS(f) and thus no support expressions (τ ranges over S \ {σS(f) } in the rules for γ
and over S in the rule for ϕ). The requirement that the guard does not use the
inductive relations and support expressions is used later to ensure the existence
of least fixpoints for defining semantics of inductive definitions. The semantics of
an ite-formula ite(γ, α, β) is the same as the one of (γ ∧α)∨(¬γ ∧β); however, the
supports of the two formulas will turn out to be different (i.e., Sp(ite(γ : α, β))
and Sp((γ ∧ α) ∨ (¬γ ∧ β)) are different), as explained in Section 3.2. The same
is true for existential formulas, i.e., ∃y : γ.ϕ has the same semantics as ∃y.γ ∧ ϕ
but, in general, has a different support.
For recursive definitions (throughout the paper, we use the terms recursive
definitions and inductive definitions with the same meaning), we require that
the relation R that is defined does not have arguments of sort σS(f) . This is
another restriction in order to ensure the existence of a least fixpoint model in
the definition of the semantics.1

3.2 Semantics of Support Expressions: Design Decisions

We discuss the design decisions that go behind the semantics of the support
operator Sp in our logic, and then give an example for the support of an inductive
definition. The formal conditions that the supports should satisfy are stated in
the equations in Figure 2, and are explained in Section 3.3. Here, we start by an
informal discussion.
The first decision is to have every formula uniquely define a support, which
roughly captures the subdomain of mutable functions that a formula ϕ’s truth-
hood depends on, and have Sp(ϕ) evaluate to it.
The choice for supports of atomic formulae are relatively clear. An atomic
formula of the kind f (x)=y, where x is of the foreground sort and f is a mutable
function, has as its support the singleton set containing the location interpreted
1
It would be sufficient to restrict formulas of the form R(t1 , . . . , tn ) for inductive
relations R to not contain support expressions as subterms.
522 A. Murali et al.

for x. And atomic formulas that do not involve mutable functions over the fore-
ground have an empty support. Supports for terms can also be similarly defined.
The support of a conjunction α ∧ β should clearly be the union of the supports
of the two formulas.
Remark 1. In traditional separation logic, each pointer field is stored in a sep-
arate location, using integer offsets. However, in our work, we view pointers as
references and disallow pointer arithmetic. A more accurate heaplet for such
references can be obtained by taking heaplet to be the pair (x, f ) (see [30]), cap-
turing the fact that the formula depends only on the field f of x. Such accurate
heaplets can be captured in FL as well— we can introduce a non-mutable field
lookup pointer Lf and use x.Lf .f in programs instead of x.f .
What should the support of a formula α ∨ β be? The choice we make here is
that its support is the union of the supports of α and β. Note that in a model
where α is true and β is false, we still include the heaplet of β in Sp(α ∨ β). In a
sense, this is an overapproximation of the support as far as frame reasoning goes,
as surely preserving the model’s definitions on the support of α will preserve the
truth of α, and hence of α ∨ β.
However, we prefer the support to be the union of the supports of α and β.
We think of the support as the subdomain of the universe that determines the
meaning of the formula, whether it be true or false. Consequently, we would like
the support of a formula and its negation to be the same. Given that the support
of the negation of a disjunction, being a conjunction, is the union of the frames
of α and β, we would like this to be the support.
Separation logic makes a different design decision. Logical formulas are not
associated with tight supports, but rather, the semantics of the formula is defined
for models with given supports/heaplets, where the idea of a heaplet is whether
it supports the truthhood of a formula (and not its falsehood). For example,
for a model, the various heaplets that satisfy ¬(f (x) = y) in separation logic
would include all heaplets where the location of x is not present, which does
not coincide with the notion we have chosen for supports. However, for positive
formulas, separation logic handles supports more accurately, as it can associate
several supports for a formula, yielding two heaplets for formulas of the form
α ∨ β when they are both true in a model. The decision to have a single support
for a formula compels us to take the union of the supports to be the support of
a disjunction.
There are situations, however, where there are disjunctions α ∨ β, where only
one of the disjuncts can possibly be true, and hence we would like the support
of the formula to be the support of the disjunct that happens to be true. We
therefore introduce a new syntactical form ite(γ : α, β) in frame logic, whose
heaplet is the union of the supports of γ and α, if γ is true, and the supports
of γ and β if γ is false. While the truthhood of ite(γ : α, β) is the same as that
of (γ ∧ α) ∨ (¬γ ∧ β), its supports are potentially smaller, allowing us to write
formulas with tighter supports to support better frame reasoning. Note that the
support of ite(γ : α, β) and its negation ite(γ : ¬α, ¬β) are the same, as we
desired.
A First-Order Logic with Frames 523

Turning to quantification, the support for a formula of the form ∃x.α is hard
to define, as its truthhood could depend on the entire universe. We hence provide
a mechanism for guarded quantification, in the form ∃x : γ. α. The semantics
of this formula is that there exists some location that satisfies the guard γ, for
which α holds. The support for such a formula includes the support of the guard,
and the supports of α when x is interpreted to be a location that satisfies γ. For
example, ∃x : (x = f (y)). g(x) = z has as its support the locations interpreted
for y and f (y) only.
For a formula R(t) with an inductive relation R defined by R(x) := ρR (x),
the support descends into the definition, changing the variable assignment of the
variables in x from the inductive definition to the terms in t. Furthermore, it
contains the elements to which mutable functions are applied in the terms in t.
Recursive definitions are designed such that the evaluation of the equations
for the support expressions is independent of the interpretation of the inductive
relations. The equations mainly depend on the syntactic structure of formulas
and terms. Only the semantics of guards, and the semantics of subterms under
a mutable function symbol play a role. For this reason, we disallow guards to
contain recursively defined relations or support expressions. We also require that
the only functions involving the sort σS(f) are the standard functions involving
sets. Thus, subterms of mutable functions cannot contain support expressions
(which are of sort σS(f) ) as subterms.
These restrictions ensure that there indeed exists a unique simultaneous least
solution of the equations for the inductive relations and the support expressions.
We end this section with an example.
Example 1. Consider the definition of a predicate tree(x) w.r.t. two unary mu-
table functions left and right:
tree(x) := ite(x = nil : true, α) where
α = ∃
, r : (
= left(x) ∧ r = right(x)).tree(
) ∧ tree(r) ∧
Sp(tree(
)) ∩ Sp(tree(r)) = ∅ ∧ ¬(x ∈ Sp(tree(
)) ∪ Sp(tree(r)))
This inductive definition defines binary trees with pointer fields left and right
for left- and right-pointers, by stating that x points to a tree if either x is equal
to nil (in this case its support is empty), or left(x) and right(x) are trees with
disjoint supports. The last conjunct says that x does not belong to the support
of the left and right subtrees; this condition is, strictly speaking, not required to
define trees (under least fixpoint semantics). Note that the access to the support
of formulas eases defining disjointness of heaplets, like in separation logic. The
support of tree(x) turns out to be precisely the nodes that are reachable from
x using left and right pointers, as one would desire. Consequently, if a pointer
outside this support changes, we would be able to conclude using frame reasoning
that the truth value of tree(x) does not change.

3.3 Formal Semantics of Frame Logic

Before we explain the semantics of the support expressions and inductive deﬁni-
tions, we introduce a semantics that treats support expressions and the symbols
524 A. Murali et al.

Sp(c)M (ν) = Sp(x)

⎧ M (ν) = ∅ for a constant c or variable x
⎪
n
⎪
⎨ {ti M,ν } ∪ Sp(ti )M (ν) if f ∈ Fm
i with ti of sort σf i=1
Sp(f (t1 , . . . , tn ))M (ν) = n
⎪
⎪
⎩ Sp(ti )M (ν) if f ∈ Fm
i=1
Sp(Sp(ϕ))M (ν) = Sp(ϕ)M (ν)
Sp(Sp(t))M (ν) = Sp(t)M (ν)
Sp(t1 = t2 )M (ν) = n 1 )M (ν) ∪ Sp(t2 )M (ν)
Sp(t
Sp(R(t1 , . . . , tn ))M (ν) = i=1 Sp(ti )M (ν) for R ∈ R
Sp(R(t))M (ν) = Sp(ρR (x))M (ν[x ← tM,ν ]) ∪ n i=1 Sp(ti )M (ν)
for R ∈ I with deﬁnition R(x) := ρR (x),
t = (t1 , . . . , tn ), x = (x1 , . . . , xn )
Sp(α ∧ β)M (ν) = Sp(α)M (ν) ∪ Sp(β)M (ν)
Sp(¬ϕ)M (ν) = Sp(ϕ)M (ν)
Sp(α)M (ν) if M, ν |= γ
Sp(ite(γ : α, β))M (ν) = Sp(γ)M (ν) ∪
Sp(β)M (ν) if M, ν |= γ

Sp(t1 )M (ν) if M, ν |= γ
Sp(ite(γ : t1 , t2 ))M (ν) = Sp(γ)M (ν) ∪
Sp(t2 )M (ν) if M, ν |= γ

Sp(∃y : γ.ϕ)M (ν) = Sp(γ)M (ν[y ← u]) ∪ Sp(ϕ)M (ν[y ← u])
u∈Dy u∈Dy ;M,ν[y←u]|=γ

Fig. 2. Equations for support expressions

from I as uninterpreted symbols. We refer to this semantics as uninterpreted se-

mantics. For the formal definition we need to introduce some terminology first.
An occurrence of a variable x in a formula is free if it does not occur under
the scope of a quantifier for x. By renaming variables we can assume that each
variable only occurs freely in a formula or is quantified by exactly one quantifier
in the formula. We write ϕ(x1 , . . . , xk ) to indicate that the free variables of ϕ are
among x1 , . . . , xk . Substitution of a term t for all free occurrences of variable x in
a formula ϕ is denoted ϕ[t/x]. Multiple variables are substituted simultaneously
as ϕ[t1 /x1 , . . . , tn /xn ]. We abbreviate this by ϕ[t/x].
A model is of the form M = (U ; ·M ) where U = (Uσ )σ∈S contains a universe
for each sort, and an interpretation function ·M . The universe for the sort σS(f)
is the powerset of the universe for σf .
A variable assignment is a function ν that assigns to each variable a concrete
element from the universe for the sort of the variable. For a variable x, we write
Dx for the universe of the sort of x (the domain of x). For a variable x and an
element u ∈ Dx we write ν[x ← u] for the variable assignment that is obtained
from ν by changing the value assigned for x to u.
The interpretation function ·M maps each constant c of sort σ to an el-
ement cM ∈ Uσ , each function symbol f : τ1 × . . . × τm → τ to a concrete
function f M : Uτ1 × . . . × Uτm → Uτ , and each relation symbol R ∈ R ∪ I of
type τ1 × . . . × τm to a concrete relation RM ⊆ Uτ1 × . . . × Uτm . These interpre-
tations are assumed to satisfy the background theories (see Section 2). Further-
A First-Order Logic with Frames 525

more, the interpretation function maps each expression of the form Sp(ϕ) to a
function Sp(ϕ)M that assigns to each variable assignment ν a set Sp(ϕ)M (ν)
of foreground elements. The set Sp(ϕ)M (ν) corresponds to the support of the
formula when the free variables are interpreted by ν. Similarly, Sp(t)M is a
function from variable assignments to sets of foreground elements.
Based on such models, we can define the semantics of terms and formulas in
the standard way. The only construct that is non-standard in our logic are terms
of the form Sp(ϕ), for which the semantics is directly given by the interpretation
function. We write tM,ν for the interpretation of a term t in M with variable
assignment ν. With this convention, Sp(ϕ)M (ν) denotes the same thing as
Sp(ϕ)M,ν . As usual, we write M, ν |= ϕ to indicate that the formula ϕ is true
in M with the free variables interpreted by ν, and ϕM denotes the relation
defined by the formula ϕ with free variables x.
We refer to the above semantics as the uninterpreted semantics of ϕ because
we do not give a specific meaning to inductive definitions and support expres-
sions.
Now let us define the true semantics for FL. The relation symbols R ∈ I
represent inductively defined relations, which are defined by equations of the
form R(x) := ρR (x) (see Figure 1). In the intended meaning, R is interpreted as
the least relation that satisfies the equation
R(x)M = ρR (x)M .
The usual requirement for the existence of a unique least fixpoint of the equation
is that the definition of R does not negatively depend on R. For this reason, we
require that in ρR (x) each occurrence of an inductive predicate R ∈ I is either
inside a support expression, or it occurs under an even number of negations. 2
Every support expression is evaluated on a model to a set of foreground el-
ements (under a given variable assignment ν). Formally, we are interested in
models in which the support expressions are interpreted to be the sets that cor-
respond to the smallest solution of the equations given in Figure 2. The intuition
behind these definitions was explained in Section 3.2
Example 2. Consider the inductive definition tree(x) defined in Example 1. To
check whether the equations from Figure 2 indeed yield the desired support,
note that the supports of Sp(x = nil) = Sp(x) = Sp(true) = ∅. Below, we write
[u] for a variable assignment that assigns u to the free variable of the formula
that we are considering. Then we obtain that Sp(tree(x))[u] = ∅ if u = nil , and
Sp(tree(x))[u] = Sp(α)[u] if x = nil. The formula α is existentially quantified
with guard
= left(x) ∧ r = right(x). The support of this guard is {u} because
mutable functions are applied to x. The support of the remaining part of α is the
union of the supports of tree(
)[left(u)] and tree(r)[right(u)] (the assignments for

and r that make the guard true). So we obtain for the case that u = nil that
the element u enters the support, and the recursion further descends into the
subtrees of u, as desired.
2
As usual, it would be suﬃcient to forbid negative occurrences of inductive predicates
in mutual recursion.
526 A. Murali et al.

A frame model is a model in which the interpretation of the inductive re-

lations and of the support expressions corresponds to the least solution of the
respective equations (see the Technical Report [25] for a rigorous formalisation).

Proposition 1. For each model M , there is a unique frame model over the
same universe and the same interpretation of the constants, functions, and non-
inductive relations.

3.4 A Frame Theorem

The support of a formula can be used for frame reasoning in the following sense:
if we modify a model M by changing the interpretation of the mutable functions
(e.g., a program modifying pointers), then truth values of formulas do not change
if the change happens outside the support of the formula. This is formalized
below and proven in the Technical Report [25].
Given two models M, M over the same universe, we say that M is a mutation
of M if RM = RM , cM = cM , and f M = f M , for all constants c,
relations R ∈ R, and functions f ∈ F \ Fm . In other words, M can only be
diﬀerent from M on the interpretations of the mutable functions, the inductive
relations, and the support expressions.
Given a subset X ⊆ Uσf of the elements from the foreground universe, we say
that the mutation is stable on X if the values of the mutable functions did not
change on arguments from X, that is, f M (u1 , . . . , un ) = f M (u1 , . . . , un ) for
all mutable functions f ∈ Fm and all appropriate tuples u1 , . . . , un of arguments
with {u1 , . . . , un } ∩ X = ∅.

Theorem 1 (Frame Theorem). Let M, M be frame models such that M is

a mutation of M that is stable on X ⊆ Uσf , and let ν be a variable assignment.
Then M, ν |= α iﬀ M , ν |= α for all formulas α with Sp(α)M (ν) ⊆ X, and
tM,ν = tM ,ν for all terms t with Sp(t)M (ν) ⊆ X.

3.5 Reduction from Frame Logic to FO-RD

The only extension of frame logic compared to FO-RD is the operator Sp, which
defines a function from interpretations of free variables to sets of foreground
elements. The semantics of this operator can be captured within FO-RD itself,
so reasoning within frame logic can be reduced to reasoning within FO-RD.
A formula α(y) with y = y1 , . . . , ym has one support for each interpreta-
tion of the free variables. We capture these supports by an inductively defined
relation Spα (y, z) of arity m + 1 such that for each frame model M , we have
(u1 , . . . , um , u) ∈ Spα M if u ∈ Sp(α)M (ν) for the interpretation ν that inter-
prets yi as ui .
Since the semantics of Sp(α) is defined over the structure of α, we introduce
corresponding inductively defined relations Spβ and Spt for all subformulas β
and subterms t of either α or of a formula ρR for R ∈ I.
A First-Order Logic with Frames 527

list(x) := ite(x = nil, true, ∃z : z = next(x). list(z) ∧ x ∈ Sp(list(z))

(linked list)
dll (x) := ite(x = nil :
, ite(next(x) = nil :
, ∃z : z = next(x).
prev (z) = x ∧ dll (z) ∧ x ∈ Sp(dll (z)))) (doubly linked list)
lseg(x, y) := ite(x = y :
, ∃z : z = next(x). lseg(z, y) ∧ x ∈ Sp(lseg(z, y)))
(linked list segment)
length(x, n) := ite(x = nil : n = 0, ∃z : z = next(x). length(z, n − 1))
(length of list)
slist(x) := ite(x = nil :
, ite(next(x) = nil,
, ∃z : z = next(x).
key(x) ≤ key(z) ∧ slist(z) ∧ x ∈ Sp(slist(z)))) (sorted list)
mkeys(x, M ) := ite(x = nil : M = ∅, ∃z, M1 : z = next(x).
M = M1 ∪m {key(x)} ∧ mkeys(z, M1 )) ∧ x ∈ Sp(mkeys(z, M1 ))
(multiset of keys in linked list)
btree(x) := ite(x = nil :
, ∃
, r :
= left(x) ∧ r = right(x).
btree(
) ∧ btree(r) ∧ x ∈ Sp(btree(
)) ∧ x ∈ Sp(btree(r)) ∧
Sp(btree(
)) ∩ Sp(btree(r)) = ∅) (binary tree)
bst(x) := ite(x = nil :
, ite(left(x) = nil ∧ right(x) = nil :
, ite(left(x) = nil :
∃r : r = right(x). key(x) ≤ key(r) ∧ bst(r) ∧ x ∈ Sp(bst(r)),
ite(right(x) = nil : ∃
:
= left(x). key(
) ≤ key(x) ∧ bst(
) ∧ x ∈ Sp(bst(
)),
∃
, r :
= left(x) ∧ r = right(x). key(x) ≤ key(r) ∧ key(
) ≤ key(x) ∧
bst(
) ∧ bst(r) ∧ x ∈ Sp(bst(
)) ∧ x ∈ Sp(bst(r)) ∧
Sp(bst(
)) ∩ Sp(bst(r)) = ∅)))) (binary search tree)
height(x, n) := ite(x = nil : n = 0, ∃
, r, n1 , n2 :
= left(x) ∧ r = right(x).
height(
, n1 ) ∧ height(r, n2 ) ∧ ite(n1 > n2 : n = n1 + 1, n = n2 + 1))
(height of binary tree)
bfac(x, b) := ite(x = nil : 0, ∃
, r, n1 , n2 :
= left(x) ∧ r = right(x).
height(
, n1 ) ∧ height(r, n2 ) ∧ b = n2 − n1 )
(balance factor (for AVL tree))
avl (x) := ite(x = nil :
, ∃
, r :
= left(x) ∧ r = right(x).
avl (
) ∧ avl (r) ∧ bfac(x) ∈ {−1, 0, 1} ∧
x ∈ Sp(avl (
)) ∪ Sp(avl (r)) ∧ Sp(avl (
)) ∩ Sp(avl (r)) = ∅) (avl tree)
ttree(x) := pttree(x, nil) (threaded tree)
pttree(x, p) := ite(x = nil :
, ∃
, r :
= left(x) ∧ r = right(x).
((r = nil ∧ tnext(x) = p) ∨ (r = nil ∧ tnext(x) = r)) ∧
pttree(
, x) ∧ pttree(r, p) ∧ x ∈ Sp(pttree(
, x)) ∪ Sp(pttree(r, p)) ∧
Sp(pttree(
, x)) ∩ Sp(pttree(r, p)) = ∅)
(threaded tree auxiliary deﬁnition)

Fig. 3. Example deﬁnitions of data-structures and other predicates in Frame Logic

528 A. Murali et al.

The equations for supports from Figure 2 can be expressed by inductive def-
initions for the relations Spβ . The translations are shown in the Technical Re-
port [25]. It is not hard to see that general frame logic formulas can be translated
to FO-RD formulas that make use of these new inductively defined relations.
Proposition 2. For every frame logic formula there is an equisatisfiable FO-
RD formula with the signature extended by auxiliary predicates for recursive
definitions of supports.

3.6 Expressing Data-Structures Properties in FL

We now present the formulation of several data-structures and properties about
them in FL. Figure 3 depicts formulations of singly- and doubly-linked lists,
list segments, lengths of lists, sorted lists, the multiset of keys stored in a list
(assuming a background sort of multisets), binary trees, their heights, and AVL
trees. In all these deﬁnitions, the support operator plays a crucial role. We also
present a formulation of single threaded binary trees (adapted from [7]), which are
binary trees where, apart from tree-edges, there is a pointer tnext that connects
every tree node to the inorder successor in the tree; these pointers go from leaves
to ancestors arbitrarily far away in the tree, making it a nontrivial deﬁnition.
We believe that FL formulas naturally and succinctly express these data-
structures and their properties, making it an attractive logic for annotating
programs.

4 Programs and Proofs

In this section, we develop a program logic for a while-programming language
that can destructively update heaps. We assume that location variables are de-
noted by variables of the form x and y, whereas variables that denote other
data (which would correspond to the background sorts in our logic) are denoted
by v. We omit the grammar to construct background terms and formulas, and
simply denote such ‘background expressions’ with be and clarify the sort when
it is needed. Finally, we assume that our programs are written in Single Static
Assignment (SSA) form, which means that every variable is assigned to at most
once in the program text. The grammar for our programming language is in
Figure 4.

S ::= x := c | x := y | x := y.f | v := be | x.f := y

Fig. 4. Grammar of while programs. c is a constant location, f is a ﬁeld pointer, and

be is a background expression. In our logic, we model every ﬁeld f as a function f ()
from locations to the appropriate sort.
A First-Order Logic with Frames 529

4.1 Operational Semantics

A conﬁguration C is of the form (M, H, U ) where M contains interpretations

for the store and the heap. The store is a partial map that interprets variables,
constants, and non-mutable functions (a function from location variables to lo-
cations) and the heap is a total map on the domain of locations that interprets
mutable functions (a function from pointers and locations to locations). H is a
subset of locations denoting the set of allocated locations, and U is a subset of
locations denoting a subset of unallocated locations that can be allocated in the
future. We introduce a special configuration ⊥ that the program transitions to
when it dereferences a variable not in H.
A configuration (M, H, U ) is valid if all variables of the location sort map
only to locations not in U , locations in H do not point to any location in U ,
and U is a subset of the complement of H that does not contain nil or the
locations mapped to by the variables. We denote this by valid (M, H, U ). Initial
configurations and reachable configurations of any program will be valid.
The transition of configurations on various commands that manipulate the
store and heap are defined in the natural way. Allocation adds a new location
from U into H with pointer-fields defaulting to nil and default data fields. See
the Technical Report [25] for more details.

4.2 Triples and Validity

We express speciﬁcations of programs using triples of the form {α}S{β} where

α and β are FL formulae and S is a program. The formulae are, however,
restricted— for simplicity, we disallow atomic relations on locations, and func-
tions with arity greater than one. We also disallow functions from a background
sort to the foreground sort (see Section 3). Lastly, quantified formulae can have
supports as large as the entire heap. However, our program logic covers a more
practical fragment without compromising expressivity. Thus, we require guards
in quantification to be of the form f (z ) = z or z ∈ U (z is the quantified
variable).
We define a triple to be valid if every valid configuration with heaplet being
precisely the support of α, when acted on by the program, yields a configuration
with heaplet being the support of β. More formally, a triple is valid if for every
valid configuration (M, H, U ) such that M |= α, H = Sp(α)M :

– it is never the case that the abort state ⊥ is encountered in the execution
on S.
– if (M, H, U ) transitions to (M , H , U ) on S, then M |= β and H =
Sp(β)M
530 A. Murali et al.

4.3 Program Logic

First, we deﬁne a set of local rules and rules for conditionals, while, sequence,
consequence, and framing:

Assignment: {true} x := y {x = y} {true} x := c {x = c}

Lookup: {f (y) = f (y)} x := y.f {x = f (y)}
Mutation: {f (x) = f (x)} x.f := y {f (x) = y}

Allocation: {true} alloc(x) { f (x) = def f }
f ∈F

Deallocation: {f (x) = f (x)} free(x) {true}

{be ∧ α} S {β} {¬be ∧ α} T {β}
Conditional: {α} if be then S else T {β}
{α ∧ be} S {α}
While: {α} while be do S {¬be ∧ α}
{α} S {β} {β} T {μ}
Sequence: {α} S ; T {μ}
α =⇒ α Sp(α) = Sp(α )
{α} S {β}
β =⇒ β Sp(β) = Sp(β )
Consequence: {α } S {β }
Sp(α) ∩ Sp(μ) = ∅ {α} S {β}
vars(S) ∩ fv (μ) = ∅
Frame: {α ∧ μ} S {β ∧ μ}

The above rules are intuitively clear and are similar to the local rules in
separation logic [38]. The rules for statements capture their semantics using
minimal/tight heaplets, and the frame rule allows proving triples with larger
heaplets. In the rule for alloc, the postcondition says that the newly allocated
location has default values for all pointer ﬁelds and dataﬁelds (denoted as deff ).
The soundness of the frame rule relies crucially on the frame theorem for FL
(Theorem 1). The full soundness proof can be found in the Technical Report [25].

Theorem 2. The above rules are sound with respect to the operational seman-
tics.

4.4 Weakest-Precondition Proof Rules

We now turn to the much more complex problem of designing rules that give
weakest preconditions for arbitrary postconditions, for loop-free programs. In
separation logic, such rules resort to using the magic-wand operator −∗ [12, 27,
28, 38], The magic-wand operator, a complex operator whose semantics calls for
second-order quantiﬁcation over arbitrarily large submodels. In our setting, our
main goal is to show that FL is itself capable of expressing weakest preconditions
of postconditions written in FL.
A First-Order Logic with Frames 531

First, we deﬁne a notion of Weakest Tightest Precondition (WTP) of a for-

mula β with respect to each command in our operational semantics. To define
this notion, we first define a preconfiguration, and use that definition to define
weakest tightest preconditions:
Definition 1. The preconfigurations corresponding to a valid configuration (M, H, U )
with respect to a program S are a set of valid configurations of the form (Mp , Hp , Up )
(with Mp being a model, Hp and Up a subuniverse of the locations in Mp , and Up
being unallocated locations) such that when S is executed on Mp with unallocated
set Up it dereferences only locations in Hp and results (using the operational se-
mantics rules) in (M, H, U ) or gets stuck (no transition is available). That is:
preconfigurations((M, H, U ), S) =
S
{(Mp , Hp , Up ) | valid (Mp , Hp , Up ) and (Mp , Hp , Up ) ⇒ (M, H, U ) or
(Mp , Hp , Up ) gets stuck on S}
Definition 2. α is a WTP of a formula β with respect to a program S if
{(Mp , Hp , Up ) | Mp |= α, Hp = Sp(α)Mp , valid (Mp , Hp , Up )}
= {preconfigurations((M, H, U ), S) | M |= β, H = Sp(β)M , valid (M, H, U )}
With the notion of weakest tightest preconditions, we define global program
logic rules for each command of our language. In contrast to local rules, global
specifications contain heaplets that may be larger than the smallest heap on
which one can execute the command.
Intuitively, a WTP of β for lookup states that β must hold in the precondition
when x is interpreted as x , where x = f (y), and further that the location y
must belong to the support of β. The rules for mutation and allocation are
more complex. For mutation, we define a transformation MW x.f :=y (β) that
evaluates a formula β in the pre-state as though it were evaluated in the post-
state. We similarly define such a transformation MW valloc(x) for allocation. We
will define these in detail later. Finally, the deallocation rule ensures x is not in
the support of the postcondition. The conjunct f (x) = f (x) is provided to satisfy
the tightness condition, ensuring the support of the precondition is the support
of the postcondition with x added. The rules can be seen below, and the proof
of soundness for these global rules can be found in the Technical Report [25].

Assignment-G: {β[y/x]} x := y {β} {β[c/x]} x := c {β}

Lookup-G: {∃x : x = f (y). (β ∧ y ∈ Sp(β))[x /x]} x := y.f {β}

(where x does not occur in β)

Mutation-G: {MW x.f :=y (β ∧ x ∈ Sp(β))} x.f := y {β}
Allocation-G: {∀v : (v ∈ U ) .(v = nil ⇒ MW valloc(x) (β))} alloc(x) {β}
(for some fresh variable v)
Deallocation-G: {β ∧ x ∈ Sp(β) ∧ f (x) = f (x)} free(x) {β}
(where f ∈ Fm is an arbitrary (unary) mutable function)
532 A. Murali et al.

4.5 Deﬁnitions of M W Primitives

Recall that the M W 3 primitives MW x.f :=y and MW valloc(x) need to evaluate a
formula β in the pre-state as it would evaluate in the post-state after mutation
and allocation statements. The deﬁnition of MW x.f :=y is as follows:

MW x.f :=y (β) = β[λz. ite(z = x : ite(f (x) = f (x) : y, y), f (z))/f ]

The β[λz.ρ(z)/f ] notation is shorthand for saying that each occurrence of a

term of the form f (t), where t is a term, is substituted (recursively, from in-
side out) by the term ρ(t). The precondition essentially evaluates β taking into
account f ’s transformation, but we use the ite expression with a tautological
guard f (x) = f (x) (which has the support containing the singleton x) in order
to preserve the support. The deﬁnition of MW valloc(x) is similar. Refer to the
Technical Report [25] for details.

Theorem 3. The rules above suﬃxed with -G are sound w.r.t the operational
semantics. And, each precondition corresponds to the weakest tightest precondi-
tion of β.

4.6 Example

In this section, we will see an example of using our program logic rules that we
described earlier. This will demonstrate the utility of Frame Logic as a logic for
annotating and reasoning with heap manipulating programs, as well as oﬀer some
intuition about how our program logic can be deployed in a practical setting.
The following program performs in-place list reversal: j := nil ; while (i
!= nil) do k := i.next ; i.next := j ; j := i ; i := k For the sake
of simplicity, instead of proving that this program reverses a list, we will instead
prove the simpler claim that after executing this program j is a list. The recursive
deﬁnition of list we use for this proof is the one from Figure 3:

list(x) := ite(x = nil, true, ∃z : z = next(x). list(z) ∧ x ∈ Sp(list(z)))

We need to also give an invariant for the while loop, simply stating that i
and j point to disjoint lists: list(i) ∧ list(j) ∧ Sp(list(i)) ∩ Sp(list(j)) = ∅.
We prove that this is indeed an invariant of the while loop below. Our proof
uses a mix of both local and global rules from Sections 4.3 and 4.4 above to
demonstrate how either type of rule can be used. We also use the consequence
rule along with the program rule to be applied in several places in order to
simplify presentation. As a result, some detailed analysis is omitted, such as
proving supports are disjoint in order to use the frame rule.

{list(i) ∧ list(j) ∧ Sp(list(i)) ∩ Sp(list(j)) = ∅ ∧ i = nil} (consequence rule)

3
The acronym MW is a shout-out to the Magic-Wand operator, as these serve a
similar function, except that they are deﬁnable in FL itself.
A First-Order Logic with Frames 533

{list(i) ∧ list(j) ∧ Sp(list(i)) ∩ Sp(list(j)) = ∅ ∧ i = nil ∧ i ∈

/ Sp(list(j))}
(consequence rule: unfolding list deﬁnition)
{∃k : k = next(i). list(k ) ∧ i ∈ Sp(list(k )) ∧ list(j)
∧ i ∈ Sp(list(j)) ∧ Sp(list(k )) ∩ Sp(list(j)) = ∅} (consequence rule)
{∃k : k = next(i). next(i) = next(i) ∧ list(k ) ∧ i ∈ Sp(list(k )) ∧ list(j)
∧ i ∈ Sp(list(j)) ∧ Sp(list(k )) ∩ Sp(list(j)) = ∅}
k := i.next ; (consequence rule, lookup-G rule)
{next(i) = next(i) ∧ list(k) ∧ i ∈ Sp(list(k)) ∧ list(j)
∧ i ∈ Sp(list(j)) ∧ Sp(list(k)) ∩ Sp(list(j)) = ∅}
i.next := j ; (mutation rule, frame rule)
{next(i) = j ∧ list(k) ∧ i ∈ Sp(list(k)) ∧ list(j)
∧ i ∈ Sp(list(j)) ∧ Sp(list(k)) ∩ Sp(list(j)) = ∅} (consequence rule)
{list(k) ∧ next(i) = j ∧ i ∈ Sp(list(j)) ∧ list(j) ∧ Sp(list(k)) ∩ Sp(list(j)) = ∅}
(consequence rule: folding list deﬁnition)
{list(k) ∧ list(i) ∧ Sp(list(k)) ∩ Sp(list(i)) = ∅}
j := i ; i := k (assignment-G rule)
{list(i) ∧ list(j) ∧ Sp(list(i)) ∩ Sp(list(j)) = ∅}

Armed with this, proving j is a list after executing the full program above is
a trivial application of the assignment, while, and consequence rules, which we
omit for brevity.
Observe that in the above proof we were apply the frame rule because of
the fact that i belongs neither to Sp(list(k)) nor Sp(list(j)). This can be dis-
pensed with easily using reasoning about first-order formulae with least-fixpoint
definitions, techniques for which are discussed in Section 6.
Also note the invariant of the loop is precisely the intended meaning of list(i)∗
list(j) in separation logic. In fact, as we will see in Section 6, we can define a
first-order macro Star as Star (ϕ, ψ) = ϕ ∧ ψ ∧ Sp(ϕ) ∩ Sp(ψ) = ∅. We can use
this macro to represent disjoint supports in similar proofs.
These proofs demonstrate what proofs of actual programs look like in our
program logic. They also show that frame logic and our program logic can prove
many results similarly to traditional separation logic. And, by using the derived
operator Star , very little even in terms of verbosity is sacrificed in gaining the
flexibility of Frame Logic(please see Section 6 for a broader discussion of the ways
in which Frame Logic differs from Separation Logic and in certain situations
offers many advantages in stating and reasoning with specifications/invariants).

5 Expressing a Precise Separation Logic

In this section, we show that FL is expressive by capturing a fragment of sep-
aration logic in frame logic; the fragment is a syntactic fragment of separation
logic that deﬁnes only precise formulas— formulas that can be satisﬁed in at
534 A. Murali et al.

most one heaplet for any store. The translation also shows that frame logic can
naturally and compactly capture such separation logic formulas.

5.1 A Precise Separation Logic

As discussed in Section 1, a crucial difference between separation logic and
frame logic is that formulas in separation logic have uniquely determined sup-
ports/heaplets, while this is not true in separation logic. However, it is well
known that in verification, determined heaplets are very natural (most uses of
separation logic in fact are precise) and sometimes desirable. For instance, see [8]
where precision is used crucially to give sound semantics to concurrent separa-
tion logic and [29] where precise formulas are proposed in verifying modular
programs as imprecision causes ambiguity in function contracts.
We define a fragment of separation logic that defines precise formulas (more
accurately, we handle a slightly larger class inductively: formulas that when
satisfiable have unique minimal heaplets for any given store). The fragment we
capture is similar to the notion of precise predicates seen in [29]:
Definition 3. PSL Fragment:
– sf : formulas over the stack only (nothing dereferenced). Includes isatom?(),
m(x) = y for immutable m, true, background formulas, etc.
f
– x− →y
– ite(sf , ϕ1 , ϕ2 ) where sf is from the first bullet
– ϕ1 ∧ ϕ2 and ϕ1 ∗ ϕ2
– I where I contains all unary inductive definitions I that have unique heaplets
inductively (list , tree, etc.). In particular, the body ρI of I is a formula in
the PSL fragment (ρI [I ← ϕ] is in the PSL fragment provided ϕ is in the
PSL fragment). Additionally, for all x, if s, h |= I(x) and s, h |= I(x), then
h = h . 4
f
– ∃y. (x − → y) ∗ ϕ1
Note that in the fragment negation and disjunction are disallowed, but mu-
tually exclusive disjunction using ite is allowed. Existential quantification is only
present when the topmost operator is a ∗ and where one of the formulas guards
the quantified variable uniquely.
The semantics of this fragment follows the standard semantics of separation
f
logic [12, 27, 28, 38], with the heaplet of x −→ y taken to be {x}. See Remark 1
f
in Section 3.2 for a discussion of a more accurate heaplet for x −
→ y being the set
containing the pair (x, f ), and how this can be modeled in the above semantics
by using field-lookups using non-mutable pointers.
Theorem 4 (Minimum Heap). For any formula ϕ in the PSL fragment, if
there is an s and h such that s, h |= ϕ then there is a hϕ such that s, hϕ |= ϕ
and for all h such that s, h |= ϕ, hϕ ⊆ h .
4
While we only assume unary inductive definitions here, we can easily generalize this
to inductive definitions with multiple parameters.
A First-Order Logic with Frames 535

5.2 Translation to Frame Logic

For a separation logic store and heap s, h (respectively), we define the corre-
sponding interpretation Ms,h such that variables are interpreted according to s
and values of pointer functions on dom(h) are interpreted according to h. For ϕ
in the PSL fragment, we first define a formula P (ϕ), inductively, that captures
whether ϕ is precise. ϕ is a precise formula iff, when it is satisfiable with a store
s, there is exactly one h such that s, h |= ϕ. The formula P (ϕ) is in separation
logic and will be used in the translation. To see why this formula is needed,
consider the formula ϕ1 ∧ ite(sf , ϕ2 , ϕ3 ). Assume that ϕ1 is imprecise, ϕ2 is pre-
cise, and ϕ3 is imprecise. Under conditions where sf is true, the heaplets for ϕ1
and ϕ2 must align. However, when sf is false, the heaplets for ϕ1 and ϕ3 can
be anything. Because we cannot initially know when sf will be true or false, we
need this separation logic formula P (ϕ) that is true exactly when ϕ is precise.
Definition 4. Precision predicate P :
f
– P (sf ) = ⊥ and P (x − → y) =
– P (ite(sf , ϕ1 , ϕ2 )) = (sf ∧ P (ϕ1 )) ∨ (¬sf ∧ P (ϕ2 ))
– P (ϕ1 ∧ ϕ2 ) = P (ϕ1 ) ∨ P (ϕ2 )
– P (ϕ1 ∗ ϕ2 ) = P (ϕ1 ) ∧ P (ϕ2 )
– P (I) = where I ∈ I is an inductive predicate
f
– P (∃y. (x − → y) ∗ ϕ1 ) = P (ϕ1 )
Note that this definition captures precision within our fragment since stack
formulae are imprecise and pointer formulae are precise. The argument for the
rest of the cases follow by simple structural induction.
Now we define the translation T inductively:
Definition 5. Translation from PSL to Frame Logic:
f
– T (sf ) = sf and T (x − → y) = (f (x) = y)
– ite(sf , ϕ1 , ϕ2 ) = ite(T (sf ), T (ϕ1 ), T (ϕ2 ))
– T (ϕ1 ∧ ϕ2 ) = T (ϕ1 ) ∧ T (ϕ2 ) ∧ T (P (ϕ1 )) =⇒ Sp(T (ϕ2 )) ⊆ Sp(T (ϕ1 ))
∧ T (P (ϕ2 )) =⇒ Sp(T (ϕ1 )) ⊆ Sp(T (ϕ2 ))
– T (ϕ1 ∗ ϕ2 ) = T (ϕ1 ) ∧ T (ϕ2 ) ∧ Sp(T (ϕ1 )) ∩ Sp(T (ϕ2 )) = ∅
– T (I) = T (ρI ) where ρI is the definition of the inductive predicate I as in
Section 3.
f
– T (∃y. (x − → y) ∗ ϕ1 ) = ∃y : [f (x) = y]. [T (ϕ1 ) ∧ x ∈ Sp(T (ϕ1 ))]
Finally, recall that any formula ϕ in the PSL fragment has a unique minimal
heap (Theorem 4). With this (and a few auxiliarly lemmas that can be found in
the Technical Report [25]), we have the following theorem, which captures the
correctness of the translation:
Theorem 5. For any formula ϕ in the PSL fragment, we have the following
implications: s, h |= ϕ =⇒ Ms,h |= T (ϕ)
Ms,h |= T (ϕ) =⇒ s, h |= ϕ where h ≡ Ms,h (Sp(T (ϕ)))
Here, Ms,h (Sp(T (ϕ))) is the interpretation of Sp(T (ϕ)) in the model Ms,h . Note
h is minimal and is equal to hϕ as in Theorem 4.
536 A. Murali et al.

6 Discussion
Comparison with Separation Logic. The design of frame logic is, in many ways,
inspired by the design choices of separation logic. Separation logic formulas im-
plicitly hold on tight heaplets— models are defined on pairs (s, h), where s is
a store (an interpretation of variables) and h is a heaplet that defines a subset
of the heap as the domain for functions/pointers. In Frame Logic, we choose to
not define satisfiability with respect to heaplets but define it with respect to the
entire heap. However, we give access to the implicitly defined heaplet using the
operator Sp, and give a logic over sets to talk about supports. The separating
conjunction operation ∗ can then be expressed using normal conjunction and a
constraint that says that the support of formulae are disjoint.
We do not allow formulas to have multiple supports, which is crucial as Sp is
a function, and this roughly corresponds to precise fragments of separation logic.
Precise fragments of separation logic have already been proposed and accepted in
the separation logic literature for giving robust handling of modular functions,
concurrency, etc. [8, 29]. Section 5 details a translation of a precise fragment
of separation logic (with ∗ but not magic wand) to frame logic that shows the
natural connection between precise formulas in separation logic and frame logic.
Frame logic, through the support operator, facilitates local reasoning much
in the same way as separation logic does, and the frame rule in frame logic
supports frame reasoning in a similar way as separation logic. The key difference
between frame logic and separation logic is the adherence to a first-order logic
(with recursive definitions), both in terms of syntax and expressiveness.
First and foremost, in separation logic, the magic wand is needed to express
the weakest precondition [38]. Consider for example computing the weakest pre-
condition of the formula list(x) with respect to the code y.n := z. The weakest
precondition should essentially describe the (tight) heaplets such that changing
the n pointer from y to z results in x pointing to a list. In separation logic,
n
this is expressed typically (see [38]) using magic wand as (y −
→ z) −∗ (list(x)).
However, the magic wand operator is inherently a second-order property. The
formula α −∗β holds on a heaplet h if for any disjoint heaplet that satisfies α,
β will hold on the conjoined heaplet. Expressing this property (for arbitrary α,
whose heaplet can be unbounded ) requires quantifying over unbounded heaplets
satisfying α, which is not first order expressible.
In frame logic, we instead rewrite the recursive definition list(·) to a new
one list (·) that captures whether x points to a list, assuming that n(y) = z
(see Section 4.4). This property continues to be expressible in frame logic and
can be converted to first-order logic with recursive definitions (see Section 3.5).
Note that we are exploiting the fact that there is only a bounded amount of
change to the heap in straight-line programs in order to express this in FL.
Let us turn to expressiveness and compactness. In separation logic, separa-
tion of structures is expressed using ∗, and in frame logic, such a separation
is expressed using conjunction and an additional constraint that says that the
supports of the two formulas are disjoint. A precise separation logic formula
of the form α1 ∗ α2 ∗ . . . αn is compact and would get translated to a much
A First-Order Logic with Frames 537

larger formula in frame logic as it would have to state that the supports of
each pair of formulas is disjoint. We believe this can be tamed using macros
(Star(α, β) = α ∧ β ∧ Sp(α) ∩ Sp(β) = ∅).
There are, however, several situations where frame logic leads to more com-
pact and natural formulations. For instance, consider expressing the property
that x and y point to lists, which may or may not overlap. In Frame Logic,
we simply write list(x) ∧ list(y). The support of this formula is the union of
the supports of the two lists. In separation logic, we cannot use ∗ to write
this compactly (while capturing the tightest heaplet). Note that the formula
(list(x) ∗ true) ∧ (list(y) ∗ true) is not equivalent, as it is true in heaplets that
are larger than the set of locations of the two lists. The simplest formulation we
know is to write a recursive definition lseg(u, v) for list segments from u to v and
use quantification: (∃z. lseg(x, z) ∗ lseg(y, z) ∗ list(z)) ∨ (list(x) ∗ list(y)) where
the definition of lseg is the following: lseg(u, v) ≡ (u = v ∧ emp) ∨ (∃w. u →
w ∗ lseg(w, v)).
If we wanted to say x1 , . . . , xn all point to lists, that may or may not overlap,
then in FL we can say list(x1 ) ∧ list(x2 ) ∧ . . . ∧ list(xn ). However, in separation
logic, the simplest way seems to be to write using lseg and a linear number
of quantified variables and an exponentially-sized formula. Now consider the
property saying x1 , . . . , xn all point to binary trees, with pointers left and right,
and that can overlap arbitrarily. We can write it in FL as tree(x1 )∧. . .∧tree(xn ),
while a formula in (first-order) separation logic that expresses this property
seems very complex.
In summary, we believe that frame logic is a logic that supports frame rea-
soning built on the same principles as separation logic, but is still translatable
to first-order logic (avoiding the magic wand), and makes different choices for
syntax/semantics that lead to expressing certain properties more naturally and
compactly, and others more verbosely.

Reasoning with Frame Logic using First-Order Reasoning Mechanisms. An ad-

vantage of the adherence of frame logic to being translatable to a first-order
logic with recursive definitions is the power to reason with it using first-order
theorem proving techniques. While we do not present tools for reasoning in this
paper, we note that there are several reasoning schemes that can readily handle
first-order logic with recursive definitions.
The theory of dynamic frames [18] has been proposed for frame reasoning for
heap manipulating programs and has been adopted in verification engines like
Dafny [21] that provide automated reasoning. A key aspect of dynamic frames
is the notion of regions, which are subsets of locations that can be used to
define subsets of the heap that change or do not change when a piece of code
is executed. Program logics such as region logic have been proposed for object-
oriented programs using such regions [1–3]. The supports of formulas in frame
logic are also used to express such regions, but the key difference is that the
definition of regions is given implicitly using supports of formulas, as opposed
to explicitly defining them. Separation logic also defines regions implicitly, and
538 A. Murali et al.

in fact, the work on implicit dynamic frames [31, 39] provides translations from
separation logic to regions for reasoning using dynamic frames.
Reasoning with regions using set theory in a first-order logic with recursive
definitions has been explored by many works to support automated reasoning.
Tools like Vampire [20] for first-order logic have been extended in recent work to
handle algebraic datatypes [19]; many data-structures in practice can be modeled
as algebraic datatypes and the schemes proposed in [19] are powerful tools to
reason with them using first-order theorem provers.
A second class of tools are those proposed in the work on natural proofs [23,
32, 37]. Natural proofs explicitly work with first order logic with recursive defi-
nitions (FO-RD), implementing validity through a process of unfolding recursive
definitions, uninterpreted abstractions, and proving inductive lemmas using in-
duction schemes. Natural proofs are currently used primarily to reason with
separation logic by first translating verification conditions arising from Hoare
triples with separation logic specifications (without magic wand) to first-order
logic with recursive definitions. Frame logic reasoning can also be done in a very
similar way by translating it first to FO-RD.
The work in [23] considers natural proofs and quantifier instantiation heuris-
tics for FO-RD (using a similar setup of foreground sort for locations and back-
ground sorts), and the work identifies a fragment of FO-RD (called safe fragment)
for which this reasoning is complete (in the sense that a formula is detected as
unsatisfiable by quantifier instantiation iff it is unsatisfiable with the inductive
definitions interpreted as fixpoints and not least fixpoints). Since FL can be
translated to FO-RD, it is possible to deal with FL using the techniques of [23].
The conditions for the safe fragment of FO-RD are that the quantifiers over
the foreground elements are the outermost ones, and that terms of foreground
type do not contain variables of any background type. As argued in [23], these
restrictions are typically satisfied in heap logic reasoning applications.

7 Related Work

The frame problem [13] is an important problem in many diﬀerent domains of

research. In the broadest form, it concerns representing and reasoning about
the effects of a local action without requiring explicit reasoning regarding static
changes to the global scope. For example, in artificial intelligence one wants a
logic that can seamlessly state that if a door is opened in a lit room, the lights
continue to stay switched on. This issue is present in the domain of verification
as well, specifically with heap-manipulating programs.
There are many solutions that have been proposed to this problem. The most
prominent proposal in the verification context is separation logic [12, 27, 28, 38],
which we discussed in detail in the previous section.
In contrast to separation logic, the work on Dynamic Frames [17, 18] and
similarly inspired approaches such as Region Logic [1–3] allow methods to ex-
plicitly specify the portion of the support that may be modified. This allows
fine-grained control over the modifiable section, and avoids special symbols like
A First-Order Logic with Frames 539

∗ and −∗. However, explicitly writing out frame annotations can become verbose
and tedious.
The work on Implicit Dynamic Frames [22, 39, 40] bridges the worlds of
separation logic (without magic wand) and dynamic frames— it uses separation
logic and fractional permissions to implicitly define frames (reducing annotation
burden), allows annotations to access these frames, and translates them into set
regions for first-order reasoning. Our work is similar in that frame logic also
implicitly defines regions and gives annotations access to these regions, and can
be easily translated to pure FO-RD for first-order reasoning.
One distinction with separation logic involves the non-unique heaplets in
separation logic and the unique heaplets in frame logic. Determined heaplets
have been used [29, 32, 37] as they are more amenable to automated reasoning. In
particular a separation logic fragment with determined heaplets known as precise
predicates is defined in [29], which we capture using frame logic in Section 5.
There is also a rich literature on reasoning with these heap logics for program
verification. Decidability is an important dimension and there is a lot of work on
decidable logics for heaps with separation logic specifications [4–6, 11, 26, 33].
The work based on EPR (Effectively Propositional Reasoning) for specifying
heap properties [14–16] provides decidability, as does some of the work that
translates separation logic specifications into classical logic [34].
Finally, translating separation logic into classical logics and reasoning with
them is another solution pursued in a lot of recent efforts [10, 23, 24, 32, 32,
34–37, 41]. Other techniques including recent work on cyclic proofs [9, 42] use
heuristics for reasoning about recursive definitions.

8 Conclusions
Our main contribution is to propose Frame Logic, a classical first-order logic
endowed with an explicit operator that recovers the implicit supports of formulas
and supports frame reasoning. we have argued its expressive by capturing several
properties of data-structures naturally and succinctly, and by showing that it
can express a precise fragment of separation logic. The program logic built using
frame logic supports local heap reasoning, frame reasoning, and weakest tightest
preconditions across loop-free programs.
We believe that frame logic is an attractive alternative to separation logic,
built using similar principles as separation logic while staying within the first-
order logic world. The first-order nature of the logic makes it potentially amenable
to easier automated reasoning.
A practical realization of a tool for verifying programs in a standard program-
ming language with frame logic annotations by marrying it with existing auto-
mated techniques and tools for first-order logic (in particular [19, 24, 32, 37, 41]),
is the most compelling future work.
Acknowledgements: We thank ESOP’20 reviewers for their comments that
helped improve this paper. This work is based upon research supported by the
National Science Foundation under Grant NSF CCF 1527395.
540 A. Murali et al.

Bibliography

[1] Banerjee, A., Naumann, D.: Local reasoning for global invariants, Part II:
Dynamic boundaries. Journal of the ACM (JACM) 60 (06 2013)
[2] Banerjee, A., Naumann, D.A., Rosenberg, S.: Regional logic for local rea-
soning about global invariants. In: Vitek, J. (ed.) ECOOP 2008 – Object-
Oriented Programming. pp. 387–411. Springer Berlin Heidelberg, Berlin,
Heidelberg (2008)
[3] Banerjee, A., Naumann, D.A., Rosenberg, S.: Local reasoning for global
invariants, Part I: Region logic. J. ACM 60(3), 18:1–18:56 (Jun 2013), http:
//doi.acm.org/10.1145/2485982
[4] Berdine, J., Calcagno, C., O’Hearn, P.W.: A decidable fragment of separa-
tion logic. In: Proceedings of the 24th International Conference on Founda-
tions of Software Technology and Theoretical Computer Science. pp. 97–109.
FSTTCS’04 (2004)
[5] Berdine, J., Calcagno, C., O’Hearn, P.W.: Symbolic execution with separa-
tion logic. In: Proceedings of the Third Asian Conference on Programming
Languages and Systems. pp. 52–68. APLAS’05 (2005)
[6] Berdine, J., Calcagno, C., O’Hearn, P.W.: Smallfoot: Modular automatic
assertion checking with separation logic. In: Proceedings of the 4th In-
ternational Conference on Formal Methods for Components and Ob-
jects. pp. 115–137. FMCO’05, Springer-Verlag, Berlin, Heidelberg (2006).
https://doi.org/10.1007/11804192 6
[7] Brinck, K., Foo, N.Y.: Analysis of algorithms on threaded
trees. The Computer Journal 24(2), 148–155 (01 1981).
https://doi.org/10.1093/comjnl/24.2.148
[8] Brookes, S.: A semantics for concurrent separation logic.
Theor. Comput. Sci. 375(1-3), 227–270 (Apr 2007).
https://doi.org/10.1016/j.tcs.2006.12.034
[9] Brotherston, J., Distefano, D., Petersen, R.L.: Automated cyclic en-
tailment proofs in separation logic. In: Proceedings of the 23rd Inter-
national Conference on Automated Deduction. pp. 131–146. CADE’11,
Springer-Verlag, Berlin, Heidelberg (2011), http://dl.acm.org/citation.cfm?
id=2032266.2032278
[10] Chin, W.N., David, C., Nguyen, H.H., Qin, S.: Automated veriﬁcation of
shape, size and bag properties. In: 12th IEEE International Conference
on Engineering Complex Computer Systems (ICECCS 2007). pp. 307–320
(2007)
[11] Cook, B., Haase, C., Ouaknine, J., Parkinson, M., Worrell, J.: Tractable
reasoning in a fragment of separation logic. In: Proceedings of the 22nd In-
ternational Conference on Concurrency Theory. pp. 235–249. CONCUR’11
(2011)
[12] Demri, S., Deters, M.: Separation logics and modalities: a survey. Journal
of Applied Non-Classical Logics 25, 50–99 (2015)
A First-Order Logic with Frames 541

[13] Hayes, P.J.: The frame problem and related problems in artifi-
cial intelligence. In: Webber, B.L., Nilsson, N.J. (eds.) Readings
in Artificial Intelligence, pp. 223 – 230. Morgan Kaufmann (1981).
https://doi.org/10.1016/B978-0-934613-03-3.50020-9
[14] Itzhaky, S., Banerjee, A., Immerman, N., Lahav, O., Nanevski, A., Sagiv,
M.: Modular reasoning about heap paths via effectively propositional for-
mulas. In: Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium
on Principles of Programming Languages. pp. 385–396. POPL ’14, ACM,
New York, NY, USA (2014). https://doi.org/10.1145/2535838.2535854
[15] Itzhaky, S., Banerjee, A., Immerman, N., Nanevski, A., Sagiv, M.:
Effectively-propositional reasoning about reachability in linked data struc-
tures. In: Proceedings of the 25th International Conference on Computer
Aided Verification. pp. 756–772. CAV’13, Springer-Verlag, Berlin, Heidel-
berg (2013). https://doi.org/10.1007/978-3-642-39799-8 53
[16] Itzhaky, S., Bjørner, N., Reps, T., Sagiv, M., Thakur, A.: Property-directed
shape analysis. In: Proceedings of the 16th International Conference on
Computer Aided Verification. pp. 35–51. CAV’14, Springer-Verlag, Berlin,
Heidelberg (2014). https://doi.org/10.1007/978-3-319-08867-9 3
[17] Kassios, I.T.: The dynamic frames theory. Form. Asp. Comput. 23(3), 267–
288 (May 2011). https://doi.org/10.1007/s00165-010-0152-5
[18] Kassios, I.T.: Dynamic frames: Support for framing, dependencies and shar-
ing without restrictions. In: Misra, J., Nipkow, T., Sekerinski, E. (eds.) FM
2006: Formal Methods. pp. 268–283. Springer-Verlag, Berlin, Heidelberg
(2006)
[19] Kovács, L., Robillard, S., Voronkov, A.: Coming to terms with quantified
reasoning. In: Proceedings of the 44th ACM SIGPLAN Symposium on Prin-
ciples of Programming Languages. pp. 260–270. POPL ’17, ACM, New York,
NY, USA (2017). https://doi.org/10.1145/3009837.3009887
[20] Kovács, L., Voronkov, A.: First-order theorem proving and Vampire. In:
CAV ’13. pp. 1–35 (2013). https://doi.org/10.1007/978-3-642-39799-8 1
[21] Leino, K.R.M.: Dafny: An automatic program verifier for func-
tional correctness. In: Proceedings of the 16th International Confer-
ence on Logic for Programming, Artificial Intelligence, and Reason-
ing. p. 348–370. LPAR’10, Springer-Verlag, Berlin, Heidelberg (2010).
https://doi.org/10.5555/1939141.1939161
[22] Leino, K.R.M., Müller, P.: A basis for verifying multi-threaded pro-
grams. In: Castagna, G. (ed.) Programming Languages and Systems.
pp. 378–393. Springer Berlin Heidelberg, Berlin, Heidelberg (2009).
https://doi.org/10.1007/978-3-642-00590-9 27
[23] Löding, C., Madhusudan, P., Peña, L.: Foundations for natural proofs
and quantifier instantiation. PACMPL 2(POPL), 10:1–10:30 (2018).
https://doi.org/10.1145/3158098
[24] Madhusudan, P., Qiu, X., Ştefănescu, A.: Recursive proofs for induc-
tive tree data-structures. In: Proceedings of the 39th Annual ACM
SIGPLAN-SIGACT Symposium on Principles of Programming Lan-
542 A. Murali et al.

guages. pp. 123–136. POPL ’12, ACM, New York, NY, USA (2012).
https://doi.org/10.1145/2103656.2103673
[25] Murali, A., Peña, L., Löding, C., Madhusudan, P.: A first order logic with
frames. CoRR (2019), http://arxiv.org/abs/1901.09089
[26] Navarro Pérez, J.A., Rybalchenko, A.: Separation logic + superposition
calculus = heap theorem prover. In: Proceedings of the 32nd ACM SIG-
PLAN Conference on Programming Language Design and Implementation.
pp. 556–566. PLDI ’11, ACM, New York, NY, USA (2011)
[27] O’Hearn, P.W.: A primer on separation logic (and automatic program ver-
ification and analysis). In: Software Safety and Security (2012)
[28] O’Hearn, P.W., Reynolds, J.C., Yang, H.: Local reasoning about programs
that alter data structures. In: Proceedings of the 15th International Work-
shop on Computer Science Logic. pp. 1–19. CSL ’01, Springer-Verlag, Lon-
don, UK, UK (2001), http://dl.acm.org/citation.cfm?id=647851.737404
[29] O’Hearn, P.W., Yang, H., Reynolds, J.C.: Separation and information hid-
ing. In: Proceedings of the 31st ACM SIGPLAN-SIGACT Symposium on
Principles of Programming Languages. pp. 268–280. POPL ’04, ACM, New
York, NY, USA (2004). https://doi.org/10.1145/964001.964024
[30] Parkinson, M., Bierman, G.: Separation logic and abstraction. In: Proceed-
ings of the 32nd ACM SIGPLAN-SIGACT Symposium on Principles of
Programming Languages. pp. 247–258. POPL ’05, ACM, New York, NY,
USA (2005). https://doi.org/10.1145/1040305.1040326
[31] Parkinson, M.J., Summers, A.J.: The relationship between separation logic
and implicit dynamic frames. In: Barthe, G. (ed.) Programming Languages
and Systems. pp. 439–458. Springer Berlin Heidelberg, Berlin, Heidelberg
(2011). https://doi.org/10.1007/978-3-642-19718-5 23
[32] Pek, E., Qiu, X., Madhusudan, P.: Natural proofs for data structure
manipulation in C using separation logic. In: Proceedings of the 35th
ACM SIGPLAN Conference on Programming Language Design and Im-
plementation. pp. 440–451. PLDI ’14, ACM, New York, NY, USA (2014).
https://doi.org/10.1145/2594291.2594325
[33] Pérez, J.A.N., Rybalchenko, A.: Separation logic modulo theories. In: Pro-
gramming Languages and Systems (APLAS). pp. 90–106. Springer Interna-
tional Publishing, Cham (2013)
[34] Piskac, R., Wies, T., Zufferey, D.: Automating separation logic using
SMT. In: Proceedings of the 25th International Conference on Computer
Aided Verification. pp. 773–789. CAV’13, Springer-Verlag, Berlin, Heidel-
berg (2013). https://doi.org/10.1007/978-3-642-39799-8 54
[35] Piskac, R., Wies, T., Zufferey, D.: Automating separation logic with trees
and data. In: Proceedings of the 16th International Conference on Computer
Aided Verification. pp. 711–728. CAV’14, Springer-Verlag, Berlin, Heidel-
berg (2014)
[36] Piskac, R., Wies, T., Zufferey, D.: Grasshopper. In: Ábrahám, E., Havelund,
K. (eds.) Tools and Algorithms for the Construction and Analysis of Sys-
tems. pp. 124–139. Springer Berlin Heidelberg, Berlin, Heidelberg (2014)
A First-Order Logic with Frames 543

[37] Qiu, X., Garg, P., Ştefănescu, A., Madhusudan, P.: Natural proofs for
structure, data, and separation. In: Proceedings of the 34th ACM SIG-
PLAN Conference on Programming Language Design and Implemen-
tation. pp. 231–242. PLDI ’13, ACM, New York, NY, USA (2013).
https://doi.org/10.1145/2491956.2462169
[38] Reynolds, J.C.: Separation logic: A logic for shared mutable data structures.
In: Proceedings of the 17th Annual IEEE Symposium on Logic in Computer
Science. pp. 55–74. LICS ’02 (2002)
[39] Smans, J., Jacobs, B., Piessens, F.: Implicit dynamic frames: Combining dy-
namic frames and separation logic. In: Drossopoulou, S. (ed.) ECOOP 2009
– Object-Oriented Programming. pp. 148–172. Springer Berlin Heidelberg,
Berlin, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03013-0 8
[40] Smans, J., Jacobs, B., Piessens, F.: Implicit dynamic frames.
ACM Trans. Program. Lang. Syst. 34(1), 2:1–2:58 (May 2012).
https://doi.org/10.1145/2160910.2160911
[41] Suter, P., Dotta, M., Kunćak, V.: Decision procedures for algebraic
data types with abstractions. In: Proceedings of the 37th Annual ACM
SIGPLAN-SIGACT Symposium on Principles of Programming Lan-
guages. pp. 199–210. POPL ’10, ACM, New York, NY, USA (2010).
https://doi.org/10.1145/1706299.1706325
[42] Ta, Q.T., Le, T.C., Khoo, S.C., Chin, W.N.: Automated mutual explicit
induction proof in separation logic. In: Fitzgerald, J., Heitmeyer, C., Gnesi,
S., Philippou, A. (eds.) FM 2016: Formal Methods. pp. 659–676. Springer
International Publishing, Cham (2016). https://doi.org/10.1007/978-3-319-
48989-6 40

Sreeja S Nair1 , Gustavo Petri2 , and Marc Shapiro1

1
Sorbonne Université—LIP6 & Inria, Paris, France
2
ARM Research, Cambridge, UK

Abstract. To provide high availability in distributed systems, object

replicas allow concurrent updates. Although replicas eventually converge,
they may diverge temporarily, for instance when the network fails. This
makes it diﬃcult for the developer to reason about the object’s prop-
erties, and in particular, to prove invariants over its state. For the sub-
class of state-based distributed systems, we propose a proof methodology
for establishing that a given object maintains a given invariant, taking
into account any concurrency control. Our approach allows reasoning
about individual operations separately. We demonstrate that our rules
are sound, and we illustrate their use with some representative examples.
We automate the rule using Boogie, an SMT-based tool.

Keywords: Replicated objects · Consistency · Automatic veriﬁcation ·

Distributed application design · Tool support

1 Introduction
Many modern applications serve users accessing shared data in different ge-
ographical regions. Examples include social networks, multi-user games, co-
operative engineering, collaborative editors, source-control repositories, or dis-
tributed file systems. One approach would be to store the application’s data
(which we call object) in a single central location, accessed remotely. However,
users far from the central location would suffer long delays and outages.
Instead, the object is replicated to several locations. A user accesses the
closest available replica. To ensure availability, an update must not synchronise
across replicas; otherwise, when a network partition occurs, the system would
block. Thus, a replica executes both queries and updates locally, and propagates
its updates to other replicas asynchronously.
Updates at different locations are concurrent; this may cause replicas to
diverge, at least temporarily. Replicas may diverge, but if the system ensures
Strong Eventual Consistency (SEC), this ensures that replicas that have received
the same set of updates have the same state [25], simplifying the reasoning.
The replicated object may also require to maintain some (application-specific)
invariant, an assertion about the object. We say a state is safe if the invariant
is true in that state; the system is safe if every reachable state is safe. In a se-
quential system, this is straightforward (in principle): if the initial state is safe,

c The Author(s) 2020

P. Müller (Ed.): ESOP 2020, LNCS 12075, pp. 544–571, 2020.
https://doi.org/10.1007/978-3-030-44914-8_ 20
Proving the safety of highly-available distributed objects 545

and the final state of every update individually is safe, then the system is safe.
However, these conditions are not sufficient in the replicated case, because con-
current updates at different replicas may interfere with one another. This can be
fixed by synchronising between some or all types of updates. To maximise avail-
ability and latency, such synchronisation should be minimised. In this paper, we
propose a proof methodology to ensure that a given object is system-safe, for a
given invariant and a given amount of concurrency control. In contrast to pre-
vious works, we consider state-based objects.1 Indeed, the specific properties of
state-based propagation enable simple modular reasoning despite concurrency,
thanks to the concept of concurrency invariant. Our proof methodology derives
the concurrency invariant automatically from the sequential specification. Now,
if the initial state is safe, and every update maintains both the application in-
variant and the concurrency invariant, then every reachable state is safe, even
in concurrent executions, regardless of network partitions. We have developed
a tool named Soteria, to automate our proof methodology. Soteria analyses the
specification to detect concurrency bugs and provides counterexamples.
The contributions of this paper are as follows:
– We propose a novel proof system specialised to proving the safety of avail-
able objects that converge by propagating state. This specialisation supports
modular reasoning, and thus it enables automation.
– We demonstrate that this proof system is sound. Moreover, we provide a sim-
ple semantics for state-propagating systems that allows us to ignore network
messages altogether.
– We present Soteria, to the best of our knowledge the first tool support-
ing the verification of program invariants for state-based replicated objects.
When Soteria succeeds it ensures that every execution, whether replicas are
partitioned or concurrent, is safe.
– We present a number of representative case studies, which we run through
Soteria.

2 Background

As a running example, consider a simple auction system (for simplicity, we con-

sider a single auction). An auction object is composed of the following parts:

– Its Status, that can move from initial state INVALID (under preparation) to
ACTIVE (can receive bids) and then to CLOSED (no more bids accepted).
– The Winner of the auction, that is initially ⊥ and can become the bid taking
the highest amount. In case of ties, the bid with the lowest id wins.
– The set of Bids placed, that is initially empty. A bid is a tuple composed of
• BidId: A unique identifier
• Placed: A boolean flag to indicate whether the bid has been placed or
not. Initially, it is FALSE. Once placed, a bid cannot be withdrawn.
• The monetary Amount of the bid; this cannot be modified once the bid
is created.
546 S. S. Nair et al.

Fig. 1: Evolution of state of an auction object

Figure 1 illustrates how the auction state evolves over time. The state of the
object is geo-replicated at data centers in Adelaide, Brussels, and Calgary. Users
at diﬀerent locations can start an auction, place bids, close the auction, declare
a winner, inspect the local replica, and observe if a winner is declared and who
it is. The updates are propagated asynchronously to other replicas. All replicas
will eventually agree on the same auction status, the same set of bids and the
same winner.
There are two basic approaches to propagating updates. The operation-based
approach applies an update to some origin replica, then transmits the operation
itself to be replayed at other replicas. If messages are delivered in causal order,
exactly once, and concurrent operations are commutative, then two replicas that
received the same updates reach the same state (this is the Strong Eventual
Consistency guarantee, or SEC) [25].
The state-based approach applies an update to some origin replica. Occasion-
ally, a replica sends its full state to some other replica, which merges the received
state into its own. If the state space forms a monotonic semi-lattice, an update
is an inﬂation (its output state is not lesser than the input state), and merge
computes the least-upper-bound of the local and received states, then SEC is
guaranteed [25]. As long as every update eventually reaches every replica, mes-
sages may be dropped, re-ordered or duplicated, and the set of replicas may be
unknown. Due to these relaxed requirements, state-based propagation is widely
used in industry. Figure 1 shows the state-based approach with local operations
and merges. Alternatives exist where only a delta of the state —that is, the
portion of the state not known to be part of the other replicas— is sent as a
message [1]; since this is an optimisation, it is of no consequence to the results
of this paper.

1
As opposed to operation-based. These terms are deﬁned in Section 2.
Proving the safety of highly-available distributed objects 547

Looking back to Figure 1, we can see that replicas diverge temporarily. This
temporary divergence can lead to an unsafe state, in this case declaring a wrong
winner. This correctness problem has been addressed before; however, previous
works mostly consider the operation-based propagation approach [11, 13, 19, 24].

3 System Model
In this section, we ﬁrst introduce the object components, explain the underlying
system model informally, and then formalise the operational semantics.

3.1 General Principles

An object consists of a state, a set of operations, a merge function and an in-
variant. Figure 1 illustrates three replicas of an auction object, at three diﬀerent
locations, represented by the horizontal lines. The object evolves through a set of
states. Each line depicts the evolution of the state of the corresponding replica;
time ﬂows from left to right.

State. A distributed system consists of a number of servers, with disjoint memory

and processing capabilities. The servers might be distributed over geographical
regions. A set of servers at a single location stores the state of the object. This is
called a single replica. The object is replicated at diﬀerent geographical locations,
each location having a full copy of the state. In the simplest case (for instance at
initialisation) the state at all replicas will be identical. The state of each replica
is called a local state. The global view, comprising all local states is called the
global state.

Operations. Each replica may perform the operations defined for the object.
To support availability, an operation modifies the local state at some arbitrary
replica, the origin replica for that operation, without synchronising with other
replicas (the cost of synchronisation being significant at scale). An operation
might consist of several changes; these are applied to the replica as a single
atomic unit.
Executing an operation on its origin replica has an immediate effect. However,
the state of the other replicas, called remote replicas, remains unaltered at this
point. The remote replicas get updated when the state is eventually propagated.
An immediate consequence of this execution model is that in the presence of
concurrent operations, replicas can reach different states, i.e. they diverge.
Let us illustrate this with our example in Figure 1. Initially, the auction
is yet to start, the winner is not declared and no bids are placed. By de-
fault, a replica can execute any operation - start auction, place bid, and
close auction - locally without synchronising with other replicas. We see that
the local states of replicas occasionally diverge. For example at the point where
operation close auction completes at the Adelaide replica, the Adelaide replica
is aware of only a $100 bid, the Brussels replica has two bids, and the Calgary
replica observes only one bid for $105.
548 S. S. Nair et al.

State Propagation. A replica occasionally propagates its state to other replicas

in the system and a replica receiving a remote state merges it into its own.
In Figure 1, the arrows crossing between replicas represent the delivery of a
message containing the state of the source replica, to be merged into the target
replica. A message is labelled with the state propagated. For instance, the ﬁrst
message delivery at the Brussels replica represents the result of updating the
local state (setting auction status to ACTIVE), with the state originating in the
replica at Adelaide (auction started).
Similar to the operations, a merge is atomic. In Figure 1, Alice closes the
auction at the Adelaide replica. This atomically sets the status of the auction
to CLOSED and declares a winner from the set of bids it is aware of. The up-
dated auction state and winner are transmitted together. Merging is performed
atomically by the Brussels replica.2
We now specify the merge operation for an auction. The receiving replica’s
local state is denoted σ = (status, winner, Bids), the received state is denoted
σ = (status , winner , Bids ) and the result of merge is denoted as σnew =
(statusnew , winnernew , Bidsnew ).
merge (( status , winner , Bids ) ,( status , winner , Bids )) :
status new := max ( status , status )
winner new := winner = ⊥ ? winner : winner
for ( b in Bids ∪ Bids )
Bids new . b . placed := Bids . b . placed ∨ Bids . b . placed
Bids new . b . amount := max ( Bids . b . amount , Bids . b . amount )

Furthermore, we require the operations and merge to be deﬁned in a way that

ensures convergence. We discuss the relevant properties later in Section 6.1.

Invariants. An invariant is an assertion that must evaluate to true in every local

state of every replica. Although evaluated locally at each replica, the invariant
is in eﬀect global, since it must be true at all replicas, and replicas eventually
converge. For our running example, the invariant can be stated as follows:

– Only an active auction can receive bids, and

– the highest unique bid wins when the auction closes (breaking ties using bid
identiﬁers).

This condition must hold true in all possible executions of the object.

3.2 Notations and Assumptions

First, we introduce some notations and assumptions:

– We assume a ﬁxed set of replicas, ranged over with the meta-variable r ∈ R

sampled from the domain of unique replica names R.
– We denote a local state with the meta-variable σ ∈ Σ ranged over the domain
of states of the object Σ.
2
We see that this leads to an unsafe state, we discuss this in detail in Section 4.2
Proving the safety of highly-available distributed objects 549

– The local semantic function takes an operation and a state, and returns the
state after applying the operation. We write op(σ) = σnew for executing
operation op on state σ resulting in a new state σnew .
– Ω denotes a partial function returning the current state of a replica. For
instance Ω(r) = σ means that in global state Ω, replica r is in local state
σ. We will use the notation Ω[r ← σ] to denote the global state resulting
from replacing the local state of replica r with σ. The local state of all other
replicas remains unchanged in the resulting global state.3
σ
– A message propagating states between replicas is denoted r − → r . This
represents the fact that replica r has sent a message (possibly not yet re-
ceived) to replica r , with the state σ as its payload. The meta-variable M
denotes the messages in transit in the network.
– In the following sub-section, we will utilise a set of states to record the history
of the execution. The set of past states will be ranged over with the variable
S ∈ P(Σ).
– All replicas are assumed to start in the same initial state σi . Formally, for
each replica r ∈ dom(Ωi ) we have Ωi (r) = σi .

3.3 Operational Semantics

In this and the following subsections we will present two semantics for systems
propagating states. Importantly, while the first semantics takes into account
the effects of the network on the propagation of the states, and is hence an
accurate representation of the execution of systems with state propagation, we
will show in the next subsection that reasoning about the network is unnecessary
in this kind of system. We will demonstrate this claim by presenting a much
simpler semantics in which the network is abstracted away. The importance
of this reduction is that the number of events to be considered, both when
conducting proofs and when reasoning about applications, is greatly reduced.
As informal evidence of this claim, we point at the difference in complexity
between the semantic rules presented in Figure 2 and Figure 3. We postpone the
equivalence argument to Theorem 1.
Figure 2 presents the semantic rules describing what we shall call the precise
semantics (we will later present a more abstract version) defining the transition
relations describing how the state of the object evolves.
The figure defines a semantic judgement of the form (Ω, M) → − (Ωnew , Mnew )
where (Ω, M) is a configuration where the replica states are given by Ω as shown
above, and M is a set of messages that have been transmitted by different replicas
and are pending to be received by their target replicas.
Rule Operation presents the state transition resulting from a replica r
executing an operation op. The operation queries the state of replica r, evaluates
the semantic function for operation op and updates its state with the result. The
3
This notation of a global state is used only to explain and prove our proof rule. In
fact, the rule is based only on the local state of each replica.
550 S. S. Nair et al.

Operation
Ω(r) = σ op(σ) = σnew Ωnew = Ω[r ← σnew ]
(Ω, M) −
→ (Ωnew , M)
Send
σ
Ω(r) = σ r ∈ dom(Ω) \ {r} → r }
Mnew = M ∪ { r −
(Ω, M) −
→ (Ω, Mnew )
Merge
Ω(r) = σ
σ
Mnew = M \ { r −→ r } merge(σ, σ ) = σnew Ωnew = Ω[r ← σnew ]
(Ω, M) −
→ (Ωnew , Mnew )

Op & Broadcast
Ω(r) = σ op(σ) = σnew Ωnew = Ω[r ← σnew ]
σnew
Mnew = M ∪ { r −− −→ r | r ∈ dom(Ω) \ {r} }
(Ω, M) −
→ (Ωnew , Mnew )
Merge & Broadcast
Ω(r) = σ
σ
Mnew = M \ { r −→ r } merge(σ, σ ) = σnew Ωnew = Ω[r ← σnew ]
σnew
Mnew = Mnew ∪ { r −−−→ r | r ∈ dom(Ω) \ {r} }

(Ω, M) −
→ (Ωnew , Mnew )

Fig. 2: Precise Operational Semantics: Messages

set of messages M does not change. The second rule, Send, represents the non-
deterministic sending of the state of replica r to replica r . The rule has no other
effect than to add a message to the set of pending messages M. The Merge rule
σ
picks any message, r −→ r , in the set of pending messages M, and applies the
merge function to the destination replica with the state in the payload of the
σ
message, removing r −→ r from M.
The final two rules, Op & Broadcast and Merge & Broadcast represent
the specific case when the states are immediately sent to all replicas. These rules
are not strictly necessary since they are subsumed by the application of either
Operation or Merge followed by one Send per replica. We will, however, use
them to simplify a simulation argument in what follows.
We remark at this point that no assumptions are made about the duplication
of messages or the order in which messages are delivered. This is in contrast to
other works on the verification of properties of replicated objects [11, 13]. The
reason why this assumption is not a problem in our case is that the least-upper-
bound assumption of the merge function, as well as the inflation assumptions on
the states considered in Item 2 (Section 6.1) mean that delayed messages have
no effect when they are merged.
Proving the safety of highly-available distributed objects 551

Operation
Ω(r) = σ op(σ) = σnew Ωnew = Ω[r ← σnew ]
(Ω, S) −
→ (Ωnew , S ∪ {σnew })

Merge
Ω(r) = σ σ ∈ S merge(σ, σ ) = σnew Ωnew = Ω[r ← σnew ]
(Ω, σ) −
→ (Ωnew , S ∪ {σnew })

Fig. 3: Semantic Rules with a History of States

∗
As customary we will denote with (Ω, M) −
→ (Ωnew , Mnew ) the repeated appli-
cation of the semantic rules zero or more times, from the state (Ω, M) resulting
in the state (Ωnew , Mnew ).
It is easy to see how the example in Figure 1 proceeds according to these
rules for the auction.
The following lemma,4 to be used later, establishes that whenever we use
only the broadcast rules, for any intermediate state in the execution, and for
any replica, when considering the ﬁnal state of the trace, either the replica
has already observed a fresher version of the state in the execution, or there
is a message pending for it with that state. This is an obvious consequence of
broadcasting.

Lemma 1. If we consider a restriction to the semantics of Figure 2 where in-

stead of applying the Operation rule of Figure 2 we apply the Op & Broad-
cast rule always, and instead of applying the Merge rule we apply Merge &
Broadcast always, we can conclude that given an execution starting from an
initial global state Ωi with
∗ ∗
(Ωi , ∅) −
→ (Ω, M) −
→ (Ωnew , Mnew )

for any two replicas r and r and a state σ such that Ω(r) = σ, then either:

– Ωnew (r ) ≥ σ, or
σ
– r−→ r ∈ Mnew .

3.4 Operational Semantics with State History

We now turn our attention to a simpler semantics where we omit messages from
conﬁgurations, but instead, we record in a separate set all the states occurring
in any replica throughout the execution.
The semantics in Figure 3 presents a judgement of the form (Ω, S) →
− (Ωnew , Snew )
between conﬁgurations of the form (Ω, S) as before, but where the set of messages
is replaced by a set of states denoted with the meta-variable S ∈ P(Σ).
4
The proofs for the lemmas are included in the extended version[23].
552 S. S. Nair et al.

The rules are simple. Operation executes an operation as before, and it

adds the resulting new state to the set of observed states. The rule Merge
non-deterministically selects a state in the set of states and it merges a non-
deterministically chosen replica with it. The resulting state is also added to the
set of observed states.

Lemma 2. Consider a state (Ω, S) reachable from an initial global state Ωi with
∗
the semantics of Figure 3. Formally: (Ωi , {σi }) −
→ (Ω, S). We can conclude that
the set of recorded states in the ﬁnal conﬁguration S includes all of the states
present in any of the replicas

{Ω(r)} ⊆ S
r∈dom(Ω)

3.5 Correspondence between the semantics

In this section, we show that removing the messages from the semantics, and
choosing to record states instead renders the same executions. To that end, we
will deﬁne the following relation between conﬁgurations of the two semantics
which will be later shown to be a bisimulation.

Deﬁnition 1 (Bisimulation Relation). We deﬁne the relation RΩi between

a conﬁguration (Ω, M) of the semantics of Figure 2 and a conﬁguration (Ω, S) of
the semantics of Figure 3 parameterized by an initial global state Ωi and denoted
by
(Ω, M) RΩi (Ω, S)
when the following conditions are met:
∗
1. (Ωi , ∅) −
→ (Ω, M), and
∗
2. (Ωi , {σi }) −
→ (Ω, S), and
σ
3. { σ | r − → r ∈ M } ⊆ S

In other words, two states represented in the two configurations are related
if both are reachable from an initial global state and all the states transmitted
by the messages (M) is present in the history (S).
We can now show that this relation is indeed a bisimulation. We first show
that the semantics of Figure 3 simulates that of Figure 2. That is, all behaviours
produced by the precise semantics with messages can also be produced by the
semantics with history states. This is illustrated in the commutative diagram
of Figure 4a and Figure 4b, where the dashed arrows represent existentially
quantified components that are proven to exist in the theorem.

Lemma 3 (State-semantics simulates Messages-semantics). Consider a

reachable state (Ω, M) from the initial state Ωi in the semantics of Figure 2.
Consider moreover that according to that semantics there exists a transition of
the form
(Ω, M) →
− (Ωnew , Mnew )
Proving the safety of highly-available distributed objects 553

∗ ∗
(Ωi , ∅) (Ω, M) (Ωnew , Mnew ) (Ωi , {σi }) (Ω, S) (Ωnew , Snew )
RΩi RΩi RΩi RΩi

∗ ∗
(Ωi , {σi }) (Ω, S) (Ωnew , Snew ) (Ωi , ∅) (Ω, M) (Ωnew , Mnew )

(a) Precise to History-preserving (b) History-preserving to Precise

Simulation Simulation

Fig. 4: Simulation Schema

and consider that there exists a state (Ω, S) of the history preserving semantics
of Figure 3 such that they are related by the simulation relation

(Ω, M) RΩi (Ω, S)

We can conclude that, as illustrated in Figure 4a, there exists a state (Ωnew , Snew )
such that

(Ω, S) →
− (Ωnew , Snew ) and (Ωnew , Mnew ) RΩi (Ωnew , Snew )

We will now consider the lemma showing the inverse relation. To that end we
will consider a special case of the semantics of Figure 2 where instead of apply-
ing the Operation rule, we will always apply the Op & Broadcast rule, and
instead of the Merge rule, we will apply Merge & Broadcast. As we men-
tioned before, this is equivalent to the application of the Operation/Merge
rule, followed by a sequence of applications of Send. The reason we will do this
is that we are interested in showing that for any execution of the semantics in
Figure 3 there is an equivalent (simulated) execution of the semantics of Fig-
ure 2. Since all states can be merged in the semantics of Figure 3 we have to
assume that in the semantics of Figure 2 the states have been sent with messages.
Fortunately, we can choose how to instantiate the existential send messages to
apply the rules as necessary, and that justiﬁes this choice.

Lemma 4 (Messages-semantics simulates State-semantics). Consider a

reachable state (Ω, S) from the initial state Ωi in the semantics of Figure 3.
Consider moreover that according to that semantics there exists a transition of
the form
− (Ωnew , Snew )
(Ω, S) →
and consider that there exists a state (Ω, M) of the state-preserving semantics of
Figure 3 such that they are related by the simulates relation

(Ω, M) RΩi (Ω, S)

We can conclude that there exists a state (Ωnew , Mnew ) such that

(Ω, M) →
− (Ωnew , Mnew ) and (Ωnew , Mnew ) RΩi (Ωnew , Snew )
554 S. S. Nair et al.

As before, an illustration of this lemma is presented in Figure 4b.

We can now conclude that the two semantics are bisimilar:
Theorem 1 (Bisimulation). The semantics of Figure 2 and Figure 3 are
bisimilar as established by the relation defined in Definition 1.
The theorem above justifies carrying out our proofs with respect to the se-
mantics of Figure 3, which has fewer rules and it better aligns with our proof
methodology. This is also justifies that when reasoning semantically about state-
propagating object systems we can generally ignore the effects of network delays
and messages.
From the standpoint of concurrency, the system model allows the execution of
asynchronous concurrent operations, where each operation is executed atomically
in each replica, and the aggregation of results of different operations is performed
lazily as replicas exchange their state. At this point, we assume the set of states,
along with the operations and merge, forms a monotonic semi-lattice. This is a
sufficient condition for Strong Eventual Consistency [3, 4, 25].
We have seen that even though we achieve convergence later, there can be
instances or even long periods of time during which replicas might diverge. We
need to ensure that the concurrent executions are still safe. In the next section,
we discuss how to ensure safety of distributed objects built on top of the system
model we described.

4 Proving Invariants

In this section, we report our invariant veriﬁcation strategy. Speciﬁcally, we con-

sider the problem of verifying invariants of highly-available distributed objects.
To support the verification of invariants we will consider a syntactic-driven
approach based on program logic. Bailis et al.[2] identifies necessary and sufficient
run-time conditions to establish the security of application invariants for highly-
available distributed databases in a criterion dubbed I-confluence. Moreover,
they consider the validity of a number of typical invariants and applications.
Our work improves on the I-confluence criterion defined in [2] by providing a
static, syntax-driven, and mostly-automatic mechanism to verify the correctness
of an invariant for an application. We will address the specific differences in
Section 7, the related work.
An important consequence of our verification strategy is that while we are
proving invariants about a concurrent highly-distributed system, our verification
conditions are modular (on the number of API operations), and can be carried
out using standard sequential Hoare-style reasoning. These verification condi-
tions in turn entail stability of the assertions as one would have in a logic like
Rely/Guarantee.
Let us start by assuming that a given initial state for the object is denoted
σi . Initially, all replicas have σi as their local state. As explained earlier, each
replica executes a sequence of state transitions, due either to a local update or
to a merge incorporating remote updates.
Proving the safety of highly-available distributed objects 555

Let us call safe state a replica state that satisﬁes the invariant. Assuming
the current state is safe, any update (local or merge) must result in a safe state.
To ensure this, every update is equipped with a precondition that disallows any
unsafe execution.5 Thus, a local update executes only when, at the origin replica,
the current state is safe and its precondition currently holds.
Formally, an update u (an operation or a merge), mutates the local state σ, to
a new state σnew = u(σ). To preserve the invariant, Inv, we require that the local
state respects the precondition of the update, Preu : σ ∈ Preu =⇒ u(σ) ∈ Inv
To illustrate local preconditions, consider an operation close auction(w:
BidId), which sets auction status to CLOSED and the winner to w (of type BidId).
The developer may have written a precondition such as status = ACTIVE be-
cause closing an auction doesn’t make sense otherwise. In order to ensure the
invariant that the winner has the highest amount, one needs to strengthen it
with the clause is highest(Bids, w), deﬁned as
∀ b ∈ Bids , b . placed =⇒ b . Amount ≤ w . Amount

Similarly, merge also needs to be safe. To illustrate merge precondition, let

us use our running example. We wish to maintain the invariant that the highest
bid is the winner. Assume a scenario where the local replica declared a winner
and closed the auction. An incoming state from a remote replica contains a bid
with a higher amount. When the two states are merged, we see that the resulting
state is unsafe. So we must strengthen the merge operation with a precondition.
The strengthened precondition looks like this:
status = CLOSED =⇒ ∀ Bids ∈ P( Bids ) , is_highest ( Bids , w )
∧ status = CLOSED =⇒ ∀ Bids ∈ P( Bids ) , is_highest ( Bids , w )

This means that if the status is CLOSED in either of the two states, the winner
should be the highest bid in any state. This condition ensures that when a winner
is declared, it is the highest bid among the set of bids in any state at any replica.
Since merge can happen at any time, it must be the case that its precondition
is always true, i.e., it constitutes an additional invariant. We call this as the
concurrency invariant. Now our global invariant consists of two parts: ﬁrst, the
invariant (Inv), and second, the concurrency invariant(Invconc ).

4.1 Invariance Conditions

The veriﬁcation conditions in Figure 5 ensure that for any reachable local state
of a replica, the global invariant Inv ∧ Invconc , is a valid assertion. We assume
the invariant to be a Hoare-logic style assertion over the state of the object.
In a nutshell, all of these conditions check (i) the precondition of each of the
operations, and that of the merge operation uphold the global invariant, and
(ii) the global invariant of the object consists of the invariant and the concurrency
invariant (precondition of merge).
We will develop this intuition in what follows. Let us now consider each of
the rules:
5
Technically, this is at least the weakest-precondition of the update for safety. It
strengthens any a priori precondition that the developer may have set.
556 S. S. Nair et al.

σi Inv (1)
⎛ ⎞
σ Preop ∧
∀ op, σ, σnew , ⎝ σ Inv ∧ ⎠ ⇒ σnew Inv (2)
op(σ) = σnew
⎛ ⎞
(σ, σ ) Premerge ∧
⎜ σ Inv ∧ ⎟
∀ σ, σ , σnew , ⎜
⎝
⎟⇒
⎠ σnew Inv (3)
σ Inv ∧
merge(σ, σ ) = σnew

(σi , σi ) Invconc (4)

⎛ ⎞
σ Preop ∧
∀ op, σ, σ , σnew , ⎝ (σ, σ ) Invconc ∧ ⎠ ⇒ (σnew , σ ) Invconc (5)
op(σ) = σnew
⎛ ⎞
(σ, σ ) Premerge ∧
∀ σ, σ , σnew , ⎝ (σ, σ ) Invconc ∧ ⎠ ⇒

(σnew , σ ) Invconc (6)
merge(σ, σ ) = σnew

Fig. 5: Invariant Conditions

– Clearly, the initial state of the object must satisfy the global invariant, this
is checked by conditions (1) and (4).
The rest of the rules perform a kind of inductive reasoning. Assuming that we
start in a state that satisﬁes the global invariant, we need to check that any
state update preserves the validity of said invariant. Importantly, this reasoning
is not circular, since the initial state is known by the rule above to be safe. 6
– Condition (2) checks that each of the operations, when executed starting
in a state satisfying its precondition and the invariant, is safe. Notice that
we require that the precondition of the operation be satisﬁed in the start-
ing state. This is the core of the inductive argument alluded to above, all
operations – which as we mentioned in Section 3 execute atomically w.r.t.
concurrency – preserve the invariant Inv.
Other than the execution of operations, the other source of local state changes
is the execution of the merge function in a replica. It is not true in general that
for any two given states of an object, the merge should compute a safe state.
In particular, it could be the case that the merge function needs a precondition
that is stronger than the conjunction of the invariants in the two states to be
merged. The following rules deal with these cases.
– We require the merge function to be annotated with a precondition strong
enough to guarantee that merge will result in a safe state. Generally, this
6
Indeed, the proof of soundness of program logics such as Rely/Guarantee are typi-
cally inductive arguments of this nature.
Proving the safety of highly-available distributed objects 557

precondition can be obtained by calculating the weakest precondition [9] of

merge w.r.t. the desired invariant. Since merge is the only operation that
requires two states as input, the precondition of merge has two states. We
can then verify that merging two states is safe. This is the purpose of rule (3).
As per the program model of Section 3, any two replicas can exchange their states
at any given point of time and trigger the execution of a merge operation. Thus,
it must be the case that the precondition of the merge function is enabled at all
times between any two replica local states. Since merge is the only point where
a local replica can observe the result of concurrent operations in other replicas,
we call this a concurrency invariant (Invconc ). In other words: the concurrency
invariant is part of the global invariant of the object. This is the main insight
that allows us to reduce the proof of the distributed object to checking that both
the invariant Inv and the concurrency invariant Invconv are global invariants. In
particular, the latter implies the former, but for exposition purposes we shall
preserve the invariant Inv in the rules.
– Just as we did with the operations above, we now need to check that when-
ever we have a pair of states that satisfy the concurrency invariant, if one
of these states changes, the resulting pair still satisfies the concurrency in-
variant. This is exactly the purpose of rule (5) in the case where the state
change originates from an operation execution in one of the replicas of the
pair. This rule is similar to rule (2) above, where the invariant Inv has been
replaced by Invconc , and consequently we have a pair of states.
– Finally, as we did with rule (3), we need to check the case where one of the
states of a pair of states satisfying Invconc is updated because of yet another
merge happening (w.r.t. yet another replica) in one of these states. This is
the purpose of rule (6) which is similar to rule (3), with Inv replaced for
Invconc .
As anticipated at the beginning of this section, the reasoning about the con-
currency is performed in a completely local manner, by carefully choosing the
verification conditions, and it avoids the stability blow-up commonly found in
concurrent program logics. The program model, and the verification conditions
allow us to effectively reduce the problem of verifying safety of an asynchronous
concurrent distributed system, to the modular verification of the global invariant
(Inv ∧ Invconc ) as pre and post conditions of all operations and merge.
Proposition 1 (Soundness). The proof rules in equations (1)-(6) guarantee
that the implementation is safe.
To conduct an inductive proof of this lemma we need to strengthen the
argument to include the set of observed states as given by the semantics of
Figure 3.
Lemma 5 (Strengthening of Soundness). Assuming that the equations (1)-
(6) hold for an implementation of a replicated object with initial state Ωi . For
∗
any state (Ω, S) reachable from (Ωi , {σi }), that is (Ωi , {σi }) −
→ (Ω, S), we have
that:
558 S. S. Nair et al.

1. for all states σ, σ ∈ S, (σ, σ ) Invconc , and

2. for any state σ ∈ S, σ Inv.

Corollary 1. The soundness proposition (1) is a direct consequence of Lemma 5.

We remark at this point that there are numerous program logic approaches
to proving invariants of shared-memory concurrent programs, with Rely/Guar-
antee [15] and concurrent separation logic [6] underlying many of them. While
these approaches could be adapted to our use case (propagating-state distributed
systems), this adaptation is not evident. As an indication of this complexity: one
would have to predicate about the different states of the different replicas, re-
state the invariant to talk about these different versions of the state, encode the
non-deterministic behaviour of merge, etc. Instead, we argue that our specialised
rules are much simpler, allowing for a purely sequential and modular verification
that we can mechanise and automate. This reduction in complexity is the main
theoretical contribution of this paper.

4.2 Applying the proof rule

Let us apply the proof methodology to the auction object. Its invariant is the
following conjunction:

1. Only an ACTIVE auction can receive bids, and

2. the highest bid, also unique, wins when the auction is CLOSED.

Computing the weakest precondition of each update operation, for this invariant
is obvious. For instance, as discussed earlier, close auction(w) gets precondi-
tion is highest(Bids, w), because of Invariant Item 2 above.
Despite local updates to each replica respecting the invariant Inv, Figure 1
showed that it is susceptible of being violated by merging. This is the case if Bob’s
$100 bid in Brussels wins, even though Charles concurrently placed a $105 bid
in Calgary; this occurred because status became CLOSED in Brussels while still
ACTIVE in Calgary. The weakest precondition of merge for safety expresses that,
if status in either state is CLOSED, the winner should be the bid with the highest
amount in both the states. This merge precondition, now called the concurrency
invariant, strengthens the global invariant to be safe in concurrent executions.
Let us now consider how this strengthening impacts the local update opera-
tions. Since starting the auction doesn’t modify any bids, the operation trivially
preserves it. Placing a bid might violate Invconc if the auction is concurrently
closed in some other replica; conversely, closing the auction could also violate
Invconc , if a higher bid is concurrently placed in a remote replica. Thus, the auc-
tion object is safe when executed sequentially, but it is unsafe when updates are
concurrent. This indicates the speciﬁcation has a bug, which we now proceed to
ﬁx.
Proving the safety of highly-available distributed objects 559

4.3 Concurrency Control for Invariant Preservation

As we discussed earlier, the preconditions of operations and merge are strength-
ened in order to be sequentially safe. An object must also preserve the con-
currency invariant in order to ensure concurrent safety. Violating this indicates
the presence of a concurrency bug in the specification. In that case, the opera-
tions that fail to preserve the concurrency invariant might need to synchronise.
The developer adds the required concurrency control mechanisms as part of the
state in our model. The modified state is now composed of the state and the
concurrency control mechanism.
Recall that in the auction example, placing bids and closing the auction did
not preserve the precondition of merge. This requires strengthening the specifi-
cation by adding a concurrency control mechanism to restrict these operations.
We can enforce them to be strictly sequential, thereby avoiding any concurrency
at all. But this will affect the availability of the object.
A concurrency control can be better designed with the workload character-
istics in mind. For this particular use case, we know that placing bids are much
more frequent operations than closing an auction. Hence we try to formulate a
concurrency control like a readers-writer lock. In order to realise this we dis-
tribute tokens to each replica. As long as a replica has the token, it can allow
placing bids. Closing the auction requires recalling the tokens from all replicas.
This ensures that there are no concurrent bids placed and thus a winner can
be declared, respecting the invariant. The addition of this concurrency control
also updates the Invconc . Clearly, all operations must respect this modification
for the specification to be considered safe.
Note that the token model described here restricts availability in order to
ensure safety. Adding efficient synchronization is not a problem to be solved
only with application specification in hand, it rather requires the knowledge of
the application dynamics such as the workload characteristics and is part of our
future work.
Figure 6 shows the evolution of the modified auction object with concur-
rency control. The keys shown are the tokens distributed to each replica. When
a replica wants to close the auction, it can request tokens from other replicas.
When a replica releases its token, it is indicated by a cross mark on the key. This
concurrency control mechanism makes sure that the object is safe during con-
current executions as well. The specification including the concurrency control
is given in the extended version[23].
To summarize, all updates (operations and merge) have to respect the global
invariant (Inv ∧ Invconc ). If an update violates Inv, the developer must strengthen
its precondition. If an update violates Invconc , the developer must add concur-
rency control mechanisms.

5 Case Studies
This section presents three representative examples of diﬀerent consistency re-
quirements of several distributed applications. The consensus object is an ex-
560 S. S. Nair et al.

Fig. 6: Evolution of state in an auction object with concurrency control

ample of a coordination-free design, illustrating a safe object with just eventual

consistency. The next example of a distributed lock shows a design that main-
tains a total order, illustrating strong consistency. And the ﬁnal example of
courseware shows a mix of concurrent operations and operations with restrained
concurrency. This example, similar to our auction example, illustrates applica-
tions that might require coordination for some operations to ensure safety.
For each case study, we give an overview of the operational semantics infor-
mally. We then discuss how the design preserves the safety conditions discussed
in Section 4. We also provide pseudocode for better comprehension.

5.1 Consensus application

Consensus is required in distributed systems when all replicas have to agree

upon a single value. We consider the specification of a consensus object with a
fixed number of replicas. We assume that replica failures are solved locally by
redundancy or other means, and all replicas participate.
The state consists of a boolean flag indicating the result of consensus, and
a boolean array indicating the votes from replicas. Each replica agrees on a
proposal by setting its dedicated entry in the boolean array. A replica cannot
withdraw its agreement. A replica sets the consensus flag when it sees all entries
of the boolean array set.
The consistency between the values of agree flag and the boolean array is
ensured by the invariant. The merge function is the disjunction of the individual
components. In this case study, we can see that the merge ensures safety without
any additional precondition. This means that the object is trivially safe under
concurrent executions.
Proving the safety of highly-available distributed objects 561

Initial state : Comparison function :

¬B ∧ ¬flag flag ∨ (¬flag 0 ∧ ( B ∨ ¬B 0 ))

Invariant : { Pre mark : True }

flag =⇒ B # no precondition
mark ():
{ Pre merge : True } B . me := true
# no precondition
merge (B , flag , B 0 , flag 0 ): { Pre agree : B }
B := B ∨ B 0 agree ():
flag := flag ∨ flag 0 flag := true

Fig. 7: Pseudocode for consensus

Comparison function :
Initial state : t > t0
∃ r, V.r ∧ t = 0 ∨ ( t = t0 ∧ V = V0 )

{Pretransfer : V . me } {Premerge :
transfer ( r o ): ( t = t 0 =⇒ V = V 0 )
t = t +1 ∧ ( V . me =⇒ t ≥ t 0 )}
V . me := false merge (( t , V ) ,( t 0 ,V 0 )):
V . r 0 := true t = max (t , t 0 )
v = ( t 0 <t )? V : V 0
Invariant :
∃ r , V . r ∧ ∀ r , r 0 , ( V . r ∧ V . r 0 ) =⇒ r = r 0

Fig. 8: Speciﬁcation of a distributed lock

The pseudo code of the consensus example is shown in Figure 7. The design
for consensus can be relaxed, requiring only the majority of replicas to mark
their boxes. The extension for that is trivial.

5.2 A replicated concurrency control

We now discuss an object, a distributed lock, that ensures mutual exclusion. We

use an array of boolean values, one entry per replica, to model a lock. If a replica
owns the lock, the corresponding array entry is set to true. The lock is transferred
to any other replica by using the transfer function. The full speciﬁcation is shown
in Figure 8.
We need to ensure that the lock is owned by exactly one replica at any given
point in time, which is the invariant here. For simplicity, we are not considering
failures. In order to preserve safety, we need to enforce a precondition on the
transfer operation such that the operation can only transfer the ownership of
562 S. S. Nair et al.

its origin replica. For state inﬂation, a timestamp associated with the lock is
incremented during each transfer.
A merge of two states of this distributed lock will preserve the state with
the highest timestamp. In order for the merge function to be the least upper
bound, we must specify that if the timestamps of the two states are equal, their
corresponding boolean arrays are also equal. Also if the origin replica owns the
lock, it has the highest timestamp. The conjunction of these two restrictions
which form the precondition of merge, Premerge , is the concurrency invariant,
Invconc .
Consider the case of three replicas r1 , r2 and r3 sharing a distributed lock.
Assume that initially replica r1 owns the lock. Replicas r2 and r3 concurrently
place a request for the lock. The current owner r1 , has to make a decision on the
priority of the requests based on the business logic. r1 calculates a higher priority
for r3 and transfers the lock to r3 . Since r1 no longer has the lock, it cannot issue
any further transfer operations. We see here clearly that the transfer operation is
safe. In the new state, r3 is the only replica that can perform a transfer operation.
We can also note that this prevents any concurrent transfer operations. This can
guarantee mutual exclusion and hence ensures safety in a concurrent execution
environment.
An interesting property we can observe from this example is total order. Due
to the preconditions imposed in order to be safe, we see that the states progress
through a total order, ordered by the timestamp. The transfer function increases
the timestamp and merge function preserves the highest timestamp.

5.3 Courseware
We now look at an application that allows students to register and enroll in a
course. For space reasons, we elide the pseudocode which can be found in the
extended version[23]. The state consists of a set of students, a set of courses and
enrollments of students for different courses. Students can register and deregister,
courses can be created and deleted, and a student can enroll for a course. The
invariant requires enrolled students and courses to be registered and created
respectively.
The set of students and courses consists of two sets - one to track registrations
or creations and another to track deregistrations or deletions. Registration or cre-
ation monotonically adds the student or course to the registered sets respectively
and deregistration or deletion monotonically adds them to the unregistered sets.
The semantics currently doesn’t support re-registration, but that can be fixed
by using a slightly modified data structure that counts the number of times the
student has been registered/unregistered and decides on the status of registra-
tion. Enrollment adds the student-course pair to the set. Currently, we do not
consider canceling an enrollment, but it is a trivial extension. Merging two states
takes the union of the sets.
Let us consider the safety of each operation. The operations to register a
student and create a course are safe without any restrictions. Therefore they do
not need any precondition. The remaining three operations might violate the
Proving the safety of highly-available distributed objects 563

invariant in some cases. This leads to strengthening their preconditions. The

precondition of the operation for deregistering a student and deleting a course
requires no existing enrollments for them. For enrollment, both the student and
the course should be registered/created and not unregistered/deleted.
Merge also requires strengthening of its precondition. It requires the set of
enrolled students and courses to be registered and not unregistered in all the
remote states as well. This is the concurrent invariant (Invconc ) for this object.
Running this speciﬁcation through our tool which we describe in Section 6
reveals concurrency issues for deregistering a student, deleting a course and
enrollment. This means that we need to add concurrency control to the state.
For this use case, we know that enrolling will be more frequent than dereg-
istering a student or deleting a course. So, we model a concurrency control
mechanism as in the case of the auction object discussed earlier. We assign a
token to each replica for each student and course, called a student token and
course token respectively. A replica will have a set of student tokens indicating
the registered students and course tokens indicating the created courses. In order
to deregister a student or delete a course, all replicas must have released their
tokens for that particular student/course. Enroll operations can progress as long
as the student token and course token are available at the local replica for the
student and course for that particular enrollment.
This concurrency control mechanism now forms part of the state. The precon-
ditions of operations and merge are recomputed and the concurrency invariant
is updated. The edited speciﬁcation passes all checks and is deemed safe.

6 Automation
In this section, we present a tool to automate the verification of invariants as
discussed in the previous sections. Our tool, called Soteria is based on the Boogie
[5] verification framework. The input to Soteria is a specification of the object
written as Boogie procedures, augmented with a number of domain-specific an-
notations needed to check the properties described in Section 4.
Let us now consider how a distributed object is specified in Soteria.:
– State: We require the programmer to provide a declaration of the state
using the global variables in Boogie. The data types can be either built-in
or user defined.
– Comparison function: Next we require the programmer to provide a com-
parison function. This function determines the partial order on states. Again,
we shall use this comparison function as a basis to check the lattice condi-
tions, and whether each operation is an inflation on the lattice. We use the
keyword @gteq to annotate the comparison function in the tool. This com-
parison function returns true when all the components of the first state are
greater than or equal to the corresponding components in the other state. It
is encoded as a function in Boogie.
– Operations: We require the programmer to provide the implementation of
the operations of the object. Moreover, for each operation op we require the
564 S. S. Nair et al.

programmer to provide the precondition Preop . In general, operations are

encoded as Boogie procedures. Alternatively, we could just require only a
post-condition describing how the state transitions from the precondition to
the post-condition. Notice that since in our program model operations are
atomic, this is an unambiguous encoding of the operations.
A few things are important in this code. The specification declares opera-
tions that can modify the contents of the global variables as declared in the
modifies clause. Preconditions are annotated with the requires clauses,
and the postcondition is specified by the ensures clauses. The semantics of
multiple requires and ensures clauses is conjunction.
– Merge function: We require the special merge operation to be distin-
guished from other operations. To that end, we use the annotation @merge.
While, as mentioned before, the precondition of merge can be obtained by
calculating the weakest precondition to ensure safety. The current version of
Soteria does not perform this step automatically, it relies on the developer to
provide the preconditions. Notice that, as we argued in Section 4.1, Soteria
will consider this as the concurrency invariant (Invconc ).
While in Section 3 we mentioned that the merge procedure takes two states
as arguments, in the specification input to Soteria, the procedure merge takes
only one state as the argument. This is because this procedure assumes that
the merge is being applied in a replica, and therefore, the local state of the
replica is captured by the global variables.
– Invariant: Clearly, we require the programmer to provide the invariant to
be verified by the tool. This invariant is simply provided as a Boogie asser-
tion over the state of the object. Once more, we require the invariant to be
annotated with the special keyword @invariant.
While these are the components required by Soteria to check the safety, often
Boogie requires additional information to verify the procedures. Some of these
components are:
– User-defined data types,
– Constants to declare special objects such as the origin replica me, or to
bound the quantifiers,
– We sometimes make recourse to inductively-defined functions over aggregate
data structures, for instance, to obtain the maximum in a set of values. Since
we would like to use these functions in the specifications, we axiomatise their
semantics to enable the SMT solver used by Boogie to discharge our proof
obligations. This is particularly important for list comprehensions, and array
operations. We follow the approach of Leino et al.[18].
– When we iterate over lists, arrays or matrices, we need to provide Boogie
with loop invariants. Loops are part of the programs, and thus, verified by
Boogie.

6.1 Veriﬁcation passes

The veriﬁcation of a speciﬁcation is performed in multiple stages. Let us consider
these in order:
Proving the safety of highly-available distributed objects 565

1. Syntax checks
The first simple checks validate that the specification provided respects Boo-
gie syntax when ignoring Soteria annotations. It also calls Boogie to validate
that the types are correct and that the pre/post conditions provided are
sound.
Then it checks that the specification provides all the elements necessary for a
complete specification. Specifically, it checks the function signatures marked
by @gteq and @invariant and the procedure marked by @merge.
2. Convergence check
This stage checks the convergence of the specification. Specifically, it checks
whether the specification respects Strong Eventual Consistency. The Strong
Eventual Consistency (SEC) property states that any two replicas that re-
ceived the same set of updates are in the same state. To guarantee this,
objects are designed to have certain sufficient properties in the encoding of
the state [3, 4, 25], which can be summarised as follows:
– The state space is equipped with an ordering operator, comparing two
states.
– The ordering forms a join-semilattice.
– Each individual operation is an inflation in the semilattice.
– The merge operation, composing states from two replicas, computes the
least-upper-bound of the given states in the semilattice.
We present the conditions formally in the extended version[23].
An alternative is to make use of the CALM theorem [12]. This allows non-
monotonic operations, but requires them to coordinate. However, our aim is
to provide maximum possible availability with SEC. 7
To ensure these conditions of Strong Eventual Consistency, the tool performs
the following checks:
– That each operation is an inflation. In a nutshell, we prove using Boogie
the following Hoare-logic triple:
assume σ ∈ Preop
call σnew := op(σ)
assert σnew ≥ σ

– Merge computes the least upper bound. The veriﬁcation condition dis-
charged is shown below:
assume (σ, σ ) ∈ Premerge
call σnew := merge(σ, σ )
assert σnew ≥ σ ∧ σnew ≥ σ
assert ∀σ∗, σ∗ ≥ σ ∧ σ∗ ≥ σ =⇒ σ∗ ≥ σnew

3. Safety check This stage verifies the safety of the specification as discussed
in Section 4. This stage is divided further into two sub-stages:
– Sequential safety: Soteria checks whether each individual operation is
safe. This corresponds to the conditions (2) and (3) in Figure 5. The
verification condition discharged by the tool to ensure sequential safety
of operations is:
7
Convergence of our running example is discussed in the extended version[23].
566 S. S. Nair et al.

assume σ ∈ Preop ∧ Inv

call σnew := op(σ)
assert σnew ∈ Inv

The special case of the merge function is veriﬁed with the following
veriﬁcation condition:
assume (σ, σ ) ∈ Premerge ∧ σ ∈ Inv ∧ σ ∈ Inv
call σnew := merge(σ, σ )
assert σnew ∈ Inv

Notice that in this condition we assume that there are two copies of the
state, the state of the replica applying the merge, and the state with
superscript representing a state arriving from another replica. In case of
failure of the sequential safety check, the designer needs to strengthen
the precondition of the operation (or merge) which was unsafe.
– Concurrent safety: Here we check whether each operation upholds the
precondition of merge. This corresponds to the conditions (5) and (6) in
Figure 5. Notice that while this check relates to the concurrent behaviour
of the distributed object, the check itself is completely sequential; it does
not require reasoning about operations performed by other processes. As
shown in Section 4, this ensures safety during concurrent operation.
The veriﬁcation conditions are:
assume σ ∈ Preop ∧ Inv ∧ (σ, σ ) ∈ Invconc
call σnew := op(σ)
assert (σnew , σ ) ∈ Invconc

to validate each operation op, and

assume (σ, σ ) ∈ Invconc ∧ σ ∈ Inv ∧ σ ∈ Inv
call σnew := merge(σ, σ )
assert (σnew , σ) ∈ Invconc

to validate a call to merge. If the concurrent safety check fails, the design
of the distributed object needs a replicated concurrency control mecha-
nism embedded as part of the state.

When all checks are validated, the tool reports that the specification is safe.
Whenever a check fails, Soteria provides a counterexample 8 along with the
failure message tailored to the type of check. This can help the developer identify
issues with the specification and fix it.
Once the invariants and specification of an application is given, Soteria is
fully automatic, thanks to Z3, an SMT solver that is fully automated. The spec-
ification of the application includes the state, all the operations including the pre
and post conditions (including merge). In case the invariant cannot be proven,
Soteria provides counter-examples. The programmer can leverage these to up-
date the specification with appropriate concurrency control, rerun Soteria, and
so on until the application is correct. As far as the proof system is concerned, no
programmer involvement is required. Currently, the effort of adding the required
synchronization conditions is manual, but as the next step, we are working on
8
Soteria uses the counter model provided by Boogie.
Proving the safety of highly-available distributed objects 567

automating the eﬃcient generation of synchronization control considering the

workload characteristics. The tool and the full speciﬁcations in the form of the
tool input are available at Soteria [22]. 9

7 Related Work

Several works have concentrated on the formalisation and speciﬁcation of even-

tually consistent systems [7, 8, 27] to mention but a few.
A number of works concentrate on the specification and correct implementa-
tion of replicated data types [10, 14]. Unlike these works, we are not concerned
with the correctness of the data type implementation with respect to a specifi-
cation, but rather on proving properties that hold of a distributed object.
Gotsman et al.[11] present a proof methodology for proving invariants of
distributed objects. In fact, that work has been extended with a tool called
CISE [24] which, similar to Soteria, performs the check using an SMT solver as
a backend. Another more user-friendly tool was developed by Marcelino et al.[19]
based on the principle of CISE. It is named Correct Eventual Consistency(CEC)
Tool. The tool is based on Boogie verification framework and also proposes sets
of tokens that the developer might use. An improved token generation by using
the counterexamples generated by Boogie is discussed by Nair and Shapiro[20].
Unlike our work, CISE and CEC (and more generally the work of Gots-
man et al.[11]) consider the implementation of operation-based objects. As a
consequence, they assume that the underlying network model ensures causal
consistency, and the proof methodology therein presented requires reasoning
about concurrent behaviours (reflected as stability verification conditions on as-
sertions). We position Soteria as a complementary tool to CISE, since CISE is
not well-adapted to reason about systems that propagate state, and Soteria is
not well-adapted to reason about objects that propagate operations. We con-
sider, as part of our future work, the use of both CISE and Soteria in tandem
to prove properties depending on the implementation of the objects at hand.
Houshmand et al.[13] extends CISE by lowering the causal consistency re-
quirements and generating concurrency control protocols. It still requires rea-
soning about concurrent behaviours.
As anticipated in Section 4, Bailis et al. [2] introduced the concept of I-
confluence based on a similar system model. I-confluence states that for an
invariant to hold in a lattice-based state-propagating distributed application,
the set of reachable valid (i.e. invariant preserving) states must be closed under
operations and merge. This condition is similar to the ones presented in Figure 5.
However, there is a fundamental difference: while Bailis et al. [2] recognises that
one needs to consider only reachable states when checking that the merge opera-
tion satisfies the invariant, they do not provide means to identify these reachable
states. This is indeed a hard problem. In Soteria, we instead over-approximate
the set of reachable states by ignoring whether the states are indeed reachable,
9
Experimental results with verification time is provided in the extended version[23].
568 S. S. Nair et al.

but requiring that their merge satisfies the invariant. This is captured in the
concurrency invariant, Invconc , which is synthesised from the user provided in-
variant. How to obtain this invariant is understandably not addressed in Bailis
et al.[2] since no proof technique is provided. Notice that this is a sound approxi-
mation since it guarantees the invariant is satisfied, and we also verify that every
operation preserves this condition as shown in Corollary 1. In this sense we say
that the pre-condition of merge for a given invariant I, is also an invariant of the
system. It is this abstraction step that makes the analysis performed by Soteria
to be syntax-driven, automated, and machine-checked. The fact that Soteria is
an analysis of a program is in contrast with I-confluence [2] where no means
to link a given program text to the semantical model, let alone rules to show
that the syntax implies invariant preservation, are provided. In other words, I-
confluence [2] does not provide a program logic, but rather a meta-theoretical
proof about lattice-based state-propagating systems.
Our previous work [21], provides an informal proof methodology for ensuring
safety of Convergent Replicated Data Types(CvRDTs), which are a group of
specialised data structures used to ensure convergence in distributed program-
ming. This work builds upon it, and formalises the proof rules and prove them
sound. We relax the requirement of CvRDTs by allowing the usage of any data
types, that together respect the lattice conditions mentioned in Section 3. We
also show several case studies which demonstrate the use of the rule.
A final interesting remark is that we can show how our methodology can
aid in the verification of distributed objects mediated by concurrency control.
Some works [16, 17, 26, 27] have considered this problem from the standpoint of
synthesis, or from the point of view of which mechanisms can be used to check
a certain property of the system.

8 Conclusion
We have presented a sound proof rule to verify invariants of state-based dis-
tributed objects, i.e., the objects that propagate state. We present the proof
obligations guaranteeing that the implementation is safe in concurrent execution
by reducing the problem to checking that each operation of the object satisﬁes
a precondition of the merge function of the state.
We presented Soteria, a tool sitting on top of the Boogie veriﬁcation frame-
work. This tool can be used to identify the concurrency bugs in the design of
a distributed object. Soteria also checks convergence by checking the lattice
conditions on the state, described by [3]. We have shown multiple compelling
case-studies showing how Soteria can be leveraged to ensure the correctness of
distributed objects that propagate state. It would be an interesting next step
to look into automatic concurrency control synthesis. The synthesised concur-
rency control can be analysed and adapted dynamically to minimise the cost of
synchronisation.
Acknowledgements. This research is supported in part by the RainbowFS project (Agence Na-
tionale de la Recherche, France, number ANR-16-CE25-0013-01) and by European H2020 project
732 505 LightKone (2017–2020).
Proving the safety of highly-available distributed objects 569

Bibliography

[1] Almeida, P.S., Shoker, A., Baquero, C.: Delta state replicated data types.
J. Parallel Distrib. Comput. 111, 162–173 (2018), https://doi.org/10.1016/
j.jpdc.2017.08.003
[2] Bailis, P., Fekete, A., Franklin, M.J., Ghodsi, A., Hellerstein, J.M., Sto-
ica, I.: Coordination avoidance in database systems. Proc. VLDB Endow.
8(3), 185–196 (Nov 2014), http://dx.doi.org/10.14778/2735508.2735509,
int. Conf. on Very Large Data Bases (VLDB) 2015, Waikoloa, Hawai’i, USA
[3] Baquero, C., Almeida, P.S., Cunha, A., Ferreira, C.: Composition in state-
based replicated data types. Bulletin of the EATCS 123 (2017), http://
eatcs.org/beatcs/index.php/beatcs/article/view/507
[4] Baquero, C., Moura, F.: Using structural characteristics for autonomous
operation. Operating Systems Review 33(4), 90–96 (1999), https://doi.org/
10.1145/334598.334614
[5] Barnett, M., Chang, B.Y.E., DeLine, R., Jacobs, B., Leino, K.R.M.: Boogie:
A modular reusable verifier for object-oriented programs. In: Proceedings
of the 4th International Conference on Formal Methods for Components
and Objects. pp. 364–387. FMCO’05, Springer-Verlag, Berlin, Heidelberg
(2006), http://dx.doi.org/10.1007/11804192 17
[6] Brookes, S., O’Hearn, P.W.: Concurrent separation logic. SIGLOG News
3(3), 47–65 (2016), https://dl.acm.org/citation.cfm?id=2984457
[7] Burckhardt, S.: Principles of eventual consistency. Foundations and Trends
in Programming Languages 1(1-2), 1–150 (2014), https://doi.org/10.1561/
2500000011
[8] Burckhardt, S., Gotsman, A., Yang, H., Zawirski, M.: Replicated data types:
Specification, verification, optimality. In: Symp. on Principles of Prog. Lang.
(POPL). pp. 271–284. San Diego, CA, USA (Jan 2014), http://doi.acm.org/
10.1145/2535838.2535848
[9] Dijkstra, E.: A discipline of programming. Prentice-Hall series in automatic
computation, Prentice-Hall (1976)
[10] Gomes, V.B.F., Kleppmann, M., Mulligan, D.P., Beresford, A.R.: A frame-
work for establishing strong eventual consistency for conflict-free replicated
datatypes. Archive of Formal Proofs 2017 (2017), https://www.isa-afp.org/
entries/CRDT.shtml
[11] Gotsman, A., Yang, H., Ferreira, C., Najafzadeh, M., Shapiro, M.: ’Cause
I’m Strong Enough: Reasoning about consistency choices in distributed sys-
tems. In: Symp. on Principles of Prog. Lang. (POPL). pp. 371–384. St. Pe-
tersburg, FL, USA (2016), http://dx.doi.org/10.1145/2837614.2837625
[12] Hellerstein, J.M., Alvaro, P.: Keeping CALM: when distributed consistency
is easy. CoRR abs/1901.01930 (2019), http://arxiv.org/abs/1901.01930
[13] Houshmand, F., Lesani, M.: Hamsaz: Replication coordination analysis and
synthesis. Proc. ACM Program. Lang. 3(POPL), 74:1–74:32 (Jan 2019),
http://doi.acm.org/10.1145/3290387
570 S. S. Nair et al.

[14] Jagadeesan, R., Riely, J.: Eventual consistency for crdts. In: Ahmed, A.
(ed.) Programming Languages and Systems - 27th European Symposium
on Programming, ESOP 2018, Held as Part of the European Joint Con-
ferences on Theory and Practice of Software, ETAPS 2018, Thessaloniki,
Greece, April 14-20, 2018, Proceedings. Lecture Notes in Computer Sci-
ence, vol. 10801, pp. 968–995. Springer (2018), https://doi.org/10.1007/
978-3-319-89884-1 34
[15] Jones, C.B.: Specification and design of (parallel) programs. In: Mason, R.
(ed.) Information Processing 83. IFIP Congress Series, vol. 9, pp. 321–332.
IFIP, North-Holland/IFIP, Paris, France (Sep 1983)
[16] Kaki, G., Earanky, K., Sivaramakrishnan, K., Jagannathan, S.: Safe repli-
cation through bounded concurrency verification. Proc. ACM Program.
Lang. 2(OOPSLA), 164:1–164:27 (Oct 2018), http://doi.acm.org/10.1145/
3276534
[17] Kaki, G., Nagar, K., Najafzadeh, M., Jagannathan, S.: Alone together:
Compositional reasoning and inference for weak isolation. In: Symp. on
Principles of Prog. Lang. (POPL). Proc. ACM Program. Lang., vol. 2, pp.
27:1–27:34. Assoc. for Computing Machinery, Assoc. for Computing Ma-
chinery, Los Angeles, CA, USA (Dec 2017), http://doi.acm.org/10.1145/
3158115
[18] Leino, K.R.M., Monahan, R.: Reasoning about comprehensions with first-
order smt solvers. In: Proceedings of the 2009 ACM Symposium on Applied
Computing. pp. 615–622. SAC ’09, ACM, New York, NY, USA (2009),
http://doi.acm.org/10.1145/1529282.1529411
[19] Marcelino, G., Balegas, V., Ferreira, C.: Bringing hybrid consistency closer
to programmers. In: W. on Principles and Practice of Consistency for
Distr. Data (PaPoC). pp. 6:1–6:4. PaPoC ’17, Euro. Conf. on Comp. Sys.
(EuroSys), ACM, Belgrade, Serbia (2017), http://doi.acm.org/10.1145/
3064889.3064896
[20] Nair, S., Shapiro, M.: Improving the “Correct Eventual Consistency” tool.
Rapport de recherche RR-9191, Institut National de la Recherche en Infor-
matique et Automatique (Inria), Paris, France (Jul 2018), https://hal.inria.
fr/hal-01832888
[21] Nair, S.S., Petri, G., Shapiro, M.: Invariant safety for distributed appli-
cations. In: W. on Principles and Practice of Consistency for Distr. Data
(PaPoC). pp. 4:1–4:7. Assoc. for Computing Machinery, Assoc. for Com-
puting Machinery, Dresden, Germany (Mar 2019), https://doi.org/10.1145/
3301419.3323970
[22] Nair, S.S., Petri, G., Shapiro, M.: Soteria. https://github.com/sreeja/
soteria tool (2019)
[23] Nair, S.S., Petri, G., Shapiro, M.: Proving the safety of highly-available
distributed objects (Extended version). Tech. rep. (Feb 2020), https://hal.
archives-ouvertes.fr/hal-02492599
[24] Najafzadeh, M., Gotsman, A., Yang, H., Ferreira, C., Shapiro, M.: The
CISE tool: Proving weakly-consistent applications correct. In: W. on Prin-
ciples and Practice of Consistency for Distr. Data (PaPoC). EuroSys 2016
Proving the safety of highly-available distributed objects 571

workshops, Assoc. for Computing MachinerySpecial Interest Group on Op.

Sys. (SIGOPS), Assoc. for Computing Machinery, London, UK (Apr 2016),
http://dx.doi.org/10.1145/2911151.2911160
[25] Shapiro, M., Preguiça, N., Baquero, C., Zawirski, M.: Conﬂict-free repli-
cated data types. In: Défago, X., Petit, F., Villain, V. (eds.) Int. Symp.
on Stabilization, Safety, and Security of Dist. Sys. (SSS). Lecture Notes in
Comp. Sc., vol. 6976, pp. 386–400. Springer-Verlag, Grenoble, France (Oct
2011)
[26] Shapiro, M., Saeida Ardekani, M., Petri, G.: Consistency in 3D. In: Deshar-
nais, J., Jagadeesan, R. (eds.) Int. Conf. on Concurrency Theory (CON-
CUR). Leibniz Int. Proc. in Informatics (LIPICS), vol. 59, pp. 3:1–3:14.
Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing,
Germany, Québec, Québec, Canada (Aug 2016), http://dx.doi.org/10.4230/
LIPIcs.CONCUR.2016.3
[27] Sivaramakrishnan, K., Kaki, G., Jagannathan, S.: Declarative programming
over eventually consistent data stores. In: Assoc. for Computing Machin-
erySpecial Interest Group on Pg. Lang. (SIGPLAN). pp. 413–424. PLDI
’15, Assoc. for Computing Machinery, Assoc. for Computing Machinery,
Portland, OR, USA (2015), http://doi.acm.org/10.1145/2737924.2737981

Rong Pan1 , Qinheping Hu2 , Rishabh Singh3 , and Loris D’Antoni2

1
The University of Texas at Austin, Austin, USA
2
University of Wisconsin-Madison, Madison, USA
3
Google, Mountain View, USA

Abstract. Program sketching is a program synthesis paradigm in which

the programmer provides a partial program with holes and assertions.
The goal of the synthesizer is to automatically find integer values for
the holes so that the resulting program satisfies the assertions. The most
popular sketching tool, Sketch, can efficiently solve complex program
sketches, but uses an integer encoding that often performs poorly if the
sketched program manipulates large integer values. In this paper, we
propose a new solving technique that allows Sketch to handle large in-
teger values while retaining its integer encoding. Our technique uses a
result from number theory, the Chinese Remainder Theorem, to rewrite
program sketches to only track the remainders of certain variable values
with respect to several prime numbers. We prove that our transformation
is sound and the encoding of the resulting programs are exponentially
more succinct than existing Sketch encodings. We evaluate our tech-
nique on a variety of benchmarks manipulating large integer values. Our
technique provides speedups against both existing Sketch solvers and
can solve benchmarks that existing Sketch solvers cannot handle.

1 Introduction

Program synthesis, the art of automatically generating programs that meet a

user’s intent, promises to increase the productivity of programmers by automat-
ing tedious, error-prone, and time-consuming tasks. Syntax-guided Synthesis
(SyGuS) [2], where the search space of possible programs is defined using a gram-
mar or a domain-specific language, has emerged as a common program synthesis
paradigm for many synthesis domains. One of the earliest and successful syntax-
guided program synthesis frameworks is program sketching [19], where (i ) the
search space of the synthesis problem is described using a partial program in
which certain integer constants are left unspecified (represented as holes), and
(ii ) the specification is provided as a set of assertions describing the intended be-
havior of the program. The goal of the synthesizer is to automatically replace the
holes in the program with integer values so that the resulting complete program
satisfies all the assertions. Thanks to its simplicity, program sketching has found
wide adoption in applications such as data-structure design [20], personalized
education [18], program repair [7], and many others.

c The Author(s) 2020

P. Müller (Ed.): ESOP 2020, LNCS 12075, pp. 572–598, 2020.
https://doi.org/10.1007/978-3-030-44914-8_ 21
Solving Program Sketches with Large Integer Values 573

The most popular sketching tool, Sketch [21], can efficiently solve complex
program sketches with hundreds of lines of code. However, Sketch often per-
forms poorly if the sketched program manipulates large integer values. Sketch’s
synthesis is based on an algorithm called counterexample-guided inductive syn-
thesis (Cegis) [21]. The Cegis algorithm iteratively considers a finite set I of
inputs for the program and performs SAT queries to identify values for the holes
so that the resulting program satisfies all the assertions for the inputs in I.
Further SAT queries are then used to verify whether the generated solution is
correct on all the possible inputs of the program. Sketch represents integers
using a unary encoding (a variable for each integer value) so that arithmetic
computations such as addition, multiplication etc. can be represented efficiently
in the SAT formulas as lookup operations. This unary encoding, however, results
in huge formulas for solving sketches with larger integer values as we also observe
in our evaluation. Recently, an SMT-like technique that extends the SAT solver
with native integer variables and integer constraints was proposed to alleviate
this issue in Sketch. It guesses values for the integer variables and propagates
them through the integer constraints, and learns from conflict clauses. However,
this technique does not scale well when the sketches contain complex arithmetic
operations—e.g., non-linear integer arithmetic.
In this paper, we propose a program transformation technique that allows
Sketch to solve program sketches involving large integer values while retain-
ing the unary encoding used by the traditional Sketch solver. Our technique
rewrites a Sketch program into an equivalent one that performs computations
over smaller values. The technique is based on the well-known Chinese Remain-
der Theorem, which states that, given distinct prime numbers p1 , . . . , pn such
that N = p1 · . . . · pn , for every two distinct numbers 0 ≤ k1 , k2 < N , there
exists a pi such that k1 mod pi = k2 mod pi . Intuitively, this theorem states that
tracking the modular values of a number smaller than N for each pi is enough to
uniquely recover the actual value of the number itself. We use this idea to replace
a variable x in the program with n variables xp1 , . . . , xpn , so that for every i,
xpi = x mod pi . Using closure properties of modular arithmetic we show that,
as long as the program uses the operators +, −, ∗, ==, tracking the modular
values of variables and performing the corresponding operations on such values
is enough to ensure correctness. For example, to reflect the variable assignment
x = y + z, we perform the assignment xpi = (ypi + zpi ) mod pi , for every pi . Sim-
ilarly, the Boolean operation x == y will only hold if xpi = ypi , for every pi . To
identify what variables and values in the program can be rewritten, we develop
a data-flow analysis that computes what variables may flow into operations that
are not sound in modular arithmetic—e.g., <, >, ≤, and /.
We provide a comprehensive theoretical analysis of the complexity of the
proposed transformation. First, we derive how many prime numbers are needed
to track values in a certain integer range. Second, we analyze the number of bits
required to encode values in the original and rewritten program and show that,
for the unary encoding used by Sketch, our technique offers an exponential
saving in the number of required bits.
574 R. Pan et al.

We evaluate our technique on 181 benchmarks from various applications of

program sketching. Our results show that our technique results in signiﬁcant
speedups over existing Sketch solvers and is able to solve 48 benchmarks on
which Sketch times out.
Contributions. In summary, our contributions are:

– A language IMP-MOD together with a modular semantics that represents in-

teger values using their remainders for a given set of primes and a proof that
this semantics is equivalent to the standard integer semantics (§ 4).
– A data-ﬂow analysis for detecting variables that can be soundly executed in
the modular semantics and an algorithm for translating IMP programs into
IMP-MOD ones (§ 5).
– A synthesis algorithm for IMP-MOD programs and incremental synthesis al-
gorithm that lazily increases the number of primes used in the modular
semantics (§ 6).
– A complexity analysis that shows that synthesis for IMP-MOD programs re-
quires exponentially smaller SAT queries than synthesis in IMP (§ 7).
– An evaluation of our technique on 181 benchmarks that manipulate large
integer values. Our solver outperforms the default Sketch unary solver, it
can solve 48 new benchmarks that no Sketch solver can solve, and is 15.9X
faster than the Sketch SMT-like integer solver on the hard benchmarks
that take more than 10 seconds to solve (§ 8).

An extended version containing all proofs and further details has been uploaded
to arXiv as supplementary material.

2 Motivating Example

In this section, we use a simple example to illustrate our technique and its
eﬀectiveness. Consider the Sketch program polyArray presented in Figure 1b.
The goal of this synthesis problem is to synthesize a two-variable quadratic
polynomial (lines 7–8) whose evaluation p on given inputs x and y is equal to a
given expected-output array z (line 9). Solving the problem amounts to ﬁnding
non-negative integer values for the holes (??) and sign values, i.e., -1 or 1, for
the holes (??s ) such that the assertion becomes true.1 In this case, a possible
solution is the polynomial:
p [ i ] = -17* y [ i ]^2 -8* x [ i ]* y [ i ] -17* x [ i ]^2 -3* x [ i ];

When attempting to solve this problem, the Sketch synthesizer times out at
300 seconds. To solve this problem, Sketch creates SAT queries where the
variables are the holes. Due to the large numbers involved in the computation of
this program, the unary encoding of Sketch ends up with SAT formulas with
approximately 45 million clauses.
1
In Sketch, holes can only assume positive values. This is why we need the sign holes,
which are implemented using regular holes as follows: if(??) then 1 else -1.
Solving Program Sketches with Large Integer Values 575

1 // n =4 , x =[24 , -1 ,0 , -19] , y =[ -7 ,11 , -3 ,13]

2 // z =[ -9353 , -1983 , -153 , -6977]
3 polyArray ( int n , int [ n ] x , int [ n ] y , int [ n ] z ){
4 int [ n ] p ;
5 int i =0;
6 while (i < n ){
7 p [ i ]=?? s1 *?? 1 * y [ i ] 2 +?? s2 *?? 2 * x [ i ] 2 +?? s3 *?? 3 * x [ i ]* y [ i ]
8 +?? s4 *?? 4 * y [ i ]+?? s5 *?? 5 * x [ i ]+?? s6 *?? 6 ;
9 assert p [ i ] == z [ i ];
10 i ++; }
11 }
(a) Original sketch program.

1 // n =4 , x =[24 , -1 ,0 , -19] , y =[ -7 ,11 , -3 ,13]

2 // z =[ -9353 , -1983 , -153 , -6977]
3 pAPrime ( int n , int [ n ] x , int [ n ] y , int [ n ] z ){
4 int [ n ] x2 , x3 , x5 , x7 , x11 , x13 , x17 ;
5 while (i < n ){ // Initialize modular variables
6 x2 [ i ]= x [ i ]%2;
7 x3 [ i ]= x [ i ]%3;
8 ... i ++; }
9 int i =0;
10 int [ n ] p2 , p3 , p5 , p7 , p11 , p13 , p17 ;
11 while (i < n ){
12 p2 [ i ]=(?? s1 *(?? 1 %2)*( y2 [ i ] 2 %2)%2
13 +?? s2 *(?? 2 %2)*( x2 [ i ] 2 %2)%2
14 +?? s3 *(?? 3 %2)*( x2 [ i ]%2)*( y2 [ i ]%2)%2
15 +?? s4 *(?? 4 %2)*( y2 [ i ]%2)%2
16 +?? s5 *(?? 5 %2)*( x2 [ i ]%2)%2
17 +?? s6 *(?? 6 %2)%2)%2;
18 ...
19 assert p2 [ i ] = z2 [ i ];
20 assert p3 [ i ] = z3 [ i ];
21 ...
22 i ++; }
23 }
(b) Rewritten sketch program.

Fig. 1: Sketch program (a) and rewritten version with values tracked for diﬀer-
ent moduli (b).

Sketch Program with Modular Arithmetic The technique we propose in this paper
has the goal of reducing the complexity of the synthesis problem by transforming
the program into an equivalent one that manipulates smaller integer values and
that yields easier SAT queries. Given the Sketch program in Figure 1b, our
technique produces the modiﬁed Sketch program pAPrime in Figure 1a. The
new Sketch program has the same control ﬂow graph as the original one, but
576 R. Pan et al.

instead of computing the actual values of the expressions x[·] and y[·], it tracks
their remainders for the set of prime numbers {2, 3, 5, 7, 11, 13, 17} using new
variables—e.g., x2[i] tracks the remainder of x[i] modulo 2.
The program pAPrime initializes the modular variables with the correspond-
ing modular values (lines 5–8). When rewriting a computation over modular
variables, the same computation is performed modularly (lines 12–17). For ex-
ample, the term ??s1 ∗ ??1 *y[i]2 when tracked modulo 2 is rewritten as
(?? s1 *(?? 1 %2)*(( y2 [ i ]%2) 2 %2))%2

In the rewritten program, the variables i and n are not tracked modularly,
since such a transformation would incorrectly access array indices. Finally, the
assertions for diﬀerent moduli share the same holes as the solution to the Sketch
has to be correct for all modular values. In the rest of the paper, we develop a
data ﬂow analysis that detects when variables can be tracked modularly.
Sketch can solve the rewritten program in less than 2 seconds and produce
hole values that are correct solutions for the original program. This speedup
is due to the small integer values manipulated by the modular computations.
In fact, the intermediate SAT formulas generated by Sketch for the program
pAPrime have approximately 120 thousand clauses instead of the 45 million
clauses for polyArray. Due to the complex arithmetic in the formulas, even if
Sketch uses the SMT-like native integer encoding, it still requires more than
300 seconds to solve this problem.
While this technique is quite powerful, it does have some limitations. In
particular, the solution to the rewritten Sketch is guaranteed to be a correct
solution only for inputs that cause intermediate values of the program to be in
a range [d1 , d2 ] such that d2 − d1 ≤ 2 × 3 × 5 × 7 × 11 × 13 × 17 = 510, 510. We
will prove this result in Section 4.

3 Preliminaries

In this section, we describe the IMP language that we will consider through-
out the paper and briefly recall the counter-example guided inductive synthesis
algorithm employed by the Sketch solver.
For simplicity, we consider a simple imperative language IMP with integer
holes for defining the hypothesis space of programs. The syntax and semantics
of IMP are shown in Appendix ??. Without loss of generality, we assume the
programs consists of a single program f (v1 , · · · , vn , ??1 , . . .??m ) with n integer
variables and m integer holes. The body of the program f consists of a sequence
of statements, where a statement s can either be a variable assignment, a while
loop statement, an if conditional statement, or an assert statement. The holes
?? denote integer constant values that are unknown and the goal of the synthesis
process is to compute these values such that a set of desired program assertions
are satisfied for every possible input values to f .2
2
Our implementation also supports for-loops, recursion, arrays, and complex types.
Solving Program Sketches with Large Integer Values 577

Example 1. An example IMP sketch denoting a partial program is shown below.

triple (n ,h ,??){ h =??; assert h * n == n + n + n ; }

The goal of the synthesizer is to compute the value of the hole ?? such that the
assertion is true for all possible input values of n and h. For this example, ?? = 3
is a valid solution.

The Sketch solver uses the counter-example guided inductive synthesis al-
gorithm (Cegis) to ﬁnd hole values such that the desired assertions hold for all
input values. Formally, the Sketch synthesizer solves the following constraint:

∃?? ≡ (??1 , · · ·, ??m )∈Zm . ∀in∈I. f (in, ??)IMP = ⊥

where Z denotes the domain of all integer values, ?? denotes the list of unknown
hole values (??1 , · · · , ??m ) ∈ Zm , I denotes the domain of all input argument
values to the function f , and f (in, ??)IMP = ⊥ denotes that the program satis-
ﬁes all assertions. The synthesis problem is in general undecidable for a language
with complex operations such as the IMP language because of the inﬁnite size of
possible hole and input values. To make the synthesis process more tractable,
Sketch imposes a bound on the sizes of both the input domain (Ib ) and the
domain of hole values (Zb ) to obtain the following constraint:

∃?? ≡ (??1 , · · ·, ??m )∈Zm

b . ∀in∈Ib . f (in, ??)
IMP
= ⊥

The bounded domains make the synthesis problem decidable, but the second-
order quantified formula results in a search space of hole values that is still huge
for any reasonable bounds. To solve such bounded equations efficiently, Sketch
uses the Cegis algorithm to incrementally add inputs from the domain until
obtaining hole values ?? that satisfy the assertion predicates for all the input
values in the bounded domain. The algorithm solves the second-order formula
by iteratively solving a series of first-order queries. It first encodes the existential
query (synthesis query) over a randomly selected input value in0 to find the hole
values H that satisfy the predicate for in0 using a SAT solver in the backend.

∃?? ≡ (??1 , · · ·, ??m ) ∈ Zm

b . f (in0 , ??)
IMP
= ⊥

It then encodes another existential query (veriﬁcation) to now ﬁnd a counter-

example in1 for which the predicate is not satisfied for the previously found hole
values.
∃in ∈ Ib . ¬f (in, H)IMP = ⊥
If no counter-example input can be found, the hole values are returned as the de-
sired solution. Otherwise, the algorithm computes a new hole value that satisfies
the assertion for all the counter-example inputs found so far. This process contin-
ues iteratively until either a desired hole value is found (i.e. no counter-example
input exists), no satisfiable hole value is found (i.e. the synthesis problem is
infeasible), or the SAT solver times out.
578 R. Pan et al.

Integer Encoding The Sketch solver can efficiently solve the synthesis con-
straint in many domains, but it does not scale well for sketches manipulating
large numbers. Sketch uses a unary encoding to represent integers, where the
encoded formula consists of a variable for each integer value. The unary encod-
ing allows for simplifying the representation of complex non-linear arithmetic
operations. For example, a multiplication operation can be represented as sim-
ply a lookup table using this encoding. In practice, the unary encoding results
in magnitudes of faster solving times compared to the logarithmic encoding for
many synthesis problems. However, this also results in huge SAT formulas in
presence of large integers. Recently, a new SMT-like technique based on extend-
ing the SAT solver with native integer variables and constraints was proposed to
alleviate this issue in Sketch. Similar to the Boolean variables, this extended
solver guesses for integer values and propagates them in the constraints while
also learning from conflict clauses. Note that Sketch uses these SAT extensions
and encodings instead of an SMT solver as SMT doesn’t scale well for the non-
linear constraints typically found in the synthesis problems. Our new technique
for handling computations over large numbers still maintains the efficient unary
encoding of integers and computations over them.

4 Modular Arithmetic Semantics

In this section, we present the language IMP-MOD in which variables can be

tracked using modular arithmetic. We start by recalling the Chinese Remain-
der Theorem, then deﬁne both a modular and integer semantics for the IMP-MOD
language, and show that the two semantics are equivalent.

4.1 The Chinese Remainder Theorem

The Chinese Remainder Theorem is a powerful number theory result that shows
the following: given a set of distinct primes P = {p1 , . . . , pk }, any number n in
an interval of size p1 · . . . · pk can be uniquely identified from the remainders
[n mod p1 , · · · , n mod pk ]. In Section 4.2, we will use this idea to define the
semantics of the IMP-MOD language. The main benefit of this idea is that the
remainders could be much smaller than actual program values.

Example 2. For P = [3, 5, 7] and an integer 101, its remainders [2, 1, 3] are much
smaller than 101. However, any number of the form 101 + 105 × n also has
remainders [2, 1, 3] with respect to the same prime set.

In general, one cannot uniquely determine an arbitrary integer value from its
remainders for some set P—i.e., the mapping from a number to its remainders
is an abstraction in the sense of abstract interpretation [6]. However, if we are
interested in a limited range of integer values [L, U ), one can choose a set of
primes P = {p1 , . . . , pk } such that, for values L ≤ x < U , the map [r1 , · · · , rk ]
→
x, where x ≡ ri mod pi , is an injection.
Solving Program Sketches with Large Integer Values 579

Modular Expr aP := cP | v P | aP1 opPa aP2 | toPrime(a)

Modular Op opPa := + | − | ∗

Arith Expr a := ?? | c | v | a1 opa a2

Arith Op opa := + | − | ∗ | /
Bool Expr b := not b | a1 opc a2 | b1 and b2 | b1 or b2 | aP1 ==aP2
Comp Op opc := < | > | ≤ | ≥

Stmt s := v = a | v P = aP | s1 ; s2
| while(b) {s} | if(b) s1 else s2 | assert b
Program P := f (v1 , · · · , vn , v1P , · · · , vm
P
, ??1 , . . . , ??l ) {s}

Fig. 2: Syntax of the IMP-MOD language.

Theorem 1 (Chinese Remainder Theorem [4]). Let p1 , ..., pk be positive

integers that are pairwise
k co-prime—i.e., no two numbers share a divisor larger
than 1. Denote N = i=1 pi , and let d, r1 , r2 , . . . , rk be any integers. Then
there is one and only one integer d ≤ x < d + N such that x ≡ ri mod pi for
every 1 ≤ i ≤ k.
We deﬁne the translation function mP (x) := [x mod pi , · · · , x mod pk ] that
maps an integer to its set of remainders with respect to P. When mP (x) is
bijective on some set R, we denote with m−1,R
P : [0, p1 ) × · · · × [0, pk ) → R its
inverse function.
Example 3. Let x be a integer in the range [0, 105) (note that 105 = 3 × 5 × 7).
If we know that the value of x is congruent to [2, 1, 3] modulo {3, 5, 7}, we can
uniquely identify the value of x to be 101 by observing that 101 ≡ 2 mod 3, 101 ≡
1 mod 5, and 101 ≡ 3 mod 7.
The following lemma shows that the function mP is closed under addition,
subtraction and multiplication of integers.
Lemma 1. For every set of primes P, integers x and y, and op ∈ {+, −, ∗}, the
following holds: mP (x op y) = mP (x) op mP (y).

4.2 The IMP-MOD Language

In this section, we deﬁne the IMP-MOD language (syntax in Figure 2), a variant
of the IMP language for which the semantics can be deﬁned using modular arith-
metic.3 An IMP-MOD program is parametric on a set P = {p1 , . . . , pk } of distinct
3
We consider the simple subset for a clear presentation of the semantics, but our
framework works for the full IMP language (and for more complex language con-
structs) as we will see in the later sections.
580 R. Pan et al.

toPrime(a)Pσ,σP := [ aPσ,σP mod p1 , · · · ]

v P Pσ,σP := σ P (v) cP Pσ,σP := [ c mod p1 , · · · , c mod pk ]
aP1 opPa aP2 Pσ,σP := [ (x11 opPa x21 ) mod p1 , · · · ] where aPi P = [ xi1 , · · · , xik ]
aP1 == aP2 Pσ,σP := x11 ==x21 ∧ · · · ∧ x1k == x2k where aPi P = [ xi1 , · · · , xik ]

cPσ,σP := c vPσ,σP := σ(v) a1 opa a2 Pσ,σP := a1 Pσ,σP opa a2 Pσ,σP

v = aPσ,σP := (σ[v ← aPσ,σP ], σ P ) v P = aP Pσ,σP := (σ, σ P [v P ← aP Pσ,σP ])

Fig. 3: Modular semantics.

prime numbers. The structure of an IMP-MOD program is similar to an IMP pro-

gram, but IMP-MOD supports two types of variables and arithmetic expressions:
the regular IMP ones (i.e., v, a, and b), which operate over an integer semantics,
and the modular ones (i.e., v P , aP , and bP ), which take as an additional parame-
ter the set of primes P and operate over a modular semantics. The semantics of
some of the key constructs of IMP-MOD is shown in Figure 3.
The key idea of the modular semantics is that the value of each program
variable in v P and arithmetic expressions in aP is denoted by a tuple of val-
ues, one for each prime number pi ∈ P. For example, the value of the con-
stant cP is represented by the tuple [c mod p1 , · · · , c mod pk ], where each in-
dividual value denotes the remainder of c when divided by the prime number
pi ∈ P. Formally, the program f has two sets of variables V Z = {v1 , · · · , vn }
and V P = {v1P , · · · , vm
P
}, which contain all the integer and prime variables re-
spectively, and a set of holes H = {??1 , . . . , ??k }. The denotation function, uses
two valuation functions: (i ) σ : V Z ∪ H → Z, which maps variables and holes
to integer values, (ii ) σ P : V P → [0, p1 ) × · · · × [0, pk ), which maps primed vari-
ables to modular values. The expression toPrime(a) converts the integer value
of an integer expression a to a modular tuple. Arithmetic expressions in aP are
computed using modular values with the result being obtained using modular
arithmetic with respect to the corresponding primes in P. Note that the only
comparison operator allowed over modular expressions is == and that the divi-
sion operator cannot be applied to modular expressions. While the syntax does
not directly allow for holes to be represented modularly—i.e., we do not have
expressions of the form ??P —an expression of the form toPrime(??) eﬀectively
achieves the objective of representing a hole ?? modularly.

4.3 Equivalence between the two Semantics

Next, we provide an alternative integer semantics, which applies the IMP integer
semantics to modular expressions and show that, under some assumptions on
the values manipulated by the program, the modular and integer semantics are
equivalent. We will use this result to build our modiﬁed synthesis algorithm.
Solving Program Sketches with Large Integer Values 581

toPrime(a)σ1 ,σ2 := aσ1 ,σ2 v P σ1 ,σ2 := σ2 (v P ) cP σ1 ,σ2 := c

aP1 opPa aP2 σ1 ,σ2 := aP1 σ1 ,σ2 opa aP2 σ1 ,σ2 aP1 ==aP2 σ1 ,σ2 := aP1 σ1 ,σ2 ==aP2 σ1 ,σ2

cσ1 ,σ2 := c vσ1 ,σ2 := σ1 (v) a1 opa a2 σ1 ,σ2 := a1 σ1 ,σ2 opa a2 σ1 ,σ2

v = aσ1 ,σ2 := (σ1 [v ← aσ1 ,σ2 ], σ2 ) v P = aP σ1 ,σ2 := (σ1 , σ2 [v P ← aP σ1 ,σ2 ])

Fig. 4: Integer semantics.

Integer Semantics The integer semantics of IMP-MOD is shown in Figure 4 (de-

noted ·σ1 ,σ2 ). In this semantics, modular expressions are evaluated as integer
expressions using the same semantics as for IMP—i.e., the values of modular vari-
ables and modular arithmetic expressions are denoted by integer values. There-
fore, in the integer semantics, we use two valuation functions σ1 : V Z ∪ H
→ Z
mapping variables and holes to integers and σ2 : V P
→ Z mapping modular
variables to integers.
Relation between the Two Semantics We now show that the modular semantics
is, in some sense, equivalent to the integer semantics. For the rest of this section,
we ﬁx a set of distinct primes P = {p1 , · · · , pk }.
To prove the equivalence of the two program semantics, we will require the
values of modular expressions to lie in some range that is covered by the prime
numbers in P. The following deﬁnition captures this restriction.

Deﬁnition 1. Given a modular arithmetic expression aP (resp. Boolean expres-

sion b) and some integers L < U , we say aP with context (σ1 , σ2 ) is uniformly in
the range R := [L, U ) —aP ∈σ1 ,σ2 R for short—if under the integer semantics,
all evaluation of modular subexpressions of aP (resp. b) are in the range R:

– aP ∈σ1 ,σ2 R, iﬀ aP σ1 ,σ2 ∈ R;

– aP1 == aP2 ∈σ1 ,σ2 R, iff aP1 ∈σ1 ,σ2 R, aP2 ∈σ1 ,σ2 R;
– b1 and b2 ∈σ1 ,σ2 R, iff b1 ∈σ1 ,σ2 R, b2 ∈σ1 ,σ2 R;
– b1 or b2 ∈σ1 ,σ2 R, iff b1 ∈σ1 ,σ2 R, b2 ∈σ1 ,σ2 R;
– not b ∈σ1 ,σ2 R, iff b ∈σ1 ,σ2 R;
– a1 opc a2 ∈σ1 ,σ2 R for any arithmetic expressions a1 , a2 and operator opc .

Given a valuation function σ : V P

→ Z, we write mP ◦ σ to denote the
modular valuation obtained by applying the mP function to σ—i.e., for every
v P ∈ V P , (mP ◦ σ)(v P ) = mP (σ(v P )). Similarly, for a modular valuation function
σ P : V P → [0, p1 ) × · · · [0, pk ), we denote m−1,R
P ◦ σ P the integer valuation from
−1,R
V P to R such that, for every v P ∈ V P , (mP ◦ σ P )(v P ) = m−1,R
P (σ P (v P )). The
following lemma shows that, when the values of modular arithmetic expressions
lay in an interval of size N = p1 · . . . · pk the modular and integer semantics of
modular arithmetic expressions are equivalent.
582 R. Pan et al.

Lemma 2. Given a set of primes P = {p1 , · · · , pk }, an arithmetic expression

aP , and two valuation functions σ1 : V Z ∪ H
→ Z and σ2 : V P
→ Z, we have

mP (aP σ1 ,σ2 ) = aP Pσ1 ,mP ◦σ2

Moreover, if there exists an interval R of size N = p1 · . . . · pk such that

aP ∈σ1 ,σ2 R, then
m−1,R
P (aP Pσ1 ,mP ◦σ2 ) = aP σ1 ,σ2 .

Similarly, we show that the two semantics are also equivalent for Boolean
expressions.

Lemma 3. Given a set of primes P = {p1 , · · · , pk }, an interval R of size N =

p1 · . . . · pk , a Boolean expression b, and two valuation functions σ1 : V Z ∪ H
→ Z
and σ2 : V P
→ Z, if b ∈σ1 ,σ2 R, then bσ1 ,σ2 = bPσ1 ,mP ◦σ2 .

We are now ready to show the equivalence between the modular semantics
and the integer semantics for programs P ∈ IMP-MOD. The semantics of a pro-
gram P = f (V Z , V P , H) {s} is a map from valuations to valuations, i.e., given
a valuation σ1 : V Z → Z for integer variables, a valuation σ2 : V P → Z for mod-
ular variables and a valuation σ H : H → Z for holes, we have P (σ1 , σ2 , σ H ) =
sσ1 ∪σH ,σ2 and P P (σ1 , σ2 , σ H ) = sPσ1 ∪σH ,mP ◦σ2 . Therefore, it is suﬃcient to
show that the two semantics are equivalent for any statement s.
The two semantics are equivalent for a statement s if, under the same input
valuations, the resulting valuations of the semantics can be translated to each
other. Formally, given valuations σ1 , σ2 and an interval R of size N , we say
sσ1 ,σ2 ≡P sPσ1 ,mP ◦σ2 iﬀ σ1 = σ1 , mP ◦ σ2 = σ2P and σ2 = m−1,R P ◦ σ2P where

sσ1 ,σ2 = (σ1 , σ2 ) and sσ1 ,mP ◦σ2 = (σ1 , σ2 ).
P P

We deﬁne uniform inclusion for statements.

Deﬁnition 2. Given a set of primes P, two integers L < U and a statement s,

we say s with context (σ1 , σ2 ) is uniformly in the range R := [L, U )—s ∈σ1 ,σ2 R
for short—if under the integer semantics, all evaluation of modular subexpres-
sions of s are in the range R:

– (v P = aP ) ∈σ1 ,σ2 R iﬀ aP ∈σ1 ,σ2 R.

– while(b){s} ∈σ1 ,σ2 R iff s ∈σ1 ,σ2 R and b ∈σ1 ,σ2 R.
– s1 ; s2 ∈σ1 ,σ2 R iff s1 ∈σ1 ,σ2 R and s2 ∈σ1 ,σ2 R.
– if(b) s1 else s2 ∈σ1 ,σ2 R iff s1 ∈σ1 ,σ2 R, s2 ∈σ1 ,σ2 R and b ∈σ1 ,σ2 R.
– assert b ∈σ1 ,σ2 R iff b ∈σ1 ,σ2 R.

At last, the two semantics are equivalent for statements.

Theorem 2. Given a set of primes P = [p1 , · · · , pk ], a statement s and two

valuation functions σ1 : V Z ∪ H → Z and σ2 : V P → Z, if there exists an interval
R of size N such that s ∈σ1 ,σ2 R, then sσ1 ,σ2 ≡P sPσ1 ,mP ◦σ2 .
Solving Program Sketches with Large Integer Values 583

Algorithm 1: returns variables that should be tracked using modular/in-

teger semantics.
/* f : sketched function, V P variables to be tracked modularly, V Z
variables to be tracked with integer values */
1 function DataFlowAnalysis(f )
2 S ← {/, <, >, ≤, ≥}; V Z ← ∅
3 for op ∈ S do
/* Compute all variables v that may flow into op */
4 V Z ← V Z ∪ Dataflow(op, f )
5 VP ←V \VZ
6 return (V Z , V P )

5 From IMP to IMP-MOD Programs

In this section, we develop a data ﬂow analysis for detecting variables in IMP
programs for which it is sound to track values modularly. We then use this data
ﬂow analysis to rewrite an IMP program to an equivalent IMP-MOD program.

5.1 Data Flow Analysis

The formalization of IMP-MOD in Section 4.2 made it clear that the modular
semantics is only appropriate when integer values are manipulated using addi-
tion, multiplication, subtraction, and equality. Other operations like division and
less-than comparison cannot be computed soundly in modular arithmetic.

Example 4. Consider an integer variable x with modular value x2 under modulus

2 and x3 under modulus 3, and an integer variable y with modular value y2 ,
y3 under corresponding moduli. Then the assignment of x = y + y; implies
x2 = (y2 + y2 ) mod 2; and x3 = (y3 + y3 ) mod 3. However, x = x/y; does not
imply x2 = (x2 /y2 ) mod 2; and x3 = (x3 /y3 ) mod 3.

We now deﬁne a data ﬂow analysis (shown in Algorithm 1) for computing

which variables in a program must be tracked with the integer semantics (i.e., the
set V Z ) and which variables can be soundly tracked using the modular semantics
(i.e., the set V P ). For each operator op in {/, <, >, ≤, ≥}, the analysis computes
the set of variables that may flow into the operands of an expression of the form
e1 op e2 . In practice, this is done via backward may analysis, noted as Dataflow
procedure in Algorithm 1. The obtained set of variables must be tracked using
the integer semantics. The remaining variables will never flow into a problematic
operator and can therefore be tracked using the modular semantics.
Implementation Remark Since our implementation also supports arrays and re-
cursion, the data flow analysis in Algorithm 1 is inter-procedural and the set S
also contains the array indexing operator [ ]—i.e., given an expression arr[a], if
a variable v may flow into a, then a must be tracked using the integer semantics.
584 R. Pan et al.

⎧ P
⎪v
⎪
⎪
if a ≡ v and v ∈ V P
⎨c P if a ≡ c
Ra (a) =
⎪
⎪ R a (a ) op P
R a (a ) if a ≡ a1 opPa a2
⎪
⎩
1 a 2

toPrime(a) otherwise
⎧
⎪
⎪ Ra (a1 ) == Ra (a2 ) if b ≡ a1 == a2
⎪
⎨R (b ) and R (b ) if b ≡ b and b
b 1 b 2 1 2
Rb (b) =
⎪
⎪ not R b (b1 ) if b ≡ not b
⎪
⎩
2

b otherwise
⎧
⎪
⎪ Rs (s1 ); Rs (s2 ) if s ≡ s1 ; s2
⎪
⎪
⎪
⎪ v = a if s ≡ v = a and v ∈ V Z
⎪
⎪
⎨v P = R (a) if s ≡ v = a and v ∈ V P
a
Rs (s) =
⎪if(Rb (b)) Rs (s0 ) else Rs (s1 ) if s ≡ if(b) s0 else s1
⎪
⎪
⎪
⎪
⎪
⎪while(Rb (b)) {Rs (s)}
⎪
if s ≡ while b {s}
⎩
assert Rb (b) if s ≡ assert b

Fig. 5: Subset of rules for the translation from IMP to IMP-MOD programs. Rules
are parametric in V Z , V P with P: Rf (f (V, ??){s}) = f (V Z , V P , ??){Rs (s)}.

Furthermore, while in our formalization we allow variables to be tracked using

only one of the two semantics, in our implementation, we allow variables to be
tracked differently (using actual values or modular values) at different program
points by tracking, for each variable v, the program points for which the actual
value of v is needed, which is done by using the same data-flow analysis. In this
case, a variable might initially need to be tracked using actual values but can
later be tracked using modular values.
Example 5. Consider the sketch program polyArray in Figure 1b. For this pro-
gram, Algorithm 1 will return that the variables x and y can be tracked modu-
larly. However, the variables i and n must be tracked using the integer semantics
since they are used in a < operation and as array indices.

5.2 From IMP to IMP-MOD

Now that we have computed what sets of variables can be tracked modularly, we
can transform the IMP program into an IMP-MOD program. The transformation
Rf that rewrites f into an IMP-MOD program is shown in Figure 5. The key idea
of the program transformation is to use the sets V Z and V P to only rewrite
variables and sub-expressions of f for which the modular arithmetic can be
performed soundly.
Once we get a solution for the IMP-MOD program as hole values, we can get
a solution for the IMP program by mapping the hole to integer values given by
the integer semantics.
Solving Program Sketches with Large Integer Values 585

Example 6. Consider a program where the dataﬂow analysis computes V Z =

{i, n} and V P = {x}. The statement x = x + i + 1 is rewritten to xP = xP +
toPrime(i) + 1P .
The transformation Rf is sound.
Theorem 3. Given an IMP program f , and sets V Z and V P resulting from the
data ﬂow analysis on f , the program Rf (f ) is in the IMP-MOD language. More-
over, f IMP = Rf (f ).

6 Solving IMP-MOD Sketches

In this section, we discuss how synthesis in the modular semantics relates to syn-
thesis in the integer semantics and provide an incremental algorithm for solving
IMP-MOD sketches.

6.1 Synthesis in IMP-MOD

Given a set of integers R we say that a variable valuation σ is in R (denoted
σ ∈ R) if for every v, we have σ(v) ∈ R. Similarly to what we saw in Sec-
tion 3, we assume that the sketch has to be solved for ﬁnite ranges of possible
values for the hole (RH ) and input values (Rin ). Solving an IMP-MOD problem
P = f (V, V P , H){s} for the integer semantics amounts to solving the following
constraint:

∃σ H ∈ RH .∀σ1 , σ2 ∈ Rin .sσ1 ∪σH ,σ2 = ⊥.

According to Theorem. 2, given a set of distinct primes P = {p1 , · · · , pk }

and variable valuations σ H , σ1 , and σ2 , if there exists a range R of size N =
p1 · . . . · · · pk such that s ∈σ1 ∪σH ,σ2 R, the modular semantics and the integer
semantics are equivalent to each other. Using this observation, we can deﬁne
the set of variable valuations for which the two semantics are guaranteed to be
equivalent:

IR := (σ1 , σ2 ) | ∀σ H ∈RH .∃R. |R|=N ∧ s∈σ1 ∪σH ,σ2 R .

Since for every σ H ∈ RH and σ1 , σ2 ∈ IR P

we have that sPσ1 ∪σH ,mP ◦σ2 =
sσ1 ∪σH ,σ2 , any solution to an IMP-MOD program in the modular semantics is
also a solution to the following formula in the integer semantics:

∃σ H ∈ RH .∀σ1 , σ2 ∈ IR
P
.sσ1 ∪σH ,σ2 = ⊥.

When all valuations in σ1 , σ2 ∈ Rin are also elements of IR P

, any solution to
an IMP-MOD program in the modular semantics is guaranteed to be a correct
solution under the integer semantics.
To summarize, if the synthesizer returns UNSAT for the IMP-MOD program,
the problem is unrealizable and does not admit a solution. When it returns a solu-
tion, the solution is correct if it only produces valuations in the range allowed by
586 R. Pan et al.

Algorithm 2: Incremental synthesis for IMP-MOD.

/* f : function, P: set of primes */
1 function IncrementalSynthesis(f, P)
2 P ← [p1 ]
3 fsyn ← Synthesis(f, P )
4 while ∃pcex ∈ P : ¬Verify(fsyn , pcex ) do
5 P ← P ∪ pcex
6 fsyn ← Synthesis(f, P )
7 if fsyn == UNSAT then return ∅ ;
8 return fsyn

the choice of prime numbers. In practice, one can use a veriﬁer to check the cor-
rectness of the synthesized solution and add more prime numbers to the modular
synthesizer if needed. In fact, this is the main idea behind the counterexample-
guided inductive synthesis algorithm used by Sketch (Section 3).

6.2 Incremental Synthesis Algorithm

In this section, we propose an incremental synthesis algorithm that builds on
the following observation. The set of variable valuations for which modular and
integer semantics are equivalent increases monotonically in the size of P:
P1 ⊆ P2 =⇒ IR
P1
⊆ IR
P2
. (1)
Algorithm 2 uses Equation 1 to add prime numbers lazily during the synthesis
process. The algorithm first constructs a set P = {p1 } with the first prime num-
ber p1 ∈ P and synthesizes a solution that is correct for computations modulo
the set P . It then checks if the synthesized solution fsyn satisfies the assertions
with respect to all prime numbers in P. If yes, fsyn is returned as the solution.
Otherwise, the algorithm finds a prime pcex ∈ P where Verify(fsyn , pcex ) does
not hold and it adds it to the set P continuing the iterative algorithm. Due to
Equation 1, Algorithm 2 is sound and complete with respect to the synthesis
algorithm that considers the full prime set P all at once.
In practice, the user could use domain knowledge to estimate a suitable set
of primes or alternatively use our incremental algorithm to discover appropriate
prime sets. The set of prime numbers {2, 3, 5, 7, 11, 13, 17} could usually instan-
tiate a range R that is large enough for most synthesis tasks based on Sketch.

7 Complexity of Rewritten Programs

In this section, we analyze how many bits are necessary to encode numbers for
both semantics using unary and binary bit-vector encodings of integers (Sec. 7.1
and 7.2), and show how many prime numbers are necessary in the modular
semantics to cover values up to a certain bound (Sec. 7.3). The following results
build upon several number theory results that the reader can consult at [9, 15].
Solving Program Sketches with Large Integer Values 587

7.1 Bit-complexity of Binary Encoding

In this section, we analyze how many bits are necessary when representing an
interval of size N in binary in our modular semantics. In the rest of the section,
we consider the set of primes Pn = {p | p < n} = {p1 , . . . , pk } containing the
prime numbers that have value smaller than n. We will show in Section 8 that
this choice of prime number also yields good performance in practice. Concretely,
we are interested in knowing what is the magnitude of the number N = p1 ·. . .·pk
and how many bits are used to represent the numbers in Pn .
We start by introducing the notion of primorial.
Deﬁnition 3 (Primorial). Given a number n, the primorial
n# is deﬁned as
the product of all primes smaller than n—i.e., n# = p.
p∈Pn

The primorial captures the size N of the interval covered by the Chinese Re-
mainder Theorem when using prime numbers up to n. The following number
theory result gives us a close form for the primorial and shows that the number
N has approximately n bits.

n# = e(1+o(1))n = 2(1+o(1))n (2)

We use another number theory notion to quantify the number of bits in Pn .

Deﬁnition 4 (Chebyshev function). Given a number n, the Chebyshev func-
tion ϑ(n) is the
sum of the logarithms of all the prime numbers smaller than
n—i.e., ϑ(n) = log p.
p∈Pn

The following number theory result relates the primorial to the Chebyshev func-
tion.
ϑ(n) = log(n#) = log 2(1+o(1))n = (1 + o(1))n (3)
Aside from rounding errors, the Chebyshev function captures the number of bits
required to represent the numbers in Pn . To
obtain a more precise bound on this
number, we need a bound for the formula log p.
p∈Pn
We start by recalling the following fundamental number theory result.
Theorem 4 (Prime number theorem). The set Pn has size approximately
n/ log n.
Using Theorem 4, we get the following result.

log p ≤ n/ log n + log p ≈ (1 + o(1))n (4)
p∈Pn p∈Pn

Representing a number en in a classic binary encoding requires log2 (en ) =

(1 + o(1))n bits and, combining Equations 2 and 4, we get the following result.
Theorem 5. Representing a number 2n in binary requires (1+o(1))n bits under
both modular and integer semantics.
588 R. Pan et al.

Hence, representing a number in binary requires the same number of bits in

the both semantics.
Example 7. Consider the set P18 = {2, 3, 5, 7, 11, 13, 17}, which can model an
interval of N = 510, 510 integers (i.e., n = 18 in Theorem 5). Representing N in
binary requires 19 bits while the binary representations of all the primes in P18
use 22 bits. Both numbers are close to 18 as predicted by the theorem.

7.2 Bit-complexity of Unary Encoding

As discussed in Sec. 3, the default Sketch solver encodes numbers using a unary
encoding—i.e., Sketch requires 2n bits to encode the number 2n . Representing
the same number in unary under the modular semantics requires only prime
numbers smaller than n and therefore p bits. We can then use the following
p∈Pn
closed form to approximate this quantity.
n2
p∼ (5)
2 log n
p∈Pn

Equation 5 yields the following theorem.

Theorem 6. Representing a number 2n in unary requires 2n bits in the integer
n2
semantics and approximately 2 log n bits in the modular semantics.
These results show that, under a unary encoding, the modular semantics is
exponentially more succinct than the integer semantics.
Example 8. Consider again the prime set P18 = {2, 3, 5, 7, 11, 13, 17}, which can
model an interval of N = 510, 510 integers. Representing N in unary requires
510,510 bits. On the other hand, the sum of the bits in the unary encoding of
the primes in P18 is 58.

7.3 Number of Required Primes

We analyze how many primes are needed to represent a certain number in the
modular semantics. We start by introducing the following alternative version of
the primorial.
Definition 5 (Prime Primorial). For the n-th prime number pn , the prime
n

primorial pn # is defined as the product of the first n primes—i.e., pn # = pi .
k=1
The following known number theory result gives us an approximation for the
prime primorial.
pn # = e(1+o(1))n log n (6)
Notice how the approximation of the primorial differs from that of the prime
primorial. This is due to the fact that prime numbers are sparse—i.e., the n-th
prime number is approximately n log n.
Using Equation 6 we obtain the following result.
Solving Program Sketches with Large Integer Values 589

Theorem 7. Representing numbers in an interval of size N = en log n in the

modular semantics requires the ﬁrst n prime numbers.
Since the relation k = n log n does not admit a closed form for n, we cannot
derive exactly how many primes are needed to represent a number 2k with k
bits. It is however clear from the theorem that the number of required primes
grows slower than k.

8 Evaluation

We implemented a prototype of our technique as a simple compiler in Java. Our

implementation provides a simplified Sketch frontend, which only allows the
limited syntax we support. Given a Sketch file, our tool rewrites it into a differ-
ent Sketch file that operates according to the modular semantics. We will use
Unary to denote the result obtained by running the default version of Sketch
with unary integer encoding on the original Sketch file, Binary to denote the
result obtained by running the version of Sketch using an SMT-like native in-
teger solver based on binary integer encoding, Unary-p to denote the result of
running the default Sketch version on our modified Sketch file, and Unary-
p-inc to denote the result of running the default version of Sketch on the file
generated by the incremental version of our algorithm described in Section 6. As
expected from our theory, the prime technique is not beneficial for the SMT-like
native integer solver and always results in worse runtime. Therefore, we do not
present data for this solver. All experiments were performed on a machine with
4.0GHz Intel Core i7 CPU with 16GB RAM with Sketch-1.7.5 and we use a
timeout value of 300 seconds (we also report out-of-memory errors as timeouts).
Our evaluation answers the following research questions:

Q1 How does the performance of Unary-p compare to Unary and Binary?

Q2 How does the incremental algorithm compare to the non-incremental one?
Q3 Is Unary-p’s performance sensitive to the set of selected prime numbers?
Q4 How many primes are needed by Unary-p to produce correct solutions?
Q5 Does Unary generate larger SAT queries than Unary-p?

8.1 Benchmarks

We perform our evaluation on three families of programs.

Polynomials The first set of benchmarks contains 81 variants of the polynomial
synthesis problem presented in Figure 1. The original version of this benchmark
appears in the Sketch benchmark suite under the name polynomial.sk. For
each benchmark, we generate a random polynomial f , random inputs {→ −x }, and
→
−
take the set {( x , f (x))} as specification. Each benchmark in this set has the
following parameters: #Ex∈ {2, 4, 6} is the number of input-output examples as
specification, cbits∈ {5, 6, 7} denote the number of bits hole values can use,
exIn∈ {[−10, 10], [−30, 30], [−50, 50]} denotes the range of randomly generated
590 R. Pan et al.

input examples and coeff∈ {[−10, 10], [−30, 30], [−50, 50]} denotes the range of
randomly generated coefficients in the polynomial f .
Invariants The second set of benchmarks contain 46 variants of two invariant
generation problems obtained from a public set of programs that require poly-
nomial invariants to be verified [8]. We selected the two programs in which at
least one variable could be tracked modularly by our tool (the other programs
involved complex array operations or inequality operators) and turned the verifi-
cation problems into synthesis problems by asking Sketch to find a polynomial
equality (using the program variables) that is an invariant for the loop in the
program. To control the size of the magnitudes of the inputs, we only require
the invariants to hold for a fixed set of input examples.
The first problem, mannadiv, iteratively computes the remainder and the
quotient of two numbers given as input. The invariant required to verify mannadiv
is a polynomial equality of degree 2 involving 5 variables. The Sketch template
required to describe the space of all polynomial equalities has 32 holes and can-
not be handled by any of the Sketch solvers we consider. We therefore simplify
the invariant synthesis problems in two ways. In the first variant, we reduce the
ranges of the hole values in the templates by considering cbits ∈ {2, 3}. In the
second variant, we set cbits = {5, 6, 7}, but reduce the number of missing hole
values to 4 (i.e., we provide part of the invariant). Each benchmark takes two
random inputs and we consider the following input ranges {[1, 50], [1, 100]}. In
total, we have 10 benchmarks for mannadiv.
The second problem, petter, iteratively computes the sum 1≤i≤n i5 for a

given input n. The invariant required to verify petter is a polynomial equality
of degree 6 involving 3 variables. The Sketch template required to describe all
such polynomial equalities has 56 holes and cannot be handled by any of the
Sketch solvers we consider. We consider the following simplified variants of the
problem: (i ) petter_0 computes 1≤i≤n1 and requires a polynomial invariant
of degree one, (ii ) petter_x computes 1≤i≤n x for a given input variable x
and
requires a polynomial invariant of degree two, (iii ) petter_1 computes
1≤i≤n i and
requires a polynomial invariant of degree two, and (iv ) petter_10
computes 1≤i≤n i + 1 and requires a polynomial invariant of degree two. Each
benchmark takes two random inputs and we consider the following input ranges
{[1, 10], [1, 100], [1, 1000]}. In total, we have 12 variants of petter, each run for
values of cbits ∈ {5, 6, 7}—i.e., a total of 36 benchmarks.
Program Repair The third set of benchmarks contains 54 variants of Sketch
problems from the domain of automatic feedback generation for introductory
programming assignments [7]. Each benchmark corresponds to an incorrect pro-
gram submitted by a student and the goal of the synthesizer is to find a small
variation of the program that behaves correctly on a set of test cases. We select
the 6/11 benchmarks from the tool Qlose [7] for which (i ) our implementation
can support all the features in the program, and (ii ) our data flow analysis
identifies at least one variable that can be tracked modularly. Of the remaining
benchmarks, 3/11 do not contain variables that can be tracked modularly, and
2/11 call auxiliary functions that cannot be translated into Sketch. For each
Solving Program Sketches with Large Integer Values 591

Table 1: Effectiveness of different solvers. SAT (resp. UNSAT) denotes the num-
ber of benchmarks for which solver could find a solution to the benchmarks (resp.
prove no solution existed) while TO denotes the number of timeouts.
Polynomials Invariants Program repair
Solver Solved SAT UNSAT TO SAT UNSAT TO SAT UNSAT TO
Unary 69/181 12 4 65 5 0 41 48 0 6
Binary 127/181 70 6 5 17 0 29 34 0 20
Unary-p 169/181 73 5 3 41 2 3 48 0 6
Unary-p-inc 172/181 73 6 2 41 2 3 50 0 4

program, we consider the original problem and two variants where the integer
inputs are multiplied by 10 and 100, respectively. Further, for each program vari-
ants, we impose an assertion specifying that the distance between the original
program and the repaired program is within a certain bound. We select three
diﬀerent bounds for each program: the minimum cost c, c + 100, and c + 200.

8.2 Performance of Unary-p

Table 1 summarizes our comparison. First, we compare the performance of

Unary-p and Unary. We use P = {2, 3, 5, 7, 11, 13, 17}, which is enough for
Unary-p to always ﬁnd correct solutions (we verify the correctness of a solution
by instantiating the hole values in the original sketch programs). Unary can only
solve 69/181 benchmarks while Unary-p can solve 169/181. Figure 7a shows a
scatter plot (log scale) of the solving times for the two techniques: each point
below the diagonal line denotes a benchmark on which Unary-p was faster than
Unary. Points on the extreme right-hand side of the plot denote timeout for
Unary. When both solvers terminate, Unary-p (avg. 1.7s) is 6.1X (geometric
mean) faster than Unary (avg. 25.0s).
Next, we compare the performance of Unary-p and Binary (Figure 7b). On
the 64 easier benchmarks that Binary can solve in less than 1 second, Binary
(avg. 0.55s) outperforms Unary-p (avg. 2.32s), but Unary-p still has reason-
able performance. On the 49 benchmarks that Binary can solve between 1 and
10 seconds, Unary-p (avg. 3.5s) is on average 1.9X faster than Binary (avg.
6.9s). Most interestingly, for the 14 harder benchmarks for which Binary takes
more than 10 seconds, Unary-p (avg. 5.7s) is on average 15.9X faster than Bi-
nary (avg. 90.9s). Remarkably, Unary-p solved 43 of the benchmarks (in less
than 8s each) for which Binary timed out 4 , and Unary-p only timed out for
two benchmarks that Binary could solve in less than a second and one bench-
mark that Binary could solve in 260s. Finally, we would like to highlight that
for 41/208 benchmarks, even Unary outperforms Binary. As expected from
4
During our experiment, we observed that Binary incorrectly reported UNSAT for
10 satisﬁable benchmarks. We reported these benchmarks as timeouts and have
contacted the authors of Sketch to address the issue.
592 R. Pan et al.

the discussion throughout the paper, these are benchmarks typically involving
complex operations but not involving overly large numbers.
We can now answer Q1. First, Unary-p consistently outperforms Unary
across all benchmarks. Second, Unary-p outperforms Binary on hard-to-
solve problems and can solve problems that Binary cannot solve—
e.g., Unary-p solved 28/46 invariant problems that Sketch could not solve.
Unary-p and Binary have similar performance on easy problems.
Comparison to full SMT encoding For completeness, we also compare our ap-
proach to a tool that uses SMT solvers to model the entire synthesis problem.
We choose the state-of-the-art SMT-based synthesizer Rosette [23] for our
comparison. Rosette is a programming language that encodes verification and
synthesis constraints written in a domain-
specific language into SMT formulas that
can be solved using SMT solvers. 105
We only run Rosette on the set of
Polynomials because Rosette does sup- Binary (ms)
104
port the theories of integers, but does not
have native support for loops, so there 103
is no direct way to encode Invariants
and Program Repair benchmarks. To our 102
knowledge, Rosette provides a way to 102 103 104 105
specify the number k it uses to model in- Rosette (ms)
tegers and reals as k-bit words, but the
user has no control over how many bits Fig. 6: Rosette vs Binary
it uses for unknown holes specifically. So
we evaluate 27 instead of 81 variants of the polynomial synthesis problem on
Rosette, i.e., we consider different numbers of cbits.
Figure 6 shows the running times (log scale) for Rosette and Binary with
cbits=6. Rosette successfully solved 16/27 benchmarks and it terminates
quickly (avg. 2.9s) when it can find a solution. However, Rosette times out
on 11 benchmarks for which Binary terminates. The timeouts are due to the
fact that Rosette employs full SMT encodings that combine multiple theories
while Binary uses a SAT solver that is only modified to accommodate SMT-like
integer constraints. Since we now know full SMT encodings are not as general
and efficient as the encodings used in Sketch, we will only evaluate the effec-
tiveness of our technique based on comparison with Binary.
Finally, we tried applying our prime-based technique to Rosette and, as
expected, the technique is not beneficial due to the binary encoding of numbers
in SMT, and causes all benchmarks to timeout. To summarize, (i ) SMT solvers
cannot efficiently handle the synthesis problems considered in this paper, and
(ii ) our technique is better suited for SAT solvers than SMT solvers.

8.3 Performance of Incremental Solving

Our implementation of the incremental solver Unary-p-inc ﬁrst attempts to
ﬁnd a solution with the prime set P = {2, 3, 5, 7}. If the solver returns a correct
Solving Program Sketches with Large Integer Values 593

Polynomials Polynomials
105 Repair 105 Repair
Invariants Invariants
Unary-p (ms)

Unary-p (ms)
104 104

103 103

103 104 105 TO 103 104 105 T O

Unary (ms) Binary (ms)
(a) Unary vs Unary-p (b) Binary vs Unary-p

Fig. 7: Performance of Unary, Binary, and Unary-p.

solution, Unary-p-inc terminates. Oth- Polynomials

erwise, Unary-p-inc incrementally adds 105 Repair
Invariant
Unary-p (ms)

the next prime to P until it ﬁnds a

correct solution, it proves there is no
104
solution, or it times out. Unary-p-
inc is 25.2% (geometric mean) slower
than Unary-p (Figure 8 (log scale)). 103
Unary-p-inc can solve three bench-
marks for which both Unary-p and 103 104 105 TO
Binary timed out. To answer Q3, Unary-p-inc (ms)
Unary-p-inc and Unary-p have
similar performance. Fig. 8: Unary-p-inc vs Unary-p

8.4 Varying the Prime Number Set P

In this experiment, we evaluate how different prime number sets affect Unary-p.
We consider the 5 increasing sets of primes: P5 = {2, 3, 5}, P7 = {2, 3, 5, 7},
P11 = {2, 3, 5, 7, 11}, P13 = {2, 3, 5, 7, 11, 13}, and P17 = {2, 3, 5, 7, 11, 13, 17}.
Figure 9a (log scale) shows the running times for all the polynomial benchmarks
with cbits=7 (showing all benchmarks would clutter the plot). The points where
the lines change from dashed to solid denote the number of primes for which the
algorithm starts yielding correct solutions. As expected, a smaller set of primes
leads to faster solving times as the resulting constraints are smaller and fewer
bits are needed for encoding intermediate values. The runtime on average grows
with the increasing size of the primes. For example, across all benchmarks, using
P17 takes 23% longer on average than using P11 . To answer Q3, Unary-p is
slower when using increasingly large sets of prime.
In terms of correctness, we find that smaller prime sets often yield incorrect
solutions (P5 (37% correct), P7 (70%), P11 (86%), P13 (97 %), and P17 (100%)
because there is not enough discriminative power with fewer primes and the
594 R. Pan et al.
solving time (ms)

solving time (ms)

105
104

104

103
103

[2 3 5] [2 3 5 7] [2−11] [2−13] [2−17] [2−17] [11 17 19 23] [31 41 47] [251 263]
prime set prime set
(a) Larger sets of primes (b) Larger primes

Fig. 9: Performance for diﬀerent sets of prime numbers.

solutions may overﬁt to the smaller set of intermediate values. It is interesting

to note that even prime sets of intermediate size often lead to correct solutions
in practice, which explains some of the speedups observed in the incremental
synthesis algorithm. To answer Q4, Unary-p is able to synthesize correct
solutions even with intermediate sized sets of primes.
Changing Magnitude of Primes We also evaluate the performance of Unary-
p when using primes of different magnitudes. We consider the sets of primes
{11, 17, 19, 23}, {31, 41, 47}, and {251, 263}, which define similar integer ranges,
but pose different trade-offs between the number of used primes and their sizes—
e.g., the set {251, 263} only uses two very large primes. Since the different sets
cover similar integer ranges, they all produce correct solutions. Figure 9b (log
scale) shows the running time of Unary-p for the same benchmarks as Figure 9a.
Larger prime sets of smaller prime values require less time to solve than smaller
prime sets of larger prime values. This result is expected since, in the unary
encoding of numbers, representing larger numbers requires more bits.

8.5 Size of SAT Formulas

In this experiment, we compare the sizes of the intermediate SAT formulas gen-
erated by Unary-p and Unary. Figure 10a shows a scatter plot (log scale) of
the number of clauses of the largest intermediate SAT query generated by the
CEGIS algorithm for the two techniques. We only plot the instances in which
Unary was able to produce at least a SAT formula. Unary produces SAT for-
mulas that are on average 19.3X larger than those produced by Unary-p. To
answer Q5, as predicted by our theory, Unary-p produces signiﬁcantly
smaller SAT queries than Unary.
Performance vs Size of SAT Queries We also evaluate the correlation between
synthesis time and size of SAT queries. Figure 10b plots the synthesis times of
both solvers against the sizes of the SAT queries. It is clear that the synthesis
Solving Program Sketches with Large Integer Values 595

Polynomials Unary
formula size Unary-p

107 Repair 105 Unary-p

Invariant

time (ms)
106
104
5
10

104 103

103 3
10 104 105 106 107 105 106 107 108
formula size Unary formula size
(a) Size: Unary-p vs Unary (b) Performance vs size

Fig. 10: SAT formulas sizes and performance.

time increases with larger SAT queries. The plot illustrates how the solving time
strongly depends on the size of the generated formulas.

9 Related Work

Program Sketching Program sketching was designed to automatically synthesize

efficient bit-vector manipulations from inefficient iterative implementations [21].
The Sketch tool has since been engineered to support complex language fea-
tures and operations [19]. Thanks to its simplicity, sketching has found wide
adoption in applications such as optimizing database queries [3], automated
feedback generation [18], program repair [7], and many others. Our work further
extends the capabilities of Sketch in a new direction by leveraging number
theory results. In particular, our technique allows Sketch to handle sketches
manipulating large integer numbers. To the best of our knowledge, our technique
is the first one that can solve many of the benchmarks presented in this paper.
Uses of Chinese Remainder Theorem The Chinese Remainder Theorem and its
derivative corollaries have found wide application in several branches of Com-
puter Science and, in particular, in Cryptography [11, 26].
The idea of using modular arithmetic to abstract integer values has been
used in program analysis. Since modular fields are finite, they can be used as
an abstract domain for verifying programs manipulating integers [5]—e.g., the
abstract domain can track whether a number is even or odd. Our work extends
this idea to the domain of program synthesis and requires us to solve several
challenges. First, when used for verifying programs, the modular abstraction is
used to overapproximate the set of possible values of the program and does not
need to be precise. In particular, Clark et al. [5] allow program operations that
are in the IMP language but not in the IMP-MOD language and lose precision when
modeling such operations—e.g., when performing the assignment x = x/2 the
value of x mod 2 can be either 0 or 1. Such imprecision is fine in program analysis
596 R. Pan et al.

since the abstraction is used to show that a program does not contain a bug—
i.e., even in the abstract domain, the problem behaves fine. In our setting, the
problem is opposite as we use the abstraction to simplify the synthesis problem
and provide a theory for when the modular and integer semantics are equivalent.
Pruning Spaces in Program Synthesis Many techniques have been proposed to
prune large search space of possible programs [14]. Enumerative synthesis tech-
niques [24, 12, 13, 17] enumerate programs in a search space and avoid enumer-
ating syntactically and semantically equivalent terms. Some synthesizers such
as Synquid [16] and Morpheus [10] use refinement types and first-order formu-
las over specifications of DSL constructs to refute inconsistent programs. Re-
cently, Wang et al. [25] proposed a technique based on abstraction refinement
for iteratively refining abstractions to construct synthesis problems of increasing
complexity for incremental search over a large space of programs.
Instead of pruning programs in the syntactic space, our technique uses mod-
ular arithmetic to prune the semantic space—i.e., the complexity of verifying the
correctness of the synthesized solution—while maintaining the syntactic space
of programs. Our approach is related to that of Tiwari et al. [22], who present a
technique for component-based synthesis using dual semantics—where syntactic
symbols in a language are provided two different semantics to capture differ-
ent requirements. Our technique is similar in the sense that we also provide an
additional semantics based on modular arithmetic. However, we formalize our
analysis based on number theory results and develop it in the context of general-
purpose Sketch programs that manipulate integer values, unlike Tiwari et al.’s
work that is developed for straight-line programs composed of components.
Synthesis for Large Integer Values Abate et al. propose a modification of the
Cegis algorithm for solving syntax-guided synthesis (SyGuS) problems with
large constants [1]. SyGuS differs from program sketching in how the synthesis
problem is posed and in the type of programs that can be modeled. In particular,
in SyGuS one can only describe programs representing SMT formulas and the
logical specification for the problem can only relate the input and output of the
program—i.e., there cannot be intermediate assertions within the program. The
problem setup and the solving algorithms proposed in this paper are orthogonal
to those of Abate et al. First, we focus on program sketching, which is orthog-
onal to SyGuS as sketching allows for richer and more generic program spaces
as well as richer specifications. While it is true that certain synthesis problems
can be expressed both as sketches and as SyGuS problems, this is not the case
for our benchmarks programs, which use loops, arrays and non-linear integer
arithmetic, all of which are not supported by SyGuS. Second, our technique is
motivated by how Sketch encodes and solves program sketches through SAT
solving. While the traditional Sketch encoding can explode for large constants,
the same encoding allows Sketch to solve program sketches involving complex
arithmetic and complex programming constructs. The algorithm proposed by
Abate et al. iteratively builds SMT (not SAT) formulas that are required to
be in a decidable logical theory. Such an encoding only works for the restricted
programming models used in SyGuS problems.
Solving Program Sketches with Large Integer Values 597

References

1. A. Abate, C. David, P. Kesseli, D. Kroening, and E. Polgreen. Counterexample

guided inductive synthesis modulo theories. In CAV , Lecture Notes in Computer
Science. Springer, 2018.
2. R. Alur, R. Bodík, G. Juniwal, M. M. K. Martin, M. Raghothaman, S. A. Seshia,
R. Singh, A. Solar-Lezama, E. Torlak, and A. Udupa. Syntax-guided synthesis. In
Formal Methods in Computer-Aided Design, FMCAD 2013, Portland, OR, USA,
October 20-23, 2013, pages 1–8, 2013.
3. A. Cheung, A. Solar-Lezama, and S. Madden. Optimizing database-backed ap-
plications with query synthesis. In Proceedings of the 34th ACM SIGPLAN Con-
ference on Programming Language Design and Implementation, PLDI ’13, pages
3–14, 2013.
4. L. N. Childs, editor. The Chinese Remainder Theorem, pages 253–281. Springer
New York, New York, NY, 2009.
5. E. M. Clarke, O. Grumberg, and D. E. Long. Model checking and abstraction.
ACM Trans. Program. Lang. Syst., 16(5):1512–1542, Sept. 1994.
6. P. Cousot and R. Cousot. Abstract interpretation: A unified lattice model for static
analysis of programs by construction or approximation of fixpoints. In Proceedings
of the 4th ACM SIGACT-SIGPLAN Symposium on Principles of Programming
Languages, POPL ’77, pages 238–252, New York, NY, USA, 1977. ACM.
7. L. D’Antoni, R. Samanta, and R. Singh. Qlose: Program repair with quantitative
objectives. In CAV (2), volume 9780 of Lecture Notes in Computer Science, pages
383–401. Springer, 2016.
8. S. de Oliveira, S. Bensalem, and V. Prevosto. Polynomial invariants by linear
algebra. In C. Artho, A. Legay, and D. Peled, editors, Automated Technology
for Verification and Analysis, pages 479–494, Cham, 2016. Springer International
Publishing.
9. P. Dusart. Estimates of ψ,ϑ for large values of x without the riemann hypothesis.
Math. Comput., 85(298):875–888, 2016.
10. Y. Feng, R. Martins, J. Van Geffen, I. Dillig, and S. Chaudhuri. Component-
based synthesis of table consolidation and transformation tasks from examples. In
Proceedings of the 38th ACM SIGPLAN Conference on Programming Language
Design and Implementation, PLDI 2017, pages 422–436, New York, NY, USA,
2017. ACM.
11. J. Grobchadl. The chinese remainder theorem and its application in a high-speed
rsa crypto chip. In Proceedings of the 16th Annual Computer Security Applications
Conference, ACSAC ’00, pages 384–, Washington, DC, USA, 2000. IEEE Computer
Society.
12. S. Gulwani. Automating string processing in spreadsheets using input-output ex-
amples. In Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Prin-
ciples of Programming Languages, POPL 2011, Austin, TX, USA, January 26-28,
2011, pages 317–330, 2011.
13. S. Gulwani, W. R. Harris, and R. Singh. Spreadsheet data manipulation using
examples. Commun. ACM, 55(8):97–105, 2012.
14. S. Gulwani, O. Polozov, and R. Singh. Program synthesis. Foundations and Trends
in Programming Languages, 4(1-2):1–119, 2017.
15. G. J. O. Jameson. The Prime Number Theorem. London Mathematical Society
Student Texts. Cambridge University Press, 2003.
598 R. Pan et al.

16. N. Polikarpova, I. Kuraj, and A. Solar-Lezama. Program synthesis from polymor-

phic refinement types. In Proceedings of the 37th ACM SIGPLAN Conference on
Programming Language Design and Implementation, PLDI 2016, Santa Barbara,
CA, USA, June 13-17, 2016, pages 522–538, 2016.
17. R. Singh and S. Gulwani. Transforming spreadsheet data types using examples.
In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Prin-
ciples of Programming Languages, POPL 2016, St. Petersburg, FL, USA, January
20 - 22, 2016, pages 343–356, 2016.
18. R. Singh, S. Gulwani, and A. Solar-Lezama. Automated feedback generation for
introductory programming assignments. In ACM SIGPLAN Conference on Pro-
gramming Language Design and Implementation, PLDI ’13, Seattle, WA, USA,
June 16-19, 2013, pages 15–26, 2013.
19. A. Solar-Lezama. Program sketching. STTT, 15(5-6):475–495, 2013.
20. A. Solar-Lezama, C. G. Jones, and R. Bodík. Sketching concurrent data structures.
In Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language
Design and Implementation, Tucson, AZ, USA, June 7-13, 2008, pages 136–148,
2008.
21. A. Solar-Lezama, L. Tancau, R. Bodik, S. Seshia, and V. Saraswat. Combinatorial
sketching for finite programs. SIGOPS Oper. Syst. Rev., 40(5):404–415, Oct. 2006.
22. A. Tiwari, A. Gascón, and B. Dutertre. Program synthesis using dual interpre-
tation. In Automated Deduction - CADE-25 - 25th International Conference on
Automated Deduction, Berlin, Germany, August 1-7, 2015, Proceedings, pages 482–
497, 2015.
23. E. Torlak and R. Bodik. A lightweight symbolic virtual machine for solver-aided
host languages. In Proceedings of the 35th ACM SIGPLAN Conference on Pro-
gramming Language Design and Implementation, PLDI ’14, pages 530–541, New
York, NY, USA, 2014. ACM.
24. A. Udupa, A. Raghavan, J. V. Deshmukh, S. Mador-Haim, M. M. Martin, and
R. Alur. Transit: Specifying protocols with concolic snippets. In Proceedings of
the 34th ACM SIGPLAN Conference on Programming Language Design and Im-
plementation, PLDI ’13, pages 287–296, 2013.
25. X. Wang, I. Dillig, and R. Singh. Program synthesis using abstraction refinement.
PACMPL, 2(POPL):63:1–63:30, 2018.
26. S.-M. Yen, S. Kim, S. Lim, and S.-J. Moon. Rsa speedup with chinese remainder
theorem immune against hardware fault cryptanalysis. IEEE Trans. Comput.,
52(4):461–472, Apr. 2003.

Marco Paviotti12 , Simon Cooksey2 , Anouk Paradis3 , Daniel Wright2 , Scott

Owens2 , and Mark Batty2
1
Imperial College London, United Kingdom
[email protected]
2
University of Kent, Canterbury, United Kingdom
{m.paviotti, sjc205, daw29, S.A.Owens, M.J.Batty}@kent.ac.uk
3
ETH Zurich, Switzerland
[email protected]

Abstract. We present a denotational semantics for weak memory con-

currency that avoids thin-air reads, provides data-race free programs
with sequentially consistent semantics (DRF-SC), and supports a com-
positional refinement relation for validating optimisations. Our semantics
identifies false program dependencies that might be removed by compiler
optimisation, and leaves in place just the dependencies necessary to rule
out thin-air reads. We show that our dependency calculation can be
used to rule out thin-air reads in any axiomatic concurrency model, in
particular C++. We present a tool that automatically evaluates litmus
tests, show that we can augment C++ to fix the thin-air problem, and
we prove that our augmentation is compatible with the previously used
compilation mappings over key processor architectures. We argue that
our dependency calculation offers a practical route to fixing the long-
standing problem of thin-air reads in the C++ specification.

Keywords: Thin-air problem · Weak memory concurrency · Compiler

Optimisations · Denotational Semantics · Compositionality

1 Introduction
It has been a longstanding problem to deﬁne the semantics of programming
languages with shared memory concurrency in a way that does not allow un-
wanted behaviours – especially observing thin-air values [8,7] – and that does
not forbid compiler optimisations that are important in practice, as is the case
with Java and Hotspot [30,29]. Recent attempts [16,11,25,15] have abandoned
the style of axiomatic models, which is the de facto paradigm of industrial spec-
iﬁcation [8,2,6]. Axiomatic models comprise rules that allow or forbid individual
program executions. While it is impossible to solve all of the problems in an

This work was funded by EPSRC Grants EP/M017176/1, EP/R020566/1 and
EP/S028129/1, the Lloyds Register Foundation, and the Royal Academy of En-
gineering.

c The Author(s) 2020

P. Müller (Ed.): ESOP 2020, LNCS 12075, pp. 599–625, 2020.
https://doi.org/10.1007/978-3-030-44914-8_ 22
600 M. Paviotti et al.

axiomatic setting [7], abandoning it completely casts aside mature tools for au-
tomatic evaluation [3], automatic test generation [32], and model checking [23],
as well as the hard-won refinements embodied in existing specifications like C++,
where problems have been discovered and fixed [8,7,18]. Furthermore, the indus-
trial appetite for fundamental change is limited. In this paper we offer a solution
to the thin-air problem that integrates with existing axiomatic models.
The thin-air problem in C++ stems from a failure to account for dependen-
cies [22]: false dependencies are those that optimisation might remove, and real
dependencies must be left in place to forbid unwanted behaviour [7]. A single
execution is not sufficient to discern real and false dependencies. A key insight
from previous work [14,15] is that event structures [33,34] give us a simultane-
ous overview of all traces at once, allowing us to check whether a write is sure
to happen in every branch of execution. Unfortunately, previous work does not
integrate well with axiomatic models, nor lend itself to automatic evaluation.
To address this, we construct a denotational semantics in which the meaning
of an entire program is constructed by combining the meanings of its subcom-
ponents via a compositional function over the program text. This approach can
be particularly amenable to automatic evaluation, reasoning and compiler certi-
fication [19,24], and fits with the prevailing axiomatic approach.
This paper uses this denotational approach to capturing program dependen-
cies to explore the thin-air problem, resulting in a concrete proposal for fixing
the thin-air problem in the ISO standard for C++.

Contributions. There are two parts to the paper. In the first, we develop a deno-
tational model called “Modular Relaxed Dependencies model” (MRD) and build
metatheory around it. The model uses a relatively simple account of synchronisa-
tion, but it demonstrates separation between the calculation of dependency and
the enforcement of synchronisation. In the second, we evaluate the dependency
calculation by combining it with the fully-featured axiomatic models RC11 [18]
and IMM [26].
The denotational semantics has the following advantages:
1. It is the first thin-air solution to support fork/join (§2.2).
2. It satisfies the DRF-SC property for a compositional model (§5): programs
without data races behave according to sequential consistency.
3. It comes with a refinement relation that validates program transformations,
including the optimisation that makes Hotspot unsound for Java [30,29], and
a list of others from the Java Causality Tests [27] (§7).
4. It is shown to be equivalent to a global semantics that first performs a
dependency calculation and then applies an axiomatic model.
5. An example in Section 10 illustrates a case in which thin-air values are
observable in the current state-of-the-art models but forbidden in ours.
We adopt the dependency calculation from the global semantics of point 4 as
the basis of our C++ model, which we call MRD-C11. We establish the C++
DRF-SC property described in the standard [13] (§9.1) and we provide several
desirable properties for a solution to the thin-air problem in C++:
Modular Relaxed Dependencies in Weak Memory Concurrency 601

5. We show that our dependency calculation is the ﬁrst that can be applied
to any axiomatic model, and in particular the RC11 and IMM models that
cover C++ concurrency (§8).
6. Our augmented IMM model, which we call MRD+IMM, is provably imple-
mentable over x86, Power, ARMv8, ARMv7 and RISC-V, with the compiler
mappings provided by the IMM [26] (§8.1).
7. These augmented models of C++ are the ﬁrst that solve the thin-air problem
to have a tool that can automatically evaluate litmus tests (§11).

1.1 Modular Relaxed Dependency by example

To simplify things for now, we will attach an Init program to the beginning
of each example to initialise all global variables to zero. Doing this makes the
semantics non-compositional, but it is a natural starting place and aligns well
with previous work in the area. Later, after we have made all of our formal
deﬁnitions, we will see why the Init program is not necessary.
For now, consider a simple programming language where all values are booleans,
registers (ranging over r) are thread-local, and variables (ranging over x, y) are
global. Informally, an event structure for a program consists of a directed graph
of events. Events represent the global variable reads and writes that occur on all
possible paths that the program can take. This can be built up over the program
as follows: each write generates a single event, while each read generates two –
one for each possible value that could be read. These read events are put in
conﬂict with each other to indicate that they cannot both happen in a single
execution, this is indicated with a zig-zag red arrow between the two events.
Additionally, the event structure tracks true dependencies via an additional re-
lation which we call semantic dependencies (dp). These are yellow arrows from
read events to write events.
For example, consider the program
(r1 := x; y := r1 ) (LB1 )
that reads from a variable x and then writes the result to y. The interpretation
of this program is an event structure depicted as follows:

k 9
_x0 _x1
j 8
qy0 qy1

Each event has a unique identifier (the number attached to the box). The
straight black arrows represent program order, the curved yellow arrows indicate
a causal dependency between the reads and writes, and the red zigzag represents
a conflict between two events. If two events are in conflict, then their respective
continuations are in conflict too.
If we interpret the program Init; LB1 , as below, we get a program where
the Init event sets the variables to zero.
602 M. Paviotti et al.

R
AMBi
k 9
_x0 _x1
j 8
qy0 qy1

In the above event structure, we highlight events {1, 2, 3} to identify an exe-

cution. The green dotted arrow indicates that event 2 reads its value from event
1, we call this relation reads-from (rf). This execution is complete as all of its
reads read from a write and it is closed w.r.t conﬂict-free program order.
We interpret the following program similarly,

(r2 := y; x := r2 ) (LB2 )

leading to a symmetrical event structure where the write to x is dependent on

the read from y.
The interpretation of Init; (LB1 LB2 ) gives the event structure where
(LB1 ) and (LB2 ) are simply placed alongside one another.

R
AMBi
k 9 e 3
_x0 _x1 _y0 _y1
j 8 d N
qy0 qy1 qx0 qx1

The interpretation of parallel composition is the union of the event structures

from LB1 and LB2 without any additional conﬂict edges. When parallel compos-
ing the semantics of two programs, we add all rf-edges that satisfy a coherence
axiom. Here we present an axiom that provides desirable behaviour in this ex-
ample (Section 4 provides our model’s complete axioms).

(dp ∪ rf) is acyclic

The program Init; (LB1 LB2 ) allows executions of the following three
shapes.

R R R
AMBi AMBi AMBi
k e k e k e
_x0 _y0 _x0 _y0 _x0 _y0
j d j d j d
qy0 qx0 qy0 qx0 qy0 qx0
Modular Relaxed Dependencies in Weak Memory Concurrency 603

Note that in this example, we are not allowed to read the value 1 – reading
a value that does not appear in the program is one sort of thin-air behaviour, as
described by Batty et al. [7]. For example, the execution {1, 4, 5, 8, 9} does not
satisfy the coherence axiom as 4 −→ 5 −→ 8 −→ 9 −→ 4 forms a cycle.
dp rf dp rf

We now substitute (LB2 ) with the following code snippet

r1 := y; x := 1 (LB3 )

where the value written to the variable x is a constant. Its generated event
structure is depicted as follows
+
_y0 _y1
# /
qx1 qx1

In this program, for each branch, we can reach a write of value 1 to location
x. Hence, this will happen no matter which branch is chosen: we say b and d
are independent writes and we draw no dependency edges from their preceding
reads.
Consider now the program (LB3 ) in parallel with LB1 introduced earlier in
this section. As usual, we interpret the Init program in sequence with (LB1
LB3 ) as follows:
R
AMBi
k 9 +
_x0 _x1 _y0 _y1
j 8 # /
qy0 qy1 qx1 qx1

The resulting event structure is very similar to that of (LB1 LB2 ), but the
executions permitted in this event structure are diﬀerent. The dependency edges
calculated when adding the read are preserved, and now executions {1, 2, 3, a, b}
and {1, a, b, 4, 5} are allowed. However, this event structure also contains the
execution in which d is independent.
In the execution {d −→ 4 −→ 5 −→ c} there is
rf dp rf
R
no rf or dp edge between d and c that can create
AMBi
a cycle, hence this is a valid complete execution in
9 + which we can observe x = 1, y = 1. Note that the
_x1 _y1 Init is irrelevant in the consistency of this execution.
8 /
qy1 qx1 Modularity. It is worthwhile underlining the role that
modularity plays here. In order to compute the be-
haviour of (LB1 LB2 ) and (LB1 LB3 ) we did not have to compute the
behaviour of LB1 again. In fact, we computed the semantics of LB1 , LB2 and
LB3 in isolation and then we observed the behaviour in parallel composition.
604 M. Paviotti et al.

Thin-air values. The program (LB1 LB3 ) is a standard example in the weak
memory literature called load buﬀering. In the program (LB1 LB2 ), if event 5
or 9 were allowed in a complete execution, that would be an undesirable thin-air
behaviour: there is no value 1 in the program text, nor does any operation in the
program compute the value 1. The program (LB1 LB3 ) is similar, but now
contains a write of value 1 in the program text, so this is no longer a thin-air
value. Note that the execution given for it is not sequentially consistent, but
nonetheless a weak memory model needs to allow it so that a compiler can, for
example, swap the order of the two commands in LB3 , which are completely
independent of each other from its perspective.

2 Event Structures
Event structures will form the semantic domain of our denotational semantics
in Section 5. Our presentation follows the essential ideas of Winskel [33] and is
further inﬂuenced by the treatment of shared memory by Jeﬀrey and Riely [15].

2.1 Background
A partial order (E, ) is a set E equipped with a reflexive, transitive and an-
tisymmetric relation . A well-founded partial order is a partial order that has
no infinite decreasing chains of the form · · · ei−1 ei ei+1 · · · .
A prime event structure is a triple (E, , #). E is a set of events, is a
well-founded partial order on E and # is a conflict relation on E. # is binary,
symmetric and irreflexive such that, for all c, d, e ∈ E, if c#d e then c#e. We
write Con(E) for the set of conflict-free subsets of E, i.e. those subsets C ⊆ E
for which there is no c, d ∈ C such that c#d.

Notation. We use E to range over (prime/labelled/memory) event structures,

and also the event set contained within, when there is no ambiguity. We also use
E for event structures.
A labelled event structure (E, , #, λ), over a set of labels Σ, is a prime event
structure together with a function λ : E → Σ which assigns a label to an event.
We make events explicit using the notation {e : σ} for λ(e) = σ. We sometimes
avoid using names and just write the label σ when there is no risk of confusion.
Consider the labelled event structure formed by the
R
set {1, 2, 3, 4}, where the order relation is defined such
qx0
that 1 2 3 and 1 4, the conflict relation is defined
k 9 such that 2#4 and 3#4, and the labelling function is
_x0 _x1 defined such that λ(1) = (W x 0), λ(2) = (R x 0), λ(3) =
j (W y 1) and λ(4) = (R x 1). The event structure is
qy1 visualised on the left (we elide conflict edges that can be
inferred from order).
Given labelled event structures E1 and E2 define the product labelled event
structure E1 ×E2 (E, , #, λ). E is E1 ∪E2 , assuming E1 and E2 to be disjoint,
is 1 ∪ 2 , # is #1 ∪ #2 and λ is λ1 ∪ λ2 .
Modular Relaxed Dependencies in Weak Memory Concurrency 605

The coproduct labelled event structure E1 + E2 is the same as the product,

except that the conflict relation # is #1 ∪ #2 ∪ {E1 × E2 } ∪ {E2 × E1 }. We
can use a similar construction for the co-product of an infinite set of pairwise-
disjoint labelled event structures, indexed by I: we take infinite unions on the
underlying sets and relations, along with extra conflicts for every pair of indices.
Where the Ei are not disjoint, we can make them so by renamingwith fresh
event identifiers. In particular, we will need the infinite coproduct i∈I E with
as many copies of E as the cardinality of the set I, and all the events between
each copy in conflict. Each of these copies will by referred to as E i .
For a labelled event structure E0 and an event e, where e ∈ E0 , define the
prefix labelled event structure, e • E0 , as a labelled event structure (E, , #, λ)
where E equals E0 ∪ {e}, equals 0 ∪ ({e} × E), and # equals #0 .

2.2 The fork-join event structure

Our language supports parallel composition nested under sequential composi-
tion, so we will need to model spawning threads and a subsequent wait for their
termination. To support this, we define the fork-join composition of two labelled
event structures, E1 E2 . First we define the leaves, ↓ (E), as the -maximal ele-
ments of E. Let I be the set of maximal conflict-free subsets of ↓ (E1 ). Intuitively,
each event set in I corresponds to the last events4 of one way of executing the
concurrent threads in E1 . We then generate a fresh copy of E2 for each of the
executions: E3 = i∈I E2 .
Now E1 E2 (E, , #, λ) such that E is E1 ∪ E3 , # is #1 ∪ #3 , λ is λ1 ∪ λ3 ,
is the transitive closure of

1 ∪ 3 ∪ {(e, e ) | e ∈ i ∧ e ∈ E2E }

i∈I

The set of events, E, is the set E1 plus all the elements from the copies of
E3 . The order, , is constructed by linking every event in the copy E2i , with all
the events in the set i, plus the obvious order from E1 and the order in the local
copy E2i . Finally, the conﬂict relation is the union of the conﬂict in E1 and E3 .

3 Coherent event structure

The signature of labels, Σ, is deﬁned as follows:

Σ = ({R, W} × X × V) + {L} + {U}

where (W x v) ∈ Σ and (R x v) ∈ Σ are the usual write and read operations
and L, U are the lock and unlock operations respectively.
A coherent event structure is a tuple (E, S, , ≤) where E is a labeled event
structure. S is a set of partial executions, where each execution is a tuple compris-
ing a maximal conﬂict-free set of events, together with an intra-thread reads-from
4
We assume that there are no inﬁnite increasing -chains in E1 .
606 M. Paviotti et al.

relation rfi , an extra-thread reads-from rfe , a dependency relation dp, and a

partial order on lock/unlock events lk. The justification relation, , is a relation
between conflict-free sets and events. Finally, the preserved program order, ≤X ,
is a restriction of the program order, , for events on the same variable. ≤L is
the restriction of program order on events related in program order with locks
or unlocks. Finally, we define rf to be rfe ∪ rfi and ≤ to be ≤X ∪ ≤L . For a
partial execution, X ∈ S, we denote its components as lkX , rfX and dpX .
Justification, , collects dependency information in the program and is used
to calculate dpX . For a conflict-free set C and an event e, we say C justifies
e or e depends on C whenever C e. We collect dependencies between events
modularly in order to identify the so-called independent writes which will be
introduced shortly.
For a given partial execution, X, we define the order hbX as the reflexive
transitive closure of ( ∪ lkX ). A coherent event structure contains a data race
if there exists an execution X, with two events on the same variable x, at least
one of which is a write, that are not ordered by hbX . A coherent event structure
is data-race-free if it does not contain any data race. A racy rfX -edge is when
rfe
two events w and r are racy and w −−→ X r. Note that rfi edges cannot ever be
racy. We now define a coherent partial execution.
Definition 1 (Coherent Partial Execution). A partial execution X is co-
herent if and only if:
1. (≤L ∪ lkX ∪ dpX ∪ rfeX ) is acyclic, and
2. if (w : W x v) −→X (r : R x v) there are no (e : R x v ) or (e : W x ) such
rf

→ r with v = v .
hbX hbX
that w −−−→ e −−−
A complete execution X is an execution where all read events r have a write
w that they read from, i.e. w −→X r.
rf

4 Weak memory model

Central to the model is the way it records program dependencies in and dp.
Justification, , records the structure of those dependencies in the program that
may be influenced by further composition. As we shall see, composing programs
may add or remove dependencies from justification: for example, composing a
read may make later writes dependent, or the coproduct mechanism, introduced
shortly, may remove them. In some parts of the program, e.g. inside locked
regions, dependencies do not interact with the context. In this case, we freeze
the justifications, using them to calculate dp. Following a freeze, the justification
relation is redundant and can be forgotten – dp can be used to judge which
executions are coherent.

Freezing. Here we deﬁne a function freeze which takes a justiﬁcation C (w :

W x v) and gives the corresponding dependency relation (r : R x v) −→ (w :
dp

W x v) iﬀ r ∈ C. We lift freeze to a function on an event structure as follows:

Modular Relaxed Dependencies in Weak Memory Concurrency 607

freeze(E1 , S1 , 1 , ≤1 ) (E1 , S, ∅, ≤1 ) (1)

where S contains all the executions

(X1 , lkX1 , (dpX1 ∪ dp), rfX1 )

where for each write, wi ∈ X1 , we choose a justiﬁcation so that C1 1 w1 , ..., Cn 1

wn covers all writes in X1 . Furthermore, with dp deﬁned as follows:

dp = ( freeze(Ci wi ))
i∈{1,···n}

X1 must be a coherent execution. We prove that for a coherent execution there

always exists a choice of write justiﬁcations that freeze into dependencies to form
a coherent execution.
We will illustrate freezing of the program,

r1 := x; r2 := t; if (r1 == 1 ∨ r2 == 1){y := 1}

whose event structure is as follows:

R k
_x0 _x1
j 9 8 e
_t0 _t1 _t0 _t1
d 3 N
qy1 qy1 qy1

The rules later on in this section will provide us with justifications {(6 : R t 1)}
(9 : W y 1) and {(2 : R x 1)} (9 : W y 1) (but not the independent justification
(9 : W y 1)). So in this program there are two minimal justifications of
(9 : W y 1). The result of freezing is to duplicate all partial executions for each
choice of write justifications. In this case, we get an execution containing 2 −→ 9
dp

and another one containing 6 −→ 9.

4.1 Prepending single events

When prepending loads and stores, we model forwarding optimisations by up-

dating the justification relation: e.g. when prepending a write, (w : W x 0), to
an event structure where {(r : R x 0)} w , write forwarding satisfies the read
of the justification, leaving an independently justified write, w .
608 M. Paviotti et al.

R
Forwarding is forbidden if there
qx0 exists e in E such that w ≤ e ≤
r, as in the example on the left.
k j In this example we do not for-
_x0 _x1 ward 1 to 6. The rules of this
9 8 e d section give us that {1, 3, 6} 9:
_x0 _x1 _x0 _x1 we have preserved program or-
der over the accesses of x, 1 ≤
3 N
3 ≤ 6, and we do not forward
qz1 qz1
across the intervening read 3.

Read Semantics We now deﬁne the semantics of read prepending as follows:

(r : R x v) • (E1 , S1 , 1 , ≤1 ) ((r : R x v) • E1 , S, , ≤) (2)

where preserved program order ≤ is built straightforwardly out of ≤1 , ordering

locks, unlocks and same-location accesses, and S is deﬁned as the set of all
(X ∪ {r}, lkX , rfX , dpX ), where X is a partial execution of S1 and is the
smallest relation such that for all C 1 e we have

C1 ∪ {r} \ LF e

with LF being the “Load Forwarded ” set of reads, i.e.the set of reads consecu-
tively following the matching prepended one:

LF = {(r : R x v) ∈ C1 | e , r ≤X e ≤X r }

This allows for load forwarding optimisations and coherence is satisﬁed by

construction.

Write Semantics The write semantics are then deﬁned as follows:

(w : W x v) • (E1 , S1 , 1 , ≤1 ) ((w : W x v) • E1 , S, , ≤) (3)

where ≤ is built as in the read rule and S contains all coherent executions of
the form,
(X ∪ {w}, lkX , (rfX ∪ rfi ), dpX )
i
where X ∈ S1 , and w −−→
rf
r for any set of matching reads r in E1 such that
condition (1.2) of coherence is satisfied. Adding rfi edges leaves condition (1.1)
satisfied.
The justification relation is the smallest upward-closed relation such that
for all C 1 e:

1. w
2. C \ SF ∪ {w} e if there exists e ∈ C s.t. w ≤X e
3. C \ SF e otherwise
Modular Relaxed Dependencies in Weak Memory Concurrency 609

with SF being the Store Forwarding set of reads, i.e.the set of reads that we are
going to remove from the justiﬁcation set for later events that are matching the
write we are prepending. This is deﬁned as follows:

SF = {(r : R x v) | e, w ≤X e ≤X r }

When prepending a write to an event structure, we add it to justiﬁcations

that contain a read to the same variable. Failing to do so would invalidate the
DRF-SC property. We provide an example in Section 6.3, but we need to com-
plete the definition of the semantics first, in particular, we need to explain first
how the writes are lifted. This is coming in the next section (Section 4.2).

4.2 Coproduct semantics

The coproduct mechanism is responsible for making writes independent of prior
reads if they are sure to happen, regardless of the value read. It produces the
independent writes that enabled relaxed behaviour in the example in Section 1.
In the definition of coproduct we use an upward-closure of justification to
enable the lifting of more dependencies. Whenever C e we define ↑ (C) as the
upward-closed justification set, i.e. D e if C e, D is a conflict-free lock-free
set with C ⊆ D, such that for all e ∈ D if e is an event such that e ≤ e then
e ∈ D.
Now we define the coproduct operation. If E1 is a labelled event structure of
the form (r1 : R x v1 ) • E1 and, similarly, E2 is of the form (r2 : R x v2 ) • E2 ,
the coproduct of event structures is defined as,

(E1 , S1 , 1 , ≤1 ) + (E2 , S2 , 2 , ≤2 ) (E1 + E2 , S1 ∪ S2 , (1 ∪ 2 ∪ ), ≤)

where whenever {r1 } ∪ C1 1 (w : W y v) and {r2 } ∪ C2 2 (w : W y v) then if

the following conditions hold, we have D w and D w :
1. there exists a D ∈ ↑ (C1 ) that is isomorphic to a D ∈ ↑ (C2 ), that is, there
exists f : D → D that is a λ-preserving and ≤X -preserving bijection,
2. there is no event e in D such that r1 ≤X e
The example of Section 1 illustrates the application of condition (1) of co-
product. Recall the event structures of (LB1 ) and (LB3 ) respectively.
k 9 +
_x0 _x1 _y0 _y1
j 8 # /
qy0 qy1 qx1 qx1

In each case, the event structure is built as the coproduct of the conﬂicting
events. In (LB3 ), prior to applying coproduct we have {a} b and {c} d. The
writes have the same label for both read values so, taking C1 and C2 to be empty,
coproduct makes them independent, adding the independent writes b and d.
610 M. Paviotti et al.

In contrast, the values of writes 3 and 5 differ in (LB1 ), so the coproduct has
{2} 3 and {4} 5. When ultimately frozen, the justifications of (LB1 ) will
produce the dependency edges (2, 3) and (4, 5) as described in Section 1.
As for condition (2), if there is an event in the justification set that is ordered
in ≤X with the respective top read, then the top read cannot be erased from the
justification. Doing so would break the ≤X link.
When having value sets that contain more than two values, we use v∈V to
denote a simultaneous coproduct (rather than the infinite sum). More precisely,
if we coproduct the event structures E0 , E1 , · · · , En in a pairwise fashion as
follows,
(· · · (E0 + E1 ) + · · · ) + Ev
we would get liftings that are undesirable. To see this, it suffices to consider the
program,
if (r==3){x := 2}{x := 1}
where the write to x of 1 is independent for a coproduct over values 1 and 2, but
not when considering the event structure following (R x 3).

4.3 Lock semantics

When prepending a lock, we order the lock before following events in ≤ and we
freeze the justifications into dependencies. By freezing, we prevent justifications
from events after the lock from interacting with newly appended events. This
disables optimisations across the lock, e.g. store and load forwarding.
We define the semantics of locks as follows,

(l : L) • (E1 , 1 , S1 , ≤1 ) ((l : L) • E1 , ∅, S, ≤) (4)

where ≤X remains unchanged and (E1 , ∅, S1 , ≤1 ) = freeze(E1 , 1 , S1 , ≤1 ), where

S contains all partial executions of the form,

(X ∪ {l}, (lkX ∪ lk), dpX , rfX )

where X ∈ S1 and the lock order lk is such that for all lock or unlock event
l ∈ X, l −→ l . Finally, ≤L is ≤L 1 extended with the lock ordered before all
lk

events in E1 .
The semantics for the unlock is similar.

4.4 Parallel composition

We deﬁne the parallel semantics as follows. Note that this operation freezes the
constituent denotations before combining them, erasing their respective justiﬁ-
cation relations. This choice prevents the optimisation of dependencies across
forks and it makes thread inlining optimisations unsound, as they are in the
Promising Semantics [16] and the Java memory model [21].

(E1 , S1 , 1 , ≤1 ) × (E2 , S2 , 2 , ≤2 ) (E1 × E2 , S, ∅, ≤1 ∪ ≤2 )

Modular Relaxed Dependencies in Weak Memory Concurrency 611

where, S are all coherent partial executions of the form,

(X1 ∪ X2 , (lkX1 ∪ lkX2 ∪ lk), (dpX1 ∪ dpX2 ), (rfX1 ∪ rfX2 ∪ rfe ))

where X1 ∈ S1F , X2 ∈ S2F and

– freeze(E1 , S1 , 1 , ≤1 ) = (E1 , S1F , ∅, ≤1 )
– freeze(E2 , S2 , 2 , ≤2 ) = (E2 , S2F , ∅, ≤2 )
Furthermore, lk is constrained so that (lkX1 ∪ lkX2 ∪ lk) is a total order over
the lock/unlock operations such that no lock/unlock operation is introduced
between a lock and the next unlock on the same thread. Finally, we add all
rfe
(w : W x v) −−→ (r : R x v) edges such that the execution satisﬁes condition
(1.1) of coherence1 and such that w belongs to S1F and r belongs to S2F or vice
versa.

4.5 Join Semantics

We deﬁne the join composition as follows:

(E1 , S1 , 1 , ≤1 ) (E2 , S2 , 2 , ≤2 ) (E1 E2 , S, 1 , ≤) (5)

where ≤ is built as in the read rule and S are all executions of the form

(X1 ∪ X2 , (lkX1 ∪ lkX2 ∪ lk), (dpX1 ∪ dpX2 ), (rfX1 ∪ rfX2 ∪ rfi ))

where X1 ∈ S1 and X2 ∈ S2 with X1 and X2 conﬂict-free. Lock order lk orders

rfi
all lock/unlock of X1 before all lock/unlock of X2 and w −−→ r whenever w ∈ X1
and r ∈ X2 such that the execution is still coherent.

5 Language and Semantics

We consider an imperative language that has sequential and parallel composition,
and mutable shared memory.
Deﬁnition 2 (Language).

B := M = M | B ∧ B | B ∨ B | ¬B M := n | r
P ::= skip | r := x | x := M | P1 ; P2 | P1 P2 | if (B){P1 }{P2 }
| while(B){P } | L | U

We have standard boolean expressions, B, and expressions, M , represented by

natural numbers, n, or registers, r. Finally we have the set of command state-
ments, P , where skip is the command that performs no action, r := x reads
from a global variable and stores the value in r, x := M computes the expression
M and stores its value to the global variable x, P1 ; P2 is sequential composition,
1
Note that condition (1.2) does not need to be checked.
612 M. Paviotti et al.

and P1 P2 is parallel composition. We have standard conditional statements,

while loops, locks and unlocks. Moreover, a program P is lock-well-formed 5 if on
every thread, every lock is paired with a following unlock instruction and vice
versa, and there is no lock or unlock operation between pairs.
A register environment, R → V, is a function from the set of local registers, R,
to the set of values, V. A continuation is a function taking a register environment,
R → V, to an event structure, E. We write ∅ as a short-hand for λρ.∅, the
continuation returning the empty event structure.
We interpret the syntax defined above into the semantic domain defined in
Section 4. In Figure 1, we define · as a function which takes a step-index n,
a register environment ρ, and a continuation κ, and returns a coherent event
structure.
The interpretation function · is defined first by induction on the step-index
and then by induction on the syntax of the program. When n = 1 the inter-
pretation gives the empty event structure (undefined). Otherwise we proceed by
induction on the structure of the program. skip is just the continuation applied
to the environment. A read is interpreted as a set of conflicting read events for
each value v attached with a continuation applied to the environment where the
register is updated with v.
A write is interpreted as a write with a following continuation. We interpret
sequencing by interpreting the second program and passing it on to the interpre-
tation of the first as a continuation. Parallel composition is the interpretation
of the two programs with empty continuations passed to the × operator. The
conditional statement is interpreted as usual. For interpreting the while-loops
we use the induction hypothesis on the step-index [9].
When parallel composing two threads, we want to forbid any reordering with
events sequenced before or after the composition (as thread inlining would do).
To forbid this local reordering we surround this composition with two lock-unlock
pairs.

5.1 Compositionality
We deﬁne the language of contexts inductively in the standard way.
Deﬁnition 3 (Context).

C ::= [−] | P ; C | C; P | (C P ) | (P C)
| if (B){C}{P } | if (B){P }{C} | while(B){C}

In the base case, the context is a hole, denoted by [−]. The inductive cases follow
the structure of the program syntax. In particular, a context can be a program
P in sequence with a context, a context in sequence with a program P and so
on. For a context C we denote C[P ] by the inductively defined function on the
context C that substitutes the program P in every hole.
5
Jeffrey and Riely [15] adopt the same restriction. We conjecture that modelling
blocking locks [4] would not affect the DRF-SC property.
Modular Relaxed Dependencies in Weak Memory Concurrency 613

P 1 ρ κ = ∅
Ln ρ κ = (L • E1 , 1 )
skipn ρ κ = κ(ρ)
where (E1 , 1 ) = κ(ρ)
r := xn ρ κ = Σv∈V (R x v • κ(ρ[r → v]))
Un ρ κ = (U • E1 , 1 )
x := M n ρ κ = (W x M ρ ) • κ(ρ)
where (E1 , 1 ) = κ(ρ)
P1 ; P2 n ρ κ = P1 n ρ (λρ.P2 n ρ κ )

P1 P2 n ρ κ = L; Un ρ κ

where κ = (λρ.(P1 n ρ ∅ ) × (P2 n ρ ∅ ) (L; Un ρ κ ))

P1 n ρ κ Bρ = T
if (B){P1 }{P2 }n ρ κ =
P2 n ρ κ Bρ = F

P ;while(B){P }(n−1) ρ κ Bρ = T
while(B){P }n ρ κ =
skipn ρ κ Bρ = F

Fig. 1: Semantic interpretation

The following lemma shows that the semantics preserve context application.
This falls out from the fact that the semantic interpretation is compositional,
that is, we define every constructor in terms of its subcomponents.
Lemma 1 (Compositionality). For all programs P1 , P2 , if P1 = P2 then
for all contexts C, C[P1 ] = C[P2 ].
The proof is a straightforward induction on the context C and it follows from the
fact that semantics is inductively defined on the program syntax. The attentive
reader may note that to prove P1 = P2 in the first place we have to assume n,
ρ and κ and prove P1 n ρ κ = P2 n ρ κ . It is customary however in denotational
semantics to have programs denoted by functions that are equal if they are equal
at all inputs [31].

5.2 Data Race Freedom

Data race freedom ensures that we forbid optimisations which could lead to
unexpected behaviour even in the absence of data races. We ﬁrst deﬁne the
closed semantics for a program P . For all n, the semantics of P , namely P
is Init(P )n λx.0 ∅ , where Init(P ) is the program that takes the global vari-
ables in P and initialises them to 0. We now establish that race-free programs
interpreted in the closed semantics have sequentially consistent behaviour.

DRF semantics. Rather than proving DRF-SC directly, we prove that race-free
programs behave according to an intermediate semantics ·. This semantics
diﬀers from · in only two ways: program order is used in the calculation of
coherence instead of preserved program order, and no dependency edges are
614 M. Paviotti et al.

recorded (as these are subsumed by program order). More precisely, the seman-
tics is calculated as in Figure 1 but we check that (rfe ∪ lk ∪ ) is acyclic.
Note that race-free executions of the intermediate semantics · satisfy the
constraints of the model of Boehm and Adve [10], and the deﬁnition of race is
the same between the two models. Boehm and Adve prove that in the absence
of races, their model provides sequential consistency.
The DRF-SC theorem is stated as follows.
Theorem 1. For any program P , if P is data race free then every execution
D in P is a sequentially consistent execution, i.e. D is in P .

6 Tests and Examples

In this section, four examples demonstrate aspects of the semantics: the first
recognises a false dependency, the second forbids unintended behaviour allowed
by Jeffrey and Riely [15], the third motivates the choice to add forwarded writes
to justification, and the last shows how we support an optimisation forbidden
by Java but performed by the Hotspot compiler.

6.1 LB+ctrl-double
In the ﬁrst example, from Batty et al. [7], the compiler collapses conditionals to
transform P1 to P2 .
P1 P2
r1 := x;
if (r1 ==1){ +
y := 1 r1 := x; _x0 _x1
−→
} else { y := 1 # /

y := 1 qy1 qy1
}
Coproduct ensures that the denotations of P1 and P2 are identical, with the
event structure above, together with justiﬁcation b and d. From composi-
tionality (Lemma 1) and equality of the denotations, we have equal behaviour
of P1 and P2 in any context, and the optimisation is allowed.

6.2 Jeﬀrey and Riely’s TC7

The next test is Java TC7. The outcome where r1 , r2 and r3 all have value 1 is
forbidden by Jeﬀrey and Riely [15, Section 7], but allowed in the Java Causality
Test Cases [27].
T1 T2
r1 := z; r3 := y;

(TC7)
r2 := x; z := r3 ;

y := r2 x := 1
Modular Relaxed Dependencies in Weak Memory Concurrency 615

As noted by Jeﬀrey and Riely [15], the failure of this test “indicates a failure to
validate the reordering of independent reads”.

R e
_z0 _z1
k j d 3
_x0 _x1 _x0 _x1
9 8 N Ry
qy0 qy1 qy0 qy1

In the event structure of T1 above, the justiﬁcation relation is constructed ac-

cording to Section 5. In particular, the rule for prepending reads (equation (4.1))
gives us {1, 2} T1 4 and {1, 3} T1 5 on the left-hand side, and {6, 7} T1 9 and
{6, 8} T1 10 on the right. When composing the left and right sides, the co-
product rule (Section 4.2) makes four independent links, namely, {2} T1 4,
{3} T1 5, {7} T1 9, and {8} T1 10. This is because, at the top level, for
both branches, we can choose a write with the same label that is dependent on
the same reads (plus the top ones on z). More precisely, on the left-hand side
C1 = {1, 2} is such that C1 T1 4, and on the right-hand side C2 = {6, 7} is such
that C2 T1 9. When the top events, 1 and 6 respectively, are removed, these
contexts become isomorphic (C1 [1] ∼ = C2 [6]). Hence, {2} T1 4 and {7} T1 9,
and {3} T1 5 and {8} T1 10.
Now consider the event structure for the thread T2 .
Here we have two independent writes, namely T2 (15 :
RR Rk W x 1) and T2 (16 : W x 1), arising in the coproduct
_y0 _y1 from justifications {11} T2 (15 : W x 1) and {12} T2
Rj R9 (16 : W x 1). Notice that by definition (3), we do not
qz0 qz1 add the writes 13 and 14 to the justification sets of
any W x 1, and because they write different values to z
R8 Re
depending on the value of y, we have the dependencies
qx1 qx1
{11} T2 13 and {12} T2 14.
When parallel composing, we connect the rf-edges
that respect coherence. Thus we obtain the execution
{16 −→ 8 −→ 10 −→ 12 −→ 14 −→ 6}, which is coherent, allowing the outcome
rf dp rf dp rf

with r1 , r2 and r3 all 1 as desired.

6.3 Adding writes to justiﬁcations

In the definition of prepending writes (equation (3), condition (2)) we state that
for any given justification, if there is an event in the justification set that is
related via ≤X with the write we are prepending, then that write must be in the
justification set as well.
To see why we made this choice consider the following program,
616 M. Paviotti et al.

x := 1;
r1 := y;
if (r1 ==0){
r := z;
3
x := 0; r2 := x; if (r2 ==1){z := 1}
if (z==1){y := 1}
} else {
r3 := x; if (r3 ==1){z := 1}
}
and its associated event structure,

y
qx1
R k Ry RR
_y0 _y1 _z0 _z1
j 9 8 Rk
qx0 _x0 _x1 qy1
e d 3
_x0 _x1 qz1
N
qz1

We focus on the interpretation of the left-hand side thread. In the equation

(3), because {7} 9 and 3 ≤X 7, the event (3 : W x 0) gets inserted in the
justification set, leading to the justification {3, 7} 9. On the other branch,
up until the coproduct of the read on y, we have {5} 8. At this point, the
justifications {7} 9 and {5} 8 are not lifted because 9 requires 3 as well.
Event 3 may not be removed because of the condition in the write prepending
rule. Without this condition 3 would not be necessary to justify 9, yielding the
lifting of the link {5} 8. This would also cause the execution {0 −→ 5 −→
rf dp

8 −→ 11 −→ 12 −→ 2} to be coherent due to the lack of a dependency between

rf dp rf

2 and 5.
This execution is not sequentially consistent, but under SC, the program is
race free. Without writes in justiﬁcations, the model would violate the DRF-SC
property described in Section 5.2.

6.4 Java memory model, Hotspot.

Finally, we discuss redundant read after read elimination, an optimisation per-
formed by the Hotspot compiler but forbidden by the Java memory model. It
is the ﬁrst optimisation in the following sequence from Ševčı́k and Aspinall [30,
Figure 5], used to demonstrate that the Java memory model is too strict, and
unsound with respect to the observable behaviour of Sun’s Hotspot compiler.
Modular Relaxed Dependencies in Weak Memory Concurrency 617

T3 T2 T1
r2 := y;
if(r2 == 1)
r2 := y; x := 1;
{r3 := y; x := r3 } −→ −→
x := 1; r2 := y;
else
{x := 1}

Consider the event structures of the unoptimised T3 and optimised T1 .

R j Ry
_y0 _y1 qx1
k 9 e RR Rk
qx1 _y0 _y1 −→ _y0 _y1
8 d
qx0 qx1

The optimisation removes the apparently redundant pair of reads (4, 6), then
reorders the now-independent write. This redundancy is represented in justifi-
cation: when prepending the top read of y to the right-hand side of the event
structure, the existing justification 6 7 is replaced by 3 7. When coproduct is
applied, this matches with justification 1 2, leading to the independent writes
2 and 7. In a weak memory context however, a parallel thread could write a
value to y between the two reads, thereby changing the value written to x. For
this reason, we keep event 4 in the denotation and create the dependency edge
4 −→ 5.
dp

Despite exhibiting the same behaviour here, the denotations of T3 and T2 do

not match. We establish that the optimisation is sound in any context in the
next section.

7 Reﬁnement

We have shown in Section 5.1 that our semantics enjoys a compositionality

property: if we can prove that two programs have the same semantics (w.r.t
set-theoretical equality) then they cannot be distinguished by any context. We
also explained how equality is too strict, as it does not allow us to relate all
programs that ought to be deemed semantically equivalent. Our Java Hotspot
compiler example in Section 6 shows that the program T3 is in practice optimised
to T2 and then to T1 . However, it is clearly not true that T1 n ρ κ is a subset of
T2 n ρ κ .
In this section we present a coarser-grained relation, which we call reﬁnement
(). This relation permits the optimisations we want, but remains sound w.r.t.
the intuitive notion of observational equivalence, and that it is closed under
context application in the same way as equality.
618 M. Paviotti et al.

To show soundness we deﬁne observational reﬁnement (Obs ) which cap-

tures the intuitive notion of program equivalence: one program is a permissible
optimisation of another if it does not increase the set of observable behaviours,
defined here as changes to values of observed variables. The definition identifies
related executions and compares the ordering of observable events, recognising
that adding happens-before edges restricts behaviour. We then define a refine-
ment relation and show this relation is a subset of observational refinement. This
is formally stated in the following lemma:
Lemma 2 (Soundness of Refinement (⊆Obs )). For all P1 and P2 , if
P1 Tn ρ ∅ P2 Tn ρ ∅ then P1 Tn ρ ∅ obs P2 Tn ρ ∅

Note that the reﬁnement relation is deﬁned over a tweaked version of the
semantics, ·T , a variant of · in which the registers are explicit in the event
structure.
Finally we show is compositional:

Theorem 2 (Compositionality of Reﬁnement ()). For all programs P1

and P2 , and indexes n, if for all ρ, P1 Tn ρ ∅ P2 Tn ρ ∅ then for all contexts C,
ρ, κ and κ such that κ κ we have that C[P1 ]Tn ρ κ C[P2 ]Tn ρ κ

8 Showing implementability via IMM

In this section we show that our calculation of relaxed dependencies can easily be
reused to solve the thin-air problem in other state-of-the-art axiomatic models,
drawing the advantages of these models over to ours. In particular, we augment
the IMM and RC11 models of Podkopaev et al. [26]. We adopt their language,
given below. It covers C++ atomics, fences, fetch-and-add and compare-and-
swap operations but excludes locks. Note that locks are implementable using
compare and swap operations.

M := n | r
P ::= T1 · · · Tn
B := M = M | B ∧ B | B ∨ B | ¬B
oR ::= rlx | acq
T ::= skip | r :=oR x | x :=oW M | T1 ; T2
oW ::= rlx | rel
| if (B){P1 }{P2 } | while(B){P }
oF ::= acq | rel | acqrel | sc
| fenceoF | r := FADDooRRMW,oW
(x, M )
oRMW ::= normal | strong
| CASooRRMW
,oW
(x, M, M )
First we provide a model, written (for a program P ) as P MRD+IMM , that
combines our relaxed dependencies to the axiomatic model of IMM , here written
as P IMM . We will make these deﬁnitions precise shortly. We then show that
P MRD+IMM is weaker than P IMM , making P MRD+IMM implementable over
hardware architectures like x86-TSO, ARMv7, ARMv8 and Power. Secondly, we
relax the RC11 axiomatic model by using our relaxed dependencies model MRD
to create a new model P MRD-C11 , and show this model weaker than the RC11
Modular Relaxed Dependencies in Weak Memory Concurrency 619

model. We argue that the mathematical description of P MRD-C11 is lightweight

and close to the C++ standard, it would therefore require minimal work to
augment the standard with the ideas presented in this paper.
To prove implementability over hardware architectures we define a pre-execution
semantics, where the relaxed dependency relation dp is calculated along with the
data and control dependencies from IMM . To combine our model with IMM ,
we redefine the ar relation (we refer the reader to the IMM paper [26] for the
details on ar) such that it is parametrised by an arbitrary relation which we put
in place of the relations (data ∪ ctrl). ar(data ∪ ctrl) equals the original axiom
ar and ar(dp) is the same axiom where dp is put in place of data ∪ ctrl.
We define the executions in P MRD+IMM as the maximal conflict-free sets
such that ar(dp) is acyclic, and executions in P IMM as the maximal conflict-
free sets such that ar(data ∪ ctrl) is acyclic.

8.1 Implementability

We can now state and prove that the MRD model is implementable over IMM,
which gives us that MRD is implementable over x86-TSO, ARMv7, ARMv8,
Power and RISC-V by combining our result with the implementability result of
IMM .

Theorem 3 (MRD+IMM is weaker than IMM ). For all programs P by the

IMM model,
P MRD+IMM ⊇ P IMM

9 Modular Relaxed Dependencies in RC11: MRD-C11

We refer to the RC11 [18] model, as speciﬁed in Podkopaev et al. [26]. We call this
model P RC11 . While P RC11 forbids thin-air executions, it is not weak enough:
it forbids common compiler optimisations by imposing that ( ∪ rf) is acyclic.
We relax this condition by similarly replacing with our relaxed dependency
relation dp, this time calculated on our preserved program order relation (≤).
We call this model P MRD-C11 . Mathematically, this is done by imposing that
(dp ∪ rf) is acyclic.
At this point, we prove the following lemma:
Lemma 3 (Implementability of MRD-C11). For all programs P ,

P MRD-C11 ⊇ P RC11

To show this it suﬃces to show that there always exists dp ⊆ . This is straight-
forward by induction on the structure of P , observing that the only place where
dependencies go against is when hoisting a write in the coproduct case. How-
ever, in the same construction we always preserve the dependencies coming from
the diﬀerent branches of the structure which are, by inductive hypothesis, always
agreeing with program order.
620 M. Paviotti et al.

9.1 MRD-C11 is DRF-SC

We show that MRD-C11 validates the DRF-SC theorem of the C++ standard [13,
§6.8.2.1 paragraph 20].

Theorem 4 (MRD-C11 is DRF-SC). For a program whose atomic accesses

are all SC-ordered, if there are no SC-consistent executions with a race over
non-atomics, then the outcomes of P under MRD-C11 coincide with those under
SC.

Sketch proof. In the absence of races and relaxed atomics, the no-thin-air guar-
antee of RC11 is made redundant by the guarantee of happens-before acyclicity
shared by RC11 and MRD-C11. The result follows from this observation, lemma 3
and Theorem 4 from Lahav et al. [18].

10 On the Promising Semantics and weakestmo

In this section we present examples that differentiate the Promising Semantics
and weakestmo from our MRD and MRD-C11 models.
First, we show that MRD correctly forbids the out-of-thin-air behaviour in the
litmus test Coh-CYC from Chakraborty and Vafeiadis [11]. The test, given below,
differentiates Promising and weakestmo: only the latter avoids the outcome
r1 = 3, r2 = 2 and r3 = 1.
x := 1;
x := 2;
r2 := x; \\ 2
r1 := x; \\ 3
r3 := y; \\ 1
if (r1 != 2){y := 1}
if (r3 != 0){x := 3}
MRD correctly forbids this outcome: it identifies a dependency on the left-
hand thread from the read of 3 from x to the write y := 1, and on the right-hand
thread from the read of 1 from y to the write x := 3. The desired outcome then
has a cycle in dependency and reads-from, and it is forbidden.
Chakraborty and Vafeiadis ascribe the behaviour to “a violation of coherence
or a circular dependency”, and include specific machinery to weakestmo that
checks for global coherence violations at each step of program execution. These
global checks forbid the unwanted outcome.
The Promising Semantics, on the other hand, can make promises that are not
sensitive to coherence order, and therefore allows the above outcome erroneously.
In Coh-CYC, enforcing coherence ordering at each step in weakestmo was
enough to forbid the thin-air behaviour, but it is not adequate in all cases. The
example below features an outcome that Promising and weakestmo allow, and
that MRD-C11 and MRD forbid. It demonstrates that cycles in dependency can
arise without violating coherence in weakestmo.

z := 1 y := x if (z!= 0){x := 1}{r0 := y; x := r0 ; a := r0 }

Modular Relaxed Dependencies in Weak Memory Concurrency 621

The program is an adaptation6 of a Java test, where the the unwanted out-
come represents a violation of type safety [20]. Observing the thin-air behaviour
where a = 1 in the adaptation above is the analogue of the unwanted outcome in
the original test. If in the end a = 1, then the second branch of the conditional
in the rightmost thread must execute. It contains a read of 1 from y, and a
dependent write of x := 1. On the middle thread there is a read of 1 from x, and
a dependent write of y := 1. These dependencies form the archetypal thin-air
shape in the execution where a = 1. MRD correctly identiﬁes these dependencies
and the outcome is prohibited due to its cycle in reads-from and dependency.
The a = 1 outcome is allowed in the Promising Semantics: a promise can be
validated against the write of x := 1 in the true branch of the righthand thread,
and later switched to a validation with x := r0 from the false branch, ignoring
the dependency on the read of y.
In the previous example, Coh-CYC, a stepwise global coherence check caused
weakestmo to forbid the unwanted behaviour allowed by Promising, but that
machinery does not apply here. weakestmo allows the unwanted outcome, and
we conjecture that this deﬁciency stems from the structure of the model. De-
pendencies are not represented as a relation at the level of the global axiomatic
constraint, so one cannot check that they are consistent with the dynamic exe-
cution of memory, as represented by the other relations. Adopting a coherence
check in the stepwise generation of the event structure mitigates this concern for
Coh-CYC, but not for the test above.
In contrast, MRD does represent dependencies as a relation, allowing us to
check consistency with the rf relation here. The axiom that requires acyclicity
of (dp ∪ rf) forbids the unwanted outcome, as desired.

11 Evaluating MRD-C11 with the MRD-er tool

MRD-C11 is the first weak memory model to solve the thin-air problem for C++
atomics that has a tool for automatically evaluating litmus tests. Our tool, MRD-
er, evaluates litmus tests under the base model, RC11 augmented with MRD, and
IMM augmented with MRD. It has been used to check the result of every litmus
test in this paper, together with many tests from the literature, including the
Java Causality Test cases [7,11,15,16,18,25,26,27].
When evaluating whether a particular execution is allowed for a given test, a
model that solves the thin-air problem must take other executions of the program
into account. For example, the semantics of Pichon-Pharabod et al., having
explored one execution path, may ultimately backtrack [25]. Jeffrey and Riely
phrase their semantics as a two player game where at each turn, the player
explores all forward executions of the program [15]. At each operational step, the
Promising Semantics [16] has to run forwards in a limited local way to validate
6
James Riely, Alan Jeffrey and Radha Jagadeesan provided the precise example pre-
sented here [28]. It is based on Fig. 8 of Lochbihler [20], and its problematic execution
under Promising was confirmed with the authors of Promising.
622 M. Paviotti et al.

that promised writes will be reached. The invisible events of Chakraborty et

al. [11] are used to similar effect.
In MRD-C11, it is the calculation of justification that draws in information
from other executions. This mechanism is localised, it avoids making choices
about the execution that prune behaviours, and it does not require backtracking.
MRD-C11 acts in a “bottom-up” fashion, and modularity ensures that justifica-
tions drawn from the continuation need not be recalculated. These properties
have supported the development of MRD-er: automation of the model requires
only a single pass across the program text to construct the denotation.

12 Discussion

Four recent papers have presented models that forbid thin-air values and permit
previously challenging compiler optimisations. The key insight from these papers
is that it is necessary to consider multiple program executions simultaneously.
To do this, three of the four [15,25,11] use event structures, while the Promising
Semantics [16] is a small-step operational semantics that explores future traces
in order to take a step.
Although the Promising Semantics [16] is quite different from MRD, its mech-
anism for promising focuses on future writes, and MRD has parallels in its cal-
culation of independent writes. Note also that both Promising’s certification
mechanism and MRD’s lifting are thread-local.
The previous event-structure-based models are superficially similar to MRD,
but all have a fundamentally different approach from ours: Pichon-Pharabod and
Sewell [25] use event structures as the state of a rewriting system; Jeffrey and
Riely [14,15] build whole-program event structures and then use a global mech-
anism to determine which executions are allowed; and Chakraborty et al. [11]
transform an event structure using an operational semantics. In contrast, we fol-
low a more traditional approach [33] where our event structures are used as the
co-domain of a denotational semantics. Further, Jeffrey and Riely [14,15] and
Pichon-Pharabod and Sewell [25] do not cover a significant subset set of C++
relaxed concurrency primitives.
MRD does not suffer from known problems with existing models. As noted
by Kang et al. [16], the Pichon-Pharabod and Sewell model produces behaviour
incompatible with the ARM architecture. The Jeffrey and Riely model forbids
the reordering of independent reads, as demonstrated by Java Causality Test 7
(see Section 6.2). The Promising semantics allows the cyclic coherence ordering
of the problematic Coh-CYC example [11]. weakestmo allows the thin-air out-
come in the Java-inspired test of Section 10. In all four cases MRD provides the
correct behaviour.
MRD is also highly compatible with the existing C++ standard text. The
dp relation generated by MRD can be used directly in the axiomatic model to
forbid thin-air behaviour. We are working on standards text with the ISO C++
committee based on this work, and have a current working paper with them [5].
Modular Relaxed Dependencies in Weak Memory Concurrency 623

The notion in C++ that data-race free programs should not exhibit observ-
able weak behaviours goes back to Adve and Hill [1], and formed the basis of
the original proposal for C++ [10]. This was formalised by Batty et al. [8] and
adopted into the ISO standard. Despite the pervasiveness of DRF-SC theorems
for weak memory models, these have remained whole-program theorems that
do not support breaking a program into separate DRF and racy components.
Our DRF theorem for our denotational model demonstrates a limited form of
modularity that merits further exploration.
Other denotational approaches to relaxed concurrency have not tackled the
thin-air problem. Dodds et al. [12] build a denotational model based on an
axiomatic model similar to C++. It forms the basis of a sound reﬁnement relation
and is used to validate data-structures and optimisations. Their context language
is too restrictive to support a compositional semantics, and their compromise
to disallow thin-air executions forbids important optimisations. Kavanagh and
Brookes [17] provide a denotational account of TSO concurrency, but their model
is based on pomsets and suﬀers from the same limitation as axiomatic models [7]:
it cannot be made to recognise false dependencies.

Future Work. We envisage a generalised theorem that would, on augmentation

with MRD, extend an axiomatic DRF-SC proof to a proof that applies to the
augmented model.
The ISO have struggled to define memory order::consume [13]. It is intended
to provide ordering through dependencies that the compiler will not optimise
away. The semantic dependency relation calculated by MRD identifies just these
dependencies, and may support a better definition.
Finally, where we have used a global semantics to provide a full C++ model,
it would be interesting to extend the denotational semantics to also cover all of
C++, thereby allowing reasoning about C++ code in isolation from its context.

13 Conclusions

We have used the relatively recent insight that to avoid thin-air problems, a
semantics should consider some information about what might happen in other
program executions. We codify that into a modular notation of justification,
leading to a semantic notion of independent writes, and finally of dependency
(dp). We demonstrate the effectiveness of these concepts in three ways. One,
we define a denotational semantics for a weak memory model, show it supports
DRF-SC, and build a compositional refinement relation strong enough to verify
difficult optimisations. Two, we show how to use dp with other axiomatic models,
supporting the first optimal implementability proof for a thin-air solution via
IMM , and showing how to repair the ISO C++ model. Three, we build a tool
for executing litmus tests allowing us to check a large number of examples.
624 M. Paviotti et al.

References

1. Adve, S.V., Hill, M.D.: Weak ordering — a new definition. In: ISCA (1990)
2. Alglave, J., Maranget, L., McKenney, P.E., Parri, A., Stern, A.: Frightening small
children and disconcerting grown-ups: Concurrency in the linux kernel. In: ASP-
LOS (2018)
3. Alglave, J., Maranget, L., Tautschnig, M.: Herding cats: modelling, simulation,
testing, and data-mining for weak memory. In: PLDI (2014)
4. Batty, M.: The C11 and C++11 Concurrency Model. Ph.D. thesis, University of
Cambridge, UK (2015)
5. Batty, M., Cooksey, S., Owens, S., Paradis, A., Paviotti, M., Wright, D.: Modular
Relaxed Dependencies: A new approach to the Out-Of-Thin-Air Problem (2019),
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1780r0.html
6. Batty, M., Donaldson, A.F., Wickerson, J.: Overhauling SC atomics in C11 and
OpenCL. In: POPL (2016)
7. Batty, M., Memarian, K., Nienhuis, K., Pichon-Pharabod, J., Sewell, P.: The prob-
lem of programming language concurrency semantics. In: ESOP (2015)
8. Batty, M., Owens, S., Sarkar, S., Sewell, P., Weber, T.: Mathematizing C++ con-
currency. In: POPL (2011)
9. Benton, N., Hur, C.: Step-indexing: The good, the bad and the ugly. In: Modelling,
Controlling and Reasoning About State, 29.08. - 03.09.2010 (2010)
10. Boehm, H.J., Adve, S.V.: Foundations of the C++ concurrency model. In: PLDI
(2008)
11. Chakraborty, S., Vafeiadis, V.: Grounding thin-air reads with event structures. In:
POPL (2019)
12. Dodds, M., Batty, M., Gotsman, A.: Compositional verification of compiler opti-
misations on relaxed memory. In: ESOP (2018)
13. ISO/IEC JTC 1/SC 22 Programming languages, their environments and system
software interfaces: ISO/IEC 14882:2017 Programming languages — C++ (2017)
14. Jeffrey, A., Riely, J.: On thin air reads towards an event structures model of relaxed
memory. In: LICS (2016)
15. Jeffrey, A., Riely, J.: On thin air reads: Towards an event structures model of
relaxed memory. Logical Methods in Computer Science 15(1) (2019)
16. Kang, J., Hur, C.K., Lahav, O., Vafeiadis, V., Dreyer, D.: A promising semantics
for relaxed-memory concurrency. In: POPL (2017)
17. Kavanagh, R., Brookes, S.: A denotational semantics for SPARC TSO. MFPS
(2018)
18. Lahav, O., Vafeiadis, V., Kang, J., Hur, C., Dreyer, D.: Repairing sequential con-
sistency in C/C++11. In: PLDI (2017)
19. Leroy, X., Grall, H.: Coinductive big-step operational semantics. Inf. Comput.
(2009)
20. Lochbihler, A.: Making the Java memory model safe. ACM Trans. Program. Lang.
Syst. (2013)
21. Manson, J., Pugh, W., Adve, S.V.: The Java Memory Model. In: POPL (2005)
22. McKenney, P.E., Jeffrey, A., Sezgin, A., Tye, T.: Out-of-Thin-Air Execution
is Vacuous (2016), http://www.open-std.org/jtc1/sc22/wg21/docs/papers/
2016/p0422r0.html
23. Michalis Kokologiannakis, Azalea Raad, V.V.: Model checking for weakly consis-
tent libraries. In: PLDI (2019)
Modular Relaxed Dependencies in Weak Memory Concurrency 625

24. Owens, S., Myreen, M.O., Kumar, R., Tan, Y.K.: Functional big-step semantics. In:
Programming Languages and Systems - 25th European Symposium on Program-
ming, ESOP 2016, Held as Part of the European Joint Conferences on Theory and
Practice of Software, ETAPS 2016, Eindhoven, The Netherlands, April 2-8, 2016,
Proceedings (2016)
25. Pichon-Pharabod, J., Sewell, P.: A concurrency semantics for relaxed atomics that
permits optimisation and avoids thin-air executions. In: POPL (2016)
26. Podkopaev, A., Lahav, O., Vafeiadis, V.: Bridging the gap between programming
languages and hardware weak memory models. PACMPL (POPL) (2019)
27. Pugh, W.: Java causality tests. http://www.cs.umd.edu/~pugh/java/
memoryModel/CausalityTestCases.html (2004), accessed: 2018-11-17
28. Riely, J., Jagadeesan, R., Jeﬀrey, A.: private correspondence (2020)
29. Ševčı́k, J.: Program transformations in weak memory models. Ph.D. thesis, Uni-
versity of Edinburgh, UK (2009)
30. Ševčı́k, J., Aspinall, D.: On validity of program transformations in the Java memory
model. In: ECOOP (2008)
31. Streicher, T.: Domain-theoretic foundations of functional programming (01 2006)
32. Wickerson, J., Batty, M., Sorensen, T., Constantinides, G.A.: Automatically com-
paring memory consistency models. In: POPL (2017)
33. Winskel, G.: Event structures. In: Petri Nets: Central Models and Their Properties,
Advances in Petri Nets 1986, Part II, Proceedings of an Advanced Course, Bad
Honnef, 8.-19. September 1986 (1986)
34. Winskel, G.: An introduction to event structures (1989)

Ben Simner1 , Shaked Flur1∗ , Christopher Pulte1∗ , Alasdair Armstrong1 , Jean

Pichon-Pharabod1 , Luc Maranget2 , and Peter Sewell1
1
University of Cambridge, UK
2
INRIA Paris, France
∗
These authors contributed equally

Abstract. Computing relies on architecture speciﬁcations to decouple

hardware and software development. Historically these have been prose
documents, with all the problems that entails, but research over the
last ten years has developed rigorous and executable-as-test-oracle spec-
ifications of mainstream architecture instruction sets and “user-mode”
concurrency, clarifying architectures and bringing them into the scope of
programming-language semantics and verification. However, the system
semantics, of instruction-fetch and cache maintenance, exceptions and
interrupts, and address translation, remains obscure, leaving us without
a solid foundation for verification of security-critical systems software.
In this paper we establish a robust model for one aspect of system se-
mantics: instruction fetch and cache maintenance for ARMv8-A. Sys-
tems code relies on executing instructions that were written by data
writes, e.g. in program loading, dynamic linking, JIT compilation, de-
bugging, and OS configuration, but hardware implementations are often
highly optimised, e.g. with instruction caches, linefill buffers, out-of-order
fetching, branch prediction, and instruction prefetching, which can affect
programmer-observable behaviour. It is essential, both for programming
and verification, to abstract from such microarchitectural details as much
as possible, but no more. We explore the key architecture design ques-
tions with a series of examples, discussed in detail with senior Arm staff;
capture the architectural intent in operational and axiomatic seman-
tic models, extending previous work on “user-mode” concurrency; make
these models executable as test oracles for small examples; and experi-
mentally validate them against hardware behaviour (finding a bug in one
hardware device). We thereby bring these subtle issues into the mathe-
matical domain, clarifying the architecture and enabling future work on
system software verification.

1 Introduction
Computing relies on the architectural abstraction: the speciﬁcation of an en-
velope of allowed hardware behaviour that hardware implementations should
lie within, and that software should assume. These interfaces, deﬁned by hard-
ware vendors and relatively stable over time, notionally decouple hardware and

c The Author(s) 2020

P. Müller (Ed.): ESOP 2020, LNCS 12075, pp. 626–655, 2020.
https://doi.org/10.1007/978-3-030-44914-8_ 23
ARMv8-A system semantics: instruction fetch in relaxed architectures 627

software development; they are also, in principle, the foundation for software ver-
ification. In practice, however, industrial architectures have accumulated great
complexity and subtlety: the ARMv8-A and Intel architecture reference manuals
are now 7476 and 4922 pages [9,26], and hardware optimisations, including out-
of-order and speculative execution, result in surprising and poorly-understood
programmer-observable behaviour. Architecture specifications have historically
also been entirely informal, describing these complex envelopes of allowed be-
haviour solely in prose and pseudocode. This is problematic in many ways: do not
serve as clear documentation, with the inevitable ambiguity and incompleteness
of informal prose leaving major questions unanswered; without a specification
that is executable as a test oracle (that can decide whether some observed be-
haviour is allowed or not), hardware validation relies on test suites that must be
manually curated; without an architecturally-complete emulator (that can ex-
hibit all allowed behaviour), it is very hard for software developers to “program to
the specification” – they rely on test-and-debug development, and can only test
above the hardware implementation(s) they have; and without a mathematically
rigorous semantics, formal verification of hardware or software is impossible.
Over the last 10 years, much has been done to put architecture specifications
on a more rigorous footing, so that a single specification can serve all those
purposes. There are three main problems, two of which are now largely solved.
The first is the instruction-set architecture (ISA): the specification of the
sequential behaviour of individual instructions. This is chiefly a problem of scale:
modern industrial architectures such as Arm or x86 have large instruction sets,
and each instruction involves many details, including its behaviour at different
privilege levels, virtual-to-physical address translation, and so on – a single Arm
instruction might involve hundreds of auxiliary functions. Recent work by Reid
et al. within Arm [40,41,42] transitioned their internal ISA description into a
mechanised form, used both for documentation and testing, and with him we
automatically translated this into publicly available Sail definitions and thence
into theorem-prover definitions [11,10]. Other related work is in §7.
The second is the relaxed-memory concurrent behaviour of “user-mode” op-
erations: memory writes and reads, and the mechanisms that architectures pro-
vide to enforce ordering and atomicity (dependencies, memory barriers, load-
linked/store-conditional operations, etc.). In 2008, for ARMv7, IBM POWER,
and x86, this was poorly understood, and the architects regarded even their own
prose specifications as inscrutable. Now, following extensive work by many peo-
ple [36,37,19,18,22,8,31,45,7,46,48,35,6,2,47,13,1], ARMv8-A has a well-defined
and simplified model as part of its specification [9, B2.3], including a prose
transcription of a mathematical model [15], and an equivalence proof between
operational and axiomatic presentations [36,37]; RISC-V has adopted a similar
model [52]; and IBM POWER and x86 have well-established de-facto-standard
models. All of these are experimentally validated against hardware, and sup-
ported by tools for exhaustively running tests [17,4]. The combination of these
models and the ISA semantics above is enough to let one reason about or model-
check concurrent algorithms.
628 B. Simner et al.

That leaves the third part of the problem: the “system” semantics, of
instruction-fetch and cache maintenance, exceptions and interrupts, and ad-
dress translation and TLB (translation lookaside buffer) maintenance. Just as
for “user-mode” relaxed memory, these are all areas where microarchitectural op-
timisations can have surprising programmer-visible effects, especially in the con-
current context. The mechanisms are relied on by all code, but they are explicitly
managed only by systems code, in just-in-time (JIT) compilers, dynamic loaders,
operating-system (OS) kernels, and hypervisors. This is, of course, exactly the
security-critical computing base, currently trusted but not trustworthy, that is
especially in need of verification – which requires a precise and well-validated
definition of the architectural abstraction. Previous work has scarcely touched
on this: none of seL4 [27], CertiKOS [24,23], Komodo [16], or [25,12], address
realistic architecture concurrency, and they use (at best) idealised models of the
sequential systems architecture. The CakeML [51,28] and CompCert [29] verified
compilers target only sequential user-mode ISA fragments.
In this paper we focus on one aspect of system semantics: instruction fetch
and cache maintenance, for ARMv8-A. The ability to execute code that has
previously been written to data memory is fundamental to computing: fine-
grained self-modifying code is now rare, and (rightly) deprecated, but program
loading, dynamic linking, JIT compilation, debugging, and OS configuration all
rely on executing code from data writes. However, because these are relatively
infrequent operations, hardware designers have been able to optimise by partially
separating the instruction and data paths, e.g. with distinct instruction caching,
which by default may not be coherent with data accesses. This can introduce
programmer-visible behaviour analogous to that of user-mode relaxed-memory
concurrency, and require specific additional synchronisation to correctly pick up
code modifications. Exactly what these are is not entirely clear in the current
ARMv8-A architecture text, just as pre-2018 user-mode concurrency was not.
Our main contribution is to clarify this situation, developing precise abstrac-
tions that bring the instruction-fetch part of ARMv8-A system behaviour into
the domain of rigorous semantics. Arm have stated [private communication]
that they intend to incorporate a version of this into their architecture. We aim
thereby to enable future work on system software verification using the tech-
niques of programming languages research: program analysis, model-checking,
program logics, etc. We begin (§2) by recalling the informal architectural guar-
antees that Arm provide, and the ways in which real-world software systems
such as Linux, JavaScript, and WebAssembly change instruction memory. Then:
(1) We explore the fundamental phenomena and architecture de-
sign questions with a series of examples (§3). We explore the interactions
between instruction fetching, cache maintenance and the ‘usual’ relaxed mem-
ory stores and loads, showing that instruction fetches are more relaxed, and
how even fundamental coherence guarantees for data memory do not apply to
instruction fetches. Most of these questions arose during the development of our
models, in detailed ongoing discussion with the Arm Chief Architect and other
Arm staff. They include questions of several different kinds. Six are clear from
ARMv8-A system semantics: instruction fetch in relaxed architectures 629

the Arm prose specification. Of the others: two are not implied by the prose but
are natural choices; five involved substantive new choices by Arm that had not
previously been considered and/or documented; for two, either choice could be
reasonable, and Arm chose the simpler (and weaker) option; and for one, Arm
were independently already strengthening the architecture to accommodate ex-
isting software.
(2) We give an operational semantics for Arm instruction fetch
and icache maintenance (§4). This is in an abstract-microarchitectural style
that supports an operational intuition for how hardware actually works, while
abstracting from the mass of detail and the microarchitectural variation of actual
hardware implementations. We do so by extending the Flat model [37] with
simple abstractions of instruction caches and the coherent data cache network,
in a way that captures the architectural intent, defining the entire envelope of
behaviours that implementations should be allowed to exhibit.
(3) We give a more concise presentation of the model in an ax-
iomatic style (§5), extending the “user-mode” axiomatic model from previous
work [37,36,15,9], and intended to be functionally equivalent. We discuss how
this too matches the architectural intent.
(4) We validate all this in two ways: by the extensive discussion with
Arm staff mentioned above, and by experimental testing of hardware behaviour,
on a selection of ARMv8-A cores designed by multiple vendors (§6). We run
tests on hardware with a mild extension of the Litmus tool [5,7]. We make the
operational model executable as a test oracle by integrating it into the RMEM
tool and its web interface [17], introducing optimisations that make it possible
to exhaustively execute the examples. We make the axiomatic model executable
as a test oracle with a new tool that takes litmus tests and uses a Sail [11]
definition of a fragment of the ARMv8-A ISA to generate SMT problems for the
model. We then compare hardware and the two models for the handwritten tests
(modulo two tests not supported by the axiomatic checker), compare hardware
and the operational model on a suite of 1456 tests, automatically generated
with an extension of the diy tool [3], and check the operational and axiomatic
models against sets of previous non-ifetch tests. In all this data our models are
equivalent to each other and consistent with hardware observations, except for
one case where our testing uncovered a hardware bug on a Qualcomm device.
Finally, we discuss other related work (§7) and conclude (§8). We do all this
for ARMv8-A, but other relaxed architectures, e.g. IBM POWER and RISC-V,
face similar issues; our tests and tooling should enable corresponding work there.
The models are too large to include or explain in full here, so we focus
on explaining the motivating examples, the main intuition and style of the
operational model, in a prose rendering of its executable mathematics, and
the definition of the axiomatic model. Appendices provide additional exam-
ples, a complete prose description of the operational model, and additional ex-
planation of the axiomatic model. The complete executable mathematics ver-
sion, the web-interface tool for running it, and our test results are at https:
//www.cl.cam.ac.uk/~pes20/iflat/.
630 B. Simner et al.

Caveats and Limitations Our executable models are integrated with a substan-
tial fragment of the Sail ARMv8-A ISA (similar to that used for CakeML), but
not yet with the full ISA model [11,40,41,42]; this is just a matter of additional
engineering. We only handle the 64-bit AArch64 part of ARMv8-A, not AArch32.
We do not handle the interaction between instruction fetch and mixed-size ac-
cesses, or other variants of the cache maintenance instructions, e.g. those used for
interaction with DMA engines, and variants by set or way instead of by virtual
address. Finally, the equivalence between our operational and axiomatic models
is validated experimentally. A proof of this equivalence is essential in the long
term, but would be a major work in itself: the complexity makes mechanisation
essential, but the operational model (in all its scale and complexity) has not yet
been subject to mechanised proof. Without instruction fetch, a non-mechanised
proof was the main result of an entire PhD thesis [36], and we expect the addition
of instruction fetch to require global changes to the argument.

2 Industry Practice and the Existing ARMv8-A Prose

Computer architecture relies on a host of sophisticated techniques, including

buffering, caching, prediction, and pipelining, for performance. For the normal
memory reads and writes of “user-mode” concurrency, the programmer-visible
relaxed-memory effects largely arise from store buffering and from out-of-order
and speculative pipeline behaviour, not from the cache hierarchy (though some
IBM POWER phenomena do arise from the interconnect, and from late process-
ing of cache invalidates). All major architectures provide a strong per-location
guarantee of coherence: for each memory location, different threads cannot ob-
serve the writes to that location in different orders. This is implemented in
hardware by coherent cache protocols, ensuring (roughly) that each cache line is
writable by at most one hardware thread at a time, and by additional machinery
restricting store buffer and pipeline behaviour. Then each architecture provides
additional synchronisation mechanisms to let the programmer enforce ordering
properties involving multiple locations.
At first sight, one might expect instruction fetches to act like other memory
reads but, because writes to instruction memory are relatively rare, hardware de-
signers have adopted different caching mechanisms. The Arm architecture care-
fully does not mandate exactly what these must be, to allow a wide range of
possible hardware implementations, but, for example, a high-performance Arm
processor might have per-core separate L1 instruction and data caches, above
a unified per-core L2 cache and an L3 cache shared between cores. There may
also be additional structures, e.g. per-core fetch queues, and caching of decoded
micro-operations. This instruction caching is not necessarily coherent with data
memory accesses: “the architecture does not require the hardware to ensure co-
herency between instruction caches and memory” [9, B2.4.4 (B2-114)]; instead,
programmers must use explicit cache maintenance instructions. The documenta-
tion gives a particular sequence of these: “If software requires coherency between
instruction execution and memory, it must manage this coherency using Context
ARMv8-A system semantics: instruction fetch in relaxed architectures 631

synchronization events and cache maintenance instructions. The following code

sequence can be used to allow a processing element (PE) to execute code that the
same PE has written.”
; Coherency example for data and instruction accesses [...]
; Enter this code with <Wt> containing a new 32-bit instruction,
; to be held in Cacheable space at a location pointed to by Xn.
STR Wt, [Xn]; Store new instruction
DC CVAU, Xn ; Clean data cache by virtual address (VA) to PoU
DSB ISH ; Ensure visibility of the data cleaned from cache
IC IVAU, Xn ; Invalidate instruction cache by VA to PoU
DSB ISH ; Ensure completion of the invalidations
ISB ; Synchronize the fetched instruction stream

At first sight, this may be entirely mysterious. The remainder of the paper es-
tablishes precise semantics for each instruction, explaining why each is required,
but as a rough intuition:
1. The DC CVAU,Xn cleans this core’s data cache for address Xn, pushing the new
write far enough down the hierarchy for an instruction fetch that misses in
the instruction cache to be guaranteed to see the new value. This point is the
Point of Unification (PoU) and is usually the point where the instruction
and data caches become unified (L2 for most modern devices).
2. The DSB ISH waits for the clean to have happened before letting the later
instructions execute (without this, the sequence itself can execute out-of-
order, and the clean might not have pushed the write down far enough before
the instruction cache is updated). The ISH makes this specific to the Inner
Shareable Domain: the processor itself, not the system-on-chip. We do not
model shareability domains in this paper, so this is equivalent to a DSB SY.
3. The IC IVAU,Xn invalidates any entry for that address in the instruction
caches for all cores, forcing any future fetch to miss in the instruction cache,
and instead read the new value from the data memory hierarchy; it also
touches some fetch queue machinery.
4. The second DSB ISH ensures the invalidation completes.
5. The final ISB flushes this core’s pipeline, forcing a re-fetch of all program-
order-later instructions.
Some hardware implementations provide extra guarantees, rendering the DC or
IC instructions unnecessary. Arm allow software to discover this in an archi-
tectural way, by reading the CTR_EL0 register’s DIC and IDC bits. Our mod-
elling handles this, but for brevity we only discuss the weakest case, with
CTR_EL0.DIC=CTR_EL0.IDC=0, that requires full cache maintenance.
Arm make clear that instructions can be prefetched (perhaps speculatively):
“How far ahead of the current point of execution instructions are fetched from
is IMPLEMENTATION DEFINED. Such prefetching can be either a fixed or a
dynamically varying number of instructions, and can follow any or all possible
future execution paths. For all types of memory, the PE might have fetched the
instructions from memory at any time since the last Context synchronization
event on that PE.”
632 B. Simner et al.

Concurrent modiﬁcation and instruction fetch require the same sequence,

with an ISB on each thread that executes the new instructions, and the rest of
the sequence on the modifying thread [9, B2.2.5 (B2-94)]. Concurrent modifica-
tion without synchronisation is restricted to particular instructions ( B (branch),
BL (branch-and-link), BRK (break), SMC, HVC, SVC (secure monitor, hypervisor,
and supervisor calls), ISB, and NOP), otherwise there could be constrained unpre-
dictable behaviour : “any behavior that can be achieved by executing any sequence
of instructions that can be executed from the same Exception level” . Concurrent
modification of conditional branches is allowed but can result in the old condition
with the new target address or vice versa.
All this gives some guidance for programmers, but it leaves the exact seman-
tics of instruction fetch and those cache maintenance instructions unclear, and in
practice software typically does not use the above sequence verbatim. For exam-
ple, it may synchronise a range of addresses at once, looping the DC and IC parts,
or the final ISB may be subsumed by instruction synchronisation from exception
entry or return. Linux has many places where it modifies code at runtime: in
boot-time patching of alternatives, modifying kernel code to specialise it to the
particular hardware being run on; when the kernel loads code (e.g. when the user
calls dl_open); and in the ptrace system call, used e.g. by the GDB debugger to
patch arbitrary instructions with breakpoints at runtime. In Google’s Chrome
web browser, its WebAssembly and JavaScript just-in-time (JIT) compilers are
required to both write new code during execution and modify existing code at
runtime. In JavaScript, this modification happens inside a single thread and so is
quite straightforward. The WebAssembly case is more complex, as one thread is
modifying the code of another. A software thread can also be moved (by the OS
or hypervisor) from one hardware thread to another, perhaps while it is in the
middle of some instruction cache maintenance. Moreover, for security reasoning,
we have to be able to bound the possible behaviour of arbitrary code.
All this means that we cannot treat the above sequence as a whole, as an
opaque black box. Instead, we need a precise semantics for each individual in-
struction, but the existing prose documentation does not provide that.
The problem we face is to give such a semantics, that correctly defines be-
haviour in arbitrary concurrent contexts, that captures the Arm architectural
intent, that is strong enough for software, and that abstracts from the variety
of hardware implementations (e.g. with differing cache structures) that the ar-
chitecture intends to allow – but which programmers should not have to think
about.

3 Instruction Fetch Phenomena and Examples

We now describe the main instruction-fetch phenomena and architecture design

questions for ARMv8-A, illustrated by handwritten litmus tests, to guide the
following model design.
ARMv8-A system semantics: instruction fetch in relaxed architectures 633

3.1 Instruction-Fetch Atomicity

The first point, as mentioned in §2, is that concurrent modification and fetch
is only permitted if the original and modified instructions are in a particular
set: various branches, supervisor/hypervisor/secure-monitor calls, the ISB in-
struction synchronisation barrier, and NOP. Otherwise, the architecture permits
constrained unpredictable behaviour, meaning that the resulting machine state
could be anything that would be reachable by arbitrary instructions at the same
exception level. The following W+F test illustrates this.
W+F AArch64
Initial state: 0:W0="SUB X0,X0,#1", 0:X1=l
Thread 0 Thread 1
STR W0,[X1] // modify Thread 1 at l l: ADD X0,X0,#1 // initial code
Allowed: constrained-unpredictable final state

In this test Thread 0 performs a memory store (with the STR instruction)
to the code that Thread 1 is executing; overwriting the ADD X0,X0,#1 instruc-
tion with the 32-bit encoding of the SUB X0,X0,#1 instruction. If the fetch were
atomic, the outcome of this test would be the result of executing either the ADD
or the SUB instruction, but, since at least one of those is not in the set of the
8 atomically-fetchable instructions given previously, Thread 1 has constrained-
unpredictable behaviour and the final state is very loosely constrained. Note,
however, that this is nonetheless much stronger than the C/C++ whole-program
undefined behaviour in the presence of a data race: unlike C/C++, a hardware
architecture has to define a useful envelope of behaviour for arbitrary code, to
provide guarantees for the rest of the system when one user thread has a race.
Conditional Branches For conditional branches, the Arm architecture pro-
vides a specific non-single-copy-atomic fetch guarantee: the execution will be
consistent with either the old or new target, and either the old or new condition.
For example, this W+F+branches
W+F+branches AArch64
test can overwrite a B.EQ g with
Initial state: 0:W0="B.NE h", 0:X1=l
a B.NE h, and end up executing
Thread 0 Thread 1
B.NE g or B.EQ h instead of one
STR W0,[X1] l: B.EQ g
of those. Our future examples will
only modify NOPs and unconditional Allowed: execute "B.NE g"
branch instructions.

3.2 Coherence
Data writes and reads are coherent, in Arm and in other major architectures:
in any execution, for each address, the reads of each hardware thread must see
a subsequence of the total coherence order of all writes to that address. The
plain-data CoRR test [46] illustrates one case of this: it is forbidden for a thread
to read a new write of x and then the initial state for x. However, instruction
fetches are not necessarily coherent: one instruction fetch may be inconsistent
634 B. Simner et al.

with a program-order-previous fetch, and the data and instruction streams can
become out-of-sync with each other. We explore three kinds of coherence:

– Instruction-to-Instruction Coherence: whether fetches of the same location

must observe writes to the same location coherently.
– Data-to-Instruction Coherence: whether fetches and then reads to the same
location must observe writes to the same location coherently.
– Instruction-to-Data Coherence: whether reads and then fetches of the same
location must observe writes to the same location coherently.

Instruction-to-Instruction Coherence Arm explicitly do not guarantee any

consistency between fetches of the same location: fetching an instruction does
not mean that a later fetch of that location will not see an older instruction [9,
B2.4.4]. This is illustrated by CoFF, like CoRR but with fetches instead of reads.
CoFF AArch64
Initial state: 0:W0="B l1", 0:X1=f
Thread 0 Thread 1 Common Thread 0 Thread 1
irf
STR W0,[X1] //a BL f f: B l0 a:write f=B l1 b:fetch f=B l1
MOV X0,X10 l1: MOV X10,#2
BL f RET fpo
MOV X1,X10 l0: MOV X10,#1 irf
RET
c:fetch f=B l0
Allowed: 1:X0=2, 1:X1=1

Here Thread 1 makes two calls to address f (BL is branch-and-link), while

Thread 0 overwrites the instruction at that address. The interesting potential
execution is that in which the ﬁrst call to f fetches and executes the newly-
written B l1, but the second call fetches and executes the original B l0. We can
view such executions as graphs, similar to previous axiomatic-model candidate
executions but with new fetch events, one per instruction, and new edges. As
usual, we use po and rf edges for the program-order and reads-from relations,
together with:

– fe (fetch-to-execute), which relates the fetch event of an instruction to all

the execution events (memory writes, reads or barriers) of the instruction;
– irf (instruction-read-from), relating a write to all fetches that read from it
(analogous to reads-from, rf); and
– fpo (fetch-program-order), relating fetches of instructions that are in pro-
gram order (analogous to program order, po).

Edges from the initial state are drawn from a small circle. Since we do not modify
the code of most locations, we usually omit the fetch events for those instructions,
showing only a subgraph of the interesting events, e.g. as on the right above. For
Arm, this execution is both architecturally allowed and experimentally observed.
Here, and in future tests, we assume some common code consisting of a
function at address f which always has the same shape: a branch that might
be overwritten, which selects a block that writes a value to register X10 before
ARMv8-A system semantics: instruction fetch in relaxed architectures 635

returning. This is sometimes duplicated at diﬀerent addresses ( f1, f2, ...) or

extended to g, with three cases. We sometimes elide the common code.
Data-to-Instruction Coherence Fetching from a particular write does imply
that program-order-later reads from the same address will see that write (or a
coherence successor thereof). This is a data-to-instruction coherence property,
illustrated by CoFR below. Here Thread 1 fetches the newly-written B l1 at f
and then, when reading from f with its LDR load instruction, cannot read the
original B l0 instruction (it can only read the new B l1).
CoFR AArch64
Initial state: 0:W0="B l1", 0:X1=f, 1:X2=f Thread 0 Thread 1
irf
Thread 0 Thread 1 Common a:write f=B l1 b:fetch f=B l1
STR W0,[X1] BL f f: B l0 fpo
MOV X0,X10 l1: MOV X10,#2
LDR X1,[X2] RET c:fetch LDR X1,[X2]
l0: MOV X10,#1
RET rf fe
Forbidden: 1:X0=2, 1:X1="B l0" d:read f=B l0

This is not clear in the existing prose speciﬁcation, but the architectural
intent that emerged during discussion with Arm is that the given execution
should be forbidden, reﬂecting microarchitectural choices that (1) instructions
decode in order, so the fetch b must occur before the read d, and (2) fetches that
miss in the instruction cache must read from data storage, so the instruction
cache cannot be ahead of the available data. This ensures that fetching from a
write means that all threads are now guaranteed to read from that write (or
another coherence-after it).
Instruction-to-Data Coherence In the other direction, reading from a par-
ticular write to some location does not imply that later fetches of that location
will see that write (or a coherence successor), as in the following CoRF+ctrl-isb.
CoRF+ctrl-isb AArch64
Initial state: 0:W0="B l1", 0:X1=f, 1:X2=f
Thread 0 Thread 1 Common Thread 0 Thread 1
rf
STR W0,[X1] LDR X0,[X2] f: B l0 a:write f=B l1 b:read f=B l1
CBNZ X0,l l1: MOV X10,#2
l: ISB RET ctrl+isb
BL f l0: MOV X10,#1 irf
MOV X1,X10 RET
c:fetch f=B l0
Allowed: 1:X0="B l1", 1:X1=1

Here Thread 1 has a control dependency and an instruction synchronisation

barrier (the CBNZ conditional branch, dependent on the value read by its LDR
load, and ISB), abbreviated to ctrl+isb, between its load and the fetch from f. If
the latter were a data load, this would ensure the two loads are satisﬁed in order.
This is not explicit in the existing prose, but it is what one would expect, and it
is observed in practice. Microarchitecturally, it is easily explained by an out-of-
date entry for f in the instruction cache of Thread 1: if Thread 1 had previously
fetched f (perhaps speculatively), and that instruction cache entry has not been
evicted or explicitly invalidated since, then this fetch of f will simply read the
636 B. Simner et al.

old value from the instruction cache without going out to data memory. The ISB
ensures that f is freshly fetched, but does not ensure that Thread 1’s instruction
cache is up-to-date with respect to data memory.

3.3 Instruction Synchronisation

Instruction fetches satisfy few guarantees, so explicit synchronisation must be
performed when modifying the instruction stream.
Same-Thread Synchronisation Test SM below shows the simplest self-
modifying code case: without additional synchronisation, a write to program
memory can be ignored by a program-order-later fetch.
SM AArch64
Initial state: 0:W0="B l1", 0:X1=f
Thread 0 Common Thread 0
STR W0,[X1] // a f: B l0 a:write f=B l1
BL f l1: MOV X10,#2
MOV X0,X10 RET ifr
l0: MOV X10,#1 irf
RET
b:fetch f=B l0
Allowed: 1:X0=1

In this execution, the fetch b, fetching the instruction at f, fetches a value

from a write coherence-before a, even though b is the fetch of an instruction
program-order after a. We illustrate this with an instruction from-reads (ifr)
edge. This is a derived relation, analogous to the usual from-reads (fr) relation,
that relates each fetch to all writes that are coherence-after the write it read
from; it is defined as ifr = irf−1 ;co. If the fetch were a data read, this would
be a forbidden coherence shape (COWR). As it is, it is architecturally allowed,
as described explicitly by Arm [9, B2.4.4], and it is experimentally observed on
all devices we have tested. Microarchitecturally, this too is simply due to fetches
from old instruction cache entries.
Cache Maintenance As we saw in §2, the Arm architecture provides cache
maintenance instructions to synchronise the instruction and data streams: the
DC data-cache clean and IC instruction-cache invalidate instructions. To forbid
the relaxed outcome of SM, by forcing a fetch of the modified code, the specified
sequence of cache maintenance instructions must be inserted, with an ISB.
SM+cachesync-isb AArch64
Initial state: 0:W0="B l1", 0:X1=f
Thread 0 Thread 0
STR W0,[X1] //overwrite f with branch a:write f=B l1
DC CVAU,X1 //clean data cache
DSB ISH cachesync
IC IVAU,X1 //invalidate instruction cache
DSB ISH
b:ISB
ISB //flush pipeline isb
BL f irf
MOV X0,X10 c:fetch f=B l0
Forbidden: 1:X0=1
ARMv8-A system semantics: instruction fetch in relaxed architectures 637

Now the outcome is forbidden. The cache synchronisation sequence DC CVAU;

DSB ISH; IC IVAU; DSB ISH (which we abbreviate to a single cachesync edge)
ensures that by the time the ISB executes, the instruction and data memory have
been made coherent with each other for f. The ISB then ensures the final fetch
of f is ordered after this sequence. The microarchitectural intuition for this was
in §2; our §4 operational model will describe the semantics of each instruction.
Cross-Thread Synchronisation We now consider modifying code that can be
fetched by other threads, using variants of the standard message-passing shape
MP. That checks whether two writes (to different locations) on one thread can
be seen out-of-order by two reads on another thread; here we replace one or both
of those reads by fetches, and ask what synchronisation is required to ensure that
the relaxed outcome is forbidden. Consider first an MP variant where the first
write is of a new instruction, and the second is just a simple data memory flag:
MP.RF+dmb+ctrl-isb AArch64
Initial state: 0:W0="B l1", 0:X1=f,
0:X2=1, 0:X3=x, 1:X2=x, [x]=0 Thread 0 Thread 1
Thread 0 Thread 1 a:write f=B l1 c:read x=1
rf
STR W0,[X1] LDR X0,[X2] dmb ctrl
DMB ISH CBNZ X0,l
STR X2,[X3] l: ISB b:write x=1 d:ISB
BL f isb
MOV X1,X10
irf
Allowed: 1:X0=1, 1:X1=1 e:fetch f=B l0

This test includes suﬃcient synchronisation on each thread to enforce thread-

local ordering of data accesses: the DMB in Thread 0 ensures the writes a and b
propagate to memory in program order, and the control-dependency into an ISB
on Thread 1 ensures the read c and the fetch e happen in program order. How-
ever, as we saw in §2, this is not enough to synchronise concurrent modiﬁcation
and execution of code in ARMv8-A. Thread 0 needs the entire cache synchro-
nization sequence (giving test MP.RF+cachesync+ctrl-isb, not shown), not just
a DMB, to forbid this outcome.
Another variant of this MP-shape test where the message passing itself is
done using modiﬁcation of code gives a much stronger guarantee, as can be
seen from the following MP.FR+dmb+fpo-fe test. This is not clear from the
MP.FR+dmb+fpo-fe AArch64
Initial state: 0:X0=1, 0:X1=x,
1:X2=x, [x]=0, Thread 0 Thread 1
0:W2="B l1", 0:X3=f a:write x=1 c:fetch f=B l1
Thread 0 Thread 1 irf fpo
dmb
STR X0,[X1] BL f
DMB ISH MOV X0,X10 b:write f=B l1 d:fetch LDR X1,[X2]
STR W2,[X3] LDR X1,[X2]
fe
Forbidden: 1:X0=2, 1:X1=0
e:read x=0
architecture manual, but this outcome is already forbidden with only the DMB.
638 B. Simner et al.

This is for similar reasons to the above CoFR test: since Thread 1 fetched the
updated value for f, we know that value must have reached at least the data
caches (since that is where the instruction cache reads from) and therefore multi-
copy atomicity guarantees that a normal load instruction will observe it.
The ﬁnal variant of these MP-shaped tests has both Thread 0 writes be of new
instructions. This idiom is very common in practice; it is currently how Chrome’s
WebAssembly JIT synchronises the modiﬁed thread with the new code.
MP.FF+dmb+fpo AArch64
Initial state: 0:W0="B l1", 0:X1=f1,
0:W2="B l1", 0:X3=f2 Thread 0 Thread 1
Thread 0 Thread 1 a:write f1=B l1 c:fetch f2=B l1
irf fpo
STR W0,[X1] BL f2 dmb
DMB ISH MOV X0,X10 irf
STR W2,[X3] BL f1 b:write f2=B l1 d:fetch f1=B l0
MOV X1,X10
Allowed: 1:X0=2, 1:X1=1

Without the full cachesync sequence on Thread 0, this is an allowed

outcome. Interestingly, adding the cachesync sequence to Thread 0 (Test
MP.FF+cachesync+fpo, not shown) is sufficient to make the outcome forbid-
den, without an ISB in Thread 1, as the cachesync sequence is intended to make
it appear that fetches occur in program order. Microarchitecturally, that could
be ensured in two ways: either by actually fetching in-order, or by making the
IC instruction not only invalidate all the instruction caches (for this address)
but also clean any core’s pre-fetch buffer stale entries (for this address). Archi-
tecturally, this is not clear in the current prose, but, concurrent with this work,
Arm were independently strengthening their definition to make it so.
Incremental Synchronisation The cache synchronisation sequence need not
be contiguous, or even all in the same thread. So long as the sequence in its
entirety has been performed by the time the fetch happens, then the instruction
stream will have been made consistent with the data stream for that address.
This is demonstrated by the following test, where Thread 0 performs a write
to f and then only a DC before synchronizing with Thread 1, which performs the
IC, while Thread 2 observes the modified code. This can happen in practice when
a software thread is migrated between hardware threads at runtime, by a hyper-
visor or OS. Thread 0 and Thread 1 may just represent the runtime scheduling
of a single process, beginning execution on hardware Thread 0 but migrated to
hardware Thread 1 between the DC and IC instructions. In the graph, the dcsync
and icsync represent the DC;DSB ISH and DSB ISH;IC;DSB ISH combinations. The
DC does not need a preceding DSB ISH because it is ordered w.r.t. the preceding
store to the same cache line.
Here the IC gets broadcast to all threads [9, B2.2.5p3], and so the fact that
it happens on a different thread to the DC does not affect the outcome. Similarly,
if the DC were to happen on another thread first (to get the test MP.RF+[dc]-
ic+ctrl-isb, not shown), then it would have the effect of ensuring consistency
globally, for all threads.
ARMv8-A system semantics: instruction fetch in relaxed architectures 639

ISA2.F+dc+ic+ctrl-isb AArch64
Initial state: 0:W0="B l1", 0:X1=f,
0:X2=1, 0:X3=x, [x]=0, 1:X4=f, Thread 0 Thread 1 Thread 2
1:X1=x, 1:X2=1, 1:X3=y, [y]=0, 2:X2=y a:write f=B l1 c:read x=1 e:read y=1
Thread 0 Thread 1 Thread 2 dcsync icsync ctrl
STR W0,[X1] LDR X0,[X1] LDR X0,[X2] rf rf
DC CVAU, X1 DSB ISH CBZ X0,l b:write x=1 d:write y=1 f: ISB
DSB ISH IC IVAU, X4 l:ISB isb
STR X2,[X3] DSB ISH BL f ifr
STR X2,[X3] MOV X1,X10 g:fetch f=B l0
Forbidden: 1:X0=1, 1:X1=1

3.4 Multi-Copy Atomicity

For data accesses, the question of whether they are multi-copy atomic is a crucial
one for relaxed architectures. IBM POWER, ARMv7, and pre-2018 ARMv8-A
are/were non-multi-copy atomic: two writes to different addresses could become
visible to distinct other threads in different orders. Post-2018 ARMv8-A and
RISC-V are multi-copy atomic (or “other multi-copy-atomic” in Arm terminol-
ogy) [37,36,9]: the programmer can assume there is a single shared memory, with
all relaxed-memory effects due to thread-local out-of-order execution.
However, for fetches, due to the lack of any fetch atomicity guarantee for most
instructions (§3.1), and the lack of coherent fetches for the others (§3.2), the
question of multi-copy atomicity is not particularly interesting. Tests are either
trivially forbidden (by data-to-instruction coherence) or are allowed but only the
full cache synchronisation sequence provides enough guarantees to forbid it, and
(§3.3) this ensures all cores will share the same consistent view of memory.

3.5 Strength of the IC Instruction

Multiple Points of Unification Cleaning the data cache, using the DC in-
struction, makes a write visible to instruction memory. It does this by pushing
the write past the Point of Unification. However, there may be multiple Points
of Unification: one for each core, where its own instruction and data memory
become unified, and one for the entire system (or shareability domain) where all
the caches unify. Fetching from a write implies that it has reached the closest
PoU, but does not imply it has reached any others, even if the write originated
from a distant core. Consider: Here Thread 0 modifies f, Thread 1 fetches the
new value and performs just an IC and DSB, before signalling Thread 0 which
also fetches f. That IC is not strong enough to ensure that the write is pulled
into the instruction cache of Thread 0.
This is not clear in the existing prose, but the architectural intent is that it
be allowed (i.e., that IC is weak in this respect). We have not so far observed it
in practice. The write may have passed the Point of Unification for Thread 1,
but not the shared Point of Unification for both threads. In other words, the
write might reach Thread 1’s instruction cache without being pushed down from
Thread 0’s data cache. Microarchitecturally this can be explained by direct data
640 B. Simner et al.

SM.F+ic AArch64
Initial state: 0:W0="B l1", 0:X4=f, Thread 0 Thread 1
irf
0:X3=x, [x]=0, 1:X4=f, 1:X2=1, 1:X3=x a:write f=B l1 e:fetch f=B l1
Thread 0 Thread 1 po icsync
STR W0,[X4] BL f b:read x=1 f:write x=1
LDR X2,[X3] MOV X0,X10 rf
CBZ X2,l IC IVAU, X4 ctrl
l: ISB DSB ISH
BL f STR X2,[X3] c:ISB
MOV X1,X10 isb
Allowed: 1:X0=2, 0:X2=1, 0:X1=1 irf
d:fetch f=B l0

intervention (DDI), an optimisation allowing cache lines to be migrated directly

from one thread’s (data) cache to another. The line could be migrated from
Thread 0 to Thread 1, then pushed past Thread 1’s Point of Unification, making
it visible to Thread 1’s instruction memory without ever making it visible to
Thread 0’s own instruction memory. The lack of coherence between instruction
and data caches would make this observable, even in multi-copy atomic machines.
Stale Fetches So far, we have only talked about fetching from two distinct
writes. But theoretically there is no limit to how far back we can fetch from,
with insufficient synchronization. The MP.RF+dmb+ctrl-isb test (§3.3) required
the full cachesync sequence to forbid the given behaviour. Below we give a test,
FOW, similar to that MP-shaped test but allowing many consumer threads
to independently and simultaneously see different values in their instruction
memory, even after invalidating their caches.
FOW AArch64
Initial state: 0:W0="B l1", 0:X2=g, 0:W1="B l2", 0:X3=1, 0:X4=x, [x]=0,
1:X4=x, 2:X4=x
Thread 0 Thread 1 Thread 2 Common
STR W0,[X2] LDR X0, [X4] LDR X0, [X4] g: B l0
STR W1,[X2] CBNZ X0, la CBNZ X0, lb l2: MOV X10, #3
DSB ISH la: ISB lb: ISB RET
IC IVAU, X2 BL g BL g l1: MOV X10, #2
DSB ISH MOV X1,X10 MOV X1,X10 RET
STR X3,[X4] l0: MOV X10, #1
RET
Allowed: 1:X0=1, 1:X1=2, 2:X0=1, 2:X1=1

Thread 0 Thread 1 Thread 2

a:write g=B l1 d:read x=1 f:read x=1
po irf ctrl+isb ctrl+isb
b:write g=B l2 rf e:fetch g=B l1 irf g:fetch g=B l0
icsync
rf
c:write x=1
This is not clear in the existing architecture text. It is a case where the architec-
ture design is not very constrained. On the one hand, it has not been observed,
and it is thought unlikely that hardware will ever exhibit this behaviour: it would
ARMv8-A system semantics: instruction fetch in relaxed architectures 641

require keeping multiple writes in the coherent part of the data caches, rather
than a single dirty line, which would require more complex cache coherence pro-
tocols. On the other hand, there does not seem to be any beneﬁt to software from
forbidding it. Arm therefore prefer the choice that gives a simpler and weaker
model (here the two happen to coincide), to make it easier to understand and to
provide more ﬂexibility for future microarchitectural optimisations. We therefore
design our models to allow the above behaviour.

3.6 Strength of the DC Instruction

Instruction Cache depth Test CoFF (§3.2) showed that fetches can see “old”
writes. In principle, there is no limit to the depth of the instruction-cache hier-
archy: there could be many values for a single location cached in the instruction
memory for each core, even if the data cache has been cleaned. The test below
illustrates this, with Thread 1 able to see all three values for g.
MP.RF+dc+ctrl-isb-isb AArch64
Initial state: 0:W0="B l1", 0:X2=g,
0:W1="B l2", 0:X3=1, 0:X4=x, [x]=0, 1:X4=x
Thread 0 Thread 1
Thread 0 Thread 1 Common
STR W0,[X2] LDR X0, [X4] g: B l0
a:write g=B l1 d:read x=1
STR W1,[X2] CBNZ X0, l l2:MOV X10,#3 po rf ctrl+isb
DSB ISH l:ISB RET
DC CVAU,X2 BL g l1:MOV X10,#2 b:write g=B l2 irf e:fetch g=B l2
DSB ISH MOV X1,X10 RET
STR X3,[X4] ISB l0:MOV X10,#1 dcsync isb
BL g RET
irf
MOV X2,X10 c:write x=1 f:fetch g=B l1
ISB
BL g isb
MOV X3,X10 irf g:fetch g=B l0
Allowed: 1:X0=1, 1:X1=3, 1:X2=2, 1:X3=1

This is similar to the preceding FOW case: it is thought unlikely that hardware
will exhibit this in practice, but the desire for the simpler and weaker option
means the architectural intent is to allow it, and we follow that in our models.

4 An Operational Semantics for Instruction Fetch

Previous work on operational models for IBM POWER and Arm “user-
mode” concurrency [46,45,22,18,19,37] has shown, surprisingly, that as far as
programmer-visible behaviour is concerned, one can abstract from almost all
hardware implementation details of data memory (store queues, the cache hi-
erarchy, the cache protocol, etc.). For ARMv8-A, following their 2018 shift to
a multicopy-atomic architecture, one can do so completely: the Flat model of
[37] has a shared ﬂat memory, with a per-thread out-of-order thread subsystem,
modelling pipeline eﬀects, responsible for all observable relaxed behaviour. For
instruction-fetch, it is no longer possible to abstract completely from the data
and instruction cache hierarchy, but we can still abstract from much of it.
The Flat Model is a small-step operational semantics for multi-copy atomic
ARMv8-A, including the relaxed behaviours of loads and stores [37]. Its states are
642 B. Simner et al.

abstract machine states consisting of a tree of instructions for each thread, and
a flat memory subsystem shared by all threads. Each instruction in each thread
corresponds to a sequence of transitions, with some guards and a potential effect
on the shared memory state. The Flat model is made executable in our RMEM
tool, which can exhaustively interleave transitions to enumerate all the possible
behaviours. The tree of instructions for each thread models out-of-order and
speculative execution explicitly. Below we show an example for a thread that is
executing 10 instruction instances.
Some (grey) are finished, no longer
subject to restart; others (pink)
have run some but perhaps not all
of their instruction semantics; in-
structions are not necessarily atomic. Those with multiple children are branch
instructions with multiple potential successors speculated simultaneously.
For each state, the model defines the set of allowed transitions, each of which
steps to a new machine state. Transitions correspond to steps of single instruc-
tions, and individual instructions may give rise to many. Example transitions
include Register Write, Propagate Write to Memory, etc.
iFlat Extension Originally, Flat decode
had a fixed instruction mem-
ory, with a single transition that Fetch Queue
per-thread

new
can speculate the address of any fetch
Thread request fetch
program-order successor of any in-
struction in ﬂight, fetch it from Abstract I$
the ﬁxed instruction memory, and
decode it. We now remove that write data
read data

add to I$

ﬁxed instruction memory, so that

instructions can be fetched from Abstract
data writes, and add the additional
global

most D$ any
structures as shown on the right. recent
These are all of unbounded size, as
is appropriate for an architecture Memory
definition.
Fetch Queues (per-thread) These are ordered buffers of pre-fetched entries,
waiting to be decoded and begin execution. Entries are either a fetched 32-bit
opcode, or an unfetched request. The fetch queues allow the model to speculate
and pre-fetch many instructions ahead of where the thread is currently executing.
The model’s fetch queues abstract from multiple real-hardware structures: in-
struction queues, line-fill buffers, loop buffers, and slots objects. We keep a close
relation to this underlying microarchitecture by allowing out-of-order fetches,
but we believe this is not experimentally observable on real hardware.
Abstract Instruction Cches (per-thread) These are just sets of writes.
When the fetch queue requests a new entry, it gets satisfied from the instruction
cache, either immediately (a hit) or at some later point in time (a miss). The
ARMv8-A system semantics: instruction fetch in relaxed architectures 643

instruction cache can contain many possible writes for each location (§3.6), and
it can be spontaneously updated with new writes in the system at any time ([9,
B2.4.4]). To manage IC instructions, each thread keeps a list of addresses yet to
be invalidated by in-flight ICs.
Data Cache (global) Above the single shared flat memory for the entire sys-
tem, which sufficed for the multi-copy-atomic ARMv8-A data memory, we insert
a shared buffer which is just a list of writes; abstracting from the many possible
coherent data cache hierarchies. Data reads must be coherent, reading from the
most recent write to the same address in the buffer, but instruction fetches are
allowed to read from any such write in the buffer (§3.2).
Transitions To accommodate instruction fetch and cache maintenance, we in-
troduce new transitions: Fetch Request, Fetch Instruction, Fetch Instruction
(Unpredictable), Fetch Instruction (B.cond), Decode Instruction, Begin IC,
Propagate IC to Thread, Complete IC, Perform DC, and Update Instruction
Cache. We also have to modify some Flat transitions: Commit ISB, Wait for
DSB, Commit DSB, Propagate Memory Write, and Satisfy Read from Memory.
These transitions define the lifecycle of each instruction: a request gets issued
for the fetch, then at some later point the fetch gets satisfied from the instruc-
tion cache, the instruction is then decoded (in program-order) and then handed
to the existing semantics to be executed. To give a flavour, we show just one,
the Propagate IC to Thread transition, which is responsible for invalidation of
the abstract instruction caches. This is a prose rendering of the rule in our exe-
cutable mathematical model, which is expressed in the typed functional subset
of Lem [32].

Propagate IC to Thread An instruction i (with ID iiid ) in state

Wait_IC(address, state_cont) can do the relevant invalidate for any thread
tid’, modifying that thread’s instruction cache and fetch queue, if there exists
a pending entry (iiid, address) in that thread’s ic_writes. Action:
1. for any entry in the fetch queue for thread tid, whose program_loc is
in the same minimum-size instruction cache line as address, and is in
Fetched(_) state, set it to the Unfetched state;
2. for the instruction cache of thread tid, remove any write-slices which are
in the same instruction cache line of minimum size as address.
This rule can be found under the same name in the full prose description,
and in the handle_ic_ivau and flat_propagate_cache_maintenance functions
in machineDefThreadSubsystem.lem and machineDefFlatStorageSubsystem.lem
in the executable mathematics. Cache maintenance operations work over entire
cache lines, not individual addresses. Each address is associated with at least one
cache line for the data (and uniﬁed) caches, and one for the instruction caches.
The cache line of minimum size is the (architected) smallest possible cache line
for each of these.
Example This model correctly explains all the behaviours of §3. We illustrate
this by revisiting the cache synchronization explanation of §2, which can now
644 B. Simner et al.

be re-interpreted w.r.t. our precise model, and using this to explain the thread
migration case of §3.3. Given DC Xn; DSB; IC Xn; DSB we can use this model
to give meaning to it (omitting uninteresting transitions): First the DC CVAU
causes a Perform DC transition. This pushes any write that might have been
in the abstract data cache into memory. Now the first DSB’s Commit DSB can
be taken, allowing Begin IC to happen. This creates entries for each thread,
which are discharged by each Propagate IC to Thread (see above). Once all
entries are invalidated, a Complete IC can happen. Now, if any thread decodes
an instruction for that address, it must have been fetched from the write the
DC pushed, or something coherence-after it. If the software thread performing
this sequence is interrupted and migrated (by the OS) to a different hardware
thread, then, so long as the OS includes the DSB to maintain the thread-local DC
ordering, the DC will push the write in an identical way, since it only affects the
global abstract data cache. The IC transitions can all be taken, and the sequence
continues as before, just on a new hardware thread. So when the second DSB
finishes, and the final Commit DSB transitions is taken, the effect of the full
sequence will be seen system-wide even if the thread was migrated.

5 An Axiomatic Semantics for Instruction Fetch

Based on the operational model, we develop an axiomatic semantics, as an ex-

tension of the ARMv8 axiomatic reference model [15,37]. Since that does not
have mixed-size support, we do not model the concurrent modification of condi-
tional branches (§3.1), as this would require mixed-size machinery. The existing
axiomatic model is a predicate on candidate executions, hypothetical complete
executions of the given program that satisfy some basic well-formedness condi-
tions, defining the set of valid executions to be those satisfying its axioms. Each
candidate execution abstractly captures a particular concrete execution of the
program in terms of events and relations over them. This model is expressed in
the herd language [8,6,4]. The events of these executions are memory reads (the
set R), memory writes (W), and memory barrier/fence events (F). The relations
are: program order (po), capturing the sequencing of events by the same thread in
the execution’s control-flow unfolding; reads-from (rf), relating a write event w
with any read event r that reads from it; the coherence order (co), recording the
execution’s sequencing of same-address writes in memory; and read-modify-write
(rmw), capturing which load/store exclusive instructions form a successful exclu-
sive pair in the execution. The derived relation from-reads fr = rf−1 ;co relates
a read r with a write w if r reads from a write w coherence before w . In addition,
candidate executions also have relations capturing dependencies between events:
address (addr), data (data), and control dependencies (ctrl). The relation loc
relates any two read/write events that are to the same memory address. The
model also has relations suffixed “i” and “e”: rfi/rfe, coi/coe, fri/fre. These
are the restrictions of the relations rf, co, and fr, to same-thread/“internal”
event pairs or different-thread/“external” event pairs. The model is defined in
relational algebra. In herd, R;S stands for sequential composition of relations R
ARMv8-A system semantics: instruction fetch in relaxed architectures 645

and S, R−1 for the inverse of relation R, R|S and R&S for the union and intersection
of R and S, and [A];R;[B] for the restriction of R to the domain A and range B.
Handling instruction fetch requires extending the notion of candidate ex-
ecution. We add new events: an instruction-fetch (IF) event for each executed
instruction; a DC event for each DC CVAU instruction; an IC event for each IC IVAU
and IC IALLU instruction. We replace po with fetch-program-order (fpo) which
orders the IF event of an instruction before any program-order later IF events.
We add a relation same-cache-line (scl), relating reads, writes, fetches, DC and
IC events to addresses in the same cache line. We add an acyclic transitively
closed relation wco, which extends co with orderings for cache maintenance (DC
or IC) events: it includes an ordering (e, e ) or (e , e) for any cache maintenance
event e and same-cache-line event e if e is a write or another cache mainte-
nance event; where co = ([W];wco;[W]) & loc. The loc, addr, and ctrl are all
extended to include DC and IC events. We add a fetch-to-execute relation (fe),
relating an IF event to any event generated by the execution of that instruction;
and an instruction-read-from relation (irf), which relates a write to any IF event
that fetches from it. Finally, we add a boolean constrained-unpredictable (CU) to
detect badly behaved programs. Now we derive the following relations: the stan-
dard po relation, as po = fe−1 ;fpo;fe (two events e and e are po-related if their
fetch-events are fpo-related); and instruction-from-reads (ifr), the analogue of
fr for instruction fetches, relating a fetch to all writes coherence-after the one it
fetched from: ifr = irf−1 ;co.
We then make two semantics-preserving rewrites of the existing model to
make adding instruction fetches easier (described in the appendix); and make
the following changes and additions to the model. The full model is shown in
Figure 1, with comments pointing to the relevant locations in the model deﬁni-
tion. For lack of space we only describe the main addition, the iseq relation, in
detail (including its correspondence with the operational model of §4); for the
others we give an overview and refer to the appendix for the full description.
We deﬁne the relation iseq, relating some write w to address x to an IC
event completing a cache synchronisation sequence (not necessarily on a single
thread): w is followed by a same-cache line DC event, which is in turn followed
by a same-cache line IC event. In operational model terms, this captures traces
that propagated w to memory, subsequently performed a same-cache-line DC,
and then began an IC (and eagerly propagated the IC to all threads). In any
state after this sequence it is guaranteed that w, or a coherence-newer same-
address write, is in the instruction cache of all threads: performing the DC has
cleared the abstract data cache of writes to x, and the subsequent IC has re-
moved old instructions for location x from the instruction caches, so that any
subsequent updates to the instruction caches have been with w, or co-newer
writes. Adding ifr;iseq to the observed-by relation (obs) (4) relates an instruc-
tion fetch i to location x to an IC ic if: i fetched from a write w to x, some
write w to x is coherence-after w, and ic completes a cache synchronisation se-
quence (iseq) starting from w . Then the irreflexive ob axiom requires that i
must be ordered-before ic (because it would otherwise have fetched w ).We now
646 B. Simner et al.

let iseq = [W];(wco&scl);[DC]; (1) | [dmb.ld]; po; [R|W]

(wco&scl);[IC] | [A|Q]; po; [R|W]
| [W]; po; [dmb.st]
(* Observed-by *)
| [dmb.st]; po; [W]
let obs = rfe | fr | wco (*2*)
| [R|W]; po; [L]
| irf | (ifr;iseq) (*3, 4*)
| [R|W|F|DC|IC]; po; [dsb.ish] (*9*)
(* Fetch-ordered-before *) | [dsb.ish]; po; [R|W|F|DC|IC] (*10*)
let fob = [IF]; fpo; [IF] (*5*) | [dmb.sy]; po; [DC] (*11*)
| [IF]; fe (*6*)
(* Cache-op-ordered-before *)
−1
| [ISB]; fe ; fpo (*7*) let cob = [R|W]; (po&scl); [DC] (*12*)
| [DC]; (po&scl); [DC] (*13*)
(* Dependency-ordered-before *)
let dob = addr | data (* Ordered-before *)
| ctrl; [W] let ob = (obs|fob|dob|aob|bob|cob)+
| (ctrl | (addr; po)); [ISB]
(*| [ISB]; po; [R] *) (*8*) (* Internal visibility requirement *)
| addr; po; [W] acyclic (po-loc|fr|co|rf) as internal
| (addr | data); rfi
(* External visibility requirement *)
(* Atomic-ordered-before *) irreflexive ob as external
let aob = rmw
(* Atomic *)
| [range(rmw)]; rfi; [A|Q]
empty rmw & (fre; coe) as atomic
(* Barrier-ordered-before *)
(* Constrained unpredictable *)
let bob = [R|W]; po; [dmb.sy]
let cff = ([W];loc;[IF]) \ (*14*)
| [dmb.sy]; po; [R|W] −1
| [L]; po; [A] ob \ (co;iseq;ob)
| [R]; po; [dmb.ld] cff_bad cff ≡ CU (*15*)

Fig. 1. Axiomatic model

brieﬂy overview other changes made to the axiomatic model and their intuition.
We include irf in obs (3): for an instruction to be fetched from a write, the
write has to have been done before. We add a relation fetch-ordered-before (fob)
(5-7), which is included in ordered-before. The relation fob includes fpo and fe;
including fpo (5) requires fetches to be ordered according to their position in the
control-ﬂow unfolding of the execution. and including the fe (fetch-to-execute)
relation (6) captures the idea that an instruction must be fetched before it can
execute; fetches program-order-after an ISB happen after the ISB (or else are
restarted) (7). For DSB ISH instructions the edge [R|W|F|DC|IC];po;[dsb.ish]
is included in ob (9): DSB ISHs are ordered with all program-order-preceding
non-fetch events. Symmetrically, all non-IF events are ordered after program-
order-preceding dsb.ish events (10). DCs wait for preceding dmb.sy events (11).
We include the relation cache-op-ordered-before (cob) in ob. This relation orders
DC instructions with program-order previous reads/writes and other DCs to the
same cache line (12,13).

Finally, could-fetch-from (cff) (14) captures, for each fetch i, the writes it
could have fetched from (including the one it did fetch from), which we use to
deﬁne the constrained unpredictable axiom cff_bad (not given) (15).
ARMv8-A system semantics: instruction fetch in relaxed architectures 647

6 Validation
To gain confidence in the presented models we validated the models against the
Arm architectural intent, against each other, and against real hardware.
Validation against the Architecture To ensure our models correctly cap-
tured the architectural intent we engaged in detailed discussions with Arm, in-
cluding the Arm chief architect. These involved inventing litmus tests (including,
those described in §3 and many others) and discussing what the architecture
should allow in each case.
Validating against hardware To run instruction-fetch tests on hardware, we
extended the litmus tool [7]. The most significant extension consists in handling
code that can be modified, and thus has to be restored between experiments. To
that end, code copies are executed, those copies reside in mmap’d memory with
(execute permission granted. Copies are made from “master” copies, in effect
C functions whose contents basically consist of gcc extended inline assembly. Of
course, such code has to be position independent, and explicit code addresses in
test initialisation sections (such as in 0:X1=l in the test of §3.1) are specific to
each copy. All the cache handling instructions used in our experiments are all
allowed to execute at exception level 0 (user-mode), and therefore no additional
privilege is needed to run the tests.
To automatically generate families of interesting instruction-fetch tests, we
extended the diy test generation tool [3] to support instruction-fetch reads-
from (irf) and instruction-fetch from-reads (ifr) edges, in both internal (same-
thread) and external (inter-thread) forms, and the cachesync edge. We used this
to generate 1456 tests involving those edges together with po, rf, fr, addr, ctrl,
ctrlisb, and dmb.sy. diy does not currently support bare DC or IC instructions,
locations which are both fetched and read from, or repeated fetches from the
same location.
We then ran the diy-generated test suite on a range of hardware implemen-
tations, to collect a substantial sample of actual hardware behaviour.
Correspondence between the models We experimentally test the equiva-
lence of the operational and axiomatic models on the above hand-written and
diy-generated tests, checking that the models give the same sets of allowed final
states, and that these are consistent with the hardware observations.
Making the models executable as a test oracle To make the operational
model executable as a test oracle, capable of computing the set of all allowed
executions of a litmus test, we must be able to exhaustively enumerate all possible
traces. For the model as presented, doing this naively is infeasible: for each
instruction it is theoretically possible to speculate any of the 264 addresses as
potential next address, and the interleaving of the new fetch transitions with
others leads to an additional combinatorial explosion.
We address these with two new optimisations. First, we extend the fixed-point
optimisation in RMEM (incrementally computing the set of possible branch tar-
gets) [37] to keep track not only of indirect branches but also the successors of
648 B. Simner et al.

every program location, and only allow speculating from this set of successors.
Additionally, we track during a test which locations were both fetched and mod-
ified during the test, and eagerly take fetch and decode transitions for all other
locations. As before, the search then runs until the set of branch targets and
the set of modified program-locations reaches a fixed point. We also take some
of the transitions eagerly to reduce the search space, in cases where this cannot
remove behaviour: Wait for IC, Complete IC, Fetch Request, and Update
Instruction Cache.

Making the axiomatic model executable as a test oracle The axiomatic

model is expressed in a herd-like form, but the herd tool does not support instruc-
tion fetch and cache maintenance instructions. To make the model executable
as a test oracle, we built a new tool that takes litmus tests and uses a Sail [11]
definition of a fragment of the ARMv8-A ISA to generate SMT problems for the
model. Using the Sail instruction semantics, we generate a Sail program that cor-
responds to each thread within a litmus test. The tool then partially evaluates
these programs using the concrete values for addresses and registers specified in
the litmus file, while allowing memory values and arbitrary addresses to remain
symbolic. Using a Sail to SMT-LIB backend, these are translated into SMT defi-
nitions that include all possible behaviours of each thread as satisfiable solutions.
The rules for the axiomatic model are then applied as assertions restricting the
possible behaviours to just those allowed by the axiomatic model. The tool also
derives the addr and data relations, using the syntactic dependencies within the
instruction semantics to derive the syntactic dependencies between instructions.
For litmus tests, where we can know up-front which instructions may be
modified, we would like to avoid generating IF events for instructions that cannot
be modified. If we naively removed certain IF events, however, we would break
the correspondence between po and fe−1 ;fpo;fe. This can be worked around
by ensuring that every modifiable instruction generates an event which appears
in po, allowing fpo between the modifiable instructions to instead be derived
as fe;po;fe−1 . Branches emit a special branch address announce event for this
purpose, which is also used to derive the ctrl relation. The fpo relation can
then be modified, replacing [ISB];fe−1 ;fpo with [ISB];po;fe−1 and adding
[ISB];po. The second change ensures that all the transitive edges generated by
[ISB];fe−1 ;fpo followed by [IF];fe remain with fob and hence ob.
A limitation of this approach is it cannot support cases where two threads
both attempt to execute the same possibly-modified instruction, as in the
SM.F+ic and FOW tests.

Validation results First, to check for regressions, we ran the operational model
on all the 8950 non-mixed-size tests used for developing the original Flat model
(without instruction fetch or cache maintenance). The results are identical, ex-
cept for 23 tests which did not terminate within two hours. We used a 160
hardware-thread POWER9 server to run the tests.
We have also run the axiomatic model on the 90 basic two-thread tests that
do not use Arm release/acquire instructions (not supported by the ISA semantics
ARMv8-A system semantics: instruction fetch in relaxed architectures 649

used for this); the results are all as they should be. This takes around 30 minutes
on 8 cores of a Xeon Gold 6140.
Then, for the key handwritten tests mentioned in this paper, together with
some others (that have also been discussed with Arm), we ran them on various
hardware implementations and in the operational and axiomatic models. The
models’ results are identical to the Arm architectural intent in all cases, except
for two tests which are not currently supported by the axiomatic checker.
Test Arm intent op. model ax. model hardware obs.
CoFF allow = = 42.6k/13G
CoFR forbid = = 0/13G
CoRF+ctrl-isb allow = = 3.02G/13G
SM allow = = 25.8G/25.9G
SM+cachesync-isb forbid = = 0/25.9G
MP.RF+dmb+ctrl-isb allow = = 480M/6.36G
MP.RF+cachesync+ctrl-isb forbid = = 0/13G
MP.FR+dmb+fpo-fe forbid = = 0/13G
MP.FF+dmb+fpo allow = = 447M/13G
F
MP.FF+cachesync+fpo forbid = = 2.3k/13G
ISA2.F+dc+ic+ctrl-isb forbid = = 0/6.98G
U
SM.F+ic allow = unsupported 0/12.9G
U
FOW allow = unsupported 0/7G
U
MP.RF+dc+ctrl-isb-isb allow = = 0/12.94G
MP.R.RF+addr-cachesync+dmb+ctrl-isb forbid = = 0/6.97G
U
MP.RF+dmb+addr-cachesync allow = = 0/6.34G

[The hardware observations are the sum of testing seven devices: a Snapdragon 810
(4x Arm A53 + 4x Arm A57 cores), Tegra K1 (2x NVIDIA Denver cores), Snapdragon
820 (4x Qualcomm Kryo cores), Exynos 8895 (4x Arm A53 + 4x Samsung Mongoose 2
cores), Snapdragon 425 (4x Arm A53), Amlogic 905 (4x Arm A53 cores), and Amlogic
922X (4x Arm A73 + 2x Arm A53 cores). U: allowed but unobserved. F: forbidden but
observed.]
Our testing revealed a hardware bug in a Snapdragon 820 (4 Qualcomm Kryo
cores). A version of the ﬁrst cross-thread synchronisation test of §3.3 but with
the full cache synchronisation (MP.RF+cachesync+ctrl-isb) exhibited an illegal
outcome in 84/1.1G runs (not shown in the table), which we have reported. We
have also seen an anomaly for MP.FF+cachesync+fpo, currently under investi-
gation by Arm. Apart from these, the hardware observations are all allowed by
the models. As usual, speciﬁc hardware implementations are sometimes stronger.
Finally, we ran the 1456 new instruction-fetch diy tests on a variety of hard-
ware, for around 10M iterations each, and in the operational model. The model
is sound with respect to the observed hardware behaviour except for that same
Snapdragon 820 device.

7 Related Work
To the best of our knowledge, no previous work establishes well-validated rigor-
ous semantics for any systems aspects, of any current production architecture,
in a realistic concurrent setting.
650 B. Simner et al.

The closest is Raad et al.’s work on non-volatile memory, which models the
required cache maintenance for persistent storage in ARMv8-A [39], as an ex-
tension to the ARMv8-A axiomatic model, and for Intel x86 [38] as an oper-
ational model, but neither are validated against hardware. In the sequential
case, Myreen’s JIT compiler verification [33] models x86 icache behaviour with
an abstract cache that can be arbitrarily updated, cleared on a jmp. For ad-
dress translation, the authoritative Arm-internal ASL model [40,41,42], and Sail
model derived from it [11] cover this, and other features sufficient to boot an OS
(Linux), as do the handwritten Sail models for RISC-V (Linux and FreeBSD)
and MIPS/CHERI-MIPS (FreeBSD, CheriBSD), but without any cache effects.
Goel et al. [21,20] describe an ACL2 model for much of x86 that covers address
translation; and the Forvis [34] and RISCV-PLV [14] Haskell RISC-V ISA mod-
els are also complete enough to boot Linux. Syeda and Klein [49,50] provide
an somewhat idealised model for ARMv7 address translation and TLB mainte-
nance. Komodo [16] uses a handwritten model for a small part of ARMv7, as
do Guanciale et al. [25,12]. Romanescu et al. [44,43] do discuss address trans-
lation in the concurrent setting, but with respect to idealised models. Lustig et
al. [30] describe a concurrent model for address translation based on the Intel
Sandy Bridge microarchitecture, combined with a synopsis of some of the rele-
vant Linux code, but not an architectural semantics for machine-code programs.

8 Conclusion

The mainstream architectures are the most important programming languages

used in practice, and their systems aspects are fundamental to the security (or
lack thereof) of our computing infrastructure. We have established a robust
semantics for one of those systems aspects, soundly abstracting the hardware
complexities to a manageable model that captures the architectural intent. This
enables future work on reasoning, model-checking, and verification for real sys-
tems code.
Acknowledgements This work would not have been possible without generous
technical assistance from Arm. We thank Richard Grisenthwaite, Will Deacon,
Ian Caulfield, and Dave Martin for this. We also thank Hans Boehm, Stephen
Kell, Jaroslav Ševčík, Ben Titzer, and Andrew Turner, for discussions of how in-
struction cache maintenance is used in practice, and Alastair Reid for comments
on a draft. This work was partially supported by EPSRC grant EP/K008528/1
(REMS), ERC Advanced Grant 789108 (ELVER), an ARM iCASE award, and
ARM donation funding. This work is part of the CIFV project sponsored by
the Defense Advanced Research Projects Agency (DARPA) and the Air Force
Research Laboratory (AFRL), under contract FA8650-18-C-7809. The views,
opinions, and/or findings contained in this paper are those of the authors and
should not be interpreted as representing the official views or policies, either
expressed or implied, of the Department of Defense or the U.S. Government.
ARMv8-A system semantics: instruction fetch in relaxed architectures 651

References
1. Adir, A., Attiya, H., Shurek, G.: Information-flow models for shared memory with
an application to the PowerPC architecture. IEEE Trans. Parallel Distrib. Syst.
14(5), 502–515 (2003). https://doi.org/10.1109/TPDS.2003.1199067
2. Alglave, J., Fox, A., Ishtiaq, S., Myreen, M.O., Sarkar, S., Sewell, P.,
Zappa Nardelli, F.: The semantics of Power and ARM multiprocessor machine
code. In: Proc. DAMP 2009 (Jan 2009)
3. Alglave, J., Maranget, L.: The diy7 tool. http://diy.inria.fr/ (2019), accessed
2019-07-08
4. Alglave, J., Maranget, L.: The herd7 tool. http://diy.inria.fr/doc/herd.html/
(2019), accessed 2019-07-08
5. Alglave, J., Maranget, L., Deplaix, K., Didier, K., Sarkar, S.: The litmus7 tool.
http://diy.inria.fr/doc/litmus.html/ (2019), accessed 2019-07-08
6. Alglave, J., Maranget, L., Sarkar, S., Sewell, P.: Fences in weak memory models.
In: Proc. CAV (2010)
7. Alglave, J., Maranget, L., Sarkar, S., Sewell, P.: Litmus: running tests against
hardware. In: Proceedings of TACAS 2011: the 17th international conference on
Tools and Algorithms for the Construction and Analysis of Systems. pp. 41–44.
Springer-Verlag, Berlin, Heidelberg (2011), http://dl.acm.org/citation.cfm?id=
1987389.1987395
8. Alglave, J., Maranget, L., Tautschnig, M.: Herding Cats: Modelling, Simulation,
Testing, and Data Mining for Weak Memory. ACM TOPLAS 36(2), 7:1–7:74 (Jul
2014). https://doi.org/10.1145/2627752
9. ARM Limited: ARM architecture reference manual. ARMv8, for ARMv8-A archi-
tecture profile (Oct 2018), v8.4. ARM DDI 0487D.a (ID103018)
10. Armstrong, A., Bauereiss, T., Campbell, B., Gray, S.F.J.F.K.E., Kerneis, G., Kr-
ishnaswami, N., Mundkur, P., Norton-Wright, R., Pulte, C., Reid, A., Sewell, P.,
Stark, I., Wassell, M.: Sail. https://www.cl.cam.ac.uk/~pes20/sail/ (2019)
11. Armstrong, A., Bauereiss, T., Campbell, B., Reid, A., Gray, K.E., Norton, R.M.,
Mundkur, P., Wassell, M., French, J., Pulte, C., Flur, S., Stark, I., Krishnaswami,
N., Sewell, P.: ISA semantics for ARMv8-A, RISC-V, and CHERI-MIPS. In: Proc.
46th ACM SIGPLAN Symposium on Principles of Programming Languages (Jan
2019). https://doi.org/10.1145/3290384, proc. ACM Program. Lang. 3, POPL, Ar-
ticle 71
12. Baumann, C., Schwarz, O., Dam, M.: Compositional verification of security prop-
erties for embedded execution platforms. In: PROOFS@CHES 2017, 6th Interna-
tional Workshop on Security Proofs for Embedded Systems, Taipei, Taiwan, Friday
September 29th, 2017. pp. 1–16 (2017), http://www.easychair.org/publications
/paper/wkpS
13. Chong, N., Ishtiaq, S.: Reasoning about the ARM weakly consistent memory
model. In: MSPC (2008)
14. Clester, I.J., Bourgeat, T., Wright, A., Gruetter, S., Chlipala, A.: riscv-plv risc-v
isa formal specification. https://github.com/mit- plv/riscv- semantics (2019),
accessed 2019-07-01
15. Deacon, W.: The ARMv8 application level memory model. https://github.com
/herd/herdtools7/blob/master/herd/libdir/aarch64.cat (accessed 2019-07-01)
(2016)
16. Ferraiuolo, A., Baumann, A., Hawblitzel, C., Parno, B.: Komodo: Using verification
to disentangle secure-enclave hardware from software. In: Proceedings of the 26th
652 B. Simner et al.

Symposium on Operating Systems Principles, Shanghai, China, October 28-31,

2017. pp. 287–305 (2017). https://doi.org/10.1145/3132747.3132782
17. Flur, S., French, J., Gray, K., Pulte, C., Sarkar, S., Sewell, P.: rmem. www.cl.cam
.ac.uk/~pes20/rmem/ (2017)
18. Flur, S., Gray, K.E., Pulte, C., Sarkar, S., Sezgin, A., Maranget, L., Deacon, W.,
Sewell, P.: Modelling the ARMv8 architecture, operationally: Concurrency and
ISA. In: Proceedings of POPL: the 43rd ACM SIGPLAN-SIGACT Symposium on
Principles of Programming Languages (2016)
19. Flur, S., Sarkar, S., Pulte, C., Nienhuis, K., Maranget, L., Gray, K.E., Sezgin,
A., Batty, M., Sewell, P.: Mixed-size concurrency: ARM, POWER, C/C++11,
and SC. In: The 44st Annual ACM SIGPLAN-SIGACT Symposium on Prin-
ciples of Programming Languages, Paris, France. pp. 429–442 (Jan 2017).
https://doi.org/10.1145/3009837.3009839
20. Goel, S.: The x86isa books: Features, usage, and future plans. In: Pro-
ceedings 14th International Workshop on the ACL2 Theorem Prover and
its Applications, Austin, Texas, USA, May 22-23, 2017. pp. 1–17 (2017).
https://doi.org/10.4204/EPTCS.249.1, arXiv version: https://arxiv.org/abs/
1705.01225
21. Goel, S., Hunt, W.A., Kaufmann, M., Ghosh, S.: Simulation and formal verifica-
tion of x86 machine-code programs that make system calls. In: Proceedings of the
14th Conference on Formal Methods in Computer-Aided Design. pp. 18:91–18:98.
FMCAD ’14, FMCAD Inc, Austin, TX (2014), http://dl.acm.org/citation.cf
m?id=2682923.2682944
22. Gray, K.E., Kerneis, G., Mulligan, D., Pulte, C., Sarkar, S., Sewell, P.: An in-
tegrated concurrency and core-ISA architectural envelope definition, and test or-
acle, for IBM POWER multiprocessors. In: Proc. MICRO-48, the 48th Annual
IEEE/ACM International Symposium on Microarchitecture (Dec 2015)
23. Gu, R., Shao, Z., Chen, H., Wu, X.N., Kim, J., Sjöberg, V., Costanzo, D.: Cer-
tiKOS: An extensible architecture for building certified concurrent OS kernels.
In: 12th USENIX Symposium on Operating Systems Design and Implementation,
OSDI 2016, Savannah, GA, USA, November 2-4, 2016. pp. 653–669 (2016), https:
//www.usenix.org/conference/osdi16/technical-sessions/presentation/gu
24. Gu, R., Shao, Z., Kim, J., Wu, X.N., Koenig, J., Sjöberg, V., Chen, H., Costanzo,
D., Ramananandro, T.: Certified concurrent abstraction layers. In: Proceedings
of the 39th ACM SIGPLAN Conference on Programming Language Design and
Implementation, PLDI 2018, Philadelphia, PA, USA, June 18-22, 2018. pp. 646–
661 (2018). https://doi.org/10.1145/3192366.3192381
25. Guanciale, R., Nemati, H., Dam, M., Baumann, C.: Provably secure memory iso-
lation for linux on ARM. Journal of Computer Security 24(6), 793–837 (2016).
https://doi.org/10.3233/JCS-160558
26. Intel Corporation: Intel 64 and ia-32 architectures software developer’s manual
combined volumes: 1, 2a, 2b, 2c, 2d, 3a, 3b, 3c, 3d and 4. https://software.i
ntel.com/en-us/download/intel-64-and-ia-32-architectures-sdm-combined-
volumes- 1- 2a- 2b- 2c- 2d- 3a- 3b- 3c- 3d- and- 4, accessed 2019-06-30 (May 2019),
325462-070US
27. Klein, G., Andronick, J., Elphinstone, K., Murray, T., Sewell, T., Kolan-
ski, R., Heiser, G.: Comprehensive formal verification of an OS microker-
nel. ACM Transactions on Computer Systems 32(1), 2:1–2:70 (Feb 2014).
https://doi.org/10.1145/2560537
ARMv8-A system semantics: instruction fetch in relaxed architectures 653

28. Kumar, R., Myreen, M.O., Norrish, M., Owens, S.: CakeML: a verified imple-
mentation of ML. In: The 41st Annual ACM SIGPLAN-SIGACT Symposium on
Principles of Programming Languages, POPL ’14, San Diego, CA, USA, January
20-21, 2014. pp. 179–192 (2014). https://doi.org/10.1145/2535838.2535841
29. Leroy, X.: A formally verified compiler back-end. J. Autom. Reasoning 43(4), 363–
446 (2009). https://doi.org/10.1007/s10817-009-9155-4
30. Lustig, D., Sethi, G., Martonosi, M., Bhattacharjee, A.: COATCheck: Verifying
memory ordering at the hardware-OS interface. SIGOPS Oper. Syst. Rev. 50(2),
233–247 (Mar 2016). https://doi.org/10.1145/2954680.2872399
31. Maranget, L., Sarkar, S., Sewell, P.: A tutorial introduction to the ARM and
POWER relaxed memory models. Draft available from http://www.cl.cam.ac.
uk/~pes20/ppc-supplemental/test7.pdf (2012)
32. Mulligan, D.P., Owens, S., Gray, K.E., Ridge, T., Sewell, P.: Lem: reusable engi-
neering of real-world semantics. In: Proceedings of ICFP 2014: the 19th ACM SIG-
PLAN International Conference on Functional Programming. pp. 175–188 (2014).
https://doi.org/10.1145/2628136.2628143
33. Myreen, M.O.: Verified just-in-time compiler on x86. In: Proceedings of the
37th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Program-
ming Languages. pp. 107–118. POPL ’10, ACM, New York, NY, USA (2010).
https://doi.org/10.1145/1706299.1706313
34. Nikhil, R.S., Sharma, N.N.: Forvis: A formal RISC-V ISA specification. https:
//github.com/rsnikhil/Forvis_RISCV-ISA-Spec (2019), accessed 2019-07-01
35. Owens, S., Sarkar, S., Sewell, P.: A better x86 memory model: x86-TSO. In: Pro-
ceedings of TPHOLs 2009: Theorem Proving in Higher Order Logics, LNCS 5674.
pp. 391–407 (2009)
36. Pulte, C.: The Semantics of Multicopy Atomic ARMv8 and RISC-V. Ph.D. thesis,
University of Cambridge (2019), https://doi.org/10.17863/CAM.39379
37. Pulte, C., Flur, S., Deacon, W., French, J., Sarkar, S., Sewell, P.: Simplifying ARM
Concurrency: Multicopy-atomic Axiomatic and Operational Models for ARMv8.
In: Proceedings of the 45th ACM SIGPLAN Symposium on Principles of Program-
ming Languages (Jan 2018). https://doi.org/10.1145/3158107
38. Raad, A., Wickerson, J., Neiger, G., Vafeiadis, V.: Persistency seman-
tics of the Intel-x86 architecture. PACMPL 4(POPL), 11:1–11:31 (2020).
https://doi.org/10.1145/3371079
39. Raad, A., Wickerson, J., Vafeiadis, V.: Weak persistency semantics from the
ground up: Formalising the persistency semantics of ARMv8 and transactional
models. Proc. ACM Program. Lang. 3(OOPSLA), 135:1–135:27 (Oct 2019).
https://doi.org/10.1145/3360561
40. Reid, A.: Trustworthy specifications of ARM v8-A and v8-M system level archi-
tecture. In: FMCAD 2016. pp. 161–168 (October 2016), https://alastairreid.g
ithub.io/papers/fmcad2016-trustworthy.pdf
41. Reid, A.: ARM releases machine readable architecture specification. https://alas
tairreid.github.io/ARM-v8a-xml-release/ (Apr 2017)
42. Reid, A., Chen, R., Deligiannis, A., Gilday, D., Hoyes, D., Keen, W., Pathirane,
A., Shepherd, O., Vrabel, P., Zaidi, A.: End-to-end verification of processors with
ISA-Formal. In: Chaudhuri, S., Farzan, A. (eds.) Computer Aided Verification -
28th International Conference, CAV 2016, Toronto, ON, Canada, July 17-23, 2016,
Proceedings, Part II. Lecture Notes in Computer Science, vol. 9780, pp. 42–58.
Springer (2016)
654 B. Simner et al.

43. Romanescu, B., Lebeck, A., Sorin, D.J.: Address translation aware
memory consistency. IEEE Micro 31(1), 109–118 (Jan 2011).
https://doi.org/10.1109/MM.2010.99
44. Romanescu, B.F., Lebeck, A.R., Sorin, D.J.: Specifying and dynamically verifying
address translation-aware memory consistency. In: Proceedings of the Fifteenth
Edition of ASPLOS on Architectural Support for Programming Languages and
Operating Systems. pp. 323–334. ASPLOS XV, ACM, New York, NY, USA (2010).
https://doi.org/10.1145/1736020.1736057
45. Sarkar, S., Memarian, K., Owens, S., Batty, M., Sewell, P., Maranget, L.,
Alglave, J., Williams, D.: Synchronising C/C++ and POWER. In: Pro-
ceedings of PLDI 2012, the 33rd ACM SIGPLAN conference on Program-
ming Language Design and Implementation (Beijing). pp. 311–322 (2012).
https://doi.org/10.1145/2254064.2254102
46. Sarkar, S., Sewell, P., Alglave, J., Maranget, L., Williams, D.: Understanding
POWER multiprocessors. In: Proceedings of PLDI 2011: the 32nd ACM SIGPLAN
conference on Programming Language Design and Implementation. pp. 175–186
(2011). https://doi.org/10.1145/1993498.1993520
47. Sarkar, S., Sewell, P., Zappa Nardelli, F., Owens, S., Ridge, T., Braibant,
T., Myreen, M., Alglave, J.: The semantics of x86-CC multiprocessor machine
code. In: Proceedings of POPL 2009: the 36th annual ACM SIGPLAN-SIGACT
symposium on Principles of Programming Languages. pp. 379–391 (Jan 2009).
https://doi.org/10.1145/1594834.1480929
48. Sewell, P., Sarkar, S., Owens, S., Zappa Nardelli, F., Myreen, M.O.: x86-TSO: A
rigorous and usable programmer’s model for x86 multiprocessors. Communications
of the ACM 53(7), 89–97 (Jul 2010), (Research Highlights)
49. Syeda, H., Klein, G.: Reasoning about translation lookaside buffers. In: LPAR-21,
21st International Conference on Logic for Programming, Artificial Intelligence and
Reasoning, Maun, Botswana, May 7-12, 2017. pp. 490–508 (2017), http://www.ea
sychair.org/publications/paper/340347
50. Syeda, H.T., Klein, G.: Program verification in the presence of cached address
translation. In: Interactive Theorem Proving - 9th International Conference, ITP
2018, Held as Part of the Federated Logic Conference, FloC 2018, Oxford, UK,
July 9-12, 2018, Proceedings. pp. 542–559 (2018). https://doi.org/10.1007/978-3-
319-94821-8_32
51. Tan, Y.K., Myreen, M.O., Kumar, R., Fox, A.C.J., Owens, S., Norrish, M.:
The verified CakeML compiler backend. J. Funct. Program. 29, e2 (2019).
https://doi.org/10.1017/S0956796818000229
52. Waterman, A., Asanović, K. (eds.): The RISC-V Instruction Set Manual Vol-
ume I: Unprivileged ISA (Dec 2018), document Version 20181221-Public-Review-
draft. Contributors: Arvind, Krste Asanović, Rimas Avižienis, Jacob Bachmeyer,
Christopher F. Batten, Allen J. Baum, Alex Bradbury, Scott Beamer, Preston
Briggs, Christopher Celio, Chuanhua Chang, David Chisnall, Paul Clayton, Palmer
Dabbelt, Roger Espasa, Shaked Flur, Stefan Freudenberger, Jan Gray, Michael
Hamburg, John Hauser, David Horner, Bruce Hoult, Alexandre Joannou, Olof
Johansson, Ben Keller, Yunsup Lee, Paul Loewenstein, Daniel Lustig, Yatin Man-
erkar, Luc Maranget, Margaret Martonosi, Joseph Myers, Vijayanand Nagarajan,
Rishiyur Nikhil, Jonas Oberhauser, Stefan O’Rear, Albert Ou, John Ousterhout,
David Patterson, Christopher Pulte, Jose Renau, Colin Schmidt, Peter Sewell,
Susmit Sarkar, Michael Taylor, Wesley Terpstra, Matt Thomas, Tommy Thorn,
Caroline Trippel, Ray VanDeWalker, Muralidaran Vijayaraghavan, Megan Wachs,
ARMv8-A system semantics: instruction fetch in relaxed architectures 655

Andrew Waterman, Robert Watson, Derek Williams, Andrew Wright, Reinoud

Zandijk, and Sizhuo Zhang

Fabian Thorand and Jurriaan Hage

Dept. of Information and Computing Sciences, Utrecht University The Netherlands

[email protected], [email protected]

Abstract. The precision of a static analysis can be improved by increas-

ing the context-sensitivity of the analysis. In a type-based formulation
of static analysis for functional languages this can be achieved by, e.g.,
introducing let-polyvariance or subtyping. In this paper we go one step
further by defining a higher-ranked polyvariant type system so that even
properties of lambda-bound identifiers can be generalized over. We do
this for dependency analysis, a generic analysis that can be instantiated
to a range of different analyses that in this way all can profit.
We prove that our analysis is sound with respect to a call-by-name se-
mantics and that it satisfies a so-called noninterference property. We
provide a type reconstruction algorithm that we have proven to be ter-
minating, and sound and complete with respect to its declarative speci-
fication. Our principled description can serve as a blueprint for making
other analyses higher-ranked.

1 Introduction

The typical compiler for a statically typed functional language will perform a
number of analyses for validation, optimisation, or both (e.g., strictness anal-
ysis, control-flow analysis, and binding time analysis). These analyses can be
specified as a type-based static analysis so that vocabulary, implementation and
concepts from the world of type systems can be reused in this setting [19,24].
In that setting the analysis properties are taken from a language of annotations
which adorn the types computed for the program during type inference: the anal-
ysis is specified as an annotated type system, and the payload of the analysis
corresponds to the annotations computed for a given program.
Consider for example binding-time analysis [5,7]. In this case, we have a two-
value lattice of annotations containing S for static and D for dynamic (where
⊥ = S D = , so that whenever an expression is annotated with S, it
can be soundly changed to D, because that is a strictly weaker property). An
expression that is known to be static may be evaluated at compile time, because
the analysis has determined that all the values that determine its outcome are
in fact available at compile-time while all other expressions are annotated with
D, and must be evaluated at run-time; the goal of binding-time analysis is then
to (soundly) assign S to as many expressions as possible.
c The Author(s) 2020
P. Müller (Ed.): ESOP 2020, LNCS 12075, pp. 656–683, 2020.
https://doi.org/10.1007/978-3-030-44914-8_ 24
Higher-Ranked Annotation Polymorphic Dependency Analysis 657

Static analyses may diﬀer in precision, e.g., a monovariant binding-time anal-

ysis lacks context-sensitivity for let-bound identifiers (although some of it can
be recovered with subtyping). Assuming id to be the identity function, if in the
program
let id x = x in . . id s . . id d . .
the subexpression s is a statically known integer, which we denote as s : intS ,
and d : intD a dynamic integer, then for id we arrive at intD → intD, so
that the property found for id s is that it is a dynamic integer. Clearly, however,
if the value of s is known statically then also that of id s is! The fact that
values with different properties flow to a function and we have to be (overly)
pessimistic for some of these is a phenomenon sometimes called poisoning [28].
Context-sensitivity reduces poisoning; it can be achieved by making the analysis
polyvariant. In that case, our type for id may become ∀β.intβ → intβ, so
that for the first call to id we may instantiate β with S and for the second
choose D, essentially mimicking the polymorphic lambda-calculus at the level of
annotations.
But what about a function like
foo = λf . (f d , f s)
in which we have two calls to a lambda-bound function argument f ? Can we treat
these context-sensitively as well, so that we can have the most precise types for
both calls, independent of each other? The answer is: yes, we can.
Independence can be achieved by inferring for foo a type that associates with
f an annotation polymorphic type,
∀β1 .(∀β0 .intβ0 → intβ1 β0 )
Here, β0 ranges over simple annotations (such as S and D), and β1 ranges over
annotation level functions (in the terminology of this paper, these annotations
are higher-sorted; see section 3). The annotation variable β0 is a placeholder
for the analysis property of the actual argument to f , while β1 represents how
that property propagates to the value returned by f . If the identity function
∀β.intβ → intβ is passed to foo, a pair with annotated type intD × intS
will be returned. This is because the types of f d and f s can be determined
independently of each other, because the choice for β0 can be made separately
for each call. The “price” we pay is that we have to know how the annotations on
the values returned by f can be derived from the annotations on the arguments.
This is exactly what β1 represents.
If β0 or β1 would range over (annotated) types, then the underlying language
itself would be higher-ranked, and inference in that case is known to be undecid-
able [14]. However, as we show in this paper, if they range only over annotations
(even higher-sorted ones), then inference may become decidable again. Why is
that? Intuitively, this is because the underlying types provide structure to the
analysis inference algorithm, while a higher-ranked polymorphic type system
does not have this advantage.
658 F. Thorand and J. Hage

In which situations can we expect to beneﬁt from higher-ranked polyvari-

ance? Generally speaking, this is when we have functions of order 2 and higher,
functions that often show up in idiomatic functional code.
Languages like Haskell do support higher-rank types [13]. Decidability is
not problematic then, because the compiler expects the programmer to provide
the higher-rank type signatures where necessary, and the compiler only needs
to verify that the provided types are consistent: type checking is decidable. In
our situation this is typically not acceptable: we cannot expect programmers
to provide explicit control-flow [12] or binding-time information. So we have to
insist on full inference of analysis information, and this paper shows how this
can be done for dependency analysis [1].
Dependency analysis is in fact a family of analyses; instances include binding-
time analysis, exception analysis, secure information flow analysis and static
slicing. The precision of our higher-ranked polyvariant annotated type system
for dependency analysis thereby carries over immediately to the instances, and
metatheoretical properties we prove, like a noninterference theorem [8], need to
be proven only once.
In summary, this paper offers the following contributions. We (1) define a
higher-ranked annotation polymorphic type system for a generic dependency
analysis (section 4) for a call-by-name language that takes its annotations from
a simply typed lambda-calculus enriched with lattice operations (section 3). The
analysis also supports polyvariant recursion [10] to improve precision for certain
recursive functions. Due to the principled way in which the analysis is set-up it
can serve as a blueprint for giving other analyses the same treatment. We (2)
prove our system sound with respect to a call-by-name operational semantics. We
also formulate and prove a noninterference theorem for our system (section 5).
We (3) give a type reconstruction algorithm that is sound and complete with
respect to the type system (section 6) and provide a prototype implementation
(section 7). For reasons of space we omit many details that are available in a
separate document [26].

2 Intuition and motivation

Before we go on to the technical details of this paper, we want to elaborate upon

our intuitive description from the introduction. We do this by means of a few
small examples, keeping the discussion informal. Formally discussed examples,
as generated by our implementation, become big and hard to read pretty quickly;
these can be found in section 7.
We start with a few examples in which binding-time analysis is the depen-
dency analysis instance, followed by a few examples that use security flow anal-
ysis; our implementation supports both instances. We note that our implemen-
tation supports a few more language constructs than the formal specification
given in this paper, giving us a bit more flexibility. Neither, however, supports
polymorphism at the type level. This substantially simplifies the technicalities.
For the following example
Higher-Ranked Annotation Polymorphic Dependency Analysis 659

foo : ((int → int) → int) → int × int

foo = λf : (int → int) → int.(f (λx : int.x ), f (λx : int.0))

our analysis can derive a higher-ranked polyvariant type for f ,

∀β1 .(∀β2 .intβ2 → intβ1 β2 ) → intβ3 β2 β1

where β1 and β2 can be instantiated independently for each of the two calls to
f in foo, and β3 is universally bound by foo and represents how the argument f
uses its function argument.
Since the argument to f is itself a function, the information that flows out
of, say, the first call to f can be independent of the analysis of the function
that flows into the second call (and vice versa), thereby avoiding unnecessary
poisoning. This means that the binding-time of, say, the second component of
the pair depends only on f and the function λx : int.0, irrespective of f also
receiving λx : int.x as argument to compute the first component.
For the next example, let us consider security flow analysis in which we have
annotations L and H that designate values (call these L-values and H-values)
of low respectively high confidentiality. An important scenario where additional
precision can be achieved is when analyzing Haskell code in which type classes
have been desugared to dictionary-passing functional core. A function like

g x y = (x + y, y + y)

is then transformed into something like g (+) x y = (x + y, y + y). Now,

consider the case that we pass an H-value to x and an L-value to y; the operator
(+) produces an L-value if and only if both arguments are L-values. Without
higher-ranked annotations, the annotation on the ﬁrst argument to (+) has to be
consistent with all uses of (+). Because x is an H-value, that will then also be the
case for the second call to (+), leading to a pair of values of which the components
are both H-values. With higher-ranked annotations, we can instantiate the two
instances independently, and the second component of the pair is analyzed to
produce an L-value. Functions in Haskell that use type classes are extremely
common.

3 The λ -calculus

An essential ingredient of our annotated type system is the language of anno-

tations that we use to decorate our types and to represent the dependencies
resulting from evaluating an expression. Indeed, the fact that annotations are
in fact “programs” in a lambda calculus is what allows us to make our analysis
a higher-ranked polyvariant one. For the purpose of this paper, we generalize
the λ∪ -calculus of [16] to the λ -calculus (λ for short) a simply typed lambda
calculus extended with a lattice structure.
The syntax of λ is given in ﬁgure 1; from now on, we refer to its types
exclusively as sorts. Here, κ ranges over sorts, β over annotation variables, etc.
660 F. Thorand and J. Hage

κ ∈ AnnSort ::= (base sort)

Fig. 1: The syntax of the λ -calculus, sorts and annotations

In order to avoid confusion with the field of (algebraic) effects, we refer to terms
of λ as dependency terms or dependency annotations. Terms are either of base
sort , representing values in the underlying lattice L, or of function sort κ1 ⇒ κ2 .
On the term level, we allow arbitrary elements of the underlying lattice and
taking binary joins, in addition to the usual variables, function applications and
lambda abstractions. Lattice elements are assumed to be taken from a bounded
join-semilattice L, an algebraic structure L, consisting of an underlying set
L and an associative, commutative and idempotent binary operation , called
join (we usually write ∈ L for ∈ L), and a least element ⊥.
The sorting rules of λ are straightforward (see [26]). Values of the underlying
lattice are always of sort , and the join operator is defined on arbitrary terms
of the same sort:
Σ s ξ1 : κ Σ s ξ2 : κ
[S-Join]
Σ s ξ1 ξ2 : κ
The sorting rule uses sort environments denoted by the letter Σ that map
annotation variables β to sorts κ. We denote the set of sort environments by
SortEnv. More precisely, a sort environment or sort context Σ is a finite list of
bindings from annotation variables β to sorts κ. The empty context is written
as ∅ (in code as []), and the context Σ extended with the binding of the variable

V = L
Vκ1 ⇒κ2 = {f : Vκ1 → Vκ2 | f mono}

ρ : AnnVar →ﬁn {Vκ | κ ∈ AnnSort}
βρ = ρ(β)
λβ :: κ1 .ξρ = λv ∈ Vκ1 . ξρ[β→v]
ξ1 ξ2 ρ = ξ1 ρ (ξ2 ρ )
ρ =
ξ1 ξ2 ρ = ξ1 ρ ξ2 ρ
Fig. 2: The semantics of λ -calculus
Higher-Ranked Annotation Polymorphic Dependency Analysis 661

β to the sort κ is written Σ, β : κ. We denote the set of annotation variables

in the context Σ with dom(Σ). When we write, Σ(β) = κ this means that
β ∈ dom(Σ) and the rightmost occurrence of β binds it to κ. Moreover, Σ \ B
where B ⊆ AnnVar denotes the context Σ where all bindings of annotation
variables in B have been removed. In the remainder of this paper, we shall
overload this notation for all kinds of other environments we shall be needing,
including type environments, and annotated type environments.
The λ -calculus enjoys a number of properties, many of which are what one
might expect; we have put these and their proofs in [26].
A substitution is a map from variables to terms usually denoted by the letter
θ. The application of a substitution θ to a term ξ is written θξ and replaces all
free variables in ξ that are also in the domain of θ with the corresponding terms
they are mapped to. A concrete substitution replacing the variables β1 , . . . , βn
with terms ξ1 , . . . , ξn is written [ξ1 /β1 , . . . , ξn /βn ].
Assuming the usual definitions for the pointwise extension of a lattice L,
and for monotone (order-preserving) functions between lattices, Figure 2 shows
the denotational semantics of λ , where we employ the pointwise lifting of ∪ to
functions to give semantics to the join of λ . The universe Vκ denotes the lattice
that is represented by the sort κ. The base sort represents the underlying
lattice L and the function sort κ1 ⇒ κ2 represents the lattice constructed by
pointwise extension of the lattice Vκ2 restricted to monotone functions.
The denotation function ·ρ is parameterized with an environment ρ of the
given type that provides the values of variables. The denotation of a lambda
term is simply an element of the corresponding function space. Applications are
therefore mapped directly to the underlying function application of the meta-
theory. This is unlike the λ∪ -calculus of [16] where lambda terms are mapped
to singleton sets of functions and function application is defined in terms of the
union of the results of individually applying each function. The crucial difference
is that we have offloaded this complexity into the definition of the pointwise
extension of lattices. It is therefore important to note that the join operator
used in the denotation of a term ξ1 ξ2 depends on the sort κ of this term and
belongs to the lattice Vκ .
An environment ρ : AnnVar →fin {Vκ | κ ∈ AnnSort} and a sort envi-

ronment Σ are compatible if dom(Σ) = dom(ρ) and for all β ∈ dom(Σ) we have
ρ(β) ∈ VΣ(β) . Given two dependency terms ξ1 and ξ2 and a sort κ such that
Σ s ξ1 : κ and Σ s ξ2 : κ, we say that ξ2 subsumes ξ1 under the environment
Σ, written Σ sub ξ1 ξ2 , if for all environments ρ compatible with Σ, we have
ξ1 ρ ξ2 ρ . They are semantically equal under Σ, written Σ ξ1 ≡ ξ2 , if for
all environments ρ compatible with Σ, we have ξ1 ρ = ξ2 ρ .

4 The declarative type system

The types and syntax of our source language are given in ﬁgure 3. The types
of our source language consist of a unit type, and product, sum and function
types. As mentioned earlier, let-polymorphism at the type level is not part of the
662 F. Thorand and J. Hage

τ ∈ Ty ::= unit (unit type)

| τ 1 + τ2 (sum type)
| τ 1 × τ2 (product type)
| τ1 → τ 2 (function type)
t ∈ Tm ::= x (variable)
| () (unit constructor)
| λx : τ.t (abstraction)
| t 1 t2 (application)
| (t1 , t2 ) (pair constructor)
| proji (t) (pair projections)
| inlτ2 (t) | inrτ1 (t) (sum constructors)
| case t of {inl(x ) → t1 ; inr(y) → t2 } (sum eliminator)
| μx : τ.t (ﬁxpoint)
| seq t1 t2 (forcing)
| ann (t) (raise annotation level to ∈ L)

Fig. 3: The types and terms of the source language

type system. The language itself is then hardly suprising and includes variables,
a unit constant, lambda abstraction, function application, projection functions
for product types, sum constructors, a sum eliminator (case), fixpoints, seq for
explicitly forcing evaluation in our call-by-name language, and, finally, a special
operation ann (t) that raises the annotation level of t to . We omit the underly-
ing type system for the source language since it consists mostly of the standard
rules (see [26]). A notable exception is the rule for ann (t). Such an explicitly
annotated term has the same underlying type as t:
Γ t :τ [U-Ann]
Γ ann (t) : τ
The annotation imposed on t only becomes relevant in the annotated type
system that we discuss next. In the following, we assume the usual definitions
for computing the set of free term variables of a term, ftv(t).

The annotated type system The source language is simply a desugared

variant of the functional language a programmer deals with. The target language
has the same structure, but adds dependency annotations to the source syntax.
These annotations are the payload of the dependency analaysis and computed
by the algorithm given in section 6, so that the analysis results can be employed
in the back-end of a compiler. In other words, the algorithm elaborates a source
level term into a target term.
The syntax of the target language is shown in ﬁgure 4. Annotated types of
the target language are denoted by τ and annotated terms are denoted by t.
The annotations that we put on compound types, as well as their components
are not just there for uniformity. Because of our non-strict semantics and the
Higher-Ranked Annotation Polymorphic Dependency Analysis 663

Fig. 4: The annotated types and terms of the target language

presence of seq, we can observe the eﬀects on a pair constructor independently

of its values, so we have separate annotations to represent these.
On the type level, there is an additional construct ∀β :: κ.τ quantifying over
an annotation variable β of sort κ. Furthermore, the recursive occurrences in the
sum, product and arrow types now each carry an annotation. On the term level,
the explicit type annotations of lambda expressions and fixpoints are now an-
notated types and also include a dependency annotation. Moreover, dependency
abstraction and application have been added to reflect the quantification of de-
pendency variables on the type level. We denote the set of free (term) variables
in a target term t by ftv(t ).
The formal definition of well-formedness for annotated types can be found
in [26]. Informally, a type is well-formed only if all annotations are of sort and
all annotation variables that are used have previously been bound.
Below, we assume the unsurprising recursive definitions for computing the
underlying terms t and underlying types τ that correspond to annotated
terms t and annotated types τ. We also straightforwardly extend the definition
of free annotation variables to annotated types, and denote these by fav( τ ).

Subtyping To deﬁne subtyping we need an auxiliary relation that says when

two annotated types τ1 and τ2 have the same shape. The unsurprising formal
deﬁnition is in [26], but essentially they have the same syntactic structure, and
in the forall case, quantify over the same annotation variable. It can be quite
easily proven that if two types have the same shape, then they have the same
underlying type. This is not true the other way around: the annotated types
∀β1 .∀β2 .intβ1 → intβ1 β2 and ∀β1 .intβ1 → intβ1 have the same under-
lying type, int → int, but do not have the same shape.
Figure 5 shows the rules deﬁning the subtyping relation on annotated types
of the same shape, that allows us to weaken the annotations on a type to a less
demanding one. Intuitively, a type τ1 is a subtype of τ2 under a sort environment
664 F. Thorand and J. Hage

[Sub-Refl]
Σ sub τ τ
Σ sub τ1 τ2 Σ sub τ2 τ3
[Sub-Trans]
Σ sub τ1 τ3
Σ, β :: κ sub τ1 τ2
[Sub-Forall]
Σ sub ∀β :: κ.τ1 ∀β :: κ.
τ2
Σ sub τ1 τ1 Σ sub τ2 τ2

Σ sub ξ1
ξ1 Σ sub ξ2
ξ2
[Sub-Prod]
Σ sub τ1 ξ1 × τ2 ξ2 τ1 ξ1 × τ2 ξ2
Σ sub τ1 τ1 Σ sub τ2 τ2

Σ sub ξ1
ξ1 Σ sub ξ2
ξ2
[Sub-Arr]
Σ sub τ1 ξ1 → τ2 ξ2 τ1 ξ1 → τ2 ξ2
Fig. 5: Subtyping relation (Σ sub τ1 τ2 ), [Sub-Sum] is like [Sub-Prod]

Σ, written Σ sub τ1 τ2 , if a value of type τ1 can be used in places where a value
of type τ2 is required. The subtyping relation only relates the annotations inside
the types using the subsumption relation Σ sub ξ1 ξ2 between dependency
terms. Moreover, the subtyping relation implicitly demands that both types are
well-formed under the environment. The [Sub-Forall] rule requires that the
quantiﬁed variable has the same name in both types. This is not a restriction,
as we can simply rename the variables in one or both of the types accordingly
in order to make them match and prevent unintentional capturing of previously
free variables. Note that [Sub-Arr] is contravariant for argument positions. We
omitted [Sub-Sum] which can be derived from [Sub-Prod] by replacing × with
+.

The annotated type rules An annotated type environment Γ is deﬁned anal-

ogously to sort environments, but instead maps term variables x to pairs of an
annotated type τ and a dependency term ξ. We extend the definition of the set
of free annotation variables to annotated environments by taking the union of
the free annotation variables of all annotated types and dependency terms oc-
curring in the environment, denoted by fav(Γ). We denote the set of annotated
type environments by AnnTyEnv.
We have now all the definitions in place in order to define the declarative
annotated type system shown in figure 6. It consists of judgments of the form
Σ | Γ te t : τ & ξ expressing that under the sort environment Σ and the
annotated type environment Γ, the annotated term t has the annotated type τ
and the dependency term ξ. The dependency term in this context is also called
Higher-Ranked Annotation Polymorphic Dependency Analysis 665

the dependency term of t 1 . It is implicitly assumed that every type τ is also well-
formed under Σ, i.e. Σ wft τ, and that the resulting dependency annotation ξ
is of sort , i.e. Σ s ξ : .
We now discuss some of the more interesting rules of figure 6. In [T-Var],
both the annotated type and the dependency annotation are looked up in the
environment. The dependency annotation of the unit value defaults to the least
annotation in [T-Unit]. While we could admit an arbitrary dependency anno-
tation here, the same can be achieved by using the subtyping rule [T-Sub]. We
employ this principle more often, e.g., in [T-Abs], and [T-Pair]. This essentially
means that the context in which such a term is used completely determines the
annotation.
The rule [T-App] may seem overly restrictive by requiring that the types
and dependency annotations of the arguments match, and that the dependency
annotations of the return value and the function itself are the same. However, in
combination with the subtyping rule [T-Sub], this effectively does not restrict
the analysis in any way. We see the same happening in other rules, such as
[T-Case] and [T-Proj]. Note that the dependency annotation of the argument
does not play a role in the resulting dependency annotation of the application.
This is because we are dealing with a call by name semantics which means that
the argument is not necessarily evaluated before the function call. It should be
noted that this does not mean that the dependency annotations of arguments
are ignored completely. If the body of a function makes use of an argument, the
type system makes sure that its dependency annotation is also incorporated into
the result.
When constructing a pair (rule [T-Pair]), the dependency annotations of
the components are stored in the type while the pair itself is assigned the least
dependency annotation. When accessing a component of a pair (rule [T-Proj]),
we require that the dependency annotation of the pair matches the dependency
annotation of the projected component. Again, this is no restriction due to the
subtyping rule.
In [T-Inl/Inr], the argument to the injection constructor only determines
the type and annotation of one component of the sum type while the other
component can be chosen arbitrarily as long as the underlying type matches the
annotation on the constructor. The destruction of sum types happens in a case
statement that is handled by rule [T-Case]. Again, to keep the rule simple and
without loss of precision due to judicious use of rule [T-Sub], we may demand
that the types of both branches match, and that additionally the dependency
annotations of both branches and the scrutinee are equal.
The annotation rule [T-Ann] requires that the dependency annotation of
the term being annotated is at least as large as the lattice element . In the
fixpoint rule, [T-Fix], not only the types but also the dependency annotations
of the term itself and the bound variables must match. Note that this rule also
1
Following the literature of type and effect systems we would much like to use the
term “effect” at this point, but decided to use a different term to avoid confusion
with the literature on effect handlers.
666 F. Thorand and J. Hage

Γ(x) = τ & ξ
[T-Var]
Σ | Γ te x : τ & ξ

[T-Unit]
Σ | Γ te () : unit
&⊥

Σ | Γ, x : τ1 & ξ1 te t : τ2 & ξ2

[T-Abs]
Σ | Γ te λx : τ1 & ξ1 .t : τ1 ξ1 → τ2 ξ2 & ⊥

Σ | Γ te t1 : τ1 ξ1 → τ2 ξ2 & ξ2 Σ | Γ te t2 : τ1 & ξ1

[T-App]
Σ | Γ te t1 t2 : τ2 & ξ2

Σ | Γ te t1 : τ1 & ξ1 Σ | Γ te t2 : τ2 & ξ2

[T-Pair]
Σ | Γ te (t1 , t2 ) : τ1 ξ1 × τ2 ξ2 & ⊥

Σ | Γ te t : τ1 ξ1 × τ2 ξ2 & ξi

[T-Proj]
Σ | Γ te proj (t) : τi & ξi
i

Σ | Γ te t : τ1 & ξ1
[T-Inl]

Σ | Γ te inlτ2 (t) : τ1 ξ1 + τ2 ξ2 & ⊥

Σ | Γ te t : τ2 & ξ2
[T-Inr]
Σ | Γ te inrτ1 (t) : τ1 ξ1 + τ2 ξ2 & ⊥

Σ | Γ, x : τ1 & ξ1 te t1 : τ & ξ

Σ | Γ te t : τ1 ξ1 + τ2 ξ2 & ξ Σ | Γ, y : τ2 & ξ2 te t2 : τ & ξ
[T-Case]

Σ | Γ te case t of {inl(x ) → t1 ; inr(y) → t2 } : τ & ξ

Σ | Γ te t : τ & ξ Σ sub
ξ
[T-Ann]

Σ | Γ te ann (t) : τ & ξ

Σ | Γ, x : τ & ξ te t : τ & ξ

[T-Fix]
Σ | Γ te μx : τ & ξ.t : τ & ξ

Σ | Γ te t1 : τ1 & ξ Σ | Γ te t2 : τ2 & ξ

[T-Seq]

Σ | Γ te seq t1 t2 : τ2 & ξ

Σ | Γ te t : τ & ξ Σ sub τ τ Σ sub ξ

ξ
[T-Sub]
Σ | Γ te t : τ & ξ

Σ, β : κ | Γ te t : τ & ξ β ∈ fav(Γ) ∪ fav(ξ)

[T-AnnAbs]

Σ | Γ te Λβ :: κ.t : ∀β :: κ.
τ &ξ

Σ | Γ te t : ∀β :: κ.
τ &ξ Σ s ξ : κ
[T-AnnApp]
Σ | Γ te t ξ : [ξ / β ]
τ &ξ
Fig. 6: Declarative annotated type system (Σ | Γ te
t : τ & ξ)
Higher-Ranked Annotation Polymorphic Dependency Analysis 667

v ∈ Nf ::= λx : τ & ξ.t | Λβ :: κ.t | () | inlτ (t) | inrτ (t) | (t1 , t2 )

v ∈ Nf ::= v | ann (v )

Fig. 7: Values in the target language

admits polyvariant recursion [23], since quantiﬁcation can occur anywhere in

an annotated type. Since seq t1 t2 forces the evaluation of its first argument,
it requires that t1 ’s dependency annotation is part of the final result. This is
justified, because the result depends on the termination behavior of t1 .
The subtyping rule [T-Sub] allows us to weaken the annotations nested inside
a type through the subtyping relation (see figure 5), as well as the dependency
annotations itself through the subsumption relation. The rule [T-AnnAbs] in-
troduces an annotation variable β of sort κ in the body t of the abstraction.
The second premise ensures that the annotation variable does not escape its
scope determined by the quantification on the type level. The annotation appli-
cation rule [T-AnnApp] allows the instantiation of an annotation variable with
an arbitrary well-sorted dependency term.

5 Metatheory
In this section we develop a noninterference proof for our declarative type system,
based on a small-step operational call-by-name semantics for the target language.
Figure 7 deﬁnes the values of the target language, i.e. those terms that cannot
be further evaluated. Apart from a technicality related to annotations, they
correspond exactly to the weak head normal forms of terms. The distinction for
Nf ⊂ Nf is made to ensure that there is at most one annotation at top level.
The semantics itself is largely straightforward, except for the handling of
annotations. These are moved just as far outwards as necessary in order to
reach a normal form, thereby computing the least “permission” an evaluator
must possess for computing a certain output. Figure 8 shows two rules: a lifting
rule (for applications) and the rule for merging adjacent annotations (see the
supplemental material for the others).
In the remainder of this section we state the standard progress and subject
reduction theorems that ensure that our small-step semantics is compatible with

v ∈ Nf
[E-LiftApp]
(ann (v )) t2 → ann (v t2 )

v ∈ Nf
[E-JoinAnn]
ann1 (ann2 (v )) → ann1 2 (v )

Fig. 8: Small-step semantics (t → t ) (excerpt)

668 F. Thorand and J. Hage

the annotated type system. The following progress theorem demonstrates that
any well-typed term is in normal form, or an evaluation step can be performed.
Theorem 1 (Progress). If ∅ | ∅ te t : τ & ξ, then either t ∈ Nf or there is a
t such that t → t .
The subject reduction property says that the reduction of a well-typed term
results in a term of the same type.
Theorem 2 (Subject Reduction). If ∅ | ∅ te t : τ & ξ and there is a t such
that t → t , then ∅ | ∅ te t : τ & ξ.
As expected, subject reduction extends naturally to a sequence of reductions
by induction on the length of the reduction sequence:
Corollary 1. If we have ∅ | ∅ te t : τ & ξ and t →∗ v , then ∅ | ∅ te v : τ & ξ.
where, as usual, we write t →∗ v if there is a ﬁnite sequence of terms (ti )0in
with t0 = t and tn = v ∈ Nf and reductions (ti → ti+1 )0i<n between them. If
there is no such sequence, this is denoted by t ⇑ and t is said to diverge.
Finally, if a term evaluates to an annotated value, this annotation is com-
patible with the dependency annotation that has been assigned to the term:
Theorem 3 (Semantic Soundness). If we have ∅ | ∅ te t : τ & ξ and t →∗
ann (v ), then ∅ sub ξ.

The noninterference property An important theorem for the safety of pro-

gram transformations/optimizations using the results of dependency analysis is
noninterference. It guarantees that if there is a target term t depending on some
τ &ξ te t :
variable x such that ∅ | x : τ &ξ holds and the dependency annotation

ξ of the variable is not encompassed by the resulting dependency annotation
ξ (i.e. ∅ sub ξ ξ), then t will always evaluate to the same normal form,
regardless the value of x .
Since we are in a non-strict setting, our noninterference property only applies
to the topmost constructors of values. This is because the dependency annota-
tions derived in the annotated type system only provide information about the
evaluation to weak head normal form. Nested terms might possess lower as well
as higher classiﬁcations. In particular, the subterms with greater dependency
annotations than their enclosing constructors prevent us from making a more
general statement because those can still depend on the context whereas the top-
level constructor cannot. In the noninterference theorem presented for the SLam
calculus, this problem is circumvented by restricting the statement to so called
transparent types, where the annotations of nested components are decreasing
when moving further inward [9].
In the following we consider two normal forms v1 , v2 ∈ Nf to be similar, de-
noted v1 v2 , if their top level constructors (and annotations, if present) match
(see the supplemental material for the unsurprising deﬁnition of ). So, v1 v2
implies that these two values are indistinguishable without further evaluation,
which is the property guaranteed by the noninterference theorem.
Higher-Ranked Annotation Polymorphic Dependency Analysis 669

Theorem 4 (Noninterference). Let t be a target term such that ∅ | x : τ &

ξ te t : τ & ξ and ∅ sub ξ ξ. Let v be a value.
If there is a t1 with ∅ | ∅ te t1 : τ & ξ such that [t1 / x ]t →∗ v , then there is
a t such that for all t2 with ∅ | ∅ te t2 : τ & ξ we have [t2 / x ]t →∗ [t2 / x ]t

and [t1 / x ]t [t2 / x ]t .

The noninterference proofs crucially rely on the fact that the source term
is well-typed, and the additional assumption ∅ sub ξ ξ stating that the
dependency annotation of the variable in the context is not encompassed by the
dependency annotation of the term being evaluated.
By introducing the restriction to transparent types, we can recover the no-
tion of noninterference used for the SLam calculus. For example, if we have a
transparent type τ1 ξ1 × τ2 ξ2 & ξ (i.e. ∅ sub ξ1 ξ and ∅ sub ξ2 ξ) and
∅ sub ξ ξ holds, then we also know ∅ sub ξ ξ1 and ∅ sub ξ ξ2 . Other-
wise, we would get ∅ sub ξ ξ by transitivity, contradicting the assumption.
This means all prerequisites of the noninterference theorem are still fulﬁlled.
Hence, it is possible in these cases to apply the noninterference theorem to
the nested (possibly unevaluated) subterms of a constructor in weak head normal
form. As in the work of [1], our noninterference theorem is restricted to deal with
terms depending on exactly one variable.

6 The type reconstruction algorithm

Modularity considerations When designing the type reconstruction algo-
rithm we have two goals: it should be a conservative extension of the underlying
type system, and types assigned by the analysis should be as general as possible.
Concretely, a function’s type must be general enough to be able to adapt to
arguments with arbitrary annotations. These two goals give rise to the notion
of fully flexible and fully parametric types defined by [12]. [16] calls these types
conservative and pattern types respectively. Informally, an annotated type is a
pattern type if it can be instantiated to any conservative type of the same shape
and a conservative type is an analysis of an expression that is able to cope with
any arguments it might depend on. These types are conservative in the sense
that they make the least assumptions about their arguments and therefore are a
conservative estimate compared to other typings with fewer degrees of freedom.
For a pattern type to be instantiable to any conservative type, we first need
to make sure that all dependency annotations occurring in it can be instantiated
to the corresponding dependency terms in a matching conservative type. This
leads to the following definition of a pattern in the λ -calculus. It is based
on the similar definition by [16] which in turn is a special case of a pattern
in higher-order unification theory [4,21]. A λ -term is a pattern if it is of the
form f β1 · · · βn where f is a free variable and β1 , . . . , βn are distinct bound
variables. A unification problem of the form ∀β1 · · · βn .f β1 · · · βn = ξ where
the left-hand side is a pattern is called pattern unification. A pattern unification
problem ∀β1 · · · βn .f β1 · · · βn = ξ has a unique most general solution, namely
the substitution [f → λβ1 . · · · λβn .ξ] [4].
670 F. Thorand and J. Hage

β ∈ αi
[P-Unit]
αi :: καi
p unit & β αi β :: καi ⇒

αi :: καi p τ1 & ξ1 βj :: κβj αi :: καi p τ2 & ξ2 γk :: κγk

[P-Prod]
αi :: καi p τ1 ξ1 × τ2 ξ2 & β αi β :: καi ⇒ , βj :: κβj , γk :: κγk

∅ p τ1 & ξ1 βj :: κβj αi :: καi , βj :: κβj p τ2 & ξ2 γk :: κγk

[P-Arr]
αi :: καi p ∀βj :: κβj .
τ1 ξ1 → τ2 ξ2 & β αi β :: καi ⇒ , γk :: κγk

Fig. 9: Pattern types (Σ p τ & ξ Σ ), where β ∈ αi , βj , γk , and [P-Sum] is like

[P-Prod]

The definition of a pattern is then extended to annotated types using the rules
from figure 9. Our definition is more precise than the one from previous work in
that it makes explicit which variables are expected to be bound and which are
free. We require that all variables with different names in the definition of these
rules are distinct from each other.
An annotated type and depencency pair τ & ξ is a pattern type under the
sort environment Σ if the judgment Σ p τ & ξ Σ holds for some Σ . We call
the variables in Σ argument variables and the variables in Σ pattern variables.

Example 1. A simple pattern type with the pattern variables β :: ⇒ and

β :: ⇒ ⇒ is

∀β1 :: .unitβ
1 → (∀β2 :: .unitβ β1 β2 )β β1
2 → unitβ

Note that since β1 is quantiﬁed on the function arrow chain, it is passed on to the
second function arrow. However, it is not propagated into the second argument.
In general, annotations on the return type may depend on the annotations of all
previous arguments while annotations of the arguments may not. This prevents
any dependency between the annotations of arguments and guarantees that they
are as permissive as possible. This is also why pattern variables in a covariant
position are passed on to the next higher level while pattern variables in argu-
ments are quantiﬁed in the enclosing function arrow. This allows the caller of
a function to instantiate the dependency annotations of the parameters to the
actual arguments.

As we stated earlier, a conservative function type makes the least assumptions

over its arguments. Formally, this means that arguments of conservative func-
tions are pattern types. We will later see that a pattern type can be instantiated
to any conservative type of the same shape. On the other hand, non-functional
conservative types are not constrained in their annotations. These characteris-
tics are captured by the following deﬁnition based on conservative types [16] and
fully ﬂexible types [12].
An annotated type τ is conservative if
Higher-Ranked Annotation Polymorphic Dependency Analysis 671

β fresh
[C-Unit]
αi :: καi
c unit : unit & β αi β :: καi ⇒

αi :: καi c τ1 : τ1 & ξ1 βj :: κβj αi :: καi c τ2 : τ2 & ξ2 γk :: κγk

[C-Prod]
αi :: καi c τ1 × τ2 : τ1 ξ1 × τ2 ξ2 & β αi β :: καi ⇒ , βj :: κβj , γk :: κγk

∅ c τ1 : τ1 & ξ1 βj :: κβj αi :: καi , βj :: κβj c τ2 : τ2 & ξ2 γk :: κγk

[C-Arr]
αi :: καi c τ1 → τ2 : ∀βj :: κβj .
τ1 ξ1 → τ2 ξ2 & β αi β :: καi ⇒ , γk :: κγk

τ &ξ Σ ), all β fresh, [C-Sum] is like [C-Prod]

Fig. 10: Type completion (Σ c τ :

1. τ = unit,
or
2. τ = τ1 ξ1 + τ2 ξ2 and both τ1 and τ2 are conservative, or
3. τ = τ1 ξ1 × τ2 ξ2 and both τ1 and τ2 are conservative, or
4. τ = ∀βj :: κj . τ1 ξ1 → τ2 ξ2 and both (a) ∅ p τ1 & ξ1 βj :: κj and (b) τ2 is
conservative.

Moreover, an annotated type and depencency pair τ & ξ is conservative if τ

is conservative and an annotated type environment Γ is conservative if for all
x ∈ dom(Γ), Γ(x ) is conservative.
The following type signature for the function f is a conservative type that
takes the function type from example 1 as an argument.

f : ∀β :: ⇒ .∀β :: ⇒ ⇒ .∀β3 :: .
(∀β1 :: .unitβ
1 → (∀β2 :: .unitβ β1 β2 )β β1 )β3
2 → unitβ
3 β ⊥ β ⊥ & ⊥
→ unitβ

Note that the pattern variables of the argument have been bound in the
top-level function type. This allows callers of f to instantiate these patterns.
We can extend the previous definition of pattern types to the type completion
relation shown in figure 10. It relates every underlying type τ with a pattern type
τ such that τ erases to τ . It is defined through judgments Σ c τ : τ & ξ Σ with
the meaning that under the sort environment Σ, τ is completed to the annotated
type τ and the dependency annotation ξ containing the pattern variables Σ .
The completion relation can also be interpreted as a function taking Σ and τ as
arguments and returning τ, ξ and Σ .
Lastly, we revisit the examples from the previous sections and show how a
pattern type can be mechanically derived from an underlying type.
In example 1 we presented a pattern type for the underlying type unit →
unit → unit. Using the type completion relation, we can derive the pattern type,

(∀β1 .unitβ
1 → (∀β2 .unitβ β1 β2 )β β1 ) & β3
2 → unitβ

without having to guess. This is because the components τ, ξ and Σ in a judg-
ment Σ c τ : τ & ξ Σ are uniquely determined by Σ and τ from looking at
672 F. Thorand and J. Hage

the syntax alone. The resulting pattern type contains three pattern variables,
β :: ⇒ , β :: ⇒ ⇒ and β3 :: . If the initial sort environment is empty,
these are also the only free variables of the pattern type.
Based on the type completion relation we can define least type completions.
These are conservative types that are subtypes of all other conservative types of
the same shape. Therefore, all annotations occurring in positive positions on the
top level function arrow chain must also be least. We do not need to consider
arguments here because those are by definition equal up to alpha-conversion due
to being pattern types. We define the least annotation term of sort κ as

⊥ = ⊥
⊥κ1 ⇒κ2 = λβ : κ1 .⊥κ2 .

These least annotation terms correspond to the least elements of our bounded
lattice for a given sort κ. This in turn leads us to the deﬁnition of the least
completion of type τ (see ﬁgure 10) by substituting all free variables in the
completion with the least annotation of the corresponding sort, i.e.

⊥τ = [⊥κi / βi ]
τ for ∅ c τ : τ & ξ βi :: κi .

The algorithm We can now move on to the type reconstruction algorithm that
performs the actual analysis. At its core lies algorithm R shown in figure 11.
The input of the algorithm is a triple (Γ, Σ, t) consisting of a well-typed source
term t, an annotated type environment Γ providing the types and dependency
annotations of the free term variables in t and a sort environment Σ mapping
each free annotation variable in scope to its sort. It returns a triple t : τ & ξ
consisting of an elaborated term t in the target language (that erases to the
source term t), an annotated type τ and an dependency annotation ξ such that
Σ | Γ te t : τ & ξ holds. In the definition of R, to avoid clutter, we write Γ
instead of Γ because we are only dealing with one kind of type environment.
The algorithm relies on the invariant that all types in the type environment
and the inferred type must be conservative. In the version of [16], all inferred
dependency annotations (including those nested as annotations in types) had
to be canonically ordered as well. But as it turned out that this canonically
ordered form was not enough for deciding semantic equality, so we lifted this
requirement. We still mark those places in the algorithm where canonicalization
would have occurred with ·· , but the actual result of this operation does not
matter as long as the dependency terms remain equivalent.
The algorithm for computing the least upper bound of types ( in figure 12)
requires that both types are conservative, have the same shape and use the same
names for bound variables. The latter can be ensured by α-conversion while the
former two requirements are fulfilled by how this function is used in R.
The restriction to conservative types allows us to ignore functions arguments
because these are always required to be pattern types, which are unique up to
α-equivalence. This alleviates the need for computing a corresponding greatest
lower bound of types, because the algorithm only traverses covariant positions.
Higher-Ranked Annotation Polymorphic Dependency Analysis 673

× Ty
R : AnnTyEnv × SortEnv × Tm → Tm × AnnTm
R(Γ ; Σ; x ) = x : Γ (x ) τ2 τ3 Σ & ξ1 ξ2 ξ3 Σ
:
R(Γ ; Σ; ()) = () : unit &⊥ R(Γ ; Σ; proji (t)) =
R(Γ ; Σ; ann (t)) = let t : τ1 ξ1 × τ2 ξ2 & ξ = R(Γ ; Σ; t)
let t : τ & ξ = R(Γ ; Σ; t) in proji ( t ) : τi & ξ ξi Σ
in ann ( t ) : τ & ξ Σ R(Γ ; Σ; λx : τ1 .t) =
R(Γ ; Σ; seq t1 t2 ) = let τ1 & β βi :: κi = C([ ]; τ1 )
let t1 : τ1 & ξ1 = R(Γ ; Σ; t1 ) Γ = Γ, x : τ1 & β
t2 : τ2 & ξ2 = R(Γ ; Σ; t2 ) Σ = Σ, βi :: κi
in seq t1 t2 : τ2 & ξ1 ξ2 Σ
t : τ2 & ξ2 = R(Γ ; Σ ; t)
R(Γ ; Σ; (t1 , t2 )) = in Λβi :: κi .λx : τ1 & β. t
let t1 : τ1 & ξ1 = R(Γ ; Σ; t1 ) : ∀βi :: κi .τ1 β → τ2 ξ2 & ⊥
t2 : τ2 & ξ2 = R(Γ ; Σ; t2 ) R(Γ ; Σ; t1 t2 ) =
in (t1 , t2 ) : τ1 ξ1 × τ2 ξ2 & ⊥ let t1 : τ1 & ξ1 = R(Γ ; Σ; t1 )
R(Γ ; Σ; inlτ2 (t)) = t2 : τ2 & ξ2 = R(Γ ; Σ; t2 )
let t : τ1 & ξ1 = R(Γ ; Σ; t) τ2 β → τξ βi = I( τ1 )
in inlτ2 ( t ) : τ1 ξ1 + ⊥τ2 ⊥ & ⊥ θ = [β → ξ2 ] ◦ M([ ]; τ2 ; τ2 )
R(Γ ; Σ; inrτ1 (t)) = in t1 θβi t2 : θ τ Σ & ξ1 θξΣ
let t : τ2 & ξ2 = R(Γ ; Σ; t) R(Γ ; Σ; μx : τ.t) =
in inrτ1 ( t ) : ⊥τ1 ⊥ + τ2 ξ2 & ⊥ do i ; τ0 & ξ0 ← 0; ⊥τ & ⊥
R(Γ ; Σ; case t1 of {inl(x ) → t2 ; repeat ti+1 : τi+1 & ξi+1
inr(y) → t3 }) = ← R(Γ, x : τi & ξi ; Σ; t)
let t1 : τξ + τ ξ & ξ1 = R(Γ ; Σ; t1 ) i ←i +1
t2 : τ2 & ξ2 = R(Γ, x : τ & ξ; Σ; t2 ) until ( τi−1 ≡ τi ∧ ξi−1 ≡ ξi )
t3 : τ3 & ξ3 = R(Γ, y : τ & ξ ; Σ; t3 ) return (μx : τi & ξi . ti ) : τi & ξi
in case t1 of {inl(x ) → t2 ; inr(y) → t3 }

Fig. 11: Type reconstruction algorithm (R)

The handling of λ-abstractions uses the type completion algorithm C of ﬁg-

ure 12, that defers its work to the type completion relation defined earlier which
can be interpreted in a functional way (see figure 10). The underlying type of the
function argument is completed to a pattern type. The function body is analyzed
in the presence of the newly introduced pattern variables. Note that this pattern
type is also conservative, thereby preserving the invariant that the context only
holds conservative types. The inferred annotated type of the lambda abstraction
universally quantifies over all pattern variables and the quantification is reflected
on the term level through annotation abstractions Λβ :: κ.t.
In order to analyze function applications, we need two more auxiliary al-
gorithms. The first one is the instantiation procedure I (see figure 12) which
instantiates all top-level quantifiers with fresh annotation variables. The second
is the matching algorithm M (see figure 12) which instantiates a pattern type
674 F. Thorand and J. Hage

× Ty
: Ty → Ty

unit
unit
= unit

τ1 ξ1 × τ2 ξ2 ) (
( τ1 ξ1 × τ2 ξ2 ) = (τ1 τ1 )ξ1 ξ1 × (τ2 τ2 )ξ2 ξ2
τ1 β → τ2 ξ2 ) (
( τ1 β → τ2 ξ2 ) = τ1 β → ( τ2 τ2 )ξ2 ξ2
(∀β :: κ.τ) (∀β :: κ.τ ) = ∀β :: κ. τ τ

C : SortEnv → Ty × AnnTm × SortEnv

C(Σ; τ ) = τ & ξ βi :: κi where Σ c τ : τ & ξ βi :: κi

→ Ty
I : Ty × SortEnv
τ ) = let τ Σ = I(
I(∀β :: κ. τ ) in [β → β ](
τ ) β :: κ, Σ where β be fresh
I(τ) = τ [ ]

M : SortEnv × Ty × Ty → AnnSubst

M(Σ; unit;
unit) = []
M(Σ; τ1 β βi × τ2 β βi ; τ1 ξ1 × τ2 ξ2 ) =
[β → λβi :: Σ(βi ).ξ1 , β → λβi :: Σ(βi ).ξ2 ] ◦ M(Σ; τ1 ; τ1 ) ◦ M(Σ; τ2 ; τ2 )
M(Σ; τ1 β → τ2 β βi ; τ1 β → τ2 ξ) =
[β → λβi :: Σ(βi ).ξ ] ◦ M(Σ; τ2 ; τ2 )
M(Σ; ∀β :: κ. τ ; ∀β :: κ.τ ) = M(Σ, β :: κ; τ ; τ)

Fig. 12: Least upper bound of types ( ), completion (C), instantiation (I), and
matching (M). Rules for · + · in and M are like those for · × ·.

with a conservative type of the same shape. It returns a substitution obtained

by performing pattern uniﬁcation on corresponding annotations.

Soundness and Completeness An annotated type environment Γ is well-

formed under an environment Σ, if Γ is conservative and for all bindings x : τ & ξ
in Γ we have Σ wft τ and Σ s ξ : .
In order to demonstrate the correctness of the reconstruction algorithm pre-
sented in this section we have to show that for every well-typed underlying term,
it produces an analysis (i.e. annotated types and dependency annotations) that
can be derived in the annotated type system (see ﬁgure 6). That is to say, algo-
rithm R is sound w.r.t. the annotated type system.
Theorem 5. Let t be a source term, Σ a sort environment and Γ an annotated
type environment well-formed under Σ such that R(Γ; Σ; t) = t : τ & ξ for some
t , τ and ξ.
Then, Σ | Γ te t : τ & ξ, Σ wft τ, Σ s ξ : and τ is conservative.
The next step is to show that our analysis succeeds in deriving an annotated
type and dependency annotation for any well-typed source term: it is complete.
Higher-Ranked Annotation Polymorphic Dependency Analysis 675

The crucial part here is the termination of the fixpoint iteration. In order to show
the convergence of the fixpoint iteration, we start by defining an equivalence
relation on annotated type and depencency pairs.
Our type reconstruction algorithm handles polymorphic recursion through
Kleene-Mycroft-iteration. Such an algorithm is based on fixpoint iteration and
needs a way to decide whether two dependency terms are equal according to the
denotational semantics of λ .
A straightforward way to decide semantic equivalence is to enumerate all
possible environments and compare the denotations of the two terms in all of
these (possibly after some semantics preserving normalization). This only works
if the dependency lattice L is finite.
For some analyses, e.g., the set of all program locations in a slicing analysis,
L = V is finite but large, and deciding equality in this fashion becomes imprac-
tical. To alleviate this problem, our prototype implementation applies a partial
canonicalization procedure which, while not complete, can serve as an approxi-
mation of equality: if two canonicalized dependency terms become syntactically
equal, then we can be assured that they are semantically equal, but if they are
not we can still apply the above procedure to the canonicalized dependency
terms. We omit formal details from the paper.
We can now state our completeness results for the type reconstruction al-
gorithm. Here, we write Γ t t : τ to say that term t has type τ under the
environment Γ in the underlying type system.

Theorem 6 (Completeness). Given a source term t, a sort environment Σ,

an annotated type environment Γ well-formed under Σ, and an underlying type
τ such that Γ t t : τ , then there are t , τ and ξ such that R(Γ; Σ; t) = t : τ & ξ
and
τ = τ , t = t.

As a corollary of the foregoing theorems, our analysis is a conservative ex-

tension of the underlying type system.

Corollary 2 (Conservative Extension). Let t be a source term, τ be a type

and Γ a type environment such that Γ t t : τ . Then there are Σ, Γ, t , τ, ξ such
that Σ | Γ te t : τ & ξ with t = t,
τ = τ and Γ = Γ .

7 Implementation and Examples

Beyond the deﬁnition of the annotated system and the development of the associ-
ated algorithm and meta-theory we also have a REPL prototype implementation
of our analysis in Haskell. Compared to the annotated type system in the paper,
the prototype provides support for booleans and integers, including literals and
conditionals if c then t1 else t2 for which the type rules can be straightfor-
wardly derived. Concrete lattice implementations are provided only for binding-
time analysis and security analysis, but the reconstruction algorithm abstracts
away from the choice for a particular lattice, so it is easy to add new instances.
The implementation is available at http://www.staﬀ.science.uu.nl/ ∼hage0101/
676 F. Thorand and J. Hage

prototype-hrp.zip. Below we walk through a few examples, taking advantage of

the slightly extended source language that our implementation supports. More
(detailed) examples are discussed in [26].

Construction and Elimination Whenever something is constructed, be it a

product, a sum or a lambda abstraction, the outermost dependency annotation
is ⊥. This is because the analysis aims to produce the best possible and thereby
least annotations for a given source program.
Consider the case of binding-time analysis, and suppose we have a variable of
function type f :∀β.intβ → intβ&D. We can see that it preserves the annota-
tions of its arguments, i.e. if we apply f to a static value, the return annotation
is also instantiated to be static. The function itself, however, is dynamic. And
therefore, the whole result of the function application must also be dynamic,
because we cannot know which particular function has been assigned to f .
As elimination always introduces a dependency in the program, and this can
uncover subtleties arising when functions only diﬀer in their termination behav-
ior. For example, compare λp : int × int.p with λp : int × int.(proj1 (p), proj2 (p)).
In a call-by-value language, these two functions would be (extensionally) equiv-
alent. However, with non-strict evaluation, p might be a non-terminating com-
putation. In that case, applying the former function would diverge, while the
latter function at least produces the pair constructor. This is also reﬂected in
the annotated types that are inferred. For the former, we get

∀β0 , β1 , β2 :: .(intβ0 × intβ1 )β2 → (intβ0 × intβ1 )β2 & S, and

∀β0 , β1 , β2 :: .(intβ0 × intβ1 )β2 → (intβ0 β2 × intβ1 β2 )S & S

for the latter. In particular, the annotation of the product in the second type
signature is S. Therefore, it can not depend on the input of the function.

Polymorphic Recursion One class of functions where the analysis beneﬁts

from polymorphic recursion are those that permute their arguments on recursive
calls. Our example is a slightly modiﬁed version of an example from [5]:

μf : bool → bool → bool.λx : bool.λy : bool. if x then true else f y x

In an analysis with monomorphic recursion, the analysis assigns the same anno-
tation to both parameters, large enough to accommodate for both arguments.
This is due to the permutation of the arguments in the else branch. An analysis
with polymorphic recursion is allowed to use a diﬀerent instantiation for f in
that case. Our algorithm hence infers the following most general type.

∀β1 :: .boolβ
1 → (∀β2 :: .boolβ
2 → boolβ
1 β2 )⊥ & ⊥

We see that the result of the function indeed depends on the annotations of
both arguments, as both end up in the condition of the if-expression at some
Higher-Ranked Annotation Polymorphic Dependency Analysis 677

point. Yet, both arguments are completely unrestricted, and unrelated in their
annotations. In contrast, a type system with monomorphic recursion would only
admit a weaker type, possibly similar to

∀β1 :: .boolβ
1 → (boolβ
1 → boolβ
1 )⊥ & ⊥

A real world example of this kind is Euclid’s algorithm for computing the
greatest common divisor(see [26]).

Higher-Ranked Polyvariance This section discusses several examples for the

dependency analysis instance of binding time analysis, comparing our outcomes
with a let-polyvariant analysis[29].
A simple example to start with is a function that applies a function to both
components of a pair2

both : (int → int) → int × int → int × int

both = λf : int → int.λp : int × int.(f (proj1 (p)), f (proj2 (p)))

Suppose in the context of binding-time analysis that both is used to apply a

statically known function to a pair whose ﬁrst component is always computable
at compile time, but whose second component is dynamic. For simplicity’s sake,
the function is the identity on integers.

id : int → int
id = λx : int.x

A non-higher-ranked analysis would assign types to both and id . The anno-

tation on the function argument to both must be large enough to accommodate
both components of the pair as input. When we consider the call both id p for
some pair p:intS×intD&S. Then, the whole call has the type intD×intD.
Our higher-ranked analysis infers the following conservative types for id and
both.

id : ∀β :: .intβ → intβ & ⊥

id = Λβ :: .λx : int & β.x
both : ∀β1 :: .∀β2 :: ⇒ .(∀β :: .intβ → intβ2 β)β1
→ (∀β3 , β4 , β5 :: .(intβ3 × intβ4 )β5
→ (intβ2 (β3 β5 ) β1 × intβ2 (β4 β5 ) β1 )S)S & S
both = Λβ1 :: .Λβ2 :: ⇒ .λf : (∀β :: .intβ → intβ2 β).
Λβ3 :: .Λβ4 :: .Λβ5 :: .λp : intβ3 × intβ4 .
(f β3 β5 (proj1 (p)), f β4 β5 (proj2 (p)))

In case of both, the function parameter f can be instantiated separately for each
component because our analysis assigns it a type that universally quantifies over
2
NB. both is a simplified instance of a traversal ∀f .Applicative f ⇒ (Int → f Int) →
(Int, Int) → f (Int, Int), in order to fit the restrictions of the source language [6,15].
678 F. Thorand and J. Hage

the annotation of its argument. It is evident from the type signature that the
components of the resulting pair only depend on the corresponding components
of the input pair, and the function and the input pair itself. They do not depend
on the respective other component of the input.
If we again consider the call both id p, we obtain β2 = λβ :: .β, β1 = β3 =
β5 = S and β4 = D through pattern uniﬁcation. Normalization of the resulting
dependency terms results in the expected return type intS × intD.
The generality provided by the higher-ranked analysis extends to an arbitrar-
ily deep nesting of function arrows. The following example demonstrates this for
two levels of arrows. Functions with more than two levels of arrows can arise
directly in actual programs, but even more so in desugared code, e.g., when type
classes in Haskell are implemented via explicit dictionary passing. Due to limi-
tations of our source language, the examples are syntactically heavily restricted.
Consider the following function that takes a function argument which again
requires a function.

foo : ((int → int) → int) → int × int

foo = λf : (int → int) → int.(f (λx : int.x ), f (λx : int.0))

The higher-ranked analysis infers the following type and target term (where we
omitted the type in the argument of the lambda term because it essentially
repeats what is already visible in the top level type signature).

foo : ∀β4 :: .∀β3 :: ⇒ ( ⇒ ) ⇒ .

(∀β2 :: .∀β1 :: ⇒ .(∀β0 :: .intβ0 → intβ1 β0 )β2
→ intβ3 β2 β1 )β4
→ (intβ3 S (λβ5 :: .β5 ) β4 × intβ3 S (λβ6 :: .S) β4 )S & S
foo = Λβ4 :: .Λβ3 :: ⇒ ( ⇒ ) ⇒ .λf : · · · .
(f S λβ0 :: .β0 (Λβ5 :: .λx : int & β5 .x )
, f S λβ0 :: .S (Λβ6 :: .λx : int & β6 .1))

Since the type of f is a pattern type, the argument to f is also a pattern type by
definition. Therefore, the analysis of f depends on the analysis of the function
passed to it. This gives rise to the higher-order effect operator β3 [12]. Thus, f
can be applied to any function with a conservative type of the right shape. As our
algorithm always infers conservative types, the type of f is as general as possible.
This is reflected in the body of the lambda where in both cases f is instantiated
with the dependency annotation corresponding to the function passed to it. The
result of this instantiation can be observed in the returned product type where
β3 is applied to the effect operators λβ0 :: .β0 and λβ0 :: .S corresponding to
the respective functions used as arguments to f .
Only when we finally apply foo, the resulting annotations can be evaluated.

bar : ∀α2 :: .∀α1 :: ⇒ .(∀α0 :: .intα0 → intα1 α0 )α2

→ intα1 D α2 & S
bar = Λα2 :: .Λα1 :: ⇒ .λf : · · · .f (annD (0))
Higher-Ranked Annotation Polymorphic Dependency Analysis 679

For bar we obtain foo bar : intD × intS & S. In this case, β3 = λβ2 ::
.λβ1 :: ⇒ .β1 D β2 , because bar applies its argument to a value with
dynamic binding time. This causes the ﬁrst component of the returned pair to
be deemed dynamic as well. On the other hand, in the second component bar
is applied to a constant function. Thus, regardless of the argument’s dynamic
binding time, the resulting binding time is static. In a rank-1 system we would
get intD × intD instead of intD × intS.

8 Related Work
The basis for most type systems of functional programming languages is the
Hindley-Milner type system [22]. Our algorithm R strongly resembles the well-
known type inference algorithm for the Hindley-Milner type system, Algorithm
W [3], a distinct advantage of our approach. The idea to define an annotated
type system as a means to design static analyses for higher-order languages is
attributed to [19]. The major technical difference compared to a let-polyvariant
analysis is that our annotations form a simply typed lambda-calculus.
Full reconstruction for a higher-ranked polyvariant annotated type system
was first considered by [12] in the context of a control-flow analysis. However,
we found that the (constraint-based) algorithm as presented in [12] generates
constraints free of cycles. Therefore, it cannot faithfully reflect the constraints
necessary for the fixpoint combinator. The algorithm incorrectly concludes for
the following example that only the first and third ‘False‘ term flow into the
condition x , but not the second one.

(ﬁx (λf . λx . λy. λz . if x then True else f z x y)) False False False

We reproduced this mistake with their implementation and verified that the
mistake was not a simple bug in that implementation.
Close to our formulation is the (unpublished) work of [16] which deals with
exception analysis, which uses a simply typed lambda-calculus with sets to repre-
sent annotations. We have chosen a more modular approach in which we offload
much of the complexity of dealing with lattice values to the lattice. In [16] terms
from the simply typed lambda-calculus with sets are canonicalized and then
checked for alpha equivalence during Kleene-Mycroft iteration. We found how-
ever that two terms can have different canonical forms even though they are
actually semantically equivalent. This causes Koot’s reconstruction algorithm
to diverge on a particular class of programs, because the inferred annotations
continue to grow. The simplest such program we found is the following.

μf : (unit → unit) → unit → unit.λg : unit → unit.λx : unit.g (f g x )

Our solution is to apply canonicalization to simplify terms as much as pos-

sible, and then compare the outcomes for all possible inputs.
The Dependency Core Calculus was introduced by [1] as a unifying frame-
work for dependency analyses. Instances include binding-time analysis (see, e.g.,
680 F. Thorand and J. Hage

[29]), exception analysis [17,16], secure information flow analysis [9] and static
slicing [27]. They devised the Dependency Core Calculus (DCC) to which each
instance of a dependency analysis can be mapped. This allowed them to compare
different dependency analyses, uncover problems with existing instance analy-
ses and to simplify proofs of noninterference [8,20]. The instance analyses in
[1] were defined as a monovariant type and effect system with subtyping, for a
monomorphic call-by-name language. An implicit, let-polymorphic implementa-
tion of DCC, FlowCaml, was developed by [25]. It is not higher-ranked.
The difference between DCC and our analysis is to a large extent a different
focus: the DCC is a calculus defined in a way that any calculus that elaborates
to DCC has the noninterference property and any other properties proven for
the calculus. On the other hand, our analysis is meant to be implemented in a
compiler (with the added precision), and that implementation (and its associated
meta-theory) can then be reused inside the compiler for a variety of analyses.
Comparable to DCC, we have proven a noninterference property for our generic
higher-rank polyvariant dependency analysis, so that all its instances inherit it.
The Haskell community supports an implementation of DCC in which the
(security) annotations are lifted to the Haskell type level [2]. Since the GHC
compiler supports higher-rank types, the code written with this library can in
fact model security flows with higher-rank. Because of the general undecidability
of full reconstruction for higher-rank types [14], the programmer must however
provide explicit type information. In [18], the authors introduce dependent flow
types, that allows them to express a large variety of security policies. An essential
difference with our work is that our approach is fully automated.
Early on in our research, we observed that the approach of [11] may lead to
similar precision gains as higher-ranked annotations do. Since they deal with a
different analysis, a direct comparison is impossible to make at this time.

9 Conclusion and Future Work

We have defined a higher-rank annotation polymorphic type system for a generic
dependency analysis, established its soundness and provided a sound and com-
plete reconstruction algorithm. Examples show that we can achieve higher pre-
cision than plain let-polyvariance. The analysis we have defined is for a call-by-
name language. We expect the results to hold as well for a lazy language, but
chose call-by-name for reduced bookkeeping in the proofs. We also believe the
analysis can be adapted relatively easily to one for a call-by-value language, by
letting the annotation on the argument flow into the effect of the call. However,
we would need to re-examine the metatheory.
In future work we want to consider whether we can further refine the canon-
icalization of λ terms so that syntactic equality up to alpha-equivalence can
completely replace our current approach.

Acknowledgments We acknowledge the contributions of Ruud Koot in un-

published work that made this work possible.
Higher-Ranked Annotation Polymorphic Dependency Analysis 681

References
1. Abadi, M., Banerjee, A., Heintze, N., Riecke, J.G.: A core calculus of dependency.
In: Proceedings of the 26th ACM SIGPLAN-SIGACT symposium on Principles of
programming languages - POPL '99. Association for Computing Machinery (ACM)
(1999). https://doi.org/10.1145/292540.292555
2. Algehed, M., Russo, A.: Encoding dcc in haskell. In: Proceedings of the 2017 Work-
shop on Programming Languages and Analysis for Security. pp. 77–89. PLAS ’17,
ACM, New York, NY, USA (2017). https://doi.org/10.1145/3139337.3139338
3. Damas, L., Milner, R.: Principal type-schemes for functional programs. In: Pro-
ceedings of the 9th ACM SIGPLAN-SIGACT symposium on Principles of pro-
gramming languages - POPL '82. Association for Computing Machinery (ACM)
(1982). https://doi.org/10.1145/582153.582176
4. Dowek, G.: Handbook of automated reasoning. chap. Higher-order Unification and
Matching, pp. 1009–1062. Elsevier Science Publishers B. V., Amsterdam, The
Netherlands (2001), http://dl.acm.org/citation.cfm?id=778522.778525
5. Dussart, D., Henglein, F., Mossin, C.: Polymorphic recursion and subtype qualifi-
cations: Polymorphic binding-time analysis in polynomial time. In: Static Analysis,
pp. 118–135. Springer Nature (1995). https://doi.org/10.1007/3-540-60360-3 36
6. Foster, J.N., Greenwald, M.B., Moore, J.T., Pierce, B.C., Schmitt, A.: Com-
binators for bidirectional tree transformations: A linguistic approach to the
view-update problem. ACM Trans. Program. Lang. Syst. 29(3) (May 2007).
https://doi.org/10.1145/1232420.1232424
7. Glynn, K., Stuckey, P.J., Sulzmann, M., Söndergaard, H.: Boolean constraints for
binding-time analysis. In: PADO ’01: Proceedings of the Second Symposium on
Programs as Data Objects. pp. 39–62. Springer-Verlag, London, UK (2001)
8. Goguen, J.A., Meseguer, J.: Security policies and security models. In:
1982 IEEE Symposium on Security and Privacy. pp. 11–11 (April 1982).
https://doi.org/10.1109/SP.1982.10014
9. Heintze, N., Riecke, J.G.: The SLam calculus. In: Proceedings of the 25th
ACM SIGPLAN-SIGACT symposium on Principles of programming lan-
guages - POPL '98. Association for Computing Machinery (ACM) (1998).
https://doi.org/10.1145/268946.268976
10. Henglein, F.: Type inference with polymorphic recursion. ACM Transac-
tions on Programming Languages and Systems 15(2), 253–289 (4 1993).
https://doi.org/10.1145/169701.169692
11. Hoffmann, J., Das, A., Weng, S.C.: Towards automatic resource bound analysis for
ocaml. In: Proceedings of the 44th ACM SIGPLAN Symposium on Principles of
Programming Languages. pp. 359–373. POPL 2017, ACM, New York, NY, USA
(2017). https://doi.org/10.1145/3009837.3009842
12. Holdermans, S., Hage, J.: Polyvariant flow analysis with higher-ranked
polymorphic types and higher-order effect operators. In: Proceedings of
the 15th ACM SIGPLAN international conference on Functional program-
ming - ICFP '10. Association for Computing Machinery (ACM) (2010).
https://doi.org/10.1145/1863543.1863554
13. Jones, S.P., Vytiniotis, D., Weirich, S., Shields, M.: Practical type inference for
arbitrary-rank types. Journal of Functional Programming 17(1), 1–82 (2007).
https://doi.org/http://dx.doi.org/10.1017/S0956796806006034
14. Kfoury, A., Tiuryn, J.: Type reconstruction in finite rank fragments of the
second-order λ-calculus. Information and Computation 98(2), 228–257 (6 1992).
https://doi.org/10.1016/0890-5401(92)90020-g
682 F. Thorand and J. Hage

15. Kmett, E.: The lens library (2018), http://lens.github.io/, consulted 9/7/2018
16. Koot, R.: Higher-ranked exception types (2015), https://github.com/ruudkoot/
phd/tree/master/higher-ranked-exception-types, accessed 2018-03-09
17. Koot, R., Hage, J.: Type-based exception analysis for non-strict higher-
order functional languages with imprecise exception semantics. In: Proceed-
ings of the 2015 Workshop on Partial Evaluation and Program Manipu-
lation - PEPM '15. Association for Computing Machinery (ACM) (2015).
https://doi.org/10.1145/2678015.2682542
18. Lourenço, L., Caires, L.: Dependent information flow types. In: Proceedings of the
42Nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Program-
ming Languages. pp. 317–328. POPL ’15, ACM, New York, NY, USA (2015).
https://doi.org/10.1145/2676726.2676994
19. Lucassen, J.M., Gifford, D.K.: Polymorphic effect systems. In: POPL ’88:
Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles
of programming languages. pp. 47–57. ACM, New York, NY, USA (1988).
https://doi.org/http://doi.acm.org/10.1145/73560.73564
20. McLean, J.: Security Models. Wiley Press (1994).
https://doi.org/10.1002/0471028959
21. Miller, D.: A logic programming language with lambda-abstraction, function vari-
ables, and simple unification. In: Extensions of Logic Programming, pp. 253–281.
Springer Nature (1991). https://doi.org/10.1007/bfb0038698
22. Milner, R.: A theory of type polymorphism in programming. Journal of Computer
and System Sciences 17(3), 348–375 (12 1978). https://doi.org/10.1016/0022-
0000(78)90014-4
23. Mycroft, A.: Polymorphic type schemes and recursive definitions. In: Lec-
ture Notes in Computer Science, pp. 217–228. Springer Nature (1984).
https://doi.org/10.1007/3-540-12925-1 41
24. Nielson, F., Nielson, H., Hankin, C.: Principles of Program Analysis. Springer
Verlag, second printing edn. (2005)
25. Pottier, F., Simonet, V.: Information flow inference for ml. ACM Trans. Program.
Lang. Syst. 25(1), 117–158 (Jan 2003). https://doi.org/10.1145/596980.596983
26. Thorand, F., Hage, J.: Addendum with proofs, definitions and examples for the
esop 2020 paper, higher-ranked annotation polymorphic dependency analysis, http:
//www.staff.science.uu.nl/∼hage0101/downloads/hrp-addendum.pdf
27. Tip, F.: A survey of program slicing techniques. Tech. rep., Amsterdam, The
Netherlands, The Netherlands (1994)
28. Wansbrough, K., Jones, S.P.: Once upon a polymorphic type. In: Proceedings
of the 26th ACM SIGPLAN-SIGACT symposium on Principles of programming
languages - POPL '99. Association for Computing Machinery (ACM) (1999).
https://doi.org/10.1145/292540.292545
29. Zhang, G.: Binding-Time Analysis: Subtyping versus Subeffecting. Msc thesis
(2008), http://people.cs.uu.nl/jur/downloads/guangyuzhang-msc.pdf
Higher-Ranked Annotation Polymorphic Dependency Analysis 683

John Toman1 , Ren Siqi1 , Kohei Suenaga1 ,

Atsushi Igarashi1 , and Naoki Kobayashi2
1
Kyoto University, Kyoto, Japan,
{jtoman,shiki,ksuenaga,igarashi}@fos.kuis.kyoto-u.ac.jp
2
The University of Tokyo, Tokyo, Japan, [email protected]

Abstract. We present ConSORT, a type system for safety veriﬁcation

in the presence of mutability and aliasing. Mutability requires strong
updates to model changing invariants during program execution, but
aliasing between pointers makes it difficult to determine which invariants
must be updated in response to mutation. Our type system addresses
this difficulty with a novel combination of refinement types and fractional
ownership types. Fractional ownership types provide flow-sensitive and
precise aliasing information for reference variables. ConSORT interprets
this ownership information to soundly handle strong updates of potentially
aliased references. We have proved ConSORT sound and implemented a
prototype, fully automated inference tool. We evaluated our tool and found
it verifies non-trivial programs including data structure implementations.

Keywords: reﬁnement types, mutable references, aliasing, strong up-

dates, fractional ownerships, program veriﬁcation, type systems

1 Introduction
Driven by the increasing power of automated theorem provers and recent high-
profile software failures, fully automated program verification has seen a surge
of interest in recent years [5, 10, 15, 29, 38, 66]. In particular, refinement types
[9, 21, 24, 65], which refine base types with logical predicates, have been shown to
be a practical approach for program verification that are amenable to (sometimes
full) automation [47, 61, 62, 63]. Despite promising advances [26, 32, 46], the sound
and precise application of refinement types (and program verification in general)
in settings with mutability and aliasing (e.g., Java, Ruby, etc.) remains difficult.
One of the major challenges is how to precisely and soundly support strong
updates for the invariants on memory cells. In a setting with mutability, a single
invariant may not necessarily hold throughout the lifetime of a memory cell; while
the program mutates the memory the invariant may change or evolve. To model
these changes, a program verifier must support different, incompatible invariants
which hold at different points during program execution. Further, precise program
verification requires supporting different invariants on distinct pieces of memory.
c The Author(s) 2020
P. Müller (Ed.): ESOP 2020, LNCS 12075, pp. 684–714, 2020.
https://doi.org/10.1007/978-3-030-44914-8_ 25
ConSORT: Context- and Flow-Sensitive Ownership Refinement Types 685

1 mk(n) { mkref n } 1 loop(a, b) {

2 let aold = *a in
3 let p = mk(3) in 3 b := *b + 1;
4 let q = mk(5) in 4 a := *a + 1;
5 p := *p + 1; 5 assert(*a = aold + 1);
6 q := *q + 1; 6 if then
7 assert(*p = 4); 7 loop(b, mkref )
8 else
9 loop(b,a)
Fig. 1. Example demonstrating the dif-
10 }
ﬁculty of eﬀecting strong updates in the
11 loop(mkref , mkref )
presence of aliasing. The function mk is
bound in the program from lines 3 to 7;
its body is given within the braces. Fig. 2. Example with non-trivial alias-
ing behavior.

One solution is to use refinement types on the static program names (i.e.,
variables) which point to a memory location. This approach can model evolving
invariants while tracking distinct invariants for each memory cell. For example,
consider the (contrived) example in Figure 1. This program is written in an ML-
like language with mutable references; references are updated with := and allo-
cated with mkref. Variable p can initially be given the type {ν : int | ν = 3} ref ,
indicating it is a reference to the integer 3. Similarly, q can be given the type
{ν : int | ν = 5} ref . We can model the mutation of p’s memory on line 5 by
strongly updating p’s type to {ν : int | ν = 4} ref .
Unfortunately, the precise application of this technique is confounded by the
existence of unrestricted aliasing. In general, updating just the type of the mutated
reference is insufficient: due to aliasing, other variables may point to the mutated
memory and their refinements must be updated as well. However, in the presence
of conditional, may aliasing, it is impossible to strongly update the refinements on
all possible aliases; given the static uncertainty about whether a variable points to
the mutated memory, that variable’s refinement may only be weakly updated. For
example, suppose we used a simple alias analysis that imprecisely (but soundly)
concluded all references allocated at the same program point might alias. Variables
p and q share the allocation site on line 1, so on line 5 we would have to weakly
update q’s type to {ν : int | ν = 4 ∨ ν = 5}, indicating it may hold either 4 or
5. Under this same imprecise aliasing assumption, we would also have to weakly
update p’s type on line 6, preventing the verification of the example program.
Given the precision loss associated with weak updates, it is critical that
verification techniques built upon refinement types use precise aliasing information
and avoid spuriously applied weak updates. Although it is relatively simple to
conclude that p and q do not alias in Figure 1, consider the example in Figure 2.
(In this example, represents non-deterministic values.) Verifying this program
requires proving a and b never alias at the writes on lines 3 and 4. In fact, a
and b may point to the same memory location, but only in different invocations
of loop; this pattern may confound even sophisticated symbolic alias analyses.
686 J. Toman et al.

Additionally, a and b share an allocation site on line 7, so an approach based on

the simple alias analysis described above will also fail on this example. This must-
not alias proof obligation can be discharged with existing techniques [53, 54], but
requires an expensive, on-demand, interprocedural, flow-sensitive alias analysis.
This paper presents ConSORT (CONtext Sensitive Ownership Refinement
Types), a type system for the automated verification of program safety in imper-
ative languages with mutability and aliasing. ConSORT is built upon the novel
combination of refinement types and fractional ownership types [55, 56]. Frac-
tional ownership types extend pointer types with a rational number in the range
[0, 1] called an ownership. These ownerships encapsulate the permission of the
reference; only references with ownership 1 may be used for mutation. Fractional
ownership types also obey the following key invariant: any references with a mu-
table alias must have ownership 0. Thus, any reference with non-zero ownership
cannot be an alias of a reference with ownership 1. In other words, ownerships
encode precise aliasing information in the form of must-not aliasing relationships.
To understand the benefit of this approach, let us return to Figure 1. As mk
returns a freshly allocated reference with no aliases, its type indicates it returns a
reference with ownership 1. Thus, our type system can initially give p and q types
{ν : int | ν = 3} ref 1 and {ν : int | ν = 5} ref 1 respectively. The ownership 1 on
the reference type constructor ref indicates both pointers hold “exclusive” own-
ership of the pointed to reference cell; from the invariant of fractional ownership
types p and q must not alias. The types of both references can be strongly up-
dated without requiring spurious weak updates. As a result, at the assertion state-
ment on line 7, p has type {ν : int | ν = 4} ref 1 expressing the required invariant.
Our type system can also verify the example in Figure 2 without expensive
side analyses. As a and b are both mutated, they must both have ownership 1;
i.e., they cannot alias. This pre-condition is satisfied by all invocations of loop;
on line 7, b has ownership 1 (from the argument type), and the newly allocated
reference must also have ownership 1. Similarly, both arguments on line 9 have
ownership 1 (from the assumed ownership on the argument types).
Ownerships behave linearly; they cannot be duplicated, only split when aliases
are created. This linear behavior preserves the critical ownership invariant. For
example, if we replace line 9 in Figure 2 with loop(b,b), the program becomes
ill-typed; there is no way to divide b’s ownership of 1 to into two ownerships of 1.
Ownerships also obviate updating refinement information of aliases at muta-
tion. ConSORT ensures that only the trivial refinement is used in reference
types with ownership 0, i.e., mutably-aliased references. When memory is mu-
tated through a reference with ownership 1, ConSORT simply updates the refine-
ment of the mutated reference variable. From the soundness of ownership types,
all aliases have ownership 0 and must therefore only contain the refinement.
Thus, the types of all aliases already soundly describe all possible contents.3
ConSORT is also context-sensitive, and can use different summaries of func-
tion behavior at different points in the program. For example, consider the variant
3
This assumption holds only if updates do not change simple types, a condition our
type-system enforces.
ConSORT: Context- and Flow-Sensitive Ownership Refinement Types 687

of Figure 1 shown in Figure 3. The function get returns

1 get(p) { *p } the contents of its argument, and is called on lines 5
and 6. To precisely verify this program, on line 5 get
3 let p = mkref 3 in
must be typed as a function that takes a reference to
4 let q = mkref 5 in
3 and returns 3. Similarly, on line 6 get must be typed
5 p := get(p) + 1;
as a function that takes a reference to 5 and returns
6 q := get(q) + 1;
5. Our type system can give get a function type that
7 assert(*p = 4);
distinguishes between these two calling contexts and
8 assert(*q = 6);
selects the appropriate summary of get’s behavior.
We have formalized ConSORT as a type system
Fig. 3. Example of for a small imperative calculus and proved the system
context-sensitivity is sound: i.e., a well-typed program never encounters as-
sertion failures during execution. We have implemented
a prototype type inference tool targeting this impera-
tive language and found it can automatically verify several non-trivial programs,
including sorted lists and an array list data structure.
The rest of this paper is organized as follows. Section 2 deﬁnes the imperative
language targeted by ConSORT and its semantics. Section 3 deﬁnes our type
system and states our soundness theorem. Section 4 sketches our implementa-
tion’s inference algorithm and its current limitations. Section 5 describes an eval-
uation of our prototype, Section 6 outlines related work, and Section 7 concludes.

2 Target Language
This section describes a simple imperative language with mutable references and
ﬁrst-order, recursive functions.

2.1 Syntax
We assume a set of variables, ranged over by x, y, z, . . . , a set of function names,
ranged over by f , and a set of labels, ranged over by 1 , 2 , . . . . The grammar of
the language is as follows.

d ::= f → (x1 , ... , xn )e

e ::= x | let x = y in e | let x = n in e | ifz x then e1 else e2
| let x = mkref y in e | let x = ∗y in e | let x = f (y1 , . . . , yn ) in e
| x : = y ; e | alias(x = y) ; e | alias(x = ∗y) ; e | assert(ϕ) ; e | e1 ; e2
P ::= {d1 , ... , dn }, e

ϕ stands for a formula in propositional ﬁrst-order logic over variables, integers

and contexts; we discuss these formulas later in Section 3.1.
Variables are introduced by function parameters or let bindings. Like ML, the
variable bindings introduced by let expressions and parameters are immutable.
Mutable variable declarations such as int x = 1; in C are achieved in our lan-
guage with:
let y = 1 in(let x = mkref y in . . .) .
688 J. Toman et al.

As a convenience, we assume all variable names introduced with let bindings and
function parameters are distinct.
Unlike ML (and like C or Java) we do not allow general expressions on the
right hand side of let bindings. The simplest right hand forms are a variable y or
an integer literal n. mkref y creates a reference cell with value y, and ∗y accesses
the contents of reference y. For simplicity, we do not include an explicit null value;
an extension to support null is discussed in Section 4. Function calls must occur
on the right hand side of a variable binding and take the form f (x1 , . . . , xn ),
where x1 , . . . , xn are distinct variables and is a (unique) label. These labels are
used to make our type system context-sensitive as discussed in Section 3.3.
The single base case for expressions is a single variable. If the variable
expression is executed in a tail position of a function, then the value of that
variable is the return value of the function, otherwise the value is ignored.
The only intraprocedural control-flow operations in our language are if state-
ments. ifz checks whether the condition variable x equals zero and chooses the
corresponding branch. Loops can be implemented with recursive functions and
we do not include them explicitly in our formalism.
Our grammar requires that side-effecting, result-free statements, assert(ϕ)
alias(x = y), alias(x = ∗y) and assignment x := y are followed by a continu-
ation expression. We impose this requirement for technical reasons to ease our
formal presentation; this requirement does not reduce expressiveness as dummy
continuations can be inserted as needed. The assert(ϕ) ; e form executes e if
the predicate ϕ holds in the current state and aborts the program otherwise.
alias(x = y) ; e and alias(x = ∗y) ; e assert a must-aliasing relationship between
x and y (resp. x and ∗y) and then execute e. alias statements are effectively an-
notations that our type system exploits to gain added precision. x : = y ; e updates
the contents of the memory cell pointed to by x with the value of y. In addition
to the above continuations, our language supports general sequencing with e1 ; e2 .
A program is a pair D, e, where D = {d1 , ... , dn } is a set of first-order,
mutually recursive function definitions, and e is the program entry point. A
function definition d maps the function name to a tuple of argument names
x1 , ... , xn that are bound within the function body e.

Paper Syntax. In the remainder of the paper, we will write programs that are
technically illegal according to our grammar, but can be easily “de-sugared” into
an equivalent, valid program. For example, we will write
let x = mkref 4 in assert(*x = 4)
as syntactic sugar for:
let f = 4 in let x = mkref f in
let tmp = *x in assert(tmp = 4); let dummy = 0 in dummy

2.2 Operational Semantics

We now introduce the operational semantics for our language. We assume a
ﬁnite domain of heap addresses Addr: we denote an arbitrary address with a.
ConSORT: Context- and Flow-Sensitive Ownership Reﬁnement Types 689

, x
H , R, F : F −→D , F [x ]
H , R, F , E [x ; e]
H , R, F : F −→D , E [e]
H , R, F
(R-Var) (R-Seq)

x ∈ dom(R) x ∈ dom(R)

H , R, F , E [let x = y in e]
H , R, F , E [let x = n in e]

−→D H , R{x → R(y)}, F , E [[x /x ]e] −→D H , R{x → n}, F , E [[x /x ]e]
(R-Let) (R-LetInt)

R(x ) = 0 R(x ) = 0

, E [ifz x then e1 else e2 ]
H , R, F , E [ifz x then e1 else e2 ]
H , R, F

−→D H , R, F , E [e1 ] −→D H , R, F , E [e2 ]
(R-IfTrue) (R-IfFalse)

a ∈ dom(H ) x ∈ dom(R) R(y) = a H (a) = v x ∈ dom(R)

, E [let x = mkref y in e] −→D
H , R, F , E [let x = ∗y in e] −→D
H , R, F

H {a → R(y)}, R{x → a}, F , E [[x /x ]e] , E [[x /x ]e]
H , R{x → v }, F
(R-MkRef) (R-Deref)

Fig. 4. Transition Rules (1).

A runtime state is represented by a conﬁguration H , R, F , e , which consists

of a heap, register ﬁle, stack, and currently reducing expression respectively.

The register file maps variables to runtime values v, which are either integers n
or addresses a. The heap maps a finite subset of addresses to runtime values.
The runtime stack represents pending function calls as a sequence of return
contexts, which we describe below. While the final configuration component is an
expression, the rewriting rules are defined in terms of E [e], which is an evaluation
context E and redex e, as is standard. The grammar for evaluation contexts is
defined by: E ::= E ; e | [].
Our operational semantics is given in Figures 4 and 5. We write dom(H ) to
indicate the domain of a function and H {a → v} where a ∈ dom(H ) to denote a
map which takes all values in dom(H) to their values in H and which additionally
takes a to v. We will write H {a ← v} where a ∈ dom(H ) to denote a map
equivalent to H except that a takes value v. We use similar notation for dom(R)
and R{x → v}. We also write ∅ for the empty register file and heap. The step
relation −→D is parameterized by a set of function definitions D; a program D, e
is executed by stepping the initial configuration ∅, ∅, ·, e according to −→D .
The semantics is mostly standard; we highlight some important points below.
Return contexts F take the form E [let y = [] in e]. A return context repre-
sents a pending function call with label , and indicates that y should be bound to
the return value of the callee during the execution of e within the larger execution
context E . The call stack F is a sequence of these contexts, with the first such re-
turn context representing the most recent function call. The stack grows at func-
tion calls as described by rule R-Call. For a call E [let x = f (y1 , . . . , yn ) in e]
where f is defined as (x1 , ... , xn )e , the return context E [let y = [] in e] is
690 J. Toman et al.

f → (x1 , .. , xn )e ∈ D R(x ) = a a ∈ dom(H )

, E [let x = f (y1 , . . . , yn ) in e ]
H , R, F , E [x : = y ; e] −→D
H , R, F

−→D H , R, E [let x = [] in e ] : F , [y1 /x1 ] · · · [yn /xn ]e H {a ← R(y)}, R, F , E [e]
(R-Call) (R-Assign)

R(x ) = R(y)
R(y) = a H (a) = R(x )

H , R, F , E [alias(x = y) ; e]
, E [alias(x = ∗y) ; e]
H , R, F , E [e]
−→D H , R, F
−→D H , R, F , E [e]
(R-AliasPtr)
(R-Alias)

R(x ) = R(y) R(x ) = H (R(y))

, E [alias(x = y) ; e]
H , R, F −→D AliasFail , E [alias(x = ∗y) ; e]
H , R, F −→D AliasFail
(R-AliasFail) (R-AliasPtrFail)

|= [R] ϕ
|= [R] ϕ
, E [assert(ϕ) ; e]
H , R, F
, E [assert(ϕ) ; e]
H , R, F −→D AssertFail
−→D H , R, F , E [e]
(R-AssertFail)
(R-Assert)

Fig. 5. Transition Rules (2).

prepended onto the stack of the input configuration. The substitution of formal
arguments for parameters in e , denoted by [y1 /x1 ] · · · [yn /xn ]e , becomes the
currently reducing expression in the output configuration. Function returns are
handled by R-Var. Our semantics return values by name; when the currently
executing function fully reduces to a single variable x, x is substituted into the
return context on the top of the stack, denoted by E [let y = [] in e][x].
In the rules R-Assert we write |= [R] ϕ to mean that the formula yielded
by substituting the concrete values in R for the variables in ϕ is valid within
some chosen logic (see Section 3.1); in R-AssertFail we write |= [R] ϕ when
the formula is not valid. The substitution operation [R] ϕ is defined inductively
as [∅] ϕ = ϕ, [R{x → n}] ϕ = [R] [n/x ]ϕ, [R{x → a}] ϕ = [R] ϕ. In the case of an
assertion failure, the semantics steps to a distinguished configuration AssertFail.
The goal of our type system is to show that no execution of a well-typed program
may reach this configuration. The alias form checks whether the two references
actually alias; i.e., if the must-alias assertion provided by the programmer is
correct. If not, our semantics steps to the distinguished AliasFail configuration.
Our type system does not guarantee that AliasFail is unreachable; aliasing
assertions are effectively trusted annotations that are assumed to hold.
In order to avoid duplicate variable names in our register file due to recursive
functions, we refresh the bound variable x in a let expression to x . Take expression
let x = y in e as an example; we substitute a fresh variable x for x in e, then bind
x to the value of variable y. We assume this refreshing of variables preserves our
assumption that all variable bindings introduced with let and function parameters
are unique, i.e. x does not overlap with variable names that occur in the program.
ConSORT: Context- and Flow-Sensitive Ownership Refinement Types 691

Types τ ::= {ν : int | ϕ} | τ ref r Function Types σ ::= ∀λ. x1 : τ1 , . . . , xn : τn

Fig. 6. Syntax of types, reﬁnements, and contexts.

3 Typing

We now introduce a fractional ownership reﬁnement type system that guarantees

well-typed programs do not encounter assertion failures.

3.1 Types and Contexts

The syntax of types is given in Figure 6. Our type system has two type con-
structors: references and integers. τ ref r is the type of a (non-null) reference to a
value of type τ . r is an ownership which is a rational number in the range [0, 1].
An ownership of 0 indicates a reference that cannot be written, and for which
there may exist a mutable alias. By contrast, 1 indicates a pointer with exclusive
ownership that can be read and written. Reference types with ownership values
between these two extremes indicate a pointer that is readable but not writable,
and for which no mutable aliases exist. ConSORT ensures that these invariants
hold while aliases are created and destroyed during execution.
Integers are refined with a predicate ϕ. The language of predicates is built using
the standard logical connectives of first-order logic, with (in)equality between
variables and integers, and atomic predicate symbols φ as the basic atoms. We
include a special “value” variable ν representing the value being refined by the
predicate. For simplicity, we omit the connectives ϕ1 ∧ ϕ2 and ϕ1 =⇒ ϕ2 ; they
can be written as derived forms using the given connectives. We do not fix a
particular theory from which φ are drawn, provided a sound (but not necessarily
complete) decision procedure exists. CP are context predicates, which are used
for context sensitivity as explained below.

Example 1. {ν : int | ν > 0} is the type of strictly positive integers. The type
of immutable references to integers exactly equal to 3 can be expressed by
{ν : int | ν = 3} ref 0.5 .

As is standard, we denote a type environment with Γ , which is a ﬁnite map

from variable names to type τ . We write Γ [x : τ ] to denote a type environment
Γ such that Γ (x ) = τ where x ∈ dom(Γ ), Γ, x : τ to indicate the extension of
Γ with the type binding x : τ , and Γ [x ← τ ] to indicate the type environment
Γ with the binding of x updated to τ . We write the empty environment as
692 J. Toman et al.

•. The treatment of type environments as mappings instead of sequences in a

dependent type system is somewhat non-standard. The standard formulation
based on ordered sequences of bindings and its corresponding well-formedness
condition did not easily admit variables with mutually dependent reﬁnements
as introduced by our function types (see below). We therefore use an unordered
environment and relax well-formedness to ignore variable binding order.

Function Types, Contexts, and Context Polymorphism. Our type system achieves
context sensitivity by allowing function types to depend on where a function is
called, i.e., the execution context of the function invocation. Our system represents
a concrete execution contexts with strings of call site labels (or just “call strings”),
defined by ::= | : . As is standard (e.g., [49, 50]), the string : abstracts an
execution context where the most recent, active function call occurred at call site
which itself was executed in a context abstracted by ; is the context under
which program execution begins. Context variables, drawn from a finite domain
CVar and ranged over by λ1 , λ2 , . . ., represent arbitrary, unknown contexts.
A function type takes the form ∀λ. x1 : τ1 , . . . , xn : τn → x1 : τ1 , . . . , xn : τn | τ .
The arguments of a function are an n-ary tuple of types τi . To model side-effects on
arguments, the function type includes the same number of output types τi . In ad-
dition, function types have a direct return type τ . The argument and output types
are given names: refinements within the function type may refer to these names.
Function types in our language are context polymorphic, expressed by universal
quantification “∀λ.” over a context variable. Intuitively, this context variable repre-
sents the many different execution contexts under which a function may be called.
Argument and return types may depend on this context variable by including
context query predicates in their refinements. A context query predicate CP
usually takes the form λ, and is true iff is a prefix of the concrete context
represented by λ. Intuitively, a refinement λ =⇒ ϕ states that ϕ holds in any
concrete execution context with prefix , and provides no information in any other
context. In full generality, a context query predicate may be of the form 1 2
or 1 . . . n : λ; these forms may be immediately simplified to , ⊥ or λ.

Example 2. The type {ν : int | (1 λ =⇒ ν = 3) ∧ (2 λ =⇒ ν = 5)} rep-

resents an integer that is 3 if the most recent active function call site is 1 , 5 if
the most recent call site is 2 , and is otherwise unconstrained. This type may be
used for the argument of f in, e.g., f1 (3) + f2 (5).

As types in our type system may contain context variables, our typing
judgment (introduced below) includes a typing context L, which is either a
single context variable λ or a concrete context . This typing context represents
the assumptions about the execution context of the term being typed. If the
typing context is a context variable λ, then no assumptions are made about the
execution context of the term, although types may depend upon λ with context
query predicates. Accordingly, function bodies are typed under the context
variable universally quantiﬁed over in the corresponding function type; i.e., no
assumptions are made about the exact execution context of the function body.
ConSORT: Context- and Flow-Sensitive Ownership Reﬁnement Types 693

As in parametric polymorphism, consistent substitution of a concrete context

for a context variable λ in a typing derivation yields a valid type derivation
under concrete context .

Remark 1. The context-sensitivity scheme described here corresponds to the

standard CFA approach [50] without a priori call-string limiting. We chose this
scheme because it can be easily encoded with equality over integer variables (see
Section 4), but in principle another context-sensitivity strategy could be used
instead. The important feature of our type system is the inclusion of predicates
over contexts, not the speciﬁc choice for these predicates.

Function type environments are denoted with Θ and are ﬁnite maps from
function names (f ) to function types (σ).

Well Formedness. We impose two well-formedness conditions on types: ownership

well-formedness and refinement well-formedness. The ownership condition is
purely syntactic: τ is ownership well-formed if τ = τ ref 0 implies τ = n for
some n. i is the “maximal” type of a chain of i references, and is defined
inductively as 0 = {ν : int | } , i = i−1 ref 0 .
The ownership well-formedness condition ensures that aliases introduced via
heap writes do not violate the invariant of ownership types and that refinements
are consistent with updates performed through mutable aliases. Recall our own-
ership type invariant ensures all aliases of a mutable reference have 0 ownership.
Any mutations through that mutable alias will therefore be consistent with the
“no information” refinement required by this well-formedness condition.
Refinement well-formedness, denoted L | Γ WF ϕ, ensures that free program
variables in refinement ϕ are bound in a type environment Γ and have integer type.
It also requires that for a typing context L = λ, only context query predicates
over λ are used (no such predicates may be used if L = ). Notice this condition
forbids refinements that refer to references. Although ownership information can
signal when refinements on a mutably-aliased reference must be discarded, our
current formulation provides no such information for refinements that mention
mutably-aliased references. We therefore conservatively reject such refinements
at the cost of some expressiveness in our type system.
We write L | Γ WF τ to indicate a well-formed type where all refinements are
well-formed with respect to L and Γ . We write L WF Γ for a type environment
where all types are well-formed. A function environment is well-formed (written
WF Θ) if, for every σ in Θ, the argument, result, and output types are well-
formed with respect to each other and the context variable quantified over in σ.
As the formal definition of refinement well-formedness is fairly standard, we omit
it for space reasons (the full definition may be found in the full version [60]).

3.2 Intraprocedural Type System

We now introduce the type system for the intraprocedural fragment of our
language. Accordingly, this section focuses on the interplay of mutability and
694 J. Toman et al.

(T-Var)
Θ | L | Γ [x : τ1 + τ2 ] x : τ1 ⇒ Γ [x ← τ2 ]

Θ | L | Γ [y ← τ1 ∧y y =τ1 x ], x : (τ2 ∧x x =τ2 y) e : τ ⇒ Γ x ∈ dom(Γ )

(T-Let)
Θ | L | Γ [y : τ1 + τ2 ] let x = y in e : τ ⇒ Γ

Θ | L | Γ, x : {ν : int | ν = n} e : τ ⇒ Γ x ∈ dom(Γ )
(T-LetInt)
Θ | L | Γ let x = n in e : τ ⇒ Γ

Θ | L | Γ [x ← {ν : int | ϕ ∧ ν = 0}] e1 : τ ⇒ Γ
Θ | L | Γ [x ← {ν : int | ϕ ∧ ν = 0}] e2 : τ ⇒ Γ
(T-If)
Θ | L | Γ [x : {ν : int | ϕ}] ifz x then e1 else e2 : τ ⇒ Γ

Θ | L | Γ [y ← τ1 ], x : (τ2 ∧x x =τ2 y) ref 1 e : τ ⇒ Γ Θ | L | Γ e1 : τ ⇒ Γ

x ∈ dom(Γ ) Θ | L | Γ e2 : τ ⇒ Γ
Θ | L | Γ [y : τ1 + τ2 ] let x = mkref y in e : τ ⇒ Γ Θ | L | Γ e1 ; e2 : τ ⇒ Γ
(T-MkRef) (T-Seq)

τ1 ∧ y y =τ 1 x r > 0
τ =
τ1 r=0 Γ |= ϕ
| Γ WF ϕ
Θ | L | Γ [y ← τ ref r ], x : τ2 e : τ ⇒ Γ Θ | L | Γ e : τ ⇒Γ
x ∈ dom(Γ ) Θ | L | Γ assert(ϕ) ; e : τ ⇒ Γ
Θ | L | Γ [y : (τ1 + τ2 ) ref r ] let x = ∗y in e : τ ⇒ Γ (T-Assert)
(T-Deref)

Fig. 7. Expression typing rules.

refinement types. The typing rules are given in Figures 7 and 8. A typing judgment
takes the form Θ | L | Γ e : τ ⇒ Γ , which indicates that e is well-typed under
a function type environment Θ, typing context L, and type environment Γ , and
evaluates to a value of type τ and modifies the input environment according to Γ .
Any valid typing derivation must have L WF Γ , L WF Γ , and L | Γ WF τ ,
i.e., the input and output type environments and result type must be well-formed.
The typing rules in Figure 7 handle the relatively standard features in our
language. The rule T-Seq for sequential composition is fairly straightforward
except that the output type environment for e1 is the input type environment for
e2 . T-LetInt is also straightforward; since x is bound to a constant, it is given
type {ν : int | ν = n} to indicate x is exactly n. The output type environment Γ
cannot mention x (expressed with x ∈ dom(Γ )) to prevent x from escaping its
scope. This requirement can be met by applying the subtyping rule (see below) to
weaken refinements to no longer mention x . As in other refinement type systems
[47], this requirement is critical for ensuring soundness.
Rule T-Let is crucial to understanding our ownership type system. The
body of the let expression e is typechecked under a type environment where
the type of y in Γ is linearly split into two types: τ1 for y and τ2 for the newly
created binding x . This splitting is expressed using the + operator. If y is a ref-
erence type, the split operation distributes some portion of y’s ownership infor-
mation to its new alias x . The split operation also distributes refinement infor-
mation between the two types. For example, type {ν : int | ν > 0} ref 1 can be
split into (1) {ν : int | ν > 0} ref r and {ν : int | ν > 0} ref (1−r) (for r ∈ (0, 1)),
ConSORT: Context- and Flow-Sensitive Ownership Refinement Types 695

i.e., two immutable references with non-trivial reﬁnement information, or (2)

{ν : int | ν > 0} ref 1 and {ν : int | } ref 0 , where one of the aliases is mutable
and the other provides no refinement information. How a type is split depends
on the usage of x and y in e. Formally, we define the type addition operator as
the least commutative partial operation that satisfies the following rules:
{ν : int | ϕ1 } + {ν : int | ϕ2 } = {ν : int | ϕ1 ∧ ϕ2 } (Tadd-Int)
r1 r2 r1 +r2
τ1 ref +τ2 ref = (τ1 + τ2 ) ref (Tadd-Ref)
Viewed another way, type addition describes how to combine two types for the
same value such that the combination soundly incorporates all information from
the two original types. Critically, the type addition operation cannot create or
destroy ownership and refinement information, only combine or divide it between
types. Although not explicit in the rules, by ownership well-formedness, if the
entirety of a reference’s ownership is transferred to another type during a split,
all refinements in the remaining type must be .
The additional bits ∧y y =τ1 x and ∧x x =τ2 y express equality between x and
y as refinements. We use the strengthening operation τ ∧x ϕ and typed equality
proposition x =τ y, defined respectively as:
{ν : int | ϕ} ∧y ϕ = {ν : int | ϕ ∧ [ ν /y] ϕ } (x ={ν : int|ϕ} y) = (x = y)
τ ref r ∧y ϕ = τ ref r (x =τ ref r y) =
We do not track equality between references or between the contents of aliased
reference cells as doing so would violate our refinement well-formedness condition.
These operations are also used in other rules that can introduce equality.
Rule T-MkRef is very similar to T-Let, except that x is given a reference
type of ownership 1 pointing to τ2 , which is obtained by splitting the type of y. In
T-Deref, the content type of y is split and distributed to x . The strengthening
is conditionally applied depending on the ownership of the dereferenced pointer,
that is, if r = 0, τ has to be a maximal type i .
Our type system also tracks path information; in the T-If rule, we update the
refinement on the condition variable within the respective branches to indicate
whether the variable must be zero. By requiring both branches to produce the
same output type environment, we guarantee that these conflicting refinements
are rectified within the type derivations of the two branches.
The type rule for assert statements has the precondition Γ |= ϕ which is
defined to be |= Γ =⇒ ϕ, i.e., the logical formula Γ =⇒ ϕ is valid in the
chosen theory. Γ lifts the refinements on the integer valued variables into a
proposition in the logic used for verification. This denotation operation is defined
as:
• = {ν : int | ϕ}y = [y/ ν ] ϕ
Γ, x : τ = Γ ∧ τ x τ ref r y =
If the formula Γ =⇒ ϕ is valid, then in any context and under any valuation
of program variables that satisfy the refinements in Γ , the predicate ϕ must be
true and the assertion must not fail. This intuition forms the foundation of our
soundness claim (Section 3.4).
696 J. Toman et al.

(The shapes of τ and τ2 are similar)

Θ | L | Γ [x ← τ1 ][y ← (τ2 ∧y y =τ2 x ) ref 1 ] e : τ ⇒ Γ
(T-Assign)
Θ | L | Γ [x : τ1 + τ2 ][y : τ ref 1 ] y : = x ; e : τ ⇒ Γ

(τ1 ref r1 +τ2 ref r2 ) ≈ (τ1 ref r1 +τ2 ref r2 )

Θ | L | Γ [x ← τ1 ref r1 ][y ← τ2 ref r2 ] e : τ ⇒ Γ
(T-Alias)
Θ | L | Γ [x : τ1 ref r1 ][y : τ2 ref r2 ] alias(x = y) ; e : τ ⇒ Γ

(τ1 ref r1 +τ2 ref r2 ) ≈ (τ1 ref r1 +τ2 ref r2 )

Θ | L | Γ [x ← τ1 ref r1 ][y ← (τ2 ref r2 ) ref r ] e : τ ⇒ Γ
(T-AliasPtr)
Θ | L | Γ [x : τ1 ref r1 ][y : (τ2 ref r2 ) ref r ] alias(x = ∗y) ; e : τ ⇒ Γ

Γ ≤ Γ Θ | L | Γ e : τ ⇒ Γ Γ , τ ≤ Γ , τ
(T-Sub)
Θ | L | Γ e : τ ⇒ Γ

τ1 ≈ τ2 iﬀ • τ1 ≤ τ2 and • τ2 ≤ τ1 .

Fig. 8. Pointer manipulation and subtyping

Γ |= ϕ1 =⇒ ϕ2 ∀ x ∈ dom(Γ ).Γ Γ (x ) ≤ Γ (x )
(S-Int) (S-TyEnv)
Γ {ν : int | ϕ1 } ≤ {ν : int | ϕ2 } Γ ≤ Γ

r 1 ≥ r2 Γ τ1 ≤ τ 2 Γ, x : τ ≤ Γ , x : τ x ∈ dom(Γ )
(S-Ref) (S-Res)
Γ τ1 ref r1 ≤ τ2 ref r2 Γ, τ ≤ Γ, τ

Fig. 9. Subtyping rules.

Destructive Updates, Aliasing, and Subtyping. We now discuss the handling

of assignment, aliasing annotations, and subtyping as described in Figure 8.
Although apparently unrelated, all three concern updating the refinements of
(potentially) aliased reference cells.
Like the binding forms discussed above, T-Assign splits the assigned value’s
type into two types via the type addition operator, and distributes these types
between the right hand side of the assignment and the mutated reference contents.
Refinement information in the fresh contents may be inconsistent with any
previous refinement information; only the shapes must be the same. In a system
with unrestricted aliasing, this typing rule would be unsound as it would admit
writes that are inconsistent with refinements on aliases of the left hand side.
However, the assignment rule requires that the updated reference has an ownership
of 1. By the ownership type invariant, all aliases with the updated reference have 0
ownership, and by ownership well-formedness may only contain the refinement.

Example 3. We can type the program as follows:

let x = mkref 5 in // x : {ν : int | ν = 5} ref 1
let y = x in // x : 1 , y : {ν : int | ν = 5} ref 1
y := 4; assert(*y = 4) // x : 1 , y : {ν : int | ν = 4} ref 1
In this and later examples, we include type annotations within comments. We
stress that these annotations are for expository purposes only; our tool can infer
these types automatically with no manual annotations.
ConSORT: Context- and Flow-Sensitive Ownership Reﬁnement Types 697

As described thus far, the type system is quite strict: if ownership has been
completely transferred from one reference to another, the refinement information
found in the original reference is effectively useless. Additionally, once a mutable
pointer has been split through an assignment or let expression, there is no
way to recover mutability. The typing rule for must alias assertions, T-Alias
and T-AliasPtr, overcomes this restriction by exploiting the must-aliasing
information to “shuffle” or redistribute ownerships and refinements between two

aliased pointers. The typing rule assigns two fresh types τ1 ref r1 and τ2 ref r2 to
the two operand pointers. The choice of τ1 , r1 , τ2 , and r2 is left open provided

that the sum of the new types, (τ1 ref r1 ) + (τ2 ref r2 ) is equivalent (denoted ≈)
to the sum of the original types. Formally, ≈ is defined as in Figure 8; it implies
that any refinements in the two types must be logically equivalent and that
ownerships must also be equal. This redistribution is sound precisely because the
two references are assumed to alias; the total ownership for the single memory
cell pointed to by both references cannot be increased by this shuffling. Further,
any refinements that hold for the contents of one reference must necessarily hold
for contents of the other and vice versa.

Example 4 (Shuﬄing ownerships and reﬁnements). Let ϕ=n be ν = n.

let x = mkref 5 in // x : {ν : int | ϕ=5 } ref 1
let y = x in // x : 1 , y : {ν : int | ϕ=5 } ref 1
y := 4; alias(x = y) // x : {ν : int | ϕ=4 } ref 0.5 , y : {ν : int | ϕ=4 } ref 0.5
The ﬁnal type assignment for x and y is justiﬁed by

1 + {ν : int | ϕ=4 } ref 1 = {ν : int | ∧ ϕ=4 } ref 1 ≈

{ν : int | ϕ=4 ∧ ϕ=4 } ref 1 = {ν : int | ϕ=4 } ref 0.5 + {ν : int | ϕ=4 } ref 0.5 .

The aliasing rules give fine-grained control over ownership information. This
flexibility allows mutation through two or more aliased references within the
same scope. Provided sufficient aliasing annotations, the type system may shuffle
ownerships between one or more live references, enabling and disabling mutability
as required. Although the reliance on these annotations appears to decrease the
practicality of our type system, we expect these aliasing annotations can be
inserted by a conservative must-aliasing analysis. Further, empirical experience
from our prior work [56] indicates that only a small number of annotations are
required for larger programs.

Example 5 (Shuﬄing Mutability). Let ϕ=n again be ν = n. The following

program uses two live, aliased references to mutate the same memory location:
let x = mkref 0 in
let y = x in // x : {ν : int | ϕ=0 } ref 1 , y : 1
x := 1; alias(x = y); // x : 1 , y : {ν : int | ϕ=1 } ref 1
y := 2; alias(x = y); // x : {ν : int | ϕ=2 } ref 0.5 , y : {ν : int | ϕ=2 } ref 0.5
assert(*x = 2)
698 J. Toman et al.

Θ(f ) = ∀λ. x1 : τ1 , . . . , xn : τn → x1 : τ1 , . . . , xn : τn | τ

σα = [
: L/λ] σx = [y1 /x1 ] · · · [yn /xn ]
Θ | L | Γ [yi ← σα σx τi ], x : σα σx τ e : τ ⇒ Γ x ∈ dom(Γ )
(T-Call)
Θ | L | Γ [yi : σα σx τi ] let x = f (y1 , . . . , yn ) in e : τ ⇒ Γ

Θ(f ) = ∀λ. x1 : τ1 , . . . , xn : τn → x1 : τ1 , . . . , xn : τn | τ

Θ | λ | x1 : τ1 , . . . , xn : τn e : τ ⇒ x1 : τ1 , . . . , xn : τn
(T-FunDef)
Θ f → (x1 , .. , xn )e

∀f → (x1 , .. , xn )e ∈ D.Θ f → (x1 , .. , xn )e ΘD WF Θ

dom(D) = dom(Θ) Θ | | • e : τ ⇒Γ
ΘD D, e
(T-Funs) (T-Prog)

Fig. 10. Program typing rules

After the ﬁrst aliasing statement the type system shuﬄes the (exclusive) mutability
between x and y to enable the write to y. After the second aliasing statement
the ownership in y is split with x ; note that transferring all ownership from y to
x would also yield a valid typing.

Finally, we describe the subtyping rule. The rules for subtyping types and
environments are shown in Figure 9. For integer types, the rules require the
reﬁnement of a supertype is a logical consequence of the subtype’s reﬁnement
conjoined with the lifting of Γ . The subtype rule for references is covariant in
the type of reference contents. It is widely known that in a language with un-
restricted aliasing and mutable references such a rule is unsound: after a write
into the coerced pointer, reads from an alias may yield a value disallowed by
the alias’ type [43]. However, as in the assign case, ownership types prevent un-
soundness; a write to the coerced pointer requires the pointer to have ownership
1, which guarantees any aliased pointers have the maximal type and provide no
information about their contents beyond simple types.

3.3 Interprocedural Fragment and Context-Sensitivity

We now turn to a discussion of the interprocedural fragment of our language,
and how our type system propagates context information. The remaining typing
rules for our language are shown in Figure 10. These rules concern the typing of
function calls, function bodies, and entire programs.
We ﬁrst explain the T-Call rule. The rule uses two substitution maps. σx
translates between the parameter names used in the function type and actual
argument names at the call-site. σα instantiates all occurrences of λ in the callee
type with : L, where is the label of the call-site and L the typing context of
the call. The types of the arguments yi ’s are required to match the parameter
ConSORT: Context- and Flow-Sensitive Ownership Reﬁnement Types 699

types (post substitution). The body of the let binding is then checked with
the argument types updated to reflect the changes in the function call (again,
post substitution). This update is well-defined because we require all function
arguments be distinct as described in Section 2.1. Intuitively, the substitution σα
represents incrementally refining the behavior of the callee function with partial
context information. If L is itself a context variable λ , this substitution effectively
transforms any context prefix queries over λ in the argument/return/output
types into a queries over : λ . In other words, while the exact concrete execution
context of the callee is unknown, the context must at least begin with which
can potentially rule out certain behaviors.
Rule T-FunDef type checks a function definition f → (x1 , .. , xn )e against
the function type given in Θ. As a convenience we assume that the parameter
names in the function type match the formal parameters in the function definition.
The rule checks that under an initial environment given by the argument types the
function body produces a value of the return type and transforms the arguments
according to the output types. As mentioned above, functions may be executed
under many different contexts, so type checking the function body is performed
under the context variable λ that occurs in the function type.
Finally, the rule for typing programs (T-Prog) checks that all function
definitions are well typed under a well-formed function type environment, and
that the entry point e is well typed in an empty type environment and the typing
context , i.e., the initial context.

Example 6 (1-CFA). Recall the program in Figure 3 in Section 1; assume the

function calls are labeled as follows:
p := get1 (p) + 1;
// ...
q := get2 (q) + 1;
Taking τp to be the type shown in Example 2:

{ν : int | (1 λ =⇒ ν = 3) ∧ (2 λ =⇒ ν = 5)}

we can give get the type ∀λ. z : τp ref 1 → z : τp ref 1 | τp .

Example 7 (2-CFA). To see how context information propagates across multiple

calls, consider the following change to the code considered in Example 6:
get_real(z) { *z }
get(z) { get_real3 (z) }
The type of get remains as in Example 6, and taking τ to be

{ν : int | (3 1 λ =⇒ ν = 3) ∧ (3 2 λ =⇒ ν = 5)}

the type of get_real is: ∀λ . z : τ ref 1 → z : τ ref 1 | τ .

We focus on the typing of the call to get_real in get; it is typed in context

λ and a type environment where p is given type τp from Example 6.
700 J. Toman et al.

Applying the substitution [3 : λ/λ ] to the argument type of get_real yields:

{ν : int | (3 1 3 : λ =⇒ ν = 3) ∧ (3 2 3 : λ =⇒ ν = 5)} ref 1 ≈

{ν : int | (1 λ =⇒ ν = 3) ∧ (2 λ =⇒ ν = 5)} ref 1

which is exactly the type of p. A similar derivation applies to the return type of
get_real and thus get.

3.4 Soundness
We have proven that any program that type checks according to the rules above
will never experience an assertion failure. We formalize this claim with the
following soundness theorem.

Theorem 1 (Soundness). If D, e, then ∅, ∅, ·, e −→∗D AssertFail.

Further, any well-typed program either diverges, halts in the conﬁguration
AliasFail, or halts in a conﬁguration H , R, ·, x for some H , R and x , i.e.,
evaluation does not get stuck.

Proof (Sketch). By standard progress and preservation lemmas; the full proof
has been omitted for space reasons and can be found in the full version [60].

4 Inference and Extensions

We now brieﬂy describe the inference algorithm implemented in our tool Con-
SORT. We sketch some implemented extensions needed to type more interesting
programs and close with a discussion of current limitations of our prototype.

4.1 Inference
Our tool ﬁrst runs a standard, simple type inference algorithm to generate type
templates for every function parameter type, return type, and for every live
variable at each program point. For a variable x of simple type τS ::= int | τS ref
at program point p, ConSORT generates a type template τS x ,0,p as follows:

intx ,n,p = {ν : int | ϕx ,n,p (ν; FVp )} τS ref x ,n,p = τS x ,n+1,p ref rx ,n,p

ϕx ,n,p (ν; FVp ) denotes a fresh relation symbol applied to ν and the free variables
of simple type int at program point p (denoted FVp ). rx ,n,p is a fresh ownership
variable. For each function f , there are two synthetic program points, f b and f e
for the beginning and end of the function respectively. At both points, ConSORT
generates type template for each argument, where FVf b and FVf e are the names
of integer typed parameters. At f e , ConSORT also generates a type template
for the return value. We write Γ p to indicate the type environment at point p,
where every variable is mapped to its corresponding type template. Γ p is thus
equivalent to x ∈FVp ϕx ,0,p (x ; FVp ).

ConSORT: Context- and Flow-Sensitive Ownership Reﬁnement Types 701

When generating these type templates, our implementation also generates own-
ership well-formedness constraints. Speciﬁcally, for a type template of the form
{ν : int | ϕx ,n+1,p (ν; FVp )} ref rx ,n,p ConSORT emits the constraint: rx ,n,p =
0 =⇒ ϕx ,n+1,p (ν; FVp ) and for a type template (τ ref rx ,n+1,p ) ref rx ,n,p Con-
SORT emits the constraint rx ,n,p = 0 =⇒ rx ,n+1,p = 0.
ConSORT then walks the program, generating constraints between relation
symbols and ownership variables according to the typing rules. These constraints
take three forms, ownership constraints, subtyping constraints, and assertion
constraints. Ownership constraints are simple linear (in)equalities over ownership
variables and constants, according to conditions imposed by the typing rules.
For example, if variable x has the type template τ ref rx ,0,p for the expression
x : = y ; e at point p, ConSORT generates the constraint rx ,0,p = 1.
ConSORT emits subtyping constraints between the relation symbols at
related program points according to the rules of the type system. For example, for
the term let x = y in e at program point p (where e is at program point p , and x
has simple type int ref ) ConSORT generates the following subtyping constraint:

Γ p ∧ ϕy,1,p (ν; FVp ) =⇒ ϕy,1,p (ν; FVp ) ∧ ϕx ,1,p (ν; FVp )

in addition to the ownership constraint ry,0,p = ry,0,p + rx ,0,p .

Finally, for each assert(ϕ) in the program, ConSORT emits an assertion
constraint of the form: Γ p =⇒ ϕ which requires the reﬁnements on integer
typed variables in scope are suﬃcient to prove ϕ.

Encoding Context Sensitivity. To make inference tractable, we require the user

to fix a priori the maximum length of prefix queries to a constant k (this choice
is easily controlled with a command line parameter to our tool). We supplement
the arguments in every predicate application with a set of integer context vari-
ables c1 , . . . , ck ; these variables do not overlap with any program variables.
ConSORT uses these variables to infer context sensitive refinements as
follows. Consider a function call let x = f (y1 , . . . , yn ) in e at point p where e
is at point p . ConSORT generates the following constraint for a refinement
ϕyi ,n,p (ν, c1 , . . . , ck ; FVp ) which occurs in the type template of yi :

ϕyi ,n,p (ν, c0 , . . . , ck ; FVp ) =⇒ σx ϕxi ,n,f b (ν, , c0 , . . . , ck −1 ; FVf b )

σx ϕxi ,n,f e (ν, , c0 , . . . , ck −1 ; FVf e ) =⇒ ϕyi ,n,p (ν, c0 , . . . , ck ; FVp )
σx = [y1 /x1 ] · · · [yn /xn ]

Eﬀectively, we have encoded 1 . . . k λ as ∧0<i≤k ci = i . In the above, the

shift from c0 , . . . , ck to , c0 , . . . , ck −1 plays the role of σα in the T-Call rule.
The above constraint serves to determine the value of c0 within the body of the
function f . If f calls another function g, the above rule propagates this value of
c0 to c1 within g and so on. The solver may then instantiate relation symbols
with predicates that are conditional over the values of ci .

Solving Constraints. The results of the above process are two systems of con-
straints; real arithmetic constraints over ownership variables and constrained Horn
702 J. Toman et al.

clauses (CHC) over the refinement relations. Under certain assumptions about the
simple types in a program, the size of the ownership and subtyping constraints will
be polynomial to the size of the program. These systems are not independent; the
relation constraints may mention the value of ownership variables due to the well-
formedness constraints described above. The ownership constraints are first solved
with Z3 [16]. These constraints are non-linear but Z3 appears particularly well-
engineered to quickly find solutions for the instances generated by ConSORT. We
constrain Z3 to maximize the number of non-zero ownership variables to ensure as
few refinements as possible are constrained to be by ownership well-formedness.
The values of ownership variables inferred by Z3 are then substituted into the
constrained Horn clauses, and the resulting system is checked for satisfiability
with an off-the-shelf CHC solver. Our implementation generates constraints in
the industry standard SMT-Lib2 format [8]; any solver that accepts this format
can be used as a backend for ConSORT. Our implementation currently supports
Spacer [37] (part of the Z3 solver [16]), HoICE [13], and Eldarica [48] (adding a
new backend requires only a handful of lines of glue code). We found that different
solvers are better tuned to different problems; we also implemented parallel mode
which runs all supported solvers in parallel, using the first available result.

4.2 Extensions
Primitive Operations. As defined in Section 2, our language can compare integers
to zero and load and store them from memory, but can perform no meaningful
computation over these numbers. To promote the flexibility of our type system
and simplify our soundness statement, we do not fix a set of primitive operations
and their static semantics. Instead, we assume any set of primitive operations
used in a program are given sound function types in Θ. For example, under the
assumption that + has its usual semantics and the underlying logic supports +, we
can give + the type ∀λ. x : 0 , y : 0 → x : 0 , y : 0 | {ν : int | ν = x + y}.
Interactions with a nondeterministic environment or unknown program inputs
can then be modeled with a primitive that returns integers refined with .

Dependent Tuples. Our implementation supports types of the form: (x1 : τ1 , . . . ,

xn : τn ), where xi can appear within τj (j = i) if τi is an integer type. For
example, (x : {ν : int | } , y : {ν : int | ν > x }) is the type of tuples whose second
element is strictly greater than the first. We also extend the language with tuple
constructors as a new value form, and let bindings with tuple patterns as the LHS.
The extension to type checking is relatively straightforward; the only signifi-
cant extensions are to the subtyping rules. Specifically, the subtyping check for a
tuple element xi : τi is performed in a type environment elaborated with the types
and names of other tuple elements. The extension to type inference is also straight-
forward; the arguments for a predicate symbol include any enclosing dependent
tuple names and the environment in subtyping constraints is likewise extended.

Recursive Types. Our language also supports some unbounded heap structures
via recursive reference types. To keep inference tractable, we forbid nested recur-
sive types, multiple occurrences of the recursive type variable, and additionally
ConSORT: Context- and Flow-Sensitive Ownership Reﬁnement Types 703

fix the shape of refinements that occur within a recursive type. For recursive re-
finements that fit the above restriction, our approach for refinements is broadly
similar to that in [35], and we use the ownership scheme of [56] for handling
ownership. We first use simple type inference to infer the shape of the recursive
types, and automatically insert fold/unfold annotations into the source program.
As in [35], the refinements within an unfolding of a recursive type may refer to
dependent tuple names bound by the enclosing type. These recursive types can
express, e.g., the invariants of a mutable, sorted list. As in [56], recursive types
are unfolded once before assigning ownership variables; further unfoldings copy
existing ownership variables.
As in Java or C++, our language does not support sum types, and any
instantiation of a recursive type must use a null pointer. Our implementation
supports an ifnull construct in addition to a distinguished null constant. Our
implementation allows any refinement to hold for the null constant, including
⊥. Currently, our implementation does not detect null pointer dereferences, and
all soundness guarantees are made modulo freedom of null dereferences. As Γ
omits refinements under reference types, null pointer refinements do not affect
the verification of programs without null pointer dereferences.

Arrays. Our implementation supports arrays of integers. Each array is given an

ownership describing the ownership of memory allocated for the entire array. The
array type contains two refinements: the first refines the length of the array itself,
and the second refines the entire array contents. The content refinement may
refer to a symbolic index variable for precise, per-index refinements. At reads
and writes to the array, ConSORT instantiates the refinement’s symbolic index
variable with the concrete index used at the read/write.
As in [56], our restriction to arrays of integers stems from the difficulty of
ownership inference. Soundly handling pointer arrays requires index-wise tracking
of ownerships which significantly complicates automated inference. We leave
supporting arrays of pointers to future work.

4.3 Limitations
Our current approach is not complete; there are safe programs that will be rejected
by our type system. As mentioned in Section 3.1, our well-formedness condition
forbids reﬁnements that refer to memory locations. As a result, ConSORT
cannot in general express, e.g., that the contents of two references are equal.
Further, due to our reliance on automated theorem provers we are restricted to
logics with sound but potentially incomplete decision procedures. ConSORT
also does not support conditional or context-sensitive ownerships, and therefore
cannot precisely handle conditional mutation or aliasing.

5 Experiments
We now present the results of preliminary experiments performed with the imple-
mentation described in Section 4. The goal of these experiments was to answer the
704 J. Toman et al.

Table 1. Description of benchmark suite adapted from JayHorn. Java are programs
that test Java-speciﬁc features. Inc are tests that cannot be handled by ConSORT, e.g.,
null checking, etc. Bug includes a “safe” program we discovered was actually incorrect.

Set Orig. Adapted Java Inc Bug

Safe 41 32 6 2 1
Unsafe 41 26 13 2 0

following questions: i) is the type system (and extensions of Section 4) expressive

enough to type and verify non-trivial programs? and ii) is type inference feasible?
To answer these questions, we evaluated our prototype implementation on two
sets of benchmarks.4 The first set is adapted from JayHorn [32, 33], a verification
tool for Java. This test suite contains a combination of 82 safe and unsafe
programs written in Java. We chose this benchmark suite as, like ConSORT,
JayHorn is concerned with the automated verification of programs in a language
with mutable, aliased memory cells. Further, although some of their benchmark
programs tested Java specific features, most could be adapted into our low-level
language. The tests we could adapt provide a comparison with existing state-of-
the-art verification techniques. A detailed breakdown of the adapted benchmark
suite can be found in Table 1.

Remark 2. The original JayHorn paper includes two additional benchmark sets,
Mine Pump and CBMC. Both our tool and recent JayHorn versions time out on
the Mine Pump benchmark. Further, the CBMC tests were either subsumed by
our own test programs, tested Java speciﬁc features, or tested program synthesis
functionality. We therefore omitted both of these benchmarks from our evaluation.

The second benchmark set consists of data structure implementations and

microbenchmarks written directly in our low-level imperative language. We
developed this suite to test the expressive power of our type system and inference.
The programs included in this suite are:

– Array-List Implementation of an unbounded list backed by an array.

– Sorted-List Implementation of a mutable, sorted list maintained with an
in-place insertion sort algorithm.
– Shuﬄe Multiple live references are used to mutate the same location in
program memory as in Example 5.
– Mut-List Implementation of general linked lists with a clear operation.
– Array-Inv A program which allocates a length n array and writes the value
i at every index i.
– Intro2 The motivating program shown in Figure 2 in Section 1.
4
Our experiments and the ConSORT source code are available at https://www.fos.
kuis.kyoto-u.ac.jp/projects/consort/.
ConSORT: Context- and Flow-Sensitive Ownership Reﬁnement Types 705

Table 2. Comparison of ConSORT to JayHorn on the benchmark set of [32] (top)

and our custom benchmark suite (bottom). T/O indicates a time out.

ConSORT JayHorn
Set N. Tests Correct T/O Correct T/O Imp.
Safe 32 29 3 24 5 3
Unsafe 26 26 0 19 0 7
Name Safe? Time(s) Ann JH Name Safe? Time(s) Ann JH
Array-Inv 10.07 0 T/O Array-Inv-BUG X 5.29 0 T/O
Array-List 16.76 0 T/O Array-List-BUG X 1.13 0 T/O
Intro2 0.08 0 T/O Intro2-BUG X 0.02 0 T/O
Mut-List 1.45 3 T/O Mut-List-BUG X 0.41 3 T/O
Shuﬄe 0.13 3 Shuﬄe-BUG X 0.07 3 X
Sorted-List 1.90 3 T/O Sorted-List-BUG X 1.10 3 T/O

We introduced unsafe mutations to these programs to check our tool for unsound-
ness and translated these programs into Java for further comparison with JayHorn.
Our benchmarks and JayHorn’s require a small number of trivially identi-
fied alias annotations. The adapted JayHorn benchmarks contain a total of 6
annotations; the most for any individual test was 3. The number of annotations
required for our benchmark suite are shown in column Ann. of Table 2.
We first ran ConSORT on each program in our benchmark suite and ran
version 0.7 of JayHorn on the corresponding Java version. We recorded the final
verification result for both our tool and JayHorn. We also collected the end-to-end
runtime of ConSORT for each test; we do not give a performance comparison
with JayHorn given the many differences in target languages. For the JayHorn
suite, we first ran our tool on the adapted version of each test program and ran
JayHorn on the original Java version. We also did not collect runtime information
for this set of experiments because our goal is a comparison of tool precision, not
performance. All tests were run on a machine with 16 GB RAM and 4 Intel i5
CPUs at 2GHz and with a timeout of 60 seconds (the same timeout was used in
[32]). We used ConSORT’s parallel backend (Section 4) with Z3 version 4.8.4,
HoICE version 1.8.1, and Eldarica version 2.0.1 and JayHorn’s Eldarica backend.

5.1 Results
The results of our experiments are shown in Table 2. On the JayHorn benchmark
suite ConSORT performs competitively with JayHorn, correctly identifying 29
of the 32 safe programs as such. For all 3 tests on which ConSORT timed out
after 60 seconds, JayHorn also timed out (column T/O). For the unsafe programs,
ConSORT correctly identiﬁed all programs as unsafe within 60 seconds; JayHorn
answered Unknown for 7 tests (column Imp.).
On our own benchmark set, ConSORT correctly veriﬁes all safe versions of
the programs within 60 seconds. For the unsafe variants, ConSORT was able to
706 J. Toman et al.

quickly and definitively determine these programs unsafe. JayHorn times out on
all tests except for Shuffle and ShuffleBUG (column JH). We investigated the
cause of time outs and discovered that after verification failed with an unbounded
heap model, JayHorn attempts verification on increasingly larger bounded heaps.
In every case, JayHorn exceeded the 60 second timeout before reaching a pre-
configured limit on the heap bound. This result suggests JayHorn struggles in
the presence of per-object invariants and unbounded allocations; the only two
tests JayHorn successfully analyzed contain just a single object allocation.
We do not believe this struggle is indicative of a shortcoming in JayHorn’s
implementation, but stems from the fundamental limitations of JayHorn’s memory
representation. Like many verification tools (see Section 6), JayHorn uses a single,
unchanging invariant to for every object allocated at the same syntactic location;
effectively, all objects allocated at the same location are assumed to alias with one
another. This representation cannot, in general, handle programs with different
invariants for distinct objects that evolve over time. We hypothesize other tools
that adopt a similar approach will exhibit the same difficulty.

6 Related Work

The difficulty in handling programs with mutable references and aliasing has been
well-studied. Like JayHorn, many approaches model the heap explicitly at ver-
ification time, approximating concrete heap locations with allocation site labels
[14, 20, 32, 33, 46]; each abstract location is also associated with a refinement. As
abstract locations summarize many concrete locations, this approach does not in
general admit strong updates and flow-sensitivity; in particular, the refinement
associated with an abstract location is fixed for the lifetime of the program. The
techniques cited above include various workarounds for this limitation. For exam-
ple, [14, 46] temporarily allows breaking these invariants through a distinguished
program name as long as the abstract location is not accessed through another
name. The programmer must therefore eventually bring the invariant back in
sync with the summary location. As a result, these systems ultimately cannot
precisely handle programs that require evolving invariants on mutable memory.
A similar approach was taken in CQual [23] by Aiken et al. [2]. They used
an explicit restrict binding for pointers. Strong updates are permitted through
pointers bound with restrict, but the program is forbidden from using any pointers
which share an allocation site while the restrict binding is live.
A related technique used in the field of object-oriented verification is to declare
object invariants at the class level and allow these invariants on object fields to be
broken during a limited period of time [7, 22]. In particular, the work on Spec#
[7] uses an ownership system which tracks whether object a owns object b; like
ConSORT’s ownership system, these ownerships contain the effects of mutation.
However, Spec#’s ownership is quite strict and does not admit references to b
outside of the owning object a.
Viper [30, 42] (and its related projects [31, 39]) uses access annotations (ex-
pressed as permission predicates) to explicitly transfer access/mutation permis-
ConSORT: Context- and Flow-Sensitive Ownership Refinement Types 707

sions for references between static program names. Like ConSORT, permissions
may be fractionally transferred, allowing temporary shared, immutable access to
a mutable memory cell. However, while ConSORT automatically infers many
ownership transfers, Viper requires extensive annotations for each transfer.
F*, a dependently typed dialect of ML, includes an update/select theory of
heaps and requires explicit annotations summarizing the heap effects of a method
[44, 57, 58]. This approach enables modular reasoning and precise specification of
pre- and post-conditions with respect to the heap, but precludes full automation.
The work on rely–guarantee reference types by Gordon et al. [26, 27] uses re-
finement types in a language mutable references and aliasing. Their approach ex-
tends reference types with rely/guarantee predicates; the rely predicate describes
possible mutations via aliases, and the guarantee predicate describes the admissi-
ble mutations through the current reference. If two references may alias, then the
guarantee predicate of one reference implies the rely predicate of the other and
vice versa. This invariant is maintained with a splitting operation that is similar
to our + operator. Further, their type system allows strong updates to reference
refinements provided the new refinements are preserved by the rely predicate.
Thus, rely–guarantee refinement support multiple mutable, aliased references
with non-trivial refinement information. Unfortunately this expressiveness comes
at the cost of automated inference and verification; an embedding of this system
into Liquid Haskell [63] described in [27] was forced to sacrifice strong updates.
Work by Degen et al. [17] introduced linear state annotations to Java. To effect
strong updates in the presence of aliasing, like ConSORT, their system requires
annotated memory locations are mutated only through a distinguished reference.
Further, all aliases of this mutable reference give no information about the state
of the object much like our 0 ownership pointers. However, their system cannot
handle multiple, immutable aliases with non-trivial annotation information; only
the mutable reference may have non-trivial annotation information.
The fractional ownerships in ConSORT and their counterparts in [55, 56]
have a clear relation to linear type systems. Many authors have explored the
use of linear type systems to reason in contexts with aliased mutable references
[18, 19, 52], and in particular with the goal of supporting strong updates [1].
A closely related approach is RustHorn by Matsushita et al. [40]. Much like
ConSORT, RustHorn uses CHC and linear aliasing information for the sound
and—unlike ConSORT—complete verification of programs with aliasing and
mutability. However, their approach depends on Rust’s strict borrowing discipline,
and cannot handle programs where multiple aliased references are used in the
same lexical region. In contrast, ConSORT supports fine-grained, per-statement
changes in mutability and even further control with alias annotations, which
allows it to verify larger classes of programs.
The ownerships of ConSORT also have a connection to separation logic
[45]; the separating conjunction isolates write effects to local subheaps, while
ConSORT’s ownership system isolates effects to local updates of pointer types.
Other researchers have used separation logic to precisely support strong updates
of abstract state. For example, in work by Kloos et al. [36] resources are associated
708 J. Toman et al.

with static, abstract names; each resource (represented by its static name) may
be owned (and thus, mutated) by exactly one thread. Unlike ConSORT, their
ownership system forbids even temporary immutable, shared ownership, or
transferring ownerships at arbitrary program points. An approach proposed by
Bakst and Jhala [4] uses a similar technique, combining separation logic with
refinement types. Their approach gives allocated memory cells abstract names, and
associates these names with refinements in an abstract heap. Like the approach
of Kloos et al. and ConSORT’s ownership 1 pointers, they ensure these abstract
locations are distinct in all concrete heaps, enabling sound, strong updates.
The idea of using a rational number to express permissions to access a refer-
ence dates back to the type system of fractional permissions by Boyland [12]. His
work used fractional permissions to verify race freedom of a concurrent program
without a may-alias analysis. Later, Terauchi [59] proposed a type-inference algo-
rithm that reduces typing constraints to a set of linear inequalities over rational
numbers. Boyland’s idea also inspired a variant of separation logic for a concurrent
programming language [11] to express sharing of read permissions among several
threads. Our previous work [55, 56], inspired by that in [11, 59], proposed meth-
ods for type-based verification of resource-leak freedom, in which a rational num-
ber expresses an obligation to deallocate certain resource, not just a permission.
The issue of context-sensitivity (sometimes called polyvariance) is well-studied
in the field of abstract interpretation (e.g., [28, 34, 41, 50, 51], see [25] for a recent
survey). Polyvariance has also been used in type systems to assign different behav-
iors to the same function depending on its call site [3, 6, 64]. In the area of refine-
ment type systems, Zhu and Jagannathan developed a context-sensitive dependent
type system for a functional language [67] that indexed function types by unique
labels attached to call-sites. Our context-sensitivity approach was inspired by this
work. In fact, we could have formalized context-polymorphism within the frame-
work of full dependent types, but chose the current presentation for simplicity.

7 Conclusion
We presented ConSORT, a novel type system for safety verification of imperative
programs with mutability and aliasing. ConSORT is built upon the novel combi-
nation of fractional ownership types and refinement types. Ownership types flow-
sensitively and precisely track the existence of mutable aliases. ConSORT admits
sound strong updates by discarding refinement information on mutably-aliased
references as indicated by ownership types. Our type system is amenable to auto-
matic type inference; we have implemented a prototype of this inference tool and
found it can verify several non-trivial programs and outperforms a state-of-the-art
program verifier. As an area of future work, we plan to investigate using fractional
ownership types to soundly allow refinements that mention memory locations.

Acknowledgments The authors would like to the reviewers for their thoughtful feedback
and suggestions, and Yosuke Fukuda and Alex Potanin for their feedback on early drafts.
This work was supported in part by JSPS KAKENHI, grant numbers JP15H05706 and
JP19H04084, and in part by the JST ERATO MMSD Project.
ConSORT: Context- and Flow-Sensitive Ownership Reﬁnement Types 709

Bibliography

[1] Ahmed, A., Fluet, M., Morrisett, G.: L3 : a linear language with locations.
Fundamenta Informaticae 77(4), 397–449 (2007)
[2] Aiken, A., Foster, J.S., Kodumal, J., Terauchi, T.: Checking and
inferring local non-aliasing. In: Conference on Programming Lan-
guage Design and Implementation (PLDI). pp. 129–140 (2003).
https://doi.org/10.1145/781131.781146
[3] Amtoft, T., Turbak, F.: Faithful translations between polyvariant flows and
polymorphic types. In: European Symposium on Programming (ESOP). pp.
26–40. Springer (2000). https://doi.org/10.1007/3-540-46425-5 2
[4] Bakst, A., Jhala, R.: Predicate abstraction for linked data struc-
tures. In: Conference on Verification, Model Checking, and Abstract In-
terpretation (VMCAI). pp. 65–84. Springer Berlin Heidelberg (2016).
https://doi.org/10.1007/978-3-662-49122-5 3
[5] Ball, T., Levin, V., Rajamani, S.K.: A decade of software model check-
ing with SLAM. Communications of the ACM 54(7), 68–76 (2011).
https://doi.org/10.1145/1965724.1965743
[6] Banerjee, A.: A modular, polyvariant and type-based closure analysis. In:
International Conference on Functional Programming (ICFP). pp. 1–10
(1997). https://doi.org/10.1145/258948.258951
[7] Barnett, M., Fähndrich, M., Leino, K.R.M., Müller, P., Schulte, W., Venter,
H.: Specification and verification: the Spec# experience. Communications
of the ACM 54(6), 81–91 (2011). https://doi.org/10.1145/1953122.1953145
[8] Barrett, C., Fontaine, P., Tinelli, C.: The Satisfiability Modulo Theories
Library (SMT-LIB). www.SMT-LIB.org (2016)
[9] Bengtson, J., Bhargavan, K., Fournet, C., Gordon, A.D., Maffeis, S.: Re-
finement types for secure implementations. ACM Transactions on Pro-
gramming Languages and Systems (TOPLAS) 33(2), 8:1–8:45 (2011).
https://doi.org/10.1145/1890028.1890031
[10] Bhargavan, K., Bond, B., Delignat-Lavaud, A., Fournet, C., Hawblitzel,
C., Hriţcu, C., Ishtiaq, S., Kohlweiss, M., Leino, R., Lorch, J., Mail-
lard, K., Pan, J., Parno, B., Protzenko, J., Ramananandro, T., Rane, A.,
Rastogi, A., Swamy, N., Thompson, L., Wang, P., Zanella-Béguelin, S.,
Zinzindohoué, J.K.: Everest: Towards a verified, drop-in replacement of
HTTPS. In: Summit on Advances in Programming Languages (SNAPL
2017). pp. 1:1–1:12. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2017).
https://doi.org/10.4230/LIPIcs.SNAPL.2017.1
[11] Bornat, R., Calcagno, C., O’Hearn, P.W., Parkinson, M.J.: Per-
mission accounting in separation logic. In: Symposium on Prin-
ciples of Programming Languages (POPL). pp. 259–270 (2005).
https://doi.org/10.1145/1040305.1040327
[12] Boyland, J.: Checking interference with fractional permissions. In:
Symposion on Static Analysis (SAS). pp. 55–72. Springer (2003).
https://doi.org/10.1007/3-540-44898-5 4
710 J. Toman et al.

[13] Champion, A., Kobayashi, N., Sato, R.: HoIce: An ICE-based non-linear Horn
clause solver. In: Asian Symposium on Programming Languages and Systems
(APLAS). pp. 146–156. Springer (2018). https://doi.org/10.1007/978-3-030-
02768-1 8
[14] Chugh, R., Herman, D., Jhala, R.: Dependent types for JavaScript. In: Confer-
ence on Object Oriented Programming Systems Languages and Applications
(OOPSLA). pp. 587–606 (2012). https://doi.org/10.1145/2384616.2384659
[15] Cousot, P., Cousot, R., Feret, J., Mauborgne, L., Miné, A., Monniaux, D.,
Rival, X.: The ASTRÉE analyzer. In: European Symposium on Programming
(ESOP). pp. 21–30. Springer (2005). https://doi.org/10.1007/978-3-540-
31987-0 3
[16] De Moura, L., Bjørner, N.: Z3: An efficient SMT solver. In: Conference on
Tools and Algorithms for the Construction and Analysis of Systems (TACAS).
pp. 337–340. Springer (2008). https://doi.org/10.1007/978-3-540-78800-3 24
[17] Degen, M., Thiemann, P., Wehr, S.: Tracking linear and affine resources
with JAVA(X). In: European Conference on Object-Oriented Programming
(ECOOP). pp. 550–574. Springer (2007). https://doi.org/10.1007/978-3-540-
73589-2 26
[18] DeLine, R., Fähndrich, M.: Enforcing high-level protocols in low-level soft-
ware. In: Conference on Programming Language Design and Implementation
(PLDI). pp. 59–69 (2001). https://doi.org/10.1145/378795.378811
[19] Fähndrich, M., DeLine, R.: Adoption and focus: Practical linear
types for imperative programming. In: Conference on Programming
Language Design and Implementation (PLDI). pp. 13–24 (2002).
https://doi.org/10.1145/512529.512532
[20] Fink, S.J., Yahav, E., Dor, N., Ramalingam, G., Geay, E.: Effective type-
state verification in the presence of aliasing. ACM Transactions on Soft-
ware Engineering and Methodology (TOSEM) 17(2), 9:1–9:34 (2008).
https://doi.org/10.1145/1348250.1348255
[21] Flanagan, C.: Hybrid type checking. In: Symposium on Prin-
ciples of Programming Languages (POPL). pp. 245–256 (2006).
https://doi.org/10.1145/1111037.1111059
[22] Flanagan, C., Leino, K.R.M., Lillibridge, M., Nelson, G., Saxe, J.B.,
Stata, R.: Extended static checking for Java. In: Conference on Program-
ming Language Design and Implementation (PLDI). pp. 234–245 (2002).
https://doi.org/10.1145/512529.512558
[23] Foster, J.S., Terauchi, T., Aiken, A.: Flow-sensitive type qualifiers. In:
Conference on Programming Language Design and Implementation (PLDI).
pp. 1–12 (2002). https://doi.org/10.1145/512529.512531
[24] Freeman, T., Pfenning, F.: Refinement types for ML. In: Conference on
Programming Language Design and Implementation (PLDI). pp. 268–277
(1991). https://doi.org/10.1145/113445.113468
[25] Gilray, T., Might, M.: A survey of polyvariance in abstract interpretations.
In: Symposium on Trends in Functional Programming. pp. 134–148. Springer
(2013). https://doi.org/10.1007/978-3-642-45340-3 9
ConSORT: Context- and Flow-Sensitive Ownership Refinement Types 711

[26] Gordon, C.S., Ernst, M.D., Grossman, D.: Rely–guarantee references for
refinement types over aliased mutable data. In: Conference on Program-
ming Language Design and Implementation (PLDI). pp. 73–84 (2013).
https://doi.org/10.1145/2491956.2462160
[27] Gordon, C.S., Ernst, M.D., Grossman, D., Parkinson, M.J.: Verifying invari-
ants of lock-free data structures with rely–guarantee and refinement types.
ACM Transactions on Programming Languages and Systems (TOPLAS)
39(3), 11:1–11:54 (2017). https://doi.org/10.1145/3064850
[28] Hardekopf, B., Wiedermann, B., Churchill, B., Kashyap, V.: Widening for
control-flow. In: Conference on Verification, Model Checking, and Abstract
Interpretation (VMCAI). pp. 472–491 (2014). https://doi.org/10.1007/978-
3-642-54013-4 26
[29] Hawblitzel, C., Howell, J., Kapritsos, M., Lorch, J.R., Parno, B., Roberts,
M.L., Setty, S., Zill, B.: IronFleet: proving practical distributed systems
correct. In: Symposium on Operating Systems Principles (SOSP). pp. 1–17.
ACM (2015). https://doi.org/10.1145/2815400.2815428
[30] Heule, S., Kassios, I.T., Müller, P., Summers, A.J.: Verification condition gen-
eration for permission logics with abstract predicates and abstraction func-
tions. In: European Conference on Object-Oriented Programming (ECOOP).
pp. 451–476. Springer (2013). https://doi.org/10.1007/978-3-642-39038-8 19
[31] Heule, S., Leino, K.R.M., Müller, P., Summers, A.J.: Abstract read per-
missions: Fractional permissions without the fractions. In: Conference on
Verification, Model Checking, and Abstract Interpretation (VMCAI). pp.
315–334 (2013). https://doi.org/10.1007/978-3-642-35873-9 20
[32] Kahsai, T., Kersten, R., Rümmer, P., Schäf, M.: Quantified heap invariants
for object-oriented programs. In: Conference on Logic for Programming
Artificial Intelligence and Reasoning (LPAR). pp. 368–384 (2017)
[33] Kahsai, T., Rümmer, P., Sanchez, H., Schäf, M.: JayHorn: A framework for
verifying Java programs. In: Conference on Computer Aided Verification
(CAV). pp. 352–358. Springer (2016). https://doi.org/10.1007/978-3-319-
41528-4 19
[34] Kashyap, V., Dewey, K., Kuefner, E.A., Wagner, J., Gibbons, K., Sarracino,
J., Wiedermann, B., Hardekopf, B.: JSAI: a static analysis platform for
JavaScript. In: Conference on Foundations of Software Engineering (FSE).
pp. 121–132 (2014). https://doi.org/10.1145/2635868.2635904
[35] Kawaguchi, M., Rondon, P., Jhala, R.: Type-based data structure verification.
In: Conference on Programming Language Design and Implementation
(PLDI). pp. 304–315 (2009). https://doi.org/10.1145/1542476.1542510
[36] Kloos, J., Majumdar, R., Vafeiadis, V.: Asynchronous liquid separation
types. In: European Conference on Object-Oriented Programming (ECOOP).
pp. 396–420. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2015).
https://doi.org/10.4230/LIPIcs.ECOOP.2015.396
[37] Komuravelli, A., Gurfinkel, A., Chaki, S., Clarke, E.M.: Automatic ab-
straction in SMT-based unbounded software model checking. In: Confer-
ence on Computer Aided Verification (CAV). pp. 846–862. Springer (2013).
https://doi.org/10.1007/978-3-642-39799-8 59
712 J. Toman et al.

[38] Leino, K.R.M.: Dafny: An automatic program verifier for functional correct-
ness. In: Conference on Logic for Programming Artificial Intelligence and Rea-
soning (LPAR). pp. 348–370. Springer (2010). https://doi.org/10.1007/978-
3-642-17511-4 20
[39] Leino, K.R.M., Müller, P., Smans, J.: Deadlock-free channels and locks. In:
European Symposium on Programming (ESOP). pp. 407–426. Springer-
Verlag (2010). https://doi.org/10.1007/978-3-642-11957-6 22
[40] Matsushita, Y., Tsukada, T., Kobayashi, N.: RustHorn: CHC-based verifica-
tion for Rust programs. In: European Symposium on Programming (ESOP).
Springer (2020)
[41] Milanova, A., Rountev, A., Ryder, B.G.: Parameterized object sen-
sitivity for points-to analysis for Java. ACM Transactions on Soft-
ware Engineering and Methodology (TOSEM) 14(1), 1–41 (2005).
https://doi.org/10.1145/1044834.1044835
[42] Müller, P., Schwerhoff, M., Summers, A.J.: Viper: A verification infrastruc-
ture for permission-based reasoning. In: Conference on Verification, Model
Checking, and Abstract Interpretation (VMCAI). pp. 41–62. Springer-Verlag
(2016). https://doi.org/10.1007/978-3-662-49122-5 2
[43] Pierce, B.C.: Types and programming languages. MIT press (2002)
[44] Protzenko, J., Zinzindohoué, J.K., Rastogi, A., Ramananandro, T., Wang,
P., Zanella-Béguelin, S., Delignat-Lavaud, A., Hriţcu, C., Bhargavan, K.,
Fournet, C., Swamy, N.: Verified low-level programming embedded in F*.
Proceedings of the ACM on Programming Languages 1(ICFP), 17:1–17:29
(2017). https://doi.org/10.1145/3110261
[45] Reynolds, J.C.: Separation logic: A logic for shared mutable data structures.
In: Symposium on Logic in Computer Science (LICS). pp. 55–74. IEEE
(2002). https://doi.org/10.1109/LICS.2002.1029817
[46] Rondon, P., Kawaguchi, M., Jhala, R.: Low-level liquid types. In: Symposium
on Principles of Programming Languages (POPL). pp. 131–144 (2010).
https://doi.org/10.1145/1706299.1706316
[47] Rondon, P.M., Kawaguci, M., Jhala, R.: Liquid types. In: Conference on
Programming Language Design and Implementation (PLDI). pp. 159–169
(2008). https://doi.org/10.1145/1375581.1375602
[48] Rümmer, P., Hojjat, H., Kuncak, V.: Disjunctive interpolants for Horn-
clause verification. In: Conference on Computer Aided Verification (CAV).
pp. 347–363. Springer (2013). https://doi.org/10.1007/978-3-642-39799-8 24
[49] Sharir, M., Pnueli, A.: Two approaches to interprocedural data flow analysis.
In: Muchnick, S.S., Jones, N.D. (eds.) Program Flow Analysis: Theory and
Applications, chap. 7, pp. 189–223. Prentice Hall (1981)
[50] Shivers, O.: Control-flow analysis of higher-order languages. Ph.D. thesis,
Carnegie Mellon University (1991)
[51] Smaragdakis, Y., Bravenboer, M., Lhoták, O.: Pick your con-
texts well: Understanding object-sensitivity. In: Symposium on
Principles of Programming Languages (POPL). pp. 17–30 (2011).
https://doi.org/10.1145/1926385.1926390
ConSORT: Context- and Flow-Sensitive Ownership Refinement Types 713

[52] Smith, F., Walker, D., Morrisett, G.: Alias types. In: European
Symposium on Programming (ESOP). pp. 366–381. Springer (2000).
https://doi.org/10.1007/3-540-46425-5 24
[53] Späth, J., Ali, K., Bodden, E.: Context-, flow-, and field-sensitive
data-flow analysis using synchronized pushdown systems. Proceedings
of the ACM on Programming Languages 3(POPL), 48:1–48:29 (2019).
https://doi.org/10.1145/3290361
[54] Späth, J., Nguyen Quang Do, L., Ali, K., Bodden, E.: Boomerang:
Demand-driven flow-and context-sensitive pointer analysis for Java. In:
European Conference on Object-Oriented Programming (ECOOP). pp.
22:1–22:26. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2016).
https://doi.org/10.4230/LIPIcs.ECOOP.2016.22
[55] Suenaga, K., Fukuda, R., Igarashi, A.: Type-based safe resource dealloca-
tion for shared-memory concurrency. In: Conference on Object Oriented
Programming Systems Languages and Applications (OOPSLA). pp. 1–20
(2012). https://doi.org/10.1145/2384616.2384618
[56] Suenaga, K., Kobayashi, N.: Fractional ownerships for safe memory deal-
location. In: Asian Symposium on Programming Languages and Systems
(APLAS). pp. 128–143. Springer (2009). https://doi.org/10.1007/978-3-642-
10672-9 11
[57] Swamy, N., Hriţcu, C., Keller, C., Rastogi, A., Delignat-Lavaud, A., Forest,
S., Bhargavan, K., Fournet, C., Strub, P.Y., Kohlweiss, M., Zinzindohoué,
J.K., Zanella-Béguelin, S.: Dependent types and multi-monadic effects in
F*. In: Symposium on Principles of Programming Languages (POPL). pp.
256–270 (2016). https://doi.org/10.1145/2837614.2837655
[58] Swamy, N., Weinberger, J., Schlesinger, C., Chen, J., Livshits, B.: Verifying
higher-order programs with the Dijkstra monad. In: Conference on Program-
ming Language Design and Implementation (PLDI). pp. 387–398 (2013).
https://doi.org/10.1145/2491956.2491978
[59] Terauchi, T.: Checking race freedom via linear programming. In: Conference
on Programming Language Design and Implementation (PLDI). pp. 1–10
(2008). https://doi.org/10.1145/1375581.1375583
[60] Toman, J., Siqi, R., Suenaga, K., Igarashi, A., Kobayashi, N.: Consort:
Context- and flow-sensitive ownership refinement types for imperative pro-
grams. https://arxiv.org/abs/2002.07770 (2020)
[61] Unno, H., Kobayashi, N.: Dependent type inference with interpolants. In:
Conference on Principles and Practice of Declarative Programming (PPDP).
pp. 277–288. ACM (2009). https://doi.org/10.1145/1599410.1599445
[62] Vazou, N., Rondon, P.M., Jhala, R.: Abstract refinement types. In: Euro-
pean Symposium on Programming (ESOP). pp. 209–228. Springer (2013).
https://doi.org/10.1007/978-3-642-37036-6 13
[63] Vazou, N., Seidel, E.L., Jhala, R., Vytiniotis, D., Peyton-Jones, S.: Refine-
ment types for Haskell. In: International Conference on Functional Program-
ming (ICFP). pp. 269–282 (2014). https://doi.org/10.1145/2628136.2628161
714 J. Toman et al.

[64] Wells, J.B., Dimock, A., Muller, R., Turbak, F.: A calculus with polymorphic
and polyvariant ﬂow types. Journal of Functional Programming 12(3), 183–
227 (2002). https://doi.org/10.1017/S0956796801004245
[65] Xi, H., Pfenning, F.: Dependent types in practical programming. In: Sympo-
sium on Principles of Programming Languages (POPL). pp. 214–227. ACM
(1999). https://doi.org/10.1145/292540.292560
[66] Zave, P.: Using lightweight modeling to understand Chord. ACM
SIGCOMM Computer Communication Review 42(2), 49–57 (2012).
https://doi.org/10.1145/2185376.2185383
[67] Zhu, H., Jagannathan, S.: Compositional and lightweight dependent
type inference for ML. In: Conference on Veriﬁcation, Model Check-
ing, and Abstract Interpretation (VMCAI). pp. 295–314. Springer (2013).
https://doi.org/10.1007/978-3-642-35873-9 19

Vasco T. Vasconcelos , Filipe Casal , Bernardo Almeida , and Andreia

Mordido

LASIGE, Faculdade de Ciências, Universidade de Lisboa, Lisbon, Portugal

Abstract. Session types describe patterns of interaction on commu-

nicating channels. Traditional session types include a form of choice
whereby servers offer a collection of options, of which each client picks
exactly one. This sort of choice constitutes a particular case of separated
choice: offering on one side, selecting on the other. We introduce mixed
choices in the context of session types and argue that they increase the
flexibility of program development at the same time that they reduce
the number of synchronisation primitives to exactly one. We present a
type system incorporating subtyping and prove preservation and absence
of runtime errors for well-typed processes. We further show that classi-
cal (conventional) sessions can be faithfully and tightly embedded in
mixed choices. Finally, we discuss algorithmic type checking and a run-
time system built on top of a conventional (choice-less) message-passing
architecture.

Keywords: Type Systems · Session Types · Mixed Choice.

1 Introduction

Session types provide for describing series of continuous interactions on commu-

nication channels [16,19,43,45,49]. When used in type systems for programming
languages, session type systems statically verify that programs follow protocols,
and hence that they do not engage in communication mismatches.
In order to motivate mixed sessions, suppose that we want to describe a
process that asks for a ﬁxed but unbounded number of integer values from some
producer. The consumer may be in two states: happy with the values received
so far, or ready to ask the producer for a new value. In the former case it must
notify the producer so that this may stop sending numbers. In the latter case,
the client must ask the producer for another integer, after which it “goes back
to the beginning”. Using classical sessions, and looking from the consumer side,
the communication channel can be described by a (recursive) session type T of
the form
⊕{ enough : end , more : ? i n t . T}

where ⊕ denotes internal choice (the consumer decides), the two branches in the
choice are labelled with enough and more, type end denotes a channel on which
no further interaction is possible, and ?int denotes the reception of an integer
c The Author(s) 2020
P. Müller (Ed.): ESOP 2020, LNCS 12075, pp. 715–742, 2020.
https://doi.org/10.1007/978-3-030-44914-8_ 26
716 V. T. Vasconcelos et al.

value. Reception is a prefix to a type, the continuation is T (in this case the “goes
back to the beginning” part). The code for the consumer (and the producer as
well) is unnecessarily complex, featuring parts that exchange messages in both
directions: enough and more selections from the consumer to the producer, and int
messages from the producer to the consumer. In particular, the consumer must
first select option enough (outgoing) and then receive an integer (incoming).
Using mixed sessions one can invert the direction of the more selection and
write the type of the channel (again as seen from the side of the consumer) as
⊕{ enough ! u n i t . end , more ? i n t . T}
The changes seem merely cosmetic, but label/polarity pairs (polarity is ! or ?)
are now indivisible and constitute the keys of the choice type when seen as a
map. The integer value is piggybacked on top of selection more. As a result, the
classical session primitive operations: selection and branching (that is, internal
and external choice) and communication (output and input) become one only:
mixed session. The producer can be safely written as
p ( enough ? z . 0 + more ! n . p r o d u c e ! ( p , n+1) )
offering a choice on channel end p featuring mixed branches with labels enough?
and more!, where 0 denotes the terminated process and produce(p, n+1) a recur-
sive call to the producer. The example is further developed in Section 2.
Mixed sessions build on Vasconcelos presentation of session types which we
call classical sessions [43], by adapting choice and input/output as needed, but
keeping everything else unchanged as much as possible. The result is a language
with
– a single synchronisation/communication primitive: mixed choice on a given
channel that
– allows for duplicated labels in choice processes, leading to non-determinism
in a pure linear setting, and
– replicated output processes arising naturally from replicated mixed choices,
and that
– enjoys preservation and absence of runtime errors for typable processes, and
– provides for embedding classical sessions in a tight type and operational
correspondence.
The rest of the paper is organised as follows: the next section shows mixed ses-
sions in action; Section 3 introduces the technical development of the language,
and Section 4 proves the main results (preservation and absence of runtime
errors for typable processes). Then Section 5 presents the embedding and the
correspondence proofs, Section 6 discusses implementation details, and Section 7
explores related work. Section 8 concludes the paper.

2 There is Room for Mixed Sessions

This section introduces the main ideas of mixed sessions via examples. We ad-
dress mixed choices, duplicated labels in choices, and unrestricted output, in this
order.
Mixed Sessions 717

2.1 Mixed Choices

Consider the producer-consumer problem where the producer produces only in-
sofar as so requested by the consumer. Here is the code for a producer that
writes on channel end x numbers starting from n.
def produce ( x , n ) =
l i n x ( enough ? z . 0 +
more ! n . p r o d u c e ! ( x , n+1)
)

Syntax qx(M+N) introduces a choice between M and N on channel end x. Qualiﬁer

q is either un or lin and controls whether the process is persistent (remains after
reduction) or is ephemeral (is consumed in the reduction process). Each branch
in a choice is composed of a label (enough or more), a polarity mark (input ?
or output ! ), a variable or a value (z or n), and a continuation process (after
the dot). The terminated process is represented by 0; notation def introduces a
recursive process. The def syntax and its encoding in the base language is from
the Pict programming language [36] and taken up by Sepi [12].
A consumer that requests n integer values on channel end y can be written
as follows, where () represents the only value of type unit .
d e f consume ( y , n ) =
i f n == 0
then l i n y ( enough ! ( ) . 0 )
e l s e l i n y ( more ? z . consume ! ( x , n −1) )

Suppose that x and y are two ends of the same channel. When choices on x and
on y get together, a pair of matching label-polarities pairs is selected and a value
transmitted from the output continuation to the input continuation.
Types for the two channel ends ensure that choice synchronisation succeeds.
The type of x is rec a. lin &{enough?unit.end, more!int.a} where the qualifier lin
says that the channel end must be used in exactly one process, & denotes external
choice, and each branch is composed of a label, a polarity mark, the type of the
communication, and that of the continuation. The type end states that no further
interaction is possible at the channel and rec introduces a recursive type. The
type of y is obtained from that of x by inverting views (⊕ and &) and polarities
( ! and ?), yielding rec b. lin ⊕{enough!unit.end, more?int.b}. The choice at x in the
produce process contains all branches in the type and so we select an external
choice view & for x. The choices at y contain only part of the branches, hence
the internal choice view ⊕. This type discipline ensures that processes do not
engage in runtime errors when trying to find a match for two choices at the two
ends of a given channel.
A few type and process abbreviations simplify coding: i) the lin qualifier
can be omitted, ii) the terminated process 0 together with the trailing dot can
be omitted; iii) the terminated type end together with the trailing dot can be
omitted; and iv) we introduce wildcards ( ) in variable binding positions (in
input branches).
718 V. T. Vasconcelos et al.

2.2 Duplicated Labels in Choices for Types and for Processes

Classical session types require distinct identiﬁers to label distinct branches.

Mixed sessions relax this restriction by allowing duplicated labels whenever
paired with distinct polarities. The next example describes two processes—
countDown and collect —that bidirectionally exchange a fixed number of msg-
labelled messages. The number of messages that flow in each direction is not
fixed a priori, but instead decided by the non-deterministic operational seman-
tics. The type that describes the channel, as seen by process countDown, is rec
a.⊕{msg!unit.a, msg?unit.a, done!unit}, where one can see the msg label in two
distinct branches, but with different polarities.
Process countDown features a parameter n that controls the number of mes-
sages exchanged (sent or received). The end of the interaction (when n reaches
0) is signalled by a done message.
countDown : ( r e c a .⊕{msg ! u n i t . a , msg? u n i t . a , done ! u n i t } , i n t )
d e f countDown ( x , n ) =
i f n == 0
then x ( done ! ( ) )
e l s e x ( msg ! ( ) . countDown ! ( x , n −1) +
msg? . countDown ! ( x , n −1) )

Process collect sees the channel from the dual viewpoint, obtained by ex-
changing ? with ! and ⊕ with &. Parameter n in this case denotes the number
of messages received. When done, the process writes the result on channel end r ,
global to the collect process.
c o l l e c t : ( r e c b .&{msg ! u n i t . b , msg? u n i t . b , done ? u n i t } , i n t )
def c o l l e c t ( y , n ) =
y ( msg ! ( ) . c o l l e c t ! ( y , n+1) +
msg? . c o l l e c t ! ( y , n ) +
done ? . r ( r e s u l t ! n ) )

Mixed sessions allow for duplicated message-polarity pairs permitting a new

form of non-determinism that uses exclusively linear channels. A process of the
form (νxy)P declares a channel with end points x and y to be used in process P.
The process
( ν xy ) (
x ( msg ! ( ) ) |
y ( msg? . z (m! t r u e ) + msg? . z (m! f a l s e ) )
)

featuring two linear choices may reduce to z (m!true) or to z (m!false). Non-

determinism in the π-calculus without choice (that of Functions as Processes
[27,29] for example) can only be achieved by introducing race conditions on un
channels. For example, the π-calculus process
( ν xy ) ( x ! ( ) | y ? . z ! t r u e | y ? . z ! f a l s e ) )
Mixed Sessions 719

reduces either to (z!true | (νxy)y? .z! false )) or to (z! false | (νxy)y? .z!true)),
leaving for the runtime the garbage collection of the inert residuals. Also note
that in this case, channel y cannot remain linear.
Duplicated message-polarities in choices lead to elegant and concise code. A
random number generator with a given number n of bits can be written with two
processes. The ﬁrst process sends n messages on channel end x. The contents of
the messages are irrelevant (we use value () of type unit ); what is important is
that n more messages are sent, followed by a done message, followed by silence.
w r i t e : ( r e c a .⊕{ done ! u n i t , more ! u n i t . a } , i n t )
def w r i t e ( x , n ) =
i f n == 0
then x ( done ! ( ) )
e l s e x ( more ! ( ) . w r i t e ! ( x , n−1) )

The reader process reads the more messages in two distinct branches and
interprets messages received on one branch as bit 0, and on the other as 1. Upon
the reception of a done message, the accumulated random number is conveyed
on channel end r , a variable global to the read process.
r e a d : ( r e c b .&{ done ? u n i t , more ? u n i t . b } , i n t )
def read ( y , n ) =
y ( done ? . r ( r e s u l t ! n ) +
more ? . r e a d ! ( y , 2∗ n ) +
more ? . r e a d ! ( y , 2∗ n+1)
)

Notice that mixed sessions allow duplicated label-polarity pairs in processes

but not in types. This point is further discussed in Section 3. Also note that
duplicated message labels could be easily added to traditional session types.

2.3 Unrestricted Output

Mixed sessions allow for replicated output processes. The original version of
the π-calculus [30,31] features recursion on arbitrary processes. Subsequent ver-
sions [29] introduce replication but restricted to input processes. When compared
to languages with unrestricted input only, unrestricted output allows for more
concise programs and fewer message exchanges for the same eﬀect. Here is a
process (call it P ) containing a pair of processes that exchange msg-labelled
messages ad-aeternum,
( ν xy ) ( un y ( msg ! ( ) ) | un x ( msg? ) )

where x is of type rec a.un &{msg?unit.a}. The un preﬁx denotes replication: an

un choice survives reduction. Because none of the two sub-processes features a
continuation P reduces to P in one step. The behaviour of un y (msg!()) can be
mimicked by a process without output replication, namely,
( ν wz ) w ( ! ( ) ) | un z ( ? . y ( msg ! ( ) . w ( ! ( ) ) ) )
720 V. T. Vasconcelos et al.

v ::= Values:
x variable
true | false boolean values
() unit value
P ::= Processes:

qx Mi choice
i∈I

P |P parallel composition
(νxx)P scope restriction
if v then P else P conditional
0 inaction
M ::= Branches:
l v.P branch
::= Polarities:
! |? out and in
q ::= Qualiﬁers:
lin | un linear and unrestricted

Fig. 1: The syntax of processes

Even if unrestricted output can be simulated with unrestricted input, the encod-
ing requires one extra channel (wz) and an extra message exchange (on channel
wz) in order to reestablish the output on channel end y.
It is a fact that unrestricted output can be added to any ﬂavour of the π-
calculus (session-typed or not). In the case of mixed sessions it arises naturally:
there is only one communication primitive—choice—and this can be classiﬁed as
lin or un. If an un-choice happens to behave in “output mode”, then we have an un-
output. It is not obvious how to design the language of mixed choices without
allowing unrestricted output, while still allowing unrestricted input (which is
mandatory for unbounded behaviour).

3 The Syntax and Semantics of Mixed Sessions

This section introduces the syntax and the semantics of mixed sessions. Inspired
in Vasconcelos’ formulation of session types for the π-calculus [43,45], mixed
sessions replace input and output, selection and branching (internal and external
choice), with a single construct which we call choice.
Mixed Sessions 721

3.1 Syntax

Figure 1 presents the syntax of values and processes. Let x, y, z range over a
(countable) set of variables, and let l range over a set of labels. Metavariable v
ranges over values. Following the tradition of the π-calculus, set up by Milner
et al. [30,31], variables are used both as placeholders for incoming values in
communication and for channels. Linearity constraints, central to session types
but absent in the π-calculus, dictate that the two ends of a channel must be
syntactically distinguished; we use one variable for each end [43]. Different prim-
itive values can be used. Here, we pick the boolean values (so that we may have
a conditional process), and unit that plays its role in the embedding of classical
session types (Section 5).
Metavariables P and Q range over processes. Choices are processes of the
form qx i∈I Mi offering a choice of Mi alternatives on channel end x. Qualifier q

describes how choice behaves with respect to reduction. If q is lin, then the choice
is consumed in reduction, otherwise q must be un, and in this case the choice
persists after reduction. The type system in Figure 8 rejects nullary (empty)
choices. There are two forms of branches: output l! v.P and input l? x.P . An
output branch sends value v and continues as P . An input branch receives a
value and continues as P with the value replacing variable x. The type system
in Figure 8 makes sure that value v in l? v.P is a variable.
The remaining process constructors are standard in the π-calculus. Processes
of the form P | Q denote the parallel composition of processes P and Q. Scope
restriction (νxy)P binds together the two channel ends x and y of a same channel
in process P . The conditional process if v then P else Q behaves as process P if
v is true and as process Q otherwise. Since we do not have nullary choices, we
include 0—called inaction—as primitive to denote the terminated process.

3.2 Operational Semantics

The variable bindings in the language are as follows: variables x and y are bound
in P , in a process of the form (νxy)P ; variable x is bound in P in a choice of
the form l? x.P . The sets of bound and free variables, as well as substitution,
P [v/x], are defined accordingly. We work up to alpha-conversion and follow
Barendregt’s variable convention, whereby all variables in binding occurrences
in any mathematical context are pairwise distinct and distinct from the free
variables [2].
Figure 2 summarises the operational semantics of mixed sessions. Following
the tradition of the π-calculus, a binary relation on processes—structural congru-
ence—rearranges processes when preparing for reduction. Such an arrangement
reduces the number of rules included in the operational semantics. Structural
congruence was introduced by Milner [27,29]. It is defined as the least congru-
ence relation closed under the axioms in Figure 2. The first three rules state that
parallel composition is commutative, associative, and takes inaction as the neu-
tral element. The fourth rule is commonly known as scope extrusion [30,31] and
allows extending the scope of channel ends x, y to process Q. The side-condition
722 V. T. Vasconcelos et al.

Structural congruence, P ≡ P

P |Q≡Q|P (P | Q) | R ≡ P | (Q | R) P |0≡P
(νxy)P | Q ≡ (νxy)(P | Q) (νxy)0 ≡ 0 (νwx)(νyz)P ≡ (νyz)(νwx)P

Reduction, P → P

if true then P else Q → P if false then P else Q → Q [R-IfT] [R-IfF]

(νxy)(linx(M + l v.P + M ) | liny(N + l z.Q + N ) | R) → (νxy)(P | Q[v/z] | R)
! ?

(νxy)(unx(M + l! v.P + M ) | liny(N + l? z.Q + N ) | R) → [R-UnLin]

(νxy)(P | Q[v/z] | unx(M + l v.P + M ) | R)
!

(νxy)(unx(M + l! v.P + M ) | uny(N + l? z.Q + N ) | R) → [R-UnUn]

(νxy)(P | Q[v/z] | unx(M + l v.P + M ) | uny(N + l z.Q + N ) | R)
! ?

P → Q P → Q P ≡ P P → Q Q ≡ Q
(νxy)P → (νxy)Q P |R → Q|R P → Q
[R-Res] [R-Par] [R-Struct]

Fig. 2: Operational semantics

“x and y not free in Q” is redundant in face of the Barendregt convention. The

fifth rule allows collecting channel bindings no longer in use, and the last rule
allows for rearranging the order of channel bindings in a process.
Reduction includes six axioms, two for the destruction of boolean values (via
a conditional process), and four for communication. The axioms for communi-
cation take processes of a similar nature. The scope restriction (νxy) identifies
the two ends of the channel engaged in communication. Under the scope of the
channel one finds three processes: the first contains an output process on chan-
nel end x, the second contains an input process on channel end y, and the third
(R) is an arbitrary process that may contain other references to x and y (the
witness process). Communication proceeds by identifying a pair of compatible
branches, namely l! v.P and l? z.Q. The result contains the continuation pro-
cess P and the continuation process Q with occurrences of the bound variable z
replaced by value v (together with the witness process). The four axioms differ
in the treatment of the process qualifiers: lin (ephemeral) and un (persistent).
Ephemeral processes are consumed in reduction, persistent processes remain in
the contractum.
Choices apart, rules [R-LinLin] and [R-LinUn] are already present in the
works of Milner and Vasconcelos [29,43]. Rules [R-UnLin] and [R-LinLin] are
absent on the grounds of economy: replicated output can be simulated with a
new channel and a replicated in input. In mixed choices these rules cannot be
Mixed Sessions 723

T ::= Types:
q{Ui }i∈I choice
end termination
unit | bool unit and boolean
μa.T recursive type
a type variable
U ::= Branches:

l T.T branch
::= Views:
⊕ | & internal and external
Γ ::= Contexts:
· empty
Γ, x : T entry

Fig. 3: The syntax of types

omitted for there is no distinction between input and output: choice is the only
(symmetrical) communication primitive.
We have designed mixed choices in such a way that labels may be duplicated
in choices; more: label-polarity pairs may be also be duplicated. This allows for
non-determinism in a linear context. For example, process

(νxy)(lin x(l! true.0 + l! false.0) | lin y(l? z.lin w(m! z.0)))

reduces in one step to either lin w(m! true.0) or lin w(m! false.0).
The examples in Section 2 take advantage of a def notation, a derived process
construct inspired in the SePi [12] and the Pict languages [36]. A process of the
form def x(z) = P in Q is understood as

(νxy)(un y(? z.P ) | Q))

and calls to the recursive procedure, of the form x!v, are interpreted as lin x(! v),
for an arbitrarily chosen label. The derived syntax hides channel end y and
simpliﬁes the syntax of calls to the procedure. Procedures with more than one
parameter require tuple passing, a notion that is not primitive to mixed sessions.
Fortunately, tuple passing is easy to encode; see Vasconcelos[43].

3.3 Typing
Figure 3 summarises the syntax of types. We rely on an extra set, that of type
variables, a, b, . . . Types describe values, including boolean and unit values, and
724 V. T. Vasconcelos et al.

Branch subtyping, U <: U

S2 <: S1 T1 <: T2 S1 <: S2 T1 <: T2
l! S1 .T1 <: l! S2 .T2 l? S1 .T1 <: l? S2 .T2
Subtyping, T <: T
S[μa.S/a] <: T S <: T [μa.T /a]
end <: end unit <: unit bool <: bool μa.S <: T S <: μa.T
J ⊆I Uj <: Vj I⊆J Ui <: Vi
q⊕{Ui }i∈I <: q⊕{Vj }j∈J q&{Ui }i∈I <: q&{Vj }j∈J

Fig. 4: Coinductive subtyping rules

channel ends. A type of the form q{Ui }i∈I denotes a channel end. Qualifier q
states the number of processes that may contain references to the channel end:
exactly one for lin, zero or more for un. View distinguishes external (⊕) from in-
ternal (&) choice. This distinction is not present in processes but is of paramount
importance for typing purposes, as we shall see. The branches are either of
output—l! S.T —or of input—l? S.T —nature. In either case, S denotes the ob-
ject of communication and T describes the subsequent behaviour of the channel
end. Type end denotes the channel end on which no more interaction is possible.
Types μa.T and a cater for recursive types.
Types are subject to a few syntactic restrictions: i) choices must have at least
one branch; ii) label-polarity pairs—l —are pairwise distinct in the branches of
a choice type (unlike in processes); iii) recursive types are assumed contractive
(that is, containing no subterm of the form μa1 . . . μan .a1 ). New variables, new
bindings: type variable a is bound in T in type μa.T . Again the definitions
of bound and free names as well as that of substitution—S[T /a]—are defined
accordingly.
Mixed sessions come equipped with a notion of subtyping. Figure 4 introduces
the rules that allow determining whether a given type is subtype of another.
The rules must be read coinductively. Base types (end, unit, bool) are subtypes
to themselves. The rules for recursive types are standard. Subtyping behaves
differently in presence of external or internal choice. For external choice we re-
quire the branches in the subtype to contain those in the supertype: exercising
less options cannot cause difficulties on the receiving side. For internal choice we
require the opposite: here offering more choices can not cause runtime errors.
For branches we distinguish output from input: output is contravariant on the
contents of the message, input is covariant. In either case, the continuation is
covariant. Choices, input/output, and recursive types receive no different treat-
ment than those in classical sessions [15]. We can easily show that the <: relation
is a preorder. Notation S ≡ T abbreviates S <: T and T <: S.
Duality is a notion central to session types. In order for channel communi-
cation to proceed smoothly, the two channel ends must be compatible: if one
end says input, the other must say output; if one end says external choice, the
Mixed Sessions 725

Polarity duality and view duality, ⊥ and ⊥

!⊥? ?⊥! ⊕ ⊥& &⊥⊕

Type duality, T ⊥ T

⊥ i ⊥ • i Si ≡ Si Ti ⊥ Ti

end ⊥ end q{li Si .Ti }i∈I •
⊥ q{li Si .Ti }i∈I
S[μa.S/a] ⊥ T S ⊥ T [μa.T /a]
μa.S ⊥ T S ⊥ μa.T

Fig. 5: Coinductive type duality rules

un and lin predicates, un(T ), lin(T )

un(T )
un(end) un(unit) un(bool) un(un{Ui })
un(μa.T ) lin(T )

Fig. 6: The un and lin predicates on types

other must say internal choice. In presence of recursive types, the problem of
building the dual of a given type has been elusive, as works by Bernardi and
Hennessy, Bono and Padovani, Lindley and Morris show [5,7,25]. Here we eschew
the problem by working with a duality relation, as in Gay and Hole [15].
The rules in Figure 5 define what we mean for two types to be dual. This
is the coinductive definition of Gay and Hole in rule format (and adapted to
choice). Duality is defined for session types only. Type end is the dual of itself.
The rule for choice types requires dual views (& is the dual of ⊕, and vice-versa)
and dual polarities (? is the dual of !, and vice-versa). Furthermore, the objects
of communications must be equivalent (Si ≡ Si ) and the continuations must be
dual again (Ti ⊥ Ti ). The rules in the second line handle recursion in the exact
same way as in type equivalence. As an example, we can easily show that
μa.lin ⊕ {l? bool.lin&{m! unit.a}} ⊥ lin&{l! bool.μb.lin ⊕{m? unit.lin&{l! bool.b}}}
It can be shown that ⊥ is an involution, that is, if R ⊥ S and S ⊥ T , then
R ≡ T.
The meaning of the un and lin predicates are defined by the rules in Fig-
ure 6. Basic types—unit, bool, end—are unrestricted; un-annotated choices are
unrestricted; μa.T is unrestricted if T is. Contractivity ensures that the predi-
cate is total. All types are lin, meaning that both lin and non-lin types may be
used in linear contexts.
Before presenting the type system, we need to introduce two notions that
manipulate typing contexts. The rules in Figure 7 define the meaning of context
split and context update. These two relations are taken verbatim from Vasconce-
los [43]; context split is originally from Walker [48] (cf. Kobayashi et al. [22,23]).
Context split is used when type checking processes with two sub-processes. In
726 V. T. Vasconcelos et al.

Context split, Γ = Γ ◦ Γ

Γ1 ◦ Γ 2 = Γ un(T )
·=·◦·
Γ, x : T = (Γ1 , x : T ) ◦ (Γ2 , x : T )
Γ = Γ1 ◦ Γ 2 Γ = Γ1 ◦ Γ 2
Γ, x : lin p = (Γ1 , x : lin p) ◦ Γ2 Γ, x : lin p = Γ1 ◦ (Γ2 , x : lin p)

Context update, Γ + x : T = Γ

x: U ∈/Γ un(T ) T ≡U
Γ + x : T = Γ, x : T (Γ, x : T ) + x : U = (Γ, x : T )

Fig. 7: Inductive context split and context update rules

this case we split the context in two, by copying unrestricted entries to both
contexts and linear entries to one only. Context update is used to add to a given
context an entry representing the continuation (after a choice operation) of a
channel. If the variable in the entry is not in the context, then we add the entry
to the context. Otherwise we require the entry to be present in the context and
the type to be unrestricted.
The rules in Figure 8 introduce the typing system for mixed sessions. Here the
un and lin predicates on types are pointwise extended to typing contexts. Notice
that all contexts are linear and only some contexts are unrestricted. We require
all instances of the axioms to be built from unrestricted contexts, thus ensuring
that linear resources (channel ends) are fully consumed in typing derivations.
The typing rules for values should be straightforward: constants have their
own types, the type for a variable is read from the context, and [T-Sub] is the
subsumption rule, allowing a type to be replaced by a supertype.
The rules for branches—[T-Out] and [T-In]—follow those for output and
input in classical session types. To type an output branch we split the context
in two: one part for the value, the other for the continuation process. To type an
input branch we add an entry with the bound variable x to the context under
which we type the continuation process. Rule [T-In] rejects branches of the form
l? v.P when v not a variable. The continuation type T is not used in neither rule;
instead it is incorporated in the type for the channel in Γ (cf. rule [T-Choice]
below).
The rules for inaction, parallel composition, and conditional are from Vas-
concelos [43]. That for scope restriction is adapted from Gay and Hole [15]. Rule
[T-Inact] follows the general pattern for axioms, requiring a un context. Rule
[T-Par] splits the context in two, providing each subprocess with one part. Rule
[T-If] splits the context and uses one part to type guard v. Because v is unre-
stricted, we know that Γ1 contains exactly the un entries in Γ1 ◦ Γ2 and that Γ2
is equal to Γ1 ◦ Γ2 . Context Γ2 is used to type both branches of the conditional,
for only one of them will ever execute. Rule [T-Res] introduces in the typing
context entries for the two channel ends, x and y, at dual types.
Mixed Sessions 727

Typing rules for values, Γ

v : T

un(Γ ) un(Γ ) un(Γ1 , Γ2 ) Γ

v : S S <: T
Γ
() : unit Γ
true, false : bool Γ1 , x : T, Γ2
x : T Γ
v: T
[T-Unit] [T-True] [T-False] [T-Var] [T-Sub]

Typing rules for branches, Γ

M : U
Γ1
v : S Γ2
P Γ, x : S
P
[T-Out] [T-In]
Γ1 ◦ Γ2
l! v.P : l! S.T Γ
l? x.P : l? S.T
Typing rules for processes, Γ
P
un(Γ ) Γ1
P Γ2
Q
[T-Inact] [T-Par]
Γ
0 Γ1 ◦ Γ 2
P | Q
Γ1
v : bool Γ2
P Γ2
Q Γ, x : S, y : T
P S⊥T
[T-If] [T-Res]
Γ1 ◦ Γ2
if v then P else Q Γ
(νxy)P
q1 (Γ1 ◦Γ2 ) Γ1
x : q2 {li Si .Ti }i∈I Γ2 + x : Tj
lj vj .Pj : lj Sj .Tj {lj }j∈J = {li }i∈I

Γ1 ◦ Γ2
q1 x j∈J lj vj .Pj
[T-Choice]

Fig. 8: Inductive typing rules

The rule for choice is new. The incoming context is split in two: one for the
subject x of the choice, the other for the various branches in the choice. The
qualiﬁer of the process, q1 , dictates the nature of the incoming context: un or lin.
This allows for a linear choice to contain channels of an arbitrary nature, but
limits unrestricted choices to unrestricted channels only (for one cannot predict
how many times such choices will be exercised). The second premise extracts a
type q2 {li Si .Ti } for x. The third premise types each branch: type Sj is used to
type values vj in the branches and each type Tj is used to type the corresponding
continuation. The rule updates context Γ2 with the continuation type of x: if
q2 is lin, then x is not in Γ2 and the update operation simply adds the entry
to the context. If, on the other hand, q2 is un, then x is in Γ2 and the context
update operation (together with rule [T-Sub]) insists that type Tj is a subtype
of un{lj Sj .Tj }, meaning that Tj is a recursive type.
The last premise to rule [T-Choice] insists that the set of labels in the
choice type coincides with that in the choice process. That does not mean that
the label-polarity pairs are in a one-to-one correspondence: label-polarity pairs
are pairwise distinct in types (see the syntactic restrictions in Section 3.3),
but not in processes. For example, process linx(l? y.0 + l? z.0) can be typed
against context x : lin ⊕ {l? bool.end}. From the fact that the two sets must co-
incide does not follow that the label-polarity pairs type in the context must
coincide with those in the process. Taking advantage of subtyping, the above
process can still be typed against context x : lin⊕{l? bool.end, m! unit.end} because
lin ⊕{l? bool.end, m! unit.end} <: lin ⊕{l? bool.end}. The opposite phenomenon hap-
728 V. T. Vasconcelos et al.

pens with external choice, where one may remove branches by virtue of subtyp-
ing.
We complete this section by discussing examples that illustrate options taken
in the typing system (we postpone the formal justiﬁcation to Section 4). Suppose
we allow empty choices in the syntax of types. Then the process

(νxy)(x() | y())

would be typable by taking x : ⊕ (), y : &(), yet the process would not reduce.
We could add an extra reduction rule for the eﬀect

(νxy)(x() | y() | R) → (νxy)R

which would satisfy preservation (Theorem 2). We decided not to include it in
our reduction rules as we did not want the extra complexity. Including the rule
also does not bring any apparent beneﬁt.
The syntax of processes places no restrictions on the label-polarity pairs
in choices; yet that of types does. What if we relax the restriction that label-
polarities pairs in choice types must be pairwise distinct? Then process

(νxy)(x(l! true + l! ()) | y(l? z.if z then 0 else 0))

could be typed under context x : &{l! bool, l! unit}, y : ⊕ {l? bool, l? unit}, yet the
process might reduce to if () then 0 else 0 which is a runtime error.

4 Well-typed Mixed Sessions Do Not Lead to Runtime

Errors
This section introduces the main results of mixed choices: absence of runtime
errors and preservation, both for well-typed processes.
We say that a process is a runtime error if it is structurally congruent to:
– a process of the form

li vi .Pi | q y lj wj .Qj | R)

(νx1 y1 ) . . . (νxn yn )(νxy)(qx
i∈I j∈J

where {li• }i∈I ∩ {lj }j∈J = ∅ with each •i is obtained by dualising i , or

– a process of the form qz(M + l? v.P + N ) and v is not a variable, or
– a process of the form if v then P else Q and v is neither true nor false.
Examples of processes which are runtime errors include:

(νxy)(linx(l! true.0) | liny(l! true.0))

(νxy)(unx(l! true.0) | liny(m? z.0))
unx(l? false.0)
if () then 0 else 0
Mixed Sessions 729

Notice that processes of the form (νxy)linx i∈I Mi cannot be classiﬁed as

runtime errors for they may be typed. Just think of (νxy)linx(l? z.liny(l! true.0)),
typable under the empty context. Unlike the interpretations of session types
in linear logic by Caires, Pfenning and Wadler [8,14,46,47], typable mixed ses-
sion processes can easily deadlock. Similarly, processes with more than one lin-
choice on the same channel end can be typed. For example process linx(l! true.0) |
linx(l? z.0)) can be typed under context x : μa.un ⊕ {l! unit.a, l? bool.a}. Recall the
relationship between qualiﬁers in processes q1 and those in types q2 in the dis-
cussion of the rules for choice in Section 3.
Theorem 1 (Well-typed processes are not runtime errors). If · P ,
then P is not a runtime error.
Proof. In view of a contradiction, assume that · P and that P is

li vi .Pi | q2 yn lj wj .Qj | R)

(νx1 y1 ) . . . (νxn yn )(q1 xn
i∈I j∈J

and {li• }i∈I ∩ {lj }j∈J = ∅ with i ⊥•i . From the typing derivation for P ,
using [T-Par] and [T-Res], we obtain a context Γ = Γ1 ◦ Γ2 ◦ Γ3 =
x1 : T
1 , y1 : S1 , . . . , xn : Tn , yn : Sn ,Ti ⊥ Si for all i = 1, . . . , n and that Γ1
q1 xn i∈I li vi .Pi and Γ2 q2 yn j∈J lj wj .Qj and Γ3 R. Without loss of
generality, due to the fact that xn and yn have dual types and from the
premises of rule [T-Choice], assume that Γ1 xn : q1 &{lk Tk .Tk }k∈K and
Γ2 yn : q2 ⊕{lk• Sk .Sk }k∈K , {li }i∈I = {lk }k∈K and {lj }j∈J ⊆ {lk• }k∈K ,
with k ⊥•k . This also implies that {li• }i∈I = {lk• }k∈K . Thus, a label lj from
q2 y1 j∈J lj wj .Qj belongs to the set of labels {li• }i∈I : lj ∈ {lk• }k∈K = {li• }i∈I ,
contradicting {li• }i∈I ∩ {lj }j∈J = ∅ with i ⊥•i
When P is qz(M + l? v.P + N ) and v is not a variable, the contradiction is
with rule [T-Out], which can only be applied when the value v is a variable.
When P is if v then P else Q and v is not a boolean value, the contradiction
immediately arises with rule [T-If].

In order to prepare for the preservation result we introduce a few lemmas.
Lemma 1 (Unrestricted weakening). If Γ P and un(T ), then Γ, x : T P .
Proof. The proof goes by mutual induction on the rules for branches and pro-
cesses, but we ﬁrst need to show the result for the value typing rules. We need
to show that if Γ v : S and un(R) then Γ, x : R v : S. This follows by a simple
case inspection of the rules [T-Unit], [T-True],[T-False],[T-Var] taking into
consideration that un(R). For the rule [T-Sub], use the induction hypothesis to
obtain Γ, x : R v : S and conclude, using [T-Sub], that Γ, x : R v : T .
For the branch and processes typing rules we detail the proof when the last
rule is [T-Out]. Using the result for typing values, we obtain Γ1 , x : R v : S,
and the induction hypothesis for processes leads to Γ2 , x : R P . Using the
un context split property, taking into account that un(R), we conclude that
Γ1 ◦ Γ2 , x : R l! v.P : l! S.T .
730 V. T. Vasconcelos et al.

For the processes rule [T-Inact], the result is a simple consequence of un(T ).
For the other rules, the result follows by induction hypothesis in processes and
branches rules, as well as using the value typing result. We detail the proof for
rule [T-If]. Using the typing values result, we know that Γ1 , x : T x : bool. By
induction hypothesis we also obtain that Γ2 , x : T P and Γ2 , x : T Q. Using
the un context split property, we conclude Γ1 ◦ Γ2 , x : T if v then P else Q.

Lemma 2 (Preservation for ≡). If Γ P and P ≡ Q, then Γ Q.

Proof. As in Vasconcelos [43, Lemma 7.4] since we share the structural congru-
ence axioms.

Lemma 3 (Substitution). If Γ1 v : T and Γ2 , x : T P and Γ = Γ1 ◦ Γ2 ,

then Γ P [v/x].

Proof. The proof follows by mutual induction on the rules for processes and
branches.

Theorem 2 (Preservation). If Γ P and P → Q, then Γ Q.

Proof. The proof is by rule induction on the reduction, making use of the weaken-
ing, substitution lemmas, and preservation for structural congruence. We sketch
the cases for [R-LinLin] and [R-LinUn].
When reduction ends with rule [R-LinLin], we know that rule [T-Res] in-
troduces x : X, y : Y with X⊥Y in the context Γ . From there, with applications
of [T-Par] and [T-Choice], Γ = Γ1 ◦ Γ2 ◦ Γ3 and Γ1 lin x(M + l! v.P + M ),
Γ2 lin y(N + l? z.Q + N ), Γ3 R. Furthermore, Γ1 = Γ1 ◦ Γ1 and lin(Γ1 ),
Γ1 x : lin ⊕ {M, l! S.T, M } and Γ1 , x : T l! v.P : l! S.T . From the [T-Out]
rule, Γv v : S and Γ4 P . For the y side, Γ2 y : lin&{N, l? U.V, N } and
Γ2 , y : Y l? z.Q : l? U.V . From the [T-In] rule, Γz , y : V, z : U Q. We also have
that S ≡ U from the duality of x and y. Using the substitution Lemma 3,
Γz , y : V, Γv Q[v/z]. Using [T-Par] with the remaining contexts and [T-Res]
types the conclusion of [R-LinLin].
When reduction ends with rule [R-LinUn], we know that rule [T-Res] in-
troduces x : X, y : Y with X⊥Y in the context Γ . From there, with applications
of [T-Par] and [T-Choice], Γ = Γ1 ◦ Γ2 ◦ Γ3 and Γ1 lin x(M + l! v.P + M ),
Γ2 un y(N + l? z.Q + N ), Γ3 R. Furthermore, Γ1 = Γ1 ◦ Γ1 and lin(Γ1 ),
Γ1 x : un ⊕ {M, l! S.T, M }. Here x is un since x and y are dual. We also have
Γ1 , x : T l! v.P : l! S.T , from which follows Γ4 v : S and Γ5 P from rule
[T-Out]. For the y side, Γ2 y : un&{N, l? U.V, N } and Γ2 , y : Y l? z.Q : l? U.V
which has Γ6 , y : V, z : U Q from [T-In].
Types S and U are equivalent due to the duality of x, y and so Γ6 , y : V, z : S
Q. Using the substitution Lemma 3, Γ6 ◦ Γ4 , y : V Q[v/z]. From Γ5 we also type
the process P . Using [T-Par] with the remaining contexts and [T-Res], types
the conclusion of [R-UnLin].

Mixed Sessions 731

5 Classical Sessions Were Mixed All Along

This section introduces the syntax and semantics of classical session types and
shows that the language of classical sessions can be embedded in that of mixed
sessions.
The syntax and semantics of classical session types are in Figure 9; we follow
Vasconcelos [43]. The syntax and the rules for the various judgements extend
those of Figures 1 to 8, where we remove choice both from grammar productions
(for processes and types) and from the various judgements (operational seman-
tics, subtyping, duality, and typing). On what concerns the syntax of processes,
the choice construct of Figure 1 is replaced by new process constructors: output,
linear (lin) and replicated (un) input, selection (internal choice) and branching
(external choice). The four reduction axioms in Figure 2 that pertain to choice
([R-LinLin], [R-LinUn], [R-UnLin], [R-UnUn]) are replaced by the three ax-
ioms in Figure 9. Rule [R-LinCom] describes the output against ephemeral-input
interaction, rule [R-UnCom] the output against replicated-input interaction,
and rule [R-Case] selects a label in the menu at the other channel end.
The syntax of types features new constructs—linear or unrestricted input and
output, and linear or unrestricted external and internal choice—replacing the
choice construct in Figure 3. The subtyping rules for the new type constructors
are taken from Gay and Hole [15]. Type duality is such that the objects of com-
munication must be equivalent and the continuations (both in communication
and choice) must be dual again. We omit the dual rules for q!S.S ⊥ q?T.T and
q&{li : Si }i∈I ⊥ q⊕{li : Ti }i∈I . The new duality rules are adapted from the co-
inductive definition of Gay and Hole [15]. The un predicate on types insists on the
idea that un-annotated types are unrestricted: un(un S.T ) and un(un{li : Ti }).
The typing rule for choice in Figure 8 is replaced by the four rules in Figure 9;
these are taken verbatim from Vasconcelos [43].
The embedding of classical session types in mixed sessions is defined in Fig-
ure 10. It consists of two maps, one for processes, the other for types. These
maps act as homomorphisms on all process and type constructors not explicitly
shown. For example P | Q = P | Q. We distinguish one label, msg, and
use it to encode input and output (both processes and types). Input and output
processes are encoded in choices with one only msg-labelled branch. The output
process is qualified as lin (it does not survive reduction) and the input process
reads its qualifier q from the incoming process. Choice processes in classical ses-
sions are encoded in choices in mixed sessions. The value transmitted on the
mixed session is irrelevant: we pick () of type unit for the output side, and a
fresh variable yi on the input side. Both types are linear.
Input and output types are translated in choice types. For output we arbi-
trarily pick an external choice (⊕), and conversely for the input. The label in the
only branch is msg in order to match our pick for processes, and the qualifier is
read from the incoming type. For classical choices, we read the qualifier and the
view from the incoming type. The type of the communication in the branches of
the mixed choice is unit, again so that it matches our pick for processes.
Typing correspondence says that the embedding preserves typability.
732 V. T. Vasconcelos et al.

Classical syntactic forms

P ::= . . . Processes:
x!v.P output
qx?x.P input
x l.P selection
x {li : Pi }i∈I branching
T ::= . . . Types:
q T.T communication
q{li : Ti }i∈I choice

Classical reduction rules, P → P , (plus [R-Res] [R-Par] [R-Struct] from Figure 2)

(νxy)(x!v.P | lin y?z.Q | R) → (νxy)(P | Q[v/z] | R) [R-LinCom]

(νxy)(x!v.P | un y?z.Q | R) → (νxy)(P | Q[v/z] | un y?z.Q | R) [R-UnCom]
j∈I
[R-Case]
(νxy)(x lj .P | y {li : Qi }i∈I | R) → (νxy)(P | Qj | R)
Classical subtyping rules, T <: T

T <: S S <: T S <: T S <: T

q!S.S <: q!T.T
q?S.S <: q?T.T

J ⊆I Sj <: Tj I⊆J Si <: Ti

q⊕{li : Si }i∈I <: q⊕{lj : Tj }j∈J q&{li : Si }i∈I <: q&{lj : Tj }j∈J

Classical type duality rules, T ⊥ T

S≡T S ⊥ T S i ⊥ Ti
q?S.S ⊥ q!T.T
q ⊕ {li : Si }i∈I ⊥ q&{li : Ti }i∈I

Classical typing rules, Γ

P
Γ1
x : q !T.U Γ2
v : T Γ3 + x : U
P
[T-TOut]
Γ1 ◦ Γ2 ◦ Γ3
x!v.P
q1 (Γ1 ◦ Γ2 ) Γ1
x : q2 ?T.U (Γ2 + x : U ), y : T
P
[T-TIn]
Γ1 ◦ Γ2
q1 x?y.P
Γ1
x : q&{li : Ti }i∈I Γ2 + x : Ti
Pi ∀i ∈ I
[T-Branch]
Γ1 ◦ Γ2
x {li : Pi }i∈I
Γ1
x : q⊕{li : Ti }i∈I Γ 2 + x : Tj
P j∈I
[T-Sel]
Γ1 ◦ Γ2
x lj .P

Fig. 9: Classical session types

Theorem 3 (Typing correspondence).

1. If Γ v : T , then Γ v : T .
2. If Γ P , then Γ P .
Mixed Sessions 733

Process translation

x!v.P = lin x{msg! v.P }

qx?y.P = q x{msg? y.P }
x l.P = lin x{l! ().P }
x {li : Pi }i∈I = lin x{li? yi .Pi }i∈I (yi ∈
/ fv(Pi ))

(Homomorphic for 0, P | Q, (νxy)P , and if v then P else Q)

Type translation

q!S.T = q⊕{msg! S.T }

q?S.T = q&{msg? S.T }
q⊕{li : Ti }i∈I = q⊕{li! unit.Ti }i∈I
q&{li : Ti }i∈I = q&{li? unit.Ti }i∈I

(Homomorphic for end, unit, bool, μa.T , and a)

Fig. 10: Embedding classical session types

Proof. 1. A straightforward rule induction on the hypothesis.

2. By rule induction on the hypothesis. We sketch a few cases.
When the derivation ends with [T-TIn], we use item 1., induction, the fact
that q1 (Γ1 ◦Γ2 ) implies q1 Γ1 · Γ2 , and that (Γ2 +x : T ), y : T = (Γ1 , y : T )+x : S
because x and y are distinct variables.
When the derivation ends with [T-Branch], we obtain (Γ2 + x : Ti ), yi : unit
Pi from the induction hypothesis Γ2 + x : Ti Pi using weakening (Lemma 1).

We complete this section by proving that the classical-mixed translation

meets Gorla’s good encoding criteria [17]. The five criteria proposed by Gorla
ensure that the encoding is meaningful. There are two syntactical and three
semantics-related criteria.
Let C range over classical processes and M range over mixed choice processes.
The map · : C → M described in Figure 10 is a translation from classical
processes to mixed choice processes. To be in line with the criteria, we add the
process representing a successfully terminating process to the syntax of both
the source and the target languages. We denote by ⇒ the reflexive and transitive
closure of the reduction relations, → , in both the source and target languages.
Sometimes we use subscript M to denote the reduction of mixed choice processes
and the subscript C for the reduction of classical processes, even though it should
be clear from context.
We say that a process P does not reduce, P →, when it cannot make any
reduction step. We say that a process diverges, P →ω , when P can do an infinite
number of reductions. On the other hand, a process is successful, P ⇓, if P
reduces to a process in parallel with a success , that is, P ⇒ P | . Gorla’s
734 V. T. Vasconcelos et al.

criteria view calculi as triples P, → , , where P is a set of processes, → a

reduction relation (the operational semantics), and is a behavioral equivalence
on processes.
The behavioral equivalence for mixed sessions we use coincides with struc-
tural congruence ≡.
The ﬁrst criterion states that the translation is compositional. For this pur-
pose, we deﬁne a context C( 1 ; . . . ; k ) as a classical process with k holes.

Theorem 4 (Compositionality). The translation · : C −→ M is composi-

tional, i.e., for every k-ary operator op of M and for every subset N of channel
N
ends, there exists a k-ary context Cop ( 1 ; . . . ; k ) such that for all P1 , . . . , Pk with
k N
∪i=1 fv(Pi ) = N and op(P1 , . . . , Pk ) = Cop (P1 ; . . . ; Pk ).

Proof. The translation of a process is deﬁned in terms of the translation of their

subterms, see Figure 10.

Following the ideas from Peters et al. [34], the translation from mixed to
classical sessions can be enriched with a renaming policy ϕ , representing a
map from channel ends to sequences of channel ends. The following theorem
states that the proposed translation is name invariant.

Theorem 5 (Name invariance). The translation · : C −→ M is name

invariant, i.e., for every classical process P and substitution σ,

= P σ if σ is injective
P σ
P σ otherwise

where σ is such that ϕ (σ(x)) = σ (ϕ (x)), for every channel end x.

Proof. The translation transforms each channel end (x, in Figure 10) into itself.
Thus, any substitution is preserved. See Figure 10.

Operational correspondence states that the embedding preserves and reﬂects

reduction. In our case the embedding is quite tight: one reduction step in classical
sessions corresponds to one reduction step in mixed sessions. There is no runtime
penalty in running classical sessions on a mixed sessions machine. Further notice
that we do not rely on any equivalence relation on mixed sessions to establish
the result: mixed-sessions images leave no “junk” in the process of simulating
classical sessions.
Theorem 6 (Operational correspondence). Let P, P be classical sessions
processes and Q a mixed sessions process.

1. If P → P , then P → P .
2. If P → Q, then P → P and P = Q, for some P .

Proof. Straightforward rule induction on the hypotheses, relying on the fact that
P [v/x] = P [v/x] and xi ∈
/ fv(Pi ) in the translation of x {li : Pi }i∈I .

Mixed Sessions 735

The following theorems concern the finite and infinite behavior of classical
session processes and their corresponding translations.
Theorem 7 (Divergence Reflection). The translation · : C −→ M reflects
divergence, i.e., if P →ω ω
M then P →C for every process P ∈ C.

Proof. Corollary of Theorem 6.

Theorem 8 (Success Sensitivity). The translation · : C −→ M is success
sensitive, i.e., P ⇓C iﬀ P ⇓M , for every process P ∈ C.
Proof. Corollary of Theorem 6.

6 What is in the Way of a Compiler?

This section discusses algorithmic type checking and the implementation of
choice in message passing architectures.
We start with type checking and then move to the runtime system. Gay and
Hole present an algorithmic subtyping system for classical sessions [15]. Algo-
rithmic subtyping for mixed sessions can be obtained by adapting the rules in
Figure 4 along the lines of Gay and Hole. [T-Sub] is the only non syntax-directed
rule in Figure 8.We delete this rule and distribute subtype checking among all
rules that use, in their premises, sequents Γ v : T , as usual. Most of the rules
include a non-deterministic context split operation. Take rule [T-Par], for ex-
ample. Rather than guessing the right split, we take the incoming context and
give it all to process P , later reclaiming the unused part. This outgoing context
is then passed to process Q. The outgoing context of the parallel composition
P | Q is that of Q. See, e.g., Vasconcelos or Walker for details [43,48]. Rule
[T-Res] requires guessing the type of the two channel ends, so that one is dual
to the other. Rather than guessing the type of channel end x, we require the help
of the programmer by working with an explicitly typed syntax—(νxy : T )P —as
in Franco and Vasconcelos [12,43], where T refers to the type of channel end x.
For the type of channel end y, rather than guessing, we build it from type T ;
cf. [4,5,7,25].
Running mixed sessions on a message passing architecture need not be an
expensive operation. Take one of the communication axioms in Figure 2. We
set up a broker process that receives the label-polarity pairs of both processes
({li }i∈I and {lj }j∈J ), decides on a matching pair (guaranteed to exist for typed
processes), and communicates the result back to the two processes. The processes
then exchange the appropriate value, and proceed. If the broker is an independent
process, then we exchange ﬁve messages per choice synchronisation. This basic
broker is instantiated for two processes P lin x(l1? z.P1 + l2! v2 .P1 + l3! v3 .P3 ) and
Q lin y(l1! v1 .Q1 + l3? w.Q3 ) in Figure 11a.
We can do better by piggybacking the values in the output choices together
with the label-polarities pairs. The broker passes its decision to the input side
in the form of a triple label-polarity-value, yielding one less message exchanged,
as showcased in Figure 11b.
736 V. T. Vasconcelos et al.

= a
Broker = a
Broker

l? ! !
l! +l? l? ! !
l! v1 +l?
1 +l2 +l3
1 2 1
3 1 +l2 v2 +l3 v3
1 2 1
3

3
4
3
4
l!1 l?
1 l!1 v1 l?
1

} ! } !
P o
5
v1
Q P Q

(a) Basic broker (b) Values are piggybacked

Fig. 11: Broker is an independent process

l!1 v1 +l? l? ! !
1 +l2 v2 +l3 v3
r ,
3

1
1
P/B 3Q P k Q/B
2 2
l?
1 l!1 v1

(a) P is the broker (b) Q is the broker

Fig. 12: Broker is P or Q

Finally, we observe that the broker need not be an independent process; it can
be located at one of the choice processes. This reduces the number of messages
down to two messages in the general case, as described in Figures 12a and 12b
where either P is the broker or Q is the broker. Even if the value was already
sent by Q in the case that P is the broker, P must still let Q know which choice
was taken, so that Q may proceed with the appropriate branch.
However, in particular cases one message may be enough. Take, for instance
a process P un x(l1! v1 .P + l2! v2 .P ). Independently of which branch is taken,
the process proceeds as P . Thus, if the broker is located in a process Q, then
P needs not be informed of the selected choice. The same is true for classical
sessions where selection is a mixed-out choice of a single branch.
There are two other aspects that one should discuss when implementing
mixed sessions on a message passing architecture other than the number of
messages exchanged.
The ﬁrst is related the type of broker used and to which values are revealed in
a choice to the other party. In the case of the basic broker, only the chosen option
value is revealed, and never to the broker itself. However, when we piggyback
the values in the second type of broker, all values in the choice branches are
revealed to the broker, even if they are not used in the end. This is even more
striking in the case where one of the processes is the broker—the other party
has access to all the possible values, independently of the choice that is taken.
Mixed Sessions 737

The second aspect is also related to the values themselves which, in order to
be presented in the choice, values must be computed a priori, even if they are
not used in the choice.
When dealing with the privacy of the values, we can choose which type of
broker to use depending on how much we want to reveal to the other party.
However, to prevent computing before a branch is chosen, one should instead
use classical sessions.

7 Related Work

The origin of choice Free (completely unrestricted) choice is central to process

algebras, including BPA and CCS [3,26]. Here we usually find processes of the
form P + Q, where P and Q are arbitrary process. Free choice is also present in
the very first proposal of the π-calculus [30,31], even if Milner later uses guarded
choice [28]. Sangiorgi and Walker’s book builds on the pi-calculus with guarded
(mixed) choice [38]. Guarded choices in all preceding proposals operate on possi-
bly distinct channels—x!true.P + y?z.Q— whereas choices on mixed sessions run
on a common channel—x(l!true.P + m?y.Q). Kouzapas and Yoshida introduce
the notion of mixed session in the context of multiparty session types [24]. Mul-
tiparty session types are projected into binary session types, hence the authors
also consider mixed choices for binary sessions. This language is not as concise
as the one we present, probably because it is designed so as to match projection
from multiparty types.
Labelled-choices were embedded in the theory of session types by Honda
et al. [18,19,41], where one finds primitives for value passing—x!true.P and
x?y.Q—and, separately, for choice in the form of labelled selection—x l.P —
and branching—x {li : Pi }i∈I —see Section 5. Coalescing label selection with
output and branching with input was proposed by Vasconcelos [44] (and later
used by Sangiorgi [37]) as a means to describe concurrent objects. Demangeon
and Honda use a similar language to study embeddings of calculi for functions
and for session-based communication [9]. All these languages offer only separated
(unmixed) choices and only on the input side.

Mixed choices in the Singularity operating system Concrete syntax apart, the
language of linear mixed choices is quite similar to that of channel contracts in
Sing# [10]. Rather than explicit recursive types, Sing# contracts uses named
states (akin to typestates [40]), providing for more legible contracts. In Sing#,
each state in a contract corresponds to a mixed session lin&{li Si .Ti } (contracts
are always written from the consumer side) where each li denotes a message tag,
the message direction (! or ?), Si the type of the value in the message, and Ti
the next state.
Stengel and Bultan showed that processes that follow Sing# contracts can
engage in communication errors [39]. They further provide a realizability condi-
tion for contracts that essentially rules out mixed choices. Bono and Padovani
present a calculus and a type system that models Sing# [6,7]. The type system
738 V. T. Vasconcelos et al.

ensures that well-typed processes are exempt from communication errors, but the
language of types excludes mixed-choices. So it seems that Sing#-like languages
only function properly under separated choice, yet our work survives under mixed
choices. Contradiction? No! Sing# features asynchronous (or buﬀered) seman-
tics whereas mixed sessions run under synchronous semantics. The operational
semantics makes all the diﬀerence in this case.

Synchronicity, asynchronicity, and choice Pierce and Turner identiﬁed the prob-
lem: “In an asynchronous language guarded choice should be restricted still fur-
ther since an asynchronous output in a choice is sensitive to buﬀering” [36] and
Peters et al. state that “a discussion on synchrony versus asynchrony cannot
be separated from a discussion on choice” [34,35]. Based on classical sessions,
mixed sessions are naturally synchronous. The naive introduction of an asyn-
chronous semantics would ruin the main results of the language (see Section 4).
Asynchronous semantics are known to be compatible with classical sessions;
see Honda et al. [20,21] for multiparty asynchronous session types and Fowler
et al. [11] and Gay and Vasconcelos [16] for two examples of functional lan-
guages with session types and asynchronous semantics. So one can ask whether
a language can be designed where mixed-choices are handled synchronously and
separated-choices asynchronously, a type-guided operational semantics with by-
default asynchronous semantics, reverting to a synchronous semantics when in
presence of mixed-choices.

Separation results Palamidessi shows that the π-calculus with mixed choice is
more expressive than its subset with separated choice [32]. Gorla provides a
simpler proof [17] of the same result and Peters and Nestmann analyse the
problem from the perspective of breaking initial symmetries in separated-choice
processes [33]. Unlike the π-calculus with separated choices, mixed choices oper-
ate on the same channel and are guided by types. It would be interesting to look
into separation results for classical sessions and mixed sessions. Are mixed ses-
sions more expressive than classical session under some widely accepted criteria
(those of Gorla [17], for example)?

The origin of mixed sessions Mixed sessions dawned on us when looking into
an algorithm to decide the equivalence of context-free session types [1,42]. The
algorithm translates types into (simple) context-free grammars. The decision
procedure runs on arbitrary simple grammars: the right-hand sides of grammar
productions may start with a label-output or a label-input pair for the same
non-terminal symbol at the left of the production. We then decided to explore
mixed sessions and picked the simplest possible language for the eﬀect: the π-
calculus. It would be interesting to look into mixed context-free session types,
given that decidability of type equivalence is guaranteed.
Mixed Sessions 739

8 Conclusion

We introduce mixed sessions: session types with mixed choice. Classical session
types feature separated choice; in fact all the proposals in the literature we are
aware of provide for choice on the input side only, even if we can easily think
of choice on the output side. Mixed sessions increase ﬂexibility in programming
and are easily realisable in conventional message passing architectures.
Mixed choices come with a type system featuring subtyping. Typability is
preserved by reduction. Furthermore well-typed programs are exempt from run-
time errors. We provide suggestions on how to derive a type checking procedure,
even if we do not formalise it. Classical session types are a particular case of
mixed sessions: we provide for an encoding and show typing and operational
correspondences.
We leave open the problem of looking into a typed separation result (or a
proof of inseparability) between classical sessions and mixed sessions. An inter-
esting avenue for further development includes looking for a hybrid type-guided
semantics, asynchcronous by default, that reverts to synchronous when in pres-
ence of an output choice.

Acknowledgements We thank Simon Gay, Uwe Nestmann, Kirstin Peters, and

Peter Thiemann for comments and discussions. This work was supported by
FCT through the LASIGE Research Unit, ref. UIDB/00408/2020, and by Cost
Action CA15123 EUTypes.

References

1. Almeida, B., Mordido, A., Vasconcelos, V.T.: Checking the equivalence of context-
free session types. In: Tools and Algorithms for the Construction and Analysis of
Systems - 26th International Conference, TACAS 2020. Lecture Notes in Computer
Science, Springer (2020)
2. Barendregt, H.P.: The lambda calculus - its syntax and semantics, Studies in logic
and the foundations of mathematics, vol. 103. North-Holland (1985)
3. Bergstra, J.A., Klop, J.W.: Process theory based on bisimulation semantics. In:
Linear Time, Branching Time and Partial Order in Logics and Models for Concur-
rency. Lecture Notes in Computer Science, vol. 354, pp. 50–122. Springer (1988).
https://doi.org/10.1007/BFb0013021
4. Bernardi, G., Dardha, O., Gay, S.J., Kouzapas, D.: On duality relations for session
types. In: Trustworthy Global Computing. Lecture Notes in Computer Science,
vol. 8902, pp. 51–66. Springer (2014). https://doi.org/10.1007/978-3-662-45917-
14
5. Bernardi, G., Hennessy, M.: Using higher-order contracts to model
session types. Logical Methods in Computer Science 12(2) (2016).
https://doi.org/10.2168/LMCS-12(2:10)2016
6. Bono, V., Messa, C., Padovani, L.: Typing copyless message passing. In: Program-
ming Languages and Systems. Lecture Notes in Computer Science, vol. 6602, pp.
57–76. Springer (2011). https://doi.org/10.1007/978-3-642-19718-5 4
740 V. T. Vasconcelos et al.

7. Bono, V., Padovani, L.: Typing copyless message passing. Logical Methods in Com-
puter Science 8(1) (2012). https://doi.org/10.2168/LMCS-8(1:17)2012
8. Caires, L., Pfenning, F., Toninho, B.: Linear logic propositions as session
types. Mathematical Structures in Computer Science 26(3), 367–423 (2016).
https://doi.org/10.1017/S0960129514000218
9. Demangeon, R., Honda, K.: Full abstraction in a subtyped pi-calculus with lin-
ear types. In: CONCUR 2011 - Concurrency Theory. Lecture Notes in Computer
Science, vol. 6901, pp. 280–296. Springer (2011). https://doi.org/10.1007/978-3-
642-23217-6 19
10. Fähndrich, M., Aiken, M., Hawblitzel, C., Hodson, O., Hunt, G.C., Larus, J.R.,
Levi, S.: Language support for fast and reliable message-based communication in
singularity OS. In: Proceedings of the 2006 EuroSys Conference. pp. 177–190. ACM
(2006). https://doi.org/10.1145/1217935.1217953
11. Fowler, S., Lindley, S., Morris, J.G., Decova, S.: Exceptional asynchronous ses-
sion types: session types without tiers. PACMPL 3(POPL), 28:1–28:29 (2019).
https://doi.org/10.1145/3290341
12. Franco, J., Vasconcelos, V.T.: A concurrent programming language with re-
ﬁned session types. In: Software Engineering and Formal Methods. Lec-
ture Notes in Computer Science, vol. 8368, pp. 15–28. Springer (2013).
https://doi.org/10.1007/978-3-319-05032-4 2
13. Garrigue, J., Keller, G., Sumii, E. (eds.): Proceedings of the 21st ACM SIGPLAN
International Conference on Functional Programming, ICFP 2016, Nara, Japan,
September 18-22, 2016. ACM (2016). https://doi.org/10.1145/2951913
14. Gastin, P., Laroussinie, F. (eds.): CONCUR 2010 - Concurrency Theory, 21th
International Conference, CONCUR 2010, Paris, France, August 31-September 3,
2010. Proceedings, Lecture Notes in Computer Science, vol. 6269. Springer (2010).
https://doi.org/10.1007/978-3-642-15375-4
15. Gay, S.J., Hole, M.: Subtyping for session types in the pi calculus. Acta Inf. 42(2-3),
191–225 (2005). https://doi.org/10.1007/s00236-005-0177-z
16. Gay, S.J., Vasconcelos, V.T.: Linear type theory for asynchronous session types. J.
Funct. Program. 20(1), 19–50 (2010). https://doi.org/10.1017/S0956796809990268
17. Gorla, D.: Towards a uniﬁed approach to encodability and separa-
tion results for process calculi. Inf. Comput. 208(9), 1031–1053 (2010).
https://doi.org/10.1016/j.ic.2010.05.002
18. Honda, K.: Types for dyadic interaction. In: CONCUR ’93, 4th International Con-
ference on Concurrency Theory. Lecture Notes in Computer Science, vol. 715, pp.
509–523. Springer (1993). https://doi.org/10.1007/3-540-57208-2 35
19. Honda, K., Vasconcelos, V.T., Kubo, M.: Language primitives and type discipline
for structured communication-based programming. In: Programming Languages
and Systems. Lecture Notes in Computer Science, vol. 1381, pp. 122–138. Springer
(1998). https://doi.org/10.1007/BFb0053567
20. Honda, K., Yoshida, N., Carbone, M.: Multiparty asynchronous session
types. In: Proceedings of the 35th ACM SIGPLAN-SIGACT Sympo-
sium on Principles of Programming Languages. pp. 273–284. ACM (2008).
https://doi.org/10.1145/1328438.1328472
21. Honda, K., Yoshida, N., Carbone, M.: Multiparty asynchronous session types. J.
ACM 63(1), 9:1–9:67 (2016). https://doi.org/10.1145/2827695
22. Kobayashi, N., Pierce, B.C., Turner, D.N.: Linearity and the pi-calculus.
In: Conference Record of POPL’96. pp. 358–371. ACM Press (1996).
https://doi.org/10.1145/237721.237804
Mixed Sessions 741

23. Kobayashi, N., Pierce, B.C., Turner, D.N.: Linearity and the pi-
calculus. ACM Trans. Program. Lang. Syst. 21(5), 914–947 (1999).
https://doi.org/10.1145/330249.330251
24. Kouzapas, D., Yoshida, N.: Mixed-choice multiparty session types (2020), unpub-
lished
25. Lindley, S., Morris, J.G.: Talking bananas: structural recursion for session types.
In: Garrigue et al. [13], pp. 434–447. https://doi.org/10.1145/2951913.2951921
26. Milner, R.: A Calculus of Communicating Systems, Lecture Notes in Computer
Science, vol. 92. Springer (1980). https://doi.org/10.1007/3-540-10235-3
27. Milner, R.: Functions as processes. In: Automata, Languages and Programming.
Lecture Notes in Computer Science, vol. 443, pp. 167–180. Springer (1990).
https://doi.org/10.1007/BFb0032030
28. Milner, R.: The polyadic pi-calculus: A tutorial. ECS-LFCS 91–180, Lab oratory
for Foundations of Computer Science, Department of Computer Science, University
of Edinburgh (1991), this report was published in F. L. Hamer, W. Brauer and H.
Schwichtenberg, editors, Logic and Algebra of Speciﬁcation. Springer-Verlag, 1993
29. Milner, R.: Functions as processes. Mathematical Structures in Computer Science
2(2), 119–141 (1992). https://doi.org/10.1017/S0960129500001407
30. Milner, R., Parrow, J., Walker, D.: A calculus of mobile processes, I. Inf. Comput.
100(1), 1–40 (1992). https://doi.org/10.1016/0890-5401(92)90008-4
31. Milner, R., Parrow, J., Walker, D.: A calculus of mobile processes, II. Inf. Comput.
100(1), 41–77 (1992). https://doi.org/10.1016/0890-5401(92)90009-5
32. Palamidessi, C.: Comparing the expressive power of the synchronous and asyn-
chronous pi-calculi. Mathematical Structures in Computer Science 13(5), 685–719
(2003). https://doi.org/10.1017/S0960129503004043
33. Peters, K., Nestmann, U.: Breaking symmetries. Mathemati-
cal Structures in Computer Science 26(6), 1054–1106 (2016).
https://doi.org/10.1017/S0960129514000346
34. Peters, K., Schicke, J., Nestmann, U.: Synchrony vs causality in the asynchronous
pi-calculus. In: Proceedings 18th International Workshop on Expressiveness in Con-
currency. EPTCS, vol. 64, pp. 89–103 (2011). https://doi.org/10.4204/EPTCS.64.7
35. Peters, K., Schicke-Uﬀmann, J., Goltz, U., Nestmann, U.: Synchrony versus causal-
ity in distributed systems. Mathematical Structures in Computer Science 26(8),
1459–1498 (2016). https://doi.org/10.1017/S0960129514000644
36. Pierce, B.C., Turner, D.N.: Pict: a programming language based on the pi-calculus.
In: Proof, Language, and Interaction, Essays in Honour of Robin Milner. pp. 455–
494. The MIT Press (2000)
37. Sangiorgi, D.: An interpretation of typed objects into typed pi-calculus. Inf. Com-
put. 143(1), 34–73 (1998). https://doi.org/10.1006/inco.1998.2711
38. Sangiorgi, D., Walker, D.: The Pi-Calculus - a theory of mobile processes. Cam-
bridge University Press (2001)
39. Stengel, Z., Bultan, T.: Analyzing singularity channel contracts. In: Proceedings
of the Eighteenth International Symposium on Software Testing and Analysis. pp.
13–24. ACM (2009). https://doi.org/10.1145/1572272.1572275
40. Strom, R.E., Yemini, S.: Typestate: A programming language concept for en-
hancing software reliability. IEEE Trans. Software Eng. 12(1), 157–171 (1986).
https://doi.org/10.1109/TSE.1986.6312929
41. Takeuchi, K., Honda, K., Kubo, M.: An interaction-based language and its
typing system. In: PARLE ’94: Parallel Architectures and Languages Europe.
Lecture Notes in Computer Science, vol. 817, pp. 398–413. Springer (1994).
https://doi.org/10.1007/3-540-58184-7 118
742 V. T. Vasconcelos et al.

42. Thiemann, P., Vasconcelos, V.T.: Context-free session types. In: Garrigue et al.
[13], pp. 462–475. https://doi.org/10.1145/2951913.2951926
43. Vasconcelos, V.T.: Fundamentals of session types. Inf. Comput. 217, 52–70 (2012).
https://doi.org/10.1016/j.ic.2012.05.002
44. Vasconcelos, V.T.: Typed concurrent objects. In: Object-Oriented Programming.
Lecture Notes in Computer Science, vol. 821, pp. 100–117. Springer (1994).
https://doi.org/10.1007/BFb0052178
45. Vasconcelos, V.T.: Fundamentals of session types. In: Formal Methods for Web
Services. Lecture Notes in Computer Science, vol. 5569, pp. 158–186. Springer
(2009). https://doi.org/10.1007/978-3-642-01918-0 4
46. Wadler, P.: Propositions as sessions. In: ACM SIGPLAN International
Conference on Functional Programming. pp. 273–286. ACM (2012).
https://doi.org/10.1145/2364527.2364568
47. Wadler, P.: Propositions as sessions. J. Funct. Program. 24(2-3), 384–418 (2014).
https://doi.org/10.1017/S095679681400001X
48. Waker, D.: Advanced Topics in Types and Programming Languages, chap. Sub-
structural Type Systems. The MIT Press (2005)
49. Yoshida, N., Vasconcelos, V.T.: Language primitives and type discipline for struc-
tured communication-based programming revisited: Two systems for higher-order
session communication. Electr. Notes Theor. Comput. Sci. 171(4), 73–93 (2007).
https://doi.org/10.1016/j.entcs.2007.02.056

Jack Williams1 , Nima Joharizadeh2 , Andrew D. Gordon1,3 , and

Advait Sarkar1,4
1
Microsoft Research, Cambridge, UK
{t-jowil,adg,advait}@microsoft.com
2
University of California, Davis, USA
[email protected]
3
University of Edinburgh, Edinburgh, UK
4
University of Cambridge, Cambridge, UK

Abstract. We develop a theory for two recently-proposed spreadsheet

mechanisms: gridlets allow for abstraction and reuse in spreadsheets, and
build on spilled arrays, where an array value spills out of one cell into
nearby cells. We present the first formal calculus of spreadsheets with
spilled arrays. Since spilled arrays may collide, the semantics of spilling
is an iterative process to determine which arrays spill successfully and
which do not. Our first theorem is that this process converges determin-
istically. To model gridlets, we propose the grid calculus, a higher-order
extension of our calculus of spilled arrays with primitives to treat spread-
sheets as values. We define a semantics of gridlets as formulas in the grid
calculus. Our second theorem shows the correctness of a remarkably di-
rect encoding of the Abadi and Cardelli object calculus into the grid cal-
culus. This result is the first rigorous analogy between spreadsheets and
objects; it substantiates the intuition that gridlets are an object-oriented
counterpart to functional programming extensions to spreadsheets, such
as sheet-defined functions.

1 Introduction

Many spreadsheets contain repeated regions that share the same formatting and
formulas, perhaps with minor variations. The typical method for generating each
variation is to apply the operations copy-paste-modify. That is, the user copies
the region they intend to repeat, pastes it into a new location, and makes local
modifications to the newly pasted region such as altering data values, format-
ting, or formulas. A common problem associated with copy-paste-modify is that
updates to a source region will not propagate to a modified copy. A user must
modify each copy manually—a process that is tedious and error-prone.
Gridlets [12] are a high-level abstraction for re-use in spreadsheets based on
the principle of live copy-paste-modify: a pasted region of a spreadsheet can be
locally modified without severing the link to the source region. Changes to the
source region propagate to the copy.
c The Author(s) 2020
P. Müller (Ed.): ESOP 2020, LNCS 12075, pp. 743–769, 2020.
https://doi.org/10.1007/978-3-030-44914-8_ 27
744 J. Williams et al.

The central idea of this paper is that we can implement gridlets using a
formula operator G. If a cell a contains the formula

G(r, a1 , F1 , . . . , an , Fn )

then the behaviour is to copy range r, modify cells ai with formulas Fi , and
paste the computed array in cell a where its elements may be displayed in the
cells below and to the right.
Consider the following example:

A B C A B C
1 “Edge” “Len.” 1 “Edge” “Len.”
2 “a” 3 = B2^2 2 “a” 3 9
3 “b” 4 = B3^2 3 “b” 4 16
4 “c” = SQRT(C4) = C2 + C3 4 “c” 5 25

Source sheet Evaluated sheet

The table computes and displays a Pythagorean triple, with intermediate cal-
culation spread across many cells. To reuse the table a user creates a gridlet by
inserting5 a G formula in cell A6 as follows.

A B C A B C
.. .. .. .. .. ..
. . . . . .
6 = G(A1:C4, B2, 7, B3, 24) 6 “Edge” “Len.”
7 7 “a” 7 49
8 8 “b” 24 576
9 9 “c” 25 625

Source sheet Evaluated sheet

The formula in A6 is interpreted as: compute the source range A1:C4 with B2
bound to 7, and B3 bound to 24. The result of the formula is an array corre-
sponding to the computed range which then displays in the grid, emulating a
paste action. A consequence of this design is that this single formula controls
the content of a range of cells, below and to the right; we say that it spills into
these cells.
Our overall goal is to explain the semantics of the gridlet operator G using ar-
ray spilling. Spilling is not new in spreadsheets: both Microsoft Excel and Google
Sheets allow a cell to contain a formula that computes an array, and whose com-
puted value then spills into vacant cells below and to the right. While there is a
practical precedent for spilling in spreadsheets, there is no corresponding formal
precedent from which to derive a semantics for G. This paper therefore proceeds
in two parts.
5
The user may enter this formula either directly, or indirectly via some grid-based
interface [12]; details of the user experience are beyond the scope of this paper.
Higher-Order Spreadsheets with Spilled Arrays 745

First, we make sense of array spilling and its subtleties. Two formulas spilling
into the same cell, or colliding, is one problem. Another problem is a formula
spilling into an area on which it depends, triggering a spill cycle. Both problems
make preserving determinism and acyclicity of spreadsheet evaluation a chal-
lenge. We give a semantics of spilling that exploits iteration to determine which
arrays spill successfully, and which do not. Our solution ensures that there is at
most one array that spills into any address, and that the iteration converges.
Second, we develop three new spreadsheet primitives that implement G when
paired with spilled arrays. We present a higher-order spreadsheet calculus, the
grid calculus, that admits sheets as first-class values and provides operations
that manipulate sheet-values. Previous work has drawn connections between
spreadsheets and object-oriented programming [5,8,9,15,17], but we give the first
direct correspondence by showing that the Abadi and Cardelli object calculus [1]
can be embedded in the grid calculus. Our translation constitutes a precise
analogy between objects and sheets, and between methods and cells.
In our semantics for gridlets, we make three distinct technical contributions:
– We develop the spill calculus, the first formalisation of spilled arrays for
spreadsheets. Our first theorem is that the iterative process of spilling we
present converges deterministically (Section 4). Our formal analysis of spilled
arrays, a feature now available in commercial spreadsheet systems, is a sub-
stantial contribution of this work, independent of our gridlet semantics.
– We develop the grid calculus, an extension of the spill calculus with three
higher-order operators: GRID, VIEW, and UPDATE. These correspond to
copy, paste, and modify, and suffice to encode the operator G (Section 5).
– In the course of developing the grid calculus, we realised a close connection
between gridlets and object-oriented programming. We make this precise by
encoding the Abadi and Cardelli object calculus into the grid calculus. Our
second theorem shows the correctness of this encoding (Section 6).

2 Challenges of Spilling
In this section we describe the challenges of implementing spilled arrays. We de-
scribe core design principles for spreadsheet implementations and then illustrate
how spilled arrays challenge these principles.

2.1 Design Principles for Spreadsheet Evaluation

Spreadsheet implementations rely on the following two properties to be pre-
dictable and eﬃcient.
Determinism Evaluation should produce identical output given identical in-
put; this property is exploited for eﬃcient recalculation.
Acyclicity Evaluation should not be self-referential. The dependency graph of
a spreadsheet should form a directed acyclic graph and no cell should depend
on its own value. Creating self-referential formulas cannot be prevented, but
violations of acyclicity should be observable and not cause divergence.
746 J. Williams et al.

Both properties are satisﬁed by standard spreadsheet implementations, if we

exclude a few nondeterministic worksheet functions such as RAND. Through-
out this work we consider only deterministic worksheet functions. Given this
assumption, spreadsheet formulas constitute a purely functional language, and
so evaluation is deterministic. Cell evaluation tracks a calculating state for every
cell and raises a circularity violation for any cell that depends on its own value.
Spilled arrays pose a challenge for preserving determinism and acyclicity
which we illustrate with examples. For the remainder of our technical develop-
ments we drop the leading = from formulas. We begin with core terminology.
Arrays Spreadsheet arrays are ﬁnite two-dimensional matrices that use one-
based indexing and are non-empty. We denote an (m, n) array literal as
{V1,1 , . . . , V1,n ; . . . ; Vm,1 , . . . , Vm,n }
where (,) delimits the n columns and (;) delimits the m rows. We use V to
range over values, which are described in Section 3.
Spilling Address ar (i, j)-spills into address at iﬀ the value of ar is an (m, n)
array and at is i − 1 rows below and j − 1 columns right of ar , where i ∈ 1..m
and j ∈ 1..n. In particular, ar (1,1)-spills into itself.
Roots, targets, & areas If ar (i, j)-spills into address at we call ar the spill
root and at a spill target. The spill area of ar is the set of its spill targets.
The value of at is element (i, j) of the array that is the value of ar .
Consider the following example:

A B A B
1 {10, 20} 1 10 20
2 2

Source Sheet Evaluated Sheet

Address A1 evaluates to a (1, 2) array and is a spill root with spill area {A1, B1}.
Address A1 (1, 1)-spills into A1, and (1, 2)-spills into B1.

2.2 Spill Collisions

Spill collisions can be static or dynamic, and may interfere with determinism.

Static Collision Every cell in a spill area should be blank except for the spill
root; a blank cell has no formula. A static collision occurs when a spill root spills
into another non-blank cell, and we say the non-blank cell is an obstruction.
The choice to read the value from the obstruction or the spilled value violates
determinism. We adopt a simple mechanism used by Excel and Sheets to resolve
static spill collisions: the root evaluates to an error value, not an array, and spills
nowhere. The ambiguity between reading the obstructing cell’s value and the
root’s spilled value is resolved by preventing the root from spilling—we always
read the value from the obstructing cell. Consider the following example:
Higher-Order Spreadsheets with Spilled Arrays 747

A B A B
1 {10, 20} 40 1 ERR 40
2 B1 + 2 2 42

Source Sheet Evaluated Sheet

The address B1 obstructs spill root A1 and consequently address A1 evaluates
to an error value, address B1 evaluates to 40, and address B2 evaluates to 42.

Dynamic Collisions A dynamic collision occurs when a blank cell is a spill target
for two distinct spill roots. Dynamic collisions can be resolved in diﬀerent ways.
– The conservative approach is to say no colliding spill root spills and each
root evaluates to an error.
– The liberal approach is to say that every colliding spill root spills. This
approach can be non-deterministic because the spill target obtains its value
by choosing one of the multiple colliding spill roots. Google Sheets takes the
liberal approach.
– An intermediate approach enforces what we call the single-spill policy. One
root from the set of colliding roots is permitted to spill and the rest evaluate
to an error. This approach can be non-deterministic because there is a choice
of which root is permitted to spill. Excel takes the single-spill approach.
Consider the following example that uses the single-spill approach:

A B A B A B
1 B2 {3; 4} 1 2 ERR 1 4 3
2 {1, 2} 2 1 2 2 ERR 4

Source Sheet Root A2 wins Root B1 wins

Addresses A2 and B1 are spill roots: the former evaluates to an array of size
(1, 2) while the latter evaluates to an array of size (2, 1). The value of address A1
depends on which address from the colliding spill roots A2 and B1 are permitted
to spill. Arbitrarily selecting which root is permitted to spill violates determinis-
tic evaluation. Sheets and Excel resolve collisions using an ordering that prefers
newer formulas. While consecutive evaluations of the same spreadsheet will pro-
duce the same result, two syntactically identical spreadsheets constructed in
diﬀerent ways can produce diﬀerent results. In Section 4 we give a deterministic
semantics for spilling that uses a total ordering on addresses to select a single
root from a set of colliding roots.

2.3 Spill Cycles

A spill cycle occurs when the value of a spill root depends on an address in its
spill area. Spill cycles violate acyclicity and subtly diﬀer from cell cycles. A cell
748 J. Williams et al.

cycle occurs when the value of a formula in a cell depends on the value of the
cell itself. We know that it is never legal for a cell to read its own value and
therefore it is possible to eagerly detect cell cycles during evaluation of a cell. In
contrast, a spill cycle only occurs if the cell evaluates to an array that is spilled
into a range the cell depends on, so it is not possible to detect the cycle until
the cell has been evaluated.
We can thus proactively detect cell cycles, but only retroactively detect spill
cycles. To see why, let us consider the following example, wherein we assume
the deﬁnition of a conditional operator IF that is lazy in the second and third
arguments, and the function INC that maps over an array and increments every
number and converts to 0, where is the value read from a blank cell.

A B
1 42 IF(A1 = 42, SUM(B2:B3), INC(B2:B3))
2
3

The evaluation of address B1 returns the sum of the range B2 : B3. While the
value of B1 depends on the values in the range B2:B3, the sum returns a scalar
and therefore no spilling is required.
Consider the case where the value in A1 is changed to 43. The address B1
will evaluate the formula INC(B2 : B3), ﬁrst by dereferencing the range B2 : B3
to yield {; }, and then by applying INC to yield {0; 0}. The array {0; 0} will
attempt to spill into the range B1:B2—a range just read from by the formula.
The attempt to spill will induce a spill cycle; there is no consistent value that
can be assigned to the addresses B1, B2, and B3.
In Section 4 we give a semantics for spilling that uses dynamic dependency
tracking to ensure that no spill root depends on its own spill area.

3 Core Calculus for Spreadsheets

In this section we present a core calculus for spreadsheets that serves as the
foundation of our technical developments.

3.1 Syntax

Figure 1 presents the syntax of the core calculus. Let a and b range over A1-style
addresses, written N m, composed from a column name N and row index m. A
column name is a base-26 numeral written using the symbols A..Z. A row index
is a decimal numeral written as usual. Let m and n range over positive natural
numbers which we typically use to denote row or array indices. We assume a
locale in which rows are numbered from top to bottom, and columns from left to
right, so that A1 is the top-left cell of the sheet. We use the terms address and cell
interchangeably. Let r range over ranges that are pairs of addresses that denote
a rectangular region of a grid. Modern spreadsheet systems do not restrict which
Higher-Order Spreadsheets with Spilled Arrays 749

A1-style column name N ::= A | . . . | Z | AA | AB | . . .

m, n ∈ N1
Address a, b ::= Nm
Range r ::= a1 :a2
Value V ::= | c | ERR | {Vi,j i∈1..m,j∈1..n }
Formula F ::= V | r | f (F1 , . . . , Fn ) (f function name)
Sheet S ::= [ai → Fi i∈1..n ] (ai distinct and no Fi = )
Grid γ ::= [ai → Vi i∈1..n ] (ai distinct)

Fig. 1. Syntax for Core Calculus

corners of a rectangle are denoted by a range but will automatically normalise the
range to represent the top-left and bottom-right corners. We implicitly assume
that all ranges are written in the normalised form such that range B1:A2 does
not occur; instead, the range is denoted A1:B2.
A value V is either the blank value , a constant c, an error ERR, or a
two-dimensional array {Vi,j i∈1..m,j∈1..n }. We write {Vi,j i∈1..m,j∈1..n } as short
for array literal {V1,1 , . . . , V1,n ; . . . ; Vm,1 , . . . , Vm,n }.
Let F range over formulas. A formula is either a value V , a range r, or a
function application f (F1 , . . . , Fn ), where f ranges over names of pre-defined
worksheet functions such as SUM or PRODUCT.
Let S range over sheets, where a sheet is a partial function from addresses
to formulas that has finite domain. We write [] to denote the empty map, and
we write S[a → F ] to denote the extension of S to map address a to formula
F , potentially shadowing an existing mapping. We do not model the maximum
numbers of rows or columns imposed by some implementations. Each finite S
represents an unbounded sheet that is almost everywhere blank: we say a cell a
is blank to mean that a is not in the domain of S.
Let γ range over grids, where a grid is a partial function from addresses to
values that has finite domain. A grid can be viewed as a function that assigns
values to addresses, obtained by evaluating a sheet.

3.2 Operational Semantics

Figure 2 presents the operational semantics of the core calculus. Auxiliary deﬁ-
nitions are present at the top of Figure 2.

Formula Evaluation The relation S F ⇓ V means that in sheet S, formula

F evaluates to value V . A value V evaluates to itself. A function application
f (F1 , . . . , Fn ) evaluates to V if the result of applying f to evaluated arguments
is V , where f is the underlying semantics of f , a total function on values. A
single cell range a:a evaluates to V if address a dereferences to V . A multiple
cell range a1 :a2 evaluates to an array of the same dimensions, where each value
in the array is obtained by dereferencing the corresponding single cell within the
range. We write size(a1 :a2 ) to denote the operation that returns the dimensions
750 J. Williams et al.

size(N1 m1 :N2 m2 ) = (m2 − m1 + 1, N2 − N1 + 1)

N m + (i, j) = (N + j − 1)(m + i − 1)
Formula evaluation: S F ⇓ V

S Fi ⇓ Vi f (V1 , . . . , Vn ) = V S a!V
SV ⇓V S f (F1 , . . . , Fn ) ⇓ V S a:a ⇓ V

a1 = a2 size(a1 :a2 ) = (m, n) ∀i ∈ 1..m, j ∈ 1..n. S (a1 + (i, j)) ! Vi,j

S a1 :a2 ⇓ {Vi,j i∈1..m,j∈1..n }

Address dereferencing: S a ! V

S(a) = F SF ⇓V a ∈ dom(S)

S a!V S a!

Sheet evaluation: S ⇓ γ

def
S ⇓ γ = ∀a ∈ dom(S). S a ! γ(a)

Fig. 2. Operational Semantics for Core Calculus

of a range written (m, n), where m is the number of rows, and n is the number of
columns. We write a + (i, j) to denote the address oﬀset to the right and below a
by i − 1 rows and j − 1 columns. For example, a + (1, 1) maps to a, and a + (1, 2)
maps to the address immediately to the right of a. Both size(a1:a2 ) and a + (i, j)
are deﬁned in Figure 2.

Address Dereferencing The relation S a ! V means that in sheet S, address a

dereferences to V . If address a maps to formula F in sheet S, then dereferencing
a returns V when F evaluates to V . If address a is not in the domain of S then
dereference a returns the blank value . We make range evaluation and address
dereferencing distinct relations to aid our presentation in Section 4.

Sheet Evaluation The relation S ⇓ γ means that sheet S evaluates to grid γ

and the relation is deﬁned by point-wise dereferencing of every address in the
sheet. Recall the spreadsheet design principles of determinism and acyclicity
from Section 2.1. The relations of our semantics are partial functions (as stated
in Appendix A of the extended version [21]). As for acyclicity, if there is a cycle
where S(a) = F and evaluation of formula F must dereference cell a, then we
cannot derive S F ⇓ V for any V . Although our calculus could be modiﬁed to
model a detection mechanism for cell cycles, we omit any such mechanism for
the sake of simplicity.
Higher-Order Spreadsheets with Spilled Arrays 751

Formula F ::= · · · | a# (postﬁx operator)

Dependency set D ::= {a1 , . . . , an }
Grid γ ::= [ai → (Vi# , Vi! , Di )i∈1..n ] (ai distinct)
Spill permit p ::= | ×
Spill oracle ω ::= [ai → (mi , ni , pi )i∈1..n ] (ai distinct)

Fig. 3. Syntax for Spill Calculus (Extends and modiﬁes Figure 1)

4 Spill Calculus: Core Calculus with Spilled Arrays

The spill calculus, presented in this section, is the ﬁrst formalism to explain the
semantics of arrays that spill out of cells in spreadsheets. The spill calculus and
its convergence, Theorem 1, is our ﬁrst main technical contribution.

4.1 Syntax

Figure 3 presents the extensions and modiﬁcations to the syntax of Figure 1; we

omit syntax classes that remain unchanged.
Let F range over formulas, extended to include the postfix root operator a#.
The root operator a# evaluates to an array if address a is a spill root. Accessing
an array via the root operator instead of a fixed-size range is more robust to
future edits. For example, consider the sheet [A1 → F, B1 → SUM(A1 : A10)]
where formula F evaluates to a (10, 1) array. If the user modifies F such that
the formula evaluates to an array of size (11, 1) then the summation in B1 still
applies only to the first ten elements that spill from A1, even if the user intends
to sum the whole array. The root operator allows a more robust formulation:
[A1 → F, B1 → SUM(A1#)]. The summation in B1 applies to the entire array
that spills from A1, regardless of its size. Section 4.3 shows the full semantics of
the root operator.
Let D range over dependency sets, which denote a set of addresses that a
formula bound to an address depends on.
Let γ range over grids, which now map addresses to tuples of the form
(V # , V ! , D). If γ(a) = (V # , V ! , D) then V # is the pre-spill value obtained by
applying the root operator # to a, while V ! is the post-spill value obtained
by evaluating a, and D is the dependency set required to dereference a. Each
dereferenced address has both a pre-spill and post-spill value, even if the cell
content does not spill. If the pre-spill value is not an array, it cannot spill, and
the post-spill value equals the pre-spill value.
Let p range over spill permits, where denotes that a root is permitted to
spill and × denotes that it is not.
Let ω range over spill oracles, which map addresses to tuples of the form
(m, n, p). A spill oracle governs how arrays spill in a sheet.

– If ω(a) = (m, n, p) we expect a to be a spill root for an (m, n) array:

– If p = the contents of a can spill with no obstruction.
752 J. Williams et al.

def
Let S = [A1 → {7; 8}, B1 → IF(A2 = 8, {9; 10}, 100)]

A B A B A B
1 {7; 8} 100 1 7 {9; 10} 1 7 9
2 2 8 2 8 10

Round 1: ω1 = [] Round 2: Round 3: ω3 = [A1 →

ω2 = [A1 → (2, 1, )] (2, 1, ), B1 → (2, 1, )]

Fig. 4. Example Spill Iteration

– If p = × then a cannot spill because either a formula obstructs the spill

area, or another spill root will spill into the area.
Oracles track the size of each spilled array so we can ﬁnd the spill root a of any
spill target, and hence obtain the value for a spill target by dereferencing a.

4.2 Spill Oracles and Iteration

As discussed in Section 2.2, spill collisions have the potential to introduce non-
determinism if not handled appropriately. Our solution is to evaluate a sheet in a
series of rounds, each determined by a spill oracle. Given a sheet, a grid is induced
by evaluating the sheet and using the oracle to deterministically predict how
each root spills. A discrepancy could be a new spill root the oracle missed, or an
existing spill root with dimensions differing from the oracle. If any discrepancies
are found we compute a new oracle, and start a new round. Iteration halts when
the oracle is consistent with the induced grid. The notion of a consistent oracle
is defined in Section 4.4. We can view the iteration as a sequence of n oracles
where only the final oracle is consistent:

[] = ω1 −→ ω2 −→ · · · −→ ωn and ωn is consistent

Consider the example in Figure 4. At the top we show the bindings of the sheet;
at the bottom we show the oracle and induced grid for each round of spilling.
We define the initial spill oracle as ω1 = [] and in the first round the oracle
is empty. An empty oracle anticipates no spill roots and therefore no roots are
permitted to spill. The array in A1 remains collapsed and B1 evaluates using the
false branch. Once the sheet has been fully evaluated we determine that ω1 was
not a consistent prediction because there is an array in A1 with no corresponding
entry in ω1 . We compute a new oracle that determines that A1 is allowed to spill
because the area is blank. We define the new oracle as ω2 = [A1 → (2, 1, )].
In the second round the root A1 is permitted to spill by the oracle and as a
consequence B1 now evaluates to the array {9; 10}—this array is not anticipated
by the oracle and remains collapsed. Once the sheet has been fully evaluated we
determine that ω2 was not a consistent prediction because there is an array in
Higher-Order Spreadsheets with Spilled Arrays 753

B1 with no corresponding entry in ω2 . We compute a new oracle that determines

that B1 is allowed to spill because the area is blank in the grid induced by ω2 .
We deﬁne the third oracle as ω3 = [A1 → (2, 1, ), B1 → (2, 1, )].
In the third and ﬁnal round the root A1 is permitted to spill by the oracle
and B1 evaluates to the array {9; 10}. This time the oracle anticipates the root
in B1 and permits the array to spill. Once the sheet has been fully evaluated we
determine that ω3 is a consistent prediction because the spill roots A1 and B1
are contained in the oracle. The iteration is the sequence of three oracles:

[] −→ [A1 → (2, 1, )] −→ [A1 → (2, 1, ), B1 → (2, 1, )]

Spill Rejection Spill oracles explicitly track the anticipated size of the array
to ensure that spill rejections based on incorrect dimensions can be corrected.
Consider the following example:

A B C
1 IF(C2 = 2, {10; 20}, {10; 20; 30}) {1; 2}
2
3 {1, 2, 3}

After the first round using an empty spill oracle there are three spill roots:
A3 = {1, 2, 3}, B1 = {10; 20; 30}, and C1 = {1; 2}. There is sufficient space to
spill C1 but only space to spill one of A3 and B1; the decision is resolved using
the total ordering on addresses. Suppose that we allow A3 to spill such that the
new oracle is: [A3 → (1, 3, ), B1 → (3, 1, ×), C1 → (2, 1, )].
After the second round we find that address B1 returns an array of a smaller
size because the root C1 spills into C2. Previously we thought B1 was too big to
spill but with the new oracle we find there is now sufficient room; by explicitly
recording the anticipated size it is possible to identify cases that require further
refinement. We compute the new oracle [A3 → (1, 3, ), B1 → (2, 1, ), C1 →
(2, 1, )] that is consistent.
An interesting limitation arises if the total ordering places B1 before A3,
which we discuss in Section 4.6.

4.3 Operational Semantics

Figure 5 presents the operational semantics for the spill calculus. The key ad-
ditions to the relations for formula evaluation and address dereferencing are an
oracle ω that is part of the context, and a dependency set D that is part of the
output. We discuss each relation in turn and focus on the extensions and modi-
ﬁcations from Figure 2. Auxiliary deﬁnitions are present at the top of Figure 5.

Formula Evaluation: S, ω F ⇓ V, D The spill oracle ω is not inspected by the

relation but is threaded through the deﬁnition. Dependency set D denotes the
transitive dependencies required to evaluate F . Evaluating a value or function
application is as before, except we additionally compute the dependencies of the
754 J. Williams et al.

owners(ω, a) = {(ar , i, j) | ω(ar ) = (m, n, ) and ar + (i, j) = a and (i, j) ≤ (m, n)}
area(a, m, n) = {
a + (i, j) | ∀i ∈ 1..m, ∀j ∈ 1..n }
(m, n) if V = {Vi,j i∈1..m,j∈1..n }
size(V ) =
⊥ otherwise

Formula evaluation: S, ω F ⇓ V, D

S, ω Fi ⇓ Vi , Di f (V1 , . . . , Vn ) = V
S, ω V ⇓ V, ∅ n
S, ω f (F1 , . . . , Fn ) ⇓ V, Di
i=1

S, ω a ! V # , V ! , D S, ω a ! V # , V ! , D
S, ω a# ⇓ V , D ∪ {a}
#
S, ω a:a ⇓ V ! , D ∪ {a}

a1 = a2
size(a1 :a2 ) = (m, n) ∀i ∈ 1..m, j ∈ 1..n. S, ω a1 + (i, j) ! Vi,j
# !
, Vi,j , Di,j
m,n
! i∈1..m,j∈1..n

S, ω a1 :a2 ⇓ {Vi,j }, Di,j ∪ {a1 + (i, j)}
i,j=1,1

Address dereferencing: S, ω a ! V # , V ! , D

owners(ω, a) = ∅ a ∈ dom(ω) S(a) = F S, ω F ⇓ V, D

(1)
S, ω a ! V, V, D

owners(ω, a) = ∅ a ∈ dom(ω) a ∈ dom(S)

(2)
S, ω a ! , , ∅

owners(ω, a) = ∅ ω(a) = (m, n, ×) S(a) = F S, ω F ⇓ V, D

(3)
S, ω a ! V, ERR, D

(ar , i, j) ∈ owners(ω, a) ω(ar ) = (m, n, ) S(ar ) = F

S, ω\ar F ⇓ V, D size(V ) = (m, n) area(ar , m, n) ∩ D = ∅
(4)
S, ω a ! (a = ar ? V : ), Vi,j , D

(ar , i, j) ∈ owners(ω, a)
ω(ar ) = (m, n, ) S(ar ) = F S, ω\ar F ⇓ V, D size(V ) = (m, n)
(5)
S, ω a ! (a = ar ? V : ), (a = ar ? V : ), (a = ar ? D : ∅)

Sheet evaluation: S, ω ⇓ γ

def
S, ω ⇓ γ = ∀a ∈ dom(S). S, ω a ! γ(a)

Fig. 5. Operational Semantics for Spill Calculus

Higher-Order Spreadsheets with Spilled Arrays 755

formula. The dependency set required to evaluate a value is ∅. The dependency

set required to evaluate a function application is the union of the dependencies
of the arguments. Evaluating a root operation a# dereferences a and returns the
pre-spill value V # . The dependency set required to evaluate a root operation a#
is the dependency set required to dereference a and the address a itself. Evaluat-
ing a single cell range a:a dereferences a and returns the post-spill value V ! . The
dependency set required to evaluate a single cell range a : a is the dependency
set required to dereference a and the address a itself. Evaluating a multiple cell
range a1:a2 returns an array of the same dimensions, where each value in the ar-
ray is obtained by dereferencing the corresponding single cell and extracting the
post-spill value. The dependency set required to evaluate a multiple cell range is
the dependency set required to dereference every address in the range, and the
range itself.

Address dereferencing The relation S, ω a ! V # , V ! , D means that in sheet S

with oracle ω, address a dereferences to pre-spill value V # and post-spill value
V ! , and depends upon the addresses in D. Five rules govern address dereferenc-
ing, based on spill oracle ω and owners set owners(ω, a).
The set owners(ω, a) is key to the operational semantics and denotes the set of
owners for address a. If a tuple (ar , i, j) is in the set owners(ω, a), we say ar owns
a, meaning that ar is a spill root that we expect to spill into address a, and that a
is offset from ar by i−1 rows and j −1 columns. Hence, to dereference a we must
first compute the root ar and extract the (i, j)th spilled value from the root array.
Our definition allows an address to own itself, denoted (a, 1, 1) ∈ owners(ω, a),
and does not preclude an address having multiple owners, violating the single-
spill policy. We enforce the single-spill policy in our technical results using an
additional well-formedness condition on oracles, defined in Section 4.5.
Rule (1) applies when the address has no owner, the address is not a spill
root, and the address has a formula binding in S. The pre-spill and post-spill
values are the value obtained by evaluating the bound formula.
Rule (2) applies when the address has no owner, the address is not a spill
root, and the address has no formula binding in S. The pre-spill and post-spill
values are the blank value and the dependency set is empty. Rules (1) and (2)
correspond to the address dereferencing behaviour described in the core calculus
(Section 3) which is lifted to the new relation.
Rule (3) rule applies when the address is a spill root and the root is not
permitted to spill. The pre-spill value is the value obtained by evaluating the
bound formula; the post-spill value is an error value. If the address has no bound
formula then the relation is undefined.
Rules (4) and (5) apply when an address with an owner is dereferenced. The
owner ar is omitted from the spill oracle before evaluating the associated formula,
denoted by S, ω\ar F ⇓ V, D. This prevents cycles when the oracle incorrectly
expects the root to spill, but the root does not, and instead depends on the
expected spill area. For example, B1 = SUM(B2:B3) and ω = [B1 → (3, 1, )].
The address B1 owns B2 according to ω, therefore dereferencing address B2
requires dereferencing B1, which in-turn depends on B2. If we did not remove
756 J. Williams et al.

B1 from ω when evaluating the formula bound to B1 we would create a cycle. We

remove B1 from ω so that when formula SUM(B2:B3) dereferences B2 a blank
value is returned. Genuine spill cycles are detected post-dereferencing using the
dependency set.
Rule (4) applies when the address has an owner and the formula bound to
the owner evaluates to an array of the expected size according to ω. This rule is
only deﬁned when the intersection of the spill root’s dependencies and its spill
area is empty, preventing spill cycles. The pre-spill value is obtained using the
conditional operator a = ar ? V : . When the dereferenced cell is the root then
the value is the root array, otherwise the value is blank. The post-spill value is
obtained by indexing into the root array at the (i, j)th position.
Rule (5) applies when the address has an owner and the formula bound to the
owner does not evaluate to an array of the expected size according to ω. In this
case there is no attempt to spill as the oracle is incorrect. When the dereferenced
address is the root then the pre-spill and post-spill values are obtained from the
formula, otherwise the pre-spill and post-spill values are blank.

Sheet evaluation: S, ω ⇓ γ Sheet evaluation in the spill calculus accepts a spill

oracle, but is otherwise unchanged from sheet evaluation in the core calculus. The
computed grid only contains the value of addresses with a bound formula, and
does not include the value of any blank cells that are in a spill area. In contrast,
a spreadsheet application would display the value for all addresses, including
those within a spill area. Obtaining this view can be done by dereferencing
every address in the viewport using the sheet and oracle.

4.4 Oracle Reﬁnement

We have shown how to compute a grid given a sheet and oracle, but we have not
considered the accuracy of the predictions provided by the oracle. In Section 4.2
we informally describe an iterative process to refine an oracle from a computed
grid; in this section we give the precise semantics of oracle refinement. Figure 6
presents the full definition of oracle refinement.

Consistency The relation γ |= ω states that grid γ is consistent with oracle ω. A

grid is consistent if every address is consistent, written γ |=a ω. An address a is
consistent in γ and ω if, and only if, the grid and oracle agree on the size of the
value at address a. Consistency tells us that the oracle has correctly predicted
the location and size of every spill root in the grid, and has not predicted any
spurious roots.

Reﬁnement The function reﬁne(S, ω, γ) takes an inconsistent oracle and returns

a new oracle that is reﬁned using the computed grid. The function is deﬁned as
follows. First, start with subset ωok of ω that is consistent with γ. Second, collect
the remaining unresolved spill roots in γ, denoted γr . Finally, recursively select
the smallest address in γr according to a total order on addresses, determining
whether the root is permitted to spill and adding the permit to the accumulating
Higher-Order Spreadsheets with Spilled Arrays 757

def
γ |=a ω = ∀m, n, p. (ω(a) = (m, n, p)) ⇔
∃V # , V ! , D. (γ(a) = (V # , V ! , D) ∧ size(V # ) = (m, n))
def
γ |= ω = ∀a. γ |=a ω

reﬁne(S, ω, γ) = decide(S, ωok , γr ) where

ωok = {a → (m, n, p) ∈ ω | γ |=a ω}

γr = {a → (V # , V ! , D) ∈ γ | ∃m, n. size(V # ) = (m, n) and a ∈ dom(ωok )}

decide(S, ω, []) = ω
decide(S, ω, γ[a → (V # , V ! , D)]) = decide(S, ω[a → (m, n, p)], γ)
#
where a is the least element in dom(γ) and size(V ) = (m, n)
if ∀at ∈ area(a, m, n). a = at ⇒ at ∈ dom(S) and owners(ω, at ) = ∅
p=
× otherwise

Spill iteration: ω −→S ω Final oracle: S ω ﬁnal

S, ω ⇓ γ γ |= ω reﬁne(S, ω, γ) = ω S, ω ⇓ γ γ |= ω
ω −→S ω S ω ﬁnal

Final sheet evaluation: S ⇓ γ

S ⇓ γ = [] −→∗S ω and S ω ﬁnal and S, ω ⇓ γ

def

Fig. 6. Oracle Reﬁnement

oracle. A root is permitted to spill if the potential spill area is blank (excluding
the root itself) and each address in the spill area has no owner, thereby preserving
the single-spill policy.

Spill iteration The relation ω −→S ω denotes a single iteration of oracle refine-
ment. When a computed grid is not consistent with the spill oracle that induced
it, written γ |= ω, a new oracle is produced using function refine(S, ω, γ). We
write −→∗S for the reflexive and transitive closure of −→S .

Final oracle The relation S ω ﬁnal states that oracle ω is ﬁnal for sheet S,
and is valid when the grid induced by ω is consistent with ω.

Final sheet evaluation The relation S ⇓ γ denotes the evaluation of sheet S to

grid γ which implicitly refines an oracle to a final state. The process starts with
an empty oracle [] and iterates until a final oracle is found.
758 J. Williams et al.

4.5 Technical Results

This section presents the main technical result of the spill calculus: that iteration
of oracle refinement converges for well-behaved sheets. We begin with prelimi-
nary definitions and results.
To avoid ambiguous evaluation every spill area must be disjoint and unob-
structed; an oracle is well-formed if it predicts non-blank spill roots, and predicts
disjoint and unobstructed spill areas, defined below:
Definition 1 (Well-formed oracle). We write S ω wf if oracle ω is well-
formed for sheet S. An oracle ω is well-formed if for all addresses a the following
conditions are satisfied:

1. If a ∈ dom(S) then a ∈ dom(ω).

2. |owners(ω, a)| ≤ 1.
3. If (ar , i, j) ∈ owners(ω, a) and a = ar then a ∈ dom(S).

The deﬁnition of oracle reﬁnement in Figure 6 preserves well-formedness.

Lemma 1. If S ω wf and S, ω ⇓ γ then S refine(S, ω, γ) wf.
Producing well-formed oracles alone is insufficient to guarantee convergence.
Oracle refinement would never reach a consistent state if the predicted spill areas
were incorrectly sized.
The definition of oracle refinement in Figure 6 predicts spill areas that are
correctly sized with respect to the current grid.
Lemma 2. If S ω wf and S, ω ⇓ γ then γ |= refine(S, ω, γ).
Predicting correctly sized spill areas is also insufficient to guarantee con-
vergence. Oracle refinement would never reach a consistent state if it oscillates
between permitting and rejecting the same root to spill. Consider the sheet:
def
Let S = [A1 → {1; 2}, B1 → IF(A2 = 2, {3; 4}, 0)]

Spill iteration would continue indeﬁnitely if reﬁnement cycled between the

following two well-formed and correctly sized oracles:

[A1 → (2, 1, )] −→ [A1 → (2, 1, ×), B1 → (2, 1, )] −→ · · ·

To avoid oscillating spill iteration the process of oracle reﬁnement should be

permit preserving, defined below:
Definition 2 (Permit preserving extension). We write γ ω ω if
oracle ω is a permit preserving extension of ω in context γ. Defined as:

γ ω ω = ∀a, m, n, p. (γ |=a ω ∧ ω(a) = (m, n, p)) ⇒ ω (a) = (m, n, p)

def

The deﬁnition of oracle reﬁnement in Figure 6 is permit preserving.

Higher-Order Spreadsheets with Spilled Arrays 759

Lemma 3. If S ω wf and S, ω ⇓ γ then γ ω reﬁne(S, ω, γ).

Spill iteration should be a converging iteration but this cannot be guaranteed
in general; at any given step in the iteration a sheet can fail to evaluate to a grid.
This can happen because the sheet contains a cell cycle, spill cycle, or diverging
grid calculus term. Instead, we only expect that if the sheet is free from these
divergent scenarios then spill iteration must converge. To allow us to dissect
different forms of divergence and focus on spill iteration we only consider acyclic
sheets, defined below:
Definition 3 (Acyclic). A sheet S is acyclic if for all ω such that S ω wf,
there exists some γ such that S, ω ⇓ γ.
For instance, none of the following sheets are acyclic: [A1 → A1] has a
cell cycle, [A1 → B1 : C1] has a spill cycle, and [A1 → Ω] has a formula Ω
that diverges. Divergent terms are not encodable in the spill calculus but are
encodable in the grid calculus, as we show in Section 6.1. An alternative approach
would be to explicitly model divergence in our semantics of sheet evaluation and
show that iteration converges or the sheet diverges. We choose not to pursue
this approach to improve the clarity of our operational semantics, but note that
our semantics can be extended to model cycles.
For any acylic sheet, spill iteration will converge to a final spill oracle.
Theorem 1 (Convergence). For all acyclic S and ω such that S ω wf,
there exists an oracle ω such that ω −→∗S ω and S ω final.

Proof. (Sketch—see Appendix B of the extended version [21] for the full proof.)
The value of any address with a binding is a function of its dependencies and the
oracle prediction for that address. We inductively define an address as fixed if
the oracle prediction is consistent for the address, and every address in the spill-
dependency set (defined in [21]) is fixed. Lemma 3 states that correct predictions
are always preserved, therefore a fixed address remains fixed through iteration
and its value remains invariant. The dependency graph of the sheet is acyclic
therefore if there is a non-fixed address then there must be a non-fixed address
with no dependencies but an inconsistent oracle prediction—we call this a non-
fixed source. Lemma 2 states that every new oracle correctly predicts the size
with respect to the previous grid, therefore any non-fixed sources will be fixed
in the new oracle. We conclude by observing that the number of fixed addresses
in the sheet strictly increases at each step, and when every address is fixed the
oracle is final.

4.6 Limitations and Diﬀerences with Real Systems

Permit preservation requires that if the size of an array does not change then
the permit (which may be ×) is preserved—this property is crucial for our proof
of convergence.
Real spreadsheet systems such as Sheets and Excel do not guarantee permit
preservation. A root a that is prevented from spilling using a permit × can later
760 J. Williams et al.

be permitted to spill, even if the size of the associated array does not change.
This particular interaction arises when a root that was previously preventing a
from spilling changes dimension, freeing a previously occupied spill area. Per-
mitting roots to spill into newly freed regions of the grid is desirable from a user
perspective because it reﬂects the visual aspect of spreadsheet programming
where an array will spill into any unoccupied cells.
A limitation of our formalism, if implemented directly, is that there exist some
spreadsheets that when evaluated will prevent an array from spilling, despite the
potential spill area being blank. Consider the sheet:

[A3 → {1, 2, 3}, C1 → IF(ISERROR(A3), 0, {4; 5; 6})]

When the total ordering used by oracle refinement orders A3 before C1 then
the behaviour is as expected: A3 spills to the right and C1 evaluates to an error
value. When the total ordering used by oracle refinement orders C1 before A3
then the behaviour appears peculiar: A3 evaluates to an error value and C1
evaluates to 0. The root A3 is prevented from spilling despite there appearing
room in the grid! The issue is that the array in A3 never changes size, therefore
the permit × assigned to the root is preserved, despite root C1 relinquishing the
spill area on subsequent spill iterations.
The fundamental problem is one of constraint satisfaction. We would like to
find a well-formed oracle that maximizes the number of roots that can spill in
a deterministic manner. The total order on addresses ensures determinism but
restricts the solution space. Our approach could be modified to deterministically
permute the ordering until an optimal solution is found, however such a method
would be prohibitively expensive.
Both Sheets and Excel find the best solution to our example sheet. We expect
their implementations do not permute a total order on addresses, but implement
a more efficient algorithm that runs for a bounded time. Finding a more efficient
algorithm that is guaranteed to terminate remains an open challenge.
The limitation we present in our formalism only arises when a spreadsheet
includes dynamic spill collisions and conditional spilling. We anticipate that this
is a rare use case for spilled arrays, and does not arise when using spilled arrays
to implement gridlets for live copy-paste-modify.

5 Grid Calculus: Spill Calculus with Sheets as Values

In this section we present the grid calculus: a higher-order spreadsheet calculus
with sheets as values. The grid calculus extends the spill calculus of Section 4.

5.1 Extending Spreadsheets with Gridlets

The gridlet concept [12] has been proposed but not implemented. Our observa-
tion is that spilling a range reference acts much like copy-paste, but lacks local
modification. We propose to implement gridlets using spilled arrays, by extend-
ing the spill calculus with primitives that implement first-class grid modification.
Higher-Order Spreadsheets with Spilled Arrays 761

A B C A B C
1 “Edge” “Len.” .. .. ..
. . .
2 “a” 3 B2^2
6 G(A1:C4, B2, 7, B3, 24)
3 “b” 4 B3^2
7
4 “c” SQRT(C4) C2 + C3
.. .. .. 8
. . . 9

Source range A1:C4 Gridlet invocation in A6

Revisiting the example from the introduction, there are four key interactions
happening in the invocation of a gridlet.
First, select the content in the grid that is to be modified.
Second, apply the selected modifications or updates.
Third, calculate the grid using the modified content.
Fourth and finally, project the calculated content into the grid.
Spreadsheets with spilled arrays support the final step but lack the capabilities
to support the first three. We add these capabilities using four new constructs.
First-class sheet values S .
Operator GRID that evaluates to the current sheet.
Operator UPDATE that binds a formula in a sheet-value.
Operator VIEW that evaluates a given range in a sheet-value to an array.
Using these constructs we can implement gridlets, for example:
def
G(A1:C4, B2, 7, B3, 24) =
VIEW(UPDATE(UPDATE(GRID, B2, 7), B3, 24), A1:C4)
Formatting is a core feature of Gridlets, but we omit formatting from the grid
calculus for clarity, on the basis that it would be a straightforward addition. We
now describe the details of the grid calculus.

5.2 Syntax and Operational Semantics

Figure 7 presents the syntax and operational semantics for the grid calculus. The
grid calculus does not require modiﬁcation of existing rules; we only add formula
evaluation rules for the new constructs, and evaluation relations for views.

Syntax Let x range over formula identiﬁers. Let F range over formulas which
may additionally be identiﬁers x, LET(x, F1 , F2 ) which binds the result of evalu-
ating F1 to x in F2 , GRID which captures the current sheet, UPDATE(F1 , a, F2 )
which updates a formula binding in a sheet-value, and VIEW(F, r) which extracts
a dereferenced range from a sheet-value. Let V range over values which may ad-
ditionally be a sheet-value S . Let V range over views; a view is a sheet with a
range, denoted (S, r). A view range r delimits the addresses to be computed in
sheet S.
762 J. Williams et al.

Formula evaluation: S, ω F ⇓ V, D

S, ω F1 ⇓ V1 , D1 S, ω F2 [x := V1 ] ⇓ V2 , D2
S, ω LET(x, F1 , F2 ) ⇓ V2 , D1 ∪ D2 S, ω GRID ⇓ S, ∅

S, ω F1 ⇓ S1 , D S, ω F ⇓ S1 , D (S1 , r) ⇓ V

S, ω UPDATE(F1 , a, F2 ) ⇓ S1 [a → F2 ], D1 S, ω VIEW(F, r) ⇓ V, D

View evaluation: V, ω ⇓ γ

def
(S, r), ω ⇓ γ = ∀a ∈ dom(S) ∩ area(r). S, ω a ! γ(a)

Spill iteration: ω −→V ω Final oracle: V ω ﬁnal

(S, r), ω ⇓ γ γ |= ω reﬁne(S, ω, γ) = ω V, ω ⇓ γ γ |= ω

ω −→(S,r) ω V ω ﬁnal

Final view evaluation: V ⇓ V

(S, r) ⇓ V = [] −→∗(S,r) ω and (S, r) ω ﬁnal and S, ω r ⇓ V, D

def

Fig. 7. Syntax and Operational Semantics for Grid Calculus (Extends Figures 3—6)

Formula evaluation: S, ω F ⇓ V, D A formula LET(x, F1 , F2 ) evaluates in the

standard way. A formula GRID evaluates to a sheet-value that captures the cur-
rent sheet. A formula UPDATE(F1 , a, F2 ) updates a formula binding in a sheet-
value. If evaluating formula F1 produces sheet-value S1 then UPDATE(F1 , a, F2 )
evaluates to the sheet-value where a is bound to F2 in S1 , denoted S1 [a → F2 ] .
A formula VIEW(F, r) evaluates a sheet-value and extracts a range. If evaluat-
ing formula F produces sheet-value S1 then VIEW(F, r) evaluates to the value
obtained by evaluating view (S1 , r). View evaluation is deﬁned in Figure 7 and
we describe the semantics at the end of the section. Here we address a subtle
property of VIEW; evaluating a view (S, r) adds no dependencies to the con-
taining formula. Dependency tracking in our semantics is used to prevent spill
cycles and captures dependence between values of addresses: the value of a spill
root should not depend on the value of an address in the spill area. In contrast,
sheet-values depend on the formula of an address in the containing sheet, but
Higher-Order Spreadsheets with Spilled Arrays 763

not the value of an address in the containing sheet. For example:

def
Let S = [A1 → VIEW(UPDATE(GRID, A1, 10), A2), A2 → A1]

Sheet S evaluates to grid [A1 → 10, A2 → 10]. What are the dependencies of
each address? The value of A2 in the grid depends on the value of A1 in the grid.
In contrast, the value of A1 in the grid does not depend on the value of A2 in the
grid. This is because evaluating the formula in A1 constructs a private grid from
which the value of A2 is obtained. However, A1 does depend on the formula
of A2 in the containing grid. Our semantics only considers value dependence,
therefore the dependency set of A1 is ∅—the address has no dependence on
values in the containing grid.
Formula dependence is vital for eﬃcient recalculation, though we do not
model that in our semantics and only use dependency tracking to prevent spill
cycles. If an address depends on the value of another address bound in a sheet,
then it also depends on the formula of that address. The converse is not true in
the presence of sheet-values.

View evaluation: V, ω ⇓ γ Evaluation of view (S, r) with oracle ω is deﬁned in

a similar manner as evaluation of sheets, however the induced grid γ is limited
to the sheet bindings that intersect the range r. There are two key consequences
that arise from limiting the induced grid. First, we only evaluate the bindings
in S required to evaluate the bindings in r. Second, only roots that are within
range r are permitted to spill; any root that is outside r remains as an address
containing a collapsed array. There is a diﬀerence between an address that holds
a collapsed array and a root that is prevented from spilling an array by permit
×. The former has a pre-spill and post-spill value that is an array; the latter has
a pre-spill value that is an array and a post-spill value that is an error.

Spill iteration: ω −→V ω The deﬁnition of spill iteration for views is the same
as spill iteration for sheets, except that we use view evaluation rather than sheet
evaluation.

Final oracle: V ω final The definition of a final oracle for views is the same as
a final oracle for sheets, except that we use view evaluation rather than sheet
evaluation.

Final view evaluation: V ⇓ V Evaluating a view (S, r) computes a ﬁnal oracle

for the view and then evaluates range r in the context of sheet S. Final view
evaluation will evaluate range r, rather than extracting values from an induced
grid, because viewing a range should sample all values in the range—including
blank cells. If we extract values from the induced grid we can only obtain the
values for addresses with a binding in r.
764 J. Williams et al.

5.3 Formulas for Gridlets

We can encode the G operator using primitives from the grid calculus.
[[G(r, a1 , V1 , . . . , an , Vn )]] = VIEW([[(a1 , V1 , . . . , an , Vn )]], r)
[[(a1 , V1 )]] = UPDATE(GRID, a1 , V1 )
[[(a1 , V1 , . . . , an+1 , Vn+1 )]] = UPDATE([[(a1 , V1 , . . . , an , Vn )]], an+1 , Vn+1 )
The G operator translates to the VIEW operator, and any bindings translate to
a sequence of UPDATE operations. The initial sheet-value is obtained from the
context using the GRID operator.
The translation illustrates that G is not higher-order because every applica-
tion returns the value obtained by evaluating a view on a sheet-value. A language
that only provides G does not permit sheet-values to escape and be manipulated
by formulas. This is acceptable when emulating copy-paste because a copy is
always taken with respect to the top-level sheet, however this does limit the
usefulness of G as an implementation construct. This limitation motivates the
design of the grid calculus; as we show in the next section, the grid calculus is
capable of encoding other language features.

6 Encoding Objects, Lambdas, and Functions

In this section we give three encodings that target the grid calculus: objects,
lambdas, and sheet-deﬁned functions.

6.1 Encoding the Abadi and Cardelli Object Calculus

We introduce the grid calculus to implement gridlets and the concept of live
copy-paste. Perhaps surprisingly, the grid calculus can encode object-oriented
programming, in particular the untyped object calculus of Abadi and Cardelli [1].
Their calculus is a tiny object-based programming language, akin to a prototype-
based language such as Self [6], but capable of representing class-based object-
oriented programming via encodings.
We draw a precise analogy between spreadsheets and objects. A sheet is like
an object. A cell is like a method name. A formula in a cell is like a method
implementation. The GRID operator is like the this keyword. Formula update is
like method update.
We assume an isomorphism between method names and cell addresses a
and use in both the object calculus and grid calculus. We deﬁne the translation
of object calculus terms to grid calculus formulas, denoted [[b]], as follows:
[[x]] = x
i∈0..n
[[[i = ς(xi )bi ]]] = [i → [[ς(xi )bi ]]i∈0..n ]
[[b.]] = VIEW([[b]], )
[[b1 . ⇐ ς(x)b2 ]] = UPDATE([[b1 ]], , [[ς(x)b2 ]])
[[ς(x)b]] = LET(x, GRID, [[b]])
Higher-Order Spreadsheets with Spilled Arrays 765

The translation makes our analogy concrete. We use the LET formula to lexically
capture self identiﬁers. The grid calculus allows the construction of diverging
formulas, as discussed in Section 4.5. We demonstrate this using a diverging
object calculus term.

Ω = [[[A1 = ς(x)x.A1].A1]] = VIEW([A1 → LET(x, GRID, VIEW(x, A1))] , A1)

The operational semantics are preserved by the translation. We assume a big-

step relation for object calculus terms, denoted b ⇓ o. The proof is in Appendix
C of the extended version [21].
Theorem 2. If b is a closed and b ⇓ o then [], [] [[b]] ⇓ [[o]], ∅.

6.2 Encoding the Lambda Calculus

We give an encoding of the lambda calculus that is inspired by the object calculus
embedding of the lambda calculus. We use ARG1 to hold the argument and
VAL1 to hold the result of a lambda. In spreadsheet languages both ARG1 and
VAL1 are legal cell addresses; for example, address ARG1 denotes the cell at
column 1151 and row 1.

[[x]] = x
[[λx.M ]] = UPDATE(GRID, VAL1, LET(x, VIEW(GRID, ARG1), [[M ]]))
[[M N ]] = VIEW(UPDATE([[M ]], ARG1, [[N ]]), VAL1)

6.3 Encoding Sheet-Deﬁned Functions

A sheet-deﬁned function [14, 17, 19, 20] is a mechanism for a user to author a
function using a region of a spreadsheet. We can model a sheet-deﬁned function
f as a triple (S, (a0 , . . . , an ), r) that consists of the moat or sheet-bindings for
the function, the addresses from the moat that denote arguments, and the range
from the moat that denotes the result. The application f (V0 , . . . , Vn ) can be
encoded in the grid calculus as follows, where f = (S, (a0 , . . . , an ), r):

[[f (V0 , . . . , Vn )]] = VIEW([[(V0 , . . . , Vn )]], r)

[[()]] = S
[[(V0 , . . . , Vn +1 )]] = UPDATE([[(V0 , . . . , Vn )]], an +1 , Vn +1 )

7 Related Work

Formal Semantics of Spreadsheets. Our core calculus is similar to previous for-

malisms for spreadsheets, Several previous works [3, 7, 14, 19] oﬀer formal se-
mantics for spreadsheet fragments. Mokhov et al. [16] capture the logic of re-
calculating dependent cells. Finally, Bock et al. [4] provide a cost semantics for
evaluation of spreadsheet formulas.
766 J. Williams et al.

Spilling. Major spreadsheet implementations like Sheets 6 and Excel 7 implement

spilled arrays [11], but do not document details of the implementation. In [17],
authors propose a spilling-like mechanism that allows matrix values in cells to
spread across a predefined range—this is closely related to “Ctrl+Shift+Enter”
formulas 8 in Excel. The proposal in [17] is significantly simpler than spilled
arrays because the dimension of the spilled area is fixed and declared ahead of
time. Sarkar et al. [18] note that spilled arrays violate Kay’s value principle [13]
because a user is unable to edit constituent cells, except for the spill root.

Extending the Spreadsheet Paradigm. Clack and Braine [8] propose a spreadsheet
based on a combination of functional and object-oriented programming. Their
integration is diﬀerent from our analogy: in their system, a class is a collection
of parameterised worksheets, and a parameterised worksheet corresponds to a
method. In gridlets, the grid corresponds to an object and cells on the grid
correspond to methods of the object.

Similarity Inheritance in Forms/3. Forms/3 [5] is a visual programming lan-

guage that borrows the key concept of cell from spreadsheets. Instead of a tab-
ular sheet, cells in Forms/3 are arranged on a form: a canvas with no structure.
Forms/3 explored an abstraction model called “similarity inheritance” through
which a form may borrow cells from another form and optionally modify at-
tributes of certain cells. This resembles substitution in gridlets, however reusing
a portion of the tabular grid and spilling into adjacent cells are primary to
gridlets, whereas such notions are absent from Forms/3.

Sheet-deﬁned Functions. Sheet-deﬁned functions [17] (SDFs) allow the user to

reuse logic deﬁned using formulas in the grid. The user nominates input cells, an
output cell, and gives the function a name. When the function is called, a virtual
copy of the workbook is instantiated. Arguments to the function are placed in
the input cells, the virtual workbook is calculated, and the result from the output
cell is returned.
Elastic SDFs [14] generalize SDFs to handle input arrays of arbitrary size.
In [4], the authors provide a precise semantics for SDFs, closures and array
formulas, but not for spilling. Gridlets are more general than SDFs as each
Gridlet invocation can have a unique set of local substitutions, whereas all calls
to an SDF share the same arguments, giving greater ﬂexibility to the user.

Error prevention and Error detection. Abraham and Erwig propose type systems
for error detection [3] and automatic model inference [2]. Abraham and Erwig [3]
provide an operational semantics for sheets that is similar to the core calculus
in Section 3, but they do not give a semantics for spilled arrays.
Gencel [10] is a typed “template language” that describes the layout of a de-
sired worksheet along with a set of customized update operations that are speciﬁc
6
https://support.google.com/docs/answer/6208276?hl=en
7
https://aka.ms/excel-dynamic-arrays
8
https://aka.ms/excel-cse-formulas
Higher-Order Spreadsheets with Spilled Arrays 767

to the particular template. The type system guarantees that the restricted set
of update operations keeps the desired worksheet free from omission, reference,
and type errors.
Cheng and Rival [7] use abstract interpretation to detect formula errors due
to mismatch in type. Their technique also incorporates analysis of associated
programs, such as VBA scripts, along with formulas on the grid.

8 Conclusion
Repetition is common in programming—spreadsheets are no different. The dis-
tinguishing property of spreadsheets is that reuse includes formatting and layout,
and is not limited to formula logic. Gridlets [12] are a high-level re-use abstrac-
tion for spreadsheets. In this work we give the first semantics of gridlets as a
formula. Our approach comes in two stages.
First, we make sense of spilled arrays, a feature that is available in major
spreadsheet implementations but not previously formalised. The concept is sim-
ple and belies the many subtleties involved in implementing spilled arrays. We
present the spill calculus as a concise description of spilling in spreadsheets.
Second, we extend the spill calculus with the tools to implement gridlets. The
grid calculus introduces the concept of first-class sheet values, and describes the
semantics of three higher-order operators that emulate copy-paste-modify. The
composition of these operators gives the semantics for gridlet operator G.
Spreadsheet programming bears a resemblance to object-oriented program-
ming, alluded to often in the literature. We show that the resemblance runs deep
by giving an encoding of the object calculus into the grid calculus, with a direct
parallel between objects and sheets.

Acknowledgements

Thank you to the Microsoft Excel team for hosting the second author during his
research internship at Microsoft’s Redmond campus. Thank you to Tony Hoare,
Simon Peyton Jones, Ben Zorn, and members of the Microsoft Excel team for
their feedback and assistance with this work.

Cambridge, UK, and Davis, California, USA

Spreadsheet Day, October 17, 2019
768 J. Williams et al.

References
1. Abadi, M., Cardelli, L.: A Theory of Objects. Monographs in Computer Science,
Springer (1996)
2. Abraham, R., Erwig, M.: Inferring templates from spreadsheets. In: Proceedings
of the 28th International Conference on Software Engineering. pp. 182–191. ICSE
’06, ACM, New York, NY, USA (2006)
3. Abraham, R., Erwig, M.: Type inference for spreadsheets. In: Proceedings of the
8th ACM SIGPLAN International Conference on Principles and Practice of Declar-
ative Programming. pp. 73–84. PPDP ’06, ACM, New York, NY, USA (2006)
4. Bock, A.A., Bøgholm, T., Sestoft, P., Thomsen, B., Thomsen, L.L.: Concrete and
abstract cost semantics for spreadsheets. Tech. Rep. TR–2008–203, IT University
of Copenhagen (2018)
5. Burnett, M., Atwood, J., Djang, R.W., Reichwein, J., Gottfried, H., Yang, S.:
Forms/3: A first-order visual language to explore the boundaries of the spreadsheet
paradigm. Journal of functional programming 11(2), 155–206 (2001)
6. Chambers, C., Ungar, D.M.: Customization: Optimizing compiler technology for
self, A dynamically-typed object-oriented programming language. In: PLDI. pp.
146–160. ACM (1989)
7. Cheng, T., Rival, X.: Static analysis of spreadsheet applications for type-unsafe
operations detection. In: Vitek, J. (ed.) Programming Languages and Systems. pp.
26–52. Springer Berlin Heidelberg, Berlin, Heidelberg (2015)
8. Clack, C., Braine, L.: Object-oriented functional spreadsheets. In: 10th Glasgow
Workshop on Functional Programming. pp. 1–12 (1997)
9. Djang, R.W., Burnett, M.M.: Similarity inheritance: a new model of inheritance
for spreadsheet vpls. In: Proceedings. 1998 IEEE Symposium on Visual Languages
(Cat. No. 98TB100254). pp. 134–141. IEEE (1998)
10. Erwig, M., Abraham, R., Cooperstein, I., Kollmansberger, S.: Automatic genera-
tion and maintenance of correct spreadsheets. In: Proceedings of the 27th interna-
tional conference on Software engineering. pp. 136–145. ACM (2005)
11. Jelen, B.: Excel Dynamic Arrays Straight to the Point. Holy Macro!
Books (2018), see also https://blog-insider.office.com/2019/06/13/
dynamic-arrays-and-new-functions-in-excel/
12. Joharizadeh, N., Sarkar, A., Gordon, A.D., Williams, J.: Gridlets: Reusing
spreadsheet grids. In: Extended Abstracts of the 2020 CHI Conference on
Human Factors in Computing Systems. CHI EA ’20, ACM, New York, NY,
USA (2020). https://doi.org/10.1145/3334480.3382806, http://doi.acm.org/10.
1145/3334480.3382806
13. Kay, A.: Computer software. Scientific American 251(3), 52–59 (1984), http://
www.jstor.org/stable/24920344
14. McCutchen, M., Borghouts, J., Gordon, A.D., Peyton Jones, S., Sarkar, A.: Elastic
sheet-defined functions: Generalising spreadsheet functions to variable-size input
arrays (2019), unpublished manuscript available at https://aka.ms/calcintel
15. McCutchen, M., Itzhaky, S., Jackson, D.: Object spreadsheets: A new computa-
tional model for end-user development of data-centric web applications. In: Pro-
ceedings of the 2016 ACM International Symposium on New Ideas, New Paradigms,
and Reflections on Programming and Software. pp. 112–127. Onward! 2016, ACM,
New York, NY, USA (2016)
16. Mokhov, A., Mitchell, N., Peyton Jones, S.: Build systems à la carte. PACMPL
2(ICFP), 79:1–79:29 (2018)
Higher-Order Spreadsheets with Spilled Arrays 769

17. Peyton Jones, S.L., Blackwell, A.F., Burnett, M.M.: A user-centred approach to
functions in Excel. In: ICFP. pp. 165–176. ACM (2003)
18. Sarkar, A., Gordon, A.D., Jones, S.P., Toronto, N.: Calculation view: multiple-
representation editing in spreadsheets. In: 2018 IEEE Symposium on Visual
Languages and Human-Centric Computing (VL/HCC). pp. 85–93 (Oct 2018).
https://doi.org/10.1109/VLHCC.2018.8506584
19. Sestoft, P.: Implementing function spreadsheets. In: Proceedings of the 4th inter-
national workshop on End-user software engineering. pp. 91–94. ACM (2008)
20. Sestoft, P., Sørensen, J.Z.: Sheet-deﬁned functions: Implementation and initial eval-
uation. In: Dittrich, Y., Burnett, M., Mørch, A., Redmiles, D. (eds.) End-User
Development. pp. 88–103. Springer Berlin Heidelberg, Berlin, Heidelberg (2013)
21. Williams, J., Joharizadeh, N., Gordon, A.D., Sarkar, A.: Higher-order spreadsheets
with spilled arrays (with appendices). Tech. rep., Microsoft Research (2020), https:
//aka.ms/calcintel

Abate, Carmine 1 Hrițcu, Cătălin 1

Adams, Michael D. 197 Hu, Qinheping 572
Ahman, Danel 29
Almeida, Bernardo 715 Ibsen-Jensen, Rasmus 112
Appel, Andrew W. 428 Igarashi, Atsushi 684
Armstrong, Alasdair 626
Joharizadeh, Nima 743
Barthe, Gilles 56 Jongmans, Sung-Shik 251
Batty, Mark 599 Jovanović, Dejan 224, 280
Bauer, Andrej 29
Birkedal, Lars 336 Kobayashi, Naoki 484, 684
Blanco, Roberto 1 Krishna, Siddharth 280, 308
Bohrer, Brandon 84 Krogh-Jespersen, Morten 336
Bono, Viviana 169
Lago, Ugo Dal 56
Casal, Filipe 715 Laurel, Jacob 366
Chatterjee, Krishnendu 112 Löding, Christof 515
Ciobâcă, Ștefan 1
Cooksey, Simon 599 Madhusudan, P. 515
Costea, Andreea 141 Mamouras, Konstantinos 394
Crubillé, Raphaëlle 56 Mansky, William 428
Maranget, Luc 626
D’Antoni, Loris 572 Matsuda, Kazutaka 456
Dagnino, Francesco 169 Matsushita, Yusuke 484
Dezani-Ciancaglini, Mariangiola 169 Misailovic, Sasa 366
Durier, Adrien 1 Mordido, Andreia 715
Murali, Adithya 515
Emmi, Michael 280
Enea, Constantin 280 Nair, Sreeja S. 544

Flur, Shaked 626 Ohlenbusch, Marit Edna 336

Owens, Scott 599
Garg, Deepak 1
Gavazzo, Francesco 56 Pan, Rong 572
Germane, Kimball 197 Paradis, Anouk 599
Goharshady, Amir Kafshdar 112 Patrignani, Marco 1
Gordon, Andrew D. 743 Paviotti, Marco 599
Gregersen, Simon Oddershede 336 Pavlogiannis, Andreas 112
Peña, Lucas 515
Hage, Jurriaan 656 Petri, Gustavo 544
Hajdu, Ákos 224 Pichon-Pharabod, Jean 626
Honoré, Wolf 428 Platzer, André 84
772 Author Index

Polikarpova, Nadia 141 Thorand, Fabian 656

Pulte, Christopher 626 Timany, Amin 336
Toman, John 684
Sarkar, Advait 743 Tsukada, Takeshi 484
Sergey, Ilya 141
Sewell, Peter 626 Vasconcelos, Vasco T. 715
Shapiro, Marc 544
Simner, Ben 626 Wies, Thomas 308
Singh, Rishabh 572 Williams, Jack 743
Siqi, Ren 684 Wright, Daniel 599
Suenaga, Kohei 684
Summers, Alexander J. 308 Yoshida, Nobuko 251

Tanter, Éric 1 Zhu, Amy 141

Thibault, Jérémy 1 Zucca, Elena 169

Combinatorial Optimization: Alexander Schrijver
No ratings yet
Combinatorial Optimization: Alexander Schrijver
34 pages
COMP3001 Design and Analysis of Algorithms Semester 1 2020 Bentley Campus INT
No ratings yet
COMP3001 Design and Analysis of Algorithms Semester 1 2020 Bentley Campus INT
10 pages
Structured Analysis and System Specification - DeMarco
No ratings yet
Structured Analysis and System Specification - DeMarco
6 pages
2014 Book ToolsAndAlgorithmsForTheConstr PDF
100% (1)
2014 Book ToolsAndAlgorithmsForTheConstr PDF
670 pages
Linear Temporal Logic Symbolic Model Checking
100% (1)
Linear Temporal Logic Symbolic Model Checking
41 pages
Lisp Art of The Interpreter Sussman
100% (3)
Lisp Art of The Interpreter Sussman
75 pages
Archimedes Operating System
100% (1)
Archimedes Operating System
320 pages
(Foundations of Computing) Joseph A. Goguen, Grant Malcolm-Algebraic Semantics of Imperative Programs-The MIT Press (1996) PDF
No ratings yet
(Foundations of Computing) Joseph A. Goguen, Grant Malcolm-Algebraic Semantics of Imperative Programs-The MIT Press (1996) PDF
229 pages
NCERT Math 11th CBSE PDF
100% (1)
NCERT Math 11th CBSE PDF
452 pages
On The Expressive Power of Deep Neural Networks
No ratings yet
On The Expressive Power of Deep Neural Networks
8 pages
The Design and Implementation of Xmonad
100% (1)
The Design and Implementation of Xmonad
38 pages
Spectral Analysis of Julia Sets
100% (1)
Spectral Analysis of Julia Sets
142 pages
Michael Shub (Auth.) - Global Stability of Dynamical Systems-Springer-Verlag New York (1987) PDF
100% (3)
Michael Shub (Auth.) - Global Stability of Dynamical Systems-Springer-Verlag New York (1987) PDF
159 pages
Haskell Programming 0.12.0 Ereader PDF
100% (1)
Haskell Programming 0.12.0 Ereader PDF
1,879 pages
Group Invariance in Statistical Inference (Narayan)
100% (1)
Group Invariance in Statistical Inference (Narayan)
176 pages
Decision Uncertainty
No ratings yet
Decision Uncertainty
269 pages
CG Programming
100% (1)
CG Programming
429 pages
Operating Systems Embedded Systems and Real-Time Systems
100% (1)
Operating Systems Embedded Systems and Real-Time Systems
177 pages
Haskell Notes
100% (1)
Haskell Notes
192 pages
Interactive Programming in Java
100% (1)
Interactive Programming in Java
509 pages
Pilipović, Stevan - Stanković, Bogoljub - Vindas, Jasson - Asymptotic Behavior of Generalized Functions PDF
100% (1)
Pilipović, Stevan - Stanković, Bogoljub - Vindas, Jasson - Asymptotic Behavior of Generalized Functions PDF
309 pages
Lambda Calculus
No ratings yet
Lambda Calculus
14 pages
Carnap, Logical Foundations of Probability
100% (1)
Carnap, Logical Foundations of Probability
10 pages
Dokumen - Pub Introduction To The Design and Analysis of Algorithms 0071243461 9780071243469
No ratings yet
Dokumen - Pub Introduction To The Design and Analysis of Algorithms 0071243461 9780071243469
750 pages
Taming The State in React
100% (1)
Taming The State in React
261 pages
Lecture Notes - Kristiaan Pelckmans
100% (1)
Lecture Notes - Kristiaan Pelckmans
153 pages
Rational Unified Process
100% (1)
Rational Unified Process
7 pages
(Carlo Cercignani, Ester Gabetta) Transport Phenom (BookFi)
100% (2)
(Carlo Cercignani, Ester Gabetta) Transport Phenom (BookFi)
274 pages
Theory of Complexity Classes - Yap (1998)
100% (1)
Theory of Complexity Classes - Yap (1998)
435 pages
Programming Languages: Application and Interpretation
100% (1)
Programming Languages: Application and Interpretation
376 pages
Jeff Erikson - Models of Computation
100% (1)
Jeff Erikson - Models of Computation
155 pages
Haskell
No ratings yet
Haskell
597 pages
The Children's Machine: Seymour Papert
100% (1)
The Children's Machine: Seymour Papert
27 pages
Nonparametric Statistical Methods Using R 2nd ed Edition Myles Hollander download pdf
100% (14)
Nonparametric Statistical Methods Using R 2nd ed Edition Myles Hollander download pdf
60 pages
Functional Programming - Lambda Calculus
100% (2)
Functional Programming - Lambda Calculus
24 pages
Electrical Engineering: Computer Fundamentals
No ratings yet
Electrical Engineering: Computer Fundamentals
14 pages
Mark Joshi (Auth.) - Proof Patterns-Springer International Publishing (2015)
No ratings yet
Mark Joshi (Auth.) - Proof Patterns-Springer International Publishing (2015)
189 pages
An Intuitive Introduction To Data Structures Heinold
No ratings yet
An Intuitive Introduction To Data Structures Heinold
167 pages
Meta-Programming - A Software Production Method by Charles Simonyi
100% (1)
Meta-Programming - A Software Production Method by Charles Simonyi
146 pages
Aizenman J., Pinto B. (Eds.) Managing Economic Volatility and Crises (CUP, 2005) (ISBN 0521855241) (O) (615s) - GK
100% (2)
Aizenman J., Pinto B. (Eds.) Managing Economic Volatility and Crises (CUP, 2005) (ISBN 0521855241) (O) (615s) - GK
615 pages
CO250 Web
No ratings yet
CO250 Web
204 pages
Flexible Operating System Internals
100% (1)
Flexible Operating System Internals
362 pages
Undergraduate Text
No ratings yet
Undergraduate Text
351 pages
StreamMining PDF
No ratings yet
StreamMining PDF
185 pages
Econometrics in STAN
No ratings yet
Econometrics in STAN
39 pages
An Introduction To R: W. N. Venables, D. M. Smith and The R Core Team
No ratings yet
An Introduction To R: W. N. Venables, D. M. Smith and The R Core Team
105 pages
PDF Computer Organization and Design RISC V Edition The Hardware Software Interface David A. Patterson download
100% (5)
PDF Computer Organization and Design RISC V Edition The Hardware Software Interface David A. Patterson download
65 pages
Get Semiclassical Mechanics With Molecular Applications 2nd Edition Mark S. Child Free All Chapters
100% (6)
Get Semiclassical Mechanics With Molecular Applications 2nd Edition Mark S. Child Free All Chapters
66 pages
Asymptotic Analysis
No ratings yet
Asymptotic Analysis
42 pages
Knowledge Representation Reasoning and Declarative Problem Solving With Answer Sets - Chitta Baral PDF
100% (1)
Knowledge Representation Reasoning and Declarative Problem Solving With Answer Sets - Chitta Baral PDF
417 pages
Graphical User Interfaces in Haskell
100% (1)
Graphical User Interfaces in Haskell
98 pages
Symmetry & Space Groups
No ratings yet
Symmetry & Space Groups
49 pages
A Systematic Review of Network Protocol Fuzzing Techniques
100% (1)
A Systematic Review of Network Protocol Fuzzing Techniques
6 pages
Fields of Logic and Computation Essays Dedicated To Yuri Gurevich On The Occasion of His 70th Birthday PDF
No ratings yet
Fields of Logic and Computation Essays Dedicated To Yuri Gurevich On The Occasion of His 70th Birthday PDF
636 pages
The Economics of Sovereign Debt and Default
From Everand
The Economics of Sovereign Debt and Default
Mark Aguiar
No ratings yet
Developing Intelligent Agent Systems: A Practical Guide
From Everand
Developing Intelligent Agent Systems: A Practical Guide
Lin Padgham
3/5 (1)
Lectures on the Coupling Method
From Everand
Lectures on the Coupling Method
Torgny Lindvall
No ratings yet
Dartmouth Proposal: Fundamentals and Applications
From Everand
Dartmouth Proposal: Fundamentals and Applications
Fouad Sabry
No ratings yet
Error-Correction on Non-Standard Communication Channels
From Everand
Error-Correction on Non-Standard Communication Channels
Edward A. Ratzer
No ratings yet
The Satisfiability Problem: Algorithms and Analyses
From Everand
The Satisfiability Problem: Algorithms and Analyses
Uwe Schöning
No ratings yet
Master OOPs Concepts in Java
No ratings yet
Master OOPs Concepts in Java
10 pages
SCU-LAN10 Operation Manual ENG 2012-B
No ratings yet
SCU-LAN10 Operation Manual ENG 2012-B
86 pages
Book Recommendation System
No ratings yet
Book Recommendation System
10 pages
E-Health and Nursing: Using Smartphones To Enhance Nursing Practice
No ratings yet
E-Health and Nursing: Using Smartphones To Enhance Nursing Practice
8 pages
Introduction To Digital Signal Processors (DSPS) : Prof. Brian L. Evans
No ratings yet
Introduction To Digital Signal Processors (DSPS) : Prof. Brian L. Evans
30 pages
CS217 - Object-Oriented Programming (OOP) Assignment # 1: Carefully Read The Following Instructions!
No ratings yet
CS217 - Object-Oriented Programming (OOP) Assignment # 1: Carefully Read The Following Instructions!
2 pages
BitDefender Security For SharePoint Administration Guide en
No ratings yet
BitDefender Security For SharePoint Administration Guide en
129 pages
4700L Service Manual
No ratings yet
4700L Service Manual
1,024 pages
BSC It Dbms Notes
73% (11)
BSC It Dbms Notes
187 pages
2024 - Advanced Database Systems - 83858
No ratings yet
2024 - Advanced Database Systems - 83858
3 pages
Oral - Presentation - RT2018 - NCruz - RNC - Software - Architecture
No ratings yet
Oral - Presentation - RT2018 - NCruz - RNC - Software - Architecture
14 pages
Chapter 8 User Defined Function
No ratings yet
Chapter 8 User Defined Function
6 pages
COMP 312 COMPUTER NETWORKS - kabarak university
No ratings yet
COMP 312 COMPUTER NETWORKS - kabarak university
6 pages
994-0089 D400 Substation Gateway Hardware User Manual v1.30 R12
No ratings yet
994-0089 D400 Substation Gateway Hardware User Manual v1.30 R12
124 pages
Emc Powerpath Ve For Vmware Sphere Install Reference
No ratings yet
Emc Powerpath Ve For Vmware Sphere Install Reference
9 pages
WWW Csselectronics Com Pages Lin Bus Protocol Intro Basics
No ratings yet
WWW Csselectronics Com Pages Lin Bus Protocol Intro Basics
18 pages
CAE 1 Time Table - Odd Sem 2021-22
No ratings yet
CAE 1 Time Table - Odd Sem 2021-22
28 pages
Info Tech Word Assignment
No ratings yet
Info Tech Word Assignment
9 pages
IMAGEnet I-Base en New CI
No ratings yet
IMAGEnet I-Base en New CI
6 pages
SQL
No ratings yet
SQL
501 pages
OM0202017E25_YA-1NPCseries_Operating instruction
No ratings yet
OM0202017E25_YA-1NPCseries_Operating instruction
276 pages
Cloud Run
No ratings yet
Cloud Run
10 pages
Unity 2D Game Development Sample Chapter
100% (1)
Unity 2D Game Development Sample Chapter
28 pages
Causal Loop Analysis
No ratings yet
Causal Loop Analysis
1 page
Semi Automatic Biochemistry Analyzer (With Coagulation & Incubator)
No ratings yet
Semi Automatic Biochemistry Analyzer (With Coagulation & Incubator)
2 pages
Support Ninja Secrets
No ratings yet
Support Ninja Secrets
17 pages
Versant Guide - Test Administrators Guide
100% (2)
Versant Guide - Test Administrators Guide
22 pages
Wireless Sensor Systems Security Implications For The Industrial Environment Fuhr
No ratings yet
Wireless Sensor Systems Security Implications For The Industrial Environment Fuhr
136 pages
CAT 2023 Mock Test Navigation Guide
No ratings yet
CAT 2023 Mock Test Navigation Guide
10 pages

Combinatorial Optimization: Alexander Schrijver
Combinatorial Optimization: Alexander Schrijver
COMP3001 Design and Analysis of Algorithms Semester 1 2020 Bentley Campus INT
COMP3001 Design and Analysis of Algorithms Semester 1 2020 Bentley Campus INT
Structured Analysis and System Specification - DeMarco
Structured Analysis and System Specification - DeMarco
2014 Book ToolsAndAlgorithmsForTheConstr PDF
2014 Book ToolsAndAlgorithmsForTheConstr PDF
Linear Temporal Logic Symbolic Model Checking
Linear Temporal Logic Symbolic Model Checking
Lisp Art of The Interpreter Sussman
Lisp Art of The Interpreter Sussman
Archimedes Operating System
Archimedes Operating System
(Foundations of Computing) Joseph A. Goguen, Grant Malcolm-Algebraic Semantics of Imperative Programs-The MIT Press (1996) PDF
(Foundations of Computing) Joseph A. Goguen, Grant Malcolm-Algebraic Semantics of Imperative Programs-The MIT Press (1996) PDF
NCERT Math 11th CBSE PDF
NCERT Math 11th CBSE PDF
On The Expressive Power of Deep Neural Networks
On The Expressive Power of Deep Neural Networks
The Design and Implementation of Xmonad
The Design and Implementation of Xmonad
Spectral Analysis of Julia Sets
Spectral Analysis of Julia Sets
Michael Shub (Auth.) - Global Stability of Dynamical Systems-Springer-Verlag New York (1987) PDF
Michael Shub (Auth.) - Global Stability of Dynamical Systems-Springer-Verlag New York (1987) PDF
Haskell Programming 0.12.0 Ereader PDF
Haskell Programming 0.12.0 Ereader PDF
Group Invariance in Statistical Inference (Narayan)
Group Invariance in Statistical Inference (Narayan)
Decision Uncertainty
Decision Uncertainty
CG Programming
CG Programming
Operating Systems Embedded Systems and Real-Time Systems
Operating Systems Embedded Systems and Real-Time Systems
Haskell Notes
Haskell Notes
Interactive Programming in Java
Interactive Programming in Java
Pilipović, Stevan - Stanković, Bogoljub - Vindas, Jasson - Asymptotic Behavior of Generalized Functions PDF
Pilipović, Stevan - Stanković, Bogoljub - Vindas, Jasson - Asymptotic Behavior of Generalized Functions PDF
Lambda Calculus
Lambda Calculus
Carnap, Logical Foundations of Probability
Carnap, Logical Foundations of Probability
Dokumen - Pub Introduction To The Design and Analysis of Algorithms 0071243461 9780071243469
Dokumen - Pub Introduction To The Design and Analysis of Algorithms 0071243461 9780071243469
Taming The State in React
Taming The State in React
Lecture Notes - Kristiaan Pelckmans
Lecture Notes - Kristiaan Pelckmans
Rational Unified Process
Rational Unified Process
(Carlo Cercignani, Ester Gabetta) Transport Phenom (BookFi)
(Carlo Cercignani, Ester Gabetta) Transport Phenom (BookFi)
Theory of Complexity Classes - Yap (1998)
Theory of Complexity Classes - Yap (1998)
Programming Languages: Application and Interpretation
Programming Languages: Application and Interpretation
Jeff Erikson - Models of Computation
Jeff Erikson - Models of Computation
Haskell
Haskell
The Children's Machine: Seymour Papert
The Children's Machine: Seymour Papert
Nonparametric Statistical Methods Using R 2nd ed Edition Myles Hollander download pdf
Nonparametric Statistical Methods Using R 2nd ed Edition Myles Hollander download pdf
Functional Programming - Lambda Calculus
Functional Programming - Lambda Calculus
Electrical Engineering: Computer Fundamentals
Electrical Engineering: Computer Fundamentals
Mark Joshi (Auth.) - Proof Patterns-Springer International Publishing (2015)
Mark Joshi (Auth.) - Proof Patterns-Springer International Publishing (2015)
An Intuitive Introduction To Data Structures Heinold
An Intuitive Introduction To Data Structures Heinold
Meta-Programming - A Software Production Method by Charles Simonyi
Meta-Programming - A Software Production Method by Charles Simonyi
Aizenman J., Pinto B. (Eds.) Managing Economic Volatility and Crises (CUP, 2005) (ISBN 0521855241) (O) (615s) - GK
Aizenman J., Pinto B. (Eds.) Managing Economic Volatility and Crises (CUP, 2005) (ISBN 0521855241) (O) (615s) - GK
CO250 Web
CO250 Web
Flexible Operating System Internals
Flexible Operating System Internals
Undergraduate Text
Undergraduate Text
StreamMining PDF
StreamMining PDF
Econometrics in STAN
Econometrics in STAN
An Introduction To R: W. N. Venables, D. M. Smith and The R Core Team
An Introduction To R: W. N. Venables, D. M. Smith and The R Core Team
PDF Computer Organization and Design RISC V Edition The Hardware Software Interface David A. Patterson download
PDF Computer Organization and Design RISC V Edition The Hardware Software Interface David A. Patterson download
Get Semiclassical Mechanics With Molecular Applications 2nd Edition Mark S. Child Free All Chapters
Get Semiclassical Mechanics With Molecular Applications 2nd Edition Mark S. Child Free All Chapters
Asymptotic Analysis
Asymptotic Analysis
Knowledge Representation Reasoning and Declarative Problem Solving With Answer Sets - Chitta Baral PDF
Knowledge Representation Reasoning and Declarative Problem Solving With Answer Sets - Chitta Baral PDF
Graphical User Interfaces in Haskell
Graphical User Interfaces in Haskell
Symmetry & Space Groups
Symmetry & Space Groups
A Systematic Review of Network Protocol Fuzzing Techniques
A Systematic Review of Network Protocol Fuzzing Techniques
Fields of Logic and Computation Essays Dedicated To Yuri Gurevich On The Occasion of His 70th Birthday PDF
Fields of Logic and Computation Essays Dedicated To Yuri Gurevich On The Occasion of His 70th Birthday PDF
The Economics of Sovereign Debt and Default
From Everand
The Economics of Sovereign Debt and Default
Developing Intelligent Agent Systems: A Practical Guide
From Everand
Developing Intelligent Agent Systems: A Practical Guide
Lectures on the Coupling Method
From Everand
Lectures on the Coupling Method
Dartmouth Proposal: Fundamentals and Applications
From Everand
Dartmouth Proposal: Fundamentals and Applications
Error-Correction on Non-Standard Communication Channels
From Everand
Error-Correction on Non-Standard Communication Channels
The Satisfiability Problem: Algorithms and Analyses
From Everand
The Satisfiability Problem: Algorithms and Analyses
Master OOPs Concepts in Java
Master OOPs Concepts in Java
SCU-LAN10 Operation Manual ENG 2012-B
SCU-LAN10 Operation Manual ENG 2012-B
Book Recommendation System
Book Recommendation System
E-Health and Nursing: Using Smartphones To Enhance Nursing Practice
E-Health and Nursing: Using Smartphones To Enhance Nursing Practice
Introduction To Digital Signal Processors (DSPS) : Prof. Brian L. Evans
Introduction To Digital Signal Processors (DSPS) : Prof. Brian L. Evans
CS217 - Object-Oriented Programming (OOP) Assignment # 1: Carefully Read The Following Instructions!
CS217 - Object-Oriented Programming (OOP) Assignment # 1: Carefully Read The Following Instructions!
BitDefender Security For SharePoint Administration Guide en
BitDefender Security For SharePoint Administration Guide en
4700L Service Manual
4700L Service Manual
BSC It Dbms Notes
BSC It Dbms Notes
2024 - Advanced Database Systems - 83858
2024 - Advanced Database Systems - 83858
Oral - Presentation - RT2018 - NCruz - RNC - Software - Architecture
Oral - Presentation - RT2018 - NCruz - RNC - Software - Architecture
Chapter 8 User Defined Function
Chapter 8 User Defined Function
COMP 312 COMPUTER NETWORKS - kabarak university
COMP 312 COMPUTER NETWORKS - kabarak university
994-0089 D400 Substation Gateway Hardware User Manual v1.30 R12
994-0089 D400 Substation Gateway Hardware User Manual v1.30 R12
Emc Powerpath Ve For Vmware Sphere Install Reference
Emc Powerpath Ve For Vmware Sphere Install Reference
WWW Csselectronics Com Pages Lin Bus Protocol Intro Basics
WWW Csselectronics Com Pages Lin Bus Protocol Intro Basics
CAE 1 Time Table - Odd Sem 2021-22
CAE 1 Time Table - Odd Sem 2021-22
Info Tech Word Assignment
Info Tech Word Assignment
IMAGEnet I-Base en New CI
IMAGEnet I-Base en New CI
SQL
SQL
OM0202017E25_YA-1NPCseries_Operating instruction
OM0202017E25_YA-1NPCseries_Operating instruction
Cloud Run
Cloud Run
Unity 2D Game Development Sample Chapter
Unity 2D Game Development Sample Chapter
Causal Loop Analysis
Causal Loop Analysis
Semi Automatic Biochemistry Analyzer (With Coagulation & Incubator)
Semi Automatic Biochemistry Analyzer (With Coagulation & Incubator)
Support Ninja Secrets
Support Ninja Secrets
Versant Guide - Test Administrators Guide
Versant Guide - Test Administrators Guide
Wireless Sensor Systems Security Implications For The Industrial Environment Fuhr
Wireless Sensor Systems Security Implications For The Industrial Environment Fuhr
CAT 2023 Mock Test Navigation Guide
CAT 2023 Mock Test Navigation Guide

2020 Book ProgrammingLanguagesAndSystems PDF

Uploaded by

2020 Book ProgrammingLanguagesAndSystems PDF

Uploaded by

ARCoSS Peter Müller (Ed.

Editorial Board Members

Advanced Research in Computing and Software Science

Subline Series Editors

Subline Advisory Board

ISSN 0302-9743 ISSN 1611-3349 (electronic)

LNCS Sublibrary: SL1 – Theoretical Computer Science and General Issues

The ETAPS Steering Committee (SC) consists of an Executive Board, and

February 2020 Marieke Huisman

Welcome to the European Symposium on Programming (ESOP 2020)! The 29th

February 2020 Peter Müller

Amtoft, Torben Brady, Edwin

Cusumano-Towner, Marco Muller, Stefan

University of Texas at Austin, USA

Trace-Relating Compiler Correctness and Secure Compilation . . . . . . . . . . . 1

On the Versatility of Open Logical Relations: Continuity,

Constructive Game Logic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

Optimal and Perfectly Parallel Algorithms for On-demand

Concise Read-Only Specifications for Better Synthesis of Programs

Soundness Conditions for Big-Step Semantics . . . . . . . . . . . . . . . . . . . . . . 169

Liberate Abstract Garbage Collection from the Stack by Decomposing

SMT-Friendly Formalization of the Solidity Memory Model. . . . . . . . . . . . . 224

Exploring Type-Level Bisimilarity towards More Expressive Multiparty

Verifying Visibility-Based Weak Consistency . . . . . . . . . . . . . . . . . . . . . . . 280

Local Reasoning for Global Graph Properties . . . . . . . . . . . . . . . . . . . . . . . 308

Aneris: A Mechanised Logic for Modular Reasoning

Continualization of Probabilistic Programs With Correction . . . . . . . . . . . . . 366

Semantic Foundations for Deterministic Dataflow and Stream Processing. . . . 394

Connecting Higher-Order Separation Logic to a First-Order

Modular Inference of Linear Types for Multiplicity-Annotated Arrows . . . . . 456

RustHorn: CHC-Based Verification for Rust Programs. . . . . . . . . . . . . . . . . 484

A First-Order Logic with Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515

Proving the Safety of Highly-Available Distributed Objects . . . . . . . . . . . . . 544

Solving Program Sketches with Large Integer Values . . . . . . . . . . . . . . . . . 572

Modular Relaxed Dependencies in Weak Memory Concurrency . . . . . . . . . . 599

ARMv8-A System Semantics: Instruction Fetch in Relaxed Architectures. . . . 626

Higher-Ranked Annotation Polymorphic Dependency Analysis . . . . . . . . . . . 656

ConSORT: Context- and Flow-Sensitive Ownership Refinement Types

Mixed Sessions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715

Higher-Order Spreadsheets with Spilled Arrays . . . . . . . . . . . . . . . . . . . . . . 743

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 771

Deﬁnition 1.1 (Basic Compiler Correctness (CC)). A compiler ↓ is correct iff

Trace-Relating Compiler Correctness. Generalized formalizations of compiler cor-

∀W. ∀t. W↓

2 Trace-Relating Compiler Correctness

2.1 Property Mappings

TPτ ≡ ∀πS . ∀W. W |= πS ⇒ W↓ |= τ (πS ); TPσ ≡ ∀πT . ∀W. W |= σ(πT ) ⇒ W↓ |= πT .

For an arbitrary source program W, τ interprets a source property πS as the target

We will often write α : (X, ) (Y, ) : γ to denote a Galois connection, or simply

Lemma 2.3 (Characteristic property of Galois connections). If α:(X, ) (Y, ):γ

2.3 Preservation of Subset-Closed Hyperproperties

Deﬁnition 2.9 (Hyperproperty Satisfaction). A program W satisﬁes a hyperproperty

Deﬁnition 2.10 (Lifting property mappings to hyperproperty mappings). Let τ :

3 Instances of Trace-Relating Compiler Correctness

3.1 Undeﬁned Behavior

3.2 Resource Exhaustion

The induced trace property mappings σ̃ and τ̃ are the following ( ):

3.3 Different Source and Target Values

Theorem 3.3 ( ·↓ is correct ). ·↓ is CC∼ .

  using forward simulation 

3.4 Abstraction Mismatches

4 Trace-Relating Compilation and Noninterference Preservation

Cl⊆ ◦ τ̃ (NIS ) = Cl⊆ ({ πS × Nω | πS ∈ NIS }) = { πS × I | πS ∈ NIS ∧ I ⊆ Nω } ,

notion of source NI, what property is guaranteed on noninterfering source programs by

s ∼ t ⇐⇒ s◦ ∼ t◦ ∧ s• ∼ t• . The trace relation ∼ corresponds to a Galois connection

and ∼ corresponds to a pair of Galois connections, τ̃ ◦ σ̃ ◦ and τ̃ • σ̃ • , between the

reason about the impact of compilation on noninterference.

source inputs (e.g., equality) and ∼ is deﬁned as s• ∼ t• ≡ s• = t• ∨ ∃m• ≤ t• . s• =

m• · Goes_wrong. Intuitively, a CC∼ compiler guarantees that no interference can be

Technically, instead of giving us a deﬁnition of ρ# T , the theorem gives a property of it.

∀s t. s◦ ∼ t◦ ⇒ φT (t◦ ) = φT (τ̃ ◦ (s◦ )).

5 Trace-Relating Secure Compilation

5.1 Trace-Relating Secure Compilation: A Spectrum of Trinities

RTPσ̃ ≡ ∀P ∀πT ∈ 2TraceT . P |=R σ̃(πT ) ⇒ P↓ |=R πT .

Safe ◦ τ̃ : 2TraceS SafetyT : σ̃

5.2 Instance of Trace-Relating Robust Preservation of Trace Properties

We will often write α : (X, ) (Y, ) : γ to denote a Galois connection, or simply

Lemma 2.3 (Characteristic property of Galois connections). If α:(X, ) (Y, ):γ

using forward simulation