HTML conversions sometimes display errors due to content that did not convert correctly from the source. This paper uses the following packages that are not yet supported by the HTML conversion tool. Feedback on these issues are not necessary; they are known and are being worked on.

  • failed: minibox

Authors: achieve the best HTML results from your LaTeX submissions by following these best practices.

License: CC BY 4.0
arXiv:2404.04621v1 [cs.PL] 06 Apr 2024

IsoPredict: Dynamic Predictive Analysis for Detecting Unserializable Behaviors in Weakly Isolated Data Store Applications
\minibox[frame]This extended version of a PLDI 2024 paper adds an appendix with additional material

Chujun Geng 0009-0000-6149-0208 Ohio State UniversityColumbusUSA [email protected] Spyros Blanas 0009-0004-2703-7177 Ohio State UniversityColumbusUSA [email protected] Michael D. Bond 0000-0002-8971-4944 Ohio State UniversityColumbusUSA [email protected]  and  Yang Wang 0000-0002-9721-4923 Ohio State UniversityColumbusUSA [email protected]
(2024; 2023-11-03; 2024-03-31)

IsoPredict: Dynamic Predictive Analysis for Detecting Unserializable Behaviors in Weakly Isolated Data Store Applications

Chujun Geng 0009-0000-6149-0208 Ohio State UniversityColumbusUSA [email protected] Spyros Blanas 0009-0004-2703-7177 Ohio State UniversityColumbusUSA [email protected] Michael D. Bond 0000-0002-8971-4944 Ohio State UniversityColumbusUSA [email protected]  and  Yang Wang 0000-0002-9721-4923 Ohio State UniversityColumbusUSA [email protected]
(2024; 2023-11-03; 2024-03-31)
Abstract.

Distributed data stores typically provide weak isolation levels, which are efficient but can lead to unserializable behaviors, which are hard for programmers to understand and often result in errors. This paper presents the first dynamic predictive analysis for data store applications under weak isolation levels, called IsoPredict. Given an observed serializable execution of a data store application, IsoPredict generates and solves SMT constraints to find an unserializable execution that is a feasible execution of the application. IsoPredict introduces novel techniques that handle divergent application behavior; solve mutually recursive sets of constraints; and balance coverage, precision, and performance. An evaluation on four transactional data store benchmarks shows that IsoPredict often predicts unserializable behaviors, 99% of which are feasible.

weak isolation levels, dynamic predictive analysis, data stores, transactions
ccs: Software and its engineering Software testing and debuggingcopyright: rightsretaineddoi: 10.1145/3656391journalyear: 2024submissionid: pldi24main-p59-pjournal: PACMPLjournalvolume: 8journalnumber: PLDIarticle: 161publicationmonth: 6

1. Introduction

Distributed data stores are the foundation of today’s service infrastructure, due to their scalability, fault tolerance, and ease of use (Corbett et al., 2012; Elhemali et al., 2022; Snowflake, 2023; MySQL, 2023b). Many real-world data stores only support weak isolation levels, such as causal consistency (causal(Ahamad et al., 1995), which is the strongest level that achieves availability under network partitions (Burckhardt, 2014; Gilbert and Lynch, 2002). Another weak isolation level is read committed (rc(Berenson et al., 1995), which is commonly used by database applications to balance performance and correctness (Crooks et al., 2017; Pavlo, 2017; Cheng et al., 2023; Tang et al., 2022). Under weak isolation, an execution may be unserializable, producing an outcome that is impossible for any serial execution. Unserializable behaviors are poorly understood by most programmers, and often lead to errors and failures in real-world systems (Cheng et al., 2023; Tang et al., 2022; Warszawski and Bailis, 2017).

Prior work has introduced techniques to find unserializable behaviors in data store applications under weak isolation, but has scalability or accuracy limitations. Static analysis can find unserializable behaviors, but its precision scales poorly with program complexity, leading to many false positives (infeasible unserializable behaviors) (Brutschy et al., 2018; Nagar and Jagannathan, 2018; Rahmani et al., 2019). Dynamic analysis can avoid false positives by analyzing only the observed execution (Biswas et al., 2021; Brutschy et al., 2017), or it can extrapolate from an observed execution but report numerous false positives (Gan et al., 2020; Warszawski and Bailis, 2017). §8 discusses prior work in more detail.

Motivating example

Algorithm 1 shows code of a transactional data store application. The 𝐷𝑎𝑡𝑎𝑆𝑡𝑜𝑟𝑒𝐷𝑎𝑡𝑎𝑆𝑡𝑜𝑟𝑒\mathit{DataStore}italic_DataStore provides a key–value interface. Our execution model requires that every 𝑔𝑒𝑡𝑔𝑒𝑡\mathit{get}italic_get (read) or 𝑝𝑢𝑡𝑝𝑢𝑡\mathit{put}italic_put (write) operation to execute in a transaction, so an operation starts a new transaction if the current session (i.e., client) is not in a transaction. A 𝑐𝑜𝑚𝑚𝑖𝑡𝑐𝑜𝑚𝑚𝑖𝑡\mathit{commit}italic_commit operation ends the session’s ongoing transaction.

Figure 1 shows two different executions of the application. In each execution, two sessions (i.e., clients) call deposit concurrently on the same empty account to deposit 50 and 60, respectively. Developers would expect that the ending balance will be 110, which is the only serializable outcome. However, under weak isolation levels causal and rc, the ending balance may be 110, 50, or 60.

Algorithm 1 A procedure in a data store application that deposits money in an account.
procedure deposit(𝑎𝑐𝑐𝑜𝑢𝑛𝑡𝑎𝑐𝑐𝑜𝑢𝑛𝑡\mathit{account}italic_account, amount𝑎𝑚𝑜𝑢𝑛𝑡amountitalic_a italic_m italic_o italic_u italic_n italic_t)
    𝑏𝑎𝑙𝑎𝑛𝑐𝑒𝐷𝑎𝑡𝑎𝑆𝑡𝑜𝑟𝑒.𝑔𝑒𝑡(𝑎𝑐𝑐𝑜𝑢𝑛𝑡)formulae-sequence𝑏𝑎𝑙𝑎𝑛𝑐𝑒𝐷𝑎𝑡𝑎𝑆𝑡𝑜𝑟𝑒𝑔𝑒𝑡𝑎𝑐𝑐𝑜𝑢𝑛𝑡\mathit{balance}\leftarrow\mathit{DataStore}.\mathit{get}(\mathit{account})italic_balance ← italic_DataStore . italic_get ( italic_account ) \triangleright Read balance; implicitly starts transaction if not in one
    𝐷𝑎𝑡𝑎𝑆𝑡𝑜𝑟𝑒.𝑝𝑢𝑡(𝑎𝑐𝑐𝑜𝑢𝑛𝑡,𝑏𝑎𝑙𝑎𝑛𝑐𝑒+𝑎𝑚𝑜𝑢𝑛𝑡)formulae-sequence𝐷𝑎𝑡𝑎𝑆𝑡𝑜𝑟𝑒𝑝𝑢𝑡𝑎𝑐𝑐𝑜𝑢𝑛𝑡𝑏𝑎𝑙𝑎𝑛𝑐𝑒𝑎𝑚𝑜𝑢𝑛𝑡\mathit{DataStore}.\mathit{put}(\mathit{account},\mathit{balance}+\mathit{% amount})italic_DataStore . italic_put ( italic_account , italic_balance + italic_amount ) \triangleright Update balance
    𝐷𝑎𝑡𝑎𝑆𝑡𝑜𝑟𝑒.𝑐𝑜𝑚𝑚𝑖𝑡()formulae-sequence𝐷𝑎𝑡𝑎𝑆𝑡𝑜𝑟𝑒𝑐𝑜𝑚𝑚𝑖𝑡\mathit{DataStore}.\mathit{commit}()italic_DataStore . italic_commit ( ) \triangleright Commits transaction
((a)) The execution in which t2subscript𝑡2t_{2}italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT reads from t1subscript𝑡1t_{1}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is causal, rc, and serializable.
((b)) The execution in which t1subscript𝑡1t_{1}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and t2subscript𝑡2t_{2}italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT both read from the initial state is causal and rc but not serializable.
Figure 1. Different executions of two sessions (clients) concurrently on the same account.

Contributions

This paper introduces IsoPredict, the first predictive analysis for transactional data store applications, and shows that the approach is effective at finding unserializable behaviors. Given a serializable execution such as Figure 0(a) as input, IsoPredict finds an unserializable execution such as Figure 0(b). IsoPredict uses dynamic predictive analysis, which analyzes an observed execution of a program and detects alternative feasible, unserializable executions of the program.

Predictive analysis is powerful because, in essence, it explores many executions at once. To predict an unserializable execution from an observed serializable execution, IsoPredict generates SMT constraints that encode execution feasibility, unserializability, and weak isolation level (causal or rc), and uses an off-the-shelf SMT solver to solve them. We introduce analysis variants that trade coverage for performance, and precision for coverage. To account for the possibility of predicting infeasible executions, IsoPredict can optionally validate a predicted unserializable execution. An evaluation on transactional data store benchmarks shows that IsoPredict is effective at predicting unserializable executions from observed executions under causal and rc. More than 99% of predictions are validated as feasible executions.

While prior work introduces predictive analysis for shared-memory programs (Said et al., 2011; Kini et al., 2017; Roemer et al., 2020; Huang et al., 2014; Sinha et al., 2012; Tunç et al., 2023), to our knowledge IsoPredict is the first predictive analysis approach for transactional data store applications, which present unique challenges (§8). Compared to prior work MonkeyDB (Biswas et al., 2021), IsoPredict is comparably effective at finding unserializable executions of the evaluated programs (§7.3). However, IsoPredict and MonkeyDB use completely different approaches to find erroneous executions. While MonkeyDB uses random exploration to produce an erroneous execution, IsoPredict uses predictive analysis to evaluate an equivalence class of many executions at once. Furthermore, MonkeyDB requires applications to run on its specialized data store, while IsoPredict’s predictive analysis approach is in principle suitable for analyzing executions from any data store, although demonstrating so is outside the scope of this paper.

2. Background

This section introduces this paper’s formalisms for weakly isolated executions of transactional data store applications, which are closely based on the axiomatic framework of Biswas and Enea (Biswas and Enea, 2019). We use this framework because it supports a variety of isolation levels, is well suited to encoding as constraints, and has been employed by recent work (Biswas et al., 2021; Bouajjani et al., 2023).

2.1. Weakly Isolated Execution Histories

A transactional data store is modeled as a distributed store of key–value pairs. A data store application performs read (get) and write (put) operations on keys, all executed in transactions. Non-transactional applications can be handled by treating each read and write operation as a separate transaction. An execution consists of events in committed transactions (aborted transactions are not part of an execution). Each event is either read(k𝑘kitalic_k), or write(k𝑘kitalic_k) or commit, where k𝑘kitalic_k is a key. Other operations, such as insertion into and deletion from a set, can be modeled in terms of reads and writes. Multiple clients may open connections, or sessions, to the data store. If a session is not in a transaction, its next event implicitly starts a new transaction, ensuring every event is in a transaction. The commit event ends the current transaction. Within a session, transactions are ordered by the strict partial order session order (𝑠𝑜𝑠𝑜\mathit{so}italic_so):

𝑠𝑜(t1,t2):-t1 precedes t2 in the same session:-𝑠𝑜subscript𝑡1subscript𝑡2t1 precedes t2 in the same session\displaystyle\mathit{so}(t_{1},t_{2})\coloneq\textnormal{$t_{1}$ precedes $t_{% 2}$ in the same session}italic_so ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) :- italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT precedes italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT in the same session

An important property of an execution is which write each read reads from. The strict partial order 𝑤𝑟ksubscript𝑤𝑟𝑘\mathit{wr}_{k}italic_wr start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT (write–read on key k𝑘kitalic_k) orders transactions if one reads from the other:

𝑤𝑟k(t1,t2)subscript𝑤𝑟𝑘subscript𝑡1subscript𝑡2\displaystyle\mathit{wr}_{k}(t_{1},t_{2})italic_wr start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) :-t2 reads the write of t1 on k:-absentt2 reads the write of t1 on k\displaystyle\coloneq\textnormal{$t_{2}$ reads the write of $t_{1}$ on $k$}:- italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT reads the write of italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT on italic_k

If a read reads from a write in the same transaction, the read is not included as an event in the transaction (and thus this write–read ordering is not included in 𝑤𝑟ksubscript𝑤𝑟𝑘\mathit{wr}_{k}italic_wr start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT). If a transaction writes k𝑘kitalic_k multiple times, only the last write is included as an event in the transaction. Thus a read(k𝑘kitalic_k) event always reads from a write(k𝑘kitalic_k) in another transaction, which is the transaction’s last write to k𝑘kitalic_k. If a transaction t𝑡titalic_t reads k𝑘kitalic_k from the data store’s initial state, then 𝑤𝑟k(t0,t)subscript𝑤𝑟𝑘subscript𝑡0𝑡\mathit{wr}_{k}(t_{0},t)italic_wr start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_t ), where t0subscript𝑡0t_{0}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is a special transaction representing the initial state. The union of 𝑤𝑟ksubscript𝑤𝑟𝑘\mathit{wr}_{k}italic_wr start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT over all keys is 𝑤𝑟𝑤𝑟\mathit{wr}italic_wr, i.e., 𝑤𝑟:-k is a key𝑤𝑟k:-𝑤𝑟subscript𝑘 is a keysubscript𝑤𝑟𝑘\mathit{wr}\coloneq\bigcup_{k\textnormal{ is a key}}\mathit{wr}_{k}italic_wr :- ⋃ start_POSTSUBSCRIPT italic_k is a key end_POSTSUBSCRIPT italic_wr start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. The transitive closure of 𝑠𝑜𝑠𝑜\mathit{so}italic_so and 𝑤𝑟𝑤𝑟\mathit{wr}italic_wr is happens-before order, i.e., ℎ𝑏:-(𝑠𝑜𝑤𝑟)+:-ℎ𝑏superscript𝑠𝑜𝑤𝑟\mathit{hb}\coloneq(\mathit{so}\cup\mathit{wr})^{+}italic_hb :- ( italic_so ∪ italic_wr ) start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT. An execution history of a data store application is the set of all committed transactions (T𝑇\mathit{T}italic_T), session order (𝑠𝑜𝑠𝑜\mathit{so}italic_so), and write–read order (𝑤𝑟𝑤𝑟\mathit{wr}italic_wr), i.e., History:-T,so,wr:-𝐻𝑖𝑠𝑡𝑜𝑟𝑦𝑇𝑠𝑜𝑤𝑟History\coloneq\langle T,so,wr\rangleitalic_H italic_i italic_s italic_t italic_o italic_r italic_y :- ⟨ italic_T , italic_s italic_o , italic_w italic_r ⟩. Every history includes the special transaction t0subscript𝑡0t_{0}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT mentioned above that represents the initial state. t0subscript𝑡0t_{0}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT implicitly writes the initial value to every key, and t0subscript𝑡0t_{0}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is 𝑠𝑜𝑠𝑜\mathit{so}italic_so-ordered before all other transactions.

Example

Figures 1(a) and 2(a) each show an execution history as a graph. Transactions are boxes containing read and write events implicitly concluded by a commit event. t1subscript𝑡1t_{1}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and t2subscript𝑡2t_{2}italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT execute in different sessions, and t0subscript𝑡0t_{0}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is the initial state transaction. The 𝑤𝑟ksubscript𝑤𝑟𝑘\mathit{wr}_{k}italic_wr start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT edges indicate each read’s writer.

2.2. Serializablility

An execution history T,so,wr𝑇𝑠𝑜𝑤𝑟\langle T,so,wr\rangle⟨ italic_T , italic_s italic_o , italic_w italic_r ⟩ is serializable if and only if it could have been produced by a serial execution of the transactions in T𝑇Titalic_T. (In a serial execution, transactions execute one at a time, and every read to k𝑘kitalic_k reads from the most recent write to k𝑘kitalic_k.) Equivalently, an execution is serializable if and only if there exists a commit order, 𝑐𝑜𝑐𝑜\mathit{co}italic_co, with the following constraints: (1) 𝑐𝑜𝑐𝑜\mathit{co}italic_co must be consistent with happens-before (ℎ𝑏ℎ𝑏\mathit{hb}italic_hb) order. (2) a transaction that writes to k𝑘kitalic_k cannot be 𝑐𝑜𝑐𝑜\mathit{co}italic_co-ordered between two transactions ordered by 𝑤𝑟ksubscript𝑤𝑟𝑘\mathit{wr}_{k}italic_wr start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. The second constraint’s ordering is called arbitration order and represented by the strict partial order 𝑤𝑤𝑤𝑤\mathit{ww}italic_ww, which is defined as follows:

(1) 𝑤𝑤(t1,t2):-k,t1 and t2 write to kt3T,wrk(t2,t3)𝑐𝑜(t1,t3)formulae-sequence:-𝑤𝑤subscript𝑡1subscript𝑡2𝑘t1 and t2 write to ksubscript𝑡3𝑇𝑤subscript𝑟𝑘subscript𝑡2subscript𝑡3𝑐𝑜subscript𝑡1subscript𝑡3\displaystyle\mathit{ww}(t_{1},t_{2})\coloneq\exists k,\textnormal{$t_{1}$ and% $t_{2}$ write to $k$}\land\>\exists t_{3}\in\mathit{T},wr_{k}(t_{2},t_{3})% \land\mathit{co}(t_{1},t_{3})italic_ww ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) :- ∃ italic_k , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT write to italic_k ∧ ∃ italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ∈ italic_T , italic_w italic_r start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ∧ italic_co ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT )

Note the circular dependency between 𝑤𝑤𝑤𝑤\mathit{ww}italic_ww and 𝑐𝑜𝑐𝑜\mathit{co}italic_co: Commit ordering may imply additional arbitration ordering, which in turn may imply additional commit ordering. This property leads to challenges in encoding SMT constraints that §4 explains and addresses. Thus a history is serializable if and only if there exists a 𝑐𝑜𝑐𝑜\mathit{co}italic_co that is consistent with ℎ𝑏ℎ𝑏\mathit{hb}italic_hb and 𝑤𝑤𝑤𝑤\mathit{ww}italic_ww:

T,so,wr is serializable𝑐𝑜,ℎ𝑏𝑤𝑤𝑐𝑜iff𝑇𝑠𝑜𝑤𝑟 is serializable𝑐𝑜ℎ𝑏𝑤𝑤𝑐𝑜\displaystyle\langle T,so,wr\rangle\textnormal{ is }\textsc{serializable}\iff% \exists\mathit{co},\mathit{hb}\cup\mathit{ww}\subseteq\mathit{co}⟨ italic_T , italic_s italic_o , italic_w italic_r ⟩ is smallcaps_serializable ⇔ ∃ italic_co , italic_hb ∪ italic_ww ⊆ italic_co

Equivalently, the history is serializable if and only if there exists 𝑐𝑜𝑐𝑜\mathit{co}italic_co such that (ℎ𝑏𝑤𝑤𝑐𝑜)+superscriptℎ𝑏𝑤𝑤𝑐𝑜(\mathit{hb}\cup\mathit{ww}\cup\mathit{co})^{+}( italic_hb ∪ italic_ww ∪ italic_co ) start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT is acyclic. An execution is unserializable if and only if it is not serializable.

Example

Figure 1(a)’s history is serializable because there exists a commit order (t0<𝑐𝑜t1<𝑐𝑜t2subscript𝑐𝑜subscript𝑡0subscript𝑡1subscript𝑐𝑜subscript𝑡2t_{0}<_{\mathit{co}}t_{1}<_{\mathit{co}}t_{2}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT < start_POSTSUBSCRIPT italic_co end_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT < start_POSTSUBSCRIPT italic_co end_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT), shown in Figure 1(b), that is consistent with the serializable axioms. Note that the arbitration rule (Equation 1) never applies in Figure 1(a), and so Figure 1(b) shows no 𝑤𝑤𝑤𝑤\mathit{ww}italic_ww edges.

The history in Figure 2(a) is unserializable because there does not exist a commit order that satifies the serializable axioms. For example, as Figure 2(b) shows, if 𝑐𝑜(t1,t2)𝑐𝑜subscript𝑡1subscript𝑡2\mathit{co}(t_{1},t_{2})italic_co ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ), then 𝑤𝑤(t1,t0)𝑤𝑤subscript𝑡1subscript𝑡0\mathit{ww}(t_{1},t_{0})italic_ww ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) by Equation 1, which implies 𝑐𝑜(t1,t0)𝑐𝑜subscript𝑡1subscript𝑡0\mathit{co}(t_{1},t_{0})italic_co ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) and thus 𝑐𝑜𝑐𝑜\mathit{co}italic_co is cyclic. Alternatively, if 𝑐𝑜(t2,t1)𝑐𝑜subscript𝑡2subscript𝑡1\mathit{co}(t_{2},t_{1})italic_co ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ), then 𝑤𝑤(t2,t0)𝑤𝑤subscript𝑡2subscript𝑡0\mathit{ww}(t_{2},t_{0})italic_ww ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) and thus 𝑐𝑜(t2,t0)𝑐𝑜subscript𝑡2subscript𝑡0\mathit{co}(t_{2},t_{0})italic_co ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ), and again 𝑐𝑜𝑐𝑜\mathit{co}italic_co is cyclic.

Figure 2. A causal, serializable history corresponding to Figure 0(a).
((a)) Execution history
((b)) A 𝑐𝑜𝑐𝑜\mathit{co}italic_co (dashed arrows) consistent with the serializable axioms.
((a)) Execution history
((b)) A 𝑐𝑜𝑐𝑜\mathit{co}italic_co (dashed arrows) inconsistent with the serializable axioms (contradiction shown in red).
Figure 2. A causal, serializable history corresponding to Figure 0(a).
Figure 3. A causal, unserializable history corresponding to Figure 0(b).

2.3. Causal Consistency

Causal consistency (causal) is a weak isolation level that preserves the order of operations that are causally related (Ahamad et al., 1995). causal is of theoretical and practical interest because it is the strongest isolation level achievable when a data store requires availability under network partitions (Burckhardt, 2014; Gilbert and Lynch, 2002; Mahajan et al., 2011).

Similar to serializable, causal is defined in terms of whether there exists a commit order that is consistent with happens-before (ℎ𝑏ℎ𝑏\mathit{hb}italic_hb) and an arbitration order, which we call 𝑤𝑤𝑐𝑎𝑢𝑠𝑎𝑙subscript𝑤𝑤𝑐𝑎𝑢𝑠𝑎𝑙\mathit{ww}_{\mathit{causal}}italic_ww start_POSTSUBSCRIPT italic_causal end_POSTSUBSCRIPT to distinguish it from the arbitration order for serializable (𝑤𝑤𝑤𝑤\mathit{ww}italic_ww). Two transactions t1subscript𝑡1t_{1}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and t2subscript𝑡2t_{2}italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are ordered by 𝑤𝑤𝑐𝑎𝑢𝑠𝑎𝑙subscript𝑤𝑤𝑐𝑎𝑢𝑠𝑎𝑙\mathit{ww}_{\mathit{causal}}italic_ww start_POSTSUBSCRIPT italic_causal end_POSTSUBSCRIPT if they write the same key and if there is a third transaction t3subscript𝑡3t_{3}italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT that happens-after t1subscript𝑡1t_{1}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (ℎ𝑏(t1,t3)ℎ𝑏subscript𝑡1subscript𝑡3\mathit{hb}(t_{1},t_{3})italic_hb ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT )) and reads from t2subscript𝑡2t_{2}italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT’s write to the same key (𝑤𝑟(t2,t3)𝑤𝑟subscript𝑡2subscript𝑡3\mathit{wr}(t_{2},t_{3})italic_wr ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT )). More formally,

(2) 𝑤𝑤𝑐𝑎𝑢𝑠𝑎𝑙(t1,t2):-k,t1 and t2 write to kt3T,wrk(t2,t3)hb(t1,t3)formulae-sequence:-subscript𝑤𝑤𝑐𝑎𝑢𝑠𝑎𝑙subscript𝑡1subscript𝑡2𝑘t1 and t2 write to ksubscript𝑡3𝑇𝑤subscript𝑟𝑘subscript𝑡2subscript𝑡3𝑏subscript𝑡1subscript𝑡3\displaystyle\mathit{ww}_{\mathit{causal}}(t_{1},t_{2})\coloneq\exists k,% \textnormal{$t_{1}$ and $t_{2}$ write to $k$}\land\exists t_{3}\in\mathit{T},% wr_{k}(t_{2},t_{3})\land hb(t_{1},t_{3})italic_ww start_POSTSUBSCRIPT italic_causal end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) :- ∃ italic_k , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT write to italic_k ∧ ∃ italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ∈ italic_T , italic_w italic_r start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ∧ italic_h italic_b ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT )

A history is causal if and only if there exists a commit order consistent with ℎ𝑏ℎ𝑏\mathit{hb}italic_hb and 𝑤𝑤𝑐𝑎𝑢𝑠𝑎𝑙subscript𝑤𝑤𝑐𝑎𝑢𝑠𝑎𝑙\mathit{ww}_{\mathit{causal}}italic_ww start_POSTSUBSCRIPT italic_causal end_POSTSUBSCRIPT:

(3) T,𝑠𝑜,𝑤𝑟 is causal𝑐𝑜,ℎ𝑏𝑤𝑤𝑐𝑎𝑢𝑠𝑎𝑙𝑐𝑜iff𝑇𝑠𝑜𝑤𝑟 is causal𝑐𝑜ℎ𝑏subscript𝑤𝑤𝑐𝑎𝑢𝑠𝑎𝑙𝑐𝑜\displaystyle\langle T,\mathit{so},\mathit{wr}\rangle\textnormal{ is }\textsc{% causal}\iff\exists\mathit{co},\mathit{hb}\cup\mathit{ww}_{\mathit{causal}}% \subseteq\mathit{co}⟨ italic_T , italic_so , italic_wr ⟩ is smallcaps_causal ⇔ ∃ italic_co , italic_hb ∪ italic_ww start_POSTSUBSCRIPT italic_causal end_POSTSUBSCRIPT ⊆ italic_co

Equivalently, a history is causal if and only if (ℎ𝑏𝑤𝑤𝑐𝑎𝑢𝑠𝑎𝑙)+superscriptℎ𝑏subscript𝑤𝑤𝑐𝑎𝑢𝑠𝑎𝑙(\mathit{hb}\cup\mathit{ww}_{\mathit{causal}})^{+}( italic_hb ∪ italic_ww start_POSTSUBSCRIPT italic_causal end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT is acyclic.111Unlike serializable, causal can be defined in terms of whether (ℎ𝑏𝑤𝑤𝑐𝑎𝑢𝑠𝑎𝑙)+superscriptℎ𝑏subscript𝑤𝑤𝑐𝑎𝑢𝑠𝑎𝑙(\mathit{hb}\cup\mathit{ww}_{\mathit{causal}})^{+}( italic_hb ∪ italic_ww start_POSTSUBSCRIPT italic_causal end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT is acyclic, which implies that a total commit order must exist. In contrast, serializable’s arbitration order (𝑤𝑤𝑤𝑤\mathit{ww}italic_ww) is dependent on the commit order, so serializable must be defined in terms of whether (ℎ𝑏𝑤𝑤𝑐𝑜)+superscriptℎ𝑏𝑤𝑤𝑐𝑜(\mathit{hb}\cup\mathit{ww}\cup\mathit{co})^{+}( italic_hb ∪ italic_ww ∪ italic_co ) start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT is acyclic.

Example

The history in Figure 1(a) is causal because there exists a commit order t0<𝑐𝑜t1<𝑐𝑜t2subscript𝑐𝑜subscript𝑡0subscript𝑡1subscript𝑐𝑜subscript𝑡2t_{0}<_{\mathit{co}}t_{1}<_{\mathit{co}}t_{2}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT < start_POSTSUBSCRIPT italic_co end_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT < start_POSTSUBSCRIPT italic_co end_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT that is consistent with the causal axioms. (Or, since the history is serializable, which is strictly stronger than causal, the history must be causal.) The history in Figure 2(a) is causal because there exists a commit order, t0<𝑐𝑜t1<𝑐𝑜t2subscript𝑐𝑜subscript𝑡0subscript𝑡1subscript𝑐𝑜subscript𝑡2t_{0}<_{\mathit{co}}t_{1}<_{\mathit{co}}t_{2}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT < start_POSTSUBSCRIPT italic_co end_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT < start_POSTSUBSCRIPT italic_co end_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (or t0<𝑐𝑜t2<𝑐𝑜t1subscript𝑐𝑜subscript𝑡0subscript𝑡2subscript𝑐𝑜subscript𝑡1t_{0}<_{\mathit{co}}t_{2}<_{\mathit{co}}t_{1}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT < start_POSTSUBSCRIPT italic_co end_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT < start_POSTSUBSCRIPT italic_co end_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT), that is consistent with the causal axioms.

2.4. Read Committed

Read committed (rc) is a popular weak isolation level because of the balance between performance and consistency it provides (Berenson et al., 1995). Whereas causal requires transactions ordered by happens-before (ℎ𝑏ℎ𝑏\mathit{hb}italic_hb) to be viewed by other transactions in the same order, rc’s arbitration order, 𝑤𝑤𝑟𝑐subscript𝑤𝑤𝑟𝑐\mathit{ww}_{\mathit{rc}}italic_ww start_POSTSUBSCRIPT italic_rc end_POSTSUBSCRIPT, only applies to write transactions that are read by multiple read events from the same transaction. More formally, rc is defined based on whether there exists a commit order that is consistent with ℎ𝑏ℎ𝑏\mathit{hb}italic_hb and 𝑤𝑤𝑟𝑐subscript𝑤𝑤𝑟𝑐\mathit{ww}_{\mathit{rc}}italic_ww start_POSTSUBSCRIPT italic_rc end_POSTSUBSCRIPT, which is defined as follows:

(4) 𝑤𝑤𝑟𝑐(t1,t2):-k,t1 and t2 write to kα,β,𝑝𝑜(β,α)𝑤𝑟¯k(t2,α)k,𝑤𝑟¯k(t1,β):-subscript𝑤𝑤𝑟𝑐subscript𝑡1subscript𝑡2𝑘t1 and t2 write to k𝛼𝛽𝑝𝑜𝛽𝛼subscript¯𝑤𝑟𝑘subscript𝑡2𝛼superscript𝑘subscript¯𝑤𝑟superscript𝑘subscript𝑡1𝛽\displaystyle\mathit{ww}_{\mathit{rc}}(t_{1},t_{2})\coloneq\exists k,% \textnormal{$t_{1}$ and $t_{2}$ write to $k$}\land\exists\>\!\alpha,\beta,% \mathit{po}(\beta,\alpha)\land\overline{\mathit{wr}}_{k}(t_{2},\alpha)\land% \exists k^{\prime},\overline{\mathit{wr}}_{k^{\prime}}(t_{1},\beta)italic_ww start_POSTSUBSCRIPT italic_rc end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) :- ∃ italic_k , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT write to italic_k ∧ ∃ italic_α , italic_β , italic_po ( italic_β , italic_α ) ∧ over¯ start_ARG italic_wr end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_α ) ∧ ∃ italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , over¯ start_ARG italic_wr end_ARG start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_β )

where 𝑝𝑜𝑝𝑜\mathit{po}italic_po is program order, a strict partial order that orders events within a transaction; and 𝑤𝑟¯k(t,e)subscript¯𝑤𝑟𝑘𝑡𝑒\overline{\mathit{wr}}_{k}(t,e)over¯ start_ARG italic_wr end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t , italic_e ) is true if and only if e𝑒eitalic_e is a read event that reads from a write in transaction t𝑡titalic_t (and thus et𝑒𝑡e\neq titalic_e ≠ italic_t). Thus α𝛼\alphaitalic_α and β𝛽\betaitalic_β must be events in the same transaction such that α𝛼\alphaitalic_α is a read(k𝑘kitalic_k) event that reads from write(k𝑘kitalic_k) in t2subscript𝑡2t_{2}italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and β𝛽\betaitalic_β is a read event that reads from any write in t1subscript𝑡1t_{1}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. An execution history is rc if and only if there exists a commit order that is consistent with ℎ𝑏ℎ𝑏\mathit{hb}italic_hb and 𝑤𝑤𝑟𝑐subscript𝑤𝑤𝑟𝑐\mathit{ww}_{\mathit{rc}}italic_ww start_POSTSUBSCRIPT italic_rc end_POSTSUBSCRIPT:

(5) T,𝑠𝑜,𝑤𝑟 is rc𝑐𝑜,ℎ𝑏𝑤𝑤𝑟𝑐𝑐𝑜iff𝑇𝑠𝑜𝑤𝑟 is rc𝑐𝑜ℎ𝑏subscript𝑤𝑤𝑟𝑐𝑐𝑜\displaystyle\langle T,\mathit{so},\mathit{wr}\rangle\textnormal{ is }\textsc{% rc}\iff\exists\mathit{co},\mathit{hb}\cup\mathit{ww}_{\mathit{rc}}\subset% \mathit{co}⟨ italic_T , italic_so , italic_wr ⟩ is smallcaps_rc ⇔ ∃ italic_co , italic_hb ∪ italic_ww start_POSTSUBSCRIPT italic_rc end_POSTSUBSCRIPT ⊂ italic_co

Example

The execution histories in Figures 1(a) and 2(a) are rc because there exist commit orders (in fact, the same commit orders used to establish causal) satisfying the above condition. Or, the histories are rc because they are causal, which is strictly stronger than rc.

3. IsoPredict Overview

IsoPredict consists of two main components, as shown in Figure 4: predictive analysis and validation.



Figure 4. IsoPredict’s components and workflow.

The predictive analysis component takes as input an observed execution history that is recorded at the client application’s backend data store, generates SMT constraints, and uses an SMT solver to find a predicted unserializable execution if one exists. §4 describes IsoPredict’s predictive analysis.

The validation component tries to execute the predicted execution history to determine if it is feasible, and it generates and solves constraints to determine if the resulting execution is unserializable. If so, IsoPredict outputs the validated history alongside a visualization of the validated unserializable execution. §5 describes IsoPredict’s validation component.

Validation is optional; developers may choose to skip it for two reasons. First, it may be overkill—in our experiments, over 99% of predicted unserializable executions are successfully validated. Second, validation may be impractical if the application cannot be replayed easily. Validation is, however, useful to our evaluation to measure how many predicted executions are feasible.

4. Predictive Analysis

IsoPredict’s predictive analysis component takes as input an observed execution history of a data store application. The observed history 𝐻𝑖𝑠𝑡𝑜𝑟𝑦=T,𝑠𝑜,𝑤𝑟𝑜𝑏𝑠𝐻𝑖𝑠𝑡𝑜𝑟𝑦𝑇𝑠𝑜subscript𝑤𝑟𝑜𝑏𝑠\mathit{History}=\langle\mathit{T},\mathit{so},\mathit{wr_{obs}}\rangleitalic_History = ⟨ italic_T , italic_so , italic_wr start_POSTSUBSCRIPT italic_obs end_POSTSUBSCRIPT ⟩ consists of a set of transactions T𝑇\mathit{T}italic_T, session order 𝑠𝑜𝑠𝑜\mathit{so}italic_so between transactions, and observed write–read ordering 𝑤𝑟𝑜𝑏𝑠subscript𝑤𝑟𝑜𝑏𝑠\mathit{wr_{obs}}italic_wr start_POSTSUBSCRIPT italic_obs end_POSTSUBSCRIPT. The goal of IsoPredict is to find a feasible, unserializable execution that is valid under a weak isolation model M𝑀Mitalic_M (i.e., causal or rc). To find such an execution, IsoPredict encodes and solves the following necessary and sufficient constraints for a predicted execution history, 𝐻𝑖𝑠𝑡𝑜𝑟𝑦=T,𝑠𝑜,𝑤𝑟superscript𝐻𝑖𝑠𝑡𝑜𝑟𝑦superscript𝑇𝑠𝑜𝑤𝑟\mathit{History}^{\prime}=\langle T^{\prime},\mathit{so},\mathit{wr}\rangleitalic_History start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = ⟨ italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_so , italic_wr ⟩:

  1. (1)

    𝐻𝑖𝑠𝑡𝑜𝑟𝑦superscript𝐻𝑖𝑠𝑡𝑜𝑟𝑦\mathit{History}^{\prime}italic_History start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT must be a feasible execution prefix222We allow Tsuperscript𝑇T^{\prime}italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT to be a subset of T𝑇Titalic_T to exclude transactions that may diverge from the observed execution (§4.5). An execution prefix is sufficient: If 𝐻𝑖𝑠𝑡𝑜𝑟𝑦superscript𝐻𝑖𝑠𝑡𝑜𝑟𝑦\mathit{History}^{\prime}italic_History start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT exists, a full execution history exists that has 𝐻𝑖𝑠𝑡𝑜𝑟𝑦superscript𝐻𝑖𝑠𝑡𝑜𝑟𝑦\mathit{History}^{\prime}italic_History start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT as a prefix and meets the criteria above. of the program that produced 𝐻𝑖𝑠𝑡𝑜𝑟𝑦𝐻𝑖𝑠𝑡𝑜𝑟𝑦\mathit{History}italic_History4.1).

  2. (2)

    𝐻𝑖𝑠𝑡𝑜𝑟𝑦superscript𝐻𝑖𝑠𝑡𝑜𝑟𝑦\mathit{History}^{\prime}italic_History start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT must be unserializable4.2).

  3. (3)

    𝐻𝑖𝑠𝑡𝑜𝑟𝑦superscript𝐻𝑖𝑠𝑡𝑜𝑟𝑦\mathit{History}^{\prime}italic_History start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT must be valid under M𝑀Mitalic_M4.3).

As an example, Figure 1(a) shows a serializable execution history that contains two deposit transactions (Algorithm 1) running concurrently. IsoPredict generates and solves the constraints sketched above, in order to predict the causal and rc but unserializable execution from Figure 2(a).

4.1. Encoding of Feasible Execution

This section describes the constraints that IsoPredict generates to ensure that 𝐻𝑖𝑠𝑡𝑜𝑟𝑦=T,𝑠𝑜,𝑤𝑟superscript𝐻𝑖𝑠𝑡𝑜𝑟𝑦superscript𝑇𝑠𝑜𝑤𝑟\mathit{History}^{\prime}=\langle T^{\prime},\mathit{so},\mathit{wr}\rangleitalic_History start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = ⟨ italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_so , italic_wr ⟩ is a feasible execution of the application that produced 𝐻𝑖𝑠𝑡𝑜𝑟𝑦=T,𝑠𝑜,𝑤𝑟𝑜𝑏𝑠𝐻𝑖𝑠𝑡𝑜𝑟𝑦𝑇𝑠𝑜subscript𝑤𝑟𝑜𝑏𝑠\mathit{History}=\langle T,\mathit{so},\mathit{wr_{obs}}\rangleitalic_History = ⟨ italic_T , italic_so , italic_wr start_POSTSUBSCRIPT italic_obs end_POSTSUBSCRIPT ⟩.

Session order

The predicted execution must preserve the observed execution’s session order (𝑠𝑜𝑠𝑜\mathit{so}italic_so). IsoPredict generates constraints over a Boolean SMT function ϕ𝑠𝑜(t1,t2)subscriptitalic-ϕ𝑠𝑜subscript𝑡1subscript𝑡2\phi_{\mathit{so}}(t_{1},t_{2})italic_ϕ start_POSTSUBSCRIPT italic_so end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) that takes two transactions as input; a transaction is an SMT data type representing the set of all executed transactions T𝑇Titalic_T. The analysis generates the following constraints to preserve the observed execution’s 𝑠𝑜𝑠𝑜\mathit{so}italic_so:

t1,t2T,t1t2,ϕ𝑠𝑜(t1,t2)\displaystyle\hbox{\multirowsetup$\forall t_{1},t_{2}\in T,t_{1}\neq t_{2},% \quad$}\quad\boxed{\phi_{\mathit{so}}(t_{1},t_{2})}∀ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ italic_T , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≠ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , start_ARG italic_ϕ start_POSTSUBSCRIPT italic_so end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG if 𝑠𝑜(t1,t2)if 𝑠𝑜subscript𝑡1subscript𝑡2\displaystyle\quad\textnormal{if }\mathit{so}(t_{1},t_{2})if italic_so ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT )
¬ϕ𝑠𝑜(t1,t2)subscriptitalic-ϕ𝑠𝑜subscript𝑡1subscript𝑡2\displaystyle\boxed{\neg\phi_{\mathit{so}}(t_{1},t_{2})}¬ italic_ϕ start_POSTSUBSCRIPT italic_so end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) otherwise

For clarity, SMT constraints generated by IsoPredict are boxed throughput the paper. The way to understand the above is that, for every t1,t2Tsubscript𝑡1subscript𝑡2𝑇t_{1},t_{2}\in Titalic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ italic_T such that t1t2subscript𝑡1subscript𝑡2t_{1}\neq t_{2}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≠ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT,333Although the partial and total orders throughout the paper are irreflexive, the analysis never needs to generate irreflexivity constraints (e.g., t,¬ϕr(t,t)for-all𝑡subscriptitalic-ϕ𝑟𝑡𝑡\forall{t},\boxed{\neg\phi_{r}(t,t)}∀ italic_t , start_ARG ¬ italic_ϕ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_t , italic_t ) end_ARG for relation r𝑟ritalic_r) because it never generates any constraints that use ϕr(t,t)subscriptitalic-ϕ𝑟𝑡𝑡\phi_{r}(t,t)italic_ϕ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_t , italic_t ). the analysis generates a constraint—either ϕ𝑠𝑜(t1,t2)subscriptitalic-ϕ𝑠𝑜subscript𝑡1subscript𝑡2\phi_{\mathit{so}}(t_{1},t_{2})italic_ϕ start_POSTSUBSCRIPT italic_so end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) or ¬ϕ𝑠𝑜(t1,t2)subscriptitalic-ϕ𝑠𝑜subscript𝑡1subscript𝑡2\neg\phi_{\mathit{so}}(t_{1},t_{2})¬ italic_ϕ start_POSTSUBSCRIPT italic_so end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) depending on whether the transactions are ordered by 𝑠𝑜𝑠𝑜\mathit{so}italic_so.

Write–read order

Each read in the predicted execution can potentially read from any transaction that writes the same key.444Recall that a read to k𝑘kitalic_k can only read from another transaction’s last write to k𝑘kitalic_k2.1). To help reason about multiple reads in a transaction to the same key that have different writer transactions (and to help exclude potentially divergent events; §4.5), we introduce the notion of an event’s position: In each session, events are numbered with monotonically increasing integers. To ensure each read has exactly one writer transaction in the predicted execution, IsoPredict introduces an SMT function ϕ𝑐ℎ𝑜𝑖𝑐𝑒(s,i)subscriptitalic-ϕ𝑐ℎ𝑜𝑖𝑐𝑒𝑠𝑖\phi_{\mathit{choice}}(s,i)italic_ϕ start_POSTSUBSCRIPT italic_choice end_POSTSUBSCRIPT ( italic_s , italic_i ) that takes as input a session and the position of a read event in the session, and returns the writer transaction that the read reads from. Like transactions, sessions are a finite SMT data type representing the set of all sessions. (Note that ϕ𝑐ℎ𝑜𝑖𝑐𝑒(s,i)subscriptitalic-ϕ𝑐ℎ𝑜𝑖𝑐𝑒𝑠𝑖\phi_{\mathit{choice}}(s,i)italic_ϕ start_POSTSUBSCRIPT italic_choice end_POSTSUBSCRIPT ( italic_s , italic_i ) is left undefined if i𝑖iitalic_i is not the position of a read event in s𝑠sitalic_s.) IsoPredict generates the following constraints to ensure that ϕ𝑐ℎ𝑜𝑖𝑐𝑒(s,i)subscriptitalic-ϕ𝑐ℎ𝑜𝑖𝑐𝑒𝑠𝑖\phi_{\mathit{choice}}(s,i)italic_ϕ start_POSTSUBSCRIPT italic_choice end_POSTSUBSCRIPT ( italic_s , italic_i ) is equal to some transaction that writes the same key:

k is a key,t2 reads k,i𝑟𝑑𝑝𝑜𝑠k(t2),t1t2 writes kϕ𝑐ℎ𝑜𝑖𝑐𝑒(s2,i)=t1formulae-sequencefor-all𝑘 is a keyfor-allsubscript𝑡2 reads 𝑘for-all𝑖subscript𝑟𝑑𝑝𝑜𝑠𝑘subscript𝑡2subscriptsubscript𝑡1subscript𝑡2 writes 𝑘subscriptitalic-ϕ𝑐ℎ𝑜𝑖𝑐𝑒subscript𝑠2𝑖subscript𝑡1\displaystyle\forall k\textnormal{ is a key},\forall t_{2}\textnormal{ reads }% k,\forall i\in\mathit{rdpos_{k}}(t_{2}),\quad\boxed{\bigvee_{t_{1}\neq t_{2}% \textnormal{ writes }k}\phi_{\mathit{choice}}(s_{2},i)=t_{1}}∀ italic_k is a key , ∀ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT reads italic_k , ∀ italic_i ∈ italic_rdpos start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , start_ARG ⋁ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≠ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT writes italic_k end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_choice end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_i ) = italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG

where s2subscript𝑠2s_{2}italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is t2subscript𝑡2t_{2}italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT’s session, and 𝑟𝑑𝑝𝑜𝑠k(t)subscript𝑟𝑑𝑝𝑜𝑠𝑘𝑡\mathit{rdpos_{k}}(t)italic_rdpos start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) is the set of positions of reads to k𝑘kitalic_k in transaction t𝑡titalic_t.

IsoPredict encodes 𝑤𝑟ksubscript𝑤𝑟𝑘\mathit{wr}_{k}italic_wr start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT by generating constraints on Boolean SMT functions ϕ𝑤𝑟k(t1,t2)subscriptitalic-ϕsubscript𝑤𝑟𝑘subscript𝑡1subscript𝑡2\phi_{\mathit{wr}_{k}}(t_{1},t_{2})italic_ϕ start_POSTSUBSCRIPT italic_wr start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ):

k is a key,t1 writes k,t2 reads k,t1t2,ϕ𝑤𝑟k(t1,t2)=i𝑟𝑑𝑝𝑜𝑠k(t2)ϕ𝑐ℎ𝑜𝑖𝑐𝑒(s2,i)=t1formulae-sequencefor-all𝑘 is a keyfor-allsubscript𝑡1 writes 𝑘for-allsubscript𝑡2 reads 𝑘subscript𝑡1subscript𝑡2subscriptitalic-ϕsubscript𝑤𝑟𝑘subscript𝑡1subscript𝑡2subscript𝑖subscript𝑟𝑑𝑝𝑜𝑠𝑘subscript𝑡2subscriptitalic-ϕ𝑐ℎ𝑜𝑖𝑐𝑒subscript𝑠2𝑖subscript𝑡1\displaystyle\forall k\textnormal{ is a key},\forall t_{1}\textnormal{ writes % }k,\forall t_{2}\textnormal{ reads }k,t_{1}\neq t_{2},\quad\boxed{\phi_{% \mathit{wr}_{k}}(t_{1},t_{2})=\bigvee_{i\in\mathit{rdpos_{k}}(t_{2})}\phi_{% \mathit{choice}}(s_{2},i)=t_{1}}∀ italic_k is a key , ∀ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT writes italic_k , ∀ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT reads italic_k , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≠ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , start_ARG italic_ϕ start_POSTSUBSCRIPT italic_wr start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = ⋁ start_POSTSUBSCRIPT italic_i ∈ italic_rdpos start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_choice end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_i ) = italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG

where s2subscript𝑠2s_{2}italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is t2subscript𝑡2t_{2}italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT’s session.

To encode 𝑤𝑟(t1,t2)𝑤𝑟subscript𝑡1subscript𝑡2\mathit{wr}(t_{1},t_{2})italic_wr ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ), the analysis generates constraints on a Boolean SMT function ϕ𝑤𝑟(t1,t2)subscriptitalic-ϕ𝑤𝑟subscript𝑡1subscript𝑡2\phi_{\mathit{wr}}(t_{1},t_{2})italic_ϕ start_POSTSUBSCRIPT italic_wr end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) that represents the union of all ϕ𝑤𝑟k(t1,t2)subscriptitalic-ϕsubscript𝑤𝑟𝑘subscript𝑡1subscript𝑡2\phi_{\mathit{wr}_{k}}(t_{1},t_{2})italic_ϕ start_POSTSUBSCRIPT italic_wr start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ):

t1,t2T,t1t2,ϕ𝑤𝑟(t1,t2)=k is a keyϕ𝑤𝑟k(t1,t2)formulae-sequencefor-allsubscript𝑡1subscript𝑡2𝑇subscript𝑡1subscript𝑡2subscriptitalic-ϕ𝑤𝑟subscript𝑡1subscript𝑡2subscript𝑘 is a keysubscriptitalic-ϕsubscript𝑤𝑟𝑘subscript𝑡1subscript𝑡2\displaystyle\forall t_{1},t_{2}\in\mathit{T},t_{1}\neq t_{2},\quad\boxed{\phi% _{\mathit{wr}}(t_{1},t_{2})=\bigvee_{k\textnormal{ is a key}}\phi_{\mathit{wr}% _{k}}(t_{1},t_{2})}∀ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ italic_T , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≠ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , start_ARG italic_ϕ start_POSTSUBSCRIPT italic_wr end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = ⋁ start_POSTSUBSCRIPT italic_k is a key end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_wr start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG

4.2. Encoding Unserializability

This section describes how the analysis encodes constraints for the predicted execution to be unserializable. The constraints must ensure that all possible commit orders are cyclic. §4.2.1 presents an approach that encodes the needed constraints exactly, resulting in long solving times. §4.2.2 presents an alternative approach that encodes a sufficient condition for unserializability, which has lower solving time than the first approach, but still has high coverage in our experiments.

4.2.1. Constraints that encode an exact condition

To encode that no acyclic 𝑐𝑜𝑐𝑜\mathit{co}italic_co exists for the predicted execution history, IsoPredict generates the following constraint:

ϕ𝑐𝑜,¬𝐼𝑠𝑆𝑒𝑟𝑖𝑎𝑙𝑖𝑧𝑎𝑏𝑙𝑒(ϕ𝑐𝑜)for-allsubscriptitalic-ϕ𝑐𝑜𝐼𝑠𝑆𝑒𝑟𝑖𝑎𝑙𝑖𝑧𝑎𝑏𝑙𝑒subscriptitalic-ϕ𝑐𝑜\displaystyle\boxed{\forall\phi_{\mathit{co}},\neg\mathit{IsSerializable}(\phi% _{\mathit{co}})}∀ italic_ϕ start_POSTSUBSCRIPT italic_co end_POSTSUBSCRIPT , ¬ italic_IsSerializable ( italic_ϕ start_POSTSUBSCRIPT italic_co end_POSTSUBSCRIPT )

where 𝐼𝑠𝑆𝑒𝑟𝑖𝑎𝑙𝑖𝑧𝑎𝑏𝑙𝑒𝐼𝑠𝑆𝑒𝑟𝑖𝑎𝑙𝑖𝑧𝑎𝑏𝑙𝑒\mathit{IsSerializable}italic_IsSerializable is defined as shown below. Note that in the constraint above, ϕ𝑐𝑜(t)subscriptitalic-ϕ𝑐𝑜𝑡\phi_{\mathit{co}}(t)italic_ϕ start_POSTSUBSCRIPT italic_co end_POSTSUBSCRIPT ( italic_t ), which takes a transaction t𝑡titalic_t as input and evaluates to an integer indicating t𝑡titalic_t’s position in the 𝑐𝑜𝑐𝑜\mathit{co}italic_co total order, is not an SMT function—it is a bound variable of the quantifier. Function 𝐼𝑠𝑆𝑒𝑟𝑖𝑎𝑙𝑖𝑧𝑎𝑏𝑙𝑒𝐼𝑠𝑆𝑒𝑟𝑖𝑎𝑙𝑖𝑧𝑎𝑏𝑙𝑒\mathit{IsSerializable}italic_IsSerializable is defined as follows:

𝐼𝑠𝑆𝑒𝑟𝑖𝑎𝑙𝑖𝑧𝑎𝑏𝑙𝑒(ϕ𝑐𝑜):-:-𝐼𝑠𝑆𝑒𝑟𝑖𝑎𝑙𝑖𝑧𝑎𝑏𝑙𝑒subscriptitalic-ϕ𝑐𝑜absent\displaystyle\mathit{IsSerializable}(\phi_{\mathit{co}})\coloneq\;italic_IsSerializable ( italic_ϕ start_POSTSUBSCRIPT italic_co end_POSTSUBSCRIPT ) :- 𝐷𝑖𝑠𝑡𝑖𝑛𝑐𝑡(ϕ𝑐𝑜(t1),,ϕ𝑐𝑜(tn))limit-from𝐷𝑖𝑠𝑡𝑖𝑛𝑐𝑡subscriptitalic-ϕ𝑐𝑜subscript𝑡1subscriptitalic-ϕ𝑐𝑜subscript𝑡𝑛\displaystyle\mathit{Distinct}(\phi_{\mathit{co}}(t_{1}),\dots,\phi_{\mathit{% co}}(t_{n}))\;\landitalic_Distinct ( italic_ϕ start_POSTSUBSCRIPT italic_co end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , italic_ϕ start_POSTSUBSCRIPT italic_co end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) ∧
t1,t2T,t1t2(ϕ𝑤𝑟(t1,t2)ϕ𝑠𝑜(t1,t2)𝐴𝑟𝑏𝑖𝑡𝑟𝑎𝑡𝑖𝑜𝑛(t1,t2))ϕ𝑐𝑜(t1)<ϕ𝑐𝑜(t2)subscriptformulae-sequencefor-allsubscript𝑡1subscript𝑡2𝑇subscript𝑡1subscript𝑡2subscriptitalic-ϕ𝑤𝑟subscript𝑡1subscript𝑡2subscriptitalic-ϕ𝑠𝑜subscript𝑡1subscript𝑡2𝐴𝑟𝑏𝑖𝑡𝑟𝑎𝑡𝑖𝑜𝑛subscript𝑡1subscript𝑡2subscriptitalic-ϕ𝑐𝑜subscript𝑡1subscriptitalic-ϕ𝑐𝑜subscript𝑡2\displaystyle\bigwedge_{\forall t_{1},t_{2}\in T,t_{1}\neq t_{2}}(\phi_{% \mathit{wr}}(t_{1},t_{2})\lor\phi_{\mathit{so}}(t_{1},t_{2})\lor\mathit{% Arbitration}(t_{1},t_{2}))\Rightarrow\phi_{\mathit{co}}(t_{1})<\phi_{\mathit{% co}}(t_{2})⋀ start_POSTSUBSCRIPT ∀ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ italic_T , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≠ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_ϕ start_POSTSUBSCRIPT italic_wr end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∨ italic_ϕ start_POSTSUBSCRIPT italic_so end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∨ italic_Arbitration ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) ⇒ italic_ϕ start_POSTSUBSCRIPT italic_co end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) < italic_ϕ start_POSTSUBSCRIPT italic_co end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT )

where t1,,tnsubscript𝑡1subscript𝑡𝑛t_{1},\dots,t_{n}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT are all transactions in T𝑇\mathit{T}italic_T, and 𝐷𝑖𝑠𝑡𝑖𝑛𝑐𝑡(v1,,vk)𝐷𝑖𝑠𝑡𝑖𝑛𝑐𝑡subscript𝑣1subscript𝑣𝑘\mathit{Distinct}(v_{1},\dots,v_{k})italic_Distinct ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) is a built-in SMT function that requires all input values to be distinct from each other. By mapping ϕ𝑐𝑜subscriptitalic-ϕ𝑐𝑜\phi_{\mathit{co}}italic_ϕ start_POSTSUBSCRIPT italic_co end_POSTSUBSCRIPT (t) to a unique integer for each t𝑡titalic_t, the first line of the equation above ensures that 𝑐𝑜𝑐𝑜\mathit{co}italic_co is a total order.

The second line of the equation ensures that 𝑐𝑜𝑐𝑜\mathit{co}italic_co is consistent with 𝑤𝑟𝑤𝑟\mathit{wr}italic_wr, 𝑠𝑜𝑠𝑜\mathit{so}italic_so, and 𝑤𝑤𝑤𝑤\mathit{ww}italic_ww, respectively. For simplicity and to reduce the size of the constraints, arbitration constraints are factored out into the 𝐴𝑟𝑏𝑖𝑡𝑟𝑎𝑡𝑖𝑜𝑛𝐴𝑟𝑏𝑖𝑡𝑟𝑎𝑡𝑖𝑜𝑛\mathit{Arbitration}italic_Arbitration function, which is defined as follows:

𝐴𝑟𝑏𝑖𝑡𝑟𝑎𝑡𝑖𝑜𝑛(t1,t2):-k,t1 and t2 write kt3T{t1,t2},t3 reads kϕ𝑤𝑟k(t2,t3)(ϕ𝑐𝑜(t1)<ϕ𝑐𝑜(t3)):-𝐴𝑟𝑏𝑖𝑡𝑟𝑎𝑡𝑖𝑜𝑛subscript𝑡1subscript𝑡2subscriptfor-all𝑘subscript𝑡1 and subscript𝑡2 write 𝑘for-allsubscript𝑡3𝑇subscript𝑡1subscript𝑡2subscript𝑡3 reads 𝑘subscriptitalic-ϕsubscript𝑤𝑟𝑘subscript𝑡2subscript𝑡3subscriptitalic-ϕ𝑐𝑜subscript𝑡1subscriptitalic-ϕ𝑐𝑜subscript𝑡3\displaystyle\boxed{\mathit{Arbitration}(t_{1},t_{2})\coloneq\bigvee_{\begin{% subarray}{c}\forall k,t_{1}\textnormal{ and }t_{2}\textnormal{ write }k\\ \forall t_{3}\in\mathit{T}\setminus\{t_{1},t_{2}\},t_{3}\textnormal{ reads }k% \end{subarray}}\phi_{\mathit{wr}_{k}}(t_{2},t_{3})\land\bigl{(}\phi_{\mathit{% co}}(t_{1})<\phi_{\mathit{co}}(t_{3})\bigr{)}}italic_Arbitration ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) :- ⋁ start_POSTSUBSCRIPT start_ARG start_ROW start_CELL ∀ italic_k , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT write italic_k end_CELL end_ROW start_ROW start_CELL ∀ italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ∈ italic_T ∖ { italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT } , italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT reads italic_k end_CELL end_ROW end_ARG end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_wr start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ∧ ( italic_ϕ start_POSTSUBSCRIPT italic_co end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) < italic_ϕ start_POSTSUBSCRIPT italic_co end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) )

which is a straightforward encoding of the serializable arbitration constraints in Equation 1.

By using this approach we are pushing all the heavy lifting to the SMT solver. However, SMT solvers are known to be inefficient at solving constraints with universal quantifiers (Leino and Pit-Claudel, 2016)—an issue confirmed by our performance results (§7.2).

4.2.2. Constraints encoding a sufficient but unnecessary condition

Alternatively, the analysis can encode a sufficient, but unnecessary, condition for predicting an unserializable execution. We introduce a partial order, 𝑝𝑐𝑜𝑝𝑐𝑜\mathit{pco}italic_pco, that is a subset of every commit order for every valid predicted execution. If there exists a predicted execution for which 𝑝𝑐𝑜𝑝𝑐𝑜\mathit{pco}italic_pco is cyclic, then there cannot exist an acyclic 𝑐𝑜𝑐𝑜\mathit{co}italic_co for the predicted execution, meaning it is unserializable. In theory, this approach has the potential for missing unserializable executions that §4.2.1’s approach finds. But in our experiments, the 𝑝𝑐𝑜𝑝𝑐𝑜\mathit{pco}italic_pco-based approach predicts all unserializable executions that §4.2.1’s approach finds (§7.2).

We define 𝑝𝑐𝑜𝑝𝑐𝑜\mathit{pco}italic_pco to include all orders that must be in 𝑐𝑜𝑐𝑜\mathit{co}italic_co: session (𝑠𝑜𝑠𝑜\mathit{so}italic_so), write–read (𝑤𝑟𝑤𝑟\mathit{wr}italic_wr), and arbitration (𝑤𝑤𝑤𝑤\mathit{ww}italic_ww) orders. We also introduce an anti-dependency order (𝑟𝑤𝑟𝑤\mathit{rw}italic_rw) that must be in every 𝑐𝑜𝑐𝑜\mathit{co}italic_co, which allows adding more edges to 𝑝𝑐𝑜𝑝𝑐𝑜\mathit{pco}italic_pco and thus finding more unserializable executions. A challenge with encoding 𝑝𝑐𝑜𝑝𝑐𝑜\mathit{pco}italic_pco is that the arbitration and anti-dependency orders are both defined in terms of commit order, creating a circular dependency that leads to erroneous self-justifying edges in 𝑝𝑐𝑜𝑝𝑐𝑜\mathit{pco}italic_pco. We break both circular dependencies by introducing the notion of rank in the generated constraints. Next we describe anti-dependency order (𝑟𝑤𝑟𝑤\mathit{rw}italic_rw), the circular dependency problem and our rank-based solution to it, and finally the constraints that the analysis generates.

Adding anti-dependency order (𝑟𝑤𝑟𝑤\mathit{rw}italic_rw) to 𝑝𝑐𝑜𝑝𝑐𝑜\mathit{pco}italic_pco

To make 𝑝𝑐𝑜𝑝𝑐𝑜\mathit{pco}italic_pco as large as possible while still being consistent with every valid 𝑐𝑜𝑐𝑜\mathit{co}italic_co, we add an anti-dependency (𝑟𝑤𝑟𝑤\mathit{rw}italic_rw) order to 𝑝𝑐𝑜𝑝𝑐𝑜\mathit{pco}italic_pco. 𝑟𝑤𝑟𝑤\mathit{rw}italic_rw must be part of any valid 𝑐𝑜𝑐𝑜\mathit{co}italic_co, as we prove in Appendix A. Intuitively, for any write–read relation 𝑤𝑟k(t1,t2)subscript𝑤𝑟𝑘subscript𝑡1subscript𝑡2\mathit{wr}_{k}(t_{1},t_{2})italic_wr start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ), anti-dependency prevents future transactions that also write k𝑘kitalic_k from being ordered between t1subscript𝑡1t_{1}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and t2subscript𝑡2t_{2}italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT in the commit order. More formally, we define 𝑟𝑤(t1,t2)𝑟𝑤subscript𝑡1subscript𝑡2\mathit{rw}(t_{1},t_{2})italic_rw ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) as follows:

𝑟𝑤(t1,t2):-k,t2 writes ktw,𝑤𝑟k(tw,t1)𝑝𝑐𝑜(tw,t2):-𝑟𝑤subscript𝑡1subscript𝑡2𝑘subscript𝑡2 writes 𝑘subscript𝑡𝑤subscript𝑤𝑟𝑘subscript𝑡𝑤subscript𝑡1𝑝𝑐𝑜subscript𝑡𝑤subscript𝑡2\displaystyle\mathit{rw}(t_{1},t_{2})\coloneq\exists k,t_{2}\textnormal{ % writes }k\land\exists t_{w},\mathit{wr}_{k}(t_{w},t_{1})\land\mathit{pco}(t_{w% },t_{2})italic_rw ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) :- ∃ italic_k , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT writes italic_k ∧ ∃ italic_t start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT , italic_wr start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ∧ italic_pco ( italic_t start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT )

Figure 6 shows an example in which 𝑝𝑐𝑜𝑝𝑐𝑜\mathit{pco}italic_pco is cyclic only if 𝑟𝑤𝑟𝑤\mathit{rw}italic_rw is included.

Figure 5. Including anti-dependency ordering (𝑟𝑤𝑟𝑤\mathit{rw}italic_rw; dashed arrows) in 𝑝𝑐𝑜𝑝𝑐𝑜\mathit{pco}italic_pco makes 𝑝𝑐𝑜𝑝𝑐𝑜\mathit{pco}italic_pco cyclic.
Figure 5. Including anti-dependency ordering (𝑟𝑤𝑟𝑤\mathit{rw}italic_rw; dashed arrows) in 𝑝𝑐𝑜𝑝𝑐𝑜\mathit{pco}italic_pco makes 𝑝𝑐𝑜𝑝𝑐𝑜\mathit{pco}italic_pco cyclic.
Figure 6. An example of circular dependency: 𝑤𝑤(t1,t2)𝑤𝑤subscript𝑡1subscript𝑡2\mathit{ww}(t_{1},t_{2})italic_ww ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) depends on 𝑝𝑐𝑜(t1,t3)𝑝𝑐𝑜subscript𝑡1subscript𝑡3\mathit{pco}(t_{1},t_{3})italic_pco ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ), which in turn depends on 𝑤𝑤(t1,t2)𝑤𝑤subscript𝑡1subscript𝑡2\mathit{ww}(t_{1},t_{2})italic_ww ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ).

The partial order 𝑝𝑐𝑜𝑝𝑐𝑜\mathit{pco}italic_pco can now be defined as the union of all orders that must be part of 𝑐𝑜𝑐𝑜\mathit{co}italic_co:

𝑝𝑐𝑜=(𝑠𝑜𝑤𝑟𝑤𝑤𝑟𝑤)+𝑝𝑐𝑜superscript𝑠𝑜𝑤𝑟𝑤𝑤𝑟𝑤\mathit{pco}=(\mathit{so}\cup\mathit{wr}\cup\mathit{ww}\cup\mathit{rw})^{+}italic_pco = ( italic_so ∪ italic_wr ∪ italic_ww ∪ italic_rw ) start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT

Adapting Equation 1 to use 𝑝𝑐𝑜𝑝𝑐𝑜\mathit{pco}italic_pco instead of 𝑐𝑜𝑐𝑜\mathit{co}italic_co, we define arbitration order, 𝑤𝑤𝑤𝑤\mathit{ww}italic_ww, as follows:

𝑤𝑤(t1,t2):-k,t1 and t2 write to kt3T,wrk(t2,t3)𝑝𝑐𝑜(t1,t3)formulae-sequence:-𝑤𝑤subscript𝑡1subscript𝑡2𝑘t1 and t2 write to ksubscript𝑡3𝑇𝑤subscript𝑟𝑘subscript𝑡2subscript𝑡3𝑝𝑐𝑜subscript𝑡1subscript𝑡3\displaystyle\mathit{ww}(t_{1},t_{2})\coloneq\exists k,\textnormal{$t_{1}$ and% $t_{2}$ write to $k$}\land\exists t_{3}\in\mathit{T},wr_{k}(t_{2},t_{3})\land% \mathit{pco}(t_{1},t_{3})italic_ww ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) :- ∃ italic_k , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT write to italic_k ∧ ∃ italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ∈ italic_T , italic_w italic_r start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ∧ italic_pco ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT )
Circular dependency and rank

In the definitions above, note the circular dependencies between 𝑝𝑐𝑜𝑝𝑐𝑜\mathit{pco}italic_pco and 𝑤𝑤𝑤𝑤\mathit{ww}italic_ww and between 𝑝𝑐𝑜𝑝𝑐𝑜\mathit{pco}italic_pco and 𝑟𝑤𝑟𝑤\mathit{rw}italic_rw, which seem to permit “self-justifying” edges. As an example, consider Figure 6. According to the definitions, 𝑝𝑐𝑜(t1,t3)𝑤𝑤(t1,t2)𝑝𝑐𝑜subscript𝑡1subscript𝑡3𝑤𝑤subscript𝑡1subscript𝑡2\mathit{pco}(t_{1},t_{3})\Rightarrow\mathit{ww}(t_{1},t_{2})italic_pco ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ⇒ italic_ww ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ), and 𝑤𝑤(t1,t2)𝑝𝑐𝑜(t1,t3)𝑤𝑤subscript𝑡1subscript𝑡2𝑝𝑐𝑜subscript𝑡1subscript𝑡3\mathit{ww}(t_{1},t_{2})\Rightarrow\mathit{pco}(t_{1},t_{3})italic_ww ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ⇒ italic_pco ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ), allowing us to wrongly conclude 𝑤𝑤(t1,t2)𝑤𝑤subscript𝑡1subscript𝑡2\mathit{ww}(t_{1},t_{2})italic_ww ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) and 𝑝𝑐𝑜(t1,t3)𝑝𝑐𝑜subscript𝑡1subscript𝑡3\mathit{pco}(t_{1},t_{3})italic_pco ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ). To avoid such self-justifying edges, 𝑝𝑐𝑜𝑝𝑐𝑜\mathit{pco}italic_pco, 𝑤𝑤𝑤𝑤\mathit{ww}italic_ww, and 𝑟𝑤𝑟𝑤\mathit{rw}italic_rw in fact must be defined as the minimal relations that satisfy the above definitions.

How can we encode this “minimal relation” property in the SMT constraints? If IsoPredict simply encodes the above definitions as SMT constraints, the constraint solver will find self-justifying edges, resulting in spurious cycles and reporting executions that are not actually unserializable. For example, for Figure 6, the SMT solver would choose both 𝑤𝑤(t1,t2)𝑤𝑤subscript𝑡1subscript𝑡2\mathit{ww}(t_{1},t_{2})italic_ww ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) and 𝑝𝑐𝑜(t1,t3)𝑝𝑐𝑜subscript𝑡1subscript𝑡3\mathit{pco}(t_{1},t_{3})italic_pco ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) to be true, finding a cycle and wrongly reporting a predicted execution that is actually serializable.

We address this problem by introducing the notion of rank, which orders 𝑝𝑐𝑜𝑝𝑐𝑜\mathit{pco}italic_pco edges that depend on each other. IsoPredict relies on an integer SMT function 𝑟𝑎𝑛𝑘(t1,t2)𝑟𝑎𝑛𝑘subscript𝑡1subscript𝑡2\mathit{rank}(t_{1},t_{2})italic_rank ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) to enforce the following rule:

For any relations r𝑟ritalic_r and rsuperscript𝑟normal-′r^{\prime}italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, if r(t1,t2)𝑟subscript𝑡1subscript𝑡2r(t_{1},t_{2})italic_r ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) depends on r(t1,t2)superscript𝑟normal-′subscriptsuperscript𝑡normal-′1subscriptsuperscript𝑡normal-′2r^{\prime}(t^{\prime}_{1},t^{\prime}_{2})italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ), then 𝑟𝑎𝑛𝑘(t1,t2)>𝑟𝑎𝑛𝑘(t1,t2)𝑟𝑎𝑛𝑘subscript𝑡1subscript𝑡2𝑟𝑎𝑛𝑘subscriptsuperscript𝑡normal-′1subscriptsuperscript𝑡normal-′2\mathit{rank}(t_{1},t_{2})>\mathit{rank}(t^{\prime}_{1},t^{\prime}_{2})italic_rank ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) > italic_rank ( italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ).

Note that the rule does not require t1t1subscript𝑡1subscriptsuperscript𝑡1t_{1}\neq t^{\prime}_{1}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≠ italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT or t2t2subscript𝑡2subscriptsuperscript𝑡2t_{2}\neq t^{\prime}_{2}italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≠ italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. For Figure 6, rank constraints disallow 𝑤𝑤(t1,t2)𝑤𝑤subscript𝑡1subscript𝑡2\mathit{ww}(t_{1},t_{2})italic_ww ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) and 𝑝𝑐𝑜(t1,t3)𝑝𝑐𝑜subscript𝑡1subscript𝑡3\mathit{pco}(t_{1},t_{3})italic_pco ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ), which would require both 𝑟𝑎𝑛𝑘(t1,t2)>𝑟𝑎𝑛𝑘(t1,t3)𝑟𝑎𝑛𝑘subscript𝑡1subscript𝑡2𝑟𝑎𝑛𝑘subscript𝑡1subscript𝑡3\mathit{rank}(t_{1},t_{2})>\mathit{rank}(t_{1},t_{3})italic_rank ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) > italic_rank ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) and 𝑟𝑎𝑛𝑘(t1,t3)>𝑟𝑎𝑛𝑘(t1,t2)𝑟𝑎𝑛𝑘subscript𝑡1subscript𝑡3𝑟𝑎𝑛𝑘subscript𝑡1subscript𝑡2\mathit{rank}(t_{1},t_{3})>\mathit{rank}(t_{1},t_{2})italic_rank ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) > italic_rank ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ).

Generated constraints

IsoPredict generates arbitration and anti-dependency constraints on Boolean SMT functions ϕ𝑤𝑤(t1,t2)subscriptitalic-ϕ𝑤𝑤subscript𝑡1subscript𝑡2\phi_{\mathit{ww}}(t_{1},t_{2})italic_ϕ start_POSTSUBSCRIPT italic_ww end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) and ϕ𝑟𝑤(t1,t2)subscriptitalic-ϕ𝑟𝑤subscript𝑡1subscript𝑡2\phi_{\mathit{rw}}(t_{1},t_{2})italic_ϕ start_POSTSUBSCRIPT italic_rw end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ):

t1,t2T,t1t2,formulae-sequencefor-allsubscript𝑡1subscript𝑡2𝑇subscript𝑡1subscript𝑡2\displaystyle\forall t_{1},t_{2}\in T,t_{1}\neq t_{2},∀ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ italic_T , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≠ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ,
ϕ𝑤𝑤(t1,t2)=k,t1 and t2 write kt3T{t1,t2},t3 reads kϕ𝑤𝑟k(t2,t3)ϕ𝑝𝑐𝑜(t1,t3))𝑟𝑎𝑛𝑘(t1,t2)>𝑟𝑎𝑛𝑘(t1,t3)\displaystyle\boxed{\phi_{\mathit{ww}}(t_{1},t_{2})=\bigvee_{\begin{subarray}{% c}\forall k,t_{1}\textnormal{ and }t_{2}\textnormal{ write }k\\ \forall t_{3}\in\mathit{T}\setminus\{t_{1},t_{2}\},t_{3}\textnormal{ reads }k% \end{subarray}}\phi_{\mathit{wr}_{k}}(t_{2},t_{3})\land\phi_{\mathit{pco}}(t_{% 1},t_{3}))\land\mathit{rank}(t_{1},t_{2})>\mathit{rank}(t_{1},t_{3})}italic_ϕ start_POSTSUBSCRIPT italic_ww end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = ⋁ start_POSTSUBSCRIPT start_ARG start_ROW start_CELL ∀ italic_k , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT write italic_k end_CELL end_ROW start_ROW start_CELL ∀ italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ∈ italic_T ∖ { italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT } , italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT reads italic_k end_CELL end_ROW end_ARG end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_wr start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ∧ italic_ϕ start_POSTSUBSCRIPT italic_pco end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ) ∧ italic_rank ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) > italic_rank ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT )
ϕ𝑟𝑤(t1,t2)=k,t1 reads kt2 writes kt3T{t1,t2},t3 writes kϕ𝑤𝑟k(t3,t1)ϕ𝑝𝑐𝑜(t3,t2)𝑟𝑎𝑛𝑘(t1,t2)>𝑟𝑎𝑛𝑘(t3,t2)subscriptitalic-ϕ𝑟𝑤subscript𝑡1subscript𝑡2subscriptfor-all𝑘subscript𝑡1 reads 𝑘subscript𝑡2 writes 𝑘for-allsubscript𝑡3𝑇subscript𝑡1subscript𝑡2subscript𝑡3 writes 𝑘subscriptitalic-ϕsubscript𝑤𝑟𝑘subscript𝑡3subscript𝑡1subscriptitalic-ϕ𝑝𝑐𝑜subscript𝑡3subscript𝑡2𝑟𝑎𝑛𝑘subscript𝑡1subscript𝑡2𝑟𝑎𝑛𝑘subscript𝑡3subscript𝑡2\displaystyle\boxed{\phi_{\mathit{rw}}(t_{1},t_{2})=\bigvee_{\begin{subarray}{% c}\forall k,t_{1}\textnormal{ reads }k\>\land\>t_{2}\textnormal{ writes }k\\ \forall t_{3}\in\mathit{T}\setminus\{t_{1},t_{2}\},t_{3}\textnormal{ writes }k% \end{subarray}}\phi_{\mathit{wr}_{k}}(t_{3},t_{1})\land\phi_{\mathit{pco}}(t_{% 3},t_{2})\land\mathit{rank}(t_{1},t_{2})>\mathit{rank}(t_{3},t_{2})}italic_ϕ start_POSTSUBSCRIPT italic_rw end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = ⋁ start_POSTSUBSCRIPT start_ARG start_ROW start_CELL ∀ italic_k , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT reads italic_k ∧ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT writes italic_k end_CELL end_ROW start_ROW start_CELL ∀ italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ∈ italic_T ∖ { italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT } , italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT writes italic_k end_CELL end_ROW end_ARG end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_wr start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ∧ italic_ϕ start_POSTSUBSCRIPT italic_pco end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∧ italic_rank ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) > italic_rank ( italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT )

The following constraints ensure that 𝑝𝑐𝑜𝑝𝑐𝑜\mathit{pco}italic_pco is a partial order implied by 𝑠𝑜𝑠𝑜\mathit{so}italic_so, 𝑤𝑟𝑤𝑟\mathit{wr}italic_wr, 𝑤𝑤𝑤𝑤\mathit{ww}italic_ww, and 𝑟𝑤𝑟𝑤\mathit{rw}italic_rw:

t1,t2T,t1t2,formulae-sequencefor-allsubscript𝑡1subscript𝑡2𝑇subscript𝑡1subscript𝑡2\displaystyle\forall t_{1},t_{2}\in T,t_{1}\neq t_{2},∀ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ italic_T , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≠ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ,
ϕ𝑝𝑐𝑜(t1,t2)=subscriptitalic-ϕ𝑝𝑐𝑜subscript𝑡1subscript𝑡2absent\displaystyle\phi_{\mathit{pco}}(t_{1},t_{2})=\;italic_ϕ start_POSTSUBSCRIPT italic_pco end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = ϕ𝑠𝑜(t1,t2)ϕ𝑤𝑟(t1,t2)ϕ𝑤𝑤(t1,t2)ϕ𝑟𝑤(t1,t2)subscriptitalic-ϕ𝑠𝑜subscript𝑡1subscript𝑡2subscriptitalic-ϕ𝑤𝑟subscript𝑡1subscript𝑡2subscriptitalic-ϕ𝑤𝑤subscript𝑡1subscript𝑡2limit-fromsubscriptitalic-ϕ𝑟𝑤subscript𝑡1subscript𝑡2\displaystyle\phi_{\mathit{so}}(t_{1},t_{2})\lor\phi_{\mathit{wr}}(t_{1},t_{2}% )\lor\phi_{\mathit{ww}}(t_{1},t_{2})\lor\phi_{\mathit{rw}}(t_{1},t_{2})\;\loritalic_ϕ start_POSTSUBSCRIPT italic_so end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∨ italic_ϕ start_POSTSUBSCRIPT italic_wr end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∨ italic_ϕ start_POSTSUBSCRIPT italic_ww end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∨ italic_ϕ start_POSTSUBSCRIPT italic_rw end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∨
tT{t1,t2}ϕ𝑝𝑐𝑜(t1,t)ϕ𝑝𝑐𝑜(t,t2)𝑟𝑎𝑛𝑘(t1,t2)>𝑟𝑎𝑛𝑘(t1,t)𝑟𝑎𝑛𝑘(t1,t2)>𝑟𝑎𝑛𝑘(t,t2)subscript𝑡𝑇subscript𝑡1subscript𝑡2subscriptitalic-ϕ𝑝𝑐𝑜subscript𝑡1𝑡subscriptitalic-ϕ𝑝𝑐𝑜𝑡subscript𝑡2𝑟𝑎𝑛𝑘subscript𝑡1subscript𝑡2𝑟𝑎𝑛𝑘subscript𝑡1𝑡𝑟𝑎𝑛𝑘subscript𝑡1subscript𝑡2𝑟𝑎𝑛𝑘𝑡subscript𝑡2\displaystyle\bigvee_{t\in\mathit{T}\setminus\{t_{1},t_{2}\}}\!\!\phi_{\mathit% {pco}}(t_{1},t)\land\phi_{\mathit{pco}}(t,t_{2})\land\mathit{rank}(t_{1},t_{2}% )>\mathit{rank}(t_{1},t)\land\mathit{rank}(t_{1},t_{2})>\mathit{rank}(t,t_{2})⋁ start_POSTSUBSCRIPT italic_t ∈ italic_T ∖ { italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT } end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_pco end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t ) ∧ italic_ϕ start_POSTSUBSCRIPT italic_pco end_POSTSUBSCRIPT ( italic_t , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∧ italic_rank ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) > italic_rank ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t ) ∧ italic_rank ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) > italic_rank ( italic_t , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT )

To ensure that 𝑝𝑐𝑜𝑝𝑐𝑜\mathit{pco}italic_pco is cyclic, the analysis generates the following constraint:

t1,t2T,t1t2ϕ𝑝𝑐𝑜(t1,t2)ϕ𝑝𝑐𝑜(t2,t1)subscriptformulae-sequencefor-allsubscript𝑡1subscript𝑡2𝑇subscript𝑡1subscript𝑡2subscriptitalic-ϕ𝑝𝑐𝑜subscript𝑡1subscript𝑡2subscriptitalic-ϕ𝑝𝑐𝑜subscript𝑡2subscript𝑡1\displaystyle\boxed{\bigvee_{\forall t_{1},t_{2}\in\mathit{T},t_{1}\neq t_{2}}% \phi_{\mathit{pco}}(t_{1},t_{2})\land\phi_{\mathit{pco}}(t_{2},t_{1})}⋁ start_POSTSUBSCRIPT ∀ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ italic_T , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≠ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_pco end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∧ italic_ϕ start_POSTSUBSCRIPT italic_pco end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT )

If the solver finds a satisfying solution, a predicted unserializable execution exists. If the solver reports no satisfying solution, a predicted unserializable execution may or may not exist. In our experiments, a predicted unserializable execution never exists in this case.

We have not been able to come up with an execution for which our 𝑝𝑐𝑜𝑝𝑐𝑜\mathit{pco}italic_pco-based approach misses a predicted unserializable execution. We believe that such an execution should exist because otherwise it would imply a polynomial-time algorithm for deciding if an execution history is serializable—a problem that is NP-hard (Biswas and Enea, 2019).

4.3. Encoding Weak Isolation

This section describes the constraints that IsoPredict generates to ensure that the execution conforms to the target weak isolation model (causal or rc).

Regardless of the model, IsoPredict encodes ℎ𝑏ℎ𝑏\mathit{hb}italic_hb as the transitive closure of 𝑠𝑜𝑠𝑜\mathit{so}italic_so and 𝑤𝑟𝑤𝑟\mathit{wr}italic_wr2.1), by generating constraints on a Boolean SMT function ϕℎ𝑏(t1,t2)subscriptitalic-ϕℎ𝑏subscript𝑡1subscript𝑡2\phi_{\mathit{hb}}(t_{1},t_{2})italic_ϕ start_POSTSUBSCRIPT italic_hb end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ):

t1,t2T,t1t2,ϕℎ𝑏(t1,t2)=ϕ𝑠𝑜(t1,t2)ϕ𝑤𝑟(t1,t2)tT{t1,t2}ϕℎ𝑏(t1,t)ϕℎ𝑏(t,t2)formulae-sequencefor-allsubscript𝑡1subscript𝑡2𝑇subscript𝑡1subscript𝑡2subscriptitalic-ϕℎ𝑏subscript𝑡1subscript𝑡2subscriptitalic-ϕ𝑠𝑜subscript𝑡1subscript𝑡2subscriptitalic-ϕ𝑤𝑟subscript𝑡1subscript𝑡2subscriptfor-all𝑡𝑇subscript𝑡1subscript𝑡2subscriptitalic-ϕℎ𝑏subscript𝑡1𝑡subscriptitalic-ϕℎ𝑏𝑡subscript𝑡2\displaystyle\forall t_{1},t_{2}\in T,t_{1}\neq t_{2},\quad\boxed{\phi_{% \mathit{hb}}(t_{1},t_{2})=\phi_{\mathit{so}}(t_{1},t_{2})\lor\phi_{\mathit{wr}% }(t_{1},t_{2})\lor\bigvee_{\forall t\in T\setminus\{t_{1},t_{2}\}}\phi_{% \mathit{hb}}(t_{1},t)\land\phi_{\mathit{hb}}(t,t_{2})}∀ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ italic_T , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≠ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , start_ARG italic_ϕ start_POSTSUBSCRIPT italic_hb end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = italic_ϕ start_POSTSUBSCRIPT italic_so end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∨ italic_ϕ start_POSTSUBSCRIPT italic_wr end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∨ ⋁ start_POSTSUBSCRIPT ∀ italic_t ∈ italic_T ∖ { italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT } end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_hb end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t ) ∧ italic_ϕ start_POSTSUBSCRIPT italic_hb end_POSTSUBSCRIPT ( italic_t , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG

4.3.1. Causal consistency (causal)

To ensure that the predicted execution is causal, IsoPredict generates constraints that ensure that the transitive closure of causal arbitration order (𝑤𝑤𝑐𝑎𝑢𝑠𝑎𝑙subscript𝑤𝑤𝑐𝑎𝑢𝑠𝑎𝑙\mathit{ww}_{\mathit{causal}}italic_ww start_POSTSUBSCRIPT italic_causal end_POSTSUBSCRIPT) and happens-before (ℎ𝑏ℎ𝑏\mathit{hb}italic_hb) is acyclic (§2.3). IsoPredict encodes the causal axiom (Equation 2) by generating constraints on a Boolean SMT function ϕ𝑤𝑤𝑐𝑎𝑢𝑠𝑎𝑙(t1,t2)subscriptitalic-ϕsubscript𝑤𝑤𝑐𝑎𝑢𝑠𝑎𝑙subscript𝑡1subscript𝑡2\phi_{\mathit{ww}_{\mathit{causal}}}(t_{1},t_{2})italic_ϕ start_POSTSUBSCRIPT italic_ww start_POSTSUBSCRIPT italic_causal end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) representing 𝑤𝑤𝑐𝑎𝑢𝑠𝑎𝑙subscript𝑤𝑤𝑐𝑎𝑢𝑠𝑎𝑙\mathit{ww}_{\mathit{causal}}italic_ww start_POSTSUBSCRIPT italic_causal end_POSTSUBSCRIPT:

t1,t2T,t1t2,ϕ𝑤𝑤𝑐𝑎𝑢𝑠𝑎𝑙(t1,t2)=k,t1 and t2 write kt3T{t1,t2},t3 reads kϕ𝑤𝑟k(t2,t3)ϕℎ𝑏(t1,t3)formulae-sequencefor-allsubscript𝑡1subscript𝑡2𝑇subscript𝑡1subscript𝑡2subscriptitalic-ϕsubscript𝑤𝑤𝑐𝑎𝑢𝑠𝑎𝑙subscript𝑡1subscript𝑡2subscriptfor-all𝑘subscript𝑡1 and subscript𝑡2 write 𝑘for-allsubscript𝑡3𝑇subscript𝑡1subscript𝑡2subscript𝑡3 reads 𝑘subscriptitalic-ϕsubscript𝑤𝑟𝑘subscript𝑡2subscript𝑡3subscriptitalic-ϕℎ𝑏subscript𝑡1subscript𝑡3\displaystyle\forall t_{1},t_{2}\in T,t_{1}\neq t_{2},\quad\boxed{\phi_{% \mathit{ww}_{\mathit{causal}}}(t_{1},t_{2})=\bigvee_{\begin{subarray}{c}% \forall k,t_{1}\textnormal{ and }t_{2}\textnormal{ write }k\\ \forall t_{3}\in\mathit{T}\setminus\{t_{1},t_{2}\},t_{3}\textnormal{ reads }k% \end{subarray}}\phi_{\mathit{wr}_{k}}(t_{2},t_{3})\land\phi_{\mathit{hb}}(t_{1% },t_{3})}∀ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ italic_T , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≠ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , start_ARG italic_ϕ start_POSTSUBSCRIPT italic_ww start_POSTSUBSCRIPT italic_causal end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = ⋁ start_POSTSUBSCRIPT start_ARG start_ROW start_CELL ∀ italic_k , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT write italic_k end_CELL end_ROW start_ROW start_CELL ∀ italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ∈ italic_T ∖ { italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT } , italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT reads italic_k end_CELL end_ROW end_ARG end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_wr start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ∧ italic_ϕ start_POSTSUBSCRIPT italic_hb end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) end_ARG

To ensure the execution is causal, there must exist a strict total order that is consistent with (ℎ𝑏𝑤𝑤𝑐𝑎𝑢𝑠𝑎𝑙)+superscriptℎ𝑏subscript𝑤𝑤𝑐𝑎𝑢𝑠𝑎𝑙(\mathit{hb}\cup\mathit{ww}_{\mathit{causal}})^{+}( italic_hb ∪ italic_ww start_POSTSUBSCRIPT italic_causal end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT (Equation 3). IsoPredict generates the constraints on an integer SMT function ϕ𝑐𝑜𝑐𝑎𝑢𝑠𝑎𝑙(t)subscriptitalic-ϕsubscript𝑐𝑜𝑐𝑎𝑢𝑠𝑎𝑙𝑡\phi_{\mathit{co_{causal}}}(t)italic_ϕ start_POSTSUBSCRIPT italic_co start_POSTSUBSCRIPT italic_causal end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ):

t,t1,t2T,t1t2,ϕℎ𝑏(t1,t2)ϕ𝑤𝑤𝑐𝑎𝑢𝑠𝑎𝑙(t1,t2)ϕ𝑐𝑜𝑐𝑎𝑢𝑠𝑎𝑙(t1)<ϕ𝑐𝑜𝑐𝑎𝑢𝑠𝑎𝑙(t2)formulae-sequencefor-all𝑡subscript𝑡1subscript𝑡2𝑇subscript𝑡1subscript𝑡2subscriptitalic-ϕℎ𝑏subscript𝑡1subscript𝑡2subscriptitalic-ϕsubscript𝑤𝑤𝑐𝑎𝑢𝑠𝑎𝑙subscript𝑡1subscript𝑡2subscriptitalic-ϕsubscript𝑐𝑜𝑐𝑎𝑢𝑠𝑎𝑙subscript𝑡1subscriptitalic-ϕsubscript𝑐𝑜𝑐𝑎𝑢𝑠𝑎𝑙subscript𝑡2\displaystyle\forall t,t_{1},t_{2}\in T,t_{1}\neq t_{2},\quad\boxed{\phi_{% \mathit{hb}}(t_{1},t_{2})\lor\phi_{\mathit{ww}_{\mathit{causal}}}(t_{1},t_{2})% \;\Rightarrow\;\phi_{\mathit{co_{causal}}}(t_{1})<\phi_{\mathit{co_{causal}}}(% t_{2})}∀ italic_t , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ italic_T , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≠ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , start_ARG italic_ϕ start_POSTSUBSCRIPT italic_hb end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∨ italic_ϕ start_POSTSUBSCRIPT italic_ww start_POSTSUBSCRIPT italic_causal end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ⇒ italic_ϕ start_POSTSUBSCRIPT italic_co start_POSTSUBSCRIPT italic_causal end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) < italic_ϕ start_POSTSUBSCRIPT italic_co start_POSTSUBSCRIPT italic_causal end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG

4.3.2. Read committed (rc)

Similar to causal, IsoPredict generates constraints so that the transitive closure of rc arbitration order (𝑤𝑤𝑟𝑐subscript𝑤𝑤𝑟𝑐\mathit{ww}_{\mathit{rc}}italic_ww start_POSTSUBSCRIPT italic_rc end_POSTSUBSCRIPT) and happens-before (ℎ𝑏ℎ𝑏\mathit{hb}italic_hb) is acyclic (§2.4). IsoPredict encodes the rc axiom (Equation 4) with the help of a Boolean SMT function ϕ𝑤𝑤𝑟𝑐(t1,t2)subscriptitalic-ϕsubscript𝑤𝑤𝑟𝑐subscript𝑡1subscript𝑡2\phi_{\mathit{ww}_{\mathit{rc}}}(t_{1},t_{2})italic_ϕ start_POSTSUBSCRIPT italic_ww start_POSTSUBSCRIPT italic_rc end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) that represents 𝑤𝑤𝑟𝑐subscript𝑤𝑤𝑟𝑐\mathit{ww}_{\mathit{rc}}italic_ww start_POSTSUBSCRIPT italic_rc end_POSTSUBSCRIPT:

t1,t2T,t1t2,ϕ𝑤𝑤𝑟𝑐(t1,t2)=k,t1 and t2 write kt3T{t1,t2},t3 reads ki𝑟𝑑𝑝𝑜𝑠(t3),j𝑟𝑑𝑝𝑜𝑠k(t3),i<jϕ𝑐ℎ𝑜𝑖𝑐𝑒(s3,i)=t1ϕ𝑐ℎ𝑜𝑖𝑐𝑒(s3,j)=t2formulae-sequencefor-allsubscript𝑡1subscript𝑡2𝑇subscript𝑡1subscript𝑡2subscriptitalic-ϕsubscript𝑤𝑤𝑟𝑐subscript𝑡1subscript𝑡2subscriptfor-all𝑘subscript𝑡1 and subscript𝑡2 write 𝑘for-allsubscript𝑡3𝑇subscript𝑡1subscript𝑡2subscript𝑡3 reads 𝑘formulae-sequencefor-all𝑖subscript𝑟𝑑𝑝𝑜𝑠subscript𝑡3formulae-sequencefor-all𝑗subscript𝑟𝑑𝑝𝑜𝑠𝑘subscript𝑡3𝑖𝑗subscriptitalic-ϕ𝑐ℎ𝑜𝑖𝑐𝑒subscript𝑠3𝑖subscript𝑡1subscriptitalic-ϕ𝑐ℎ𝑜𝑖𝑐𝑒subscript𝑠3𝑗subscript𝑡2\displaystyle\forall t_{1},t_{2}\in T,t_{1}\neq t_{2},\;\;\;\boxed{\phi_{% \mathit{ww}_{\mathit{rc}}}(t_{1},t_{2})=\bigvee_{\begin{subarray}{c}\forall k,% \;t_{1}\textnormal{ and }t_{2}\textnormal{ write }k\\ \forall t_{3}\in\mathit{T}\setminus\{t_{1},t_{2}\},\;t_{3}\textnormal{ reads }% k\\ \forall i\in\mathit{rdpos}_{\ast}(t_{3}),\forall j\in\mathit{rdpos_{k}}(t_{3})% ,\;i<j\end{subarray}}\phi_{\mathit{choice}}(s_{3},i)=t_{1}\land\phi_{\mathit{% choice}}(s_{3},j)=t_{2}}∀ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ italic_T , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≠ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , start_ARG italic_ϕ start_POSTSUBSCRIPT italic_ww start_POSTSUBSCRIPT italic_rc end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = ⋁ start_POSTSUBSCRIPT start_ARG start_ROW start_CELL ∀ italic_k , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT write italic_k end_CELL end_ROW start_ROW start_CELL ∀ italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ∈ italic_T ∖ { italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT } , italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT reads italic_k end_CELL end_ROW start_ROW start_CELL ∀ italic_i ∈ italic_rdpos start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) , ∀ italic_j ∈ italic_rdpos start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) , italic_i < italic_j end_CELL end_ROW end_ARG end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_choice end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_i ) = italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∧ italic_ϕ start_POSTSUBSCRIPT italic_choice end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_j ) = italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG

where 𝑟𝑑𝑝𝑜𝑠(t)subscript𝑟𝑑𝑝𝑜𝑠𝑡\mathit{rdpos}_{\ast}(t)italic_rdpos start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ( italic_t ) is the set of positions of read events in transaction t𝑡titalic_t, 𝑟𝑑𝑝𝑜𝑠k(t)subscript𝑟𝑑𝑝𝑜𝑠𝑘𝑡\mathit{rdpos_{k}}(t)italic_rdpos start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) is the set of positions of read to k𝑘kitalic_k in transaction t𝑡titalic_t, and s3subscript𝑠3s_{3}italic_s start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT is t3subscript𝑡3t_{3}italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT’s transaction. To ensure there exists a strict total order that is consistent with (ℎ𝑏𝑤𝑤𝑟𝑐)+superscriptℎ𝑏subscript𝑤𝑤𝑟𝑐(\mathit{hb}\cup\mathit{ww}_{\mathit{rc}})^{+}( italic_hb ∪ italic_ww start_POSTSUBSCRIPT italic_rc end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT (Equation 5), IsoPredict generates constraints on an integer SMT function ϕ𝑐𝑜𝑟𝑐(t)subscriptitalic-ϕsubscript𝑐𝑜𝑟𝑐𝑡\phi_{\mathit{co_{rc}}}(t)italic_ϕ start_POSTSUBSCRIPT italic_co start_POSTSUBSCRIPT italic_rc end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ):

t,t1,t2T,t1t2,ϕℎ𝑏(t1,t2)ϕ𝑤𝑤𝑟𝑐(t1,t2)ϕ𝑐𝑜𝑟𝑐(t1)<ϕ𝑐𝑜𝑟𝑐(t2)formulae-sequencefor-all𝑡subscript𝑡1subscript𝑡2𝑇subscript𝑡1subscript𝑡2subscriptitalic-ϕℎ𝑏subscript𝑡1subscript𝑡2subscriptitalic-ϕsubscript𝑤𝑤𝑟𝑐subscript𝑡1subscript𝑡2subscriptitalic-ϕsubscript𝑐𝑜𝑟𝑐subscript𝑡1subscriptitalic-ϕsubscript𝑐𝑜𝑟𝑐subscript𝑡2\displaystyle\forall t,t_{1},t_{2}\in T,t_{1}\neq t_{2},\quad\boxed{\phi_{% \mathit{hb}}(t_{1},t_{2})\lor\phi_{\mathit{ww}_{\mathit{rc}}}(t_{1},t_{2})\;% \Rightarrow\;\phi_{\mathit{co_{rc}}}(t_{1})<\phi_{\mathit{co_{rc}}}(t_{2})}∀ italic_t , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ italic_T , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≠ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , start_ARG italic_ϕ start_POSTSUBSCRIPT italic_hb end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∨ italic_ϕ start_POSTSUBSCRIPT italic_ww start_POSTSUBSCRIPT italic_rc end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ⇒ italic_ϕ start_POSTSUBSCRIPT italic_co start_POSTSUBSCRIPT italic_rc end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) < italic_ϕ start_POSTSUBSCRIPT italic_co start_POSTSUBSCRIPT italic_rc end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG

4.4. Prediction Examples

This section shows causal, unserializable behaviors predicted by IsoPredict on programs evaluated in §7. The actual executions consist of dozens of transactions and thousands of events; the figures show only the transactions and events relevant to predicting unserializable behavior.

Figure 6(a) shows an observed execution of the Wikipedia benchmark, and Figure 6(b) shows the causal, unserializable execution predicted by IsoPredict. In contrast, Figure 6(c) shows a different observed execution of Wikipedia, from which no causal, unserializable execution can be predicted. Figure 6(d) serves to illustrate that changing t3subscript𝑡3t_{3}italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT’s read of x𝑥xitalic_x to read from t0subscript𝑡0t_{0}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT would lead to a non-causal execution (and thus will not be reported by IsoPredict).

((a)) An observed execution of Wikipedia for which a predicted causal, unserializable execution exists.
((b)) A causal, unserializable prediction; the 𝑝𝑐𝑜𝑝𝑐𝑜\mathit{pco}italic_pco cycle (including 𝑟𝑤𝑟𝑤\mathit{rw}italic_rw edges) shows it is unserializable.
((c)) An observed execution of Wikipedia, for which no predicted causal, unserializable execution exists.
((d)) The non-causal execution that results if we try to change (6(c)) so t3subscript𝑡3t_{3}italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT reads from t0subscript𝑡0t_{0}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT.
Figure 7. Comparison of (relevant subsets of) executions from Wikipedia. Blue edges highlight the differences between observed and predicted executions.

Figure 7(a) shows an observed execution of the Smallbank benchmark, and Figure 7(b) shows the IsoPredict-predicted execution. As Figure 7(b) shows, a causal, unserializable predicted execution exists in which both reads read from the initial state (t0subscript𝑡0t_{0}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT), as demonstrated by the 𝑝𝑐𝑜𝑝𝑐𝑜\mathit{pco}italic_pco cycle t1<𝑐𝑜t3<𝑐𝑜t2<𝑐𝑜t4<𝑐𝑜t1subscript𝑐𝑜subscript𝑡1subscript𝑡3subscript𝑐𝑜subscript𝑡2subscript𝑐𝑜subscript𝑡4subscript𝑐𝑜subscript𝑡1t_{1}<_{\mathit{co}}t_{3}<_{\mathit{co}}t_{2}<_{\mathit{co}}t_{4}<_{\mathit{co% }}t_{1}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT < start_POSTSUBSCRIPT italic_co end_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT < start_POSTSUBSCRIPT italic_co end_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT < start_POSTSUBSCRIPT italic_co end_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT < start_POSTSUBSCRIPT italic_co end_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT.

((a)) An observed execution for which a causal, unserializable predicted execution exists.
((b)) A causal, unserializable predicted execution as shown by the 𝑝𝑐𝑜𝑝𝑐𝑜\mathit{pco}italic_pco cycles including 𝑟𝑤𝑟𝑤\mathit{rw}italic_rw edges.
Figure 8. Observed and predicted executions of Smallbank. For simplicity, each history shows a subset of the executed transactions, and each transaction shows a subset of the executed events.

4.5. Handling Divergence in the Predicted Execution

Reading from a different write in the predicted execution than in the observed execution, may lead to different application behaviors. Specifically, code in the data store application that is control dependent on a read from a different writer transaction may generate different events. For example, consider the observed execution shown in Figures 8(a) and 8(b), which executes transactions shown in Algorithms 1 and 2. Figure 8(c) shows an unserializable predicted history that IsoPredict would find using the constraints presented so far. However, the predicted execution is infeasible: t2subscript𝑡2t_{2}italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT aborts if it reads from t0subscript𝑡0t_{0}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, making it impossible for t3subscript𝑡3t_{3}italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT to read from t2subscript𝑡2t_{2}italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, as Figure 8(d) shows. IsoPredict (mostly) avoids make spurious predictions, by excluding (much of the) potentially divergent behavior.

((a)) One session deposits into an account twice, while another session withdraws once.
((b)) The execution history for (8(a)), which is serializable. The write–read edge from t1subscript𝑡1t_{1}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT to t2subscript𝑡2t_{2}italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (shown in blue) is not present in the predicted execution in (8(c)).
((c)) A predicted execution history that is unserializable. The write–read edge from t0subscript𝑡0t_{0}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT to t2subscript𝑡2t_{2}italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (shown in blue) was not present in the observed execution (8(b)).
((d)) The validating execution based on the predicted execution in (8(c)). It diverges because t2subscript𝑡2t_{2}italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT aborts, and the resulting execution is serializable.
((e)) This execution history consisting of the events from the predicted execution in (8(c)) that are within the strict prediction boundary is serializable.
((f)) This execution history consisting of the events from the predicted execution in (8(c)) that are within the relaxed prediction boundary is unserializable.
Figure 9. Motivation for a prediction boundary (8(a)8(d)) and illustration of the two kinds of prediction boundaries (8(e)8(f)). The target weak isolation model is causal. Dashed arrows represent 𝑝𝑐𝑜𝑝𝑐𝑜\mathit{pco}italic_pco edges that are not part of the history.
Algorithm 2 A procedure in a data store application that withdraws money from an account.
procedure withdraw(𝑎𝑐𝑐𝑜𝑢𝑛𝑡𝑎𝑐𝑐𝑜𝑢𝑛𝑡\mathit{account}italic_account, 𝑎𝑚𝑜𝑢𝑛𝑡𝑎𝑚𝑜𝑢𝑛𝑡\mathit{amount}italic_amount)
    𝑏𝑎𝑙𝑎𝑛𝑐𝑒𝐷𝑎𝑡𝑎𝑆𝑡𝑜𝑟𝑒.𝑔𝑒𝑡(𝑎𝑐𝑐𝑜𝑢𝑛𝑡)formulae-sequence𝑏𝑎𝑙𝑎𝑛𝑐𝑒𝐷𝑎𝑡𝑎𝑆𝑡𝑜𝑟𝑒𝑔𝑒𝑡𝑎𝑐𝑐𝑜𝑢𝑛𝑡\mathit{balance}\leftarrow\mathit{DataStore}.\mathit{get}(\mathit{account})italic_balance ← italic_DataStore . italic_get ( italic_account ) \triangleright Read balance; implicitly starts transaction
    if 𝑏𝑎𝑙𝑎𝑛𝑐𝑒<𝑎𝑚𝑜𝑢𝑛𝑡𝑏𝑎𝑙𝑎𝑛𝑐𝑒𝑎𝑚𝑜𝑢𝑛𝑡\mathit{balance}<\mathit{amount}italic_balance < italic_amount then
         𝐷𝑎𝑡𝑎𝑆𝑡𝑜𝑟𝑒.𝑟𝑜𝑙𝑙𝑏𝑎𝑐𝑘()formulae-sequence𝐷𝑎𝑡𝑎𝑆𝑡𝑜𝑟𝑒𝑟𝑜𝑙𝑙𝑏𝑎𝑐𝑘\mathit{DataStore}.\mathit{rollback}()italic_DataStore . italic_rollback ( ) \triangleright Abort transaction
    else
         𝐷𝑎𝑡𝑎𝑆𝑡𝑜𝑟𝑒.𝑝𝑢𝑡(𝑎𝑐𝑐𝑜𝑢𝑛𝑡,𝑏𝑎𝑙𝑎𝑛𝑐𝑒𝑎𝑚𝑜𝑢𝑛𝑡)formulae-sequence𝐷𝑎𝑡𝑎𝑆𝑡𝑜𝑟𝑒𝑝𝑢𝑡𝑎𝑐𝑐𝑜𝑢𝑛𝑡𝑏𝑎𝑙𝑎𝑛𝑐𝑒𝑎𝑚𝑜𝑢𝑛𝑡\mathit{DataStore}.\mathit{put}(\mathit{account},\mathit{balance}-\mathit{% amount})italic_DataStore . italic_put ( italic_account , italic_balance - italic_amount ) \triangleright Update balance
         𝐷𝑎𝑡𝑎𝑆𝑡𝑜𝑟𝑒.𝑐𝑜𝑚𝑚𝑖𝑡()formulae-sequence𝐷𝑎𝑡𝑎𝑆𝑡𝑜𝑟𝑒𝑐𝑜𝑚𝑚𝑖𝑡\mathit{DataStore}.\mathit{commit}()italic_DataStore . italic_commit ( ) \triangleright Commit transaction     

Divergent behavior

To account for divergent behavior, we make a distinction between the predicted execution, which is generated by IsoPredict based on the observed execution, and what we call the validating execution, which is the execution that actually occurs if one tries to produce the predicted execution using the data store application. Divergent behaviors are behaviors that differ between the predicted and validating executions. We categorize divergent behaviors into two categories:

  • The validating execution reads or writes different keys or omits or adds events from the predicted execution, leading to a different execution history with different properties.

  • A transaction that commits in the predicted execution, aborts in the validating execution (e.g., an application might have logic that aborts if a consistency check fails), as Figures 8(c) and 8(d) show.

The problem with divergent behavior is that an unserializable predicted execution can lead to a serializable validating execution. (The validating execution will always be a feasible execution conforming to the weak isolation model because validation ensures these properties; §5.)

Prediction boundary

IsoPredict accounts for divergence by generating prediction boundary constraints that exclude events that may be impacted by divergence—specifically, events that happen-after (i.e., inverse of ℎ𝑏ℎ𝑏\mathit{hb}italic_hb) any read event that reads from different writers in the predicted and observed executions. IsoPredict supports a prediction boundary that is strict or relaxed, as shown in Table 1. The strict boundary excludes events that happen-after events that read from a different writer in the predicted execution than in the observed execution. The strict boundary prevents false predictions except when a transaction in the predicted execution aborts in the validating execution. Alternatively, the relaxed boundary excludes events that happen-after transactions that read from a different writer, risking more false predictions but increasing the chances of finding an unserializable predicted execution.

Table 1. Comparison of strict and relaxed prediction boundaries.
Prediction Divergent behaviors can
boundary Excluded events cause false predictions
Strict

Events that happen-after any read event with a different writer

Abort-related only
Relaxed

Events that happen-after any transaction containing a read event with a different writer

Any

Figures 8(e) and 8(f) show strict and relaxed boundaries, respectively, applied to the prediction in Figure 8(c). The strict boundary excludes all events that happen-after t2subscript𝑡2t_{2}italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT’s read (since it has a different writer than in Figure 8(b)); the resulting execution history is serializable. The relaxed boundary excludes all transactions that happen-after t2subscript𝑡2t_{2}italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT’s read; the resulting execution history is unserializable. Although the relaxed boundary allows a false prediction in this example, in our evaluation the relaxed boundary results in few false predictions.

Generating prediction boundary constraints

Here we present IsoPredict’s constraints for excluding events using the prediction boundary. We show constraints for the strict prediction boundary, but the constraints for the relaxed prediction boundary are similar except they also constrain every session’s boundary to be the last event of a transaction.

The prediction boundary is delimited by a boundary event in each session, which is either (1) a read event, which reads from a different write in the predicted execution than in the observed execution, or (2) the last event in the session (which will always be a commit event). IsoPredict generates the following constraints on an integer SMT function ϕ𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦(s)subscriptitalic-ϕ𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦𝑠\phi_{\mathit{boundary}}(s)italic_ϕ start_POSTSUBSCRIPT italic_boundary end_POSTSUBSCRIPT ( italic_s ) to ensure that the boundary event for each session is either a read event or the last event (represented by position \infty):

s is a session,(t is a transaction in si𝑟𝑑𝑝𝑜𝑠k(t)ϕ𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦(s)=i)ϕ𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦(s)=for-all𝑠 is a sessionsubscript𝑡 is a transaction in 𝑠𝑖subscript𝑟𝑑𝑝𝑜𝑠𝑘𝑡subscriptitalic-ϕ𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦𝑠𝑖subscriptitalic-ϕ𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦𝑠\displaystyle\forall s\textnormal{ is a session},\quad\boxed{\Big{(}\bigvee_{% \begin{subarray}{c}t\textnormal{ is a transaction in }s\\ i\in\mathit{rdpos_{k}}(t)\end{subarray}}\phi_{\mathit{boundary}}(s)=i\Big{)}% \lor\phi_{\mathit{boundary}}(s)=\infty}∀ italic_s is a session , start_ARG ( ⋁ start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_t is a transaction in italic_s end_CELL end_ROW start_ROW start_CELL italic_i ∈ italic_rdpos start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) end_CELL end_ROW end_ARG end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_boundary end_POSTSUBSCRIPT ( italic_s ) = italic_i ) ∨ italic_ϕ start_POSTSUBSCRIPT italic_boundary end_POSTSUBSCRIPT ( italic_s ) = ∞ end_ARG

Recall that 𝑟𝑑𝑝𝑜𝑠ksubscript𝑟𝑑𝑝𝑜𝑠𝑘\mathit{rdpos_{k}}italic_rdpos start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT(s) is the set of positions of reads to k𝑘kitalic_k in the transaction t𝑡titalic_t.

To ensure that each read that happens-before the prediction boundary reads from the same write as in the observed execution, IsoPredict generates the following constraints, where ϕ𝑜𝑏𝑠(s,i)subscriptitalic-ϕ𝑜𝑏𝑠𝑠𝑖\phi_{\mathit{obs}}(s,i)italic_ϕ start_POSTSUBSCRIPT italic_obs end_POSTSUBSCRIPT ( italic_s , italic_i ) is an integer SMT function that represents the last write of each read in the observed execution history (and is thus the analogue of ϕ𝑐ℎ𝑜𝑖𝑐𝑒subscriptitalic-ϕ𝑐ℎ𝑜𝑖𝑐𝑒\phi_{\mathit{choice}}italic_ϕ start_POSTSUBSCRIPT italic_choice end_POSTSUBSCRIPT for the observed execution):

t1,t2T,t1t2,i𝑟𝑑𝑝𝑜𝑠k(t2)=i,t2’s read at pos i reads from t1 in 𝑤𝑟𝑜𝑏𝑠,ϕ𝑜𝑏𝑠(s2,i)=t1formulae-sequencefor-allsubscript𝑡1subscript𝑡2𝑇formulae-sequencesubscript𝑡1subscript𝑡2for-all𝑖subscript𝑟𝑑𝑝𝑜𝑠𝑘subscript𝑡2𝑖subscript𝑡2’s read at pos i reads from t1 in subscript𝑤𝑟𝑜𝑏𝑠subscriptitalic-ϕ𝑜𝑏𝑠subscript𝑠2𝑖subscript𝑡1\displaystyle\forall t_{1},t_{2}\in T,t_{1}\neq t_{2},\forall i\in\mathit{% rdpos_{k}}(t_{2})=i,t_{2}\textnormal{'s read at pos $i$ reads from $t_{1}$ in % }\mathit{wr_{obs}},\quad\boxed{\phi_{\mathit{obs}}(s_{2},i)=t_{1}}∀ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ italic_T , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≠ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ∀ italic_i ∈ italic_rdpos start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = italic_i , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ’s read at pos italic_i reads from italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT in italic_wr start_POSTSUBSCRIPT italic_obs end_POSTSUBSCRIPT , start_ARG italic_ϕ start_POSTSUBSCRIPT italic_obs end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_i ) = italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG
k is a key,t1 writes k,t2 reads k,i𝑟𝑑𝑝𝑜𝑠k(t2),i<ϕ𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦(s2)ϕ𝑐ℎ𝑜𝑖𝑐𝑒(s2,i)=ϕ𝑜𝑏𝑠(s2,i)formulae-sequencefor-all𝑘 is a keyfor-allsubscript𝑡1 writes 𝑘for-allsubscript𝑡2 reads 𝑘for-all𝑖subscript𝑟𝑑𝑝𝑜𝑠𝑘subscript𝑡2𝑖subscriptitalic-ϕ𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦subscript𝑠2subscriptitalic-ϕ𝑐ℎ𝑜𝑖𝑐𝑒subscript𝑠2𝑖subscriptitalic-ϕ𝑜𝑏𝑠subscript𝑠2𝑖\displaystyle\forall k\textnormal{ is a key},\forall t_{1}\textnormal{ writes % }k,\forall t_{2}\textnormal{ reads }k,\forall i\in\mathit{rdpos_{k}}(t_{2}),\;% \;\boxed{i<\phi_{\mathit{boundary}}(s_{2})\;\Rightarrow\,\phi_{\mathit{choice}% }(s_{2},i)=\phi_{\mathit{obs}}(s_{2},i)}∀ italic_k is a key , ∀ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT writes italic_k , ∀ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT reads italic_k , ∀ italic_i ∈ italic_rdpos start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , start_ARG italic_i < italic_ϕ start_POSTSUBSCRIPT italic_boundary end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ⇒ italic_ϕ start_POSTSUBSCRIPT italic_choice end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_i ) = italic_ϕ start_POSTSUBSCRIPT italic_obs end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_i ) end_ARG

where s1subscript𝑠1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is t1subscript𝑡1t_{1}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT’s session and s2subscript𝑠2s_{2}italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is t2subscript𝑡2t_{2}italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT’s session.

A read to k𝑘kitalic_k on or before the prediction boundary must read from a write to k𝑘kitalic_k that is before the prediction boundary. IsoPredict ensures this property by generating the following constraints:

k is a key,t1 writes k,t2 reads k,i𝑟𝑑𝑝𝑜𝑠k(t2),for-all𝑘 is a keyfor-allsubscript𝑡1 writes 𝑘for-allsubscript𝑡2 reads 𝑘for-all𝑖subscript𝑟𝑑𝑝𝑜𝑠𝑘subscript𝑡2\displaystyle\forall k\textnormal{ is a key},\forall t_{1}\textnormal{ writes % }k,\forall t_{2}\textnormal{ reads }k,\forall i\in\mathit{rdpos_{k}}(t_{2}),∀ italic_k is a key , ∀ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT writes italic_k , ∀ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT reads italic_k , ∀ italic_i ∈ italic_rdpos start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ,
ϕ𝑐ℎ𝑜𝑖𝑐𝑒(s2,i)=t1iϕ𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦(s2)𝑤𝑟𝑝𝑜𝑠k(t1)<ϕ𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦(s1)subscriptitalic-ϕ𝑐ℎ𝑜𝑖𝑐𝑒subscript𝑠2𝑖subscript𝑡1𝑖subscriptitalic-ϕ𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦subscript𝑠2subscript𝑤𝑟𝑝𝑜𝑠𝑘subscript𝑡1subscriptitalic-ϕ𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦subscript𝑠1\displaystyle\boxed{\phi_{\mathit{choice}}(s_{2},i)=t_{1}\land i\leq\phi_{% \mathit{boundary}}(s_{2})\implies\mathit{wrpos}_{k}(t_{1})<\phi_{\mathit{% boundary}}(s_{1})}italic_ϕ start_POSTSUBSCRIPT italic_choice end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_i ) = italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∧ italic_i ≤ italic_ϕ start_POSTSUBSCRIPT italic_boundary end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ⟹ italic_wrpos start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) < italic_ϕ start_POSTSUBSCRIPT italic_boundary end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT )

where s1subscript𝑠1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is t1subscript𝑡1t_{1}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT’s session, s2subscript𝑠2s_{2}italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is t2subscript𝑡2t_{2}italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT’s session, and 𝑤𝑟𝑝𝑜𝑠ksubscript𝑤𝑟𝑝𝑜𝑠𝑘\mathit{wrpos}_{k}italic_wrpos start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT(t) is the position of t𝑡titalic_t’s last write to key k𝑘kitalic_k.

To exclude events after the prediction boundary, IsoPredict generates modified constraints for all arbitration and anti-dependency rules, as detailed in Appendix B.

5. Validation

Even by using the prediction boundary, IsoPredict’s predictive analysis may report unserializable predicted executions for which the corresponding validating execution is serializable. To rule out such predictions, IsoPredict can attempt to validate predicted executions, by executing the data store application based on the predicted execution history, and checking whether the resulting validating execution is unserializable.

Validating execution

Validation produces the validating execution using a query engine that takes the predicted execution as input. At each read(k𝑘kitalic_k) event, the query engine checks that (1) the corresponding read in the predicted execution also read from k𝑘kitalic_k; (2) the writer transaction t𝑡titalic_t from the predicted execution also wrote to k𝑘kitalic_k in the validating execution; and (3) reading from t𝑡titalic_t in the validating execution will satisfy the weak isolation model (causal or rc). If any of these conditions is violated, we categorize the execution as having diverged, and the query engine chooses a different, weak isolation model–conforming writer for the read to read from. Note that it is always possible to keep executing while preserving causal or rc (Bouajjani et al., 2023). Furthermore, the validating execution may still be unserializable, as our evaluation shows.

Recall that the predicted execution history contains events only up to the prediction boundary. To avoid serendipitously introducing unserializable behaviors that were not part of the predicted execution (which could make it tough to measure the effectiveness of IsoPredict’s predictive analysis), validation executes each transaction in full that is on the boundary or that happens-before any transaction on the boundary—and then it terminates the execution. This approach is sufficient: If this execution prefix is unserializable, then so is the full execution.

Note that validation must directly control what transaction each read reads from, i.e., the write–read relation (𝑤𝑟𝑤𝑟\mathit{wr}italic_wr). Our evaluation extends MonkeyDB (Biswas et al., 2021) to allow explicit control of 𝑤𝑟𝑤𝑟\mathit{wr}italic_wr6). In settings where MonkeyDB cannot be used, such as production systems, there are other ways to control 𝑤𝑟𝑤𝑟\mathit{wr}italic_wr. One is using resource locks (e.g., sp_getapplock in SQL Server) to force specific transaction orders that produce the desired 𝑤𝑟𝑤𝑟\mathit{wr}italic_wr relation.

Checking serializability

Validation generates constraints to check whether the validating execution history is serializable (which can be encoded more efficiently than unserializable, since serializable implies a total commit order exists). If the solver returns “satisfiable,” IsoPredict reports no prediction. Otherwise (the solver returns “unsatisfiable”), IsoPredict reports the validating execution, which is known to be a feasible, unserializable, weak isolation model–conforming execution.

6. Implementation

This section describes the implementation of IsoPredict, which is publicly available (Geng et al., 2024b).

Predictive analysis

We implemented IsoPredict’s predictive analysis (§4) as a Python program that uses Z3Py, the Python binding of the Z3 SMT solver (de Moura and Bjørner, 2008). Observed and predicted execution histories are in the form of traces containing read and write events and transaction and session identifiers, including the transaction that each read reads from. If Z3 finds a predicted unserializable execution, it either reports the predicted execution history in both textual and graphical forms, or passes the predicted history to the validation component, depending on how IsoPredict is configured.

To generate observed execution traces, we extended the implementation of MonkeyDB, a transactional key–value data store (Biswas et al., 2021). MonkeyDB handles relational queries by translating them to key–value queries. MonkeyDB executes transactions serially, and we configured it to choose the latest writer at each read, so observed executions are always serializable.

Validation

IsoPredict’s validation component replays the client application on a customized query engine that we also built on top of MonkeyDB. The query engine executes transactions one at a time, in an order dictated by the predicted execution, to ensure that read events always occur after their writers. At each read, the query engine chooses a last writer that satisfies the weak isolation model and, if possible, matches the predicted execution (§5). Validation uses Z3Py to generate and solve SMT constraints to determine if the validating execution history is unserializable, reporting the validating execution to the user in both textual and graphical forms if so.

The customized query engine handles transaction aborts by rewinding the predicted execution trace to the beginning of the current transaction. In our experiments, every transaction that aborted during the observed execution also aborts during the validating execution—except in a few cases, when a transaction that aborted in the observed execution and immediately precedes a committed transaction on the prediction boundary, actually commits in the validating execution. As for other divergent behavior, the resulting validating execution may or may not be unserializable.

7. Evaluation

This section evaluates how effectively and efficiently IsoPredict predicts unserializable executions under causal and rc, and it compares empirically against prior work MonkeyDB (Biswas et al., 2021).

7.1. Methodology

Prediction strategies

Table 2. IsoPredict prediction strategies.
Pred. strategy Encoding precision Pred. boundary Divergence normal-⇒\Rightarrow false predictions?
Exact-Strict Exact encoding Strict Only because of aborts
Approx-Strict Approximate encoding Strict Only because of aborts
Approx-Relaxed Approximate encoding Relaxed Yes

Table 2 shows the combinations of unserializability constraints and prediction boundaries that we evaluated, which we call prediction strategies. The Exact-Strict prediction strategy uses precise encoding of unserializability (§4.2.1), while Approx-Strict and Approx-Relaxed encode the sufficient condition for unserializability (§4.2.2). Exact-Strict and Approx-Strict encode the strict prediction boundary, while Approx-Relaxed encode the relaxed prediction boundary.

Benchmarks

We evaluated IsoPredict and MonkeyDB using transactional workloads from OLTP-Bench, a database testing framework that generates various workloads for benchmarking relational databases (Difallah et al., 2013). Table 3 shows quantitative characteristics of the evaluated Benchmarks.

Our experiments used versions of the OLTP-Bench programs that the MonkeyDB authors ported to use simplified SQL queries recognized by MonkeyDB (Biswas et al., 2021). In these versions, each benchmark runs a nondeterministic number of transactions based on a specified time limit. For the purposes of our evaluation, we modified the benchmarks to be more deterministic for two reasons. First, determinism provides a more stable comparison among IsoPredict’s prediction strategies. Second, determinism helps with validation, since the validating execution can run the benchmark with the same RNG seed that the observed execution used. (To use validation in a production setting, one should record and replay the application (Galanis et al., 2008; Li et al., 2023).) We modified the benchmarks to be more deterministic by (1) fixing the number of sessions and transactions per session and (2) adding a random number generator (RNG) seed as a parameter to each benchmark. Although these modifications increase determinism, the benchmarks still execute nondeterministically because the interleaving of transactions is timing dependent. This source of nondeterminism does not hinder validation, which executes transactions in an order consistent with the predicted execution’s ℎ𝑏ℎ𝑏\mathit{hb}italic_hb relation.

Table 3. Average number of events and committed transactions across 10 trials of each OLTP-Bench program.
Small workload Large workload
KV accesses Committed txns KV accesses Committed txns
Program Reads Writes Total (Read-only) Reads Writes Total (Read-only)
Smallbank 669.7 14.7 11.0 (3.5) 1271.3 30.5 20.3 (6.6)
Voter 763.0 6.0 12.0 (11.0) 919.0 6.0 24.0 (23.0)
TPC-C 3297.3 763.0 11.9 (0.9) 7025.6 1502.4 23.8 (1.7)
Wikipedia 1067.7 55.1 9.9 (8.8) 2677.1 111.1 22.8 (20.6)
Algorithm 3 Code executed by each of Voter’s transactions.
procedure Vote(id)
    𝑣𝑜𝑡𝑒𝑠𝐷𝑎𝑡𝑎𝑆𝑡𝑜𝑟𝑒.𝑔𝑒𝑡(𝑖𝑑)formulae-sequence𝑣𝑜𝑡𝑒𝑠𝐷𝑎𝑡𝑎𝑆𝑡𝑜𝑟𝑒𝑔𝑒𝑡𝑖𝑑\mathit{votes}\leftarrow\mathit{DataStore}.\mathit{get}(\mathit{id})italic_votes ← italic_DataStore . italic_get ( italic_id )
    if 𝑣𝑜𝑡𝑒𝑠<1𝑣𝑜𝑡𝑒𝑠1\mathit{votes}<1italic_votes < 1 then
         𝐷𝑎𝑡𝑎𝑆𝑡𝑜𝑟𝑒.𝑝𝑢𝑡(𝑖𝑑,1)formulae-sequence𝐷𝑎𝑡𝑎𝑆𝑡𝑜𝑟𝑒𝑝𝑢𝑡𝑖𝑑1\mathit{DataStore}.\mathit{put}(\mathit{id},1)italic_DataStore . italic_put ( italic_id , 1 )     
    𝐷𝑎𝑡𝑎𝑆𝑡𝑜𝑟𝑒.𝑐𝑜𝑚𝑚𝑖𝑡()formulae-sequence𝐷𝑎𝑡𝑎𝑆𝑡𝑜𝑟𝑒𝑐𝑜𝑚𝑚𝑖𝑡\mathit{DataStore}.\mathit{commit}()italic_DataStore . italic_commit ( )

We configured each benchmark with both small and large workloads, in which three sessions each execute four or eight transactions, resulting in 12 or 24 attempted transactions, respectively. The number of committed transactions is somewhat fewer because all programs except Voter occasionally abort a transaction based on application-specific logic.

Platform

All experiments ran on an Intel Xeon server at 2.3 GHz with 16 cores, hyperthreading enabled, and 187 GB of RAM, running Linux.

7.2. IsoPredict’s Effectiveness and Performance

Tables 4 and 5 show IsoPredict’s effectiveness and performance at predicting unserializable executions under causal and rc, respectively. For each benchmark and each of IsoPredict’s three prediction strategies, we ran IsoPredict on 10 executions, each of which used one of 10 RNG seeds, which we kept consistent across prediction strategies and isolation levels.

Table 4. IsoPredict effectiveness and performance under causal. “T/O” means the solver did not finish within 24 hours. “Unk” means the solver returned “unknown” without reaching the timeout.
Prediction Prediction Validation Constraint gen. Solving time
Program strategy Unk Unsat Sat Validated (Diverged) # Literals Time Sat Unsat
Smallbank Exact-Strict 0 6 4 4 (0) 140140140140 K 8.88.88.88.8 s 13.913.913.913.9 s 11.311.311.311.3 s
Approx-Strict 0 6 4 4 (1) 366366366366 K 22.922.922.922.9 s 1.01.01.01.0 s 3.23.23.23.2 s
Approx-Relaxed 0 0 10 9 (1) 366366366366 K 22.922.922.922.9 s 0.60.60.60.6 s
Voter Exact-Strict 0 10 0 0 (0) 687687687687 K 61.761.761.761.7 s 64.564.564.564.5 s
Approx-Strict 0 10 0 0 (0) 1,52615261,5261 , 526 K 131.7131.7131.7131.7 s 10.410.410.410.4 s
Approx-Relaxed 0 10 0 0 (0) 1,52615261,5261 , 526 K 132.1132.1132.1132.1 s 10.010.010.010.0 s
TPC-C Exact-Strict 0 1 9 9 (0) 3,49334933,4933 , 493 K 220.4220.4220.4220.4 s 230.4230.4230.4230.4 s 752.3752.3752.3752.3 s
Approx-Strict 0 1 9 9 (0) 6,50865086,5086 , 508 K 425.8425.8425.8425.8 s 35.135.135.135.1 s 105.2105.2105.2105.2 s
Approx-Relaxed 0 0 10 10 (0) 6,50865086,5086 , 508 K 425.5425.5425.5425.5 s 22.722.722.722.7 s
Wikipedia Exact-Strict 1 9 0 0 (0) 180180180180 K 13.913.913.913.9 s 24.024.024.024.0 s
Approx-Strict 0 10 0 0 (0) 529529529529 K 36.336.336.336.3 s 1.31.31.31.3 s
Approx-Relaxed 0 8 2 2 (1) 529529529529 K 36.336.336.336.3 s 2.52.52.52.5 s 1.01.01.01.0 s
((a)) Small workload
Prediction Prediction Validation Constraint gen. Solving time
Program strategy T/O Unsat Sat Validated (Diverged) # Literals Time Sat Unsat
Smallbank Exact-Strict 4 1 5 5 (1) 1,07310731,0731 , 073 K 55.655.655.655.6 s 8,618.98618.98,618.98 , 618.9 s 2,366.22366.22,366.22 , 366.2 s
Approx-Strict 1 0 9 9 (0) 2,17521752,1752 , 175 K 121.0121.0121.0121.0 s 332.5332.5332.5332.5 s
Approx-Relaxed 0 0 10 10 (0) 1,07310731,0731 , 073 K 118.8118.8118.8118.8 s 19.319.319.319.3 s
Voter Exact-Strict 9 1 0 0 (0) 2,62326232,6232 , 623 K 235.1235.1235.1235.1 s 5,708.75708.75,708.75 , 708.7 s
Approx-Strict 0 10 0 0 (0) 5,62356235,6235 , 623 K 490.5490.5490.5490.5 s 47.247.247.247.2 s
Approx-Relaxed 0 10 0 0 (0) 5,62356235,6235 , 623 K 496.1496.1496.1496.1 s 47.147.147.147.1 s
TPC-C Exact-Strict 4 3 3 3 (0) 36,4343643436,43436 , 434 K 1,914.61914.61,914.61 , 914.6 s 30,413.130413.130,413.130 , 413.1 s 24,281.224281.224,281.224 , 281.2 s
Approx-Strict 2 0 8 8 (0) 60,8346083460,83460 , 834 K 3,416.13416.13,416.13 , 416.1 s 1,210.31210.31,210.31 , 210.3 s
Approx-Relaxed 0 0 10 10 (2) 60,8346083460,83460 , 834 K 3,332.33332.33,332.33 , 332.3 s 186.2186.2186.2186.2 s
Wikipedia Exact-Strict 8 1 1 1 (0) 1,77317731,7731 , 773 K 111.9111.9111.9111.9 s 910.2910.2910.2910.2 s 1,876.81876.81,876.81 , 876.8 s
Approx-Strict 0 9 1 1 (0) 4,31643164,3164 , 316 K 263.7263.7263.7263.7 s 15.615.615.615.6 s 30.130.130.130.1 s
Approx-Relaxed 0 8 2 2 (2) 4,31643164,3164 , 316 K 258.3258.3258.3258.3 s 20.320.320.320.3 s 25.325.325.325.3 s
((b)) Large workload
Table 5. IsoPredict effectiveness and performance under rc. “T/O” means the solver did not finish within 24 hours. “Unk” means the solver returned “unknown” without reaching the timeout.
Prediction Prediction Validation Constraint gen. Solving time
Program strategy Unk Unsat Sat Validated (Diverged) # Literals Time Sat Unsat
Smallbank Exact-Strict 0 0 10 10 (0) 144144144144 K 10.010.010.010.0 s 2.32.32.32.3 s
Approx-Strict 0 0 10 10 (0) 370370370370 K 24.324.324.324.3 s 0.80.80.80.8 s
Approx-Relaxed 0 0 10 10 (0) 370370370370 K 24.424.424.424.4 s 0.60.60.60.6 s
Voter Exact-Strict 0 0 10 10 (2) 688688688688 K 62.562.562.562.5 s 12.912.912.912.9 s
Approx-Strict 0 0 10 10 (7) 1,52715271,5271 , 527 K 133.0133.0133.0133.0 s 12.312.312.312.3 s
Approx-Relaxed 0 0 10 10 (10) 1,52715271,5271 , 527 K 132.7132.7132.7132.7 s 12.712.712.712.7 s
TPC-C Exact-Strict 0 0 10 10 (0) 3,85538553,8553 , 855 K 359.0359.0359.0359.0 s 52.052.052.052.0 s
Approx-Strict 0 0 10 10 (0) 6,86968696,8696 , 869 K 569.2569.2569.2569.2 s 27.227.227.227.2 s
Approx-Relaxed 0 0 10 10 (3) 6,86968696,8696 , 869 K 588.9588.9588.9588.9 s 22.822.822.822.8 s
Wikipedia Exact-Strict 2 1 7 7 (2) 184184184184 K 15.415.415.415.4 s 3.93.93.93.9 s 8.08.08.08.0 s
Approx-Strict 0 3 7 7 (1) 533533533533 K 38.338.338.338.3 s 2.12.12.12.1 s 0.50.50.50.5 s
Approx-Relaxed 0 3 7 7 (7) 533533533533 K 38.138.138.138.1 s 1.71.71.71.7 s 0.50.50.50.5 s
((a)) Small workload
Prediction Prediction Validation Constraint gen. Solving time
Program strategy T/O Unsat Sat Validated (Diverged) # Literals Time Sat Unsat
Smallbank Exact-Strict 0 0 10 9 (1) 1,08510851,0851 , 085 K 60.360.360.360.3 s 624.6624.6624.6624.6 s
Approx-Strict 0 0 10 10 (1) 2,18721872,1872 , 187 K 124.5124.5124.5124.5 s 19.019.019.019.0 s
Approx-Relaxed 0 0 10 10 (1) 2,18721872,1872 , 187 K 128.7128.7128.7128.7 s 51.751.751.751.7 s
Voter Exact-Strict 0 0 10 10 (2) 2,62526252,6252 , 625 K 255.7255.7255.7255.7 s 212.0212.0212.0212.0 s
Approx-Strict 0 0 10 10 (6) 5,62556255,6255 , 625 K 491.7491.7491.7491.7 s 76.276.276.276.2 s
Approx-Relaxed 0 0 10 10 (10) 5,62556255,6255 , 625 K 495.2495.2495.2495.2 s 75.475.475.475.4 s
TPC-C Exact-Strict 0 0 10 10 (2) 38,0623806238,06238 , 062 K 2,571.02571.02,571.02 , 571.0 s 898.6898.6898.6898.6 s
Approx-Strict 0 0 10 10 (2) 62,4626246262,46262 , 462 K 3,981.93981.93,981.93 , 981.9 s 279.7279.7279.7279.7 s
Approx-Relaxed 0 0 10 10 (4) 62,4626246262,46262 , 462 K 4,040.44040.44,040.44 , 040.4 s 201.0201.0201.0201.0 s
Wikipedia Exact-Strict 0 0 10 10 (1) 1,80718071,8071 , 807 K 124.6124.6124.6124.6 s 81.481.481.481.4 s
Approx-Strict 0 0 10 10 (1) 4,35043504,3504 , 350 K 272.8272.8272.8272.8 s 29.229.229.229.2 s
Approx-Relaxed 0 0 10 9 (10) 4,35043504,3504 , 350 K 272.9272.9272.9272.9 s 16.916.916.916.9 s
((b)) Large workload

Predictive analysis

The Sat column under Prediction reports the number of unserializable executions (out of 10) that IsoPredict found. The Approx-Relaxed prediction strategy generally predicts more than the other strategies because it uses the relaxed boundary. Although Exact-Strict can theoretically predict more executions than Approx-Strict, this never happened in our experiments.

IsoPredict consistently predicts more unserializable executions under rc than under causal, which makes sense because rc is strictly weaker than causal. Voter has the biggest difference—there were no successful predictions under causal. This is because every observed execution of Voter has only one writing (i.e., non-read-only) transaction (see Algorithm 3), which is not sufficient to predict an unserializable execution under causal.555More specifically, the initial state transaction t0subscript𝑡0t_{0}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and the writing transaction twsubscript𝑡𝑤t_{w}italic_t start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT constitute the only pair of conflicting writes. If a transaction trsubscript𝑡𝑟t_{r}italic_t start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT reads from the initial state, then a commit order with trsubscript𝑡𝑟t_{r}italic_t start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT preceding twsubscript𝑡𝑤t_{w}italic_t start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT is acyclic. On the other hand, if trsubscript𝑡𝑟t_{r}italic_t start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT reads from another transaction, a commit order trsubscript𝑡𝑟t_{r}italic_t start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT following twsubscript𝑡𝑤t_{w}italic_t start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT is acyclic. Similarly, IsoPredict has low prediction rates for Wikipedia, which has few writing transactions. In contrast, under rc, a transaction may legally read both the initial state and the writing transaction, which is why IsoPredict has higher prediction rates for Voter and Wikipedia under rc than under causal. §4.4 and Appendix C present some observed and predicted executions from the evaluated benchmarks.

Validation

We configured IsoPredict to validate every predicted unserializable execution. The Validated column reports the number of validating executions that were unserializable. Across all experiments, all but three predicted executions were successfully validated as unserializable.

The Diverged column shows that, in many cases, the validating execution diverged, i.e., it could not match the predicted execution history (§5). Unsurprisingly, the relaxed boundary experienced significantly more divergence than the strict boundary. However, divergence rarely resulted in failed validation: Among the 81 divergent executions across Tables 4 and 5, only three failed validation (i.e., produced serializable executions). One validation failure was caused by divergent behavior unrelated to aborts (§5), and the other two failures were caused by previously aborted transactions being committed (an implementation issue discussed in §6).

Performance

The four rightmost columns of each table report the performance of IsoPredict’s predictive analysis, which consists of two components: (1) the time the analysis takes to generate SMT constraints (Constraint gen.) and (2) SMT solving time (Solving time). Each table also reports the size of the generated constraints (# Literals),666The Approx-Strict and Approx-Relaxed prediction strategies generate different constraints, but they have the same size. which correlates with constraint generation time. SMT solving is significantly faster for successful prediction (Sat) than for failed prediction (Unsat),777It makes sense that successful prediction, which finds a single satisfying solution, is faster than failed prediction, which requires the solver to prove that no satisfying solution exists. so the table reports the two average solving times separately.

Compared to the other prediction strategies, Exact-Strict, which generates a single quantified constraint, spends less time generating constraints but more time solving constraints because its constraints are inherently harder to solve. Approx-Relaxed and Approx-Strict have performance similar to each other, which makes sense since they share the same approximation techniques.

Generating constraints can take a long time—often longer than constraint-solving time. We investigated this issue by using the perf (perf, 2024) and py-spy (Frederickson, 2024) performance analysis tools on the slowest instance of constraint generation: the large workload of TPC-C under rc using the Approx-Relaxed strategy (Table 5). To the best of our understanding, 97% of time is spent in Python code (IsoPredict and Z3Py), and 3% is spent in C code (Z3). Of the time spent in Python, 81% is spent in Z3Py functions, with most time spent in the following Z3PY API functions and their callees: __call__(), And(), and Or(). The __call__() function is part of Z3Py’s implementation of SMT functions, which act as callable objects in Python. The And() and Or() functions create conjunction and disjunction clauses, respectively. Z3Py functions call into Z3 code written in C; an unknown fraction of the time spent in Z3Py is due to making cross-language calls from Z3Py to Z3.

7.3. Comparison with MonkeyDB

MonkeyDB is a transactional key–value data store that aims to produce unusual executions that are legal under a target isolation level (Biswas et al., 2021). MonkeyDB handles each read to a key by returning a randomly chosen value among the set of legal values under the target isolation level.

MonkeyDB and IsoPredict both aim to find erroneous executions under weak isolation, but they use completely different approaches. MonkeyDB relies on a customized query engine that produces a single execution, while IsoPredict uses predictive analysis to analyze an equivalence class of many executions at once. They also differ in how they define and expose unserializable behavior: IsoPredict tries to find an unserializable execution, while MonkeyDB uses programmer-crafted assertions to detect unserializable behaviors.

Tables 7 and 7 compare MonkeyDB and IsoPredict’s effectiveness at predicting unserializable executions. To account for MonkeyDB’s randomized approach, we ran it 100 times for each configuration: 10 times for each of the 10 RNG seeds used as benchmark input (§7.1). The percentage of these executions with an assertion failure is reported in the Fail column.

Table 6. Comparison between MonkeyDB (Biswas et al., 2021) and IsoPredict (Approx-Relaxed strategy) under causal. The numbers report how often a benchmark assertion failed (Fail) or the history was unserializable (Unser).
Table 7. Comparison between MonkeyDB (Biswas et al., 2021), IsoPredict (Approx-Strict strategy), and regular execution using MySQL under rc. Each number is the percentage of runs in which a benchmark assertion failed (Fail) or the history was unserializable (Unser).
MonkeyDB IsoPredict
Program Fail Unser Unser
Smallbank 70% 98% 90%
Voter 70% 80% 0%
TPC-C 98% 100% 100%
Wikipedia 0% 11% 20%
((a)) Small workload
MonkeyDB IsoPredict
Program Fail Unser Unser
Smallbank 84% 100% 100%
Voter 56% 80% 0%
TPC-C 100% 100% 100%
Wikipedia 0% 19% 20%
((b)) Large workload
MonkeyDB IsoPredict MySQL
Program Fail Unser Unser Fail
Smallbank 100% 100% 100% 0%
Voter 89% 100% 100% 0%
TPC-C 100% 100% 100% 50%
Wikipedia 54% 54% 70% 0%
((a)) Small workload
MonkeyDB IsoPredict MySQL
Program Fail Unser Unser Fail
Smallbank 100% 100% 100% 0%
Voter 95% 100% 100% 0%
TPC-C 100% 100% 100% 70%
Wikipedia 89% 89% 100% 0%
((b)) Large workload
Table 7. Comparison between MonkeyDB (Biswas et al., 2021), IsoPredict (Approx-Strict strategy), and regular execution using MySQL under rc. Each number is the percentage of runs in which a benchmark assertion failed (Fail) or the history was unserializable (Unser).

To compare MonkeyDB and IsoPredict directly, we computed whether each execution produced by MonkeyDB was unserializable, by generating and solving constraints corresponding to the definition of serializable. An assertion failure is a sufficient but unnecessary condition for an unserializable execution; hence, for MonkeyDB, the number of executions failing assertions (Fail) never exceeds the number of unserializable executions (Unser).

The IsoPredict column shows the percentage of executions that led to unserializable predictions that were successfully validated (i.e., same results as the Validation columns in Tables 4 and 5). The tables use the best-performing prediction strategy for each isolation level.

Quantitatively, MonkeyDB and IsoPredict are comparable, finding erroneous executions at similar rates, except for two cases. In one case—Voter under causal—MonkeyDB produces unserializable executions, but IsoPredict never predicts any. Voter issues only one write transaction under serializable (Algorithm 3), from which it is impossible to predict an unserializable execution under causal, because IsoPredict cannot predict events that did not happen in the observed execution. In contrast, since MonkeyDB chooses values on the fly, its choices of reads can lead Voter to perform additional writes, leading to unserializable behavior. In another case—Wikipedia under causal—IsoPredict is able to predict several unserializable executions while MonkeyDB never has assertion failures, since its assertions are not sensitive enough to detect unserializable behaviors.

Qualitatively, the approaches differ in two significant ways. First, IsoPredict does not require programmers to write assertions. Second and more significantly, IsoPredict predicts unserializable executions from observed executions, which in theory could be produced by any data store. In contrast, MonkeyDB’s approach requires its specialized query engine.

Comparison with regular execution

Both MonkeyDB and IsoPredict routinely produce unserializable executions for the evaluated programs, but a natural question is whether executing these programs normally on a real-world data store yields unserializable executions. To evaluate this question, we executed the programs using MySQL (MySQL, 2023a) in rc mode (MySQL does not support causal). As for the MonkeyDB runs, we executed each program 100 times—10 times for each of the 10 RNG seeds used as input to the program—and evaluated the assertions used by MonkeyDB.

Table 7’s MySQL columns show the percentage of runs in which an assertion failed, a sufficient condition for an unserializable history. The results show that Smallbank, Voter, and Wikipedia never experienced an assertion failure under regular execution.888It is an open question whether MySQL in rc mode can actually produce unserializable executions for these programs. Data store implementations may preclude behaviors that are theoretically possible under the target isolation level. TPC-C experienced an assertion failure half of the time on the small workload and 70% of the time on the large workload. In contrast, MonkeyDB and IsoPredict often produce assertion-failing, unserializable executions.

Differences between our MonkeyDB results and the MonkeyDB paper’s results

In our experiments, MonkeyDB triggered fewer assertion failures than reported in the MonkeyDB paper (Biswas et al., 2021). These differences exist because we found and fixed a few bugs in the ported benchmarks and their assertions, which eliminated a few spurious failures. We confirmed all of the bugs and fixes with the MonkeyDB authors (Biswas et al., 2023). To be clear, the differences do not impact the MonkeyDB paper’s takeaway: MonkeyDB often produces unserializable, erroneous executions for the evaluated programs.

8. Related Work

The closest existing approaches to IsoPredict are arguably MonkeyDB (Biswas et al., 2021), IsoDiff (Gan et al., 2020), 2AD (Warszawski and Bailis, 2017), and Sinha et al.’s predictive analysis (Sinha et al., 2012). As §7.3 explained, MonkeyDB produces a single execution, which may or may not be unserializable, while IsoPredict predicts unserializable executions from an observed execution. IsoPredict can in theory work with any data store that can generate execution traces, while MonkeyDB requires its specialized query engine.

IsoDiff and 2AD detect unserializable behaviors based on an observed execution (Gan et al., 2020; Warszawski and Bailis, 2017). They build an abstract graph that does not take into account potential dependencies between read values. As the 2AD paper acknowledges, “2AD’s abstract histories are value-agnostic and do not account for control flow within a program; in effect, 2AD’s abstract history construction process assumes that each variable read and written can assume arbitrary values. However, there are often dependencies (e.g., y=x+1𝑦𝑥1y=x+1italic_y = italic_x + 1) between the values that variables assume” (Warszawski and Bailis, 2017). As a result, 2AD incurs high false positive rates even after using programmer-guided refinement: 37 reported “witnesses” on average per application, but only 22 bugs across 12 applications, or 2 bugs on average per application (Warszawski and Bailis, 2017).

In contrast, IsoPredict accounts for dependencies among read values through its axiomatic encoding of constraints, which permits encoding of potential dependencies using the prediction boundary. IsoPredict may still report false positives, but for narrower reasons: divergent aborts or (only when using the relaxed boundary) intra-transaction dependencies.

Sinha et al.’s analysis predicts atomicity violations in shared-memory multithreaded programs by encoding the conditions for unserializability as SMT constraints (Sinha et al., 2012). A key difference with IsoPredict is that Sinha et al.’s work deals with execution histories of shared-memory programs, in which all pairs of conflicting accesses are fully ordered, while IsoPredict deals with execution histories of distributed data store applications, in which conflicting accesses are unordered in general. As a result, Sinha et al.’s work only needs to encode graph cyclicity, while IsoPredict must encode that every potential commit order is acyclic. Addressing this unique challenge led us to develop IsoPredict’s approximate encoding (§4.2.2). Other differences include the different prediction spaces: Sinha et al.’s analysis predicts different orderings of critical sections on the same lock, while IsoPredict predicts different write–read orders.

Dynamic analysis

Non-predictive dynamic analysis can check if an observed execution satisfies an isolation level. ECRacer checks whether an observed execution is serializable, using a relaxed definition of serializability that accounts for commutative operations (Brutschy et al., 2017). In contrast, IsoPredict finds new executions that violate serializability.

Prior work uses run-time testing and constraint solving to check if a data store provides a stated weak isolation level (Biswas and Enea, 2019; Kingsbury and Alvaro, 2020; Tan et al., 2020; Zhang et al., 2023; Zennou et al., 2022). In contrast, IsoPredict assumes the data store provides the target weak isolation level and predicts feasible unserializable executions.

Model checking explores multiple executions, avoiding exhaustively exploring all possible executions by using techniques such as dynamic partial order reduction (DPOR) (Abdulla et al., 2023; Bouajjani et al., 2023; Ghafoor et al., 2016). Conschecker uses a DPOR-based stateless model checking algorithm to verify distributed shared-memory programs under causal consistency (Abdulla et al., 2023). Bouajjani et al.’s work adapts DPOR-based algorithms to transactional database applications to check them under a range of isolation levels (Bouajjani et al., 2023).

Static analysis

Static analysis can find unserializable behavior, but precision and performance scale poorly with program size. C4 and Nagar and Jagannathan’s analysis detect serializability violations under causal consistency, eventual consistency, and snapshot isolation (Brutschy et al., 2018; Nagar and Jagannathan, 2018). Clotho uses static analysis, model checking, and test generation to detect unserializable executions; it avoids false positives by verifying the feasibility of unserializable behaviors (Rahmani et al., 2019). In contrast, IsoPredict detects unserializable behaviors with high precision by basing it on a single observed execution.

Isolation levels

IsoPredict generates constraints based on isolation levels encoded in Biswas and Enea’s axiomatic framework (Biswas and Enea, 2019). Other prior work besides Biswas and Enea’s has introduced axiomatic encodings of weak isolation levels (Bouajjani et al., 2017; Perrin et al., 2016; Cerone et al., 2015; Kaki et al., 2018).

Adya et al. define various isolation levels with dependency graphs where each level allows certain types of cycles (Adya et al., 2000). Their approach encompasses “classical” database isolation levels such as read committed and snapshot isolation, but not isolation levels typically used in distributed data stores such as causal consistency (Alglave et al., 2014; Bouajjani et al., 2017; Burckhardt, 2014; Hamza, 2015; Perrin et al., 2016) and eventual consistency (Burckhardt, 2014).

IsoPredict currently supports only causal and rc, by encoding axioms from Biswas and Enea’s framework (Biswas and Enea, 2019). We expect that extending IsoPredict to more isolation levels from their framework—read atomic (a.k.a. repeated reads) and snapshot isolation—to be straightforward. We do not know how difficult it would be to encode other isolation levels (e.g., eventual consistency and monotonic atomic view) into Biswas and Enea’s framework or into IsoPredict.

9. Conclusion

IsoPredict is the first predictive analysis for detecting unserializable behaviors of applications backed by weakly isolated data stores. IsoPredict’s design introduces novel approaches to address challenges involving constraint complexity, constraint encoding, and divergent behaviors. An evaluation shows that, based on observed executions of data store applications, IsoPredict effectively, precisely, and efficiently predicts feasible, unserializable behaviors.

Data-Availability Statement

An artifact reproducing this paper’s results is publicly available (Geng et al., 2024a).

Acknowledgements.
We thank the MonkeyDB authors (Biswas et al., 2021) for making their implementation publicly available and answering our questions about it; Vincent Beardsley and Noah Charlton for helpful discussions; and the anonymous reviewers for valuable feedback. This material is based in part upon work supported by the National Science Foundation under Grant Numbers NSF CCF-2118745, CSR-2106117, and OAC-2112606, and by Oracle America, Inc.

References

  • (1)
  • Abdulla et al. (2023) Parosh Abdulla, Mohamed Faouzi Atig, S. Krishna, Ashutosh Gupta, and Omkar Tuppe. 2023. Optimal Stateless Model Checking for Causal Consistency. In Tools and Algorithms for the Construction and Analysis of Systems, Sriram Sankaranarayanan and Natasha Sharygina (Eds.). Springer Nature Switzerland, Cham, 105–125.
  • Adya et al. (2000) A. Adya, B. Liskov, and P. O’Neil. 2000. Generalized isolation level definitions. In Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073). IEEE Computer Society, Los Alamitos, CA, USA, 67–78. https://doi.org/10.1109/ICDE.2000.839388
  • Ahamad et al. (1995) Mustaque Ahamad, Gil Neiger, James E. Burns, Prince Kohli, and P.W. Hutto. 1995. Causal Memory: Definitions, Implementation and Programming. Distributed Computing 9, 1 (1995), 37–49. https://doi.org/10.1007/BF01784241
  • Alglave et al. (2014) Jade Alglave, Luc Maranget, and Michael Tautschnig. 2014. Herding Cats: Modelling, Simulation, Testing, and Data Mining for Weak Memory. ACM Trans. Program. Lang. Syst. 36, 2, Article 7 (Jul 2014), 74 pages. https://doi.org/10.1145/2627752
  • Berenson et al. (1995) Hal Berenson, Phil Bernstein, Jim Gray, Jim Melton, Elizabeth O’Neil, and Patrick O’Neil. 1995. A Critique of ANSI SQL Isolation Levels. In Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data (San Jose, California, USA) (SIGMOD ’95). ACM, New York, NY, USA, 1–10. https://doi.org/10.1145/223784.223785
  • Biswas and Enea (2019) Ranadeep Biswas and Constantin Enea. 2019. On the Complexity of Checking Transactional Consistency. Proc. ACM Program. Lang. 3, OOPSLA, Article 165 (Oct 2019), 28 pages. https://doi.org/10.1145/3360591
  • Biswas et al. (2021) Ranadeep Biswas, Diptanshu Kakwani, Jyothi Vedurada, Constantin Enea, and Akash Lal. 2021. MonkeyDB: Effectively Testing Correctness under Weak Isolation Levels. Proc. ACM Program. Lang. 5, OOPSLA, Article 132 (Oct 2021), 27 pages. https://doi.org/10.1145/3485546
  • Biswas et al. (2023) Ranadeep Biswas, Diptanshu Kakwani, Jyothi Vedurada, Constantin Enea, and Akash Lal. 2023. Personal communication.
  • Bouajjani et al. (2017) Ahmed Bouajjani, Constantin Enea, Rachid Guerraoui, and Jad Hamza. 2017. On verifying causal consistency. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages (Paris, France) (POPL ’17). Association for Computing Machinery, New York, NY, USA, 626–638. https://doi.org/10.1145/3009837.3009888
  • Bouajjani et al. (2023) Ahmed Bouajjani, Constantin Enea, and Enrique Román-Calvo. 2023. Dynamic Partial Order Reduction for Checking Correctness against Transaction Isolation Levels. Proc. ACM Program. Lang. 7, PLDI, Article 129 (Jun 2023), 26 pages. https://doi.org/10.1145/3591243
  • Brutschy et al. (2017) Lucas Brutschy, Dimitar Dimitrov, Peter Müller, and Martin Vechev. 2017. Serializability for Eventual Consistency: Criterion, Analysis, and Applications. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages (Paris, France) (POPL ’17). Association for Computing Machinery, New York, NY, USA, 458–472. https://doi.org/10.1145/3009837.3009895
  • Brutschy et al. (2018) Lucas Brutschy, Dimitar Dimitrov, Peter Müller, and Martin Vechev. 2018. Static Serializability Analysis for Causal Consistency. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (Philadelphia, PA, USA) (PLDI 2018). Association for Computing Machinery, New York, NY, USA, 90–104. https://doi.org/10.1145/3192366.3192415
  • Burckhardt (2014) Sebastian Burckhardt. 2014. Principles of Eventual Consistency. Found. Trends Program. Lang. 1, 1–2 (oct 2014), 1–150. https://doi.org/10.1561/2500000011
  • Cerone et al. (2015) Andrea Cerone, Giovanni Bernardi, and Alexey Gotsman. 2015. A Framework for Transactional Consistency Models with Atomic Visibility. In 26th International Conference on Concurrency Theory (CONCUR 2015) (Leibniz International Proceedings in Informatics (LIPIcs), Vol. 42), Luca Aceto and David de Frutos Escrig (Eds.). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 58–71. https://doi.org/10.4230/LIPIcs.CONCUR.2015.58
  • Cheng et al. (2023) Chaoyi Cheng, Mingzhe Han, Nuo Xu, Spyros Blanas, Michael D. Bond, and Yang Wang. 2023. Developer’s Responsibility or Database’s Responsibility? Rethinking Concurrency Control in Databases. In 13th Conference on Innovative Data Systems Research, CIDR 2023, Amsterdam, The Netherlands, January 8-11, 2023. www.cidrdb.org. https://www.cidrdb.org/cidr2023/papers/p30-cheng.pdf
  • Corbett et al. (2012) James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, JJ Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford. 2012. Spanner: Google’s Globally-Distributed Database. In 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12). USENIX Association, Hollywood, CA, 261–264. https://www.usenix.org/conference/osdi12/technical-sessions/presentation/corbett
  • Crooks et al. (2017) Natacha Crooks, Youer Pu, Lorenzo Alvisi, and Allen Clement. 2017. Seeing is Believing: A Client-Centric Specification of Database Isolation. In Proceedings of the ACM Symposium on Principles of Distributed Computing (Washington, DC, USA) (PODC ’17). ACM, New York, NY, USA, 73–82. https://doi.org/10.1145/3087801.3087802
  • de Moura and Bjørner (2008) Leonardo de Moura and Nikolaj Bjørner. 2008. Z3: An Efficient SMT Solver. In Tools and Algorithms for the Construction and Analysis of Systems, C. R. Ramakrishnan and Jakob Rehof (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 337–340.
  • Difallah et al. (2013) Djellel Eddine Difallah, Andrew Pavlo, Carlo Curino, and Philippe Cudre-Mauroux. 2013. OLTP-Bench: An Extensible Testbed for Benchmarking Relational Databases. Proc. VLDB Endow. 7, 4 (Dec 2013), 277–288. https://doi.org/10.14778/2732240.2732246
  • Elhemali et al. (2022) Mostafa Elhemali, Niall Gallagher, Nick Gordon, Joseph Idziorek, Richard Krog, Colin Lazier, Erben Mo, Akhilesh Mritunjai, Somasundaram Perianayagam, Tim Rath, Swami Sivasubramanian, James Christopher Sorenson III, Sroaj Sosothikul, Doug Terry, and Akshat Vig. 2022. Amazon DynamoDB: A Scalable, Predictably Performant, and Fully Managed NoSQL Database Service. In 2022 USENIX Annual Technical Conference (USENIX ATC 22). USENIX Association, Carlsbad, CA, 1037–1048. https://www.usenix.org/conference/atc22/presentation/elhemali
  • Frederickson (2024) Ben Frederickson. 2024. https://github.com/benfred/py-spy
  • Galanis et al. (2008) Leonidas Galanis, Supiti Buranawatanachoke, Romain Colle, Benoît Dageville, Karl Dias, Jonathan Klein, Stratos Papadomanolakis, Leng Leng Tan, Venkateshwaran Venkataramani, Yujun Wang, and Graham Wood. 2008. Oracle Database Replay. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (Vancouver, Canada) (SIGMOD ’08). Association for Computing Machinery, New York, NY, USA, 1159–1170. https://doi.org/10.1145/1376616.1376732
  • Gan et al. (2020) Yifan Gan, Xueyuan Ren, Drew Ripberger, Spyros Blanas, and Yang Wang. 2020. IsoDiff: Debugging Anomalies Caused by Weak Isolation. Proc. VLDB Endow. 13, 12 (Jul 2020), 2773–2786. https://doi.org/10.14778/3407790.3407860
  • Geng et al. (2024a) Chujun Geng, Spyros Blanas, Michael D. Bond, and Yang Wang. 2024a. IsoPredict artifact. https://doi.org/10.5281/zenodo.10802748
  • Geng et al. (2024b) Chujun Geng, Spyros Blanas, Michael D. Bond, and Yang Wang. 2024b. IsoPredict implementation. https://github.com/PLaSSticity/IsoPredict-implementation
  • Ghafoor et al. (2016) M. Ghafoor, M. Mahmood, and J. Siddiqui. 2016. Effective Partial Order Reduction in Model Checking Database Applications. In 2016 IEEE International Conference on Software Testing, Verification and Validation (ICST). IEEE Computer Society, Los Alamitos, CA, USA, 146–156. https://doi.org/10.1109/ICST.2016.25
  • Gilbert and Lynch (2002) Seth Gilbert and Nancy Lynch. 2002. Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News 33 (June 2002), 51–59. Issue 2. https://doi.org/10.1145/564585.564601
  • Hamza (2015) Jad Hamza. 2015. Algorithmic Verification of Concurrent and Distributed Data Structures. Ph. D. Dissertation. PhD thesis, Université Paris Diderot.
  • Huang et al. (2014) Jeff Huang, Patrick O’Neil Meredith, and Grigore Rosu. 2014. Maximal sound predictive race detection with control flow abstraction. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (Edinburgh, United Kingdom) (PLDI ’14). Association for Computing Machinery, New York, NY, USA, 337–348. https://doi.org/10.1145/2594291.2594315
  • Kaki et al. (2018) Gowtham Kaki, Kapil Earanky, KC Sivaramakrishnan, and Suresh Jagannathan. 2018. Safe replication through bounded concurrency verification. Proc. ACM Program. Lang. 2, OOPSLA, Article 164 (Oct 2018), 27 pages. https://doi.org/10.1145/3276534
  • Kingsbury and Alvaro (2020) Kyle Kingsbury and Peter Alvaro. 2020. Elle: Inferring Isolation Anomalies from Experimental Observations. Proc. VLDB Endow. 14, 3 (Nov 2020), 268–280. https://doi.org/10.14778/3430915.3430918
  • Kini et al. (2017) Dileep Kini, Umang Mathur, and Mahesh Viswanathan. 2017. Dynamic race prediction in linear time. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (Barcelona, Spain) (PLDI 2017). Association for Computing Machinery, New York, NY, USA, 157–170. https://doi.org/10.1145/3062341.3062374
  • Leino and Pit-Claudel (2016) K. R. M. Leino and Clément Pit-Claudel. 2016. Trigger Selection Strategies to Stabilize Program Verifiers. In Computer Aided Verification, Swarat Chaudhuri and Azadeh Farzan (Eds.). Springer International Publishing, Cham, 361–381.
  • Li et al. (2023) Qian Li, Peter Kraft, Michael Cafarella, Çağatay Demiralp, Goetz Graefe, Christos Kozyrakis, Michael Stonebraker, Lalith Suresh, Xiangyao Yu, and Matei Zaharia. 2023. R3: Record-Replay-Retroaction for Database-Backed Applications. Proc. VLDB Endow. 16, 11 (Jul 2023), 3085–3097. https://doi.org/10.14778/3611479.3611510
  • Mahajan et al. (2011) P. Mahajan, L. Alvisi, and M. Dahlin. 2011. Consistency, Availability, Convergence. Technical Report TR-11-22. Computer Science Department, University of Texas at Austin.
  • MySQL (2023a) MySQL 2023a. http://www.mysql.com
  • MySQL (2023b) MySQL 2023b. MySQL Cluster. https://www.mysql.com/products/cluster/
  • Nagar and Jagannathan (2018) Kartik Nagar and Suresh Jagannathan. 2018. Automated Detection of Serializability Violations under Weak Consistency. arXiv:1806.08416 [cs.PL]
  • Pavlo (2017) Andrew Pavlo. 2017. What Are We Doing With Our Lives? Nobody Cares About Our Concurrency Control Research. In Proceedings of the 2017 ACM International Conference on Management of Data (Chicago, Illinois, USA) (SIGMOD ’17). Association for Computing Machinery, New York, NY, USA, 3. https://doi.org/10.1145/3035918.3056096
  • perf (2024) perf 2024. https://perf.wiki.kernel.org/index.php/Main_Page
  • Perrin et al. (2016) Matthieu Perrin, Achour Mostefaoui, and Claude Jard. 2016. Causal Consistency: Beyond Memory. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Barcelona, Spain) (PPoPP ’16). Association for Computing Machinery, New York, NY, USA, Article 26, 12 pages. https://doi.org/10.1145/2851141.2851170
  • Rahmani et al. (2019) Kia Rahmani, Kartik Nagar, Benjamin Delaware, and Suresh Jagannathan. 2019. CLOTHO: Directed Test Generation for Weakly Consistent Database Systems. Proc. ACM Program. Lang. 3, OOPSLA, Article 117 (Oct 2019), 28 pages. https://doi.org/10.1145/3360543
  • Roemer et al. (2020) Jake Roemer, Kaan Genç, and Michael D. Bond. 2020. SmartTrack: efficient predictive race detection. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (London, UK) (PLDI 2020). Association for Computing Machinery, New York, NY, USA, 747–762. https://doi.org/10.1145/3385412.3385993
  • Said et al. (2011) Mahmoud Said, Chao Wang, Zijiang Yang, and Karem Sakallah. 2011. Generating Data Race Witnesses by an SMT-Based Analysis. In NASA Formal Methods, Mihaela Bobaru, Klaus Havelund, Gerard J. Holzmann, and Rajeev Joshi (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 313–327. https://doi.org/10.1007/978-3-642-20398-5_23
  • Sinha et al. (2012) Arnab Sinha, Sharad Malik, Chao Wang, and Aarti Gupta. 2012. Predicting Serializability Violations: SMT-Based Search vs. DPOR-Based Search. In Hardware and Software: Verification and Testing, Kerstin Eder, João Lourenço, and Onn Shehory (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 95–114. https://doi.org/10.1007/978-3-642-34188-5_11
  • Snowflake (2023) Snowflake 2023. Snowflake transactions. https://docs.snowflake.com/en/sql-reference/transactions
  • Tan et al. (2020) Cheng Tan, Changgeng Zhao, Shuai Mu, and Michael Walfish. 2020. COBRA: making transactional key-value stores verifiably serializable. In Proceedings of the 14th USENIX Conference on Operating Systems Design and Implementation (OSDI’20). USENIX Association, USA, Article 4, 18 pages. https://www.usenix.org/conference/osdi20/presentation/tan
  • Tang et al. (2022) Chuzhe Tang, Zhaoguo Wang, Xiaodong Zhang, Qianmian Yu, Binyu Zang, Haibing Guan, and Haibo Chen. 2022. Ad Hoc Transactions in Web Applications: The Good, the Bad, and the Ugly. In Proceedings of the 2022 International Conference on Management of Data (Philadelphia, PA, USA) (SIGMOD ’22). Association for Computing Machinery, New York, NY, USA, 4–18. https://doi.org/10.1145/3514221.3526120
  • Tunç et al. (2023) Hünkar Can Tunç, Umang Mathur, Andreas Pavlogiannis, and Mahesh Viswanathan. 2023. Sound Dynamic Deadlock Prediction in Linear Time. Proc. ACM Program. Lang. 7, PLDI, Article 177 (Jun 2023), 26 pages. https://doi.org/10.1145/3591291
  • Warszawski and Bailis (2017) Todd Warszawski and Peter Bailis. 2017. ACIDRain: Concurrency-Related Attacks on Database-Backed Web Applications. In Proceedings of the 2017 ACM International Conference on Management of Data (Chicago, Illinois, USA) (SIGMOD ’17). ACM, New York, NY, USA, 5–20. https://doi.org/10.1145/3035918.3064037
  • Zennou et al. (2022) Rachid Zennou, Ranadeep Biswas, Ahmed Bouajjani, Constantin Enea, and Mohammed Erradi. 2022. Checking Causal Consistency of Distributed Databases. Computing 104, 10 (Oct 2022), 2181–2201. https://doi.org/10.1007/s00607-021-00911-3
  • Zhang et al. (2023) Jian Zhang, Ye Ji, Shuai Mu, and Cheng Tan. 2023. Viper: A Fast Snapshot Isolation Checker. In Proceedings of the Eighteenth European Conference on Computer Systems (Rome, Italy) (EuroSys ’23). Association for Computing Machinery, New York, NY, USA, 654–671. https://doi.org/10.1145/3552326.3567492

Appendix A Proof that Anti-Dependency Implies Commit Order

Here we prove the following claim from §4.2.2: Anti-dependency order must imply commit order, i.e., 𝑟𝑤𝑐𝑜𝑟𝑤𝑐𝑜\mathit{rw}\subseteq\mathit{co}italic_rw ⊆ italic_co for every valid 𝑐𝑜𝑐𝑜\mathit{co}italic_co. The proof proceeds by showing that violating anti-dependency order violates arbitration order:

Proof.

Suppose there exist t1,t2subscript𝑡1subscript𝑡2t_{1},t_{2}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT such that 𝑟𝑤(t1,t2)𝑟𝑤subscript𝑡1subscript𝑡2\mathit{rw}(t_{1},t_{2})italic_rw ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ), but ¬𝑐𝑜(t1,t2\neg\mathit{co}(t_{1},t_{2}¬ italic_co ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT). By the definition of anti-dependency, let k𝑘kitalic_k be a key and twsubscript𝑡𝑤t_{w}italic_t start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT be a transaction such that t2subscript𝑡2t_{2}italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT writes k𝑘kitalic_k, 𝑤𝑟k(tw,t1)subscript𝑤𝑟𝑘subscript𝑡𝑤subscript𝑡1\mathit{wr}_{k}(t_{w},t_{1})italic_wr start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ), and 𝑐𝑜(tw,t2)𝑐𝑜subscript𝑡𝑤subscript𝑡2\mathit{co}(t_{w},t_{2})italic_co ( italic_t start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ). Because ¬𝑐𝑜(t1,t2)𝑐𝑜subscript𝑡1subscript𝑡2\neg\mathit{co}(t_{1},t_{2})¬ italic_co ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) and 𝑐𝑜𝑐𝑜\mathit{co}italic_co is a total order, therefore 𝑐𝑜(t2,t1)𝑐𝑜subscript𝑡2subscript𝑡1\mathit{co}(t_{2},t_{1})italic_co ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ). Then 𝑐𝑜(t2,tw)𝑐𝑜subscript𝑡2subscript𝑡𝑤\mathit{co}(t_{2},t_{w})italic_co ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ) according to the arbitration rule (Equation 1). However, 𝑐𝑜(t2,tw)𝑐𝑜subscript𝑡2subscript𝑡𝑤\mathit{co}(t_{2},t_{w})italic_co ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ) contradicts 𝑐𝑜(tw,t2)𝑐𝑜subscript𝑡𝑤subscript𝑡2\mathit{co}(t_{w},t_{2})italic_co ( italic_t start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) since 𝑐𝑜𝑐𝑜\mathit{co}italic_co is a total order. ∎

Appendix B IsoPredict’s Full Constraints using the Prediction Boundary

This section shows the constraints generated by IsoPredict’s predictive analysis using the strict prediction boundary. For completeness we show all constraints generated by IsoPredict, including those that are unchanged compared with §4.

B.1. Encoding of Feasible Execution

t1,t2T,t1t2,ϕ𝑠𝑜(t1,t2)\displaystyle\hbox{\multirowsetup$\forall t_{1},t_{2}\in T,t_{1}\neq t_{2},% \quad$}\quad\boxed{\phi_{\mathit{so}}(t_{1},t_{2})}∀ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ italic_T , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≠ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , start_ARG italic_ϕ start_POSTSUBSCRIPT italic_so end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG if 𝑠𝑜(t1,t2)if 𝑠𝑜subscript𝑡1subscript𝑡2\displaystyle\quad\textnormal{if }\mathit{so}(t_{1},t_{2})if italic_so ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT )
¬ϕ𝑠𝑜(t1,t2)subscriptitalic-ϕ𝑠𝑜subscript𝑡1subscript𝑡2\displaystyle\boxed{\neg\phi_{\mathit{so}}(t_{1},t_{2})}¬ italic_ϕ start_POSTSUBSCRIPT italic_so end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) otherwise
t1,t2T,t1t2,i𝑟𝑑𝑝𝑜𝑠k(t2)=i,t2’s read at pos i reads from t1 in 𝑤𝑟𝑜𝑏𝑠,ϕ𝑜𝑏𝑠(s2,i)=t1formulae-sequencefor-allsubscript𝑡1subscript𝑡2𝑇formulae-sequencesubscript𝑡1subscript𝑡2for-all𝑖subscript𝑟𝑑𝑝𝑜𝑠𝑘subscript𝑡2𝑖subscript𝑡2’s read at pos i reads from t1 in subscript𝑤𝑟𝑜𝑏𝑠subscriptitalic-ϕ𝑜𝑏𝑠subscript𝑠2𝑖subscript𝑡1\displaystyle\forall t_{1},t_{2}\in T,t_{1}\neq t_{2},\forall i\in\mathit{% rdpos_{k}}(t_{2})=i,t_{2}\textnormal{'s read at pos $i$ reads from $t_{1}$ in % }\mathit{wr_{obs}},\quad\boxed{\phi_{\mathit{obs}}(s_{2},i)=t_{1}}∀ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ italic_T , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≠ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ∀ italic_i ∈ italic_rdpos start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = italic_i , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ’s read at pos italic_i reads from italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT in italic_wr start_POSTSUBSCRIPT italic_obs end_POSTSUBSCRIPT , start_ARG italic_ϕ start_POSTSUBSCRIPT italic_obs end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_i ) = italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG
k is a key,t1 writes k,t2 reads k,i𝑟𝑑𝑝𝑜𝑠k(t2),i<ϕ𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦(s2)ϕ𝑐ℎ𝑜𝑖𝑐𝑒(s2,i)=ϕ𝑜𝑏𝑠(s2,i)formulae-sequencefor-all𝑘 is a keyfor-allsubscript𝑡1 writes 𝑘for-allsubscript𝑡2 reads 𝑘for-all𝑖subscript𝑟𝑑𝑝𝑜𝑠𝑘subscript𝑡2𝑖subscriptitalic-ϕ𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦subscript𝑠2subscriptitalic-ϕ𝑐ℎ𝑜𝑖𝑐𝑒subscript𝑠2𝑖subscriptitalic-ϕ𝑜𝑏𝑠subscript𝑠2𝑖\displaystyle\forall k\textnormal{ is a key},\forall t_{1}\textnormal{ writes % }k,\forall t_{2}\textnormal{ reads }k,\forall i\in\mathit{rdpos_{k}}(t_{2}),\;% \;\boxed{i<\phi_{\mathit{boundary}}(s_{2})\implies\phi_{\mathit{choice}}(s_{2}% ,i)=\phi_{\mathit{obs}}(s_{2},i)}∀ italic_k is a key , ∀ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT writes italic_k , ∀ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT reads italic_k , ∀ italic_i ∈ italic_rdpos start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , start_ARG italic_i < italic_ϕ start_POSTSUBSCRIPT italic_boundary end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ⟹ italic_ϕ start_POSTSUBSCRIPT italic_choice end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_i ) = italic_ϕ start_POSTSUBSCRIPT italic_obs end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_i ) end_ARG

where s1subscript𝑠1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is t1subscript𝑡1t_{1}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT’s session and s2subscript𝑠2s_{2}italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is t2subscript𝑡2t_{2}italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT’s session.

k is a key,t1 writes k,t2t1 reads k,i𝑟𝑑𝑝𝑜𝑠k(t2),formulae-sequencefor-all𝑘 is a keyfor-allsubscript𝑡1 writes 𝑘for-allsubscript𝑡2subscript𝑡1 reads 𝑘for-all𝑖subscript𝑟𝑑𝑝𝑜𝑠𝑘subscript𝑡2\displaystyle\forall k\textnormal{ is a key},\forall t_{1}\textnormal{ writes % }k,\forall t_{2}\neq t_{1}\textnormal{ reads }k,\forall i\in\mathit{rdpos_{k}}% (t_{2}),∀ italic_k is a key , ∀ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT writes italic_k , ∀ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≠ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT reads italic_k , ∀ italic_i ∈ italic_rdpos start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ,
ϕ𝑐ℎ𝑜𝑖𝑐𝑒(s2,i)=t1iϕ𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦(s2)𝑤𝑟𝑝𝑜𝑠k(t1)<ϕ𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦(s1)subscriptitalic-ϕ𝑐ℎ𝑜𝑖𝑐𝑒subscript𝑠2𝑖subscript𝑡1𝑖subscriptitalic-ϕ𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦subscript𝑠2subscript𝑤𝑟𝑝𝑜𝑠𝑘subscript𝑡1subscriptitalic-ϕ𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦subscript𝑠1\displaystyle\boxed{\phi_{\mathit{choice}}(s_{2},i)=t_{1}\land i\leq\phi_{% \mathit{boundary}}(s_{2})\implies\mathit{wrpos}_{k}(t_{1})<\phi_{\mathit{% boundary}}(s_{1})}italic_ϕ start_POSTSUBSCRIPT italic_choice end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_i ) = italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∧ italic_i ≤ italic_ϕ start_POSTSUBSCRIPT italic_boundary end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ⟹ italic_wrpos start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) < italic_ϕ start_POSTSUBSCRIPT italic_boundary end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT )

where s1subscript𝑠1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is t1subscript𝑡1t_{1}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT’s session and s2subscript𝑠2s_{2}italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is t2subscript𝑡2t_{2}italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT’s session, and 𝑤𝑟𝑝𝑜𝑠ksubscript𝑤𝑟𝑝𝑜𝑠𝑘\mathit{wrpos}_{k}italic_wrpos start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT(t) is the position of t𝑡titalic_t’s last write to key k𝑘kitalic_k.

s is a session,(t is a transaction in si𝑟𝑑𝑝𝑜𝑠k(t)ϕ𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦(s)=i)ϕ𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦(s)=for-all𝑠 is a sessionsubscript𝑡 is a transaction in 𝑠𝑖subscript𝑟𝑑𝑝𝑜𝑠𝑘𝑡subscriptitalic-ϕ𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦𝑠𝑖subscriptitalic-ϕ𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦𝑠\displaystyle\forall s\textnormal{ is a session},\quad\boxed{\Big{(}\bigvee_{% \begin{subarray}{c}t\textnormal{ is a transaction in }s\\ i\in\mathit{rdpos_{k}}(t)\end{subarray}}\phi_{\mathit{boundary}}(s)=i\Big{)}% \lor\phi_{\mathit{boundary}}(s)=\infty}∀ italic_s is a session , start_ARG ( ⋁ start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_t is a transaction in italic_s end_CELL end_ROW start_ROW start_CELL italic_i ∈ italic_rdpos start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) end_CELL end_ROW end_ARG end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_boundary end_POSTSUBSCRIPT ( italic_s ) = italic_i ) ∨ italic_ϕ start_POSTSUBSCRIPT italic_boundary end_POSTSUBSCRIPT ( italic_s ) = ∞ end_ARG

Recall that 𝑟𝑑𝑝𝑜𝑠ksubscript𝑟𝑑𝑝𝑜𝑠𝑘\mathit{rdpos_{k}}italic_rdpos start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT(s) is the set of positions of reads to k𝑘kitalic_k in the transaction t𝑡titalic_t.

k is a key,t1 writes k,t2 reads k,t1t2,for-all𝑘 is a keyfor-allsubscript𝑡1 writes 𝑘for-allsubscript𝑡2 reads 𝑘subscript𝑡1subscript𝑡2\displaystyle\forall k\textnormal{ is a key},\forall t_{1}\textnormal{ writes % }k,\forall t_{2}\textnormal{ reads }k,t_{1}\neq t_{2},∀ italic_k is a key , ∀ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT writes italic_k , ∀ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT reads italic_k , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≠ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ,
ϕ𝑤𝑟k(t1,t2)=i𝑟𝑑𝑝𝑜𝑠k(t2)ϕ𝑐ℎ𝑜𝑖𝑐𝑒(s2,i)=t1iϕ𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦(s2)subscriptitalic-ϕsubscript𝑤𝑟𝑘subscript𝑡1subscript𝑡2subscript𝑖subscript𝑟𝑑𝑝𝑜𝑠𝑘subscript𝑡2subscriptitalic-ϕ𝑐ℎ𝑜𝑖𝑐𝑒subscript𝑠2𝑖subscript𝑡1𝑖subscriptitalic-ϕ𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦subscript𝑠2\displaystyle\boxed{\phi_{\mathit{wr}_{k}}(t_{1},t_{2})=\bigvee_{i\in\mathit{% rdpos_{k}}(t_{2})}\phi_{\mathit{choice}}(s_{2},i)=t_{1}\land i\leq\phi_{% \mathit{boundary}}(s_{2})}italic_ϕ start_POSTSUBSCRIPT italic_wr start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = ⋁ start_POSTSUBSCRIPT italic_i ∈ italic_rdpos start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_choice end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_i ) = italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∧ italic_i ≤ italic_ϕ start_POSTSUBSCRIPT italic_boundary end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT )

where s2subscript𝑠2s_{2}italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is t2subscript𝑡2t_{2}italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT’s session.

t1,t2T,t1t2,ϕ𝑤𝑟(t1,t2)=k is a keyϕ𝑤𝑟k(t1,t2)formulae-sequencefor-allsubscript𝑡1subscript𝑡2𝑇subscript𝑡1subscript𝑡2subscriptitalic-ϕ𝑤𝑟subscript𝑡1subscript𝑡2subscript𝑘 is a keysubscriptitalic-ϕsubscript𝑤𝑟𝑘subscript𝑡1subscript𝑡2\displaystyle\forall t_{1},t_{2}\in\mathit{T},t_{1}\neq t_{2},\quad\boxed{\phi% _{\mathit{wr}}(t_{1},t_{2})=\bigvee_{k\textnormal{ is a key}}\phi_{\mathit{wr}% _{k}}(t_{1},t_{2})}∀ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ italic_T , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≠ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , start_ARG italic_ϕ start_POSTSUBSCRIPT italic_wr end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = ⋁ start_POSTSUBSCRIPT italic_k is a key end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_wr start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG

B.2. Encoding of Unserializability

B.2.1. Precise encoding

ϕ𝑐𝑜,¬𝐼𝑠𝑆𝑒𝑟𝑖𝑎𝑙𝑖𝑧𝑎𝑏𝑙𝑒(ϕ𝑐𝑜)for-allsubscriptitalic-ϕ𝑐𝑜𝐼𝑠𝑆𝑒𝑟𝑖𝑎𝑙𝑖𝑧𝑎𝑏𝑙𝑒subscriptitalic-ϕ𝑐𝑜\displaystyle\boxed{\forall\phi_{\mathit{co}},\neg\mathit{IsSerializable}(\phi% _{\mathit{co}})}∀ italic_ϕ start_POSTSUBSCRIPT italic_co end_POSTSUBSCRIPT , ¬ italic_IsSerializable ( italic_ϕ start_POSTSUBSCRIPT italic_co end_POSTSUBSCRIPT )

where 𝐼𝑠𝑆𝑒𝑟𝑖𝑎𝑙𝑖𝑧𝑎𝑏𝑙𝑒𝐼𝑠𝑆𝑒𝑟𝑖𝑎𝑙𝑖𝑧𝑎𝑏𝑙𝑒\mathit{IsSerializable}italic_IsSerializable is defined as follows:

𝐼𝑠𝑆𝑒𝑟𝑖𝑎𝑙𝑖𝑧𝑎𝑏𝑙𝑒(ϕ𝑐𝑜):-:-𝐼𝑠𝑆𝑒𝑟𝑖𝑎𝑙𝑖𝑧𝑎𝑏𝑙𝑒subscriptitalic-ϕ𝑐𝑜absent\displaystyle\mathit{IsSerializable}(\phi_{\mathit{co}})\coloneq\;italic_IsSerializable ( italic_ϕ start_POSTSUBSCRIPT italic_co end_POSTSUBSCRIPT ) :- 𝐷𝑖𝑠𝑡𝑖𝑛𝑐𝑡(ϕ𝑐𝑜(t1),,ϕ𝑐𝑜(tn))limit-from𝐷𝑖𝑠𝑡𝑖𝑛𝑐𝑡subscriptitalic-ϕ𝑐𝑜subscript𝑡1subscriptitalic-ϕ𝑐𝑜subscript𝑡𝑛\displaystyle\mathit{Distinct}(\phi_{\mathit{co}}(t_{1}),\dots,\phi_{\mathit{% co}}(t_{n}))\;\landitalic_Distinct ( italic_ϕ start_POSTSUBSCRIPT italic_co end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , italic_ϕ start_POSTSUBSCRIPT italic_co end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) ∧
t1,t2T,t1t2(ϕ𝑠𝑜(t1,t2)ϕ𝑤𝑟(t1,t2)𝐴𝑟𝑏𝑖𝑡𝑟𝑎𝑡𝑖𝑜𝑛(t1,t2))ϕ𝑐𝑜(t1)<ϕ𝑐𝑜(t2)subscriptformulae-sequencefor-allsubscript𝑡1subscript𝑡2𝑇subscript𝑡1subscript𝑡2subscriptitalic-ϕ𝑠𝑜subscript𝑡1subscript𝑡2subscriptitalic-ϕ𝑤𝑟subscript𝑡1subscript𝑡2𝐴𝑟𝑏𝑖𝑡𝑟𝑎𝑡𝑖𝑜𝑛subscript𝑡1subscript𝑡2subscriptitalic-ϕ𝑐𝑜subscript𝑡1subscriptitalic-ϕ𝑐𝑜subscript𝑡2\displaystyle\bigwedge_{\forall t_{1},t_{2}\in T,t_{1}\neq t_{2}}(\phi_{% \mathit{so}}(t_{1},t_{2})\lor\phi_{\mathit{wr}}(t_{1},t_{2})\lor\mathit{% Arbitration}(t_{1},t_{2}))\Rightarrow\phi_{\mathit{co}}(t_{1})<\phi_{\mathit{% co}}(t_{2})⋀ start_POSTSUBSCRIPT ∀ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ italic_T , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≠ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_ϕ start_POSTSUBSCRIPT italic_so end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∨ italic_ϕ start_POSTSUBSCRIPT italic_wr end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∨ italic_Arbitration ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) ⇒ italic_ϕ start_POSTSUBSCRIPT italic_co end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) < italic_ϕ start_POSTSUBSCRIPT italic_co end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT )

where t1,,tnsubscript𝑡1subscript𝑡𝑛t_{1},\dots,t_{n}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT are all transactions in T𝑇\mathit{T}italic_T, and 𝐷𝑖𝑠𝑡𝑖𝑛𝑐𝑡(v1,,vk)𝐷𝑖𝑠𝑡𝑖𝑛𝑐𝑡subscript𝑣1subscript𝑣𝑘\mathit{Distinct}(v_{1},\dots,v_{k})italic_Distinct ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) is a built-in SMT function that requires all input values to be distinct from each other.

𝐴𝑟𝑏𝑖𝑡𝑟𝑎𝑡𝑖𝑜𝑛(t1,t2):-k,t1 and t2 write kt3T{t1,t2},t3 reads kϕ𝑤𝑟k(t2,t3)ϕ𝑐𝑜(t1)<ϕ𝑐𝑜(t3)𝑤𝑟𝑝𝑜𝑠k(t1)<ϕ𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦(s1):-𝐴𝑟𝑏𝑖𝑡𝑟𝑎𝑡𝑖𝑜𝑛subscript𝑡1subscript𝑡2subscriptfor-all𝑘subscript𝑡1 and subscript𝑡2 write 𝑘for-allsubscript𝑡3𝑇subscript𝑡1subscript𝑡2subscript𝑡3 reads 𝑘subscriptitalic-ϕsubscript𝑤𝑟𝑘subscript𝑡2subscript𝑡3subscriptitalic-ϕ𝑐𝑜subscript𝑡1subscriptitalic-ϕ𝑐𝑜subscript𝑡3subscript𝑤𝑟𝑝𝑜𝑠𝑘subscript𝑡1subscriptitalic-ϕ𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦subscript𝑠1\displaystyle\mathit{Arbitration}(t_{1},t_{2})\coloneq\!\!\bigvee_{\begin{% subarray}{c}\forall k,t_{1}\textnormal{ and }t_{2}\textnormal{ write }k\\ \forall t_{3}\in\mathit{T}\setminus\{t_{1},t_{2}\},t_{3}\textnormal{ reads }k% \end{subarray}}\!\!\phi_{\mathit{wr}_{k}}(t_{2},t_{3})\land\phi_{\mathit{co}}(% t_{1})<\phi_{\mathit{co}}(t_{3})\land\mathit{wrpos}_{k}(t_{1})<\phi_{\mathit{% boundary}}(s_{1})italic_Arbitration ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) :- ⋁ start_POSTSUBSCRIPT start_ARG start_ROW start_CELL ∀ italic_k , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT write italic_k end_CELL end_ROW start_ROW start_CELL ∀ italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ∈ italic_T ∖ { italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT } , italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT reads italic_k end_CELL end_ROW end_ARG end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_wr start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ∧ italic_ϕ start_POSTSUBSCRIPT italic_co end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) < italic_ϕ start_POSTSUBSCRIPT italic_co end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ∧ italic_wrpos start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) < italic_ϕ start_POSTSUBSCRIPT italic_boundary end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT )

B.2.2. Approximate encoding

t1,t2T,t1t2,formulae-sequencefor-allsubscript𝑡1subscript𝑡2𝑇subscript𝑡1subscript𝑡2\displaystyle\forall t_{1},t_{2}\in T,t_{1}\neq t_{2},∀ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ italic_T , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≠ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ,
ϕ𝑤𝑤(t1,t2)=k,t1 and t2 write kt3T{t1,t2},t3 reads kϕ𝑤𝑟k(t2,t3)ϕ𝑝𝑐𝑜(t1,t3)𝑟𝑎𝑛𝑘(t1,t2)>𝑟𝑎𝑛𝑘(t1,t3)𝑤𝑟𝑝𝑜𝑠k(t1)<ϕ𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦(s1)subscriptitalic-ϕ𝑤𝑤subscript𝑡1subscript𝑡2subscriptfor-all𝑘subscript𝑡1 and subscript𝑡2 write 𝑘for-allsubscript𝑡3𝑇subscript𝑡1subscript𝑡2subscript𝑡3 reads 𝑘subscriptitalic-ϕsubscript𝑤𝑟𝑘subscript𝑡2subscript𝑡3subscriptitalic-ϕ𝑝𝑐𝑜subscript𝑡1subscript𝑡3𝑟𝑎𝑛𝑘subscript𝑡1subscript𝑡2limit-from𝑟𝑎𝑛𝑘subscript𝑡1subscript𝑡3subscript𝑤𝑟𝑝𝑜𝑠𝑘subscript𝑡1subscriptitalic-ϕ𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦subscript𝑠1\displaystyle\phi_{\mathit{ww}}(t_{1},t_{2})=\bigvee_{\begin{subarray}{c}% \forall k,t_{1}\textnormal{ and }t_{2}\textnormal{ write }k\\ \forall t_{3}\in\mathit{T}\setminus\{t_{1},t_{2}\},t_{3}\textnormal{ reads }k% \end{subarray}}\;\;\begin{subarray}{l}\displaystyle\phi_{\mathit{wr}_{k}}(t_{2% },t_{3})\land\phi_{\mathit{pco}}(t_{1},t_{3})\land\mathit{rank}(t_{1},t_{2})>% \mathit{rank}(t_{1},t_{3})\;\land\\ \displaystyle\mathit{wrpos}_{k}(t_{1})<\phi_{\mathit{boundary}}(s_{1})\end{subarray}italic_ϕ start_POSTSUBSCRIPT italic_ww end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = ⋁ start_POSTSUBSCRIPT start_ARG start_ROW start_CELL ∀ italic_k , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT write italic_k end_CELL end_ROW start_ROW start_CELL ∀ italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ∈ italic_T ∖ { italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT } , italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT reads italic_k end_CELL end_ROW end_ARG end_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_ϕ start_POSTSUBSCRIPT italic_wr start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ∧ italic_ϕ start_POSTSUBSCRIPT italic_pco end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ∧ italic_rank ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) > italic_rank ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ∧ end_CELL end_ROW start_ROW start_CELL italic_wrpos start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) < italic_ϕ start_POSTSUBSCRIPT italic_boundary end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_CELL end_ROW end_ARG
ϕ𝑟𝑤(t1,t2)=k,t1 reads kt2 writes kt3T{t1,t2},t3 writes kϕ𝑤𝑟k(t3,t1)ϕ𝑝𝑐𝑜(t3,t2)𝑟𝑎𝑛𝑘(t1,t2)>𝑟𝑎𝑛𝑘(t3,t2)𝑤𝑟𝑝𝑜𝑠k(t2)<ϕ𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦(s2)subscriptitalic-ϕ𝑟𝑤subscript𝑡1subscript𝑡2subscriptfor-all𝑘subscript𝑡1 reads 𝑘subscript𝑡2 writes 𝑘for-allsubscript𝑡3𝑇subscript𝑡1subscript𝑡2subscript𝑡3 writes 𝑘subscriptitalic-ϕsubscript𝑤𝑟𝑘subscript𝑡3subscript𝑡1subscriptitalic-ϕ𝑝𝑐𝑜subscript𝑡3subscript𝑡2𝑟𝑎𝑛𝑘subscript𝑡1subscript𝑡2limit-from𝑟𝑎𝑛𝑘subscript𝑡3subscript𝑡2subscript𝑤𝑟𝑝𝑜𝑠𝑘subscript𝑡2subscriptitalic-ϕ𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦subscript𝑠2\displaystyle\phi_{\mathit{rw}}(t_{1},t_{2})=\bigvee_{\begin{subarray}{c}% \forall k,t_{1}\textnormal{ reads }k\land t_{2}\textnormal{ writes }k\\ \forall t_{3}\in\mathit{T}\setminus\{t_{1},t_{2}\},t_{3}\textnormal{ writes }k% \end{subarray}}\;\;\begin{subarray}{l}\displaystyle\phi_{\mathit{wr}_{k}}(t_{3% },t_{1})\land\phi_{\mathit{pco}}(t_{3},t_{2})\land\mathit{rank}(t_{1},t_{2})>% \mathit{rank}(t_{3},t_{2})\land\\ \displaystyle\mathit{wrpos}_{k}(t_{2})<\phi_{\mathit{boundary}}(s_{2})\end{subarray}italic_ϕ start_POSTSUBSCRIPT italic_rw end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = ⋁ start_POSTSUBSCRIPT start_ARG start_ROW start_CELL ∀ italic_k , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT reads italic_k ∧ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT writes italic_k end_CELL end_ROW start_ROW start_CELL ∀ italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ∈ italic_T ∖ { italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT } , italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT writes italic_k end_CELL end_ROW end_ARG end_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_ϕ start_POSTSUBSCRIPT italic_wr start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ∧ italic_ϕ start_POSTSUBSCRIPT italic_pco end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∧ italic_rank ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) > italic_rank ( italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∧ end_CELL end_ROW start_ROW start_CELL italic_wrpos start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) < italic_ϕ start_POSTSUBSCRIPT italic_boundary end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_CELL end_ROW end_ARG
ϕ𝑝𝑐𝑜(t1,t2)=subscriptitalic-ϕ𝑝𝑐𝑜subscript𝑡1subscript𝑡2absent\displaystyle\phi_{\mathit{pco}}(t_{1},t_{2})=\;italic_ϕ start_POSTSUBSCRIPT italic_pco end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = ϕ𝑠𝑜(t1,t2)ϕ𝑤𝑟(t1,t2)ϕ𝑤𝑤(t1,t2)ϕ𝑟𝑤(t1,t2)subscriptitalic-ϕ𝑠𝑜subscript𝑡1subscript𝑡2subscriptitalic-ϕ𝑤𝑟subscript𝑡1subscript𝑡2subscriptitalic-ϕ𝑤𝑤subscript𝑡1subscript𝑡2limit-fromsubscriptitalic-ϕ𝑟𝑤subscript𝑡1subscript𝑡2\displaystyle\phi_{\mathit{so}}(t_{1},t_{2})\lor\phi_{\mathit{wr}}(t_{1},t_{2}% )\lor\phi_{\mathit{ww}}(t_{1},t_{2})\lor\phi_{\mathit{rw}}(t_{1},t_{2})\;\loritalic_ϕ start_POSTSUBSCRIPT italic_so end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∨ italic_ϕ start_POSTSUBSCRIPT italic_wr end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∨ italic_ϕ start_POSTSUBSCRIPT italic_ww end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∨ italic_ϕ start_POSTSUBSCRIPT italic_rw end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∨
tT{t1,t2}ϕ𝑝𝑐𝑜(t1,t)ϕ𝑝𝑐𝑜(t,t2)𝑟𝑎𝑛𝑘(t1,t2)>𝑟𝑎𝑛𝑘(t1,t)𝑟𝑎𝑛𝑘(t1,t2)>𝑟𝑎𝑛𝑘(t,t2)subscript𝑡𝑇subscript𝑡1subscript𝑡2subscriptitalic-ϕ𝑝𝑐𝑜subscript𝑡1𝑡subscriptitalic-ϕ𝑝𝑐𝑜𝑡subscript𝑡2𝑟𝑎𝑛𝑘subscript𝑡1subscript𝑡2𝑟𝑎𝑛𝑘subscript𝑡1𝑡𝑟𝑎𝑛𝑘subscript𝑡1subscript𝑡2𝑟𝑎𝑛𝑘𝑡subscript𝑡2\displaystyle\bigvee_{t\in\mathit{T}\setminus\{t_{1},t_{2}\}}\!\!\phi_{\mathit% {pco}}(t_{1},t)\land\phi_{\mathit{pco}}(t,t_{2})\land\mathit{rank}(t_{1},t_{2}% )>\mathit{rank}(t_{1},t)\land\mathit{rank}(t_{1},t_{2})>\mathit{rank}(t,t_{2})⋁ start_POSTSUBSCRIPT italic_t ∈ italic_T ∖ { italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT } end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_pco end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t ) ∧ italic_ϕ start_POSTSUBSCRIPT italic_pco end_POSTSUBSCRIPT ( italic_t , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∧ italic_rank ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) > italic_rank ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t ) ∧ italic_rank ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) > italic_rank ( italic_t , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT )
t1,t2T,t1t2ϕ𝑝𝑐𝑜(t1,t2)ϕ𝑝𝑐𝑜(t2,t1)subscriptformulae-sequencefor-allsubscript𝑡1subscript𝑡2𝑇subscript𝑡1subscript𝑡2subscriptitalic-ϕ𝑝𝑐𝑜subscript𝑡1subscript𝑡2subscriptitalic-ϕ𝑝𝑐𝑜subscript𝑡2subscript𝑡1\displaystyle\boxed{\bigvee_{\forall t_{1},t_{2}\in\mathit{T},t_{1}\neq t_{2}}% \phi_{\mathit{pco}}(t_{1},t_{2})\land\phi_{\mathit{pco}}(t_{2},t_{1})}⋁ start_POSTSUBSCRIPT ∀ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ italic_T , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≠ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_pco end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∧ italic_ϕ start_POSTSUBSCRIPT italic_pco end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT )

B.3. Encoding of Weak Isolation

B.3.1. Causal consistency

t1,t2T,t1t2,formulae-sequencefor-allsubscript𝑡1subscript𝑡2𝑇subscript𝑡1subscript𝑡2\displaystyle\forall t_{1},t_{2}\in\mathit{T},t_{1}\neq t_{2},∀ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ italic_T , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≠ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ,
ϕℎ𝑏(t1,t2)=ϕ𝑠𝑜(t1,t2)ϕ𝑤𝑟(t1,t2)tT{t1,t2}ϕℎ𝑏(t1,t)ϕℎ𝑏(t,t2)subscriptitalic-ϕℎ𝑏subscript𝑡1subscript𝑡2subscriptitalic-ϕ𝑠𝑜subscript𝑡1subscript𝑡2subscriptitalic-ϕ𝑤𝑟subscript𝑡1subscript𝑡2subscriptfor-all𝑡𝑇subscript𝑡1subscript𝑡2subscriptitalic-ϕℎ𝑏subscript𝑡1𝑡subscriptitalic-ϕℎ𝑏𝑡subscript𝑡2\displaystyle\boxed{\phi_{\mathit{hb}}(t_{1},t_{2})=\phi_{\mathit{so}}(t_{1},t% _{2})\lor\phi_{\mathit{wr}}(t_{1},t_{2})\lor\bigvee_{\forall t\in T\setminus\{% t_{1},t_{2}\}}\phi_{\mathit{hb}}(t_{1},t)\land\phi_{\mathit{hb}}(t,t_{2})}italic_ϕ start_POSTSUBSCRIPT italic_hb end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = italic_ϕ start_POSTSUBSCRIPT italic_so end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∨ italic_ϕ start_POSTSUBSCRIPT italic_wr end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∨ ⋁ start_POSTSUBSCRIPT ∀ italic_t ∈ italic_T ∖ { italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT } end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_hb end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t ) ∧ italic_ϕ start_POSTSUBSCRIPT italic_hb end_POSTSUBSCRIPT ( italic_t , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT )
ϕ𝑤𝑤𝑐𝑎𝑢𝑠𝑎𝑙(t1,t2)=k,t1 and t2 write kt3T{t1,t2},t3 reads kϕ𝑤𝑟k(t2,t3)ϕℎ𝑏(t1,t3)𝑤𝑟𝑝𝑜𝑠k(t1)<ϕ𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦(s1)subscriptitalic-ϕsubscript𝑤𝑤𝑐𝑎𝑢𝑠𝑎𝑙subscript𝑡1subscript𝑡2subscriptfor-all𝑘subscript𝑡1 and subscript𝑡2 write 𝑘for-allsubscript𝑡3𝑇subscript𝑡1subscript𝑡2subscript𝑡3 reads 𝑘subscriptitalic-ϕsubscript𝑤𝑟𝑘subscript𝑡2subscript𝑡3subscriptitalic-ϕℎ𝑏subscript𝑡1subscript𝑡3subscript𝑤𝑟𝑝𝑜𝑠𝑘subscript𝑡1subscriptitalic-ϕ𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦subscript𝑠1\displaystyle\phi_{\mathit{ww}_{\mathit{causal}}}(t_{1},t_{2})=\bigvee_{\begin% {subarray}{c}\forall k,t_{1}\textnormal{ and }t_{2}\textnormal{ write }k\\ \forall t_{3}\in\mathit{T}\setminus\{t_{1},t_{2}\},t_{3}\textnormal{ reads }k% \end{subarray}}\begin{subarray}{l}\displaystyle\phi_{\mathit{wr}_{k}}(t_{2},t_% {3})\land\phi_{\mathit{hb}}(t_{1},t_{3})\land\mathit{wrpos}_{k}(t_{1})<\phi_{% \mathit{boundary}}(s_{1})\end{subarray}italic_ϕ start_POSTSUBSCRIPT italic_ww start_POSTSUBSCRIPT italic_causal end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = ⋁ start_POSTSUBSCRIPT start_ARG start_ROW start_CELL ∀ italic_k , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT write italic_k end_CELL end_ROW start_ROW start_CELL ∀ italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ∈ italic_T ∖ { italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT } , italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT reads italic_k end_CELL end_ROW end_ARG end_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_ϕ start_POSTSUBSCRIPT italic_wr start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ∧ italic_ϕ start_POSTSUBSCRIPT italic_hb end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ∧ italic_wrpos start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) < italic_ϕ start_POSTSUBSCRIPT italic_boundary end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_CELL end_ROW end_ARG
ϕℎ𝑏(t1,t2)ϕ𝑤𝑤𝑐𝑎𝑢𝑠𝑎𝑙(t1,t2)ϕ𝑐𝑜𝑐𝑎𝑢𝑠𝑎𝑙(t1)<ϕ𝑐𝑜𝑐𝑎𝑢𝑠𝑎𝑙(t2)subscriptitalic-ϕℎ𝑏subscript𝑡1subscript𝑡2subscriptitalic-ϕsubscript𝑤𝑤𝑐𝑎𝑢𝑠𝑎𝑙subscript𝑡1subscript𝑡2subscriptitalic-ϕsubscript𝑐𝑜𝑐𝑎𝑢𝑠𝑎𝑙subscript𝑡1subscriptitalic-ϕsubscript𝑐𝑜𝑐𝑎𝑢𝑠𝑎𝑙subscript𝑡2\displaystyle\quad\boxed{\phi_{\mathit{hb}}(t_{1},t_{2})\lor\phi_{\mathit{ww}_% {\mathit{causal}}}(t_{1},t_{2})\;\Rightarrow\;\phi_{\mathit{co_{causal}}}(t_{1% })<\phi_{\mathit{co_{causal}}}(t_{2})}start_ARG italic_ϕ start_POSTSUBSCRIPT italic_hb end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∨ italic_ϕ start_POSTSUBSCRIPT italic_ww start_POSTSUBSCRIPT italic_causal end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ⇒ italic_ϕ start_POSTSUBSCRIPT italic_co start_POSTSUBSCRIPT italic_causal end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) < italic_ϕ start_POSTSUBSCRIPT italic_co start_POSTSUBSCRIPT italic_causal end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG

B.3.2. Read committed

t1,t2T,t1t2,formulae-sequencefor-allsubscript𝑡1subscript𝑡2𝑇subscript𝑡1subscript𝑡2\displaystyle\forall t_{1},t_{2}\in T,t_{1}\neq t_{2},∀ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ italic_T , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≠ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ,
ϕ𝑤𝑤𝑟𝑐(t1,t2)=k,t1 and t2 write kt3T{t1,t2},t3 reads ki𝑟𝑑𝑝𝑜𝑠(t3),j𝑟𝑑𝑝𝑜𝑠k(t3),i<jϕ𝑐ℎ𝑜𝑖𝑐𝑒(s3,i)=t1ϕ𝑐ℎ𝑜𝑖𝑐𝑒(s3,j)=t2jϕ𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦(s3)subscriptitalic-ϕsubscript𝑤𝑤𝑟𝑐subscript𝑡1subscript𝑡2subscriptfor-all𝑘subscript𝑡1 and subscript𝑡2 write 𝑘for-allsubscript𝑡3𝑇subscript𝑡1subscript𝑡2subscript𝑡3 reads 𝑘formulae-sequencefor-all𝑖subscript𝑟𝑑𝑝𝑜𝑠subscript𝑡3formulae-sequencefor-all𝑗subscript𝑟𝑑𝑝𝑜𝑠𝑘subscript𝑡3𝑖𝑗subscriptitalic-ϕ𝑐ℎ𝑜𝑖𝑐𝑒subscript𝑠3𝑖subscript𝑡1subscriptitalic-ϕ𝑐ℎ𝑜𝑖𝑐𝑒subscript𝑠3𝑗subscript𝑡2𝑗subscriptitalic-ϕ𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦subscript𝑠3\displaystyle\phi_{\mathit{ww}_{\mathit{rc}}}(t_{1},t_{2})=\bigvee_{\begin{% subarray}{c}\forall k,\;t_{1}\textnormal{ and }t_{2}\textnormal{ write }k\\ \forall t_{3}\in\mathit{T}\setminus\{t_{1},t_{2}\},\;t_{3}\textnormal{ reads }% k\\ \forall i\in\mathit{rdpos}_{\ast}(t_{3}),\forall j\in\mathit{rdpos_{k}}(t_{3})% ,\;i<j\end{subarray}}\begin{subarray}{l}\displaystyle\phi_{\mathit{choice}}(s_% {3},i)=t_{1}\land\phi_{\mathit{choice}}(s_{3},j)=t_{2}\land j\leq\phi_{\mathit% {boundary}}(s_{3})\end{subarray}italic_ϕ start_POSTSUBSCRIPT italic_ww start_POSTSUBSCRIPT italic_rc end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = ⋁ start_POSTSUBSCRIPT start_ARG start_ROW start_CELL ∀ italic_k , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT write italic_k end_CELL end_ROW start_ROW start_CELL ∀ italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ∈ italic_T ∖ { italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT } , italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT reads italic_k end_CELL end_ROW start_ROW start_CELL ∀ italic_i ∈ italic_rdpos start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) , ∀ italic_j ∈ italic_rdpos start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) , italic_i < italic_j end_CELL end_ROW end_ARG end_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_ϕ start_POSTSUBSCRIPT italic_choice end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_i ) = italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∧ italic_ϕ start_POSTSUBSCRIPT italic_choice end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_j ) = italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∧ italic_j ≤ italic_ϕ start_POSTSUBSCRIPT italic_boundary end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) end_CELL end_ROW end_ARG

where 𝑟𝑑𝑝𝑜𝑠(t)subscript𝑟𝑑𝑝𝑜𝑠𝑡\mathit{rdpos}_{\ast}(t)italic_rdpos start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ( italic_t ) is the set of positions of read events in transaction t𝑡titalic_t, 𝑟𝑑𝑝𝑜𝑠k(t)subscript𝑟𝑑𝑝𝑜𝑠𝑘𝑡\mathit{rdpos_{k}}(t)italic_rdpos start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) is the set of reads to k𝑘kitalic_k in transaction t𝑡titalic_t, and s3subscript𝑠3s_{3}italic_s start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT is t3subscript𝑡3t_{3}italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT’s session.

ϕℎ𝑏(t1,t2)ϕ𝑤𝑤𝑟𝑐(t1,t2)ϕ𝑐𝑜𝑟𝑐(t1)<ϕ𝑐𝑜𝑟𝑐(t2)subscriptitalic-ϕℎ𝑏subscript𝑡1subscript𝑡2subscriptitalic-ϕsubscript𝑤𝑤𝑟𝑐subscript𝑡1subscript𝑡2subscriptitalic-ϕsubscript𝑐𝑜𝑟𝑐subscript𝑡1subscriptitalic-ϕsubscript𝑐𝑜𝑟𝑐subscript𝑡2\displaystyle\quad\boxed{\phi_{\mathit{hb}}(t_{1},t_{2})\lor\phi_{\mathit{ww}_% {\mathit{rc}}}(t_{1},t_{2})\;\Rightarrow\;\phi_{\mathit{co_{rc}}}(t_{1})<\phi_% {\mathit{co_{rc}}}(t_{2})}start_ARG italic_ϕ start_POSTSUBSCRIPT italic_hb end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∨ italic_ϕ start_POSTSUBSCRIPT italic_ww start_POSTSUBSCRIPT italic_rc end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ⇒ italic_ϕ start_POSTSUBSCRIPT italic_co start_POSTSUBSCRIPT italic_rc end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) < italic_ϕ start_POSTSUBSCRIPT italic_co start_POSTSUBSCRIPT italic_rc end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG

Appendix C Patterns of Observed and Predicted Executions

Figure 10 shows several observed executions and their unserializable predictions from our experiments. The actual executions consist of dozens of transactions and thousands of events, but the figures show only the transactions and events relevant to predicting unserializable behavior.

((a)) An observed execution of Smallbank.
((b)) A predicted execution based on (9(a)).
((c)) An observed execution of Smallbank.
((d)) A predicted execution based on (9(c)).
((e)) An observed execution of TPC-C.
((f)) A predicted execution based on (9(e)).
((g)) An observed execution of TPC-C.
((h)) A predicted execution based on (9(g)).
Figure 10. Observed executions that resulted in causal (and thus rc), unserializable predicted executions.