0% found this document useful (0 votes)
37 views29 pages

Ultraverse

The document discusses the need for efficient retroactive operation techniques in database systems and web applications to allow recovery from attacks. It proposes Ultraverse, a framework with two components - a retroactive database system that can change past SQL queries, and a web application framework that can change past application states while preserving correctness. Experimental results show Ultraverse achieves much faster speeds for retroactive database updates compared to traditional rollback and replay approaches.

Uploaded by

Yogesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views29 pages

Ultraverse

The document discusses the need for efficient retroactive operation techniques in database systems and web applications to allow recovery from attacks. It proposes Ultraverse, a framework with two components - a retroactive database system that can change past SQL queries, and a web application framework that can change past application states while preserving correctness. Experimental results show Ultraverse achieves much faster speeds for retroactive database updates compared to traditional rollback and replay approaches.

Uploaded by

Yogesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Ultraverse: Efficient Retroactive Operation for Attack Recovery

in Database Systems and Web Frameworks


Ronny Ko1,4 , Chuan Xiao2,3 , Makato Onizuka2 , Yihe Huang1 , Zhiqiang Lin4
1 Harvard University, 2 Osaka University, 3 Nagoya University, 4 Ohio State University

{hrko,yihehuang,mickens}@g.harvard.edu,{chuanx,onizuka}@ist.osaka-u.ac.jp

Abstract 1: function SendMoney (sender, receiver, amount):


2: var sender_balance = SQL_exec("SELECT balance
Retroactive operation is an operation that changes a past opera- 3: FROM Accounts WHERE id = " + sender)
tion in a series of committed ones (e.g., cancelling the past insertion 4: if (sender_balance >= amount):
5: sender_balance -= amount
of ‘5’ into a queue committed at t=3). Retroactive operation has 6: SQL_exec("UPDATE Accounts SET balance = "
7: + sender_balance + " WHERE id = " + sender)
arXiv:2211.05327v2 [cs.DB] 2 Jan 2023

many important security applications such as attack recovery or 8: SQL_exec("UPDATE Accounts SET balance += "
9: + amount + "WHERE id = " + receiver)
private data removal (e.g., for GDPR compliance). While prior efforts
designed retroactive algorithms for low-level data structures (e.g., Figure 1: An online banking server’s money transfer handler.
queue, set), none explored retroactive operation for higher levels,
such as database systems or web applications. This is challenging, • Efficiency: It is critical for any financial institution to investigate
because: (i) SQL semantics of database systems is complex; (ii) data an attack and resume its service in the shortest possible time,
records can flow through various web application components, such because every second of the service downtime is its financial
as HTML’s DOM trees, server-side user request handlers, and client- loss. Provided this, performing naive rollback & replay of all past
side JavaScript code. We propose Ultraverse, the first retroactive queries is an inefficient solution. In fact, it would be sufficient to
operation framework comprised of two components: a database selectively rollback & replay only those queries whose results are
system and a web application framework. The former enables users affected by the problematic query. Unfortunately, no prior work
to retroactively change committed SQL queries; the latter does the has proposed such a more efficient database rollback-fix-replay
same for web applications with preserving correctness of application technique covering all types of SQL semantics (i.e., retroactive
semantics. Our experimental results show that Ultraverse achieves database system).
10.5x∼693.3x speedup on retroactive database update compared to • Correctness: Even if there existed an efficient retroactive data-
a regular DBMS’s flashback & redo. base system, this would not provide the correctness of the appli-
cation state. This is because replaying only database queries does
1 Introduction not replay and re-reflect what has occurred in the application code,
Modern web application services are exposed to various remote which can lead to incorrect database state from the application
attacks such as SQL injection, cross-site scripting, session hijacking, semantics. In Figure 1, the SendMoney handler is essentially an
or even buffer overflows [27, 31, 41]. To recover from the attack dam- application-level transaction comprised of 3 SQL queries: SELECT,
ages, the application code often needs to be reconfigured or patched, UPDATE1 , and UPDATE2 . Among them, UPDATE2 takes sender_bal-
and the application’s state polluted by the attack has to be retracted. ance as an input dynamically computed by the application code
Retroactive operation is particularly important, especially for the (line 5). If we only rollback-fix-replay the SQL logs of the DB sys-
service that hosts many users and manages their financial assets, tem, this will not capture the application code’s sender_balance
because their financial data may have been affected by the attack variable’s new value that has to be recomputed during the replay
and put into invalid state. phase, and will instead use the stale old value of sender_balance
Concretely, Figure 1 shows an example of an online banking web recorded in the prior UPDATE1 query’s SQL log.
server’s user request handler that transfers money from one user’s Since there exists no automated system-level technique to retroac-
account to another. The server queries the Accounts table to check tively update both a database and application state, most of today’s
if the sender has enough balance; if so, the server subtracts the trans- application service developers use a manual approach of hand-
ferring amount from the sender’s balance and adds it to the receiver’s crafting compensating transaction [57], whose goal is to bring in
balance. But suppose that some SendMoney request was initiated the same effect of executing retroactive operation on an applica-
by a remote attacker (e.g., via request forgery or session hijacking). tion service. However, as a system’s complexity grows, ensuring
This will initially corrupt the server’s Accounts table, and as time the correctness of compensating transactions is challenging [46],
flows, its tampered data will flow to other tables and put the entire because their dependencies among SQL queries and data records
database into a corrupted state. To recover the database to a good become non-trivial for developers to comprehend. In the SQL level,
state, the most naive and intuitive solution is the following steps: (1) although materialized views [11] can reflect changes in their base
roll back the database to just before the SQL query’s commit time of tables’ derived computations, they cannot address complex cases
the malicious user request; (2) either skip the malicious query or san- such as: (i) views have circular dependencies; (ii) computations de-
itize its tampered amount to a benign value; (3) replay all subsequent rived from base tables involve timely recurring queries [38]. Indeed,
queries. But this naive solution suffers two problems: efficiency and major financial institutions have suffered critical software glitches
correctness. in compensating transactions [60].
Some traditional database system techniques are partially rele-
vant to addressing these issues: temporal or versioning databases [34,
1
Ronny Ko1,4 , Chuan Xiao2,3 , Makato Onizuka2 , Yihe Huang1 , Zhiqiang Lin4

39] can retrieve past versions of records in tables; database prove- equivalent application-level user request handler and generates the
nance [36] traces lineage of data records and speculates how a query’s final application code for service deployment.
output would change if inputs to past queries were different; check- We have implemented Ultraverse and compared its speed to Mari-
point and recovery techniques [61] rollback and replay databases aDB’s rollback & replay after retroactive database update (§5). Ul-
by following logs; retroactive data structure [19] can retroactively traverse achieved 10.5x∼693.3x speedup across various benchmarks.
add or remove past actions on a data object. Unfortunately, all ex- We also evaluated Ultraverse on popular open-source web services
isting techniques are problematic. First, their retroactive operations (InvoiceNinja [17]) and machine-learning data analytics (Apache
are either limited in functionality (i.e., supports only a small set Hivemall [32]), where Ultraverse achieved 333.5x-695.6x speedup
of SQL semantics, without supporting TRIGGER or CONSTRAINT, for in retroactive operation. As a general-purpose retroactive database
example) or inefficient (i.e., exhaustive rollback and replay of all system and web framework, we believe Ultraverse can be used to fix
committed queries). Second, no prior arts address how to perform corrupt states or simulate different states (i.e., what-if analysis [21])
retroactive operations for web applications which involve data flows of various web applications such as financial service, e-commerce,
outside SQL queries– this includes data flows in application code (e.g., logistics, social networking, and data analytics.
webpage’s DOM tree, client’s JavaScript code, server’s user request Contributions. We make the following contributions:
handler) as well as each client’s local persistent storage/databases • Designed the first (efficient) retroactive database system.
besides the server’s own database. • Designed the first web application framework that supports retroac-
To address these problems, we propose Ultraverse, which is com- tive operation preserving application semantics.
posed of two components: a database system and a web application • Developed and evaluated Ultraverse, our prototype of retroactive
framework. The Ultraverse database system (§3) is designed to database system and web application framework.
support efficient retroactive operations for queries with full SQL se-
mantics. To enable this, Ultraverse uses five techniques: (i) it records 2 Overview
column-wise read/write dependencies between queries during regu- Problem Setup: Suppose an application service is comprised of
lar service operations and generates a query dependency graph; (ii) one or more servers sharing the same database, and many clients.
it further runs row-wise dependency analysis to group queries into An attacker maliciously injected/tampered with an SQL query (or
different clusters such that any two queries from different clusters an application-level user request) of the application service, which
access different rows of tables, which means the two queries are made the application’s database state corrupted. All application
mutually independent and unaffected; (iii) it uses the dependency data to recover are in the same database. We identified the attacker-
graph to roll back and replay only those queries dependent to the controlled SQL query (or user request) committed in the past.
target retroactive query both column-wise and row-wise; (iv) during Goal: Our goal is to automatically recover the application’s cor-
the replay, it runs multiple queries accessing different columns in rupted database state, by retroactively removing or sanitizing the
parallel for fast replay; (v) it uses Hash-jumper which computes each attacker-controlled SQL query (or user request) committed in the
table’s state as a hash (to track its change in state) upon each query past. This goal should be achieved: (i) efficiently by minimizing the
commit and uses these hashes to decide whether to skip unnecessary recovery delay; and (ii) correctly not only from the low-level SQL
replay during retroactive actions (i.e., when it detects a table’s hash semantics, but also from the high-level application semantics.
match between the regular operation and the retroactive operation). Retroactive Operation: Consider a database D, a set Q of queries
Importantly, Ultraverse is seamlessly deployable to any SQL-based 𝑄𝑖 where 𝑖 represents the query’s commit order (i.e., query index),
commodity database systems, because Ultraverse’s query analyzer and 𝑄𝜏′ is the target query to be retroactively added, removed, or
and replay scheduler run with an unmodified database system. changed at the commit order 𝜏 within {𝑄 1, 𝑄 2, ...𝑄𝜏 ...𝑄 |Q| } = Q. In
The Ultraverse web application framework (§4) is designed case of retroactively adding a new query 𝑄𝜏′ , 𝑄𝜏′ is to be inserted
to retroactively update the state of web applications whose data flows (i.e., executed) right before 𝑄𝜏 . In case of retroactively removing the
occur both inside and outside the database, with support of all essen- existing query 𝑄𝜏 (𝑖.𝑒., 𝑄𝜏′ = 𝑄𝜏 ), 𝑄𝜏 is to be simply removed in the
tial functionalities common to modern web application frameworks. committed query list. In case of retroactively changing the existing
The Ultraverse framework provides developers with a uTemplate query 𝑄𝜏 to 𝑄𝜏′ , 𝑄𝜏 is to be replaced by 𝑄𝜏′ . The retroactive operation
(Ultraverse template), inside which they define the application’s user on the target query 𝑄𝜏′ is equivalent to transforming D to a new state
request handlers as SQL PROCEDUREs. The motivation of uTemplate that matches the one generated by the following procedure:
is to enforce all data flows of the application to occur only through (1) Rollback Phase: roll back D’s state to commit index 𝜏 − 1 by
SQL queries visible to the Ultraverse database. Using uTemplate also rolling back 𝑄 |Q| , 𝑄 |Q−1| , . . . 𝑄𝜏+1, 𝑄𝜏 .
fundamentally limits the developers’ available persistent storage to (2) Replay Phase: do one of the following:
be only the Ultraverse database. During regular service operations, • To retroactively add𝑄𝜏′ , execute𝑄𝜏′ and then replay𝑄𝜏 , . . . 𝑄 |Q| .
each client’s user request logs are transparently sent to the server • To retroactively remove 𝑄𝜏′ , replay 𝑄𝜏+1, . . . 𝑄 |Q| .
whenever she interacts with the server. The server replays these logs • To retroactively change 𝑄𝜏 to 𝑄𝜏′ , execute 𝑄𝜏′ and then replay
to mirror and maintain a copy of each client’s local database, with 𝑄𝜏+1, . . . 𝑄 |Q| .
which the server can retroactively update its server-side database Instead of exhaustively rolling back and replaying all 𝑄𝜏 , . . . 𝑄 |Q| ,
even when all clients are offline. uTemplates are expressive enough Ultraverse picks only those queries whose results depend on the
to implement various web application logic (e.g., accessing the DOM retroactive target query 𝑄𝜏′ . To do this, Ultraverse analyzes query
tree to read user inputs and display results on the browser’s screen). dependencies based on the read and write sets of each query. Our
Apper (application code generator) converts each uTemplate into its
2
Ultraverse: Efficient Retroactive Operation for Attack Recovery in Database Systems and Web Frameworks

Client’s Regular Query Opera�on Client’s Retroac�ve Opra�on Q1: CREATE TABLE Users Q2: CREATE TABLE Statements
1 query 3 result ① command ⑤ comple�on ( uid INT PRIMARY, ( sid INT PRIMARY AUTO_INCREMENT,
ssn INT UNIQUE aid INT FOREIGN KEY Accounts(aid),
DB System Ultraverse Query Analyzer - Read Set = { } contents TEXT)
- Read Set = { Accounts.aid }
2-a silently - Write Set = { Users.* }
2 record ② create the query - Write Set = { Statements.* }
- Parser read
Query Analysis Log dependency graph
- Op�mizer Query Log Q3: CREATE TABLE Accounts Q4: CREATE TABLE Transac�ons
mo
- Executor ④ record n
has itor hits
Redo ( aid INT PRIMARY, uid INT, ( t_id INT PRIMARY AUTO_INCREMENT,
hes
y ha
sh Scheduler balance INT, sender INT FOREIGN KEY Accounts(aid),
Hash Jumper notif FOREIGN KEY uid -> Users(uid) receiver INT FOREIGN KEY Accounts(aid),
③ replay - Read Set = { Users.uid } amount INT,
�me TIMESTAMP DEFAULT NOW() )
Unmodified database system Steps of regular query opera�ons - Write Set = { Accounts.* }
- Read Set = { Accounts.aid, Users.uid }
Ultraverse’s new so�ware component Steps of retroac�ve opera�on - Write Set = { Transac�ons.* }
Q6: INSERT INTO Users
Figure 2: The Ultraverse database system’s architecture. VALUES ('alice', 0204392) Q5: CREATE TRIGGER BalanceCheck
- Read Set = { } BEFORE INSERT ON Transac�ons
- Write Set = { Users.* } IF ((SELECT balance FROM Accounts WHERE
proposed granularity for expressing the read and write sets is table aid = 0001) < 100)
THEN
columns, the finest database granularity we can obtain from query Q7: INSERT INTO Users SIGNAL SQLSTATE '45000'
VALUES ('bob', 3804593) SET MESSAGE_TEXT = 'Insufficient funds';
statements only. Also, we will consider the effect of schema evolution - Read Set = { } - Read Set = { Accounts.balance, Accounts.aid }
- Write Set = { Users.* } - Write Set = { }
including TRIGGER creation/deletion, as well as TRANSACTION or
Q8: INSERT INTO Accounts
PROCEDURE which binds and executes multiple queries together. VALUES (0001, 'alice', 100) Q10: START TRANSACTION;
INSERT INTO Transac�ons VALUES (0001, 0002, 100);
- Read Set = { Users.uid } UPDATE Accounts balance -= 100 WHERE aid = 0001;
3 Ultraverse’s Database System - Write Set = { Accounts.* } UPDATE Accounts balance += 100 WHERE aid = 0002;
Figure 2 depicts Ultraverse’s database architecture. At a high level, - Read Set = { Accounts.aid, Accounts.balance }
Q9: INSERT INTO Accounts - Write Set = { Transac�ons.*, Accounts.balance }
Ultraverse’s query analyzer runs with a unmodified database system. VALUES (0002, 'bob', 30)
- Read Set = { Users.uid }
While the database system serves a user’s regular query requests - Write Set = { Accounts.* } Q13: START TRANSACTION;
DECLARE contents TEXT;
and records them to the query log, Ultraverse’s query analyzer reads contents = CONCAT(SELECT * FORM Transac�ons
Q11: INSERT INTO Users WHERE (sender = ‘0001’ OR receiver = ‘0001’)
the query log in the background and records two additional pieces VALUES ('charlie', 0204392) AND �me > DATE_SUB(NOW(), INTERVAL 1 MONTH) );
of information: (a) read-write dependencies among queries, (b) the - Read Set = { } INSERT INTO Statements ('0001', contents);
- Write Set = { Users.* } - Read Set = { Transac�ons.* }
hash values of each table updated by each committed query. When - Write Set = { Statements.* }
a user requests to retroactively add, remove, or change past queries, Q12: INSERT INTO Accounts
VALUES (0003, 'charlie', 20)
Ultraverse’s query analyzer analyzes the query dependency log and - Read Set = {Users.uid }
- Write Set = {Accounts.*}
sends to the database system the queries to be rolled back and re- A target query
Query dependency on a to retroac�vely remove
non-FOREIGN KEY column
played. Ultraverse’s replay phase executes multiple non-dependent A dependent query
queries in parallel to enhance the speed, while guaranteeing the final Query dependency on a to rollback & replay
FOREIGN KEY column
A trigger to include
database state to be strongly serialized (as if all queries are serially Query dependency of a Trigger during the replay
committed in the same order as in the past). Figure 3: Online banking service’s query dependency graph.
3.1 A Motivating Example 3.2 Column-wise Read/Write Dependency
Figure 3 is a motivating example of Ultraverse’s query dependency
Table A in §A1 shows all types of SQL queries that Ultraverse
graph for an online banking service, comprised of 4 tables and 1 trig-
supports for query dependency analysis. In the table, each query
ger. The Users table stores each user’s id and social security number.
has a read set (𝑅) and a write set (𝑊 ), which are used to deduce
The Accounts table stores each user’s account number and its cur-
query dependencies. A query’s 𝑅 set contains a list of columns of
rent balance. The Transactions table stores each record of money
tables/views that the query reads during execution. A 𝑊 set is a list
transfer as the tuple of the sender & receiver’s account numbers and
of columns of tables/views that the query updates during execution.
the transfer amount. The Statements table stores each account’s
Besides the descriptions in Table A, we add the following remarks:
monthly transactions history. The BalanceCheck trigger checks if
• The 𝑅/𝑊 set policy for CREATE/ALTER TABLE is applied in the
each account has an enough balance before sending out money.
same manner for creating/altering CONSTRAINT or INDEX.
Initially, the service creates the Users, Accounts, Transactions,
• If an ALTER TABLE query dynamically adds a FOREIGN KEY col-
and Statements tables (Q1∼Q4), as well as the BalanceCheck trig-
umn to some table, then the 𝑅/𝑊 set policy associated with this
ger (Q5). Then, Alice and Bob’s user IDs and accounts are created
newly added FOREIGN KEY column is applied only to those queries
(Q6∼Q9). Alice’s account transfers $100 to Bob’s account (Q10). Char-
committed after this ALTER TABLE query.
lie creates his ID and account (Q11∼Q12). Alice’s monthly bank
• VIEWs are updatable. If a query INSERT, UPDATE or DELETE a view,
statement is created (Q13). Now, suppose that Q10 turns out to
the original table/view columns this view references are also cas-
be an attacker-controlled money transfer and the service needs to
cadingly included in the query’s 𝑊 set.
retroactively remove it. Ultraverse’s 1st optimization goal is to roll-
• As for branch conditions, it is difficult to statically correctly pre-
back & replay only Q5, Q10, Q12, and Q13, skipping Q11, because
dict which direction will be explored during runtime, because the
the columns that Q11 reads/writes (Users.*) are unaffected by the
direction may depend on the dynamically evolving state of the
retroactively removed Q10.
database. To resolve this uncertainty issue, Ultraverse assumes
that each conditional branch statement in a PROCEDURE, TRANSAC-
TION, or CREATE TRIGGER query to explores both directions (i.e.,
1 Appendix URL: https://drive.google.com/file/d/11plMlhaUv1neCm4WTk6BD4G5r5DLEDo_

3
Ronny Ko1,4 , Chuan Xiao2,3 , Makato Onizuka2 , Yihe Huang1 , Zhiqiang Lin4

Notations be also affected. The red arrows ensure to rollback and replay the
𝑸𝒏 : n-th committed query queries accessing such potentially affected foreign key columns(s).
𝝉 : The retroactive target query’s index
We provide formal analysis of Ultraverse’s column-wise retroac-
𝑻𝒙 : Query "CREATE TRIGGER x"
𝑹 (𝑸 𝒏 ), 𝑾 (𝑸 𝒏 ) : 𝑄𝑛 ’s read & write sets
tive operation in §E1 .
𝒄 : a table’s column
3.4 Efficient Rollback and Replay
𝑸𝒏 → 𝑸𝒎 : 𝑸 𝒏 depends on 𝑸 𝒎
Given the query dependency graph, Ultraverse rollbacks and re-
𝑸𝒏 ⊲ 𝑻 𝒙 : 𝑸 𝒏 is a query that triggers 𝑻 𝒙
𝑨 =⇒ 𝑩 : If 𝑨 is true, then 𝑩 is true plays only the queries dependent on the target query as follows:
Column-wise Query Dependency Rule (1) Rollback Phase: Rollback each table whose column(s) appears
1. ∃𝑐 ( (𝑐 ∈ 𝑊 (𝑄𝑚 )) ∧ (𝑐 ∈ (𝑅 (𝑄𝑛 ) ∪ 𝑊 (𝑄𝑛 )))) ∧ (𝑚 < 𝑛) in some read or write set in the query dependency graph. Copy
=⇒ 𝑄𝑛 → 𝑄𝑚 those tables into a temporary database.
2. (𝑄𝑛 → 𝑄𝑚 ) ∧ (𝑄𝑚 → 𝑄𝑙 ) =⇒ 𝑄𝑛 → 𝑄𝑙 (2) Replay Phase: Add, remove, or change the retroactive target
3. (𝑄𝑛 → 𝑄𝜏 ) ∧ (𝑄𝑛 ⊲ 𝑇𝑥 ) =⇒ 𝑇𝑥 → 𝑄𝜏 query as requested by the user. Then, replay (from the temporary
Table 1: Ultraverse’s column-wise query dependency rules. database) all the queries dependent on the target query, as much
in parallel as possible without harming the correctness of the final
Ultraverse merges the 𝑅/𝑊 sets of the true and false blocks) . This database state (i.e., guaranteeing strongly serialized commits).
strategy leads to an over-estimation of 𝑅/𝑊 sets, and thereby the (3) Database Update: Lock the original database and reflect the
dependency graph’s size is potentially larger than optimal. At this changes of mutated tables (defined later) from the temporary
cost, we ensure the correctness of retroactive operation. database to the original database. After done, unlock the original
database and delete the temporary database.
3.3 Query Dependency Graph Generation
During the above process, each table in the original database
Ultraverse’s query analyzer records the 𝑅/𝑊 sets of all committed
is classified as one of the following three: 1) a mutated table if its
queries during regular service operations. This information is used
column(s) is in the write set of at least one dependent query; 2) a con-
for retroactive operation: serving a user’s request to retroactively
sulted table if none of its columns is in the write set of any dependent
remove, add, or change past queries and updating the database ac-
queries, but its column(s) is in the read set of at least one dependent
cordingly. Ultraverse accomplishes this by: (i) rolling back only those
query; 3) an irrelevant table if the table is neither mutated nor con-
tables accessed by the user’s target query and its dependent queries;
sulted. In step 1’s rollback phase, Ultraverse rollbacks mutated and
(ii) removing, adding, or changing the target query; (iii) replaying
consulted tables as well as their any logical INDEXes to each of their
only its dependent queries. To choose the tables to roll back, Ultra-
first-accessed commit time after the retroactive operation’s target
verse creates a query dependency graph (Figure 3), in which each
time, by leveraging system versioning of temporal databases [44].
node is a query and each arrow is a dependency between queries.
The reason Ultraverse needs to rollback consulted tables is that their
In Ultraverse, if executing one query could affect another query’s
former states are needed while replaying the dependent queries
result, the latter query is said to depend on the former query. Table 1
that update mutated table(s). Affected by this, other non-dependent
defines Ultraverse’s four query dependency rules. Rule 1 states that
queries that have those consulted tables in their write set will also be
if 𝑄𝑚 writes to a column of a table/view and later 𝑄𝑛 reads/writes
replayed during the replay phase. During replay, the intermediate
the same column, then 𝑄𝑛 depends on 𝑄𝑚 . In the example of Figure 3,
values of a consulted table will be read by replayed queries; at the
Q12→Q11, because Q12 reads the Users.uid (foreign key) column
end of replay, consulted tables will have the same state as before the
that Q11 wrote to. Note our query dependency differs from the depen-
rollback.
dency in conflict graphs [64], which includes read-then-write, write-
In step 2’s replay phase, the past commit order of dependent
then-read, and write-then-write cases. In contrast, our rule excludes
queries should be preserved, because otherwise, they could lead to
the read-write case, because the prior query’s read operation does not
inconsistency of the final database state– leading to a different uni-
affect the values that the later query writes. Rule 2 states that if 𝑄𝑛 de-
verse than the desired state. To ensure this, a naive approach would
pends on 𝑄𝑚 and 𝑄𝑚 depends 𝑂𝑙 , then 𝑄𝑛 also depends on 𝑄𝑙 (transi-
re-commit each query one by one (i.e., enforce strict serializability)
tivity). In Figure 3, Q12→Q7, because Q12→Q11 and Q11→Q7. Rules
for reproduction. However, serial query execution is slow. Ultraverse
3 and handles triggers. Rule 3 states that if 𝑄𝑛 depends on 𝑄𝜏 (the
solves this problem by leveraging query dependency information
retroactive target query), then we enforce 𝑇𝑥 (a trigger linked to 𝑄𝑛 )
and simultaneously executing multiple queries in parallel whose
to depend on𝑄𝜏 , so that𝑇𝑥 gets reactivated and its statement properly
𝑅/𝑊 sets do not overlap each other. Such a parallel query execution
executes whenever its linked query (or queries) is executed during
is safe because if two queries access different objects, they do not
the retroactive operation. In Figure 3, the trigger Q5→Q10, because
cause a race condition with each other. This improves the replay
Q5 is linked to Q10 (i.e., INSERT ON Transactions). Note that Ultra-
speed while guaranteeing the same final database state as strongly
verse’s dependency graph in Figure 3 omits all queries whose write
serialized commits.
set is empty (e.g., SELECT queries), since they are read-only queries
Figure 4 is Ultraverse’s replay schedule for Figure 3’s retroactive
not affecting the database’s state. Also note that Figure 3’s red arrows
operation scenario. Red arrows are the replay order. A replay arrow
represent column read-write dependencies caused by FOREIGN KEY
from 𝑄𝑛 → 𝑄𝑚 is created if 𝑛 < 𝑚 and the two queries have a
relationships – if some column’s value is retroactively changed, then
conflicting operation [55] (a read-write, write-read, or write-write) on
the foreign key columns of other tables referencing this column can
the same column of a table/view). Q12 and Q13 are safely replayed
in parallel, because they access different table columns.
4
Ultraverse: Efficient Retroactive Operation for Attack Recovery in Database Systems and Web Frameworks

1 2 read-then-write
Q12 3a Row-wise Query Independency Rule
Q5 Q10 write-then-read
Q13 3b write-then-write 𝑲 𝒄 (𝑸 𝒏 ) : A cluster set containing 𝑄𝑛 ’s cluster keys
Figure 4: The replay schedule for Figure 3. ..given 𝑐 is chosen as the cluster key column
𝑄𝑛 ↭ 𝑄𝑚 : 𝑄𝑛 and 𝑄𝑚 are in the same cluster
During the retroactive operation, Ultraverse simultaneously serves 1. 𝐾𝑐 (𝑄𝑛 ) ∩ 𝐾𝑐 (𝑄𝑚 ) ≠ ∅ =⇒ 𝑄𝑛 ↭ 𝑄𝑚
regular SQL operations from its clients, so the database system’s 2. (𝑄𝑚 ↭ 𝑄𝑛 ) ∧ (𝑄𝑛 ↭ 𝑄𝑜 ) =⇒ 𝑄𝑚 ↭ 𝑄𝑜
front-end service stays available. Such simultaneous processing is 3. 𝑄𝑛 ̸↭ 𝑄𝑚 =⇒ 𝑄𝑛 ↮ 𝑄𝑚
possible because the retroactive operation’s rollback and replay are Choice Rule for the Cluster Key Column
done on a temporarily created database. Once Ultraverse’s rollback 𝒛 : The last committed query’s index in Q
𝒕 : The retroactive target query’s index
& replay phases (steps 1 and 2) are complete, in step 3 Ultraverse
C : The set of all table columns in D
temporarily locks the original database, updates the temporary data- Í
𝑐 choice B argmin𝑐∈C 𝑧𝑗=𝑡 |𝐾𝑐 (𝑄 𝑗 ) | 2
base’s mutated table tuples to the original database, and unlocks it.
Table 2: Ultraverse’s query clustering rules.
Should there be new regular queries additionally committed during
the rollback and replay phases, before unlocking the database, Ul- an INSERT query, the rows it accesses are specified in its VALUES
traverse runs another set of rollback & replay phases for those new clause. For example, if a query’s statement contains the “WHERE
queries to reflect the delta change. aid=0001" clause or the “(aid,uid,balance) VALUES (0001,‘al-
Replaying non-determinism: During regular service operations, ice’,100)” clause, this query accesses only the rows whose aid
Ultraverse’s query analyzer records the concretized return value of value is 0001. We call such a row-deciding column a cluster key
each non-deterministic SQL function (e.g., CURTIME() or RAND()). column. Ultraverse labels each query with a cluster key (or cluster
Then, during the replay phase, Ultraverse enforces each query’s each keys if the row-deciding column is specified as multiple values or
non-deterministic function call to return the same value as in the past. a range). After every query is assigned a cluster key set (K set), Ul-
This is to ensure that during the replay phase, non-deterministic traverse groups those queries which have one or more same cluster
functions behave the same manner as during the regular service keys (i.e., their accessing rows overlap) into the same cluster. At
operations. If a retroactively added query calls a timing function the end of recursive clustering until saturation, any two queries
such as CURTIME(), Ultraverse estimates its return value based on from different clusters are guaranteed to access different rows in
the (past) timestamp value retroactively assigned to the query. any tables they access, thus their operations are mutually indepen-
A retroactively added/removed INSERT query may access a table dent and unaffected. Table 2 describes this query clustering algo-
whose schema uses AUTO_INCREMENT on some column value. §C.5 rithm in a formal manner as the row-wise query independency rule.
Column-wise Row-wise
explains how Ultraverse handles this. Ultraverse regards two queries to Query Dependency Query Dependency
have a dependency only if they
3.5 Row-wise Dependency & Query Clustering Final
are dependent both column-wise Query
While §3.3 described column-wise dependency analysis, Ultra- and cluster-wise, as illustrated Dependency
verse also uses row-wise dependency analysis to further reduce in the Venn diagram. Therefore,
the number of queries to rollback/replay. We use the same moti- from §3.2’s column-wise query dependency graph, we can further
vating example in Figure 3 which retroactively removes Q10. The eliminate the queries that are not in the same cluster as the retroactive
column-wise dependency analysis found the query dependency of target query (e.g., Q12 in Figure 3).
{Q5,Q12,Q13}→Q10. However, we could further skip Q12, because The query clustering scheme can be effectively used only if each
Q12 only accesses Charlie’s data records and the actual data affected query in a retroactive operation’s commit history window has at least
by the retroactive target query (Q10) is only Alice and Bob’s data. 1 cluster key. However, some queries may not access the cluster key
In particular, Charlie had no interaction with Alice and Bob (i.e., no column if they access different tables. For such queries, their other
money transfer), thus Charlie’s data records are independent from columns can be used as a cluster key under certain cases. Ultraverse
(and unaffected by) Alice and Bob’s data changes. Similarly, later in classifies them into 2 cases. First, if a column is a FOREIGN KEY
the service, all other users who have no interaction with with Alice column that references the cluster key column (e.g., Accounts.uid),
or Bob’s data will be unaffected, thus the queries operating on the we define that column as a foreign cluster key column, whose value
other users’ records can be skipped from rollback & replay. In this itself can be used as a cluster key. This is because the foreign (cluster)
observation, each user’s data boundary is the table row. key directly reflects its origin (cluster) key. Second, if a column is
Inspired from this, we propose the row-wise query clustering in the same table as the cluster key or a foreign cluster key (e.g.,
scheme. Its high-level idea is to classify queries into disjoint clusters Accounts.aid), we define it as an alias cluster key column, whose
according to the table rows they read/write, such that any two queries concrete value specified in a query statement’s WHERE, SET, or VALUES
belonging to different clusters have no overlap in the table rows they clause will be mapped to its same row’s (foreign) cluster key column’s
access (i.e., two queries are row-wise independent). value. For example, in Figure 3, Q12’s alias cluster key Accounts.aid
Ultraverse identifies the rows each query accesses by analyzing creates the mapping ⟨0003→“charlie"⟩, thus Q12’s cluster key is
the query’s statement. The major query types for query cluster- {“charlie"}. The cluster keys of Q10 and Q5 are {“alice", “bob"}. Q13’s
ing is SELECT, INSERT, UPDATE, and DELETE, because they are de- cluster key is {“alice"}. {“alice", “bob"}∩{“alice"}=¬∅, so Q5, Q10, Q13
signed to access only particular row(s) in a table. For SELECT, UP- are merged into the same cluster under the merged key set {“alice",
DATE, and DELETE queries, the rows they access are specified in “bob"}. However, Q12 is not merged, because {“charlie" }∩{“alice",
the WHERE clause (and in the SET clause in case of UPDATE); for “bob" }= ∅. As Q12 is not in the same cluster as Q10 (the retroactive
5
Ronny Ko1,4 , Chuan Xiao2,3 , Makato Onizuka2 , Yihe Huang1 , Zhiqiang Lin4

A query to Q14: CREATE TABLE Rewards


Users.uid Accounts.uid Accounts.aid Statements.aid retroac�vely remove (uid CHAR(16) FOREIGN KEY Accounts(uid),
Cluster key propaga�on to a Transac�ons.receiver H(Ads (14)) = H(Ads (14))
B A
type CHAR(16))
foreign key column A new query to
Cluster key propaga�on to an Transac�ons.sender retroac�vely add
alias cluster key column (within same table) Q15: INSERT INTO Rewards VALUES ('alice', 'mileage')
H(AdsB(15)) ≠ H(AdsA(14))
Its column value itself is the cluster key ON DUPLICATE KEY UPDATE UPDATE type='mileage'
The cluster key mappings for <“Accounts.aid”→“Accounts.uid”> are
specified in the commi�ed “INSERT INTO Accounts” query statements Q16: INSERT INTO Rewards VALUES ('bob', 'movie')
H(Ads (16)) ≠ H(Ads (15))
B A
Figure 5: Figure 3’s cluster key propagation graph. ON DUPLICATE KEY UPDATE UPDATE type='movie'

≠ ...(more INSERT|UPDATE|DELETE queries on “Rewards”)
target query), Q12 is row-wise independent from Q10, thus we can ...
H(AdsB(99)) = H(AdsA(98)) Q99: INSERT INTO Rewards VALUES ('alice', 'shopping')
eliminate Q12 from the rollback & replay list. ON DUPLICATE KEY UPDATE UPDATE type='shopping'
=
= (all hash hits) ...(more INSERT|UPDATE|DELETE queries on “Rewards”)
Figure 5 shows the online banking example’s relationships be- ...
tween the cluster key column (Users.uid), foreign cluster key columns H(AdsB(1000)) = H(AdsA(999)) Q1000: INSERT INTO Rewards VALUES ('rex', 'food')
ON DUPLICATE KEY UPDATE UPDATE type='food'
(Accounts.uid, Transactions.sender, Transactions.receiver,
AdsB|A : The “Ads” table BEFORE|AFTER the retroac�ve update
Statements.aid), and an alias cluster key column (Accounts.aid). AdsA(n) : The AdsA table at the n-th query’s commit (a�er retroacive update)
Arrows represent the order of discovery of each foreign/alias key. H(AdsA(n)) : The AdsA table’s hash value a�er the n-th query’s commit
In order for the query clustering to be effective, it is important Figure 6: An example of Hash-jump.
to choose the optimal column as the cluster key column. Table 2 activity), and decides to retroactively remove Q15. However, upon
describes Ultraverse’s choice rule for the cluster key column. Infor- rolling back and replaying Q99, the Rewards table’s state becomes
mally speaking, the choice rule is in favor of uniformly distributing the same as before the retroactive operation at the same commit
the cluster keys across all queries (i.e., minimize the standard devia- time, and all subsequent queries until the end (Q1000) are the same
tion), in order to minimize the worst case number of queries to be as before. Therefore, we deterministically know that the Rewards
replayed for any retroactive operation. If some query 𝑄𝑖 does not table’s final state will become the same as before the retroactive
specify the value of the (foreign/alias) cluster key column of some operation, thus it’s effectless to replay Q99∼Q1000.
table 𝑡 𝑗 that 𝑄𝑖 accesses, then we force 𝐾𝑐 (𝑄𝑖 ) = ¬∅ (all elements), Ultraverse’s Hash-jumper is designed to capture such cases and
which makes |𝐾𝑐 (𝑄𝑖 )| 2 = ∞ and thereby the choice rule excludes 𝑐. immediately terminate the retroactive operation as soon as realizing
The K set of a TRANSACTION/PROCEDURE query is the union of the that replaying the remaining queries will be effectless. For this, Ultra-
K sets of of its all sub-queries. The K set of a CREATE/DROP TRIGGER verse’s high-level approach is to compute and log the hash value of
query is the union of the K sets of all the queries that are linked to the modified table(s) upon each query’s commit during regular oper-
the trigger within the retroactive operation’s time window. The K ations. During a retroactive operation’s replay phase, Hash-jumper
set of a [CREATE/DROP/ALTER] [TABLE/VIEW] query is the union simultaneously runs from the background (not to block the replay
of the K sets of all the queries that access this table/view committed of queries) and compares whether the replayed query’s output hash
within the retroactive operation’s time window. If query clustering value matches its past logged version (before the retroactive opera-
is unusable (i.e., the choice rule’s computed weight is infinite for all tion). During the replay, if hash matches are found for all mutated
candidate columns in the database), then Ultraverse uses only the tables (§3.4), this implies that the mutated table(s)’s final state will
column-wise dependency analysis. become the same as before the retroactive operation, thus Ultraverse
In §C.61 , we further describe Ultraverse’s advanced clustering terminates effectless replay and keeps the original table(s).
techniques to enable finer-grained and higher-opportunity clus- When computing each table’s hash, the efficiency of the hash
tering: (1) detect and support any implicit foreign key columns function is crucial. Ideally, the hash computation time should not
(undefined in SQL table schema); (2) detect and handle variable be affected by the size of the table; otherwise, replaying each query
cluster keys; (3) simultaneously use multiple cluster key columns if that writes to a huge table would spend a prohibitively long time to
possible (i.e., multi-dimensional cluster keys). Our experiment (§5) compute its hash. Ultraverse designs an efficient table hash algorithm
demonstrates the drastic performance improvement enabled by the that meets this demand, whose algorithm is as follows.
advanced clustering technique in the TATP, Epinions, and SEATS An empty table’s initial hash value is 0. Once the database system
micro-benchmarks, and the Invoice Ninja macro-benchmark. executes a requested query and records the target table’s rows to be
We provide formal analysis of the query clustering scheme in §E1 added/deleted to the query log, Ultraverse’s query analyzer reads
3.6 Hash-Jumper this log and computes the hash value of each of these added/deleted
During a retroactive operation, if we can somehow infer that the rows by a collision-resistant hash function (e.g., SHA-256), and then
retroactive operation will not affect the final state of the database, either adds (for insertion) or subtracts (for deletion) the hash value
we can terminate the effectless retroactive operation in advance. from the target table’s current hash value in modulo p (size of the
Figure 6 is a motivating example, which is a continual scenario of collision-resistant hash function’s output range, 2256 for SHA-256).
Figure 3. Suppose the service newly creates the Rewards table (Q14) For each query, the table hash computation time is constant with re-
to give users reward points for their daily expenses. For the rewards spect to the target table’s size, and linear in the number of rows to be
type, Alice chooses ‘mileage’ (Q15), Bob chooses ‘movie’, and this inserted/deleted. Given that the collision-resistant hash function’s
continues for many future users. In the middle, Alice changes her output is uniformly distributed in [0, 𝑝 − 1], Hash-jumper’s collision
rewards type to ‘shopping’ (Q99). Later, the service detects that Q15 rate of table hashes, regardless of the number of rows in tables, is
(orange) was a problematic query (triggered by a bug or malicious upper-bounded by 𝑝1 (2−256 for SHA-256, which is negligibly smaller

6
Ultraverse: Efficient Retroactive Operation for Attack Recovery in Database Systems and Web Frameworks

than the CPU bit error rate). See §F1 for the proof and discussion PROCEDURE Caller INPUT OUTPUT
on false positives & negatives. Nevertheless, Ultraverse offers the PRE_REQUEST client DOM node fields HTTP Req. msg
option of literal table comparison upon detecting hash hits. SERVE_REQUEST server HTTP Req. msg HTTP Res. msg
POST_REQUEST client HTTP Res. msg DOM node fields
4 Ultraverse’s Web Application Framework Table 3: uTemplate’s allowed procedures as user request.
A realistic web service’s state is manipulated not only by the data
flows in SQL queries of its database system, but also by the data flows three *_REQUEST stages as SQL PROCEDUREs. This framework en-
in its application-level code. While §3 covered the former, this section sures that all data flows in the web application are captured and
covers the latter, together which accomplish retroactive updates on logged by the Ultraverse database system as SQL queries during
a web application’s state. regular service operation and replay them during retroactive opera-
In general, a web application consists of server-side code (i.e., web tion. Importantly, the SQL language used in the uTemplate innately
server) and client-side code (i.e. webpage). When a client browser vis- provides no built-in interface to access any other system storage than
its a URL, it sends an HTTP GET request to the web server, which in SQL databases. Therefore, using uTemplate fundamentally prevents
turn loads the requested URL’s base HTML template that is common the application from directly accessing any external persistent stor-
to all users, selectively customizes some of its HTML tags based on the age untrackable by the Ultraverse database (e.g., document.cookie
client-specific state (e.g., learns about the client’s identity from the or window.localStorage), while Ultraverse provides a way to in-
HTTP request’s cookie and rewrites the base HTML template’s <h1> directly access them via Ultraverse-provided client-side local tables
tag’s value “User Account” to “Alice’s Account”), and returns the cus- (e.g., BrowserCookie) which Ultraverse can track.
tomized HTML webpage to the client browser. Then, the client navi- After the developer implements the client-side webpage’s user
gating this webpage can make additional data-processing requests to request handler as SQL PROCEDUREs in the uTemplate, Apper con-
the server, such as typing in the transfer amount in the <input> tag’s verts them into their equivalent application code (e.g., NodeJS). The
textbox and clicking the "Send Money" button. Then, the webpage’s generated application code essentially passes uTemplate’s each SQL
JavaScript reads the user’s input from the <input> tag’s associated PROCEDURE statement as a string argument to the application-level
DOM node, pre-processes it, and sends it to the web server, who in SQL database API (e.g., NodeJS’s SQL_exec()) so that the executed
turn processes it, updates its server-side database, and returns re- PROCEDURE is logged by Ultraverse’s query analyzer. In uTemplate,
sults (e.g., “Success") to the client. Finally, client-side JavaScript post- the developer can link the INPUT and OUTPUT of each PROCEDURE
processes the received result (e.g., displays the result string to the to client webpage’s particular DOM nodes or HTTP messages ex-
webpage by writing it to its DOM tree’s <p> tag’s’ innerHTML). Dur- changed between the client and server. The allowed linkings are
ing this whole procedure, JavaScript can store its intermediate data or described in Table 3. At a high level, a webpage’s user request reads
session state in the browser’s local storage such as a cookie. Given this a user’s input from some DOM node(s) as INPUT arguments to PRE_-
web application framework, when an application-level retroactive REQUEST, and writes POST_REQUEST’s returned OUTPUT back to some
operation is needed (e.g., cancel Alice’s money transfer committed at DOM node(s). The INPUT and OUTPUT of SERVE_REQUEST are HTTP
t=9), Ultraverse should track and replay the data flow dependencies messages exchanged between the client and server.
that occur not only within SQL queries, but also within client-side Enforcing Data Flow Restriction: By requiring developers to use
& server-side application code, as well as within the customized uTemplate and Apper, Ultraverse enforces the application’s all data
client-side webpage’s DOM tree nodes. This section explains how Ul- flows to be captured at the SQL level by Ultraverse’s query analyzer.
traverse ensures the correctness of retroactive operation by carefully This way, the inputs to user requests are also captured as INPUTs
enforcing the interactions between SQL queries and application code. to PRE_REQUEST. However, once POST_REQUEST writes its output to
DOM node(s), Ultraverse’s query analayzer cannot track this out-
4.1 Architecture
gone flow, because a webpage’s DOM resides outside the trackable
The Ultraverse web application framework is comprised of two
SQL domain. As a solution, the application code generated by Apper
components: uTemplates (Ultraverse templates) and Apper (appli-
adds an application logic which dynamically taints all DOM node(s)
cation code generator). Ultraverse divides a web application’s each
that POST_REQUEST writes its output to, and forbids PRE_REQUEST
user request handler into three stages: PRE_REQUEST, SERVE_RE-
from receiving inputs from tainted DOM node(s). This essentially
QUEST, and POST_REQUEST. PRE_REQUEST and POST_REQUEST are
enforces that any data flown from SQL to DOM cannot flow back
executed by the client webpage’s JavaScript; SERVE_REQUEST is by
to SQL (i.e., cannot affect the database’s state anymore). Thus, it
the web server’s request handler. For example, when a client types
is safe not to further track/record such flows, and such flows need
in her input value(s) in some DOM node(s) and sends a user re-
not be replayed during retroactive operation as well. If a webpage
quest (e.g., a button click event), the client webpage’s JavaScript
needs to implement the logic of repeatedly refreshing some values
code executes PRE_REQUEST, which reads user inputs and sends an
(e.g., streamed value) to the same DOM node, then the developer
HTTP request to the server. Once the server receives the client’s
can implement the webpage’s POST_REQUEST to create a new DOM
request, it executes SERVER_REQUEST to process the request and
output node for each update and delete the previously created node,
update the server-side database accordingly. After a response is re-
so that each node is transient. Each tainted DOM node gets an ad-
turned to the client browser, its JavaScript executes POST_REQUEST
ditional 1-bit object property named tainted. The 1-bit taints in
to update, if needed, the client-side local database and updates the
DOM nodes refresh only when the webpage gets refreshed. Note that
webpage’s DOM node(s). Ultraverse requires the developer to fill
POST_REQUESTs are allowed to write their outputs to already-tainted
out the uTemplate, which is essentially implementing each of these
DOM nodes or to dynamically created new DOM nodes.
7
Ronny Ko1,4 , Chuan Xiao2,3 , Makato Onizuka2 , Yihe Huang1 , Zhiqiang Lin4

Webpage: https://online-banking.com/account/send_money.html
Base HTML: <html><head></head><body> webpage, she types in the recipient’s account ID and the trans-
<h1 id="�tle">User Account</h1> fer amount in the <input> textboxes and clicks the “submit" but-
<!-- Apper will auto-generate a <script> tag defining “function SendMoney()” -->
<form onsubmit="SendMoney()"> ton. Then, the client-side JavaScript’s PRE_REQUEST logic sends the
<input type="text" id="receiver" value="" >
<input type="text" id="amount" value="" > money transfer request to the web server; the web server’s SERVE_-
</form>
</body></html> REQUEST logic updates the account balances of the sender and re-
Type-1 Request Handler: server=SendMoney_CreateWebpage() ceiver in the server-side database accordingly and sends the result to
“SERVE_REQUEST”
|- INPUT: [HTTP Request Header’s "COOKIE.uname" Field] → @username VARCHAR(32) the client; the client-side JavaScript’s POST_REQUEST logic displays
|- OUTPUT: @final_html TEXT → [HTTP Response Body] the received result to the webpage. It is Apper which generates the
|- SQL BODY:
IF username != "" THEN client-side JavaScript code and web server’s application code that
DECLARE personalizedTitle TEXT;
SET @personalizedTitle = username + " ’s Account"; correspond to each of the 3 SQL PROCEDUREs above.
SET @final_html = Ultraverse_UpdateHTML("�tle", "innerHTML", personalizedTitle);
END IF; Local data-processing by a client webpage’s JavaScript (Type-
Type-2 HTTP Method: PUT, client=SendMoney(), server=SendMoney_Process() 3): Although not shown in Figure 7, two examples are as follows:
"PRE_REQUEST"
|- INPUT: [DOM Node id="receiver", field="value"] → @receiver VARCHAR(16) (1) A client webpage’s JavaScript dynamically updates its refreshed
[DOM Node id="amount", field="value"] → @amount VARCHAR(16) local time to its webpage; for this request, the developer will im-
|- OUTPUT: @recv VARCHAR(16), @amt VARCHAR(16) → [HTTP Request Body]
|- SQL BODY: SET @recv = receiver; SET @amt = amount; plement only POST_REQUEST which calls CURTIME() and writes its
"SERVE_REQUEST"
|- INPUT: [HTTP Request Header’s "COOKIE.uid" Field] → @sender VARCHAR(32) return value to an output DOM node. (2) The server locally runs data
[HTTP Request Body’s "receiver" Field] → @receiver VARCHAR(32)
[HTTP Request Body’s "amount" Field] → @amount VARCHAR(32) analytics on its current database; for this request, the developer will
|- OUTPUT: @result TEXT → [HTTP Response Body] implement only the SERVE_REQUEST, not interacting with clients.
|- SQL BODY: DECLARE cur_balance INT;
SET @cur_balance = SELECT balance from Acccounts WHERE uid = sender;
IF (cur_balance >= amount) THEN 4.3 Application-level Retroactive Operation
UPDATE Accounts SET balance -= amount WHERE uid = sender;
UPDATE Accounts SET balance += amount WHERE uid = receiver;
INSERT INTO Transac�ons VALUES (sender, receiver, amount); Logging and Replaying User Requests: During the web appli-
END IF
SET @result = "Successfully sent " + amount + " to " + receiver cation service’s regular operation, Ultraverse silently logs all the
"POST_REQUEST" information required for retroactive operation on any user request,
|- INPUT: [HTTP Response Body’s "result" Field] → @result TEXT
|- OUTPUT:@result TEXT → [new <p> id="result", filed="InnerHTML", append- which includes the following: each called user request’s name and
To:<body>]
|- SQL BODY: UPDATE BrowserCookie SET last_ac�vity = result; timestamp, the calling client’s ID, the webpage’s customized DOM
Figure 7: A uTemplate that designs Type-1 & Type-2 request node IDs used as the user request’s arguments, and interactive user
handlers of the send_money.html webpage. inputs used as arguments (if any). To log them, Apper’s generated
client-side code has the application logic such that whenever the
4.2 Supported Types of User Requests client sends a user request to the server, it piggybacks the client’s
Modern web applications are generally designed based on three
execution logs of user requests (e.g., INPUT values to PRE_REQUESTs).
types of user request handlers: (i) a web server creates webpages
The server merges these logs into the server’s global log and uses it
requested by clients; (ii) a webpage’s JavaScript interacts with the
to build the query dependency graph for retroactive operation. The
web server to remotely process the client’s data; (iii) a webpage’s
replay phase has to replay each user request’s application-specific
JavaScript processes the client’s data locally without interacting
logic of updating the browser cookie or any persistent application
with the server. The Ultraverse web framework’s uTemplate allows
variables which survive across multiple user requests (e.g., JavaScript
developers to implement these three types of user request handlers.
global/static variables). pTemplate enforces the developer to imple-
Figure 7 is a uTemplate that designs an online banking service’s “Send-
ment such application-specific logic as SQL logic of updating Ultra-
Money" webpage comprised of Type-1 and Type-2 user requests.
verse’s specially reserved 2 tables (BrowserCookie and AppVari-
Creation of a client’s requested webpage (Type-1): When a ables) inside PRE_REQUEST, SERVE_REQUEST, and POST_REQUEST.
client visits https://online-banking.com/send_money.html, its Thus, when Ultraverse’s replay phase replays these PROCEDUREs,
web server returns the client’s customized webpage. To implement they replay each webpage’s cookie and persistent variables. While
such a user request handler, the developer first associates a new replaying them, any customized DOM nodes used as arguments to
uTemplate to the above target URL. Then, the developer implements user requests are also re-computed based on the retroactive updated
this webpage’s base HTML which is common to all users visiting it. database state. During the retroactive operation, clients need not
Then, the developer implements SERVE_REQUEST’s SQL logic which be online, because Ultraverse locally replays the user requests of all
customizes the base HTML according to each client’s HTTP re- clients by itself. See §B1 for in-depth details.
quest (i.e., “Type-1 Request Handler" box in Figure 7). When a client
Optimizing Retroactive Operation: Ultraverse treats each type
browser visits the URL, the web server executes this uTemplate’s
of user request as an application-level transaction, computes its
SERVE_REQUEST’s application code generated by Apper. Note that
𝑅/𝑊 /𝐾 sets, and rolls back & replays only dependent application-
PRE_REQUEST and POST_REQUEST are unused in Type-1 user request
level transactions both column-wise & row-wise. Hash-jumps are
handlers, because the client simply navigates to the URL by using
made in the granularity of application-level transaction.
the browser’s address bar interface or page navigation API.
Replaying Interactive Human Decisions: During Ultraverse’s
Remote data-processing between a client webpage’s JavaScript
retroactive operation, it is tricky to retroactively replay interactive
and a server (Type-2): After the client loads the send_money.html

8
Ultraverse: Efficient Retroactive Operation for Attack Recovery in Database Systems and Web Frameworks

24427 25238
Execution Delay (sec)

Execution Delay (sec)


19565 20183
16156 16398 16937 17593

2045 930 3012


206 104 161 103 951 198 103 1429
4 51 2 1 71 44

M U M U M U M U M U M U U(H) M U U(H) M U U(H) M U U(H) M U U(H)

TATP SEATS Epinions RS TATP SEATS Epinions RS


Redo Sync Redo Sync
Figure 8: Testcases without Hash-jumps. Figure 9: Testcases where Hash-jumps are possible.
TPCC New_Order.NO_W_ID TATP SEATS
Stock.S_W_ID Subscriber.s_id Flight.f_id Customer.c_id
“Subscriber” Table
Warehouse.W_ID Order_Line.OL_W_ID s_id sub_nbr
Subscriber.sub_nbr 1 0000001 Frequent_Flyer.ff_c_id Customer.c_id_str
District.D_W_ID
2 0000002
OOrder.O_W_ID ... ... Reserva�on.r_f_id Reserva�on.r_c_id
Customer.C_W_ID History.H_C_W_ID ※(sub_nbr→s_id) * Co-usable cluster key columns: Flight.f_id,Customer.c_id
Cluster key column
Foreign cluster key column Cluster key propaga�on to an explicit foreign key column
Key propaga�on to an alias cluster key column (within the same table)
Alias cluster key column
Figure 10: TATP, TPCC, and SEAT’s cluster key propagation graphs generated by the row-wise query clustering technique.

user inputs, because Ultraverse cannot replay a human mind. Ultra- 5 Evaluation
verse provides 2 options for handling this. First, Ultraverse’s retroac- Implementation: Ultraverse can be seamlessly deployed based
tive replay uses the same interactive human inputs as recorded in on any unmodified SQL-based database systems. Our prototype’s
the past user request log. For example, suppose there is a transac- host database system was MariaDB [42]. We implemented the query
tion where Alice who initially had $200 sends $100 to Bob. Then, analyzer (column-wise & row-wise dependency analysis, replay
suppose that a retroactive operation changes Alice’s initial balance scheduler, and Hash-jumper) in C. The query analyzer reads Mari-
to $50. Then, during the replay, Alice’s transaction of sending $100 aDB’s binary log [43] to retrieve committed queries and computes
to Bob will abort due to her insufficient funds. Although this is a each query’s 𝑅/𝑊 /𝐾 sets and table hashes. The replay scheduler
correct result in the application semantics, in human’s viewpoint, uses a lockless queue [23] and atomic compare-and-swap instruc-
Alice might not have tried to send $100 to Bob if her balance had tions [28] to reduce contention among threads simultaneously de-
been lower than that. To address this issue, Ultraverse’s 2nd option queuing the queries to replay. For database rollback, there are 3
allows the replay phase to change each user request’s INPUTs to options: (i) sequentially apply an inverse operation to every com-
different (e.g., human-engineer-picked) values to simulate differ- mitted query; (ii) use a temporal database to stage all historical
ent human decisions. These new values can be either a constant or table states; (iii) assume periodic snapshots of backup DBs (e.g.,
a return value of PRE_REQUEST_INTERACTIVE, an optional PROCE- every 3 days, 1 week). Our evaluation chose option 3, as creating
DURE in uTemplate which executes only during retroactive operation system backups is a common practice and this approach incurs no
to generate different INPUTs to PRE_REQUEST (i.e., different user in- rollback delay (i.e., we can simply load the particular version of
puts). For example, the developer’s PRE_REQUEST_INTERACTIVE can backup DB). For the Ultraverse web framework, we implemented
implement the logic of simulating a human mind such that if the Apper in Python, which reads a user-provided uTemplate and gener-
user’s current (i.e., retroactively updated) balance is lower than her ates web application code for ExpressJS web framework [52]. The
intended amount of transfer, then she transfers only the amount she Ultraverse software and installation guideline is available at https:
currently has. //anonymous.4open.science/r/ultraverse-8E1D/README.md.
Other Design Topics: Due to the space limit, see §C1 for Ultra- In this section, we evaluate the Ultraverse database system (§5.1)
verse’s other features: handling the browser cookie across user re- and web application framework (§5.2), and present case studies (§5.3).
quests (§C.11 ); preserving secrecy of client’s secret values such as
password or random seed during replay (§C.21 ); supporting client- 5.1 Database System Evaluation
side code’s dynamic registration of event handlers (§C.31 ); handling We evaluated Ultraverse on Digital Ocean’s VM with 8 virtual
malicious clients who hack their downloaded webpage’s JavaScript CPUs, 32GB RAM and 640GB SSD (NVMe). We compared the speed
to tamper with their user request logic or send corrupt user request of retroactive operation of MariaDB (M) and Ultraverse (U). We used
logs to the server (§C.41 ); handling AUTO INCREMENT initiated by a five micro-benchmarks in BenchBase [22]: TPC-C, TATP, Epinions,
user request’s PROCEDURE (§C.51 ); the advanced query clustering SEATS, and ResourceStresser (RS). For each benchmark, we ran
scheme (§C.61 ); virtual two-level table mappings to improve column- retroactive operations on various sizes of commit history: 1M, 10M,
wise dependency analysis (§C.71 ); column-specific retroactive oper- 100M, and 1B queries. For each benchmark, we designed realis-
ation which allows the user to selectively skip unneeded columns tic retroactive scenarios which choose a particular transaction to
without harming correctness (§C.81 ); retroactively remove/add and retroactively update the database. Due

9
Ronny Ko1,4 , Chuan Xiao2,3 , Makato Onizuka2 , Yihe Huang1 , Zhiqiang Lin4

to the space limit, we describe each benchmark’s retroactive oper- TPC-C TATP Epinions SEATS RS
ation scenarios and Ultraverse’s optimization analysis in §D1 . We U 90% 99% 99% 99% 94%
ran each testcase 10 times and reported the median. U(H) 96% 99% 99% 99% 97%
Both Ultraverse and MariaDB used a backup DB to instantly roll- Table 4: Query reduction rate v.s. MariaDB.
back the database. For the replay, Ultraverse used column-wise &
TPC-C TATP Epinions SEATS RS
row-wise dependency analysis, parallel replay, and hash-jump de-
U 7.0% 0.7% 3.9% 3.9% 6.4%
scribed in §3, while MariaDB serially replayed all transactions com-
U(H) 9.5% 0.6% 4.9% 4.2% 6.4%
mitted after the retroactive target transaction. For the replay, both
Ultraverse and MariaDB skipped read-only transactions (comprised Table 5: Overhead (%) for regular service operations.
of only SELECT queries) as they do not affect the database’s state. TPC-C TATP Epinions SEATS RS
Since this subsection evaluates only the Ultraverse database sys- Slowdown 8.2% 7.2% 14.1% 16.5% 3.3%
tem (not its web application framework), we modified BenchBase’s Table 6: Overhead of simultaneously executing a retroactive
source code such that each benchmark’s any transaction logic im- operation and regular service operations in the same ma-
plemented in Java application code is instead implemented in SQL chine.
so that it is recorded and visible in the SQL log to properly replay
them by both MariaDB and the Ultraverse database system. Each TPC-C TATP Epinions SEATS RS
benchmark’s default cluster number was set as follows: TPC-C (10 Log Size (bytes) / Query 110b 48b 12b 35b 48b
warehouses), TATP (200,000 subscribers), Epinions (2,000 users), Table 7: Ultraverse’s average log size per query (bytes).
ResoureStresser (1,000 employees), and SEATS (100,000 customers).
Queries TPC-C TATP Epinions SEATS RS
Figure 8 shows the execution time of retroactive operation scenar-
ios between MariaDB and Ultraverse. Ultraverse’s major speedup 1 Million 10.0x 240.8x 153.2x 112.0x 10.7x
was from its significantly smaller number of queries to be replayed. 10 Million 10.5x 253.0x 145.4x 111.9x 9.6x
Table 4 summarizes Ultraverse’s query reduction rate (w/o Hash- 100 Million 10.1x 232.5x 156.8x 119.2x 12.3x
jumper), which is between 90% – 99%. For most benchmarks, Ultra- 1 Billion 10.8x 241.1x 153.4x 114.4x 11.8x
verse achieved the query number reduction rate in proportion to the Table 8: Speedup for various rollback/replay window sizes.
number of clusters (e.g., users, customers, employers, or warehouses).
Size Factor TPC-C TATP Epinions SEATS RS
Figure 10 is Ultraverse’s cluster key propagation graphs for TPC-
C, TATP, and SEATS. In TPC-C, Warehouse.w_id was the cluster 1 10.1x 232.5x 156.8x 119.2x 12.3x
key column. TPC-C’s all tables participating in cluster key propa- 10 50.19x 683.9x 651.9x 693.8x 131.4x
gation (total 8) have explicit foreign key relationship, which were 100 106.81x 667.4x 693.3x 674.8x 659.2x
discovered by the basic query clustering scheme (§3.5). In TATP, Sub- Table 9: Speedup for various database sizes.
scriber.s_id was the cluster key column, and the alias cluster key
Ultraverse prototype’s upper-bound of hash collision rate was ap-
mappings of Subscriber.sub_nbr→Subscriber.s_id was dis-
proximately 1.16𝑥10−77 .
covered by the advanced query clustering technique (§C.61 ). SEATS
The column-wise & row-wise query dependency analysis and
also used the advanced clustering technique to simultaneously use
hash-jump analysis incurs additional overhead during regular ser-
2 cluster key columns (Flight.f_id and Customer.c_id) with an
vice operations due to their extra logging activity of 𝑅/𝑊 /𝐾 sets
alias cluster key column (Customer.c_id_str). We present the clus-
and table hash values for each committed query. We measured this
ter key propagation graphs and column-wise transaction (query)
overhead in Table 5, which is between 0.6%∼9.5%. However, this
dependency graphs for all benchmarks in §D1 .
overhead can be almost fully reimbursed in practice if Ultraverse’s
In both MariaDB and the Ultraverse database system, reading the
analyzer runs asynchronously in a different machine than the data-
query log and replaying queries were done in parallel. However, as
base system, as regular query operations and query analysis can be
the sequential disk batch-reading speed of SSD NVMe was faster
performed asynchronously.
than serial execution of queries, the critical path of retroactive oper-
Table 6 shows how much the retroactive operation running in the
ation was the replay delay. Ultraverse additionally had a database
background slows down the speed of regular operations running
synchronization delay, because it replays only column-wise & row-
in the foreground when they are executed simultaneously in the
wise dependent queries, and thus at the end of its replay, it updates
same machine. The average overhead varied between 3.3%∼16.5%.
only the affected table rows & columns to the original database. As
However, this overhead can be almost fully reimbursed if the retroac-
shown in Figure 8, this synchronization delay was negligibly small
tive operation’s replay runs in a different machine and only the
(∼1 second).
synchronization at the end runs in the same machine. Table 7 shows
Figure 9 shows the execution time for different retroactive op-
Ultraverse’s average log size per query, which varies between 12∼110
eration testcases where the Hash-jump optimization is applicable
bytes, depending on the benchmarks.
(i.e., a table hash match is found by Hash-jumper). We note U as
Table 8 reports Ultraverse’s speedup against MariaDB in retroac-
Ultraverse without using Hash-jump, whereas U(H) is with Hash-
tive operation across different window sizes of commit history: 1M,
jump enabled. Using hash-jump could achieve additional 101%∼185%
10M, 100M, and 1B queries. Regardless of the window size of queries
speedup compared to not using it, by detecting hash matches and
terminating the effectless retroactive operation in advance. Our

10
Ultraverse: Efficient Retroactive Operation for Attack Recovery in Database Systems and Web Frameworks

42870 44938 45974 44858 Transactions


Execution Delay (sec)

41936 𝑆𝑐𝑒𝑛 1 𝑆𝑐𝑒𝑛 2 𝑆𝑐𝑒𝑛 3 𝑆𝑐𝑒𝑛 4 𝑆𝑐𝑒𝑛 5


1 Million 422.3x 495.3x 574.5x 333.5x 594.5x
10 Million 415.5x 489.6x 593.2x 343.4x 661.4x
101 94 72 143 70 100 Million 424.5x 478.1x 582.4x 321.5x 640.8x
1 Billion 418.3x 483.5x 581.2x 325.6x 581.5x
M U M U M U M U M U
Scenario1 Scenario2 Scenario3 Scenario4 Scenario5 Table 10: Speedup for various rollback/replay window sizes.
Redo Sync
Figure 11: Retroactive operation times for Invoice Ninja. Size Factor 𝑆𝑐𝑒𝑛 1 𝑆𝑐𝑒𝑛 2 𝑆𝑐𝑒𝑛 3 𝑆𝑐𝑒𝑛 4 𝑆𝑐𝑒𝑛 5
1 424.5x 478.1x 582.4x 321.5x 640.8x
Sessions.u_id Sessions.session_id Cookies.session_id 10 663.8x 694.4x 671.9x 687.1x 721.6x
Items.creator_id Items.item_id BillItems.item_id 100 648.5x 673.5x 664.3x 695.6x 674.6x
Bills.creator_id Table 11: Speedup for various database sizes in 5 scenarios.
Users.u_id Bills.bill_id BillItems.bill_id
Bills.payer_id
foreign key relationships (i.e., not defined as FOREIGN KEY in table
Payments.creator_id schema but used as foreign keys in the application semantics), which
Payments.recipient_id were discovered by the advanced query clustering scheme (§C.61 ).
※ “Cookies” and “BIllItems” tables
Sta�s�cs.user_id
have virtual 2-level table mappings. Ultraverse clustered the Invoice Ninja database’s all table rows by
Cluster key column using Users.user_id as the cluster key column.
Alias cluster key column
Foreign cluster key column We first compared the performance of Ultraverse and MariaDB
Cluster key propaga�on to an implicit foreign key column on both retroactive operation and regular service operation for a
Key propaga�on to an alias cluster key column database with 10,000 users for 100M transactions. For retroactive
Figure 12: Invoice Ninja’s cluster key propagation graph.
operations, Ultraverse replayed only dependent user requests both
to be rolled back and replayed, each benchmark’s speedup in retroac- column-wise and row-wise, whereas MariaDB using the naive strat-
tive operation stayed roughly constant. This is because each bench- egy did so for all past user requests. Also note that MariaDB’s retroac-
mark’s transaction weights were constant, so Ultraverse’s reduction tive operation does not provide correctness for application semantics.
rates of queries to be replayed were consistent regardless of the As depicted in Figure 11, Ultraverse’s median speedup over MariaDB
window size of committed transactions. was 478.1x. Table 10 shows Ultraverse’s speedup against MariaDB
Table 9 reports Ultraverse’s speedup of retroactive operation for various window sizes of transaction history. While achieving
against MariaDB across different database sizes (10MB∼10GB), while the observed speedup, Ultraverse’s average reduction rate of the
the window size of commit history is constant (100M queries). In- number of replay queries compared to MariaDB was 99.8%.
terestingly, Ultraverse’s speedup increased roughly in proportion Table 11 shows Ultraverse’s speedup against MariaDB across
with a database’s size factor. This is because a bigger database had various database size factors (10,000∼1M users), for the same window
more number of clusters (e.g., warehouses, customers), which led to size of transaction commit history. The speedup increased with
finer granularity of row-wise query clustering. When the database the increasing database size factor, due to the increasing number
size factor was 100, there were too many query clusters, and thus of clusters (i.e., Users.u_id), and eventually upper-limited by the
too few queries to replay, despite the large size of query analysis log. delay of reading and interpreting the query analysis log.
In such cases Ultraverse’s replay speedup was upper-limited by the Ultraverse’s query analysis overhead during regular service oper-
delay of reading and interpreting the query analysis log. ation (when both running on the same machine) was 4.2% on average.
Ultraverse’s additional storage overhead for query analysis log was
5.2 Web Application Framework Evaluation 205 bytes per user request on average.
For this evaluation, we re-implemented Invoice Ninja [17] based Data Analytics Evaluation: We evaluated Ultraverse on a pop-
on Ultraverse’s web application framework. Invoice Ninja is a Venmo- ular use case of data analytics (§6), which runs the K-clustering
like open source web application for invoice management of online algorithm on 1 million online articles created by 10,000 users and
users. The application provides 31 types of user requests (e.g., cre- classifies them into 10 categories based on their words relevance,
ating or editing a user’s profile). The application consists of 6 major by using the LDA algorithm [4]. The stage for data processing and
server-side data tables (Users, Items, Bills, BillItems, Payments, view meterialization used Hive [47] and Hivemall [32], a machine-
and Statistics) and 2 client-side data tables (Cookie and Ses- learning programming framework based on JDBC. We ported the
sion). Each user’s DOM displays a user-specific dashboard such as application into Ultraverse and compared to MariaDB’s replay. Ultra-
the invoices to be paid, the current balance, etc. Each user request verse’s average overhead during regular service operation was 1.8%,
consists of PRE_REQUEST, SERVE_REQUEST, and POST_REQUEST. An and its speedup in retroactive operation was 41.7x. The speedup came
invalid user request gets aborted (e.g., the user’s provided credential from eliminating independent queries in the article classification
is invalid, or the user’s balance is insufficient). We designed 5 retroac- stage, where each user ID was the cluster key.
tive operation scenarios, each of which retroactively removes an
attacker-triggered user request committed in the past. We describe 5.3 Case Study
these scenarios and Ultraverse’s optimization analysis in §D.61 . According to cyberattack reports in 2020 [18, 25], financial orga-
Figure 12 shows Ultraverse’s cluster key propagation graph for nizations take 16 hours on average to detect security breaches (while
Invoice Ninja. All columns propagating the cluster keys had implicit other domains may take ≥ 48 hours). Recent financial statistics [24]

11
Ronny Ko1,4 , Chuan Xiao2,3 , Makato Onizuka2 , Yihe Huang1 , Zhiqiang Lin4

TPC-C TATP Epinions SEATS RS Invoie Ninja Temporal and Versioning Databases [37] store each record with
M 84.6h 95.8h 3.6h 135.6h 101.0h 120.2h a reference of time. The main temporal aspects are valid time and
U 8.5h 0.4h 0.1h 1.1h 5.1h 0.3h transaction time [15]. The former denotes the beginning and end-
Table 12: Retroactive operation time for 1B transactions. ing time a record is regarded as valid in its application-specific real
world; the latter denotes the beginning and ending time a database
show that everyday the U.S. generates on average 108 million credit system regards a record to be stored in its database. A number of
card transactions, which account for 23% of all types of financial database languages support temporal queries [5, 50]. SQL:2011 [40]
transactions. This implies that everyday the U.S. generates roughly incorporates temporal features and MariaDB [42] supports its usage.
108 million÷23%×100%=470 million financial transactions. OrpheusDB [34] allows users to efficiently store and load past ver-
Table 12 reports the retroactive operation time of MariaDB and Ul- sions of SQL tables by traversing a version graph. While temporal or
traverse for 1 billion transactions of each benchmark. MariaDB’s me- versioning databases enable querying a database’s past states, they
dian retroactive operation time was 95.8 hours, while Ultraverse cut do not support changing committed past operations (which entails
it down to 0.4 hours. Most importantly, a regular database system’s updating the entire database accordingly).
SQL-only replay does not provide correctness of the application
semantics as Ultraverse’s web application framework does. Ultra- Database Recovery/Replication: There are two logging schemes:
verse’s estimated VM rent cost ($0.41/h) for its retroactive operation (i) value logging logs changes on each record; (ii) operation log-
is $0.04∼$20.9, which is significantly cheaper than hiring human ging logs committed actions (e.g., SQL statements). ARIES [49] is a
engineers to handcraft compensating transactions for manual data standard technique that uses value logging (undo and redo logs) to
recovery. Ideally, those human engineers can use Ultraverse as a recover inconsistent checkpoints. Some value logging systems lever-
complementary tool to assist their manual analysis of data recovery age multiple cores to speed up database recovery (e.g., SiloR [65])
in financial domains as well as in other various web service domains. or replay (e.g., Kuafu [33]). However, value logging is not designed
to support retroactive operation: if a query is retroactively added or
6 Discussion removed, the value logs recorded prior to that event become stale.
Systems that use operation logging [16, 48, 51, 53, 58] are mainly
Data Analytics: Ultraverse can be used to enforce GDPR compli-
designed to efficiently replicate databases. However, they either
ance in data analytics. Modern data analytics architectures (e.g.,
inefficiently execute all queries serially or do not support strong seri-
Azure [45] or IBM Analytics [35]) generally consist of four stages:
alization [6], while Ultraverse supports efficient strong serialization.
(i) sourcing data; (ii) storing them in an SQL database; (iii) retrieving
data records from tables and processing them to create materialized Attack Recovery: CRIU-MR [63] recovers a malware-infected Linux
views; (iv) using materialized views to run various data analytics. container by selectively removing malware during checkpoint restora-
When a user initiates her data deletion, all other data records de- tion. ChromePic [59] replays browser attacks by transparently log-
rived from it during the data analytics should be also accordingly ging the user’s page navigation activities and reconstructing the
updated. Thus, GDPR-compliant data analytics should replay the logs for forensics. RegexNet [2] identifies malicious DoS attacks of
third stage (e.g., data processing) to reflect changes to materialized sending expensive regular expression requests to the server, and
views. However, this stage often involves machine-learning or ad- recovers the server by isolating requests containing the identified
vanced statistical algorithms, which are complex, computationally attack signatures. However, the prior works do not address how to
heavy, and do not have efficient incremental deletion algorithm. Ul- selectively undo the damages on the server’s persistent database,
traverse can be used for such GDPR deletion of user data to efficiently both efficiently and correctly from application semantics. Warp [9]
update materialized views. Ultraverse can be easily ported into this and Rail [10] selectively remove problematic user request(s) or patch
use case, because many data analytics frameworks such as Hive [47] the server’s code and reconstruct the server’s state based on that.
or Spark SQL [56] support the SQL language to implement complex However, all these techniques require replaying the heavy browsers
data processing logic. We provide our experimental results in §5.2. (∼500MB per instance) during their replay phase, which is not scal-
able for large services that have more than even 1M users or 1M
What-if Analysis: Ultraverse can be used for what-if analysis [21]
transactions to replay. On the other hand, Ultraverse’s strength lies
at both database and application levels with the capability of retroac-
in its supreme efficiency and scalability: Ultraverse uses pTemplate
tively adding/removing any SQL queries or application-level transac-
and Apper to represent each webpage as compact SQL code, and
tions. For example, one can use Ultraverse to retroactively add/remove
replaying a web service’s history only requires replaying these SQL
certain queries and test if SQL CONSTRAINT/ABORT conditions are
queries, which is faster and lighter-weight. Further, the prior arts
still satisfied.
do not have novel database techniques that reduce the number of
Cross-App: We currently support retroactive operation within replay queries, such as Ultraverse’s column-wise query dependency
a single Ultraverse web application. Our future work is to enable analysis, advanced row-wise query clustering, and hash jumper.
cross-app retroactive operation (across many databases/services).
Provenance in Databases: What-if-provenance [21] speculates the
7 Related Work output of a query if a hypothetical modification is made to the
Retroactive Data Structure efficiently changes past operations on database. Why-provenance [7] traces the origin tuples of each out-
a data object. Typical retroactive actions are insertion, deletion, and put tuple by analyzing the lineage [14] of data generation. How-
update. Retroactive algorithms have been designed for queues [13], provenance [30] explains the way origin tuples are combined to
doubly linked queues [26], priority queues [20], and union-find [54].

12
Ultraverse: Efficient Retroactive Operation for Attack Recovery in Database Systems and Web Frameworks

generate each output (e.g., ORCHESTRA [29], SPIDER [1]). Where- [18] Deep Instinct. Voice of SecOps, 2021. https://www.deepinstinct.com/pdf/voice-
provenance [7] traces each output’s origin tuple and column, an- of-secops-report-2nd-edition.
[19] E. D. Demaine, J. Iacono, and S. Langerman. Retroactive data structures, 2007.
notate each table cell, and propagate labels [3] (e.g., Polygen [62], [20] E. D. Demaine, T. Kaler, Q. Liu, A. Sidford, and A. Yedidia. Polylogarithmic fully
DBNotes [12]). Mahif [8] is a recent work that answers historical retroactive priority queues via hierarchical checkpointing. In WADS, pages 263–
what-if queries, which computes the delta difference in the final 275, 2015.
[21] D. Deutch, Z. Ives, T. Milo, and V. Tannen. Caravan: Provisioning for what-if
database state given a modified past operation. Mahif leverages sym- analysis. In CIDR, 2013.
bolic execution to safely ignore a subset of transaction history that [22] D. E. Difallah, A. Pavlo, C. Curino, and P. Cudre-Mauroux. OLTP-Bench: An
extensible testbed for benchmarking relational databases. PVLDB, 7(4):277–288,
is provably unaffected by the modified past operation. However, 2013.
Mahif is not scalable over the transaction history size: its cost of [23] DPDK Project. Ring Library, 2019. https://doc.dpdk.org/guides/prog_guide/ring_
symbolic execution (i.e., the symbolic constraints on tuples that the lib.html.
[24] Erica Sandberg. The Average Number of Credit Card Transactions Per Day & Year,
SMT solver has to solve) grows unbearably large as the history grows. 2021. https://www.cardrates.com/advice/number-of-credit-card-transactions-
While Mahif was designed and evaluated for only a small transac- per-day-year/.
[25] Fawad Ali. How Long Does It Take to Detect and Respond to Cyberattacks?, 2021.
tion history (only up to 200 insert/delete/update operations used https://www.makeuseof.com/detect-and-respond-to-cyberattacks/.
in experiments), Ultraverse has been demonstrated to efficiently [26] R. Fleischer. A simple balanced search tree with O(1) worst-case update time. Int.
handle beyond 1 billion transactions. Finally, note that all prior data- J. Found. Comput. Sci., pages 137–150, 1996.
[27] D. Fu and F. Shi. Buffer overflow exploit and defensive techniques. In 2012 Fourth
base provenance works do not preserve application semantics like International Conference on Multimedia Information Networking and Security, pages
Ultraverse does. 87–90, 2012.
[28] GCC GNU. Built-in functions for atomic memory access, 2019. https://gcc.gnu.
8 Conclusion org/onlinedocs/gcc-4.1.0/gcc/Atomic-Builtins.html.
[29] T. J. Green, G. Karvounarakis, N. E. Taylor, O. Biton, Z. G. Ives, and V. Tannen.
Ultraverse efficiently updates a database for retrospective oper- ORCHESTRA: Facilitating collaborative data sharing. In SIGMOD, pages 1131–
ation. By using its various novel techniques such as column-wise 1133, 2007.
[30] T. J. Green and V. Tannen. The semiring framework for database provenance. In
& row-wise query dependency analysis and hash-jump, Ultraverse PODS, pages 93–99, 2017.
speeds up retroactive database update by up to two orders of magni- [31] J. Hasan and A. M. Zeki. Evaluation of web application session security. In 2nd
tude over a regular rollback and replay. Further, Ultraverse provides Smart Cities Symposium (SCS 2019), pages 1–4, 2019.
[32] Hivemall. Hivemall documentation. https://hivemall.apache.org/userguide/index.
a web application framework that retroactively updates the database html.
with awareness of application semantics. [33] C. Hong, D. Zhou, M. Yang, C. Kuo, L. Zhang, and L. Zhou. KuaFu: Closing the
parallelism gap in database replication. In ICDE, pages 1186–1195, 2013.
References [34] S. Huang, L. Xu, J. Liu, A. J. Elmore, and A. G. Parameswaran. OrpheusDB: bolt-on
versioning for relational databases (extended version). VLDB J., 29(1):509–538,
[1] B. Alexe, L. Chiticariu, and W.-C. Tan. Spider: A schema mapping debugger. In
2020.
VLDB, pages 1179–1182, 2006.
[35] IBM. IBM Data Analytics. https://www.ibm.com/analytics/journey-to-ai?p1=
[2] Z. Bai, K. Wang, H. Zhu, Y. Cao, and X. Jin. Runtime recovery of web applications
Search&p4=43700057304811782&p5=b&cm_mmc=Search_Google-_.
under zero-day redos attacks. In 42nd IEEE Symposium on Security and Privacy, SP
[36] James Cheney, Laura Chiticariu and Wang-Chiew Tan. Provenance in databases:
2021, San Francisco, CA, USA, 24-27 May 2021, pages 1575–1588. IEEE, 2021.
Why, how, and where. Foundations and Trends in Databases, pages 379–474, 2009.
[3] D. Bhagwat, L. Chiticariu, W. C. Tan, and G. Vijayvargiya. An annotation manage-
[37] C. Jensen. Introduction to temporal database research. Temporal database man-
ment system for relational databases. VLDB J., 14(4):373–396, 2005.
agement, pages 1–29, 2000.
[4] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn.
[38] A. Jindal, H. Patel, A. Roy, S. Qiao, Z. Yin, R. Sen, and S. Krishnan. Peregrine:
Res., 3(null):993–1022, Mar. 2003.
Workload optimization for cloud query engines. In SoCC, pages 416–427, 2019.
[5] M. H. Bohlen, C. S. Jensen, and R. T. Snodgrass. Temporal statement modifiers.
[39] M. Kaufmann, P. Fischer, N. May, C. Ge, A. Goel, and D. Kossmann. Bi-temporal
ACM Trans. Database Syst., 25(4):407–456, 2000.
timeline index: A data structure for processing queries on bi-temporal data. In
[6] Y. Breitbart, H. Garcia-Molina, and A. Silberschatz. Overview of multidatabase
ICDE, pages 471–482, 2015.
transaction management. VLDB J., 1(2):181–239, 1992.
[40] K. Kulkarni and J.-E. Michels. Temporal features in SQL:2011. SIGMOD Rec.,
[7] P. Buneman, S. Khanna, and W. C. Tan. Why and where: A characterization of
41(3):34–43, 2012.
data provenance. In ICDT, pages 316–330, 2001.
[41] L. Ma, D. Zhao, Y. Gao, and C. Zhao. Research on sql injection attack and prevention
[8] F. S. Campbell, B. S. Arab, and B. Glavic. Efficient answering of historical what-if
technology based on web. In 2019 International Conference on Computer Network,
queries. In Proceedings of the 2022 International Conference on Management of
Electronic and Automation (ICCNEA), pages 176–179, 2019.
Data, SIGMOD ’22, page 1556–1569, New York, NY, USA, 2022. Association for
[42] MariaDB. MariaDB Source Code, 2019. https://mariadb.com/kb/en/library/getting-
Computing Machinery.
the-mariadb-source-code/.
[9] R. Chandra, T. Kim, M. Shah, N. Narula, and N. Zeldovich. Intrusion recovery
[43] MariaDB. Overview of the Binary Log, 2019. https://mariadb.com/kb/en/library/
for database-backed web applications. In Proceedings of the Twenty-Third ACM
overview-of-the-binary-log/.
Symposium on Operating Systems Principles, SOSP ’11, page 101–114, New York,
[44] MariaDB. Temporal Data Tables, 2019. https://mariadb.com/kb/en/library/
NY, USA, 2011. Association for Computing Machinery.
temporal-data-tables/.
[10] H. Chen, T. Kim, X. Wang, N. Zeldovich, and M. F. Kaashoek. Identifying infor-
[45] Microsoft. Advanced Analytics Architecture. https://docs.microsoft.com/en-
mation disclosure in web applications with retroactive auditing. In 11th USENIX
us/azure/architecture/solution-ideas/articles/advanced-analytics-on-big-data.
Symposium on Operating Systems Design and Implementation (OSDI 14), pages
[46] Microsoft. Compensating transaction pattern. https://docs.microsoft.com/en-
555–569, Broomfield, CO, Oct. 2014. USENIX Association.
us/azure/architecture/patterns/compensating-transaction.
[11] R. Chirkova and J. Yang. Materialized views. Foundations and Trends® in Databases,
[47] Microsoft. What is Apache Hive and HiveQL on Azure HDInsight?
4(4):295–405, 2012.
https://docs.microsoft.com/en-us/azure/hdinsight/hadoop/hdinsight-use-
[12] L. Chiticariu, W.-C. Tan, and G. Vijayvargiya. DBNotes: A post-it system for
hive#:~:text=Hive%20enables%20data%20summarization%2C%20querying,
relational databases based on provenance. In SIGMOD, pages 942–944, 2005.
knowledge%20of%20Java%20or%20MapReduce.
[13] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and Stein. Introduction to Algorithms.
[48] U. F. Minhas, S. Rajagopalan, B. Cully, A. Aboulnaga, K. Salem, and A. Warfield.
MIT Press, 2001.
Remusdb: transparent high availability for database systems. VLDB J., 22(1):29–45,
[14] Y. Cui, J. Widom, and J. L. Wiener. Tracing the lineage of view data in a warehousing
2013.
environment. ACM Trans. Database Syst., 25(2):179–227, 2000.
[49] C. Mohan, D. Haderle, B. Lindsay, H. Pirahesh, and P. Schwarz. ARIES: A transaction
[15] C. Date and H. Darwen. Temporal Data and the Relational Model. Morgan Kaufmann
recovery method supporting fine-granularity locking and partial rollbacks using
Publishers Inc., 2002.
write-ahead logging. ACM Trans. Database Syst., 17(1):94–162, 1992.
[16] K. Daudjee and K. Salem. Lazy database replication with snapshot isolation. In
[50] G. Ozsoyoglu and R. T. Snodgrass. Temporal and real-time databases: a survey.
VLDB, pages 715–726, 2006.
IEEE Trans. Knowl. Data Eng., 7(4):513–532, 1995.
[17] David Bomba. Invoice ninja, 2020. https://github.com/invoiceninja/invoiceninja.

13
Ronny Ko1,4 , Chuan Xiao2,3 , Makato Onizuka2 , Yihe Huang1 , Zhiqiang Lin4

[51] C. Plattner and G. Alonso. Ganymed: Scalable replication for transactional web
applications. In Middleware, pages 155–174, 2004.
[52] Rodion Abdurakhimov. ExpressJS Source Code Repository, 2020. https://github.
com/expressjs/express.
[53] K. Salem and H. Garcia-Molina. System M: A transaction processing testbed for
memory resident data. IEEE Trans. Knowl. Data Eng., 2(1):161–172, 1990.
[54] N. Sarnak and R. E. Tarjan. Planar point location using persistent search trees.
Commun. ACM, 29(7):669–679, 1986.
[55] Sonal Srivastava. Conflict serializability in dbms, 2019. https://www.geeksforgeeks.
org/conflict-serializability-in-dbms/.
[56] A. Spark. Spark overview. https://spark.apache.org/sql/.
[57] G. Speegle. Compensating Transactions, pages 405–406. Springer US, Boston, MA,
2009.
[58] A. Thomson and D. J. Abadi. The case for determinism in database systems. PVLDB,
3(1):70–80, 2010.
[59] P. Vadrevu, J. Liu, B. Li, B. Rahbarinia, K. H. Lee, and R. Perdisci. Enabling recon-
struction of attacks on users via efficient browsing snapshots. In 24th Annual
Network and Distributed System Security Symposium, NDSS 2017, San Diego, Cali-
fornia, USA, February 26 - March 1, 2017. The Internet Society, 2017.
[60] Validata Group. Banks Busted by a Software Glitch during 2021,
2021. https://www.validata-software.com/blog-mobi/item/447-banks-busted-by-
a-software-glitch-during-2021.
[61] J. S. M. Verhofstad. Recovery techniques for database systems. ACM Comput. Surv.,
10(2):167–195, 1978.
[62] Y. R. Wang and S. E. Madnick. A polygen model for heterogeneous database
systems: The source tagging perspective. In VLDB, pages 519–538, 1990.
[63] A. Webster, R. Eckenrod, and J. Purtilo. Fast and service-preserving recovery from
malware infections using CRIU. In 27th USENIX Security Symposium (USENIX
Security 18), pages 1199–1211, Baltimore, MD, Aug. 2018. USENIX Association.
[64] G. Weikum and G. Vossen. Transactional Information Systems: Theory, Algorithms,
and the Practice of Concurrency Control and Recovery. Morgan Kaufmann Publishers
Inc., San Francisco, CA, USA, 2001.
[65] W. Zheng, S. Tu, E. Kohler, and B. Liskov. Fast databases with fast durability and
recovery through multicore parallelism. In OSDI, pages 465–477, 2014.

14
Ultraverse: Efficient Retroactive Operation for Attack Recovery in Database Systems and Web Frameworks

A The Table Columns Each Type of Query Reads/Writes


Query Type Policy for Classifying Table Columns Into Read & Write Sets
CREATE | | ALTER 𝑅 = { The columns of external tables (or views) this query’s FOREIGN KEYs reference }
TABLE 𝑊 = { All columns of the table (or view) to be created or altered }
DROP | | TRUNCATE 𝑊 = { All columns of the target table to be dropped or truncated
TABLE 𝑅 = { + all external tables’ FOREIGN KEY columns that reference this target table’s any column }
CREATE (OR 𝑅 = { All columns of the original tables (or views) this view references }
REPLACE) VIEW 𝑊 = { All columns of the target view to be created }
DROP VIEW 𝑊 = { All columns of the view to be dropped }
SELECT 𝑅 = { the columns of the tables (or views) this query’s SELECT or WHERE clause accesses
𝑅 = { + columns of external tables (or views) if this query uses a FOREIGN KEY referencing them
𝑅 = { + Union of the 𝑅 of this query’s inner sub-queries } , 𝑊 = { }
INSERT 𝑅 = { Union of the 𝑅 of this query’s inner sub-queries
𝑅 = { + the columns of external tables (or views) if this query uses a FOREIGN KEY referencing them }
𝑊 = { All columns of the target table (or view) this query inserts into }
UPDATE | | 𝑅 = { Union of the 𝑅 of this query’s inner sub-queries + the columns of the target table (or view) this query reads
DELETE 𝑅 = } + The columns of external tables (or views) if this query uses a FOREIGN KEY referencing them
𝑅 = { + The columns of the tables (or views) read in its WHERE clause }
𝑊 = { Either the specific updated columns or all deleted columns of the target table (or view)
𝑅 = { + all external tables’ FOREIGN KEY columns that reference this target table’s updated/deleted column }
CREATE 𝑅 = { Union of the 𝑅 of all queries within it }
TRIGGER 𝑊 = { Union of all queries within it }
* Also, add these R/W sets to the R/W sets of each query linked to this trigger (since this trigger will co-execute with it)
DROP TRIGGER 𝑅/𝑊 = the same as the read/set set of its counterpart CREATE TRIGGER query
TRANSACTION | | 𝑅 = { Union of the 𝑅 of all queries within this transaction or procedure } * Data flows via SQL’s DECLARE variables or
PROCEDURE 𝑊 = { Union of the 𝑊 of all queries within this transaction or procedure } areturn values of sub-queries are also tracked
Table A: Ultraverse’s policy for generating a read set (𝑅) and a write set (𝑊 ) for each type of SQL query.
• In the above table, we intentionally omit all other SQL keywords that are not related to determining 𝑅/𝑊 sets (e.g., JOIN, LIMIT, GROUP BY,
FOR/WHILE/CASE, LABEL, CURSOR, or SIGNAL SQLSTATE).

15
Ronny Ko1,4 , Chuan Xiao2,3 , Makato Onizuka2 , Yihe Huang1 , Zhiqiang Lin4

B Apper-generated Web Application // The "SendMoney" webpage’s base HTML which is auto-generated by Apper
const base__html = '<html> <head></head> <body>
<script>
// Create the "SERVE_REQUSET" PROCEDURE of the Type-1 user request // Create the "PRE_REQUEST" PROCEDURE of the Type-2 user request
(GET send_money.html) (PUT send_money.html)
SQL_Execute(' SQL_Execute(`CREATE PROCEDURE SendMoney_PRE_REQUEST
CREATE PROCEDURE SendMoney_CreateWebpage (IN receiver VARCHAR(16), IN amount VARCHAR(16),
(IN base__html TEXT, IN username VARCHAR(32), OUT final_html TEXT) AS OUT recv VARCHAR(16), OUT amt VARCHAR(16)) AS
BEGIN BEGIN
DECLARE @personalizedTitle AS TEXT; SET @recv = receiver;
IF username != "" THEN SET @amt = amount;
SET @personalizedTitle = username + " ’s Account"; END`);
SET @final_html = U�lity__UpdateHTML(base__html, "�tle", </script>
"innerHTML", personalizedTitle); <h1 id="�tle">User Account</h1>
END IF; <form onsubmit="SendMoney()">
END;'); <input type="text" id="receiver" value="" >
<input type="text" id="amount" value="" >
// Create the "SERVE_REQUSET" PROCEDURE of the Type-2 user request </form>
(PUT send_money.html) <script>
SQL_Execute(' func�on SendMoney()
CREATE PROCEDURE SendMoney_Process { var preRquest_args = {};
(IN sender VARCHAR(32), // Ensure no output data from POST_REQUEST may flow back to PRE_REQUEST
IN receiver VARCHAR(32), console.assert(!document.getElementById("receiver").taint__bit);
IN amount VARCHAR(32), OUT result TEXT) AS console.assert(!document.getElementById("amount").taint__bit);
BEGIN
DECLARE cur_balance INT; // Record interactive user inputs & non-interactive DOM values that are used
SET @cur_balance = SELECT balance from Acccounts WHERE uid = sender; as arguments to PRE_REQUEST. Later, send them to the server for future
IF (cur_balance >= amount) THEN retroactive operation to replay the client’s state. Only the arguments that are
UPDATE Accounts SET balance -= amount WHERE uid = sender; either interactive or personalized need to be refreshed during the replay.
UPDATE Accounts SET balance += amount WHERE uid = receiver; preRequest__args.0 = {};
INSERT INTO Transac�ons VALUES (sender, receiver, amount); preRequest__args.0.interac�ve = true;
END IF preRequest__args.0.personalized = false;
SET @result = "result: 'Successfully sent " + amount " to" + receiver + "'" preRequest__args.0.dom_id = "receiver";
END;'); preRequest__args.0.val = document.getElementById("receiver").value;

// Request handler for "HTTP GET https://online-banking.com/send_money.html" preRequest__args.1 = {};


func�on H�pHandler_SendMoney_CreateWebpage (var cookie) preRequest__args.1.interac�ve = true;
{ preRequest__args.1.personalized = false;
// Log the Ultraverse web application client’s ID and her user request preRequest__args.1.dom_id = "amount";
SQL_Execute(`INSERT INTO Ultraverse__Log preRequest__args.1.val = document.getElementById("amount").value;
(request_name, Ultraverse__uid, pre_request_args)
VALUES ("SendMoney_CreateWebpage", ${cookie.Ultraverse__uid}, "")`) // Call "PRE_REQUEST" of the Type-2 user request (PUT send_money.html)
var h�pRequest__body = SQL_Execute(`CALL SendMoney
// Run the server’s SERVE_REQUEST in SQL (${preRequest__args.0.val}, ${preRequest__args.1.val})`);
var personalized_html = SQL_Execute('CALL SendMoney_CreateWebpage h�pRequest__body.preRequest_args = preRequest__args;
(${base__html}, ${cookie.username})'); var h�pResponse__body = h�p_send( host="online-banking.com", port=443,
path="send_money.html", method="PUT",
request_body= "{" + h�pRequest__body + "}" ); // HTTP body as JSON format
// Return the personalized webpage to the client as an HTTP response
return personalized_html; // "POST_REQUEST" of the Type-2 user request (PUT send_money.html)
} // Print the money transfer result message on the screen and taint
// the output DOM node to prevent data flow back to PRE_REQUEST
// Request handler for "HTTP PUT https://online-banking.com/send_money.html" var postRequest__args = {};
func�on H�pHandler_SendMoney_Process (var h�p) postRequest__args.result = JSON.parse(h�pResponse__body).result;
{ var newNode__0 = document.createElement("p"); st
// Log the Ultraverse web application client’s ID and her user request newNode__0.id = "result";
SQL_Execute(`INSERT INTO Ultraverse__Log newNode__0.innerHTML = postRequest__args.result;
(request_name, Ultraverse__uid, pre_request_args) newNode__0.taint__bit = true;
VALUES ("SendMoney_Process", ${h�p.cookie.Ultraverse__uid}, document.getElementsByTagName("body")[0].appendChild(newNode__0);
${JSON.parse(h�p.body).preRequest__args.stringify()})`)
// "POST_REQUEST": Update the client’s browser cookie
// Run the server’s SERVE_REQUEST in SQL document.cookie.last_ac�vity = result;
var serveRequest__args.0 = JSON.parse(h�p.body).receiver; }
var serveRequest__args.1 = JSON.parse(h�p.body).amount; </script>
var h�pResponse__body = SQL_Execute(`CALL SendMoney_Process ( </body></html>';
${h�p.cookie.uid},
${JSON.parse(h�p.cookie.body).recv}, Figure 14: The base HTML common to all clients generated
${JSON.parse(h�p.cookiebody).amt})`); by Apper based on Figure 7’s uTemplate.
// Return the response in a JSON format
return "{" + h�pResponse__body + "}";
} Logging & Replaying User Requests: Ultraverse reserves one
Figure 13: The server-side web application code generated by special property in the browser’s cookie, which is cookie.Ultraverse_-
Apper based on Figure 7’s uTemplate. _uid. This property stores each Ultraverse web application client’s
unique ID, which can be also used as an application-level unique user
Figure 13 and Figure 14 are JavaScript application code generated
ID. Whenever the client makes a Type-2 (client-server remote data
by Apper based on Figure 7’s uTemplate. String literals are colored in
processing) or Type-3 (client-only local data processing) request,
brown, of which SQL queries are colored in purple. variable names
the webpage’s client-side JavaScript (generated by Apper) sends the
containing double underscores (__) are special variables reserved
application-specific user request data to the web server. This code
by Ultraverse, which are generated during Apper’s conversion from
also silently piggybacks the following 3 system-level information
uTemplate to application code. Bold JavaScript/SQL variables store
to the server: (1) the client’s unique ID (i.e., cookie.Ultraverse_-
client-specific values (e.g., interactive user input or personalized
_uid); (2) the user request’s name (e.g. "SendMoney"); and (3) the
DOM node). These bold variables may change their values during
arguments of the user request’s PRE_REQUEST call. These 3 pieces of
retroactive operations, and thus Ultraverse is designed to recompute
information are essential for Ultraverse to run an application-level
their values during retroactive operation.
16
Ultraverse: Efficient Retroactive Operation for Attack Recovery in Database Systems and Web Frameworks

retroactive operation in a multi-client service. Specifically, the ar- C Extended Design Features
guments used in the PRE_REQUEST call are one of the three types: C.1 Handling the Browser Cookie
(i) the DOM node that stores the user’s interactive input (e.g., the
<input id="amount"> tag stores the transfer amount); (ii) the DOM
across User Requests
Modern web frameworks allow a client and server to update the
node that was customized by the web server based on the client’s
client browser’s cookie by using one of 3 ways: (i) set an HTTP re-
identity during the webpage creation (e.g., the <h1 id="title">
quest’s COOKIE field; (ii) set an HTTP response’s SET-COOKIE field;
tag stores Alice’s name); (iii) all other DOM nodes in the webpage
(iii) update client-side JavaScript’s document.cookie object. Dur-
that are constant and common to all users (e.g., the <form onsub-
ing retroactive operation, all such cookie-handling logic should be
mit="SendMoney()"> tag). When Ultraverse runs a retroactive op-
retroactively replayed to properly replay the evolution of the client’s
eration, its default mode assumes that the state of the DOM nodes
state. To achieve this, Ultraverse enforces the developer to implement
which store a user’s interactive inputs is the same as in the past. How-
the client-side logic of updating cookies by using SQL so that it can be
ever, Ultraverse assumes that customized DOM nodes may change
captured and replayed by Ultraverse’s query analyzer. Also, Apper-
their values during retroactive operation, and thus Ultraverse is
generated JavaScript code silently updates each HTTP message’s
designed to recompute their values during the replay phase by re-
COOKIE and SET-COOKIE fields before sending it out to client/server.
playing the webpage’s Type-1 webpage creation request (SERVE_-
As explained in §4.2, Type-1 user request handlers create and
REQUEST). Given the standard user request routine comprised of
return a webpage for a client’s requested URL. When retroactively
PRE_REQUEST → SERVE_REQUEST → POST_REQUEST, some user re-
replaying a Type-1 user request, Ultraverse assumes that the re-
quest may have only SERVE_REQUEST (e.g., a web server’s locally
quested URL and most of the HTTP header fields are the same as in
invoked internal scheduler routine) or only POST_REQUEST (e.g., a
the past, while the following three elements are subject to change:
client’s local data processing). For any given user request, Ultraverse
(a) the client’s HTTP GET request’s COOKIE field; (b) the HTTP re-
only needs to log the arguments of the initial PROCEDURE of the user
sponse’s SET-COOKIE field; (c) the returned webpage’s some HTML
request, because the arguments of the subsequent PROCEDUREs can
tags customized by SERVE_REQUEST based on the client’s retroac-
be deterministically computed based on the prior PROCEDURE’s re-
tively changed COOKIE and the server’s retroactively changed data-
turn value. Therefore, the subsequent arguments are recomputed by
base state. During retroactive operation, replaying all these three el-
Ultraverse while replaying the user request.
ements is important, because they often determine the state of the re-
Logging & Replaying Browser Cookies: Ultraverse also re-
turned webpage. In order to replay a client’s HTTP request’s COOKIE
plays the evolution of the client browser’s cookie state by replay-
field, the same client’s all (column-wise & row-wise dependent) Type-
ing the user request’s SQL logic of updating Ultraverse’s specially
2 and Type-3 requests executed before this should be also retroac-
reserved table: BrowserCookie. The developer is required to imple-
tively replayed, because they can affect the client’s cookie state in
ment each webpage’s customized cookie handling logic as SQL logic
the database. Ultraverse ensures to replay them while retroactively
of updating the BrowserCookie table in PROCEDUREs in uTemplate.
replaying its global log. Ultraverse achieves this by enforcing the
See §C.1 for further details on how Ultraverse handles the browser
developer to implement the update logic of browser cookies as SQL
cookie for each of Type-1, Type-2, and Type-3 user requests.
logic of updating Ultraverse’s specially reserved BrowserCookie
Logging & Replaying Application Code’s Persistent Vari-
table in PRE_REQUEST, SERVE_REQUEST, and POST_REQUEST. During
ables: While a client stays in the same webpage, its Type-2 or Type-3
retroactive operation, each client’s BrowserCookie table is replayed,
user requests may write to some JavaScript variables in the web-
which is essentially the replay of each client’s browser cookie. Given
page whose value persistently survive across multiple user requests
the developer’s SQL logic of updating the BrowserCookie table, the
(e.g., global or static variables). Ultraverse requires the developer
Apper-generated application code also creates the equivalent appli-
to implement all application logic accessing such persistent appli-
cation logic that updates Javascript’s document.cookie, mirroring
cation variables via Ultraverse’s specially reserved table: AppVari-
the developer’s SQL logic implementation, so that each user request’s
ables. During a retroactive operation, Ultraverse replays the state
HTTP request’s COOKIE field and HTTP response’s SET-COOKIE fields
of variables stored in this table the similar way it does for replaying
will contain the proper value of the browser cookie stored in docu-
the browser cookie with the BrowserCookie table. Note that the
ment.cookie.
developer using the Ultraverse web framework’s uTemplate funda-
Each client webpage manages its own local BrowserCookie ta-
mentally has no way to directly access the browser cookie or the
ble (e.g., BrowserCookie_Alice) comprised of 1 row, whereas the
application’s persistent variables, and can access them only through
server-side global database has the GlobalBrowserCookie table
the BrowserCookie and AppVariables tables as SQL logic imple-
which is essentially the union of the rows of all clients’ Browser-
mented in PRE_REQUEST, SERVE_REQUEST, and POST_REQUEST.
Cookie tables. Ultraverse uses virtual two-level page mappings
(§C.7) from BrowserCookie_<username>→GlobalBrowserCookie.

C.2 Preserving Secrecy of Client’s Secret Values


During retroactive operation, Ultraverse’s replay phase, by default,
uses the same interactive user inputs as in the past as arguments to
Type-2 user request’s PRE_REQUEST or Type-3 user request’s POST_-
REQUEST. However, some of the user inputs such as password strings
are privacy-sensitive. In practice, modern web servers store a client’s

17
Ronny Ko1,4 , Chuan Xiao2,3 , Makato Onizuka2 , Yihe Huang1 , Zhiqiang Lin4

password as hash value instead of cleartext, because exposing the To remove an existing event listener, the Event Registration tuple’s
client’s raw password breaks her privacy. To preserve privacy, Ul- add_or_remove value should be set "remove" instead, and then the
traverse’s uTemplate additionally provides an optional function Apper-generated JavaScript code’s Type-3 user request will execute
called PRE_REQUEST_SECRET, which is designed to read values from document.getElementById("form_send")
privacy-sensitive DOM nodes (e.g., <input id="password">) or do .removeEventListener("onsubmit", SendMoney, true).
privacy-sensitive computation (e.g., picking a value in an array based
on the client’s secret random seed). When the client’s Type-2 user re- C.4 Clients Tampering with Application Code
quest is executed during regular operation, PRE_REQUEST_SECRET is A malicious client might tamper with its webpage’s JavaScript
first executed and its OUTPUTs (e.g., a hashed password) are pipelined code to hack its user request logic and sends invalid user request
to PRE_REQUEST as INPUTs, which further processes the data and data to the web server. Note that a client can do this not only in a
then sends them over to the web server as SERVE_REQUEST’s INPUTs. Ultraverse web application, but also in any other types of web appli-
Meanwhile, the Apper-generated application code does not record cations. By practice, it is the developer who is responsible to design
PRE_REQUEST_SECRET’s INPUTs (i.e., raw password), and the user his server-side application code to be resistant to such client-side
request execution log being sent to the web server includes only code tampering. However, when it comes to Ultraverse’s retroactive
PRE_REQUEST’s INPUTs (i.e., hashed password). Thus, the client’s operation, such client-side code tampering could result in an incon-
privacy is preserved. During the retroactive operation, Ultraverse sistent state at the end of a retroactive operation, because Ultraverse
will replay the user request by directly replaying PRE_REQUEST by cannot replay the client’s same tampered code which is unknown.
using the hashed password as its INPUT argument (without replaying However, Ultraverse can detect the moment when a client’s tampered
PRE_REQUEST_SECRET). code causes an inconsistency problem in the server’s global database
state, because Ultraverse’s uTemplate has all Type-2 and Type-3 user
C.3 Client-side Code’s Dynamic Registration of request code that is to be executed by the client’s browser. Besides
Event Handlers the web application service’s regular operations, Ultraverse’s offline
A web service’s developer may want to design a webpage’s JavaScript Verifier can replay each user request recorded in the global Ultraverse
to dynamically register some event handler. For example, in Figure 7, logs to detect any mismatch between: (i) the user request call’s asso-
a developer may want to register the SendMoney function to the ciated SERVE_REQUEST’s INPUT arguments submitted by the client
<form> tag’s onsubmit event listener only after the webpage is fully during regular service operations; (ii) the same user request call’s
loaded. The motivation for this is to prevent the client from issuing associated PRE_REQUEST’s OUTPUT replayed by the offline Verifier.
a sensitive money transfer request before the webpage is fully ready These two values should always match for any benign user request;
for service (to avoid the webpage’s any potential irregular behav- a mismatch indicates that the client either had sent a contradicting
ior). To fulfill this design requirement, Ultraverse’s uTemplate used user request execution log, or had tampered with the client-side
for a Type-2 or Type-3 user request provides an optional section JavaScript (originally generated by Apper) to produce a mismatching
called Event Registration, which is comprised of 4 tuples: ⟨dom_id, OUTPUT contradicting the one replayed by the server’s genuine PRE_-
event_type, user_request_name, add_or_remove⟩. Once Apper con- REQUEST. The server can detect such contradicting user requests and
verts this uTemplate into application code, it runs in such a way that take countermeasures (e.g., retroactively remove the contradicting
after the Type-2 or Type-3 user request’s POST_SERVE is executed, user request). This verification is needed only once for each newly
it scans each row of the Event Registration section (if defined), and committed user request, based on which the server can run any
it dynamically adds or removes user_request_name from dom_id’s number of retroactive operations. Verifier needs to do replay verifi-
event_name listener. For example, in the alternate version of Figure 7, cation for at least those user requests which were committed within
suppose that its base HTML’s <form onsubmit="SendMoney()"> the desired retroactive operation’s time window. For example, most
is replaced to <form id="form_send"> and its <body> is replaced financial institutions detect a cyberattack within 16 hours from the
to <body onload="RegisterSendMoney()">. In this new version attack time (§5.3), in which case Verifier must have done (or must
of webpage, at the end of the pageload, the RegisterSendMoney() do) the replay verification for the user requests committed within
function is called, which in turn registers the SendMoney function the latest 16 hours.
to the <form id="form_send"> tag’s onsubmit event listener, so
C.5 Handling Retroactive AUTO_INCREMENT
that the client can issue her money transfer only after the pageload
Suppose that an online banking service’s each money transfer
completion. To implement this webpage’s logic in the Ultraverse
transaction gets a transaction_id whose value is AUTO_INCRE-
web framework, Figure 7’s Type-1 and Type-2 user requests are
MENTed based on its SQL table schema. And suppose that later, some
unchanged, and there will be an additional Type-3 user request
past user request which inserts a row into the Transaction table
defined for client=RegisterSendMoney(), whose POST_REQUEST
is retroactively removed. Then, each subsequently replayed money
is empty and whose Event Registration section defines the tuple:
transfer transaction’s transaction_id will be decreased by 1. This
⟨"form_send", "onsubmit", "SendMoney()", "add"⟩. Given this
phenomenon may or may not be desired depending on what the
uTemplate, the Apper-generated JavaScript will execute the Type-3
application service provider expects. In case the service provider
user request at the end of the pageload, which will in turn call docu-
rather wants to preserve the same transaction_ids of all past com-
ment.getElementById("form_send").addEventListener("onsubmit",
mitted transactions even after the retroactive operation, Ultraverse
SendMoney, true) to register the Type-2 user request handler "Send-
provides the developer an option for this. When this option is en-
Money()" to the <form id="form_send"> tag’s onsubmit listener.
abled, Ultraverse marks a tombstone on the retroactively removed

18
Ultraverse: Efficient Retroactive Operation for Attack Recovery in Database Systems and Web Frameworks

AUTO_INCREMENT value of the Transaction table to ensure that this To do this, Ultraverse runs static data flow analysis on the PROCE-
specific transaction_id value should not be re-used by another DUREs defined in the web application’s all uTemplates. This data flow
query during the retroactive operation. This way, all subsequently analysis runs the following two validations:
replayed transactions will get the same transaction_ids as before. • Validation of Implicit Foreign Cluster Key Columns: This
Similarly, when a new query is retroactively added, Ultraverse pro- validates if a candidate column in the database is a valid implicit
vides the option of applying tombstones to all queries that were foreign key column . For this validation, Ultraverse checks the fol-
committed in the past, so that the retroactively added query will use lowing for every data flow in every PROCEDURE in every uTemplate
the next available AUTO_INCREMENT value (e.g., the highest value in whose sink is the candidate column: (1) the source of the data flow
the table) as its transaction_id, not conflicting with the transac- should be the cluster key column; (2) the data flow between the
tion_ids of any other transactions committed in the past. source and the sink is never modified by any binary/unary logic,
throughout all control flow branches (if any) that the data flow
C.6 Advanced Query Clustering undergoes. This is to ensure that the data flow sink column’s each
C.6.1 Support for Implicit/Alias Cluster Key Columns value always immutably (i.e., idempotently) mirror the same value
This subsection requires reading §4 first. from the data flow source cluster key column across any retroac-
When the query clustering technique is used in the Ultraverse tive operation’s replay scenarios. Meanwhile, there are cases that
web framework, there are 2 new challenges. We explain them based a data flow’s source is a DOM node used as INPUT to PRE_REQUEST,
on Figure 3’s example. SERVE_REQUEST, or POST_REQUEST. To handle such cases, Ultra-
Challenge 1: A developer’s web application code may use a certain verse allows the developer to optionally mark the DOM node(s)
table’s column as if it were a foreign key column, without defining (in the base HTML in uTemplate) which stores a cluster key. For
this column as an SQL-level FOREIGN KEY in the SQL table’s schema. example, if Ultraverse’s choice rule for the cluster key column
For example, if Figure 3 is implemented as a web application, it is (Table 2) reports Users.uid as the optimal cluster key column,
possible that the application code’s SQL table schema which creates then the developer would mark any DOM nodes in uTemplats
the Accounts table (Q3) does not explicitly define Accounts.uid as that stores the user’s ID (e.g., an <input id=’user_id’> tag) as
a FOREIGN KEY referencing Users.uid, but the application code may "cluster_key:Users.uid". Note that if the developer mistak-
use Accounts.uid and Users.uid as if they were in a foreign key enly or purposely omits the marking of cluster key DOM nodes
relationship (e.g., whenever a new account is created, the applica- in uTemplate, this only makes Ultraverse potentially lose the op-
tion code copies one value from the Users.uid column and inserts portunity of applying the query clustering optimization, without
it into the Accounts.uid column, and whenever some Users.uid harming Ultraverse’s correctness of retroactive operation.
is deleted, the Accounts table’s all rows containing the same Ac- • Validation of Variable Cluster Keys: This validates if a runtime-
counts.uid value are co-deleted, which essentially implements the concretized value of an SQL variable (or its expression) used as a
SQL’s foreign key logic of ON DELETE CASCADE). Ultraverse defines cluster key can be considered as a valid cluster key. For this valida-
such a foreign key which is perceivable only from the application tion, Ultraverse checks that for each data flow whose sink is a (for-
semantics as an implicit foreign key. It’s challenging to determine eign/alias) cluster key column and the value flowing to the sink is
whether application code uses a particular table column as an implicit an SQL variable (or its expression), the data flow’s source(s) should
foreign key, because this requires the understanding and analysis of be an immutably (i.e., idempotently) replayable source(s). Specifi-
the application-specific semantics. cally, Ultraverse checks if the source(s) is comprised of only the fol-
Challenge 2: The application code’s generated query statement lowing immutable (i.e., idempotent) elements: (1) a constant literal;
may use an (foreign/alias) cluster key not as a concrete value, but as an (1) a DOM node that stores a constant value across any retroactive
SQL variable. Resolving a variable into a concrete cluster key requires operation; (3) another (foreign/alias) cluster key column; (4) an
careful analysis, because the optimization and correctness of the row- SQL function that is guaranteed to return the same value across
wise query clustering technique is based on the fundamental premise any retroactive operation scenario. RAND()/CURTIME() are also
that the database’s each committed query’s cluster key set is constant, valid sources, because their seed values are recorded by Ultraverse
immutable, and independent from any retroactive operations. If this during regular operations for idempotent replay (§3.4). This vali-
premise gets violated, using the query clustering technique could dation is applied to all control flow branches (if any) that the data
break the correctness of retroactive operation. Thus, during the flow undergoes. A variable cluster key comprised of the above 4
query clustering analysis, when Ultraverse sees a query that uses an idempotent elements is guaranteed that its runtime-concretized
SQL variable as a cluster key, Ultraverse has to identify not only the cluster key is also idempotent across any retroactive operation’s
runtime-concretized value of the SQL variable recorded in the SQL replay scenarios.
log, but also whether the concretized value will be guaranteed to be Note that when the developer uses the Ultraverse web framework,
the same and immutable across any retroactive operations– if not, Ultraverse only needs to run the above two validations on each
this query’s cluster key set should be treated as ¬∅ (i.e., this column uTemplate’s PROCEDURE definition merely once before the the service
cannot be used as a cluster key). deployment, because all runtime user requests are calls of those same
Solution: To solve the 2 challenges, Ultraverse has to scan the ap- PROCEDUREs. Thus, the data flow analysis does not incur runtime
plication code to identify implicit foreign key relationships and to overhead.
determine the immutability (i.e., idempotence) of the concretized In our experiment (§5), there are 2 benchmarks that involve vari-
value of the SQL-variable cluster keys used in query statements. able cluster keys: TATP (Figure 16) and SEATS (Figure 22). And there

19
Ronny Ko1,4 , Chuan Xiao2,3 , Makato Onizuka2 , Yihe Huang1 , Zhiqiang Lin4

Condition for a Valid Cluster Key Column In the query clustering scheme that supports multiple cluster key
𝒄 : A candidate cluster key column columns, each cluster key (besides its value) additionally gets the type
𝒕𝒄 : The table that owns the column 𝑐 information, where the type is its origin cluster key column. Thus,
𝒕 𝒊 → 𝒕 𝒋 : 𝑡𝑖 depends on 𝑡 𝑗 (i.e., a data flow from 𝑡 𝑗 to 𝑡𝑖 exists in the
each cluster key is a 2-dimensional element, comprised of ⟨cluster
: transaction history of the retroactive operation’s time window)
key column, value⟩. In the example of SEATS, the two types for
Condition. ∀𝑖, 𝑗 ( (𝑊 (𝑄𝑖 ) ≠ ∅) ∧ (𝑡 𝑗 ∈ (𝑅 (𝑄𝑖 ) ∪ 𝑊 (𝑄𝑖 )) its two cluster key columns are Flight.f_id and Customer.c_id.
∧(𝑡 𝑗 → 𝑡𝑐 ) =⇒ 𝐾𝑐 (𝑄𝑖 ) ≠ ¬∅) Each query’s 𝐾 sets contain 2-dimensional cluster keys as elements.
Table 14: Generalized condition for valid cluster key When applying the query clustering rule (Table 2), the 𝐾 sets of two
columns. queries are considered to have an intersection only if they have at
least 1 common element whose type and value both matches.
is 1 benchmark that involves implicit foreign key columns: Invoice Table 14 describes Ultraverse’s generalized condition for a valid
Ninja web service (Figure 25). cluster key column, which is general enough to be applied to both
C.6.2 Support for Multiple Cluster Key Columns cases of using a single cluster key column (§3.5) and multiple cluster
Ultraverse can further reduce the granularity of query cluster- key columns. This condition states that for every database-updating
ing by using multiple cluster key columns simultaneously. Given query 𝑄𝑖 (or transaction) in the transaction history belonging to
a database and its transaction history, suppose that the database’s the retroactive operation’s time window, for every table 𝑡 𝑗 that 𝑄𝑖
tables can be classified into two groups such that each group’s tables operates on (i.e., 𝑡 𝑗 belongs to 𝑄𝑖 ’s 𝑅/𝑊 sets), if there exists a data
have data flows only among them and there is no data flow between flow from the table that owns the cluster key column 𝑐 (i.e., 𝑡𝑐 ) to
two table groups. Then, there is no data interaction between the two table 𝑡 𝑗 , then 𝑄𝑖 ’s SELECT, WHERE or VALUE clause should specify the
table groups, and thus each table group can independently use its (foreign/alias) cluster key associated with 𝑐. Otherwise, 𝑡 𝑗 ’s rows
own cluster key column to cluster table rows in its table group. that 𝑄𝑖 operates on cannot be clustered based on the values of the 𝑐
However, there are many cases in practice such that a database’s column, and thus 𝑐 cannot be used as a cluster key column.
tables cannot be classified into 2 disjoint groups, because the data Given Table 14’s validation of cluster key columns, the SEATS
flows between some tables prevent the formation of disjoint table benchmark has 2 satisfying cluster key columns: Flight.f_id and
groups. Yet, even in such cases, there is a way to use multiple clus- Customer.c_id (Figure 22); the Epinions benchmark also has 2 sat-
ter key columns simultaneously for finer-grained query clustering. isfying cluster key columns: Useracct.u_id and Item.i_id (Fig-
We explain this by example. Figure 22 is the cluster key propaga- ure 18). Our evaluation (§5.1) shows performance improvement
tion graph for the SEATS benchmark in BenchBase [22]. In this of retroactive operation by simultaneously using the 2 cluster key
benchmark, customers create/edit/cancel reservations for their flight. columns in each benchmark.
While running this benchmark, its transactions update only 4 tables:
Flight, Customer, Reservation, and Frequent_Flyer. In SEATS’s C.7 Virtual Two-level Table Mappings
all transactions, there exist only 3 types of data flows among ta- In many web applications, there exist a table equivalent to the
bles: Flight→Reservation, Customer→Reservation, and Cus- Sessions table, whose each row stores each user’s session infor-
tomer→Frequent_Flyer. Further, in SEATS’s all transactions that mation, such as a login credential or last active time. When a web
update the database, its each query’s SELECT, WHERE or VALUE clause server processes a logged-in user’s any request, the server first ac-
specifies a concrete value of Flight.f_id, Customer.c_id, or both. cesses the Sessions table to verify the user’s login credential and
Given SEATS’s above transaction characteristics, Ultraverse can update the user’s last active time. This service routine is problematic
apply the following row-wise query clustering strategy: for column-wise dependency analysis, because all users’ requests
• The Flight table can be row-wise clustered by using Flight.f_- will end up with mutual column-wise write-write dependency on
id as the cluster key column. the Sessions.last_active_time column, and thus the column-
• The Customer and Frequent_Flyer tables can be row-wise clus- wise dependency analysis alone cannot reduce the number of replay
tered by using Customer.c_id as the cluster key column. queries for retroactive operation. This problem can be solved by co-
• The Reservation table can be row-wise clustered by using both using the row-wise dependency analysis (§C.6) to split the Sessions
Flight.f_id and Customer.c_id as the cluster key columns. table’s rows by using Users.user_id as the cluster key column. Nev-
Note that the Reservation table receives data flows from both ertheless, Ultraverse provides an alternative solution purely based
the Customer and Flight tables, which is why clustering the Reser- on the column-wise dependency analysis.
vation table’s rows requires using both Flight.f_id and Cus- Ultraverse’s uTemplate supports virtual two-level table mappings
tomer.c_id as cluster keys. On the other hand, the Flight table in SQL query statements, which uses the new syntax:
can cluster its rows only by using Flight.f_id, because it does tableName<columnName>. For example, the developer’s query state-
not get data flows from any other tables. Similarly, the Customer ment in uTemplate can use the syntax Sessions<‘alice’> to access
and Frequent_Flyer tables can cluster their rows only by using Alice’s session data. Then, when Ultraverse’s query analyzer ana-
Customer.c_id, because they don’t get data flows from any other lyzes the column-wise dependency of user requests defined in uTem-
tables. As a result, Ultraverse can cluster queries (i.e., transactions) plates, the anlayzer considers Sessions<‘alice’> as a virtual table
of SEATS by carefully using both Flight.f_id and Customer.c_id uniquely dedicated to Alice, named Sessions_alice, comprised of
as the cluster key columns. 1 row. Therefore, if another user request accesses Sessions<‘bob’>
for example, Ultraverse considers these two queries to access differ-
ent tables (i.e., Sessions_alice and Sessions_bob respectively),
20
Ultraverse: Efficient Retroactive Operation for Attack Recovery in Database Systems and Web Frameworks

thus two user requests are column-wise independent from each D Retroactive Attack Recovery Scenarios
other. Therefore, when retroactively replaying Alice’s user request, and Ultraverse’s Optimization Analysis
we don’t need to replay Bob’s user request even if we do not use In this section, we explain the attack recovery scenarios based on
the row-wise dependency analysis (§C.6), because Bob’s query is retroactive operations as evaluated in §5 and explains how Ultra-
column-wise independent from Alice’s. Meanwhile, in the database verse’s optimization techniques are applied. For each benchmark, we
system, Alice and Bob’s data records are physically stored in the also show the column-wise query (transaction) dependency graph,
same Sessions table in different rows. Ultraverse’s Apper rewrites where each node represents a transaction and its 𝑅/𝑊 sets. Note
Sessions<‘alice’> in the uTemplate such that it reads/writes Ses- that we omit read-only transactions from the graphs because they
sions WHERE user_id=‘alice’. Therefore, the virtualized table is do not affect the database’s state during both regular and retroactive
interpreted only by Ultraverse’s query analyzer to improve the per- operations.
formance of column-wise dependency analysis. Another motivation
for table virtualization is that for any applications, maintaining a D.1 TATP
small number of physical tables is important, because creating as TATP’s dataset and transactions are designed for mobile network
many tables as the number of users could degrade the database providers to manage their subscribers.
system’s performance in searching user records.
DeleteCallForwarding
In our macro-benchmark evaluation of the Invoice Ninja web R={ Special_Faculty.sf_type, Subscriber.s_id }
service (§5.2), we applied virtual two-level table mappings to the W={ Call_Forwarding.* }
Sessions and Cookies tables (Figure 25). on Call_Forwarding.*
. InsertCallForwarding
R={ Subscriber[sub_nbr, s_id], Special_Faculty.sf_type }
C.8 Column-Specific Retroactive Update W={ Call_Forwarding.* }
For a retroactive operation for data analytics, a user may not need UpdateLoca�on
the retroactive result of certain columns, such as bulky debugging R={ Subscriber[s_id, sub_nbr] }
table’s verbose error message column. Carefully ignoring retroactive W={ Subscriber.vlr_loca�on }
updates of such unneeded column(s), say 𝑢𝑐𝑎 , can further expedite UpdateSubscriberData
retroactive operation. Ultraverse supports such an option by safely R={ Subscriber.s_id, Special_Faculty.sf_type }
ignoring the retroactive update of 𝑢𝑐𝑎 if no other columns to be W={ Subscriber.bit_1, Special_Faculty.data_a }
rolled back & replayed depend on the state of 𝑢𝑐𝑎 . The algorithm is Figure 15: TATP’s transaction dependency graph.
as follows:
(1) The user chooses columns to ignore. Ultraverse stores their “Subscriber” Table
names in the IgnoreColumns set, and stores names of all other Subscriber.s_id s_id sub_nbr
columns of the database in the IncludeColumns set. 1 0000001
2 0000002
(2) Ultraverse generates the query dependency graph based on Subscriber.sub_nbr ... ...
column-wise and cluster-wise analysis.
※(sub_nbr→s_id)
(3) For each column 𝑢𝑐𝑎 in IgnoreColumns, Ultraverse moves it to
IncludeColumns if the following two conditions are true for some Cluster key column
query 𝑄𝑏 in the query dependency graph: (i) 𝑢𝑐𝑎 appears in Alias cluster key column
𝑄𝑏 ’s read set; (ii) 𝑄𝑏 ’s write set contains some column in the Key propaga�on to an alias cluster key column
IncludeColumns set. This step runs repeatedly until no more Figure 16: TATP’s cluster key propagation graph.
columns are moved from IgnoreColumns to IncludeColumns. D.1.1 Attack Recovery Scenario 1An attacker initiated a tampered
(4) For each query 𝑄𝑐 in the query dependency graph, Ultraverse UpdateLocation user request to provide wrong information about
safely removes 𝑄𝑐 from the graph if its write set includes only the user’s location (e.g., GPS or geographic area). After identifying
those columns in IgnoreColumns. this transaction, Ultraverse retroactively corrected the data in the
(5) Ultraverse rollbacks/replays the query dependency graph. query by applying the following optimization techniques.
Column-wise Query Dependency: There was no need to roll
back and replay its subsequent DeleteCallForwarding, Insert-
CalLForwarding, and UpdateSubscriberData transactions. This
was because these transactions are column-wise independent from
the UpdateLocation transaction. In particular, these transactions
does not contain the UpdateLocation transaction’s write set ele-
ment SUBSCRIBER.vlr_location in their read/write set.
Row-wise Query Clustering: Ultraverse’s query clustering scheme
clustered all committed transactions into the number of distinct
phone subscribers. Therefore, when a transaction belonging to a
particular cluster (i.e., subscriber) was retroactively corrected, all the

21
Ronny Ko1,4 , Chuan Xiao2,3 , Makato Onizuka2 , Yihe Huang1 , Zhiqiang Lin4

other transactions belonging to other clusters (i.e., other subscribers) the attacker-initiated updateUserName transaction. Ultraverse used
didn’t need to be rolled back and replayed. the column-wise query dependency analysis and row-wise query
D.1.2 Attack Recovery Scenario 2An attacker initiated several In- clustering, as well as the following optimization technique.
sertCallForwarding transactions which inserted tampered rows Hash-jump: During the retroactive operation, Ultraverse de-
into the CALL_FORWARDING table. We used Ultraverse to retroactively tected a hash hit upon executing the victim user’s benign Upda-
remove them based on the column-wise query dependency analysis teUserName request that overwrote the attacker-tampered account
and row-wise query clustering, as well as the following optimization information. Thus, Ultraverse returned the UserAcct table’s same
technique. state stored before the retroactive operation.
Hash-jump: After committing all the previously existing coun-
terpart DeleteCallForwarding transactions of the retroactively D.3 Resourcestresser
removed InsertCallForwarding transactions, the CALL_FORWARD- Resource Stresser’s dataset and transactions manage employment
ING table’s hash value became the same as before the retroactive information (e.g., salary), which are technically designed to run
operation, and thus Ultraverse returned the Call_Forwarding ta- stress-testing on a CPU, disk I/O, and locks from a database system.
ble’s same state stored before the retroactive operation. Conten�on1 IO1
R={ Locktable.empid } R={ Iotable.empid }
D.2 Epinions W={ Locktable.salary } W={ Iotable[data1:16] }
Epinions’ dataset and transactions are designed to generate rec- on Locktable.salary
ommendation system networks based on online user reviews. Conten�on2 IO2
R={ Locktable.empid } R={ Iotablesmallrow.empid }
DeleteUpdateItemTitle UpdateTrustRa�ng W={ Locktable.salary } W={ Iotablesmallrow.flag1 }
R={ Item.i_id } R={ Useracct.u_id } Figure 19: ResourceStresser’s transaction dependency graph.
W={ Item.�tle } W={ Trust.trust }
UpdateReviewRa�ng UpdateUserName
R={ Item.i_id, Useracct.u_id } R={ Useracct.u_id } CPUTable.empid
W={ Review.ra�ng } W={ Useracct.name }
Figure 17: Epinion’s transaction dependency graph (none). IOTable.empid IOTableSmallRow.empid LockTable.empid
Cluster key column
Useracct.u_id Item.i_id Foreign cluster key column
Cluster key propaga�on to an explicit foreign key column
Trust.u_id Review.u_id Review.i_id Figure 20: ResourceStresser’s cluster key propagation graph.
* Co-usable cluster key columns: Useracct.u_id & Item.i_id D.3.1 Attack Recovery Scenario 1The attacker injected mal-formed
Cluster key column IO2 transactions. We used Ultraverse to retroactively remove the
Foreign cluster key column identified transaction by using the following optimization technique.
Cluster key propaga�on to an explicit foreign key column Column-wise Query Dependency: There was no need to roll
Figure 18: Epinions’ cluster key propagation graph. back and replay its subsequent independent Contention1, Con-
tention2, and IO1 transactions, because they are column-wise in-
D.2.1 Attack Recovery Scenario 1Several UpdateReviewRating user
dependent from IO2. In particular, these transactions did not contain
requests turned out to be initiated by a remote attacker’s botnet. We
the IO2 transaction’s write set element iotablesmallrow.flag1
retroactively removed those UpdateReviewRating transactions by
in their read/write set.
using the following optimization technique.
Row-wise Query Clustering: Ultraverse used CPUTable.empid
Column-wise Query Dependency: There was no need to roll
as the cluster key and only needed to roll back & replay the IO2 trans-
back and replay its subsequent UpdateUserName, UpdateUserName,
actions whose cluster key is the same as that of the retroactively
and UpdateTrustRating transactions, because they are column-
removed IO2 transaction.
wise independent from UpdateReviewRating. In particular, these
D.3.2 Attack Recovery Scenario 2The attacker changed the order
transactions does not contain the UpdateReviewRating transac-
of certain IO1 transactions which updated the iotable table’s state.
tion’s write set element review.rating in their read/write set.
Once those transactions were identified, we retroactively corrected
Row-wise Query Clustering: Ultraverse used multiple cluster
their commit order. Ultraverse used the column-wise query depen-
key columns (Useracct.u_id and Item.i_id) and only needed to
dency analysis with the following optimization technique.
roll back & replay the UpdateReviewRating transactions whose
Hash-jump: After committing the last IO1 transaction which
cluster key is the same as that of the retroactively removed Up-
was correctly ordered, the iotable table’s evolution of hash values
dateReviewRating transaction.
matched the ones before the retroactive operation. Thus, Ultraverse
D.2.2 Attack Recovery Scenario 2An attacker-controlled transaction
returned the table’s same state stored before the retroactive opera-
changed a victim user’s account information by initiating a tampered
tion.
UpdateUserName request and changed the UserAcct table’s state.
Later in time, there was the victim user’s benign UpdateUserName re- D.4 SEATS
quest that changed the his account information to a different state. To SEAT’s dataset and transactions are designed for an online flight
correct any potential side effects, we decided to retroactively remove ticket reservation system.

22
Ultraverse: Efficient Retroactive Operation for Attack Recovery in Database Systems and Web Frameworks

UpdateCustomer D.5 TPC-C


R={ Customer.c_id, Airport.ap_id, Country[co_id],
Frequent_Flyer.*, Airline.al_id }
TPC-C’s dataset and transactions are designed to manage product
W={ Customer.c_ia�r[00,01], Frequent_Flyer.ff_ia�r[00:01] } orders shippings for online users in an e-commercial service.
on Customer.c_iattr00
UpdateReserva�on Delivery
R={ Reserva�on[r_id, r_seat], Flight.f_id, Customer.c_id } R={ District.d_id, Warehouse.w_id, Customer.c_id,
W={ Reserva�on.r_seat } OpenOrder[o_id,o_carrier_id,ol_amount] }
on Reservation.r_seat W={ OpenOrder.o_carrier_id, OrderLine[ol_delivery_d]
NewOrder.*, Customer[c_balance,c_delivery_cnt] }
NewReserva�on
on NewOrder.*,
R={ Flight[f_id], Customer[c_id,c_balance,c_sa�r00 ], OpenOrder.o_carrier_id on Customer.c_balance
Airport.ap_id, Reserva�on[r_id,r_seat], Airline.* }
W={ Reserva�on.*, Frequent_Flyer.ff_ia�r[10:14], Payment
Customer.c_ia�r[10:15], Flight.seats_le� } R={ Warehouse[w_id,w_ytd,w_[info] ],
on Reservation.* District[d_id,d_ytd,d_[info] ],
DeleteReserva�on Customer[c_id,c_[info],c_credit,c_credit_lim,
R={ Customer[c_id,c_id_str,c_sa�r[00,02,04] ], Airline.al_id, c_discount,c_balance,c_ytd_payment,
c_ia�r[00,02,04,06] ], Flight[f_id,f_seats_le�], c_payment_cnt] }
Reserva�on[r_id,r_seat,r_price,r_ia�r00] } W={ District[d_ytd], History.*, Warehouse[w_ytd],
W={ Reserva�on.*, Flight.f_seats_le�, Customer[c_balance,c_ytd_payment,
Customer[c_balance,c_ia�r[00,10,11] ], c_payment_cnt,c_data] }
Frequent_Flyer.ff_ia�r10 }
NewOrder
Figure 21: SEAT’s transaction dependency graph. R={ Warehouse[w_id,w_tax],
District[d_id,d_tax,d_next_o_id],
Flight.f_id Customer.c_id Customer[c_id,c_discount,c_last,c_credit],
Item[i_id,i_price,i_name,i_data],
Stock{s_quan�ty,s_data,s_dist_[00:10] }
Frequent_Flyer.ff_c_id Customer.c_id_str W={ NewOrder.*, OpenOrder.*, District[d_next_o_id],
Stock[info] }
Reserva�on.r_f_id Reserva�on.r_c_id Figure 23: TPC-C’s transaction dependency graph.
* Co-usable cluster key columns: Flight.f_id & Customer.c_id
Cluster key column New_Order.NO_W_ID
Foreign cluster key column Stock.S_W_ID
Alias cluster key column Order_Line.OL_W_ID
Warehouse.W_ID
Cluster key propaga�on to an explicit foreign key column
Key propaga�on to an alias cluster key column (same table) District.D_W_ID
Figure 22: SEATS’s cluster key propagation graph. OOrder.O_W_ID
Customer.C_W_ID History.H_C_W_ID
D.4.1 Attack Recovery Scenario 1An attacker broke into the flight
ticket reservation system and tampered with passengers’ reserva- Cluster key column
tion information by issuing malicious UpdateReservation user Foreign cluster key column
requests. After the problematic transactions were identified, Ultra- Cluster key propaga�on to an explicit foreign key column
verse retroactively updated the database by using the following Figure 24: TPC-C’s cluster key propagation graph.
optimization technique. D.5.1 Attack Recovery Scenario 1We configured the number of ware-
Row-wise Query Clustering: Ultraverse used multiple cluster houses to 10, which corresponded to the number of clusters in Ultra-
key columns (Flight.f_id and Customer.c_id), so only needed verse’s query clustering scheme. An attacker injected a fabricated
to rollback & replay the transactions which are in the same cluster Payment transaction without an actual payment. After this transac-
as the retroactively updated UpdateReservation transaction. tion was identified, Ultraverse retroactively removed the transaction
D.4.2 Attack Recovery Scenario 2The attacker intercepted and swapped by applying the following optimization technique.
the commit order of a client’s two UpdateCustomer transactions, Row-wise Query Clustering: Ultraverse used Warehouse.w_-
both of which updated only the client’s Customer.iattr01 meta- id as the cluster key column, so only needed to rollback and reply
data. Ultraverse retroactively corrected their commit order based on those transactions which had the same cluster key as the retroactively
the column-wise query dependency analysis and row-wise query removed Payment transaction.
clustering, as well as the following optimization technique. D.5.2 Attack Recovery Scenario 2After the attacker’s fabricated a
Hash-jump: There is no transaction that reads (i.e., depends on) Payment transaction, the vendor for this product failed to ship it
Customer.iattr01, so the other tables’ states were unaffected by out due to an issue with logistics, so the delivery was cancelled by
this attack. After correcting the order of two UpdateCustomer trans- the vendor and the victimized client’s balance received the refund
actions, there was another benign transaction which overwrote the in the Customer table. However, to ensure correctness, we used
client’s value of Customer.iattr01 metadata, after which point the Ultraverse to retroactively remove the malicious Payment transac-
Customer table hash for this client’s cluster matched the past version. tion. Ultraverse used row-wise query clustering with the following
Thus, Ultraverse returned the table’s same state stored before the optimization technique.
retroactive operation. Hash-jump: After retroactively removing the malicious Payment
transaction, and after the commit time when its associated Delivery

23
Ronny Ko1,4 , Chuan Xiao2,3 , Makato Onizuka2 , Yihe Huang1 , Zhiqiang Lin4

transaction refunded the cost to the customer, the hash value and D.6.3 Attack Recovery Scenario 3An attacker stole a user’s creden-
the subsequent transactions for the Customer table matched the tial and logged into the user’s account by issuing a “Login” user
ones before the retroactive operation. Thus, Ultraverse returned the request. After this user request was identified, Ultraverse retroac-
table’s same state stored before the retroactive operation. tively removed it by applying the following optimization technique.
Row-wise Query Clustering: The Users.user_id column was
D.6 Invoice Ninja used as the cluster key. When the “Login” user request was removed,
Invoice Ninja is an online-banking web application service that Ultraverse only needed to rollback and replay only that user’s subse-
manages user accounts and transfer of funds between users. We eval- quent requests, while all other users’ transactions were skipped who
uated 5 scenarios where an attacker controlled the following 5 user have not had (direct/indirect) interactions with this user, as their
requests: “Create a Bill”, “Modify an Item in a Bill”, “Login”, Make- queries were in different clusters.
Payment”, and “Logout”. We used virtual two-level table mappings Row-wise Query Dependency: Ultraverse only needed to roll-
(§C.7) for the Sessions and Cookies tables. back and replay those transactions that are in the same cluster as the
Sessions.u_id Sessions.session_id Cookies.session_id retroactively removed “Login” transaction.
Items.creator_id Items.item_id BillItems.item_id
D.6.4 Attack Recovery Scenario 4Ultraverse retroactively removed
an attacker-initiated “MakePayment” user request by applying the
Bills.creator_id
Users.u_id Bills.bill_id BillItems.bill_id following optimization technique.
Bills.payer_id Row-wise Query Clustering: Similar to scenario 3’s optimiza-
Payments.creator_id tion, Ultraverse only needed to rollback and replay that user’s and
Payments.recipient_id other users’ subsequent requests who have had a money flow from
Sta�s�cs.user_id ※ “Cookies” and “BIllItems” tables this user.
have virtual 2-level table mappings.
Row-wise Query Dependency: Ultraverse only needed to roll-
Cluster key column
Alias cluster key column back and replay those transactions that are in the same cluster as the
Foreign cluster key column
Cluster key propaga�on to an implicit foreign key column retroactively removed “MakePayment” transaction.
Key propaga�on to an alias cluster key column D.6.5 Attack Recovery Scenario 5A user finished using the web
Figure 25: Invoice Ninja’s cluster key propagation graph. service by using a public PC and left the seat without logging out. An
D.6.1 Attack Recovery Scenario 1A user’s bill was maliciously cre- attacker took the seat and used the service for another hour by using
ated by an attacker-initiated “Create a Bill” user request. After this the user’s account. After identifying this event via a surveillance
user request was identified, Ultraverse retroactively removed it by camera in the public area, we used Ultraverse to retroactively move
applying the following optimization technique. the victimized user’s “Logout” request to 1 hour earlier when he left
Column-wise Query Dependency: Ultraverse rolled back and the seat. Ultraverse’s retroactive operation used the column-wise
replayed only the following transactions: “Add a New Bill”, “Delete query dependency analysis and row-wise query clustering, as well
a Bill”, “Add an Item to a Bill”, “Modify an Item in a Bill”, and “Delete as the following optimization technique.
an Item in a Bill”. Other transactions such as “Sign Up”, “Login”, “Log Hash-jump: The attacker’s activities only affected the state of
Out”, “Reset Password”, “Edit My Profile” were not rolled back and the “Sessions” table that exclusively belonged to the user. This table’s
replayed because they did not depend on the changed values in the state was refreshed after the user logged in again next time. As of
Bills, Items, or BillItems tables. this point, the user’s “Sessions” table’s state was the same as before
Row-wise Query Dependency: Ultraverse used User.u_id as the retroactive operation. Therefore, Ultraverse returned the table’s
the cluster key column, so only needed to rollback and replay those same state stored before the retroactive operation.
transactions that are in the same cluster as the retroactively removed
“Create a Bill” transaction.
D.6.2 Attack Recovery Scenario 2An attacker tampered with the
price of an item by issuing a malicious “Modify an Item in a Bill” user
request. We retroactively modified the price in the injected “Modify
an Item in a Bill” transaction to a correct value. Ultraverse retroac-
tively updated the database by applying the following optimization
technique.
Column-wise Query Dependency: Ultraverse replayed only
“Create an Item”, “Add an item to a Bill”, “Modify an Item in a Bill”,
and “Delete an Item in a Bill”. Other user requests such as “Sign Up”,
“Login”, “Log Out”, “Reset Password”, “Edit My Profile”, “Create a
Bill”, “Delete a Bill” were not replayed because they didn’t depend
on the Items or BillItems tables.
Row-wise Query Dependency: Ultraverse only needed to roll-
back and replay those transactions that are in the same cluster as the
retroactively updated “Modify an Item in a Bill” transaction.

24
Ultraverse: Efficient Retroactive Operation for Attack Recovery in Database Systems and Web Frameworks

E Formal Analysis of Query Dependency and D : A given database


Q : A set of all committed queries in D
Query Clustering
𝑸𝒏 : a query with index 𝑛 in Q
The formal definition of a retroactive operation is as follows:
𝝉 : a retroactive target query’s index in Q
Definition 1 (Retroactive Operation). Let D a database and 𝑸 𝝉′ : The retroactive target query to add or change
Q a set of all committed queries 𝑄 1, 𝑄 2, · · · 𝑄 |Q| where the subscript 𝑻𝒙 : Query "CREATE TRIGGER x"
represents the query’s index (i.e., commit order). Let Q ⟨𝑖,𝑗 ⟩ be a subset of 𝑻 𝒙−1 : Query "DROP TRIGGER x" (.𝑻 𝒙 ’s counterpart)
Q that contains from i-th to j-th queries in Q, that is {𝑄𝑖 , 𝑄𝑖+1, · · · , 𝑄 𝑗 } 𝑹 (𝑸 𝒏 ) : 𝑄𝑛 ’s read set
(where 𝑖 ≤ 𝑗). Let 𝜓 be the last query’s commit order in Q (i.e., |Q|). Let 𝑾 (𝑸 𝒏 ) : 𝑄𝑛 ’s write set
M : D, Q → D ′ be a function that accepts an input database D and 𝒄 : a table’s column
a set of queries Q, executes queries in Q in ascending order of query 𝑸 𝒏 → 𝑸 𝒎 : 𝑄𝑛 depends on 𝑸 𝒎
indices, and outputs a resulting database D ′ . Let M −1 : D, Q → D ′ be 𝑸 𝒎 ↷ 𝑸 𝒏 : 𝑄𝑚 is an influencer of 𝑄𝑛
a function that accepts an input database D and a set of queries Q, rolls Definition 2 (Read/Write Set). A query 𝑄𝑖 ’s read set is the set
back queries in Q in descending order of query indices, and outputs of column(s) that 𝑄𝑖 operates on with read access. 𝑄𝑖 ’s write set is the
a resulting database D ′ . Given a database D and a set of committed set of column(s) that 𝑄𝑖 operates on with write access.
queries Q, a retroactive operation with a target query 𝑄𝜏′ is defined to
be a transformation of D to a new state that matches the one generated For each type of SQL statements, its read & write sets are deter-
by the following procedure: mined according to the policies described in Table A.
(1) Roll back D’s state to commit index 𝜏 − 1 by computing D := Loosely speaking, given D and Q, we define that 𝑄𝑖 depends on 𝑄 𝑗
M −1 (D, Q ⟨𝜏,𝜓 ⟩ ). if some retroactive operation on 𝑄𝑖 could change the result of 𝑄 𝑗 (i.e.,
(2) Depending on the database user’s command, do one of the fol- 𝑄 𝑗 ’s return value or the state of the resulting table that 𝑄 𝑗 writes to).
lowing retroactive operations: In this section, when we say query dependency, it always implies
• In case of retroactively adding 𝑄𝜏′ , newly execute 𝑄𝜏′ by comput- column-wise query dependency (discussed in §3.3). We present the
ing D := M (D, 𝑄𝜏′ ), and then replay all subsequent queries by formal definition of query dependency in Definition 3.
computing D := M (D, Q ⟨𝜏,𝜓 ⟩ ). Definition 3 (Query Dependency). Given a database D and a
• In case of retroactively removing 𝑄𝜏 , skip replaying 𝑄𝜏 , and re- set of all committed queries Q, one query depends on another if they
play all subsequent queries by computing D := M (D, Q ⟨𝜏+1,𝜓 ⟩ ). satisfy Proposition 1, 2, 3, or 4.
• In case of retroactively changing 𝑄𝜏 to 𝑄𝜏′ , newly execute 𝑄𝜏′ by
computing D := M (D, 𝑄𝜏′ ), and replay all subsequent queries Proposition 1. ∃𝑐 ((𝑐 ∈ 𝑊 (𝑄𝑚 )) ∧ (𝑐 ∈ (𝑅(𝑄𝑛 ) ∪ 𝑊 (𝑄𝑛 )))) ∧
by computing D := M (D, Q ⟨𝜏+1,𝜓 ⟩ ). (𝑚 < 𝑛) =⇒ 𝑄𝑛 → 𝑄𝑚
Proposition 1 states that if 𝑄𝑛 reads or writes the table/view’s
The goal of Ultraverse’s query analysis is to reduce the number
column after 𝑄𝑚 writes to it. Proposition 1 captures the cases where
of queries to be rolled back and replayed for a retroactive operation,
two queries operate on the same column and retroactively adding or
while preserving its correctness.
removing the prior query could change the column’s state that the
Setup 1 (Ultraverse’s Query Analysis). . later query accesses.
Input : D, Q, ⟨𝑄𝜏′ , add|remove|change⟩.
Proposition 2. (𝑄𝑛 → 𝑄𝑚 ) ∧ (𝑄𝑚 → 𝑄𝑙 ) =⇒ 𝑄𝑛 → 𝑄𝑙
Output : A subset of .Q to be rolled back and replayed.
Proposition 2 states that if 𝑄𝑛 depends on 𝑄𝑚 and 𝑄𝑚 depends
Setup 1 describes the input and output of Ultraverse’s query anal-
𝑂𝑙 , then 𝑄𝑛 also depends on 𝑄𝑙 (transitivity). Proposition 2 captures
ysis. The input is D (a database), Q (a set of all committed queries), 𝑄𝜏′
the cases where two queries, 𝑄𝑛 and 𝑄𝑙 , do not operate on the same
(a retroactive target query to be added or changed to), and the type
column, but there exists some intermediate query𝑄𝑚 which operates
of retroactive operation on 𝑄𝜏′ (i.e., add, remove, or change it). Note
on some same column as each of 𝑄𝑛 and 𝑄𝑙 . In such cases, 𝑄𝑚 acts
that in case of retroactive removal of the query at the commit index
as a data flow bridge between 𝑄𝑙 ’s column and 𝑄𝑛 ’s column, and
𝜏, the retroactive target query 𝑄𝜏′ in the Input is 𝑄𝜏 . The output is a
therefore, a retroactive operation on 𝑄𝑙 could change the column’s
subset of Q. Rolling back and replaying the output queries result in
state that 𝑄𝑛 accesses. Therefore, we regards that 𝑄𝑛 depends on 𝑄𝑙
a correct retroactive operation.
transitively.
Ultraverse’s query analysis is comprised of two components:
query dependency analysis and query clustering analysis. We will Proposition 3. (∃𝑐 ((𝑐 ∈ 𝑊 (𝑄𝑛 )) ∧ (𝑐 ∈ (𝑅(𝑄𝑘 ) ∪𝑊 (𝑄𝑘 ))))) ∧
first describe query dependency analysis and then extend to query (((𝑄𝑛 = 𝑇𝑥 ) ∧ ((𝑛 > 𝑘) ∨ ((𝑄𝑚 = 𝑇𝑥−1 ) ∧ (𝑚 > 𝑘)))) ∨ ((𝑄𝑛 =
clustering analysis. To show the correctness of performing retroac- 𝑇𝑥−1 ) ∧ (𝑛 > 𝑘))) =⇒ 𝑄𝑘 → 𝑄𝑛
tive operations using query analysis, we first assume that the retroac- Proposition 4. (∃𝑐 ((𝑐 ∈ 𝑊 (𝑄𝑘 )) ∧ (𝑐 ∈ (𝑅(𝑄𝑛 ) ∪𝑊 (𝑄𝑛 ))))) ∧
tive operation is either adding or removing a query, and address the (((𝑄𝑛 = 𝑇𝑥 ) ∧ ((𝑛 > 𝑘) ∨ ((𝑄𝑚 = 𝑇𝑥−1 ) ∧ (𝑚 > 𝑘)))) ∨ ((𝑄𝑛 =
case of retroactively changing a query at the end of this section. 𝑇𝑥−1 ) ∧ (𝑛 > 𝑘))) =⇒ 𝑄𝑛 → 𝑄𝑘
E.1 Column-wise Query Dependency Analysis We additionally present Proposition 3 and 4 to handle triggers. At
Terminology 1 (Query Dependency Analysis). . a high level, these two propositions enforce that if a trigger query
either depends on or is depended by (i.e., has an incoming or out-
going dependency arrow to) at least one query that depends on the
25
Ronny Ko1,4 , Chuan Xiao2,3 , Makato Onizuka2 , Yihe Huang1 , Zhiqiang Lin4

retroactive target query 𝑄𝜏′ , then during the replay of the retroac- (𝑏𝑖 > 𝜏) ∧ (𝑄𝑏𝑖 ∉ I). For example, 𝑄𝑏 1 is the oldest query in Q that
tive operation, this trigger is reactivated by replaying its equivalent does not belong to I, and 𝑄𝑏 2 is the second-oldest query in Q that
CREATE TRIGGER query. These two propositions conservatively as- does not belong to I. Note that every query that does not belong to I
sume that a trigger’s conditionally executed body is always executed also does not depend on any query in I (otherwise, it should have
until the trigger is dropped by its equivalent DROP TRIGGER query. been put into I by Proposition 1, 2, 3, 4, 5, and 6). We prove Theorem 1
Proposition 3 states that if a trigger 𝑇𝑥 was alive at the moment 𝑄𝑘 by finite induction.
was committed and 𝑄𝑘 reads or writes some column that 𝑇𝑥 writes,
then 𝑄𝑘 depends on 𝑇𝑥 and 𝑇𝑥−1 . Proposition 4 covers the reverse of Case 𝑸 𝒃1 : Let D𝑏 1 = M (M (M −1 (D, Q ⟨𝜏,𝜓 ⟩ −{𝑄𝑏 1 }), 𝑄𝜏′ ), Q ⟨𝜏,𝜓 ⟩ −
Proposition 3: 𝑇𝑥 and 𝑇𝑥−1 depends on 𝑄𝑘 if 𝑇𝑥 was alive when 𝑄𝑘 {𝑄𝑏 1 }), which is equivalent to rolling back all queries in Q ⟨𝜏,𝜓 ⟩ ex-
was committed and 𝑇𝑥 reads or writes a column that 𝑄𝑘 writes. cept for 𝑄𝑏 1 , executing 𝑄𝜏′ , and replaying all queries in Q ⟨𝜏,𝜓 ⟩ except
for 𝑄𝑏 1 .
Definition 4 (I). I is an intermediate set of all queries that are
selected to be rolled back and replayed for a retroactive target query Lemma 1. D ′ = D𝑏 1 , which is equivalent to:
𝑄𝜏′ . M (M (M −1 (D, Q ⟨𝜏,𝜓 ⟩ ), 𝑄𝜏′ ), Q ⟨𝜏,𝜓 ⟩ )
We define the I set for three purposes. First, we add the queries = M (M (M −1 (D, Q ⟨𝜏,𝜓 ⟩ − {𝑄𝑏 1 }), 𝑄𝜏′ ), Q ⟨𝜏,𝜓 ⟩ − {𝑄𝑏 1 }).
dependent on the retroactive target query 𝑄𝜏′ to I, as candidate The definition of 𝑄𝑏 1 implies that in Q ⟨𝜏,𝜓 ⟩ , any query committed
queries to be rolled back and replayed. Second, we further add more before 𝑄𝑏 1 belongs to I. Proposition 2 and 5 guarantee that 𝑄𝑏 1
queries that need to be rolled back and replayed in order to replay does not depend on any query in I (because otherwise, 𝑄𝑏 1 would
consulted table(s) (discussed in §3.4). Third, we remove those queries have been put into I). This means that the results of 𝑄𝑏 1 (i.e., the
that do not belong to the same cluster as the retroactive target query resulting state of its write set columns) will be the same as before
𝑄𝜏′ (discussed in §3.5). the retroactive operation. Further, in Q, no query committed before
𝑄𝑏 1 depends on the results of 𝑄𝑏 1 (because otherwise, there would
Proposition 5. (𝑄𝑖 → 𝑄𝜏′ ) ∧ (𝑊 (𝑄𝑖 ) ≠ ∅) =⇒ 𝑄𝑖 ∈ I.
have been an influencer for such a query, and as both the influencer
Proposition 5 states if 𝑄𝑖 depends on the retroactive target query and 𝑄𝑏 1 write to the same column(s), 𝑄𝑏 1 would have been put into
𝑄𝜏′ and 𝑄𝑖 ’s write set is not empty, then 𝑄𝑖 is added to I (we do not I according to Proposition 6). Provided that the results of 𝑄𝑏 1 are
rollback and replay if 𝑄𝑖 ’s write set is empty, because a read-only the same as before the retroactive operation and its results are to be
query does not change the database’s state). Proposition 5 presents used only by those queries committed after 𝑄𝑏 1 , 𝑄𝑏 1 needs not be
our first purpose of using I. rolled back & replayed while generating D ′ . Thus, Lemma 1 is true.
To find the queries to be rolled back and replayed in order to replay
consulted table(s), we introduce a new term, influencer. Case 𝑸 𝒃2 : Let Q𝑏 1 be the set of rolled back & replayed queries for
Definition 5 (Influencer). (𝑄𝑖 → 𝑄𝜏′ ) ∧ (∃𝑐, 𝑓 ((𝑐 ∈ (𝑅(𝑄𝑖 ) ∪ generating D𝑏 1 . Let D𝑏 2 = M (M (M −1 (D, Q ⟨𝜏,𝜓 ⟩ −{𝑄𝑏 1 , 𝑄𝑏 2 }), 𝑄𝜏′ ),
𝑊 (𝑄𝑖 ))) ∧ (𝑓 = argmax 𝑗 (( 𝑗 < 𝑖) ∧ (𝑐 ∈ 𝑊 (𝑄 𝑗 )))))) =⇒ 𝑄 𝑓 ↷ 𝑄𝑖 Q ⟨𝜏,𝜓 ⟩ −{𝑄𝑏 1 , 𝑄𝑏 2 }), which is equivalent to rolling back all queries in
Q ⟨𝜏,𝜓 ⟩ except for {𝑄𝑏 1 , 𝑄𝑏 2 }, executing 𝑄𝜏′ , and replaying all queries
in Q ⟨𝜏,𝜓 ⟩ except for {𝑄𝑏 1 , 𝑄𝑏 2 }.
Definition 5 states that if query 𝑄𝑖 depends on the retroactive
target query 𝑄𝜏′ and 𝑄𝑖 immediately (back-to-back) depends on 𝑄 𝑓 Lemma 2. D𝑏 1 = D𝑏 2 , which is equivalent to:
on some column 𝑐 (i.e., 𝑄 𝑓 is the last query that writes to 𝑐 before 𝑄𝑖 M (M (M −1 (D, Q ⟨𝜏,𝜓 ⟩ − {𝑄𝑏 1 }), 𝑄𝜏′ ), Q ⟨𝜏,𝜓 ⟩ − {𝑄𝑏 1 }).
accesses it), then 𝑄 𝑓 is defined to be an influencer of 𝑄𝑖 . = M (M (M −1 (D, Q ⟨𝜏,𝜓 ⟩ −{𝑄𝑏 1 , 𝑄𝑏 2 }), 𝑄𝜏′ ), Q ⟨𝜏,𝜓 ⟩ −{𝑄𝑏 1 , 𝑄𝑏 2 }).

Proposition 6. ∃𝑗, 𝑓 ((𝑄 𝑗 ∈ I) ∧ (𝑄 𝑓 ↷ 𝑄 𝑗 ) ∧ (𝑄𝑖 → 𝑄 𝑓 )) =⇒


𝑄𝑖 ∈ I The definition of 𝑄𝑏 2 implies that in Q ⟨𝜏,𝜓 ⟩ , any query committed
before 𝑄𝑏 1 belongs to I ∪ {𝑄𝑏 1 }. But in case of D𝑏 1 , Q𝑏 1 does not con-
Proposition 6 states that if 𝑄𝑖 depends on some influencer of some tain 𝑄𝑏 1 , and thus in Q𝑏 1 , any query committed before 𝑄𝑏 1 belongs
query in I, then 𝑄𝑖 is added to I. Note that Proposition 1, 2, 3, 4, 5, and to I. Then, based on the same reasoning used for proving Lemma 1,
Proposition 6 repeats to find and add more queries that are required the results of 𝑄𝑏 2 are the same as before the retroactive operation
to fully replay all consulting table(s) for the retroactive target query and its results are to be used only by those queries committed after
𝑄𝜏′ . 𝑄𝑏 2 . Thus, 𝑄𝑏 2 needs not be rolled back & replayed while generating
Once Proposition 1, 2, 3, 4, 5, and 6 complete repetition until no D𝑏 1 . Thus, Lemma 2 is true.
more queries are added to I, query dependency analysis is complete.
....Case 𝑸 𝒃𝝍−|I| : Let Q𝑏𝜓 −|I|−1 be the set of rolled back & replayed
Theorem 1. For a retroactive operation for adding or removing queries for generating D𝑏𝜓 −|I|−1 . Let
the target query 𝑄𝜏′ , it is sufficient to do the following: (i) rollback the
D𝑏𝜓 −|I| = M (M (M −1 (D, Q ⟨𝜏,𝜓 ⟩ −
queries that belong to I; (ii) either execute 𝑄𝜏′ (in case of retroactively
adding 𝑄𝜏′ ) or roll back 𝑄𝜏′ (in case of retroactively removing 𝑄𝜏′ ); (iii) {𝑄𝑏 1 , 𝑄𝑏 2 , · · · 𝑄𝑏𝜓 −|I| }), 𝑄𝜏′ ), Q ⟨𝜏,𝜓 ⟩ − {𝑄𝑏 1 , 𝑄𝑏 2 , · · · 𝑄𝑏𝜓 −|I| }), which
replay all queries in I. is equivalent to rolling back all queries in Q ⟨𝜏,𝜓 ⟩ except for
{𝑄𝑏 1 , 𝑄𝑏 2 , · · · 𝑄𝑏𝜓 −|I| }, executing 𝑄𝜏′ , and replaying all queries in
Proof. Let the database state after the retroactive operation of Q ⟨𝜏,𝜓 ⟩ except for {𝑄𝑏 1 , 𝑄𝑏 2 , · · · 𝑄𝑏𝜓 −|I| }.
adding the target query𝑄𝜏′ be D ′ = M (M (M −1 (D, Q ⟨𝜏,𝜓 ⟩ ), 𝑄𝜏′ ), Q ⟨𝜏,𝜓 ⟩ ).
Let 𝑏𝑖 be the 𝑖-th oldest query index that satisfies the following: Lemma 3. D𝑏𝜓 −|I|−1 = D𝑏𝜓 −|I| , which is equivalent to:
26
Ultraverse: Efficient Retroactive Operation for Attack Recovery in Database Systems and Web Frameworks

M (M (M −1 (D, Q ⟨𝜏,𝜓 ⟩ − {𝑄𝑏 1 , 𝑄𝑏 2 , · · · 𝑄𝑏𝜓 −|I|−1 }), 𝑄𝜏′ ), Proposition 7. The cluster key column 𝑐𝑘 is considered to be ef-
...Q ⟨𝜏,𝜓 ⟩ − {𝑄𝑏 1 , 𝑄𝑏 2 , · · · 𝑄𝑏𝜓 −|I|−1 }) ficient enough to evenly cluster queries if the following is true: 𝑐𝑘 B
argmin𝑐 ∈ C 𝑗=𝜏 |𝐾𝑐 (𝑄 𝑗 )| 2 .
Í𝜓
= M (M (M −1 (D, Q ⟨𝜏,𝜓 ⟩ − {𝑄𝑏 1 , 𝑄𝑏 2 , · · · 𝑄𝑏𝜓 −|I| }), 𝑄𝜏′ ),
...Q ⟨𝜏,𝜓 ⟩ − {𝑄𝑏 1 , 𝑄𝑏 2 , · · · 𝑄𝑏𝜓 −|I| }) Proposition 7 describes how to choose a cluster key column 𝑐𝑘 in
D which is efficient enough to evenly cluster queries. Our goal for
The definition of 𝑄𝑏𝜓 −|I| implies that in Q ⟨𝜏,𝜓 ⟩ , any query commit- choosing𝑐𝑘 is not to minimize the number of queries to be rolled back
ted before 𝑄𝑏𝜓 −|I| belongs to I ∪ {𝑄𝑏 1 , 𝑄𝑏 2 , · · · 𝑄𝑏𝜓 −|I| }. But in case of and replayed for every possible scenario of retroactive operation, but
D𝑏𝜓 −|I|−1 , Q𝑏𝜓 −|I|−1 does not contain {𝑄𝑏 1 , 𝑄𝑏 2 , · · · 𝑄𝑏𝜓 −|I| }, and thus to minimize the variance of the number of those queries across all
in Q𝑏𝜓 −|I|−1 , any query committed before 𝑄𝑏𝜓 −|I| belongs to I. Then, scenarios, based on the observed heuristics applied to the ordered list
based on the same reasoning used for proving Lemma 1, the results of committed queries 𝐻 . |𝐾𝑐 (𝑄 𝑗 )| represents the number of cluster
of 𝑄𝑏𝜓 −|I| are the same as before the retroactive operation and its keys that query 𝑄 𝑗 is assigned, and its power of 2 adds a penalty
results are to be used only by those queries committed after 𝑄𝑏𝜓 −|I| . if the query has skewedly many cluster keys. In other words, the
Thus, 𝑄𝑏𝜓 −|I| needs not be rolled back & replayed while generating formula prefers a column where each query belongs to a finer and
D𝑏𝜓 −|I|−1 . Thus, Lemma 3 is true. balanced granularity of clusters.
Now, we describe how to cluster queries.
According to Lemma 1, 2 and 3, Proposition 8. 𝐾𝑐𝑘 (𝑄𝑛 ) ∩ 𝐾𝑐𝑘 (𝑄𝑚 ) ≠ ∅ =⇒ 𝑄𝑛 ↭ 𝑄𝑚
D ′ = D𝑏 1 = D𝑏 2 = · · · = D𝑏𝜓 −|I|
= M (M (M −1 (D, I), 𝑄𝜏′ ), I) Proposition 8 states that if two queries have an overlap in their
cluster key(s), then the two queries are merged into (↭) the same
Therefore, during the retroactive operation of adding the target cluster. This captures the cases that if two queries access the same
query 𝑄𝜏′ , all queries that do not belong to I need not be rolled back tuples, then retroactively adding or removing either query could
& replayed, and the resulting database’s state is still consistent. potentially affect the result of another query.
If the retroactive operation is removing 𝑄𝜏′ , then the resulting Proposition 9. (𝑄𝑚 ↭ 𝑄𝑛 ) ∧ (𝑄𝑛 ↭ 𝑄𝑜 ) =⇒ 𝑄𝑚 ↭ 𝑄𝑜
database’s state is D ′ = M (M −1 (D, Q ⟨𝜏,𝜓 ⟩ ), Q ⟨𝜏+1,𝜓 ⟩ ). Then, a sim-
ilar induction proof used for Lemma 1, 2, 3 can be applied to derive Proposition 9 states that clustering is transitive. This further cap-
the following: tures the cases that even if 𝑄𝑚 and 𝑄𝑜 do not access any same tuples,
M (M −1 (D, Q ⟨𝜏,𝜓 ⟩ ), Q ⟨𝜏+1,𝜓 ⟩ ) one could affect another’s result if there exists a bridging query 𝑄𝑛
between them such that 𝑄𝑛 and 𝑄𝑚 access some same tuple, and 𝑄𝑛
= M (M −1 (D, Q ⟨𝜏,𝜓 ⟩ − {𝑄𝑏 1 }), Q ⟨𝜏+1,𝜓 ⟩ − {𝑄𝑏 1 })
and 𝑄𝑜 access some same tuple. If we repeat transitive clustering
= M (M −1 (D, Q ⟨𝜏,𝜓 ⟩ − {𝑄𝑏 1 , 𝑄𝑏 2 }), Q ⟨𝜏+1,𝜓 ⟩ − {𝑄𝑏 1 , 𝑄𝑏 2 }) until no more clusters are merged together, the final output is a set of
··· mutually disjoint final clusters such that queries in different clusters
= M (M −1 (D, Q ⟨𝜏,𝜓 ⟩ − {𝑄𝑏 1 , 𝑄𝑏 2 , · · · 𝑄𝑏𝜓 −|I| }), Q ⟨𝜏+1,𝜓 ⟩ access mutually disjoint set of data tuples.
−{𝑄𝑏 1 , 𝑄𝑏 2 , · · · 𝑄𝑏𝜓 −|I| })
Theorem 2. For a retroactive operation for adding or removing
= M (M −1 (M −1 (D, I), 𝑄𝜏′ ), I)
the target query 𝑄𝜏′ , it is sufficient to do the following: (i) rollback

the queries that belong to I and are co-clustered with 𝑄𝜏′ ; (ii) either
E.2 Row-wise Query Clustering Analysis execute 𝑄𝜏′ (in case of retroactively adding 𝑄𝜏′ ) or roll back 𝑄𝜏′ (in case
Next, we describe query clustering analysis to further reduce of retroactively removing 𝑄𝜏′ ); (iii) replay all queries that belong to I
queries from I. First, we present additional notations, as described and is co-clustered with 𝑄𝜏′ .
in Terminology 2. Proof. In the proof of Theorem 1, we showed that given database
Terminology 2 (Query Clustering Analysis). . D and committed queries Q, a retroactive operation for adding or
𝝍 : The last committed query’s index in Q removing 𝑄𝜏′ does not need to roll back and replay all queries in Q,
C : The set of all table columns in D but only those in I. We further break down I into I𝐾 and I ∅ , where
𝑲 𝒄 (𝑸 𝒏 ) : A cluster set containing 𝑄𝑛 ’s cluster keys I𝐾 = {𝑄𝑖 |𝑄𝑖 ↭ 𝑄𝜏′ }, and I ∅ = {𝑄𝑖 |𝑄𝑖 ̸↭ 𝑄𝜏′ }.
..given 𝑐 is chosen as the cluster key column Our proof leverages the commutativity rule [64]:
𝑄𝑛 ↭ 𝑄𝑚 : 𝑄𝑛 and 𝑄𝑚 are in the same cluster (1) If two transactions are read-then-read, write-then-read, or
The motivation of defining query clusters is to group queries into write-then-write operations on mutually non-overlapping ob-
disjoint sets such that a retroactive operation on any query in one jects, those two transactions are defined to be non-conflicting.
group does not affect the results of queries in other groups. (2) If two non-conflicting transactions are consecutive to each
other, then swapping their execution order does not change the
Definition 6 (cluster key Column). A cluster key column is the resulting database’s state.
selected table column in a given database D, based on which queries Note that each query in I ∅ is non-conflicting with all queries in I𝐾 .
are clustered into disjoint groups. Suppose the retroactive operation is to add the retroactive target
Definition 7 (Query Cluster). A query 𝑄𝑖 ’s cluster is the set query 𝑄𝜏′ . Then, the retroactive operation’s result is
of values or value ranges of the chosen cluster key column 𝑐 that 𝑄𝑖 M (M (M −1 (D, I), 𝑄𝜏′ ), I)
performs read-or-write operation on. = M (M (M −1 (D, I𝐾 ), 𝑄𝜏′ ), I𝐾 ).
27
Ronny Ko1,4 , Chuan Xiao2,3 , Makato Onizuka2 , Yihe Huang1 , Zhiqiang Lin4

This is because the commutativity rule allows all queries in I ∅ F Collision Rate of Ultraverse’s Table Hashes
to be moved to before 𝑄𝜏′ was committed in the query execution Ultraverse computes a table’s hash by hashing its each row with a
history with harming the consistency of the resulting database, and collision-resistant hash function and summing them up. By assum-
thus need not be rolled back & replayed. ing that the collision-resistant hash function’s output is uniformly
Suppose the retroactive operation is to remove the retroactive distributed in [0, 𝑝 − 1], we will prove that given two tables 𝑇1 and
target query 𝑄𝜏′ . Then, the retroactive operation’s result is 𝑇2 , Ultraverse’s Hash-jumper’s hash collision rate is upper-bounded
M (M −1 (M −1 (D, I), 𝑄𝜏′ }), I) by 𝑝1 (i.e., with a probability no more than 𝑝1 producing the same
= M (M −1 (M −1 (D, I𝐾 ), 𝑄𝜏′ ), I𝐾 ) hash value for 𝑇1 and 𝑇2 , when 𝑇2 ≠ 𝑇1 ).
Therefore, for a retroactive operation, it is sufficient to do the Suppose the Hash-jumper outputs a hash value ℎ ∈ [0, 𝑝 − 1] for
following: (i) rollback the queries that belong to I𝐾 ; (ii) either execute 𝑇1 . Without loss of generality, we assume 𝑇2 has 𝑛 rows. Then we
𝑄𝜏′ (in case of retroactively adding 𝑄𝜏′ ) or roll back 𝑄𝜏′ (in case of prove by induction.
retroactively removing 𝑄𝜏′ ); (iii) replay all queries that belong to I𝐾 .
□ Case n = 1: Because the collision-resistant hash function’s output is
uniformly distributed in [0, 𝑝 −1], it is easy to see that the probability
We can extend the query analysis to support multiple retroactive
that the Hash-jumper outputs ℎ for any 𝑇2 is 𝑝1 .
target queries. In this case, we run Setup 1 for each retroactive target
query, and then take a union of all outputs, which is the final set of
Case n = k: For each row of 𝑇2 , let 𝑥𝑖 denote the collision-resistant
queries to rollback & replay. In case of retroactively changing a target
hash function’s output. Then there are𝑝 𝑘 possibilities of (𝑥 1, 𝑥 2, ..., 𝑥𝑘 )
query, the desired database state after the retroactive operation is the
for the 𝑘 rows. Because the collision-resistant hash function’s out-
same as running Setup 1 twice: retroactively removing 𝑄𝜏′ and then
put is uniformly distributed in [0, 𝑝 − 1], all these possibilities have
retroactively adding a changed target query 𝑄𝜏′ . Thus, the retroactive
changing is equivalent to performing the two retroactive operations. the same probability 𝑝1𝑘 . Consider the output of the Hash-jumper.
Because performing retroactive operations using the query analysis For any ℎ𝑘 ∈ [0, 𝑝 − 1], we assume there are 𝑝 𝑘−1 possibilities of
Í
is correct for both of them, we have the correctness of retroactively (𝑥 1, 𝑥 2, ..., 𝑥𝑘 ) such that 𝑘𝑖 𝑥𝑖 = ℎ𝑘 mod 𝑝. This holds for 𝑘 = 1, as
changing a query using the query analysis. we have seen for Case n = 1.

Case n = k + 1: There are 𝑝 𝑘+1 possibilities of (𝑥 1, 𝑥 2, ..., 𝑥𝑘+1 ) for


the (𝑘 + 1) rows. Because the collision-resistant hash function’s
output is uniformly distributed in [0, 𝑝−1], all these possibilities have
the same probability 𝑝 𝑘+11 . For any (𝑥 , 𝑥 , ..., 𝑥 ) such that Í𝑘 𝑥 =
1 2 𝑘 𝑖 𝑖
ℎ𝑘 mod 𝑝, there exists exactly one 𝑥𝑘+1 such that ℎ𝑘 + 𝑥𝑘+1 = ℎ mod
𝑝. By the assumption in Case n = k, for each ℎ𝑘 ∈ [0, 𝑝 − 1], there
are exactly 𝑝 𝑘−1 possibilities of (𝑥 1, 𝑥 2, ..., 𝑥𝑘+1 ) such that the output
of the Hash-jumper is ℎ. Because there are 𝑝 possible ℎ𝑘 values in
[0, 𝑝 − 1], there are 𝑝 𝑘 possibilities of (𝑥 1, 𝑥 2, ..., 𝑥𝑘+1 ) such that the
output of the Hash-jumper is ℎ. Therefore, the probability that that
𝑝𝑘
the Hash-jumper outputs ℎ for a table 𝑇2 of (𝑘 + 1) rows is 𝑝 𝑘+1 = 𝑝1 .

Because the above probability is independent of the number of


rows 𝑛, for any 𝑇2 , the Hash-jumper output ℎ with a probability of
1 1
𝑝 . Therefore, the hash collision rate is upper-bounded by 𝑝 when
𝑇2 ≠ 𝑇1 .
False Positives: From the security perspective, there is yet a non-
zero chance that a malicious user fabricates two row hash values
(𝑥 1′ , 𝑥 2′ ) such that (𝑥 1′ + 𝑥 2′ ) mod 𝑝 = (𝑥 1 + 𝑥 2 ) mod 𝑝 and tricks the
database server into believing in a false positive on a table hash hit.
To address this, whenever a table hash hit is found, Hash-jumper op-
tionally makes a literal table comparison between two table versions
at the same commit time (one evolved during the replay; the other
newly rolled back from the original database to this same point in
time) and verifies if they really match. If the literal table comparison
returns a true positive before finishing to replay rest of the queries,
Ultraverse still ends up reducing its replay time.
False Negatives: A subtlety occurs when a query uses the LIMIT
keyword without ORDER BY, because each replay of this query may
return different row(s) of a table in a non-deterministic manner,

28
Ultraverse: Efficient Retroactive Operation for Attack Recovery in Database Systems and Web Frameworks

which may lead Hash-jumper to making a false negative decision that missing the opportunity of optimization does not affect the
and missing the opportunity of a legit hash-jump. However, note correctness of retroactive operations.

29

You might also like