0% found this document useful (0 votes)
50 views13 pages

Testing Database Systems Via Differential Query Execution

Uploaded by

qiuchicheng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views13 pages

Testing Database Systems Via Differential Query Execution

Uploaded by

qiuchicheng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)

Testing Database Systems via Differential Query


Execution
Jiansen Song∗† , Wensheng Dou∗†‡§ , Ziyu Cui∗† , Qianwang Dai∗† , Wei Wang∗†‡§ ,
Jun Wei∗†‡§ , Hua Zhong∗† , Tao Huang∗†
∗ State
Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences
† University of Chinese Academy of Sciences
‡ University of Chinese Academy of Sciences Nanjing College
2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) | 978-1-6654-5701-9/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICSE48619.2023.00175

§ Nanjing Institute of Software Technology


{songjiansen20, wsdou, cuiziyu20, daiqianwang19, wangwei, wj, zhonghua, tao}@otcaix.iscas.ac.cn

Abstract—Database Management Systems (DBMSs) provide a pivot row, and checks whether the target DBMS fails to
efficient data retrieval and manipulation for many applications fetch the pivot row. NoREC [10] rewrites a SELECT query
through Structured Query Language (SQL). Incorrect implemen-
tations of DBMSs can result in logic bugs, which cause SELECT as another equivalent one that cannot be optimized by the
queries to fetch incorrect results, or UPDATE and DELETE queries DBMS, and then detects difference in their query results.
to generate incorrect database states. Existing approaches mainly TLP [11] decomposes a SELECT query into three partitioning
focus on detecting logic bugs in SELECT queries. However, logic queries, and merges these partitioning queries’ results into a
bugs in UPDATE and DELETE queries have not been tackled. combined result, which is expected to be the same as the
In this paper, we propose a novel and general approach, which
we have termed Differential Query Execution (DQE), to detect original query’s result. However, all these approaches mainly
logic bugs in SELECT, UPDATE and DELETE queries of DBMSs. focus on detecting logic bugs in SELECT queries. While the
The core idea of DQE is that different SQL queries with the logic bugs in UPDATE and DELETE queries have not been
same predicate usually access the same rows in a database. For tackled yet, even though they can cause severer consequences,
example, a row updated by an UPDATE query with a predicate e.g., incorrect database states.
ϕ should also be fetched by a SELECT query with the same
predicate ϕ. If not, a logic bug is revealed in the target DBMS. To Logic bugs in DBMSs, especially those in UPDATE and
evaluate the effectiveness and generality of DQE, we apply DQE DELETE queries, are difficult to detect automatically. A key
on five production-level DBMSs, i.e., MySQL, MariaDB, TiDB, challenge to detect logic bugs is to construct an effective test
CockroachDB and SQLite. In total, we have detected 50 unique oracle to determine whether a DBMS behaves correctly for
bugs in these DBMSs, 41 of which have been confirmed, and 11
a given query. Existing approaches to construct oracles for
have been fixed. We expect that the simplicity and generality of
DQE can greatly improve the reliability of DBMSs. SELECT queries, e.g., PQS [9], NoREC [10] and TLP [11],
Index Terms—Database system, DBMS testing, logic bug cannot be adopted on UPDATE and DELETE queries.
In DBMSs, SELECT, UPDATE and DELETE queries utilize
I. I NTRODUCTION predicates (i.e., WHERE clauses) to specify which rows to
Database Management Systems (DBMSs) are designed to retrieve, update or delete, respectively. If they use the same
efficiently retrieve and manipulate data in databases. Rela- predicate ϕ, they should access the same rows in a database.
tional DBMSs, e.g., MySQL [1], MariaDB [2], TiDB [3], Ideally, DBMSs can adopt the same implementations for pred-
CockroachDB [4] and SQLite [5], adopt Structured Query icate evaluation in SELECT, UPDATE and DELETE queries.
Language (SQL) [6] as their standard query language, and However, a DBMS usually adopts different implementations
have become an indispensable component in many business- for predicate evaluation in SELECT, UPDATE and DELETE
critical applications [7]. queries due to various optimization choices1 . Inconsistent
DBMSs suffer from various bugs, e.g., crashes and logic implementations for predicate evaluation among these queries
bugs. Specially, logic bugs can cause a DBMS to return can cause SELECT, UPDATE and DELETE queries with the
incorrect results for SELECT queries, or generate incorrect same predicate ϕ to access different rows.
database states for UPDATE and DELETE queries. Such logic Inspired by this key observation, we propose Differential
bugs do not crash the DBMS, and can easily go unnoticed by Query Execution (DQE), a novel and general approach to
developers. In this work, we focus on detecting logic bugs in detect logic bugs in SELECT, UPDATE and DELETE queries.
DBMSs. DQE solves the test oracle problem by executing SELECT,
Recently, researchers have proposed some approaches to de- UPDATE and DELETE queries with the same predicate ϕ,
tect logic bugs in DBMSs [8]–[11]. RAGS [8] feeds a SELECT and observing inconsistencies among their execution results.
query into multiple DBMSs and observes discrepancies in their For example, if a row that is updated by an UPDATE query
query results. PQS [9] generates SELECT queries that fetch with a predicate ϕ does not appear in the query result of a

Wensheng Dou and Hua Zhong are the corresponding authors. 1 https://bugs.mysql.com/bug.php?id=106420

1558-1225/23/$31.00 ©2023 IEEE 2072


DOI 10.1109/ICSE48619.2023.00175
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 25,2025 at 02:39:07 UTC from IEEE Xplore. Restrictions apply.
SELECT query with the same predicate ϕ, a logic bug is of queries, e.g., DISTINCT, sub-queries, aggregate-based
detected in the target DBMS. The key challenge of DQE is to functions, window functions and GROUP BY that are only
automatically obtain the accessed rows for a given SELECT, used in SELECT queries. For these features, DQE cannot com-
UPDATE or DELETE query. To address this challenge, we pare their execution results in SELECT, UPDATE and DELETE
append two extra columns to each table in a database, to queries. Third, DQE cannot support non-deterministic func-
uniquely identify each row and track whether a row has tions, e.g., RAND function, which returns different values in
been modified, respectively. We further rewrite SELECT and different queries.
UPDATE queries to identify their accessed rows. Despite these limitations, the key insight of DQE is
CREATE TABLE t1 (c1 INT); widely applicable to other DBMSs that support data manip-
INSERT INTO t1 VALUES (1); -- r1 ulation specified by predicates, e.g., find(), update()
1. SELECT * FROM t1 WHERE ’’; and remove() in MongoDB. We expect that DQE can be
-- Fetch empty result
-- Warning|1292|Truncated incorrect DOUBLE widely adopted to improve the reliability of DBMSs and draw
value ’’ attention to detecting logic bugs in UPDATE and DELETE
2. UPDATE t1 SET c1=2 WHERE ’’; queries.
-- Update r1
3. DELETE FROM t1 WHERE ’’; In summary, we make the following contributions.
-- Delete r1 • We propose DQE, a novel and general approach to detect
Listing 1. TiDB#27648. The UPDATE and DELETE queries unexpectedly logic bugs in SELECT, UPDATE and DELETE queries in
change the database state.
DBMSs. To our knowledge, DQE is the first approach to
Listing 1, shows a real-world logic bug TiDB#276482 de- detect logic bugs in UPDATE and DELETE queries.
tected by DQE. In this bug, the UPDATE and DELETE queries • We implement and evaluate DQE on five widely-used
unexpectedly change the database state. Table t1 consists of DBMSs. DQE has detected 41 previously-unknown bugs
an INT row with value 1. The predicate ϕ is ' ', i.e., an empty in these DBMSs, 20 of which occur in UPDATE and
string. TiDB tries to convert ϕ into a boolean value for the DELETE queries.
three queries at Line 1−3. For the SELECT query, TiDB first
II. P RELIMINARIES
truncates ϕ to a DOUBLE value 0, and then converts it to a
boolean value FALSE. Therefore, the SELECT query fetches We first explain our target DBMSs (Section II-A and
an empty query result and raises a warning. For the UPDATE Section II-B), and then discuss SQL query execution strategies
and DELETE queries, TiDB erroneously evaluates ϕ to TRUE, that are adopted in our target DBMSs (Section II-C).
and changes the database state unexpectedly. We report this
A. Database Management Systems and SQL
bug to TiDB developers, who have confirmed and fixed it.
Existing approaches cannot detect this bug, because this bug Database Management Systems (DBMSs) are widely used
occurs in the UPDATE and DELETE queries. in many applications for effective data retrieval and ma-
To evaluate DQE’s effectiveness and generality, we im- nipulation. Mainstream DBMSs, e.g., MySQL [1], MariaDB
plement DQE and perform experiments on five widely-used [2], TiDB [3], CockroachDB [4] and SQLite [5], adopt the
and production-level DBMSs, i.e., MySQL [1], MariaDB [2], relational data model [12], which organizes data into relational
TiDB [3], CockroachDB [4] and SQLite [5]. In total, we tables. These DBMSs are so-called relational DBMSs.
have detected 50 unique bugs among these DBMSs, 41 of Relational DBMSs usually adopt Structured Query Lan-
which have been confirmed as new bugs, and 11 bugs have guage (SQL) [6] as their query language. In SQL, SELECT,
been fixed. Among the 41 confirmed bugs, 20 bugs occur in UPDATE and DELETE queries utilize predicates (i.e., WHERE
UPDATE and DELETE queries. None of our detected bugs can clauses) to determine which rows to retrieve, update or delete,
be detected by existing approaches, e.g., PQS [9], NoREC [10] respectively. DBMSs usually adopt sophisticated optimizations
and TLP [11]. Our experimental results indicate that DQE is to increase the performance of query evaluation [13]–[16]. For
effective in detecting logic bugs in SELECT, UPDATE and the same predicate, DBMSs can apply different optimizations
DELETE queries in DBMSs. We have made DQE publicly in SELECT, UPDATE and DELETE queries. For example,
available at https://github.com/tcse-iscas/dqetool. MySQL developers stated that “all the DML statements have
Although we have detected many bugs in SELECT, to pass through the optimizing stage,... SELECT and UPDATE
UPDATE and DELETE queries in our target DBMSs, DQE queries do not pass through the same optimizing process.”3
still has some limitations. First, DQE suffers from the same However, no matter what optimizations are applied on query
issue as differential testing, in which DQE fails to detect evaluation, SELECT, UPDATE and DELETE queries with the
the same bug occurring in all the three SELECT, UPDATE same predicate ϕ should access the same rows.
and DELETE queries. Second, DQE only supports common
B. Target DBMSs
operations and functions in SELECT, UPDATE and DELETE
queries, e.g., JOIN, ORDER BY, and LIMIT. DQE does not We focus on five production-level and widely-used DBMSs,
support operations and functions that are only used in one kind i.e., MySQL [1], MariaDB [2], TiDB [3], CockroachDB [4]
2 https://github.com/pingcap/tidb/issues/27648 3 https://bugs.mysql.com/bug.php?id=106420

2073

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 25,2025 at 02:39:07 UTC from IEEE Xplore. Restrictions apply.
TABLE I
SELECT … UPDATE/DELETE …
TARGET DBMS S
FROM t WHERE ij FROM t WHERE ij

DBMS DB-Engines Ranking GitHub Stars Type Strict No


Warning mode? Warning
MySQL 2 8.7K Traditional
MariaDB 13 4.7K Traditional Yes
TiDB 108 33.3K NewSQL Error Error
CockroachDB 57 26.5K NewSQL
SQLite 9 3.5K Embedded
Fig. 1. SQL modes in MySQL, MariaDB and TiDB.

and SQLite [5], as shown in Table I. We choose these DBMSs treated as an error (with the same error message) in the
based on their popularity and database types. The DB-Engines UPDATE and DELETE queries with the same predicate ϕ.
Ranking [17] shows that MySQL, SQLite and MariaDB are For non-strict mode, if a SELECT query with a predicate ϕ
among the most popular DBMSs, which are ranked 2nd, 9th raises a warning, the UPDATE and DELETE queries with the
and 13th, respectively. MySQL and MariaDB are traditional same predicate ϕ raise the same warning, too. Note that, if a
DBMSs that have been developed for decades. SQLite is SELECT query with a predicate ϕ raises an error, the UPDATE
an embedded DBMS and the most widely deployed DBMS and DELETE queries with the same predicate ϕ also raise the
[18]. According to GitHub Database Topic [19], TiDB and same error no matter whether strict mode is enabled.
CockroachDB are the top two popular (33.3K and 26.5K stars, CockroachDB and SQLite. These two DBMSs adopt rela-
respectively) relational DBMSs. CockroachDB and TiDB are tively simple query execution strategies. In these two DBMSs,
distributed NewSQL DBMSs with high scalability. SELECT, UPDATE and DELETE queries can only raise errors
when a syntax or semantic error occurs in predicate evaluation,
C. Query Execution Strategy and do not raise warnings. If an UPDATE or DELETE query
Different DBMSs usually adopt different SQL query execu- raises an error, it will be rolled back, and all changes made by
tion strategies, e.g., how to handle syntax and semantic errors the query will be undone. However, if a SELECT query raises
in query evaluation. In this section, we mainly discuss about an error, it will return all rows that match its predicate before
how our target DBMSs handle syntax and semantic errors in the error occurs. That said, a SELECT query may return a
query evaluation for SELECT, UPDATE and DELETE queries, non-empty query result when it raises an error.
which can affect the query execution analysis in DQE.
MySQL, MariaDB and TiDB. These three DBMSs adopt III. A PPROACH
the same query execution strategies. When a syntax or se- We propose Differential Query Execution (DQE) to auto-
mantic error err occurs in query evaluation, DBMSs can matically detect logic bugs in SELECT, UPDATE and DELETE
raise warnings or errors according to the severity of err. For queries. The core idea of DQE is that the SELECT, UPDATE
example, if an invalid value (e.g., comparing an INTEGER and DELETE queries with the same predicate ϕ should access
value with a TEXT value) is used, a warning is raised, while if the same rows. If these queries access different rows, DQE
a predicate is syntactically invalid (e.g., a function takes more reveals a potential logic bug in the target DBMS.
arguments than necessary), an error is raised. If a warning
occurs when evaluating a query, DBMSs can continue to A. DQE Overview
execute the query. For example, when evaluating a predicate ϕ Fig. 2 shows the workflow of DQE. We first generate a
on row r1 raises a warning, DBMSs can continue to evaluate ϕ random database (step 1 ). The generated database contains
on the following rows, e.g., row r1+1. If an error occurs when one or more tables, e.g., t1 and t2. Each table contains some
evaluating a query, DBMSs will abort and roll back the query. random columns and data, e.g., table t1 has a column c1 with
Specifically, SELECT queries return an empty query result, value ‘a’ and ‘b’. We then randomly generate a predicate ϕ,
and all changes made by UPDATE and DELETE queries are e.g., N OT c1 (step 2 ). Based on predicate ϕ, we generate
rolled back. a query triple < Qsel , Qup , Qdel >, in which Qsel , Qup
These three DBMSs can execute queries in different SQL and Qdel denote a SELECT query, an UPDATE query and
modes, which can affect the query execution strategies. A SQL a DELETE query, respectively. Qsel , Qup , and Qdel in the
mode is a set of configurations, e.g., STRICT_ALL_TABLES query triple all use ϕ as their predicates (step 3 ). We then
and STRICT_TRANS_TABLES. Specially, there are two SQL execute Qsel , Qup and Qdel in the query triple on the same
modes, i.e., strict mode and non-strict mode, which can affect database state (step 4 ), and analyze their execution results,
query execution strategies of SELECT, UPDATE and DELETE i.e., accessed rows and raised errors (step 5 ). Specially, we
queries. Fig. 1 shows how SQL queries handle warnings and analyze Qsel ’s query result rs, the modified tables tu and
errors for a given predicate ϕ in different SQL modes. When td after executing Qup and Qdel , respectively. If the three
enabling strict mode, DBMSs adopt strict validation check queries’ execution results in the query triple are inconsistent,
for UPDATE and DELETE queries. Specifically, if a SELECT e.g., accessing different rows, DQE reveals a potential logic
query with a predicate ϕ raises a warning, the warning is bug in the target DBMS (step 6 ).

2074

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 25,2025 at 02:39:07 UTC from IEEE Xplore. Restrictions apply.
ŘGenerate a ř Generate a random Ś Generate a query triple ś Execute queries ŜObtain execution ŝ Compare execution
random predicate ij < ܳ௦௘௟ , ܳ௨௣ , ܳௗ௘௟ > on the DBMS results results
database
ܳ௦௘௟ rs Accessed rows
t1 c1
c1 (TEXT) ij NOT t1.c1 SELECT t1.c1 FROM t1 Execute on t1
r1 a {r1, r2}
r1 a WHERE ij
r2 b
r2 b NOT
ܳ௨௣ tu c1
UPDATE t1 SET t1.c1='c' Execute on t1
t2 c2 (TEXT) r1 a {} 
t1.c1 WHERE ij
r3 c r2 b

r4 d ܳௗ௘௟ td c1
DELETE FROM t1 Execute on t1
r1 a {r1, r2}
WHERE ij
r2 b

Fig. 2. We use MariaDB#27885 [20] to illustrate the workflow of DQE. rs denotes the query result of Qsel , tu denotes t1’s new table state after executing
Qup , and td denotes t1’s new table state after executing Qdel .

The rest of this section is organized as follows. Section III-B Algorithm 1: Predicate generation.
describes our database generation. Section III-C describes our Input: maxDepth
strategies to generate SQL queries. Section III-D shows how 1 Function generatePredicate () do
we obtain a query triple’s execution results. Section III-E 2 generateAST(0)
shows how we detect logic bugs by comparing the execution
3 Function generateAST(depth) do
results of the three queries in a query triple.
4 nodeT ypes ← {CON ST, COLU M N }
B. Database Generation 5 if depth < maxDepth then
Database generation has been widely explored by existing 6 nodeT ypes ←
works [21]–[26], and is not a contribution of this work. Our {CON ST, COLU M N, AN D, OR, ...}
database generation is mainly adopted from SQLancer [27]. 7 nodeT ype ← random(nodeT ypes)
We present our database generation only for completeness. 8 if nodeT ype = CON ST then
We first use the CREATE TABLE command to create at 9 return randomConst()
most maxT able tables. Each table contains at most maxCol 10 else if nodeT ype = COLU M N then
columns. We assign each column with a random column 11 return randomT able().randomColumn()
type, e.g., INT or TEXT, and some column constraints, e.g.,
PRIMARY KEY and UNIQUE. We then populate random data 12 else
into each table by executing the INSERT command. Each 13 node ← nodeT ype.createN ode()
table contains at most maxInsert rows of data. We further 14 for i ← 1; i ≤ node.operands.length; i + + do
execute at most maxAlter ALTER TABLE and CREATE 15 node.child[i] ← generateAST
INDEX commands to modify each initial table, e.g., adding (depth + 1)
new columns or building indexes on existing columns. More- 16 return node
over, we configure each table with random options, e.g.,
setting the starting number of the auto-incrementing column
by appending AUTO_INCREMENT=5. Note that, maxT able,
maxCol, maxInsert, maxAlter are all configurable param- column constraints and table options. For example, Cock-
eters. We set them as 5, 3, 10, 3 by default in our experiment, roachDB supports column type INTERVAL, while MySQL
respectively. does not support it.
After generating the initial database, we alter each table in
the database by adding column rowId and updated. Column C. Query Triple Generation
rowId is used to uniquely identify each row. We assign After database generation, we generate a query triple <
column rowId as TEXT type and populate it with unique Qsel , Qup , Qdel >, in which Qsel , Qup , Qdel use the same
values, e.g., UUID. Column updated is used to track the predicate ϕ. In the following paragraphs, we first explain how
modifications of each row. We assign column updated as INT to generate a predicate ϕ, and then explain how to generate
type with default value 0. Note that, these two newly-added Qsel , Qup , and Qdel based on predicate ϕ.
columns are not used in the following query triple generation Predicate generation. We use Algorithm 1 to randomly
(Section III-C). We only use them to obtain a query’s accessed generate predicates based on Abstract Syntax Trees (ASTs)
rows in Section III-D. of SQL. We randomly choose one node type from CON ST ,
The above database generation is specific to individual COLU M N , and operators supported by the target DBMS
DBMSs. Different DBMSs support different column types, (Line 5−7). If the node type is CON ST , we randomly

2075

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 25,2025 at 02:39:07 UTC from IEEE Xplore. Restrictions apply.
TABLE II
E XAMPLES OF UPDATE OR DELETE -S PECIFIC E RRORS IN M Y SQL AND SQL ITE

Table Constraint MySQL SQLite


NOT NULL 1048, Column ‘c0’ cannot be null NOT NULL constraint failed: t1.c1
UNIQUE 1062, Duplicate entry ‘2’ for key ‘t1.i0’ UNIQUE constraint failed: t1.c1
Generated Column 3105, The value specified for generated column ‘c1’ in table ‘t1’ is not allowed cannot UPDATE generated column “c2”
Foreign Key 1451, Cannot delete or update a parent row: a foreign key constraint fails FOREIGN KEY constraint failed

generate a constant value, e.g., ‘a’ (Line 8−9). If the node obtain the errors raised by a query, and then discuss how to
type is COLU M N , we randomly return a column reference automatically obtain each query’s accessed rows in a query
from the tables in our generated database, e.g., t1.c1 (Line triple in details.
10−11). If the node type is an operator, we iteratively generate 1) Obtaining the errors raised by a query: For a query, e.g.,
its operands (Line 13−15). When the depth of an AST reaches Qsel , Qup and Qdel , it can raise errors (sometimes warnings in
maxDepth, we only generate a constant or a column reference MySQL, MariaDB and TiDB) when a syntax or semantic error
(Line 4−6) and will not expand the AST further. Note that, occurs in query evaluation. Specially, an UPDATE query Qup
maxDepth is configurable, and we set it as 3 by default in may violate table constraints, e.g., NOT NULL and UNIQUE,
our experiment. and raises UPDATE-specific errors when updating the refer-
Query triple generation. After generate a predicate ϕ, enced tables. Similarly, a DELETE query Qdel may violate
we further randomly generate Qsel , Qup , and Qdel based on table constraints, e.g., FOREIGN KEY, and raises DELETE-
predicate ϕ. Specially, We first extract the referenced tables specific errors when deleting data in the referenced tables.
from predicate ϕ, which are used in Qsel , Qup , and Qdel . Table II shows some errors about constraint violations in
For example, if predicate ϕ is t1.c1 > 2 AN D t2.c2 > 1, MySQL and SQLite for UPDATE and DELETE queries. For
its referenced tables are t1 and t2. We then generate Qsel ’s example, when Qup updates on a column c1 with NOT NULL
select field and Qup ’s update field. The select field of Qsel is constraint by changing its value to N U LL, MySQL raises
a list of column references, e.g., t1.c1, t2.c2. The update field an error with code 1048 and message “Column ‘c0’ cannot
of Qup is a list of assignments, e.g., t1.c1 = 1, t2.c2 = 2. be null”. When Qdel deletes a column that is referenced by
Finally, we generate optional clauses, e.g., ORDER BY, which another table, SQLite raises an error with message “FOREIGN
are commonly supported by Qsel , Qup , and Qdel . For example, KEY constraint failed”. Note that, Qsel does not raise specific
we can generate a query triple as follows. errors that Qup and Qdel cannot raise.
Qsel : SELECT t1.c1,t2.c2 FROM t1,t2 We use diagnostic commands provided by our target
WHERE t1.c1>2 AND t2.c2>1 DBMSs to obtain the errors raised by a query. Specially, we
Qup : UPDATE t1,t2 SET t1.c1=1,t2.c2=2 use the SHOW WARNINGS command to obtain the raised er-
WHERE t1.c1>2 AND t2.c2>1
Qdel : DELETE t1,t2 FROM t1,t2 rors in MySQL, MariaDB and TiDB. The SHOW WARNINGS
WHERE t1.c1>2 AND t2.c2>1 command returns the error level, code and message when exe-
cuting a query [28]. CockroachDB and SQLite do not provide
During query generation, DQE supports common operations
such diagnostic commands. Thus, we use SQLException in
and functions in SELECT, UPDATE and DELETE queries,
Java to obtain the raised errors in these two DBMSs.
and does not support operations and functions that are only
used in one type of queries, e.g., DISTINCT, aggregate- 2) Obtaining the accessed rows by a SELECT query (Qsel ):
based functions, window functions and GROUP BY that are In order to obtain the rows returned by a SELECT query Qsel ,
only used in SELECT queries. Moreover, DQE cannot support we append the select field of Qsel with column rowId of
non-deterministic functions, e.g., RAND function that returns its referenced tables, and form a new SELECT query Qsel .
a random value. Note that, our query generation is specific to After executing Qsel , we fetch column rowId’s values from
DBMSs. its result set. Because column rowId’s values can appear
more than once when testing multiple tables, we remove the
D. Obtaining Execution Results duplicate values of column rowId if necessary.
For a generated query triple < Qsel , Qup , Qdel >, we Fig. 3 shows an example for Qsel . We append Qsel ’s select
execute Qsel , Qup and Qdel on the same database state, and field with t1.rowId, t2.rowId, which is shown in the red font,
obtain their execution results. Note that, the three queries’ and form Qsel . We then execute Qsel to get the accessed rows
execution results, i.e., Qsel ’s query result, the modified tables from its query result rs. In this example, the same value of
by Qup , and Qdel , cannot be used directly to compare and column t1.rowId appears twice, so we remove the duplicate
find bugs. Instead, we collect two kinds of information in values. We can see that the accessed rows by Qsel is r3, r5, r6.
each query, i.e., the rows accessed by a query, and the errors 3) Obtaining the accessed rows by an UPDATE query
raised by a query if any. These information can be used to (Qup ): In order to obtain the rows updated by an UPDATE
compare the queries’ execution results in the query triple query Qup , we append the update field of Qup with a list of
in Section III-E. In the following, we first discuss how to assignments to column updated in its referenced tables, and

2076

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 25,2025 at 02:39:07 UTC from IEEE Xplore. Restrictions apply.
t1 c1 rowId SELECT t1 c1 rowId t1d c1 rowId
0 r1 t1.c1,t2.c2 0 r1 0 r1
t1.rowId, DELETE t1,t2
2 r2 rs FROM t1,t2
t2.rowId 2 r2 2 r2
4 r3 FROM t1,t2 t1.c1 t2.c2 t1.rowId t2.rowId WHERE t1.c1>2
4 r3 4 r3 {r3}
WHERE t1.c1>2 4 3 r3 r5 AND t2.c2>1
t2 c2 rowId AND t2.c2>1 {r3, r5, r6}
4 5 r3 r6
1 r4 t2 c2 rowId t2d c2 rowId
3 r5 1 r4 1 r4 {r5, r6}
{r3, r5, r6}
5 r6 3 r5 3 r5
5 r6 5 r6
Fig. 3. Obtaining the accessed rows by a SELECT query.

Fig. 5. Obtaining the accessed rows by a DELETE query.


t1 c1 updated UPDATE t1,t2 t1u c1 updated
r1 0 0 SET t1.c1=1,t2.c2=2 r1 0 0
t1.updated=1, rowup and rowdel to present the set of accessed rows by Qsel ,
r2 2 0 t2.updated=2 r2 2 0
{r3} Qup and Qdel , respectively.
r3 4 0 WHERE t1.c1>2 r3 1 1
AND t2.c2>1 {r3, r5, r6} As discussed in Section III-D, Qup and Qdel can raise
t2 c2 updated t2u c2 updated UPDATE and DELETE -specific errors, respectively, while
{r5, r6}
r4 1 0 r4 1 0 Qsel cannot. Therefore, if Qup raises UPDATE-specific errors,
r5 3 0 r5 2 1 we will not compare Qup ’s execution results with those of
r6 5 0 r6 2 1 Qsel and Qdel , i.e., we only compare the execution results
of Qsel and Qdel . Similarly, if Qdel raises DELETE-specific
Fig. 4. Obtaining the accessed rows by an UPDATE query.
errors, we will not compare Qdel ’s execution results with those
of Qsel and Qup . In these two cases, rowup or rowdel should
be empty. In the following discussion, we assume that Qup
form a new UPDATE query Qup . After executing Qup , we and Qdel do not raise UPDATE and DELETE -specific errors,
fetch column rowId’s values from each referenced table with respectively.
column updated’s value equal to 1, and form the accessed As discussed in Section II-C, different DBMSs adopt dif-
rows by Qup . ferent query execution strategies. Thus, we first discuss how
Fig. 4 shows an example for Qup . We append Qup ’s update to compare a query triple’s execution results in MySQL,
field with t1.updated = 1, t2.updated = 1, which is shown MariaDB and TiDB, and then discuss how to compare a query
in the red font, and form Qup . We then execute Qup , and triple’s execution results in CockroachDB and SQLite.
obtain its accessed rows by combining column rowId in each 1) MySQL, MariaDB and TiDB: We apply the following
modified table (i.e., t1u and t2u ) with column updated’s value rules to compare the execution results of Qsel , Qup and Qdel
equal to 1. We can see that the accessed rows by Qup is in a query triple. If any rule is violated, DQE reports a bug.
r3, r5, r6.
• If Qsel raises an error, Qup and Qdel should raise the
4) Obtaining the accessed rows by a DELETE query (Qdel ): same error. In this case, rowsel , rowup and rowdel should
In order to obtain the rows deleted by a DELETE query Qdel , be empty.
we compare column rowId’s values in each referenced table • Under strict mode, if Qsel raises a warning, Qup and Qdel
before and after executing it. should raise an error. The warning and error should have
Fig. 5 shows an example for Qdel . Before executing Qdel , the same error codes and messages, except for their error
the column rowId’s values in table t1 are r1, r2, r3, and levels. In this case, rowup and rowdel should be empty.
the column rowId’s values in table t2 are r4, r5, r6. After • Under non-strict mode, if Qsel raises a warning, Qup and
executing Qdel , the column rowId’s values in the modified Qdel should raise the same warning. In this case, rowsel ,
table t1d are r1, r2, and the column rowId’s values in the rowup and rowdel should be the same.
modified table t2d are r4, which means r3 in the table t1 and • If Qsel does not raise a warning or an error, Qup and
r5, r6 in the table t2 are deleted. Therefore, the accessed rows Qdel should not raise a warning or an error. In this case,
by Qdel is r3, r5, r6. rowsel , rowup and rowdel should be the same.
Note that, in Fig. 3, Fig. 4, and Fig. 5, we only show
the related columns for brevity. In fact, we append these two 2) CockroachDB and SQLite: We apply the following rules
columns to each table in our database generation. to compare the execution results of Qsel , Qup and Qdel in a
query triple. If any rule is violated, DQE reports a bug.
E. Comparing Execution Results • If Qsel raises an error, Qup and Qdel should raise the
After obtaining the execution results of Qsel , Qup and Qdel , same error. In this case, rowup and rowdel should be
we analyze and compare them to detect whether a logic bug empty. However, rowsel may not be empty, as discussed
occurs in the target DBMS. In the following, we use rowsel , in Section II-C.

2077

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 25,2025 at 02:39:07 UTC from IEEE Xplore. Restrictions apply.
TABLE III
B UGS R EPORTED BY DQE

Bug Status Triggering Query


DBMS Submitted Confirmed Fixed Duplicate Not a bug SELECT UPDATE DELETE
MySQL 7 1 1 0 6 0 1 1
MariaDB 4 2 0 0 0 0 0 2
TiDB 37 37 10 0 0 20 17 17
CockroachDB 1 0 0 1 0 0 0 0
SQLite 1 1 0 0 0 1 0 0
Total 50 41 11 1 6 21 18 20

• If Qsel does not raise a warning or an error, Qup and potential bug, we manually execute it through the interactive
Qdel should not raise a warning or an error. In this case, terminal of the target DBMS to check whether this bug can
rowsel , rowup and rowdel should be the same. be reproduced. When successfully reproducing the reported
bug, we manually reduce the test case to a smaller size. We
IV. E VALUATION
apply the following three strategies to perform the test case
We implement DQE based on SQLancer [27], which is reduction. First, we remove the unused columns and optional
implemented in Java. We make the following improvements column constraints. Second, we remove data in tables by
to apply DQE. First, we add UPDATE and DELETE query eliminating all INSERT commands that do not change the bug
generation in our target DBMSs, e.g., MySQL, TiDB and consequence. Third, we randomly remove some sub-clauses
SQLite. Second, DQE requires to execute the three queries in predicates without changing the bug consequence. We then
in a query triple on the same database state. In MySQL, check whether this bug has been reported in the target DBMS’s
MariaDB and TiDB, we use ROLLBACK transactions to roll bug tracking system to avoid submitting duplicate bugs. After
back all changes made by UPDATE and DELETE queries. In reporting a bug, we wait for feedbacks from developers.
CockroachDB and SQLite, we record the table content before Experimental focus. The developers’ response time deter-
query execution and refill the table with the same content after mines how much effort is spent on testing a DBMS. TiDB
query execution. In total, we write about 2,600 lines of code developers give us more responsive confirmation than other
to implement DQE on five target DBMSs. DBMS developers, which highly increases our confidence to
We evaluate the effectiveness of DQE by answering the continue our test. Therefore, we spend most of our testing time
following two research questions: on TiDB and keep it up-to-date.
• RQ1: What logic bugs can DQE detect in real-world
DBMSs?
B. Overall Detection Results
• RQ2: How many bugs detected by DQE can be found by
existing approaches? To answer RQ1, we evaluate DQE on MySQL, MariaDB,
TiDB, CockroachDB and SQLite. In total, DQE reports 122
A. Experimental Methodology bugs among them. We manually reproduce and minimize the
Experimental setup. We evaluate DQE on five widely- test cases of these 122 reported bugs. If the minimized test
used DBMSs, i.e., MySQL, MariaDB, TiDB, CockroachDB cases of some bugs are the same, we only keep one, and
and SQLite. Detailed information about these DBMSs are consider others as duplicate bugs. Finally, we obtain 50 unique
presented in Section II-B. We test these DBMSs with their bugs, and submit them to corresponding DBMS developers.
latest release versions when we start our experiment, i.e., Specifically, we submit 7 bugs in MySQL, 4 bugs in MariaDB,
MySQL 8.0.28, MariaDB 10.8.2, TiDB 5.2.0, CockroachDB 37 bugs in TiDB, 1 bug in CockroachDB and 1 bug in SQLite.
21.2.6 and SQLite 3.39.2. For TiDB, we also test version 5.3.0 Note that, we do not submit the remaining 72 bugs, which are
and 5.4.0 after they are released. considered as duplicate bugs by us and not false positives.
We perform our experiment on a CentOS machine with 8 Table III shows the bug status of our submitted bugs
CPU cores and 32GB RAM. We deploy our target DBMSs (column 2-6). Among the 50 submitted bugs, 41 bugs have
according to their own deployment requirements. Specifically, been confirmed as new bugs, in which 11 bugs have been fixed.
we deploy MySQL and MariaDB using Docker containers. Among the 41 confirmed bugs, MySQL developers confirm
We deploy TiDB in a local cluster with a TiKV instance, a 1 bug, MariaDB developers confirm 2 bugs, TiDB developers
TiDB instance and a PD instance. We deploy CockroachDB confirm 37 bugs and SQLite developers confirm 1 bug. Among
in a local cluster with three nodes. We embed SQLite within the 11 fixed bugs, MySQL developers fix 1 bug and TiDB
DQE. developers fix 10 bugs. For the 9 bugs that have not been
Experimental process. We run DQE to find bugs in our confirmed, 1 bug in CockroachDB is considered as duplicate
target DBMSs. We do not set timeout for our experiment to an existing bug, 6 bugs are considered as intended behaviors
and continuously run DQE until it finds bugs. The whole by MySQL developers, and the remaining 2 bugs have not
experiment takes about one month. When DQE reports a been decided by MariaDB developers yet.

2078

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 25,2025 at 02:39:07 UTC from IEEE Xplore. Restrictions apply.
TABLE IV TABLE V
B UG C ONSEQUENCES IN SELECT, UPDATE AND DELETE Q UERIES C OVERAGE I NFORMATION

Consequence SELECT UPDATE DELETE Tool MySQL MariaDB


Incorrect database state 0 18 18 PQS 19 -
Duplicate warning 5 0 0 NoREC - 18
Unexpected warning 6 0 2 TLP 18 -
Unexpected error 1 0 0 DQE 15 21
Incorrect warning message 6 0 0
Others 3 0 0
Total 21 18 20
in SELECT queries. We further analyze the triggering test
cases and bug consequences of the remaining 3 logic bugs
Among the 41 confirmed bugs, 22 bugs are verified as in SELECT queries. We find that, none of these 3 bugs can
Major or Moderate. Note that, different DBMSs have different be triggered or captured by the oracles in these approaches.
severity levels. We use Major to denote Critical and Serious in Therefore, all our reported bugs cannot be detected by these
MySQL and MariaDB, and Major in TiDB. We use Moderate approaches theoretically.
to denote Moderate in TiDB. Moreover, SQLite developers do Other DBMS testing approaches, e.g., SQLsmith [29],
not assign a severity level on the confirmed bug, so we do not APOLLO [30], AMOEBA [31], RAGS [8] and SparkFuzz
count it as Major or Moderate. [32], cannot construct oracles to detect logic bugs, or cannot
Table III also shows the triggering queries of the 41 con- detect logic bugs in a single DBMS because differential testing
firmed bugs (column 7-9). 21 bugs are triggered by SELECT needs multiple DBMSs. Therefore, we do not compare DQE
queries, 18 bugs are triggered by UPDATE queries and 20 with these approaches.
bugs are triggered by DELETE queries. Note that, one bug D. Other Experimental Statistics
can be triggered by more than one type of queries, so the total
Test efficiency. During testing, before a bug we detect is
number of triggering queries is more than the total number of
fixed by the DBMS developers, DQE will generate many test
confirmed bugs.
cases that trigger the same bug. In total, DQE reports 122 bugs.
Table IV shows the 41 confirmed bugs’ bug consequences
After filtering out duplicate bugs, we obtain 50 unique bugs.
with the number of their triggering queries. Most of SELECT
The duplicate rate is 41% (50/122). Existing works [9]–[11]
queries cause incorrect warnings, e.g., duplicate warnings,
also face the same problem. There is currently no practical way
unexpected warnings and incorrect warning messages. The
to automatically filter out duplicate test cases for DBMSs. For
remaining 3 SELECT queries cause the SHOW WARNINGS
discovering these 50 unique bugs, we generate 1,776,124,512
command to fail to return errors. All UPDATE queries and
query triples.
most of DELETE queries cause incorrect database states. The
Query generation efficiency. We measure the query gen-
remaining 2 DELETE queries cause unexpected warnings.
eration efficiency in DQE during testing. In this experiment,
Among the 36 queries that lead to incorrect database states,
we count every queries generated including those that create
34 queries occur in TiDB and 2 queries occur in MySQL. All 5
the database and query triples. In DQE, we generate syntacti-
queries that lead to duplicate warnings occur in TiDB. Among
cally valid queries based on Abstract Syntax Trees (ASTs)
the 8 queries that lead to unexpected warnings, 6 queries occur
of SQL. However, SQL in different DBMSs should obey
in TiDB and 2 queries occur in MariaDB. One query that
many semantic constraints, which can cause DQE to generate
leads to unexpected errors occurs in SQLite. All 6 queries
semantically invalid queries. For example, DQE may generate
that lead to incorrect warning messages occur in TiDB. The
an INSERT command that inserts a duplicate value into a
remaining 3 queries that lead to the execution failures of the
UNIQUE column. Such semantic errors can lower our success
SHOW WARNINGS command occur in TiDB.
rate of query generation. In MySQL, DQE generates 2,885
C. Comparing with Existing Approaches queries per second with a success rate of 88%. In MariaDB,
DQE generates 3,344 queries per second with a success rate
To answer RQ2, we perform a qualitative comparison with
of 87%. In TiDB, DQE generates 1,566 queries per second
existing approaches (i.e., PQS [9], NoREC [10] and TLP
with a success rate of 89%. In CockroachDB, DQE generates
[11]) that aim to detect logic bugs in DBMSs. These three
243 queries per second with a success rate of 72%. In SQLite,
approaches construct oracles to detect logic bugs in single
DQE generates 12,313 queries per second with a success rate
SELECT queries. Thus, they cannot detect the 20 logic bugs
of 97%.
in UPDATE and DELETE queries. Moreover, these three
Coverage. To demonstrate the sufficiency of our testing,
approaches do not consider the normal errors that can be
we compare code coverage with existing works, i.e., PQS [9],
unexpectedly raised by SELECT queries as logic bugs, e.g.,
NoREC [10] and TLP [11]. We run each tool with the same
the warnings in Listing 3. Unlike crashes caught by these
experimental setting for 24 hours on MySQL and MariaDB4 .
approaches, these normal errors do not crash the DBMS and
need a test oracle to validate their correctness. Thus, they 4 We have not found a suitable way to perform code coverage measurements
cannot detect 18 logic bugs related to this kind of errors in TiDB, SQLite and CockroachDB.

2079

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 25,2025 at 02:39:07 UTC from IEEE Xplore. Restrictions apply.
Table V shows our experiment results. PQS achieves 19% line CREATE TABLE t1 (c1 FLOAT);
INSERT INTO t1 VALUES (0); -- r1
coverage in MySQL. NoREC achieves 18% line coverage in 1. SELECT c1 FROM t1 WHERE c1 = ’a’;
MariaDB. TLP achieves 18% line coverage in MySQL. DQE -- Fetch r1
achieves 15% line coverage in MySQL and 21% line coverage -- Warning|1292|Truncated incorrect DOUBLE
value: ’a’
in MariaDB. We can see that DQE obtains similar coverage -- Warning|1292|Truncated incorrect DOUBLE
with other works. This is reasonable, since DQE, PQS, NoREC value: ’a’
and TLP are all built on SQLancer, which share the similar -- Warning|1292|Truncated incorrect DOUBLE
value: ’a’
query generation. Note that, in SQLancer, NoREC does not 2. UPDATE t1 SET c1 = 1 WHERE c1 = ’a’;
support testing MySQL, PQS and TLP do not support testing -- Update no row
MariaDB. Thus, we do not measure their code coverage. -- Error|1292|Truncated incorrect INTEGER
value: ’a’
Although DQE can generate thousands of queries per sec- 3. DELETE FROM t1 WHERE c1 = ’a’;
ond in MySQL and MariaDB, the coverage is low. This is -- Delete no row
expected, because DQE only focuses on query processing in -- Error|1292|Truncated incorrect INTEGER
value: ’a’
DBMSs. DBMSs also provide many features that we do not
test, e.g., user management, configuration, and fault tolerance. Listing 3. TiDB#31711. The SELECT query raises duplicate warnings.
Parameter selection. We use some default parameters, e.g.,
Duplicate warning. Listing 3 shows a bug TiDB#317116 ,
maxT able=5, maxCol=10, and maxDept=3, to generate
in which the SELECT query raises three same warnings on
databases and queries. These parameters may affect our bug
row r1. Table t1 consists of a FLOAT row with value 0. The
detection effectiveness. However, the impact of these parame-
predicate ϕ is c1 = ‘a’. TiDB evaluates it by checking whether
ters could be low. The reasons are as follows. (1) Logic bugs
column c1’s value is equal to constant ‘a’. Because there is
in DBMSs usually obey the small scope hypothesis [33]. That
only one row r1 in table t1, TiDB should evaluate the predicate
said, a high proportion of logic bugs can be found by test
ϕ only once. Note that, column c1 is in different data type
inputs within some small scopes, e.g., a small number of tables
with constant ‘a’, which is a string. Therefore, TiDB requires
and rows. (2) After minimizing our 50 submitted bugs, we find
a type conversion, i.e., converting ‘a’ to a DOUBLE value 0,
that all 50 bugs can be detected on a single table, 47 bugs can
when evaluating the predicate ϕ. For the SELECT query, TiDB
be detected with one row. maxT able, maxCol and maxDept
returns three warnings to indicate such conversions. These
for these 50 submitted bugs are 1, 2 and 3, respectively.
warnings will confuse users, because these three same warning
E. Selected Bugs are raised on the same row r1. Note that, the warning message
raised by the UPDATE and DELETE queries is also incorrect,
In this section, we present some interesting bugs detected
because TiDB should convert the constant ‘a’ to a DOUBLE
by DQE according to their bug consequences. Table IV shows
value according to its reference manual [34], instead of INT
the overall statistics of bug consequences. In the following
type.
discussion, we illustrate each bug consequence using a repre-
sentative bug. CREATE TABLE t1 (c1 BLOB);
INSERT INTO t1 VALUES (’a’); -- r1
CREATE TABLE t1 (c1 INT); 1. SELECT * FROM t1 WHERE c1;
INSERT INTO t1 VALUES (1); -- r1 -- Fetch empty result
1. SELECT * FROM t1 WHERE 0 ˆ ’0.5’; -- Warning|1292|Truncated incorrect DOUBLE value
-- Fetch empty result : ’a’
-- Warning|1292|Truncated incorrect INTEGER 2. UPDATE t1 SET c1 = ’b’ WHERE c1;
value: ’0.5’ -- Update no row
2. UPDATE t1 SET c1 = 2 WHERE 0 ˆ ’0.5’; -- Error|1292|Truncated incorrect DOUBLE value:
-- Update r1 ’a’
3. DELETE FROM t1 WHERE 0 ˆ ’0.5’; 3. DELETE FROM t1 WHERE c1;
-- Delete r1 -- Delete no row
Listing 2. TiDB#31708. The UPDATE and DELETE queries unexpectedly -- Warning|1292|Truncated incorrect DOUBLE value
change the database state. : ’a’
Listing 4. MariaDB#28140. The DELETE query raises an unexpected
Incorrect database state. Listing 2 shows a bug warning.
TiDB#317085 , in which the UPDATE and DELETE queries
unexpectedly change the database states. Table t1 consists of Unexpected warning. Listing 4 shows a bug Mari-
an INT row with value 1. The predicate ϕ is 0 ˆ ‘0.5’. For aDB#281407 in the strict mode, in which the DELETE query
the SELECT query, TiDB first convert ‘0.5’ into an INT value raises a warning instead of an error. In this bug, because the
0 and then calculates 0 ˆ 0 that is equal to 0 (FALSE), and SELECT query raises a warning, the same warning should
finally returns an empty query result. For the UPDATE and be treated as an error in the DELETE query. However, the
DELETE queries, TiDB unexpectedly evaluates the predicate DELETE query raises a warning. In this bug, because the
to TRUE and changes the database state without raising any predicate ϕ is c1, which value is ‘a’, the DELETE query
errors. 6 https://github.com/pingcap/tidb/issues/31711

5 https://github.com/pingcap/tidb/issues/31708 7 https://jira.mariadb.org/browse/MDEV-28140

2080

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 25,2025 at 02:39:07 UTC from IEEE Xplore. Restrictions apply.
evaluates ϕ to FALSE. Therefore, the DELETE query does not 1292 is “Truncated incorrect %s value: ‘%s’”. “evaluation
change the database state. However, such unexpected warning failed: ” should not appear in the SELECT query’s warning
can lead to change the database state in some cases. For message. This bug illustrates that cross-validating the evalu-
example, if predicate ϕ is NOT c1, the DELETE query will ation of a predicate among SELECT, UPDATE and DELETE
change the database state to an empty table unexpectedly. queries helps equip DQE with the ability to check the correct-
CREATE TABLE t1 (c1 TEXT); ness of warnings.
INSERT INTO t1 VALUES (’a’); -- r1 CREATE TABLE t1 (c1 INT);
1. SELECT c1 FROM t1 WHERE (NULL == c1) AND INSERT INTO t1 VALUES (1); -- r1
json_object(c1, c1); 1. SELECT * FROM t1 WHERE POW(0, -1);
-- Fetch no row -- Fetch empty result
-- Runtime error: json_object() labels must
--
be TEXT 2. UPDATE t1 SET c1=2 WHERE POW(0, -1);
2. UPDATE t1 SET c1 = ’b’ WHERE (NULL == c1) AND -- Update no row
json_object(c1, c1); -- Error|1690|DOUBLE value is out of range
-- Update no row in ’pow(0,-(1))’
3. DELETE FROM t1 WHERE (NULL == c1) AND 3. DELETE FROM t1 WHERE POW(0, -1);
json_object(c1, c1); -- Delete no row
-- Delete no row -- Error|1690|DOUBLE value is out of range
Listing 5. SQLite#12638. The SELECT query raises an unexpected error. in ’pow(0,-(1))’
Listing 7. TiDB#33292. The SELECT query triggers incorrect functionality
Unexpected error. Listing 5 shows a bug SQLite#12638 in the SHOW WARNINGS command.
8
, in which the SELECT query should not raise an er-
ror. Table t1 consists of a TEXT row with value ‘a’. The Others. Listing 7 shows a bug TiDB#3329211 , in which the
predicate ϕ is (NULL == c1) AND json object(c1, c1). diagnostic command SHOW WARNINGS fails to return the
The json object function accepts a pair of arguments, e.g., error raised by the SELECT query. The SHOW WARNINGS
(label1, value1), and requires the data type of label1 to command is a diagnostic command that returns warnings or
be TEXT. For the SELECT query, SQLite returns an error, errors resulting from the current query execution [28]. Because
which states that the json object function takes labels that we use the SHOW WARNINGS command to obtain a query’s
must be TEXT type. However, the label in the json object raised warning in TiDB, DQE reports this bug due to a missing
function is indeed a TEXT column c1. We report this bug warning in the SELECT query. DQE find another two bugs that
to SQLite developers, who explain that the constant propaga- also happen in the SHOW WARNINGS command.
tion optimization causes this problem. SQLite suffers from
“premature evaluation” of the json object function in this F. Not A Bug
bug. Specially, SQLite transforms predicate ϕ into (NULL In this section, we list a representative bug that is classified
== c1) AND json object(NULL, NULL), and calculates as not a bug in MySQL.
json object(NULL, NULL) in predicate ϕ without checking
CREATE TABLE t1 (c1 FLOAT);
(NULL == c1). INSERT INTO t1 VALUES (1); -- r1
CREATE TABLE t1 (c1 TEXT); CREATE UNIQUE INDEX i1 ON t1 (c1 DESC);
INSERT INTO t1 VALUES (’a’); -- r1 1. SELECT * FROM t1 WHERE (’a’|1) BETWEEN 0 AND
1. SELECT c1 FROM t1 WHERE 1 << c0; c1;
-- Fetch r1 -- Fetch r1
-- Warning|1292|evaluation failed: -- Warning|1292|Truncated incorrect INTEGER
Truncated incorrect INTEGER value: ’a’ value: ’a’
2. UPDATE t1 SET c1 = ’b’ WHERE 1 << c0; -- Warning|1292|Truncated incorrect INTEGER
-- Update no row value: ’a’
-- Error|1292|Truncated incorrect INTEGER -- Warning|1292|Truncated incorrect INTEGER
value: ’a’ value: ’a’
3. DELETE FROM t1 WHERE 1 << c0; 2. UPDATE t1 SET c1=’b’ WHERE (’a’|1) BETWEEN 0
-- Delete no row AND c1;
-- Error|1292|Truncated incorrect INTEGER -- Update no row
value: ’a’ -- Error|1292|Truncated incorrect INTEGER
value: ’a’
Listing 6. TiDB#31391. The SELECT query raises an incorrect warning -- Warning|1292|Truncated incorrect INTEGER
message.
value: ’a’
3. DELETE FROM t1 WHERE (’a’|1) BETWEEN 0 AND c1
Incorrect warning message. Listing 6 shows a bug ;
TiDB#313919 , in which the SELECT query raises a warning -- Delete no row
-- Error|1292|Truncated incorrect INTEGER
with an incorrect warning message. According to the error value: ’a’
reference manual10 , the warning message format for the code -- Warning|1292|Truncated incorrect INTEGER
value: ’a’
8 https://sqlite.org/forum/forumpost/12638a0aea0602a8
9 https://github.com/pingcap/tidb/issues/31391 Listing 8. MySQL#106407. The SELECT query raises duplicate warnings.
10 https://dev.mysql.com/doc/mysql-errors/5.7/en/server-error-
reference.html 11 https://github.com/pingcap/tidb/issues/33292

2081

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 25,2025 at 02:39:07 UTC from IEEE Xplore. Restrictions apply.
Listing 8 shows a bug MySQL#10640712 , in which the same feeds the same SELECT query into two different versions
warning or error appears multiple times on an indexed column of the same DBMS to detect performance bugs. SparkFuzz
r1. Table t1 consists of a FLOAT row with value 0. There [32] validates a query result with a reference DBMS (e.g.,
is an index built on column c1. The predicate ϕ is (‘a’ | PostgreSQL) or with a different Spark version. We develop a
1) BETWEEN 0 AND c1. For the SELECT query, MySQL novel differential testing approach, which executes SELECT,
converts ‘a’ to an INT value 0, and then performs bitwise OR UPDATE and DELETE queries with the same predicate in a
operation with 1, which result is 1, and finally checks whether DBMS to detect logic bugs.
the result is in the range of 0 and the value of column c1. Database and SQL query generation. One key component
Because the value of column c1 is 1, predicate ϕ is evaluated of automatic testing is an automatic input generator. Database
to TRUE. Therefore, the SELECT query returns row r1, but and SQL query generation have been widely explored by
raises three same warnings. The UPDATE and DELETE queries existing works [21]–[26], [29], [50]–[57]. SQLsmith [29] is an
raise a warning and an error (with the same warning code and open source random SQL query generator, which is inspired
message). We think there should be no duplicate warnings or by Csmith [58]. Go-randgen [54] can generate various SQL
errors because there is only one row. We report it to MySQL queries based on input SQL grammar. SQLRight [55] is a
developers who confirm this issue, but explain that “duplicate mutation-based SQL query generator, in which an intermediate
warnings, which are all identical, do not constitute a bug”. representation is designed to perform mutations guided by cov-
erage feedback. A better database and SQL query generation
V. D ISCUSSION might improve the efficiency of our work in detecting logic
Limitations. DQE has some limitations on detecting logic bugs.
bugs. First, DQE faces the same problem as differential Test oracles of DBMSs. Test oracles are the key to reveal
testing. DQE cannot detect a logic bug that occurs in all the DBMS bugs. ADUSA [26] uses Alloy [59], an open source
three SELECT, UPDATE and DELETE queries. Second, DQE language and analyzer, to analyze the expected query result
only supports common operations and functions supported by of a given SELECT query. PQS [9] synthesizes a SELECT
SELECT, UPDATE and DELETE queries. For each query’s query, which is computed to fetch a randomly-selected pivot
specific operations and functions, DQE cannot compare its row, and checks whether the pivot row is contained in its query
execution results with other queries’ execution results. For result. NoREC [10] rewrites a SELECT query as an equivalent
example, DQE cannot support aggregate-based functions, win- one that the DBMS cannot optimize, and compares their
dow functions and GROUP BY that are only used in SELECT results. TLP [11] leverages the ternary property of predicate
queries. Third, DQE cannot support non-deterministic func- evaluation, where the evaluation result is one of TRUE, FALSE
tions, e.g., RAND function that returns a random value in a and NULL, to partition a SELECT query into three partitioning
query. queries, whose combined query results are equal to the original
Extend to other DBMSs. The core idea of DQE is simple query’s query result. Troc [60] proposes how to build a test
but rather applicable to other DBMSs, because most DBMSs oracle for a pair of transactions. Our work proposes a new test
support data manipulation specified by predicates. We expect oracle for DBMS testing, and is complementary to existing
that the key insight of DQE could be used in many database approaches.
systems in DB-engine ranking [17]. We list some of them
as follows. (1) Graph database systems, e.g., Neo4j [35], VII. C ONCLUSION
Microsoft Azure Cosmos DB [36], and TigerGraph [37]. (2) Logic bugs in UPDATE and DELETE queries can cause
Key-value stores, e.g., Redis [38], Amazon DynamoDB [39], severer consequences, e.g., incorrect database states, and have
and Hazelcast [40]. (3) Document stores, e.g., MongoDB [41], not been tackled by existing approaches. In this paper, we
CouchDB [42], and Google Cloud Datastore [43]. propose a novel and general approach DQE to effectively
detect logic bugs in SELECT, UPDATE and DELETE queries.
VI. R ELATED W ORK
We evaluate DQE on five widely-used DBMSs, i.e., MySQL,
Differential testing of DBMSs. Differential testing [44] is MariaDB, TiDB, CockroachDB and SQLite. In total, we have
effective to test DBMSs without facing the test oracle problem. detected 41 previously-unknown logic bugs in these DBMSs.
The core idea behind differential testing is by feeding the same We expect that the generality of DQE can help improve the
input to many functionally identical systems and comparing reliability of DBMSs.
their outputs to detect bugs. There are many existing works
applying differential testing on DBMSs [8], [30]–[32], [45]– ACKNOWLEDGMENTS
[49]. RAGS [8] executes the same SELECT query on different
DBMSs and observes discrepancies in their query results. This work was partially supported by National Key R&D
DT2 [46] feeds a group of transactions into multiple DBMSs Program of China (2021YFB1716000), National Natural Sci-
to detect transaction bugs. Grand [45] and RD2 [49] apply ence Foundation of China (62072444), Frontier Science
differential testing on graph database systems. APOLLO [30] Project of Chinese Academy of Sciences (QYZDJ-SSW-
JSC036), and Youth Innovation Promotion Association at
12 https://bugs.mysql.com/bug.php?id=106407 Chinese Academy of Sciences.

2082

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 25,2025 at 02:39:07 UTC from IEEE Xplore. Restrictions apply.
R EFERENCES [26] S. Abdul Khalek, B. Elkarablieh, Y. O. Laleye, and S. Khurshid, “Query-
aware test generation using a relational constraint solver,” in Proceedings
[1] “MySQL homepage,” https://www.mysql.com, 2022. of IEEE/ACM International Conference on Automated Software Engi-
[2] “MariaDB homepage,” https://mariadb.org/, 2022. neering (ASE), 2008, pp. 238–247.
[3] D. Huang, Q. Liu, Q. Cui, Z. Fang, X. Ma, F. Xu, L. Shen, L. Tang, [27] “SQLancer homepage,” https://github.com/sqlancer/sqlancer, 2022.
Y. Zhou, M. Huang, W. Wei, C. Liu, J. Zhang, J. Li, X. Wu, L. Song, [28] “SHOW WARNINGS statement,” https://dev.mysql.com/doc/refman/8.
R. Sun, S. Yu, L. Zhao, N. Cameron, L. Pei, and X. Tang, “TiDB: 0/en/show-warnings.html., 2022.
A Raft-based HTAP database,” Proceedings of the VLDB Endowment [29] “SQLsmith,” https://github.com/anse1/sqlsmith, 2015.
(VLDB), vol. 13, no. 12, pp. 3072–3084, 2020. [30] J. Jung, H. Hu, J. Arulraj, T. Kim, and W. Kang, “APOLLO: Automatic
[4] R. Taft, I. Sharif, A. Matei, N. VanBenschoten, J. Lewis, T. Grieger, detection and diagnosis of performance regressions in database systems,”
K. Niemi, A. Woods, A. Birzin, R. Poss, P. Bardea, A. Ranade, B. Dar- Proceedings of the VLDB Endowment (VLDB), vol. 13, no. 1, pp. 57–70,
nell, B. Gruneir, J. Jaffray, L. Zhang, and P. Mattis, “CockroachDB: 2019.
The resilient Geo-distributed SQL database,” in Proceedings of ACM [31] X. Liu, Q. Zhou, J. Arulrai, and A. Orso, “Automatic detection of
SIGMOD International Conference on Management of Data (SIGMOD), performance bugs in database systems using equivalent queries,” in Pro-
2020, pp. 1493–1509. ceedings of International Conference on Software Engineering (ICSE),
[5] “SQLite homepage,” https://www.sqlite.org/index.html, 2022. 2022, pp. 225–236.
[6] D. D. Chamberlin and R. F. Boyce, “SEQUEL: A structured english [32] B. Ghit, N. Poggi, J. Rosen, R. Xin, and P. Boncz, “SparkFuzz: Search-
query language,” in Proceedings of ACM SIGFIDET Workshop on Data ing correctness regressions in modern query engines,” in Proceedings of
Description, Access and Control, 1974, pp. 249–264. the Workshop on Testing Database Systems (DBTest), 2020.
[7] “MySQL customers by industry,” https://www.mysql.com, 2022. [33] A. Andoni, D. Daniliuc, S. Khurshid, and D. Marinov, “Evaluating the
[8] D. R. Slutz, “Massive stochastic testing of SQL,” in Proceedings of “small scope hypothesis”,” in Proceedings of ACM Symposium on the
International Conference on Very Large Data Bases (VLDB), 1998, pp. Principles of Programming Languages (POPL), vol. 2, 2003.
618–622. [34] “Type conversion in expression evaluation,” https://dev.mysql.com/doc/
[9] M. Rigger and Z. Su, “Testing database engines via pivoted query refman/5.7/en/type-conversion.html, 2022.
synthesis,” in Proceedings of USENIX Symposium on Operating Systems [35] “Neo4j homepage,” https://neo4j.com/, 2022.
Design and Implementation (OSDI), 2020, pp. 667–682. [36] “Azure cosmos DB,” https://azure.microsoft.com/en-us/products/
[10] ——, “Detecting optimization bugs in database engines via non- cosmos-db/, 2023.
optimizing reference engine construction,” in Proceedings of ACM Joint [37] “TigerGraph,” https://www.tigergraph.com/, 2023.
European Software Engineering Conference and Symposium on the [38] “Redis homepage,” https://redis.io/, 2022.
Foundations of Software Engineering (ESEC/FSE), 2020, pp. 1140– [39] “Amazon DynamoDB,” https://aws.amazon.com/cn/dynamodb/, 2023.
1152. [40] “Hazelcast,” https://hazelcast.com/, 2023.
[11] ——, “Finding bugs in database systems via query partitioning,” in [41] “MongoDB,” https://www.mongodb.com/, 2022.
Proceedings of ACM SIGPLAN Conference on Object-Oriented Pro- [42] “Apache CouchDB,” https://couchdb.apache.org/, 2023.
gramming Systems, Languages, and Applications (OOPSLA), vol. 4, [43] “Datastore,” https://cloud.google.com/datastore, 2023.
2020. [44] W. M. McKeeman, “Differential testing for software,” DIGITAL TECH-
[12] E. F. Codd, “A relational model of data for large shared data banks,” NICAL JOURNAL, vol. 10, pp. 100–107, 1998.
Communications of the ACM, vol. 13, no. 6, pp. 377–387, 1970. [45] Y. Zheng, W. Dou, Y. Wang, Z. Qin, L. Tang, Y. Gao, D. Wang,
[13] B. Ding, S. Das, W. Wu, S. Chaudhuri, and V. Narasayya, “Plan W. Wang, and J. Wei, “Finding bugs in Gremlin-based graph database
Stitch: Harnessing the best of many plans,” Proceedings of the VLDB systems via randomized differential testing,” in Proceedings of ACM
Endowment (VLDB), vol. 11, no. 10, pp. 1123–1136, 2018. SIGSOFT International Symposium on Software Testing and Analysis
[14] T. Neumann and B. Radke, “Adaptive optimization of very large join (ISSTA), 2022, pp. 302–313.
queries,” in Proceedings of International Conference on Management of [46] Z. Cui, W. Dou, Q. Dai, J. Song, W. Wang, J. Wei, and D. Ye,
Data (SIGMOD), 2018, pp. 677–692. “Differentially testing database transactions for fun and profit,” in Pro-
[15] C. Wu, A. Jindal, S. Amizadeh, H. Patel, W. Le, S. Qiao, and S. Rao, ceedings of IEEE/ACM International Conference on Automated Software
“Towards a learning optimizer for shared clouds,” Proceedings of the Engineering (ASE), 2022.
VLDB Endowment (VLDB), vol. 12, no. 3, pp. 210–222, 2018. [47] J. Fu, J. Liang, Z. Wu, M. Wang, and Y. Jiang, “Griffin: Grammar-free
[16] R. Marcus, P. Negi, H. Mao, C. Zhang, M. Alizadeh, T. Kraska, DBMS fuzzing,” in Proceedings of IEEE/ACM International Conference
O. Papaemmanouil, and N. Tatbul, “Neo: A learned query optimizer,” on Automated Software Engineering (ASE), 2023.
Proceedings of the VLDB Endowment (VLDB), vol. 12, no. 11, pp. 1705– [48] W. Lin, Z. Hua, L. Zhang, and T. Xie, “GDiff: Automated differential
1718, 2019. performance testing for graph database systems,” in Proceedings of
[17] “DB-Engines ranking,” https://db-engines.com/en/ranking, 2023. International Conference on Software Engineering (ICSE), 2023.
[18] “Most widely deployed and used database engine,” https://www.sqlite. [49] R. Yang, Y. Zheng, L. Tang, W. Dou, W. Wang, and J. Wei, “Randomized
org/mostdeployed.html, 2022. differential testing of RDF stores,” in Proceedings of International
[19] “Database topic in GitHub,” https://github.com/topics/database, 2023. Conference on Software Engineering (ICSE Demo), 2023.
[20] “Unexpected delete when data truncation,” https://jira.mariadb.org/ [50] J. Ba and M. Rigger, “Testing database engines via query plan guidance,”
browse/MDEV-27885, 2022. in Proceedings of International Conference on Software Engineering
[21] A. Neufeld, G. Moerkotte, and P. C. Lockemann, “Generating consistent (ICSE), 2023.
test data: Restricting the search space by a generator formula,” Proceed- [51] Z. Jiang, J. Bai, and Z. Su, “DynSQL: Stateful fuzzing for database
ings of the VLDB Endowment (VLDB), vol. 2, no. 2, pp. 173–214, 1993. management systems with complex and valid SQL query generation,” in
[22] J. Gray, P. Sundaresan, S. Englert, K. Baclawski, and P. J. Weinberger, Proceedings of USENIX Security Symposium (USENIX Security), 2023.
“Quickly generating billion-record synthetic databases,” in Proceedings [52] Z. Hua, W. Lin, L. Ren, Z. Li, L. Zhang, W. Jiao, and T. Xie, “GDsmith:
of ACM SIGMOD International Conference on Management of Data Detecting bugs in Cypher graph database engines,” 2023.
(SIGMOD), 1994, pp. 243–252. [53] M. Kamm, M. Rigger, C. Zhang, and Z. Su, “Testing graph database
[23] N. Bruno and S. Chaudhuri, “Flexible database generators,” in Proceed- engines via query partitioning,” in Proceedings of ACM SIGSOFT
ings of International Conference on Very Large Data Bases (VLDB), International Symposium on Software Testing and Analysis (ISSTA),
2005, pp. 1097–1107. 2023.
[24] K. Houkjær, K. Torp, and R. Wind, “Simple and realistic data genera- [54] “go-randgen,” https://github.com/pingcap/go-randgen, 2020.
tion,” in Proceedings of International Conference on Very Large Data [55] Y. Liang, S. Liu, and H. Hu, “Detecting logical bugs of DBMS
Bases (VLDB), 2006, pp. 1243–1246. with Coverage-based guidance,” in Proceedings of USENIX Security
[25] C. Binnig, D. Kossmann, E. Lo, and M. T. Özsu, “QAGen: Generating Symposium (USENIX Security), 2022, pp. 4309–4326.
query-aware test databases,” in Proceedings of ACM SIGMOD Interna- [56] R. Zhong, Y. Chen, H. Hu, H. Zhang, W. Lee, and D. Wu, “SQUIR-
tional Conference on Management of Data (SIGMOD), 2007, pp. 341– REL: Testing database management systems with language validity and
352. coverage feedback,” in Proceedings of ACM SIGSAC Conference on
Computer and Communications Security (CCS), 2020, pp. 58–71.

2083

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 25,2025 at 02:39:07 UTC from IEEE Xplore. Restrictions apply.
[57] M. Wang, Z. Wu, X. Xu, J. Liang, C. Zhou, H. Zhang, and Y. Jiang, Programming Language Design and Implementation (PLDI), 2011, pp.
“Industry practice of Coverage-guided enterprise-level DBMS fuzzing,” 283–294.
in Proceedings of IEEE/ACM International Conference on Software [59] “Alloy,” https://alloytools.org/, 2022.
Engineering: Software Engineering in Practice (ICSE-SEIP), 2021, pp. [60] W. Dou, Z. Cui, Q. Dai, J. Song, D. Wang, Y. Gao, W. Wang, J. Wei,
328–337. L. Chen, H. Wang, H. Zhong, and T. Huang, “Detecting isolation bugs
[58] X. Yang, Y. Chen, E. Eide, and J. Regehr, “Finding and understanding via transaction oracle construction,” in Proceedings of International
bugs in C compilers,” in Proceedings of ACM SIGPLAN Conference on Conference on Software Engineering (ICSE), 2023.

2084

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 25,2025 at 02:39:07 UTC from IEEE Xplore. Restrictions apply.

You might also like