0% found this document useful (0 votes)
6 views16 pages

DBMS

This document presents a systematic literature review on the performance comparisons of database management systems (DBMS). It highlights that existing performance tests often do not reflect real-world scenarios and lack sufficient detail for replication. The study aims to synthesize findings from various studies, provide recommendations for industry and research, and emphasize the importance of considering factors beyond just performance in DBMS selection.

Uploaded by

sanskargade102
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views16 pages

DBMS

This document presents a systematic literature review on the performance comparisons of database management systems (DBMS). It highlights that existing performance tests often do not reflect real-world scenarios and lack sufficient detail for replication. The study aims to synthesize findings from various studies, provide recommendations for industry and research, and emphasize the importance of considering factors beyond just performance in DBMS selection.

Uploaded by

sanskargade102
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

The Journal of Systems and Software 208 (2024) 111872

Contents lists available at ScienceDirect

The Journal of Systems & Software


journal homepage: www.elsevier.com/locate/jss

Database management system performance comparisons: A systematic


literature review✩
Toni Taipalus
Faculty of Information Technology, University of Jyväskylä, P.O. Box 35, FI-40014, Finland

ARTICLE INFO ABSTRACT

Keywords: Efficiency has been a pivotal aspect of the software industry since its inception, as a system that serves the
Database end-user fast, and the service provider cost-efficiently benefits all parties. A database management system
Performance (DBMS) is an integral part of effectively all software systems, and therefore it is logical that different studies
Comparison
have compared the performance of different DBMSs in hopes of finding the most efficient one. This study
Database management system
systematically synthesizes the results and approaches of studies that compare DBMS performance and provides
Relational database
NoSQL
recommendations for industry and research. The results show that performance is usually tested in a way that
NewSQL does not reflect real-world use cases, and that tests are typically reported in insufficient detail for replication
or for drawing conclusions from the stated results.

1. Introduction study, performance is typically tested in very specific contexts which


are not necessarily generalizable, and there are other aspects besides
Efficiency is important in effectively all software systems, whether performance to consider.
efficiency is measured by response times, how many concurrent users This study was inspired by a study by Raasveldt et al. (2018),
the system can serve, or how energy-efficient the system is (Toffola which claimed that ‘‘[...] we will explore the common pitfalls in database
et al., 2018). Despite its importance, many software systems suffer from performance benchmarking that are present in a large number of scientific
efficiency problems (Jin et al., 2012), as optimization has been largely works [...]’’ while consciously refraining from citing example studies.
recognized as a complex task (Toffola et al., 2018; Difallah et al., 2013). While we agree with their claim based on our personal experiences,
The more a system holds and handles data, the more the system’s we wanted to systematically explore whether this phenomenon is com-
performance depends on the database, and the database is often one mon among performance comparisons, and whether such studies show
of the first suspects when a performance issue is detected. The domain performance gains of one DBMS over another in a setting that can be
of database management systems (DBMS) saw rapid advancements in replicated. This study is not an attempt to criticize studies comparing
performance especially in the 1980s and 1990s, as benchmarking com-
DBMS performance, as no scientific study (ours included) is without
petitions between DBMS and hardware vendors led to innovations in
threats to validity. Rather, based on the survey of the literature, the
DBMS technology that significantly improved DBMS performance (De-
primary goals of our study are to propagate information on (i) how
Witt and Levine, 2008). Performance improvements are related to
DBMS performance has been tested, (ii) how performance has been
DBMS aspects such as different supporting data structures (Valduriez,
recommended to be tested, (iii) how the performance comparison re-
1987), and algorithms for sorting (Estivill-Castro and Wood, 1992;
sults should be interpreted, (iv) what other aspects besides performance
Do et al., 2022) and joining (Schneider and DeWitt, 1989; Patel and
DeWitt, 1996). Given that DBMSs are annually a multi-billion dollar should be considered, and (v) what other avenues might be fruitful for
industry, the performance of a DBMS is one of the most crucial aspects DBMS performance testing. Additionally, we provide (vi) a relatively
when a company chooses a DBMS for their product or service (Dietrich accessible background on database system performance, followed by
et al., 1992). As different DBMS performance comparison studies and (vii) a systematic review of literature on DBMS performance compar-
DBMS vendor white-papers highlight the performance gains of one isons, (viii) describing which DBMSs and which types of DBMSs have
DBMS over another, it may seem tempting to either consider choosing been compared with each other, (ix) the outcomes of the performance
the fastest DBMS for a business domain or to migrate from one DBMS to comparisons, and (x) by which benchmarks the DBMSs have been
another for performance gains. However, as we show and argue in this compared.

✩ Editor: Dr. Jacopo Soldani.


E-mail address: [email protected].

https://doi.org/10.1016/j.jss.2023.111872
Received 10 March 2023; Accepted 4 October 2023
Available online 27 October 2023
0164-1212/© 2023 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
T. Taipalus The Journal of Systems & Software 208 (2024) 111872

The rest of this study is structured as follows. In Sections 2 and There are several popular logical data models, some of which are
3, we provide theoretical background for understanding the results inseparable from their underlying physical data models. One of the
and discussion provided by this study. These background sections are most prominent logical data models is the relational data model rooted
deliberately presented by refraining from using unnecessary informa- in set theory (Codd, 1970). Relational DBMSs (RDBMS) follow many of
tion technology-related terms, acronyms, algorithms, or mathematics, the concepts introduced in the relational model. Many of the popular
to cater to the needs of readers from various backgrounds. For readers RDBMSs such as PostgreSQL and Oracle Database have adopted data
more technically inclined or interested, we have provided further read- structures from other logical data models as well (Lu and Holubová,
ing at the end of Sections 2 and 3. Section 4 details how we searched, 2019). What is common for effectively all modern RDBMSs is that
selected, and categorized the DBMS performance comparison studies, they utilize Structured Query Language (SQL) (ISO/IEC, 2016a,b) to
and Section 5 presents a high-level overview of the results, which is define data structures and to retrieve and manipulate data. Typically,
complemented by the Appendix detailing the performance comparison RDBMSs also implement a strong data consistency model which dictates
outcomes. In Section 6, we discuss what these findings mean, how or allows that database operations grouped into a transaction must
they are applicable in industry, and present our recommendations for all succeed or all fail, data must follow defined business logic, suc-
industry and research based on the findings. Section 7 concludes the cessful transactions persist in storage, and concurrent transactions (cf.
study. Bernstein and Goodman, 1981) must result in the same data as if the
transactions were serial. At least the last rule can often be loosened
2. Database systems in modern implementations to various degrees. These constraints are
collectively referred to as the ACID consistency model (Haerder and
2.1. Database system overview Reuter, 1983).
NoSQL is an umbrella term for several data models, typically
A database is a collection of interrelated data, typically stored developed or popularized in the first decade of the 2000s (Grolinger
according to a data model. Typically, the database is used by one or et al., 2013). Contrary to the relational model, the data models within
several software applications via a DBMS. Collectively, the database, NoSQL typically have no formal definitions, and different NoSQL
the DBMS, and the software application are referred to as a database DBMSs implement different data models such as key–value (e.g., Redis),
system (Elmasri and Navathe, 2016, p. 7)(Connolly and Begg, 2015, p. document (e.g., MongoDB), wide-column (e.g., Cassandra) and graph
65). The separation of the database and the DBMS, especially in the (e.g., Neo4J) (Davoudian et al., 2018; Reniers et al., 2017). Further-
realm of relational databases, is typically impossible without exporting more, these DBMSs often have a distinct query language developed
the database in another format. In these situations, the database is often to cater to the particular data structures available in the DBMS’s
unusable by the DBMS, unless the database is imported back to a format implementation of a data model. While RDBMSs have favored data
understood by the DBMS. Possibly due to this inseparability, both the consistency (Chaudhry and Yousaf, 2020) by eliminating redundant
DBMS and the underlying database are often colloquially referred to
data through logical database design, and through a strong consistency
simply as database. It is worth noting, though, that the former is a piece
model, NoSQL DBMSs have generally adopted the opposite approach.
of software that does, while the other is a collection of data that is.
In several NoSQL data models such as key–value pairs and docu-
Fig. 1 shows a simplified example of a system where the compo-
ments, redundant data are stored at the cost of storage space (Hecht
nents crucial for a database system and the scope of this study are
and Jablonski, 2011). This approach enables query languages to be
emphasized. We refer to the components in the figure throughout this
simple (Dey et al., 2014), avoiding complex and potentially slow
study. Several things are worth noting in considering the figure, as
queries. Furthermore, consistency models are typically less strict than
we have traded technical precision and comprehensiveness for ease
in RDBMSs (Stonebraker, 2010), which facilitates higher performance
of presentation by depicting only a single end-user, a single software
demanded by, e.g., web applications with a large number of concurrent
application (some parts typically reside on the end-user’s device, while
users (Ramakrishnan, 2012).
others reside on a separate server), a single DBMS, single hardware
Although NoSQL DBMSs popularized several database-related ap-
components, and a single database. Furthermore, we have not illus-
proaches such as non-strict database structures, data availability over
trated other DBMS components such as access control, data structures
data consistency, and relatively effortless database replication (i.e., data
such as metadata, or outputs such as query execution plans. The figure
is copied over computing nodes) and sharding (i.e., data is divided
also adopts the view that the database resides in persistent storage
between computing nodes) (Grolinger et al., 2013), some industry
— this is not always the case. Additionally, we have depicted merely
leaders such as Google deemed a strong consistency model and an
a centralized database system in which neither the DBMS nor the
expressive query language important enough to design a DBMS which
database has been distributed across multiple nodes. These are willful
incorporates features from both RDBMSs and NoSQL DBMSs (Corbett
omissions given the scope of this study.
et al., 2013). These so-called NewSQL DBMSs use the relational model,
2.2. Data models often with extensions, SQL as their primary query language, and a
distributed database architecture (Pavlo and Aslett, 2016). In addition
Databases follow one or several data models, i.e., definitions of to these three main categories of RDBMS, NoSQL, and NewSQL data
how and what data can be stored, and sometimes, what operations models, others such as object stores (Kulshrestha and Sachdeva, 2014)
are available for data retrieval and manipulation. Data models may be and GPU-intensive (Suh et al., 2022) systems are used in specific
conceptual, logical, or physical. Conceptual models such as the Entity- contexts.
Relationship model (Chen, 1976) do not dictate how data should be
stored, but are rather used to describe the interrelations and char- 2.3. Query execution
acteristics of the data. Logical data models such as the relational
model (Codd, 1970) are related to how data is stored and presented, but The word query typically refers to query language statements that
often without describing how the data is physically stored, e.g., which retrieve some data from the database. However, in this study, we
computing node is responsible for storing the data, where the data is use the word query to refer to any data retrieval and manipulation
located on a disk, and what types of indices (i.e., redundant data struc- statement for brevity. In times it is necessary to differentiate between
tures which facilitate query performance) and physical data retrieval data retrieval and manipulation, we use appropriate terms such as read
operators are available. One DBMS is not limited to using a single data operations for data retrieval, and write operations for data insertion,
model (Forresi et al., 2022). updates, and deletes. In this subsection, we describe how queries are

2
T. Taipalus The Journal of Systems & Software 208 (2024) 111872

Fig. 1. A simplified view of a database system and the end-user with the emphasis on components relevant to this study; the arrows represent the flow of information from the
end-user’s device to the database residing in persistent storage; the flow of information back to the software application is not illustrated here; gray rectangles represent boundaries
of physical devices.

executed, using mainly general (i.e., not specific to a single DBMS) Typically, the business domain dictates what types of anomalies are
literature from the domain of RDBMS query execution. tolerated.
When a user — were it a human actor directly using a terminal, a Finally, as strong consistency models often require that transactions
transaction processing software application, or a database benchmark persist in the database and that all or none of the operations in a trans-
software — submits a query to a DBMS, a multitude of events must take action succeed, locking is typically complemented by transaction logs.
place before the user receives feedback. Illustrated in a general fashion These logs are written before write operations are committed to the
in Fig. 1, the query parser checks, among other things, that the query database, and can be used in reversing earlier write operations if a later
is syntactically valid (Hellerstein et al., 2007). If the query passes these write operation in the same transaction fails. All these considerations
(and other) checks, the query is translated to a lower-level presentation discussed in this section play a significant role from a performance
and passed to the query optimizer. The optimizer generates one or perspective, which is discussed in the next section.
several query execution plans. These plans consist of physical operators Further reading on database systems: for readers interested in
for implementing, e.g., which physical data structures will be utilized in the basics of database systems, either the undergraduate level textbook
executing the query, and in RDBMSs in particular, how tables are joined by Connolly and Begg (2015), or Elmasri and Navathe (2016) are
together (Graefe, 1993). If several plans are generated, the optimizer excellent albeit lengthy introductions covering the topic from several
evaluates which of these plans is the most effective in regards to, points of view and with the focus on RDBMSs. For readers interested
e.g., query execution time (Hellerstein et al., 2007). The accuracy of the in query processing, we point to studies by Chaudhury (1998), and
optimizer relies on aspects such as database metadata (Christodoulakis, Hellerstein, Stonebraker and Hamilton (2007). If you are interested in
logical relational database design, the book by Date (2019) is an in-
1984), statistics of previous query executions, and the indices avail-
depth resource covering both formal and informal approaches. For a
able (Chaudhuri, 1998). Generating effective query execution plans is
survey of literature on NoSQL data models, the study by Davoudian
a complex effort and takes time (Graefe, 1993; Chaudhuri, 1998), but
et al. (2018) is an accessible starting point.
once formulated, the plans can be re-used to a degree.
Next, the query execution engine implements the query execution
3. Performance
plan, using the physical operators therein. Simplified, the data objects
required by the query are typically first searched from a memory area
3.1. Performance measurement
called the buffer pool which is allocated and maintained by the DBMS.
If some or all data is not found, the data is requested from disk. Before In general, performance is a measurement of how efficiently a
accessing the disk, many systems may additionally utilize other areas software system completes its tasks. Performance is typically measured
of memory to avoid disk access (Yang and Lilja, 2018). in response time, throughput (Hellerstein et al., 2007), or in some
Effectively all database systems function in an environment where cases, utilization of computing resources (Cortellessa et al., 2011, p.
multiple concurrent end-users use the database. This concurrency 4). Response time is the time taken for a call in the system to traverse
presents challenges particularly when the users execute write opera- to some other part of the system and back. This is also sometimes called
tions on the same database, e.g., when two or more users withdraw latency (Gunther, 2011, p. 10), and in the context of database systems,
money from the same bank account, concurrently updating the bal- the response time may be measured as the response time to the first or
ance (Bernstein and Goodman, 1981). To guarantee that the write the last result item (Graefe, 1993). In a broad perspective described
operations do not interfere with each other in a way that would cause in Fig. 1, the response time might be the time taken after the end-
the data to not represent the real world, DBMSs typically implement user sends a request to the software application (e.g., an online store),
concurrency control through locking or versioning data. Effectively, which passes the request to a DBMS, which returns a set of data to the
the simpler implementations of locking restrict data objects from be- software application, which finally presents the data to the end-user’s
ing accessed by other operations while the data objects are being device. In database benchmarking, however, response time might be
modified (Hellerstein et al., 2007). These locking mechanisms may be measured by running the benchmark on the same device the DBMS
implemented to ensure that no anomalies happen, or with implementa- and the database reside, effectively eliminating inter-device-induced
tions that theoretically allow some anomalies (Berenson et al., 1995). performance drawbacks such as network latency (Patounas et al., 2020;

3
T. Taipalus The Journal of Systems & Software 208 (2024) 111872

Delis and Roussopoulos, 1993) and firewalls, and mitigating the effects guidelines that strive to minimize redundancy to eliminate potential
of other software running on the devices. Although DBMSs perform data anomalies caused by redundant data (Codd, 1972, 1975), and to
other tasks besides querying, querying is typically what is measured minimize the need for storage space, which in turn typically causes
in DBMS performance testing (Dietrich et al., 1992). While response queries to run slower due to a larger number of table joins. In contrast,
time is perhaps the least arduous performance metric to measure, it is different NoSQL data models — especially key–value, document, and
not often enough for reliable measurement of transaction processing wide-column — follow design guidelines according to which data
environments (Dietrich et al., 1992) (often dubbed online transaction structures are designed to efficiently satisfy predetermined business
processing, OLTP). That is, response time might be a metric better logic queries, with the elimination of redundant data being a secondary
suited for long-running queries in decision support environments (of- concern (Davoudian et al., 2018). It follows that because many NoSQL
ten dubbed online analytical processing, OLAP), but as transaction data structures are designed to serve queries, queries are typically sim-
processing environments often process a large number of concurrent ple (Dey et al., 2014), and their execution requires less computational
transactions, response time alone might not reliably account for the resources than complex queries in relational databases. As discussed in
effects of concurrent transactions, unless response time is measured as Section 2.3, locking data objects (both on disk and in memory, and both
an average of multiple concurrent transactions. primary data structures as well as indices), logging write operations,
Performance can also be measured by throughput, i.e., how many and how memory is managed by the DBMS all play a significant role
transactions the DBMS can execute in a given time frame. Throughput in DBMS performance (Hellerstein et al., 2007; Stonebraker, 2010).
is often expressed as transactions per second (Dietrich et al., 1992) and For example, preventing write operation-induced anomalies is a costly
requires a more sophisticated approach, e.g., benchmarking software. action, and the level of granularity of database locks presents signifi-
Again, throughput may be measured either locally (i.e., using only the cant considerations on write operation performance, which is largely
hardware the DBMS and the database reside on), or over a network dictated by the ratio of read and write operations.
in case the database is distributed. Alternatively, throughput may be Distribution: Write operations in distributed configurations pose non-
measured by connecting the benchmarking software to the software trivial challenges to both performance and data consistency (Delis and
application, which simulates the throughput of the whole database sys- Roussopoulos, 1993). In distributed database systems, effectively all
tem by accounting for, e.g., network and the software application (e.g., transactions must choose either data consistency or data availabil-
Kumar and Grot, 2022; Sundaresan et al., 2013). Such an approach ity (Brewer, 2012; Gilbert and Lynch, 2002). The former guarantees
arguably requires significantly more investment, but provides a holistic that the data the end-user receives are not stale, with the cost of
perspective on the performance of the whole system, also uncovering performance, while the latter guarantees to a degree that the end-user
potential performance issues unrelated to the DBMS and the database. receives data faster, but with no guarantees that the dataset received is
Finally, performance may be measured by resource utilization, either the most recent. The preferred approach is largely dictated by business
CPU time, I/O, memory allocation, or energy consumption (Graefe, logic.
1993) in systems striving for energy-efficiency due to, e.g., limited DBMS and OS parameters: Moving from data models and database
battery power, or due to environmental concerns (Guo et al., 2022). system distribution to lower levels of abstraction, operating system
In summary, we might consider the measurement of throughput (OS) and DBMS parameters and their interrelationships (e.g., page size)
a process that typically requires a simulation of some level, and the can have direct or indirect effects on performance (Dietrich et al.,
measurement of response time as an exact or approximated mathemat- 1992). Additionally, DBMS parameters such as the amount of memory
ical method. The former approach requires relatively high investments the DBMS is allowed to use for data processing is typically closely
into the development of such simulations (Cortellessa et al., 2011, p. related to the amount of memory available. Furthermore, as a query
142), while the latter often relies on a set of assumptions that do not is sent to the optimizer (cf. Fig. 1), it depends on the DBMS internals
necessarily reflect real-world scenarios due to inaccuracies in predicting how efficiently the optimizer can select the most efficient physical
what the real-world scenario ultimately is and how it can change. operations to implement the query, and what physical operations are
available to the optimizer in the first place (Chaudhuri, 1998). For
3.2. Factors affecting performance example, MySQL implemented only one physical operation for table
joins until 2018,1 limiting the number of options the optimizer could
Hardware: An intuitive factor in performance is the power of hard- choose from. Regarding query optimization, the optimizers of RDBMSs
ware (Osterhage, 2013, p. 1), and while it is true that most of the in particular are relatively mature and can spot some unnecessary
local response time is attributed to time taken by CPU processing, complications in queries, while overlooking others (Brass and Goldberg,
memory and disk access, and software waiting for other tasks to com- 2006). Despite the benefits brought by the optimizers, some queries are
plete (Cortellessa et al., 2011, p. 5), first investing in software perfor- inherently slow and can only be optimized through query rewrites.
mance rather than hardware performance is often more cost-effective. Physical database design: Last, but definitely not least, physical
That being said, it is generally accepted that memory access is at least database design plays a key role in DBMS performance. It has been
four orders of magnitude faster than disk access (e.g., Gunther, 2011, p. argued that performance bottlenecks are difficult to find in large
42). That is, if memory access takes minutes (nanoseconds), disk access systems (Ammons et al., 2004), and that efficiency is gained by focusing
takes months (milliseconds). These numbers are largely dependent on on the vital few areas instead of the trivial many (Juran and De Feo,
the speed of memory and the type of disk, but paint a picture of 2010, p. 450). One of the most vital areas in database systems is
how zealously DBMS optimization strives to minimize disk access. physical design. In relational databases, efficient physical design is
Since memory is typically more expensive than disk storage, keeping largely achieved through indices, and in NoSQL databases, typically
the whole database in memory is often unfeasible. Additionally, the through database distribution over computing nodes. In contrast to a
underlying hardware is important, as, e.g., some DBMSs have been holistic system overview, performance bottlenecks may be easier to find
shown to utilize multi-processor or multi-core environments more ef- in queries, since many DBMSs provide detailed information on query
fectively than others (Tu et al., 2013). Intuitively, how well a DBMS can execution (Fig. 2). PostgreSQL (Fig. 2(a)) lists the physical operations
exploit parallelism affects the performance of query execution (Tallent used to execute the query, which of the operations took the most time
and Mellor-Crummey, 2009; Tözün et al., 2013). Ultimately, perfor- units, and which indices, if any, were used. For example, it can be
mance measurement is about gains or losses in percentages, not in, seen in Fig. 2(a) that the sequential scan on line 12 accounted for
e.g., response times.
Data models: Data models described in Section 2.2 have indirect
1
effects on DBMS performance. Relational databases often follow design https://dev.mysql.com/doc/refman/5.6/en/explain-output.html

4
T. Taipalus The Journal of Systems & Software 208 (2024) 111872

Fig. 2. Query execution plans illustrating the physical operators such as hash join and seq scan chosen by the optimizer.

approximately 94% of the execution time of the whole query (178 in an online bank. TPC-B, on the other hand, does not wait and can be
time units out of 189 ms), probably because the query fetched a large used as a precursor for TPC-A in adjusting DBMS parameters (Dietrich
number of records from the database. The query could be optimized et al., 1992). Alternatively to transaction processing, TPC-H benchmark
by, e.g., selecting a smaller number of records, and showing the results measures the performance of a DBMS in decision support (Barata et al.,
to the end-user by paging them, i.e., showing a subset of results first, 2015; Dreseler et al., 2020).
and fetching more later if necessary. In NoSQL systems, the query
In the more general DBMS domain, the Yahoo! Cloud Serving Bench-
optimizer plays a smaller role due to typically less expressive query
mark (YCSB) is a framework for benchmarking transaction processing
languages (cf. Fig. 2(b)). Some NoSQL systems such as Cassandra do
in systems with different data models and architectures (Cooper et al.,
not permit the execution of queries that do not utilize the physical
2010). Due to its extensibility, YCSB can be adapted to different NoSQL
structures effectively.
data models. YCSB contains different workloads, each with a different
3.3. Database performance benchmarks ratio of read and write operations. YCSB and its extensions such as
YCSB+T typically utilize transactions which consist of single operations
There are several database performance benchmarks available, each and do not enforce strong consistency (Qu et al., 2022b; Dey et al.,
typically consisting of a sample database and a workload that simulates 2014). The benchmarks described above are by no means an exhaustive
how the database could be used (Difallah et al., 2013; Qu et al., list but cover the most popular benchmarks (cf. Section 2.1). Other
2022b). The benchmarks usually measure the efficiency of querying benchmarks include LUBM (Guo et al., 2005), OLTP-Bench (Difallah
while taking into account factors such as concurrency but disregarding et al., 2013), and JOB (Leis et al., 2015). Regardless of the data model
other DBMS tasks such as efficiency in data structure definition or bulk and DBMS, transaction processing benchmarks have typically been the
loading (Dietrich et al., 1992). de facto method of comparing different DBMSs and hardware (Tözün
In the domain of relational databases, the Transaction Process- et al., 2013).
ing Council (TPC) benchmarks (e.g., Gray, 1992) are perhaps the
Further reading on performance: for readers interested in phys-
most utilized (Dreseler et al., 2020; Tözün et al., 2013), and test the
ical database operations and query execution from a performance
throughput of the DBMS with various parameters. For example, the
perspective, Graefe (1993) provides an in-depth, DBMS-independent
TPC-A benchmark simulates a database of a bank with four tables and
survey. For more information on physical database design, especially
with one transaction, the TPC-B benchmark a database of a wholesale
supplier with nine tables and with five transactions, and the TPC-E indices and how they work, the book by Lightstone et al. (2010) is a
benchmark a brokerage database with 33 tables and 12 transactions. detailed and descriptive source. For a practical and concise guide on
All these benchmarks have the option for simulating strong consistency, SQL query optimization, we point readers towards Winand’s (2012)
and while TPC-A and TPC-B have transactions typical for transaction book. Regarding NoSQL DBMS optimization, we suggest referring to
processing, TPC-E includes also decision support transactions (Tözün the manual of the DBMS of your choice, and always making sure that
et al., 2013). TPC-A simulates human end-user thinking by waiting the source of information is current, as NoSQL systems tend to evolve
between transactions, as a human arguably would wait between clicks rapidly.

5
T. Taipalus The Journal of Systems & Software 208 (2024) 111872

Fig. 3. The study selection process; the numbers refer to the number of primary studies selected in each stage of the process.

Table 1
Search strings.
Database Search string
ACM DL [Abstract: performance] AND [Abstract: comparison] AND [[Abstract: database] OR [Abstract:
dbms]] AND [Publication Date: (01/01/2000 TO 03/31/2022)]
IEEE Xplore (‘‘Abstract’’:performance AND ‘‘Abstract’’:comparison AND (‘‘Abstract’’:database OR ‘‘Abstract’’:dbms))
ScienceDirect Title, abstract, keywords: performance AND comparison AND (database OR dbms)
Google Scholar database performance comparison

4. Study selection 4.2. Selected studies

4.1. Process and criteria The selected 117 primary studies compared the performance of a
total of 44 different DBMSs. We categorized these DBMSs into three
The DBMSs in this study were selected based on the selected primary top-level types defined and discussed in Section 2.2: RDBMSs, NoSQL
studies. That is, we did not choose, e.g., the most popular DBMSs to systems, and NewSQL systems. Five DBMSs not clearly pertaining to
include, but reported the DBMSs yielded by the primary studies. The any of these three categories were categorized under other systems
results herein may be considered the most popular DBMSs in terms
(Table 3). It is worth noting that these DBMS types are not always
of benchmarking reported in scientific literature. Fig. 3 describes the
clear-cut due to the lack of specificity and changing nature of the
primary study selection process starting from ACM Digital Library,
definitions, and should be interpreted as merely means to compartmen-
IEEE Xplore, and ScienceDirect, complemented by subsequent Google
talize the results of this study into a more readable form. Five selected
Scholar searches. The search strings are detailed in Table 1. To account
for potentially missing relevant studies, we conducted three rounds of primary studies did not report results implying the performance of one
backward snowballing (i.e., following the lists of references in selected DBMS over another (Padhy and Kumaran, 2019; Schmid et al., 2015b;
studies), until snowballing revealed no additional studies. A total of 117 KumarDwivedi et al., 2012; Faraj et al., 2014; Jing et al., 2009).
primary studies comparing DBMS performance were selected. Fig. 4 shows the distribution of publication years and the types of
Table 2 describes our inclusion criteria applied in the primary DBMSs discussed in the selected studies. Although our criteria allowed
study selection. The first four criteria are related to bibliographic for studies from the year 2000, the first studies selected were published
details, while the last three criteria are concerned with article focus in 2008. The figure shows that generally, there is a somewhat constant
and content. Regarding criterion #3, we excluded academic theses and number of DBMS performance comparison studies each year. It is worth
dissertations (e.g., Coates, 2009) due to the fact that they are typically noting that one study may pertain to several types of DBMSs.
not peer-reviewed. We also excluded white and gray literature for the
same reason, and because those studies are often written or published 5. Performance comparison results
by partial parties, e.g., DBMS vendors.
We only selected studies that compared query (i.e., retrieving or The most popular DBMS performance comparisons compared one
modifying data) execution performance, not regarding e.g., database
or several RDBMSs to one or several NoSQL systems, one NoSQL
replication performance (Elnikety et al., 2006) or performance of dif-
system to another NoSQL system, or one RDBMS to another RDBMS,
ferent join operations (Kim and Patel, 2010). We also excluded studies
respectively. A total of 48 studies compared solely read performance,
that compared a single DBMS performance in different configurations
while 6 studies compared solely write performance. The rest of the
such as hardware, replication strategy, database structure, or query
language (Holzschuher and Peinl, 2013) and studies that compared studies compared both read and write performance, with the exception
a DBMS with different data-related platforms (Purbo et al., 2020). of two studies (Cheng et al., 2019; Nepaliya and Gupta, 2015) which
Studies that reported pseudonymized DBMS names were also excluded. were unclear whether they compared write operations. All comparisons
Finally, we only included studies that reported results based on at and their results per DBMS type are summarized in Fig. 5.
least seemingly objective metrics and empirical results. That is, studies Fig. 6 presents an overview of which DBMSs and DBMS types
simply stating the opinions of the authors such as ‘‘based on our the primary studies compared. The figure perhaps conveys how both
experiences, we believe MySQL is faster than SQL Server’’ were not other and NewSQL systems are typically compared within their respec-
considered. tive DBMS type groups, while RDBMS and NoSQL systems are both

6
T. Taipalus The Journal of Systems & Software 208 (2024) 111872

Table 2
Primary study selection criteria.
# Inclusion criterion
1 Article is written in English.
2 Full article can be accessed.
3 Article is published in a scientific journal, or conference or workshop proceedings.
4 Article is published between 2000 and March 2022.
5 Article focus is on query language statement execution performance comparison.
6 Article focus is on comparing the performance of two or more different DBMSs.
7 Article is based on at least seemingly objective metrics.

Table 3
DBMSs discussed in this study divided into four types.
DBMS type DBMSs
RDBMS Access, Azure SQL, Interbase, DB/2, H2, Hive, MariaDB, MySQL
Cluster, MySQL, Oracle Database, PostgreSQL, PostgresXL, SQL
Server, SQLite
NoSQL ArangoDB, Azure Document Database, Cassandra, Couchbase,
CouchDB, Elasticsearch, Firebase, HBase, Hypertable, memcached,
MongoDB, Neo4J, Oracle NoSQL, OrientDB, RavenDB, Redis,
RethinkDB, Riak, Scalaris, Tarantool, Voldemort
NewSQL CockroachDB, MemSQL (now known as SingleStoreDB), NuoDB,
VoltDB
Other BlazingSQL, Caché, Db4o, OmniSciDB, PG-Strom

Fig. 4. The number of publications by publication year and DBMS type; the year 2022 was only considered until March.

Fig. 5. DBMS performance comparisons overview; a directed edge from node a to node b represents the number of studies according to which a system of type a outperformed
a system or systems of type b in (r)ead and (w)rite operations, e.g., a NoSQL system outperformed a NewSQL system in read operations in one study, and in write operations in
one study; thicker edges visualize the most popular comparisons.

7
T. Taipalus The Journal of Systems & Software 208 (2024) 111872

Fig. 6. An overview of read operation performance comparisons between NoSQL systems (green, upper right), NewSQL systems (yellow, lower right), RDBMSs (red, lower left),
and other systems (blue, upper left); a clockwise turning edge from node a to node b depicts node a outperforming node b, and the color of the edge corresponds to the type of
the outperforming node, e.g., Caché outperforms PostgreSQL according to one or several studies; the size of a node represents out-degree, i.e., larger nodes have outperformed
more systems than smaller nodes.

compared within their respective groups as well as with each other. that two of the studies (Oliveira and Bernardino, 2017; Vershinin
Additionally, the size of the nodes such as MongoDB, Redis, Cassandra, and Mustafina, 2021) seemed to have executed the queries of TPC-H,
and MySQL show that these DBMSs typically outperform the DBMSs instead of running the benchmark and accounting for, e.g., the effects
they are compared to. Due to their length, the detailed results from of concurrent transactions. One primary study utilized the OLTP-Bench
the primary study comparisons are presented in the Appendix, which benchmark (Tongkaw and Tongkaw, 2016), one the LUBM bench-
includes tables detailing which DBMSs outperformed which. mark (Franke et al., 2013), and one, in addition to TPC-H, the JOB
Regarding the benchmarks defined in earlier scientific literature, the
benchmark (Suh et al., 2022). Regarding the benchmarks formulated
most popular was YCSB, which was utilized by 15 primary studies (ap-
by the primary study authors, 25 primary studies (21%) reported using
proximately 13%) (Abramova and Bernardino, 2013; Abramova et al.,
ad hoc queries instead of earlier defined benchmarks to compare the
2014a,b; Gandini et al., 2014; Schreiner et al., 2019; Seghier and Kazar,
performance of DBMSs. These queries were defined verbatim in the
2021; Yassien and Desouky, 2016; Abubakar et al., 2014; Kashyap
et al., 2013; Swaminathan and Elmasri, 2016; Tang and Fan, 2016; primary studies. In contrast, 70 of the primary studies (60%) compared
Klein et al., 2015; Araujo et al., 2021; Hendawi et al., 2018; Rabl et al., DBMS performance using undisclosed ad hoc queries, likely formulated
2012). The second most popular benchmark was the TPC-H benchmark by the study authors. In other words, 22 primary studies (19%) used
and its variations, utilized by five primary studies (4%) (Almeida et al., some type of earlier defined database benchmarking suite. The per-
2015; Fotache and Hrubaru, 2016; Oliveira and Bernardino, 2017; Suh formance tests of these 22 primary studies and what aspects of the
et al., 2022; Vershinin and Mustafina, 2021). It is worth noting, though, environment they reported are detailed in Table 4.

8
T. Taipalus The Journal of Systems & Software 208 (2024) 111872

Table 4
An overview of primary studies using previously defined benchmark software and which aspects of the testing environment they explicitly disclosed; performance measurements
abbreviated as ET (execution time) and TP (throughput).
Study Explicitly reported Benchmark Measurement Nodes
DBMS versions Hardware DB structure DBMS parameters
Abramova and Bernardino (2013) yes yes noa no YCSB ET 1
Abramova et al. (2014b) yes yes noa no YCSB ET 1
Abramova et al. (2014a) yes yes noa no YCSB ET 1
Abubakar et al. (2014) no no noa no YCSB ET 1
Almeida et al. (2015) no yes logical only no Star Schema Benchmark ET 1
Araujo et al. (2021) yes yes noa no YCSB ET, TP 2
Fotache and Hrubaru (2016) no yes logical only no TPC-H ET 5
Franke et al. (2013) yes yes no no LUBM-based ET 9
Gandini et al. (2014) no yes noa no YCSB ET, TP 2-9
Hendawi et al. (2018) yes yes noa no YCSB ET, TP 8
Kashyap et al. (2013) yes yes noa no YCSB ET, TP up to 5
Klein et al. (2015) yes no noa no YCSB ET, TP 9
Oliveira and Bernardino (2017) no yes logical only no TPC-H ET 1
Rabl et al. (2012) yes yes noa no YCSB ET, TP 16 and 24
Schreiner et al. (2019) no yes noa yes (default) YCSB, Voter ET, TP 3
Seghier and Kazar (2021) yes yes noa no YCSB ET 1
Suh et al. (2022) yes yes noa yes (default) TPC-H ET 3
Swaminathan and Elmasri (2016) yes yes noa no YCSB TP up to 14
Tang and Fan (2016) yes yes noa no YCSB ET, TP 4
Tongkaw and Tongkaw (2016) yes yes logical only no Sysbench, OLTP-Bench TP 1
Vershinin and Mustafina (2021) no yes logical only no TPC-H ET 1
Yassien and Desouky (2016) yes yes noa no YCSB ET, TP 1
a
The YCSB benchmark defines a single-table with n columns (or loose equivalents in non-relational data models).

6. Discussion interpret the results as if they do not generalize to other environments.


That is, if you are in the process of deciding on a DBMS for your
6.1. General discussion application, or perhaps considering changing one DBMS to another,
consider whether the performance comparison study you are reading
The difficulty of rigorous performance testing is perhaps one of the presents a similar use case. Compare your business domain to that pre-
root causes of why optimization is difficult, and several studies have sented in performance comparison studies, remembering that a single,
highlighted the complexity of performance testing due to, e.g., the sometimes even a seemingly inconsequential parameter (cf. e.g., data
effects of DBMS parameters (Purohith et al., 2017), testing environment types in SQLite in Purohith et al., 2017) may change the results.
settings (Wang et al., 2022), and how well the data in the performance
DeWitt and Levine (2008) aptly describe performance comparisons as
test database reflects the real application data (Qu et al., 2022a).
the maximum potential performance gain of one DBMS over another.
Is it also important whether an impartial actor has carried out the
The performance gain in your particular environment might be less,
performance test, or whether the test results are published e.g., by a
or it might be that the DBMS that performed better in the comparison
DBMS vendor (DeWitt and Levine, 2008). However, this is sometimes
difficult to assess and can be mitigated by simply explicitly reporting performs worse in your environment.
the test so that it can be replicated and verified by others. One important aspect of the environment is the physical setup. Dif-
Despite the fact that we were aware of some DBMS performance ferent hardware has been shown to affect DBMS performance, as some
comparison studies as they have been touched on in previous works, DBMSs exploit parallelism more efficiently than others (Marek and
we were surprised by the extent the few examples presented in the Rahm, 1992; Jiang et al., 2010), effectively meaning that if a test was
previous works (Raasveldt et al., 2018; Wang et al., 2022) generalize performed on one single-core CPU, the results might not generalize to
to so many studies on the subject. For example, in read operations, distributed and multi-CPU environments. Additionally, different hard-
MongoDB outperforms Cassandra according to ten studies, Cassandra ware aspects such as the relative sizes of different CPU memory caches
outperforms Redis according to four studies, and Redis outperforms may significantly affect DBMS performance, making performance com-
MongoDB according to six studies (cf. Appendix), leading to a situation parisons between different hardware a complex task (Ailamaki et al.,
of 𝑀𝑜 > 𝐶𝑎 > 𝑅𝑒 > 𝑀𝑜, where MongoDB is both the best and the worst 1999; Wang et al., 2022). In distributed environments, which were
performing DBMS. Furthermore, as discussed in Section 5, few of the se- rarely tested in the primary studies, it is worth considering whether
lected studies reported the test setting in enough detail for replication. data availability is prioritized over data consistency, as the latter setup
Unfortunately, without sufficient details for replicating an experiment, is typically significantly slower. Benchmarks that simulate concurrent
such experimental results can claim any outcome (Raasveldt et al., users should also be considered separately from performance tests that
2018). One aspect that was typically reported was some details about merely execute queries sequentially. Concurrency introduces several
the hardware the test was run on, i.e., processor make and model, clock challenges, many of which severely affect performance (Wang et al.,
rate, memory size, and disk size. Without other details about the DBMS
2022). For example, SQLite uses database locking on a level of gran-
parameters, parallel execution, etc., these details are inconsequential.
ularity which makes concurrent writes slow, but this has no negative
Despite the importance of the topic of DBMS performance comparisons,
effects on single-user writes (Obradovic et al., 2019). Unfortunately,
with the exception of one study (Rabl et al., 2012), no primary studies
some studies have shown that developers do not widely understand
were published in major data management fora such as ACM SIGMOD
or VLDB. concurrency-related security aspects (Warszawski and Bailis, 2017),
and that concurrency-related performance problems are understud-
6.2. Considerations for industry ied (Yu and Pradel, 2018). Some have even stated that the research
has not been focusing on relevant issues (Pavlo, 2017).
6.2.1. Consider the environments in performance testing studies Intuitively, different business domains have different databases and
If the environment in which the performance testing was carried out they are used in different ways. For example, in some domains, the end-
does not provide sufficient details, whatever the study states, you may users typically read data, while in others, write operations are more

9
T. Taipalus The Journal of Systems & Software 208 (2024) 111872

common. The ratio of read and write operations in a performance test while MongoDB outperformed PostgreSQL/PostGIS in most of the tests,
plays a crucial role, as some DBMSs are specifically designed for specific MongoDB provides only a subset of the geospatial operations provided
workloads (Cooper et al., 2010). The credibility of testing results is by PostGIS. If the rest of the operations needed by the business domain
also related to how well the test database and data therein represent need to be implemented in the software application, it is not realistic
the target environment (Qu et al., 2022a). Furthermore, in business to assume that such task is either trivial to implement, nor trivial to
domains such as online stores, there are typically popular products, and implement in a way that outperforms the solutions offered by existing
thus the data related to them are targets of a relatively large number DBMS features.
of database operations. For generalizable benchmarking results, the Another consideration is the availability of suitable workforce,
benchmark must account for such skewness in database use, rather which is closely related to the DBMS technology and its maturity. It
than, e.g., randomly querying data objects. It is also worth considering is not surprising that as query languages such as SQL have been a topic
how the performance tests have tested performance. For example, is of effectively all information technology-related curricula in higher
your application about inserting 10,000 rows in bulk, but one row education for several years (Joint Task Force on Computing Curricula,
at a time randomly generated by the application? If it is not, you Association for Computing Machinery (ACM) and IEEE Computer Soci-
should not consider this type of benchmark results as an indication of ety, 2013; The Joint Task Force on Computing Curricula, 2015), there
how well one DBMS performs compared to another in your particular is a relatively large number of professionals fluent in SQL, as opposed
business context. It is also worth considering that decision support to new query languages. Some studies have also shown that strong
benchmarks such as TPC-H test performance in environments that can consistency models (Corbett et al., 2013) and the SQL language (Cass,
be fundamentally different from transaction processing environments. 2022) are desired as skills as well as features in a DBMS. That is, it is
Finally, even similar business domains can have a myriad of different worth considering how feasible it is to implement a database system
technical implementations. with each specific technology, and DBMS performance is only one of
We have discussed some of the particulars involved in database the important aspects to consider.
system design in this subsection, and in Sections 2 and 3, from which Finally, as the primary studies typically considered performance
one can infer what has often been repeated in database system research: in terms of response time or throughput, we have approached the
the environments and their optimization is a task so complex (Graefe, topic from a similar viewpoint. However, as discussed in Section 3.1,
1993; Cooper et al., 2010) that DBMS optimization is a whole profes- performance may be measured by the usage of computing resources,
sion (Raasveldt et al., 2018). It follows that there are several threats which can be a goal conflicting with response time (Chaudhuri, 1998).
to rigorous DBMS benchmarking. Even though RDBMS optimization It is typical that increasing parallelism through multiple CPUs lowers
is widely and deeply studied in both academic and industry contexts, response time, but increases the total amount of work due to the par-
RDBMS optimization remains a complex task. In the domain of NoSQL allelism overhead (Osterhage, 2013, p. 13). Finally, it has been shown
DBMSs, there exist far fewer scientific studies simply due to the age that migrating data from one DBMS to another is all but trivial, and
of the NoSQL DBMSs, and the heterogeneity of NoSQL data models. prone to fail due to a lack of clear methodologies (Thalheim and Wang,
Additionally, there are several querying anti-patterns to avoid, such 2013) — especially when the DBMSs differ in data models and query
as performing joins in the software application instead of the DBMS, languages (Kim et al., 2018). Therefore, migrations such as RDBMS
or paging query results by utilizing ordering, limiting and offset. All ⇔ RDBMS or RDBMS ⇔ NewSQL are arguably less complex than
these points considered, a reader of a performance comparison study migrations such as NoSQL ⇔ NoSQL, RDBMS ⇔ NoSQL or NewSQL
must trust that the performance comparison study writers have been ⇔ NoSQL.
able to optimize the database systems to a similar degree for the per-
formance comparison results to be credible. This requires particularly 6.3. Considerations for research
specific, in-depth expertise when DBMSs with more than one data
model are compared. Furthermore, decades of benchmarking software 6.3.1. Consider using existing guidelines for testing and reporting
development by entire councils (e.g., TPC) cannot simply be skipped by Database benchmarking guidelines are not a novel invention in
writing a set of (often arbitrary) queries, running them on two or more database system research and have been described in detail (Gray,
DBMSs in a single-user environment, recording response times, and 1992) and in short (Dietrich et al., 1992) in the early 1990s, and as
consequently stating that one DBMS is faster than another. Although a reader-friendly checklist later (Raasveldt et al., 2018). Additionally,
this was the case in over 80% of the selected primary studies, we do benchmarking pitfalls have been discussed in numerous studies in
not consider this sufficient. respected database systems fora (Wang et al., 2022; Dreseler et al.,
In summary, if it is possible that changing even one of the en- 2020). Based on the primary studies, however, neither of these lines of
vironmental aspects discussed above may affect the performance test research has been widely applied in practice. Database benchmarking
results significantly, it seems reasonable to argue that, no matter how has been argued to be difficult (Raasveldt et al., 2018), as environmen-
many DBMS performance comparison studies state that one DBMS tal parameters such as the nature of data (Qu et al., 2022a), DBMS
outperforms another, these DBMSs were not tested in an environment parameters (Wang et al., 2022), and data types (Purohith et al., 2017)
that is the same as your environment, and thus have little concern in can all have significant impacts on performance testing results. Further-
the decision of which DBMS is performance-wise the best fit for your more, benchmarking tools have received critique (Reniers et al., 2017;
environment. Grolinger et al., 2013) despite the fact that some of the tools have been
under development for decades. Therefore, we urge researchers, at the
6.2.2. Consider other aspects besides performance very least, to consider whether using a performance test suite of one’s
There are other aspects besides response time or throughput to ad hoc queries is credible when well-known performance benchmarks
consider when choosing a DBMS. Performance gains, such as those are freely available.
provided by many NoSQL systems, rely heavily on redundant data As for reporting, Raasveldt et al. (2018) provide a 24-point checklist
to minimize the complexity of queries, thus providing faster response for fair benchmarking. Some of the points are concerned about how
times. Naturally, storing redundant data increases the cost of storage, performance is tested, and others about how the testing is reported. A
and may lead to data inconsistencies. Another comparison perspective performance comparison that cannot be replicated may present what-
is related to the features provided by the DBMSs compared. Intuitively, ever results (Raasveldt et al., 2018). Furthermore, an empirical study
a DBMS that is tailored for a specific purpose outperforms a general- without reproducible evidence should be considered an opinion of
purpose DBMS (Raasveldt et al., 2018; Stonebraker et al., 2007). For the authors, rather than an empirical study. Indeed, at the start of
example, one primary study (Bartoszewski et al., 2019) noted that the NoSQL movement, we have witnessed several studies with high

10
T. Taipalus The Journal of Systems & Software 208 (2024) 111872

praise for the strengths of different NoSQL products, yet with little or 6.4. Limitations and threats to validity
no critical notions addressing the acknowledged shortcoming of such
DBMSs. Therefore, we would caution the reader from inferring from It might be that some relevant studies are missing from this litera-
these results that one DBMS performs better than another. Rather, each ture review. However, it was not our intention to select primary studies
such argument should be carefully scrutinized and interpreted in a to quantitatively demonstrate that one DBMS outperforms another by
specific context, like in the primary study assessing the performance of the number of studies corroborating such an argument. Rather, the
GPU DBMSs (Suh et al., 2022), in which performance between DBMSs results verify previous observations (Raasveldt et al., 2018) according
was compared, but the comparison was merely one aspect of the study. to which many of such comparisons are problematic and should be
interpreted with care, if at all. Nevertheless, we have strived to include
at least most of the primary studies that fit our criteria (Table 2) by
6.3.2. Consider a different approach to DBMS-DBMS testing several rounds of snowballing (Fig. 3) as well as a complementary
Especially for a junior researcher, comparing the performance of literature search. Furthermore, as the DBMS classification (Table 3)
one DBMS to another may seem like a relatively simple research setting and the interpretation of the primary study results (Appendix) involve
to both carry out, and also justify based on the prevalence of the DBMS human judgment, it is possible that another group of researchers may
industry. We hope that the arguments presented in previous studies attain at least slightly different results.
as well as here have highlighted that neither of these points are as
clear-cut. Following the guidelines (e.g., Raasveldt et al., 2018) can 7. Conclusion
make performance testing a time-consuming task, and in many cases,
perhaps overly time-consuming, and given the considerations on the
Several database management system performance comparisons
generalizability of the results, the results may not be of interest in
have been conducted and published as both vendor white-papers as
other environments. Alternatively, not following guidelines introduces
well as in scientific fora. The approaches and reporting in such studies
significant threats to validity. While generalizability is hardly an in-
have been criticized in previous literature. In this study, we system-
trinsic value, concluding that, e.g., MySQL outperforms PostgreSQL atically surveyed 117 DBMS performance comparison studies. What
in ‘‘my webstore’’ but not in others unless they have similar data, seemed to be common among the selected primary studies is that
hardware, number of end-users, etc., does not carry the implication they lack sufficient detail for reproducibility. Scientific, peer-reviewed
of being as scientifically impactful result as saying that, e.g., MySQL research of high external validity concerning database management sys-
will always outperform PostgreSQL. Therefore, we must either perform tem performance comparison is effectively scarce. Based on the review
the performance comparisons with rigor and accept that the results do of literature, we presented several considerations for the industry as
not probably generalize, or perform the comparisons without scientific well as database system researchers. Namely, we argued for considering
rigor and state sophisms. Since the latter is hardly ethically sound, (i) the environments (i.e., business domain, amount of data, amount of
DBMS performance comparisons should be limited to domains where concurrent users, hardware, database distribution, read/write opera-
the goal of a study is not the generalizability of the results, but the tion ratio, etc.) when interpreting the results of DBMS performance
betterment of the very particular domain the study concerns (e.g., comparison tests, and for considering (ii) other aspects besides DBMS
Ameri et al., 2014). performance when choosing a DBMS or changing one DBMS to another,
Given the arguments above, we propose that future studies, if inter- and for researchers to consider (iii) using existing guidelines in per-
DBMS performance must be compared, consider taking a different formance testing and reporting the testing environments transparently,
approach to performance testing. First, using a wide range of database to consider (iv) different approaches to performance testing when one
system optimization experts to ensure that all aspects of the system are DBMS is compared to another, and to consider (v) other use cases for
fairly optimized, and avoiding situations where one system is optimized performance testing besides comparing the performance of one DBMS
beyond diminishing returns, while the other is barely optimized. We to another. The results highlight how rarely benchmarking software is
challenge research teams to explicitly disclose which authors optimized used in performance testing, how often different DBMSs with different
which systems, for authors to further one’s intellectual investments in data models are compared with each other, how often performance
the performance comparison. These solutions should be benchmarked testing results in different studies conflict with each other, and why.
by a party independent of all optimization teams, and fair benchmark- This study was not an attempt to argue the performance gains of
ing guidelines should be utilized. Second, after the benchmarking has one DBMS over another using primary studies. That is, please do not
been carried out, we urge researchers to consider what causes the cite this study by consulting the Appendix and stating that DBMS1
outperforms DBMS2 .
differences in performance, and critically compare those aspects as
well, as gains in performance arguably have root causes such as loos-
ened consistency or increased storage space. Nonetheless, performance Declaration of competing interest
comparisons of two or more DBMS with different data models should
be considered particularly complex. Unfortunately, such comparisons The authors declare that they have no known competing finan-
seem to be the most popular (cf. Fig. 5). cial interests or personal relationships that could have appeared to
influence the work reported in this paper.

6.3.3. Consider other use cases besides DBMS-DBMS testing altogether


Data availability
It is worth noting that benchmarking software has other use cases
besides inter-DBMS performance comparisons. Instead of comparing
Data will be made available on request.
one DBMS to another, researchers might consider testing the per-
formance effects of different hardware (Do et al., 2011), DBMS pa-
rameters (Wang et al., 2022), operating system parameters, query Appendix A. Supplementary data
languages (Holzschuher and Peinl, 2013), physical configurations such
as database distribution, physical structures such as different indices, Supplementary material related to this article can be found online
or different levels of data consistency. at https://doi.org/10.1016/j.jss.2023.111872.

11
T. Taipalus The Journal of Systems & Software 208 (2024) 111872

References Do, Jaeyoung, Zhang, Donghui, Patel, Jignesh M., DeWitt, David J., Naughton, Jef-
frey F., Halverson, Alan, 2011. Turbocharging DBMS buffer pool using SSDs. In:
Ailamaki, Anastassia, DeWitt, David J., Hill, Mark D., Wood, David A., 1999. DBMSs Proceedings of the 2011 ACM SIGMOD International Conference on Management
on a modern processor: Where does time go? In: VLDB’99, Proceedings of of Data. SIGMOD ’11, Association for Computing Machinery, New York, NY, USA,
25th International Conference on Very Large Data Bases, September 7-10, 1999, pp. 1113–1124.
Edinburgh, Scotland, UK, No. CONF. pp. 266–277. Dreseler, Markus, Boissier, Martin, Rabl, Tilmann, Uflacker, Matthias, 2020. Quanti-
Ammons, Glenn, Choi, Jong-Deok, Gupta, Manish, Swamy, Nikhil, 2004. Finding and fying TPC-H choke points and their optimizations. Proc. VLDB Endow. 13 (8),
removing performance bottlenecks in large systems. In: Odersky, Martin (Ed.), 1206–1220.
ECOOP 2004 – Object-Oriented Programming. Springer Berlin Heidelberg, Berlin, Elmasri, Ramez, Navathe, Shamkant B., 2016. Fundamentals of Database Systems,
Heidelberg, pp. 172–196. seventh ed. Pearson.
Barata, Melyssa, Bernardino, Jorge, Furtado, Pedro, 2015. An overview of decision sup- Elnikety, Sameh, Dropsho, Steven, Pedone, Fernando, 2006. Tashkent: Uniting durabil-
port benchmarks: TPC-DS, TPC-H and SSB. In: Rocha, Alvaro, Correia, Ana Maria, ity with transaction ordering for high-performance scalable database replication.
Costanzo, Sandra, Reis, Luis Paulo (Eds.), New Contributions in Information SIGOPS Oper. Syst. Rev. 40 (4), 117–130.
Systems and Technologies. Springer International Publishing, Cham, pp. 619–628. Estivill-Castro, Vladmir, Wood, Derick, 1992. A survey of adaptive sorting algorithms.
Berenson, Hal, Bernstein, Phil, Gray, Jim, Melton, Jim, O’Neil, Elizabeth, ACM Comput. Surv. 24 (4), 441–476.
O’Neil, Patrick, 1995. A critique of ANSI SQL isolation levels. In: Proceedings of Forresi, Chiara, Francia, Matteo, Gallinucci, Enrico, Golfarelli, Matteo, 2022. Cost-based
the 1995 ACM SIGMOD International Conference on Management of Data. SIGMOD optimization of multistore query plans. Inf. Syst. Front. 1–27.
’95, Association for Computing Machinery, New York, NY, USA, pp. 1–10. Gilbert, Seth, Lynch, Nancy, 2002. Brewer’s conjecture and the feasibility of consistent,
Bernstein, Philip A., Goodman, Nathan, 1981. Concurrency control in distributed available, partition-tolerant web services. SIGACT News 33 (2), 51–59.
database systems. ACM Comput. Surv. 13 (2), 185–221. Graefe, Goetz, 1993. Query evaluation techniques for large databases. ACM Comput.
Brass, Stefan, Goldberg, Christian, 2006. Semantic errors in SQL queries: A quite Surv. 25 (2), 73–169.
complete list. J. Syst. Softw. 79 (5), 630–644, Quality Software. Gray, Jim, 1992. Benchmark Handbook: For Database and Transaction Processing
Brewer, Eric, 2012. CAP twelve years later: How the ‘‘rules’’ have changed. Computer Systems. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
45 (2), 23–29. Grolinger, Katarina, Higashino, Wilson A., Tiwari, Abhinav, Capretz, Miriam A.M.,
Cass, Stephen, 2022. SQL should be your second language. IEEE Spectr. 59 (10), 20–21. 2013. Data management in cloud environments: Nosql and NewSQL data stores. J.
Chaudhry, Natalia, Yousaf, Muhammad Murtaza, 2020. Architectural assessment of Cloud Comput.: Adv., Syst. Appl. 2 (1), 22.
NoSQL and NewSQL systems. Distrib. Parallel Databases 38 (4), 881–926. Gunther, Neil J., 2011. Analyzing Computer System Performance with Perl::PDQ.
Chaudhuri, Surajit, 1998. An overview of query optimization in relational systems. Springer.
In: Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Guo, Yuanbo, Pan, Zhengxiang, Heflin, Jeff, 2005. LUBM: A benchmark for OWL
Principles of Database Systems. PODS ’98, Association for Computing Machinery, knowledge base systems. J. Web Semant. 3 (2), 158–182.
New York, NY, USA, pp. 34–43.
Guo, Binglei, Yu, Jiong, Yang, Dexian, Leng, Hongyong, Liao, Bin, 2022.
Chen, Peter Pin-Shan, 1976. The entity-relationship model - toward a unified view of Energy-efficient database systems: A systematic survey. ACM Comput. Surv..
data. ACM Trans. Database Syst. 1 (1), 9–36.
Haerder, Theo, Reuter, Andreas, 1983. Principles of transaction-oriented database
Christodoulakis, S., 1984. Implications of certain assumptions in database performance
recovery. ACM Comput. Surv. 15 (4), 287–317.
evauation. ACM Trans. Database Syst. 9 (2), 163–186.
Hecht, Robin, Jablonski, Stefan, 2011. NoSQL evaluation: A use case oriented survey.
Coates, Sean Steven, 2009. Comparing the Performance of Open Source and Proprietary
In: 2011 International Conference on Cloud and Service Computing. pp. 336–341.
Relational Database Management Systems (Ph.D. thesis). Northcentral University.
Hellerstein, Joseph M., Stonebraker, Michael, Hamilton, James, 2007. Architecture of
Codd, Edgar F., 1970. A relational model of data for large shared data banks. Commun.
a database system. Found. Trends Databases 1 (2), 141–259.
ACM 13 (6), 377–387.
Holzschuher, Florian, Peinl, René, 2013. Performance of graph query languages: Com-
Codd, Edgar F., 1972. Further normalization of the data base relational model. Data
parison of Cypher, Gremlin and native access in Neo4j. In: EDBT ’13. Association
Base Syst. 6, 33–64.
for Computing Machinery, New York, NY, USA, pp. 195–204.
Codd, Edgar F., 1975. Recent investigations in relational data base systems.
ISO/IEC, 2016a. ISO/IEC 9075-1:2016 - SQL - Part 1: Framework. Technical Report,
Connolly, Thomas, Begg, Carolyn, 2015. Database Systems, sixth ed. Pearson.
ISO/IEC.
Cooper, Brian F., Silberstein, Adam, Tam, Erwin, Ramakrishnan, Raghu, Sears, Russell,
ISO/IEC, 2016b. ISO/IEC 9075-2:2016 - SQL - Part 2: Foundation. Technical Report,
2010. Benchmarking cloud serving systems with YCSB. In: Proceedings of the
ISO/IEC.
1st ACM Symposium on Cloud Computing. SoCC ’10, Association for Computing
Jiang, Dawei, Ooi, Beng Chin, Shi, Lei, Wu, Sai, 2010. The performance of MapReduce:
Machinery, New York, NY, USA, pp. 143–154.
An in-depth study. Proc. VLDB Endow. 3 (1–2), 472–483.
Corbett, James C., Dean, Jeffrey, Epstein, Michael, Fikes, Andrew, Frost, Christo-
pher, Furman, J.J., Ghemawat, Sanjay, Gubarev, Andrey, Heiser, Christopher, Jin, Guoliang, Song, Linhai, Shi, Xiaoming, Scherpelz, Joel, Lu, Shan, 2012. Under-
Hochschild, Peter, Hsieh, Wilson, Kanthak, Sebastian, Kogan, Eugene, Li, Hongyi, standing and detecting real-world performance bugs. In: Proceedings of the 33rd
Lloyd, Alexander, Melnik, Sergey, Mwaura, David, Nagle, David, Quinlan, Sean, ACM SIGPLAN Conference on Programming Language Design and Implementation.
Rao, Rajesh, Rolig, Lindsay, Saito, Yasushi, Szymaniak, Michal, Taylor, Christopher, PLDI ’12, Association for Computing Machinery, New York, NY, USA, pp. 77–88.
Wang, Ruth, Woodford, Dale, 2013. Spanner: Google’s globally distributed database. Joint Task Force on Computing Curricula, Association for Computing Machinery (ACM)
ACM Trans. Comput. Syst. 31 (3), 1–22. and IEEE Computer Society, 2013. Computer science curricula 2013: Curriculum
Cortellessa, Vittorio, Di Marco, Antinisca, Inverardi, Paola, 2011. Model-Based Software guidelines for undergraduate degree programs in computer science. Technical
Performance Analysis, Vol. 980. Springer. Report, ACM, New York, NY, USA, 999133.
Date, Chris J., 2019. Database Design and Relational Theory: Normal Forms and All Juran, Joseph M., De Feo, Joseph A., 2010. Juran’s Quality Handbook: The Complete
that Jazz. A Press. Guide To Performance Excellence, sixth ed. McGraw-Hill Education.
Davoudian, Ali, Chen, Liu, Liu, Mengchi, 2018. A survey on NoSQL stores. ACM Kim, Ho-Jun, Ko, Eun-Jeong, Jeon, Young-Ho, Lee, Ki-Hoon, 2018. Migration from
Comput. Surv. 51 (2). RDBMS to column-oriented NoSQL: Lessons learned and open problems. In:
Delis, A., Roussopoulos, N., 1993. Performance comparison of three modern DBMS Lee, Wookey, Choi, Wonik, Jung, Sungwon, Song, Min (Eds.), Proceedings of
architectures. IEEE Trans. Softw. Eng. 19 (2), 120–138. the 7th International Conference on Emerging Databases. Springer Singapore,
DeWitt, David J., Levine, Charles, 2008. Not just correct, but correct and fast: A look Singapore, pp. 25–33.
at one of Jim Gray’s contributions to database system performance. SIGMOD Rec. Kim, You Jung, Patel, Jignesh, 2010. Performance comparison of the R*-Tree and the
37 (2), 45–49. quadtree for kNN and distance join queries. IEEE Trans. Knowl. Data Eng. 22 (7),
Dey, Akon, Fekete, Alan, Nambiar, Raghunath, Röhm, Uwe, 2014. YCSB+T: Benchmark- 1014–1027.
ing web-scale transactional databases. In: 2014 IEEE 30th International Conference Kumar, Rakesh, Grot, Boris, 2022. Shooting down the server front-end bottleneck. ACM
on Data Engineering Workshops. pp. 223–230. Trans. Comput. Syst. 38 (3–4).
Dietrich, Suzanne W., Brown, M., Cortes-Rello, Enrique, Wunderlin, S., 1992. A Leis, Viktor, Gubichev, Andrey, Mirchev, Atanas, Boncz, Peter, Kemper, Alfons, Neu-
practitioner’s introduction to database performance benchmarks and measurements. mann, Thomas, 2015. How good are query optimizers, really? Proc. VLDB Endow.
Comput. J. 35 (4), 322–331. 9 (3), 204–215.
Difallah, Djellel Eddine, Pavlo, Andrew, Curino, Carlo, Cudre-Mauroux, Philippe, 2013. Lightstone, Sam S., Teorey, Toby J., Nadeau, Tom, 2010. Physical Database Design:
OLTP-Bench: An extensible testbed for benchmarking relational databases. Proc. The Database Professional’s Guide to Exploiting Indexes, Views, Storage, and more.
VLDB Endow. 7 (4), 277–288. Morgan Kaufmann.
Do, Thanh, Graefe, Goetz, Naughton, Jeffrey, 2022. Efficient sorting, duplicate removal, Lu, Jiaheng, Holubová, Irena, 2019. Multi-model databases: A new journey to handle
grouping, and aggregation. ACM Trans. Database Syst.. the variety of data. ACM Comput. Surv. 52 (3).

12
T. Taipalus The Journal of Systems & Software 208 (2024) 111872

Marek, Robert, Rahm, Erhard, 1992. Performance evaluation of parallel transaction Valduriez, Patrick, 1987. Join indices. ACM Trans. Database Syst. 12 (2), 218–246.
processing in shared nothing database systems. In: Etiemble, Daniel, Syre, Jean- Wang, Yang, Yu, Miao, Hui, Yujie, Zhou, Fang, Huang, Yuyang, Zhu, Rui, Ren, Xueyuan,
Claude (Eds.), PARLE ’92 Parallel Architectures and Languages Europe. Springer Li, Tianxi, Lu, Xiaoyi, 2022. A study of database performance sensitivity to
Berlin Heidelberg, Berlin, Heidelberg, pp. 295–310. experiment settings.. Proc. VLDB Endow. 15 (7).
Obradovic, Nikola, Kelec, Aleksandar, Dujlovic, Igor, 2019. Performance anal- Warszawski, Todd, Bailis, Peter, 2017. ACIDRain: Concurrency-related attacks on
ysis on Android SQLite database. In: 2019 18th International Symposium database-backed web applications. In: Proceedings of the 2017 ACM International
INFOTEH-JAHORINA. INFOTEH, pp. 1–4. Conference on Management of Data. SIGMOD ’17, Association for Computing
Osterhage, W., 2013. Computer Performance Optimization. Springer. Machinery, New York, NY, USA, pp. 5–20.
Patel, Jignesh M., DeWitt, David J., 1996. Partition based spatial-merge join. In: Winand, Markus, 2012. SQL Performance Explained. Markus Winand.
Proceedings of the 1996 ACM SIGMOD International Conference on Management Yang, Jinfeng, Lilja, David J., 2018. Reducing relational database performance bot-
of Data. SIGMOD ’96, Association for Computing Machinery, New York, NY, USA, tlenecks using 3D XPoint storage technology. In: 2018 17th IEEE International
pp. 259–270. Conference on Trust, Security and Privacy in Computing and Communica-
Patounas, Georgios, Foukas, Xenofon, Elmokashfi, Ahmed, Marina, Mahesh K., 2020. tions/ 12th IEEE International Conference on Big Data Science and Engineering.
Characterization and identification of cloudified mobile network performance TrustCom/BigDataSE, pp. 1804–1808.
bottlenecks. IEEE Trans. Netw. Serv. Manag. 17 (4), 2567–2583. Yu, Tingting, Pradel, Michael, 2018. Pinpointing and repairing performance bottlenecks
Pavlo, Andrew, 2017. What are we doing with our lives? Nobody cares about our in concurrent programs. Empir. Softw. Eng. 23 (5), 3034–3071.
concurrency control research. In: Proceedings of the 2017 ACM International
Conference on Management of Data. SIGMOD ’17, Association for Computing
Machinery, New York, NY, USA, p. 3.
Primary studies
Pavlo, Andrew, Aslett, Matthew, 2016. What’s really new with NewSQL? SIGMOD Rec.
45 (2), 45–55. Abramova, Veronika, Bernardino, Jorge, 2013. NoSQL databases: Mongodb vs Cassan-
Purbo, Onno W., Sriyanto, Sriyanto, Suhendro, Suhendro, Aziz, Rz Abd, Herwanto, Riko, dra. In: Proceedings of the International C* Conference on Computer Science and
2020. Benchmark and comparison between hyperledger and MySQL. TELKOMNIKA Software Engineering. C3S2E ’13, ACM Press, Porto, Portugal, pp. 14–22.
(Telecommun. Comput. Electron. Control) 18 (2), 705–715. Abramova, Veronika, Bernardino, Jorge, Furtado, Pedro, 2014a. Experimental
Purohith, Dhathri, Mohan, Jayashree, Chidambaram, Vijay, 2017. The dangers and evaluation of NoSQL databases. Int. J. Database Manag. Syst. 6 (3), 01–16.
Abramova, Veronika, Bernardino, Jorge, Furtado, Pedro, 2014b. Which NoSQL
complexities of SQLite benchmarking. In: Proceedings of the 8th Asia-Pacific
database? A performance overview. Open J. Databases (OJDB) 1 (2), 17–24.
Workshop on Systems. APSys ’17, Association for Computing Machinery, New York,
Abubakar, Yusuf, Adeyi, Thankgod Sani, Auta, Ibrahim Gambo, 2014. Performance
NY, USA.
evaluation of NoSQL systems using YCSB in a resource austere environment.
Qu, Luyi, Li, Yuming, Zhang, Rong, Chen, Ting, Shu, Ke, Qian, Weining, Zhou, Aoy-
Perform. Eval. 7 (8), 23–27.
ing, 2022a. Application-oriented workload generation for transactional database
Almeida, Rafael, Furtado, Pedro, Bernardino, Jorge, 2015. Performance evaluation
performance evaluation. In: 2022 IEEE 38th International Conference on Data
MySQL InnoDB and microsoft SQL server 2012 for decision support environments.
Engineering. ICDE, pp. 420–432.
In: Proceedings of the Eighth International Conference on Computer Science &
Qu, Luyi, Wang, Qingshuai, Chen, Ting, Li, Keqiang, Zhang, Rong, Zhou, Xuan,
Software Engineering. C3S2E’15, ACM Press.
Xu, Quanqing, Yang, Zhifeng, Yang, Chuanhui, Qian, Weining, Zhou, Aoying, Ameri, Parinaz, Grabowski, Udo, Meyer, Jorg, Streit, Achim, 2014. On the appli-
2022b. Are current benchmarks adequate to evaluate distributed transactional cation and performance of MongoDB for climate satellite data. In: 2014 IEEE
databases? BenchCouncil Trans. Benchmarks, Stand Eval. 2 (1), 100031. 13th International Conference on Trust, Security and Privacy in Computing and
Raasveldt, Mark, Holanda, Pedro, Gubner, Tim, Mühleisen, Hannes, 2018. Fair bench- Communications. IEEE, Beijing, China, pp. 652–659.
marking considered difficult: Common pitfalls in database performance testing. In: Araujo, Jose Maria A., de Moura, Alysson Cristiano E., da Silva, Silvia Laryssa B.,
Proceedings of the Workshop on Testing Database Systems. DBTest ’18, Association Holanda, Maristela, Ribeiro, Edward de Oliveira, da Silva, Gladston Luiz, 2021.
for Computing Machinery, New York, NY, USA. Comparative performance analysis of NoSQL Cassandra and MongoDB databases.
Ramakrishnan, Raghu, 2012. CAP and cloud data management. Computer 45 (2), In: 2021 16th Iberian Conference on Information Systems and Technologies. CISTI,
43–49. IEEE, Chaves, Portugal, pp. 1–6.
Reniers, Vincent, Van Landuyt, Dimitri, Rafique, Ansar, Joosen, Wouter, 2017. On the Bartoszewski, Dominik, Piorkowski, Adam, Lupa, Michal, 2019. The comparison of
state of NoSQL benchmarks. In: Proceedings of the 8th ACM/SPEC on International processing efficiency of spatial data for PostGIS and MongoDB databases. In:
Conference on Performance Engineering Companion. In: ICPE ’17 Companion, Beyond Databases, Architectures and Structures. Paving the Road to Smart Data
Association for Computing Machinery, New York, NY, USA, pp. 107–112. Processing and Analysis. Springer International Publishing, pp. 291–302.
Schneider, Donovan A., DeWitt, David J., 1989. A performance evaluation of four Cheng, Yinyi, Zhou, Kefa, Wang, Jinlin, 2019. Performance analysis of PostgreSQL and
parallel join algorithms in a shared-nothing multiprocessor environment. In: Pro- MongoDB databases for unstructured data. In: Proceedings of the 2019 International
ceedings of the 1989 ACM SIGMOD International Conference on Management of Conference on Mathematics, Big Data Analysis and Simulation and Modeling.
Data. SIGMOD ’89, Association for Computing Machinery, New York, NY, USA, pp. MBDASM 2019, Atlantis Press, Changsha, China.
110–121. Faraj, Azhi, Rashid, Bilal, Shareef, Twana, 2014. Comparative study of relational and
Stonebraker, Michael, 2010. SQL databases v. NoSQL databases. Commun. ACM 53 (4), non-relations database performances using Oracle and MongoDB systems. Int. J.
10–11. Comput. Eng. Technol. (IJCET) 5 (11), 11–22.
Stonebraker, Michael, Bear, Chuck, Çetintemel, Uğur, Cherniack, Mitch, Ge, Tingjian, Fotache, Marin, Hrubaru, Ionuţ, 2016. Performance analysis of two big data tech-
Hachem, Nabil, Harizopoulos, Stavros, Lifter, John, Rogers, Jennie, Zdonik, Stan, nologies on a cloud distributed architecture. Results for non-aggregate queries on
2007. One size fits all? Part 2: Benchmarking results. In: Proc. CIDR. medium-sized data. Sci. Ann. Econ. Bus. 63 (s1), 21–50.
Sundaresan, Srikanth, Magharei, Nazanin, Feamster, Nick, Teixeira, Renata, Craw- Franke, Craig, Morin, Samuel, Chebotko, Artem, Abraham, John, Brazier, Pearl, 2013.
Efficient processing of semantic web queries in HBase and MySQL cluster. IT Prof.
ford, Sam, 2013. Web performance bottlenecks in broadband access networks.
15 (3), 36–43.
SIGMETRICS Perform. Eval. Rev. 41 (1), 383–384.
Gandini, Andrea, Gribaudo, Marco, Knottenbelt, William J., Osman, Rasha, Piaz-
Tallent, Nathan R., Mellor-Crummey, John M., 2009. Identifying performance
zolla, Pietro, 2014. Performance evaluation of NoSQL databases. In: Computer
bottlenecks in work-stealing computations. Computer 42 (12), 44–50.
Performance Engineering. Springer International Publishing, pp. 16–29.
Thalheim, Bernhard, Wang, Qing, 2013. Data migration: A theoretical perspective. Data
Hendawi, Abdeltawab, Gupta, Jayant, Jiayi, Liu, Teredesai, Ankur, Naveen, Ra-
Knowl. Eng. 87, 260–278.
makrishnan, Mohak, Shah, Ali, Mohamed, 2018. Distributed NoSQL data stores:
The Joint Task Force on Computing Curricula, 2015. Curriculum Guidelines for
Performance analysis and a case study. In: 2018 IEEE International Conference on
Undergraduate Degree Programs in Software Engineering. Technical Report, ACM,
Big Data. Big Data, IEEE, Seattle, WA, USA, pp. 1937–1944.
New York, NY, USA.
Jing, Yinan, Zhang, Chunwang, Wang, Xueping, 2009. An empirical study on per-
Toffola, Luca Della, Pradel, Michael, Gross, Thomas R., 2018. Synthesizing programs
formance comparison of Lucene and relational database. In: 2009 International
that expose performance bottlenecks. In: Proceedings of the 2018 International Conference on Communication Software and Networks. IEEE.
Symposium on Code Generation and Optimization. In: CGO 2018, Association for Kashyap, Suman, Zamwar, Shruti, Bhavsar, Tanvi, Singh, Snigdha, 2013. Benchmarking
Computing Machinery, New York, NY, USA, pp. 314–326. and analysis of NoSQL technologies. Int. J. Emerg. Technol. Adv. Eng. 3 (9),
Tözün, Pınar, Pandis, Ippokratis, Kaynak, Cansu, Jevdjic, Djordje, Ailamaki, Anastasia, 422–426.
2013. From A to E: Analyzing TPC’s OLTP benchmarks: The obsolete, the ubiq- Klein, John, Gorton, Ian, Ernst, Neil, Donohoe, Patrick, Pham, Kim, Matser, Chrisjan,
uitous, the unexplored. In: EDBT ’13. Association for Computing Machinery, New 2015. Performance evaluation of NoSQL databases: A case study. In: Proceedings
York, NY, USA, pp. 17–28. of the 1st Workshop on Performance Analysis of Big Data Systems. ACM, Austin
Tu, Stephen, Zheng, Wenting, Kohler, Eddie, Liskov, Barbara, Madden, Samuel, 2013. Texas USA, pp. 5–10.
Speedy transactions in multicore in-memory databases. In: Proceedings of the Kulshrestha, Sudhanshu, Sachdeva, Shelly, 2014. Performance comparison for data
Twenty-Fourth ACM Symposium on Operating Systems Principles. SOSP ’13, storage - Db4o and MySQL databases. In: 2014 Seventh International Conference
Association for Computing Machinery, New York, NY, USA, pp. 18–32. on Contemporary Computing (IC3). IEEE, Noida, India, pp. 166–170.

13
T. Taipalus The Journal of Systems & Software 208 (2024) 111872

KumarDwivedi, Amit, Lamba, C.S., Shukla, Shweta, 2012. Performance analysis of Baruffa, Giuseppe, Femminella, Mauro, Pergolesi, Matteo, Reali, Gianluca, 2020. Com-
column oriented database vs row oriented database. Int. J. Comput. Appl. 50 (14), parison of MongoDB and Cassandra databases for spectrum monitoring as-a-service.
31–34. IEEE Trans. Netw. Serv. Manag. 17 (1), 346–360.
Nepaliya, Prateek, Gupta, Prateek, 2015. Performance analysis of NoSQL databases. Int. Bassil, Youssef, 2012. A comparative study on the performance of the top DBMS
J. Comput. Appl. 127 (12), 36–39. systems. J. Comput. Sci. Res..
Oliveira, João, Bernardino, Jorge, 2017. NewSQL databases - MemSQL and VoltDB Batra, Shalini, Tyagi, Charu, 2012. Comparative analysis of relational and graph
experimental evaluation:. In: Proceedings of the 9th International Joint Conference databases. Int. J. Soft Comput. Eng. (IJSCE) 2 (2), 509–512.
on Knowledge Discovery, Knowledge Engineering and Knowledge Management. Boicea, Alexandru, Radulescu, Florin, Agapin, Laura Ioana, 2012. MongoDB vs Oracle
SCITEPRESS - Science and Technology Publications, Funchal, Madeira, Portugal, – database comparison. In: 2012 Third International Conference on Emerging
pp. 276–281. Intelligent Data and Web Technologies. IEEE.
Padhy, Sarita, Kumaran, G. Mayil Muthu, 2019. A quantitative performance analysis Čerešňák, Roman, Kvet, Michal, 2019. Comparison of query performance in relational
between Mongodb and Oracle NoSQL. In: 2019 6th International Conference on a non-relation databases. Transp. Res. Procedia 40, 170–177.
Computing for Sustainable Global Development. INDIACom, IEEE, pp. 387–391. Chakraborty, Soarov, Paul, Shourav, Azharul Hasan, K.M., 2021. Performance compar-
Rabl, Tilmann, Gómez-Villamor, Sergio, Sadoghi, Mohammad, Muntés-Mulero, Victor, ison for data retrieval from NoSQL and SQL databases: A case study for COVID-19
Jacobsen, Hans-Arno, Mankovskii, Serge, 2012. Solving big data challenges for genome sequence dataset. In: 2021 2nd International Conference on Robotics,
enterprise application performance management. Proc. VLDB Endow. 5 (12), Electrical and Signal Processing Techniques. ICREST, IEEE, DHAKA, Bangladesh,
1724–1735. pp. 324–328.
Schmid, Stephan, Galicz, Eszter, Reinhardt, Wolfgang, 2015b. WMS performance of Chaudhary, Anurag Singh, Singh, Kanika, Kalra, Sanchi, Kaur, Parmeet, 2018. An
selected SQL and NoSQL databases. In: International Conference on Military empirical comparison of MongoDB and hive. In: 2018 4th International Conference
Technologies. ICMT 2015, IEEE. on Computing Communication and Automation. ICCCA, IEEE, Greater Noida, India,
Schreiner, Geomar A., Knob, Ronan, Duarte, Denio, Vilain, Patricia, Mello, Ronaldo pp. 1–4.
dos Santos, 2019. NewSQL through the looking glass. In: Proceedings of the 21st Chickerur, Satyadhyan, Goudar, Anoop, Kinnerkar, Ankita, 2015. Comparison of
International Conference on Information Integration and Web-Based Applications relational database with document-oriented database (MongoDB) for big data appli-
& Services. ACM, Munich Germany, pp. 361–369. cations. In: 2015 8th International Conference on Advanced Software Engineering
Seghier, Nadia Ben, Kazar, Okba, 2021. Performance benchmarking and comparison & Its Applications. ASEA, IEEE, Jeju Island, South Korea, pp. 41–47.
of NoSQL databases: Redis vs MongoDB vs Cassandra using YCSB tool. In: 2021
Chopade, Mrs. Rupali M., Dhavase, Nikhil S., 2017. MongoDB, Couchbase: Perfor-
International Conference on Recent Advances in Mathematics and Informatics.
mance comparison for image dataset. In: 2017 2nd International Conference for
ICRAMI, IEEE, Tebessa, Algeria, pp. 1–6.
Convergence in Technology. I2CT, IEEE, Mumbai, pp. 255–258.
Suh, Young-Kyoon, An, Junyoung, Tak, Byungchul, Na, Gap-Joo, 2022. A comprehen-
Damodaran B, Dipina, Salim, Shirin, Vargese, Surekha Marium, 2016. Performance
sive empirical study of query performance across GPU DBMSes. Proc. ACM Meas.
evaluation of MySQL and MongoDB databases. Int. J. Cybern. Inf. 5 (2), 387–394.
Anal. Comput. Syst. 6 (1), 1–29.
Deari, Raif, Zenuni, Xhemal, Ajdari, Jaumin, Ismaili, Florije, Raufi, Bujar, 2018. Anal-
Swaminathan, Surya Narayanan, Elmasri, Ramez, 2016. Quantitative analysis of scalable
ysis and comparision of document-based databases with relational databases: Mon-
NoSQL databases. In: 2016 IEEE International Congress on Big Data. BigData
godb vs MySQL. In: 2018 International Conference on Information Technologies.
Congress, IEEE, San Francisco, CA, USA, pp. 323–326.
InfoTech, IEEE, Varna, pp. 1–4.
Tang, Enqing, Fan, Yushun, 2016. Performance comparison between five NoSQL
Ding, Haijie, Jin, Yuehui, Cui, Yidong, Yang, Tan, 2012. Distributed storage of network
databases. In: 2016 7th International Conference on Cloud Computing and Big
measurement data on HBase. In: 2012 IEEE 2nd International Conference on Cloud
Data. CCBD, IEEE, Macau, China, pp. 105–109.
Computing and Intelligence Systems. IEEE.
Tongkaw, Sasalak, Tongkaw, Aumnat, 2016. A comparison of database performance
Eyada, Mahmoud Moustafa, Saber, Walaa, Genidy, Mohammed M. El, Amer, Fathy,
of MariaDB and MySQL with OLTP workload. In: 2016 IEEE Conference on Open
2020. Performance evaluation of IoT data management using MongoDB versus
Systems. ICOS, IEEE.
MySQL databases in different cloud environments. IEEE Access 8, 110656–110668.
Vershinin, I.S., Mustafina, A.R., 2021. Performance analysis of PostgreSQL, MySQL,
Fatima, Haleemunnisa, Wasnik, Kumud, 2016. Comparison of SQL, NoSQL and NewSQL
Microsoft SQL server systems based on TPC-h Tests. In: 2021 International
databases for internet of things. In: 2016 IEEE Bombay Section Symposium. IBSS,
Russian Automation Conference. RusAutoCon, IEEE, Sochi, Russian Federation, pp.
IEEE.
683–687.
Yassien, Amal W., Desouky, Amr F., 2016. RDBMS, NoSQL, Hadoop: A performance- Filip, Petr, Cegan, Lukas, 2020. Comparison of MySQL and MongoDB with focus on
based empirical analysis. In: Proceedings of the 2nd Africa and Middle East performance. In: 2020 International Conference on Informatics, Multimedia, Cyber
Conference on Software Engineering - AMECSE ’16. ACM Press, Cairo, Egypt, pp. and Information System. ICIMCIS, IEEE.
52–59. Fioravanti, Sara, Mattolini, Simone, Patara, Fulvio, Vicario, Enrico, 2016. Experimental
performance evaluation of different data models for a reflection software archi-
tecture over NoSQL persistence layers. In: Proceedings of the 7th ACM/SPEC on
Further reading International Conference on Performance Engineering. ACM, Delft The Netherlands,
pp. 297–308.
Abdullah, Ahmad, Zhuge, Qingfeng, 2015. From relational databases to NoSQL Fraczek, Konrad, Plechawska-Wojcik, Malgorzata, 2017. Comparative analysis of re-
databases: Performance evaluation. Res. J. Appl. Sci. Eng. Technol. 11 (4), 434–439. lational and non-relational databases in the context of performance in web
Aboutorabi, Seyyed Hamid, Rezapour, Mehdi, Moradi, Milad, Ghadiri, Nasser, 2015. applications. In: Beyond Databases, Architectures and Structures. Toward Efficient
Performance evaluation of SQL and MongoDB databases for big e-commerce data. Solutions for Data Analysis and Knowledge Representation. Springer International
In: 2015 International Symposium on Computer Science and Software Engineering. Publishing, pp. 153–164.
CSSE, IEEE, Tabriz, Iran, pp. 1–7. Gomes, Augusto, Lopes, Vitor, Ribeiro, Edward, Lima, Jorge, Costa, Wagner, Gar-
Afolabi, A.O., Ajayi, A.O., 2008. Performance evaluation of a database management cia, Luis, Holanda, Maristela, 2021. An empirical performance comparison between
system (A case study of INTERBASE and MySQL). J. Eng. Appl. Sci. 3 (2), 155–160. MySQL and MongoDB on analytical queries in the COMEX database. In: 2021 16th
Agarwal, Sarthak, Rajan, K.S., 2016. Performance analysis of MongoDB versus Post- Iberian Conference on Information Systems and Technologies. CISTI, IEEE, Chaves,
GIS/PostGreSQL databases for line intersection and point containment spatial Portugal, pp. 1–5.
queries. Spatial Inf. Res. 24 (6), 671–677. Gunawan, Rohmat, Rahmatulloh, Alam, Darmawan, Irfan, 2019. Performance evalua-
Aghi, Rajat, Mehta, Sumeet, Chauhan, Rahul, Chaudhary, Siddhant, Bohra, Navdeep, tion of query response time in the document stored NoSQL database. In: 2019 16th
2015. A comprehensive comparison of SQL and MongoDB databases. Int. J. Sci. International Conference on Quality in Research (QIR): International Symposium
Res. Publ. 5 (2), 1–3. on Electrical and Computer Engineering. IEEE, Padang, Indonesia, pp. 1–6.
Ahmed, Nadeem, Ahamed, Shakil, Rafiq, Jahir Ibna, Rahim, Sifatur, 2017. Data pro- Gyorodi, Cornelia A., Dumse-Burescu, Diana V., Zmaranda, Doina R., Gyorodi, Robert s.,
cessing in Hive vs. SQL Server: A comparative analysis in the query performance. Gabor, Gianina A., Pecherle, George D., 2020. Performance analysis of NoSQL and
In: 2017 IEEE 3rd International Conference on Engineering Technologies and Social relational databases with CouchDB and MySQL for application’’’s data storage. Appl.
Sciences. ICETSS, IEEE, Bangkok, pp. 1–5. Sci. 10 (23), 8524.
Almabdy, Soad, 2018. Comparative analysis of relational and graph databases for Gyorodi, Cornelia, Gyorodi, Robert, Pecherle, George, Olah, Andrada, 2015. A com-
social networks. In: 2018 1st International Conference on Computer Applications parative study: Mongodb vs. MySQL. In: 2015 13th International Conference on
& Information Security. ICCAIS, IEEE, pp. 1–4. Engineering of Modern Electric Systems. EMES, IEEE, Oradea, Romania, pp. 1–6.
Andjelic, Svetlana, Obradovic, Slobodan, Gacesa, Branislav, 2008. A performance Hairah, U., Budiman, E., 2021. Inner join query performance: MariaDB vs PostgreSQL.
analysis of the DBMS - MySQL Vs PostgreSQL. Commun. - Sci. Lett. Univ. Zilina J. Phys. Conf. Ser. 1844 (1), 012021.
10 (4), 53–57. Haiyan, Yu, Jingsong, Li, Huan, Chen, Xiaoguang, Zhang, Yu, Tian, Yibing, Yang, 2010.
Baralis, Elena, Dalla Valle, Andrea, Garza, Paolo, Rossi, Claudio, Scullino, Francesco, Performance evaluation of post-relational database in hospital information systems.
2017. SQL versus NoSQL databases for geospatial applications. In: 2017 IEEE In: 2010 Second International Workshop on Education Technology and Computer
International Conference on Big Data. Big Data, IEEE, Boston, MA, pp. 3388–3397. Science. IEEE.

14
T. Taipalus The Journal of Systems & Software 208 (2024) 111872

Hajjaji, Yosra, Farah, Imed Riadh, 2018. Performance investigation of selected NoSQL Murazza, Muh. Rafif, Nurwidyantoro, Arif, 2016. Cassandra and SQL database compar-
databases for massive remote sensing image data storage. In: 2018 4th International ison for near real-time Twitter data warehouse. In: 2016 International Seminar on
Conference on Advanced Technologies for Signal and Image Processing. ATSIP, Intelligent Technology and Its Applications. ISITIA, IEEE, Lombok, Indonesia, pp.
IEEE. 195–200.
Hassan, Mahmudul, Bansal, Srividya K., 2018. Semantic data querying over NoSQL Nyati, Suyog S., Pawar, Shivanand, Ingle, Rajesh, 2013. Performance evaluation
databases with Apache Spark. In: 2018 IEEE International Conference on of unstructured NoSQL data over distributed framework. In: 2013 International
Information Reuse and Integration. IRI, IEEE, Salt Lake City, UT, pp. 364–371. Conference on Advances in Computing, Communications and Informatics. ICACCI,
Ilić, Miloš, Kopanja, Lazar, Zlatković, Dragan, Trajković, Milica, Ćurguz, Dejana, 2021. IEEE, Mysore, pp. 1623–1627.
Microsoft SQL Server and Oracle: Comparative performance analysis. In: Book of Ohyver, Margaretha, Moniaga, Jurike V., Sungkawa, Iwa, Subagyo, Bonifasius Edwin,
Proceedings of the 7th International Conference Knowledge Management. Chandra, Ian Argus, 2019. The comparison firebase realtime database and MySQL
Jaiswal, Garima, 2013. Comparative analysis of relational and graph databases. IOSR database performance using wilcoxon signed-rank test. Procedia Comput. Sci. 157,
J. Eng. 03 (08), 25–27. 396–405.
Jandaeng, Chanankorn, 2015. Comparison of RDBMS and document oriented database Parker, Zachary, Poe, Scott, Vrbsky, Susan V., 2013. Comparing NoSQL MongoDB to
in audit log analysis. In: 2015 7th International Conference on Information an SQL DB. In: Proceedings of the 51st ACM Southeast Conference on - ACMSE
Technology and Electrical Engineering. ICITEE, IEEE, Chiang Mai, Thailand, pp. ’13. ACM Press, Savannah, Georgia, p. 1.
332–336. Patil, Mayur M., Hanni, Akkamahadevi, Tejeshwar, C.H., Patil, Priyadarshini, 2017.
Jose, Benymol, Abraham, Sajimon, 2020. Performance analysis of NoSQL and relational A qualitative analysis of the performance of MongoDB vs MySQL database based
databases with MongoDB and MySQL. Mater. Today: Proc. 24, 2036–2043. on insertion and retriewal operations using a web/android application to explore
Jung, Min-Gyue, Youn, Seon-A., Bae, Jayon, Choi, Yong-Lak, 2015. A study on data load balancing — sharding in MongoDB and its advantages. In: 2017 International
input and output performance comparison of MongoDB and PostgreSQL in the big Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud). I-SMAC, IEEE.
data environment. In: 2015 8th International Conference on Database Theory and Pereira, Diogo Augusto, Ourique de Morais, Wagner, Pignaton de Freitas, Edison,
Application. DTA, IEEE, Jeju Island, South Korea, pp. 14–17. 2018. NoSQL real-time database performance comparison. Int. J. Parallel Emergent
Kabakus, Abdullah Talha, Kara, Resul, 2017. A performance evaluation of in-memory Distrib. Syst. 33 (2), 144–156.
databases. J. King Saud Univ. - Comput. Inf. Sci. 29 (4), 520–525. Poljak, R., Poscic, P., Jaksic, D., 2017. Comparative analysis of the selected relational
Kaur, Karambir, Sachdeva, Monika, 2017. Performance evaluation of NewSQL database management systems. In: 2017 40th International Convention on Infor-
databases. In: 2017 International Conference on Inventive Systems and Control. mation and Communication Technology, Electronics and Microelectronics. MIPRO,
ICISC, IEEE. IEEE, Opatija, Croatia, pp. 1496–1500.
Khan, Wisal, Ahmad, Waqas, Luo, Bin, Ahmed, Ejaz, 2019. SQL Database with physical Puangsaijai, Wittawat, Puntheeranurak, Sutheera, 2017. A comparative study of re-
database tuning technique and NoSQL graph database comparisons. In: 2019 lational database and key-value database for big data applications. In: 2017
IEEE 3rd Information Technology, Networking, Electronic and Automation Control International Electrical Engineering Congress. IEECON, IEEE, Pattaya, Thailand, pp.
Conference. ITNEC, IEEE, Chengdu, China, pp. 110–116. 1–4.
Khan, Wisal, ahmed, Ejaz, Shahzad, Waseem, 2017. Predictive performance comparison
Rafamantanantsoa, Fontaine, Laha, Maherindefo, 2018. Analysis and neural networks
analysis of relational & NoSQL graph databases. Int. J. Adv. Comput. Sci. Appl. 8
modeling of web server performances using MySQL and PostgreSQL. Commun.
(5).
Network 10 (04), 142–151.
Khanna, Deepti, Aggarwal, V.B., Director, J.I.M.S., Dave, India Meenu, 2018. Per-
Rautmare, Sharvari, Bhalerao, D.M., 2016. MySQL and NoSQL database comparison for
formance analysis for select, project and join operations of Oracle, My-SQL and
IoT application. In: 2016 IEEE International Conference on Advances in Computer
microsoft access DBMSS. Int. J. Comput. Eng. Technol. (IJCET).
Applications. ICACA, IEEE, Coimbatore, pp. 235–238.
Kumar, Lokesh, Rajawat, Shalini, Joshi, Krati, 2015. Comparative analysis of NoSQL
Ribeiro, Jardel, Henrique, Jonas, Ribeiro, Rodrigo, Neto, Rosalvo, 2017. NoSQL vs
(MongoDB) with MySQL database. Int. J. Modern Trends Eng. Res. 2 (5), 120–127.
relational database: A comparative study about the generation of the most frequent
Kumar, K.B. Sundhara, Srividya, Mohanavalli, S., 2017. A performance comparison
N-grams. In: 2017 4th International Conference on Systems and Informatics. ICSAI,
of document oriented NoSQL databases. In: 2017 International Conference on
IEEE, Hangzhou, pp. 1568–1572.
Computer, Communication and Signal Processing. ICCCSP, IEEE, Chennai, India,
Roopak, K.E., Rao, K.S. Swati, Ritesh, S., Chickerur, Satyadhyan, 2013. Performance
pp. 1–6.
comparison of relational database with object database (DB4o). In: 2013 5th Inter-
Laksono, Dany, 2018. Testing spatial data deliverance in SQL and NoSQL database
national Conference on Computational Intelligence and Communication Networks.
using nodejs fullstack web app. In: 2018 4th International Conference on Science
IEEE.
and Technology. ICST, IEEE, Yogyakarta, pp. 1–5.
Saikia, Amlanjyoti, Joy, Sherin, Dolma, Dhondup, Mary R, Roseline, 2015. Comparative
Lazarska, Malgorzata, Siedlecka-Lamch, Olga, 2019. Comparative study of relational
performance analysis of MySQL and SQL server relational database management
and graph databases. In: 2019 IEEE 15th International Scientific Conference on
systems in windows environment. IJARCCE 160–164.
Informatics. IEEE, pp. 000363–000370.
Lee, Chao-Hsien, Shih, Zhe-Wei, 2018. A comparison of NoSQL and SQL databases over Samanta, Ashis Kumar, Sarkar, Bidut Biman, Chaki, Nabendu, 2018. Query performance
the hadoop and spark cloud platforms using machine learning algorithms. In: 2018 analysis of NoSQL and big data. In: 2018 Fourth International Conference on
IEEE International Conference on Consumer Electronics-Taiwan. ICCE-TW, IEEE, Research in Computational Intelligence and Communication Networks. ICRCICN,
Taichung, pp. 1–2. IEEE.
Li, Yishan, Manoharan, Sathiamoorthy, 2013. A performance comparison of SQL and Schmid, Stephan, Galicz, Eszter, Reinhardt, Wolfgang, 2015a. Performance investigation
NoSQL databases. In: 2013 IEEE Pacific Rim Conference on Communications, of selected SQL and NoSQL databases. In: Proceedings of the AGILE. pp. 1–5.
Computers and Signal Processing. PACRIM, IEEE. Seda, Pavel, Hosek, Jiri, Masek, Pavel, Pokorny, Jiri, 2018. Performance testing of
Lorincz, Josip, Huljic, Vlatka, Begusic, Dinko, 2020. Transforming product catalog rela- NoSQL and RDBMS for storing big data in e-applications. In: 2018 3rd International
tional into graph database: A performance comparison. In: 2020 43rd International Conference on Intelligent Green Building and Smart Grid. IGBSG, IEEE.
Convention on Information, Communication and Electronic Technology. MIPRO, Sharma, Monika, Sharma, Vishal Deep, Bundele, Mahesh M., 2018. Performance
IEEE, Opatija, Croatia, pp. 523–528. analysis of RDBMS and No SQL databases: Postgresql, MongoDB and Neo4j.
Magdum, Junaid, Barhate, Rahul, 2018. Performance analysis of DML operations on In: 2018 3rd International Conference and Workshops on Recent Advances and
NoSQL databases for streaming data. In: 2018 Fourth International Conference on Innovations in Engineering. ICRAIE, IEEE, Jaipur, India, pp. 1–5.
Computing Communication Control and Automation. ICCUBEA, IEEE, Pune, India, Sholichah, Rahmatian Jayanty, Imrona, Mahmud, Alamsyah, Andry, 2020. Performance
pp. 1–6. analysis of Neo4j and MySQL databases using public policies decision making data.
Mahmood, Khalid, Orsborn, Kjell, Risch, Tore, 2019. Comparison of NoSQL datastores In: 2020 7th International Conference on Information Technology, Computer, and
for large scale data stream log analytics. In: 2019 IEEE International Conference Electrical Engineering. ICITACEE, IEEE, Semarang, Indonesia, pp. 152–157.
on Smart Computing. SMARTCOMP, IEEE, Washington, DC, USA, pp. 478–480. Sirish Shetty, B., Akshay, Kc, 2019. Performance analysis of queries in RDBMS vs
Makris, Antonios, Tserpes, Konstantinos, Spiliopoulos, Giannis, Anagnostopoulos, Di- NoSQL. In: 2019 2nd International Conference on Intelligent Computing, Instru-
mosthenis, 2019. Performance evaluation of MongoDB and PostgreSQL for mentation and Control Technologies. ICICICT, IEEE, Kannur,Kerala, India, pp.
spatio-temporal data. In: EDBT/ICDT Workshops. 1283–1286.
Makris, Antonios, Tserpes, Konstantinos, Spiliopoulos, Giannis, Zissis, Dimitrios, Anag- Stancu-Mara, Sorin, Baumann, Peter, 2008. A comparative benchmark of large objects
nostopoulos, Dimosthenis, 2021. MongoDB Vs PostgreSQL: A comparative study on in relational databases. In: Proceedings of the 2008 International Symposium on
performance aspects. GeoInformatica 25 (2), 243–268. Database Engineering & Applications - IDEAS ’08. ACM Press, Coimbra, Portugal,
Marrero, Luciano, Olsowy, Verena, Tesone, Fernando, Thomas, Pablo, Delia, Lisan- p. 277.
dro, Pesado, Patricia, 2020. Performance analysis in NoSQL databases, relational Truica, Ciprian-Octavian, Radulescu, Florin, Boicea, Alexandru, Bucur, Ion, 2015. Per-
databases and NoSQL databases as a service in the cloud. In: Argentine Congress formance evaluation for CRUD operations in asynchronously replicated document
of Computer Science. Springer, pp. 157–170. oriented database. In: 2015 20th International Conference on Control Systems and
Mavrogiorgos, Konstanitnos, Kiourtis, Athanasios, Mavrogiorgou, Argyro, Kyriazis, Di- Computer Science. IEEE, Bucharest, Romania, pp. 191–196.
mosthenis, 2021. A comparative study of MongoDB, ArangoDB and CouchDB for van der Veen, Jan Sipke, van der Waaij, Bram, Meijer, Robert J., 2012. Sensor data
big data storage. In: 2021 5th International Conference on Cloud and Big Data storage performance: SQL or NoSQL, physical or virtual. In: 2012 IEEE Fifth
Computing. ICCBDC, ACM, Liverpool United Kingdom, pp. 8–14. International Conference on Cloud Computing. IEEE.

15
T. Taipalus The Journal of Systems & Software 208 (2024) 111872

Vicknair, Chad, Macias, Michael, Zhao, Zhendong, Nan, Xiaofei, Chen, Yixin, Xu, Wei, Zhou, Zhonghua, Zhou, Hong, Zhang, Wu, Xie, Jiang, 2014. MongoDB
Wilkins, Dawn, 2010. A comparison of a graph database and a relational database: improves big data analysis performance on electric health record system. In:
A data provenance perspective. In: Proceedings of the 48th Annual Southeast Communications in Computer and Information Science. Springer Berlin Heidelberg,
Regional Conference on - ACM SE ’10. ACM Press, Oxford, Mississippi, p. 1. pp. 350–357.
Wei-ping, Zhu, Ming-xin, Li, Huan, Chen, 2011. Using MongoDB to implement textbook Yinfeng Wang, Guiquan Zhong, Lin Kun, Longxiang Wang, Huang Kai, Fuliang Guo,
management system instead of MySQL. In: 2011 IEEE 3rd International Conference Chengzhe Liu, Xiaoshe Dong, 2015. The performance survey of in memory database.
on Communication Software and Networks. IEEE. In: 2015 IEEE 21st International Conference on Parallel and Distributed Systems.
Wiseso, Linggis Galih, Imrona, Mahmud, Alamsyah, Andry, 2020. Performance analysis ICPADS, IEEE, Melbourne, VIC, pp. 815–820.
of Neo4j, MongoDB, and PostgreSQL on 2019 national election big data manage- Zhou, Zhonghai, Zhou, Bin, Li, Wenwen, Griglak, Brian, Caiseda, Carmen, Huang, Qun-
ment database. In: 2020 6th International Conference on Science in Information ying, 2009. Evaluating query performance on object-relational spatial databases.
Technology. ICSITech, IEEE. In: 2009 2nd IEEE International Conference on Computer Science and Information
Technology. IEEE.

16

You might also like