RAM-Disk vs. In-Memory Database Systems: An Embedded Database Performance Benchmark
RAM-Disk vs. In-Memory Database Systems: An Embedded Database Performance Benchmark
RAM-Disk Databases:
A Linux-based Benchmark
Abstract: It stands to reason that accessing data from memory will be faster than from
physical media. A new type of database management system, the in-memory database
system (IMDS), claims breakthrough performance and availability via memory-only
processing. But doesn’t database caching achieve the same result? And if complete
elimination of disk access is the goal, why not deploy a traditional database on a RAM-
disk, which creates a file system in memory?
This paper tests the eXtremeDB in-memory embedded database (an IMDS) against the
db.linux embedded database, with db.linux used in both traditional (disk-based) and
RAM-disk modes, running on Red Hat Linux 6.2. Deployment in RAM boosts db.linux’s
performance by as much as 74 percent. But even then, the traditional database lags the
IMDS. Fundamental architectural differences explain the disparity. Overhead hard-wired
into disk-based databases includes data transfer and duplication, unneeded recovery
functions and, ironically, caching logic intended to avoid disk access.
McObject LLC
22525 SE 64th Place
Suite 302
Issaquah, WA 98027
Phone: 425-888-8505
E-mail: [email protected]
It makes sense that maintaining data in memory, rather than retrieving it from disk, will
improve application performance. After all, disk access is one of the few mechanical (as
opposed to electronic) functions integral to processing, and suffers from the slowness of
moving parts. On the software side, disk access also involves a “system call” that is
relatively expensive in terms of performance. The desire to improve performance by
avoiding disk access is the fundamental rationale for database management system
(DBMS) caching and file system caching methods.
This concept has been extended recently with a new type of DBMS, designed to reside
entirely in memory. Proponents of these in-memory database systems (IMDSs) point to
groundbreaking improvements in database speed and availability, and claim the
technology represents an important step forward in both data management and real-time
systems.
But this begs a seemingly obvious question: since caching is available, why not extend its
use to cache entire databases to realize desired performance gains? In addition, RAM-
drive utilities exist to create file systems in memory. Deploying a traditional database on
such a RAM-disk eliminates physical disk access entirely. Shouldn’t its performance
equal the main memory database?
This white paper tests the theory. Two nearly identical database structures and
applications are developed to measure performance in reading and writing 30,000
records. The main difference is that one of the embedded databases used in the test,
eXtremeDB, is an IMDS, while the other, db.linux, is designed for disk storage. The
result: while RAM-drive deployment makes the disk-based database significantly faster,
it cannot approach the in-memory database performance. The sections below present the
comparison and explain how caching, data transfer and other overhead sources inherent
in a disk-based database (even on a RAM-drive) cause the performance disparity.
McObject’s eXtremeDB is the first in-memory database created for the embedded
systems market. This DBMS is similar to disk-based embedded databases, such as
db.linux, BerkeleyDB, Empress, C-tree and others, in that all are intended for use by
application developers to provide database management functionality from within an
application. They are “embedded” in the application, as opposed to being a separately
administered server like Microsoft SQL Server, DB2 or Oracle. Each also has a
relatively small footprint when compared to enterprise class databases, and offers a
navigational API for precise control over database operations.
This paper compares eXtremeDB to db.linux, a disk-based embedded database. The open
source db.linux DBMS was chosen because of its longevity (first released in 1986 under
the name db_VISTA) and wide usage. eXtremeDB and db.linux also have similar
database definition languages.
The tests were performed on a PC running Red Hat Linux 6.2, with a 400Mhz Intel
Celeron processor and 128 megabytes of RAM.
Database Design
The following simple database schema was developed to compare the two databases’
performance writing 30,000 similar objects to a database and reading them back via a
key.
/**********************************************************
* *
* Copyright(c) 2001 McObject,LLC. All Right Reserved. *
* *
**********************************************************/
struct stuff {
int2 a;
};
/*
*
*/
class Measure {
uint4 sensor_id;
uint4 timestamp;
string spectra;
stuff thing;
};
struct stuff {
short a;
};
record Measure
{
long sensor_id;
long m_timestamp;
char spectra[1000];
The only meaningful difference between the two schemas is the field ‘spectra’. In the
case of eXtremeDB it is defined as a ‘string’ type whereas with db.linux it is defined as
char[1000]. The db.linux implementation will consume 1000 bytes for the spectra field,
regardless of how many bytes are actually stored in the field. In eXtremeDB a string is a
variable length field. db.linux does not have a direct corollary to the eXtremeDB string
type, though there is a technique to use db.linux network model sets to emulate variable
length fields with varying degrees of granularity (trading performance for space
efficiency). Doing so, however, would have caused significant differences in the two sets
of implementation code, making it more difficult to perform a side-by-side comparison.
eXtremeDB has a fixed length character data type; however, the variable length field was
used for purposes of comparison, because it is the data type explicitly designed for this
task.
(An interesting exercise for the reader may be to alter the eXtremeDB implementation to
use a char[1000] type for spectra, and to alter the db.linux implementation to employ the
variable length field implementation. The pseudo-code for implementing this is shown in
Appendix A).
Benchmark Application
The first half of the test application populates the database with 30,000 instances of the
‘Measure’ class/record.
The eXtremeDB implementation allocates memory for the database, allocates memory
for randomized strings, opens the database, and establishes a connection to it.
if ( !start_mem ) {
printf( "\nToo bad ..." );
exit( 1 );
}
make_strings();
make_strings();
stat = d_opentask(&task);
From this point, both implementations enter two loops: 100 iterations for the outer loop,
300 iterations for the inner loop (total 30,000).
To add a record to eXtremeDB, a write transaction is started and space is reserved for a
new object in the database (Measure_new). Then the sensor_id and timestamp fields are
put to the object, a random string is taken from the pool created earlier and put to the
object, and the transaction committed.
A second pair of nested loops is set up to conduct the performance evaluation of reading
the 30,000 objects previously created.
for ( sensor_num = 0; sensor_num < SENSORS; sensor_num++ ) {
uint2 len;
for ( measure_num = 0; measure_num < MEASURES; measure_num++ ) {
mco_trans_start(db, MCO_READ_ONLY, MCO_TRANS_FOREGROUND,
&t );
rc = Measure_sensors_index_cursor( t, &csr );
rc = Measure_sensors_find( t, &csr, MCO_EQ, sensor_num,
sensor_num + measure_num);
if ( rc != 0 ) {
rc = mco_trans_commit( t );
goto rep2;
}
rc = Measure_from_cursor( t, &csr, &measure );
/* read the spectra */
rc = Measure_spectra_get( &measure, str, sizeof(str), &len );
rc = Measure_sensor_id_get( &measure, &id );
rc = Measure_timestamp_get( &measure, &ts );
rc = mco_trans_commit( t );
}
}
The eXtremeDB implementation sets up the loops and, for each iteration, starts a read
transaction, instantiates a cursor, and finds the Measure object by its key fields. Upon
successfully finding the object, an object handle is initialized from the cursor and the
object’s fields are read from the object handle. Lastly, the transaction is completed.
For the db.linux implementation, the two loops are set up and on each iteration, the key
search values are assigned to a structure’s fields. db.linux does not use transactions for
read-only access, but requires that the record-type be explicitly locked. Upon
successfully acquiring the record lock, the structure holding the key lookup values is
passed to the d_keyfind function. If the key values are found, the record is read into the
same structure by d_recread and the record lock is released.
As alluded to above, the key implementation differences revolve around transactions and
multi-user (multi-threaded) concurrent access (there is also a philosophical difference
between the object-oriented approach to database access of eXtremeDB, but it is
unrelated to in-memory versus disk-based databases, so we do not explore it here).
With eXtremeDB, all concurrency controls are implicit, only requiring that all database
access occur within the scope of a read or write transaction. In contrast, db.linux requires
the application to explicitly acquire read or write record type locks, as appropriate, prior
to attempting to access the record type. Because db.linux requires explicit locking, it
does not require a transaction for read-only access.
The following graph depicts the relative performance of eXtremeDB and db.linux in a
multi-threaded, transaction-controlled environment, with db.linux maintaining the
database files on disk, as it naturally does.
eXtremeDB (main memory)
vs.
db.linux (disk drive)
eXtremeDB eXtremeDB
read 1
write 2.6
db.linux
read 16.25
db.linux
write 3118.25
Figure 10 shows the performance of the same eXtremeDB implementation used above,
alongside db.linux with the database files on a RAM-disk, completely eliminating
physical disk access (for details on the implementation of this RAM-disk on Red Hat
Linux 6.2, see Appendix B).
eXtremeDB (main memory)
vs.
db.linux (RAM-drive)
eXtremeDB
read 1
eXtremeDB
write 2.6
db.linux
read 4.2
db.linux
write 1093
However, it is equally obvious that the database fundamentally designed for in-memory
use delivers superior performance. The in- memory database still outperforms the RAM-
deployed, disk-based database by 420X for database writing, and by more than 4X for
database reads. The following section analyzes the reasons for this disparity.
The RAM-drive approach eliminates physical disk access. So why does the disk-based
database still lag the in-memory database in performance? The problem is that disk-based
databases incorporate processes that are irrelevant for in- memory processing, and the
RAM-drive deployment does not change such internal functioning. These processes “go
through the motions” even when no longer needed, adding several distinct types of
performance overhead.
Caching overhead
Due to the significant performance drain of physical disk access, virtually all disk-based
databases incorporate sophisticated techniques to minimize the need to go to disk.
Foremost among these is database caching, which strives to keep the most frequently
used portions of the database in memory. Caching logic includes cache synchronization,
which makes sure that an image of a database page in cache is consistent with the
physical database page on disk, to prevent the application from reading invalid data.
Another process, cache lookup, determines if data requested by the application is in cache
and, if not, retrieves the page and adds it to the cache for future reference. It also selects
data to be removed from cache, to make room for incoming pages. If the outgoing page
is “dirty” (holds one or more modified records), additional logic is invoked to protect
other applications from seeing the modified data until the transaction is committed.
These caching functions present only minor overhead when considered individually, but
present significant overhead in aggregate. Each process plays out every time the
application makes a function call to read a record from disk (in the case of db.linux,
examples are d_recfrst, d_recnext, d_findnm, d_keyfind, etc.). In the demonstration
application above, this amounts to some 90,000 function calls: 30,000 d_fillnew, 30,000
d_keyfind and 30,000 d_recread. In contrast, all records in a main memory database such
as eXtremeDB are always in memory, and therefore require zero caching
In the event of catastrophic failure, the in-memory database image is lost—which suits
IMDSs’ intended applications. If the system is turned off or some other event causes the
in-memory image to expire, the database is simply re-provisioned upon restart. Examples
of this include a program guide application in a set-top box that is continually
downloaded from a satellite or cable head-end, a network switch that discovers network
topology on startup, or a wireless access point that is provisioned by a server upstream.
This does not preclude the use of saved local data. The application can open a stream (a
socket, pipe, or a file pointer) and instruct eXtremeDB to read or write a database image
from, or to, the stream. This feature could be used to create and maintain boot-stage data,
i.e. an initial starting point for the database. The other end of the stream can be a pipe to
another process, or a file system pointer (any file system, whether it’s magnetic, optical,
or FLASH). However, eXtremeDB’s transaction processing operates independently from
these capabilities, limiting its scope to main memory processing in order to provide
maximum availability.
With a disk-based embedded database such as db.linux, data is transferred and copied
extensively. In fact, the application works with a copy of the data contained in a program
variable that is several times removed from the database. Consider the “handoffs”
required for an application to read a piece of data from the disk-based database, modify
it, and write that piece of data back to the database.
1. The application requests the data item from the database runtime through some
database API (e.g. db.linux’s d_recread function).
2. The database runtime instructs the file system to retrieve the data from the
physical media (or memory-based storage location, in the case of a RAM-disk).
3. The file system makes a copy of the data for its cache and passes another copy to
the database.
4. The database keeps one copy in its cache and passes another copy to the
application.
5. The application modifies its copy and passes it back to the database through some
database API (e.g. db.linux’s d_recwrite function).
6. The database runtime copies the modified data item back to database cache.
7. The copy in the database cache is eventually written to the file system, where it is
updated in the file system cache.
8. Finally, the data is written back to the physical media (or RAM-disk).
In this scenario there are 4 copies of the data (application copy, database cache, file
system cache, file system) and 6 transfers to move the data from the file system to the
application and back to the file system. And this simplified scenario doesn’t account for
additional copies and transfers that are required for transaction logging!
In contrast, an in- memory database such as eXtremeDB requires little or no data transfer.
The application may make copies of the data, in local program variables, for its own
purposes or convenience, but is not required to by eXtremeDB. Instead, eXtremeDB will
give the application a pointer that refers directly to the data item in the database, enabling
the application to work with the data directly. The data is still protected because the
pointer is only used through the eXtremeDB -provided API, which insures that it is used
properly.
A RAM-disk database still uses the underlying file system to access data within the
database. Therefore, it still relies on the file system function lseek() to locate the data.
Differing implementations of lseek() (for disk file systems as well as RAM disks) will
exhibit better or worse performance based on the quality of the implementation, but the
DBMS has no knowledge or control over this performance factor. In constrast,
eXtremeDB has complete control over access methods and is highly optimized.
Conclusion
While not this paper’s primary focus, two other benefits of the in-memory database
systems emerge from the experiment above. One is database footprint—the absence of
caching functions and other unnecessary logic means that memory and storage demands
are correspondingly low. In fact, the eXtremeDB database maintained a total RAM
footprint of 108K in this test and 20.85MB when fully loaded with data (the raw data size
is 16.7MB), compared to db.linux’s footprint of 323K and 31.8MB with data (raw data is
the same, 16.7MB). The second benefit is greater reliability stemming from a less
complex database system architecture. It stands to reason that with fewer interacting
processes, this streamlined database system should result in fewer negative surprises for
end-users and developers.
Further Reading
Information about the eXtremeDB in-memory embedded database:
http://www.mcobject.com/extremedbfamily.shtml
To emulate a variable length string field with db.linux, alter the database schema as
follows:
/**********************************************************
* *
* Copyright(c) 2001 McObject,LLC. All Right Reserved. *
* *
**********************************************************/
struct stuff {
short a;
};
database mcs
{
data file "mcs.dat" contains Measure;
key file "mcs.key" contains sensors;
record Measure
{
long sensor_id;
long m_timestamp;
d_fillnew (MEASURE)
d_setor(SPECTRA)
char *p = spectra
do {
if strlen(p) >= sizeof_spectra300
strncpy( Text300.spectra300, p, sizeof_spectra300 )
d_fillnew the Text300 record
d_connect(SPECTRA)
p += sizeof_spectra300
else if strlen(p) >= sizeof_spectra200
strncpy( Text200.spectra200, p, sizeof_spectra200 )
d_fillnew the Text200 record
d_connect(SPECTRA)
p += sizeof_spectra200
else if strlen(p) >= sizeof_spectra100
strncpy( Text100.spectra100, p, sizeof_spectra100 )
d_fillnew the Text100 record
d_connect(SPECTRA)
p = NULL
} while (p)
Note: the above pseudo-code is greatly simplified and does not cover all of the border
conditions. The general idea is to break off the largest piece of the spectra string possible
and store it in the appropriately sized TextNNN record and create a linked list of these
records with db.linux’s multi-member network model set, named SPECTRA in this
example.
When retrieving the data, the linked list is traversed, concatenating the segmented spectra
string back together into the whole:
d_keyfind (MEASURE)
d_recread (MEASURE)
d_setor(SPECTRA)
char spectra[1000];
for( (stat = d_findfm(SPECTRA)); stat != S_EOS; (stat =
d_findnm(SPECTRA)) {
d_recread( &text300rec )
strcat( spectra, text300rec.spectra300 )
}
The code to reassemble the string iterates over the set reading each
set member record and concatenating the string segment to the whole.
Again, the pseudo code is simplified to illustrate the primary logic of
the variable length string technique.
Appendix B – RAM-Disk configuration