VoltDB
7.0.3
March 4, 2021
This document provides information about known issues and limitations to the current release of VoltDB. If you encounter any problems not listed below, please be sure to report them to [email protected]. Thank you.
VoltDB 7.0 was a major release incorporating features from previous point releases plus new capabilities. The major new features in V7.0 and later include:
Multi-cluster Cross-Datacenter Replication (XDCR) — Cross-Datacenter Replication (XCDR) supports active replication between clusters. XDCR support has been extended from just two clusters to three or more clusters. See the chapter on "Database Replication" in the Using VoltDB manual for details.
Improved configuration for database replication (DR) — The configuration of
DR clusters has been consolidated into a single configuration file element, eliminating the need for special SQL
statements and start command flags to identify the type of DR cluster being created. The <dr
role="{type}">
attribute lets you explicitly identify the type of cluster
— master, replica, or xdcr — being started.
Simplified software upgrades — The process for upgrading VoltDB software on a single cluster has been simplified, removing the need to re-initialize and manually restore the data. Now upgrades from recent versions are simply performing a shutdown --save, upgrading the software, and restarting the database. See the section on "Upgrading VoltDB Software" in the VoltDB Administrator's Guide for details.
Views on table joins — VoltDB now supports materialized views on the join of two or more tables. See the description of the CREATE VIEW statement for details.
Window functions — VoltDB now supports six window functions: RANK, DENSE_RANK, COUNT, SUM, MAX, and MIN. The window functions allow more selective calculations on the statement results than can be achieved with plain aggregation functions. See the description of the SELECT statement for details.
Increased availability and robustness — Over the past few months extensive work has been done to harden VoltDB against common issues associated with distributed systems. Efforts include working with the Jepsen tests to identify and eliminate the last few edge cases related to data reliability and redesigning the partitioning algorithms to increase the availability beyond K factor guarantees in most cases. All of these changes have occurred "under the covers", providing added reliability and availability to customers with no changes to existing applications.
Additional options for reporting and monitoring — VoltDB continues to add to the statistics and other information available from its servers. The statistics around DR have been expanded and improved to provide better visibility into the current state of the DR clusters. In particular, the @Statistics DRROLE selector provides an overview of the cluster's role and current state. VoltDb also now provides SNMP traps for a number of important cluster events, such as server outages and exceeding resource caps.
VoltDB Management Center improvements — The monitoring tab of the VoltDB Management Center has been rearranged to improve usability. Charts and graphs related to two features — export and database replication (DR) — have been broken out into tabs of their own, simplifying the Monitor tab. The SQL Query tab has also been enhanced to support multiple queries and allow the user to resize the query and results panes.
Most of the new features and capabilities in VoltDB V7.0 do not impact existing applications. However, there are a few changes that simplify and extend existing functionality that do require minor changes to the configuration when upgrading from earlier versions. Also several deprecated features have now been removed. Existing customers should take note of of the following changes:
Supported platforms
Ubuntu 12.04 is no longer a supported production platform for VoltDB. The currently supported operating systems for running production VoltDB databases are:
CentOS V6.6 or later. Including CentOS 7.0 and later
Red Hat (RHEL) V6.6 or later, including Red Hat 7.0 and later
Ubuntu 14.04 and 16.04
Changes to DR configuration
The configuration of DR clusters has improved, aggregating, the server settings into one place within the
configuration file. In the <dr>
element, the role attribute identifies the role
of the cluster, obviating the need for the SET DR=ACTIVE statement in the schema and the --replica flag on the start
command. These old syntax elements are now deprecated and VoltDB V7.0 generates a warning reminding the user to use the
new syntax when they are encountered.
For users running passive DR, be sure to add the role="replica"
to the
configuration file of your replica cluster. For XDCR users, add the attribute role="xdcr"
as a replacement for the SET DR=ACTIVE statement.
Old export syntax removed
The old syntax for declaring and configuring export tables has been removed from the product. This means the
EXPORT TABLE statement and <export>
without explicit <configuration>
sub-elements no longer work. If you have not already migrated your
export configuration to the new syntax, before using VoltDB V7.0, you must:
In the schema, replace CREATE TABLE and EXPORT TABLE statements with CREATE STREAM EXPORT TO...
In the configuration file, replace a single <export>
element with
<export>
and one or more <configuration>
sub-elements.
See the chapter on "Importing and Exporting Live Data" in the Using VoltDB manual.
Catalog mode removed
Catalog mode (compiling and loading a separate catalog JAR file) was deprecated more than two years ago. It has now been disabled in the product. It is no longer possible to compile or load a standalone catalog. Instead, please use interactive DDL for the schema and the LOAD CLASSES command in sqlcmd to load stored procedures.
Deprecated Features
The following features are deprecated as of VoltDB V7.0. Although they continue to work in V7.x, we strongly recommend users migrate to the replacement features as the deprecated features will be removed in a future major release.
SHA-1 hashing — VoltDB supports passing credentials using either SHA-1 or SHA-2 hashing. However, SHA-1 hashing is not sufficiently strong for most applications and is therefore deprecated. Use of SHA-2 hashing is recommended.
PARTITION PROCEDURE — The standalone PARTITION PROCEDURE statement is deprecated. Instead, please include the partitioning information as part of the CREATE PROCEDURE statement using the PARTITION ON clause. The combined CREATE PARTITION statement not only is more efficient, it avoids certain edge cases where procedures cannot be declared and partitioned separately as well as making the schema of the procedure more self-explanatory.
The process for upgrading from the recent versions of VoltDB is as follows:
Shutdown the database, creating a final snapshot (using voltadmin shutdown --save).
Upgrade the VoltDB software.
Restart the database (using voltdb start).
For DR clusters, see the section on "Upgrading VoltDB Software" in the VoltDB Administrator's Guide for more special considerations related to DR upgrades.
Support for upgrading using shutdown --save was only added in V6.8. If you are upgrading from older versions of VoltDB, you will need to save and restore the snapshot manually. The procedure to do that is as follows:
Place the database in admin mode (using voltadmin pause).
Perform a manual snapshot of the database (using voltadmin save --blocking).
Shutdown the database (using voltadmin shutdown).
Upgrade the VoltDB software.
Initialize a new database root directory (using the voltdb init --force action).
Start the database in admin mode (using the voltdb start --pause action).
Restore the snapshot created in Step #2 (using voltadmin restore).
Return the database to normal operations (using voltadmin resume).
For customers upgrading from V5.x or earlier releases of VoltDB, please see the V5.0 Upgrade Notes.
For customers upgrading from V4.x or earlier releases of VoltDB, please see the V4.0 Upgrade Notes.
In addition to the new features listed in the section called “What's New in VoltDB V7.0 and Later”, users of previous versions of VoltDB should take note of the following changes that might impact their existing applications.
The following are known limitations to the current release of VoltDB. Workarounds are suggested where applicable. However, it is important to note that these limitations are considered temporary and are likely to be corrected in future releases of the product.
1. Command Logging | |
1.1. | Do not use the subfolder name "segments" for the command log snapshot directory. |
VoltDB reserves the subfolder "segments" under the command log directory for storing the actual command log files. Do not add, remove, or modify any files in this directory. In particular, do not set the command log snapshot directory to a subfolder "segments" of the command log directory, or else the server will hang on startup. | |
2. Database Replication | |
2.1. | Some DR data may not be delivered if master database nodes fail and rejoin in rapid succession. |
Because DR data is buffered on the master database and then delivered asynchronously to the replica, there is always the danger that data does not reach the replica if a master node stops. This situation is mitigated in a K-safe environment by all copies of a partition buffering on the master cluster. Then if a sending node goes down, another node on the master database can take over sending logs to the replica. However, if multiple nodes go down and rejoin in rapid succession, it is possible that some buffered DR data — from transactions when one or more nodes were down — could be lost when another node with the last copy of that buffer also goes down. If this occurs and the replica recognizes that some binary logs are missing, DR stops and must be restarted. To avoid this situation, especially when cycling through nodes for maintenance purposes, the key is to ensure that all buffered DR data is transmitted before stopping the next node in the cycle. You can do this using the @Statistics system procedure to make sure the last ACKed timestamp (using @Statistitcs DR on the master cluster) is later than the timestamp when the previous node completed its rejoin operation. | |
2.2. | Avoid bulk data operations within a single transaction when using database replication |
Bulk operations, such as large deletes, inserts, or updates are possible within a single stored procedure. However, if the binary logs generated for DR are larger than 45MB, the operation will fail. To avoid this situation, it is best to break up large bulk operations into multiple, smaller transactions. A general rule of thumb is to multiply the size of the table schema by the number of affected rows. For deletes and inserts, this value should be under 45MB to avoid exceeding the DR binary log size limit. For updates, this number should be under 22.5MB (because the binary log contains both the starting and ending row values for updates). | |
2.3. | Database replication ignores resource limits |
There are a number of VoltDB features that help manage the database by constraining memory size and resource utilization. These features are extremely useful in avoiding crashes as a result of unexpected or unconstrained growth. However, these features could interfere with the normal operation of DR when passing data from one cluster to another, especially if the two clusters are different sizes. Therefore, as a general rule of thumb, DR overrides these features in favor of maintaining synchronization between the two clusters. Specifically, DR ignores any resource monitor limits defined in the deployment file when applying binary logs on the consumer cluster. DR also ignores any partition row limits defined in the database schema when applying binary logs. This means, for example, if the replica database in passive DR has less memory or fewer unique partitions than the master, it is possible that applying binary logs of transactions that succeeded on the master could cause the replica to run out of memory. Note that these resource monitor and tables row limits are applied on any original transactions local to the cluster (for example, transactions on the master database in passive DR). | |
2.4. | Different cluster sizes can require additional Java heap |
Database Replication (DR) now supports replication across clusters of different sizes. However, if the replica cluster is smaller than the master cluster, it may require a significantly larger Java heap setting. Specifically, if the replica has fewer unique partitions than the master, each partition on the replica must manage the incoming binary logs from more partitions on the master, which places additional pressure on the Java heap. A simple rule of thumb is that the worst case scenario could require an additional P * R * 20MB space in the Java heap , where P is the number of sites per host on the replica server and R is the ratio of unique partitions on the master to partitions on the replica. For example, if the master cluster is 5 nodes with 10 sites per host and a K factor of 1 (i.e. 25 unique partitions) and the replica cluster is 3 nodes with 8 sites per host and a K factor of 1 (12 unique partitions), the Java heap on the replica cluster may require approximately 320MB of additional space in the heap: Sites-per-host * master/replace ratio * 20MB An alternative is to reduce the size of the DR buffers on the master cluster by setting the DR_MEM_LIMIT Java property. For example, you can reduce the DR buffer size from the default 10MB to 5MB using the VOLTDB_OPTS environment variable before starting the master cluster. $ export VOLTDB_OPTS="-DDR_MEM_LIMIT=5" $ voltdb start Changing the DR buffer limit on the master from 10MB to 5MB proportionally reduces the additional heap size needed. So in the previous example, the additional heap on the replica is reduced from 320MB to 160MB. | |
3. Cross Datacenter Replication (XDCR) | |
3.1. | Avoid replicating tables without a unique index. |
Part of the replication process for XDCR is to verify that the record's starting and ending states match on both clusters, otherwise known as conflict resolution. To do that, XDCR must find the record first. Finding uniquely indexed records is efficient; finding non-unique records is not and can impact overall database performance. To make you aware of possible performance impact, VoltDB issues a warning if you declare a table as a DR table and it does not have a unique index. | |
3.2. | When starting XDCR for the first time, only one database can contain data. |
You cannot start XDCR if both databases already have data in the DR tables. Only one of the two participating databases can have preexisting data when DR starts for the first time. | |
3.3. | During the initial synchronization of existing data, the receiving database is paused. |
When starting XDCR for the first time, where one database already contains data, a snapshot of that data is sent to the other database. While receiving and processing that snapshot, the receiving database is paused. That is, it is in read-only mode. Once the snapshot is completed and the two database are synchronized, the receiving database is automatically unpaused, resuming normal read/write operations. | |
3.4. | A large number of multi-partition write transactions may interfere with the ability to restart XDCR after a cluster stops and recovers. |
Normally, XDCR will automatically restart where it left off after one of the clusters stops and recovers from its command logs (using the voltdb recover command). However, if the workload is predominantly multi-partition write transactions, a failed cluster may not be able to restart XDCR after it recovers. In this case, XDCR must be restarted from scratch, using the content from one of the clusters as the source for synchronizing and recreating the other cluster (using the voltdb create --force command) without any content in the DR tables. | |
3.5. | A TRUNCATE TABLE transaction will be reported as a conflict with any other write operation to the same table. |
When using XDCR, if the binary log from one cluster includes a TRUNCATE TABLE statement and the other cluster performs any write operation to the same table before the binary log is processed, the TRUNCATE TABLE operation will be reported as a conflict. Note that currently DELETE operations always supercede other actions, so the TRUNCATE TABLE will be executed on both clusters. | |
3.6. | Exceeding a LIMIT PARTITION ROWS constraint can generate multiple conflicts |
It is possible to place a limit on the number of rows that any partition can hold for a specific table using the LIMIT PARTITION ROWS clause of the CREATE TABLE statement. When close to the limit, transactions on either or both clusters can exceed the limit simultaneously, resulting in a potentially large number of delete operations that then generate conflicts when the the associated binary log reaches the other cluster. | |
3.7. | Use of the VoltProcedure.getUniqueId method is unique to a cluster, not across clusters. |
VoltDB provides a way to generate a deterministically unique ID within a stored procedure using the getUniqueId method. This method guarantees uniqueness within the current cluster. However, the method could generate the same ID on two distinct database clusters. Consequently, when using XDCR, you should combine the return values of VoltProcedure.getUniqueId with VoltProcedure.getClusterId, which returns the current cluster's unique DR ID, to generate IDs that are unique across all clusters in your environment. | |
3.8. | XDCR cannot be used with deprecated export syntax. |
You cannot use cross-datacenter replication (XDCR) with the deprecated export syntax, that is the EXPORT TABLE statement. To use XDCR with export, you must use the current CREATE STREAM syntax for declaring the source streams for export targets. | |
4. Export | |
4.1. | Synchronous export in Kafka can use up all available file descriptors and crash the database. |
A bug in the Apache Kafka client can result in file descriptors being allocated but not released if the producer.type attribute is set to "sync" (which is the default). The consequence is that the system eventually runs out of file descriptors and the VoltDB server process will crash. Until this bug is fixed, use of synchronous Kafka export is not recommended. The workaround is to set the Kafka producer.type attribute to "async" using the VoltDB export properties. | |
5. Import | |
5.1. | Data may be lost if a Kafka broker stops during import. |
If, while Kafka import is enabled, the Kafka broker that VoltDB is connected to stops (for example, if the server crashes or is taken down for maintenance), some messages may be lost between Kafka and VoltDB. To ensure no data is lost, we recommend you disable VoltDB import before taking down the associated Kafka broker. You can then re-enable import after the Kafka broker comes back online. | |
5.2. | Kafka import may be reset, resulting in duplicate entries. |
There is an issue with Kafka and the VoltDB Kafka importer where the current pointer in the Kafka queue gets reset to zero. The consequence of this event is that items in the queue get imported a second time resulting in duplicate entries. This issue will be addressed in an upcoming release. In the meantime, if you are using the Kafka importer, contact [email protected] for details. | |
6. SQL and Stored Procedures | |
6.1. | Comments containing unmatched single quotes in multi-line statements can produce unexpected results. |
When entering a multi-line statement at the sqlcmd prompt, if a line ends in a comment (indicated by two hyphens) and the comment contains an unmatched single quote character, the following lines of input are not interpreted correctly. Specifically, the comment is incorrectly interpreted as continuing until the next single quote character or a closing semi-colon is read. This is most likely to happen when reading in a schema file containing comments. This issue is specific to the sqlcmd utility. A fix for this condition is planned for an upcoming point release | |
6.2. | Do not use assertions in VoltDB stored procedures. |
VoltDB currently intercepts assertions as part of its handling of stored procedures. Attempts to use assertions in stored procedures for debugging or to find programmatic errors will not work as expected. | |
6.3. | The UPPER() and LOWER() functions currently convert ASCII characters only. |
The UPPER() and LOWER() functions return a string converted to all uppercase or all lowercase letters, respectively. However, for the initial release, these functions only operate on characters in the ASCII character set. Other case-sensitive UTF-8 characters in the string are returned unchanged. Support for all case-sensitive UTF-8 characters will be included in a future release. | |
7. Client Interfaces | |
7.1. | Avoid using decimal datatypes with the C++ client interface on 32-bit platforms. |
There is a problem with how the math library used to build the C++ client library handles large decimal values on 32-bit operating systems. As a result, the C++ library cannot serialize and pass Decimal datatypes reliably on these systems. Note that the C++ client interface can send and receive Decimal values properly on 64-bit platforms. | |
8. SNMP | |
8.1. | Enabling SNMP traps can slow down database startup. |
Enabling SNMP can take up to 2 minutes to complete. This delay does not always occur and can vary in length. If SNMP is enabled when the database server starts, the delay occurs after the server logs the message "Initializing SNMP" and before it attempts to connect to the cluster. If you enable SNMP while the database is running, the delay can occur when you issue the voltadmin update command or modify the setting in the VoltDB Management Center Admin tab. This issue results from a Java constraint related to secure random numbers used by the SNMP library. | |
9. VoltDB Management Center | |
9.1. | The VoltDB Management Center currently reports on only one DR connection. |
With VoltDB V7.0, cross-datacenter replication (XDCR) supports multiple clusters in an XDCR network. However, the VoltDB Management Center currently reports on only one such connection per cluster. In the future, the Management Center will provide monitoring and statistics for all connections to the current cluster. |
The following notes provide details concerning how certain VoltDB features operate. The behavior is not considered incorrect. However, this information can be important when using specific components of the VoltDB product.