Using VoltDB

Documentation

VoltDB Home » Documentation » Using VoltDB

Using VoltDB

V5.9

The text and illustrations in this document are licensed under the terms of the GNU Affero General Public License Version 3 as published by the Free Software Foundation. See the GNU Affero General Public License (http://www.gnu.org/licenses/) for more details.

Many of the core VoltDB database features described herein are part of the VoltDB Community Edition, which is licensed under the GNU Affero Public License 3 as published by the Free Software Foundation. Other features are specific to the VoltDB Enterprise Edition, which is distributed by VoltDB, Inc. under a commercial license. Your rights to access and use VoltDB features described herein are defined by the license you received when you acquired the software.

Abstract

This book explains how to use VoltDB to design, build, and run high performance applications.


Table of Contents

About This Book
1. Overview
1.1. What is VoltDB?
1.2. Who Should Use VoltDB
1.3. How VoltDB Works
1.3.1. Partitioning
1.3.2. Serialized (Single-Threaded) Processing
1.3.3. Partitioned vs. Replicated Tables
1.3.4. Ease of Scaling to Meet Application Needs
1.4. Working with VoltDB Effectively
2. Installing VoltDB
2.1. Operating System and Software Requirements
2.2. Installing VoltDB
2.2.1. Upgrading From Older Versions
2.2.2. Installing Standard System Packages
2.2.3. Building a New VoltDB Distribution Kit
2.3. Setting Up Your Environment
2.4. What is Included in the VoltDB Distribution
2.5. VoltDB in Action: Running the Sample Applications
3. Starting the Database
3.1. Initializing a VoltDB Database
3.2. Initializing the Database on a Cluster
3.3. Updating Nodes on the Cluster
3.4. Stopping a VoltDB Database
3.5. Restarting a VoltDB Database
3.6. Defining the Cluster Configuration
3.6.1. Determining How Many Sites per Host
3.6.2. Configuring Paths for Runtime Features
3.6.3. Verifying your Hardware Configuration
4. Designing the Database Schema
4.1. How to Enter DDL Statements
4.2. Creating Tables and Primary Keys
4.3. Analyzing Data Volume and Workload
4.4. Partitioning Database Tables
4.4.1. Choosing a Column on which to Partition Table Rows
4.4.2. Specifying Partitioned Tables
4.4.3. Design Rules for Partitioning Tables
4.5. Replicating Database Tables
4.5.1. Choosing Replicated Tables
4.5.2. Specifying Replicated Tables
4.6. Modifying the Schema
4.6.1. Effects of Schema Changes on Data and Clients
4.6.2. Viewing the Schema
4.6.3. Modifying Tables
4.6.4. Adding and Dropping Indexes
4.6.5. Modifying Partitioning for Tables and Stored Procedures
5. Designing Stored Procedures to Access the Database
5.1. How Stored Procedures Work
5.1.1. VoltDB Stored Procedures are Transactional
5.1.2. VoltDB Stored Procedures are Deterministic
5.2. The Anatomy of a VoltDB Stored Procedure
5.2.1. The Structure of the Stored Procedure
5.2.2. Passing Arguments to a Stored Procedure
5.2.3. Creating and Executing SQL Queries in Stored Procedures
5.2.4. Interpreting the Results of SQL Queries
5.2.5. Returning Results from a Stored Procedure
5.2.6. Rolling Back a Transaction
5.3. Installing Stored Procedures into the Database
5.3.1. Compiling, Packaging, and Loading Stored Procedures
5.3.2. Declaring Stored Procedures in the Schema
5.3.3. Partitioning Stored Procedures in the Schema
6. Designing VoltDB Client Applications
6.1. Connecting to the VoltDB Database
6.1.1. Connecting to Multiple Servers
6.1.2. Using an Auto-Reconnecting Client
6.2. Invoking Stored Procedures
6.3. Invoking Stored Procedures Asynchronously
6.4. Closing the Connection
6.5. Handling Errors
6.5.1. Interpreting Execution Errors
6.5.2. Handling Timeouts
6.5.3. Writing a Status Listener to Interpret Other Errors
6.6. Compiling and Running Client Applications
6.6.1. Starting the Client Application
6.6.2. Running Clients from Outside the Cluster
7. Simplifying Application Development
7.1. Using Default Procedures
7.2. Shortcut for Defining Simple Stored Procedures
7.3. Verifying Expected Query Results
7.4. Writing Stored Procedures Inline Using Groovy
8. Using VoltDB with Other Programming Languages
8.1. C++ Client Interface
8.1.1. Writing VoltDB Client Applications in C++
8.1.2. Creating a Connection to the Database Cluster
8.1.3. Invoking Stored Procedures
8.1.4. Invoking Stored Procedures Asynchronously
8.1.5. Interpreting the Results
8.2. JSON HTTP Interface
8.2.1. How the JSON Interface Works
8.2.2. Using the JSON Interface from Client Applications
8.2.3. How Parameters Are Interpreted
8.2.4. Interpreting the JSON Results
8.2.5. Error Handling using the JSON Interface
8.3. JDBC Interface
8.3.1. Using JDBC to Connect to a VoltDB Database
8.3.2. Using JDBC to Query a VoltDB Database
9. Using VoltDB in a Cluster
9.1. Starting a Database Cluster
9.2. Updating the Cluster Configuration
9.2.1. Adding Nodes with Elastic Scaling
9.2.2. Configuring How VoltDB Rebalances New Nodes
10. Availability
10.1. How K-Safety Works
10.2. Enabling K-Safety
10.2.1. What Happens When You Enable K-Safety
10.2.2. Calculating the Appropriate Number of Nodes for K-Safety
10.3. Recovering from System Failures
10.3.1. What Happens When a Node Rejoins the Cluster
10.3.2. Where and When Recovery May Fail
10.4. Avoiding Network Partitions
10.4.1. K-Safety and Network Partitions
10.4.2. Using Network Fault Protection
11. Database Replication
11.1. How Database Replication Works
11.1.1. Starting Database Replication
11.1.2. Database Replication, Availability, and Disaster Recovery
11.1.3. Database Replication and Completeness
11.2. Using Passive Database Replication
11.2.1. Specifying the DR Tables in the Schema
11.2.2. Configuring the Clusters
11.2.3. Starting the Clusters
11.2.4. Loading the Schema and Starting Replication
11.2.5. Stopping Replication
11.2.6. Database Replication and Read-only Clients
11.3. Using Cross Datacenter Replication
11.3.1. Designing Your Schema for Active Replication
11.3.2. Starting the Database Clusters
11.3.3. Loading a Matching Schema and Starting Replication
11.3.4. Stopping Replication
11.3.5. Understanding Conflict Resolution
11.4. Monitoring Database Replication
12. Security
12.1. How Security Works in VoltDB
12.2. Enabling Authentication and Authorization
12.3. Defining Users and Roles
12.4. Assigning Access to Stored Procedures
12.5. Assigning Access by Function (System Procedures, SQL Queries, and Default Procedures)
12.6. Using Default Roles
12.7. Integrating Kerberos Security with VoltDB
12.7.1. Installing and Configuring Kerberos
12.7.2. Installing and Configuring the Java Security Extensions
12.7.3. Configuring the VoltDB Servers and Clients
13. Saving & Restoring a VoltDB Database
13.1. Performing a Manual Save and Restore of a VoltDB Cluster
13.1.1. How to Save the Contents of a VoltDB Database
13.1.2. How to Restore the Contents of a VoltDB Database Manually
13.1.3. Changing the Cluster Configuration Using Save and Restore
13.2. Scheduling Automated Snapshots
13.3. Managing Snapshots
13.4. Special Notes Concerning Save and Restore
14. Command Logging and Recovery
14.1. How Command Logging Works
14.2. Controlling Command Logging
14.3. Configuring Command Logging for Optimal Performance
14.3.1. Log Size
14.3.2. Log Frequency
14.3.3. Synchronous vs. Asynchronous Logging
14.3.4. Hardware Considerations
15. Importing and Exporting Live Data
15.1. Understanding Export
15.2. Planning your Export Strategy
15.3. Identifying Export Tables in the Schema
15.4. Configuring Export in the Deployment File
15.5. How Export Works
15.5.1. Export Overflow
15.5.2. Persistence Across Database Sessions
15.6. The File Connector
15.7. The HTTP Connector
15.7.1. Understanding HTTP Properties
15.7.2. Exporting to Hadoop via WebHDFS
15.7.3. Exporting to Hadoop Using Kerberos Security
15.8. The JDBC Connector
15.9. The Kafka Connector
15.10. The RabbitMQ Connector
15.11. The Elasticsearch Connector
15.12. Understanding Import
15.12.1. One-Time Import Using Data Loading Utilities
15.12.2. Streaming Import Using Built-in Import Features
A. Supported SQL DDL Statements
ALTER TABLE — Modifies an existing table definition.
CREATE INDEX — Creates an index for faster access to a table.
CREATE PROCEDURE AS — Defines a stored procedure composed of a SQL query.
CREATE PROCEDURE FROM CLASS — Defines a stored procedure associated with a Java class.
CREATE ROLE — Defines a role and the permissions associated with that role.
CREATE TABLE — Creates a table in the database.
CREATE VIEW — Creates a view into a table, optimizing access to a summary of its contents.
DR TABLE — Identifies a table as a participant in database replication (DR)
DROP INDEX — Removes an index.
DROP PROCEDURE — Removes the definition of a stored procedure.
DROP ROLE — Removes a role.
DROP TABLE — Removes a table and any data associated with it.
DROP VIEW — Removes a view and any data associated with it.
EXPORT TABLE — Specifies that a table is for export only.
IMPORT CLASS — Specifies additional Java classes to include in the application catalog.
PARTITION PROCEDURE — Specifies that a stored procedure is partitioned.
PARTITION TABLE — Specifies that a table is partitioned and which is the partitioning column.
SET DR — Enables the use of Cross Datacenter Replication (XDCR).
B. Supported SQL Statements
DELETE — Deletes one or more records from the database.
INSERT — Creates new rows in the database, using the specified values for the columns.
SELECT — Fetches the specified rows and columns from the database.
TRUNCATE TABLE — Deletes all records from the specified table.
UPDATE — Updates the values within the specified columns and rows of the database.
UPSERT — Either inserts new rows or updates existing rows depending on the primary key value.
C. SQL Functions
ABS() — Returns the absolute value of a numeric expression.
APPROX_COUNT_DISTINCT() — Returns an approximate count of the number of distinct values for the specified column expression.
ARRAY_ELEMENT() — Returns the element at the specified location in a JSON array.
ARRAY_LENGTH() — Returns the number of elements in a JSON array.
AVG() — Returns the average of a range of numeric column values.
BIN() — Returns the binary representation of a BIGINT value as a string.
BIT_SHIFT_LEFT() — Shifts the bits of a BIGINT value to the left a specified number of places.
BIT_SHIFT_RIGHT() — Shifts the bits of a BIGINT value to the right a specified number of places.
BITAND() — Returns the mask of bits set in both of two BIGINT values
BITNOT() — Returns the mask reversing every bit of a BIGINT value.
BITOR() — Returns the mask of bits set in either of two BIGINT values
BITXOR() — Returns the mask of bits set in one but not both of two BIGINT values
CAST() — Explicitly converts an expression to the specified datatype.
CEILING() — Returns the smallest integer value greater than or equal to a numeric expression.
CHAR() — Returns a string with a single UTF-8 character associated with the specified character code.
CHAR_LENGTH() — Returns the number of characters in a string.
COALESCE() — Returns the first non-null argument, or null.
CONCAT() — Concatenates two or more strings and returns the result.
COUNT() — Returns the number of rows selected containing the specified column.
CURRENT_TIMESTAMP — Returns the current time as a timestamp value.
DATEADD() — Returns a new timestamp value by adding a specified time interval to an existing timestamp value.
DAY(), DAYOFMONTH() — Returns the day of the month as an integer value.
DAYOFWEEK() — Returns the day of the week as an integer between 1 and 7.
DAYOFYEAR() — Returns the day of the year as an integer between 1 and 366.
DECODE() — Evaluates an expression against one or more alternatives and returns the matching response.
EXP() — Returns the exponential of the specified numeric expression.
EXTRACT() — Returns the value of a selected portion of a timestamp.
FIELD() — Extracts a field value from a JSON-encoded string column.
FLOOR() — Returns the largest integer value less than or equal to a numeric expression.
FORMAT_CURRENCY() — Converts a DECIMAL to a text string as a monetary value.
FROM_UNIXTIME() — Converts a UNIX time value to a VoltDB timestamp.
HEX() — Returns the hexadecimal representation of a BIGINT value as a string.
HOUR() — Returns the hour of the day as an integer value.
LEFT() — Returns a substring from the beginning of a string.
LN(), LOG() — Returns the natural logarithm of a numeric value.
LOWER() — Returns a string converted to all lowercase characters.
MAX() — Returns the maximum value from a range of column values.
MIN() — Returns the minimum value from a range of column values.
MINUTE() — Returns the minute of the hour as an integer value.
MOD() — Returns the result of a modulo operation.
MONTH() — Returns the month of the year as an integer value.
NOW — Returns the current time as a timestamp value.
OCTET_LENGTH() — Returns the number of bytes in a string.
OVERLAY() — Returns a string overwriting a portion of the original string with the specified replacement.
PI() — Returns the value of the mathematical constant pi (π) as a FLOAT value.
POSITION() — Returns the starting position of a substring in another string.
POWER() — Returns the value of the first argument raised to the power of the second argument.
QUARTER() — Returns the quarter of the year as an integer value
REGEXP_POSITION() — Returns the starting position of a regular expression within a text string.
REPEAT() — Returns a string composed of a substring repeated the specified number of times.
REPLACE() — Returns a string replacing the specified substring of the original string with new text.
RIGHT() — Returns a substring from the end of a string.
SECOND() — Returns the seconds of the minute as a floating point value.
SET_FIELD() — Returns a copy of a JSON-encoded string, replacing the specified field value.
SINCE_EPOCH() — Converts a VoltDB timestamp to an integer number of time units since the POSIX epoch.
SPACE() — Returns a string of spaces of the specified length.
SQRT() — Returns the square root of a numeric expression.
SUBSTRING() — Returns the specified portion of a string expression.
SUM() — Returns the sum of a range of numeric column values.
TO_TIMESTAMP() — Converts an integer value to a VoltDB timestamp based on the time unit specified.
TRIM() — Returns a string with leading and/or training spaces removed.
TRUNCATE() — Truncates a VoltDB timestamp to the specified time unit.
UPPER() — Returns a string converted to all uppercase characters.
WEEK(), WEEKOFYEAR() — Returns the week of the year as an integer value.
WEEKDAY() — Returns the day of the week as an integer between 0 and 6.
YEAR() — Returns the year as an integer value.
D. VoltDB CLI Commands
csvloader — Imports the contents of a CSV file and inserts it into a VoltDB table.
jdbcloader — Extracts a table from another database via JDBC and inserts it into a VoltDB table.
kafkaloader — Imports data from a Kafka message queue into the specified database table.
sqlcmd — Starts an interactive command prompt for issuing SQL queries to a running VoltDB database
voltadmin — Performs administrative functions on a VoltDB database.
voltdb — Performs management tasks on the current server, such as starting and recovering the database.
E. Deployment File (deployment.xml)
E.1. Understanding XML Syntax
E.2. The Structure of the Deployment File
F. VoltDB Datatype Compatibility
F.1. Java and VoltDB Datatype Compatibility
G. System Procedures
@AdHoc — Executes an SQL statement specified at runtime.
@Explain — Returns the execution plan for the specified SQL query.
@ExplainProc — Returns the execution plans for all SQL queries in the specified stored procedure.
@GetPartitionKeys — Returns a list of partition values, one for every partition in the database.
@Pause — Initiates read-only mode on the cluster.
@Promote — Promotes a replica database to normal operation.
@Quiesce — Waits for all queued export data to be written to the connector.
@Resume — Returns a paused database to normal operating mode.
@Shutdown — Shuts down the database.
@SnapshotDelete — Deletes one or more native snapshots.
@SnapshotRestore — Restores a database from disk using a native format snapshot.
@SnapshotSave — Saves the current database contents to disk.
@SnapshotScan — Lists information about existing native snapshots in a given directory path.
@SnapshotStatus — Lists information about the most recent snapshots created from the current database.
@Statistics — Returns statistics about the usage of the VoltDB database.
@StopNode — Stops a VoltDB server process, removing the node from the cluster.
@SystemCatalog — Returns metadata about the database schema.
@SystemInformation — Returns configuration information about VoltDB and the individual nodes of the database cluster.
@UpdateApplicationCatalog — Reconfigures the database by replacing the application catalog and/or deployment configuration.
@UpdateClasses — Adds and removes Java classes from the database.
@UpdateLogging — Changes the logging configuration for a running database.