VoltDB is an in-memory, transaction processing database. It excels at managing large volumes of transactions in real-time.
However, transaction processing is often only one aspect of the larger business context and data needs to transition from system to system as part of the overall solution. The process of moving from one database to another as data moves through the system is often referred to as Extract, Transform, and Load (ETL). VoltDB supports ETL through the ability to selectively export data as it is committed to the database, as well as the ability to import data through multiple standard protocols.
Exporting data differs from save and restore (as described in Chapter 13, Saving & Restoring a VoltDB Database) in several ways:
You only export selected data (as required by the business process)
Export is an ongoing process rather than a one-time event
The outcome of exporting data is that information is used by other business processes, not as a backup copy for restoring the database
The target for exporting data from VoltDB may be another database, a repository (such as a sequential log file), or a process (such as a system monitor or accounting system). No matter what the target, VoltDB helps automate the process for you. This chapter explains how to plan for and implement the exporting of live data using VoltDB.
For import, VoltDB supports both one-time import through data loading utilities and ongoing import as part of the database process. The following sections describe how to use VoltDB export and import in detail.
VoltDB lets you automate the export process by specifying certain tables in the schema as sources for export. At runtime, any data written to the specified tables is sent to the selected export connector, which queues the data for export. Then, asynchronously, the connector sends the queued export data to the selected output target. Which export connector runs depends on the target you choose when configuring export in the deployment file. Currently, VoltDB provides connectors for exporting to files, for exporting to other business processes via a distributed message queue or HTTP, and for exporting to other databases via JDBC. The connector processes are managed by the database servers themselves, helping to distribute the work and ensure maximum throughput.
Figure 15.1, “Overview of the Export Process” illustrates the basic export procedure, where Tables B and D are specified as export tables.
Note that you do not need to modify the schema or the client application to turn exporting of live data on and off. The application's stored procedures insert data into the export-only tables; but it is the deployment file that determines whether export actually occurs at runtime.
When a stored procedure uses an SQL INSERT statement to write data into an export-only table, rather than storing that data in the database, it is handed off to the connector when the stored procedure successfully commits the transaction.[5] Export-only tables have several important characteristics:
Export-only tables let you limit the export to only the data that is required. For example, in the preceding example, Table B may contain a subset of columns from Table A. Whenever a new record is written to Table A, the corresponding columns can be written to Table B for export to the remote database.
Export-only tables let you combine fields from several existing tables into a single exported table. This technique is particularly useful if your VoltDB database and the target of the export have different schema. The export-only table can act as a transformation of VoltDB data to a representation of the target schema.
Export-only tables let you control when data is exported. Again, in the previous example, Table D might be an export-only table that is an exact replica of Table C. However, the records in Table C are updated frequently. The client application can choose to copy records from Table C to Table D only when all of the updates are completed and the data is finalized, significantly reducing the amount of data that must pass through the connector.
Of course, there are restrictions to export-only tables. Since they have no storage associated with them, they are for INSERT only. Any attempt to SELECT, UPDATE, or DELETE data from export-only tables will result in an error.
[5] There is no guarantee on the latency of export between the connector and the export target. The export function is transactionally correct; no export occurs if the stored procedure rolls back and the export data is in the appropriate transaction order. But the flow of export data from the connector to the target is not synchronous with the completion of the transaction. There may be several seconds delay before the export data reaches the target.