15.2. Planning your Export Strategy

Documentation

VoltDB Home » Documentation » Using VoltDB

15.2. Planning your Export Strategy

The important point when planning to export data, is deciding:

  • What data to export

  • When to export the data

  • Where to export data to

It is possible to export all of the data in a VoltDB database. You would do this by creating export-only replicas of all tables in the schema and writing to the export-only table whenever you insert into the normal table. However, this means the same number of transactions and volume of data that is being processed by VoltDB will be exported through the connector. There is a strong likelihood, given a high transaction volume, that the target database will not be able to keep up with the load VoltDB is handling. As a consequence you will usually want to be more selective about what data is exported when.

If you have an existing target database, the question of what data to export is likely decided for you (that is, you need to export the data matching the target's schema). If you are defining both your VoltDB database and your target at the same time, you will need to think about what information is needed "downstream" and create the appropriate export-only tables within VoltDB.

The second consideration is when to export the data. For tables that are not updated frequently, inserting the data to a complementary export-only table whenever data is inserted into the real table is the easiest and most practical approach. For tables that are updated frequently (hundreds or thousands of times a second) you should consider writing a copy of the data to an export-only table at an appropriate milestone.

Using the flight reservation system as an example, one aspect of the workflow not addressed by the application described in Chapter 6, Designing VoltDB Client Applications is the need to archive information about the flights after takeoff. Changes to reservations (additions and cancellations) are important in real time. However, once the flight takes off, all that needs to be recorded (for billing purposes, say) is what reservations were active at the time.

In other words, the archiving database needs information about the customers, the flights, and the final reservations. According to the workload in Table 4.1, “Example Application Workload”, the customer and flight tables change infrequently. So data can be inserted into the export-only tables at the same time as the "live" flight and reservation tables. (It is a good idea to give the export-only copy of the table a meaningful name so its purpose is clear. In this example we identify the export-only tables with the export_ prefix or, in the case of the reservation table which is not an exact copy, the _final suffix.)

The reservation table, on the other hand, is updated frequently. So rather than export all changes to a reservation to the export-only reservation table in real-time, a separate stored procedure is invoked when a flight takes off. This procedure copies the final reservation data to the export-only table and deletes the associated flight and reservation records from the VoltDB database. Figure 15.2, “Flight Schema with Export Table” shows the modified database schema with the added export-only tables, EXPORT_FLIGHT, EXPORT_CUSTOMER, and RESERVATION_FINAL.

Figure 15.2. Flight Schema with Export Table

Flight Schema with Export Table

This design adds a transaction to the VoltDB application, which is executed approximately once a second (when a flight takes off). However, it reduces the number of reservation transactions being exported from 1200 a second to less than 200 a second. These are the sorts of trade offs you need to consider when adding export functionality to your application.

The third decision is where to export the data to. As described in Section 15.4, “Configuring Export in the Deployment File”, you can export the data through multiple different protocols: files, HTTP. JDBC, etc. Your choice of protocol will depend on the ultimate target destination for your exported data.

You can also export to multiple destinations at once. When you declare an export table, you assign it to a specific stream. If you want different tables to be exported to different destinations, you can declare the tables to belong to different streams. Then in the deployment file you can configure each stream to be exported to a different destination.