15.6. The File Connector

Documentation

VoltDB Home » Documentation » Using VoltDB

15.6. The File Connector

The file connector receives the serialized data from the export tables and writes it out as text files (either comma or tab separated) to disk. The file connector writes the data out one file per database table, "rolling" over to new files periodically. The filenames of the exported data are constructed from:

  • A unique prefix (specified with the nonce property)

  • A unique value identifying the current version of the database schema

  • The table name

  • A timestamp identifying when the file was started

While the file is being written, the file name also contains the prefix "active-". Once the file is complete and a new file started, the "active-" prefix is removed. Therefore, any export files without the prefix are complete and can be copied, moved, deleted, or post-processed as desired.

There are two properties that must be set when using the file connector:

  • The type property lets you choose between comma-separated files (csv) or tab-delimited files (tsv).

  • The nonce property specifies a unique prefix to identify all files that the connector writes out for this database instance.

Table 15.1, “File Export Properties” describes the supported properties for the file connector.

Table 15.1. File Export Properties

PropertyAllowable ValuesDescription
type*csv, tsvSpecifies whether to create comma-separated (CSV) or tab-delimited (TSV) files,
nonce*stringA unique prefix for the output files.
outdirdirectory pathThe directory where the files are created. If you do not specify an output path, VoltDB writes the output files to the current default directory.
periodIntegerThe frequency, in minutes, for "rolling" the output file. The default frequency is 60 minutes.
binaryencodinghex, base64Specifies whether VARBINARY data is encoded in hexadecimal or BASE64 format. The default is hexadecimal.
dateformatformat stringThe format of the date used when constructing the output file names. You specify the date format as a Java SimpleDateFormat string. The default format is "yyyyMMddHHmmss".
timezonestringThe time zone to use when formatting the timestamp. Specify the time zone as a Java timezone identifier. The default is GMT.
delimitersstringSpecifies the delimiter characters for CSV output. The text string specifies four characters: the field delimiter, the enclosing character, the escape character, and the record delimiter. To use special or non-printing characters (including the space character) encode the character as an HTML entity. For example "<" for the "less than" symbol.
batchedtrue, falseSpecifies whether to store the output files in subfolders that are "rolled" according to the frequency specified by the period property. The subfolders are named according to the nonce and the timestamp, with "active-" prefixed to the subfolder currently being written.
skipinternalstrue, falseSpecifies whether to include six columns of VoltDB metadata (such as transaction ID and timestamp) in the output. If you specify skipinternals as "true", the output files contain only the exported table data.
with-schematrue, falseSpecifies whether to write a JSON representation of each table's schema as part of the export. The JSON schema files can be used to ensure the appropriate datatype and precision is maintained if and when the output files are imported into another system.

*Required


Whatever properties you choose, the order and representation of the content within the output files is the same. The export connector writes a separate line of data for every INSERT it receives, including the following information:

  • Six columns of metadata generated by the export connector. This information includes a transaction ID, a timestamp, a sequence number, the site and partition IDs, as well as an integer indicating the query type.

  • The remaining columns are the columns of the database table, in the same order as they are listed in the database definition (DDL) file.