15.9. The Kafka Connector

Documentation

VoltDB Home » Documentation » Using VoltDB

15.9. The Kafka Connector

The Kafka connector receives serialized data from the export tables and writes it to a message queue using the Apache Kafka version 0.8.2 protocols. Apache Kafka is a distributed messaging service that lets you set up message queues which are written to and read from by "producers" and "consumers", respectively. In the Apache Kafka model, VoltDB export acts as a "producer".

Before using the Kafka connector, we strongly recommend reading the Kafka documentation and becoming familiar with the software, since you will need to set up a Kafka 0.8.2 service and appropriate "consumer" clients to make use of VoltDB's Kafka export functionality. The instructions in this section assume a working knowledge of Kafka and the Kafka operational model.

When the Kafka connector receives data from the VoltDB export tables, it establishes a connection to the Kafka messaging service as a Kafka producer. It then writes records to Kafka topics based on the VoltDB table name and certain export connector properties.

The majority of the Kafka export properties are identical in both in name and content to the Kafka producer properties listed in the Kafka documentation. All but one of these properties are optional for the Kafka connector and will use the standard Kafka default value. For example, if you do not specify the queue.buffering.max.ms property it defaults to 5000 milliseconds.

The only required property is bootstrap.servers, which lists the Kafka servers that the VoltDB export connector should connect to. You must specify this property so VoltDB knows where to send the export data.

In addition to the standard Kafka producer properties, there are several custom properties specific to VoltDB. The properties binaryencoding, skipinternals, and timezone affect the format of the data. The topic.prefix and topic.key properties affect how the data is written to Kafka.

The topic.prefix property specifies the text that precedes the table name when constructing the Kafka topic. If you do not specify a prefix, it defaults to "voltdbexport". Alternately, you can map individual tables to topics using the topic.key property. In the topic.key property you associate an individual Kafka topic with the corresponding VoltDB export table name as a named pair separated by a colon (:). Multiple named pairs are separated by commas (,). For example:

EmpTopic:Employee,CoTopic:Company,EntTopic:Enterprise

Any table-specific mappings in the topic.key property override the automated topic name specified by topic.prefix.

Note that unless you configure the Kafka brokers with the auto.create.topics.enable property set to true, you must create the topics for every export table manually before starting the export process. Enabling auto-creation of topics when setting up the Kafka brokers is recommended.

When configuring the Kafka export connector, it is important to understand the relationship between synchronous versus asynchronous processing and its effect on database latency. If the export data is sent asynchronously, the impact of export on the database is reduced, since the export connector does not wait for the Kafka infrastructure to respond. However, with asynchronous processing, VoltDB is not able to resend the data if the message fails after it is sent.

If export to Kafka is done synchronously, the export connector waits for acknowledgement of each message sent to Kafka before processing the next packet. This allows the connector to resend any packets that fail. The drawback to synchronous processing is that on a heavily loaded database, the latency it introduces means export may not be able to keep up with the influx of export data and and have to write to overflow.

You specify the level of synchronicity and durability of the connection using the Kafka acks property. Set acks to "0" for asynchronous processing, "1" for synchronous delivery to the Kafka broker, or "all" to ensure durability on the Kafka broker. Use of "all" is not recommended for VoltDB export. See the Kafka documentation for more information.

VoltDB guarantees that at least one copy of all export data is sent by the export connector. But when operating in asynchronous mode, the Kafka connector cannot guarantee that the packet is actually received and accepted by the Kafka broker. By operating in synchronous mode, VoltDB can catch errors returned by the Kafka broker and resend any failed packets. However, you pay the penalty of additional latency and possible export overflow.

Finally, the actual export data is sent to Kafka as a comma-separated values (CSV) formatted string. The message includes six columns of metadata (such as the transaction ID and timestamp) followed by the column values of the export table.

Table 15.4, “Kafka Export Properties” lists the supported properties for the Kafka connector, including the standard Kafka producer properties and the VoltDB unique properties.

Table 15.4. Kafka Export Properties

PropertyAllowable ValuesDescription
bootstrap.servers*string

A comma-separated list of Kafka brokers. You can use metadata.broker.list as a synonym for bootstrap.servers.

acks0, 1, allSpecifies whether export is synchronous (1 or all) or asynchronous (0) and to what extent it ensures delivery. See the Kafka documentation of the producer properties for details.
acks.retry.timeoutintegerSpecifies how long, in milliseconds, the connector will wait for acknowledgment from Kafka for each packet. The retry timeout only applies if acknowledgements are enabled. That is, if the acks property is set greater than zero. The default timeout is 5,000 milliseconds. When the timeout is reached, the connector will resend the packet of messages.
partition.key{table}.{column}[,...]

Specifies which table column value to use as the Kafka partitioning key for each table. Kafka uses the partition key to distribute messages across multiple servers.

By default, the value of the table's partitioning column is used as the Kafka partition key. Using this property you can specify a list of table column names, where the table name and column name are separated by a period and the list of table references is separated by commas. If the table is not partitioned and you do not specify a key, the server partition ID is used as a default.

binaryencodinghex, base64Specifies whether VARBINARY data is encoded in hexadecimal or BASE64 format. The default is hexadecimal.
skipinternalstrue, falseSpecifies whether to include six columns of VoltDB metadata (such as transaction ID and timestamp) in the output. If you specify skipinternals as true, the output contains only the exported table data. The default is false.
timezonestringThe time zone to use when formatting the timestamp. Specify the time zone as a Java timezone identifier. The default is GMT.
topic.keystring

A set of named pairs each identifying a Kafka topic name and the corresponding VoltDB table name that will be written to that topic. Separate the names with a colon (:) and the name pairs with a comma (,).

The specific topic/table mappings declared by topic.key override the automated topic names specified by topic.prefix.

topic.prefixstringThe prefix to use when constructing the topic name. Each row is sent to a topic identified by {prefix}{table-name}. The default prefix is "voltdbexport".
Kafka producer propertiesvarious

You can specify standard Kafka producer properties as export connector properties and their values will be passed to the Kafka interface. However, you cannot modify the property block.on.buffer.full.

*Required