11.4. Monitoring Database Replication

Documentation

VoltDB Home » Documentation » Using VoltDB

11.4. Monitoring Database Replication

Database replication runs silently in the background. To ensure replication is proceeding effectively, VoltDB provides statistics on both the producer and consumer clusters that help you understand the current state of the DR process. Specifically, the statistics can tell you:

  • The amount of DR data waiting to be sent from the producer

  • The timestamp and unique ID of the last transaction received by the consumer

  • Whether any partitions are "falling behind" in processing DR data

This information is available from the @Statistics system procedure using the "DR" selector on the producer database and "DRCONSUMER" on the consumer. For one-way (passive) DR, the master database acts as the producer and the replica acts as the consumer. For two-way (cross datacenter) replication, both clusters act as both producer and consumer and can provide statistics on both roles:

  • On the producer database, the @Statistics DR procedure includes columns for the transaction ID and timestamp of the last queued transaction and for the last transaction ACKed by the consumer. The difference between these two events can tell you the approximate latency between the two databases.

  • On the consumer database, the @Statistics DRCONSUMER procedure includes statistics, on a per partition basis, showing whether it has an identified "host" server from the producer cluster "covering" it, or in other words, providing it DR logs. The system procedure results also include columns listing the ID and timestamp of the last received transaction. If a consumer partition is not covered, it means it has lost contact with the server on the producer database that was providing it logs (possibly due to a node failure). It is possible for the partition to recover, once the covering server rejoins. However, the difference between the last received timestamp of that partition and the other partitions may give you an indication of how long the interruption has persisted and how far behind that partition may be.