6.2. Recovering in Safe Mode

Documentation

VoltDB Home » Documentation » Administrator's Guide

6.2. Recovering in Safe Mode

After determining what caused the problem, the next step is often to get the database up and running again as soon as possible. When using snapshots or command logs, this is done using the voltdb recover command described in Section 3.4, “Restarting the Database”. However, in unusual cases, the resume itself may fail.

There are several situations where an attempt to recover a database — either from a snapshot or command logs — may fail. For example, restoring a snapshot where a unique index has been added to a table can result in a constraint violation that causes the restore, and the database, to fail. Similarly, a command log may contain a transaction that originally succeeded but fails and raises an exception during playback.

In both of these situations, VoltDB issues a fatal error and stops the database to avoid corrupting the contents.

Although protecting you from an incomplete recovery is the appropriate default behavior, there may be cases where you want to recover as much data as possible, with full knowledge that the resulting data set does not match the original. VoltDB provides two techniques for performing partial recoveries in case of failure:

  • Logging constraint violations during snapshot restore

  • Performing command log recovery in safe mode

The following sections describe these techniques.

Warning

It is critically important to recognize that the techniques described in this section do not produce a complete copy of the original database or resolve the underlying problem that caused the initial recovery to fail. These techniques should never be attempted without careful consideration and full knowledge and acceptance of the risks associated with partial data recovery.

6.2.1. Logging Constraint Violations

There are several situations that can cause a snapshot restore to fail because of constraint violations. Rather than have the operation fail as a whole, you can request that constraint violations be logged to a file instead. This way you can review the tuples that were excluded and decide whether to ignore or replace their content manually after the restore completes.

To perform a manual restore that logs constraint violations rather than stopping when they occur, you use a special JSON form of the @SnapshotRestore system procedure. You specify the path of the log files in a JSON attribute, duplicatePaths. For example, the following commands perform a restore of snapshot files in the directory /var/voltdb/snapshots/ with the unique identifier myDB. The restore operation logs constraint violations to the directory /var/voltdb/logs.

$ sqlcmd
1> exec @SnapshotRestore '{ "path":"/https/docs.voltactivedata.com/var/voltdb/snapshots/", 
                            "nonce":"myDB", 
                            "duplicatesPath":"/var/voltdb/logs/" }';
2> exit

Constraint violations are logged as needed, one file per table, to CSV files with the name {table}-duplicates-{timestamp}.csv.

6.2.2. Safe Mode Recovery

On rare occasions, recovering a database from command logs may fail. This can happen, for example, if a stored procedure introduces non-deterministic content. If a recovery fails, the specific error is known. However, there is no way for VoltDB to know the root cause or how to continue. Therefore, the recovery fails and the database stops.

When this happens, VoltDB logs the last successful transaction before the recovery failed. You can then ask VoltDB to recover up to but not including the failing transaction by performing a recovery in safe mode.

You request safe mode by adding the --safemode switch to the command line when starting the recovery operation, like so:

$ voltdb recover --safemode -license ~/license.xml

When VoltDB recovers from command logs in safe mode it enables two distinct behaviors:

  • Snapshots are restored, logging any constraint violations

  • Command logs are replayed up to the last valid transaction

This means that if you are recovering using an automated snapshot (rather than command logs), you can recover some data even if there are constraint violations during the snapshot restore. Also, when recovering from command logs, VoltDB will ignore constraint violations in the command log snapshot and replay all transactions that succeeded in the previous attempt.

It is important to note that to successfully use safe mode with command logs, you must perform a regular recovery operation first — and have it fail — so that VoltDB can determine the last valid transaction. Also, if the snapshot and the command logs contain both constraint violations and failed transactions, you may need to run recovery in safe mode twice to recover as much data as possible. Once to complete restoration of the snapshot, then a second time to recover the command logs up to a point before the failed transaction.