3.2. Designing the Data Access (Stored Procedures)

Documentation

VoltDB Home » Documentation » Using VoltDB

3.2. Designing the Data Access (Stored Procedures)

As you can see from the previous discussion of designing the database, defining the database schema — and particularly the partitioning plan — goes hand in hand with understanding how the data is accessed. The two must be coordinated to ensure optimum performance.

It doesn't matter whether you design the partitioning first or the data access first, as long as in the end they work together. However, for the sake of example, we will use the schema and partitioning outlined in the preceding sections when discussing how to design the data access.

3.2.1. Writing VoltDB Stored Procedures

The key to designing the data access for VoltDB applications is that complex or performance sensitive access to the database should be done through stored procedures. It is possible to perform ad hoc queries on a VoltDB database. However, ad hoc queries do not benefit as fully from the performance optimizations VoltDB specializes in and therefore should not be used for frequent, repetitive, or complex transactions.

In VoltDB, a stored procedure and a transaction are one and the same. The stored procedure succeeds or rolls back as a whole. Also, because the transaction is defined in advance as a stored procedure, there is no need for specific BEGIN TRANSACTION or END TRANSACTION commands.[1]

Within the stored procedure, you access the database using standard SQL syntax, with statements such as SELECT, UPDATE, INSERT, and DELETE. You can also include your own code within the stored procedure to perform calculations on the returned values, to evaluate and execute conditional statements, or to perform any other functions your applications need.

3.2.2. VoltDB Stored Procedures and Determinism

To ensure data consistency and durability, VoltDB procedures must be deterministic. That is, given specific input values, the outcome of the procedure is predictable. Determinism is critical because it allows the same stored procedure to run in multiple locations and give the same results. It is determinism that makes it possible to run redundant copies of the database partitions without impacting performance. (See Chapter 11, Availability for more information on redundancy and availability.)

One key to deterministic behavior is avoiding ambiguous SQL queries. Specifically, performing unsorted queries can result in a nondeterministic outcome. VoltDB does not guarantee a consistent order of results unless you use a tree index to scan the records in a specific order or you specify an ORDER BY clause in the query itself. In the worst case, a limiting query, such as SELECT TOP 10 Emp_ID FROM Employees without an index or ORDER BY clause, can result in a different set of rows being returned. However, even a simple query such as SELECT * from Employees can return the same rows in a different order.

The problem is that even if a non-deterministic query is read-only, its results might be used as input to an INSERT, UPDATE, or DELETE statement elsewhere in the stored procedure. For clusters with a K-safety value greater than zero, this means unsorted query results returned by two copies of the same partition, which may not match, could be used for separate update queries. If this happens. VoltDB detects the mismatch, reports it as potential data corruption, and shuts down the cluster to protect the database contents.

This is why VoltDB issues a warning for any non-deterministic queries in read-write stored procedures. This is also why use of an ORDER BY clause or a tree index in the WHERE constraint is strongly recommended for all SELECT statements that return multiple rows.

Another key to deterministic behavior is avoiding external functions or procedures that can introduce arbitrary data. External functions include file and network I/O (which should be avoided any way because they can impact latency), as well as many common system-specific procedures such as Date and Time.

However, this limitation does not mean you cannot use arbitrary data in VoltDB stored procedures. It just means you must either generate the arbitrary data outside the stored procedure and pass it in as input parameters or generate it in a deterministic way.

For example, if you need to load a set of records from a file, you can open the file in your application and pass each row of data to a stored procedure that loads the data into the VoltDB database. This is the best method when retrieving arbitrary data from sources (such as files or network resources) that would impact latency.

The other alternative is to use data that can be generated deterministically. For two of the most common cases, timestamps and random values, VoltDB provides a method for doing this:

  • VoltProcedure.getTransactionTime() returns a timestamp that can be used in place of the Java Date or Time classes.

  • VoltProcedure.getSeededRandomNumberGenerator() returns a pseudo random number that can be used in place of the Java Util.Random class.

These procedures use the current transaction ID to generate a deterministic value for the timestamp and the random number.

Finally, even seemingly harmless programming techniques, such as static variables can introduce unpredictable behavior. VoltDB provides no guarantees concerning the state of the stored procedure class instance across invocations. Any information that you want to persist across invocations must either be stored in the database itself or passed into the stored procedure as a procedure parameter.

3.2.3. The Anatomy of a VoltDB Stored Procedure

The stored procedures themselves are written as Java classes, each procedure being a separate class. Example 3.1, “Components of a VoltDB Stored Procedure” shows the stored procedure that looks up a flight to see if there are any available seats. The callouts identify the key components of a VoltDB stored procedure.

Example 3.1. Components of a VoltDB Stored Procedure

package fadvisor.procedures;

import org.voltdb.*;                                       1

public class HowManySeats extends VoltProcedure {          2

   public final SQLStmt GetSeatCount = new SQLStmt(        3
        "SELECT NumberOfSeats, COUNT(ReserveID) " +
        "FROM Flight AS F, Reservation AS R " +
        "WHERE F.FlightID=R.FlightID AND R.FlightID=? " +
        "GROUP BY NumberOfSeats;");
 
    public long run( int flightid) 
        throws VoltAbortException {                        4

        long numofseats;
        long seatsinuse;
        VoltTable[] queryresults;

        voltQueueSQL( GetSeatCount, flightid);             5
        queryresults = voltExecuteSQL();                   6

        VoltTable result = queryresults[0];                7
        if (result.getRowCount() < 1) { return -1; } 
        numofseats = result.fetchRow(0).getLong(0); 
        seatsinuse = result.fetchRow(0).getLong(1);

        numofseats = numofseats - seatsinuse;              8
        return numofseats; // Return available seats
    }
}

1

Stored procedures are written as Java classes. To access the VoltDB classes and methods, be sure to import org.voltdb.*.

2

Each stored procedure extends the generic class VoltProcedure.

3

Within the stored procedure you access the database using a subset of ANSI-standard SQL statements. To do this, you declare the statement as a special Java type called SQLStmt. In the SQL statement, you insert a question mark (?) everywhere you want to replace a value by a variable at runtime. (See Appendix B, Supported SQL Statements for details on the supported SQL statements.)

4

The bulk of the stored procedure is the run method. Note that the run method throws the exception VoltAbortException if any exceptions are not caught. VoltAbortException causes the stored procedure to rollback. (See Section 3.2.3.6, “Rolling Back a Transaction” for more information about rollback.)

5

To perform database queries, you queue SQL statements (specifying both the SQL statement and the variables to use) using the voltQueueSQL method.

6

Once you queue all of the SQL statements you want to perform, use voltExecuteSQL to execute the statements in the queue.

7

Each statement returns its results in a VoltTable structure. Because the queue can contain multiple queries, voltExecuteSQL returns an array of VoltTable structures, one array element for each query.

8

In addition to queueing and executing queries, stored procedures can contain custom code. Note, however, you should limit the amount of custom code in stored procedures to only that processing that is necessary to complete the transaction, so as not to delay the following transactions in the queue.

The following sections describe these components in more detail.

3.2.3.1. The Structure of the Stored Procedure

VoltDB stored procedures are Java classes. The key points to remember are to:

  • Import the VoltDB classes in org.voltdb.*

  • Include the class definition, which extends the abstract class VoltProcedure

  • Define the method run, that performs the SQL queries and processing that make up the transaction

The following diagram illustrates the basic structure if a VoltDB stored procedure.

import org.voltdb.*;

public class Procedure-name extends VoltProcedure {

                 // Declare SQL statements ...

    public datatype run ( arguments ) throws VoltAbortException {


                // Body of the Stored Procedure ...


    }
}

3.2.3.2. Passing Arguments to a Stored Procedure

You specify the number and type of the arguments that the stored procedure accepts in the run() method. For example, the following is the declaration of the run() method for the Initialize stored procedure from the voter sample application. This procedure accepts two arguments: an integer and a string.

public long run(int maxContestants, String contestants) {

VoltDB stored procedures can accept parameters of any of the following Java and VoltDB datatypes:

  • Integer types: byte, short, int, long, Byte, Short, Integer, and Long

  • Floating point types: float, double, Float, and Double

  • Fixed decimal point: BigDecimal

  • Timestamp types: VoltDB timestamp (org.voltdb.types.TimestampType), java.util.Date, java.sql.Date, and java.sql.Timestamp

  • String and binary types: String and byte[]

  • VoltDB types: VoltTable

The arguments can be scalar objects or arrays of any of the preceding types. For example, the following run() method defines three arguments: a scalar long and two arrays, one array of timestamps and one array of Strings:

import org.voltdb.*;
public class LogMessagesByEvent extends VoltProcedure {

     public long run ( 
           long eventType, 
           org.voltdb.types.TimestampType[] eventTimeStamps,
           String[] eventMessages
    ) throws VoltAbortException {

The calling application can use any of the preceding datatypes when invoking the callProcedure() method and, where necessary, VoltDB makes the appropriate type conversions (for example, from int to String or from String to Double). (See Section 3.3.2, “Invoking Stored Procedures” for information on the callProcedure() method.)

3.2.3.3. Creating and Executing SQL Queries in Stored Procedures

The main function of the stored procedure is to perform database queries. In VoltDB this is done in two steps:

  1. Queue the queries using the voltQueueSQL function

  2. Execute the queue and return the results using voltExecuteSQL

The first argument to voltQueueSQL is the SQL statement to be executed. The SQL statement is declared using a special class, SQLStmt, with question marks as placeholders for values that will be inserted at runtime. The remaining arguments to voltQueueSQL are the actual values that VoltDB inserts into the placeholders.

For example, if you want to perform a SELECT of a table using two columns in the WHERE clause, your SQL statement might look something like this:

SELECT CustomerID FROM Customer WHERE FirstName=? AND LastName=?;

At runtime, you want the questions marks replaced by values passed in as arguments from the calling application. So the actual voltQueueSQL invocation might look like this:

public final SQLStmt getcustid = new SQLStmt(
                                "SELECT CustomerID FROM Customer " +
                                "WHERE FirstName=? AND LastName=?;");

     ...

voltQueueSQL(getcustid, firstnm, lastnm);

Once you have queued all of the SQL statements you want to execute together, you can then process the queue using the voltExecuteSQL function:

VoltTable[] queryresults = voltExecuteSQL();

Note that you can queue multiple SQL statements before calling voltExecuteSQL. This improves performance when executing multiple SQL queries because it minimizes the amount of network traffic within the cluster.

You can also queue and execute SQL statements as many times as necessary to complete the transaction. For example, if you want to make a flight reservation, you may need to verify that the flight exists before creating the reservation. One way to do this is to look up the flight, verify that a valid row was returned, then insert the reservation, like so:

final String getflight = "SELECT FlightID FROM Flight WHERE FlightID=?;";
final String makeres = "INSERT INTO Reservation (?,?,?,?,?,?);";

public final SQLStmt getflightsql = new SQLStmt(getflight);
public final SQLStmt makeressql = new SQLStmt(makeres);

public VoltTable[] run( int servenum, int flightnum, int customernum ) 
        throws VoltAbortException { 

        // Verify flight exists
  voltQueueSQL(getflightsql, flightnum);
  VoltTable[] queryresults = voltExecuteSQL();

        // If there is no matching record, rollback  
  if (queryresults[0].getRowCount() == 0 ) throw new VoltAbortException();

        // Make reservation
  voltQueueSQL(makeressql, reservnum, flightnum, customernum,0,0);
  return voltExecuteSQL();
}

3.2.3.4. Interpreting the Results of SQL Queries

When you call voltExecuteSQL, the results of all the queued SQL statements are returned in an array of VoltTable structures. The array contains one VoltTable for each SQL statement in the queue. The VoltTables are returned in the same order as the respective SQL statements in the queue.

The VoltTable itself consists of rows. Each row contains columns. Each column has a label and a value of a fixed datatype. The number of rows and columns per row depends on the specific query.

For example, if you queue two SQL SELECT statements, one looking for the destination of a specific flight and the second looking up the ReserveID and Customer name (first and last) of reservations for that flight, the code for the stored procedure might look like the following:

public final SQLStmt getdestsql = new SQLStmt(
              "SELECT Destination FROM Flight WHERE FlightID=?;");
public final SQLStmt getressql = new SQLStmt(
             "SELECT r.ReserveID, c.FirstName, c.LastName " +
             "FROM Reservation AS r, Customer AS c " +
             "WHERE r.FlightID=? AND r.CustomerID=c.CustomerID;");

         ...

   voltQueueSQL(getdestsql,flightnum);
   voltQueueSQL(getressql,flightnum);
   VoltTable[] results = voltExecuteSQL();

The array returned by voltExecuteSQL will have two elements:

  • The first array element is a VoltTable with one row (FlightID is defined as unique) with one column, because the SELECT statement returns only one value.

  • The second array element is a VoltTable with as many rows as there are reservations for the specific flight, each row containing three columns: ReserveID, FirstName, and LastName.

VoltDB provides a set of convenience routines for accessing the contents of the VoltTable array. Table 3.2, “Methods of the VoltTable Classes” lists some of the most common methods.

Table 3.2. Methods of the VoltTable Classes

MethodDescription

int fetchRow(int index)

Returns an instance of the VoltTableRow class for the row specified by index.

int getRowCount()

Returns the number of rows in the table.

int getColumnCount()

Returns the number of columns for each row in the table.

Type getColumnType(int index)

Returns the datatype of the column at the specified index. Type is an enumerated type with the following possible values:

BIGINT
DECIMAL
FLOAT
INTEGER
INVALID
NULL
NUMERIC
SMALLINT
STRING
TIMESTAMP
TINYINT
VARBINARY
VOLTTABLE

String getColumnName(int index)

Returns the name of the column at the specified index.

double getDouble(int index)
long getLong(int index)
String getString(int index)
BigDecimal getDecimalAsBigDecimal(int index)
double getDecimalAsDouble(int index)
Date getTimestampAsTimestamp(int index)
long getTimestampAsLong(int index)
byte[] getVarbinary(int index)

Methods of VoltTable.Row

Return the value of the column at the specified index in the appropriate datatype. Because the datatype of the columns vary depending on the SQL query, there is no generic method for returning the value. You must specify what datatype to use when fetching the value.


It is also possible to retrieve the column values by name. You can invoke the getDatatype methods passing a string argument specifying the name of the column, rather than the numeric index.

Accessing the columns by name can make code easier to read and less susceptible to errors due to changes in the SQL schema (such as changing the order of the columns). On the other hand, accessing column values by numeric index is potentially more efficient under heavy load conditions.

Example 3.2, “Displaying the Contents of VoltTable Arrays” shows a generic routine for walking through the return results of a stored procedure. In this example, the contents of the VoltTable array are written to standard output.

Example 3.2. Displaying the Contents of VoltTable Arrays

public void displayResults(VoltTable[] results) {
  int table = 1;
     for (VoltTable result : results) {
      System.out.printf("*** Table %d ***\n",table++);
      displayTable(result);
   }
}

public void displayTable(VoltTable t) {

   final int colCount = t.getColumnCount();
   int rowCount = 1;
   t.resetRowPosition();
   while (t.advanceRow()) { 
      System.out.printf("--- Row %d ---\n",rowCount++);
 
      for (int col=0; col<colCount; col++) {
         System.out.printf("%s: ",t.getColumnName(col));
         switch(t.getColumnType(col)) {
            case TINYINT: case SMALLINT: case BIGINT: case INTEGER:
               System.out.printf("%d\n", t.getLong(col));
               break;
            case STRING:
               System.out.printf("%s\n", t.getString(col));
               break;
            case DECIMAL:
               System.out.printf("%f\n", t.getDecimalAsBigDecimal(col));
               break;
            case FLOAT:
               System.out.printf("%f\n", t.getDouble(col));
               break;
         }
      }
   }
}

For further details on interpreting the VoltTable structure, see the Java documentation that is provided online in the doc/ subfolder for your VoltDB installation.

3.2.3.5. Returning Results from a Stored Procedure

Stored procedures can return a single VoltTable, an array of VoltTables, or a long integer. You can return all of the query results by returning the VoltTable array, or you can return a scalar value that is the logical result of the transaction. (For example, the stored procedure in Example 3.1, “Components of a VoltDB Stored Procedure” returns a long integer representing the number of remaining seats available in the flight.)

Whatever value the stored procedure returns, make sure the run method includes the appropriate datatype in its definition. For example, the following two definitions specify different return datatypes; the first returns a long integer and the second returns the results of a SQL query as a VoltTable array.

public long run( int flightid)

public VoltTable[] run ( String lastname, String firstname) 

It is important to note that you can interpret the results of SQL queries either in the stored procedure or in the client application. However, for performance reasons, it is best to limit the amount of additional processing done by the stored procedure to ensure it executes quickly and frees the queue for the next stored procedure. So unless the processing is necessary for subsequent SQL queries, it is usually best to return the query results (in other words, the VoltTable array) directly to the calling application and interpret them there.

3.2.3.6. Rolling Back a Transaction

Finally, if a problem arises while a stored procedure is executing, whether the problem is anticipated or unexpected, it is important that the transaction rolls back. Rollback means that any changes made during the transaction are undone and the database is left in the same state it was in before the transaction started.

VoltDB is a fully transactional database, which means that if a transaction (i.e. stored procedure) fails, the transaction is automatically rolled back and the appropriate exception is returned to the calling application. Exceptions that can cause a rollback include the following:

  • Runtime errors in the stored procedure code, such as division by zero or datatype overflow.

  • Violating database constraints in SQL queries, such as inserting a duplicate value into a column defined as unique.

There may also be situations where a logical exception occurs. In other words, there is no programmatic issue that might be caught by Java or VoltDB, but a situation occurs where there is no practical way for the transaction to complete. In these conditions, the stored procedure can force a rollback by explicitly throwing the VoltAbortException exception.

For example, if a flight ID does not exist, you do not want to create a reservation so the stored procedure can force a rollback like so:

if (!flightid) { throw new VoltAbortException(); }

See Section 4.4, “Verifying Expected Query Results” for another way to roll back procedures when queries do not meet necessary conditions.

3.2.4. Partitioning Stored Procedures

To make your stored procedures accessible in the database, you must declare them in the DDL schema using the CREATE PROCEDURE statement. For example, the following statements declare five stored procedures, identifying them by their class name:

CREATE PROCEDURE FROM CLASS procedures.LookupFlight;
CREATE PROCEDURE FROM CLASS procedures.HowManySeats;
CREATE PROCEDURE FROM CLASS procedures.MakeReservation;
CREATE PROCEDURE FROM CLASS procedures.CancelReservation;
CREATE PROCEDURE FROM CLASS procedures.RemoveFlight;

You can also declare your stored procedures as single-partitioned or not. If you do not declare a procedure as single-partitioned, it is assumed to be multi-partitioned by default.

The advantage of multi-partitioned stored procedures is that they have full access to all of the data in the database. However, the real focus of VoltDB, and the way to achieve maximum throughput for your OLTP application, is through the use of single-partitioned stored procedures.

Single-partitioned stored procedures are special because they operate independently of other partitions (which is why they are so fast). At the same time, single-partitioned stored procedures operate on only a subset of the entire data (i.e. only the data within the specified partition). Most important of all it is the responsibility of the application developer to ensure that the SQL queries within the stored procedure are actually single-partitioned.

When you declare a stored procedure as single-partitioned, you must specify both the partitioning table and column using the PARTITION PROCEDURE statement in the schema DDL. For example, in our sample application the table RESERVATION is partitioned on FLIGHTID. Let's say you create a stored procedure with two arguments, flight_id and reservation_id. You declare the stored procedure as single-partitioned in the DDL schema using the FLIGHTID column as the partitioning column. By default, the first parameter to the procedure, flight_id, is used as the hash value. For example:

PARTITION PROCEDURE MakeReservation ON TABLE Reservation COLUMN FlightID;

At this point, your stored procedure can operate on only those records in the RESERVATION with FLIGHTID=flight_id. What's more it can only operate on records in other partitioned tables that are partitioned on the same hash value.

In other words, the following rules apply:

  • Any SELECT, UPDATE, or DELETE queries of the RESERVATION table must use the constraint WHERE FLIGHTID=? (where the question mark is replaced by the value of flight_id).

  • SELECT statements can join the RESERVATION table to replicated tables, as long as the preceding constraint is also applied.

  • SELECT statements can join the RESERVATION table to other partitioned tables as long as the following is true:

    • The two tables are partitioned on the same column (in this case, FLIGHTID).

    • The tables are joined on the shared partitioning column.

    • The preceding constraint (WHERE RESERVATION.FLIGHTID=?) is used.

For example, the RESERVATION table can be joined to the FLIGHT table (which is replicated). However, the RESERVATION table cannot be joined with the CUSTOMER table in a single-partitioned stored procedure because the two tables use different partitioning columns. (CUSTOMER is partitioned on the CUSTOMERID column.)

The following are examples of invalid SQL queries for a single-partitioned stored procedure partitioned on FLIGHTID:

  • INVALID: SELECT * FROM reservation WHERE reservationid=?

  • INVALID: SELECT c.lastname FROM reservation AS r, customer AS c WHERE r.flightid=? AND c.customerid = r.customerid

In the first example, the RESERVATION table is being constrained by a column (RESERVATIONID) which is not the partitioning column. In the second example, the correct partitioning column is being used in the WHERE clause, but the tables are being joined on a different column. As a result, not all CUSTOMER rows are available to the stored procedure since the CUSTOMER table is partitioned on a different column than RESERVATION.

Warning

It is the application developer's responsibility to ensure that the queries in a single-partitioned stored procedure are truly single-partitioned. VoltDB does not warn you about SELECT or DELETE statements that will return incomplete results. VoltDB does generate a runtime error if you attempt to INSERT a row that does not belong in the current partition.

Finally, the PARTITION PROCEDURE statement assumes that the partitioning column value is the first parameter to the procedure. If you wish to partition on a different parameter value, say the third parameter, you must specify the partitioning parameter using the PARAMETER clause and a zero-based index for the parameter position. In other words, the index for the third parameter would be "2" and the PARTITION PROCEDURE statement would read as follows:

PARTITION PROCEDURE GetCustomerDetails 
    ON TABLE Customer COLUMN CustomerID
    PARAMETER 2;


[1] One side effect of transactions being precompiled as stored procedures is that external transaction management frameworks, such as Spring or JEE, are not supported by VoltDB.