We now have a complete database that we can interact with using SQL queries. For example, we can find the least populous county for any given state (California, for example) with the following SQL query:
$ sqlcmd 1> SELECT TOP 1 county, abbreviation, population 2> FROM people, states WHERE people.state_num=6 3> AND people.state_num=states.state_num 4> ORDER BY population ASC;
However, typing in the same query with a different state number over and over again gets tiring very quickly. The situation gets worse as the queries get more complex.
For queries you run frequently only changing the input, you can create a simple stored procedure. Stored procedures let you define the query once and change the input values when you execute the procedure. Stored procedures have an additional benefit; because they are pre-compiled as part of the application catalog, the queries do not need to be planned at runtime, reducing the time it takes for each query to execute.
To create simple stored procedures — that is, procedures consisting of a single SQL query — you can define the entire procedure in your database schema using the CREATE PROCEDURE AS statement. So, for example to turn our previous query into a stored procedure, we can add the following statement to our schema:
CREATE PROCEDURE leastpopulatedAS SELECT TOP 1 county, abbreviation, population FROM people, states WHERE people.state_num=?
AND people.state_num=states.state_num ORDER BY population ASC;
In the CREATE PROCEDURE AS statement:
The label, in this case leastpopulated, is the name given to the stored procedure. | |
Question marks are used as placeholders for values that will be input at runtime. |
In addition to creating the stored procedure, we can also specify if it is single-partitioned or not. When you partition a stored procedure, you associate it with a specific partition based on the table that it accesses. For example, the preceding query accesses the People table and, more importantly, narrows the focus to a specific value of the partitioning column, State_num.
Note that you can access more than one table in a single-partition procedure, as we do in the preceding example. However, all of the data you access must be in that partition. In other words, all the tables you access must be partitioned on the same key value or, for read-only SELECT statements, you can also include replicated tables.
So we can partition our new procedure using the People table by adding a PARTITION PROCEDURE statement:
PARTITION PROCEDURE leastpopulated ON TABLE people COLUMN state_num;
Now when we invoke the stored procedure, it is executed only in the partition where the State_num column matches the first argument to the procedure, leaving the other partitions free to process other requests.
Of course, before we can use the procedure we need to compile it into the catalog. Modifying stored procedures can be done on the fly, like adding and removing tables. So we do not need to restart the database, just recompile the catalog and update the running database.[2]
$ voltdb compile -o towns.jar towns.sql $ voltadmin update towns.jar deployment.xml
Once we update the catalog, the new procedure becomes available. (Although, if you are using interactive commands, as in this tutorial, it is a good idea to exit and reenter sqlcmd to make sure it recognizes the new procedure name.) So we can now execute the query multiple times for different states simply by changing the argument to the procedure:
$ sqlcmd SQL Command :: localhost:21212 1> exec leastpopulated 6; COUNTY ABBREVIATION POPULATION -------------- ------------- ----------- Alpine County CA 1175 (1 row(s) affected) 2> exec leastpopulated 48; COUNTY ABBREVIATION POPULATION -------------- ------------- ----------- Loving County TX 82 (1 row(s) affected)
Simple stored procedures written purely in SQL are very handy as short cuts. However, some procedures are more complex, requiring multiple queries and additional computation based on query results. For more involved procedures, VoltDB supports writing stored procedures in Java.
It isn't necessary to be a Java programming wizard to write VoltDB stored procedures. All VoltDB stored procedures have the same basic structure. For example, the following code reproduces the simple stored procedure leastpopulated we wrote in the previous section using Java:
import org.voltdb.*;public class LeastPopulated extends VoltProcedure {
public final SQLStmt getLeast = new SQLStmt( " SELECT TOP 1 county, abbreviation, population " + " FROM people, states WHERE people.state_num=?" + " AND people.state_num=states.state_num" + " ORDER BY population ASC;" );
public VoltTable[] run(integer state_num)
throws VoltAbortException { voltQueueSQL( getLeast, state_num );
return voltExecuteSQL();
} }
In this example:
We start by importing the necessary VoltDB classes and methods. | |
The procedure itself is defined as a Java class. The Java class name is the name we use at runtime to invoke the procedure. In this case, the procedure name is LeastPopulated. | |
At the beginning of the class, you declare the SQL queries that the stored procedure will use. Here we use the same SQL query from the simple stored procedure, including the use of a question mark as a placeholder. | |
The body of the procedure is a single run method. The arguments to the run method are the arguments that must be provided when invoking the procedure at runtime. | |
Within the run method, the procedure queues one or more queries, specifying the SQL query name, declared in step 3, and the arguments to be used for the placeholders. (Here we only have the one query with one argument, the state number.) | |
Finally, a call executes all of the queued queries and the results of those queries are returned to the calling application. |
Now, writing a Java stored procedure to execute a single SQL query is overkill. But it does illustrate the basic structure of the procedure.
Java stored procedures become important when designing more complex interactions with the database. One of the most important aspects of VoltDB stored procedures is that each stored procedure is executed as a complete unit, a transaction, that either succeeds or fails as a whole. If any errors occur during the transaction, earlier queries in the transaction are rolled back before a response is returned to the calling application, or any further work is done by the partition.
One such transaction might be updating the database. It just so happens that the population data from the U.S. Census Bureau contains both actual census results and estimated population numbers for following years. If we want to update the database to replace the 2010 results with the 2011 estimated statistics (or some future estimates), we would need a procedure to:
Check to see if a record already exists for the specified state and county.
If so, use the SQL UPDATE statement to update the record.
If not, use an INSERT statement to create a new record.
We can do that by extending our original sample Java stored procedure. We can start be giving the Java class a descriptive name, UpdatePeople. Next we include the three SQL statements we will use (SELECT, UPDATE, and INSERT). We also need to add more arguments to the procedure to provide data for all of the columns in the People table. Finally, we add the query invocations and conditional logic needed. Note that we queue and execute the SELECT statement first, then evaluate its results (that is, whether there is at least one record or not) before queuing either the UPDATE or INSERT statement.
The following is the completed stored procedure source code.
import org.voltdb.*; public class UpdatePeople extends VoltProcedure { public final SQLStmt findCurrent = new SQLStmt( " SELECT * FROM people WHERE state_num=? AND county_num=?;"); public final SQLStmt updateExisting = new SQLStmt( " UPDATE people SET population=?" + " WHERE state_num=? AND county_num=?;"); public final SQLStmt addNew = new SQLStmt( " INSERT INTO people VALUES (?,?,?,?);"); public VoltTable[] run(byte state_num, short county_num, String county, long population) throws VoltAbortException { voltQueueSQL( findCurrent, state_num, county_num ); VoltTable[] results = voltExecuteSQL(); if (results[0].getRowCount() > 0) { // found a record voltQueueSQL( updateExisting, population, state_num, county_num ); } else { // no existing record voltQueueSQL( addNew, state_num, county_num, county, population); } return voltExecuteSQL(); } }
Once we write the Java stored procedure, we need to compile it into the catalog the same way we do with simple stored procedures. But first, the Java class itself needs compiling. We use the Java compiler, javac, to compile the procedure the same way we would any other Java program.
When compiling stored procedures, the Java compiler must be able to
find the VoltDB classes and methods imported at the beginning of the
procedure. To do that, we must include the VoltDB libraries in the Java
classpath. The libraries are in the subfolder
/voltdb
where you installed VoltDB. For example, if
you installed VoltDB in the directory /opt/voltdb
,
the command to compile the UpdatePeople procedure is the following:
$ javac -cp "$CLASSPATH:/opt/voltdb/voltdb/*" UpdatePeople.java
Once we compile the source code into a Java class, we can then declare the stored procedure in our schema, in much the same way simple stored procedures are declared. But this time we use the CREATE PROCEDURE FROM CLASS statement, specifying the class name rather than the SQL query. We can also partition the procedure on the People table, since all of the queries are constrained to a specific value of State_num, the partitioning column. Here are the statements we add to the schema.
CREATE PROCEDURE FROM CLASS UpdatePeople; PARTITION PROCEDURE UpdatePeople ON TABLE people COLUMN state_num;
The updated schema definition is included in the tutorial files as
townsupdate.sql
. So, if you are following along,
rename the file to towns.sql
to bring your sample
files up to date.
$ cp townsupdate.sql towns.sql
OK. Now we have a Java class file for the new stored procedure and an updated schema. We are ready to recompile the catalog.
Just as Java needs to know where the VoltDB classes are when it
compiles your stored procedure, VoltDB needs to know where your stored
procedure classes are when it compiles the catalog. You tell VoltDB where
to look by adding the location of the stored procedure class files to the
classpath. You can add them to the classpath using the
-c
or --classpath
flag to the
voltdb compile command.
If you compile your Java classes into a subdirectory, which is common for larger projects with multiple classes, that subdirectory would be the path you should specify. Since our example does not specify an alternate location for the Java classes, the classes are in the current working directory and we can use ./ to identify their location.
So we can use the same commands we have used throughout (with the addition of the --classpath flag) to compile our catalog and update the database:
$ voltdb compile --classpath=./ -o towns.jar towns.sql $ voltadmin update towns.jar deployment.xml
Obviously, we don't want to invoke our new procedure manually for each record in the People table. We could write a program to do it for us. Fortunately, there is a program already available that we can use.
The csvloader command normally uses the default INSERT procedures to load data into a table. However, you can specify an different procedure if you wish. So we can use csvloader to invoke our new procedure to update the database with every record in the data file.
First we must filter the data to the columns we need. We use the
same shell commands we used to create the initial input file, except we
switch to selecting the column with data for the 2011 estimate rather than
the actual census results. We can save this file as
data/people2011.txt
(which is included with the
source files):
$ grep -v "^040," data/CO-EST2011-Alldata.csv \
| cut --delimiter="," --fields=4,5,7,11 > data/people2011.txt
Before we update the database, let's just check to see which are the two counties with the smallest population:
$ sqlcmd SQL Command :: localhost:21212 1> SELECT TOP 2 county, abbreviation, population 2> FROM people,states WHERE people.state_num=states.state_num 3> ORDER BY population ASC; COUNTY ABBREVIATION POPULATION --------------- ------------- ----------- Loving County TX 82 Kalawao County HI 90 (2 row(s) affected)
Now we can run csvloader to update the database,
using the -p
flag indicating that we are specifying a
stored procedure name rather than a table name:
$ csvloader --skip 1 --file data/people2011.txt \
-p UpdatePeople
And finally, we can check to see the results of the update by repeating our earlier query:
$ sqlcmd SQL Command :: localhost:21212 1> SELECT TOP 2 county, abbreviation, population 2> FROM people,states WHERE people.state_num=states.state_num 3> ORDER BY population ASC; COUNTY ABBREVIATION POPULATION --------------- ------------- ----------- Kalawao County HI 90 Loving County TX 94 (2 row(s) affected)
Aha! In fact, the estimates show that Loving County, Texas is growing and is no longer the smallest!
[2] If you are using the VoltDB Community Edition see the section called “Restoring Data to a New Catalog”.