A major part of any relational database is the schema: the structure of the data as defined by the tables and columns. It is possible to change the schema when necessary. However, at any given time, each table has a set number of columns, each with a specific name and datatype.
It is possible to store unstructured data in a relational database as a "blob" using a VARBINARY or VARCHAR column. However, the database has no way to operate on your data effectively beyond simply storing and retrieving it.
Sometimes data is not as strictly organized as a relational database schema requires, but does have structure within it. For example, a table may have a set of properties, each with a different name and matching value. But not all records use the same set of properties.
JSON (JavaScript Object Notation) is a light-weight data interchange format that lets you describe data structures on the fly. JSON-encoded strings are composed of a hierarchy of key-value pairs that can be as simple or as complex as needed. More importantly, the actual structure of the object is determined at run-time and does not need to be predefined.
VoltDB gives you the ability to mix the efficiency of the relational schema with the flexibility of JSON. By using JSON-encoded columns with new VoltDB SQL functions and index capabilities, you can work more naturally with JSON data while maintaining the efficiency and transactional consistency of a relational database.
Let’s assume that you want to implement a single sign-on (SSO) application using VoltDB. You wish to store the login session for a set of different online sites under a common username. Each login session could hold different user state, simple data values or possibly more complex structures. Additionally, future sessions could hold just about anything. Because of the variability of the data, a good strategy would be to JSON-encode it. The VoltDB table schema for this application might look like the following:
CREATE TABLE user_session_table ( username varchar(200) UNIQUE NOT NULL, password varchar(100) NOT NULL, global_session_id varchar(200) UNIQUE NOT NULL, last_accessed TIMESTAMP, json_data varchar(2048) ); PARTITION TABLE user_session_table ON COLUMN username;
Common across all sessions would be the username, password, perhaps a global session ID, and a last accessed timestamp. Because you wish to support millions of simultaneous logins, it is best to partition the table based on the username column.
This schema is from the json-sessions sample application that comes with the VoltDB server software.
Ultimately, the sample inserts the JSON-encoded session into the database using a simple standard SQL statement:
INSERT INTO user_session_table (username, password, global_session_id, last_accessed, json_data) VALUES (?, ?, ?, ?, ?);
The json-sessions sample models each type of session being tracked as a plain old Java object (POJO). To simplify encoding these session types into JSON, the sample uses an open source package from Google called GSON. GSON can convert POJOs to/from JSON, greatly simplifying the JSON processing in the example.
Note that VoltDB does not, at present time, validate that data inserted into a VARCHAR column is properly encoded JSON. Validation of encoding occurs during query time, as described in the next section.