SQLite Visualiser
SQLite Visualiser
By:
Paul Batty
Supervisor:
Andrew Scott
March 2016
Date:
Signed:
To get the code and other documents visit the website at:
1
Abstract
This paper presents a tool designed to visualise the internal workings of a SQLite
database file. Starting with the history of SQLite, its systems and file format.
Finding the file is a series of fixed sized chunks / pages, and each page is a node in
a much larger B-Tree structure. Secondly, constructing a model-view-controller
style application that can then parse, and present this data in real time, while other
systems access the file. Thirdly, how using TestFX and JUnit have helped build
a robust application. Lastly, how the tool could be improved by polishing up
the user interface, with customisation and other minor interactions. The overall
system could be improved with increased performance and adding some much
needed features. On top of this, a future project could look at the changes made
to the database and turn them back into the original SQL queries.
Keywords: SQLite, databases, SQL, Java, JUnit, TestFx, JavaFX, B-Trees, Merkle
trees, varints, Real time updates, User interface design
2
Introduction
SQLite is a small lightweight database engine that can perform many operations
without the need to configure, manipulate, or go through a long winded install
process. It is simple flexible, and widely distributed. In fact SQLite takes pride
that it is probably one of the most widely deployed database engines. And one of
the top five most deployed software modules. Alongside zlib, libpng and libjpg.
It finds itself inside all of the top browsers (Firefox, Google Chrome and possibly
Edge), Operating systems (Windows 10, IOS and embedded OS’s) and in the most
unexpected places such as aircraft.
SQLite’s systems and infrastructure, enable it to be flexible, fast and simple. The
main focus of this paper however, was on the file format that it uses to store the
entire database. How it was put together. How to traverse it. And why it is the
way it is. In addition to this, the available tools for SQLite. This is covered in the
first section of this paper.
Understanding the file format was just the first stepping stone. This paper then
undertakes a journey to build a tool that could traverse and read the file. While
recording every operation that was and ever will be performed onto the database.
This is covered in the second and third chapters.
While building the tool. It was important to see how it operated from a users
perspective and the best ways to break it. This was to ensure that the tool was open
to everyone, and would not fall down and crumble. This is covered in sections
four.
Once the tool became well developed, the paper looks towards the future of the
tool. What could be added to make it ever more useful for developers, researchers
and anyone else using SQLite systems. This is covered in the final sections, five
and six.
As this paper covers similar programs, there is a distinct shortage of tools that
enable users to debug their database, while there is an abundance of user interfaces.
Excluding the hex editor. Alongside this they often do not provide any inside into
how SQLite is updating the database. Lastly, there is currently no way of logging
commands that are executed onto the database, outside of your own connections.
To combat this the main aim of this paper is to help you understand the SQLite file
format and systems. While providing a useful tool that can help debug, manipulate
and record your own SQLite databases.
3
1 Background
The following chapter will cover two completely different programs that operate
on SQLite. Then look towards SQLite’s beginnings, and where it is used. Starting
back in spring of 2000. Finishing off with a look at the SQLite file format. Before
moving onto the other sections.
1 BACKGROUND 4
Apart from the usual features, such as viewing tables, schemas, and modifying
them. The more unique features allowed exporting the tables to CSV (comma
separated values), producing SQL dumps and acting as a sandbox. The sandbox
allowed users to execute commands, see the changes, but nothing was actually
performed until the user committed the changes onto the database. In addition to
this it provided a fully fledged SQL editor with auto-complete, syntax highlighting
and loading and saving of commands in external files.
The tool itself was made in C/C++ and QT, with support for SQLite databases
up to version 3.8.2. It is available for all major platforms. The user interface and
features built into the tool where simple and intuitive to navigate. It was very
powerful and can be very usefully to anyone working with SQLite that does not
want to use the command line. However, it did not allow an insight into SQLite
nor the logging of commands, produced by external programs.
Figure 1.2: Screen shot of the SQLite fragmentation visualiser. (laysakura, 2012)
The tool runs though the command line, and produces a JSon file with the analysis,
before sending it to the visualiser that produces a SVG image output. This allows
any type of output to be added in. Some of the more advanced features allow
filtering certain pages out or in, and de-fragmentation.
1 BACKGROUND 5
Although it is very powerful, it does not support WAL mode, slow on larger
databases and can only cope with UTF-8 text support. WAL mode, or write ahead
logging, is an alternative to a rollback journal, where the changes are written to
the log. This log is then inserted into the database file at each checkpoint during
the transaction. The rollback journal takes the opposite approach and changes the
database with the journal holding the backup. Overall, this tool provides a very
useful insight into SQLite. On top of this, it clearly shows the page format of the
file, which is very similar to where my tool is going.
Speaking to a colleague in January of 2000, Hipp, disused his idea for a self
contained embedded database. When some free time opened up on the 29th of
May 2000, SQLite was born. It was not until August 2000 that version 1.0 was
released. Then in just over a year on the 28 November 2001, 2.0 which introduced,
B-Trees and many of the features seen in 3.0 today. During the next year he joined
up with Joe Mistachkin followed by Dan Kennedy in 2002. To help work for the
3.0 release, which came a lot later containing a full rewrite and improvements
over 2.0, with the first public release on 18 June 2004. At the time of writing
this paper the current version of SQLite is 3.10.4. After changes to the naming
scheme, version 3 is currently supported to 2050. (Hipp, 2000).
Today, SQLite is open source within the public domain making it accessible to
everyone. The entire library size is around 350Kib, with some optional features
omitted it could be reduced to around 300Kib making it incredibly small compared
to what it does. In addition to this the runtime usage is minimal with 4Kib stack
space and 100Kib heap, allowing it to run on almost anything. SQLite’s main
strength is that the entire database in encoded into a single portable file, that can
be read, on any system whether 32 or 64 bit, big or small endian. It is often seen as
a replacement for storage files rather than a database system. In fact Hipp (2000)
stated, ”Think of SQLite not as a replacement for Oracle but as a replacement for
fopen()”.
1 BACKGROUND 6
1.3 Where is SQLite used
SQLite being a relational database engine, as well as a replacement for fopen()
(Hipp, 2000). Allows it to be truly used for anything. Hipp has stated numerous
times that SQLite might be the single most deployed software currently. Alongside
zlib, libpng and libjpg. With the numbers in the billions and billions (Hipp, 2000).
This large distribution means SQLite can be found anywhere. Microsoft even
approached Hipp, and asked for a special version to be made for use in Windows
10 (Hipp, 2015). In addition to Microsoft. Apple, Google and Facebook all use
SQLite, somewhere within their systems. On top of all the big names. SQLite
can be found within any another consumer device, such as Phones, Cameras and
Televisions. This wide usage was picked up by Google and Hipp was awarded
Best Integrator at OReillys 2005 Open Source Convention (Owens, 2006).
1 BACKGROUND 7
They both work closely together. Taking the SQL string and validating the syntax.
Before converting it into a hierarchical structure for use by the code generator.
Both systems are custom made for SQLite. With the parser going under the name
of Lemon (Owens, 2006). Lemon is designed to optimise performance and guard
against memory leaks. Once they have successfully converted the SQL string into
a Parse Tree, the parser passes it onto the code generator.
The code generator takes the parse tree from the parser, and translates it into an
assembly language program. The program is written in a assembly language that
is specifically designed for SQLite. It is run by the virtual machine inside the
core module. Once the assembly program is constructed it is sent to the virtual
machine for execution.
The interface module defines the interface between the virtual machine and the
SQL library. All libraries and external application use this to communicate with
SQLite.
1 BACKGROUND 8
With this in mind the virtual machine takes the SQL input from the Interface,
passes it onto the SQL Compiler. Then collecting the resulting program from the
code generator, and executing this program to perform the original request that
was sent in. Making it the heart or core of SQLite’s operations.
Lastly, the Pager is the transport truck, going between the B-Tree (factory) and
the OS interface (storage) to deliver pages at the B-tree requests. It also keeps the
most commonly used pages in its cache, so it does not have to keep going through
the OS interface in order to collect the pages, since it already has them.
1 BACKGROUND 9
1.5 The SQLite file format
1.5.1 The page system
As mentioned in section 1.4.3 the B-Tree module looks after the pages including
the organisation and relationships between them. Packing them into a tree structure.
This is the same structure that gets written to disk. The B-Tree implementation
is designed to support fast querying. The various B-Tree structures can be found in
Comer (1979) The ubiquitous B-Tree paper. SQLite also takes some improvements
seen in Knuth (1973) Sorting and searching book (Raymond, 2009).
The basic idea is that the file is made up of pages, each page is the same fixed
size. The size of the pages are a power of two between 512 - 65536 bytes. Pages
are numbered starting with 1 instead of 0. The maximum number of pages that
Sqlite can hold is 2,147,483,646 with a page size of 512 bytes or around 140
terabytes. The minimum number of pages within a database is 1. There are five
types of pages:
This paper will not cover the lock byte and pointer map pages.
1 BACKGROUND 10
1.5.2 The Header
The first step in parsing the SQLite file before the different pages is to read in the
SQLite header. This is the first 100 bytes located in page one. The header stores
all the necessary information to read the rest of the file. So reading it correctly is
crucial. Immediately following the header is the root B-Tree which is covered in
the next section. Appendix A shows the full header layout.
SQLite uses a both B+-Trees and B-Trees, throughout its systems. The B-Tree is
a data structure built for systems that read and write large amounts of data blocks,
such as file systems. In this case however, the pages representing are representing
the data blocks.
B-Trees are made up of internal nodes, and leaves. Internal nodes make up the
structure of the tree, connecting each part of the tree down to the leaves, like
branches. The leaves are located at the end of the structure. The B-Tree stores
data in both the internal nodes and the leaves, whereas the B+-Tree only stores
data in the leaves. An example B-Tree can be seen below in figure 1.5.
If the above figure, is a B-Tree all of the items would be a valid value, and
represent a key. However, in a B+-Tree, only the items in the leaves are valid
values, and the rest act as keys, to guide a path to the correct value.
1 BACKGROUND 11
When searching the tree for a value of N, the value of the current node is taken.
If the value N is smaller than the value of the current node, then the path to the
left node is taken. If it is larger then the path to the right. Inside a B+-Tree, since
values are in leafs only this is extended to be greater than or equal to.
Insertion and deletion follows the same pattern, upholding the ordering of the
values. SQLite uses page type and relationships to determine to order of which the
B-Trees nodes are connected, for example an overflow page can only be connected
to a B-Tree cell, and the data for a specific table can be found attached to a
common parent.
Below figure 1.6 shows an example B-Tree that could be found within SQLite.
One thing to note is how the file is made up of mainly B-Tree pages. This is briefly
mentioned in the last section, where pointer maps, lock byte and overflow pages
only appear when the requirements are met. And Freelist page when enough data
has been deleted. Leaving only the B-Tree pages.
At the start of each B-Tree page there is the B-Tree / page header. Following
the header is an array of pointers to their cells. One thing to note is that the first
page also has the database header. That will have to be skipped before reaching
the page header. Figure 1.7 shows the full layout of the SQLite file.
1 BACKGROUND 12
Figure 1.7: Sqlite file format, modified from Drinkwater (2011)
When the cells are added to the page, they start at the end of the page and work
backwards towards the top. The main difference between each type of B-tree page
can be seen inside the cells as they carry the payload for the node.
As mentioned in section 1.5.1 there are four types of B-Trees. These can be split
into two main types, and two sub types. The main types are Table and Index, both
of which uses a key-value system in order to organise them. The Table B-Trees
use 64 bit integers also known as row-id or primary key, theses are often what
the user has set inside the database, else SQLite will attach its own. The Index
B-Trees use database records as keys.
The sub types of B-Trees are broken down into Leaf and Interior. The leaves
are located at the end of the tree and contain no children. Whereas the interior
will always have at least one single child. In addition to this all database records /
values within the B-Trees are sorted using the following rules written by Raymond
(2009):
1 BACKGROUND 13
4. If one value is a real or integer value, and the other is a text or blob value,
then the numeric value is considered lesser
5. If both values are text, then the collation function is used to compare them.
The collation function is a property of the index column in which the values
are found
6. If one value is text and the other a blob, the text value is considered lesser.
7. If both values are blobs, memcmp() is used to determine the results of the
comparison function. If one blob is a prefix of the other, the shorter blob is
considered lesser.
Overall the four types of B-Trees found inside SQLite are:
• Index B-Tree Interior
• Index B-Tree leaf
• Table B-Tree Interior
• Table B-Tree leaf
In the case of index B-Trees, the interior trees contain N children and N-1 database
records where N is two or greater. Whereas a leaf will always contain M database
records where M is a one or greater. The database records stored inside an Index
B-Tree are of the same quantity as the associated database table, with the same
fields and columns, between the tables and rows. This can be seen in figure 1.8.
Index trees are used by SQLite to keep track the foreign keys and row relationships.
1 CREATE TABLE t1(a INTEGER PRIMARY KEY, b, c, d);
2 CREATE INDEX i1 ON t1(d, c);
3
4 INSERT INTO t1 VALUES(1, ’triangle’, 3, 180, ’green’);
5 INSERT INTO t1 VALUES(2, ’square’, 4, 360, ’gold’);
6 INSERT INTO t1 VALUES(3, ’pentagon’, 5, 540, ’grey’);
the format.
The table B-Trees are completely different to the index trees. The interior type
contains no data but only pointers to child B-Tree pages, as all the data is stored
1 BACKGROUND 14
on the leaf type. The interior trees contain at least one pointer, and the leaf
node contains at least one record. For each table that exists in the database, one
corresponding Table B-Tree also exists, and that B-Tree contains one entry per
row, appearing in the same order as the logical database. Figure 1.9 shows the
physical layout of the Table B-Tree.
For example, taking the following value in hex 5B and converting this to binary,
creates the value 01011011, in this case the most significant bit is not set leaving
the final value at 91. However, if the value in hex is 84 and converting this to
binary creates the value 10000100, the most significant bit is set, therefore the
next byte is needed. In this case the value of the next byte in hex is 60, converting
this to binary creates 01100000 meaning that this varint is two byes long. In order
to create the final value, the bytes need to be concatenated together leaving out the
most significant bit. Creating the value 00001001100000 giving a final value of
608 in decimal (Drinkwater, 2011). Table 1.1 shows all combinations of varints.
1 BACKGROUND 15
Bytes Value Range Bit pattern
1 7 0xxxxxxx
2 14 1xxxxxxx 0xxxxxxx
3 21 1xxxxxxx 1xxxxxxx 0xxxxxxx
4 28 1xxxxxxx 1xxxxxxx 1xxxxxxx 0xxxxxxx
5 35 1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx 0xxxxxxx
6 42 1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx
0xxxxxxx
7 49 1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx
1xxxxxxx 0xxxxxxx
8 56 1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx
1xxxxxxx 1xxxxxxx 0xxxxxxx
9 64 1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx
1xxxxxxx 1xxxxxxx 1xxxxxxx xxxxxxxx
Table 1.1: Varint combinations Raymond (2009)
1 BACKGROUND 16
Immediately following the header is the array of cell pointers, the number of cells
is read at offset 3. Each cell pointer is 2 bytes in size. It is worth noting at this
point that the pointer and offsets start at the page offset rather than the start of
the file, keeping each page self contained. Therefore, in order to follow the cell
pointers or the other offsets the following sum is needed to calculate its position
in the file:
1 cell = ((pageNumer - 1) * pageSize) + offset;
The right most pointer within interior B-Tree pages is the childs page number not
its offset, therefore to calculate the page offset the following sum is used:
1 pageOffset = ((pageNumer - 1) * pageSize);
1 BACKGROUND 17
Data type Description
4 byte integer Page number of child
Varint Row id.
Table 1.4: Page B-Tree interior cell
The Leaf type is a little more complex, this can be seen the Table 1.5 below:
If there is an overflow the following calculation can be used to work out the size
of the record in this part of the Index B-Tree cell before jumping to the overflow
page:
1 BACKGROUND 18
1 usable-size = page-size - bytes-of-unused-space;
2
3 min-local = (usable-size - 12) * min-embedded-fraction / 255 - 23;
4 max-local = (usable-size - 12) * max-embedded-fraction / 255 - 23;
5
6 local-size = min-local + (record-size - min-local) %
7 (usable-size - 4);
8
9 if(local-size > max-local) {
10 local-size = min-local;
11 }
Similarity for the Table B-Tree the only difference is in the calculation of the max
local where the following sum is used instead:
1 max-local := usable-size - 35
1 BACKGROUND 19
Figure 1.10: Database record format (Raymond, 2009)
1 BACKGROUND 20
The cell content as shown in figure 1.10 follows the same format layout in the
header, with the content size and type specified in table 1.7. Where the size is 0
there is no varint to read from the data section and should be skipped.
In some of the test databases, the root page was an Interior Table B-Tree, so going
by page numbers in order to find the schema table is a bad idea. The Table 1.8
below shows the payload / record layout of the ”sqlite_master” table.
In order to tell if the current page is a table, the first column, should contain one
of the four types, mentioned in the above table. Then read the page number in the
fourth column to find the content for that table. Much like the right child pointer
mentioned in section 1.5.5 this is the page number of the child not a pointer.
1 BACKGROUND 21
2 Design
Having looked at SQLite and the current tools. This chapter will cover the requirements
set out for the tool, the features and a high level overview of the architectural
design. Then going into more depth looking at each module that makes up the
application. Finishing off looking at how the user interface could look.
2.1 Requirements
In the previous chapter this paper showed that there is a lack of tools available
that allow an insight into SQLite and how it works. The exception to this was
the SQLite fragmentation tool, that did show the file format though not the over
arching structure. It also had some severe limitations as as to the types of databases
that it would work with.
It also showed that the file system is put together, constructing a large B-Tree
structure. As just mentioned there is no way to view this structure without using
a hex editor, and manually following the links.
Lastly, most of the SQLite tools are font end user interfaces for SQLite databases.
While this is useful for working with SQLite database they provide no way to
debug or see what is going on.
In order to address the issues listed above, throughout the remaining sections this
paper will design, implement and test a front end user interface tool, that can solve
the current situation.
The tool itself must be reliable and support the majority of features in the current
version of SQLite at the time of writing this paper. It must also not modify
the database file in any way to preserve the database, and the data it contains,
excluding parts of the application that are specifically designed to. Lastly, the tool
must be cross platform and intractable through a user interface.
2 DESIGN 22
2.2 Features
As just mentioned, there are a lot of issues surrounding the current situation with
SQLite tools. The tool this paper will design and implement contains five main
features, on top of a single base feature.
The central feature is the visualiser. The visualiser allows the visualisation of the
broken down page structure and hierarchy of the SQLite database file. Viewing
the file broken into pages, and how they connect to each other in the B-Tree
structure. On top of this clicking or otherwise interacting with a node to see more
information about. Such as data, page number and pointers.
In addition to the visualiser, there is will be a metadata tab that will present the
header information in the database. Alongside other statistics that come from
parsing the database. Such as number of tables, primary keys and version numbers.
The base feature allows real time updating of this data when any command from
any system modifies the database in some way shape or form. The live updating
will allow stepping through the time line of updates that have occurred while the
application is running. Lastly, it can be paused to inspect a certain state / place in
time.
Whenever a update occurs, all changes that happened are recorded inside a log,
and a ”snapshot” of the database is taken. This snapshot is then presented to the
user, through the visualiser, metadata and log tabs. Creating the time line that can
then be browsed.
Apart from just showing the data, the application should allow executing of SQL
commands onto the database, through the SQL executor. And view the schema
and tables currently inside the database.
Finally, the tool will come with a friendly user interface, through which the user
can interact with each of the features. Each feature above aim to address the issues
mentioned at the start of this chapter.
2 DESIGN 23
turn contact the model for information, then sending the information backwards
to be presented by the view. The idea being that the view could be switched or
adjusted at any time without breaking the application. an example of MVC can be
seen below in figure 2.1.
With this in mind the bulk of the work is performed inside the model. However,
like any project, along the way some problems occured while building the application,
and so the original design had to be adjusted. Figure 2.2 shows the original design.
2 DESIGN 24
The Model was going to run in its own thread so it could control, manage and
prepare the data as it came in. This would allow the view could request it when
it wanted. The model was also made up of five modules. With the logging stored
into an external file, for the view to read when it wanted. In addition to this the
controllers follow a hierarchical structure, with a master controller, controlling
access to the model. However, this design proved unusable and thus changed into
the following seen in figure 2.3.
Most of the changes are seen within the Model, with the addition of two new
modules, and rather than running the whole thing inside a new thread, only the
file watcher is. On top of this the command logging is no longer written out to
file. The final change is the reduction in the amount of controllers. The next
section will go over each of the modules in depth.
2 DESIGN 25
2.4.2 The View
The view consists of three parts, the top, middle and right side. This is the most
flexible of the modules and can be adjusted depending on the user interface design.
The view, is designed to hold the layout of the each section and provide interaction
to the model, as mentioned previously. However, in this case, the view is made up
of three parts, the top representing the top / menu bar section of the interface. The
middle and right side sections to show the user what they are viewing.
The first module, marked as ‘model’ represents the model interface. In which
all communication with sub modules, from the controllers will go through. It
is also the only class to have direct access to all sub modules, in order to keep
each module as modular and independent as possible. In addition to this, it will
provides a small amount of functionality such as setting up, closing and opening
the database. Since every module will require something from this action.
The Database is the in program mapping of the SQLite database file. The database
is made up of two parts, the data objects, and the interface into the data objects.
The data objects are the mapping of the SQLite database. Containing the B-Tree
system and the data. The interface provides access control to the data objects,
allowing the program to move along the database time line.
The file watcher, runs inside its own thread, utilising the observer pattern. The
observer patten allows any class to register to it, and will revive a signal when an
event occurs. In this case it would be the updating of the database file. When a
2 DESIGN 26
change is detected, a signal will be sent out over the observers, alerting them to
the change. Although, if they did not tune in to the observer, The program would
not receive database updates.
The File parser, parses any given valid database file. Converting it into the database
object.
The log, takes any two database objects and records the changes between them.
The live updater, acts like a master controller for the modules, apart from the SQL
executor, file watcher and model interface. It controls the program flow when a
change is detected, and as such is registered to the file watchers observer. When an
update signal is sent, the first thing it does is contact the file parser for the updated
database object, then sends it off to the log, to record changes, before storing it in
the database module and incrementing the current position on the time line.
The SQL executor, controls the SQL connections, and executes SQL commands
onto the database.
2 DESIGN 27
The main concept behind this design was to keep the interface close to other
programs in use, this open up the application up to a wide range of users, as
stated towards the start the application had to be easy to use by a variety of users.
Originally the SQL executor was going to have its own tab. But it worked out
better along side the content as the user can then view what the commands are
doing to the database, when they are ran or have the information as a reference
when typing up commands. Below figure 2.5 shows the SQL executer User
interface design.
The user will type commands into the top text box, and then press the button to
run the command. After the command is ran a message will be displayed in the
return text box. And any returned data will be presented in the results table.
The base live updater is controlled via the icons and drop down menus with the
four other features having their own tab. Since the live update is hidden in the
background there is no need for it to have a dedicated section. The controls can
be seen above in figure 2.4
The metadata tab, is designed to have different panes, split up in to data relevance,
each showcasing the different values. For example one pane will show the size,
another the version numbers and so on. This will let the user quickly see the
needed and relevant information in one place. This can be seen below in figure 2.6.
2 DESIGN 28
Figure 2.6: Metadata interface design.
The other feature that is closely tied to this information is the visualiser. The
visualiser is made up of two parts, the left hand side where the data will be shown,
and the central section showing the visualisation. The visualisation will display
the file B-Tree as it is inside the file, with the pages represented as nodes. Then
when a node is selected the data will be shown inside the data pane. This can be
seen below in figure 2.7.
In the above figure, the glowing node tells the user that this node was updated in
the last update. This is used to make it easier to see what pages where updated, else
the user would have to find the pages and manually check each one for a change.
The glowing node makes it easier to spot, however this could be represented in a
variety of ways. Along side this a log entry is created.
2 DESIGN 29
The log tab much like the metadata tab will be made up off small panes, titled
with the date, and the collapsible content. The content will store what changed in
that update this can be seen in figure 2.8
The idea behind the folding panels, is to allow users to hide information that they
do not need. As such if there is a large change it would not be filling the screen.
The last feature and tab is the table and schema view. This, much like the visualiser
uses two sections. The left, to display the list of tables and the schema. With the
centre displaying the table data. This can be seen in figure 2.9.
2 DESIGN 30
3 Implementation
Having looked at the overall architecture of the application, and how it is build
up. This section will go over the tools used during development, how the features
were implemented and the problems encountered along the way.
Firstly, each view or section can be represented using an FXML file. The FXML
file is heavily based on HTML, including the support for CSS styling. Each file
starts with a root node, normally one of the panes, such as border, anchor, and
grid. For this application the majority of FXML files started with anchor panes,
apart from the menu bar which used a border pane. Then following the root pane
is the items to attached to it. Each item can be given a unique identifier that allows
it to be controlled with via Java and CSS. In addition to this custom items can be
included for use within JavaFX.
Secondly, the controller is a normal Java class set as the controller for a particular
FXML file. This can be done in two ways. The first way is to inject the controller
into the loading process. This allows the application to call other methods in
side the controller, such as initialisation before the FXML file is loaded. It also
gives more control over the controller, and where it is used. The second way is
to specify the controller inside the FXML file, and Java will load the controller
in when the file is loaded. But the access to the controller object is lost. Inside
3 IMPLEMENTATION 31
the controller the annotation @FXML, allows Java to inject the item from the
controller into the view, and vice versa. Giving full control over the FXML file
and the object utilising the items unique identifier. Throughout this application
the first method of loading the controllers were used, to allow controller sharing,
and manual control over the controller.
The view is made up of four sections, the menu bar, containing the file, edit and
other drop downs, including the icons, and tabs. The other three section, represent
the left, middle, and right sections of the central pane. This means that any one
time three different views can be displayed. Each of the sections have their own
FXML file, depending on the situation they may also share a controller. This can
be seen below in figure 3.1.
As mentioned earlier that the menu bar will never change, and should never
change. Using this fact the menu bar controller also doubles up as a ‘master’
controller. It controls what is currently seen in the other three sections. Loading
and freeing up the necessary sections for that tab.
The central pane is a split pane, with two split dividers allowing each sections
size to be adjusted on the fly to fit the user’s needs. If a section is not needed it is
just a matter of hiding that panes divider bar.
The controllers for each section extend a abstract controller class. The controller
class, enforces a model interface object into the constructor, and implements
observer. The model interface allows each controller to separately contact the
model, as previously mentioned to collect the data for the view. By implementing
observer the controller can then be registered for the signal when the database
is updated within the live updater (section 2.4.4). Meaning that the updated
information can be collected as soon as it is ready, without having to wait, or
3 IMPLEMENTATION 32
having a manual refresh button.
Everything is attached to the model interface, apart from the live updater being the
exception with a copy of the model. In addition to providing access to the other
modules, it has a very small amount of implementation, that is only used when
every module is affected. Such as the case of setting up, closing and opening a
database, which are all calls to the corresponding method on the modules interfaces.
3 IMPLEMENTATION 33
files. The items are stored within an ArrayList, and an int counter is used to keep
track of where in the array the application currently is.
The storage is made up of database items that contain a “snapshot” of the current
state of the database file, the database item is made of two parts, the meta-data, and
B-Tree. The meta-data contains all the information in the header, including a few
others such as the number of tables, the file name, and page number. The B-Tree
is a custom implementation, holding the various B-Tree pages as represented in
the file. The database items are filled with information via the file parser. Out of
all the classes, the database items are the most used, being sent to the view for
displaying, and modified by the various other modules during creation.
Due to the polling nature of this module, it runs inside its own thread, and communicates
over an observer pattern, that any another class can tune into, providing they
implement the Observer interface. The thread could then process the updated
databases without stopping or slowing down the user interface and other interactions.
The final implementation, effectively detected all changes performed to the database
consistently. Since it was custom made it was also a lot lighter than the other
alternatives. In addition to this the loop had no performance hit on the computation
or the rest of the application.
3 IMPLEMENTATION 34
3.2.5 File parser
The file parser takes a database file, a database object, and converts the database
file into the database object. Parsing the database, starts with checking the magic
number, then the header, before moving onto the pages. The magic number and
header information is correctly reading the first 100 bytes of the file, See Appendix
A for the header layout. To parse the pages the application heavily relied upon
recursion.
First it would parse the page header, then switch into the method, that dealt with
that type of page, who would then call the original method, when it reached a
page number. Each page was represented as a node, with contents of the node
represented as a cell. The only main issue with this design is the size of the stack
on a large database. Below is the psudocode of the algorithm:
3 IMPLEMENTATION 35
1 public void parseBTree(stream, database) {
2 database.getBTree().setRoot(parsePage(stream, 1,
3 database.getPageSize()));
4 }
5
6 public Node parsePage(stream, pageNumber, pageSize) {
7 Node node = new Node();
8 PageHeader header = parseHeader(stream, pageNumber, pageSize);
9
10 BTreeCell cell;
11 switch(header.getType()) {
12 case (TABLE_BTREE_LEAF_CELL) {
13 cell = parseTableBtreeLeafCell(stream, pageNumber,
14 pageSize);
15 }
16 ....
17 }
18 if (header.getType() == INTERIOR_CELL) {
19 node.addChild(parsePage(in, pageHeader.getRightMostPointer(),
20 pageSize));
21 }
22 node.setData(cell);
23 return node;
24 }
25
26 public cell parseTableBtreeLeafCell(InputStream, PageHeader, Node) {
27 Cell cell = new Cell();
28
29 int cellPointers[] = header.getCellPointers();
30 foreach(cellpointer) {
31 cell.data = readData();
32 if (cell has pageNumber) {
33 node.addChild(parsePage(in, pagenumber,
34 pageSize));
35 }
36 }
37
38 return cell;
39 }
During the process of parsing the tree the file parser will also need to decode the
‘varints’ mentioned in section 1.5.4 especially as they are needed to count the
number of bytes for the record headers. Below shows the psudocode algorithm
3 IMPLEMENTATION 36
that decrypts them, to both retrieve the encrypted number and the number of bytes
used:
1 private long[] decodeVarint(stream) {
2 long[] value = new long[];
3 byte[] varint = new varint[9];
4
5 for (i = 0 to 9) {
6 varint[i] = stream.readByte();
7 if (first bit is not set) {
8 break;
9 }
10 }
11
12 if (i == 0) {
13 value[0] = 0;
14 value[1] = 1;
15 } else {
16 for (j == 0 to i) {
17 varint[j] = (varint[j] << 1);
18 }
19 value[0] = varint.toLong();
20 value[1] = i + 1;
21 }
22 return value;
23 }
The first value returned in the array is the value of the varint, and the second its
size. The only issue with this algorithm is the use of the two for loops, increasing
the time complexity of the algorithm. However, during operation this number was
always less then nine so was never a major problem.
The first technique utilised SQLites triggers. Triggers execute SQL commands
3 IMPLEMENTATION 37
when, a delete, insert or update is performed on a table, with an optional where
clause. Using this Chirico (2004) used three separate triggers to log the time,
changes before and after, and type of action performed on the table. The last part is
one of the reasons why this could not be taken any further. Firstly, the application
would need to have three triggers per table in the database, so N*3 triggers where
N is the number of tables. Secondly, in order to accomplish this, an additional
table would have to be created that the changes are stored to, hence the log file in
the original design, where the application would attach to the database and write
to it. Lastly, the triggers meant altering the database file, this is something needed
to be avoided as much as possible, as to not impede on the running of the database.
The second solution, was to try and hook into SQLite through its API more
specifically the sqlite3_trace function. It takes a callback function that is then
called with the SQL commands, at various stages as it passes through the system.
Unfortunately at the current time the JBDC for SQLite did not support the function
that was needed. In order to get around this, a couple of functions had to be written
in C that could then be called from Java in order to access the API. It worked for
the most part, apart from that method only calls the callback function for SQL
sent from the current connection. When the main aim of the application needed to
collect all the changes, thus making this unusable.
The third way was to write a custom extension to SQLite, or download the source
code and modify to suit the applications needs. The seemed to be way to far from
the original path, and if the application used a custom version of SQLite it would
only work on the custom version of SQLite. As mentioned previously, one of the
goals is to not modify the data if possible, so writing an extension, that would
have to be loaded into SQLite and attached to the database, possibly conflicting
with any other extensions they might have is out of the question.
The final option, while less sophisticated than the others, it worked well enough,
although it does not get the original requests. It does end up recording the time,
and all changes that happened per command. Since the database storage contains
all of the previous versions like a snapshot of the database. when a update comes
in it can simply compare the new updated database to the last database object that
passed through the application.
To compare the database required looping through every data value in both trees
and comparing them, not only the data values, but also the added and removed
pages. This could not be detected through any of the other techniques. Clearly
looping through every single item in a larger database would quickly become a
bottleneck, and slow the application and parsing down. So in order to speed it
3 IMPLEMENTATION 38
up two things had to be changed. Firstly, the data array had to be hashed, If the
hashes match then there is no need to loop through the data. Secondly, is to adjust
the cutom B-trees implementation into a modified version of the Merkle Tree by
Merkle (1988). The basic idea behind the merkle tree is that each node in the tree
has a hash of its childrens hash, all the way down to the leaf node, who’s hash is
based on the contents of the data. Below figure 3.3 shows a diagram of the merkle
tree.
With this the application can tell if there is any change in the current section of
the tree just by comparing the nodes hashes without having to loop over them,
allowing the program to only loop over the nodes when a change is detected.
Allowing it to skip the parts that have not been changed. Unless the entire database
is modified where the application will have to revert to looping over everything.
The hash for each node is calculated off the hash of the data inside it, and its
children’s data. This will let the application detect if a page has been added,
removed or modified.
When the application detects an update to the data, it will mark that page as
modified, a simple boolean value. Recording the string value, from the old page
and new page, into a log object. If instead it was a removal or addition, of data,
it will store the new / removed value. Something similar happens with added and
removed pages. With the addition of pages, the added page will be marked as
changed. Often when this happens a pointer to the new page will also be placed
somewhere, so this is also recorded. When a page is removed, utilising the data
from the old tree allows the application to see exactly what was page has been
removed and can record it, but there is nothing to marked as changed, since it no
3 IMPLEMENTATION 39
longer exists, apart from the pointers in other pages to that page.
However, to collect metadata information about the database, the live updater had
to run SQL onto the database, else it would have to loop through the entire tree
again. So it needed access to the SQL executer, which is covered in the next
section. Then the problems with the logging came up, and while working a way
around these problem, it ended up inside of the live updater. As a result of this the
module soon became bloated, as new features were added since this was the only
module that had access to all the needed resources.
Rather than fighting against the application design, it would be better to dedicate
this as a sort of master module, that would orchestrate the process when an update
signal is revived. This allowed the extra tasks that where piled on to move on back
into their correct modules, as a result of this it does exactly what was set out in the
beginning. Acting as a tailored interface into the database interface, contacting the
file parser, SQL executor, and Log modules to control the parsing of the updated
database.
When an update signal is received, the live updater requests a database object
from the file parser, and adds the extra metadata from the SQL command. It then
requests the previous database object from the database interface, and sends them
to the log. Before storing the new database object inside the database interface. If
the application is not paused it will then increment along the time line.
3 IMPLEMENTATION 40
this was one of the more straight forward and simple to implement, calling the
corresponding functions on the JDBC API.
The entire user interface is made up of eight FXML files and one CSS file, for
styling. As mentioned previously, the FXML files contain the layout for each of
the sections. Therefore each of the sections seen by the user represents a single
FXML file. The application takes on a dark colour scheme used throughout, due
to personal preference, however this could easily be swapped out for a lighter
color scheme.
The menu bar is made up of one FXML file. In addition to the visual elements
using the JavaFX key code combination class it provides keyboard short-cuts,
such as ‘control o’ to open a database, and ‘control-q’ to quit. The controller
for this FXML file contacts the model interface for the opening and closing the
database, and the live updater for controlling the timeline. This can be seen below
in figure 3.4.
The right section is made up of a single FXML file, containing the SQL executer.
As previously mentioned the SQL executor module enables arbitrary SQL commands
to be ran on the database. As such this section contacts the SQL executor in order
connect, run commands, and close the connection. The final interface can be seen
below in figure 3.5.
3 IMPLEMENTATION 41
Figure 3.5: SQL executor UI.
The metadata tab is made up of single FXML file. The outer layer is a scroll pane,
with a flow pane content. The flow pane provides a dynamic layout to adjust to
different resolutions alongside the scroll pane. The panels, inside of the flow pane
are panes with grid pane content. The grid pane has two columns. The left or
first column, representing the description or name and the right or second column
the value. In order to collect the metadata, the controller needs to contact the
database interface, to collect the current database object. The user interface can
be seen below in figure 3.6.
The table view is made up of two FXML files that share a controller. The table
view allows users to select a table and view the schema and all the data within
3 IMPLEMENTATION 42
it. In order to accomplish this it contacts the SQL executor, and runs a select all
query to collect the data, and a select from ‘sqlite_master’ to retrieve all tables
and schemas. This interface can be seen below in figure 3.7.
The visualiser is made up of two FXML files, sharing a single controller. The
centre section containing the visualisation of the database, uses a custom scroll
pane node to enable, zooming in addition to scrolling. Each node is represented as
a pane, with a CSS class for the colouring. The node, contains the corresponding
B-Tree node, from the database object. This is then used to display the data
within the left side pane. In order to draw the structure the first pass sorts out
the horizontal position going top down. This is also used to load the nodes, via
recursion. After all the nodes are loaded a second pass is used to then calculate
the horizontal positions using a bottom up approach. This interface can be seen
below in figure 3.8.
3 IMPLEMENTATION 43
The log is made up of one FXML file. Similar to the metadata tab, it contains a
single scroll pane, with a VBox inside, allowing it to contain infinite items. Each
item is made up of a Titled pane. Where the title is the time and data of the update,
and the content, the changes that where performed in the update. To collect the
data, it contacts the Log module, and receives a list of log items. This can be seen
below in figure 3.9.
3 IMPLEMENTATION 44
4 Testing
As mentioned at the start of this paper, and during the design section. One of the
aims is to make sure that the tool could be relied on. In order to accomplish this,
a variety of testing methods and tools were used, this section will go over theses.
As stated at the start of this paper SQLite is intended to be used as a file format
rather than a complete database system. Therefore testing the application on a
multi-gigabyte file would be unnecessary. This can be assumed as any databases
of that size would be better off in a different system (Hipp, 2000). Although where
this is out of the developers control or there are no alternatives to their situation
would be classed as an edge case.
The unit tests themselves, attempt to test all the possible action that could be
performed to each exposed method in an attempt to make sure that each part
of the program is operating as expected. Apart from the very few edge cases.
However, some parts of the application are better tested then others due to the
complex nature of some modules.
In order to test the user interface a test framework that works alongside JUnit
called TestFX (Nilsson and Strath, 2012) was used, it is specifically designed to
test JavaFX. It takes the root node of the scene to test. Then in the tests command
methods are used such as click, move to and hover with either the identifier, or
name of the item. It will then automatically control the mouse, interacting with
the user interface.
4 TESTING 45
TestFx works very well for testing the navigation methods around the user interface
and making sure that the user can get from one part of the application to another.
With that said for more complex and fine grained interaction such as the scroll
pane within the visualiser, and the opening of titled panes in the log, were hard to
test effectively, in such cases manual testing had to be used.
Integration tests where combined with the unit tests, For example when the TestFX
tests clicks on the various buttons in the menu bar the signals are still sent to the
model. This was used to effectively to make sure that the various elements with
the program where working as expected.
4 TESTING 46
5 Evaluation
Now that the paper is coming to and end, this section will reflect on the final
results from this project, and whether the final results have met our overall aims.
Including the design principles used through the undertaking of the project.
With support for other database system when needed, just by extending the interfaces
and providing an implementation. In addition to sticking to good OOP design,
TDD (Test driven development) was used in order to make sure that the final
application, would work under a variety of circumstances. In the end theses aims
where achieved. With the exception, of the lock byte and pointer map pages, as
stated towards the start of the paper.
Firstly, with the exception of the Log, all of the implemented features worked
as planned. The log however, originally was going to report the SQL commands
sent to the database, but as seen in the last chapter due to the nature of SQLite this
became very hard to do. in order to get around this the final system could only
report the changes that have occurred.
Although the application is built in Java it does not suffer any performance consequences
on smaller databases as a result of this. Passing the larger database, using Java’s
nanosecond timer, takes on average 154.1 milliseconds to parse. While this may
5 EVALUATION 47
not seen like a long time if it is scaled it up to a few megabytes the parsing could
potentially be an issue.
In addition to the performance scaling, there were a few feature that did not get
added to the final application that would have enhanced it. Including the table
exporter, to CSV, XML and JSon, a log exporter, and the ability to modify the
database without having to use SQL, just by editing the values shown in the
interface. In addition to this enhancing the SQL editor, by providing auto complete
and syntax highlighting, to further improve the user experience. Lastly, the table
view should be able to collect the data from the snapshot rather then the database
file. This would have enabled users to browser the data at a specific time slot
easier, rather then having to inspect the nodes within the visualiser. Most of theses
features were left out due to time constraints.
In addition to allowing the user to customise the interface, the application would
also benefit by improving the visualisation, to allow collapsible nodes, and have
the nodes better represent what it contains, either as a minimised preview, or icons,
rather then using colour coding.
Along side the visualiser, other aspects of the user interface could be polished up
to make the overall experience enjoyable. Such as disabling the time line control
buttons when they cannot move along the time line in that direction. Zooming
to the mouse pointer position on the visualiser rather then always to the top left.
Lastly, allowing the section to be popped out of the main window frame, allowing
them to be dragged around separately.
Although theses changes do not affect the application drastically, they are complementary
to the entire experience. As stated towards the start of this paper, the tool should be
accessible to everyone. In order to accomplish this the application should be easy
and enjoyable to use. By implementing the above user interface enhancements, the
user experience and ease of use will increase alongside it, opening the application
up to more users.
5 EVALUATION 48
5.4 Usage
The final system, has the distinct advantage over many over application of showing
how the SQLite file is laid out. In addition to the visualisation it can also show
the changes from every single connection and command ran onto the database,
something previously impossible. This allows the application to be used is a
multitude of different ways.
Firstly, the application can be used within or for educational purposes to teach and
under stand how SQLite works internally, as the user can see the database grow
an shrink. This can then be generalised further to show how database systems use
B-Trees.
Secondly, the application could be used as a tool to aid the debugging process
of SQLite, and as a log. This would allow users to track changes and see any
anomalies that occur, in both the file format, such as missing tables or corrupt
databases. The log would allow users to record the activities with in database,
allowing detection of tables for optimisation, or detecting malicious activities.
The last usage coved here is as a replacement user interface for SQLite as stated
towards the start of this paper. With the combined features and user interface
enhancement mentioned above this tool could replace the current applications
used for SQLite.
5 EVALUATION 49
6 Conclusion
The start of this paper defined some very clear goals that this project hoped to
achieve by the end of it. Firstly, to understand of why SQLite is so good, what
makes it so prevalent and how it works. This included the file format and its
systems. This was achieved throughout the first chapter of this paper.
The second aim was to take this knowledge and build a tool that could record
all operations performed onto the database. While providing the same insight
gathered throughout this project without having to look through a Hex editor. It
should also be easy to use and well tested. Making it reliable and efficient. This
was successfully achieved through the second to forth sections.
Lastly, to look at where this project could be taken in the future, and what could be
done to take the application to the next stage. This involved critically evaluating
the final application and what could be changed or added. This was achieved in
the final sections of this paper.
In conclusion this project has been an overall success. The main aims that have
been set out were reached. However, the performance could still be improved. The
user interface stills need some polishing in order to make it more user friendly.
Providing a better visualisation of the database. Though the last stretch is always
the longest and a lot of time could be spent polishing the interface, and fixing all
the edge cases that have not yet made themselves apparent.
In the future, other features could be added such as providing support for the
other system such as extensions, lock byte and pointer map pages, and any other
changes made to SQLite. Another project that would also be useful stemming off
of this one is to try and recreate the original SQL query sent to the database based
on the changes made, since it can only currently only list the changes.
But, with that said this has been an enjoyable project, and by then end of it all
have learnt a great deal about SQLite, Java and JavaFX, some more of the unique
data structures such as the Merkle trees. In addition to a nice tool that can be used
in the future whenever working on a SQLite database to discover any problems.
6 CONCLUSION 50
References
6 CONCLUSION 51
Owens M. (2006). The Definitive Guide to SQLite, Berkeley, California, Apress.
Hipp R. (2015) SQLite: The Database at the Edge of the Network. On line
Video, Skookum, https://www.youtube.com/watch?v=Jib2AmRb_rk. Last
Accessed 17th January 2016.
6 CONCLUSION 52
7 Appendix
7.1 Appendix A
Table 7.1 show the header layout. All multibyte fields are stored in a big-endian
format.
7 APPENDIX 53
Byte Offset Byte Size Description
44 4 Schema format number. either 1, 2, 3 or 4.
1. Format support back to version 3.0.0.
2. Varying number of columns within the same table. From
Version 3.1.3.
3. Extra column can be non-NULL values. From Version
3.1.4.
4. Respects DESC keyword and boolean type. From
Version 3.3.0.
7.2 Appendix B
Time in milliseconds to parse a database.
Time in milliseconds
210
147
147
7 APPENDIX 54
Time in milliseconds
163
145
142
141
139
138
169
Table 7.2: Time in miliseconds to parse
7 APPENDIX 55