Reportnet3 Database Feasibility Study PostgreSQL
Reportnet3 Database Feasibility Study PostgreSQL
Contents
Querying.................................................................................................................................................. 1
Exploiting data ........................................................................................................................................ 1
Internal relationships .............................................................................................................................. 2
External relationships ............................................................................................................................. 2
Inserting data .......................................................................................................................................... 2
Multi-user access, concurrency and transactions................................................................................... 3
Altering data structures .......................................................................................................................... 3
Copying datasets and snapshots............................................................................................................. 3
Creating copies.................................................................................................................................... 3
Performing backups ................................................................................................................................ 4
Security ................................................................................................................................................... 4
Spatial data ............................................................................................................................................. 5
Binary data .............................................................................................................................................. 5
Scalability ................................................................................................................................................ 5
Querying
PostgreSQL provides a full SQL syntax, and this document assumes familiarity with it. Querying by
the application can be done in optimal ways thanks to this syntax. Regular users can also benefit
from the ability to write filters in a rather friendly way, like status=’Open’ and date>’2018-03-01’
Bottom line: Widely-known syntax and full-featured capabilities. Friendly filter syntax.
Score: 5
Exploiting data
External exploitation of the data is made by EEA end applications. Currently, the main types are
Map services that generally use extracts (FGDB) made with FME
Downloadable datasets (CSV, mdb, SQLite, xls,…) also made with FME or custom scripts
Tableau dashboards that access the database directly
Web applications that access the database directly
FME can query PostgreSQL through native readers that support filtering. Complex queries can be
performed via SQLExecutors.
If necessary, map servers (including ArcGIS) can use PostgreSQL directly as a live datasource.
Tableau supports connetions to PostgreSQL out of the box, with similar capabilities as those of the
SQL server connector.
Native drivers exist for all major development platforms, and have been around for quite a long
time, ensuring stability and completeness.
Bottom line: Mature client components for all use cases.
Score: 5
Internal relationships
A single, standard way of encoding all relation types can be easily agreed upon. 1-1, 1-N and N-M
relationships, with their variants (required-not required) map to normalized and efficient database
structures, and the Reportnet application will implement a single algorithm for each.
If custom queries are allowed, relationships will expose a normalized structure.
Bottom line: need to implement the logic, but underlying structure is standard.
Score: 4
External relationships
An official extension in PostgreSQL (Foreign Data Wrappers) allows querying heterogeneous external
data sources in a manner similar to SQL Server Polybase. This is also the preferred method to query
other external PostgreSQL databases.
Both databases in external servers and those in the same server can be accessed in this unified way.
One of the limitations is that, when referencing remote tables, the spec (field names and types)
must be defined as a local “proxy” object. This means that, if the remote table changes, the local
proxy needs to be changed accordingly. In order to keep everything in sync, the application needs to
keep metadata and re-create proxies as needed. Mappings must also be defined for users across
servers.
Another drawback is that foreign keys cannot be enabled cross-database.
Bottom line: very similar implementation to internal relationships. Visible even without the R3
application. R3 needs to keep track of connections and maintain them.
Score: 3
Inserting data
This process would normally be controlled in all possible input channels
Forms
REST API
Bulk load from files
Custom applications, commercial products or power users will find a familiar sql interface.
Bottom line: application wrappers will be easy to make. Familiar interface towards third parties.
Score: 4
Multi-user access, concurrency and transactions
PostgreSQL runs as a service that can be accessed by multiple users simultaneously without any
special configuration.
It provides fully ACID compliant transactions with the standard isolation levels. User code can
employ the usual BEGIN .. COMMIT .. ROLLBACK cycle.
Some sort of nested transaction mechanism is implemented through SAVEPOINTs
So, concurrent operations read-write or write-write should behave as expected, both if they are
made from the application or directly on the database server.
Bottom line: transactions are available regardless of R3, and behave “as expected”.
Score: 4
Creating copies
Types
Empty: structure + refrence data
Full copies of all the data
Scenarios
At design time, to experiment or share with stakeholders (empty or full)
When reporting starts, to create a new “country sandbox” (empty)
During reporting, inside each country’s sandbox, to make a new “release” (empty or full)
The standard command line tools provided by postgresql allow highly customizable backups.
Combining flags, we can create backups that contain all the table structures and any combination of
table data. Thus, we can copy all the table data, none or just some. Metadata information is needed
in order to know which tables are “reference data”.
The syntax is complex and will require extensive testing before the application can safely clone
databases, but once there it’s likely to be very reliable. A minor drawback is that most options
require a temporary dump file to be created on disk
Bottom line: powerful mechanism for creating copies with minor drawbacks. Metadata is needed.
Score: 4
Performing backups
There are multiple ways to perform system backups in PostgreSQL. This section is dedicated to the IT
infrastructure approach to backups, as opposed to the previous one. That is, we will discuss how to
perform regular backups for disaster recovery purposes.
Dump + restore
Using the same command line tools from the previous section, full backups can be created in
different formats. Each format has its distinct advantage, like cross-version compatibility, backup
size or restoring speed.
In any case, they all perform full, consistent backups of the system at the time they are made, and
operations need not be stopped while the backup runs (except altering structure).
Filesystem based
Most of these require the server to be stopped. This can be avoided or reduced by using filesystem-
level snapshot capabilities, but complexity increases. You always have to backup the entire system
and, like in the previous case, incremental backups are not possible.
Continuous archiving
Consists of making filesystem full copies from time to time, and continuously archive the transaction
logs after that point. The FS copies need not be totally consistent, Provides point in time recovery
capabilities.
Full copies are made by pg_basebackup, and do not affect the running system. Transaction logs are
archived by defining a command line that needs to be run whenever one 16Mb log file is complete.
Bottom line: provides out of the box options for full and incremental backups. Minor manual
scripting might be necessary
Score: 4
Security
PostgreSQL can authenticate users through a wide choice of methods that include LDAP and
integrated windows authentication. They should cover any conceivable scenario for Reportnet 3.
In postgresql, both users and groups are actually roles. Some roles can belong to other roles (this is
how “users” belong in “groups”. All authorization is based on roles.
Authorization rules can be set as low as the row level.
Bottom line: Multiple authentication mechanisms, including LDAP, windows and user+pass. Fine
grained control down to row level
Score: 4
Spatial data
PostGIS is a widely known extension to PostgreSQL that handles spatial data. It is mature, stable and
used around the world. Its feature set covers every conceivable use case we might come across,
including raster support.
Bottom line: PostGIS has everything we might ever need.
Score: 5
Binary data
Two types of binary fields are supported by postgresql, one of them aimed at really larger objects.
Both have minor drawbacks, like a different syntax to interact or high memory requirements. Size
limitations, even in the “small” version are quite generous (1Gb/4Tb)
Bottom line: Supported, and relatively easy to interact with.
Score: 4
Scalability
The simple alternative described in the mongodb paper applies here as well, with the same setup
and challenges for the application.
For horizontal scaling, a number of solutions exist, the most promising being Citus Data. Citus
provides sharding over an arbitrary number of nodes, and is an extension to Postgresql. It is open
source, with paid support optional. The shariding key problem is the same as in the mongodb study,
with the only obvious automatic choice being countryCode.
Bottom line: Horizontal scaling available with extensions. Sharding key difficult to assign
automatically. Simple alternative easy to setup.
Score: 3