Preface
This book is the official documentation of
Postgres-XL. It has been written by the
Postgres-XL developers and other volunteers in
parallel to the development of the Postgres-XL
software. It describes all the functionality that the current version of
Postgres-XL officially supports.
Postgres-XL> is essentially a collection of multiple
PostgreSQL> databases to provide both read and write
performance scalability. It also provides the same full-featured transaction
consistency as PostgreSQL> provides, at the exception of SSI
which is incomplete.
Postgres-XL> inherits almost all major features from PostgreSQL>.
This document is also based upon PostgreSQL> reference manual.
To make the large amount of information about
PostgreSQL manageable, this book has been
organized in several parts. Each part is targeted at a different
class of users, or at users in different stages of their
PostgreSQL experience:
is an informal introduction for new users.
documents the SQL query
language environment, including data types and functions, as well
as user-level performance tuning. Every
PostgreSQL> user should read this.
describes the installation and
administration of the server. Everyone who runs a
PostgreSQL server, be it for private
use or for others, should read this part.
describes the programming
interfaces for PostgreSQL client
programs.
contains information for
advanced users about the extensibility capabilities of the
server. Topics include user-defined data types and
functions.
contains reference information about
SQL commands, client and server programs. This part supports
the other parts with structured information sorted by command or
program.
contains assorted information that might be of
use to PostgreSQL> developers.
What is PostgreSQL?
PostgreSQL is an object-relational
database management system (ORDBMS) based on
POSTGRES, Version 4.2,
developed at the University of California at Berkeley Computer Science
Department. POSTGRES pioneered many concepts that only became
available in some commercial database systems much later.
PostgreSQL is an open-source descendant
of this original Berkeley code. It supports a large part of the SQL
standard and offers many modern features:
complex queries
foreign keys
triggers
updatable views
transactional integrity
multiversion concurrency control
Also, PostgreSQL can be extended by the
user in many ways, for example by adding new
data types
functions
operators
aggregate functions
index methods
procedural languages
And because of the liberal license,
PostgreSQL can be used, modified, and
distributed by anyone free of charge for any purpose, be it
private, commercial, or academic.
What is Postgres-XL?
In short
Postgres-XL is an open source project to provide both write-scalability
and massively parallel processing transparently to PostgreSQL. It is a
collection of tightly coupled database components which can be installed
on more than one system or virtual machine.
Write-scalable means Postgres-XL can be configured with as many database
servers as you want and handle many more writes (updating SQL statements)
than a single standalone database server could otherwise do. You can have
more than one database server that provides a single database view. Any
database update from any database server is immediately visible to any
other transactions running on different servers. Transparent means you do
not necessarily need to worry about how your data is stored in more than
one database servers internally.
Of course, you should use the information about how tables are stored
internally when you design the database physically to get most from
Postgres-XL.
You can configure Postgres-XL to run on more than one machine. It stores
your data in a distributed way, that is, partitioned or replicated
depending on what is chosen for each table.
To distinguish from PostgreSQL's native partitioning, we refer to this as
"distribution". In distributed database textbooks, this is often
referred to as a "horizontal fragment", and more recently, sharding.
When you issue queries, Postgres-XL determines where the target data is
stored and dispatches corresponding plans to the servers containing the
target data.
In typical web systems, you can have as many web servers or application
servers to handle your transactions. However, you cannot do this for a
database server in general because all the changing data have to be visible
to all the transactions. Unlike other database cluster solutions,
Postgres-XL provides this capability. You can install as many database
servers as you like. Each database server provides uniform data view to
your applications. Any database update from any server is immediately
visible to applications connecting the database from other servers. This is
one of the most important features of Postgres-XL.
The other significant feature of Postgres-XL is MPP parallelism. You can
use Postgres-XL to handle workloads for Business Intelligence, Data
Warehousing, or Big Data. In Postgres-XL, a plan is generated once on a
coordinator, and sent down to the individual data nodes. This is then
executed, with the data nodes communicating directly with one another,
where each understands from where it is expected to receive any tuples that
it needs to ship, and where it needs to send to others.
Postgres-XL's Goal
The ultimate goal of Postgres-XL is to provide database scalability with
ACID consistency across all types of database workloads. That is,
Postgres-XL should provide the following features:
Postgres-XL should provide multiple servers to accept transactions and
statements from applications, which are known as "Coordinator"
processes.
Any Coordinator should provide a consistent database view to
applications. Any updates from any Coordinator must be visible in real
time as if such updates are done in single PostgreSQL server.
Postgres-XL should allow Datanodes to communicate directly with one
another execute queries in an efficient and parallel manner.
Tables should be able to be stored in the database designated as
replicated or distributed (known as fragments or partitions).
Replication and distribution should be transparent to applications; that
is, such replicated and distributed tables are seen as single tables and
the location or number of copies of each record/tuple is managed by
Postgres-XL and is not visible to applications.
Postgres-XL provides compatible PostgreSQL API to applications.
Postgres-XL should provide single and unified view of
underlying PostgreSQL database servers so that SQL statements
do not depend on how the tables are actually stored.
Postgres-XL Key Components
In this section, we will describe the main components of Postgres-XL.
Postgres-XL is composed of three major components: the GTM (Global
Transaction Manager), the Coordinator and the Datanode. Their features are
given in the following sections.
GTM (Global Transaction Manager)
The GTM is a key component of Postgres-XL to provide consistent
transaction management and tuple visibility control.
As described later in this manual, PostgreSQL's
transaction management is based upon MVCC (Multi-Version Concurrency
Control) technology. Postgres-XL extracts this
technology into separate component such as the GTM so that any
Postgres-XL component's transaction management
is based upon single global status. Details will be described in .
Coordinator
The Coordinator is an interface to the database for applications. It acts
like a conventional PostgreSQL backend process, however the Coordinator
does not store any actual data. The actual data is stored by the Datanodes
as described below. The Coordinator receives SQL statements, gets Global
Transaction Id and Global Snapshots as needed, determines which Datanodes
are involved and asks them to execute (a part of) statement. When issuing
statement to Datanodes, it is associated with GXID and Global Snapshot so
that Multi-version Concurrency Control (MVCC) properties extend
cluster-wide.
Datanode
The Datanode actually stores user data. Tables may be distributed among
Datanodes, or replicated to all the Datanodes. The Datanode does not have
a global view of the whole database, it just takes care of locally stored
data. Incoming statements are examined by the Coordinator as described
next, and subplans are made. These are then transferred to each Datanode
involved together with a GXID and Global Snapshot as needed. The datanode
may receive request from various Coordinators in separate sessions.
However, because each transaction is identified uniquely and associated
with a consistent (global) snapshot, each Datanode can properly execute in
its transaction and snapshot context.
Postgres-XL Inherits From PostgreSQL
Postgres-XL is an extension to
PostgreSQL and inherits most of its features.
It is an open-source descendant of PostgreSQL
and its original Berkeley code. It supports a large part of the SQL
standard and offers many modern features:
complex queries
foreign keys
Postgres-XL's foreign key usage has some
restrictions. For details, see .
triggers
Postgres-XL does not support triggers in
the current version. This may be supported in future releases.
views
transactional integrity, at the exception of SSI whose support
is incomplete
multiversion concurrency control
Also, similar to PostgreSQL,
Postgres-XL can be extended by the user in many
ways, for example by adding new
data types
functions
operators
aggregate functions
index methods
procedural languages
Postgres-XL can be used, modified, and
distributed by anyone free of charge for any purpose, be it private,
commercial, or academic, provided it adheres to the PostgreSQL License.
&history;
¬ation;
&info;
&problems;