Preface

Preface This book is the official documentation of Postgres-XL. It has been written by the Postgres-XL developers and other volunteers in parallel to the development of the Postgres-XL software. It describes all the functionality that the current version of Postgres-XL officially supports. Postgres-XL is essentially a collection of multiple PostgreSQL databases to provide both read and write performance scalability. It also provides the same full-featured transaction consistency as PostgreSQL provides, at the exception of SSI which is incomplete. Postgres-XL inherits almost all major features from PostgreSQL. This document is also based upon PostgreSQL reference manual. To make the large amount of information about PostgreSQL manageable, this book has been organized in several parts. Each part is targeted at a different class of users, or at users in different stages of their PostgreSQL experience: is an informal introduction for new users. documents the SQL query language environment, including data types and functions, as well as user-level performance tuning. Every PostgreSQL user should read this. describes the installation and administration of the server. Everyone who runs a PostgreSQL server, be it for private use or for others, should read this part. describes the programming interfaces for PostgreSQL client programs. contains information for advanced users about the extensibility capabilities of the server. Topics include user-defined data types and functions. contains reference information about SQL commands, client and server programs. This part supports the other parts with structured information sorted by command or program. contains assorted information that might be of use to PostgreSQL developers. What is <productname>PostgreSQL</productname>? PostgreSQL is an object-relational database management system (ORDBMS) based on POSTGRES, Version 4.2, developed at the University of California at Berkeley Computer Science Department. POSTGRES pioneered many concepts that only became available in some commercial database systems much later. PostgreSQL is an open-source descendant of this original Berkeley code. It supports a large part of the SQL standard and offers many modern features: complex queries foreign keys triggers updatable views transactional integrity multiversion concurrency control Also, PostgreSQL can be extended by the user in many ways, for example by adding new data types functions operators aggregate functions index methods procedural languages And because of the liberal license, PostgreSQL can be used, modified, and distributed by anyone free of charge for any purpose, be it private, commercial, or academic. What is <productname>Postgres-XL</productname>? In short Postgres-XL is an open source project to provide both write-scalability and massively parallel processing transparently to PostgreSQL. It is a collection of tightly coupled database components which can be installed on more than one system or virtual machine. Write-scalable means Postgres-XL can be configured with as many database servers as you want and handle many more writes (updating SQL statements) than a single standalone database server could otherwise do. You can have more than one database server that provides a single database view. Any database update from any database server is immediately visible to any other transactions running on different servers. Transparent means you do not necessarily need to worry about how your data is stored in more than one database servers internally. Of course, you should use the information about how tables are stored internally when you design the database physically to get most from Postgres-XL. You can configure Postgres-XL to run on more than one machine. It stores your data in a distributed way, that is, partitioned or replicated depending on what is chosen for each table. To distinguish from PostgreSQL's native partitioning, we refer to this as "distribution". In distributed database textbooks, this is often referred to as a "horizontal fragment", and more recently, sharding. When you issue queries, Postgres-XL determines where the target data is stored and dispatches corresponding plans to the servers containing the target data. In typical web systems, you can have as many web servers or application servers to handle your transactions. However, you cannot do this for a database server in general because all the changing data have to be visible to all the transactions. Unlike other database cluster solutions, Postgres-XL provides this capability. You can install as many database servers as you like. Each database server provides uniform data view to your applications. Any database update from any server is immediately visible to applications connecting the database from other servers. This is one of the most important features of Postgres-XL. The other significant feature of Postgres-XL is MPP parallelism. You can use Postgres-XL to handle workloads for Business Intelligence, Data Warehousing, or Big Data. In Postgres-XL, a plan is generated once on a coordinator, and sent down to the individual data nodes. This is then executed, with the data nodes communicating directly with one another, where each understands from where it is expected to receive any tuples that it needs to ship, and where it needs to send to others. Postgres-XL's Goal The ultimate goal of Postgres-XL is to provide database scalability with ACID consistency across all types of database workloads. That is, Postgres-XL should provide the following features: Postgres-XL should provide multiple servers to accept transactions and statements from applications, which are known as "Coordinator" processes. Any Coordinator should provide a consistent database view to applications. Any updates from any Coordinator must be visible in real time as if such updates are done in single PostgreSQL server. Postgres-XL should allow Datanodes to communicate directly with one another execute queries in an efficient and parallel manner. Tables should be able to be stored in the database designated as replicated or distributed (known as fragments or partitions). Replication and distribution should be transparent to applications; that is, such replicated and distributed tables are seen as single tables and the location or number of copies of each record/tuple is managed by Postgres-XL and is not visible to applications. Postgres-XL provides compatible PostgreSQL API to applications. Postgres-XL should provide single and unified view of underlying PostgreSQL database servers so that SQL statements do not depend on how the tables are actually stored. Postgres-XL Key Components In this section, we will describe the main components of Postgres-XL. Postgres-XL is composed of three major components: the GTM (Global Transaction Manager), the Coordinator and the Datanode. Their features are given in the following sections. GTM (Global Transaction Manager) The GTM is a key component of Postgres-XL to provide consistent transaction management and tuple visibility control. As described later in this manual, PostgreSQL's transaction management is based upon MVCC (Multi-Version Concurrency Control) technology. Postgres-XL extracts this technology into separate component such as the GTM so that any Postgres-XL component's transaction management is based upon single global status. Details will be described in . Coordinator The Coordinator is an interface to the database for applications. It acts like a conventional PostgreSQL backend process, however the Coordinator does not store any actual data. The actual data is stored by the Datanodes as described below. The Coordinator receives SQL statements, gets Global Transaction Id and Global Snapshots as needed, determines which Datanodes are involved and asks them to execute (a part of) statement. When issuing statement to Datanodes, it is associated with GXID and Global Snapshot so that Multi-version Concurrency Control (MVCC) properties extend cluster-wide. Datanode The Datanode actually stores user data. Tables may be distributed among Datanodes, or replicated to all the Datanodes. The Datanode does not have a global view of the whole database, it just takes care of locally stored data. Incoming statements are examined by the Coordinator as described next, and subplans are made. These are then transferred to each Datanode involved together with a GXID and Global Snapshot as needed. The datanode may receive request from various Coordinators in separate sessions. However, because each transaction is identified uniquely and associated with a consistent (global) snapshot, each Datanode can properly execute in its transaction and snapshot context. <productname>Postgres-XL</productname> Inherits From <productname>PostgreSQL</productname> Postgres-XL is an extension to PostgreSQL and inherits most of its features. It is an open-source descendant of PostgreSQL and its original Berkeley code. It supports a large part of the SQL standard and offers many modern features: complex queries foreign keys Postgres-XL's foreign key usage has some restrictions. For details, see . triggers Postgres-XL does not support triggers in the current version. This may be supported in future releases. views transactional integrity, at the exception of SSI whose support is incomplete multiversion concurrency control Also, similar to PostgreSQL, Postgres-XL can be extended by the user in many ways, for example by adding new data types functions operators aggregate functions index methods procedural languages Postgres-XL can be used, modified, and distributed by anyone free of charge for any purpose, be it private, commercial, or academic, provided it adheres to the PostgreSQL License. &history; ¬ation; &info; &problems;