1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
|
<!-- doc/src/sgml/intro.sgml -->
<preface id="preface">
<title>Preface</title>
<para>
This book is the official documentation of
<productname>Postgres-XL</productname>. It has been written by the
<productname>Postgres-XL</productname> developers and other volunteers in
parallel to the development of the <productname>Postgres-XL</productname>
software. It describes all the functionality that the current version of
<productname>Postgres-XL</productname> officially supports.
</para>
<para>
<productname>Postgres-XL</> is essentially a collection of multiple
<productname>PostgreSQL</> databases to provide both read and write
performance scalability. It also provides the same full-featured transaction
consistency as <productname>PostgreSQL</> provides, at the exception of SSI
which is incomplete.
</para>
<para>
<productname>Postgres-XL</> inherits almost all major features from <productname>PostgreSQL</>.
This document is also based upon <productname>PostgreSQL</> reference manual.
</para>
<para>
To make the large amount of information about
<productname>PostgreSQL</productname> manageable, this book has been
organized in several parts. Each part is targeted at a different
class of users, or at users in different stages of their
<productname>PostgreSQL</productname> experience:
<itemizedlist>
<listitem>
<para>
<xref linkend="tutorial"> is an informal introduction for new users.
</para>
</listitem>
<listitem>
<para>
<xref linkend="sql"> documents the <acronym>SQL</acronym> query
language environment, including data types and functions, as well
as user-level performance tuning. Every
<productname>PostgreSQL</> user should read this.
</para>
</listitem>
<listitem>
<para>
<xref linkend="admin"> describes the installation and
administration of the server. Everyone who runs a
<productname>PostgreSQL</productname> server, be it for private
use or for others, should read this part.
</para>
</listitem>
<listitem>
<para>
<xref linkend="client-interfaces"> describes the programming
interfaces for <productname>PostgreSQL</productname> client
programs.
</para>
</listitem>
<listitem>
<para>
<xref linkend="server-programming"> contains information for
advanced users about the extensibility capabilities of the
server. Topics include user-defined data types and
functions.
</para>
</listitem>
<listitem>
<para>
<xref linkend="reference"> contains reference information about
SQL commands, client and server programs. This part supports
the other parts with structured information sorted by command or
program.
</para>
</listitem>
<listitem>
<para>
<xref linkend="internals"> contains assorted information that might be of
use to <productname>PostgreSQL</> developers.
</para>
</listitem>
</itemizedlist>
</para>
<sect1 id="intro-whatis">
<title> What is <productname>PostgreSQL</productname>?</title>
<para>
<productname>PostgreSQL</productname> is an object-relational
database management system (<acronym>ORDBMS</acronym>) based on
<ulink url="http://db.cs.berkeley.edu/postgres.html">
<productname>POSTGRES, Version 4.2</productname></ulink>,
developed at the University of California at Berkeley Computer Science
Department. POSTGRES pioneered many concepts that only became
available in some commercial database systems much later.
</para>
<para>
<productname>PostgreSQL</productname> is an open-source descendant
of this original Berkeley code. It supports a large part of the SQL
standard and offers many modern features:
<itemizedlist spacing="compact">
<listitem>
<simpara>complex queries</simpara>
</listitem>
<listitem>
<simpara>foreign keys</simpara>
</listitem>
<listitem>
<simpara>triggers</simpara>
</listitem>
<listitem>
<simpara>updatable views</simpara>
</listitem>
<listitem>
<simpara>transactional integrity</simpara>
</listitem>
<listitem>
<simpara>multiversion concurrency control</simpara>
</listitem>
</itemizedlist>
Also, <productname>PostgreSQL</productname> can be extended by the
user in many ways, for example by adding new
<itemizedlist spacing="compact">
<listitem>
<simpara>data types</simpara>
</listitem>
<listitem>
<simpara>functions</simpara>
</listitem>
<listitem>
<simpara>operators</simpara>
</listitem>
<listitem>
<simpara>aggregate functions</simpara>
</listitem>
<listitem>
<simpara>index methods</simpara>
</listitem>
<listitem>
<simpara>procedural languages</simpara>
</listitem>
</itemizedlist>
</para>
<para>
And because of the liberal license,
<productname>PostgreSQL</productname> can be used, modified, and
distributed by anyone free of charge for any purpose, be it
private, commercial, or academic.
</para>
</sect1>
<sect1 id="intro-whatis-postgres-xl">
<title> What is <productname>Postgres-XL</productname>?</title>
<sect2 id="whatis-postgres-xl-in-short">
<title>In short</title>
<para>
Postgres-XL is an open source project to provide both write-scalability
and massively parallel processing transparently to PostgreSQL. It is a
collection of tightly coupled database components which can be installed
on more than one system or virtual machine.
</para>
<para>
Write-scalable means Postgres-XL can be configured with as many database
servers as you want and handle many more writes (updating SQL statements)
than a single standalone database server could otherwise do. You can have
more than one database server that provides a single database view. Any
database update from any database server is immediately visible to any
other transactions running on different servers. Transparent means you do
not necessarily need to worry about how your data is stored in more than
one database servers internally.
<footnote>
<para>
Of course, you should use the information about how tables are stored
internally when you design the database physically to get most from
Postgres-XL.
</para>
</footnote>
</para>
<para>
You can configure Postgres-XL to run on more than one machine. It stores
your data in a distributed way, that is, partitioned or replicated
depending on what is chosen for each table.
<footnote>
<para>
To distinguish from PostgreSQL's native partitioning, we refer to this as
"distribution". In distributed database textbooks, this is often
referred to as a "horizontal fragment", and more recently, sharding.
</para>
</footnote>
When you issue queries, Postgres-XL determines where the target data is
stored and dispatches corresponding plans to the servers containing the
target data.
</para>
<para>
In typical web systems, you can have as many web servers or application
servers to handle your transactions. However, you cannot do this for a
database server in general because all the changing data have to be visible
to all the transactions. Unlike other database cluster solutions,
Postgres-XL provides this capability. You can install as many database
servers as you like. Each database server provides uniform data view to
your applications. Any database update from any server is immediately
visible to applications connecting the database from other servers. This is
one of the most important features of Postgres-XL.
</para>
<para>
The other significant feature of Postgres-XL is MPP parallelism. You can
use Postgres-XL to handle workloads for Business Intelligence, Data
Warehousing, or Big Data. In Postgres-XL, a plan is generated once on a
coordinator, and sent down to the individual data nodes. This is then
executed, with the data nodes communicating directly with one another,
where each understands from where it is expected to receive any tuples that
it needs to ship, and where it needs to send to others.
</para>
</sect2>
<sect2 id="whatis-xc-goal">
<title>Postgres-XL's Goal</title>
<para>
The ultimate goal of Postgres-XL is to provide database scalability with
ACID consistency across all types of database workloads. That is,
Postgres-XL should provide the following features:
<itemizedlist spacing="compact">
<listitem>
<para>
Postgres-XL should provide multiple servers to accept transactions and
statements from applications, which are known as "Coordinator"
processes.
</para>
</listitem>
<listitem>
<para>
Any Coordinator should provide a consistent database view to
applications. Any updates from any Coordinator must be visible in real
time as if such updates are done in single PostgreSQL server.
</para>
</listitem>
<listitem>
<para>
Postgres-XL should allow Datanodes to communicate directly with one
another execute queries in an efficient and parallel manner.
</para>
</listitem>
<listitem>
<para>
Tables should be able to be stored in the database designated as
replicated or distributed (known as fragments or partitions).
Replication and distribution should be transparent to applications; that
is, such replicated and distributed tables are seen as single tables and
the location or number of copies of each record/tuple is managed by
Postgres-XL and is not visible to applications.
</para>
</listitem>
<listitem>
<para>
Postgres-XL provides compatible PostgreSQL API to applications.
</para>
</listitem>
<listitem>
<para>
Postgres-XL should provide single and unified view of
underlying PostgreSQL database servers so that SQL statements
do not depend on how the tables are actually stored.
</para>
</listitem>
</itemizedlist>
</para>
</sect2>
<sect2 id="whatis-xc-key-components">
<title>Postgres-XL Key Components</title>
<para>
In this section, we will describe the main components of Postgres-XL.
</para>
<para>
Postgres-XL is composed of three major components: the GTM (Global
Transaction Manager), the Coordinator and the Datanode. Their features are
given in the following sections.
</para>
<sect3>
<title>GTM (Global Transaction Manager)</title>
<para>
The GTM is a key component of Postgres-XL to provide consistent
transaction management and tuple visibility control.
</para>
<para>
As described later in this manual, <productname>PostgreSQL</productname>'s
transaction management is based upon MVCC (Multi-Version Concurrency
Control) technology. <productname>Postgres-XL</productname> extracts this
technology into separate component such as the GTM so that any
<productname>Postgres-XL</productname> component's transaction management
is based upon single global status. Details will be described in <xref
linkend="overview">.
</para>
</sect3>
<sect3>
<title>Coordinator</title>
<para>
The Coordinator is an interface to the database for applications. It acts
like a conventional PostgreSQL backend process, however the Coordinator
does not store any actual data. The actual data is stored by the Datanodes
as described below. The Coordinator receives SQL statements, gets Global
Transaction Id and Global Snapshots as needed, determines which Datanodes
are involved and asks them to execute (a part of) statement. When issuing
statement to Datanodes, it is associated with GXID and Global Snapshot so
that Multi-version Concurrency Control (MVCC) properties extend
cluster-wide.
</para>
</sect3>
<sect3>
<title>Datanode</title>
<para>
The Datanode actually stores user data. Tables may be distributed among
Datanodes, or replicated to all the Datanodes. The Datanode does not have
a global view of the whole database, it just takes care of locally stored
data. Incoming statements are examined by the Coordinator as described
next, and subplans are made. These are then transferred to each Datanode
involved together with a GXID and Global Snapshot as needed. The datanode
may receive request from various Coordinators in separate sessions.
However, because each transaction is identified uniquely and associated
with a consistent (global) snapshot, each Datanode can properly execute in
its transaction and snapshot context.
</para>
</sect3>
</sect2>
<sect2 id="whatis-xc-inherits-postgresql">
<title><productname>Postgres-XL</productname> Inherits From <productname>PostgreSQL</productname></title>
<para>
<productname>Postgres-XL</productname> is an extension to
<productname>PostgreSQL</productname> and inherits most of its features.
</para>
<para>
It is an open-source descendant of <productname>PostgreSQL</productname>
and its original Berkeley code. It supports a large part of the SQL
standard and offers many modern features:
<itemizedlist spacing="compact">
<listitem>
<simpara>complex queries</simpara>
</listitem>
<listitem>
<simpara>
foreign keys
<footnote>
<para>
<productname>Postgres-XL</productname>'s foreign key usage has some
restrictions. For details, see <xref linkend="SQL-CREATETABLE">.
</para>
</footnote>
</simpara>
</listitem>
<listitem>
<simpara>
triggers
<footnote>
<para>
<productname>Postgres-XL</productname> does not support triggers in
the current version. This may be supported in future releases.
</para>
</footnote>
</simpara>
</listitem>
<listitem>
<simpara>views</simpara>
</listitem>
<listitem>
<simpara>transactional integrity, at the exception of SSI whose support
is incomplete</simpara>
</listitem>
<listitem>
<simpara>multiversion concurrency control</simpara>
</listitem>
</itemizedlist>
Also, similar to <productname>PostgreSQL</productname>,
<productname>Postgres-XL</productname> can be extended by the user in many
ways, for example by adding new
<itemizedlist spacing="compact">
<listitem>
<simpara>data types</simpara>
</listitem>
<listitem>
<simpara>functions</simpara>
</listitem>
<listitem>
<simpara>operators</simpara>
</listitem>
<listitem>
<simpara>aggregate functions</simpara>
</listitem>
<listitem>
<simpara>index methods</simpara>
</listitem>
<listitem>
<simpara>procedural languages</simpara>
</listitem>
</itemizedlist>
</para>
<para>
<productname>Postgres-XL</productname> can be used, modified, and
distributed by anyone free of charge for any purpose, be it private,
commercial, or academic, provided it adheres to the PostgreSQL License.
</para>
</sect2>
</sect1>
&history;
¬ation;
&info;
&problems;
</preface>
|