summaryrefslogtreecommitdiff
path: root/doc/src/sgml/intro.sgml
blob: 5fe7b2c7192682ba3b5b21d604503596ad085782 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
<!-- doc/src/sgml/intro.sgml -->

<preface id="preface">
 <title>Preface</title>

  <para>
   This book is the official documentation of
   <productname>Postgres-XL</productname>.  It has been written by the
   <productname>Postgres-XL</productname> developers and other volunteers in
   parallel to the development of the <productname>Postgres-XL</productname>
   software.  It describes all the functionality that the current version of
   <productname>Postgres-XL</productname> officially supports.
  </para>
  <para>
   <productname>Postgres-XL</> is essentially a collection of multiple
   <productname>PostgreSQL</> databases to provide both read and write
   performance scalability. It also provides the same full-featured transaction
   consistency as <productname>PostgreSQL</> provides, at the exception of SSI
   which is incomplete.
  </para>
  <para>
   <productname>Postgres-XL</> inherits almost all major features from <productname>PostgreSQL</>.
   This document is also based upon <productname>PostgreSQL</> reference manual.
 </para>

 <para>
  To make the large amount of information about
  <productname>PostgreSQL</productname> manageable, this book has been
  organized in several parts.  Each part is targeted at a different
  class of users, or at users in different stages of their
  <productname>PostgreSQL</productname> experience:

  <itemizedlist>
   <listitem>
    <para>
     <xref linkend="tutorial"> is an informal introduction for new users.
    </para>
   </listitem>

   <listitem>
    <para>
     <xref linkend="sql"> documents the <acronym>SQL</acronym> query
     language environment, including data types and functions, as well
     as user-level performance tuning.  Every
     <productname>PostgreSQL</> user should read this.
    </para>
   </listitem>

   <listitem>
    <para>
     <xref linkend="admin"> describes the installation and
     administration of the server.  Everyone who runs a
     <productname>PostgreSQL</productname> server, be it for private
     use or for others, should read this part.
    </para>
   </listitem>

   <listitem>
    <para>
     <xref linkend="client-interfaces"> describes the programming
     interfaces for <productname>PostgreSQL</productname> client
     programs.
    </para>
   </listitem>


   <listitem>
    <para>
     <xref linkend="server-programming"> contains information for
     advanced users about the extensibility capabilities of the
     server.  Topics include user-defined data types and
     functions.
    </para>
   </listitem>

   <listitem>
    <para>
     <xref linkend="reference"> contains reference information about
     SQL commands, client and server programs.  This part supports
     the other parts with structured information sorted by command or
     program.
    </para>
   </listitem>

   <listitem>
    <para>
     <xref linkend="internals"> contains assorted information that might be of
     use to <productname>PostgreSQL</> developers.
    </para>
   </listitem>
  </itemizedlist>
 </para>

 <sect1 id="intro-whatis">
  <title> What is <productname>PostgreSQL</productname>?</title>

  <para>
   <productname>PostgreSQL</productname> is an object-relational
   database management system (<acronym>ORDBMS</acronym>) based on
   <ulink url="http://db.cs.berkeley.edu/postgres.html">
   <productname>POSTGRES, Version 4.2</productname></ulink>,
   developed at the University of California at Berkeley Computer Science
   Department.  POSTGRES pioneered many concepts that only became
   available in some commercial database systems much later.
  </para>

  <para>
   <productname>PostgreSQL</productname> is an open-source descendant
   of this original Berkeley code.  It supports a large part of the SQL
   standard and offers many modern features:

   <itemizedlist spacing="compact">
    <listitem>
     <simpara>complex queries</simpara>
    </listitem>
    <listitem>
     <simpara>foreign keys</simpara>
    </listitem>
    <listitem>
     <simpara>triggers</simpara>
    </listitem>
    <listitem>
     <simpara>updatable views</simpara>
    </listitem>
    <listitem>
     <simpara>transactional integrity</simpara>
    </listitem>
    <listitem>
     <simpara>multiversion concurrency control</simpara>
    </listitem>
   </itemizedlist>

   Also, <productname>PostgreSQL</productname> can be extended by the
   user in many ways, for example by adding new

   <itemizedlist spacing="compact">
    <listitem>
     <simpara>data types</simpara>
    </listitem>
    <listitem>
     <simpara>functions</simpara>
    </listitem>
    <listitem>
     <simpara>operators</simpara>
    </listitem>
    <listitem>
     <simpara>aggregate functions</simpara>
    </listitem>
    <listitem>
     <simpara>index methods</simpara>
    </listitem>
    <listitem>
     <simpara>procedural languages</simpara>
    </listitem>
   </itemizedlist>
  </para>

  <para>
   And because of the liberal license,
   <productname>PostgreSQL</productname> can be used, modified, and
   distributed by anyone free of charge for any purpose, be it
   private, commercial, or academic.
  </para>

 </sect1>
 <sect1 id="intro-whatis-postgres-xl">
  <title> What is <productname>Postgres-XL</productname>?</title>

  <sect2 id="whatis-postgres-xl-in-short">
   <title>In short</title>

    <para>
     Postgres-XL is an open source project to provide both write-scalability
     and massively parallel processing transparently to PostgreSQL.  It is a
     collection of tightly coupled database components which can be installed
     on more than one system or virtual machine.
    </para>

    <para>
     Write-scalable means Postgres-XL can be configured with as many database
     servers as you want and handle many more writes (updating SQL statements)
     than a single standalone database server could otherwise do. You can have
     more than one database server that provides a single database view.  Any
     database update from any database server is immediately visible to any
     other transactions running on different servers.  Transparent means you do
     not necessarily need to worry about how your data is stored in more than
     one database servers internally.
     <footnote>
      <para>
       Of course, you should use the information about how tables are stored
       internally when you design the database physically to get most from
       Postgres-XL.
      </para>
     </footnote>
    </para>

   <para>
    You can configure Postgres-XL to run on more than one machine. It stores
    your data in a distributed way, that is, partitioned or replicated
    depending on what is chosen for each table.
    <footnote>
     <para>
      To distinguish from PostgreSQL's native partitioning, we refer to this as
      "distribution".  In distributed database textbooks, this is often
      referred to as a "horizontal fragment", and more recently, sharding.
     </para>
    </footnote>
    When you issue queries, Postgres-XL determines where the target data is
    stored and dispatches corresponding plans to the servers containing the
    target data.
   </para>

   <para>
    In typical web systems, you can have as many web servers or application
    servers to handle your transactions. However, you cannot do this for a
    database server in general because all the changing data have to be visible
    to all the transactions. Unlike other database cluster solutions,
    Postgres-XL provides this capability. You can install as many database
    servers as you like. Each database server provides uniform data view to
    your applications.  Any database update from any server is immediately
    visible to applications connecting the database from other servers. This is
    one of the most important features of Postgres-XL.
   </para>

   <para>
    The other significant feature of Postgres-XL is MPP parallelism.  You can
    use Postgres-XL to handle workloads for Business Intelligence, Data
    Warehousing, or Big Data. In Postgres-XL, a plan is generated once on a
    coordinator, and sent down to the individual data nodes. This is then
    executed, with the data nodes communicating directly with one another,
    where each understands from where it is expected to receive any tuples that
    it needs to ship, and where it needs to send to others.
   </para>

  </sect2>

  <sect2 id="whatis-xc-goal">
   <title>Postgres-XL's Goal</title>

   <para>
    The ultimate goal of Postgres-XL is to provide database scalability with
    ACID consistency across all types of database workloads.  That is,
    Postgres-XL should provide the following features:
    <itemizedlist spacing="compact">
     <listitem>
      <para>
       Postgres-XL should provide multiple servers to accept transactions and
       statements from applications, which are known as "Coordinator"
       processes.
      </para>
     </listitem>
     <listitem>
      <para>
       Any Coordinator should provide a consistent database view to
       applications. Any updates from any Coordinator must be visible in real
       time as if such updates are done in single PostgreSQL server.
      </para>
     </listitem>
     <listitem>
      <para>
       Postgres-XL should allow Datanodes to communicate directly with one
       another execute queries in an efficient and parallel manner.
      </para>
     </listitem>
     <listitem>
      <para>
       Tables should be able to be stored in the database designated as
       replicated or distributed (known as fragments or partitions).
       Replication and distribution should be transparent to applications; that
       is, such replicated and distributed tables are seen as single tables and
       the location or number of copies of each record/tuple is managed by
       Postgres-XL and is not visible to applications.
      </para>
     </listitem>
     <listitem>
      <para>
       Postgres-XL provides compatible PostgreSQL API to applications.
      </para>
     </listitem>
     <listitem>
      <para>
       Postgres-XL should provide single and unified view of
       underlying PostgreSQL database servers so that SQL statements
       do not depend on how the tables are actually stored.
      </para>
     </listitem>
    </itemizedlist>
   </para>
  </sect2>


  <sect2 id="whatis-xc-key-components">
   <title>Postgres-XL Key Components</title>

   <para>
    In this section, we will describe the main components of Postgres-XL.
   </para>

   <para>
    Postgres-XL is composed of three major components: the GTM (Global
    Transaction Manager), the Coordinator and the Datanode. Their features are
    given in the following sections.
   </para>

   <sect3>
    <title>GTM (Global Transaction Manager)</title>

    <para>
     The GTM is a key component of Postgres-XL to provide consistent
     transaction management and tuple visibility control.
    </para>

    <para>
     As described later in this manual, <productname>PostgreSQL</productname>'s
     transaction management is based upon MVCC (Multi-Version Concurrency
     Control) technology.  <productname>Postgres-XL</productname> extracts this
     technology into separate component such as the GTM so that any
     <productname>Postgres-XL</productname> component's transaction management
     is based upon single global status.  Details will be described in <xref
     linkend="overview">.
    </para>
   </sect3>

   <sect3>
    <title>Coordinator</title>

    <para>
     The Coordinator is an interface to the database for applications. It acts
     like a conventional PostgreSQL backend process, however the Coordinator
     does not store any actual data. The actual data is stored by the Datanodes
     as described below. The Coordinator receives SQL statements, gets Global
     Transaction Id and Global Snapshots as needed, determines which Datanodes
     are involved and asks them to execute (a part of) statement.  When issuing
     statement to Datanodes, it is associated with GXID and Global Snapshot so
     that Multi-version Concurrency Control (MVCC) properties extend
     cluster-wide.
    </para>
   </sect3>

   <sect3>
    <title>Datanode</title>

    <para>
     The Datanode actually stores user data. Tables may be distributed among
     Datanodes, or replicated to all the Datanodes.  The Datanode does not have
     a global view of the whole database, it just takes care of locally stored
     data. Incoming statements are examined by the Coordinator as described
     next, and subplans are made. These are  then transferred to each Datanode
     involved together with a GXID and Global Snapshot as needed. The datanode
     may receive request from various Coordinators in separate sessions.
     However, because each transaction is identified uniquely and associated
     with a consistent (global) snapshot, each Datanode can properly execute in
     its transaction and snapshot context.
    </para>

   </sect3>

  </sect2>

  <sect2 id="whatis-xc-inherits-postgresql">
   <title><productname>Postgres-XL</productname> Inherits From <productname>PostgreSQL</productname></title>

   <para>
    <productname>Postgres-XL</productname> is an extension to
    <productname>PostgreSQL</productname> and inherits most of its features.
   </para>

   <para>
    It is an open-source descendant of <productname>PostgreSQL</productname>
    and its original Berkeley code.  It supports a large part of the SQL
    standard and offers many modern features:

    <itemizedlist spacing="compact">
     <listitem>
      <simpara>complex queries</simpara>
     </listitem>
     <listitem>
      <simpara>
       foreign keys
       <footnote>
        <para>
         <productname>Postgres-XL</productname>'s foreign key usage has some
         restrictions.  For details, see <xref linkend="SQL-CREATETABLE">.
        </para>
       </footnote>
      </simpara>
     </listitem>
     <listitem>
      <simpara>
       triggers
       <footnote>
        <para>
         <productname>Postgres-XL</productname> does not support triggers in
         the current version.  This may be supported in future releases.
        </para>
       </footnote>
      </simpara>
     </listitem>
     <listitem>
      <simpara>views</simpara>
     </listitem>
     <listitem>
      <simpara>transactional integrity, at the exception of SSI whose support
      is incomplete</simpara>
     </listitem>
     <listitem>
      <simpara>multiversion concurrency control</simpara>
     </listitem>
    </itemizedlist>

    Also, similar to <productname>PostgreSQL</productname>,
    <productname>Postgres-XL</productname> can be extended by the user in many
    ways, for example by adding new

    <itemizedlist spacing="compact">
     <listitem>
      <simpara>data types</simpara>
     </listitem>
     <listitem>
      <simpara>functions</simpara>
     </listitem>
     <listitem>
      <simpara>operators</simpara>
     </listitem>
     <listitem>
      <simpara>aggregate functions</simpara>
     </listitem>
     <listitem>
      <simpara>index methods</simpara>
     </listitem>
     <listitem>
      <simpara>procedural languages</simpara>
     </listitem>
    </itemizedlist>
   </para>

   <para>
    <productname>Postgres-XL</productname> can be used, modified, and
    distributed by anyone free of charge for any purpose, be it private,
    commercial, or academic, provided it adheres to the PostgreSQL License.
   </para>

  </sect2>
 </sect1>

 &history;
 &notation;
 &info;
 &problems;

</preface>