summaryrefslogtreecommitdiff
path: root/doc/src/sgml/start.sgml
blob: e195e8456dad1849f9a28c41b507467c8b8c3c34 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
<!-- doc/src/sgml/start.sgml -->

 <chapter id="tutorial-start">
  <title>Getting Started</title>

  <sect1 id="tutorial-install">
   <title>Installation</title>

   <para>
    Before you can use <productname>PostgreSQL</productname> you need
    to install it, of course.  It is possible that
    <productname>PostgreSQL</productname> is already installed at your
    site, either because it was included in your operating system
    distribution or because the system administrator already installed
    it.  If that is the case, you should obtain information from the
    operating system documentation or your system administrator about
    how to access <productname>PostgreSQL</productname>.
   </para>

   <para>
    If you are not sure whether <productname>PostgreSQL</productname>
    is already available or whether you can use it for your
    experimentation then you can install it yourself.  Doing so is not
    hard and it can be a good exercise.
    <productname>PostgreSQL</productname> can be installed by any
    unprivileged user; no superuser (<systemitem>root</systemitem>)
    access is required.
   </para>

   <para>
    If you are installing <productname>PostgreSQL</productname>
    yourself, then refer to <xref linkend="installation">
    for instructions on installation, and return to
    this guide when the installation is complete.  Be sure to follow
    closely the section about setting up the appropriate environment
    variables.
   </para>

   <para>
    If your site administrator has not set things up in the default
    way, you might have some more work to do.  For example, if the
    database server machine is a remote machine, you will need to set
    the <envar>PGHOST</envar> environment variable to the name of the
    database server machine.  The environment variable
    <envar>PGPORT</envar> might also have to be set.  The bottom line is
    this: if you try to start an application program and it complains
    that it cannot connect to the database, you should consult your
    site administrator or, if that is you, the documentation to make
    sure that your environment is properly set up.  If you did not
    understand the preceding paragraph then read the next section.
   </para>
  </sect1>


  <sect1 id="tutorial-arch">
   <title>Architectural Fundamentals</title>

   <para>
    Before we proceed, you should understand the basic
    <productname>PostgreSQL</productname> system architecture.
    Understanding how the parts of
    <productname>PostgreSQL</productname> interact will make this
    chapter somewhat clearer.
   </para>

   <para>
    <productname>Postgres-XL</>, in short, is a collection
    of <productname>PostgreSQL</> database clusters which act as if the whole
    collection is a single database cluster.  Based on your database design,
    each table is replicated or distributed among member databases.
   </para>

  <para>
    To provide this capability, <productname>Postgres-XL</> is
    composed of three major components called the GTM, Coordinator and
    Datanode.  The GTM is responsible to provide ACID property of
    transactions. The Datanode stores table data and handle SQL statements
    locally.  The Coordinator handles each SQL statements from
    applications, determines which Datanode to go, and sends plans on
    to the appropriate Datanodes.
   </para>

   <para>
    You usually should run GTM on a separate server because GTM has to take
    care of transaction requirements from all the Coordinators and
    Datanodes.  To group multiple requests and responses from
    Coordinator and Datanode processes running on the same server, you can
    configure GTM-Proxy.  GTM-Proxy reduces the number of interactions
    and the amount of data to GTM.  GTM-Proxy also helps handle
    GTM failures.
   </para>

   <para>
    It is often good practice to run both Coordinator and Datanode on the
    same server because we don't have to worry about workload balance
    between the two, and you can often get at data from replicated tables locally
    without sending an additional request out on the network.
    You can have any number of servers where these
    two components are running.  Because both Coordinator and Datanode
    are essentially PostgreSQL instances, you should configure them to
    avoid resource conflict.  It is very important to assign them
    different working directories and port numbers.
   </para>

   <para>
    Postgres-XL allows multiple Coordinators to accept statements
    from applications independently but in an integrated way.  Any
    writes from any Coordinator is available from any other
    Coordinators.  They acts as if they are single database.
    The Coordinator's role is to accept statements, find what Datanodes are
    involved, send query plans on to the appropriate Datanodes if
    needed, collect the results
    and write them back to applications.
   </para>

   <para>
    The Coordinator does not store any user data.  It stores only catalog
    data to determine how to process statements, where the target
    Datanodes are, among others.  Therefore, you don't have to worry
    about Coordinator failure much.  When the Coordinator fails, you
    can just switch to the other one.
   </para>

   <para>
    The GTM could be single point of failure (SPOF).  To prevent this, you can
    run another GTM as a GTM-Standby to backup GTM's status.  When GTM fails,
    GTM-Proxy can switch to the standby on the fly.  This will be described in
    detail in high-availability sections.
   </para>

   <para>
    As described above, the Coordinators and Datanodes
    of <productname>Postgres-XL</> are
    essentially <productname>PostgreSQL</> database servers. In database
    jargon, <productname>PostgreSQL</productname> uses a client/server
    model.  A <productname>PostgreSQL</productname> session consists
    of the following cooperating processes (programs):

    <itemizedlist>
     <listitem>
      <para>
       A server process, which manages the database files, accepts
       connections to the database from client applications, and
       performs database actions on behalf of the clients.  The
       database server program is called
       <filename>postgres</filename>.
       <indexterm><primary>postgres</primary></indexterm>
      </para>
     </listitem>

     <listitem>
      <para>
       The user's client (frontend) application that wants to perform
       database operations.  Client applications can be very diverse
       in nature:  a client could be a text-oriented tool, a graphical
       application, a web server that accesses the database to
       display web pages, or a specialized database maintenance tool.
       Some client applications are supplied with the
       <productname>PostgreSQL</productname> distribution; most are
       developed by users.
      </para>
     </listitem>

    </itemizedlist>
   </para>

   <para>
    As is typical of client/server applications, the client and the
    server can be on different hosts.  In that case they communicate
    over a TCP/IP network connection.  You should keep this in mind,
    because the files that can be accessed on a client machine might
    not be accessible (or might only be accessible using a different
    file name) on the database server machine.
   </para>

   <para>
    The <productname>PostgreSQL</productname> server can handle
    multiple concurrent connections from clients.  To achieve this it
    starts (<quote>forks</quote>) a new process for each connection.
    From that point on, the client and the new server process
    communicate without intervention by the original
    <filename>postgres</filename> process.  Thus, the
    master server process is always running, waiting for
    client connections, whereas client and associated server processes
    come and go.  (All of this is of course invisible to the user.  We
    only mention it here for completeness.)
   </para>
  </sect1>

  <sect1 id="tutorial-createcluster">
   <title>Creating a Postgres-XL cluster</title>

   <para>
    As mentioned in the architectural fundamentals, <productname>Postgres-XL</productname>
    is a collection of multiple components. It can be a bit of work to come up with your
    initial working setup. In this tutorial, we will show how one can start with
    an <literal>empty</literal> configuration file and use the <application>pgxc_ctl</application>
    utility to create your <productname>Postgres-XL</productname> cluster from scratch.
   </para>

   <para>
    A few pre-requisites are necessary on each node that is going to be a part of the
    <productname>Postgres-XL</productname> setup.

    <itemizedlist>
     <listitem>
      <para>
       Password-less ssh access is required from the node that is going to run the
       <application>pgxc_ctl</application> utility.
      </para>
     </listitem>

     <listitem>
      <para>
      The PATH environment variable should have the correct <productname>Postgres-XL</productname>
      binaries on all nodes, especially while running a command via ssh.
      </para>
     </listitem>

     <listitem>
      <para>
      The <filename>pg_hba.conf</filename> entries must be updated to allow remote access. Variables
      like <option>coordPgHbaEntries</option> and <option>datanodePgHbaEntries</option>
      in the <filename>pgxc_ctl.conf</filename> configuration file may need appropriate changes.
      </para>
     </listitem>

     <listitem>
      <para>
      Firewalls and iptables may need to be updated to allow access to ports.
      </para>
     </listitem>
    </itemizedlist>
   </para>

  <para>
  The <application>pgxc_ctl</application> utility should be present in your PATH. If it is
  not there, it can be compiled from source.
<screen>
<prompt>$</prompt> <userinput>cd $XLSRC/contrib/pgxc_ctl</userinput>
<prompt>$</prompt> <userinput>make install</userinput>
</screen>

  We are now ready to prepare our template configuration file. The <application>pgxc_ctl</application>
  utility allows you to create three types of configuration. We will choose the <literal>empty</literal>
  configuration which will allow us to create our <productname>Postgres-XL</productname> setup from
  scratch. Note that we also need to set up the <option>dataDirRoot</option> environment
  variable properly for all future invocations of <application>pgxc_ctl</application>.
<screen>
<prompt>$</prompt> <userinput>export dataDirRoot=$HOME/DATA/pgxl/nodes</userinput>
<prompt>$</prompt> <userinput>mkdir $HOME/pgxc_ctl</userinput>
<prompt>$</prompt> <userinput>pgxc_ctl</userinput>
Installing pgxc_ctl_bash script as /Users/postgres/pgxc_ctl/pgxc_ctl_bash.
Installing pgxc_ctl_bash script as /Users/postgres/pgxc_ctl/pgxc_ctl_bash.
Reading configuration using /Users/postgres/pgxc_ctl/pgxc_ctl_bash --home
/Users/postgres/pgxc_ctl --configuration
/Users/postgres/pgxc_ctl/pgxc_ctl.conf
Finished reading configuration.
   ******** PGXC_CTL START ***************

   Current directory: /Users/postgres/pgxc_ctl
<prompt>PGXC$ </prompt> <userinput>prepare config empty</userinput>
<prompt>PGXC$ </prompt> <userinput>exit</userinput>
</screen>

   The <literal>empty</literal> configuration file is now ready. You should now make changes
   to the <filename>pgxc_ctl.conf</filename>. At a minimum, <option>pgxcOwner</option>
   should be set correctly. The configuration file does contain <envar>USERi</> and <envar>HOME</>
   environment variables to allow easy defaults for the current user.
   </para>

   <para>
   The next step is to add the GTM master to the setup. 
<screen>
<prompt>$</prompt> <userinput>pgxc_ctl</userinput>
<prompt>PGXC$ </prompt> <userinput>add gtm master gtm localhost 20001 $dataDirRoot/gtm</userinput>
</screen>

    Use the "monitor" command to check the status of the cluster.
<screen>
<prompt>$</prompt> <userinput>pgxc_ctl</userinput>
<prompt>PGXC$ </prompt> <userinput>monitor all</userinput>
Running: gtm master
</screen>
   </para>


   <para>
   Let us now add a couple of coordinators. When the first coordinator is added, it just
starts up. When another coordinator is added, it connects to any existing coordinator node
to fetch the metadata about objects. 
<screen>
<prompt>PGXC$ </prompt> <userinput>add coordinator master coord1 localhost 30001 30011 $dataDirRoot/coord_master.1 none none</userinput>
<prompt>PGXC$ </prompt> <userinput>monitor all</userinput>
Running: gtm master
Running: coordinator master coord1
<prompt>PGXC$ </prompt> <userinput>add coordinator master coord2 localhost 30002 30012 $dataDirRoot/coord_master.2 none none</userinput>
<prompt>PGXC$ </prompt> <userinput>monitor all</userinput>
Running: gtm master
Running: coordinator master coord1
Running: coordinator master coord2
</screen>
   </para>

   <para>
   Moving on to the addition of a couple of datanodes, now. When the first datanode is added,
it connects to any existing coordinator node to fetch global metadata. When a subsequent
datanode is added, it connects to any existing datanode for the metadata.
<screen>
<prompt>PGXC$ </prompt> <userinput>add datanode master dn1 localhost 40001 40011 $dataDirRoot/dn_master.1 none none none</userinput>
<prompt>PGXC$ </prompt> <userinput>monitor all</userinput>
Running: gtm master
Running: coordinator master coord1
Running: coordinator master coord2
Running: datanode master dn1
<prompt>PGXC$ </prompt> <userinput>add datanode master dn2 localhost 40002 40012 $dataDirRoot/dn_master.2 none none none</userinput>
<prompt>PGXC$ </prompt> <userinput>monitor all</userinput>
Running: gtm master
Running: coordinator master coord1
Running: coordinator master coord2
Running: datanode master dn1
Running: datanode master dn2
</screen>
</para>

<para>
  Your <productname>Postgres-XL</productname> setup is ready now and you can move on to the next
  "Getting Started" topic. 
  </para>
  <para>
  Read on further, only if you want a quick crash course on the various commands you can
  try out with <productname>Postgres-XL</productname>. It is strongly recommended to go through
  the entire documentation for more details on each and every command that we will touch upon
  below.
  </para>

  <para>
  Connect to one of the coordinators and create a test database.
<screen>
<prompt>$ </prompt> <userinput>psql -p 30001 postgres</userinput>
postgres=# CREATE DATABASE testdb;
CREATE DATABASE
postgres=# \q
</screen>

Look at pgxc_node catalog. It should show all the configured nodes. It is normal to have
negative node id values. This will be fixed soon.
<screen>
<prompt>$ </prompt> <userinput>psql -p 30001 testdb</userinput>
testdb=# SELECT * FROM pgxc_node;
 node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred |   node_id   
-----------+-----------+-----------+-----------+----------------+------------------+-------------
 coord1    | C         |     30001 | localhost | f              | f                |  1885696643
 coord2    | C         |     30002 | localhost | f              | f                | -1197102633
 dn1       | D         |     40001 | localhost | t              | t                |  -560021589
 dn2       | D         |     40002 | localhost | f              | t                |   352366662
(4 rows)

</screen>

Let us now create a distributed table, distributed on first column by HASH.

<screen>
testdb=# CREATE TABLE disttab(col1 int, col2 int, col3 text) DISTRIBUTE BY HASH(col1);
CREATE TABLE
testdb=# \d+ disttab
                        Table "public.disttab"
 Column |  Type   | Modifiers | Storage  | Stats target | Description 
--------+---------+-----------+----------+--------------+-------------
 col1   | integer |           | plain    |              | 
 col2   | integer |           | plain    |              | 
 col3   | text    |           | extended |              | 
Has OIDs: no
Distribute By: HASH(col1)
Location Nodes: ALL DATANODES

</screen>

Also create a replicated table.

<screen>
testdb=# CREATE TABLE repltab (col1 int, col2 int) DISTRIBUTE BY
REPLICATION;
CREATE TABLE
testdb=# \d+ repltab
                       Table "public.repltab"
 Column |  Type   | Modifiers | Storage | Stats target | Description 
--------+---------+-----------+---------+--------------+-------------
 col1   | integer |           | plain   |              | 
 col2   | integer |           | plain   |              | 
Has OIDs: no
Distribute By: REPLICATION
Location Nodes: ALL DATANODES

</screen>

Now insert some sample data in these tables.
<screen>
testdb=# INSERT INTO disttab SELECT generate_series(1,100), generate_series(101, 200), 'foo';
INSERT 0 100
testdb=# INSERT INTO repltab SELECT generate_series(1,100), generate_series(101, 200);
INSERT 0 100

</screen>
Ok. So the distributed table should have 100 rows

<screen>
testdb=# SELECT count(*) FROM disttab;
 count 
-------
   100
(1 row)


</screen>

And they must not be all on the same node. <literal>xc_node_id</> is a system
column which shows the originating datanode for each row.

Note that the distribution can be slightly uneven because of the HASH
function

<screen>
testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
 xc_node_id | count 
------------+-------
 -560021589 |    42
  352366662 |    58
(2 rows)


</screen>
For replicated tables, we expect all rows to come from a single
datanode (even though the other node has a copy too).

<screen>
testdb=# SELECT count(*) FROM repltab;
 count 
-------
   100
(1 row)

testdb=# SELECT xc_node_id, count(*) FROM repltab GROUP BY xc_node_id;
 xc_node_id | count 
------------+-------
 -560021589 |   100
(1 row)

</screen>

Now add a new datanode to the cluster.

<screen>
<prompt>PGXC$ </prompt> <userinput>add datanode master dn3 localhost 40003 40013 $dataDirRoot/dn_master.3 none none none</userinput>
<prompt>PGXC$ </prompt> <userinput>monitor all</userinput>
Running: gtm master
Running: coordinator master coord1
Running: coordinator master coord2
Running: datanode master dn1
Running: datanode master dn2
Running: datanode master dn3
</screen>


Note that during cluster reconfiguration, all outstanding transactions
are aborted and sessions are reset. So you would typically see errors
like these on open sessions

<screen>
testdb=# SELECT * FROM pgxc_node;
ERROR:  canceling statement due to user request             <==== pgxc_pool_reload() resets all sessions and aborts all open transactions

testdb=# SELECT * FROM pgxc_node;
 node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred |   node_id   
-----------+-----------+-----------+-----------+----------------+------------------+-------------
 coord1    | C         |     30001 | localhost | f              | f                |  1885696643
 coord2    | C         |     30002 | localhost | f              | f                | -1197102633
 dn1       | D         |     40001 | localhost | t              | t                |  -560021589
 dn2       | D         |     40002 | localhost | f              | t                |   352366662
 dn3       | D         |     40003 | localhost | f              | f                |  -700122826
(5 rows)
</screen>

Note that with new datanode addition, Existing tables are not affected. The distribution information now
explicitly shows the older datanodes
<screen>
testdb=# \d+ disttab
                        Table "public.disttab"
 Column |  Type   | Modifiers | Storage  | Stats target | Description 
--------+---------+-----------+----------+--------------+-------------
 col1   | integer |           | plain    |              | 
 col2   | integer |           | plain    |              | 
 col3   | text    |           | extended |              | 
Has OIDs: no
Distribute By: HASH(col1)
Location Nodes: dn1, dn2

testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
 xc_node_id | count 
------------+-------
 -560021589 |    42
  352366662 |    58
(2 rows)

testdb=# \d+ repltab
                       Table "public.repltab"
 Column |  Type   | Modifiers | Storage | Stats target | Description 
--------+---------+-----------+---------+--------------+-------------
 col1   | integer |           | plain   |              | 
 col2   | integer |           | plain   |              | 
Has OIDs: no
Distribute By: REPLICATION
Location Nodes: dn1, dn2
</screen>

Let us now try to redistribute tables so that they can take advantage
of the new datanode

<screen>
testdb=# ALTER TABLE disttab ADD NODE (dn3);
ALTER TABLE
testdb=# \d+ disttab
                        Table "public.disttab"
 Column |  Type   | Modifiers | Storage  | Stats target | Description 
--------+---------+-----------+----------+--------------+-------------
 col1   | integer |           | plain    |              | 
 col2   | integer |           | plain    |              | 
 col3   | text    |           | extended |              | 
Has OIDs: no
Distribute By: HASH(col1)
Location Nodes: ALL DATANODES

testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
 xc_node_id | count 
------------+-------
 -700122826 |    32
  352366662 |    32
 -560021589 |    36
(3 rows)

</screen>

Let us now add a third coordinator.
<screen>
<prompt>PGXC$ </prompt> <userinput>add coordinator master coord3 localhost 30003 30013 $dataDirRoot/coord_master.3 none none</userinput>
<prompt>PGXC$ </prompt> <userinput>monitor all</userinput>
Running: gtm master
Running: coordinator master coord1
Running: coordinator master coord2
Running: coordinator master coord3
Running: datanode master dn1
Running: datanode master dn2
Running: datanode master dn3

testdb=# SELECT * FROM pgxc_node;
ERROR:  canceling statement due to user request
testdb=# SELECT * FROM pgxc_node;
 node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred |   node_id   
-----------+-----------+-----------+-----------+----------------+------------------+-------------
 coord1    | C         |     30001 | localhost | f              | f                |  1885696643
 coord2    | C         |     30002 | localhost | f              | f                | -1197102633
 dn1       | D         |     40001 | localhost | t              | t                |  -560021589
 dn2       | D         |     40002 | localhost | f              | t                |   352366662
 dn3       | D         |     40003 | localhost | f              | f                |  -700122826
 coord3    | C         |     30003 | localhost | f              | f                |  1638403545
(6 rows)

</screen>

We can try some more ALTER TABLE so as to delete a node from a table
distribution and add it back

<screen>
testdb=# ALTER TABLE disttab DELETE NODE (dn1);
ALTER TABLE
testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
 xc_node_id | count 
------------+-------
  352366662 |    42
 -700122826 |    58
(2 rows)

testdb=# ALTER TABLE disttab ADD NODE (dn1);
ALTER TABLE
testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
 xc_node_id | count 
------------+-------
 -700122826 |    32
  352366662 |    32
 -560021589 |    36
(3 rows)
</screen>


You could also alter a replicated table to make it a distributed table.
Note that even though the cluster has 3 datanodes now, the table will continue
to use only 2 datanodes where the table was originally replicated on.

<screen>
testdb=# ALTER TABLE repltab DISTRIBUTE BY HASH(col1);
ALTER TABLE
testdb=# SELECT xc_node_id, count(*) FROM repltab GROUP BY xc_node_id;
 xc_node_id | count 
------------+-------
 -560021589 |    42
  352366662 |    58
(2 rows)

testdb=# ALTER TABLE repltab DISTRIBUTE BY REPLICATION;
ALTER TABLE
testdb=# SELECT xc_node_id, count(*) FROM repltab GROUP BY xc_node_id;
 xc_node_id | count 
------------+-------
 -560021589 |   100
(1 row)
</screen>

Remove the coordinator added previously now. You can use the "clean" option
to remove the corresponding data directory as well.

<screen>
<prompt>PGXC$ </prompt> <userinput>remove coordinator master coord3 clean</userinput>
<prompt>PGXC$ </prompt> <userinput>monitor all</userinput>
Running: gtm master
Running: coordinator master coord1
Running: coordinator master coord2
Running: datanode master dn1
Running: datanode master dn2
Running: datanode master dn3

testdb=# SELECT oid, * FROM pgxc_node;
ERROR:  canceling statement due to user request
testdb=# SELECT oid, * FROM pgxc_node;
  oid  | node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred |   node_id   
-------+-----------+-----------+-----------+-----------+----------------+------------------+-------------
 11197 | coord1    | C         |     30001 | localhost | f              | f                |  1885696643
 16384 | coord2    | C         |     30002 | localhost | f              | f                | -1197102633
 16385 | dn1       | D         |     40001 | localhost | t              | t                |  -560021589
 16386 | dn2       | D         |     40002 | localhost | f              | t                |   352366662
 16397 | dn3       | D         |     40003 | localhost | f              | f                |  -700122826
(5 rows)

</screen>

Let us try to remove a datanode now. NOTE: <productname>Postgres-XL</productname> does not
employ any additional checks to ascertain if the datanode being dropped has data from tables
that are replicated/distributed. It is the responsibility of the user to ensure that it's
safe to remove a datanode.

You can use the below query to find out if the datanode being removed has any data on it.
Do note that this only shows tables from the current database. You might want to ensure
the same for all databases before going ahead with the datanode removal. Use the OID of the
datanode that is to be removed in the below query: 
 
<screen>
testdb=# SELECT * FROM pgxc_class WHERE nodeoids::integer[] @> ARRAY[16397];
 pcrelid | pclocatortype | pcattnum | pchashalgorithm | pchashbuckets |     nodeoids      
---------+---------------+----------+-----------------+---------------+-------------------
   16388 | H             |        1 |               1 |          4096 | 16385 16386 16397
(1 row)


testdb=# ALTER TABLE disttab DELETE NODE (dn3);
ALTER TABLE
testdb=# SELECT * FROM pgxc_class WHERE nodeoids::integer[] @> ARRAY[16397];
 pcrelid | pclocatortype | pcattnum | pchashalgorithm | pchashbuckets | nodeoids 
---------+---------------+----------+-----------------+---------------+----------
(0 rows)
</screen>

Ok, it is safe to remove datanode "dn3" now.
<screen>
<prompt>PGXC$ </prompt> <userinput>remove datanode master dn3 clean</userinput>
<prompt>PGXC$ </prompt> <userinput>monitor all</userinput>
Running: gtm master
Running: coordinator master coord1
Running: coordinator master coord2
Running: datanode master dn1
Running: datanode master dn2

testdb=# SELECT oid, * FROM pgxc_node;
ERROR:  canceling statement due to user request
testdb=# SELECT oid, * FROM pgxc_node;
  oid  | node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred |   node_id   
-------+-----------+-----------+-----------+-----------+----------------+------------------+-------------
 11197 | coord1    | C         |     30001 | localhost | f              | f                |  1885696643
 16384 | coord2    | C         |     30002 | localhost | f              | f                | -1197102633
 16385 | dn1       | D         |     40001 | localhost | t              | t                |  -560021589
 16386 | dn2       | D         |     40002 | localhost | f              | t                |   352366662
(4 rows)

</screen>

The <application>pgxc_ctl</application> utility can also help in setting up slaves for
datanodes and coordinators. Let us setup a slave for a datanode and see how failover can
be performed in case the master datanode goes down.
<screen>
<prompt>PGXC$ </prompt> <userinput>add datanode slave dn1 localhost 40101 40111 $dataDirRoot/dn_slave.1 none $dataDirRoot/datanode_archlog.1</userinput>
<prompt>PGXC$ </prompt> <userinput>monitor all</userinput>
Running: gtm master
Running: coordinator master coord1
Running: coordinator master coord2
Running: datanode master dn1
Running: datanode slave dn1
Running: datanode master dn2

testdb=# EXECUTE DIRECT ON(dn1) 'SELECT client_hostname, state, sync_state FROM pg_stat_replication';
 client_hostname |   state   | sync_state 
-----------------+-----------+------------
                 | streaming | async
(1 row)
</screen>

Add some more rows to test failover now.

<screen>
testdb=# INSERT INTO disttab SELECT generate_series(1001,1100), generate_series(1101, 1200), 'foo';
INSERT 0 100
testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
 xc_node_id | count 
------------+-------
 -560021589 |    94
  352366662 |   106
(2 rows)
</screen>

Let us simulate datanode failover now. We will first stop the datanode master "dn1" for
which we configured a slave above. Note that since the slave is connected to the master
we will use "immediate" mode for stopping it.
<screen>
<prompt>PGXC$ </prompt> <userinput>stop -m immediate datanode master dn1</userinput>
</screen>

Since a datanode is down, queries will fail. Though a few queries may still work if
the failed node is not required to run the query, and that is determined by the
distribution of the data and the WHERE clause being used.

<screen>
testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
ERROR:  Failed to get pooled connections

testdb=# SELECT xc_node_id, * FROM disttab WHERE col1 = 3;
 xc_node_id | col1 | col2 | col3 
------------+------+------+------
  352366662 |    3 |  103 | foo
(1 row)
</screen>

We will now perform the failover and check that everything is working fine post that.
<screen>
<prompt>PGXC$ </prompt> <userinput>failover datanode dn1</userinput>

testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
ERROR:  canceling statement due to user request
testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
 xc_node_id | count 
------------+-------
 -560021589 |    94
  352366662 |   106
(2 rows)
</screen>


The pgxc_node catalog now should have updated entries. Especially, the
failed over datanode node_host and node_port should have been replaced
with the slave's host and port values.

<screen>
testdb=# SELECT oid, * FROM pgxc_node;
  oid  | node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred |   node_id   
-------+-----------+-----------+-----------+-----------+----------------+------------------+-------------
 11197 | coord1    | C         |     30001 | localhost | f              | f                |  1885696643
 16384 | coord2    | C         |     30002 | localhost | f              | f                | -1197102633
 16386 | dn2       | D         |     40002 | localhost | f              | t                |   352366662
 16385 | dn1       | D         |     40101 | localhost | t              | t                |  -560021589
(4 rows)

<prompt>PGXC$ </prompt> <userinput>monitor all</userinput>
Running: gtm master
Running: coordinator master coord1
Running: coordinator master coord2
Running: datanode master dn1
Running: datanode master dn2
</screen>
</para>

  </sect1>

  <sect1 id="tutorial-createdb">
   <title>Creating a Database</title>

   <indexterm zone="tutorial-createdb">
    <primary>database</primary>
    <secondary>creating</secondary>
   </indexterm>

   <indexterm zone="tutorial-createdb">
    <primary>createdb</primary>
   </indexterm>

   <para>
    The first test to see whether you can access the database server
    is to try to create a database.  A running
    <productname>PostgreSQL</productname> server can manage many
    databases.  Typically, a separate database is used for each
    project or for each user.
   </para>

   <para>
    Possibly, your site administrator has already created a database
    for your use.  In that case you can omit this step and skip ahead
    to the next section.
   </para>

   <para>
    To create a new database, in this example named
    <literal>mydb</literal>, you use the following command:
<screen>
<prompt>$</prompt> <userinput>createdb mydb</userinput>
</screen>
    If this produces no response then this step was successful and you can skip over the
    remainder of this section.
   </para>

   <para>
    If you see a message similar to:
<screen>
createdb: command not found
</screen>
    then <productname>PostgreSQL</> was not installed properly.  Either it was not
    installed at all or your shell's search path was not set to include it.
    Try calling the command with an absolute path instead:
<screen>
<prompt>$</prompt> <userinput>/usr/local/pgsql/bin/createdb mydb</userinput>
</screen>
    The path at your site might be different.  Contact your site
    administrator or check the installation instructions to
    correct the situation.
   </para>

   <para>
    Another response could be this:
<screen>
createdb: could not connect to database postgres: could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
</screen>
    This means that the server was not started, or it was not started
    where <command>createdb</command> expected it.  Again, check the
    installation instructions or consult the administrator.
   </para>

   <para>
    Another response could be this:
<screen>
createdb: could not connect to database postgres: FATAL:  role "joe" does not exist
</screen>
    where your own login name is mentioned.  This will happen if the
    administrator has not created a <productname>PostgreSQL</> user account
    for you.  (<productname>PostgreSQL</> user accounts are distinct from
    operating system user accounts.)  If you are the administrator, see
    <xref linkend="user-manag"> for help creating accounts.  You will need to
    become the operating system user under which <productname>PostgreSQL</>
    was installed (usually <literal>postgres</>) to create the first user
    account.  It could also be that you were assigned a
    <productname>PostgreSQL</> user name that is different from your
    operating system user name; in that case you need to use the <option>-U</>
    switch or set the <envar>PGUSER</> environment variable to specify your
    <productname>PostgreSQL</> user name.
   </para>

   <para>
    If you have a user account but it does not have the privileges required to
    create a database, you will see the following:
<screen>
createdb: database creation failed: ERROR:  permission denied to create database
</screen>
    Not every user has authorization to create new databases.  If
    <productname>PostgreSQL</productname> refuses to create databases
    for you then the site administrator needs to grant you permission
    to create databases.  Consult your site administrator if this
    occurs.  If you installed <productname>PostgreSQL</productname>
    yourself then you should log in for the purposes of this tutorial
    under the user account that you started the server as.

    <footnote>
     <para>
      As an explanation for why this works:
      <productname>PostgreSQL</productname> user names are separate
      from operating system user accounts.  When you connect to a
      database, you can choose what
      <productname>PostgreSQL</productname> user name to connect as;
      if you don't, it will default to the same name as your current
      operating system account.  As it happens, there will always be a
      <productname>PostgreSQL</productname> user account that has the
      same name as the operating system user that started the server,
      and it also happens that that user always has permission to
      create databases.  Instead of logging in as that user you can
      also specify the <option>-U</option> option everywhere to select
      a <productname>PostgreSQL</productname> user name to connect as.
     </para>
    </footnote>
   </para>

   <para>
    You can also create databases with other names.
    <productname>PostgreSQL</productname> allows you to create any
    number of databases at a given site.  Database names must have an
    alphabetic first character and are limited to 63 bytes in
    length.  A convenient choice is to create a database with the same
    name as your current user name.  Many tools assume that database
    name as the default, so it can save you some typing.  To create
    that database, simply type:
<screen>
<prompt>$</prompt> <userinput>createdb</userinput>
</screen>
   </para>

   <para>
    If you do not want to use your database anymore you can remove it.
    For example, if you are the owner (creator) of the database
    <literal>mydb</literal>, you can destroy it using the following
    command:
<screen>
<prompt>$</prompt> <userinput>dropdb mydb</userinput>
</screen>
    (For this command, the database name does not default to the user
    account name.  You always need to specify it.)  This action
    physically removes all files associated with the database and
    cannot be undone, so this should only be done with a great deal of
    forethought.
   </para>

   <para>
    More about <command>createdb</command> and <command>dropdb</command> can
    be found in <xref linkend="APP-CREATEDB"> and <xref linkend="APP-DROPDB">
    respectively.
   </para>
  </sect1>


  <sect1 id="tutorial-accessdb">
   <title>Accessing a Database</title>

   <indexterm zone="tutorial-accessdb">
    <primary>psql</primary>
   </indexterm>

   <para>
    Once you have created a database, you can access it by:

    <itemizedlist spacing="compact" mark="bullet">
     <listitem>
      <para>
       Running the <productname>PostgreSQL</productname> interactive
       terminal program, called <application><firstterm>psql</></application>, which allows you
       to interactively enter, edit, and execute
       <acronym>SQL</acronym> commands.
      </para>
     </listitem>

     <listitem>
      <para>
       Using an existing graphical frontend tool like
       <application>pgAdmin</application> or an office suite with
       <acronym>ODBC</> or <acronym>JDBC</> support to create and manipulate a
       database.  These possibilities are not covered in this
       tutorial.
      </para>
     </listitem>

     <listitem>
      <para>
       Writing a custom application, using one of the several
       available language bindings.  These possibilities are discussed
       further in <xref linkend="client-interfaces">.
      </para>
     </listitem>
    </itemizedlist>

    You probably want to start up <command>psql</command> to try
    the examples in this tutorial.  It can be activated for the
    <literal>mydb</literal> database by typing the command:
<screen>
<prompt>$</prompt> <userinput>psql mydb</userinput>
</screen>
    If you do not supply the database name then it will default to your
    user account name.  You already discovered this scheme in the
    previous section using <command>createdb</command>.
   </para>

   <para>
    In <command>psql</command>, you will be greeted with the following
    message:
<screen>
psql (&version;)
Type "help" for help.

mydb=&gt;
</screen>
    <indexterm><primary>superuser</primary></indexterm>
    The last line could also be:
<screen>
mydb=#
</screen>
    That would mean you are a database superuser, which is most likely
    the case if you installed the <productname>PostgreSQL</productname> instance
    yourself.  Being a superuser means that you are not subject to
    access controls.  For the purposes of this tutorial that is not
    important.
   </para>

   <para>
    If you encounter problems starting <command>psql</command>
    then go back to the previous section.  The diagnostics of
    <command>createdb</command> and <command>psql</command> are
    similar, and if the former worked the latter should work as well.
   </para>

   <para>
    The last line printed out by <command>psql</command> is the
    prompt, and it indicates that <command>psql</command> is listening
    to you and that you can type <acronym>SQL</acronym> queries into a
    work space maintained by <command>psql</command>.  Try out these
    commands:
    <indexterm><primary>version</primary></indexterm>
<screen>
<prompt>mydb=&gt;</prompt> <userinput>SELECT version();</userinput>
                                         version
------------------------------------------------------------------------------------------
 PostgreSQL &version; on x86_64-pc-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit
(1 row)

<prompt>mydb=&gt;</prompt> <userinput>SELECT current_date;</userinput>
    date
------------
 2016-01-07
(1 row)

<prompt>mydb=&gt;</prompt> <userinput>SELECT 2 + 2;</userinput>
 ?column?
----------
        4
(1 row)
</screen>
   </para>

   <para>
    The <command>psql</command> program has a number of internal
    commands that are not SQL commands.  They begin with the backslash
    character, <quote><literal>\</literal></quote>.
    For example,
    you can get help on the syntax of various
    <productname>PostgreSQL</productname> <acronym>SQL</acronym>
    commands by typing:
<screen>
<prompt>mydb=&gt;</prompt> <userinput>\h</userinput>
</screen>
   </para>

   <para>
    To get out of <command>psql</command>, type:
<screen>
<prompt>mydb=&gt;</prompt> <userinput>\q</userinput>
</screen>
    and <command>psql</command> will quit and return you to your
    command shell. (For more internal commands, type
    <literal>\?</literal> at the <command>psql</command> prompt.)  The
    full capabilities of <command>psql</command> are documented in
    <xref linkend="app-psql">.  In this tutorial we will not use these
    features explicitly, but you can use them yourself when it is helpful.
   </para>

  </sect1>
 </chapter>