summaryrefslogtreecommitdiff
path: root/doc/skytools3.txt
blob: 0d346e4aea7d16b305b928b7094822814c26a58a (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107

Skytools 3 - cascaded replication
=================================

Keep old design from Skytools 2
-------------------------------

* Worker process connects to only 2 databases, there is no
  everybody-to-everybody communication going on.
* Worker process only pulls data from queue.
  - No pushing with LISTEN/NOTIFY is used for data transport.
  - Administrative work happens in separate process.
  - Can go down anytime, without affecting anything else.
* Relaxed attitude about tables.
  - Tables can be added/removed at any time.
  - Initial data sync happens table-by-table, no attempt is made to keep
    consistent picture between tables during initial copy.

New features in Skytools 3
--------------------------

* Cascading is implemented as generic layer on top of PgQ - *Cascaded PgQ*.
  - Its goal is to keep identical copy of queue contents in several nodes.
  - Not replication-specific - can be used for any queue.
  - Advanced admin operations: takeover, change-provider, pause/resume.
  - For terminology and technical details see here: set.notes.txt.

* New Londiste features:
  - Parallel copy - during initial sync several tables can be copied
    at the same time.  In 2.x the copy already happened in separate process,
    making it parallel was just a matter of tuning launching/syncing logic.

  - EXECUTE command, to run random SQL script on all nodes.  The script is
    executed in single TX on root, and inserted as an event into the queue
    in the same TX.  The goal is to emulate DDL AFTER TRIGGER that way.
    Londiste itself does no locking and no coordination between nodes.
    The assumption is that the DDL commands themselves do enough locking.
    If more locking is needed is can be added to script.

  - Automatic table or sequence creation by importing the structure
    from provider node.  Activated with --create switch for add-table, add-seq.
    By default *everything* is copied, including Londiste own triggers.
    The basic idea is that the triggers may be customized and that way
    we avoid the need to keep track of trigger customizations.

  - Ability to merge replication queues coming from partitioned database.
    The possibility was always there but now PgQ keeps also track
    of batch positions, allowing loss of the merge point.

  - Londiste now uses the intelligent log-triggers by default.  The triggers
    were introduced in 2.1.x, but were not on by default.  Now they are
    used by default.

  - Londiste processes events via 'handlers'.  Thus we can do table partitioning
    in Londiste, instead of custom consumer, which means all Londiste features
    are available in such situation - like proper initial COPY.
    To see list of them: `londiste3 x.ini show-handlers`.

  - Target table can use different name (--dest-table)

* New interactive admin console - qadmin.  Because long command lines are
  not very user-friendly, this is an experiment on interactive console with
  heavy emphasis on tab-completion.

* New multi-database ticker: `pgqd`.  It is possible to set up one process that
  maintains all PgQ databases in one PostgreSQL instance.  It will
  auto-detect both databases and whether they have PgQ installed.
  This also makes core PgQ usable without need for Python.

Minor improvements
------------------

* sql/pgq: ticks also store last sequence pos with them.  This allowed
  also to move most of the ticker functionality into database.  Ticker
  daemon now just needs to call SQL function periodically, it does not
  need to keep track of seq positions.

* sql/pgq: Ability to enforce max number of events that one TX can insert.
  In addition to simply keeping queue healthy, it also gives a way to
  survive bad UPDATE/DELETE statements with buggy or missing WHERE clause.

* sql/pgq: If Postgres has autovacuum turned on, internal vacuuming for
  fast-changing tables is disabled.

* python/pgq: pgq.Consumer does not register consumer automatically,
  cmdline switches --register / --unregister need to be used for that.

* londiste: sequences are now pushed into queue, instead pulled
  directly from database.  This reduces load on root
  and also allows in-between nodes that do not have sequences.

* psycopg1 is not supported anymore.

* PgQ does not handle "failed events" anymore.

* Skytools 3 modules are parallel installable with Skytools 2.
  Solved via loader module (like http://faq.pygtk.org/index.py?req=all#2.4[pygtk]).

    import pkgloader
    pkgloader.require('skytools', '3.0')
    import skytools


Further reading
---------------

* http://skytools.projects.postgresql.org/skytools-3.0/[Documentation] for skytools3.