diff options
author | Marko Kreen | 2007-12-05 15:48:41 +0000 |
---|---|---|
committer | Marko Kreen | 2007-12-05 15:48:41 +0000 |
commit | 1aa0c26d091a4e1b4f31202e0d22906d3f5ee71a (patch) | |
tree | ccba39d978aacd28a1ae77eba9cbd242e8be2552 | |
parent | 32210ea033cc45113c08ee7c4629833d709ce47f (diff) |
set notes
-rw-r--r-- | doc/Makefile | 2 | ||||
-rw-r--r-- | doc/set.notes.txt | 217 |
2 files changed, 218 insertions, 1 deletions
diff --git a/doc/Makefile b/doc/Makefile index fad5b438..cc5dc36f 100644 --- a/doc/Makefile +++ b/doc/Makefile @@ -11,7 +11,7 @@ EPYARGS = --no-private --url="http://pgfoundry.org/projects/skytools/" \ HTMLS = londiste.cmdline.html londiste.config.html README.html INSTALL.html \ londiste.ref.html TODO.html pgq-sql.html pgq-admin.html pgq-nodupes.html \ - walmgr.html + walmgr.html set.notes.html all: man diff --git a/doc/set.notes.txt b/doc/set.notes.txt new file mode 100644 index 00000000..cd4757d2 --- /dev/null +++ b/doc/set.notes.txt @@ -0,0 +1,217 @@ + += Cascaded replication = + +== Goals == + +* Main goals: + - Nodes in set are in sync - that means they can change their provider to + any other node in set. + - Combining data from partitioned plproxy db's. + +* Extra goals: + - Queue-only nodes. Cannot be initial providers to other nodes, because initial COPY cannot be done. + - Data-only nodes, without queue - leafs. Cannot be providers to other nodes. + +* Extra-extra goals: + - The node that combines events from partitions can be queue-only. + +== Termins == + +set:: + Group of nodes that synchronise set of tables using same data (same events, batch boundaries and tick_ids). + +node:: + A database that participates in a set. + - contains tables which are updated by replaying events from queue. + - contains queue which has events from. + +provider:: + Node that provides data to another. + +subscriber:: + Node that receives data from another. + +== Node types == + +root:: + current master for set data. + +branch:: + * carries full contents of the queue. + * may subscribe to all/some/none of the tables. + * can be provider for initial copy only if subscribes to table + * attached tags: mirror/subset/queue ??? + +leaf:: + Data-only node. Events are replayed, but no queue, thus cannot be provider to other nodes. + +merge-leaf:: + Exists in per-partition set. + - Does not have it's own queue. + - Initial COPY is done with --skip-truncate, + - Event data is sent to combined queue. + - tick_id for each batch is sent to combined queue. + + - replica from part to combined-failover must lag behind combined set coming + from combined-root + +combined-root:: + - Master for combined set. Received data from several per-partition set's. + - also is merge-leaf in every part-set. + - queue is filled directly from partition sets. + +combined-failover:: + - participates in combined-set, receives events. + - also is queue-only node in every part-set. + - but no processing is done, just tracking + +== Node behaviour == + + Consumer behaviour | ROOT | BRANCH | LEAF | C-ROOT | C-BRANCH | M-LEAF => C-ROOT | M-LEAF => C-BRANCH + -------------------------------+------+--------+------+--------+----------+------------------+-------------------- + read from queue | n | yes | yes | n | yes | yes | yes + event copy to queue | n | yes | n | n | yes | yes, to c-set | n + event processing | n | yes | yes | n | yes | n | n + send tick_id to queue | n | n | n | n | n | yes | n + send global_watermark to queue | yes | n | n | yes | n | n | n + send local watermark upwards | n | yes | yes | n | yes | yes | yes + wait behind combined set | n | n | n | n | n | n | yes + +== Design Notes == + +* Any data duplication should be avoided. +* Only duplicated table is for node_name -> node_connstr mappings. + For others there is always only one node responsible. +* Subscriber gets its own provider url from database, so switching to + another provider does not need config changes. +* Ticks+data can only be deleted if all nodes have already applied it. + - Special consumer registration on all queues - <setname>_watermark. + This avoid PgQ deleting old events. + - Nodes propagate upwards their lowest tick + - Root sends new watermark events to the queue + - When branch/leaf gets new watermark event, it moves the <setname>_watermark registration. + +== Illustrations == + +=== Simple case === + + +-------------+ +---------------+ + | S1 - root |----->| S2 - branch | + +-------------+ +---------------| + | + | + V + +-------------+ +-------------+ + | S3 - branch |----->| S4 - leaf | + +-------------+ +-------------+ + +(S1 - set 'S', node '1') + +On the loss of S1, it should be possible to direct S3 to receive data from S2. +On the loss of S3, it should be possible to direct S4 to receive data from S1/S2. + +=== Complex case - combining partitions === + + +----+ +----+ +----+ +----+ + | A1 | | B1 | | C1 | | D1 | + +----+ +----+ +----+ +----+ + | | | | + | | | | +-----------------------+ + | | | +-->| S1 - combined-root | + | | +----------->| | +--------------+ + | +-------------------->| A2/B2/C2/D2 - |--->| S3 - branch | + +----------------------------->| merge-leaf | +--------------+ + | | | | +-----------------------+ + | | | | | + | | | | V + | | | | +------------------------+ + | | | +-->| S2 - combined-failover | + | | +----------->| | + | +-------------------->| A3/B3/C3/D3 - | + +----------------------------->| merge-leaf | + +------------------------+ + +On the loss of S1, it should be possible to redirect S3 to S2 +and ABCD -> S2 -> S3 must stay in sync. + +== UI spec (incomplete) == + +Here should be end-user interface specced. + + setmgr.py <set.ini> init-master ( <node> --connstr <connstr> | --infoset <infoset> ) + setmgr.py <set.ini> init-node <node> --type <type> --connstr <connstr> --provider <node> + setmgr.py <set.ini> change-provider <node> --provider <node> + setmgr.py <set.ini> set-dead <node> + setmgr.py <set.ini> set-live <node> + setmgr.py <set.ini> pause <node> + setmgr.py <set.ini> resume <node> + setmgr.py <set.ini> master-failover <newmaster> + setmgr.py <set.ini> master-switchover <newmaster> + setmgr.py <set.ini> node-failover <newnode> + setmgr.py <set.ini> node-switchover <newnode> + setmgr.py <set.ini> candidates <node> + + + setmgr.py <set.ini> londiste <node> add-table <tbl> .... + setmgr.py <set.ini> remove-table <node> <tbl> .... + setmgr.py <set.ini> add-seq <node> <seq> .... + setmgr.py <set.ini> remove-seq <node> <seq> .... + setmgr.py <set.ini> tables <node> + setmgr.py <set.ini> seqs <node> + setmgr.py <set.ini> missing <node> --seqs --tables + setmgr.py <set.ini> check-tables + setmgr.py <set.ini> show-fkeys + setmgr.py <set.ini> show-triggers + setmgr.py <set.ini> show-tables + setmgr.py <set.ini> show-seqs + setmgr.py <set.ini> <node> resync-table + + londiste.py <node.ini> add-table <tbl> .... + londiste.py <node.ini> remove-table <tbl> .... + londiste.py <node.ini> add-seq <seq> .... + londiste.py <node.ini> remove-seq <seq> .... + londiste.py <node.ini> tables + londiste.py <node.ini> seqs + londiste.py <node.ini> missing --seqs --tables + londiste.py <node.ini> check + londiste.py <node.ini> fkeys + londiste.py <node.ini> triggers + londiste.py <node.ini> tables + londiste.py <node.ini> seqs + londiste.py <node.ini> resync TBL + londiste.py <node.ini> compare TBL + londiste.py <node.ini> repair TBL + + ** actions must be resumable ** + + setmgr.py <set.ini> init-master ( <node> --connstr <connstr> | --infoset <infoset> ) + - install pgq, pgq_set + - check if db is already in set + - add member to set + - create_node() + setmgr.py <set.ini> init-node <node> --type <type> --connstr <connstr> --provider <node> + - install pgq, pgq_set + - check if db is alreeady in set + - add to node master member list + - fetch full list of member from master + - add full list to node + - subscribe to provider + - create_node + setmgr.py <set.ini> change-provider <node> --provider <node> + - pause node, wait until reacts + - fetch pos from old provider + - subscribe to new provider + - change info in subscriber + - resume script + - unregister in old provider + setmgr.py <set.ini> set-dead <node> + setmgr.py <set.ini> set-live <node> + setmgr.py <set.ini> pause <node> + setmgr.py <set.ini> resume <node> + setmgr.py <set.ini> master-failover <newmaster> + setmgr.py <set.ini> master-switchover <newmaster> + setmgr.py <set.ini> node-failover <newnode> + setmgr.py <set.ini> node-switchover <newnode> + setmgr.py <set.ini> candidates <node> + + |