summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMarko Kreen2007-12-05 15:48:41 +0000
committerMarko Kreen2007-12-05 15:48:41 +0000
commit1aa0c26d091a4e1b4f31202e0d22906d3f5ee71a (patch)
treeccba39d978aacd28a1ae77eba9cbd242e8be2552
parent32210ea033cc45113c08ee7c4629833d709ce47f (diff)
set notes
-rw-r--r--doc/Makefile2
-rw-r--r--doc/set.notes.txt217
2 files changed, 218 insertions, 1 deletions
diff --git a/doc/Makefile b/doc/Makefile
index fad5b438..cc5dc36f 100644
--- a/doc/Makefile
+++ b/doc/Makefile
@@ -11,7 +11,7 @@ EPYARGS = --no-private --url="http://pgfoundry.org/projects/skytools/" \
HTMLS = londiste.cmdline.html londiste.config.html README.html INSTALL.html \
londiste.ref.html TODO.html pgq-sql.html pgq-admin.html pgq-nodupes.html \
- walmgr.html
+ walmgr.html set.notes.html
all: man
diff --git a/doc/set.notes.txt b/doc/set.notes.txt
new file mode 100644
index 00000000..cd4757d2
--- /dev/null
+++ b/doc/set.notes.txt
@@ -0,0 +1,217 @@
+
+= Cascaded replication =
+
+== Goals ==
+
+* Main goals:
+ - Nodes in set are in sync - that means they can change their provider to
+ any other node in set.
+ - Combining data from partitioned plproxy db's.
+
+* Extra goals:
+ - Queue-only nodes. Cannot be initial providers to other nodes, because initial COPY cannot be done.
+ - Data-only nodes, without queue - leafs. Cannot be providers to other nodes.
+
+* Extra-extra goals:
+ - The node that combines events from partitions can be queue-only.
+
+== Termins ==
+
+set::
+ Group of nodes that synchronise set of tables using same data (same events, batch boundaries and tick_ids).
+
+node::
+ A database that participates in a set.
+ - contains tables which are updated by replaying events from queue.
+ - contains queue which has events from.
+
+provider::
+ Node that provides data to another.
+
+subscriber::
+ Node that receives data from another.
+
+== Node types ==
+
+root::
+ current master for set data.
+
+branch::
+ * carries full contents of the queue.
+ * may subscribe to all/some/none of the tables.
+ * can be provider for initial copy only if subscribes to table
+ * attached tags: mirror/subset/queue ???
+
+leaf::
+ Data-only node. Events are replayed, but no queue, thus cannot be provider to other nodes.
+
+merge-leaf::
+ Exists in per-partition set.
+ - Does not have it's own queue.
+ - Initial COPY is done with --skip-truncate,
+ - Event data is sent to combined queue.
+ - tick_id for each batch is sent to combined queue.
+
+ - replica from part to combined-failover must lag behind combined set coming
+ from combined-root
+
+combined-root::
+ - Master for combined set. Received data from several per-partition set's.
+ - also is merge-leaf in every part-set.
+ - queue is filled directly from partition sets.
+
+combined-failover::
+ - participates in combined-set, receives events.
+ - also is queue-only node in every part-set.
+ - but no processing is done, just tracking
+
+== Node behaviour ==
+
+ Consumer behaviour | ROOT | BRANCH | LEAF | C-ROOT | C-BRANCH | M-LEAF => C-ROOT | M-LEAF => C-BRANCH
+ -------------------------------+------+--------+------+--------+----------+------------------+--------------------
+ read from queue | n | yes | yes | n | yes | yes | yes
+ event copy to queue | n | yes | n | n | yes | yes, to c-set | n
+ event processing | n | yes | yes | n | yes | n | n
+ send tick_id to queue | n | n | n | n | n | yes | n
+ send global_watermark to queue | yes | n | n | yes | n | n | n
+ send local watermark upwards | n | yes | yes | n | yes | yes | yes
+ wait behind combined set | n | n | n | n | n | n | yes
+
+== Design Notes ==
+
+* Any data duplication should be avoided.
+* Only duplicated table is for node_name -> node_connstr mappings.
+ For others there is always only one node responsible.
+* Subscriber gets its own provider url from database, so switching to
+ another provider does not need config changes.
+* Ticks+data can only be deleted if all nodes have already applied it.
+ - Special consumer registration on all queues - <setname>_watermark.
+ This avoid PgQ deleting old events.
+ - Nodes propagate upwards their lowest tick
+ - Root sends new watermark events to the queue
+ - When branch/leaf gets new watermark event, it moves the <setname>_watermark registration.
+
+== Illustrations ==
+
+=== Simple case ===
+
+ +-------------+ +---------------+
+ | S1 - root |----->| S2 - branch |
+ +-------------+ +---------------|
+ |
+ |
+ V
+ +-------------+ +-------------+
+ | S3 - branch |----->| S4 - leaf |
+ +-------------+ +-------------+
+
+(S1 - set 'S', node '1')
+
+On the loss of S1, it should be possible to direct S3 to receive data from S2.
+On the loss of S3, it should be possible to direct S4 to receive data from S1/S2.
+
+=== Complex case - combining partitions ===
+
+ +----+ +----+ +----+ +----+
+ | A1 | | B1 | | C1 | | D1 |
+ +----+ +----+ +----+ +----+
+ | | | |
+ | | | | +-----------------------+
+ | | | +-->| S1 - combined-root |
+ | | +----------->| | +--------------+
+ | +-------------------->| A2/B2/C2/D2 - |--->| S3 - branch |
+ +----------------------------->| merge-leaf | +--------------+
+ | | | | +-----------------------+
+ | | | | |
+ | | | | V
+ | | | | +------------------------+
+ | | | +-->| S2 - combined-failover |
+ | | +----------->| |
+ | +-------------------->| A3/B3/C3/D3 - |
+ +----------------------------->| merge-leaf |
+ +------------------------+
+
+On the loss of S1, it should be possible to redirect S3 to S2
+and ABCD -> S2 -> S3 must stay in sync.
+
+== UI spec (incomplete) ==
+
+Here should be end-user interface specced.
+
+ setmgr.py <set.ini> init-master ( <node> --connstr <connstr> | --infoset <infoset> )
+ setmgr.py <set.ini> init-node <node> --type <type> --connstr <connstr> --provider <node>
+ setmgr.py <set.ini> change-provider <node> --provider <node>
+ setmgr.py <set.ini> set-dead <node>
+ setmgr.py <set.ini> set-live <node>
+ setmgr.py <set.ini> pause <node>
+ setmgr.py <set.ini> resume <node>
+ setmgr.py <set.ini> master-failover <newmaster>
+ setmgr.py <set.ini> master-switchover <newmaster>
+ setmgr.py <set.ini> node-failover <newnode>
+ setmgr.py <set.ini> node-switchover <newnode>
+ setmgr.py <set.ini> candidates <node>
+
+
+ setmgr.py <set.ini> londiste <node> add-table <tbl> ....
+ setmgr.py <set.ini> remove-table <node> <tbl> ....
+ setmgr.py <set.ini> add-seq <node> <seq> ....
+ setmgr.py <set.ini> remove-seq <node> <seq> ....
+ setmgr.py <set.ini> tables <node>
+ setmgr.py <set.ini> seqs <node>
+ setmgr.py <set.ini> missing <node> --seqs --tables
+ setmgr.py <set.ini> check-tables
+ setmgr.py <set.ini> show-fkeys
+ setmgr.py <set.ini> show-triggers
+ setmgr.py <set.ini> show-tables
+ setmgr.py <set.ini> show-seqs
+ setmgr.py <set.ini> <node> resync-table
+
+ londiste.py <node.ini> add-table <tbl> ....
+ londiste.py <node.ini> remove-table <tbl> ....
+ londiste.py <node.ini> add-seq <seq> ....
+ londiste.py <node.ini> remove-seq <seq> ....
+ londiste.py <node.ini> tables
+ londiste.py <node.ini> seqs
+ londiste.py <node.ini> missing --seqs --tables
+ londiste.py <node.ini> check
+ londiste.py <node.ini> fkeys
+ londiste.py <node.ini> triggers
+ londiste.py <node.ini> tables
+ londiste.py <node.ini> seqs
+ londiste.py <node.ini> resync TBL
+ londiste.py <node.ini> compare TBL
+ londiste.py <node.ini> repair TBL
+
+ ** actions must be resumable **
+
+ setmgr.py <set.ini> init-master ( <node> --connstr <connstr> | --infoset <infoset> )
+ - install pgq, pgq_set
+ - check if db is already in set
+ - add member to set
+ - create_node()
+ setmgr.py <set.ini> init-node <node> --type <type> --connstr <connstr> --provider <node>
+ - install pgq, pgq_set
+ - check if db is alreeady in set
+ - add to node master member list
+ - fetch full list of member from master
+ - add full list to node
+ - subscribe to provider
+ - create_node
+ setmgr.py <set.ini> change-provider <node> --provider <node>
+ - pause node, wait until reacts
+ - fetch pos from old provider
+ - subscribe to new provider
+ - change info in subscriber
+ - resume script
+ - unregister in old provider
+ setmgr.py <set.ini> set-dead <node>
+ setmgr.py <set.ini> set-live <node>
+ setmgr.py <set.ini> pause <node>
+ setmgr.py <set.ini> resume <node>
+ setmgr.py <set.ini> master-failover <newmaster>
+ setmgr.py <set.ini> master-switchover <newmaster>
+ setmgr.py <set.ini> node-failover <newnode>
+ setmgr.py <set.ini> node-switchover <newnode>
+ setmgr.py <set.ini> candidates <node>
+
+