Various doc updates.

author: Marko Kreen 2009-02-13 11:46:37 +0000
committer: Marko Kreen 2009-02-13 13:20:53 +0000
commit: eb973a88e28f33f6f0277386fb176677070349c5 (patch)
tree: 13f469ffaf6c0e1821959ece0fa0bb876e1f761f
parent: 096cb8d354890a8f67a0edb74f934ec6f2ffa826 (diff)
5 files changed, 228 insertions, 101 deletions
diff --git a/doc/Makefile b/doc/Makefile
index a16b89e2..2d03da9f 100644
--- a/doc/Makefile
+++ b/doc/Makefile
@@ -3,7 +3,7 @@ include ../config.mak
 
 wiki = https://developer.skype.com/SkypeGarage/DbProjects/SkyTools
 
-web = [email protected]:/home/pgfoundry.org/groups/skytools/htdocs/
+web = [email protected]:/home/pgfoundry.org/groups/skytools/htdocs/skytools-3.0
 
 EPYDOC = epydoc
 EPYARGS = --no-private --url="http://pgfoundry.org/projects/skytools/" \
@@ -11,7 +11,7 @@ EPYARGS = --no-private --url="http://pgfoundry.org/projects/skytools/" \
 
 HTMLS = londiste.cmdline.html londiste.config.html README.html INSTALL.html \
 	londiste.ref.html TODO.html pgq-sql.html pgq-admin.html pgq-nodupes.html \
-	$(SCRIPT_HTMLS) skytools3.html
+	$(SCRIPT_HTMLS) faq.html set.notes.html
 
 SCRIPT_TXTS = walmgr.txt cube_dispatcher.txt table_dispatcher.txt \
 	      queue_mover.txt queue_splitter.txt bulk_loader.txt \
@@ -67,6 +67,10 @@ apiupload: apidoc
 	rsync -rtlz api/* $(web)/api
 	cd ../sql/pgq && rm -rf docs/html && $(MAKE) dox
 	rsync -rtlz ../sql/pgq/docs/html/* $(web)/pgq/
+	cd ../sql/pgq_node && rm -rf docs/html && $(MAKE) dox
+	rsync -rtlz ../sql/pgq_node/docs/html/* $(web)/pgq_node/
+	cd ../sql/londiste && rm -rf docs/html && $(MAKE) dox
+	rsync -rtlz ../sql/londiste/docs/html/* $(web)/londiste/
 
 clean:
 	rm -rf api *.html
diff --git a/doc/TODO.txt b/doc/TODO.txt
index 0ddd2875..46808463 100644
--- a/doc/TODO.txt
+++ b/doc/TODO.txt
@@ -3,48 +3,87 @@
 
 == Next major release - 3.0 ==
 
+=== Standalone changes ===
+
+ * Make Londiste support table wildcards.
+ * plpgsql trigger for TRUNCATE / support for installing such trigger
+ * Convert dispatcher scripts to new cascading framework
+   - queue_mover - CascadedWorker (cascaded_worker.py?  node_worker.py)
+   - queue_splitter - CascadedConsumer
+   - bulk_loader - CascadedConsumer
+   - cube_dispatcher - CascadedConsumer
+   - table_dispatcher - CascadedConsumer
+ * bulk_loader / cube_dispatcher / table_dispatcher could be merged
+   to one script with 3 backends.
+ * 'Q' event type for londiste, (queue_splitter event), for event
+   that needs to be inserted into queue.  Also trigger flag to
+   create such event.
+   - better to be done as "redirect" - allow events for a table
+     to be redirected to another table or queue.
+ * londiste check
+ * londiste fkeys
+
+=== Core changes ===
+
+ASAP:
+
+ - move leaf event copy away from consumer/worker?
+
+ - logging from db is mess, needs full cleanup
+ - docstring review
+ - dead node handling
+ - failover - may affect db code
+
+ - dbscript: document self.args
+ - dbscript: easier help string setting
+
+Soon:
+
+ - pgq_node.is_root_event() rettype
+ - exec_cmd better name
+ - cleanup of hierarchical remove_event
+
+ - make everything use next_batch_info()
+ - londiste sql: fq names?  glob names?
+ - CascadeAdmin: job_name vs. consumer_name as worker_name
+ - setadm rename-node
+
+No hurry:
+
+ - psycopgwrapper full doc
+ - londiste syncer: get provider db?
+ - sleeps while waiting notices from db
+ - dispatcher scripts: no need to check tables repeatedly
+ - replace "raise Exception" for user errors with something
+   nicer, (UsageError) which also should avoid traceback
+
+ - copy vs. reorg
+ - execute vs. copy - needs wait?
+ - --wait/--nowait switch for execute
+
 === done ===
 
- * drop support of psycopg1
- * parallel copy
+ - Move remaining sql functions away from ret types.
+ - get rid of 'set' in londiste code?
+ - londiste: add-seq refresh from provider
+
+== old todo (not up-to-date) ==
 
-=== todo ===
+ * pgqadm ticker delays, separate retry delay
+ * pgq.drop_queue -> flag to drop consumers
 
  * cascaded replication, switchover, failover [marko]
-   - add --create
-   - root worker:
-     - insert seq pos in queue
-     - seq add/remove events
    - advanced admin commands
-     - switchover
      - failover
-     - pause
-     - resume
      - node-status
      - set-status
      - rename-node
-   - standard msg/error handling for all sql functions
-   - compare/repair
-   - check if table is in other sets? [NAK]
-   - setconsumer/pgq - insert tick with original date (+evid?)
-   - dont trust node name in member_info, set_info is authoritative
-
- * drop support for 8.1 ??
-
- * new londiste sql:
-   - support new trigger flags in 8.3?
-
-== Next stable release - 2.1.7 ==
-
- * sql ident quoting
- * exit single-loop processes with error on exception
+     - on root switch/failover check if all tables are present
 
 == High-prority ==
 
 === Smaller things ===
 
- * pgq: RemoteConsumer/SerialConsumer/pgq_ext sanity, too much duplication [marko]
- * londiste: create tables on subscriber
  * pgqadm: Utility commands:
   - reg-copy que cons1 cons2
   - reg-move que cons1 cons2
@@ -62,10 +101,7 @@
 
 === Larger things ===
 
- * londiste: denytriggers on subscriber
- * londiste: Quote SQL identifiers, keep combined name, rule will be "Split schema as first dot"
  * skylog: publish sample logdb schema, with some tools
- * londiste: allow table redirection on subscriber side
  * londiste/pgqset: support creating slave from master by pg_dump / PITR.
 
 === Smaller things ===
@@ -77,36 +113,16 @@
    rights on subscriber anyway).
  * pgqadm: separate priod for retry queue processing
  * skytools: switch for silence for cron scripts
- * pgq: drop_fkeys.sql for live envs
  * DBScript: failure to write pidfile should be logged (cronscripts)
  * ideas from SlonyI:
-  - force timestamps to ISO - dubious, creates pq pkt
   - when buffering queries, check their size
-  - store event_id with tick, thus making possible to guess the size of
-    upcoming batches.
  * pgqadm: show count of events to be processed [--count switch].
-   it should be relatively cheap with optimized query if consumers are not lagging.
-   - exact method - `SELECT count(*) FROM pgq.event_X WHERE NOT txid_in_snapshot(ev_txid, last_tick_snapshot) AND ev_txid > xmin;`
-     Can be optimized: `(ev_txid > xmax OR ev_txid IN get_snapshot_active())`.
-     Several consumer can be done in one go if spread out to separate columns.
-   - inexact method: take ev_id near xmax and compare with value from sequence
-     * broken by retry events, rollbacked transactions and use of force_tick()
+  - broken by retry events, rollbacked transactions and use of force_tick()
  * automatic "repair" - after reaching sync point, the "replay" must be killed/paused, then fixes can be applied
- * pgq: store txid/event_id_seq with each tick
 
 == Just ideas ==
 
  * skytools: config from database
  * skytools: config-less operation?
- * skytools: partial sql parser for log processing
- * londiste: EXECUTE SCRIPT
  * londiste: somehow automatic sync of table structure/functions/...?
 
-== walmgr ==
-
-(walmgr needs thorough review of pending issues)
-
- * copy master config to slave
- * slave needs to decide which config to use
-
-
diff --git a/doc/faq.txt b/doc/faq.txt
new file mode 100644
index 00000000..ad9e1232
--- /dev/null
+++ b/doc/faq.txt
@@ -0,0 +1,129 @@
+
+= Skytools FAQ =
+
+== Skytools ==
+
+=== What is Skytools? ===
+
+It is bunch of database management tools we use
+and various frameworks / modules they depend on.
+
+Main components are:
+
+==== Python scripts ====
+
+Main tools:
+
+   walmgr:: walshipping manager
+   londiste:: replication on top of pgq
+   pgqadm:: pgq administration and maintenance
+   setadm:: cascaded pgq administration
+ 
+Special scripts:
+ 
+   bulk_loader
+   cube_dispatcher
+   table_dispatcher
+ 
+Queue copy
+
+   queue_mover:: copy queue contents from one db to another
+   queue_splitter:: copy queue contents to another db splitting it into several queues
+ 
+Operate bunch of scripts together
+
+   scriptmgr
+
+==== Python modules ====
+
+   pgq
+   skytools
+   londiste
+ 
+==== SQL modules ====
+   
+   londiste
+   pgq
+   pgq_node
+   pgq_ext
+
+=== Where is the code located? ===
+
+Code layout:
+
+ debian/
+ doc/
+ python/bin/
+ python/londiste/   - Londiste python modules
+ python/modules/    - C extension modules for Python (string handling)
+ python/pgq/        - pgq and cascaded pgq python modules
+ python/skytools/   - python modules for generic database scripting
+ scripts/           - Special purpose python scripts (python)
+ sql/londiste/      - database code for londiste (plpgsql)
+ sql/pgq/           - PgQ database code (C + plpgsql)
+ sql/pgq_ext/       - PgQ event/batch tracking on remote database (plpgsql)
+ sql/pgq_node/      - cascaded pgq support (plpgsql)
+ sql/txid/          - Obsolete txid code for Postgres 8.2 and below
+
+== PgQ - The generic queue ==
+
+=== Why do queue in database?  Transactional overhead? ===
+
+1. PgQ is quite likely the fastest ACID compliant queue,
+   thanks to Postgres being pretty fast despite the
+   "transactional overhead".  Why use anything less robust?
+
+2. We have lot of business logic in database.  Events created
+   by business transactions need to live or die with main transaction.
+
+3. Queue used for replication purposes needs to be transactional.
+
+I think the reason people act surprised when they hear about queue
+in database is not that they don't care about reliability
+of their event transport, but that the reliable data storage
+mechanism - SQL databases - did not have any way to write
+performant queue.  Now thanks to the txid/snapshot technique
+we have a way to write fast _and_ reliable queue,
+so why (care about anything less).
+
+=== Could you break dependancy on Python? ===
+
+There is no dependancy on Python.  The PgQ itself is written in C / plpgsql
+and it appears as bunch of SQL functions under `pgq` schema.
+Thus it can be used from any language that can execute SQL queries.
+
+There is Python helper framework that makes writing Python consumers easier.
+Such framework could be written for any language.
+
+=== Aren't the internals similar to Slony-I? ===
+
+Yes, PgQ was created by generalizing queueing parts from Slony-I.
+
+=== Dump-restore ===
+
+== Londiste - The replication tool ==
+
+=== What type of replication it does? ===
+
+Londiste does trigger-based asynchronous single-master replication,
+same as Slony-I.
+
+In Skytools 3.x it will support merging partitions togethers,
+that could be called shared-nothing multimaster replication.
+
+=== What is the difference between Slony-I and Londiste? ===
+
+Nothing fundamental.  Both do asynchronous replication.
+
+Main difference is that Londiste consists of several
+relatively independent parts, unlike Slony-I where
+code is more tightly tied together.
+
+At the moment Londiste loses to Slony-I featurewise,
+but should be easier to use.  Hopefully we can keep
+the simple UI when we catch up in features.
+
+=== What are the limitations of Londiste ===
+
+It does not support '.' and ',' in table, schema and column names.
+
diff --git a/doc/set.notes.txt b/doc/set.notes.txt
index cd4757d2..57d2960c 100644
--- a/doc/set.notes.txt
+++ b/doc/set.notes.txt
@@ -40,12 +40,14 @@ branch::
   * carries full contents of the queue.
   * may subscribe to all/some/none of the tables.
   * can be provider for initial copy only if subscribes to table
-  * attached tags: mirror/subset/queue ???
 
 leaf::
   Data-only node.  Events are replayed, but no queue, thus cannot be provider to other nodes.
+  Nodes where sets from partitions are merged together are also tagged 'leaf', because
+  in per-partition set it cannot be provider to other nodes.
 
 merge-leaf::
+  [Does not exist as separate type, detected as 'leaf' that has 'combined_queue' set.]
   Exists in per-partition set.
   - Does not have it's own queue.
   - Initial COPY is done with --skip-truncate,
@@ -56,11 +58,13 @@ merge-leaf::
     from combined-root
   
 combined-root::
+  [Does not exist as separate type, detected as 'root' that has 'leaf's with 'combined_queue' set.]
   - Master for combined set.  Received data from several per-partition set's.
   - also is merge-leaf in every part-set.
   - queue is filled directly from partition sets.
 
 combined-failover::
+  [Does not exist as separate type, detected as 'branch' that has 'leaf's with 'combined_queue' set.]
   - participates in combined-set, receives events.
   - also is queue-only node in every part-set.
   - but no processing is done, just tracking
@@ -85,11 +89,11 @@ combined-failover::
 * Subscriber gets its own provider url from database, so switching to
   another provider does not need config changes.
 * Ticks+data can only be deleted if all nodes have already applied it.
-  - Special consumer registration on all queues - <setname>_watermark.
-    This avoid PgQ deleting old events.
-  - Nodes propagate upwards their lowest tick
-  - Root sends new watermark events to the queue
-  - When branch/leaf gets new watermark event, it moves the <setname>_watermark registration.
+  - Special consumer registration on all queues - ".global_watermark".
+    This avoids PgQ from deleting old events.
+  - Nodes propagate upwards their lowest tick: (local_watermark)
+  - Root sends it's local watermark as "pgq.global-watermark" event to the queue.
+  - When branch/leaf gets new watermark event, it moves the ".global_watermark" registration.
 
 == Illustrations ==
 
@@ -134,37 +138,18 @@ On the loss of S3, it should be possible to direct S4 to receive data from S1/S2
 On the loss of S1, it should be possible to redirect S3 to S2
 and ABCD -> S2 -> S3 must stay in sync.
 
-== UI spec (incomplete) ==
+== UI spec (incomplete) [obsolete] ==
 
 Here should be end-user interface specced.
 
- setmgr.py <set.ini> init-master ( <node>  --connstr <connstr> | --infoset <infoset> )
- setmgr.py <set.ini> init-node <node> --type <type> --connstr <connstr> --provider <node>
+ setmgr.py <set.ini> create-node TYPE NODE_NAME  NODE_LOCATION --provider=<provider_connstr>
  setmgr.py <set.ini> change-provider <node> --provider <node>
  setmgr.py <set.ini> set-dead <node>
  setmgr.py <set.ini> set-live <node>
- setmgr.py <set.ini> pause <node>
- setmgr.py <set.ini> resume <node>
- setmgr.py <set.ini> master-failover <newmaster>
- setmgr.py <set.ini> master-switchover <newmaster>
- setmgr.py <set.ini> node-failover <newnode>
- setmgr.py <set.ini> node-switchover <newnode>
- setmgr.py <set.ini> candidates <node>
+ setmgr.py <set.ini> pause --node=<node> [--consumer=]
+ setmgr.py <set.ini> resume --node=<node> [--consumer=]
 
 
- setmgr.py <set.ini> londiste <node> add-table <tbl> ....
- setmgr.py <set.ini> remove-table <node> <tbl> ....
- setmgr.py <set.ini> add-seq <node> <seq> ....
- setmgr.py <set.ini> remove-seq <node> <seq> ....
- setmgr.py <set.ini> tables <node>
- setmgr.py <set.ini> seqs <node>
- setmgr.py <set.ini> missing <node> --seqs --tables
- setmgr.py <set.ini> check-tables
- setmgr.py <set.ini> show-fkeys
- setmgr.py <set.ini> show-triggers
- setmgr.py <set.ini> show-tables
- setmgr.py <set.ini> show-seqs
- setmgr.py <set.ini> <node> resync-table
 
  londiste.py <node.ini> add-table <tbl> ....
  londiste.py <node.ini> remove-table <tbl> ....
diff --git a/doc/skytools3.txt b/doc/skytools3.txt
index 13b92c09..56dd36f0 100644
--- a/doc/skytools3.txt
+++ b/doc/skytools3.txt
@@ -19,14 +19,19 @@ Keep old design from Skytools 2
 New features in Skytools 3
 --------------------------
 
-* Cascading is implemented as generic layer on top of PgQ - *pgq_set*.
+* Cascading is implemented as generic layer on top of PgQ - *Cascaded PgQ*.
   - Its goal is to keep identical copy of queue contents in several nodes.
   - Not replication-specific - can be used for any queue.
 
-* Parallel copy - during inital sync several tables can be
-  copied at the same time.
-  - In 2.1 the copy already happened in separate process,
-    making it parallel was just a matter of tuning launching/syncing logic.
+* New Londiste features:
+  - Parallel copy - during inital sync several tables can be
+    copied at the same time.   In 2.1.x the copy already happened in separate
+    process, making it parallel was just a matter of tuning launching/syncing logic.
+
+  - EXECUTE command, to run random SQL script on all nodes.
+
+  - Automatic table or sequence creation by importing the structure
+    from provider node.  Activeted with --create switch for add-table, add-seq.
 
 * Advanced admin operations:
   - switchover
@@ -35,19 +40,7 @@ New features in Skytools 3
   - pause / resume node
   - rename node
 
-Open Questions
---------------
-
-* terminology
-  - 'set' vs 'cluster' vs ???
-  - (root / branch / leaf) vs. (master / slave) vs. (provider / subscriber)
-  - add <table> vs. attach <table>
-
-* own trigger handling
-  - fullblown custom
-  - minimal custom + wait for 8.3
+Example session
+---------------
 
-* compatibility with v2.1.x (python scripts + modules):
-  - bundle old modules and scripts with different name
-  - rename new modules and scripts
author	Marko Kreen	2009-02-13 11:46:37 +0000
committer	Marko Kreen	2009-02-13 13:20:53 +0000
commit	eb973a88e28f33f6f0277386fb176677070349c5 (patch)
tree	13f469ffaf6c0e1821959ece0fa0bb876e1f761f
parent	096cb8d354890a8f67a0edb74f934ec6f2ffa826 (diff)