summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMarko Kreen2009-02-13 11:46:37 +0000
committerMarko Kreen2009-02-13 13:20:53 +0000
commiteb973a88e28f33f6f0277386fb176677070349c5 (patch)
tree13f469ffaf6c0e1821959ece0fa0bb876e1f761f
parent096cb8d354890a8f67a0edb74f934ec6f2ffa826 (diff)
Various doc updates.
-rw-r--r--doc/Makefile8
-rw-r--r--doc/TODO.txt118
-rw-r--r--doc/faq.txt129
-rw-r--r--doc/set.notes.txt43
-rw-r--r--doc/skytools3.txt31
5 files changed, 228 insertions, 101 deletions
diff --git a/doc/Makefile b/doc/Makefile
index a16b89e2..2d03da9f 100644
--- a/doc/Makefile
+++ b/doc/Makefile
@@ -3,7 +3,7 @@ include ../config.mak
wiki = https://developer.skype.com/SkypeGarage/DbProjects/SkyTools
-web = [email protected]:/home/pgfoundry.org/groups/skytools/htdocs/
+web = [email protected]:/home/pgfoundry.org/groups/skytools/htdocs/skytools-3.0
EPYDOC = epydoc
EPYARGS = --no-private --url="http://pgfoundry.org/projects/skytools/" \
@@ -11,7 +11,7 @@ EPYARGS = --no-private --url="http://pgfoundry.org/projects/skytools/" \
HTMLS = londiste.cmdline.html londiste.config.html README.html INSTALL.html \
londiste.ref.html TODO.html pgq-sql.html pgq-admin.html pgq-nodupes.html \
- $(SCRIPT_HTMLS) skytools3.html
+ $(SCRIPT_HTMLS) faq.html set.notes.html
SCRIPT_TXTS = walmgr.txt cube_dispatcher.txt table_dispatcher.txt \
queue_mover.txt queue_splitter.txt bulk_loader.txt \
@@ -67,6 +67,10 @@ apiupload: apidoc
rsync -rtlz api/* $(web)/api
cd ../sql/pgq && rm -rf docs/html && $(MAKE) dox
rsync -rtlz ../sql/pgq/docs/html/* $(web)/pgq/
+ cd ../sql/pgq_node && rm -rf docs/html && $(MAKE) dox
+ rsync -rtlz ../sql/pgq_node/docs/html/* $(web)/pgq_node/
+ cd ../sql/londiste && rm -rf docs/html && $(MAKE) dox
+ rsync -rtlz ../sql/londiste/docs/html/* $(web)/londiste/
clean:
rm -rf api *.html
diff --git a/doc/TODO.txt b/doc/TODO.txt
index 0ddd2875..46808463 100644
--- a/doc/TODO.txt
+++ b/doc/TODO.txt
@@ -3,48 +3,87 @@
== Next major release - 3.0 ==
+=== Standalone changes ===
+
+ * Make Londiste support table wildcards.
+ * plpgsql trigger for TRUNCATE / support for installing such trigger
+ * Convert dispatcher scripts to new cascading framework
+ - queue_mover - CascadedWorker (cascaded_worker.py? node_worker.py)
+ - queue_splitter - CascadedConsumer
+ - bulk_loader - CascadedConsumer
+ - cube_dispatcher - CascadedConsumer
+ - table_dispatcher - CascadedConsumer
+ * bulk_loader / cube_dispatcher / table_dispatcher could be merged
+ to one script with 3 backends.
+ * 'Q' event type for londiste, (queue_splitter event), for event
+ that needs to be inserted into queue. Also trigger flag to
+ create such event.
+ - better to be done as "redirect" - allow events for a table
+ to be redirected to another table or queue.
+ * londiste check
+ * londiste fkeys
+
+=== Core changes ===
+
+ASAP:
+
+ - move leaf event copy away from consumer/worker?
+
+ - logging from db is mess, needs full cleanup
+ - docstring review
+ - dead node handling
+ - failover - may affect db code
+
+ - dbscript: document self.args
+ - dbscript: easier help string setting
+
+Soon:
+
+ - pgq_node.is_root_event() rettype
+ - exec_cmd better name
+ - cleanup of hierarchical remove_event
+
+ - make everything use next_batch_info()
+ - londiste sql: fq names? glob names?
+ - CascadeAdmin: job_name vs. consumer_name as worker_name
+ - setadm rename-node
+
+No hurry:
+
+ - psycopgwrapper full doc
+ - londiste syncer: get provider db?
+ - sleeps while waiting notices from db
+ - dispatcher scripts: no need to check tables repeatedly
+ - replace "raise Exception" for user errors with something
+ nicer, (UsageError) which also should avoid traceback
+
+ - copy vs. reorg
+ - execute vs. copy - needs wait?
+ - --wait/--nowait switch for execute
+
=== done ===
- * drop support of psycopg1
- * parallel copy
+ - Move remaining sql functions away from ret types.
+ - get rid of 'set' in londiste code?
+ - londiste: add-seq refresh from provider
+
+== old todo (not up-to-date) ==
-=== todo ===
+ * pgqadm ticker delays, separate retry delay
+ * pgq.drop_queue -> flag to drop consumers
* cascaded replication, switchover, failover [marko]
- - add --create
- - root worker:
- - insert seq pos in queue
- - seq add/remove events
- advanced admin commands
- - switchover
- failover
- - pause
- - resume
- node-status
- set-status
- rename-node
- - standard msg/error handling for all sql functions
- - compare/repair
- - check if table is in other sets? [NAK]
- - setconsumer/pgq - insert tick with original date (+evid?)
- - dont trust node name in member_info, set_info is authoritative
-
- * drop support for 8.1 ??
-
- * new londiste sql:
- - support new trigger flags in 8.3?
-
-== Next stable release - 2.1.7 ==
-
- * sql ident quoting
- * exit single-loop processes with error on exception
+ - on root switch/failover check if all tables are present
== High-prority ==
=== Smaller things ===
- * pgq: RemoteConsumer/SerialConsumer/pgq_ext sanity, too much duplication [marko]
- * londiste: create tables on subscriber
* pgqadm: Utility commands:
- reg-copy que cons1 cons2
- reg-move que cons1 cons2
@@ -62,10 +101,7 @@
=== Larger things ===
- * londiste: denytriggers on subscriber
- * londiste: Quote SQL identifiers, keep combined name, rule will be "Split schema as first dot"
* skylog: publish sample logdb schema, with some tools
- * londiste: allow table redirection on subscriber side
* londiste/pgqset: support creating slave from master by pg_dump / PITR.
=== Smaller things ===
@@ -77,36 +113,16 @@
rights on subscriber anyway).
* pgqadm: separate priod for retry queue processing
* skytools: switch for silence for cron scripts
- * pgq: drop_fkeys.sql for live envs
* DBScript: failure to write pidfile should be logged (cronscripts)
* ideas from SlonyI:
- - force timestamps to ISO - dubious, creates pq pkt
- when buffering queries, check their size
- - store event_id with tick, thus making possible to guess the size of
- upcoming batches.
* pgqadm: show count of events to be processed [--count switch].
- it should be relatively cheap with optimized query if consumers are not lagging.
- - exact method - `SELECT count(*) FROM pgq.event_X WHERE NOT txid_in_snapshot(ev_txid, last_tick_snapshot) AND ev_txid > xmin;`
- Can be optimized: `(ev_txid > xmax OR ev_txid IN get_snapshot_active())`.
- Several consumer can be done in one go if spread out to separate columns.
- - inexact method: take ev_id near xmax and compare with value from sequence
- * broken by retry events, rollbacked transactions and use of force_tick()
+ - broken by retry events, rollbacked transactions and use of force_tick()
* automatic "repair" - after reaching sync point, the "replay" must be killed/paused, then fixes can be applied
- * pgq: store txid/event_id_seq with each tick
== Just ideas ==
* skytools: config from database
* skytools: config-less operation?
- * skytools: partial sql parser for log processing
- * londiste: EXECUTE SCRIPT
* londiste: somehow automatic sync of table structure/functions/...?
-== walmgr ==
-
-(walmgr needs thorough review of pending issues)
-
- * copy master config to slave
- * slave needs to decide which config to use
-
-
diff --git a/doc/faq.txt b/doc/faq.txt
new file mode 100644
index 00000000..ad9e1232
--- /dev/null
+++ b/doc/faq.txt
@@ -0,0 +1,129 @@
+
+= Skytools FAQ =
+
+== Skytools ==
+
+=== What is Skytools? ===
+
+It is bunch of database management tools we use
+and various frameworks / modules they depend on.
+
+Main components are:
+
+==== Python scripts ====
+
+Main tools:
+
+ walmgr:: walshipping manager
+ londiste:: replication on top of pgq
+ pgqadm:: pgq administration and maintenance
+ setadm:: cascaded pgq administration
+
+Special scripts:
+
+ bulk_loader
+ cube_dispatcher
+ table_dispatcher
+
+Queue copy
+
+ queue_mover:: copy queue contents from one db to another
+ queue_splitter:: copy queue contents to another db splitting it into several queues
+
+Operate bunch of scripts together
+
+ scriptmgr
+
+==== Python modules ====
+
+ pgq
+ skytools
+ londiste
+
+==== SQL modules ====
+
+ londiste
+ pgq
+ pgq_node
+ pgq_ext
+
+=== Where is the code located? ===
+
+Code layout:
+
+ debian/
+ doc/
+ python/bin/
+ python/londiste/ - Londiste python modules
+ python/modules/ - C extension modules for Python (string handling)
+ python/pgq/ - pgq and cascaded pgq python modules
+ python/skytools/ - python modules for generic database scripting
+ scripts/ - Special purpose python scripts (python)
+ sql/londiste/ - database code for londiste (plpgsql)
+ sql/pgq/ - PgQ database code (C + plpgsql)
+ sql/pgq_ext/ - PgQ event/batch tracking on remote database (plpgsql)
+ sql/pgq_node/ - cascaded pgq support (plpgsql)
+ sql/txid/ - Obsolete txid code for Postgres 8.2 and below
+
+== PgQ - The generic queue ==
+
+=== Why do queue in database? Transactional overhead? ===
+
+1. PgQ is quite likely the fastest ACID compliant queue,
+ thanks to Postgres being pretty fast despite the
+ "transactional overhead". Why use anything less robust?
+
+2. We have lot of business logic in database. Events created
+ by business transactions need to live or die with main transaction.
+
+3. Queue used for replication purposes needs to be transactional.
+
+I think the reason people act surprised when they hear about queue
+in database is not that they don't care about reliability
+of their event transport, but that the reliable data storage
+mechanism - SQL databases - did not have any way to write
+performant queue. Now thanks to the txid/snapshot technique
+we have a way to write fast _and_ reliable queue,
+so why (care about anything less).
+
+=== Could you break dependancy on Python? ===
+
+There is no dependancy on Python. The PgQ itself is written in C / plpgsql
+and it appears as bunch of SQL functions under `pgq` schema.
+Thus it can be used from any language that can execute SQL queries.
+
+There is Python helper framework that makes writing Python consumers easier.
+Such framework could be written for any language.
+
+=== Aren't the internals similar to Slony-I? ===
+
+Yes, PgQ was created by generalizing queueing parts from Slony-I.
+
+=== Dump-restore ===
+
+== Londiste - The replication tool ==
+
+=== What type of replication it does? ===
+
+Londiste does trigger-based asynchronous single-master replication,
+same as Slony-I.
+
+In Skytools 3.x it will support merging partitions togethers,
+that could be called shared-nothing multimaster replication.
+
+=== What is the difference between Slony-I and Londiste? ===
+
+Nothing fundamental. Both do asynchronous replication.
+
+Main difference is that Londiste consists of several
+relatively independent parts, unlike Slony-I where
+code is more tightly tied together.
+
+At the moment Londiste loses to Slony-I featurewise,
+but should be easier to use. Hopefully we can keep
+the simple UI when we catch up in features.
+
+=== What are the limitations of Londiste ===
+
+It does not support '.' and ',' in table, schema and column names.
+
diff --git a/doc/set.notes.txt b/doc/set.notes.txt
index cd4757d2..57d2960c 100644
--- a/doc/set.notes.txt
+++ b/doc/set.notes.txt
@@ -40,12 +40,14 @@ branch::
* carries full contents of the queue.
* may subscribe to all/some/none of the tables.
* can be provider for initial copy only if subscribes to table
- * attached tags: mirror/subset/queue ???
leaf::
Data-only node. Events are replayed, but no queue, thus cannot be provider to other nodes.
+ Nodes where sets from partitions are merged together are also tagged 'leaf', because
+ in per-partition set it cannot be provider to other nodes.
merge-leaf::
+ [Does not exist as separate type, detected as 'leaf' that has 'combined_queue' set.]
Exists in per-partition set.
- Does not have it's own queue.
- Initial COPY is done with --skip-truncate,
@@ -56,11 +58,13 @@ merge-leaf::
from combined-root
combined-root::
+ [Does not exist as separate type, detected as 'root' that has 'leaf's with 'combined_queue' set.]
- Master for combined set. Received data from several per-partition set's.
- also is merge-leaf in every part-set.
- queue is filled directly from partition sets.
combined-failover::
+ [Does not exist as separate type, detected as 'branch' that has 'leaf's with 'combined_queue' set.]
- participates in combined-set, receives events.
- also is queue-only node in every part-set.
- but no processing is done, just tracking
@@ -85,11 +89,11 @@ combined-failover::
* Subscriber gets its own provider url from database, so switching to
another provider does not need config changes.
* Ticks+data can only be deleted if all nodes have already applied it.
- - Special consumer registration on all queues - <setname>_watermark.
- This avoid PgQ deleting old events.
- - Nodes propagate upwards their lowest tick
- - Root sends new watermark events to the queue
- - When branch/leaf gets new watermark event, it moves the <setname>_watermark registration.
+ - Special consumer registration on all queues - ".global_watermark".
+ This avoids PgQ from deleting old events.
+ - Nodes propagate upwards their lowest tick: (local_watermark)
+ - Root sends it's local watermark as "pgq.global-watermark" event to the queue.
+ - When branch/leaf gets new watermark event, it moves the ".global_watermark" registration.
== Illustrations ==
@@ -134,37 +138,18 @@ On the loss of S3, it should be possible to direct S4 to receive data from S1/S2
On the loss of S1, it should be possible to redirect S3 to S2
and ABCD -> S2 -> S3 must stay in sync.
-== UI spec (incomplete) ==
+== UI spec (incomplete) [obsolete] ==
Here should be end-user interface specced.
- setmgr.py <set.ini> init-master ( <node> --connstr <connstr> | --infoset <infoset> )
- setmgr.py <set.ini> init-node <node> --type <type> --connstr <connstr> --provider <node>
+ setmgr.py <set.ini> create-node TYPE NODE_NAME NODE_LOCATION --provider=<provider_connstr>
setmgr.py <set.ini> change-provider <node> --provider <node>
setmgr.py <set.ini> set-dead <node>
setmgr.py <set.ini> set-live <node>
- setmgr.py <set.ini> pause <node>
- setmgr.py <set.ini> resume <node>
- setmgr.py <set.ini> master-failover <newmaster>
- setmgr.py <set.ini> master-switchover <newmaster>
- setmgr.py <set.ini> node-failover <newnode>
- setmgr.py <set.ini> node-switchover <newnode>
- setmgr.py <set.ini> candidates <node>
+ setmgr.py <set.ini> pause --node=<node> [--consumer=]
+ setmgr.py <set.ini> resume --node=<node> [--consumer=]
- setmgr.py <set.ini> londiste <node> add-table <tbl> ....
- setmgr.py <set.ini> remove-table <node> <tbl> ....
- setmgr.py <set.ini> add-seq <node> <seq> ....
- setmgr.py <set.ini> remove-seq <node> <seq> ....
- setmgr.py <set.ini> tables <node>
- setmgr.py <set.ini> seqs <node>
- setmgr.py <set.ini> missing <node> --seqs --tables
- setmgr.py <set.ini> check-tables
- setmgr.py <set.ini> show-fkeys
- setmgr.py <set.ini> show-triggers
- setmgr.py <set.ini> show-tables
- setmgr.py <set.ini> show-seqs
- setmgr.py <set.ini> <node> resync-table
londiste.py <node.ini> add-table <tbl> ....
londiste.py <node.ini> remove-table <tbl> ....
diff --git a/doc/skytools3.txt b/doc/skytools3.txt
index 13b92c09..56dd36f0 100644
--- a/doc/skytools3.txt
+++ b/doc/skytools3.txt
@@ -19,14 +19,19 @@ Keep old design from Skytools 2
New features in Skytools 3
--------------------------
-* Cascading is implemented as generic layer on top of PgQ - *pgq_set*.
+* Cascading is implemented as generic layer on top of PgQ - *Cascaded PgQ*.
- Its goal is to keep identical copy of queue contents in several nodes.
- Not replication-specific - can be used for any queue.
-* Parallel copy - during inital sync several tables can be
- copied at the same time.
- - In 2.1 the copy already happened in separate process,
- making it parallel was just a matter of tuning launching/syncing logic.
+* New Londiste features:
+ - Parallel copy - during inital sync several tables can be
+ copied at the same time. In 2.1.x the copy already happened in separate
+ process, making it parallel was just a matter of tuning launching/syncing logic.
+
+ - EXECUTE command, to run random SQL script on all nodes.
+
+ - Automatic table or sequence creation by importing the structure
+ from provider node. Activeted with --create switch for add-table, add-seq.
* Advanced admin operations:
- switchover
@@ -35,19 +40,7 @@ New features in Skytools 3
- pause / resume node
- rename node
-Open Questions
---------------
-
-* terminology
- - 'set' vs 'cluster' vs ???
- - (root / branch / leaf) vs. (master / slave) vs. (provider / subscriber)
- - add <table> vs. attach <table>
-
-* own trigger handling
- - fullblown custom
- - minimal custom + wait for 8.3
+Example session
+---------------
-* compatibility with v2.1.x (python scripts + modules):
- - bundle old modules and scripts with different name
- - rename new modules and scripts