doc/londiste.ref.txt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304


= Londiste Reference =

== Notes ==

=== PgQ daemon ===

Londiste runs as a consumer on PgQ.  Thus `pgqadm.py ticker` must be running
on provider database.  It is preferable to run ticker on same machine as database,
because it needs low latency, but that is not a requirement.

For monitoring you can use `pgqadm.py status` command.

=== Table Names ===

Londiste internally uses table names always fully schema-qualified.
If table name without schema is given on command line, it just
puts "public." in front of it, without looking at search_path.

=== PgQ events used ===

==== Table data change event in SQL format ====

Those events will be inserted by triggers on tables.

 * ev_type = 'I' / 'U' / 'D'
 * ev_data = partial SQL statement - the part between `[]` is removed:
   -  `[ INSERT INTO table ]  (column1, column2) values (value1, value2)`
   - `[ UPDATE table SET ]  column2=value2 WHERE pkeycolumn1 = value1`
   - `[ DELETE FROM table WHERE ]  pkeycolumn1 = value1`
 * ev_extra1 = table name with schema

Such partial SQL format is used for 2 reasons - to conserve space
and to make possible to redirect events to another table.

==== Table data change event in urlencoded format ====

Those events will be inserted by triggers on tables.

 * ev_type = 'I' / 'U' / 'D' + ':' + list of pkey columns
   Eg: I:lastname,firstname
 * ev_data = urlencoded values of all columns of the row.
   NULL is signified by omitting '=' after column name.
 * ev_extra1 = table name with schema

Urlencoded events take more space that SQL events, but are more
easily parseable by other scripts.

==== Table addition event ====

This event will be inserted by 'londiste add-table' on root.

 * ev_type = 'londiste.add-table'
 * ev_data = table name

All subscribers downstream will also register this table 
as being available on the queue.

==== Table removal event ====

This event will be inserted by 'londiste remove-table' on root.

 * ev_type = 'londiste.remove-table'
 * ev_data = table name

All subscribers downstream will now unregistister this table 
as being available on the queue.  If they happen to be subscriber
to this table locally, the table is unsubscribed.

==== SQL script execution event ====

This event is inserted by 'londiste execute' on root node.
The insert happens in same TX as the actual command are executed.

 * ev_type = 'EXECUTE'
 * ev_data = script body
 * ev_extra1 = unique id for script (file name?)

Script is identified by name - it is used to check if it is already applied.
This allows to override scripts in downstream nodes.

== log file ==

Londiste normal log consist just of statistics log-lines, key-value
pairs between `{}`.  Their meaning:

 * count: how many event was in batch.
 * ignored: how many of them was ignored - table not registered on subscriber or not yet in sync.
 * duration: how long the batch processing took, in seconds.

Example:

  {count: 110, duration: 0.88}


== Commands for managing provider database ==

=== provider install ===

  londiste.py <config.ini> provider install

Installs code into provider and subscriber database and creates queue.
Equivalent to doing following by hand:

    CREATE LANGUAGE plpgsql;
    CREATE LANGUAGE plpython;
    \i .../contrib/txid.sql
    \i .../contrib/logtriga.sql
    \i .../contrib/pgq.sql
    \i .../contrib/londiste.sql
    select pgq.create_queue(queue name);

Notes:

 * The schema/tables are installed under user Londiste is configured to run.
 If you prefer to run Londiste under non-admin user, they should also
 be installed by hand.

=== provider add ===

  londiste.py <config.ini> provider add <table name> ...

Registers table on provider database and adds trigger to the table
that will send events to the queue.

=== provider remove ===

  londiste.py <config.ini> provider remove <table name> ...

Unregisters table on provider side and removes triggers on table.
The event about table removal is also sent to the queue, so
all subscriber unregister table from their end also.

=== provider tables ===

  londiste.py <config.ini> provider tables

Shows registered tables on provider side.

=== provider seqs ===

  londiste.py <config.ini> provider seqs

Shows registered sequences on provider side.

== Commands for managing subscriber database ==

=== subscriber install ===

  londiste.py <config.ini> subscriber install

Installs code into subscriber database.
Equivalent to doing following by hand:

    CREATE LANGUAGE plpgsql;
    \i .../contrib/londiste.sql

This will be done under Londiste user, if the tables should be
owned by someone else, it needs to be done by hand.

=== subscriber add ===

  londiste.py <config.ini> subscriber add <table name> ... [--excect-sync | --skip-truncate | --force]

Registers table on subscriber side.  

Switches

 --expect-sync:: Table is tagged as in-sync so initial COPY is skipped.
 --skip-truncate:: When doing initial COPY, don't remove old data.
 --force:: Ignore table structure differences.

=== subscriber remove ===

  londiste.py <config.ini> subscriber remove <table name> ...

Unregisters the table from subscriber.  No events will be applied
to the table anymore.  Actual table will not be touched.

=== subscriber resync ===

  londiste.py <config.ini> subscriber resync <table name> ...

Tags tables are "not synced."  Later replay process will notice this
and launch `copy` process to sync the table again.

== Replication commands ==

=== replay ===

The actual replication process.  Should be run as daemon with `-d` switch,
because it needs to be always running.

It main task is to get a batches from PgQ and apply them in one transaction.

Basic logic:

 * Get batch from PgQ queue on provider.  See if it is already applied to
   subsciber, skip the batch in that case.
 * Management actions, can do transactions on subscriber:
   - Load table state from subscriber, to be up-to-date on registrations
     and `copy` processes running in parallel.
   - If a `copy` process wants to give table over to main process,
     wait until `copy` process catches-up.
   - If there is a table that is not synced and no `copy` process
     is already running, launch new `copy` process.
   - If there are sequences registered on subscriber, look latest state
     of them on provider and apply it to subscriber.
 * Event replay, all in one transaction on subscriber:
   - Apply events from the batch, only for tables that are registered
     on subscriber and are in sync.
   - Store tick_id on subscriber.

=== copy (internal) ===

Internal command for initial SYNC.  Launched by `replay` if it notices
that some tables are not in sync.  The reason to do table copying in
separate process is to avoid locking down main replay process for
long time.

Basic logic:

 * Register on the same queue in parallel with different name.
 * One transaction on subscriber:
   - Drop constraints and indexes.
   - Truncate table.
   - COPY data in.
   - Restore constraints and indexes.
   - Tag the table as `catching-up`.
 * When catching-up, the `copy` process acts as regular
   `replay` process but just for one table.
 * When it reaches queue end, when no more batches are immidiately
   available, it hands the table over to main `replay` process.

State changes between `replay` and `copy`:

 State                | Owner  | What is done
 ---------------------+--------+--------------------
 NULL                 | replay | Changes state to "in-copy", launches londiste.py copy process, continues with it's work
 in-copy              | copy   | drops indexes, truncates, copies data in, restores indexes, changes state to "catching-up"
 catching-up          | copy   | replay events for that table only until no more batches (means current moment),
                      |        | change state to "wanna-sync:<tick_id>" and wait for state to change
 wanna-sync:<tick_id> | replay | catch up to given tick_id, change state to "do-sync:<tick_id>" and wait for state to change
 do-sync:<tick_id>    | copy   | catch up to given tick_id, both replay and copy must now be at same position. change state to "ok" and exit
 ok                   | replay | synced table, events can be applied

Such state changes must guarantee that any process can die at any time and by just restarting it can
continue where it left.

"subscriber add" registers table with `NULL` state.  "subscriber add --expect-sync" registers table with `ok` state.

"subscriber resync" sets table state to `NULL`.

== Utility commands ==

=== repair ===

it tries to achieve a state where tables should be in sync and then compares
them and writes out SQL statements that would fix differences.

Syncing happens by locking provider tables against updates and then waiting
unitl `replay` has applied all pending changes to subscriber database. As this
is dangerous operation, it has hardwired limit of 10 seconds for locking.  If
`replay process does not catch up in that time, locks are releases and operation
is canceled.

Comparing happens by dumping out table from both sides, sorting them and
then comparing line-by-line.  As this is CPU and memory-hungry operation,
good practice is to run the `repair` command on third machine, to avoid
consuming resources on neither provider nor subscriber.

=== compare ===

it syncs tables like repair, but just runs SELECT count(*) on both sides,
to get a little bit cheaper but also less precise way of checking
if tables are in sync.

== Config file ==

    [londiste]
    job_name = test_to_subcriber
    
    # source database, where the queue resides
    provider_db = dbname=provider port=6000 host=127.0.0.1
    
    # destination database
    subscriber_db = dbname=subscriber port=6000 host=127.0.0.1
    
    # the queue where to listen on
    pgq_queue_name = londiste.replika
    
    # where to log
    logfile = ~/log/%(job_name)s.log
    
    # pidfile is used for avoiding duplicate processes
    pidfile = ~/pid/%(job_name)s.pid