Re: Postgresql replication - Mailing list pgsql-general
From | William Yu |
---|---|
Subject | Re: Postgresql replication |
Date | |
Msg-id | [email protected] Whole thread Raw |
In response to | Re: Postgresql replication (William Yu <[email protected]>) |
Responses |
Re: Postgresql replication
|
List | pgsql-general |
Another tidbit I'd like to add. What has helped a lot in implementing high-latency master-master replication writing our software with a business process model in mind where data is not posted directly to the final tables. Instead, users are generally allowed to enter anything -- could be incorrect, incomplete or the user does not have rights -- the data is still dumped into "pending" tables for people with rights to fix/review/approve later. Only after that process is the data posted to the final tables. (Good data entered on the first try still gets pended -- validation phase simply assumes the user who entered the data is also the one who fixed/reviewed/approved.) In terms of replication, this model allows for users to enter data on any server. The pending records then get replicated to every server. Each specific server then looks at it's own dataset of pendings to post to final tables. Final data is then replicated back to all the participating servers. There may be a delay for the user if he/she is working on a server that doesn't have rights to post his data. However, the pending->post model gets users used to the idea of (1) entering all data in large swoop and validating/posting it afterwards and (2) data can/will sit in pending for a period of time until it is acted upon with somebody/some server with the proper authority. Hence users aren't expecting results to pop up on the screen the moment they press the submit button. William Yu wrote: > Yes, it requires a lot foresight to do multi-master replication -- > especially across high latency connections. I do that now for 2 > different projects. We have servers across the country replicating data > every X minutes with custom app logic resolves conflicting data. > > Allocation of unique IDs that don't collide across servers is a must. > For 1 project, instead of using numeric IDs, we using CHAR and > pre-append a unique server code so record #1 on server A is A0000000001 > versus ?x0000000001 on other servers. For the other project, we were too > far along in development to change all our numerics into chars so we > wrote custom sequence logic to divide our 10billion ID space into > 1-Xbillion for server 1, X-Ybillion for server 2, etc. > > With this step taken, we then had to isolate (1) transactions could run > on any server w/o issue (where we always take the newest record), (2) > transactions required an amalgam of all actions and (3) transactions had > to be limited to "home" servers. Record keeping stuff where we keep a > running history of all changes fell into the first category. It would > have been no different than 2 users on the same server updating the same > object at different times during the day. Updating of summary data fell > into category #2 and required parsing change history of individual > elements. Category #3 would be financial transactions requiring strict > locks were be divided up by client/user space and restricted to the > user's home server. This case would not allow auto-failover. Instead, it > would require some prolonged threshold of downtime for a server before > full financials are allowed on backup servers.
pgsql-general by date: