Re: The plan for FDW-based sharding
От | Konstantin Knizhnik |
---|---|
Тема | Re: The plan for FDW-based sharding |
Дата | |
Msg-id | [email protected] обсуждение исходный текст |
Ответ на | Re: The plan for FDW-based sharding (Robert Haas <[email protected]>) |
Ответы |
Re: The plan for FDW-based sharding
|
Список | pgsql-hackers |
Thank you very much for you comments.<br /><br /><div class="moz-cite-prefix">On 01.03.2016 18:19, Robert Haas wrote:<br/></div><blockquote cite="mid:CA+TgmobUnzf+A_Fk_EYXTnHiwVKZsKV=gRVuzTCoibhPWhq2UA@mail.gmail.com" type="cite"><prewrap="">On Sat, Feb 27, 2016 at 2:29 AM, Konstantin Knizhnik <a class="moz-txt-link-rfc2396E" href="mailto:[email protected]"><[email protected]></a> wrote: </pre><blockquote type="cite"><blockquote type="cite"><pre wrap="">How do you prevent clock skew from causing serializationanomalies? </pre></blockquote><pre wrap=""> If node receives message from "feature" it just needs to wait until this future arrive. Practically we just "adjust" system time in this case, moving it forward (certainly system time is not actually changed, we just set correction value which need to be added to system time). This approach was discussed in the article: <a class="moz-txt-link-freetext" href="http://research.microsoft.com/en-us/people/samehe/clocksi.srds2013.pdf">http://research.microsoft.com/en-us/people/samehe/clocksi.srds2013.pdf</a> I hope, in this article algorithm is explained much better than I can do here. </pre></blockquote><pre wrap=""> Hmm, the approach in that article is very interesting, but it sounds different than what you are describing - they do not, AFAICT, have anything like a "correction value"</pre></blockquote><br /> In the article them used anotion "wait":<br /><br /><div data-canvas-width="98.2568"style="left: 702.708px; top: 787.766px; font-size: 14.944px; font-family: sans-serif; transform: scaleX(0.847041);">if T.SnapshotTime>GetClockTime()</div><div data-canvas-width="98.2568" style="left: 787.44px;top: 804.37px; font-size: 14.944px; font-family: sans-serif; transform: scaleX(0.847041);">then wait untilT.SnapshotTime<GetClockTime()</div><br /> Originally we really do sleep here, but then we think that instead of sleepingwe can just adjust local time.<br /> Sorry, I do not have format prove it is equivalent but... at least we have notencountered any inconsistencies after this fix and performance is improved.<br /><blockquote cite="mid:CA+TgmobUnzf+A_Fk_EYXTnHiwVKZsKV=gRVuzTCoibhPWhq2UA@mail.gmail.com"type="cite"><pre wrap=""> </pre><blockquote type="cite"><pre wrap="">There are well know limitation of this pg_tsdtm which we will try to address in future. </pre></blockquote><pre wrap=""> How well known are those limitations? Are they documented somewhere? Or are they only well-known to you? </pre></blockquote> Sorry, well know for us.<br /> But them are described at DTM wiki page.<br /> Right now pg_tsdtm is notsupporting correct distributed deadlock detection (is not building global lock graph) and is detecting distributed deadlocksjust based on timeouts.<br /> It doesn't support explicit locks but "select for update" will work correctly.<br/><br /><blockquote cite="mid:CA+TgmobUnzf+A_Fk_EYXTnHiwVKZsKV=gRVuzTCoibhPWhq2UA@mail.gmail.com" type="cite"><prewrap=""> </pre><blockquote type="cite"><pre wrap="">What we want is to include XTM API in PostgreSQL to be able to continue our experiments with different transaction managers and implementing multimaster on top of it (our first practical goal) without affecting PostgreSQL core. If XTM patch will be included in 9.6, then we can propose our multimaster as PostgreSQL extension and everybody can use it. Otherwise we have to propose our own fork of Postgres which significantly complicates using and maintaining it. </pre></blockquote><pre wrap=""> Well I still think what I said before is valid. If the code is good, let it be a core submission. If it's not ready yet, submit it to core when it is. If it can't be made good, forget it.</pre></blockquote><br /> I have nothing against committing DTM code incore. But still the best way of integration it is to use a-la-OO approach.<br /> So still need API. Inserting if-s or switchesin existed code is IMHO ugly idea.<br /><br /> Also it is not enough for DTM code to be just "good". It should provideexpected functionality.<br /> But which functionality is expected? From my experience of development different clustersolutions I can say that<br /> different customers have very different requirements. It is very hard if ever possibleto satisfy them all.<br /><br /> Right now I do not feel that I can predict all possible requirements to DTM.<br/> This is why we want to provide some API, propose some implementations of this API, receive feedbecks and get betterunderstanding which functionality is actually needed by customers.<br /><br /><br /><blockquote cite="mid:CA+TgmobUnzf+A_Fk_EYXTnHiwVKZsKV=gRVuzTCoibhPWhq2UA@mail.gmail.com"type="cite"><pre wrap=""> </pre><blockquote type="cite"><blockquote type="cite"><pre wrap="">This seems rather defeatist. If the code is good andreliable, why should it not be committed to core? </pre></blockquote><pre wrap=""> Two reasons: 1. There is no ideal implementation of DTM which will fit all possible needs and be efficient for all clusters. </pre></blockquote><pre wrap=""> Hmm, what is the reasoning behind that statement? I mean, it is certainly true that there are some places where we have decided that one-size-fits-all is not the right approach. Indexing, for example. But there are many other places where we have not chosen to make things pluggable, and that I don't think it should be taken for granted that plugability is always an advantage. I fear that building a DTM that is fully reliable and also well-performing is going to be really hard, and I think it would be far better to have one such DTM that is 100% reliable than two or more implementations each of which are 99% reliable. </pre></blockquote><br /> The question is not about it's reliability, but mostly about its functionality and flexibility.<br/><br /><blockquote cite="mid:CA+TgmobUnzf+A_Fk_EYXTnHiwVKZsKV=gRVuzTCoibhPWhq2UA@mail.gmail.com" type="cite"><prewrap=""> </pre><blockquote type="cite"><pre wrap="">2. Even if such implementation exists, still the right way of it integration is Postgres should use kind of TM API. </pre></blockquote><pre wrap=""> Sure, APIs are generally good, but that doesn't mean *this* API is good.</pre></blockquote><br /> Well, I do not what tosay "better than nothing", but I find this API to be a reasonable compromise between flexibility and minimization of changesin PostgreSQL core. If you have some suggestions how to improve it, I will be glad to receive them.<br /><br /><blockquotecite="mid:CA+TgmobUnzf+A_Fk_EYXTnHiwVKZsKV=gRVuzTCoibhPWhq2UA@mail.gmail.com" type="cite"><pre wrap=""> </pre><blockquote type="cite"><pre wrap="">I hope that everybody will agree that doing it in this way: #ifdef PGXC /* In Postgres-XC, stop timestamp has to follow the timeline of GTM */ xlrec.xact_time = xactStopTimestamp + GTMdeltaTimestamp; #else xlrec.xact_time = xactStopTimestamp; #endif </pre></blockquote><pre wrap=""> PGXC chose that style in order to simplify merging. I wouldn't have picked the same thing, but I don't know why it deserves scorn. </pre><blockquote type="cite"><pre wrap="">or in this way: xlrec.xact_time = xactUseGTM ? xactStopTimestamp + GTMdeltaTimestamp : xactStopTimestamp; is very very bad idea. </pre></blockquote><pre wrap=""> I don't know why that is such a bad idea. It's a heck of a lot faster than insisting on calling some out-of-line function. It might be a bad idea, but I think we need to decide that, not assume it. </pre></blockquote> It violates modularity, complicates code, makes it more error prone.<br /> I still prefer to extractall DTM code in separate module.<br /> It should not necessary be an extension.<br /> But from the other side - itis not required to put in in core.<br /> At least at this stage. As i already wrote - not just because code is not goodenough or is not reliable enough,<br /> but because I am not sure that it is fits all (or just most) of use cases.<br/><br /><pre class="moz-signature" cols="72">-- Konstantin Knizhnik Postgres Professional: <a class="moz-txt-link-freetext" href="http://www.postgrespro.com">http://www.postgrespro.com</a> The Russian Postgres Company </pre>
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Alvaro HerreraДата:
Сообщение: Re: Confusing with commit time usage in logical decoding
Следующее
От: Petr JelinekДата:
Сообщение: Re: Confusing with commit time usage in logical decoding