Re: The plan for FDW-based sharding

Поиск
Список
Период
Сортировка
От	Konstantin Knizhnik
Тема	Re: The plan for FDW-based sharding
Дата	1 марта 2016 г. 17:07:44
Msg-id	[email protected] обсуждение исходный текст
Ответ на	Re: The plan for FDW-based sharding (Robert Haas <[email protected]>)
Ответы	Re: The plan for FDW-based sharding
Список	pgsql-hackers
Дерево обсуждения
Thank you very much for you comments.<br /><br /><div class="moz-cite-prefix">On 01.03.2016 18:19, Robert Haas
wrote:<br/></div><blockquote cite="mid:CA+TgmobUnzf+A_Fk_EYXTnHiwVKZsKV=gRVuzTCoibhPWhq2UA@mail.gmail.com"
type="cite"><prewrap="">On Sat, Feb 27, 2016 at 2:29 AM, Konstantin Knizhnik
 
<a class="moz-txt-link-rfc2396E" href="mailto:[email protected]"><[email protected]></a> wrote:
</pre><blockquote type="cite"><blockquote type="cite"><pre wrap="">How do you prevent clock skew from causing
serializationanomalies?
 
</pre></blockquote><pre wrap="">
If node receives message from "feature" it just needs to wait until this
future arrive.
Practically we just "adjust" system time in this case, moving it forward
(certainly system time is not actually changed, we just set correction value
which need to be added to system time).
This approach was discussed in the article:
<a class="moz-txt-link-freetext"
href="http://research.microsoft.com/en-us/people/samehe/clocksi.srds2013.pdf">http://research.microsoft.com/en-us/people/samehe/clocksi.srds2013.pdf</a>
I hope, in this article algorithm is explained much better than I can do
here.
</pre></blockquote><pre wrap="">
Hmm, the approach in that article is very interesting, but it sounds
different than what you are describing - they do not, AFAICT, have
anything like a "correction value"</pre></blockquote><br /> In the article them used anotion "wait":<br /><br /><div
data-canvas-width="98.2568"style="left: 702.708px; top:     787.766px; font-size: 14.944px; font-family: sans-serif;
transform: scaleX(0.847041);">if T.SnapshotTime>GetClockTime()</div><div data-canvas-width="98.2568" style="left:
787.44px;top:     804.37px; font-size: 14.944px; font-family: sans-serif; transform:     scaleX(0.847041);">then wait
untilT.SnapshotTime<GetClockTime()</div><br /> Originally we really do sleep here, but then we think that instead of
sleepingwe can just adjust local time.<br /> Sorry, I do not have format prove it is equivalent but... at least we have
notencountered any inconsistencies after this fix and performance is improved.<br /><blockquote
cite="mid:CA+TgmobUnzf+A_Fk_EYXTnHiwVKZsKV=gRVuzTCoibhPWhq2UA@mail.gmail.com"type="cite"><pre wrap="">
 

</pre><blockquote type="cite"><pre wrap="">There are well know limitation of this  pg_tsdtm which we will try to
address in future.
</pre></blockquote><pre wrap="">
How well known are those limitations?  Are they documented somewhere?
Or are they only well-known to you?
</pre></blockquote> Sorry, well know for us.<br /> But them are described at DTM wiki page.<br /> Right now pg_tsdtm is
notsupporting correct distributed deadlock detection (is not building global lock graph) and is detecting distributed
deadlocksjust based on timeouts.<br /> It doesn't support explicit locks but "select for update" will work
correctly.<br/><br /><blockquote cite="mid:CA+TgmobUnzf+A_Fk_EYXTnHiwVKZsKV=gRVuzTCoibhPWhq2UA@mail.gmail.com"
type="cite"><prewrap="">
 
</pre><blockquote type="cite"><pre wrap="">What we want is to include XTM API in PostgreSQL to be able to continue our
experiments with different transaction managers and implementing multimaster
on top of it (our first practical goal) without affecting PostgreSQL core.

If XTM patch will be included in 9.6, then we can propose our multimaster as
PostgreSQL extension and everybody can use it.
Otherwise we have to propose our own fork of Postgres which significantly
complicates using and maintaining it.
</pre></blockquote><pre wrap="">
Well I still think what I said before is valid.  If the code is good,
let it be a core submission.  If it's not ready yet, submit it to core
when it is.  If it can't be made good, forget it.</pre></blockquote><br /> I have nothing against committing DTM code
incore. But still the best way of integration it is to use a-la-OO approach.<br /> So still need API. Inserting if-s or
switchesin existed code is IMHO ugly idea.<br /><br /> Also it is not enough for DTM code to be just "good". It should
provideexpected functionality.<br /> But which functionality is expected? From my experience of development different
clustersolutions I can say that<br /> different customers have very different requirements. It is very hard if ever
possibleto satisfy them all.<br /><br /> Right now I do not feel that I can predict all possible requirements to
DTM.<br/> This is why we want to provide some API, propose some implementations of this API, receive feedbecks and get
betterunderstanding which functionality is actually needed by customers.<br /><br /><br /><blockquote
cite="mid:CA+TgmobUnzf+A_Fk_EYXTnHiwVKZsKV=gRVuzTCoibhPWhq2UA@mail.gmail.com"type="cite"><pre wrap="">
 

</pre><blockquote type="cite"><blockquote type="cite"><pre wrap="">This seems rather defeatist.  If the code is good
andreliable, why
 
should it not be committed to core?
</pre></blockquote><pre wrap="">
Two reasons:
1. There is no ideal implementation of DTM which will fit all possible needs
and be  efficient for all clusters.
</pre></blockquote><pre wrap="">
Hmm, what is the reasoning behind that statement?  I mean, it is
certainly true that there are some places where we have decided that
one-size-fits-all is not the right approach.  Indexing, for example.
But there are many other places where we have not chosen to make
things pluggable, and that I don't think it should be taken for
granted that plugability is always an advantage.

I fear that building a DTM that is fully reliable and also
well-performing is going to be really hard, and I think it would be
far better to have one such DTM that is 100% reliable than two or more
implementations each of which are 99% reliable.
</pre></blockquote><br /> The question is not about it's reliability, but mostly about its functionality and
flexibility.<br/><br /><blockquote cite="mid:CA+TgmobUnzf+A_Fk_EYXTnHiwVKZsKV=gRVuzTCoibhPWhq2UA@mail.gmail.com"
type="cite"><prewrap="">
 
</pre><blockquote type="cite"><pre wrap="">2. Even if such implementation exists, still the right way of it
integration
is Postgres should use kind of TM API.
</pre></blockquote><pre wrap="">
Sure, APIs are generally good, but that doesn't mean *this* API is good.</pre></blockquote><br /> Well, I do not what
tosay "better than nothing", but I find this API to be a reasonable compromise between flexibility and minimization of
changesin PostgreSQL core. If you have some suggestions how to improve it,  I will be glad to receive them.<br /><br
/><blockquotecite="mid:CA+TgmobUnzf+A_Fk_EYXTnHiwVKZsKV=gRVuzTCoibhPWhq2UA@mail.gmail.com" type="cite"><pre wrap="">
 

</pre><blockquote type="cite"><pre wrap="">I hope that everybody will agree that doing it in this way:

#ifdef PGXC       /* In Postgres-XC, stop timestamp has to follow the timeline of GTM
*/       xlrec.xact_time = xactStopTimestamp + GTMdeltaTimestamp;
#else       xlrec.xact_time = xactStopTimestamp;
#endif
</pre></blockquote><pre wrap="">
PGXC chose that style in order to simplify merging.  I wouldn't have
picked the same thing, but I don't know why it deserves scorn.

</pre><blockquote type="cite"><pre wrap="">or in this way:
       xlrec.xact_time = xactUseGTM ? xactStopTimestamp + GTMdeltaTimestamp
: xactStopTimestamp;

is very very bad idea.
</pre></blockquote><pre wrap="">
I don't know why that is such a bad idea.  It's a heck of a lot faster
than insisting on calling some out-of-line function.  It might be a
bad idea, but I think we need to decide that, not assume it.

</pre></blockquote> It violates modularity, complicates code, makes it more error prone.<br /> I still prefer to
extractall DTM code in separate module.<br /> It should not necessary be an extension.<br /> But from the other side -
itis not required to put in in core.<br /> At least at this stage. As i already wrote - not just because code is not
goodenough or is not reliable enough,<br /> but because I am not sure that it is fits all (or just most) of use
cases.<br/><br /><pre class="moz-signature" cols="72">-- 
 
Konstantin Knizhnik
Postgres Professional: <a class="moz-txt-link-freetext"
href="http://www.postgrespro.com">http://www.postgrespro.com</a>
The Russian Postgres Company </pre>
В списке pgsql-hackers по дате отправления:
Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: The plan for FDW-based sharding