Re: [Postgres-xc-developers] Patch ready for review: online data redistribution

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

The refactoring patch is good to be committed.

On Wed, Jul 11, 2012 at 10:33 AM, Michael Paquier <mic...@gm...
> wrote:

> Updated patch is attached.
> I attach once again the patch for remote copy, still pending for review.
>
> On Tue, Jul 10, 2012 at 6:41 PM, Ashutosh Bapat <
> ash...@en...> wrote:
>
>> 1. Please name the variables as local_hashalgorithm instead of
>> hashalgorithm_loc (loc is also used mean the location info e.g Get
>> RelationLocInfo()).
>>
> Done.
>
> 2. Please rename IsHashDistributable as IsTypeHashDistributable(). Same
>> case with IsModuloDistributable().
>>
> Done.
>
> 3. In function prologues, add information about input and output
>> variables, esp. in case of GetRelationDistributionItems().
>>
> Done.
>
> 4. This is not change in this patch, but good if you can accommodate it.
>> At line 1102, there is a switch case, which has action only for a single
>> case, so, it better be replaced with an "if".
>>
> Done.
>
> 5. SortRelationDistributionNodes, better be a macro, as it's not doing
>> anything but call qsort().
>>
> Don't agree on that. I feel it is clearer to let it as an external
> function as it is used afterwards in a more flexible way by redistribution.
>
>
>> 6. Following comment doesn't make much sense, please remove it. The
>> executor state at the time of table creation and querying can be completely
>> different. There is no connection
>> 1218          * We should use session data because Executor uses it as
>> well to run
>> 1219          * commands on nodes.
>>
> Done.
>
> 7. In GetRelationDistributionNodes(), there are three places, node sorting
>> function is called. Instead, you should just nodeoid array at these three
>> places and call the sorting function at the end. In case we need to add
>> another if case in that function, to get array of nodes in some other way,
>> one has to remember to add the call to sort the nodes array, which can be
>> avoided if you add the call to sort function at the end.
>> BuildRelationDistributionNodes() sorts the nodeoids inside it, but you can
>> take that call out of this function.
>>
> Done. Simplifies code.
>
>
>> 8. Probably not your code but, Function BuildRelationDistributionNodes()
>> does a repalloc() for every new nodeoid it finds. Each repalloc is costly.
>> Instead we can allocate memory large enough to contain all members of the
>> list passed. If there are node repeated (which will be less likely), we
>> will waste a few bytes, but won't be as expensive as calling repalloc().
>>
>
>
>> 9. All the renamed functions are marked as "extern", do you really need
>> them so? Also, I don't understand why these functions are located in heap.c?
>>
> Yes and yes. Do you remember this patch is a base for redistribution?
> Our code has an essential dependency with a static structure inside heap.c
> classifying the typle attributes called SysAtt. This dependency is really
> important because thanks to that we can check if a chosen distribution
> column is a system column or not. In case it is a distribution column, we
> return an error.
> This makes sufficient reasons to keep this code inside heap.c and heap.h.
>
> I hope regression is sane.
>
> They are, and testing regressions is one of the first things to do when
> reviewing a patch I believe.
>
> We should honestly move faster on those small reviews, and discuss about
> the core of redistribution which is the real purpose here.
> Thanks.
> --
> Michael Paquier
> http://michael.otacoo.com
>

-- 
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Enterprise Postgres Company