Skip to content

Commit 9c79e64

Browse files
committed
Frob numeric.c loop so that clang will auto-vectorize it too.
Experimentation shows that clang will auto-vectorize the critical multiplication loop if the termination condition is written "i2 < limit" rather than "i2 <= limit". This seems unbelievably stupid, but I've reproduced it on both clang 9.0.1 (RHEL8) and 11.0.3 (macOS Catalina). gcc doesn't care, so tweak the code to do it that way. Discussion: https://postgr.es/m/CAJ3gD9evtA_vBo+WMYMyT-u=keHX7-r8p2w7OSRfXf42LTwCZQ@mail.gmail.com
1 parent 87e6ed7 commit 9c79e64

File tree

1 file changed

+8
-6
lines changed

1 file changed

+8
-6
lines changed

src/backend/utils/adt/numeric.c

+8-6
Original file line numberDiff line numberDiff line change
@@ -8191,7 +8191,6 @@ mul_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result,
81918191
int res_weight;
81928192
int maxdigits;
81938193
int *dig;
8194-
int *dig_i1_2;
81958194
int carry;
81968195
int maxdig;
81978196
int newdig;
@@ -8327,7 +8326,7 @@ mul_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result,
83278326
* Add the appropriate multiple of var2 into the accumulator.
83288327
*
83298328
* As above, digits of var2 can be ignored if they don't contribute,
8330-
* so we only include digits for which i1+i2+2 <= res_ndigits - 1.
8329+
* so we only include digits for which i1+i2+2 < res_ndigits.
83318330
*
83328331
* This inner loop is the performance bottleneck for multiplication,
83338332
* so we want to keep it simple enough so that it can be
@@ -8336,10 +8335,13 @@ mul_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result,
83368335
* Since we aren't propagating carries in this loop, the order does
83378336
* not matter.
83388337
*/
8339-
i = Min(var2ndigits - 1, res_ndigits - i1 - 3);
8340-
dig_i1_2 = &dig[i1 + 2];
8341-
for (i2 = 0; i2 <= i; i2++)
8342-
dig_i1_2[i2] += var1digit * var2digits[i2];
8338+
{
8339+
int i2limit = Min(var2ndigits, res_ndigits - i1 - 2);
8340+
int *dig_i1_2 = &dig[i1 + 2];
8341+
8342+
for (i2 = 0; i2 < i2limit; i2++)
8343+
dig_i1_2[i2] += var1digit * var2digits[i2];
8344+
}
83438345
}
83448346

83458347
/*

0 commit comments

Comments
 (0)