Hi again Reinhard,

On Fri, Nov 25, 2016 at 09:17:41AM +0100, Reinhard Vicinus wrote:
> Hi Willy,
> 
> if the cost are to high, then I have no problem keeping the known
> behavior. The only thing I would suggest is to document it, because it
> caused me some headache to figure out why the values were always to low
> and I couldn't find any information, that this behavior is a know problem.

I finally found a much more elegant solution by improving the formula to
save one multiply. Not only does this avoid the overflow without changing
the integer size, it's also faster :-)

Now the limit is at 8.4M milliseconds of average time, or around 2h18m,
this should be plenty for most situations! I'm attaching the patch I've
just merged for this.

Best regards,
Willy
>From 3758581e197dd7b390fb3da94c08f34e0d319c07 Mon Sep 17 00:00:00 2001
From: Willy Tarreau <[email protected]>
Date: Fri, 25 Nov 2016 11:55:10 +0100
Subject: BUG/MINOR: freq-ctr: make swrate_add() support larger values
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4

Reinhard Vicinus reported that the reported average response times cannot
be larger than 16s due to the double multiply being performed by
swrate_add() which causes an overflow very quickly. Indeed, with N=512,
the highest average value is 16448.

One solution proposed by Reinhard is to turn to long long, but this
involves 64x64 multiplies and 64->32 divides, which are extremely
expensive on 32-bit platforms.

There is in fact another way to avoid the overflow without using larger
integers, it consists in avoiding the multiply using the fact that
x*(n-1)/N = x-(x/N).

Now it becomes possible to store average values as large as 8.4 millions,
which is around 2h18mn.

Interestingly, this improvement also makes the code cheaper to execute
both on 32 and on 64 bit platforms :

Before :

00000000 <swrate_add>:
   0:   8b 54 24 04             mov    0x4(%esp),%edx
   4:   8b 0a                   mov    (%edx),%ecx
   6:   89 c8                   mov    %ecx,%eax
   8:   c1 e0 09                shl    $0x9,%eax
   b:   29 c8                   sub    %ecx,%eax
   d:   8b 4c 24 0c             mov    0xc(%esp),%ecx
  11:   c1 e8 09                shr    $0x9,%eax
  14:   01 c8                   add    %ecx,%eax
  16:   89 02                   mov    %eax,(%edx)

After :

00000020 <swrate_add>:
  20:   8b 4c 24 04             mov    0x4(%esp),%ecx
  24:   8b 44 24 0c             mov    0xc(%esp),%eax
  28:   8b 11                   mov    (%ecx),%edx
  2a:   01 d0                   add    %edx,%eax
  2c:   81 c2 ff 01 00 00       add    $0x1ff,%edx
  32:   c1 ea 09                shr    $0x9,%edx
  35:   29 d0                   sub    %edx,%eax
  37:   89 01                   mov    %eax,(%ecx)

This fix may be backported to 1.6.
---
 include/proto/freq_ctr.h | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/proto/freq_ctr.h b/include/proto/freq_ctr.h
index 65388b1..70b295e 100644
--- a/include/proto/freq_ctr.h
+++ b/include/proto/freq_ctr.h
@@ -182,7 +182,19 @@ unsigned int freq_ctr_remain_period(struct freq_ctr_period 
*ctr, unsigned int pe
  *
  * So basically by summing values and applying the last result an (N-1)/N 
factor
  * we just get N times the values over the long term, so we can recover the
- * constant value V by dividing by N.
+ * constant value V by dividing by N. In order to limit the impact of integer
+ * overflows, we'll use this equivalence which saves us one multiply :
+ *
+ *               N - 1                   1             x0
+ *    x1 = x0 * -------   =  x0 * ( 1 - --- )  = x0 - ----
+ *                 N                     N              N
+ *
+ * And given that x0 is discrete here we'll have to saturate the values before
+ * performing the divide, so the value insertion will become :
+ *
+ *               x0 + N - 1
+ *    x1 = x0 - ------------
+ *                    N
  *
  * A value added at the entry of the sliding window of N values will thus be
  * reduced to 1/e or 36.7% after N terms have been added. After a second batch,
@@ -220,7 +232,7 @@ unsigned int freq_ctr_remain_period(struct freq_ctr_period 
*ctr, unsigned int pe
  */
 static inline unsigned int swrate_add(unsigned int *sum, unsigned int n, 
unsigned int v)
 {
-       return *sum = *sum * (n - 1) / n + v;
+       return *sum = *sum - (*sum + n - 1) / n + v;
 }
 
 /* Returns the average sample value for the sum <sum> over a sliding window of
-- 
1.7.12.1

Reply via email to