Pairwise summation: Difference between revisions

Content deleted Content added

Inline

Revision as of 21:59, 18 March 2010

In numerical analysis, pairwise summation, also called cascade summation, is a technique to sum a sequence of finite-precision floating-point numbers that substantially reduces the accumulated round-off error compared to naively accumulating the sum in sequence.^[1] Although there are other techniques such as Kahan summation that typically have even smaller round-off errors, pairwise summation is nearly as good (differing only by a logarithmic factor) while having much lower computational cost—it can be implemented so as to have nearly the same cost (and exactly the same number of arithmetic operations) as naive summation.

In particular, pairwise summation of a sequence of n numbers x_n works by recursively breaking the sequence into two halves, summing each half, and adding the two sums: a divide and conquer algorithm. Its roundoff errors grow asymptotically as at most O(ε log n), where ε is the machine precision (assuming a fixed condition number, as discussed below).^[1] In comparison, the naive technique of accumulating the sum in sequence (adding each x_i one at a time for i=1,...,n) has roundoff errors that grow at worst as O(εn).^[1] Kahan summation has a worst-case error of roughly O(ε), independent of n, but requires several times more arithmetic operations.^[1] If the roundoff errors are random, and in particular have random signs, then they form a random walk and the error growth is reduced to an average of $O(\varepsilon {\sqrt {\log n}})$ for pairwise summation.^[2]

Precisely the recursive structure of pairwise summation is found in many fast Fourier transform (FFT) algorithms, and is responsible for the same slow roundoff accumulation of those FFTs.^[2]^[3]

The algorithm

In pseudocode, the pairwise summation algorithm for an array x of length n can be written:

s = pairwise(x[1…n])
      if n ≤ N                    base case: naive summation for a sufficiently small array
          s = x[i]
          for i = 2 to n
              s = s + x[i]
      else                        divide and conquer: recursively sum two halves of the array
          m = floor(n / 2)
          s = pairwise(x[1…m]) + pairwise(x[m+1…n])
      endif

For some sufficiently small N, this algorithm switches to a naive loop-based summation as a base case, whose error bound is O(εN). Therefore, the entire sum has a worst-case error that grows asymptotically as O(εN log n) for large n, for a given condition number (see below), and the smallest error bound is attained for N=1.

However in an algorithm this sort (as for divide and conquer algorithms in general^[4]) it is desirable to use a larger base case in order to amortize the overhead of the recursion. If N=1, then there is roughly one recursive subroutine call for every input, but more generally there is one recursive call for (roughly) every N inputs. By making N sufficiently large, the overhead of recursion can be made negligible (precisely this technique of a large base case is employed by high-performance FFT implementations^[3]).

Regardless of N, exactly n−1 additions are performed in total, the same as for naive summation, so if the recursion overhead is made negligible then pairwise summation has essentially the same computational cost as for naive summation.

Superblock summation

A variation on this idea is to break the sum into b blocks at each recursive stage, summing each block recursively, and then summing the results, and was dubbed a "superblock" algorithm by its proposers.^[5] The above pairwise algorithm corresponds to b=2 for every stage except for the last stage which is b=N.

s = superblock_b(x[1…n])
      if n ≤ b                    base case: naive summation for a sufficiently small array
          s = x[i]
          for i = 2 to n
              s = s + x[i]
      else                        divide and conquer: recursively sum b blocks of the array
          m = floor(n / b)
          s = superblock_b(x[1…m])
          for i = 2 to b−1
              s = superblock_b(x[m⋅(i−1)+1…m⋅i])
          s = superblock_b(x[m⋅(b−1)+1…n])
      endif

For t recursive steps, the block size that minimizes the worst-case error is b=n^1/t, or equivalently t=log_bn.^[5] The case t=1 corresponds to naive summation and t=log₂n corresponds to pairwise summation (b=2). The worst-case error bound then scales as O(b log_bn).^[5].

Accuracy

Suppose that one is summing n values x_i, for i=1,...,n. The exact sum is:

S_{n}=\sum _{i=1}^{n}x_{i}

(computed with infinite precision)

With pairwise summation for a base case N=1, one instead obtains $S_{n}+E_{n}$ , where the error $E_{n}$ is bounded above by:^[1]

|E_{n}|\leq {\frac {\varepsilon \log _{2}n}{1-\varepsilon \log _{2}n}}\sum _{i=1}^{n}|x_{i}|

where ε is the machine precision of the arithmetic being employed (e.g. ε≈10⁻¹⁶ for standard double precision floating point). Usually, the quantity of interest is the relative error $|E_{n}|/|S_{n}|$ , which is therefore bounded above by:

{\frac {|E_{n}|}{|S_{n}|}}\leq {\frac {\varepsilon \log _{2}n}{1-\varepsilon \log _{2}n}}\left({\frac {\sum _{i=1}^{n}|x_{i}|}{\left|\sum _{i=1}^{n}x_{i}\right|}}\right).

In the expression for the relative error bound, the fraction (Σ|x_i|/|Σx_i|) is the condition number of the summation problem. Essentially, the condition number represents the intrinsic sensitivity of the summation problem to errors, regardless of how it is computed.^[6] The relative error bound of every (backwards stable) summation method by a fixed algorithm in fixed precision (i.e. not those that use arbitrary precision arithmetic, nor algorithms whose memory and time requirements change based on the data), is proportional to this condition number.^[1] An ill-conditioned summation problem is one in which this ratio is large, and in this case even pairwise summation can have a large relative error. For example, if the summands x_i are uncorrelated random numbers with zero mean, the sum is a random walk and the condition number will grow proportional to ${\sqrt {n}}$ . On the other hand, for random inputs with nonzero mean the condition number asymptotes to a finite constant as $n\to \infty$ . If the inputs are all non-negative, then the condition number is 1.

Note that the $1-\varepsilon \log _{2}n$ denominator is effectively 1 in practice, since $\varepsilon \log _{2}n$ is much smaller than 1 until n becomes of order 2^1/ε, which is roughly 10^10¹⁵ in double precision.

In comparison, the relative error bound for naive summation (simply adding the numbers in sequence, rounding at each step) grows as $O(\varepsilon n)$ multiplied by the condition number.^[1] In practice, it is much more likely that the rounding errors have a random sign, with zero mean, so that they form a random walk; in this case, naive summation has a root mean square relative error that grows as $O(\varepsilon {\sqrt {n}})$ and pairwise summation as an error that grows as $O(\varepsilon {\sqrt {\log n}})$ on average.^[2]

References

^ ^a ^b ^c ^d ^e ^f ^g Higham, Nicholas J. (1993), "The accuracy of floating point summation", SIAM Journal on Scientific Computing, 14 (4): 783–799, doi:10.1137/0914050
^ ^a ^b ^c Manfred Tasche and Hansmartin Zeuner Handbook of Analytic-Computational Methods in Applied Mathematics Boca Raton, FL: CRC Press, 2000).
^ ^a ^b S. G. Johnson and M. Frigo, "Implementing FFTs in practice, in Fast Fourier Transforms, edited by C. Sidney Burrus (2008).
^ Radu Rugina and Martin Rinard, "Recursion unrolling for divide and conquer programs," in Languages and Compilers for Parallel Computing, chapter 3, pp. 34–48. Lecture Notes in Computer Science vol. 2017 (Berlin: Springer, 2001).
^ ^a ^b ^c Anthony M. Castaldo, R. Clint Whaley, and Anthony T. Chronopoulos, "Reducing floating-point error in dot product using the superblock family of algorithms," SIAM J. Sci. Comput., vol. 32, pp. 1156–1174 (2008).
^ L. N. Trefethen and D. Bau, Numerical Linear Algebra (SIAM: Philadelphia, 1997).

[Higham93-1] ^ ^a ^b ^c ^d ^e ^f ^g Higham, Nicholas J. (1993), "The accuracy of floating point summation", SIAM Journal on Scientific Computing, 14 (4): 783–799, doi:10.1137/0914050

[Tasche-2] Manfred Tasche and Hansmartin Zeuner Handbook of Analytic-Computational Methods in Applied Mathematics Boca Raton, FL: CRC Press, 2000).

[JohnsonFrigo08-3] S. G. Johnson and M. Frigo, "Implementing FFTs in practice, in Fast Fourier Transforms, edited by C. Sidney Burrus (2008).

[4] Radu Rugina and Martin Rinard, "Recursion unrolling for divide and conquer programs," in Languages and Compilers for Parallel Computing, chapter 3, pp. 34–48. Lecture Notes in Computer Science vol. 2017 (Berlin: Springer, 2001).

[Castaldo08-5] Anthony M. Castaldo, R. Clint Whaley, and Anthony T. Chronopoulos, "Reducing floating-point error in dot product using the superblock family of algorithms," SIAM J. Sci. Comput., vol. 32, pp. 1156–1174 (2008).

[6] L. N. Trefethen and D. Bau, Numerical Linear Algebra (SIAM: Philadelphia, 1997).

[1]

[2]

[3]

[4]

[5]

[6]

@@ Line 45: / Line 45: @@
        endif
-For ''t'' recursive steps, the block size that minimizes the worst-case error is ''b''=''n''<sup>1/''t''</sup>.<ref name=Castaldo08/>  The case ''t''=1 corresponds to naive summation and ''t''=log<sub>2</sub>''n'' corresponds to pairwise summation (''b''=2).  The worst-case error bound then scales as ''O''(''t''&nbsp;''b'').<ref name=Castaldo08/>.
+For ''t'' recursive steps, the block size that minimizes the worst-case error is ''b''=''n''<sup>1/''t''</sup>, or equivalently ''t''=log<sub>''b''</sub>''n''.<ref name=Castaldo08/>  The case ''t''=1 corresponds to naive summation and ''t''=log<sub>2</sub>''n'' corresponds to pairwise summation (''b''=2).  The worst-case error bound then scales as ''O''(''b''&nbsp;log<sub>''b''</sub>''n'').<ref name=Castaldo08/>.
 ==Accuracy==