Fredrik Johansson: Division: the sequel (with bonus material)

I have neglected to write a GSoC update for two weeks (where "two" was computed using round-to-floor), so let me first apologize for that.

In my last post, I wrote about how Newton's method (coupled with a fast multiplication algorithm, such as Karatsuba multiplication) can be used for asymptotically fast division of long integers. I have now fixed up the code to compute a correctly rounded quotient. In fact, it now performs the full divmod operation, returning both the quotient and a correct remainder.

The trick is to perform an extra exact multiplication at the end to determine the remainder. By definition, the quotient r of p/q is correct if the remainder m = p - r·q satisfies 0 ≤ m < q. If this inequality does not hold, one need only perform an additional divmod operation (which can be performed using standard long division, since the error will almost certainly fit in a single limb) to correct both the quotient and the remainder.

The extra multiplication causes some slowdown, but it's not too bad. The new idivmod function still breaks even with the builtin divmod somewhere around 1000-2000 digits and is 10 times faster at half a million digits (i.e. when dividing a million digit number by a half-million digit number):



>>> time_division(int, 2.0, 20)
      size     old time     new time       faster  correct
------------------------------------------------------------
        16     0.000008     0.000052     0.155080     True
        32     0.000005     0.000052     0.102151     True
        64     0.000008     0.000059     0.132075     True
       128     0.000013     0.000070     0.190476     True
       256     0.000035     0.000107     0.325521     True
       512     0.000126     0.000215     0.583658     True
      1024     0.000431     0.000532     0.810399     True
      2048     0.001855     0.001552     1.195104     True
      4096     0.007154     0.005050     1.416708     True
      8192     0.028505     0.015449     1.845033     True
     16384     0.111193     0.046938     2.368925     True
     32768     0.443435     0.142551     3.110706     True
     65536     1.778292     0.432412     4.112497     True
    131072     7.110184     1.305771     5.445200     True
    262144    28.596926     3.919399     7.296253     True
    524288   116.069764    11.804032     9.833061     True

As a bonus, a fast divmod also provides a fast way to convert long integers to decimal strings, by recursively splitting n in half using L, R = divmod(n, 10^b/2) where b is the number of digits in n:


>>> time_str(int, 20)
      size     old time     new time       faster  correct
------------------------------------------------------------
        16     0.000005     0.000013     0.413043     True
        32     0.000006     0.000009     0.588235     True
        64     0.000008     0.000012     0.674419     True
       128     0.000020     0.000023     0.865854     True
       256     0.000059     0.000133     0.442105     True
       512     0.000204     0.000333     0.613255     True
      1024     0.000895     0.001194     0.749708     True
      2048     0.003505     0.002252     1.556824     True
      4096     0.013645     0.006600     2.067429     True
      8192     0.052386     0.018334     2.857358     True
     16384     0.209164     0.052233     4.004412     True
     32768     0.834201     0.153238     5.443827     True
     65536     3.339629     0.450897     7.406639     True
    131072    13.372223     1.339044     9.986392     True
    262144    53.547894     3.998352    13.392491     True
    524288   214.847486    11.966933    17.953429     True

So printing an integer with half a million digits takes 12 seconds instead of 3.5 minutes. This can be quite a usability improvement.

The code can be found in this file: div.py.

I have now submitted a request for these algorithms to be implemented in the Python core. Since the pure Python implementation is very simple, I don't think porting it to C would be all that difficult. By "request", I mean that I might eventually do it myself, but there are many others who are much more familiar with both the Python codebase and the C language, and if any of those persons happens to have a few free hours, they could certainly do it both faster and better. If you are interested in helping out with this, please post to the issue tracker.

The division code is just one of several small projects I've been working on lately. Basically, I've found that there are some arithmetic functions that are needed very frequently in all kinds of mathematical code. These include:

Pure convenience functions like sign, product, ...

Bit counting

Radix conversion

Integer square roots

Rational (and complex rational) arithmetic

Integer factorization, multiplicity tests, gcd, etc

Factorials, binomial coefficients, etc

Such utility functions are currently scattered throughout mpmath, SymPy and SympyCore codebases. Many of them are hack-ish, duplicates, and/or don't always work/have very poor worst-case performance. Right now, I'm trying to collect them into a single module and optimizing / strain-hardening them.

The current result of this effort can be found here (be aware that it's very much a work in progress). Among other things, it includes the mentioned fast division code, fast square root code based on my implementation from mpmath, much needed improvements to the nth root and integer factorization code I wrote for SymPy, plus the extremely fast multinomial coefficient code optimized by Pearu and myself for SympyCore (which, if I correctly recall the results of previous benchmarking, makes polynomial powering in SympyCore faster than Singular).

My plan is to split this off to a standalone project, as it could be useful to other people as such, but keeping it a single, self-contained .py file that is easy to include in mpmath and SymPy. There won't be too much feature creep (hopefully); the advanced number theory functions in SymPy won't move, nor will the floating-point functions from mpmath.

Finally, I have done a bit more work on the adaptive numerical evaluation code for SymPy. The main new features are support for converting approximate zeros to exact zeros (thereby preventing the algorithm from hanging when it encounters a nonsimplified zero expression, and overall prettifying output), and support for logarithms and powers/exponentials of complex numbers. See evalf.py and test_evalf.py. I've also done miscellaneous other work on SymPy, such as patching the existing evalf code to use mpmath and getting rid of the global precision.

Fredrik Johansson

Monday, July 28, 2008

Division: the sequel (with bonus material)

No comments:

About Me

Blog Archive

Other blogs