In my last post, I wrote about how Newton's method (coupled with a fast multiplication algorithm, such as Karatsuba multiplication) can be used for asymptotically fast division of long integers. I have now fixed up the code to compute a correctly rounded quotient. In fact, it now performs the full divmod operation, returning both the quotient and a correct remainder.
The trick is to perform an extra exact multiplication at the end to determine the remainder. By definition, the quotient r of p/q is correct if the remainder m = p - r·q satisfies 0 ≤ m < q. If this inequality does not hold, one need only perform an additional divmod operation (which can be performed using standard long division, since the error will almost certainly fit in a single limb) to correct both the quotient and the remainder.
The extra multiplication causes some slowdown, but it's not too bad. The new idivmod function still breaks even with the builtin divmod somewhere around 1000-2000 digits and is 10 times faster at half a million digits (i.e. when dividing a million digit number by a half-million digit number):
>>> time_division(int, 2.0, 20)
size old time new time faster correct
------------------------------------------------------------
16 0.000008 0.000052 0.155080 True
32 0.000005 0.000052 0.102151 True
64 0.000008 0.000059 0.132075 True
128 0.000013 0.000070 0.190476 True
256 0.000035 0.000107 0.325521 True
512 0.000126 0.000215 0.583658 True
1024 0.000431 0.000532 0.810399 True
2048 0.001855 0.001552 1.195104 True
4096 0.007154 0.005050 1.416708 True
8192 0.028505 0.015449 1.845033 True
16384 0.111193 0.046938 2.368925 True
32768 0.443435 0.142551 3.110706 True
65536 1.778292 0.432412 4.112497 True
131072 7.110184 1.305771 5.445200 True
262144 28.596926 3.919399 7.296253 True
524288 116.069764 11.804032 9.833061 True
As a bonus, a fast divmod also provides a fast way to convert long integers to decimal strings, by recursively splitting n in half using L, R = divmod(n, 10b/2) where b is the number of digits in n:
>>> time_str(int, 20)
size old time new time faster correct
------------------------------------------------------------
16 0.000005 0.000013 0.413043 True
32 0.000006 0.000009 0.588235 True
64 0.000008 0.000012 0.674419 True
128 0.000020 0.000023 0.865854 True
256 0.000059 0.000133 0.442105 True
512 0.000204 0.000333 0.613255 True
1024 0.000895 0.001194 0.749708 True
2048 0.003505 0.002252 1.556824 True
4096 0.013645 0.006600 2.067429 True
8192 0.052386 0.018334 2.857358 True
16384 0.209164 0.052233 4.004412 True
32768 0.834201 0.153238 5.443827 True
65536 3.339629 0.450897 7.406639 True
131072 13.372223 1.339044 9.986392 True
262144 53.547894 3.998352 13.392491 True
524288 214.847486 11.966933 17.953429 True
So printing an integer with half a million digits takes 12 seconds instead of 3.5 minutes. This can be quite a usability improvement.
The code can be found in this file: div.py.
I have now submitted a request for these algorithms to be implemented in the Python core. Since the pure Python implementation is very simple, I don't think porting it to C would be all that difficult. By "request", I mean that I might eventually do it myself, but there are many others who are much more familiar with both the Python codebase and the C language, and if any of those persons happens to have a few free hours, they could certainly do it both faster and better. If you are interested in helping out with this, please post to the issue tracker.
The division code is just one of several small projects I've been working on lately. Basically, I've found that there are some arithmetic functions that are needed very frequently in all kinds of mathematical code. These include:
- Pure convenience functions like sign, product, ...
- Bit counting
- Radix conversion
- Integer square roots
- Rational (and complex rational) arithmetic
- Integer factorization, multiplicity tests, gcd, etc
- Factorials, binomial coefficients, etc
Such utility functions are currently scattered throughout mpmath, SymPy and SympyCore codebases. Many of them are hack-ish, duplicates, and/or don't always work/have very poor worst-case performance. Right now, I'm trying to collect them into a single module and optimizing / strain-hardening them.
The current result of this effort can be found here (be aware that it's very much a work in progress). Among other things, it includes the mentioned fast division code, fast square root code based on my implementation from mpmath, much needed improvements to the nth root and integer factorization code I wrote for SymPy, plus the extremely fast multinomial coefficient code optimized by Pearu and myself for SympyCore (which, if I correctly recall the results of previous benchmarking, makes polynomial powering in SympyCore faster than Singular).
My plan is to split this off to a standalone project, as it could be useful to other people as such, but keeping it a single, self-contained .py file that is easy to include in mpmath and SymPy. There won't be too much feature creep (hopefully); the advanced number theory functions in SymPy won't move, nor will the floating-point functions from mpmath.
Finally, I have done a bit more work on the adaptive numerical evaluation code for SymPy. The main new features are support for converting approximate zeros to exact zeros (thereby preventing the algorithm from hanging when it encounters a nonsimplified zero expression, and overall prettifying output), and support for logarithms and powers/exponentials of complex numbers. See evalf.py and test_evalf.py. I've also done miscellaneous other work on SymPy, such as patching the existing evalf code to use mpmath and getting rid of the global precision.