## Monday, July 28, 2008

### Division: the sequel (with bonus material)

I have neglected to write a GSoC update for two weeks (where "two" was computed using round-to-floor), so let me first apologize for that.

In my last post, I wrote about how Newton's method (coupled with a fast multiplication algorithm, such as Karatsuba multiplication) can be used for asymptotically fast division of long integers. I have now fixed up the code to compute a correctly rounded quotient. In fact, it now performs the full divmod operation, returning both the quotient and a correct remainder.

The trick is to perform an extra exact multiplication at the end to determine the remainder. By definition, the quotient r of p/q is correct if the remainder m = p - r·q satisfies 0 ≤ m < q. If this inequality does not hold, one need only perform an additional divmod operation (which can be performed using standard long division, since the error will almost certainly fit in a single limb) to correct both the quotient and the remainder.

The extra multiplication causes some slowdown, but it's not too bad. The new idivmod function still breaks even with the builtin divmod somewhere around 1000-2000 digits and is 10 times faster at half a million digits (i.e. when dividing a million digit number by a half-million digit number):

>>> time_division(int, 2.0, 20)      size     old time     new time       faster  correct------------------------------------------------------------        16     0.000008     0.000052     0.155080     True        32     0.000005     0.000052     0.102151     True        64     0.000008     0.000059     0.132075     True       128     0.000013     0.000070     0.190476     True       256     0.000035     0.000107     0.325521     True       512     0.000126     0.000215     0.583658     True      1024     0.000431     0.000532     0.810399     True      2048     0.001855     0.001552     1.195104     True      4096     0.007154     0.005050     1.416708     True      8192     0.028505     0.015449     1.845033     True     16384     0.111193     0.046938     2.368925     True     32768     0.443435     0.142551     3.110706     True     65536     1.778292     0.432412     4.112497     True    131072     7.110184     1.305771     5.445200     True    262144    28.596926     3.919399     7.296253     True    524288   116.069764    11.804032     9.833061     True

As a bonus, a fast divmod also provides a fast way to convert long integers to decimal strings, by recursively splitting n in half using L, R = divmod(n, 10b/2) where b is the number of digits in n:

>>> time_str(int, 20)      size     old time     new time       faster  correct------------------------------------------------------------        16     0.000005     0.000013     0.413043     True        32     0.000006     0.000009     0.588235     True        64     0.000008     0.000012     0.674419     True       128     0.000020     0.000023     0.865854     True       256     0.000059     0.000133     0.442105     True       512     0.000204     0.000333     0.613255     True      1024     0.000895     0.001194     0.749708     True      2048     0.003505     0.002252     1.556824     True      4096     0.013645     0.006600     2.067429     True      8192     0.052386     0.018334     2.857358     True     16384     0.209164     0.052233     4.004412     True     32768     0.834201     0.153238     5.443827     True     65536     3.339629     0.450897     7.406639     True    131072    13.372223     1.339044     9.986392     True    262144    53.547894     3.998352    13.392491     True    524288   214.847486    11.966933    17.953429     True

So printing an integer with half a million digits takes 12 seconds instead of 3.5 minutes. This can be quite a usability improvement.

The code can be found in this file: div.py.

I have now submitted a request for these algorithms to be implemented in the Python core. Since the pure Python implementation is very simple, I don't think porting it to C would be all that difficult. By "request", I mean that I might eventually do it myself, but there are many others who are much more familiar with both the Python codebase and the C language, and if any of those persons happens to have a few free hours, they could certainly do it both faster and better. If you are interested in helping out with this, please post to the issue tracker.

The division code is just one of several small projects I've been working on lately. Basically, I've found that there are some arithmetic functions that are needed very frequently in all kinds of mathematical code. These include:

• Pure convenience functions like sign, product, ...

• Bit counting

• Integer square roots

• Rational (and complex rational) arithmetic

• Integer factorization, multiplicity tests, gcd, etc

• Factorials, binomial coefficients, etc

Such utility functions are currently scattered throughout mpmath, SymPy and SympyCore codebases. Many of them are hack-ish, duplicates, and/or don't always work/have very poor worst-case performance. Right now, I'm trying to collect them into a single module and optimizing / strain-hardening them.

The current result of this effort can be found here (be aware that it's very much a work in progress). Among other things, it includes the mentioned fast division code, fast square root code based on my implementation from mpmath, much needed improvements to the nth root and integer factorization code I wrote for SymPy, plus the extremely fast multinomial coefficient code optimized by Pearu and myself for SympyCore (which, if I correctly recall the results of previous benchmarking, makes polynomial powering in SympyCore faster than Singular).

My plan is to split this off to a standalone project, as it could be useful to other people as such, but keeping it a single, self-contained .py file that is easy to include in mpmath and SymPy. There won't be too much feature creep (hopefully); the advanced number theory functions in SymPy won't move, nor will the floating-point functions from mpmath.

Finally, I have done a bit more work on the adaptive numerical evaluation code for SymPy. The main new features are support for converting approximate zeros to exact zeros (thereby preventing the algorithm from hanging when it encounters a nonsimplified zero expression, and overall prettifying output), and support for logarithms and powers/exponentials of complex numbers. See evalf.py and test_evalf.py. I've also done miscellaneous other work on SymPy, such as patching the existing evalf code to use mpmath and getting rid of the global precision.

## Tuesday, July 8, 2008

### Making division in Python faster

A very useful feature of Python is that it comes with built-in arbitrary-precision integers. The implementation is not the fastest in the world, but it is adequate for many purposes.

Python is clever enough to use the Karatsuba algorithm for multiplication of large integers, which gives an O(n1.6) asymptotic time complexity for n-digit multiplication. This is a huge improvement over the O(n2) schoolbook algorithm when multiplying anything larger than a few hundred digits.

Unfortunately, division in Python is not so well equipped: Python uses a brute force O(n2) algorithm for quotients of any size. Several algorithms in mpmath perform several multiplications followed by a single large division; at high precision, the final division can take as much time as all the preceding multiplications together. Division is needed much less often than multiplication, but it is needed nonetheless.

Newton's method to the rescue. If f(x) can be evaluated to n-digit accuracy in time O(q(n)), then O(q(n)) is also the complexity for evaluating f−1(x) using Newton's method (under reasonable assumptions). Thus Newton's method allows one to divide as fast as multiplying, compute logarithms as fast as exponentials, etc. The overhead hidden in the O()-expression is typically a small factor, of the order of 2-5.

To divide with the same asymptotic speed as multiplying, we rewrite p/q as p · (1/q), and use Newton's method to calculate r = 1/q. Newton's method leads to the iteration rn+1 = 2rn - qrn2, which converges quadratically (i.e. each iteration doubles the accuracy). It is now important to exploit the self-correcting nature of Newton's method by performing each step with an arithmetic precision equal to the accuracy. This way only a single step has to be performed at full precision. If this optimization is not used, the time complexity is just O((log n) q(n)), not O(q(n)).

Here is my implementation of Newton division in Python:

from mpmath.lib import giant_steps, lshift, rshiftfrom math import logSTART_PREC = 15def size(x):    if isinstance(x, (int, long)):        return int(log(x,2))    # GMPY support    return x.numdigits(2)def newdiv(p, q):    szp = size(p)    szq = size(q)    szr = szp - szq    if min(szp, szq, szr) < 2*START_PREC:        return p//q    r = (1 << (2*START_PREC)) // (q >> (szq - START_PREC))    last_prec = START_PREC    for prec in giant_steps(START_PREC, szr):        a = lshift(r, prec-last_prec+1)        b = rshift(r**2 * rshift(q, szq-prec), 2*last_prec)        r = a - b        last_prec = prec    return ((p >> szq) * r) >> szr

The core algorithm is just a few lines, although those lines took me a few moments to write (getting the shifts right gives me a headache). I imported the functions lshift, rshift (which are like the << and >> operators but allow negative shifts) and giant_steps (which counts up from START_PREC to ... szr/4, szr/2, szr) from mpmath just for convenience; the implementation above requires essentially nothing but Python builtins.

How does it fare? A little benchmarking code:
from time import clockdef timing(INT):    fmt = "%10s %12f %12f %12f %8s"    print "%10s %12s %12s %12s %8s" % ("size", "old time", "new time",        "faster", "error")    print "-"*78    for i in range(4,30):        n = 2**i        Q = INT(10)**n        P = Q**2        t1 = clock()        R1 = P // Q        t2 = clock()        R2 = newdiv(P,Q)        t3 = clock()        size, old_time, new_time = n, t2-t1, t3-t2        faster, error = old_time/new_time, R2-R1        print fmt % (size, old_time, new_time, faster, error)

I choose to benchmark the performance of dividing a 2n-digit integer by an n-digit integer, for a result (and required precision) of n digits. The improvement is likely to be smaller if the denominator and quotient have unequal size, but the balanced case is the most important to optimize.

Now the performance compared to Python's builtin division operator:
timing(int)      size     old time     new time       faster    error--------------------------------------------------------------        16     0.000005     0.000101     0.050000       -1        32     0.000005     0.000044     0.107595       -1        64     0.000006     0.000052     0.123656        0       128     0.000013     0.000064     0.197368        0       256     0.000033     0.000088     0.377778       -1       512     0.000115     0.000173     0.663430        0      1024     0.000413     0.000399     1.035689        1      2048     0.001629     0.001099     1.481576        0      4096     0.006453     0.004292     1.503352       -1      8192     0.025596     0.009966     2.568285        0     16384     0.101964     0.030327     3.362182       -1     32768     0.408266     0.091811     4.446792       -1     65536     1.633296     0.278531     5.863967       -1    131072     6.535185     0.834847     7.828001        0    262144    26.108710     2.517134    10.372397        0    524288   104.445635     7.576984    13.784593       -1   1048576   418.701976    22.760790    18.395758        0

The results are excellent. The Newton division breaks even with the builtin division already at 1000 digits (despite the interpreter overhead), and is 10x faster at 260,000 digits.

The implementation in newdiv is not exact; as you can see, the results differ from the (presumably correct!) values computed by Python by a few units in the last place. This is not a big concern, as I intend to use this mainly for numerical algorithms, and it is always possible to simply add a few guard digits. For number theory and other applications that require exact division, I think it is sufficient to multiply a few of the least significant bits in q and r to see if they agree with p, and correct as necessary.

The point of this exercise was to try to give division of Python long ints the advantage of the builtin Karatsuba multiplication. Just for fun, I also tried it with gmpy integers:

from gmpy import mpztiming(mpz)      size     old time     new time       faster    error--------------------------------------------------------------        16     0.000011     0.000068     0.168033       -3        32     0.000008     0.000045     0.185185       -2        64     0.000006     0.000047     0.137725       -1       128     0.000005     0.000053     0.089005       -2       256     0.000007     0.000070     0.099602        0       512     0.000015     0.000083     0.184564       -1      1024     0.000044     0.000156     0.282143        0      2048     0.000134     0.000279     0.481000       -1      4096     0.000404     0.000657     0.614960       -2      8192     0.001184     0.001719     0.688882       -1     16384     0.003481     0.004585     0.759322       -2     32768     0.009980     0.012043     0.828671       -1     65536     0.027762     0.028783     0.964516       -1    131072     0.072987     0.067773     1.076926       -1    262144     0.186714     0.151604     1.231588       -1    524288     0.462057     0.342160     1.350411       -1   1048576     1.119103     0.788582     1.419134       -2   2097152     2.726458     1.838570     1.482923       -1   4194304     6.607204     4.069614     1.623546       -1   8388608    15.627027     8.980883     1.740032       -1  16777216    36.581787    19.167144     1.908567       -4  33554432    83.568330    40.131393     2.082368       -1  67108864   190.596889    81.412867     2.341115       -1

Interestingly, the Newton division breaks even at 130,000 digits and is twice as fast at 30 million digits. So it is clear that either gmpy does not use GMP optimally, or GMP does not use the asymptotically fastest possible division algorithm. I think the latter is the case; this page contains development code for the next version of GMP, including improved division code, and notes that "the new code has complexity O(M(n)) for all operations, while the old mpz code is O(M(n)log(n))". This seems consistent with my timings.

A small notice: I will be away for about a week starting tomorrow. I will bring my laptop to continue working on my GSoC project during the time, but I won't have internet access.

Edit: Fixed a misplaced parenthesis in newdiv that gave the wrong criterion for falling back to p//q. Benchmarks unaffected.

## Monday, July 7, 2008

### Hypergeometric series with SymPy

SymPy 0.6.0 is out, go get it! (It will be required for running the following code.)

Here is a nice example of what SymPy can be used for (I got the idea to play around with it today): automated generation of code for efficient numerical summation of hypergeometric series.

A rational hypergeometric series is a series (generally infinite) where the quotient between successive terms, R(n) = T(n+1)/T(n), is a rational function of n with integer (or equivalently rational) coefficients. The general term of such a series is a product or quotient of polynomials of n, integers raised to the power of An+B, factorials (An+B)!, binomial coefficients C(An+B, Cn+D), etc. The Chudnovsky series for π, mentioned previously on this blog, is a beautiful example:

Although this series converges quickly (adding 14 digits per term), it is not efficient to sum it term by term as written. It is slow to do so because the factorials quickly grow huge; the series converges only because the denominator factorials grow even quicker than the numerator factorials. A much better approach is to take advantage of the fact that each (n+1)'th term can be computed from the n'th by simply evaluating R(n).

Given the expression for the general term T(n), finding R(n) in simplified form is a straightforward but very tedious exercise. This is where SymPy comes in. To demonstrate, let's pick a slightly simpler series than the Chudnovsky series:

The SymPy function hypersimp calculates R given T (this function, by the way, was implemented by Mateusz Paprocki who did a GSoC project for SymPy last year):
>>> from sympy import hypersimp, var, factorial>>> var('n')n>>> pprint(hypersimp(factorial(n)**2 / factorial(2*n), n)) 1 + n-------2 + 4*n

So to compute the next term during the summation of this series, we just need to multiply the preceding term by (n+1) and divide it by (4n+2). This is very easy to do using fixed-point math with big integers.

Now, it is not difficult to write some code to automate this process and perform the summation. Here is a first attempt:
from sympy import hypersimp, lambdifyfrom sympy.mpmath.lib import MP_BASE, from_man_expfrom sympy.mpmath import mpf, mpdef hypsum(expr, n, start=0):    """    Sum a rapidly convergent infinite hypergeometric series with    given general term, e.g. e = hypsum(1/factorial(n), n). The    quotient between successive terms must be a quotient of integer    polynomials.    """    expr = expr.subs(n, n+start)    num, den = hypersimp(expr, n).as_numer_denom()    func1 = lambdify(n, num)    func2 = lambdify(n, den)    prec = mp.prec + 20    one = MP_BASE(1) << prec    term = expr.subs(n, 0)    term = (MP_BASE(term.p) << prec) // term.q    s = term    k = 1    while abs(term) > 5:        term *= MP_BASE(func1(k-1))        term //= MP_BASE(func2(k-1))        s += term        k += 1    return mpf(from_man_exp(s, -prec))

And now a couple of test cases. First some setup code:
from sympy import factorial, var, Rational, binomialfrom sympy.mpmath import sqrtvar('n')fac = factorialQ = Rationalmp.dps = 1000  # sum to 1000 digit accuracy

Some formulas for e (source):
print hypsum(1/fac(n), n)print 1/hypsum((1-2*n)/fac(2*n), n)print hypsum((2*n+1)/fac(2*n), n)print hypsum((4*n+3)/2**(2*n+1)/fac(2*n+1), n)**2

Ramanujan series for π (source):
print 9801/sqrt(8)/hypsum(fac(4*n)*(1103+26390*n)/fac(n)**4/396**(4*n), n)print 1/hypsum(binomial(2*n,n)**3 * (42*n+5)/2**(12*n+4), n)

Machin's formula for π:
print 16*hypsum((-1)**n/(2*n+1)/5**(2*n+1), n) - \       4*hypsum((-1)**n/(2*n+1)/239**(2*n+1), n)

A series for √2 (Taylor series for √(1+x), accelerated with Euler transformation):
print hypsum(fac(2*n+1)/fac(n)**2/2**(3*n+1), n)

Catalan's constant:
print 1./64*hypsum((-1)**(n-1)*2**(8*n)*(40*n**2-24*n+3)*fac(2*n)**3*\    fac(n)**2/n**3/(2*n-1)/fac(4*n)**2, n, start=1)

Some formulas for ζ(3) (source):
print hypsum(Q(5,2)*(-1)**(n-1)*fac(n)**2 / n**3 / fac(2*n), n, start=1)print hypsum(Q(1,4)*(-1)**(n-1)*(56*n**2-32*n+5) / \    (2*n-1)**2 * fac(n-1)**3 / fac(3*n), n, start=1)print hypsum((-1)**n * (205*n**2 + 250*n + 77)/64 * \    fac(n)**10 / fac(2*n+1)**5, n)P = 126392*n**5 + 412708*n**4 + 531578*n**3 + 336367*n**2 + 104000*n + 12463print hypsum((-1)**n * P / 24 * (fac(2*n+1)*fac(2*n)*fac(n))**3 / \    fac(3*n+2) / fac(4*n+3)**3, n)

All of these calculations finish in less than a second on my computer (with gmpy installed). The generated code for the Catalan's constant series and the third series for ζ(3) are actually almost equivalent to the code used by mpmath for computing these constants. (If I had written hypsum earlier, I could have saved myself the trouble of implementing them by hand!)

This code was written very quickly can certainly be improved. For one thing, it should do some error detection (if the input is not actually hypergeometric, or if hypersimp fails). It would also be better to generate code for summing the series using binary splitting than using repeated division.

To perform binary splitting, one must know the number of terms in advance. Finding out how many terms must be included to obtain a accuracy of p digits can be done generally by numerically solving the equation T(n) = 10-p (for example with mpmath). If the series converges at a purely geometric rate (and this is often the case), the rate of convergence can also be computed symbolically. Returning to the Chudnovsky series, for example, we have

>>> from sympy import *>>> fac = factorial>>> var('n')n>>> P = fac(6*n)*(13591409+545140134*n)>>> Q = fac(3*n)*fac(n)**3*(-640320)**(3*n)>>> R = hypersimp(P/Q,n)>>> abs(1/limit(R, n, oo))151931373056000>>> log(_,10).evalf()14.1816474627255

So the series adds 14.18165 digits per term.

With some more work, this should be able to make it into SymPy. The goal should be that if you type in a sum, and ask for a high precision value, like this:

>>> S = Sum(2**n/factorial(n), (n, 0, oo))>>> S.evalf(1000)

then Sum.evalf should be able to automatically figure out that the sum is of the rational hypergeometric type and calculate it using the optimal method.

## Thursday, July 3, 2008

### Faster pi, more integrals and complex numbers

Yesterday, I implemented the Chudnovsky algorithm in mpmath for computing π (see commit). This turns out to be about 3x faster than the old algorithm, so mpmath now only needs about 10 seconds to compute 1 million digits of π. To try it out, fetch the SVN version of mpmath, make sure gmpy 1.03 is installed, and run:
>>> from mpmath import mp, pi>>> mp.dps = 10**6>>> print pi

Right now, computing π with mpmath is actually faster than with SAGE (but knowing the SAGE people, this bug will be fixed soon :-).

As promised in the last post, I will now write in more detail about the most recently added features to my numerical evaluation code for SymPy. For the code itself, see evalf.py and test_evalf.py.

#### New functions

The functions atan (for real input) and log (for positive real input) have been added.
>>> N('log(2)', 50)'0.69314718055994530941723212145817656807550013436026'>>> N('16*atan(1/5) - 4*atan(1/239)', 50)'3.1415926535897932384626433832795028841971693993751'

The working precision is increased automatically to evaluate log(1+ε) accurately:
>>> N('log(2**(1/10**20))',15)'6.93147180559945e-21'

#### Integrals

A second important new feature is support for integrals. There are still some bugs to be sorted out with the implementation, but the basics work.
>>> from sympy import *>>> var('x')x>>> gauss = Integral(exp(-x**2), (x, -oo, oo))>>> N(gauss, 15)'1.77245385090552'

Integrals can be used as part of larger expressions, and adaptive evaluation works as expected:
>>> N(gauss - sqrt(pi) + E*Rational(1,10**20), 15)'2.71828182845904e-20'

For reasonably nice integrands, the integration routine in mpmath can provide several hundred digits fairly quickly. Of course, any numerical integration algorithm can be fooled by pathological input, and the user must assume responsibility for being aware of this fact. In many common situations, however, numerical errors can be detected automatically (and doing this well is something I will look into further).

#### Complex arithmetic

The most important new feature is support for multiplication and addition of complex numbers.

In an earlier post, I posed the question of how to best track accuracy for complex numbers. This turns out not to be such a difficult problem; as soon as I got started with the implementation, I realized that there is only one reasonable solution. I have decided to track the accuracy of the real and imaginary parts separately, but to count the accuracy of a computed result as the accuracy of the number as a whole.

In other words, a computed result z denotes a point in the complex plane and the real and imaginary errors define a rectangular uncertainty region, centered around z. The other option would have been a circular disk, requiring only a single real error value (specifying radius). The rectangular representation is somewhat easier to work with, and much more powerful, because it is very common that the real part is known much more accurately than the imaginary part, and vice versa.

If the half-width and half-height of the error rectangle are defined by the complex number w, then the absolute error can be defined the usual way as |w| and the relative error as |w|/|z|. (For computational purposes, the complex norm can be approximated accurately using the max norm. This is wrong by at most a factor √2, or logarithmically by log2(√ 2) = 0.5 bits = 0.15 decimals.)

In other words, if |w|/|z| = 10−15, the result is considered accurate to 15 digits. This can either mean that both the real and imaginary parts are accurate to 15 digits, or that just one of them is, provided that the other is smaller in magnitude. For example, with a target accuracy of 15 digits; if the real part is fully accurate, and the imaginary part is a factor 103 smaller than the real part, then the latter need only be accurate to 15−3 = 12 digits. The advantage of this approach is that accuracy is preserved exactly under multiplication, and hence no restarting is required during multiplication.

As an example, consider the following multiplication in which the real parts completely cancel:
>>> N('(1/3+2*I)*(2+1/3*I)', 10)'.0e-12 + 4.111111111*I'

As noted in earlier posts, numerical evaluation cannot detect a quantity being exactly zero. The ".0e-12" is a scaled zero, indicating a real quantity of unknown sign and magnitude at most equal to 1e-12. (To clarify its meaning, it could perhaps be printed with a "±" sign in front). If we treat it as 10−12, its relative accuracy is 0 digits (because 0 nonzero digits are known). But the result as a whole is accurate to 10 digits, due to the imaginary part being more than 1010 times larger and accurate to 10 digits.

In an earlier post, I speculated about the total accuracy being problematic to use for complex results, because of subsequent additions potentially causing cancellations. This was rather stupid, because the problem already existed for purely real numbers, and was already solved by the existing adaptive addition code. To implement complex addition, I basically just needed to refactor the code to first evaluate all the terms and then add up the real and imaginary parts separately.

Building on the previous example, where the imaginary part held all the accuracy, we can try to subtract the entire imaginary part:

>>> N('(1/3+2*I)*(2+1/3*I) - 37/9*I + pi/10**50', 10)'3.141592654e-50 + .0e-62*I'

The addition routine finds that both the real and the imaginary parts are inaccurate and retries at higher precision until it locates the tiny real part that has been added. Alternatively, the following test also works (the imaginary part is displayed as 1.0e-15 despite having been computed accurately to 10 digits, because mpmath strips trailing zeros — this could perhaps be changed):
>>> N('(1/3+2*I)*(2+1/3*I) - (37/9 - 1/10**15)*I', 10)'.0e-30 + 1.0e-15*I'

The computation
>>> N('(1/3+2*I)*(2+1/3*I) - 37/9*I, 10)

hangs, as it should, because I have still not implemented any stopping criterion for apparent total cancellations.

So there is still work to do :-)

## Tuesday, July 1, 2008

### GMPY makes mpmath more speedy

This post will mostly be about recent mpmath development, but first a short update about my GSoC progress. Over the last week, I have improved the support for complex numbers (addition and multiplication of complex numbers works and detects cancellation in the real and imaginary parts), implemented some more functions, and added support for integrals. I still need to clean up and test the code some more. A longer update will be posted later today or tomorrow.

Besides working directly with the GSoC project, I've been busy playing with the results of a patch for mpmath that was submitted a few days ago by casevh. This brilliant patch allows mpmath to use GMPY mpz's instead of Python's built-in long integers.

GMPY, of course, is a Python wrapper for the highly optimized GMP bignum library. The beauty of GMPY is that you can just import it anywhere in your Python code, change x = 1 to x = mpz(1), and then almost any code written under the assumption of x being a Python int will continue to work, the only difference being that it will be much faster if x grows large.

(Mpmath attempts to accomplish something similar, being an intended drop-in replacement for the math and cmath modules and even parts of SciPy. However, mpmath's advantage is only that it gives increased precision; for low-precision number crunching, it is obviously much slower than ordinary floats.)

To try out mpmath with GMPY support, you need to check out the current development version from the mpmath SVN repository. There will be a new release soon, but not in the next few days. Casevh discovered a few bugs in GMPY while implementing this feature, so the patched version 1.03 of GMPY is needed. Mpmath will still work if an older version of GMPY is present; in that case, it will simply ignore it and use Python integers as usual. Just to be clear, mpmath is still a pure-Python library and will continue to work fine with GMPY unavailable (it will just not be as fast).

The improvement is quite significant. At precisions between 1,000 to 10,000 digits, most mpmath functions are of the order of 10 times faster with GMPY enabled. I have on occasion tried out some computations at 100,000 digits with mpmath; although they work fine, I have had to do something else while waiting for the results. With GMPY enabled, computing something like exp(sin(sqrt(2))) to 100,000 digits now only takes a couple of seconds.

SymPy will of course benefit from this feature. N/evalf will be able to handle ill-conditioned input much more efficiently, and the maximum working precision can be set higher, giving improved power and reliability for the same computing time.

The GMPY mode is slightly slower than the Python mode at low precision (10 percent slower below 100 digits of precision, or thereabout). That's something I can live with.

A particularly dramatic improvement can be seen by running the pidigits.py demo script. After I fixed an inefficiency in the number-to-string conversion, computing 1 million decimals of π with mpmath in GMPY-mode takes less than 30 seconds on my laptop:

C:\Source\mp\trunk\demo>pidigits.pyCompute digits of pi with mpmathWhich base? (2-36, 10 for decimal)> 10How many digits? (enter a big number, say, 10000)> 1000000Output to file? (enter a filename, or just press enterto print directly to the screen)> pi.txtStep 1 of 2: calculating binary value...  iteration 1 (accuracy ~= 0 base-10 digits)  iteration 2 (accuracy ~= 2 base-10 digits)  iteration 3 (accuracy ~= 4 base-10 digits)  iteration 4 (accuracy ~= 10 base-10 digits)  iteration 5 (accuracy ~= 21 base-10 digits)  iteration 6 (accuracy ~= 43 base-10 digits)  iteration 7 (accuracy ~= 86 base-10 digits)  iteration 8 (accuracy ~= 173 base-10 digits)  iteration 9 (accuracy ~= 348 base-10 digits)  iteration 10 (accuracy ~= 697 base-10 digits)  iteration 11 (accuracy ~= 1396 base-10 digits)  iteration 12 (accuracy ~= 2793 base-10 digits)  iteration 13 (accuracy ~= 5587 base-10 digits)  iteration 14 (accuracy ~= 11176 base-10 digits)  iteration 15 (accuracy ~= 22353 base-10 digits)  iteration 16 (accuracy ~= 44707 base-10 digits)  iteration 17 (accuracy ~= 89414 base-10 digits)  iteration 18 (accuracy ~= 178830 base-10 digits)  iteration 19 (accuracy ~= 357662 base-10 digits)  iteration 20 (accuracy ~= 715325 base-10 digits)  iteration 21 (accuracy ~= 1000017 base-10 digits)  final divisionStep 2 of 2: converting to specified base...Writing output...Finished in 28.540354 seconds (26.498292 calc, 2.042062 convert)

The same operation takes around 30 minutes in Python mode.

The speed of computing π with mpmath+GMPY is entirely respectable; some of the specialized π programs out there are actually slower (see Stu's pi page for an overview). The really fast ones run in 2-3 seconds; most require between 30 seconds and a minute.

One reason for mpmath+GMPY still being a bit slower for calculating π than the fastest programs is that it uses an arithmetic-geometric mean (AGM) based algorithm. The fastest algorithm in practice is the Chudnovsky series (with binary splitting), but its implementation is more complex, and making it really fast would require eliminating the Python and GMPY overhead as well. (A highly optimized implementation for GMP has been written by Hanhong Xue.)

The presence of GMPY does change the relative performance of various algorithms, particularly by favoring algorithms that take advantage of asymptotically fast multiplication. To try this out, I implemented binary splitting for computing e (see commit) in mpmath. Computing 1 million digits of e with mpmath now takes 3 seconds (converting to a base-10 string takes an additional 2 seconds). It actually turns out that binary splitting method for e is faster than the old direct summation method in Python mode as well, but only slightly.

Two of the most important functions are exp and log. Mpmath computes exp via Taylor series, using Brent's trick of replacing x by x/22k to accelerate convergence. Log is computed by inverting exp with Newton's method (in fact Halley's method is used now, but the principle is the same). This makes log roughly half as fast as exp at low precision, but asymptotically equal to exp. With GMPY, it becomes much faster to compute log at high precision using an AGM-based algorithm. This also implies that it becomes faster to compute exp at extremely high precision using Newton inversion of log, in an interesting reversal of roles. I have written some preliminary code for this, which I should be able to commit soon.

Several others have made great contributions to mpmath recently. Mario Pernici has improved the implementations of various elementary functions, Vinzent Steinberg has submitted some very interesting new root-finding code, and Mike Taschuk sent in a file implementing Jacobi elliptic functions for mpmath.

Felix Richter, who works for the semiconductor theory group at the physics department of the University of Rostock, shared some code for solving linear equation systems in a private correspondence. (I promised I would tidy it up and add it to mpmath, but it appears that I've been too lazy to do that yet.) He had some very interesting things to say about his research and why he wrote the code (I hope he doesn't mind that I quote a part of his mail here):

The range of problems we currently attack can be described as follows: Imagine a light beam incident on some solid body, e.g., a semiconductor crystal of some 50 micrometers length. When the light propagates into the sample it gets damped, and its strength will sooner or later become incredibly small, i.e., it can no longer be held in a double precision float variable.

However, a lot of interesting things can be calculated with the help of these electromagnetic field strength, such as predictions for the output of semiconductor lasers or signs for the much-sought-after Bose-Einstein condensation of excitons (electron-hole pairs in a semiconductor). I sure shouldn't go much more into detail here, as I suppose you're not a physicist.

Just one example: As a consistency check, the equation

1 - |r|^2 - |t|^2 = \int dx dx' A(x') chi(x,x') A(x)

should be fullfilled for any light frequency. r, t, A(x) are complex electromagnetic field strengths and just cannot be calculated with machine precision alone, not to mention further calculations based on them. (Chi is the susceptibility and describes the electromagnetic properties of the matter).

However, due to the magic of math (and mpmath ;-) ), the interplay of these incredibly small numbers yields two perfectly equal numbers between 0 and 1, as it should.

I'm really happy having chosen Python for my work. Mpmath fits in well, its so easy to use. I especially like the operator overloading, so that I may add some complex multiprecision numbers with a simple "+". And mpmath's mathematical and numerical function library is so extensive that it can take you really a long way.

It is always fun to hear from people who use software you're working on, but it's especially rewarding when you learn that the software is being used to solve real problems.

Computing a million digits of π, on the other hand, could hardly be called a real-world problem. But it is the sort of thing any competent arbitrary-precision library should be able to do without effort. Sometimes math is done to construct semiconductor lasers; sometimes math is done just because it's fun (and often there is some overlap :-).