## Sunday, November 23, 2008

### A spiralling little tune

One of my most recent achievements in music: spiral.ogg. A fun fact is that the second half essentially is identical to the first but with every measure internally reversed (except for a few melodic corrections); this can be heard quite clearly by playing the waveform backwards.

Like most music I write for myself, it is meant to loop indefinitely (so here it is in raw form: spiral.mid), and any hypnotic effect is deliberate. To appreciate it fully, one has listen to it without interruption for 8 hours or so while engaging in intense programming, studying, or some similar activity.

## Wednesday, October 15, 2008

### ANN: mpmath 0.10

Mpmath version 0.10 is now available. From the release announcement:

Additions in 0.10 include plotting support, matrices and linear algebra functions, new root-finding and quadrature algorithms, enhanced interval arithmetic, and some new special functions. Many speed improvements have been committed (a few functions are an order of magnitude faster than in 0.9), and as usual various bugs have been fixed. Importantly, this release fixes mpmath to work with Python 2.6.

For a more complete changelog, see: http://mpmath.googlecode.com/svn/trunk/CHANGES

Also be sure to check out the function gallery! It contains pretty colors ;-)

## Saturday, October 4, 2008

### Mpmath's first birthday

A little over a week ago, mpmath became 1 year old! I made the first SVN checkin on September 25, 2007. The source recently reached 10,000 lines of code (14,000 including blanks and comments), so it has grown up quickly.

The last release (version 0.9) happened over a month ago. There have been a lot of changes to the trunk since then, including linear algebra functions (written by Vinzent Steinberg), function plotting (using matplotlib), significantly improved interval arithmetic, polygamma functions, fast computation of Bernoulli numbers, Gauss-Legendre quadrature, and many speed improvements/bugfixes.

As a teaser, here is the Riemann zeta function on the critical line:
from mpmath import *plot(lambda t: zeta(0.5+t*j), [0, 50])

With all these additions, it'd be nice to make a new release soon; especially since 0.9 isn't compatible with Python 2.6. Now, I'm not personally going to switch over to Python 2.6 immediately (since some of the libraries I use aren't available for it), but most likely some users will. An advantage of mpmath's pure Python design is that it should work with minimal changes across Python versions, and the only reason it doesn't work with Python 2.6 is a trivial bug ("as" being used for a variable) which has been fixed in SVN.

I'm currently working on some rather heavy (and long overdue) global code cleanup, and there are too many loose threads to make an immediate release. But hopefully there will be one within 1 or 2 weeks. (The main obstacle, as usual, is that I also have upcoming exams...)

Of course, for anyone who wishes to play with the new features right now, it's easy to check out the current SVN trunk.

In other news, the mpmath website recently vanished from Google's search results. I hate when that happens...

## Thursday, September 25, 2008

### Fun with the digamma function

Let a ≈ 0.46163 be the positive real number that minimizes the factorial, x!. Then Euler's constant γ is given by

and

where ζ(s) is the Riemann zeta function. Both formulas follow trivially from the properties of the digamma function, but if I didn't know that, I'd find them pretty neat.

## Thursday, August 7, 2008

### Wrapping it up

The final deadline of GSoC is approaching quickly. Fortunately, I'm almost finished with my project; the main part of the code was pushed in a few days ago, and in the coming days I only expect to fix bugs and write documentation.

Here are some examples of things that should now work with the hg version of SymPy:

>>> from sympy import *>>> var('n x')(n, x)>>> (log(1+I)/pi).evalf(50)0.11031780007632579669822821605899884549134487436483 + 0.25*I>>> (atan(exp(1000))-pi/2).evalf(15, maxprec=1000)-5.07595889754946e-435>>> Integral(exp(-x**2), (x, -oo, oo)).evalf(50)1.7724538509055160272981674833411451827975494561224>>> Integral(sin(1/x), (x, 0, 1)).transform(x, 1/x).evalf(25, quad='osc')0.5040670619069283719898561>>> Sum(1/(9*n**2+4*n+6), (n, 0, oo)).evalf(50)0.27865709939940301230611638967893239422996461876558

Besides standard formula evaluation, numerical integration is supported, as is summation of infinite hypergeometric series.

Numerical differentiation and summation of generic infinite series is also on its way. The following works with my local copy:

>>> Sum(1/n-log(1+1/n), (n, 1, oo)).evalf(50)0.57721566490153286060651209008240243104215933593992>>> Derivative(-gamma(x), x).evalf(50, subs={x:1})0.57721566490153286060651209008240243104215933593992

The series is summed using the Euler-Maclaurin formula; in fact, the high-order derivatives are computed symbolically behind the scenes. (Because of the need for high-order derivatives, the Euler-Maclaurin formula makes much more sense in the context of a CAS than purely numerically.) This algorithm does have its drawbacks, so at some points pure extrapolation methods (which are already used for slowly convergent hypergeometric series) will be supported as well.

The use of an adaptive algorithm for evalf turns out to be especially useful for numerical differentiation, as one can just directly use a finite difference formula with a tiny step size, without worrying about cancellation error.

There is a lot more to be said about the details, but I'm leaving that for the to-be-written documentation.

## Monday, July 28, 2008

### Division: the sequel (with bonus material)

I have neglected to write a GSoC update for two weeks (where "two" was computed using round-to-floor), so let me first apologize for that.

In my last post, I wrote about how Newton's method (coupled with a fast multiplication algorithm, such as Karatsuba multiplication) can be used for asymptotically fast division of long integers. I have now fixed up the code to compute a correctly rounded quotient. In fact, it now performs the full divmod operation, returning both the quotient and a correct remainder.

The trick is to perform an extra exact multiplication at the end to determine the remainder. By definition, the quotient r of p/q is correct if the remainder m = p - r·q satisfies 0 ≤ m < q. If this inequality does not hold, one need only perform an additional divmod operation (which can be performed using standard long division, since the error will almost certainly fit in a single limb) to correct both the quotient and the remainder.

The extra multiplication causes some slowdown, but it's not too bad. The new idivmod function still breaks even with the builtin divmod somewhere around 1000-2000 digits and is 10 times faster at half a million digits (i.e. when dividing a million digit number by a half-million digit number):

>>> time_division(int, 2.0, 20)      size     old time     new time       faster  correct------------------------------------------------------------        16     0.000008     0.000052     0.155080     True        32     0.000005     0.000052     0.102151     True        64     0.000008     0.000059     0.132075     True       128     0.000013     0.000070     0.190476     True       256     0.000035     0.000107     0.325521     True       512     0.000126     0.000215     0.583658     True      1024     0.000431     0.000532     0.810399     True      2048     0.001855     0.001552     1.195104     True      4096     0.007154     0.005050     1.416708     True      8192     0.028505     0.015449     1.845033     True     16384     0.111193     0.046938     2.368925     True     32768     0.443435     0.142551     3.110706     True     65536     1.778292     0.432412     4.112497     True    131072     7.110184     1.305771     5.445200     True    262144    28.596926     3.919399     7.296253     True    524288   116.069764    11.804032     9.833061     True

As a bonus, a fast divmod also provides a fast way to convert long integers to decimal strings, by recursively splitting n in half using L, R = divmod(n, 10b/2) where b is the number of digits in n:

>>> time_str(int, 20)      size     old time     new time       faster  correct------------------------------------------------------------        16     0.000005     0.000013     0.413043     True        32     0.000006     0.000009     0.588235     True        64     0.000008     0.000012     0.674419     True       128     0.000020     0.000023     0.865854     True       256     0.000059     0.000133     0.442105     True       512     0.000204     0.000333     0.613255     True      1024     0.000895     0.001194     0.749708     True      2048     0.003505     0.002252     1.556824     True      4096     0.013645     0.006600     2.067429     True      8192     0.052386     0.018334     2.857358     True     16384     0.209164     0.052233     4.004412     True     32768     0.834201     0.153238     5.443827     True     65536     3.339629     0.450897     7.406639     True    131072    13.372223     1.339044     9.986392     True    262144    53.547894     3.998352    13.392491     True    524288   214.847486    11.966933    17.953429     True

So printing an integer with half a million digits takes 12 seconds instead of 3.5 minutes. This can be quite a usability improvement.

The code can be found in this file: div.py.

I have now submitted a request for these algorithms to be implemented in the Python core. Since the pure Python implementation is very simple, I don't think porting it to C would be all that difficult. By "request", I mean that I might eventually do it myself, but there are many others who are much more familiar with both the Python codebase and the C language, and if any of those persons happens to have a few free hours, they could certainly do it both faster and better. If you are interested in helping out with this, please post to the issue tracker.

The division code is just one of several small projects I've been working on lately. Basically, I've found that there are some arithmetic functions that are needed very frequently in all kinds of mathematical code. These include:

• Pure convenience functions like sign, product, ...

• Bit counting

• Integer square roots

• Rational (and complex rational) arithmetic

• Integer factorization, multiplicity tests, gcd, etc

• Factorials, binomial coefficients, etc

Such utility functions are currently scattered throughout mpmath, SymPy and SympyCore codebases. Many of them are hack-ish, duplicates, and/or don't always work/have very poor worst-case performance. Right now, I'm trying to collect them into a single module and optimizing / strain-hardening them.

The current result of this effort can be found here (be aware that it's very much a work in progress). Among other things, it includes the mentioned fast division code, fast square root code based on my implementation from mpmath, much needed improvements to the nth root and integer factorization code I wrote for SymPy, plus the extremely fast multinomial coefficient code optimized by Pearu and myself for SympyCore (which, if I correctly recall the results of previous benchmarking, makes polynomial powering in SympyCore faster than Singular).

My plan is to split this off to a standalone project, as it could be useful to other people as such, but keeping it a single, self-contained .py file that is easy to include in mpmath and SymPy. There won't be too much feature creep (hopefully); the advanced number theory functions in SymPy won't move, nor will the floating-point functions from mpmath.

Finally, I have done a bit more work on the adaptive numerical evaluation code for SymPy. The main new features are support for converting approximate zeros to exact zeros (thereby preventing the algorithm from hanging when it encounters a nonsimplified zero expression, and overall prettifying output), and support for logarithms and powers/exponentials of complex numbers. See evalf.py and test_evalf.py. I've also done miscellaneous other work on SymPy, such as patching the existing evalf code to use mpmath and getting rid of the global precision.

## Tuesday, July 8, 2008

### Making division in Python faster

A very useful feature of Python is that it comes with built-in arbitrary-precision integers. The implementation is not the fastest in the world, but it is adequate for many purposes.

Python is clever enough to use the Karatsuba algorithm for multiplication of large integers, which gives an O(n1.6) asymptotic time complexity for n-digit multiplication. This is a huge improvement over the O(n2) schoolbook algorithm when multiplying anything larger than a few hundred digits.

Unfortunately, division in Python is not so well equipped: Python uses a brute force O(n2) algorithm for quotients of any size. Several algorithms in mpmath perform several multiplications followed by a single large division; at high precision, the final division can take as much time as all the preceding multiplications together. Division is needed much less often than multiplication, but it is needed nonetheless.

Newton's method to the rescue. If f(x) can be evaluated to n-digit accuracy in time O(q(n)), then O(q(n)) is also the complexity for evaluating f−1(x) using Newton's method (under reasonable assumptions). Thus Newton's method allows one to divide as fast as multiplying, compute logarithms as fast as exponentials, etc. The overhead hidden in the O()-expression is typically a small factor, of the order of 2-5.

To divide with the same asymptotic speed as multiplying, we rewrite p/q as p · (1/q), and use Newton's method to calculate r = 1/q. Newton's method leads to the iteration rn+1 = 2rn - qrn2, which converges quadratically (i.e. each iteration doubles the accuracy). It is now important to exploit the self-correcting nature of Newton's method by performing each step with an arithmetic precision equal to the accuracy. This way only a single step has to be performed at full precision. If this optimization is not used, the time complexity is just O((log n) q(n)), not O(q(n)).

Here is my implementation of Newton division in Python:

from mpmath.lib import giant_steps, lshift, rshiftfrom math import logSTART_PREC = 15def size(x):    if isinstance(x, (int, long)):        return int(log(x,2))    # GMPY support    return x.numdigits(2)def newdiv(p, q):    szp = size(p)    szq = size(q)    szr = szp - szq    if min(szp, szq, szr) < 2*START_PREC:        return p//q    r = (1 << (2*START_PREC)) // (q >> (szq - START_PREC))    last_prec = START_PREC    for prec in giant_steps(START_PREC, szr):        a = lshift(r, prec-last_prec+1)        b = rshift(r**2 * rshift(q, szq-prec), 2*last_prec)        r = a - b        last_prec = prec    return ((p >> szq) * r) >> szr

The core algorithm is just a few lines, although those lines took me a few moments to write (getting the shifts right gives me a headache). I imported the functions lshift, rshift (which are like the << and >> operators but allow negative shifts) and giant_steps (which counts up from START_PREC to ... szr/4, szr/2, szr) from mpmath just for convenience; the implementation above requires essentially nothing but Python builtins.

How does it fare? A little benchmarking code:
from time import clockdef timing(INT):    fmt = "%10s %12f %12f %12f %8s"    print "%10s %12s %12s %12s %8s" % ("size", "old time", "new time",        "faster", "error")    print "-"*78    for i in range(4,30):        n = 2**i        Q = INT(10)**n        P = Q**2        t1 = clock()        R1 = P // Q        t2 = clock()        R2 = newdiv(P,Q)        t3 = clock()        size, old_time, new_time = n, t2-t1, t3-t2        faster, error = old_time/new_time, R2-R1        print fmt % (size, old_time, new_time, faster, error)

I choose to benchmark the performance of dividing a 2n-digit integer by an n-digit integer, for a result (and required precision) of n digits. The improvement is likely to be smaller if the denominator and quotient have unequal size, but the balanced case is the most important to optimize.

Now the performance compared to Python's builtin division operator:
timing(int)      size     old time     new time       faster    error--------------------------------------------------------------        16     0.000005     0.000101     0.050000       -1        32     0.000005     0.000044     0.107595       -1        64     0.000006     0.000052     0.123656        0       128     0.000013     0.000064     0.197368        0       256     0.000033     0.000088     0.377778       -1       512     0.000115     0.000173     0.663430        0      1024     0.000413     0.000399     1.035689        1      2048     0.001629     0.001099     1.481576        0      4096     0.006453     0.004292     1.503352       -1      8192     0.025596     0.009966     2.568285        0     16384     0.101964     0.030327     3.362182       -1     32768     0.408266     0.091811     4.446792       -1     65536     1.633296     0.278531     5.863967       -1    131072     6.535185     0.834847     7.828001        0    262144    26.108710     2.517134    10.372397        0    524288   104.445635     7.576984    13.784593       -1   1048576   418.701976    22.760790    18.395758        0

The results are excellent. The Newton division breaks even with the builtin division already at 1000 digits (despite the interpreter overhead), and is 10x faster at 260,000 digits.

The implementation in newdiv is not exact; as you can see, the results differ from the (presumably correct!) values computed by Python by a few units in the last place. This is not a big concern, as I intend to use this mainly for numerical algorithms, and it is always possible to simply add a few guard digits. For number theory and other applications that require exact division, I think it is sufficient to multiply a few of the least significant bits in q and r to see if they agree with p, and correct as necessary.

The point of this exercise was to try to give division of Python long ints the advantage of the builtin Karatsuba multiplication. Just for fun, I also tried it with gmpy integers:

from gmpy import mpztiming(mpz)      size     old time     new time       faster    error--------------------------------------------------------------        16     0.000011     0.000068     0.168033       -3        32     0.000008     0.000045     0.185185       -2        64     0.000006     0.000047     0.137725       -1       128     0.000005     0.000053     0.089005       -2       256     0.000007     0.000070     0.099602        0       512     0.000015     0.000083     0.184564       -1      1024     0.000044     0.000156     0.282143        0      2048     0.000134     0.000279     0.481000       -1      4096     0.000404     0.000657     0.614960       -2      8192     0.001184     0.001719     0.688882       -1     16384     0.003481     0.004585     0.759322       -2     32768     0.009980     0.012043     0.828671       -1     65536     0.027762     0.028783     0.964516       -1    131072     0.072987     0.067773     1.076926       -1    262144     0.186714     0.151604     1.231588       -1    524288     0.462057     0.342160     1.350411       -1   1048576     1.119103     0.788582     1.419134       -2   2097152     2.726458     1.838570     1.482923       -1   4194304     6.607204     4.069614     1.623546       -1   8388608    15.627027     8.980883     1.740032       -1  16777216    36.581787    19.167144     1.908567       -4  33554432    83.568330    40.131393     2.082368       -1  67108864   190.596889    81.412867     2.341115       -1

Interestingly, the Newton division breaks even at 130,000 digits and is twice as fast at 30 million digits. So it is clear that either gmpy does not use GMP optimally, or GMP does not use the asymptotically fastest possible division algorithm. I think the latter is the case; this page contains development code for the next version of GMP, including improved division code, and notes that "the new code has complexity O(M(n)) for all operations, while the old mpz code is O(M(n)log(n))". This seems consistent with my timings.

A small notice: I will be away for about a week starting tomorrow. I will bring my laptop to continue working on my GSoC project during the time, but I won't have internet access.

Edit: Fixed a misplaced parenthesis in newdiv that gave the wrong criterion for falling back to p//q. Benchmarks unaffected.

## Monday, July 7, 2008

### Hypergeometric series with SymPy

SymPy 0.6.0 is out, go get it! (It will be required for running the following code.)

Here is a nice example of what SymPy can be used for (I got the idea to play around with it today): automated generation of code for efficient numerical summation of hypergeometric series.

A rational hypergeometric series is a series (generally infinite) where the quotient between successive terms, R(n) = T(n+1)/T(n), is a rational function of n with integer (or equivalently rational) coefficients. The general term of such a series is a product or quotient of polynomials of n, integers raised to the power of An+B, factorials (An+B)!, binomial coefficients C(An+B, Cn+D), etc. The Chudnovsky series for π, mentioned previously on this blog, is a beautiful example:

Although this series converges quickly (adding 14 digits per term), it is not efficient to sum it term by term as written. It is slow to do so because the factorials quickly grow huge; the series converges only because the denominator factorials grow even quicker than the numerator factorials. A much better approach is to take advantage of the fact that each (n+1)'th term can be computed from the n'th by simply evaluating R(n).

Given the expression for the general term T(n), finding R(n) in simplified form is a straightforward but very tedious exercise. This is where SymPy comes in. To demonstrate, let's pick a slightly simpler series than the Chudnovsky series:

The SymPy function hypersimp calculates R given T (this function, by the way, was implemented by Mateusz Paprocki who did a GSoC project for SymPy last year):
>>> from sympy import hypersimp, var, factorial>>> var('n')n>>> pprint(hypersimp(factorial(n)**2 / factorial(2*n), n)) 1 + n-------2 + 4*n

So to compute the next term during the summation of this series, we just need to multiply the preceding term by (n+1) and divide it by (4n+2). This is very easy to do using fixed-point math with big integers.

Now, it is not difficult to write some code to automate this process and perform the summation. Here is a first attempt:
from sympy import hypersimp, lambdifyfrom sympy.mpmath.lib import MP_BASE, from_man_expfrom sympy.mpmath import mpf, mpdef hypsum(expr, n, start=0):    """    Sum a rapidly convergent infinite hypergeometric series with    given general term, e.g. e = hypsum(1/factorial(n), n). The    quotient between successive terms must be a quotient of integer    polynomials.    """    expr = expr.subs(n, n+start)    num, den = hypersimp(expr, n).as_numer_denom()    func1 = lambdify(n, num)    func2 = lambdify(n, den)    prec = mp.prec + 20    one = MP_BASE(1) << prec    term = expr.subs(n, 0)    term = (MP_BASE(term.p) << prec) // term.q    s = term    k = 1    while abs(term) > 5:        term *= MP_BASE(func1(k-1))        term //= MP_BASE(func2(k-1))        s += term        k += 1    return mpf(from_man_exp(s, -prec))

And now a couple of test cases. First some setup code:
from sympy import factorial, var, Rational, binomialfrom sympy.mpmath import sqrtvar('n')fac = factorialQ = Rationalmp.dps = 1000  # sum to 1000 digit accuracy

Some formulas for e (source):
print hypsum(1/fac(n), n)print 1/hypsum((1-2*n)/fac(2*n), n)print hypsum((2*n+1)/fac(2*n), n)print hypsum((4*n+3)/2**(2*n+1)/fac(2*n+1), n)**2

Ramanujan series for π (source):
print 9801/sqrt(8)/hypsum(fac(4*n)*(1103+26390*n)/fac(n)**4/396**(4*n), n)print 1/hypsum(binomial(2*n,n)**3 * (42*n+5)/2**(12*n+4), n)

Machin's formula for π:
print 16*hypsum((-1)**n/(2*n+1)/5**(2*n+1), n) - \       4*hypsum((-1)**n/(2*n+1)/239**(2*n+1), n)

A series for √2 (Taylor series for √(1+x), accelerated with Euler transformation):
print hypsum(fac(2*n+1)/fac(n)**2/2**(3*n+1), n)

Catalan's constant:
print 1./64*hypsum((-1)**(n-1)*2**(8*n)*(40*n**2-24*n+3)*fac(2*n)**3*\    fac(n)**2/n**3/(2*n-1)/fac(4*n)**2, n, start=1)

Some formulas for ζ(3) (source):
print hypsum(Q(5,2)*(-1)**(n-1)*fac(n)**2 / n**3 / fac(2*n), n, start=1)print hypsum(Q(1,4)*(-1)**(n-1)*(56*n**2-32*n+5) / \    (2*n-1)**2 * fac(n-1)**3 / fac(3*n), n, start=1)print hypsum((-1)**n * (205*n**2 + 250*n + 77)/64 * \    fac(n)**10 / fac(2*n+1)**5, n)P = 126392*n**5 + 412708*n**4 + 531578*n**3 + 336367*n**2 + 104000*n + 12463print hypsum((-1)**n * P / 24 * (fac(2*n+1)*fac(2*n)*fac(n))**3 / \    fac(3*n+2) / fac(4*n+3)**3, n)

All of these calculations finish in less than a second on my computer (with gmpy installed). The generated code for the Catalan's constant series and the third series for ζ(3) are actually almost equivalent to the code used by mpmath for computing these constants. (If I had written hypsum earlier, I could have saved myself the trouble of implementing them by hand!)

This code was written very quickly can certainly be improved. For one thing, it should do some error detection (if the input is not actually hypergeometric, or if hypersimp fails). It would also be better to generate code for summing the series using binary splitting than using repeated division.

To perform binary splitting, one must know the number of terms in advance. Finding out how many terms must be included to obtain a accuracy of p digits can be done generally by numerically solving the equation T(n) = 10-p (for example with mpmath). If the series converges at a purely geometric rate (and this is often the case), the rate of convergence can also be computed symbolically. Returning to the Chudnovsky series, for example, we have

>>> from sympy import *>>> fac = factorial>>> var('n')n>>> P = fac(6*n)*(13591409+545140134*n)>>> Q = fac(3*n)*fac(n)**3*(-640320)**(3*n)>>> R = hypersimp(P/Q,n)>>> abs(1/limit(R, n, oo))151931373056000>>> log(_,10).evalf()14.1816474627255

So the series adds 14.18165 digits per term.

With some more work, this should be able to make it into SymPy. The goal should be that if you type in a sum, and ask for a high precision value, like this:

>>> S = Sum(2**n/factorial(n), (n, 0, oo))>>> S.evalf(1000)

then Sum.evalf should be able to automatically figure out that the sum is of the rational hypergeometric type and calculate it using the optimal method.

## Thursday, July 3, 2008

### Faster pi, more integrals and complex numbers

Yesterday, I implemented the Chudnovsky algorithm in mpmath for computing π (see commit). This turns out to be about 3x faster than the old algorithm, so mpmath now only needs about 10 seconds to compute 1 million digits of π. To try it out, fetch the SVN version of mpmath, make sure gmpy 1.03 is installed, and run:
>>> from mpmath import mp, pi>>> mp.dps = 10**6>>> print pi

Right now, computing π with mpmath is actually faster than with SAGE (but knowing the SAGE people, this bug will be fixed soon :-).

As promised in the last post, I will now write in more detail about the most recently added features to my numerical evaluation code for SymPy. For the code itself, see evalf.py and test_evalf.py.

#### New functions

The functions atan (for real input) and log (for positive real input) have been added.
>>> N('log(2)', 50)'0.69314718055994530941723212145817656807550013436026'>>> N('16*atan(1/5) - 4*atan(1/239)', 50)'3.1415926535897932384626433832795028841971693993751'

The working precision is increased automatically to evaluate log(1+ε) accurately:
>>> N('log(2**(1/10**20))',15)'6.93147180559945e-21'

#### Integrals

A second important new feature is support for integrals. There are still some bugs to be sorted out with the implementation, but the basics work.
>>> from sympy import *>>> var('x')x>>> gauss = Integral(exp(-x**2), (x, -oo, oo))>>> N(gauss, 15)'1.77245385090552'

Integrals can be used as part of larger expressions, and adaptive evaluation works as expected:
>>> N(gauss - sqrt(pi) + E*Rational(1,10**20), 15)'2.71828182845904e-20'

For reasonably nice integrands, the integration routine in mpmath can provide several hundred digits fairly quickly. Of course, any numerical integration algorithm can be fooled by pathological input, and the user must assume responsibility for being aware of this fact. In many common situations, however, numerical errors can be detected automatically (and doing this well is something I will look into further).

#### Complex arithmetic

The most important new feature is support for multiplication and addition of complex numbers.

In an earlier post, I posed the question of how to best track accuracy for complex numbers. This turns out not to be such a difficult problem; as soon as I got started with the implementation, I realized that there is only one reasonable solution. I have decided to track the accuracy of the real and imaginary parts separately, but to count the accuracy of a computed result as the accuracy of the number as a whole.

In other words, a computed result z denotes a point in the complex plane and the real and imaginary errors define a rectangular uncertainty region, centered around z. The other option would have been a circular disk, requiring only a single real error value (specifying radius). The rectangular representation is somewhat easier to work with, and much more powerful, because it is very common that the real part is known much more accurately than the imaginary part, and vice versa.

If the half-width and half-height of the error rectangle are defined by the complex number w, then the absolute error can be defined the usual way as |w| and the relative error as |w|/|z|. (For computational purposes, the complex norm can be approximated accurately using the max norm. This is wrong by at most a factor √2, or logarithmically by log2(√ 2) = 0.5 bits = 0.15 decimals.)

In other words, if |w|/|z| = 10−15, the result is considered accurate to 15 digits. This can either mean that both the real and imaginary parts are accurate to 15 digits, or that just one of them is, provided that the other is smaller in magnitude. For example, with a target accuracy of 15 digits; if the real part is fully accurate, and the imaginary part is a factor 103 smaller than the real part, then the latter need only be accurate to 15−3 = 12 digits. The advantage of this approach is that accuracy is preserved exactly under multiplication, and hence no restarting is required during multiplication.

As an example, consider the following multiplication in which the real parts completely cancel:
>>> N('(1/3+2*I)*(2+1/3*I)', 10)'.0e-12 + 4.111111111*I'

As noted in earlier posts, numerical evaluation cannot detect a quantity being exactly zero. The ".0e-12" is a scaled zero, indicating a real quantity of unknown sign and magnitude at most equal to 1e-12. (To clarify its meaning, it could perhaps be printed with a "±" sign in front). If we treat it as 10−12, its relative accuracy is 0 digits (because 0 nonzero digits are known). But the result as a whole is accurate to 10 digits, due to the imaginary part being more than 1010 times larger and accurate to 10 digits.

In an earlier post, I speculated about the total accuracy being problematic to use for complex results, because of subsequent additions potentially causing cancellations. This was rather stupid, because the problem already existed for purely real numbers, and was already solved by the existing adaptive addition code. To implement complex addition, I basically just needed to refactor the code to first evaluate all the terms and then add up the real and imaginary parts separately.

Building on the previous example, where the imaginary part held all the accuracy, we can try to subtract the entire imaginary part:

>>> N('(1/3+2*I)*(2+1/3*I) - 37/9*I + pi/10**50', 10)'3.141592654e-50 + .0e-62*I'

The addition routine finds that both the real and the imaginary parts are inaccurate and retries at higher precision until it locates the tiny real part that has been added. Alternatively, the following test also works (the imaginary part is displayed as 1.0e-15 despite having been computed accurately to 10 digits, because mpmath strips trailing zeros — this could perhaps be changed):
>>> N('(1/3+2*I)*(2+1/3*I) - (37/9 - 1/10**15)*I', 10)'.0e-30 + 1.0e-15*I'

The computation
>>> N('(1/3+2*I)*(2+1/3*I) - 37/9*I, 10)

hangs, as it should, because I have still not implemented any stopping criterion for apparent total cancellations.

So there is still work to do :-)

## Tuesday, July 1, 2008

### GMPY makes mpmath more speedy

This post will mostly be about recent mpmath development, but first a short update about my GSoC progress. Over the last week, I have improved the support for complex numbers (addition and multiplication of complex numbers works and detects cancellation in the real and imaginary parts), implemented some more functions, and added support for integrals. I still need to clean up and test the code some more. A longer update will be posted later today or tomorrow.

Besides working directly with the GSoC project, I've been busy playing with the results of a patch for mpmath that was submitted a few days ago by casevh. This brilliant patch allows mpmath to use GMPY mpz's instead of Python's built-in long integers.

GMPY, of course, is a Python wrapper for the highly optimized GMP bignum library. The beauty of GMPY is that you can just import it anywhere in your Python code, change x = 1 to x = mpz(1), and then almost any code written under the assumption of x being a Python int will continue to work, the only difference being that it will be much faster if x grows large.

(Mpmath attempts to accomplish something similar, being an intended drop-in replacement for the math and cmath modules and even parts of SciPy. However, mpmath's advantage is only that it gives increased precision; for low-precision number crunching, it is obviously much slower than ordinary floats.)

To try out mpmath with GMPY support, you need to check out the current development version from the mpmath SVN repository. There will be a new release soon, but not in the next few days. Casevh discovered a few bugs in GMPY while implementing this feature, so the patched version 1.03 of GMPY is needed. Mpmath will still work if an older version of GMPY is present; in that case, it will simply ignore it and use Python integers as usual. Just to be clear, mpmath is still a pure-Python library and will continue to work fine with GMPY unavailable (it will just not be as fast).

The improvement is quite significant. At precisions between 1,000 to 10,000 digits, most mpmath functions are of the order of 10 times faster with GMPY enabled. I have on occasion tried out some computations at 100,000 digits with mpmath; although they work fine, I have had to do something else while waiting for the results. With GMPY enabled, computing something like exp(sin(sqrt(2))) to 100,000 digits now only takes a couple of seconds.

SymPy will of course benefit from this feature. N/evalf will be able to handle ill-conditioned input much more efficiently, and the maximum working precision can be set higher, giving improved power and reliability for the same computing time.

The GMPY mode is slightly slower than the Python mode at low precision (10 percent slower below 100 digits of precision, or thereabout). That's something I can live with.

A particularly dramatic improvement can be seen by running the pidigits.py demo script. After I fixed an inefficiency in the number-to-string conversion, computing 1 million decimals of π with mpmath in GMPY-mode takes less than 30 seconds on my laptop:

C:\Source\mp\trunk\demo>pidigits.pyCompute digits of pi with mpmathWhich base? (2-36, 10 for decimal)> 10How many digits? (enter a big number, say, 10000)> 1000000Output to file? (enter a filename, or just press enterto print directly to the screen)> pi.txtStep 1 of 2: calculating binary value...  iteration 1 (accuracy ~= 0 base-10 digits)  iteration 2 (accuracy ~= 2 base-10 digits)  iteration 3 (accuracy ~= 4 base-10 digits)  iteration 4 (accuracy ~= 10 base-10 digits)  iteration 5 (accuracy ~= 21 base-10 digits)  iteration 6 (accuracy ~= 43 base-10 digits)  iteration 7 (accuracy ~= 86 base-10 digits)  iteration 8 (accuracy ~= 173 base-10 digits)  iteration 9 (accuracy ~= 348 base-10 digits)  iteration 10 (accuracy ~= 697 base-10 digits)  iteration 11 (accuracy ~= 1396 base-10 digits)  iteration 12 (accuracy ~= 2793 base-10 digits)  iteration 13 (accuracy ~= 5587 base-10 digits)  iteration 14 (accuracy ~= 11176 base-10 digits)  iteration 15 (accuracy ~= 22353 base-10 digits)  iteration 16 (accuracy ~= 44707 base-10 digits)  iteration 17 (accuracy ~= 89414 base-10 digits)  iteration 18 (accuracy ~= 178830 base-10 digits)  iteration 19 (accuracy ~= 357662 base-10 digits)  iteration 20 (accuracy ~= 715325 base-10 digits)  iteration 21 (accuracy ~= 1000017 base-10 digits)  final divisionStep 2 of 2: converting to specified base...Writing output...Finished in 28.540354 seconds (26.498292 calc, 2.042062 convert)

The same operation takes around 30 minutes in Python mode.

The speed of computing π with mpmath+GMPY is entirely respectable; some of the specialized π programs out there are actually slower (see Stu's pi page for an overview). The really fast ones run in 2-3 seconds; most require between 30 seconds and a minute.

One reason for mpmath+GMPY still being a bit slower for calculating π than the fastest programs is that it uses an arithmetic-geometric mean (AGM) based algorithm. The fastest algorithm in practice is the Chudnovsky series (with binary splitting), but its implementation is more complex, and making it really fast would require eliminating the Python and GMPY overhead as well. (A highly optimized implementation for GMP has been written by Hanhong Xue.)

The presence of GMPY does change the relative performance of various algorithms, particularly by favoring algorithms that take advantage of asymptotically fast multiplication. To try this out, I implemented binary splitting for computing e (see commit) in mpmath. Computing 1 million digits of e with mpmath now takes 3 seconds (converting to a base-10 string takes an additional 2 seconds). It actually turns out that binary splitting method for e is faster than the old direct summation method in Python mode as well, but only slightly.

Two of the most important functions are exp and log. Mpmath computes exp via Taylor series, using Brent's trick of replacing x by x/22k to accelerate convergence. Log is computed by inverting exp with Newton's method (in fact Halley's method is used now, but the principle is the same). This makes log roughly half as fast as exp at low precision, but asymptotically equal to exp. With GMPY, it becomes much faster to compute log at high precision using an AGM-based algorithm. This also implies that it becomes faster to compute exp at extremely high precision using Newton inversion of log, in an interesting reversal of roles. I have written some preliminary code for this, which I should be able to commit soon.

Several others have made great contributions to mpmath recently. Mario Pernici has improved the implementations of various elementary functions, Vinzent Steinberg has submitted some very interesting new root-finding code, and Mike Taschuk sent in a file implementing Jacobi elliptic functions for mpmath.

Felix Richter, who works for the semiconductor theory group at the physics department of the University of Rostock, shared some code for solving linear equation systems in a private correspondence. (I promised I would tidy it up and add it to mpmath, but it appears that I've been too lazy to do that yet.) He had some very interesting things to say about his research and why he wrote the code (I hope he doesn't mind that I quote a part of his mail here):

The range of problems we currently attack can be described as follows: Imagine a light beam incident on some solid body, e.g., a semiconductor crystal of some 50 micrometers length. When the light propagates into the sample it gets damped, and its strength will sooner or later become incredibly small, i.e., it can no longer be held in a double precision float variable.

However, a lot of interesting things can be calculated with the help of these electromagnetic field strength, such as predictions for the output of semiconductor lasers or signs for the much-sought-after Bose-Einstein condensation of excitons (electron-hole pairs in a semiconductor). I sure shouldn't go much more into detail here, as I suppose you're not a physicist.

Just one example: As a consistency check, the equation

1 - |r|^2 - |t|^2 = \int dx dx' A(x') chi(x,x') A(x)

should be fullfilled for any light frequency. r, t, A(x) are complex electromagnetic field strengths and just cannot be calculated with machine precision alone, not to mention further calculations based on them. (Chi is the susceptibility and describes the electromagnetic properties of the matter).

However, due to the magic of math (and mpmath ;-) ), the interplay of these incredibly small numbers yields two perfectly equal numbers between 0 and 1, as it should.

I'm really happy having chosen Python for my work. Mpmath fits in well, its so easy to use. I especially like the operator overloading, so that I may add some complex multiprecision numbers with a simple "+". And mpmath's mathematical and numerical function library is so extensive that it can take you really a long way.

It is always fun to hear from people who use software you're working on, but it's especially rewarding when you learn that the software is being used to solve real problems.

Computing a million digits of π, on the other hand, could hardly be called a real-world problem. But it is the sort of thing any competent arbitrary-precision library should be able to do without effort. Sometimes math is done to construct semiconductor lasers; sometimes math is done just because it's fun (and often there is some overlap :-).

## Sunday, June 22, 2008

### Taking N to the limit

A simple numerical algorithm to compute limx → a f(x) in the case when f(a) is undefined is to evaluate f(a+ε) for some fixed small value ε (or f(1/ε) in case a = ∞). The expression f(a+ε) or f(1/ε) is typically extremely poorly conditioned and hence a challenge for a numerical evaluation routine.

As a stress test for N, I have tried numerically evaluating all the limits in test_demidovich.py, a set of tests for SymPy's symbolic limit function.

I chose to evaluate each limit to 10 accurate digits, using ε = 10-50. Some simplifications were necessary:

• Since the numerical limit algorithm described above cannot generally detect convergence to 0 or ∞ (giving pseudorandom tiny or huge values instead), I chose to interpret any magnitude outside the range 10-10 to 1010 as 0 or ∞.

• SymPy's limit function supports limits containing parameters, such as limx→0 (cos(mx)-cos(nx))/x2 = (n2-m2)/2. In all such cases, I replaced the parameters with arbitrary values.

The nlimit function with tests is available in the file limtest.py (requiring evalf.py, mpmath and the hg version of SymPy). A straightforward limit evaluation looks like this:

>>> nlimit(sin(x)/(3*x), x, 0, 10)'0.3333333333'

The results of the tests? After fixing two minor bugs in N that manifested themselves, nlimit passes 50 out of 53 tests. It only fails three tests involving the functions log and asin which are not yet implemented. SymPy's limit function fails 8 out of the 53 tests; in each of these cases, nlimit gives the correct value to 10 digits.

nlimit is 100 times faster than limit, processing all the test cases in 0.18 (versus 18) seconds.

Despite only 10 digits being requested, running N with verbose=True shows that upwards of 800 bits of working precision are required for some of the limits, indicating very clearly the need for adaptive numerical evaluation.

The heuristic of using a fixed, finite ε will not work in case a limit converges extremely slowly. And of course, limit gives a nice, symbolic expression instead of an approximation (nlimit could give an exact answer in simple cases by passing its output through number recognition functions in mpmath). Due to these limitations, a numerical limit algorithm is at best a complement to a symbolic algorithm. The point, at this moment, is just to test N, although providing a numerical limit function in SymPy would also be a good idea.

I should note that there are much more sophisticated algorithms for numerical limits than the brute force method described here. Such algorithms are necessary to use especially when evaluating limits of indexed sequences where each element is expensive to compute (e.g. for summation of infinite series). A few acceleration methods for sequences and series are available in mpmath.

## Friday, June 20, 2008

### How many digits would you like?

My last post discussed the implementation of a number type that tracks the propagation of initial numerical uncertainties under arithmetic operations. I have now begun implementing a function that in some sense does the reverse; given a fixed formula and desired final accuracy, it produces a numerical value through recursive evaluation. I've named the function N because it behaves much like Mathematica's function with the same name.

The file is available here (the code needs a lot of cleanup at this point, so please be considerate). It contains a small test suite that should pass if you try running it.

The input to N can be a SymPy expression or a string representing one. For simplicity, the returned value is currently just a string. The second argument to N specifies the desired precision as a number of base-10 digits:

>>> from sympy import *>>> from evalf import N>>> N(pi,30)'3.14159265358979323846264338328'>>> N('355/113',30)'3.14159292035398230088495575221'

The set of supported expressions is currently somewhat limited; examples of what does work will be given below.

As I have said before, an important motivation for an adaptive algorithm for numerical evaluation is to distinguish integers from non-integers (or more simply, distinguishing nonzero numbers from zero). Numerical evaluation is, as far as I know, the only general method to evaluate functions such as x ≥ y, sign(x), abs(x) and floor(x). Due to the discontinuous nature of these functions, a tiny numerical error can cause a drastically wrong result if undetected, leading to complete nonsense in symbolic simplifications.

There are many known examples of "high-precision fraud", i.e. cases where an expression appears to be identical to another if evaluated to low numerical precision, but where there is in fact a small (and sometimes theoretically important) difference. See for example MathWorld's article, "Almost Integer". Some of these examples are rather conspicuous (e.g. any construction involving the floor function), but others are surprising and even involve elegant mathematical theory. In any case, they are a great way to test that the numerical evaluation works as intended.

#### Some algebraic examples

A neat way to derive almost-integers is based on Binet's formula for the Fibonacci numbers, F(n) = (φn - (-φ)n)/√5 where φ is the golden ratio (1+√5)/2. The (-φ)n-term decreases exponentially as n grows, meaning that φn/√5 alone is an excellent (although not exact) approximation of F(n). How good? Let's compute F(n) - φn/√5 for a few n (we can use SymPy's exact Fibonacci number function fibonacci(n) to make sure no symbolic simplification accidentally occurs):

>>> binet = lambda n: ((1+sqrt(5))/2)**n/sqrt(5)>>> N(binet(10) - fibonacci(10), 10)'3.636123247e-3'>>> N(binet(100) - fibonacci(100), 10)'5.646131293e-22'>>> N(binet(1000) - fibonacci(1000), 10)'4.60123853e-210'>>> N(binet(10000) - fibonacci(10000), 10)'5.944461218e-2091'

N works much better than the current fixed-precision evalf in SymPy:

>>> (fibonacci(1000) - binet(1000)).evalf()-1.46910587887435e+195

With N, we find that the simplified Binet formula not only gives the correct Fibonacci number to the nearest integer; for F(10000), you have to look 2000 digits beyond the decimal point to spot the difference. A more direct approach, of course, is to simply evaluate the (-φ)n term; the beauty of the implementation of N is that it works automatically, and it will still work in case there are a hundred terms contributing to cancel each other out (a much harder situation for a human to analyze).

Another, related, well-known result is that F(n+1)/F(n) is a close approximation of the golden ratio. To see how close, we can just compute the difference:

>>> N(fibonacci(10001)/fibonacci(10000) - (1+sqrt(5))/2, 10)'3.950754128e-4180'>>> N(fibonacci(10002)/fibonacci(10001) - (1+sqrt(5))/2, 10)'-1.509053796e-4180'

The approximation is good to over 4000 digits. Note also the signs; based on the numerical results, we could compute the exact value of the function sign(F(n+1)/F(n) - φ) for any specific value of n (and find that it is positive for odd n and negative for even n). Indeed, I will later implement the sign function (and related functions) in SymPy precisely this way: just try to call N() asking for 10 digits (or 3, it doesn't make much of a difference), and use the sign of the computed result if no error occurs.

Let's also revisit Rump's example of an ill-conditioned function, which was mentioned in my previous blog post. I have given N the ability to substitute numerical values for symbols (this required roughly two lines of code), in effect allowing it to be used for function evaluation. When asked for 15 digits, N gives the correct value right away:

>>> var('x y')>>> a = 1335*y**6/4+x**2*(11*x**2*y**2-y**6-121*y**4-2) + \...     11*y**8/2+x/(2*y)>>> N(a, 15, subs={x:77617, y:33096})'-0.827396059946821'

With the "verbose" flag set, N shows that it encounters cancellations during the addition and has to restart twice:

>>> N(a, 15, subs={x:77617, y:33096}, verbose=1)ADD: wanted 54 accurate bits, got -7 -- restarting with prec 115ADD: wanted 54 accurate bits, got 2 -- restarting with prec 167'-0.827396059946821'

#### Transcendental functions

N currently supports the constants π and e, and the functions x^y, exp, cos and sin. I refer to my previous post for a discussion of the issues involved in (real) exponentiation. Suffice to say, N figures out that in order to compute 10 mantissa digits of π to the power of 1 googol, it needs 110 digits of precision:

>>> N(pi ** (10**100), 10)'4.946362032e+4971498726941338543512682882908988736516783243804424461340534999249471120895526746555473864642912223'

It is also able to cope with cancellation of exponentials close to unity:

>>> N('2**(1/10**50) - 2**(-1/10**50)',15)'1.38629436111989e-50'

The trigonometric functions are a bit more interesting. Basically, to compute cos(x) or sin(x) to n accurate digits, you need to first evaluate x with an absolute error of 10-n. In order to calculate x to within a given absolute error, the magnitude of x must be known first, so two evaluations are generally required. N avoids the problem by evaluating x to a few extra bits the first time; if it turns out that |x| < C for the appropriate constant (say C = 1000), a second evaluation is not necessary. By appropriately increasing the internal precision, correct evaluations such as the following are possible:

>>> N('sin(exp(1000))',15)'-0.906874170721915'

There is an additional complication associated with evaluating trigonometric functions. If the argument is very close to a root (i.e. a multiple of π for sin, or offset by π/2 for cos), the precision must be increased further. N detects when this is necessary, and is for example able to deduce the following:

>>> N(sin(pi*10**1000 + Rational(1,10**1000), evaluate=False), 10)'1.0e-1000'

The test shows that there is no difference between evaluating sin(2πn + x) and sin(x), except of course for speed. The evaluate=False was added to prevent SymPy from removing the full-period term π · 101000. This automatic simplification is of course a SymPy feature; indeed, it makes the cleverness in N redundant in many cases by automatically reducing the argument to a region close to zero. However, the symbolic simplification is not of much help when x happens to be close to a multiple of 2π without having that explicit symbolic form. To demonstrate, let's combine a trigonometric function with the Fibonacci number approximation from before:

>>> phi = (1+sqrt(5))/2>>> N(sin(phi**3000 / sqrt(5) * pi), 15)'1.53018563496763e-627'

Running with verbose=True shows that N sets the working precision to over 6000 bits before it arrives at those 15 digits.

#### The problem with zero

I have so far neglected to mention the issue of zero detection. Although adaptive numerical evaluation can identify a nonzero value in finite time, it cannot detect a zero.

Suppose we try to compute the difference between the explicit Fibonacci number F(n) and the expression for the same in terms of Binet's exact formula (φn - (-φ)n)/√5. N(...,10) will in effect attempt to find 10 nonzero digits of 0, and of course fail, getting stuck in an infinite loop:

>>> phi = (1+sqrt(5))/2>>> binet = lambda n: (phi**n - (-phi)**n)/sqrt(5)>>> N(binet(100) - fibonacci(100), 10, verbose=1)ADD: wanted 56 accurate bits, got -1 -- restarting with prec 113ADD: wanted 56 accurate bits, got 0 -- restarting with prec 169ADD: wanted 56 accurate bits, got -3 -- restarting with prec 228...ADD: wanted 56 accurate bits, got 0 -- restarting with prec 524753ADD: wanted 56 accurate bits, got -1 -- restarting with prec 1049051...

To deal with this problem, it will be necessary to set a threshold precision or maximum number of iterations, with a reasonable default value. This number should be possible to override by the user, either globally or by providing a keyword argument to N (perhaps both).

There are several possible courses of action in case the threshold is reached. A cheap and practical solution is to simply consider any smaller quantity to be zero. This can be done either silently or with a printed warning message. Rigorists would perhaps find it more satisfactory if an exception were raised. A final possibility is to prompt the user. I think all these options should be available; the question is what to do by default.

Fortunately, in a computer algebra system, most cancellations can be detected and turned into explicit zeros before they reach the N function (1-1, sin(πn), etc). I am not entirely sure about the terminology here, but I think this ability to symbolically detect zeros is the essential difference between computer algebra systems and what is sometimes called "lazy infinite-precision reals" (or something similar).

#### Complex numbers

Currently, N does not know how to deal with complex numbers (I have so far only written some placeholder code for this).

Addition should be relatively easy to implement: just add the real and imaginary parts separately and check the accuracy of each.

Multiplication is the simplest of all operations in the purely real case, because there are no cancellation effects whatsoever; all that is needed is to a few guard bits to deal with rounding. In fact, multiplying purely real and purely imaginary quantities already works (this is just a matter of keeping an extra boolean variable around to keep track of whether the product is imaginary; in effect a pseudo-polar representation):

>>> N('3*pi*I',10)'9.424777961*I'

With general complex numbers, however, multiplication in rectangular form translates into addition, and I think cancellation effects may come into play so that it will be a little more complicated to implement correctly. For multiplication, it would be much nicer to use polar complex numbers, but that makes addition harder. There's just no escape...

One thing I'm wondering about is how to define accuracy for complex numbers. One could either consider the accuracy of the number as a whole, or of the real and imaginary parts separately.

It is very common to encounter sums of complex numbers with conjugate imaginary parts, i.e. (a+bi) + (c-bi). What should N do if it obtains the requested number of digits, say 15, for the real part, but is unable to deduce anything about the imaginary part except that it is smaller than 10-15 of the real part? By the definition of relative error as z·(1+error), N should arguably be satisfied with that. But matters becomes more complicated if the number is subsequently passed to a function that is very sensitive to changes in the imaginary part alone (in the extreme case the imaginary part function, Im(z)).

#### Performance

What about speed? In the best case (i.e. no restarts), N seems to be about as fast as direct evaluation with mpmath. This might be surprising, since N both keeps track of errors and manually traverses the expression tree. I had actually expected N to come out faster, since much of the need for instance creation and floating-point normalization is eliminated, but it turns out that a good deal of that is still needed and the additional error bookkeeping largely makes up for the remaining advantage.

There are some potential optimizations that could be exploited. One would be to take advantage of the TERMS/FACTORS representation employed in SympyCore. In a sum like a·x + b·y + ... where the coefficients a, b... are integers or simple fractions (a very common situation), the coefficients can absorbed into the sum on the fly instead of recursively evaluating each term as a full product. Another optimization would be to save the magnitude (exponent + width of mantissa) and sign of each term in the numerical evaluation of sums. This way, an optimal precision can be chosen for each term in case the evaluation has to be restarted at higher precision.

#### Project status

This completes an important part of my GSoC project, namely to implement a reliable numerical expression evaluation algorithm. What remains now is first of all to add support for complex numbers and more mathematical functions, and to provide alternatives to the infinite loop for dealing with zeros. The code should also work with something like the num class I posted previously; N currently assumes that expressions are exact and can be approximated to any accuracy, but it should also be able to accommodate expressions initially containing approximate numbers. And of course, it all has to be implemented in SymPy and debugged.

Post scriptum: it seems that I forgot to provide an example of applying N to Ramanujan's constant. I'll leave this as an entertaining exercise to the reader.

## Wednesday, June 11, 2008

### Basic implementation of significance arithmetic

This .py file contains a work-in-progress implementation of significance arithmetic, using mpmath as a base. The main class is currently called "num" (sorry, I just haven't come up with a good name yet). An instance can be created from a float or int value. A second number that specifies the accuracy (measured in decimal digits) can be passed; floats are assumed by default to have an accuracy of 53 bits or 15.95 decimal digits. The method .accuracy(b) gives the estimated number of accurate digits in the given base (default b = 10).

>>> num(1.2345678901234567890123456789)1.23456789012346>>> _.accuracy()15.954589770191001>>> num(1.2345678901234567890123456789, 5)1.2346>>> _.accuracy()5.0>>> num(1.2345678901234567890123456789, 25)1.234567890123456690432135>>> _.accuracy()25.0

In the last example, the fact that the input float has a limited accuracy as a representation of the entered decimal literal becomes visible. (Support for passing an exact string as input to avoid this problem will be added.)

The accuracy is a measure of relative accuracy (or relative error). A num instance with value u and accuracy a represents the interval u · (1 + ξ 2-a) for values of ξ between -1 and 1. The relative error, in the traditional numerical analysis meaning of the term, is given by 2-a and the absolute error is given by |u| · 2-a. In other words, the accuracy is a logarithmic (base 2) measure of error; in some cases, it is more natural to consider the (logarithmic) error, given by -a.

It can usually be assumed that a > 0. When this property does not hold, not even the sign of the number it is supposed to represent is known (unless additional assumptions are made). Such extremely inaccurate numbers will need special treatment in various places; when they result from higher operations (such as when the user asks for N digits of an expression in SymPy), they should probably signal errors.

How do errors propagate? The simplest case is that of multiplication. If x = u · (1+e) and y = v · (1+f), then x · y = u · v · (1+e) · (1+f) = (u · v) · (1 + e + f + e·f). Therefore the cumulative error is given by e + f + e·f. Expressed in terms of the logarithmic representation of the errors, 2-a = e and 2-b = f, the final accuracy is given by c = −log2(2-a + 2-b + 2-a-b). This expression can usually be approximated as c = min(a,b), which is the rule I have implemented so far. (Further testing will show whether it needs to be refined).

This analysis does not account for the error due to roundoff in the floating-point product u · v. Such errors can be accounted for either by adding an extra term to the error sum or by setting the arithmetic precision slightly higher than the estimated accuracy. I'm currently taking the latter approach.

Division is similar to multiplication.

Addition and subtraction is a bit harder, as these operations need to translate the relative errors of all the terms into a combined absolute error, and then translate that absolute error back to a relative accuracy (of the final sum). It is important to note that the accuracy of a sum can be much greater or much lesser than the minimum accuracy of all the terms.

Translating between relative and absolute error associated with a number involves knowing its magnitude, or rather the base-2 logarithm thereof. This is very easy with mpmath numbers, which are represented as tuples x = (sign, man, exp, bc) where bc is the bit size of the mantissa. The exact magnitude of x is given by log2 x = log2 man + exp, and this quantity can be approximated closely as exp+bc (although doing so admittedly does not pay off much since a call to math.log is inexpensive in Python terms).

Perhaps the most important type of error that significance arithmetic catches is loss of significance due to cancellation of terms with like magnitude but different sign. For example, 355/113 is an excellent approximation of pi, accurate to more than one part in a million. Subtracting the numbers from each other, with each given an initial accuracy of ~15 digits, leaves a number with only 9 accurate digits:

>>> pi = num(3.1415926535897932)>>> 355./113 - pi2.66764189e-7>>> _.accuracy()9.0308998699194341

Compare this with what happens when adding the terms:

>>> 355./113 + pi6.28318557394378>>> _.accuracy()15.954589770191001

Adding an inaccurate number to an accurate number does not greatly reduce accuracy if the less accurate number has small magnitude:

>>> pi + num(1e-12,2)3.1415926535908>>> _.accuracy()14.342229822223226

Total cancellation currently raises an exception. Significance arithmetic requires special treatment of numbers with value 0, because the relative error of 0 is meaningless; instead some form of absolute error has to be used.

I have also implemented real-valued exponentiation (x**y, exp(x) and sqrt(x)). These operations are currently a little buggy, but the basics work.

Exponentiation reduces accuracy proportionally to the magnitude of the exponent. For example, exponentiation by 1010 removes 10 digits of accuracy:

>>> from devel.num import *>>> pi = num(3.1415926535897932)>>> pi.accuracy()15.954589770191001>>> pi ** (10**10)8.7365e+4971498726>>> _.accuracy()5.954589770191002

Contrast this with the behavior of mpmath. With the working precision set to 15 digits, it will print 15 (plus two or three guard) digits of mantissa. Re-computing at a higher precision however verifies that only the first five digits were correct:

>>> from mpmath import mp, pi>>> mp.dps = 15>>> pi ** (10**10)mpf('8.7365179634758897e+4971498726')>>> mp.dps = 35>>> pi ** (10**10)mpf('8.7365213691213779435688568850099288527e+4971498726')

One might wonder where all the information in the input operand has gone. Powering is of course implemented using binary expontiation, so the rounding errors from repeated multiplications are insignificant.

The answer is that exponentiation transfers information between mantissa and exponent (contrast with a single floating-point multiplication, which works by separately multiplying mantissas and adding exponents). So to speak, exponentiation by 10n moves n digits of the mantissa into the exponent and then fiddles a little with whatever remains of the mantissa. Logarithms do the reverse, moving information from the exponent to the mantissa.

This is not something you have to think of often in ordinary floating-point arithmetic, because the exponent of a number is limited to a few bits and anything larger is an overflow. But when using mpmath numbers, exponents are arbitrary-precision integers, treated as exact. If you compute pi raised to 1 googol, you get:

>>> num(3.1415926535897932) ** (10**100)1.0e+4971498726941338374217231245523794013693982835177254302984571553024909360917306608664397050705368816>>> _.accuracy()-84.045410229808965

The exponent, although printed as if accurate to 100 digits, is only accurate to 15. Although the accuracy is reported as negative, the number does have a positive "logarithmic accuracy". So in contexts where extremely large numbers are used, some extra care is needed.

A counterintuitive property of arithmetic, that a direct implementation of precision-tracking floating-point arithmetic fails to capture, is that some operations increase accuracy. Significance arithmetic can recognize these cases and automatically adjust the precision to ensure that no information is lost. For example, although each input operand is accurate to only 15 decimal places, the result of the following operation is accurate to 65:

>>> num(2.0) ** num(1e-50)1.0000000000000000000000000000000000000000000000000069314718055995>>> _.accuracy()65.783713593702743

This permits one to do things like

>>> pi = num(3.1415926535897932)>>> (pi / (exp(pi/10**50) - 1)) / 10**501.0>>> _.accuracy()15.65355977452702

which in ordinary FP or interval arithmetic have a tendency to cause divisions by zero or catastrophic loss of significance, unless the precision is manually set high enough (here 65 digits) from the start.

Automatically-increasing precision is of course a bit dangerous since a calculation can become unpexpectedly slow in case the precision is increased to a level much higher than will be needed subsequently (e.g. when computing 15 digits of exp(10-100000) rather than exp(-100000)-1). This feature therefore needs to be combined with some user-settable precision limit.

I'll finish this post with a neat example of why significance or interval arithmetic is important for reliable numerical evaluation. The example is due to Siegfried M. Rump and discussed further in the paper by Sofroniou and Spaletta mentioned in an earlier post. The problem is to evaluate the following function for x = 77617 and y = 33096:

def f(x,y):    return 1335*y**6/4 + x**2*(11*x**2*y**2-y**6-121*y**4-2) + \        11*y**8/2 + x/(2*y)

Directly with mpmath, we get (at various levels of precision):

>>> from mpmath import mp, mpf, nstr>>> for i in range(2,51):...     mp.dps = i...     print i, nstr(f(mpf(77617),mpf(33096)),50)...2 1.1718753 2596148429267413814265248164610048.04 1.1726074218755 1.1726036071777343756 1.17260384559631347656257 1.17260393500328063964843758 -9903520314283042199192993792.09 -1237940039285380274899124224.010 -154742504910672534362390528.011 9671406556917033397649408.012 1.1726039400532499712426215410232543945312513 75557863725914323419136.014 -9444732965739290427392.015 -1180591620717411303424.016 1.172603940053178639413289374715532176196575164794917 -18446744073709551616.018 1152921504606846977.2519 144115188075855873.17187520 1.172603940053178631859449551101681752385275103733921 1125899906842625.172603845596313476562522 1.172603940053178631858840746170842724016569746936523 1.17260394005317863185883412872594229979517077566724 1.172603940053178631858834955906554852822845647075725 1.172603940053178631858834904207766568258615967612626 8589934593.172603940053178625535501566901075420901227 1073741825.172603940053178631823874167317001138144428 1.172603940053178631858834904510689155863484500890729 1.172603940053178631858834904520155486726136642555730 1.172603940053178631858834904520155486726136642555731 1.172603940053178631858834904520180138629424799174632 1.17260394005317863185883490452018322011733581875233 1.172603940053178631858834904520183797896319134922734 1.172603940053178631858834904520183701599821915560935 1.17260394005317863185883490452018370761835299177136 -0.8273960599468213681411650954798162920054888159658437 -0.8273960599468213681411650954798162920054888159658438 -0.8273960599468213681411650954798162919996113442117339 -0.8273960599468213681411650954798162919990603312347840 -0.8273960599468213681411650954798162919990373723607441 -0.8273960599468213681411650954798162919990330675718642 -0.8273960599468213681411650954798162919990330675718643 -0.8273960599468213681411650954798162919990331124134144 -0.8273960599468213681411650954798162919990331152160145 -0.8273960599468213681411650954798162919990331157414946 -0.8273960599468213681411650954798162919990331157852847 -0.8273960599468213681411650954798162919990331157852848 -0.8273960599468213681411650954798162919990331157844349 -0.8273960599468213681411650954798162919990331157843950 -0.82739605994682136814116509547981629199903311578438

Remarkably, the evaluations are not only wildly inaccurate at low precision; up to 35 digits, they seem to be converging to the value 1.1726..., which is wrong!

Significance arithmetic saves the day:

>>> for i in range(2,51):...     try:...         r = f(num(77617,i),num(33096,i))...         s = str(r)...         a = r.accuracy()...         print i, str(s), a...     except (NotImplementedError):...         continue...2 2.0e+33 -2.515449934965 6.0e+29 -2.826779887266 -8.0e+28 -2.729869874267 -1.0e+28 -2.632959861258 -1.0e+27 -2.536049848249 -2.0e+26 -2.4391398352310 1.0e+25 -2.6432598178912 2.0e+23 -2.4494397918713 -9.0e+21 -2.6535597745314 -1.0e+21 -2.5566497615216 -2.0e+19 -2.362829735517 1.0e+18 -2.5669497181618 -7.0e+16 -2.7710697008120 1.0e+15 -2.577249674821 -1.0e+14 -2.4803396617925 9.0e+9 -2.6947596010926 1.0e+9 -2.5978495880827 -1.0e+8 -2.5009395750735 -8.0e-1 -2.9297794536636 -8.0e-1 -1.9297794536637 -8.0e-1 -0.92977945366238 -8.0e-1 0.070220546338439 -8.0e-1 1.0702205463440 -8.3e-1 2.0702205463441 -8.27e-1 3.0702205463442 -8.274e-1 4.0702205463443 -8.274e-1 5.0702205463444 -0.827396 6.0702205463445 -0.8273961 7.0702205463446 -0.82739606 8.0702205463447 -0.82739606 9.0702205463448 -0.8273960599 10.070220546349 -0.82739605995 11.070220546350 -0.827396059947 12.0702205463

I had to wrap the code in try/except-clauses due to num(0) not yet being implemented (at many of the low-precision stages, but fortunately for this demonstration not all of them, there is not unpexpectedly complete cancellation).

The accuracy is reported as negative until the arguments to the function are specified as accurate to over 38 digits, and at that point we see that the printed digits are indeed correct. (The values do not exactly match those of mpmath due to slightly different use of guard digits.)

Interestingly, the num class seems to greatly overestimate the error in this case, and I'm not yet sure if that's inherent to evaluating Rump's particular function with significance arithmetic or due to implementation flaws. It's of course better to overestimate errors than to underestimate them.

Numerical evaluation of a SymPy expression can be seen as converting the expression to a function of its constants (integers and special numbers like pi) and evaluating that function with the constants replaced by significance floating-point numbers. In practice, I will probably implement numerical evaluation using a recursive expression tree walker. This way forward error analysis can be used to efficiently obtain the user-requested level of accuracy at the top of the expression. Subevaluations can be repeated with increased precision until they become accurate enough; precision only needs to be increased for parts of an expression where it actually matters.

My next goal, besides polishing the existing features (and implementing num(0)), is to implement more functions (log, trigonometric functions), comparison operators, and writing some tests. With trigonometric functions, all the arithmetic operations for complex numbers will be straightforward to implement.

## Tuesday, June 3, 2008

### The significance of arithmetic

One week of GSoC has passed. Since school didn't end until today, I've so far only had time to do a simple edit to SymPy's Real type, replacing decimal with mpmath. Although this wasn't much work, the improvement is substantial. If you download the latest version of SymPy from the hg repository, you will find that for example cos(100).evalf() not only is fast (finishing in milliseconds rather than seconds), but actually returns the right value (!).

Switching to mpmath immediately solves a few problems, but others remain. What is the best way to handle the accuracy of approximate numbers in a computer algebra system? Different CASes use different strategies, each with pros and cons. I would like to arrive at some kind of conclusion by the end of this summer, but this is far from a solved problem. Common approaches roughly fall into three categories, which I will discuss briefly below.

The simplest approach, from the implementation point of view, is to approximate a number by a single (arbitrary-precision) floating-point number. It is up to the user to know whether any results become inaccurate due to cancellations and other errors. SymPy currently works this way (with the floating-point working precision adjusted via a global parameter).

The second approach is interval arithmetic: any exact real number can be approximated by bounding it between a pair of floating-point numbers. If all operations are implemented with proper rounding, interval arithmetic deals rigorously with approximation errors, in fact rigorously enough for formal proof generation.

Interval arithmetic is unfortunately sometimes needlessly conservative, greatly exaggerating the estimate of the true error. It also has some counterintuitive properties (e.g., x · x is not the same as x2), and there is no unambiguous way to define comparisons. Interval arithmetic also comes with a considerable speed penalty, by a factor usually of the order 2-8.

The third approach is what might be called significance arithmetic (although this term has other meanings). It can be viewed as a tradeoff between direct floating-point arithmetic and interval arithmetic. A number is approximated as a single floating-point number, but with an error estimate attached to it. This representation is in principle equivalent to that of interval arithmetic (the error estimate can be seen as the width of an interval), but operations are defined differently. Instead of propagating the estimated error rigorously, one settles for a first-order approximation (this often works well in practice).

The error can be stored as a machine-precision number (representing the logarithm of the absolute or relative error), making arithmetic only a little slower than direct arbitrary-precision floating-point arithmetic. A downside with significance arithmetic is that it is not so straightforward to represent the concept of a "completely inaccurate" number, e.g. a number with unknown sign (the number 0 requires special treatment). It is not entirely clear to me whether it is best to work with the absolute or relative error (or both). As with interval arithmetic, there are still multiple ways to define comparisons.

I'm leaning towards implementing significance arithmetic in SymPy. Mpmath already has some support for interval arithmetic (which I will work in improving later this summer), and this should be accessible in SymPy as well, but I think some looser form of significance arithmetic is the right approach for the common representation of approximate numbers. Right now, I'm studying the implementation of significance arithmetic in Mathematica ( Mark Sofroniou & Giulia Spaletta: Precise Numerical Computation), and writing some test code, to get to grips with the details.

## Saturday, May 24, 2008

### First post

Hi all,

I'm starting this blog to document my progress with my Google Summer of Code (GSoC) project for 2008. The aim of the project is to improve SymPy's high-precision numerics. This work will build upon my existing library mpmath, which is already included as a third-party package in SymPy. My first goal will be to make sure the SymPy-mpmath integration works correctly (this is presently not the case at all), and after that I will implement improved algorithms to make numerical evaluation faster and more reliable, as well as entirely new numerical functions (several people have for example expressed interest in doing linear algebra with mpmath, so that area will certainly get some attention). My mentor is Ondrej Certik.

The GSoC coding period officially starts on May 26. I unfortunately have an exam on May 28 and a thesis presentation on June 2 and 3, so I won't be coding full time during the first week. In any case, I have already started poking at some things. I will post a GSoC status update roughly once a week.

It's unfortunate that Saroj's GSoC project for SymPy was withdrawn. SymPy would benefit greatly from improved plotting, and a LaTeX renderer would be useful as well. With some luck, I will be able to spend some time on the plotting; I have some old interactive 2D plotting code lying around, which could be quite usable if converted from pygame to pyglet and with some general cleanup. (Try it out if you like: plot.py -curvedemo shows some standard x-y function plots; plot.py -complexdemo plots the Mandelbrot set. Mouse zooms, arrow keys pan, and keys 1-8 select detail level for the Mandelbrot set.)

I suppose it would be more relevant to show a screenshot of some complicated mathematical expression being converted to a decimal number in a Python shell, but fractals always catch people's attention. Expect more cheap tricks in the future!