Friday, September 24, 2010

Announcing mpmath 0.16

I'm happy to announce the release of mpmath 0.16, which contains the usual bugfixes as well as a slew of new features!

The main focus has been to improve coverage of special functions. Additions include inhomogeneous Bessel functions, Bessel function zeros, incomplete elliptic integrals, and parabolic cylinder functions. As of 0.16, mpmath implements essentially everything listed in the NIST Digital Library of Mathematical Functions chapters 1-20, as well as 21,24,27 and 33. (For 25 and 26 -- combinatorial and number-theoretic functions, see also my post about FLINT 2.)

Another major change is that mpmath 0.16 running in Sage will be much faster thanks to new extension code (currently awaiting review for inclusion in Sage). I've clocked speedups between 1.3x and 2x for various nontrivial pieces of code (such as the mpmath test suite and the torture test programs).

Thanks to William Stein, my work on mpmath during the summer was funded using resources from NSF grant DMS-0757627. This support is gratefully acknowledged.

Most of the new features are described in previous posts on this blog. For convenience, here is a short summary:

Assorted special functions update

  • The documentation now includes plots to illustrate several of the special functions.

  • Airy functions have been rewritten for improved speed and accuracy and to support evaluation of derivatives.

  • Functions airyaizero(), airybizero() for computation of Airy function zeros have been implemented.

  • Inhomogeneous Airy (Scorer) functions scorergi() and scorerhi() have been implemented.

  • Four inhomogeneous Bessel functions have been added (lommels1(), lommels2(), angerj(), webere()).

  • The Lambert W function has been rewritten to fix various bugs and numerical issues


Incomplete elliptic integrals complete

  • The Legendre and Carlson incomplete elliptic integrals for real and complex arguments have been implemented (ellipf(), ellipe(), ellippi(), elliprf(), elliprc(), elliprj(), elliprd(), elliprg()).


Sage Days 23, and Bessel function zeros

  • Functions besseljzero() and besselyzero() have been implemented for computing the m-th zero of Jν(z), J'ν(z) Yν(z), or Y'ν(z) for any positive integer index m and real order ν ≥ 0.


Post Sage Days 24 report

  • The Parabolic cylinder functions pcfd(), pcfu(), pcfv(), pcfw() have been implemented.


Euler-Maclaurin summation of hypergeometric series

  • Hypergeometric functions pFp-1(...; ...; z) now support accurate evaluation close to the singularity at z = 1.

  • A function sumap() has been added for summation of infinite series using the Abel-Plana formula.

  • Functions diffs_prod() and diffs_prod() have been added for generating high-order derivatives of products or exponentials of functions with known derivatives.


Again, mpmath in Sage is about to get faster

  • New Cython extension code has been written for Sage to speed up various operations in mpmath, including elementary functions and hypergeometric series.



There are various other changes as well, such as support for matrix slice indexing (contributed by Ioannis Tziakos -- thanks!). As usual, details are available in the changelog and the Changes page on the Google Code project site.

Wednesday, September 22, 2010

Again, mpmath in Sage is about to get faster

My summer project on special functions in mpmath and Sage, generously supported by William Stein with funds from NSF grant DMS-0757627, is nearing completion. I will soon release mpmath-0.16, which contains lots of new special functions and bugfixes. Sage users will also benefit from ~1500 lines of new Cython code (preliminary patch here) that speeds up various basic operations. Executing mpmath.runtests() in Sage on my laptop now takes 10.47 seconds (8.60 from a warm cache), compared to 14.21 (11.84) seconds with the new extensions disabled -- a global speedup of 30%.

For comparison, pure-Python mpmath with gmpy as the backend takes 21.46 (18.72) seconds to execute the unit tests and pure-Python mpmath with the pure-Python backend takes 52.33 (45.92) seconds.

Specifically, the new extension code implements exp for real and complex arguments, cos, sin and ln for real arguments, complex exponentiation in some cases, and summation of hypergeometric series, entirely in Cython.

Timings before (new extensions disabled):

sage: import mpmath
sage: x = mpmath.mpf(0.37)
sage: y = mpmath.mpf(0.49)
sage: %timeit mpmath.exp(x)
625 loops, best of 3: 14.5 µs per loop
sage: %timeit mpmath.ln(x)
625 loops, best of 3: 23.2 µs per loop
sage: %timeit mpmath.cos(x)
625 loops, best of 3: 17.2 µs per loop
sage: %timeit x ^ y
625 loops, best of 3: 39.9 µs per loop
sage: %timeit mpmath.hyp1f1(2r,3r,4r)
625 loops, best of 3: 90.3 µs per loop
sage: %timeit mpmath.hyp1f1(x,y,x)
625 loops, best of 3: 83.6 µs per loop
sage: %timeit mpmath.hyp1f1(x,y,mpmath.mpc(x,y))
625 loops, best of 3: 136 µs per loop


Timings after (new extensions enabled):

sage: import mpmath
sage: x = mpmath.mpf(0.37)
sage: y = mpmath.mpf(0.49)
sage: %timeit mpmath.exp(x)
625 loops, best of 3: 2.72 µs per loop
sage: %timeit mpmath.ln(x)
625 loops, best of 3: 7.25 µs per loop
sage: %timeit mpmath.cos(x)
625 loops, best of 3: 4.13 µs per loop
sage: %timeit x ^ y
625 loops, best of 3: 10.5 µs per loop
sage: %timeit mpmath.hyp1f1(2r,3r,4r)
625 loops, best of 3: 47.1 µs per loop
sage: %timeit mpmath.hyp1f1(x,y,x)
625 loops, best of 3: 59.4 µs per loop
sage: %timeit mpmath.hyp1f1(x,y,mpmath.mpc(x,y))
625 loops, best of 3: 83.1 µs per loop


The new elementary functions use a combination of custom algorithms and straightforward MPFR wrappers. Why not just wrap MPFR for everything? There are two primary reasons:

Firstly, because MPFR numbers have a limited range, custom code still needs to be used in the overflowing cases, and this is almost as much work as an implementation-from-scratch. (There are also some more minor incompatibilities, like lack of round-away-from-zero in MPFR, that result in a lot of extra work.)

Secondly, MPFR is not always fast (or as fast as it could be), so it pays off to write custom code. In fact, some of the ordinary Python implementations of functions in mpmath are faster than their MPFR counterparts in various cases, although that is rather exceptional (atan is an example). But generally, at low-mid precisions, it is possible to be perhaps 2-4x faster than MPFR with carefully optimized C code (see fastfunlib). This is a longer-term goal.

Already now, with the new extension code, the mpmath exponential function becomes faster than the Sage RealNumber version (based on MPFR) at low precision:

sage: %timeit mpmath.exp(x)
625 loops, best of 3: 2.75 µs per loop
sage: w = RealField(53)(x)
sage: %timeit w.exp()
625 loops, best of 3: 5.57 µs per loop


As the timings above indicate, hypergeometric series have gotten up to 2x faster. The speedup of the actual summation is much larger, but much of that gain is lost in various Python overheads (more work can be done on this). There should be a noticeable speedup for some hypergeometric function computations, while others will not benefit as much, for the moment.

Another benchmark is the extratest_zeta.py script in mpmath, which exercises the mpmath implementation of the Riemann-Siegel formula for evaluation of ζ(s) for complex s with large imaginary part. Such computations largely depend on elementary function performance (cos, sin, exp, log).

Here are the new timings for mpmath in Sage:

fredrik@scv:~/sage$ ./sage /home/fredrik/mp/mpmath/tests/extratest_zeta.py
399999999 156762524.675 ok = True (time = 1.144)
241389216 97490234.2277 ok = True (time = 9.271)
526196239 202950727.691 ok = True (time = 1.671)
542964976 209039046.579 ok = True (time = 1.189)
1048449112 388858885.231 ok = True (time = 1.774)
1048449113 388858885.384 ok = True (time = 1.604)
1048449114 388858886.002 ok = True (time = 2.096)
1048449115 388858886.002 ok = True (time = 2.587)
1048449116 388858886.691 ok = True (time = 1.546)

This is mpmath in Sage with the new extension code disabled:

fredrik@scv:~/sage$ ./sage /home/fredrik/mp/mpmath/tests/extratest_zeta.py
399999999 156762524.675 ok = True (time = 2.352)
241389216 97490234.2277 ok = True (time = 14.088)
526196239 202950727.691 ok = True (time = 3.036)
542964976 209039046.579 ok = True (time = 2.104)
1048449112 388858885.231 ok = True (time = 3.707)
1048449113 388858885.384 ok = True (time = 3.283)
1048449114 388858886.002 ok = True (time = 4.444)
1048449115 388858886.002 ok = True (time = 5.592)
1048449116 388858886.691 ok = True (time = 3.101)

This is mpmath in ordinary Python mode, using gmpy:

fredrik@scv:~/sage$ python /home/fredrik/mp/mpmath/tests/extratest_zeta.py
399999999 156762524.675 ok = True (time = 2.741)
241389216 97490234.2277 ok = True (time = 13.842)
526196239 202950727.691 ok = True (time = 3.124)
542964976 209039046.579 ok = True (time = 2.143)
1048449112 388858885.231 ok = True (time = 3.257)
1048449113 388858885.384 ok = True (time = 2.912)
1048449114 388858886.002 ok = True (time = 3.953)
1048449115 388858886.002 ok = True (time = 4.964)
1048449116 388858886.691 ok = True (time = 2.762)

With the new extension code, it appears that zeta computations are up to about twice as fast. This speedup could be made much larger as there still is a significant amount of Python overhead left to remove -- also a project for the future.

Sunday, September 5, 2010

Fast combinatorial and number-theoretic functions with FLINT 2

Time for a development update! Recently, I've done only a limited amount of work on mpmath (I have a some almost-finished Cython code for sage.libs.mpmath and new code for numerical integration in mpmath, both to be committed fairly soon -- within a couple of weeks, hopefully).

The last few weeks, I've mostly been contributing to FLINT 2. For those unfamiliar with it, FLINT is a fast C library for computational number theory developed by Bill Hart and others (the other active developers right now are Sebastian Pancratz and Andy Novocin). In particular, FLINT implements ridiculously fast multiprecision integer vectors and polynomials. It also provides very fast primality testing and factorization for word-size integers (32 or 64 bits), among other things. FLINT 2 is an in-progress rewrite of FLINT 1.x, a current standard component in Sage.

What does this have to do with numerical evaluation of special functions (the usual theme of this blog)? In short, my goal is to add code to FLINT 2 for exact special function computations -- combinatorial and number-theoretic functions, special polynomials and the like. Such functions benefit tremendously from the fast integer and polynomial arithmetic available in FLINT 2.

All my code can be found in my public GitHub repository (the most recent commits as of this writing are in the 'factor' branch).

Functions I've implemented so far include:

  • Möbius μ and Euler φ (totient) functions for word-size and arbitrary-size integers

  • Divisor sum function σk for arbitrary-size integers

  • Ramanujan τ function (Δ-function q-expansion)

  • Harmonic numbers 1 + 1/2 + 1/3 + ... + 1/n

  • Primorials 2 · 3 · 5 · ... · pn

  • Stirling numbers (1st and 2nd kind)



The versions in FLINT 2 of these functions should now be faster than all other implementations I've tried (GAP, Pari, Mathematica, the Sage library) for all ranges of arguments, except for those requiring factorization of large integers.

Some of these functions depend fundamentally on the ability to factorize integers efficiently. So far I've only implemented trial division for large integers in FLINT 2, with some clever code to extract large powers of small factors quickly. Sufficiently small cofactors are handled by calling Bill Hart's single-word factoring routines. The resulting code is very fast for "artificial" numbers like factorials, and will eventually be complemented with prime and perfect power detection code, plus fast implementations of Brent's algorithm and other methods. Later on the quadratic sieve from FLINT 1 will probably be ported to FLINT 2, so that FLINT 2 will be able to factor any reasonable number reasonably quickly.

Below, I've posted some benchmark results. A word of caution: all Mathematica timings were done on a different system, which is faster than my own laptop (typically by 30% or so). So in reality, Mathematica performs slightly worse relatively than indicated below. Everything else is timed on my laptop. I have not included test code for the FLINT2 functions (but it's just straightforward C code -- a function call or two between timeit_start and timeit_stop using FLINT 2's profiler module).

Möbius function (the following is basically a raw exercise of the small-integer factoring code):

Pari:
sage: %time pari('sum(n=1,10^6,moebius(n))');
CPU times: user 1.04 s, sys: 0.00 s, total: 1.04 s
Wall time: 1.04 s

Mathematica:
In[1]:= Timing[Sum[MoebiusMu[n], {n,1,10^6}];]
Out[1]= {0.71, Null}

flint2:
650 ms


Divisor sum:

Sage (uses Cython code):
sage: %time sigma(factorial(1000),1000);
CPU times: user 0.47 s, sys: 0.00 s, total: 0.47 s
Wall time: 0.46 s

Mathematica:
In[1]:= Timing[DivisorSigma[1000,1000!];]
Out[1]= {3.01, Null}

flint2:
350 ms


Ramanujan τ function:

Sage (uses FLINT 1):
sage: %time delta_qexp(100000);
CPU times: user 0.42 s, sys: 0.01 s, total: 0.43 s
Wall time: 0.42 s
sage: %time delta_qexp(1000000);
CPU times: user 6.02 s, sys: 0.37 s, total: 6.39 s
Wall time: 6.40 s

flint2:
100000: 230 ms
1000000: 4500 ms


An isolated value (Mathematica seems to be the only other software that knows how to compute this):

Mathematica:
In[1]:= Timing[RamanujanTau[10000!];]
Out[1]= {8.74, Null}

flint2:
280 ms


Harmonic numbers (again, only Mathematica seems to implement these). See also my old blog post How (not) to compute harmonic numbers. I've included the fastest version from there, harmonic5:


Mathematica:
In[1]:= Timing[HarmonicNumber[100000];]
Out[1]= {0.22, Null}
In[2]:= Timing[HarmonicNumber[1000000];]
Out[2]= {6.25, Null}
In[3]:= Timing[HarmonicNumber[10000000];]
Out[3]= {129.13, Null}

harmonic5: (100000):
100000: 0.471 s
1000000: 8.259 s
10000000: 143.639 s

flint2:
100000: 100 ms
1000000: 2560 ms
10000000: 49400 ms


The FLINT 2 function benefits from an improved algorithm that eliminates terms and reduces the size of the temporary numerators and denominators, as well as low-level optimization (the basecase summation directly uses the MPIR mpn interface).

Isolated Stirling numbers of the first kind:


Mathematica:
In[1]:= Timing[StirlingS1[1000,500];]
Out[1]= {0.24, Null}
In[2]:= Timing[StirlingS1[2000,1000];]
Out[2]= {1.79, Null}
In[3]:= Timing[StirlingS1[3000,1500];]
Out[3]= {5.13, Null}

flint 2:
100,500: 100 ms
2000,1000: 740 ms
3000,1500: 1520 ms


Isolated Stirling numbers of the second kind:

Mathematica:
In[1]:= Timing[StirlingS2[1000,500];]
Out11]= {0.21, Null}
In[2]:= Timing[StirlingS2[2000,1000];]
Out[2]= {1.54, Null}
In[3]:= Timing[StirlingS2[3000,1500];]
Out[3]= {4.55, Null}
In[4]:= Timing[StirlingS2[5000,2500];]
Out[4]= {29.25, Null}

flint2:
1000,500: 2 ms
2000,1000: 17 ms
3000,1500: 50 ms
5000,2500: 240 ms


In addition, fast functions are provided for computing a whole row or matrix of Stirling numbers. For example, computing the triangular matrix of ~1.1 million Stirling numbers of the first kind up to S(1500,1500) takes only 1.3 seconds. In Mathematica (again, on the faster system):

In[1]:= Timing[Table[StirlingS1[n,k], {n,0,1500}, {k,0,n}];]
Out[1]= {2.13, Null}


The benchmarks above mostly demonstrate performance for large inputs. Another nice aspect of the FLINT 2 functions is that there typically is very little overhead for small inputs. The high performance is due to a combination of algorithms, low-level optimization, and (most importantly) the fast underlying arithmetic in FLINT 2. I will perhaps write some more about the algorithms (for e.g. Stirling numbers) in a later post.