Fredrik Johansson: February 2010

Friday, February 12, 2010

Numerical multidimensional infinite series

Another new feature in mpmath: support for multidimensional series in nsum. Some very simple examples (finite and infinite summations can be combined in any desired order):


>>> nsum(lambda i,j: 2**(-i-j), [0,inf], [0,inf])
4.0
>>> nsum(lambda i,j,k: 2**(-i-j-k), [0,inf], [0,inf], [0,inf])
8.0
>>> nsum(lambda i,j,k,l: i/2**j*(i+l)/2**k, [1,2], [0,inf], [1,inf], [2,3])
50.0
>>> nsum((lambda i,j,k,l: 1/(2**(i**2+j**2+k**2+l**2))), *([[-inf,inf]]*4))
20.5423960756379
>>> jtheta(3,0,0.5)**4
20.5423960756379

One could of course also make nested calls to nsum, but having a direct syntax is much more convenient. Nested evaluation is also usually inefficient for convergence acceleration, so this is not what nsum does internally. Instead, it combines all the infinite summations to a single summation over growing hypercubes. The distinction is very important for conditionally convergent series.

For example, nsum can now directly evaluate the Madelung constant in three dimensions (ignore=True is to ignore the singular term at the origin):


>>> nsum(lambda i,j,k: (-1)**(i+j+k)/(i**2+j**2+k**2)**0.5,
...     [-inf,inf], [-inf,inf], [-inf,inf], ignore=True)
-1.74756459463318

While this evaluation takes several seconds, it is somewhat remarkable that a precise value can be obtained at all. A better way to compute the Madelung constant is using the following rapidly convergent 2D series:


>>> f = lambda i,j: -12*pi*sech(0.5*pi*sqrt((2*i+1)**2+(2*j+1)**2))**2
>>> nsum(f, [0,inf], [0,inf])
-1.74756459463318
>>> mp.dps = 100
>>> nsum(f, [0,inf], [0,inf])
-1.74756459463318219063621203554439740348516143662474175815282535076504062353276
117989075836269460789

Another nice application is to evaluate Eisenstein series directly from the definition:


>>> tau = 1j
>>> q = qfrom(tau=tau)
>>> nsum(lambda m,n: (m+n*tau)**(-4), [-inf,inf], [-inf,inf], ignore=True)
(3.1512120021539 + 0.0j)
>>> 0.5*sum(jtheta(n,0,q)**8 for n in [2,3,4])*(2*zeta(4))
(3.1512120021539 + 0.0j)

Thursday, February 11, 2010

A new gamma function implementation

The gamma function is probably the most important nonelementary special function. The gamma function or ratios of gamma functions appear in everything from the functional equation of the Riemann zeta function to the normalization factors of hypergeometric functions. Even expressions that are usually rational functions (such as binomial coefficients) are often most conveniently expressed (or in software, implemented) in terms of the gamma function since this automatically provides correct asymptotics and extension to noninteger parameters. Needless to say, a well-implemented gamma function is fundamental for a special functions library.

The gamma function implementation used in mpmath until now is one of the oldest parts of the code (I wrote it around three years ago), and although it has done its job, it has some flaws. It mainly uses Spouge's approximation, which is fairly efficient but nonoptimal for asymptotically large arguments. The implementation also has some subtle precision issues.

The new gamma function added in this commit is the result of over a year of intermittent work. I did most of the work during last summer but didn't have time to finish it until recently. The improvements are essentially the following:

Removal of various overheads

Special-casing half-integers

Using Taylor series for small real arguments

Using Stirling's series for complex and/or large arguments

Optimizing for computing loggamma

Handling values near poles and near the real axis extremely carefully

The difference in performance is quite dramatic. The smallest speedup occurs for small complex arguments, but it turns out that even there, the Stirling series is faster than Spouge's approximation (at least with the removal of overheads in the new implementation) despite the need to perform argument transformations. For real arguments, and large complex arguments, the new implementation is consistently faster by a large factor, especially for loggamma.

I'll show some benchmarks. The times are in milliseconds; the first value in each cell is the time for the old implementation and the second is the time for the new one, followed by the speedup. The results are on my laptop, with gmpy (I'll try with Sage later):

Firstly, results for gamma(z):

z	dps=15	dps=50	dps=250	dps=1000
3.0	0.0086 0.0077 (1.13x)	0.0089 0.0076 (1.16x)	0.0087 0.0075 (1.16x)	0.0085 0.0077 (1.12x)
5.27	0.1200 0.0342 (3.51x)	0.2513 0.0545 (4.61x)	2.5964 0.4220 (6.15x)	80.7089 11.3011 (7.14x)
2.5	0.1255 0.0158 (7.94x)	0.2398 0.0138 (17.41x)	2.5519 0.0144 (177.72x)	79.0651 0.0149 (5319.87x)
-3.7	0.2518 0.0457 (5.51x)	0.4006 0.0615 (6.52x)	3.1073 0.4283 (7.26x)	87.4200 11.3981 (7.67x)
(-3.0+4.0j)	0.6205 0.3942 (1.57x)	1.0095 0.6093 (1.66x)	9.0896 5.9099 (1.54x)	253.9219 191.3358 (1.33x)
(3.25-4.125j)	0.3548 0.2864 (1.24x)	0.7456 0.4791 (1.56x)	8.6651 5.6367 (1.54x)	252.7598 189.2187 (1.34x)
(15.0+20.0j)	0.3980 0.1973 (2.02x)	0.7526 0.4496 (1.67x)	8.6583 5.4604 (1.59x)	250.8646 188.5849 (1.33x)
52.1	0.1661 0.0586 (2.83x)	0.3050 0.0894 (3.41x)	2.9862 0.5685 (5.25x)	84.8334 12.4169 (6.83x)
123.25	0.1643 0.0638 (2.57x)	0.3071 0.0963 (3.19x)	2.8908 0.9194 (3.14x)	83.8168 14.6588 (5.72x)
-200.7	0.3011 0.1485 (2.03x)	0.4897 0.1914 (2.56x)	3.5145 1.1359 (3.09x)	89.8617 17.5500 (5.12x)
(100.25+100.0j)	0.4238 0.1857 (2.28x)	0.7641 0.2757 (2.77x)	8.6807 3.1403 (2.76x)	249.6731 171.4369 (1.46x)
12345678.9	0.3032 0.0701 (4.32x)	0.4382 0.0857 (5.11x)	3.4078 0.3257 (10.46x)	88.4082 5.2828 (16.74x)
(0.0+12345678.9j)	0.7814 0.1409 (5.54x)	1.1671 0.1802 (6.48x)	10.3985 0.6169 (16.86x)	265.6192 8.3569 (31.78x)
1.0e+20	0.8833 0.0798 (11.06x)	1.3019 0.0899 (14.48x)	6.4010 0.2939 (21.78x)	107.0924 3.5141 (30.47x)
(0.0+1.0e+20j)	3.5025 0.1515 (23.11x)	3.6163 0.1834 (19.72x)	19.5676 0.5308 (36.86x)	322.9033 6.2148 (51.96x)

Secondly, results for loggamma(z):

z	dps=15	dps=50	dps=250	dps=1000
3.0	0.0700 0.0115 (6.07x)	0.0677 0.0116 (5.84x)	0.0683 0.0119 (5.73x)	0.0683 0.0121 (5.65x)
5.27	0.2207 0.0564 (3.91x)	0.3548 0.0796 (4.45x)	2.8925 0.5221 (5.54x)	88.3556 12.6452 (6.99x)
2.5	0.2210 0.0319 (6.93x)	0.3565 0.0374 (9.53x)	2.8595 0.1011 (28.28x)	81.8333 1.3546 (60.41x)
-3.7	0.4392 0.0782 (5.62x)	0.6418 0.0987 (6.50x)	3.5767 0.5429 (6.59x)	88.8484 12.7363 (6.98x)
(-3.0+4.0j)	1.1235 0.4942 (2.27x)	1.5685 0.7224 (2.17x)	10.0879 6.0459 (1.67x)	267.2893 198.7613 (1.34x)
(3.25-4.125j)	0.8789 0.2704 (3.25x)	1.2326 0.4752 (2.59x)	9.7501 5.4572 (1.79x)	264.7649 190.4007 (1.39x)
(15.0+20.0j)	0.8983 0.1248 (7.20x)	1.2923 0.4221 (3.06x)	9.7917 5.2766 (1.86x)	265.2711 187.9775 (1.41x)
52.1	0.2604 0.0812 (3.21x)	0.4186 0.1140 (3.67x)	3.2493 0.6890 (4.72x)	87.0111 13.9280 (6.25x)
123.25	0.2629 0.0478 (5.50x)	0.4227 0.0692 (6.11x)	3.1110 1.0216 (3.05x)	85.9028 15.9917 (5.37x)
-200.7	0.5439 0.1631 (3.34x)	0.7465 0.2221 (3.36x)	3.9305 1.2537 (3.14x)	92.5380 18.8309 (4.91x)
(100.25+100.0j)	0.9103 0.1179 (7.72x)	1.3332 0.1849 (7.21x)	10.1655 3.1083 (3.27x)	266.3565 170.3099 (1.56x)
12345678.9	0.4026 0.0491 (8.21x)	0.5439 0.0525 (10.36x)	3.7183 0.2051 (18.13x)	89.9402 4.0044 (22.46x)
(0.0+12345678.9j)	1.2628 0.0765 (16.52x)	1.7085 0.0858 (19.91x)	11.5023 0.2749 (41.84x)	275.6174 4.4548 (61.87x)
1.0e+20	0.9820 0.0495 (19.84x)	1.3882 0.0540 (25.72x)	6.6983 0.1481 (45.21x)	110.1757 2.1954 (50.19x)
(0.0+1.0e+20j)	4.0385 0.0788 (51.24x)	4.3758 0.0833 (52.54x)	20.9478 0.1936 (108.20x)	335.9407 2.3911 (140.50x)

All times are from a warm cache. The new implementation has slightly longer precomputation time (for the Taylor series coefficients) at high precision, and the Taylor series therefore isn't used above a certain precision (around 1000 digits) despite being much faster on subsequent calls. I will probably add some user-visible function for controlling this.

Finally, several optimizations are possible still. The most important would be to do the implementation in Cython. (Improving the underlying elementary functions will also speed up the gamma function.) Potential algorithmic improvements include faster handling of near-integers (using Taylor series with a reduced number of terms), avoiding the reflection formula for complex arguments in the left half-plane but not too close to the negative half-axis, and performing the argument transformations with a reduced number of multiplications. But I doubt I will have time (or interest -- the present code was demanding enough to finish) to work of any of these things, at least any time soon.

The speedups naturally also affect performance of many other functions, e.g. hypergeometric functions; I might post some benchmarks of that later.

Friday, February 5, 2010

Mpmath 0.14 released

I've just released version 0.14 of mpmath! Some highlights in this release have been covered in previous development updates on this blog:

mpmath in Sage to become 3x faster -- a Cython extension module will be added to Sage that greatly speeds up mpmath (the Sage patch is currently being reviewed; if all goes to plan, the patch will be accepted soon and the mpmath version in Sage will be updated to 0.14 at the same time).

Zeta evaluation with the Riemann-Siegel expansion -- Juan Arias de Reyna contributed code for very efficient and robust evaluation of the Riemann zeta function for arguments with large imaginary part.

Various features discussed in this update: 3D surface plotting (wrapping matplotlib, matrix calculus (transcendental functions of a matrix argument), Borel summation of divergent hypergeometric series, and options for whether to use Mathematica's conventions for hypergeometric functions have been added.

Improved accuracy for hypergeometric functions with large parameters, and analytic continuation implemented for 3F2, 4F3, and higher hypergeometric functions.

Support for using Python floats/complexes for faster low-precision math. This is very handy for plotting, multidimensional integration, and other things requiring many function evaluations.

There are many other changes as well. See the CHANGES file and the list of commits for more details. Thanks to Juan Arias de Reyna, Vinzent Steinberg, Jorn Baayen and Chris Smith who contributed patches, and various people who tested and submitted bug reports (thanks also to anyone else I forgot to mention).

I'm sorry that it took several months to finish this release! This was partly due to large internal reorganizations done in order to support floats and the Cython backend in Sage. I've also had to take some time off to focus on my studies and other things.

An example: spherical harmonics

Here I will demonstrate three new features with a single example: 3D plotting, one of the newly added special functions (spherical harmonics), and low-precision evaluation with the fp context. Of course, everything here can be done in arbitrary precision with mp as well; fp is just faster for plotting.

The following code visualizes spherical harmonics using a 3D polar plot. I've chosen to visualize the real part; the code is easily edited to show the absolute value (for example) instead.


from mpmath import fp

def Y(m,n):
    def g(theta,phi):
        R = fp.re(fp.spherharm(m,n,theta,phi))**2
        x = R*fp.cos(phi)*fp.sin(theta)
        y = R*fp.sin(phi)*fp.sin(theta)
        z = R*fp.cos(theta)
        return [x,y,z]
    return g

fp.splot(Y(0,0), [0,fp.pi], [0,2*fp.pi])

The first few spherical harmonics are thus:

Y(0,0)
Y(1,0)	Y(1,1)
Y(2,0)	Y(2,1)	Y(2,2)
Y(3,0)	Y(3,1)	Y(3,2)
		Y(3,3)

Some numerical examples

A selection of evaluations that were not implemented, failed, gave inaccurate results or were extremely slow in previous versions of mpmath:


>>> from mpmath import *
>>> mp.dps = 25; mp.pretty = True

# reflection formula for Barnes G-function
>>> barnesg(-100000+10000j)
(-7.332534002219787256775675e+22857579769 +
  1.304469872717249495403882e+22857579770j)

# Riemann zeta function, large height
>>> zeta(0.5+100000000j)
(-3.362839487530727943146807 + 1.407234559646447885979583j)

# Accurate evaluation of large-degree Bernoulli polynomial
>>> bernpoly(1000,100)
4.360903799140670486890619e+1996

# Computation of Euler numbers
>>> eulernum(20)
370371188237525.0
>>> eulernum(50, exact=True)
-6053285248188621896314383785111649088103498225146815121L
>>> eulernum(2000000000000000000)
3.19651108713502662532039e+35341231273461153426

# Accurate evaluation of hypergeometric functions with large parameters
>>> hyp2f1(1000,50,25,-1)
-2.886992344761576738138864e-274
>>> legenp(0.5, 300, 0.25)
-1.717508549497252387888262e+578
>>> besseli(2, 10, derivative=100)
821.2386210064833486609255

# Evaluation of 4F3 and 4F2
>>> hyper([1,2.125,3.25,4.5], [5.25,6.5,7], 1+2j)
(1.006737556189825231039221 + 0.3058770612264986390003873j)
>>> hyper([1,2.125,3.25,4.5], [5.25,6.5], 1+2j)
(0.3220085070641791195103201 + 0.5251231752161637627314328j)

# Matrix functions
>>> A = matrix([[2,3,1+j],[1,0,-1],[2,1,5]])
>>> mnorm(A - logm(expm(A)))
9.540212722670391941827867e-25
>>> mnorm(A - expm(logm(A)))
4.375286550134565812513542e-26
>>> nprint(expm(2*A))
[ (-2855.4 + 10733.0j)   (-1871.95 + 8597.9j)  (-7721.42 + 12906.7j)]
[(-3910.68 - 1749.82j)  (-3152.69 - 1272.11j)  (-4460.18 - 3558.54j)]
[ (17039.0 + 22621.8j)   (14320.3 + 17405.9j)   (13587.7 + 35087.8j)]
>>> nprint(expm(A)**2)
[ (-2855.4 + 10733.0j)   (-1871.95 + 8597.9j)  (-7721.42 + 12906.7j)]
[(-3910.68 - 1749.82j)  (-3152.69 - 1272.11j)  (-4460.18 - 3558.54j)]
[ (17039.0 + 22621.8j)   (14320.3 + 17405.9j)   (13587.7 + 35087.8j)]
>>> nprint(chop(sqrtm(A)*sqrtm(A)))
[2.0  3.0  (1.0 + 1.0j)]
[1.0  0.0          -1.0]
[2.0  1.0           5.0]

# A convenience function for precise evaluation of exp(j*pi*x)
>>> expjpi(36); exp(j*pi*36)
(1.0 + 0.0j)
(1.0 + 4.702307256907087875509661e-25j)

# Some fp functions have specialized implementations
# for improved speed and accuracy
>>> fp.erfc(5); float(mp.erfc(5))
1.5374597944280347e-12
1.5374597944280349e-12
>>> fp.erf(5); float(mp.erf(5))
0.99999999999846256
0.99999999999846256
>>> fp.ei(-12-3j); complex(mp.ei(-12-3j))
(4.6085637890261843e-07-3.141592693919709j)
(4.6085637890261901e-07-3.141592693919709j)
>>> fp.zeta(-0.5); float(fp.zeta(-0.5))  # real-argument fp.zeta
-0.20788622497735468
-0.20788622497735468
>>> fp.ci(40); float(mp.ci(40))
0.019020007896208769
0.019020007896208765
>>> fp.gamma(4+5j); complex(mp.gamma(4+5j))
(0.14965532796078551+0.31460331072967695j)
(0.14965532796078521+0.31460331072967596j)

For more examples and images, see the posts linked at the top of this post!

Monday, February 1, 2010

mpmath in Sage to become 3x faster

I've blogged previously about the potential speedups attainable by rewriting core parts of mpmath in Cython (here and here), but all I had back then was some standalone classes and functions that couldn't easily be integrated. Well, I now finally have a fully working, fully integrated Cython implementation of the mpf and mpc types (plus related utility functions) -- and it successfully runs the entire mpmath test suite!

On my computer (Ubuntu 64 bit, AMD Athlon 64 3700+), running import mpmath; mpmath.runtests() in Sage takes this much time:

Before [mpmath (svn trunk) + Sage 4.3.1]: 62.75 seconds

After [mpmath (svn trunk with modifications) + Cython mpmath types + Sage 4.3.1 + a Sage performance patch by Robert Bradshaw]: 19.87 seconds

I have essentially only implemented arithmetic operations and conversions, so core transcendental functions (as well as square roots and powers) still fall back to the Python code, and they account for the majority of the running time. According to cProfile, about 20% of the total time is spent in exp() and gamma() alone. Hypergeometric series evaluation is also still done in Python. Replacing these functions with Cython versions should give a significant boost, and will be very easy to do in the future (in terms of code infrastructure). So realistically, the running time for the unit tests can be cut much further, probably in half.

The code isn't public yet. I will soon commit the changes to the mpmath svn repository and create a patch for Sage with the Cython extensions. The code just needs a little more cleanup, and there are no doubt a couple of subtle issues (such as corner-case memory leaks) that need fixing... but essentially, it's working, and the tests pass, so I expect it to be releasable quite soon.

Update: after adding just a little more Cythonized wrapper code, the time is now down to 17.64 seconds. I have still not touched exp, log, cos, sin, gamma, or any other core transcendental functions. Expect further improvements.

Update 2 (2010-02-03): preliminary version of the Cython code here

Fredrik Johansson