Fredrik Johansson: cython

Saturday, June 6, 2009

Cython mpmath performance, part 2

A followup to the previous post. Turns out I had left a temporary integer variable undeclared, which slowed things down unnecessarily. After adding a simple cdef, the code runs a bit faster: at 53 bits, an addition now takes 850 ns, a multiplication 920 ns; a complex addition takes 1.25 µs and a complex multiplication takes 2.15 µs.

I've also implemented division and square root. To continue the benchmarking started in the previous post:


Real division, x/y

       mpmath    cython    sage

53     23.7 µs   1.49 µs   854 ns
100    24.6 µs   1.84 µs   1.17 µs
333    27 µs     2.69 µs   2.1 µs
3333   70.3 µs   44 µs     42.8 µs

Real square root, x.sqrt()

53     21.1 µs   1.75 µs   1.48 µs
100    22.5 µs   2.23 µs   1.68 µs
333    32 µs     3.81 µs   3.11 µs
3333   65.2 µs   35.3 µs   33.1 µs

Division is a bit slow compared Sage, but it might be possible to improve. As with the other arithmetic operations, the speedup vs ordinary mpmath is still more than a factor 10 at low precision.

Further, here is a benchmark of the new gamma function algorithm (mentioned in my talk at SD15), which is an order of magnitude faster than the algorithms currently used by both mpmath and MPFR. I had already written a pure Python implementation, which was quite fast, but it's insanely fast in Cython -- at really low precision it's even faster than the exponential function (although this fact probably indicates that the exponential function could be streamlined further). It's still half as fast as exp at 333-bit precision:


Real gamma function, x.gamma()   (gamma(x) in mpmath)

       mpmath    cython    sage

53     332 µs    8.6 µs    195 µs
100    434 µs    12.9 µs   257 µs
333    1.63 ms   52 µs     975 µs
3333   90.9 ms   9.95 ms   347 ms

Yes, that is a whopping 20,000 hundred-digit gamma functions per second!

Making the Cython code fully usable as an mpmath backend might require another week or two of work. This estimate includes rewriting utility code in Cython, but not porting transcendental functions. (The main reason why this will take longer is that I don't just want to translate the existing code, but redo the implementations from scratch to make everything uniformly faster.)

I'm not working full time on Cython mpmath backend; in parallel, I'm working on the mpmath wrapper code for Sage (some of which is ready and has been posted to the Sage trac) and on high-level implementations of various special functions. The new functions I have written partial implementations for so far include the generalized exponential integral (E_n), incomplete beta function, Hurwitz zeta function with derivatives, Appell hypergeometric functions of two variables, and matrix sqrt/exp/log/cos/sin. If anyone feels that their favorite special function is missing from Sage and/or mpmath, I'm taking requests :-)

I've also done little bit of work fixing hypergeometric functions for the hard cases, but really haven't made much progress on this so far (this is a substantial project which will take some time).

By the way, there should be a new mpmath release any day now. I mostly need to review the documentation and do some minor cleanup.

Wednesday, May 27, 2009

Cython mpmath performance

I'm presently working on a Cython-based backend for mpmath, with the immediate goal to make mpmath competitive as a component of Sage. The Cython code currently depends on utility functions in the Sage library but should eventually be possible to compile for standalone use or together with SymPy's future Cython backend (linking directly against GMP/MPIR).

So far I've implemented a real type (mpf replacement) and a complex type (mpc replacement); addition, multiplication, and the real exponential function; just enough to do some basic benchmarking. Below is a comparison between standard Python mpmath (running on top of sage.Integer), the new Cython-based types, and Sage's RealNumber / ComplexNumber types. I used the inputs x = sqrt(3)-1, y = sqrt(5)-1, and for the complex cases z = x+yi, w = y+xi. The precision ranges between 53 bits (IEEE double compatible) and 3333 bits (≈1000 decimal digits).


Addition, x+y

       mpmath    cython    sage

53     8.72 µs   979 ns    707 ns
100    8.81 µs   1.01 µs   733 ns
333    10.1 µs   1.14 µs   737 ns
3333   10.6 µs   1.59 µs   1.15 µs

Multiplication, x*y

53     9.04 µs   968 ns    728 ns
100    9.56 µs   1.14 µs   901 ns
333    11.3 µs   1.59 µs   1.2 µs
3333   35.7 µs   25.4 µs   17.8 µs

Real exponential function, exp(x)

53     56.2 µs   9.41 µs   16 µs
100    52.8 µs   11.6 µs   11.3 µs
333    122 µs    25.4 µs   26.9 µs
3333   1.38 ms   831 µs    1.14 ms

Complex addition, z+w

53    14.7 µs   1.45 µs   978 ns
100   14.5 µs   1.7 µs    1 µs
333   16.3 µs   1.85 µs   992 ns
3333  17.7 µs   3.2 µs    1.73 µs

Complex multiplication, z*w

53    35.8 µs   2.72 µs   1.68 µs
100   34.5 µs   2.76 µs   2.08 µs
333   43.2 µs   4.38 µs   3.61 µs
3333  139 µs    98.6 µs   70 µs

A few observations can be made. First off, basic arithmetic with number instances is an order of magnitude faster when fully Cythonized. This isn't surprising because something as simple as an addition of two mpmath.mpf instances has to go through some 50 lines of Python code to check types, check for special cases, and round the result (creating lots of temporary objects, etc).

Reimplemented in Cython, arithmetic comes well within half the speed of Sage's numbers. I believe MPFR (which Sage uses) does arithmetic faster because it works directly with the limb data and uses some tricks to speed up the rounding, whereas my implementation uses the mpz interface straightforwardly. Some difference also comes from the fact that MPFR uses machine-precision integers for exponents, whereas I'm using mpz_t to allow arbitrary-size exponents. In any case, the results are very good.

At very low precision, memory allocations account for probably half the time (for both my Cython-based types and Sage's RealNumber). Thus there is some room for improvement later on by implementing a freelist.

The best news (for special functions) is that my exponential function is as fast as MPFR's, so the same should be true for other functions when I get there. The new exp (which uses a slightly different algorithm from the one currently in mpmath) is actually slightly faster than MPFR, although it should be said that performance for transcendental functions depends heavily on tuning, the inputs, and possibly other factors (I have no idea why Sage's RealNumber.exp in my benchmark is much slower at 53 bits than at 100 bits, for example), so this is not a conclusive result.

The code itself is currently too messy and ad-hoc to share publicly, sorry.

Fredrik Johansson

Saturday, June 6, 2009

Cython mpmath performance, part 2

Wednesday, May 27, 2009

Cython mpmath performance

About Me

Blog Archive

Other blogs