The function torturer takes a function, say

`f = airyai`or

`f = lambda z: hyp1f1(1.5, -2.25, z)`, and attempts to evaluate it with z = a · 10

^{n}as

*n*ranges between -20 and 20, with

*a*≈ 1, 1+i, i, -1+i, -1. (The "≈" indicates multiplication by an additional quasi-random factor near unity.)

Thus it tests a wide range of magnitudes (both tiny and huge), with pure positive, negative, imaginary, and complex arguments; i.e. covering most distinct regions of the complex plane. (It doesn't cover the lower half-plane, but that's generally just a waste of time since most function algorithms are agnostic about the sign of the imaginary part -- this should be done for some functions in the future, however.)

Each evaluation is also performed at a range of precisions, between 5 and 150 digits by default (for fast functions the max precision is set much higher). The results at two successive precisions are compared to verify that they agree relatively to the lesser of the precisions (with a single-digit error allowed for roundoff error).

This doesn't guarantee correctness, of course -- I might have implemented the wrong formula for the function -- but it's a strong check that whatever formula has been implemented is being evaluated accurately. It does provide some amount of correctness testing for those functions that use more than one algorithm depending on precision and/or the magnitude of the argument (and this includes the hypergeometric functions). If one algorithm is used at low precision and another at high precision, then an error in either will be revealed when comparing results.

My original intention with was just to check robustness of the asymptotic expansions of hypergeometric functions. Testing with a continuous range of magnitudes ensures that there aren't any regions where convergence gets impossibly slow, a loop fails to terminate, an error estimate is insufficient, etc. (And this needs to be done at several different levels of precision.) I then ended up including many other functions in the list of torture tests as well.

The results? I discovered (and fixed!) some 10-20 bugs in mpmath. Most were of the kind "only 43 accurate digits were returned where 50 were expected" and due to inadequate error estimates and/or insufficient extra precision being allocated (mostly in argument transformations). I actually knew about many of the problems, and had just neglected to fix them because they wouldn't show up for "normal" arguments (and in particular, there weren't any existing tests that failed).

Lesson to be learned: if it isn't tested, it probably doesn't work (fully). Semi-automatic exhaustive or randomized testing (the more "evil" inputs, the better), is a necessary complement to testing hand-picked values.

There is more to be done:

- Some functions don't yet pass the torture tests
- Rather than just large/small, the relevant stress measure for some functions is closeness to a negative integer, closeness to the unit circle, etc
- Functions of more than one argument should be tested with randomly selected values for all parameters

But already now, the torture tests have done a good job. Tellingly, functions I implemented recently are much less buggy than functions I implemented long ago (even just three months ago). Periodically rewriting old code as one learns more is bound to improve quality :-)

Speaking of bugs, Mathematica doesn't pass the mpmath torture test suite (far from it). An example of something that will fail:

In[47]:= N[HypergeometricPFQ[{1},{-1/4,1/3},10^10],15]

General::ovfl: Overflow occurred in computation.

General::ovfl: Overflow occurred in computation.

General::ovfl: Overflow occurred in computation.

General::stop: Further output of General::ovfl

will be suppressed during this calculation.

(... nothing for 5 minutes)

Interrupt> a

Out[47]= $Aborted

And mpmath (instantaneously):

>>> mp.dps = 15; mp.pretty = True

>>> hyp1f2(1,(-1,4),(1,3),10**10)

-3.53521222026601e+86866

By the way, the current set of torture tests takes about 11 minutes to run on my computer with psyco and gmpy (much longer without either). It's possibly a better benchmark than the standard tests.

**Update**: running the torture tests with multiprocessing on sage.math takes only 28 seconds (which is the time for the slowest individual test). Go parallel!

## 3 comments:

i looked at your code to execute the tests and you didn't parallelize it. using process pool/multiprocess it is very easy and might save you a lot of time once you run it on several cores! http://stackoverflow.com/questions/884650/python-spawn-parallel-child-processes-on-a-multi-processor-system-use-multipro/884846#884846

Yes, it would be easy to parallelize. But I don't have a multicore system myself, so I didn't find it worthwhile yet. Patches welcome!

Great job :)

It would be fair to send the faillure log to wolfram. It would be great to help to correct two piece of code at the same time ;) (even if mathematica isn't open source...)

Post a Comment