>>> num(1.2345678901234567890123456789)

1.23456789012346

>>> _.accuracy()

15.954589770191001

>>> num(1.2345678901234567890123456789, 5)

1.2346

>>> _.accuracy()

5.0

>>> num(1.2345678901234567890123456789, 25)

1.234567890123456690432135

>>> _.accuracy()

25.0

In the last example, the fact that the input float has a limited accuracy as a representation of the entered decimal literal becomes visible. (Support for passing an exact string as input to avoid this problem will be added.)

The accuracy is a measure of relative accuracy (or relative error). A num instance with value u and accuracy a represents the interval u · (1 + ξ 2

^{-a}) for values of ξ between -1 and 1. The relative error, in the traditional numerical analysis meaning of the term, is given by 2

^{-a}and the absolute error is given by |u| · 2

^{-a}. In other words, the accuracy is a logarithmic (base 2) measure of error; in some cases, it is more natural to consider the (logarithmic) error, given by -a.

It can usually be assumed that a > 0. When this property does not hold, not even the sign of the number it is supposed to represent is known (unless additional assumptions are made). Such extremely inaccurate numbers will need special treatment in various places; when they result from higher operations (such as when the user asks for N digits of an expression in SymPy), they should probably signal errors.

How do errors propagate? The simplest case is that of multiplication. If x = u · (1+e) and y = v · (1+f), then x · y = u · v · (1+e) · (1+f) = (u · v) · (1 + e + f + e·f). Therefore the cumulative error is given by e + f + e·f. Expressed in terms of the logarithmic representation of the errors, 2

^{-a}= e and 2

^{-b}= f, the final accuracy is given by c = −log

_{2}(2

^{-a}+ 2

^{-b}+ 2

^{-a-b}). This expression can usually be approximated as c = min(a,b), which is the rule I have implemented so far. (Further testing will show whether it needs to be refined).

This analysis does not account for the error due to roundoff in the floating-point product u · v. Such errors can be accounted for either by adding an extra term to the error sum or by setting the arithmetic precision slightly higher than the estimated accuracy. I'm currently taking the latter approach.

Division is similar to multiplication.

Addition and subtraction is a bit harder, as these operations need to translate the relative errors of all the terms into a combined absolute error, and then translate that absolute error back to a relative accuracy (of the final sum). It is important to note that the accuracy of a sum can be much greater or much lesser than the minimum accuracy of all the terms.

Translating between relative and absolute error associated with a number involves knowing its magnitude, or rather the base-2 logarithm thereof. This is very easy with mpmath numbers, which are represented as tuples x = (sign, man, exp, bc) where bc is the bit size of the mantissa. The exact magnitude of x is given by log

_{2}x = log

_{2}man + exp, and this quantity can be approximated closely as exp+bc (although doing so admittedly does not pay off much since a call to math.log is inexpensive in Python terms).

Perhaps the most important type of error that significance arithmetic catches is loss of significance due to cancellation of terms with like magnitude but different sign. For example, 355/113 is an excellent approximation of pi, accurate to more than one part in a million. Subtracting the numbers from each other, with each given an initial accuracy of ~15 digits, leaves a number with only 9 accurate digits:

>>> pi = num(3.1415926535897932)

>>> 355./113 - pi

2.66764189e-7

>>> _.accuracy()

9.0308998699194341

Compare this with what happens when adding the terms:

>>> 355./113 + pi

6.28318557394378

>>> _.accuracy()

15.954589770191001

Adding an inaccurate number to an accurate number does not greatly reduce accuracy if the less accurate number has small magnitude:

>>> pi + num(1e-12,2)

3.1415926535908

>>> _.accuracy()

14.342229822223226

Total cancellation currently raises an exception. Significance arithmetic requires special treatment of numbers with value 0, because the relative error of 0 is meaningless; instead some form of absolute error has to be used.

I have also implemented real-valued exponentiation (x**y, exp(x) and sqrt(x)). These operations are currently a little buggy, but the basics work.

Exponentiation reduces accuracy proportionally to the magnitude of the exponent. For example, exponentiation by 10

^{10}removes 10 digits of accuracy:

>>> from devel.num import *

>>> pi = num(3.1415926535897932)

>>> pi.accuracy()

15.954589770191001

>>> pi ** (10**10)

8.7365e+4971498726

>>> _.accuracy()

5.954589770191002

Contrast this with the behavior of mpmath. With the working precision set to 15 digits, it will print 15 (plus two or three guard) digits of mantissa. Re-computing at a higher precision however verifies that only the first five digits were correct:

>>> from mpmath import mp, pi

>>> mp.dps = 15

>>> pi ** (10**10)

mpf('8.7365179634758897e+4971498726')

>>> mp.dps = 35

>>> pi ** (10**10)

mpf('8.7365213691213779435688568850099288527e+4971498726')

One might wonder where all the information in the input operand has gone. Powering is of course implemented using binary expontiation, so the rounding errors from repeated multiplications are insignificant.

The answer is that exponentiation transfers information between mantissa and exponent (contrast with a single floating-point multiplication, which works by separately multiplying mantissas and adding exponents). So to speak, exponentiation by 10

^{n}moves n digits of the mantissa into the exponent and then fiddles a little with whatever remains of the mantissa. Logarithms do the reverse, moving information from the exponent to the mantissa.

This is not something you have to think of often in ordinary floating-point arithmetic, because the exponent of a number is limited to a few bits and anything larger is an overflow. But when using mpmath numbers, exponents are arbitrary-precision integers, treated as exact. If you compute pi raised to 1 googol, you get:

>>> num(3.1415926535897932) ** (10**100)

1.0e+497149872694133837421723124552379401369398283517725430

2984571553024909360917306608664397050705368816

>>> _.accuracy()

-84.045410229808965

The exponent, although printed as if accurate to 100 digits, is only accurate to 15. Although the accuracy is reported as negative, the number does have a positive "logarithmic accuracy". So in contexts where extremely large numbers are used, some extra care is needed.

A counterintuitive property of arithmetic, that a direct implementation of precision-tracking floating-point arithmetic fails to capture, is that some operations

*increase*accuracy. Significance arithmetic can recognize these cases and automatically adjust the precision to ensure that no information is lost. For example, although each input operand is accurate to only 15 decimal places, the result of the following operation is accurate to 65:

>>> num(2.0) ** num(1e-50)

1.00000000000000000000000000000000000000000000000000693147

18055995

>>> _.accuracy()

65.783713593702743

This permits one to do things like

>>> pi = num(3.1415926535897932)

>>> (pi / (exp(pi/10**50) - 1)) / 10**50

1.0

>>> _.accuracy()

15.65355977452702

which in ordinary FP or interval arithmetic have a tendency to cause divisions by zero or catastrophic loss of significance, unless the precision is manually set high enough (here 65 digits) from the start.

Automatically-increasing precision is of course a bit dangerous since a calculation can become unpexpectedly slow in case the precision is increased to a level much higher than will be needed subsequently (e.g. when computing 15 digits of exp(10

^{-100000}) rather than exp(

^{-100000})-1). This feature therefore needs to be combined with some user-settable precision limit.

I'll finish this post with a neat example of why significance or interval arithmetic is important for reliable numerical evaluation. The example is due to Siegfried M. Rump and discussed further in the paper by Sofroniou and Spaletta mentioned in an earlier post. The problem is to evaluate the following function for x = 77617 and y = 33096:

def f(x,y):

return 1335*y**6/4 + x**2*(11*x**2*y**2-y**6-121*y**4-2) + \

11*y**8/2 + x/(2*y)

Directly with mpmath, we get (at various levels of precision):

>>> from mpmath import mp, mpf, nstr

>>> for i in range(2,51):

... mp.dps = i

... print i, nstr(f(mpf(77617),mpf(33096)),50)

...

2 1.171875

3 2596148429267413814265248164610048.0

4 1.172607421875

5 1.172603607177734375

6 1.1726038455963134765625

7 1.1726039350032806396484375

8 -9903520314283042199192993792.0

9 -1237940039285380274899124224.0

10 -154742504910672534362390528.0

11 9671406556917033397649408.0

12 1.17260394005324997124262154102325439453125

13 75557863725914323419136.0

14 -9444732965739290427392.0

15 -1180591620717411303424.0

16 1.1726039400531786394132893747155321761965751647949

17 -18446744073709551616.0

18 1152921504606846977.25

19 144115188075855873.171875

20 1.1726039400531786318594495511016817523852751037339

21 1125899906842625.1726038455963134765625

22 1.1726039400531786318588407461708427240165697469365

23 1.172603940053178631858834128725942299795170775667

24 1.1726039400531786318588349559065548528228456470757

25 1.1726039400531786318588349042077665682586159676126

26 8589934593.1726039400531786255355015669010754209012

27 1073741825.1726039400531786318238741673170011381444

28 1.1726039400531786318588349045106891558634845008907

29 1.1726039400531786318588349045201554867261366425557

30 1.1726039400531786318588349045201554867261366425557

31 1.1726039400531786318588349045201801386294247991746

32 1.172603940053178631858834904520183220117335818752

33 1.1726039400531786318588349045201837978963191349227

34 1.1726039400531786318588349045201837015998219155609

35 1.172603940053178631858834904520183707618352991771

36 -0.82739605994682136814116509547981629200548881596584

37 -0.82739605994682136814116509547981629200548881596584

38 -0.82739605994682136814116509547981629199961134421173

39 -0.82739605994682136814116509547981629199906033123478

40 -0.82739605994682136814116509547981629199903737236074

41 -0.82739605994682136814116509547981629199903306757186

42 -0.82739605994682136814116509547981629199903306757186

43 -0.82739605994682136814116509547981629199903311241341

44 -0.82739605994682136814116509547981629199903311521601

45 -0.82739605994682136814116509547981629199903311574149

46 -0.82739605994682136814116509547981629199903311578528

47 -0.82739605994682136814116509547981629199903311578528

48 -0.82739605994682136814116509547981629199903311578443

49 -0.82739605994682136814116509547981629199903311578439

50 -0.82739605994682136814116509547981629199903311578438

Remarkably, the evaluations are not only wildly inaccurate at low precision; up to 35 digits, they seem to be converging to the value 1.1726..., which is wrong!

Significance arithmetic saves the day:

>>> for i in range(2,51):

... try:

... r = f(num(77617,i),num(33096,i))

... s = str(r)

... a = r.accuracy()

... print i, str(s), a

... except (NotImplementedError):

... continue

...

2 2.0e+33 -2.51544993496

5 6.0e+29 -2.82677988726

6 -8.0e+28 -2.72986987426

7 -1.0e+28 -2.63295986125

8 -1.0e+27 -2.53604984824

9 -2.0e+26 -2.43913983523

10 1.0e+25 -2.64325981789

12 2.0e+23 -2.44943979187

13 -9.0e+21 -2.65355977453

14 -1.0e+21 -2.55664976152

16 -2.0e+19 -2.3628297355

17 1.0e+18 -2.56694971816

18 -7.0e+16 -2.77106970081

20 1.0e+15 -2.5772496748

21 -1.0e+14 -2.48033966179

25 9.0e+9 -2.69475960109

26 1.0e+9 -2.59784958808

27 -1.0e+8 -2.50093957507

35 -8.0e-1 -2.92977945366

36 -8.0e-1 -1.92977945366

37 -8.0e-1 -0.929779453662

38 -8.0e-1 0.0702205463384

39 -8.0e-1 1.07022054634

40 -8.3e-1 2.07022054634

41 -8.27e-1 3.07022054634

42 -8.274e-1 4.07022054634

43 -8.274e-1 5.07022054634

44 -0.827396 6.07022054634

45 -0.8273961 7.07022054634

46 -0.82739606 8.07022054634

47 -0.82739606 9.07022054634

48 -0.8273960599 10.0702205463

49 -0.82739605995 11.0702205463

50 -0.827396059947 12.0702205463

I had to wrap the code in try/except-clauses due to num(0) not yet being implemented (at many of the low-precision stages, but fortunately for this demonstration not all of them, there is not unpexpectedly complete cancellation).

The accuracy is reported as negative until the arguments to the function are specified as accurate to over 38 digits, and at that point we see that the printed digits are indeed correct. (The values do not exactly match those of mpmath due to slightly different use of guard digits.)

Interestingly, the num class seems to greatly overestimate the error in this case, and I'm not yet sure if that's inherent to evaluating Rump's particular function with significance arithmetic or due to implementation flaws. It's of course better to overestimate errors than to underestimate them.

Numerical evaluation of a SymPy expression can be seen as converting the expression to a function of its constants (integers and special numbers like pi) and evaluating that function with the constants replaced by significance floating-point numbers. In practice, I will probably implement numerical evaluation using a recursive expression tree walker. This way forward error analysis can be used to efficiently obtain the user-requested level of accuracy at the top of the expression. Subevaluations can be repeated with increased precision until they become accurate enough; precision only needs to be increased for parts of an expression where it actually matters.

My next goal, besides polishing the existing features (and implementing num(0)), is to implement more functions (log, trigonometric functions), comparison operators, and writing some tests. With trigonometric functions, all the arithmetic operations for complex numbers will be straightforward to implement.

## 4 comments:

But see also:

http://www2.hursley.ibm.com/decimal/decifaq4.html#signif

(and, in particular, Delury's paper referred to there).

Mike Cowlishaw

Thanks for the input, Mike.

Significance arithemtic is certainly bad in some situations. A rather serious problem that I think your FAQ neglects to mention is that it is useless for "self-correcting" numerical algorithms such as Newton's method.

What I'm trying to accomplish right now is mainly reliable conversion of exact mathematical expressions into accurate numerical values, similar to what Mathematica does.

For working with measurements in a more rigorous way, I would like to some day extend SymPy's statistics module so that you can do arithmetic with (arbitrary) statistical distributions.

OK, thanks -- good point re. converging algorithms.

Mike

I'm trying to find the correct formula for "Rump's Example" (and free of typo errors). Please see slide 6 of this: https://hpcrd.lbl.gov/SciDAC08/files/presentations/ARendell.SciDAC.pdf

It doesn't agree with your formula for Rump, and I believe the slide has other problems.

Thanks!

Post a Comment