Saturday, February 21, 2009

How (not) to compute harmonic numbers

The nth harmonic number is the nth partial sum of the divergent harmonic series, Hn = 1 + 1/2 + 1/3 + ... + 1/n. The simplest way to compute this quantity is to add it directly the way it is written: 1, 1+1/2, 1+1/2+1/3, and so on. For n approximately greater than 10 or 100, this is algorithm is not a very good one. Why?

Firstly, in floating-point arithmetic, it is much better to use the asymptotic expansion

which requires fewer terms the larger n is. Combining this with recurrence when n is too small gives a fast numerical algorithm for fixed-precision or arbitrary-precision computation of harmonic numbers (especially large ones). It allows one to determine, for example, that it takes exactly 15092688622113788323693563264538101449859497 terms until the partial sums of the harmonic series exceed 100. (This algorithm is implemented in mpmath, and most other comparable software.)

Secondly, the naive algorithm is especially bad for exact computation of harmonic numbers. This might not be obvious at a glance. We are just adding n numbers: what's to improve? The catch is that rational arithmetic has hidden costs: a single addition requires three multiplications and usually a GCD reduction. In contrast with integer or floating-point addition, rational addition gets very slow when the numerators and denominators get large. Computing harmonic numbers the naive way triggers worst-case behavior of rational arithmetic: the denominators grow like n! (if no GCD is performed, the denominator of Hn literally becomes n!).

It should be well known that computing n! as 1, 1·2, 1·2·3, ... is a poor method when n is large. A much better algorithm, assuming that multiplication is subquadratic, is to recursively split the products in half, i.e. compute (1·2·...·n/2) · ((n/2+1)·...·n) and so on. This algorithm has complexity O(log(n) M(n log(n)) compared to O(n2 log(n)) for the naive algorithm, where M(n) is the complexity of multiplying two n-bit integers. This approach is called binary splitting, and has many other applications.

The example of factorials directly suggests an analogous algorithm for computing harmonic numbers with reduced complexity: just recursively split the summation in half.

A few more variations are also possible. Instead of working with rational numbers, which require a GCD reduction after each addition, one can work with the unreduced numerators and denominators as integers, and form a rational number at the end. Another way to obtain a pure-integer algorithm for Hn = p/q is to compute the denominator q = n! and then use the formula

for the numerator.

Below, I have implemented five different algorithms for computing harmonic numbers based on the preceding remarks. The code works with either Python 2.6, using the Fraction type from the standard library, or with gmpy, by adding the following respective header code:

# python
from math import factorial as fac
from fractions import Fraction as mpq
one = 1
mpz = int

# gmpy
from gmpy import mpz, mpq, fac
one = mpz(1)

Algorithm 1: directly add 1, 1/2, 1/3, ..., 1/n:

def harmonic1(n):
return sum(mpq(1,k) for k in xrange(1,n+1))

Algorithm 2: like algorithm 1, GCD reduction postponed to the end:

def harmonic2(n):
p, q = mpz(1), mpz(1)
for k in xrange(2,n+1):
p, q = p*k+q, k*q
return mpq(p,q)

Algorithm 3: compute denominator; compute numerator series:

def harmonic3(n):
q = fac(n)
p = sum(q//k for k in xrange(1,n+1))
return mpq(p,q)

Algorithm 4: binary splitting:

def _harmonic4(a, b):
if b-a == 1:
return mpq(1,a)
m = (a+b)//2
return _harmonic4(a,m) + _harmonic4(m,b)

def harmonic4(n):
return _harmonic4(1,n+1)

Algorithm 5: binary splitting, GCD reduction postponed to the end:

def _harmonic5(a, b):
if b-a == 1:
return one, mpz(a)
m = (a+b)//2
p, q = _harmonic5(a,m)
r, s = _harmonic5(m,b)
return p*s+q*r, q*s

def harmonic5(n):
return mpq(*_harmonic5(1,n+1))

I did some benchmarking on my laptop (32-bit, 1.6 GHz Intel Celeron M), with n up to 104 for the Python version and 106 for the gmpy version. Here are the results (times in seconds):

n harmonic1 harmonic2 harmonic3 harmonic4 harmonic5
10 0.000159 0.000009 0.000010 0.000160 0.000025
100 0.004060 0.000233 0.000234 0.002211 0.000358
1000 0.889937 0.028822 0.029111 0.048106 0.026434
10000 769.109402 3.813702 3.823027 2.710409 3.475280

n harmonic1 harmonic2 harmonic3 harmonic4 harmonic5
10 0.000028 0.000010 0.000010 0.000033 0.000021
100 0.000284 0.000097 0.000111 0.000365 0.000226
1000 0.003543 0.001870 0.004986 0.004280 0.002651
10000 0.103110 0.142100 0.588723 0.059249 0.052265
100000 7.861546 17.002986 74.712460 1.115333 1.200865
1000000 994.548226 - - 27.927289 24.425421

Algorithm 1 clearly does not do well asymptotically. For the largest n measured, it is 284 times slower than the fastest algorithm with Python arithmetic and 40 times slower with gmpy arithmetic. Interestingly, algorithms 2 and 3 both substantially improve on algorithm 1 with Python arithmetic at least up to n = 10000, but both are worse than algorithm 1 with gmpy. This is presumably because GMP uses a much better GCD algorithm that reduces the relative cost of rational arithmetic.

Algorithms 4 and 5 are the clear winners for large n, but it is hard to tell which is better. The timings seem to fluctuate a bit when repeated, so small differences in the table above are unreliable. I think algorithm 5 could be optimized a bit if implemented in another language, by accumulating temporary values in machine integers.

It is amusing to note that computing the 10000th harmonic number with the recursive algorithm and an optimized implementation of rational arithmetic is 14,700 times faster than the direct algorithm with a non-optimized implementation of rational arithmetic. Choosing a larger n (say, 105) gives an even more sensational ratio, of course, which I'm not patient enough to try. The moral is: Moore's law is not an excuse for doing it wrong.

Algorithm 1 is not all bad, of course. With memoization (or implemented as a generator), it is good enough for sequential generation of the numbers H1, H2, H3, ..., and this tends to be needed much more commonly than isolated large harmonic numbers. SymPy therefore uses the memoized version of algorithm 1.

The algorithms above can be adapted for the task of computing generalized harmonic numbers, i.e. numbers of the form 1 + 1/2k + 1/3k + ... + 1/nk for some integer k. Also worth noting is that algorithm 5 allows for computation of Stirling numbers of the first kind. Since |S(n+1,2)| = n! · Hn, algorithm 5 can be viewed as an efficient algorithm for simultaneous computation of S(n,2) and n!. Expressing Stirling numbers in terms of generalized harmonic numbers, this extends to an algorithm for S(n,k) for any (large) n and (small) k.

A final remark: binary splitting is not the fastest algorithm for computing factorials. The fastest known method is to decompose n! into a product of prime powers and perform balanced multiplication-exponentiation. Can this idea be applied to harmonic numbers and/or Stirling numbers? It's not obvious to me how it would be done. Are there any other, possibly even faster algorithms for harmonic numbers? I've looked around, but been unable to find anything on the subject.


Thomas DA said...
This comment has been removed by the author.
Thomas DA said...

Hi Fredrik, I read your post with interest as I in a rush of quriosity wanted to find an efficient way to compute harmonic numbers.

By time I've grown more used to the idea, that the taylor series is the by far best solution, but after realisng that H(2n) = H(n)/2+U(2n) (where U(n) is the sum of uneven fractions), I wanted to try the primes way.

I did the solution below, which besides using the trick above for all primes, also needs to use some inclusion/exclusion, as H(6) = 1+H(3)/2+H(2)/3+H(1)/5-H(1)/6 and H(30) = 1+H(15)/2+H(10)/3+...-H(5)/6-H(3)/10-...+H(1)/30

The recursion is highly overlapping it can be optimized using memory, but the largest problem is generating the primes and permutations. Probably it would be much more efficient if it was asked to calculate a lot of different H(n) values. (As it could save the prework)

from __future__ import division
import sys
N = int(sys.argv[1])

primes = [2]
for i in xrange(3,N+1):
 for p in primes:
  if i % p == 0:

def genperm(a, size):
 if size == 0:
  yield 1
 for i in xrange(a,len(primes)-size+1):
  for end in genperm(i+1,size-1):
   if primes[i]*end <= N:
    yield primes[i]*end

# Calculate the roof for in/out permutations
toolong = 1
prod = lambda a,b:a*b
while reduce(prod, primes[:toolong]) <= N:
 toolong += 1

ins = primes[:]
outs = []
for size in xrange(2,toolong):
 l = list(genperm(0,size))
 if size % 2 == 0:
  outs += l
 else: ins += l


def H(upper):
 val = 1
 for v in ins:
  if v > upper: break
  val += H(upper//v)/v
 for v in outs:
  if v > upper: break
  val -= H(upper//v)/v
 return val

def H1(upper):
 return sum(1/i for i in xrange(1,upper+1))

print H(N)
print H1(N)