Just evaluating a polynomial: how hard could it be?
The previous post looked at an example of strange floating point behavior taking from book End of Error. This post looks at another example.
This example, by Siegfried Rump, asks us to evaluate
333.75 y6 + x2 (11 x2y2 - y6 - 121 y4 - 2) + 5.5 y8 + x/(2y)
at x = 77617 and y = 33096.
Here we evaluate Rump's example in single, double, and quadruple precision.
#include <iostream>#include <quadmath.h>using namespace std;template <typename T> T f(T x, T y) { T x2 = x*x; T y2 = y*y; T y4 = y2*y2; T y6 = y2*y4; T y8 = y2*y6; return 333.75*y6 + x2*(11*x2*y2 - y6 - 121*y4 - 2) + 5.5*y8 + x/(2*y);}int main() { int x = 77617; int y = 33096; cout << f<float>(x, y) << endl; cout << f<double>(x, y) << endl; cout << (double) f<__float128>(x, y) << endl;}
This gives three answers,
-1.85901e+030-1.18059e+0211.1726
none of which is remotely accurate. The exact answer is -54767/66192 = -0.827"
Python gives the same result as C++ with double precision, which isn't surprising since Python floating point numbers are C doubles under the hood.
Where does the problem come from? The usual suspect: subtracting nearly equal numbers. Let's split Rump's expression into two parts.
s = 333.75 y6 + x2(11x2y2 - y6 - 121y4 - 2)
t = 5.5y8 + x/(2y)
and look at them separately. We get
s = -7917111340668961361101134701524942850.00 t = 7917111340668961361101134701524942849.1726...
The values -s and t agree to 36 figures, but quadruple precision has only 34 figures [1]. Subtracting these two numbers results in catastrophic loss of precision.
Rump went to some effort to conceal what's going on, and the example is contrived to require just a little more than quad precision. However, his example illustrates things that come up in practice. For one, the values of x and y are not that big, and so we could be mislead into thinking the terms in the polynomial are not that big. For another, it's not clear that you're subtracting nearly equal numbers. I've been caught off guard by both of these in real applications.
Floating point numbers are a leaky abstraction. They can trip you up if you don't know what you're doing. But the same can be said of everything in computing. We can often ignore finite precision just as we can often ignore finite memory, for example. The proper attitude toward floating point numbers is somewhere between blissful ignorance and unnecessary fear.
Similar posts[1] A quad precision number has 128 bits: 1 sign bit, 15 for exponents, and 112 for the fractional part. Because the leading zero is implicit, this gives a quad 113 bits of precision. Since log10(2113) = 34.01", this means a quad has 34 decimals of precision.