Nobu writes, >Hi, > >At Mon, 21 Apr 2003 09:24:15 +0900, >Rudi Cilibrasi wrote: >> Does this make sense? If not please let me clarify further as this makes >Yes, thank you. >[ruby-talk:69752]. >But still I have a question: why restrict to IEEE754? ># CVS head version needs no assumption on float format, and no ># flag bits. According to my test program, the version currently in CVS still has problems; If you try 5000000.times { w = rand(111111111) x = rand(222222223) y = rand(33333333334) z = rand(311414313) a = (x.to_f + y.to_f / z.to_f) * Math.exp(w.to_f / (x.to_f + y.to_f / z.to_f)) ma = Marshal.dump(a) b = Marshal.load(ma) if a == b # puts "Everything is working fine for #{a}" else puts "PROBLEM: a is #{a}, b is #{b}, and ma is #{ma.dump}" puts "w is #{w}, x is #{x}, y is #{y}, z is #{z}" exit(0) end } For me produces: PROBLEM: a is 422570262.410156, b is 422570262.40625, and ma is "\004\010f\03142 2570262.4101562\000\000\000" w is 108244015, x is 51362626, y is 9466898912, z is 96749751 You will probably also see an error print out. I think it must still suffer from some roundoff problems. Unfortunately, I do not believe you can rely on any floating-point function to give exactly correct results in the least significant bits, as there is still problems with standardizing accuracy and roundoff rules in most implementations. (though they are supposed to be exact) I think it is a tricky question of whether or not to use a 2-bit "format flag" to say that it is an IEE754 or some other type of Marshal'd Float. Here are advantages to each: 1) If no format flag is used: * saves space * seems cleaner in the Marshal format, because it has fewer component parts 2) If a format flag is used: * it allows the possibility for other ways of saving the final bits of a float in the future, for instance if we want to save Cray or VAX 16-byte floating point numbers, it would be possible to add. Without a flag, it may be difficult to know how many bytes of mantissa follow, or what number to multiply by (instead of 0x10000) to get correct results. * it allows the ability to detect (through some Marshal function like Marshal.prevent_rounding() and subsequent Exception throwing) at Ruby runtime whether or not roundoff error might occur. In some applications, for instance in interface coding, you may want to know whether or not it is safe to read a Float into Ruby and then write it back out. Without a format flag, you cannot know whether or not you will lose precision when doing this. This then makes Ruby less viable for persistent data storage. I believe that although your solution involving modf, ldexp, and frexp appears more portable in that it uses fewer direct byte accesses, in reality it is unlikely to work on any format but IEEE754, just as mine will not. This is because of the constant-width addition of 2 bytes that is only appropriate for IEEE754 8-byte double. And the more serious problem is that it seems we cannot rely on correct precise behavior as my test above indicates. Essentially, there's many more opportunities for something to go wrong in this portable solution than the less portable but more often working simple bitmask and copy, which is easier for overworked processors. Another issue is the reliance on these functions being 100% accurate, even in theory. Testing shows that something is wrong with them above, but even if this bug is fixed (and I cannot figure it out myself with some trying), it seems to me the definitions of these functions essentially require the FPU to be working in standard m * 2^e style floating points, because working in say, m * 10^e representation would not allow exact answers, or any fixed-point rep, and so on. In the end, though, I think even the multiple-format portability issue is of only academic interest, as I have heard essentially all new hardware is IEEE754 compliant, and they now have special-purpose software libraries that allow you to calculate with arbitrarily high precision when necessary, which makes it even less useful to support alternative formats. Rudi