```Nobu writes,
>Hi,
>
>At Mon, 21 Apr 2003 09:24:15 +0900,
>Rudi Cilibrasi wrote:
>> Does this make sense?  If not please let me clarify further as this makes
>Yes, thank you.
>[ruby-talk:69752].
>But still I have a question: why restrict to IEEE754?
># CVS head version needs no assumption on float format, and no
># flag bits.

According to my test program, the version currently in CVS still has
problems; If you try
5000000.times {
w = rand(111111111)
x = rand(222222223)
y = rand(33333333334)
z = rand(311414313)
a = (x.to_f + y.to_f / z.to_f) * Math.exp(w.to_f / (x.to_f + y.to_f / z.to_f))
ma = Marshal.dump(a)
if a == b
#  puts "Everything is working fine for #{a}"
else
puts "PROBLEM: a is #{a}, b is #{b}, and ma is #{ma.dump}"
puts "w is #{w}, x is #{x}, y is #{y}, z is #{z}"
exit(0)
end
}
For me produces:
PROBLEM: a is 422570262.410156, b is 422570262.40625, and ma is "\004\010f\03142
2570262.4101562\000\000\000"
w is 108244015, x is 51362626, y is 9466898912, z is 96749751

You will probably also see an error print out.  I think it must still
suffer from some roundoff problems.  Unfortunately, I do not believe you
can rely on any floating-point function to give exactly correct results
in the least significant bits, as there is still problems with
standardizing accuracy and roundoff rules in most implementations.
(though they are supposed to be exact)

I think it is a tricky question of whether or not to use a 2-bit
"format flag" to say that it is an IEE754 or some other type of
Marshal'd Float.  Here are advantages to each:

1) If no format flag is used:
* saves space
* seems cleaner in the Marshal format, because it has fewer component parts

2) If a format flag is used:
* it allows the possibility for other ways of saving the final bits of
a float in the future, for instance if we want to save Cray or VAX 16-byte
floating point numbers, it would be possible to add.  Without a flag, it
may be difficult to know how many bytes of mantissa follow, or what number
to multiply by (instead of 0x10000) to get correct results.
* it allows the ability to detect (through some Marshal function like
Marshal.prevent_rounding() and subsequent Exception throwing) at Ruby
runtime whether or not roundoff error might occur.  In some applications,
for instance in interface coding, you may want to know whether or not it
is safe to read a Float into Ruby and then write it back out.  Without
a format flag, you cannot know whether or not you will lose precision when
doing this.  This then makes Ruby less viable for persistent data storage.

I believe that although your solution involving modf, ldexp, and frexp
appears more portable in that it uses fewer direct byte accesses, in
reality it is unlikely to work on any format but IEEE754, just as mine
will not.  This is because of the constant-width addition of 2 bytes
that is only appropriate for IEEE754 8-byte double.  And the more serious
problem is that it seems we cannot rely on correct precise behavior as
my test above indicates.  Essentially, there's many more opportunities for
something to go wrong in this portable solution than the less portable
but more often working simple bitmask and copy, which is easier for
overworked processors.  Another issue is the reliance on these functions
being 100% accurate, even in theory.  Testing shows that something is
wrong with them above, but even if this bug is fixed (and I cannot figure
it out myself with some trying), it seems to me the definitions of these
functions essentially require the FPU to be working in standard
m * 2^e style floating points, because working in say, m * 10^e
representation would not allow exact answers, or any fixed-point rep,
and so on.  In the end, though, I think even the multiple-format
portability issue is of only academic interest, as I have heard essentially
all new hardware is IEEE754 compliant, and they now have special-purpose
software libraries that allow you to calculate with arbitrarily high
precision when necessary, which makes it even less useful to support
alternative formats.

Rudi

```