Sorry for a late reply,

Igal Koshevoy wrote:
> If the RubyBuild or RubyChecker programs fail to run with only a
> specific version of Ruby, then that Ruby interpreter is probably insane.
>
> However, if they run all the way through, then we need to manually
> determine how sane the Ruby interpreter is by reviewing the reports
> that the checker produced by running the test suites.

My concern is around sanity of that report.

For instance, back in 2007 Ruby once had a bug in its strtod()
implementation where it generated wrong double values for some inputs. 
This bug was hardly visible from any test suits because both String#to_f
and Ruby parser used same strtod() in common, resulting any
assert_equal(0.foo, "0.foo".to_f) to be true (yet they both were wrong).

% ruby -ve 'printf "%20.0f\n%20.0f\n", 36893488147419108000.0, "36893488147419108000.0".to_f'

# wrong
ruby 1.8.5 (2006-08-25) [x86_64-linux]
36893488147419103232
36893488147419103232

# correct (nearest)
ruby 1.8.6 (2008-07-06 patchlevel 265) [x86_64-linux]
36893488147419111424
36893488147419111424

This kind of concern is not my particular paranoia.  Bison people, for
instance, write their tests in shell scripts + m4 macros.
http://git.savannah.gnu.org/gitweb/?p=bison.git;a=tree;f=tests;hb=HEAD