Bug #3780: RDoc::Parser.binary? broken for some utf8 files longer than 1024 bytes
http://redmine.ruby-lang.org/issues/show/3780

Author: Stephen Bannasch
Status: Open, Priority: Normal
Category: core
ruby -v: ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-darwin10.4.0]

RDoc truncates files at 1024 bytes when checking if the file is binary. This will invalidate the file encoding if the file is truncated in the middle of a utf8 char and cause RDoc to exit.

I found this problem when running rdoc on the ruby 1.9.2 source.

  $ ruby -v
  ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-darwin10.4.0]
  $ rdoc --version
  rdoc 2.5.11

More description of the bug and a patch with a failing test is on this issue in RubyForge rdoc issue tracker.

http://rubyforge.org/tracker/index.php?func=detail&aid=28525&group_id=627&atid=2472

The same issue appears to be in the 1_9 source, see: http://github.com/ruby/ruby/blob/trunk/lib/rdoc/parser.rb#L70

I find it confusing knowing where to create an RDoc issue: RubyForge or here -- so I've created an issue in both places.

This gist: http://gist.github.com/561350 (possible_fix.rb) shows how I changed RDoc::Parser.binary?  locally --  but I don't think it is correct to classify all utf8 files which are invalid when truncated at 1024 bytes as binary. 

That same gist (show_parsing_error.rb) also shows another strategy for solving the invalid encoding issue but there are probably better ways to determine if a file is binary.


----------------------------------------
http://redmine.ruby-lang.org