Issue #7135 has been updated by alexdowad (Alex Dowad).


> Does this happen with unmodified Prawn at all?
 
Good question. I haven't spent a lot of time repeatedly running the spec tests for "unmodified" Prawn. Generally when I run some tests, it's because I'm contributing a patch to the gem, and I want to make sure I haven't broken anything.

I can tell you, though, that a few weeks ago, when I was working on a completely unrelated patch to Prawn, I also started getting an intermittent "invalid byte sequence in UTF-8" error when I was testing against Ruby 1.9.2. When I was tracing the error, I found the same thing with String#codepoints returning inconsistent results. I discovered that freezing the String made the problem "go away", and did a little reading of the Ruby source (which revealed that String#codepoints seems to treat frozen strings specially). It never occurred to me at the time that the problem might have anything to do with the GC, and I didn't pursue it further until now.

> I'm not familiar with Prawn, but does any of its dependencies pull in
> extra C extension which may have memory corruption bugs?
 
No. The core team has committed to *never* using binary gems, only pure Ruby.

> Can you share your work-in-progress changes to Prawn?
 
Do you really want to spend a few hours or days of your life helping to track down an obscure memory bug? If so, I'll push the code I am working on to GitHub.

At this point I think I have already got as much information as I can from Ruby-land -- I need to crack Ruby open and drop down into C-land. Right now I'm mainly fishing for information and ideas which will help when I do that...
----------------------------------------
Bug #7135: GC bug in Ruby 1.9.3-p194?
https://bugs.ruby-lang.org/issues/7135#change-30214

Author: alexdowad (Alex Dowad)
Status: Feedback
Priority: Normal
Assignee: 
Category: 
Target version: 
ruby -v: ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-linux]


I'm just doing some refactoring/performance work on a popular Ruby gem called Prawn (it's used for PDF generation). I'm fighting with a strange, intermittent failure on the spec tests, and from my experimentation so far, it seems very, very likely to be a bug in Ruby's garbage collector.

I'll try to keep this as brief as possible, but please be patient...

The code where the intermittent failure comes from measures the width of a string when rendered using a TTF font. This is part of the method body (including my debug print statements):


          # GC.disable
          # string.freeze
          p string.bytes.to_a if $my_debug
          p string.codepoints.to_a if $my_debug
          p scale if $my_debug
          result = string.codepoints.inject(0) do |s,r|
            print r if $my_debug
            print "," if $my_debug
            s + character_width_by_code(r)
          end * scale
          puts if $my_debug
          result

When the tests pass normally (which is about 7/8 of the time), the debug print statements show:

    [104, 101, 108, 108, 111, 194, 173]
    [104, 101, 108, 108, 111, 173]
    0.012
    104,101,108,108,111,173,

...You can see that the "print" calls in the "string.codepoints.inject" loop print the same series of codepoints as "p string.codepoints.to_a". This is what you would expect, because nothing is modifying the string. But about 1/8 of the time I get:

    [104, 101, 108, 108, 111, 194, 173]
    [104, 101, 108, 108, 111, 173]
    0.012
    104,42,0,0,0,0,0,

I have also seen "104,0,0,0,0,0" on occasion. In all cases, "p string.codepoints.to_a" prints the correct sequence of codepoints for the string.

You might think that something in the "string.codepoints.inject" loop is modifying the string, but it's not. I could show the contents of "character_width_by_code", but it would just be wasting your time, because it basically contains nothing but a couple of hash lookups.

If I uncomment "string.freeze", I can run the tests 100 times or more with no failure. (This proves that the string is not being modified by my code, because it would throw an exception otherwise!) Or, if I change the code to "string.codepoints.to_a.inject", again, the failure never happens. Most revealingly, if I uncomment "GC.disable", I can run the test 100 or more times with no failure. As soon as I comment out "GC.disable", the random failure comes back, for about 1/7 - 1/10 of runs.

Sometimes I also get another random failure from the same place: an "invalid codepoint in UTF-8" exception from "string.codepoints.inject". Again, this proves something inside Ruby is corrupting the string, because the call to "string.codepoints" just 2 lines before prints the correct sequence, with no exception raised.

I'd like to boil this down to a smaller example which demonstrates the failure, but it's a hopeless task. When I take pieces of the code and run them in irb, the failure never happens. Even rebooting the computer may make it go away... but then, when I am working on Prawn again, sooner or later it happens again. (I know because similar intermittent failures have happened before in the past.) Once it starts happening, though, it's pretty consistent at about 1/7 - 1/10 of test runs. Another clue is that the corrupted codepoints are *always* zero.

I can try to track down the problem, perhaps by adding some logging code to the Ruby interpreter source and recompiling, but I need some guidance on where to look. Can anyone who is familiar with Ruby internals (especially Strings and the GC) give me some ideas how to start?


-- 
http://bugs.ruby-lang.org/