Hello,

recently, I've been trying to use parts of libxml2 with Ruby/DL.  At
first, everything looked fine, but experimenting with larger
documents I ran into "random" crashes.

Please try the attached test.  If you run it without modification on
some bigger XML file, you should get:

$ ruby pull.rb myfile.xml
...
22
23
(eval):5: [BUG] Segmentation fault
ruby 1.8.2 (2004-12-25) [i686-linux]

#0  0x401b2c8d in memmove () from /lib/libc.so.6
#1  0x402f16b5 in xmlBufferAdd () from /usr/lib/libxml2.so
#2  0x402f5890 in xmlParserInputBufferPush () from /usr/lib/libxml2.so
#3  0x402e671c in xmlParseChunk () from /usr/lib/libxml2.so
#4  0x40349bf6 in xmlUCSIsCat () from /usr/lib/libxml2.so
#5  0x4034a227 in xmlUCSIsCat () from /usr/lib/libxml2.so
#6  0x4034a8ef in xmlTextReaderExpand () from /usr/lib/libxml2.so
#7  0x4034a639 in xmlTextReaderRead () from /usr/lib/libxml2.so
#8  0x4029ae91 in rb_dlsym_guardcall (type=73, ret=0xbfffc25c, 
    stack=0xbfffc278, func=0x4034a24c) at sym.c:427
#9  0x4029bad5 in rb_dlsym_call (argc=1, argv=0xbfffc5c4, self=1076214100)
    at sym.c:731
#10 0x4003e1e3 in rb_call0 (klass=1076335220, recv=1076214100, id=5217, 
    oid=5217, argc=1, argv=0xbfffc5c4, body=0x402791c0, nosuper=0)
    at eval.c:5393
#11 0x4003ece0 in rb_call (klass=1076335220, recv=1076214100, mid=5217, 
    argc=1, argv=0xbfffc5c4, scope=0) at eval.c:5743
#12 0x40038e22 in rb_eval (self=1076259940, n=0x4025b468) at eval.c:3229
#13 0x40038206 in rb_eval (self=1076259940, n=0x4025b454) at eval.c:3008
#14 0x400399a4 in rb_eval (self=1076259940, n=0x4025b558) at eval.c:3393
#15 0x4003e81e in rb_call0 (klass=1076259860, recv=1076259940, id=10313, 
    oid=10313, argc=1, argv=0xbfffd9c4, body=0x4025b558, nosuper=0)
    at eval.c:5650
#16 0x4003ece0 in rb_call (klass=1076259860, recv=1076259940, mid=10313, 
    argc=1, argv=0xbfffd9c4, scope=0) at eval.c:5743
#17 0x40038e22 in rb_eval (self=1078001184, n=0x4027a1d8) at eval.c:3229
#18 0x40038c0e in rb_eval (self=1078001184, n=0x4027a188) at eval.c:3223
#19 0x40039ab2 in rb_eval (self=1078001184, n=0x4027a0c0) at eval.c:3419
#20 0x4003e81e in rb_call0 (klass=1076260780, recv=1078001184, id=10297, 
    oid=10297, argc=0, argv=0x0, body=0x4027a0c0, nosuper=0) at eval.c:5650
#21 0x4003ece0 in rb_call (klass=1076260780, recv=1078001184, mid=10297, 
    argc=0, argv=0x0, scope=0) at eval.c:5743
#22 0x40038e22 in rb_eval (self=1076406720, n=0x40279ddc) at eval.c:3229
#23 0x40037a47 in rb_eval (self=1076406720, n=0x4027b0ec) at eval.c:2876
#24 0x40034580 in eval_node (self=1076406720, node=0x4027b0ec) at eval.c:1296
#25 0x40034b4f in ruby_exec_internal () at eval.c:1473
#26 0x40034bc0 in ruby_exec () at eval.c:1493
#27 0x40034bf5 in ruby_run () at eval.c:1503
#28 0x0804866f in main (argc=3, argv=0xbffffb34, envp=0xbffffb44) at main.c:46

If you uncomment the line marked with (A), the program will run
endless (at least, it takes far more iterations than usual).

If you uncomment the line marked with (B), the program will run
fine (21325 iterations, same as my C version).

If you replace line (C) with 'loop do  break unless reader.pull',
it will run fine too.

This only happens with big files, and the size matters:

Convert.xml  335389b  23 iterations before crash
Type.xml     276346b  21 iterations before crash
String.xml   189177b  15 iterations before crash
Array.xml    170380b  11 iterations before crash
Decimal.xml  123392b  endless loop (7202 usually)
Object.xml    44075b  endless loop (1755 usually)
Activator.xml 17969b  endless loop (995 usually)
UIntPtr.xml   15718b  947 iterations, always
Guid.xml      11735b  805 iterations, always
Void.xml       1374b  89 iterations, always

Note that the behavor changes at the 128k and 16k borderline, maybe
this is part of the reason.  I can provide these XML files on request.

I'm totally puzzled and hope this can be fixed easily...

Christian Neukirchen
<chneukirchen / gmail.com>


# pull.rb:

require 'dl'
require 'dl/import'

class FastPull
  module XMLTextReader
    extend DL::Importable
    
    begin
      dlload "libxml2.dylib"
    rescue
      dlload "libxml2.so"
    end

    # Constructor, Deconstructor
    extern "void *xmlReaderForMemory(const char*,int,const char*,const char*,int)"
    extern "void xmlFreeTextReader(void*)"

    # Attributes of the Node
    extern "int xmlTextReaderAttributeCount (void*)"
    extern "int xmlTextReaderDepth (void*)"
    extern "int xmlTextReaderHasAttributes (void*)"
    extern "int xmlTextReaderHasValue(void*)"
    extern "int xmlTextReaderIsDefault (void*)"
    extern "int xmlTextReaderIsEmptyElement (void*)"
    extern "int xmlTextReaderNodeType (void*)"
    extern "int xmlTextReaderQuoteChar (void*)"
    extern "int xmlTextReaderReadState (void*)"
    extern " char *xmlTextReaderBaseUri (void*)"
    extern " char *xmlTextReaderLocalName (void*)"
    extern " char *xmlTextReaderName (void*)"             # (A)
        
    # Iterator
    extern "int xmlTextReaderRead(void*)"
  end
  
  def initialize(source)
    @reader = XMLTextReader.xmlReaderForMemory(source, source.size, "url", "", 0)
  end

  def pull
    unless @finished
      @finished = (XMLTextReader.xmlTextReaderRead(@reader) == 0)
    end

    not @finished
  end
end


reader = FastPull.new File.read(ARGV[0])

begin
  c = 0
  while reader.pull             # (C)
    GC.start                    # (B)
    p c+=1
  end
end