Hello,
recently, I've been trying to use parts of libxml2 with Ruby/DL. At
first, everything looked fine, but experimenting with larger
documents I ran into "random" crashes.
Please try the attached test. If you run it without modification on
some bigger XML file, you should get:
$ ruby pull.rb myfile.xml
...
22
23
(eval):5: [BUG] Segmentation fault
ruby 1.8.2 (2004-12-25) [i686-linux]
#0 0x401b2c8d in memmove () from /lib/libc.so.6
#1 0x402f16b5 in xmlBufferAdd () from /usr/lib/libxml2.so
#2 0x402f5890 in xmlParserInputBufferPush () from /usr/lib/libxml2.so
#3 0x402e671c in xmlParseChunk () from /usr/lib/libxml2.so
#4 0x40349bf6 in xmlUCSIsCat () from /usr/lib/libxml2.so
#5 0x4034a227 in xmlUCSIsCat () from /usr/lib/libxml2.so
#6 0x4034a8ef in xmlTextReaderExpand () from /usr/lib/libxml2.so
#7 0x4034a639 in xmlTextReaderRead () from /usr/lib/libxml2.so
#8 0x4029ae91 in rb_dlsym_guardcall (type=73, ret=0xbfffc25c,
stack=0xbfffc278, func=0x4034a24c) at sym.c:427
#9 0x4029bad5 in rb_dlsym_call (argc=1, argv=0xbfffc5c4, self=1076214100)
at sym.c:731
#10 0x4003e1e3 in rb_call0 (klass=1076335220, recv=1076214100, id=5217,
oid=5217, argc=1, argv=0xbfffc5c4, body=0x402791c0, nosuper=0)
at eval.c:5393
#11 0x4003ece0 in rb_call (klass=1076335220, recv=1076214100, mid=5217,
argc=1, argv=0xbfffc5c4, scope=0) at eval.c:5743
#12 0x40038e22 in rb_eval (self=1076259940, n=0x4025b468) at eval.c:3229
#13 0x40038206 in rb_eval (self=1076259940, n=0x4025b454) at eval.c:3008
#14 0x400399a4 in rb_eval (self=1076259940, n=0x4025b558) at eval.c:3393
#15 0x4003e81e in rb_call0 (klass=1076259860, recv=1076259940, id=10313,
oid=10313, argc=1, argv=0xbfffd9c4, body=0x4025b558, nosuper=0)
at eval.c:5650
#16 0x4003ece0 in rb_call (klass=1076259860, recv=1076259940, mid=10313,
argc=1, argv=0xbfffd9c4, scope=0) at eval.c:5743
#17 0x40038e22 in rb_eval (self=1078001184, n=0x4027a1d8) at eval.c:3229
#18 0x40038c0e in rb_eval (self=1078001184, n=0x4027a188) at eval.c:3223
#19 0x40039ab2 in rb_eval (self=1078001184, n=0x4027a0c0) at eval.c:3419
#20 0x4003e81e in rb_call0 (klass=1076260780, recv=1078001184, id=10297,
oid=10297, argc=0, argv=0x0, body=0x4027a0c0, nosuper=0) at eval.c:5650
#21 0x4003ece0 in rb_call (klass=1076260780, recv=1078001184, mid=10297,
argc=0, argv=0x0, scope=0) at eval.c:5743
#22 0x40038e22 in rb_eval (self=1076406720, n=0x40279ddc) at eval.c:3229
#23 0x40037a47 in rb_eval (self=1076406720, n=0x4027b0ec) at eval.c:2876
#24 0x40034580 in eval_node (self=1076406720, node=0x4027b0ec) at eval.c:1296
#25 0x40034b4f in ruby_exec_internal () at eval.c:1473
#26 0x40034bc0 in ruby_exec () at eval.c:1493
#27 0x40034bf5 in ruby_run () at eval.c:1503
#28 0x0804866f in main (argc=3, argv=0xbffffb34, envp=0xbffffb44) at main.c:46
If you uncomment the line marked with (A), the program will run
endless (at least, it takes far more iterations than usual).
If you uncomment the line marked with (B), the program will run
fine (21325 iterations, same as my C version).
If you replace line (C) with 'loop do break unless reader.pull',
it will run fine too.
This only happens with big files, and the size matters:
Convert.xml 335389b 23 iterations before crash
Type.xml 276346b 21 iterations before crash
String.xml 189177b 15 iterations before crash
Array.xml 170380b 11 iterations before crash
Decimal.xml 123392b endless loop (7202 usually)
Object.xml 44075b endless loop (1755 usually)
Activator.xml 17969b endless loop (995 usually)
UIntPtr.xml 15718b 947 iterations, always
Guid.xml 11735b 805 iterations, always
Void.xml 1374b 89 iterations, always
Note that the behavor changes at the 128k and 16k borderline, maybe
this is part of the reason. I can provide these XML files on request.
I'm totally puzzled and hope this can be fixed easily...
Christian Neukirchen
<chneukirchen / gmail.com>
# pull.rb:
require 'dl'
require 'dl/import'
class FastPull
module XMLTextReader
extend DL::Importable
begin
dlload "libxml2.dylib"
rescue
dlload "libxml2.so"
end
# Constructor, Deconstructor
extern "void *xmlReaderForMemory(const char*,int,const char*,const char*,int)"
extern "void xmlFreeTextReader(void*)"
# Attributes of the Node
extern "int xmlTextReaderAttributeCount (void*)"
extern "int xmlTextReaderDepth (void*)"
extern "int xmlTextReaderHasAttributes (void*)"
extern "int xmlTextReaderHasValue(void*)"
extern "int xmlTextReaderIsDefault (void*)"
extern "int xmlTextReaderIsEmptyElement (void*)"
extern "int xmlTextReaderNodeType (void*)"
extern "int xmlTextReaderQuoteChar (void*)"
extern "int xmlTextReaderReadState (void*)"
extern " char *xmlTextReaderBaseUri (void*)"
extern " char *xmlTextReaderLocalName (void*)"
extern " char *xmlTextReaderName (void*)" # (A)
# Iterator
extern "int xmlTextReaderRead(void*)"
end
def initialize(source)
@reader = XMLTextReader.xmlReaderForMemory(source, source.size, "url", "", 0)
end
def pull
unless @finished
@finished = (XMLTextReader.xmlTextReaderRead(@reader) == 0)
end
not @finished
end
end
reader = FastPull.new File.read(ARGV[0])
begin
c = 0
while reader.pull # (C)
GC.start # (B)
p c+=1
end
end