Greetings!

There is definitely a bug in Syck's emitter code in current 1.8.1 branch
(and in release too) which is easily reproduceable on GNU/Linux systems on
IA-32 when making a meta-information for 'ri' using 'rdoc' against Ruby
sources:

$ cd ~/cvs/ruby-1.8
$ gdb ruby
(gdb) run /usr/bin/rdoc --ri
 ... lots of output ...
Generating RI...

Program received signal SIGSEGV, Segmentation fault.
0x002d6d5c in memcpy () from /lib/libc.so.6
(gdb) bt full
#0  0x002d6d5c in memcpy () from /lib/libc.so.6
No symbol table info available.
#1  0x11d6e260 in ?? ()
No symbol table info available.
#2  0x01d8f12b in syck_emitter_simple (e=0x11d6e260, 
    str=0x12110d58 "\"[     [\\\"KeywordSearchRequest\\\",
    \\\"keywordSearchRequest\\\", [       [\\\"in\\\",
    \\\"KeywordSearchRequest\\\",        [::SOAP::SOAPStruct,
    \\\"http://soap.amazon.com\\\", \\\"KeywordRequest\\\"]],
    [\\\"retval\\\", "..., 
    len=9362) at emitter.c:317
No locals.
#3  0x01d9669f in syck_emitter_simple_write (self=1091645092,
	str=1091599272) at rubyext.c:1301
        emitter = (SyckEmitter *) 0x11d6e260
#4  0x0016fa6e in call_cfunc (func=0x1d96652
	<syck_emitter_simple_write>, recv=1091645092, len=1,
	argc=1, argv=0xbffeaac8)
    at eval.c:4938
No locals.

More stack frames are available but they are in Ruby code itself and look
fine.

When looking at (SyckEmitter*)e (0x11d6e260) we can see that bufpos is way
out of buffer, as well as marker itself:

(gdb) print *((struct _syck_emitter *)0x11d6e260)
$39 = {
  headless = 0, 
  seq_map = 0, 
  use_header = 0, 
  use_version = 0, 
  sort_keys = 0, 
  anchor_format = 0x0, 
  explicit_typing = 0, 
  best_width = 80, 
  block_style = block_arbitrary, 
  stage = doc_processing, 
  level = 3, 
  indent = 2, 
  ignore_id = 4, 
  markers = 0x11d6e310, 
  anchors = 0x0, 
  bufsize = 4096, 
  buffer = 0x12437588 "\"[
	  [\\\"KeywordSearchRequest\\\",
	  \\\"keywordSearchRequest\\\", [
          [\\\"in\\\",
	  \\\"KeywordSearchRequest\\\",
	  [::SOAP::SOAPStruct,
	  \\\"http://soap.amazon.com\\\",
	  \\\"KeywordRequest\\\"]],
	  [\\\"retval\\\", "..., 
  marker = 0x12438a1a
          "ap.amazon.com\\\",
          \\\"http://soap.amazon.com\\\"],
          [\\\"DirectorSearchRequest\\\",
          \\\"directorSearchRequest\\\", [
          [\\\"in\\\",
          \\\"DirectorSearchRequest\\\",
          [::SOAP::SOAPStruct,
          \\\"http://soap.amazon.co"..., 
  bufpos = 4423, 
  handler = 0x1d96396	<rb_syck_output_handler>, 
  bonus = 0x41112f30
}

GDB's output is formatted slightly to fit into the mail.

Syck's emitter.c code looks dangerous for me in those places --
syck_emitter_write() has no protection about 'rest' variable being
negative (which happened in the case due to buffer overrun) and
syck_emitter_flush()/syck_emitter_start_obj() have manipulations with
e->marker that can easily lead to buffer overrun.

Unfortunately, I have little time to explore those codepaths more before
February. I saw another bug report yesterday in ruby-talk@ for the same
bug.
-- 
/ Alexander Bokovoy
Samba Team                      http://www.samba.org/
ALT Linux Team                  http://www.altlinux.org/
Midgard Project Ry              http://www.midgard-project.org/