Bugs item #8548, was opened at 2007-02-09 20:23
You can respond by visiting: 
http://rubyforge.org/tracker/?func=detail&atid=1698&aid=8548&group_id=426

Category: Standard Library
Group: None
Status: Open
Resolution: None
Priority: 3
Submitted By: Riley Lynch (rlynch)
Assigned to: Nobody (None)
Summary: YAML::Omap loses entries, replacing them with random references

Initial Comment:
YAML::Omap#to_yaml may lose random entries, replacing them with anchors to other (random) entries. This problem also affects YAML::Pairs.

To reproduce:

  require 'yaml'

  N = 1000
  omap = YAML::Omap.new
  N.times { |i| omap["key_#{i}"] = { "value" => i } }
  puts (omap.to_yaml =~ /id\d+/) ? "bug" : "no bug"

Note that the bug does not appear when N falls below a certain, volatile number. For my platform (ruby 1.8.4 (2005-12-24) [i686-darwin8.7.1]), the bug occurs when N > 194, but adding another line of code changes that figure.

Cause: The YAML emitter takes an object id, and therefore assumes that the object to be yamlized is valid throughout the yamlization. In the case of YAML::Omap, #to_yaml constructs temporary hashes from its associative array members:

  def to_yaml( opts = {} )
      YAML::quick_emit( self.object_id, opts ) do |out|
           out.seq( taguri, to_yaml_style ) do |seq|
              self.each do |v|
                  seq.add( Hash[ *v ] )
              end
          end
      end
  end

The object_ids for the temporary hashes may be recycled during the course of yamlizing a large Omap, causing subsequent temporary hashes which recycle previously used object_ids to appear as references.

Fix: I'm using this workaround for now, but it will fail when yamlizing a structure with multiple Omaps, etc.

  class YAML::Omap
      def to_yaml( opts = {} )
          hash_cache = []
          YAML::quick_emit( self.object_id, opts ) do |out|
              out.seq( taguri, to_yaml_style ) do |seq|
                  self.each do |v|
                      hash_cache << (tmp_hash = Hash[ *v ])
                      seq.add( tmp_hash )
                  end
              end
          end
      end
  end

A better solution might be to build the Omap associative array out of objects that have a #to_yaml which constructs a YAML::Syck::Map like ::Hash#to_yaml.

Bug #3968 suffers from the same root cause (emitter assumes persistent object_ids), but #3968 is based on a user-defined #to_yaml, whereas this is a bug in library code. YAML's assumption about the persistence of object_ids, though undocumented, is not necessarily a bug -- in any case it only should affect temporary objects allocated within #to_yaml.


----------------------------------------------------------------------

You can respond by visiting: 
http://rubyforge.org/tracker/?func=detail&atid=1698&aid=8548&group_id=426