Issue #15968 has been reported by alipman (Aaron Lipman).

----------------------------------------
Bug #15968: Custom marshal_load methods allow object instance variables to "leak" into other objects
https://bugs.ruby-lang.org/issues/15968

* Author: alipman (Aaron Lipman)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: 
* Backport: 2.4: UNKNOWN, 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
While working on a Rails app, I noticed some odd behavior where after marshalling and demarshalling an array of ActiveRecord objects, some elements were replaced with symbols and empty hashes ([original Rails bug report](https://github.com/rails/rails/issues/36522)).

It appears some of Rails' custom marshallization methods modify allow an object's unset instance variables to be set during marshallization. However, since these instance variables weren't counted at the start of marshallization, they overflow into subsequent array elements upon demarshallization.

Here is a test case (written in plain Ruby) demonstrating this behavior:

```
require 'test/unit'

class Foo
  attr_accessor :bar, :baz

  def initialize
    self.bar = Bar.new(self)
  end
end

class Bar
  attr_accessor :foo

  def initialize(foo)
    self.foo = foo
  end

  def marshal_dump
    self.foo.baz = :problem
    {foo: self.foo}
  end

  def marshal_load(data)
    self.foo = data[:foo]
  end
end

class BugTest < Test::Unit::TestCase
  def test_marshalization
    foo = Foo.new
    array = [foo, nil]
    marshalled_array = Marshal.dump(array)
    demarshalled_array = Marshal.load(marshalled_array)

    assert_nil demarshalled_array[1]
  end
end
```

I'm not positive this qualifies as a bug - if a programmer writes custom marshal_dump and marshal_load methods, perhaps it's their responsibility to avoid unintended side-effects like those demonstrated in my test case.

However, I think this issue might be altogether avoided by adding a reserved delimiter character to Ruby's core marshallization functionality (in marshal.c) representing the "end" of a serialized object. For instance, in the above test case, `marshalled_array` comes out to:

```
\x04\b[\ao:\bFoo\x06:\t@barU:\bBar{\x06:\bfoo@\x06:\t@baz:\fproblem0
```

Suppose Ruby used a `z` character to represent the end of a serialized object - in this case, `marshalled_array` would come out to something like:

```
\x04\b[\ao:\bFoo\x06:\t@barU:\bBar{\x06:\bfoo@\x06:\t@baz:\fproblemz0
```

(Note the second-to-last character - `z`.)

This way, when demarshalling an object, even if additional instance variables had somehow snuck in during marshallization process, the `z` character could be used to mark the end of a serialized object, ensuring that the extra instance variables don't overflow into the next segment of serialized data.

I don't write much C, and I haven't fully grokked Ruby's marshal.c - so there may be dozens of reasons why this won't work. But I think a serialization strategy along those lines may help avoid unexpected behavior.



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>