Issue #17685 has been updated by dsisnero (Dominic Sisneros).


On the consumer side, we can Marshal those objects the usual way, which when unserialized will give us a copy of the original object:

b = ZeroCopyByteArray.new("abc".bytes)
data = Marshal.dump(b)
new_b = Marshal.load(data)
puts b == new_b  # True
puts b.equal? new_b  # False: a copy was made
But if we pass a buffer_callback and then give back the accumulated buffers when unserializing, we are able to get back the original object:

b = ZeroCopyByteArrayi.new("abc".bytes)
buffers = []
data = Marshal.dump(b, buffer_callback: buffers.method('append')
new_b = Marshal.load(data, buffer: buffers)
puts b == new_b  # True
puts b.equal? new_b  # True: no copy was made


class ZeroCopyByteArray < Arrow::Buffer

  def _dump()
      if Marshal.protocol >= 5
          return self.class._reconstruct(MarshalBuffer.new(self), nil
      else
          # PickleBuffer is forbidden with Marshal protocols <= 4.
          return type(self)._reconstruct, (bytearray(self),)
      end

  def self._load( obj)
    m = MemoryView.new(obj)
    obj = m.obj 
    if obj.class == self.class
      return obj
    else
      return new(obj)
    end
  end

end


----------------------------------------
Feature #17685: Marshal format for out of band buffer objects
https://bugs.ruby-lang.org/issues/17685#change-90887

* Author: dsisnero (Dominic Sisneros)
* Status: Open
* Priority: Normal
----------------------------------------
Allow the use of the marshal protocol to transmit large data (objects) from one process or ractor to another, on same machine or multiple machines without extra memory copies of the data.

See Python PEP 574 - https://www.python.org/dev/peps/pep-0574/ Pickle protocol with out of band data.

When marshalling memoryview objects, it would be nice to be able to use zero copy loads of the memoryviews. That way when loading the file we can use that memoryview without copying it also if desired.

Add a Marshal::Buffer type in new version of Marshal to represent something that indicates a serializable no-copy buffer view.

The marshal_dump must be able to represent references to a Marshal::Buffer to indicate that the loader might get the actual buffer out of band

The marshal_load must be able to provide the Marshal::Buffer for deserialization

Marshal load and dump should work normally if not used out of band.

```ruby
class Apache::Arrow
  
  def marshal_dump(*)
     if marshal.version > '0.4'
         Marshal::Buffer.new(self)
     else
        #normal dump
     end
  end
end
```




-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>