Issue #16352 has been updated by marcandre (Marc-Andre Lafortune).


If the risk of collision with `SIZEOF_LONG - 1` is deemed too high, then add 64 bits of fixed data afterwards (pick a random value). If the 64 bits after the size match, then it is extended format. If they don't, then omg it happens to be size is actually `SIZEOF_LONG-1`... I haven't checked the format of Marshal closely enough, but I would not be surprised if there were some bit sequences following the size that would actually be invalid. If so, there would be no risk at all.

----------------------------------------
Feature #16352: Modify Marshal to dump objects larger than 2 GiB
https://bugs.ruby-lang.org/issues/16352#change-86046

* Author: seoanezonjic (Pedro Seoane)
* Status: Open
* Priority: Normal
----------------------------------------
Using a gem called Numo-array to handle matrix operations, I found the following error while saving a large matrix:

```
in `dump': long too big to dump (TypeError)
```

Github thread is https://github.com/ruby-numo/numo-narray/issues/144. Digging with the authors, I found the following code that reproduces the error:

```
ruby -e 'Marshal.dump(" "*2**31)'
```

Executed in:
ruby 2.7.0dev (2019-11-12T12:03:22Z master 3816622fbe) [x86_64-linux]

The marshal library has a limit based on constant `SIZEOF_LONG`. This check is performed in [here](https://github.com/ruby/ruby/blob/e7ea6e078fecb70fbc91b04878b69f696749afac/marshal.c#L301L321). I don't understand the motivation of this limit. It has a great impact on libraries that need to serialize large objects such as numeric matrix. In this case, the limit >= 2 GiB is reached easily, and it blocks ruby development. I found another related bug report: #1560, but the Marshal problem was not addressed in it.



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>