Issue #14900 has been updated by ioquatix (Samuel Williams).


I played around with my assumptions here. By far the worst from a memory POV was `slice!`, which given a string of 5MB, produces 7.5MB allocations. The equivalent sequence of `byteslice` as above only allocates 2.5MB.

Here were my comparisons:

```
measure_memory("Initial allocation") do
	string = "a" * 5*1024*1024
	string.freeze
end # => 5.0 MB

measure_memory("Byteslice from start to middle") do
	# Why does this need to allocate memory? Surely it can share the original allocation?
	x = string.byteslice(0, string.bytesize / 2)
end # => 2.5 MB

measure_memory("Byteslice from middle to end") do
	string.byteslice(string.bytesize / 2, string.bytesize)
end # => 0.0 MB

measure_memory("Slice! from start to middle") do
	string.dup.slice!(0, string.bytesize / 2) # dup doesn't make any difference to size of allocations
end # => 7.5 MB

measure_memory("Byte slice into two halves") do
	head = string.byteslice(0, string.bytesize / 2)
	remainder = string.byteslice(string.bytesize / 2, string.bytesize)
end # 2.5 MB
```

(examples are also here: https://github.com/socketry/async-io/blob/master/examples/allocations/byteslice.rb)

In the best case, the last example should be able to reuse the source string entirely, but Ruby doesn't seem capable of doing that yet. Perhaps a specific implementation of `byteslice!` could address this use case with zero allocations?

----------------------------------------
Bug #14900: Extra allocation in String#byteslice
https://bugs.ruby-lang.org/issues/14900#change-72893

* Author: janko (Janko Marohni)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-darwin17]
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
When executing `String#byteslice` with a range, I noticed that sometimes the original string is allocated again. When I run the following script:

~~~ ruby
require "objspace"

string = "a" * 100_000

GC.start
GC.disable
generation = GC.count

ObjectSpace.trace_object_allocations do
  string.byteslice(50_000..-1)

  ObjectSpace.each_object(String) do |string|
    p string.bytesize if ObjectSpace.allocation_generation(string) == generation
  end
end
~~~

it outputs

~~~
50000
100000
6
5
~~~

The one with 50000 bytes is the result of `String#byteslice`, but the one with 100000 bytes is the duplicated original string. I expected only the result of `String#byteslice` to be amongst new allocations.

If instead of the last 50000 bytes I slice the *first* 50000 bytes, the extra duplication doesn't occur.

~~~ ruby
# ...
  string.byteslice(0, 50_000)
# ...
~~~

~~~
50000
5
~~~

It's definitely ok if the implementation of `String#bytesize` allocates extra strings as part of the implementation, but it would be nice if they were deallocated before returning the result.

EDIT: It seems that `String#slice` has the same issue.



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>