Issue #14745 has been updated by janko (Janko Marohni).


> it looks like @stream_cipher.update can take a second destination arg (like IO#read and friends) and maybe that helps... (that appears to be OpenSSL::Cipher#update)

Wow, I didn't know that. Thanks, this allowed me to greatly simplify the `#read` implementation, so now it's both simpler and has lower memory usage than the original implementation.

> I've sometimes wondered if adding a String#exchange! method might help

That sounds very useful to have in general. In my particular scenario (the PR) I wouldn't have use of it after all, this `String#clear` vs `String#replace` thing was just something I encountered during the optimization, the current version doesn't need `String#replace`. But I can imagine `String#exchange!` being a useful tool for diligent string management.

I'll close this ticket, as this seems to be a necessary consequence of the copy-on-write optimization, and I'm not having problems with it anymore.

----------------------------------------
Bug #14745: High memory usage when using String#replace with IO.copy_stream
https://bugs.ruby-lang.org/issues/14745#change-72123

* Author: janko (Janko Marohni)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-darwin17]
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
I'm using custom IO-like objects that implement #read as the first argument to IO.copy_stream, and I noticed odd memory behaviour when using String#replace on the output buffer versus String#clear. Here is an example of a "fake IO" object where #read uses String#clear on the output buffer:

~~~ ruby
GC.disable

require "stringio"

class FakeIO
  def initialize(content)
    @io = StringIO.new(content)
  end

  def read(length, outbuf)
    chunk = @io.read(length)

    if chunk
      outbuf.clear
      outbuf << chunk
      chunk.clear
    else
      outbuf.clear
    end

    outbuf unless outbuf.empty?
  end
end

io = FakeIO.new("a" * 50*1024*1024) # 50MB

IO.copy_stream(io, File::NULL)

system "top -pid #{Process.pid}"
~~~

This program outputs memory usage of 50MB at the end, as expected  50MB was loaded into memory at the beginning and any new strings are deallocated. However, if I modify the #read implementation to use String#replace instead of String#clear:

~~~ ruby
  def read(length, outbuf)
    chunk = @io.read(length)

    if chunk
      outbuf.replace chunk
      chunk.clear
    else
      outbuf.clear
    end

    outbuf unless outbuf.empty?
  end
~~~

the memory usage has now doubled to 100MB at the end of the program, indicating that some string bytes weren't successfully deallocated. So, it seems that String#replace has different behaviour compared to String#clear + String#<<.

I was *only* able to reproduce this with `IO.copy_stream`, the following program shows 50MB memory usage, regardless of whether the String#clear or String#replace approach is used:

~~~ ruby
GC.disable

buffer = "a" * 50*1024*1024
chunk  = "b" * 50*1024*1024

if ARGV[0] == "clear"
  buffer.clear
  buffer << chunk
else
  buffer.replace chunk
end

chunk.clear

system "top -pid #{Process.pid}"
~~~

With this program I also noticed one interesting thing. If I remove `chunk.clear`, then the "clear" version uses 100MB as expected (because both buffer and chunk strings are 50MB large), but the "replace" version uses only 50MB, which makes it appear that the `buffer` string doesn't use any memory when in fact it should use 50MB just like the `chunk` string. I found that odd, and I think it might be a clue to the memory bug with String#replace I experienced when using `IO.copy_stream`.



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>