Issue #16278 has been updated by cristiangreco (Cristian Greco).


ko1 (Koichi Sasada) wrote:
> > If an application exercises this pattern very frequently during lifetime and across multiple processes then it”Ēs definitely going to bloat memory, at the very least. As a real-world example, this is causing high memory usage for the Prometheus client gem, where such pattern is heavily used when passing around metric labels.
> 
> Could you give us good example with that gem? As mame-san said, your example is not a problem because "This code uses constant memory".

Hi Koichi, thanks for taking the time to look into this!

This example creates a metric of type `counter`, updates it 3 times using different labels and prints out an export of the metric registry:

```ruby
# frozen_string_literal: true

require "prometheus/client"
require "prometheus/client/formats/text"

require "prometheus/client/data_stores/synchronized"
# require "prometheus/client/data_stores/single_threaded"
# require "prometheus/client/data_stores/direct_file_store"

Prometheus::Client.config.data_store = Prometheus::Client::DataStores::Synchronized.new
# Prometheus::Client.config.data_store = Prometheus::Client::DataStores::SingleThreaded.new
# Prometheus::Client.config.data_store = Prometheus::Client::DataStores::DirectFileStore.new(dir: ".")

def test
  registry = Prometheus::Client::Registry.new
  counter = registry.counter(
    :counter_metric,
    docstring: "a counter",
    labels: %i[label1 label2],
  )

  labels1 = { label1: "foo", label2: "bar" }
  labels2 = { label1: 1, label2: 2 }
  labels3 = { label1: :a, label2: :b }

  counter.increment(by: 1, labels: labels1)
  counter.increment(by: 1, labels: labels2)
  counter.increment(by: 1, labels: labels3)

  puts Prometheus::Client::Formats::Text.marshal(registry)

  [labels1.object_id, labels2.object_id, labels3.object_id]
end

def find(id)
  ObjectSpace.each_object(Hash).each { |h| id == h.object_id }
end

retained_ids = test

10.times do
  GC.start(full_mark: true, immediate_sweep: true)
end

retained_ids.each { |id| puts "found #{id}" if find(id) }
```

Using each store I found that the 3 labels hashes are retained after garbage collection (all the stores use a similar pattern of storing labels hashes within another hash).

What do you think is going on here?

----------------------------------------
Bug #16278: Potential memory leak when an hash is used as a key for another hash
https://bugs.ruby-lang.org/issues/16278#change-82331

* Author: cristiangreco (Cristian Greco)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.6.5p114 (2019-10-01 revision 67812) [x86_64-darwin18]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
Hi,

I've been hitting what seems to be a memory leak.

When an hash is used as key for another hash, the former object will be retained even after multiple GC runs.

The following code snippet demonstrates how the hash `{:a => 1}` (which is never used outside the scope of `create`) is retained even after 10 GC runs (`find` will look for an object with a given `object_id` on heap).


```ruby
# frozen_string_literal: true

def create
  h = {{:a => 1} => 2}
  h.keys.first.object_id
end

def find(object_id)
  ObjectSpace.each_object(Hash).any?{|h| h.object_id == object_id} ? 1 : 0
end


leaked = create

10.times do
  GC.start(full_mark: true, immediate_sweep: true)
end

exit find(leaked)
```

This code snippet is expected to exit with `0` while it exits with `1` in my tests. I've tested this on multiple recent ruby versions and OSs, either locally (OSX with homebrew) or in different CIs (e.g. [here](https://github.com/cristiangreco/ruby-hash-leak/commit/285e586b7193104989f59b92579fe8f25770141e/checks?check_suite_id=278711566)).

Can you please help understand what's going on here? Thanks!



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>