Issue #16278 has been updated by alanwu (Alan Wu).


The GC scans the native stack for Ruby's C code to find values to retain.
Pointers to Ruby's heap objects can end up on the native stack for a variety
of reasons, and this is mostly up to the C compiler. Whether a pointer ends up
on the native stack depends on how Ruby is built, inlining decisions, register
alloation and a bunch of other things -- this is not something Ruby's code has
direct control over.

The generated machine code might interact in a way such that certain sequence
of Ruby code overwrite heap pointers on the native stack, letting the GC
collect those objects.

To illustrate, here is a modified version of your prometheus script:


```ruby
# frozen_string_literal: true

require "prometheus/client"
require "prometheus/client/formats/text"

require "prometheus/client/data_stores/synchronized"

Prometheus::Client.config.data_store = Prometheus::Client::DataStores::Synchronized.new

def test
  registry = Prometheus::Client::Registry.new
  counter = registry.counter(
    :counter_metric,
    docstring: "a counter",
    labels: %i[label1 label2],
  )

  labels1 = { label1: "foo", label2: "bar" }
  labels2 = { label1: 1, label2: 2 }
  labels3 = { label1: :a, label2: :b }

  counter.increment(by: 1, labels: labels1)
  counter.increment(by: 1, labels: labels2)
  counter.increment(by: 1, labels: labels3)

  puts Prometheus::Client::Formats::Text.marshal(registry)

  ret = [labels1.object_id, labels2.object_id, labels3.object_id]
  return ret
  3.itself
  ret
end

def find(id)
  ObjectSpace.each_object(Hash).find { |h| id == h.object_id }
end

retained_ids = test

10.times do
  GC.start(full_mark: true, immediate_sweep: true)
end

retained_ids.each { |id|
  obj = find(id)
  puts "found #{id} #{obj}" if obj
}

```

Tracing with GDB, you can find that `labels3` is retained because the compiler
chose to spill the receiver onto the native stack whenever a call without block
happens:


```
insns.def:
763     calling.block_handler = VM_BLOCK_HANDLER_NONE;
   0x00005555557284ee <+174>:   mov    %r15,%rbx
   0x00005555557284f1 <+177>:   movq   $0x0,0x70(%rsp)

764     vm_search_method(ci, cc, calling.recv = TOPN(calling.argc = ci->orig_argc));
   0x00005555557284fa <+186>:   movslq 0xc(%rcx),%rax
   0x00005555557284fe <+190>:   mov    %eax,0x80(%rsp)
   0x0000555555728505 <+197>:   shl    $0x3,%rax
   0x0000555555728509 <+201>:   sub    %rax,%rdx
   0x000055555572850c <+204>:   mov    -0x8(%rdx),%rax
   0x0000555555728510 <+208>:   mov    %rax,0x78(%rsp) <<<< Stores pointer to Ruby object on the native stack <<<<<<<

./include/ruby/ruby.h:
2058        if (RB_IMMEDIATE_P(obj)) {
=> 0x0000555555728515 <+213>:   test   $0x7,%al
   0x0000555555728517 <+215>:   je     0x55555572b24b <vm_exec_core+11787>

2059        if (RB_FIXNUM_P(obj)) return rb_cInteger;
   0x000055555572851d <+221>:   mov    0x1b56a4(%rip),%r12        # 0x5555558ddbc8 <rb_cInteger>
   0x0000555555728524 <+228>:   test   $0x1,%al
   0x0000555555728526 <+230>:   jne    0x555555728555 <vm_exec_core+277>
```

`labels3` happens to be the last receiver in the method so it is left on the
stack and retained. We can allow the GC to collect `labels3` by performing
another method call on some other object before the end of the `test` method.
Commenting out `return ret`, the VM spills `3` onto the same location that
`labels3` occupied, allowing the GC to collect labels3.

Again, all this is up to the exact machine code output of your compiler, so
what I'm seeing here on my setup might be completely different from yours.
This kind of object retainment can happen anywhere in the Ruby's C code and
can easily be introduced or removed by changes to the VM or the Ruby code in
question.

This kind of extraneous retainment is an inherent property of Ruby's GC.
There is no memory growth since the fixed size native stack puts a cap on
the number of objects that could be retained in this manner.

This could be the reason to the memory bloat problems in your app, but I think
other explanations are more plausible. I would suggest taking another look at
your heap profile. There should be other, more actionable instances of
object retainment in your app.


----------------------------------------
Bug #16278: Potential memory leak when an hash is used as a key for another hash
https://bugs.ruby-lang.org/issues/16278#change-82348

* Author: cristiangreco (Cristian Greco)
* Status: Rejected
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.6.5p114 (2019-10-01 revision 67812) [x86_64-darwin18]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
Hi,

I've been hitting what seems to be a memory leak.

When an hash is used as key for another hash, the former object will be retained even after multiple GC runs.

The following code snippet demonstrates how the hash `{:a => 1}` (which is never used outside the scope of `create`) is retained even after 10 GC runs (`find` will look for an object with a given `object_id` on heap).


```ruby
# frozen_string_literal: true

def create
  h = {{:a => 1} => 2}
  h.keys.first.object_id
end

def find(object_id)
  ObjectSpace.each_object(Hash).any?{|h| h.object_id == object_id} ? 1 : 0
end


leaked = create

10.times do
  GC.start(full_mark: true, immediate_sweep: true)
end

exit find(leaked)
```

This code snippet is expected to exit with `0` while it exits with `1` in my tests. I've tested this on multiple recent ruby versions and OSs, either locally (OSX with homebrew) or in different CIs (e.g. [here](https://github.com/cristiangreco/ruby-hash-leak/commit/285e586b7193104989f59b92579fe8f25770141e/checks?check_suite_id=278711566)).

Can you please help understand what's going on here? Thanks!



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>