Issue #11098 has been updated by Jason Clark.


allocation_tracer is awesome for debugging, and I've happily used it a number of times. Thank you for building it Koichi!

While most people certainly wouldn't use this, I do have a case for it in production. Specifically, I work at New Relic, and I wanted this for the Ruby agent (newrelic_rpm) to read. It would be a huge benefit to our users to pinpoint specific web requests that are allocation heavy. Production allocation often differs from other environments, so seeing what's actually happening on prod is a big benefit. The current global counters are noisy in the presence of other threads, and since we can't reliably provide the information for a specific request, we don't say anything at all.

Working as a gem has the disadvantages that you list, which are real concerns for us. In our experience few users enable optional features, so we probably won't even build something for an optional approach to adding this in.

If you still feel the overhead outweighs the use case we can close this out. It would give instrumenters like myself awesome insight into one of the most common causes of Ruby app slowdown, but I understand the concerns.

----------------------------------------
Feature #11098: Thread-level allocation counting
https://bugs.ruby-lang.org/issues/11098#change-58747

* Author: Jason Clark
* Status: Feedback
* Priority: Normal
* Assignee: 
----------------------------------------
This patch introduces a thread-local allocation count. Today you can get a
global allocation count from `GC.stat`, but in multi-threaded contexts that
can give a muddied picture of the allocation behavior of a particular piece of
code.

Usage looks like this:

```
[2] pry(main)> Thread.new do
[2] pry(main)*   1000.times do
[2] pry(main)*     Object.new
[2] pry(main)*   end
[2] pry(main)*   puts Thread.current.allocated_objects
[2] pry(main)* end
1000
```

This would be of great interest to folks profiling Ruby code in cases where we
can't turn on more detailed object tracing tools. We currently use GC activity
as a proxy for object allocations, but this would let us be way more precise.

Obviously performance is a big concern. Looking at GET_THREAD, this doesn't
appear to have any clearly large overhead. To check this out, I ran the
following benchmark:

```
require 'benchmark/ips'

Benchmark.ips do |benchmark|
  benchmark.report "Object.new" do
    Object.new
  end

  benchmark.report "Object.new" do
    Object.new
  end

  benchmark.report "Object.new" do
    Object.new
  end
end
```

Results from a few run-throughs locally:

Commit 9955bb0 on trunk:

```
Calculating -------------------------------------
          Object.new   105.244k i/100ms
          Object.new   105.814k i/100ms
          Object.new   106.579k i/100ms
-------------------------------------------------
          Object.new      4.886M ( 4.5%) i/s -     24.417M
          Object.new      4.900M ( 1.9%) i/s -     24.549M
          Object.new      4.835M ( 7.4%) i/s -     23.980M
```

With this patch:

```
Calculating -------------------------------------
          Object.new   114.248k i/100ms
          Object.new   114.508k i/100ms
          Object.new   114.472k i/100ms
-------------------------------------------------
          Object.new      4.776M ( 5.1%) i/s -     23.878M
          Object.new      4.767M ( 5.2%) i/s -     23.818M
          Object.new      4.818M ( 1.5%) i/s -     24.154M
```

I don't have a good sense of whether this is an acceptable level of change or
not, but I figured without writing the code to test there was no way
to know. What do you think?

---Files--------------------------------
thread-local.patch (2.04 KB)
thread-local-update.patch (2.05 KB)


-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>