I don't mind this patch and even see it as an opportunity to drop
objspace->total_allocated_objects entirely and rely exclusively on
thread-local counters for GC.

I toyed around with a similar idea last year in [ruby-core:61424] for
malloc accounting but haven't gotten much further.  I might investigate
this again over the summer.

Anyways some minor nits inline:

> --- a/gc.c
> +++ b/gc.c
> @@ -1741,6 +1741,10 @@ newobj_of(VALUE klass, VALUE flags, VALUE v1, VALUE v2, VALUE v3)
>  #endif
>  
>      objspace->total_allocated_objects++;
> +
> +    rb_thread_t *th = GET_THREAD();
> +    th->allocated_objects++;

That would trip -Werror=declaration-after-statement in GCC.  Declare
`th' earlier or avoid the local variable entirely since you're only
reading that once.

     GET_THREAD()->allocated_objects++;

> --- a/thread.c
> +++ b/thread.c
> @@ -2568,6 +2568,14 @@ rb_thread_group(VALUE thread)
>      return group;
>  }
>  
> +VALUE
> +rb_thread_allocated_objects(VALUE thread)
> +{
> +    rb_thread_t *th;
> +    GetThreadPtr(thread, th);
> +    return LONG2NUM(th->allocated_objects);
> +}

> --- a/vm_core.h
> +++ b/vm_core.h
> @@ -598,6 +598,7 @@ typedef struct rb_thread_struct {
>      int safe_level;
>      int raised_flag;
>      VALUE last_status; /* $? */
> +    long allocated_objects;

Use uint64_t to avoid overflow on 32-bit systems as this counter never
resets.  This should never be a signed value.