sam.saffron / gmail.com wrote:
> Feature #15667: Introduce malloc_trim(0) in full gc cycles
> https://bugs.ruby-lang.org/issues/15667

The patch looks OK with the current state of glibc malloc for
multithreaded use cases in Ruby.  However, this seems like it is
papering over an inefficiency in glibc malloc which needs to be
fixed.

Background: glibc uses sbrk for the main-thread arena.
  Sub-thread arenas use mmap by default; but limiting
  MALLOC_ARENA_MAX can case sub-threads to use the main(sbrk)
  arena.

Some notes:

1.  We need to know how performance is in single-threaded apps
    compared multi-threaded apps(*); since I'm not sure if
    single-threaded use was benchmarked.

    The part which concerns me is the systrim() call only affects the
    main (sbrk) arena...

    https://public-inbox.org/libc-alpha/801ba1f499/s/?b=malloc/malloc.c#n4817

    ...which is likely to cause a performance hit if the main thread
    does more allocations+frees down the line.

    Multi-thread servers like puma do not allocate much in the main thread
    once started, but `require' eats a lot of memory at startup[3].
    So the benefit of reduced memory use noted by Hongli could be coming
    from this.

    The MADV_DONTNEED loop above systrim releases memory back to
    the OS for non-main-thread arenas, too; but should not
    noticeably affect performance.  Maybe MADV_DONTNEED (or
    MADV_FREE) alone is enough for RSS savings?


2.  As for improving glibc...
    Maybe glibc is missing malloc_consolidate calls in some
    places which need them.  Or (more likely) there's places where
    it would be better to call mtrim(av, mp_.top_pad) instead of
    malloc_consolidate(av) to have glibc do trimming.


3.  Once a multi-threaded application is fully-loaded; perhaps a
    single call to "malloc_trim(0)" (via fiddle) is all that's
    necessary?  That would free the memory eaten by `require'
    in the main thread; and further GCs won't be affected.

    If this is successful in reducing memory use, that would
    mean the systrim() call I was concerned about in [1] was
    responsible for the savings.  And perhaps a single call
    is all that is necessary.


I no longer have mosh/ssh access to anything which can build or
test large codebases at a decent rate.  So it's not feasable for
me to work on glibc, ruby or any large codebase anymore.

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>