I didn't notice the old code being faster, and I don't see your
benchmark triggering GC.  Perhaps it is related to memory size
(swapping or CPU cache misses) or power management
(CPU clock frequency adjustment, "turbo boost", etc...)
Modern hardware is tricky to benchmark :/