On Mon, Jan 11, 2010 at 8:27 AM, Paul Brannan <pbrannan / atdesk.com> wrote: > For one application where we deployed both JRuby and Ruby 1.8, JRuby was > able to significantly outperform Ruby 1.8 in terms of total cpu > utilization and latency. However, JRuby with the default GC settings > still periodically became unresponsive for about 1-2 seconds throughout > the running time of of the application. Switching to an incrementalC > made these gaps disappear, but by that point it was just an experiment > for my own curiosity; the amount of effort to setup JRuby and tune the > GC exceeded the time allocated to the project. In the end I went on-month vacation, and when I came back, the entire application had been > rewritten in C++. Some clarification of the pluggable GC stuff in the JVM (at least what I know from Hotspot/OpenJDK...the other two major JVMs have their own unique collectors). * None of the major JVMs ever give heap space back to the system; if they grow to 200MB, they'll never use less than 200MB. The justification is that if you've run a process up to a certain size, you're likely to need that size. It is possible to limit the growth using a few simple flags that set minimum and maximum heap size. * The GCs are not *live* swappable; you pick them at startup and that's what you use for the lifetime of the process. * There are many tunable settings, but most of those are also not live; you set them at startup. * Many of the settings can be left at defaults since Hotspot will try to pick an appropriate GC and GC settings for your hardware (and will adjust some GC behavior at runtime depending on how your application behaves). There are obviously way more tunables than most people ever need to set, so generally running with the default "GC ergonomics" will allow Hotspot to pick the best settings. It's not always perfect, of course, so having the tunables is nice. I thought it would be interesting to get some timings and GC output for the three main JVM GCs, given the OP's benchmark: The serial GC is the basic single-threaded collector. ~/projects/jruby time jruby -J-XX:+UseSerialGC -e "(1..7).each{|n| a = ['a']*(10**n); a.inspect;}" real 0m4.538s user 0m4.359s sys 0m0.324s The Parallel GC uses multiple threads to reduce pauses. My system has 2 cores, so I believe it defaults to 2 GC threads...both this and the Concurrent Mark/Sweep collector would be more interesting on a system with more cores. ~/projects/jruby time jruby -J-XX:+UseParallelGC -e "(1..7).each{|n| a = ['a']*(10**n); a.inspect;}" real 0m5.489s user 0m5.383s sys 0m0.485s ~/projects/jruby time jruby -J-XX:+UseParallelGC -J-XX:ParallelGCThreads=1 -e "(1..7).each{|n| a = ['a']*(10**n); a.inspect;}" real 0m8.984s user 0m5.793s sys 0m0.605s ~/projects/jruby time jruby -J-XX:+UseParallelGC -J-XX:ParallelGCThreads=2 -e "(1..7).each{|n| a = ['a']*(10**n); a.inspect;}" real 0m5.251s user 0m5.132s sys 0m0.510s ~/projects/jruby time jruby -J-XX:+UseParallelGC -J-XX:ParallelGCThreads=3 -e "(1..7).each{|n| a = ['a']*(10**n); a.inspect;}" real 0m5.685s user 0m5.146s sys 0m0.658s The Concurrent Mark-Sweep collector does the mark and sweep phases of GC concurrent with program execution, but still stops the world for compacting: ~/projects/jruby time jruby -J-XX:+UseConcMarkSweepGC -e "(1..7).each{|n| a = ['a']*(10**n); a.inspect;}" real 0m4.860s user 0m5.077s sys 0m0.349s The G1 "Garbage First" collector is a semispace collector that does not have generations. It is intended to eventually replace the CMS GC and provide more predictable pauses (CMS can occasionally cause really long full GC pauses under high load): ~/projects/jruby time jruby -J-XX:+UnlockExperimentalVMOptions -J-XX:+UseG1GC -e "(1..7).each{|n| a = ['a']*(10**n); a.inspect;}" real 0m9.098s user 0m9.709s sys 0m1.076s Here's a nice article on tuning GC for Java 5 (which is EOL, but the article applies well to Java 6+): http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html It might help guide potential solutions for MRI's GC. - Charlie