I need some help with optimizing a set of libraries that I use. They are ffi-rzmq, zmqmachine and rzmq_brokers (all up on github).

I ran a 'latency' test using the highest-level library (rzmq_brokers) and noticed that, as I add each network client to the test, each iteration takes another 300 microseconds to complete. From my testing, JRuby is far and away the fastest runtime executing my code so I have used it to gather some profile statistics. Right now I have some local changes in my zmqmachine and rzmq_brokers repositories that haven't been pushed to github, so another party can't currently reproduce what I am seeing. I am working right now to clean up those commits so that I can push them out.

I did two runs with JRuby using the --server option. The first run was with --profile to produce a flat profile results. The second run was with --profile.grah to produce the graph profile results.

https://gist.github.com/1472608

I would be grateful if another set (or several sets) of eyes could take a look at the profile results and suggest places I should focus on for reducing that 300 usec latency per client. I have already reduced it from 700 usec to 300 usec by replacing a regular Array with a SortedArray (where data is stored sorted so that lookups are fast). I'm hoping to get another 50% reduction.

Thanks!

cr