A couple quick observations.

* I assume it's expected that the majority of time is spent in
zmq_poll, which is an FFI Call to zmq. Excluding that from
consideration...

* This is bad:

 16.57        0.76       15.81     3079083  Class#new

Three million new classes created. I'm sure it's singleton somewhere,
but that's often a big perf sink...not necessarily because it impacts
straight-line code a lot, but because a class object is a nontrivial
thing to be creating three million of.

* Going down the list, I don't see anything obvious to optimize in
most of the hot methods. Usually when you have normalish-looking code
that's really hot, the next thing to optimize is reducing object
allocation as much as possible. In fact, that more than anything often
improves perf on JRuby...but I'm not sure the effect on other impls.

- Charlie

On Tue, Dec 13, 2011 at 10:32 AM, Chuck Remes <cremes.devlist / mac.com> wrote:
> On Dec 13, 2011, at 9:57 AM, Chuck Remes wrote:
>
>> I need some help with optimizing a set of libraries that I use. They areffi-rzmq, zmqmachine and rzmq_brokers (all up on github).
>>
>> I ran a 'latency' test using the highest-level library (rzmq_brokers) and noticed that, as I add each network client to the test, each iteration takes another 300 microseconds to complete. From my testing, JRuby is far andaway the fastest runtime executing my code so I have used it to gather some profile statistics. Right now I have some local changes in my zmqmachine and rzmq_brokers repositories that haven't been pushed to github, so another party can't currently reproduce what I am seeing. I am working right now to clean up those commits so that I can push them out.
>>
>> I did two runs with JRuby using the --server option. The first run was with --profile to produce a flat profile results. The second run was with --profile.grah to produce the graph profile results.
>>
>> https://gist.github.com/1472608
>>
>> I would be grateful if another set (or several sets) of eyes could take a look at the profile results and suggest places I should focus on for reducing that 300 usec latency per client. I have already reduced it from 700 usec to 300 usec by replacing a regular Array with a SortedArray (where datais stored sorted so that lookups are fast). I'm hoping to get another 50% reduction.
>>
>> Thanks!
>
>
>
> In case someone wants to try and duplicate what I am doing, all of the code has been pushed up to github. I'm really hoping someone can look at the profile results and suggest 2-3 places that are ripe for optimization, but in the event you are feeling frisky now you can reproduce the whole shebangon your own machine.
>
> The aforementioned code repositories are:
>
> git://github.com/chuckremes/ffi-rzmq.git
>
> git://github.com/chuckremes/zmqmachine.git
>
> git://github.com/chuckremes/rzmq_brokers.git
>
> In rzmq_brokers/test, run:
>
> ruby test_consensus.rb <any open port number, e.g. 5556>
>
> The test adds a new network client for each 10_000 iterations and prints the average round-trip latency of those 10k messages. As each client is added, latency goes up about 300 usec (on my machine, YMMV). I would like to reduce that number.
>
> cr
>