On Sat, 3 Sep 2005, Ara.T.Howard wrote:

> On Sat, 3 Sep 2005, Hugh Sasse wrote:
>
>> Wow, lots of good stuff already.  I'm collecting these on the web
>> (hope that's OK, attributed, slightly edited for brevity, no email
>> addresses)... The page, which needs the formatting tweaking some more, is
>> 
>> http://www.eng.cse.dmu.ac.uk/~hgs/ruby/performance/
>
> cool.

Thank you.  I've still not tidied it up yet.
>
>>>  - don't write explicit IO : use mmap and let the kernel worry about it
>> 
>> I'm not familiar enough with mmap to understand this one :-)
>
> guy's abstraction is powerful, you just use it like a string:
>
>    harp:~ > cat a.rb
>    require 'mmap'
>    require 'tempfile'
>
         [...]
>    #
>    # modify it in place - letting the kernel do the IO
>    #
>    m = Mmap::new tmp.path, 'rw', Mmap::MAP_SHARED
>    m.gsub! %r/foobar/, 'barfoo'
>    m.msync
>    m.munmap
>
Nice.
        [...]
>
>    harp:~ > ruby a.rb
>    "barfoo\n"
>
> it's extremely powerful when you are changing, say, a scanline of pixels in 
> the
> middle of a 40MB image but nothing else - the kernel will only read/write the
> blocks you need.

That's a great help.  I'll have to look into that, not that I do
much image processing these days.
>
>>> 
>>>
>>>  - compose things out of array, hashes, and strings whenever possible. 
>>> these
>>>    objects are pretty optimized in ruby and serialize great as yaml 
>>> meaning
>>>    you never need to write hand parsers.
>> 
>
>> Do you mean, as opposed to creating new classes and composing with those?
>
> sometimes ;-)  i've just been suprised by the performance of ruby's built-ins
> lately.  for instance, rbtree used to by tons faster for big maps, but hashes
> don't do too badly now.  built-in strings share storage sometimes, etc.

Yes, I'd like to write some test cases for this exploration at some
point, so as future ruby versions appear we can do regression tests
on what is claimed to be quickest.
>
>>>
>>>  - cache anything possible, for instance methods dynamically generated 
>>> from
>>>    method_missing : permanmently add the method to the object instead of
>>>    generating it everytime
>> 
>> I've tried to do this and found performance has degraded.  I'm not
>> sure why, but think this post on Redhanded
>> 
>> http://redhanded.hobix.com/inspect/theFullyUpturnedBin.html
>> 
>> (unfortunately mauled by poker spammers, <seethe/>), referencing
>> 
>> http://whytheluckystiff.net/articles/theFullyUpturnedBin.html
>> 
>> has something to do with it, and because of "Grab things in 8MB
>> chunks" in particular.   One case that stood out was caching results
>> of floating point calculations involving roots: caching was slower.
>
> sure.  it's just guideline - i've run up against similar things too.  can't
> avoid testing in the end ;-)

Agreed.  I'm just wondering if the heuristic should be to cache
things of the order of megabytes, because having about 8 of them
will "trip the switch"?  I've not read the code and don't know
enought about GC to say much that's meaningful.

>
> -a

         Thank you,
         Hugh