John Pywtorak <jpywtora / calpoly.edu> wrote:

> matt neuburg wrote:
> ..
> > What I do is this:
> > 
> >     the_string.scan(/./).each do |char|
> > 
> > However, do note that, as others have said, in Ruby 1.9 this will no
> > longer be necessary (though it will still work). m.
> > 
> 
> FWIW
> Since it comes up from time to time I was curious how much performance
> difference there was between using scan, split, and each_byte.  The 
> results surprised me.  From my blog post here is what I found:
> 
> irb(main):026:0> Benchmark.bm do |bm|
> irb(main):027:1* bm.report("split:") { 10000.times do a = 
> "1234567890".split('') end }
> irb(main):028:1> bm.report(" scan:") { 10000.times do a = 
> "1234567890".scan(/./) end }
> irb(main):029:1> bm.report("   eb:") { 10000.times do 
> "1234567890".each_byte { |by| (a ||= []) << by } end }
> irb(main):030:1> end
>       user     system      total        real
> split:  0.320000   0.000000   0.320000 (  0.321568)
> scan:  0.200000   0.000000   0.200000 (  0.210951)
>    eb:  0.260000   0.030000   0.290000 (  0.345428)
> 
> So, I am surprised that scan was faster, did you guess that? I wonder if
> pre-compiling the regex will make it even faster?
> 
> irb(main):033:0> Benchmark.bm do |bm|
> irb(main):034:1* bm.report("split:") { 10000.times do a = 
> "1234567890".split('') end }
> irb(main):035:1> bm.report(" scan:") { 10000.times do a = 
> "1234567890".scan(rx) end }
> irb(main):036:1> bm.report("   eb:") { 10000.times do 
> "1234567890".each_byte { |by| (a ||= []) << by } end }
> irb(main):037:1> end
>       user     system      total        real
> split:  0.280000   0.010000   0.290000 (  0.292449)
> scan:  0.180000   0.000000   0.180000 (  0.180988)
>    eb:  0.280000   0.050000   0.330000 (  0.367461)
> It sure did, huh.  As an aside in the book "The Ruby Way" second edition
> Hal uses scan in the Strings chapter not mentioning split, but does show
> each_byte.  I also wonder if how size of the string changes the benchmark.

I'd be curious to know what happens if you specify that this is a UTF-8
string. (Of course in that case each_byte can't be used at all.) m.

-- 
matt neuburg, phd = matt / tidbits.com, http://www.tidbits.com/matt/
Tiger - http://www.takecontrolbooks.com/tiger-customizing.html
AppleScript - http://www.amazon.com/gp/product/0596102119
Read TidBITS! It's free and smart. http://www.tidbits.com