On Thu, 17 Jun 2004, Ruben wrote:

> The difference however is that the above code does the scan over the
> String several times (# of chars to check - 1) in C, while all the other
> code scans the String only once in ruby.
> 
> Probably there is a trade-off... maybe if you would wanna count all
> occurences of [a-zA-Z0-9], then you would have to do the
> 'count'-method 57 times (57 because the above code assumes that only
> the characters you count appear in the String), effectively scanning
> the String 57 times. This might be slower than scanning the String
> only once in ruby-code. (didn't test this though...)

There is a trade-off, but you may need to throw in some kanji or unicode
to get there.  With just [a-zA-Z0-9], a single scan in Ruby is still
slower.  Here are some timings:

              user     system      total        real
array:    0.880000   0.080000   0.960000 (  0.958946)
hash:     1.210000   0.070000   1.280000 (  1.330615)
count:    0.100000   0.000000   0.100000 (  0.091805)
delete:   0.030000   0.020000   0.050000 (  0.051760)

Both 'array' and 'hash' use #each_byte (M iterations in Ruby code),
'count' uses #count repeatedly (62*M iterations in C code), and 'delete'
uses #delete to partition the string into 5 substrings and #delete! to
count characters (5*M + 62*M/5 = about 17*M iterations in C code).

And the benchmark code:

#!/usr/bin/ruby -w

str = ['a'..'z','A'..'Z',0..9].map {|x| x.to_a}.flatten.join * 10000

require 'benchmark'

Benchmark.bm(8) do |bm|
  bm.report('array:') do
    a = Array.new(255,0);
    str.each_byte {|i| a[i] += 1};
    [?0..?9,?a..?z,?A..?Z].map {|x| x.to_a}.flatten.each {|i|
      "#{[i].pack("c")} #{a[i]}"
    };
  end
  bm.report('hash:') do
    c={}
    str.each_byte{|b|c[b]||=0; c[b]+=1}
    c.sort.each {|k,v| "#{k.chr} #{v}"}
  end
  bm.report('count:') do
    ['0'..'9','a'..'z','A'..'Z'].map {|x| x.to_a}.flatten.each do |i|
      "#{i} #{str.count(i)}"
    end
  end
  bm.report('delete:') do
    ['0'..'9','a'..'m','n'..'z','A'..'M','N'..'Z'].each do |range|
      s = str.delete("^#{range.first}-#{range.last}")
      range.each do |i|
        n = s.size
        s.delete!(i)
        "#{i} #{n - s.size}"
      end
    end
  end
end

__END__

-- 
Relm