On 12/28/05, ara.t.howard / noaa.gov <ara.t.howard / noaa.gov> wrote:
> On Thu, 29 Dec 2005, Johannes Friestad wrote:
>
>
> > Symbols are (or can be) quicker for hash lookup, since it is sufficient to
> > compare object identity to find whether two symbols are the same, while
> > strings must be compared character by character. You are unlikely to notice
> > the difference unless your program uses hashes heavily.
>
> i see this claim all the time but never data supporting it, all my test
> programs have shown the opposite to be true.
>

Along with Jim and Mauricio, my tests indicate that symbols are
consistently quicker, even on short strings.

Here's my benchmark
-------
  def bmark_string_symb
    require 'benchmark'
    strings, symbols=[], []
    n, m=100, 1000
    hash={}
    n.times {|x| strings<<strings<<x.to_s+"key"}
    strings.each {|s| symbols<<s.to_sym}
    # initialize hash
    strings.each {|s| hash[s]=1}
    symbols.each {|s| hash[s]=1}
    Benchmark.bm(10) do |b|
      b.report("string set") { m.times {|x| strings.each {|s| hash[s]=x}}}
      b.report("symbol set") { m.times {|x| symbols.each {|s| hash[s]=x}}}
      b.report("string get") { m.times {|x| strings.each {|s| hash[s]}}}
      b.report("symbol get") { m.times {|x| symbols.each {|s| hash[s]}}}
    end
  end
-------

and here are some results:
-------
irb(main):080:0> bmark_string_symb
                user     system      total        real
string set  0.219000   0.016000   0.235000 (  0.235000)
symbol set  0.141000   0.000000   0.141000 (  0.141000)
string get  0.078000   0.000000   0.078000 (  0.078000)
symbol get  0.047000   0.000000   0.047000 (  0.047000)
=> true
=> true
irb(main):083:0> bmark_string_symb
                user     system      total        real
string set  0.234000   0.000000   0.234000 (  0.235000)
symbol set  0.063000   0.000000   0.063000 (  0.062000)
string get  0.078000   0.000000   0.078000 (  0.078000)
symbol get  0.047000   0.000000   0.047000 (  0.047000)
=> true
-------


There's a fair amount of variation, but symbols appear to behave as
expected (quicker on average), meaning that my guess that symbol
lookup in hashes was done on the basis of their string value was
wrong.
I guess I should learn to refrain from speculating until I've checked closer :)

jf