because I recently have messed around with a ruby syntax colorer, 
I needed to know more about the performance of #scan or if there
were faster alternatives.. String#scan seems to be the fastest.

maybe this come others in handy.

--
Simon Strandgaard


bash-2.05b$ ruby h.rb
                          user     system      total        real
String#scan           0.810000   0.020000   0.830000 (  0.937981)
strscan               1.110000   0.040000   1.150000 (  1.255724)
homemade slicer       2.420000   0.130000   2.550000 (  2.648530)
true
true
bash-2.05b$ expand -t2 h.rb
require 'strscan'
def strscan(string, re)
  tokens = []
  ss = StringScanner.new(string)
  until ss.eos?
    m = ss.scan(re)
    break unless m
    tokens << m
  end
  tokens
end
def slicer(string, re)
  tokens = []
  while string.size > 0
    m = re.match(string)
    break unless m
    token = string.slice!(0, m.end(0))
    tokens << token
  end
  tokens
end
re_src = '\d+|\s+|.'
n = 10000
require 'benchmark'
Benchmark.bm(20) do |b|
  # Exercise String#scan
  re1 = Regexp.new(re_src)
  lines = IO.readlines(__FILE__)
  result1 = []
  GC.disable
  b.report("String#scan") do
    n.times do |i|
      result1 << lines[i%lines.size].scan(re1)
    end
  end
  GC.enable
  # Exercise strscan
  lines = IO.readlines(__FILE__)
  result2 = []
  GC.disable
  b.report("strscan") do
    n.times do |i|
      result2 << strscan(lines[i%lines.size], re1)
    end
  end
  GC.enable
  # Exercise homemade slicer
  re2 = Regexp.new('\A(?:'+re_src+')')
  lines = IO.readlines(__FILE__)
  result3 = []
  GC.disable
  b.report("homemade slicer") do
    n.times do |i|
      result3 << slicer(lines[i%lines.size].clone, re2)
    end
  end
  GC.enable
  # check that output was correct
  p((result1 == result2), (result1 == result3))
end
bash-2.05b$