"David A. Black" <dblack / wobblini.net> schrieb im Newsbeitrag news:Pine.LNX.4.61.0412160543050.23221 / wobblini... > On Thu, 16 Dec 2004, Ardanwen wrote: > > > Hi all. > > > > I wrote a small script to count the number of occurrences of one string > > in another string (including somewhat overlapping occurrences). Then I > > found out about the .scan method, which speeded up some things, but > > unfortunately introduced the 'banana problem' -> ("banana".scan("ana") > > returns only one "ana") > > > > > > (This was the script I had before i found out about scan:) > > ------ > > $proteasome.each {|x| > > i = VIRUS #virus length > > begin > > i = virus[0,i + 4].rindex(x) > > count_prot += 1 if i != nil > > end until i == nil > > } > > ------ > > > > proteasome is an array filled with 12 strings ("00010" , "00100" , > > "10110" etc, > > virus is a long string of 1's and 0's (~3000 in total). > > > > > > Initially, I changed the following, which speeded up the whole program > > about 25% but suffered from the banana problem (most of the program's > > energy goes into the above algorithm) > > > > ------ > > $proteasome.each {|x| > > virus.scan(x) { > > count_prot += 1 > > } > > } > > ------ > > > > > > So I changed it into the following, and then optimized a little bit > > (first did virus.scan(/(?=#{x})/ which is a little slower then what I'm > > doing now) > > > > ------ > > $proteasome.each {|x| > > virus.scan(/#{x[0].chr}(?=#{x[1,4]})/) { > > count_prot += 1 > > } > > } > > ------ > > > > The sad thing is, the above is comparable in speed with the script I > > had before I found out about the scan method. Did I miss any obvious > > optimizations in the scanning/regexp method? > > Are you sure about the benchmarks? I got results that suggest that > the plain scan version is much faster, assuming I've adapted your > original code correctly: > > require 'benchmark' > include Benchmark > > str = "bananaaananaaana" * 100 > n = 5000 > > def boris(str) > c = 0 > i = str.size > begin > i = str[0,i + 2].rindex("ana") > c += 1 if i != nil > end until i == nil > c > end > > def david(str) > str.scan(/(?=ana)/).size > end > > bm do |x| > x.report { n.times { boris(str) } } > x.report { n.times { david(str) } } > end > > Results: > > $ ruby ban.rb > user system total real > 27.460000 0.490000 27.950000 ( 28.048449) > 8.790000 0.020000 8.810000 ( 8.800876) > > > David As far as I remember he didn't do the size trick your used. That might be one point. robert