"David A. Black" <dblack / wobblini.net> schrieb im Newsbeitrag
news:Pine.LNX.4.61.0412160543050.23221 / wobblini...
> On Thu, 16 Dec 2004, Ardanwen wrote:
>
> > Hi all.
> >
> > I wrote a small script to count the number of occurrences of one
string
> > in another string (including somewhat overlapping occurrences). Then I
> > found out about the .scan method, which speeded up some things, but
> > unfortunately introduced the 'banana problem' -> ("banana".scan("ana")
> > returns only one "ana")
> >
> >
> > (This was the script I had before i found out about scan:)
> > ------
> > $proteasome.each {|x|
> > i = VIRUS               #virus length
> > begin
> > i = virus[0,i + 4].rindex(x)
> > count_prot += 1 if i != nil
> > end until i == nil
> > }
> > ------
> >
> > proteasome is an array filled with 12 strings ("00010" , "00100" ,
> > "10110" etc,
> > virus is a long string of 1's and 0's (~3000 in total).
> >
> >
> > Initially, I changed the following, which speeded up the whole program
> > about 25% but suffered from the banana problem (most of the program's
> > energy goes into the above algorithm)
> >
> > ------
> > $proteasome.each {|x|
> > virus.scan(x) {
> > count_prot += 1
> > }
> > }
> > ------
> >
> >
> > So I changed it into the following, and then optimized a little bit
> > (first did virus.scan(/(?=#{x})/ which is a little slower then what
I'm
> > doing now)
> >
> > ------
> > $proteasome.each {|x|
> > virus.scan(/#{x[0].chr}(?=#{x[1,4]})/) {
> > count_prot += 1
> > }
> > }
> > ------
> >
> > The sad thing is, the above is comparable in speed with the script I
> > had before I found out about the scan method. Did I miss any obvious
> > optimizations in the scanning/regexp method?
>
> Are you sure about the benchmarks?  I got results that suggest that
> the plain scan version is much faster, assuming I've adapted your
> original code correctly:
>
>    require 'benchmark'
>    include Benchmark
>
>    str = "bananaaananaaana" * 100
>    n = 5000
>
>    def boris(str)
>       c = 0
>       i = str.size
>       begin
>         i = str[0,i + 2].rindex("ana")
>         c += 1 if i != nil
>       end until i == nil
>       c
>    end
>
>    def david(str)
>      str.scan(/(?=ana)/).size
>    end
>
>    bm do |x|
>      x.report { n.times { boris(str) } }
>      x.report { n.times { david(str) } }
>    end
>
> Results:
>
>    $ ruby ban.rb
>          user     system      total        real
>     27.460000   0.490000  27.950000 ( 28.048449)
>      8.790000   0.020000   8.810000 (  8.800876)
>
>
> David

As far as I remember he didn't do the size trick your used.  That might be
one point.

    robert