2010/6/24 Jes=FAs Gabriel y Gal=E1n <jgabrielygalan / gmail.com>:
> On Thu, Jun 24, 2010 at 5:04 PM, Danny Challis <dannychallis / gmail.com> w=
rote:
>> Hello everyone,
>> =A0 I need to count the number of times a substring occurs in a string.
>> I am currently doing this using the scan method, but it is simply too
>> slow. =A0I feel there should be a faster way to do this since the scan
>> method is really designed for more advanced things than this. =A0I do no=
t
>> need to do regex matching or to process the matches, just count
>> substrings. =A0So what I want is something like this:
>>
>> s =3D "you like to play with your yo-yo"
>> s.magical_count_method("yo") =3D> 4
>>
>> Once again, what I'm really looking for is something fast. =A0I've tried
>> using external linux commands such as awk, but that was much much
>> slower. Any ideas?
>
> I don't know how slow is scan for you. An implementation using
> String#index and a loop is a little bit faster, but not too much:
>
> require 'benchmark'
>
> TIMES =3D 100_000
> s =3D "you like to play with your yo-yo"
>
> Benchmark.bmbm do |x|
> =A0x.report("scan") do
> =A0 =A0TIMES.times do
> =A0 =A0 =A0 =A0s.scan("yo").size
> =A0 =A0end
> =A0end
> =A0x.report("while") do
> =A0 =A0TIMES.times do
> =A0 =A0 =A0 =A0index =3D -1
> =A0 =A0 =A0 =A0count =3D 0
> =A0 =A0 =A0 =A0while (index =3D s.index("yo", index+1))
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0count +=3D 1
> =A0 =A0 =A0 =A0end
> =A0 =A0 =A0 =A0count
> =A0 =A0end
> =A0end
> end
>
> $ ruby scan_vs_while.rb
> Rehearsal -----------------------------------------
> scan =A0 =A00.560000 =A0 0.020000 =A0 0.580000 ( =A00.585972)
> while =A0 0.440000 =A0 0.060000 =A0 0.500000 ( =A00.492969)
> -------------------------------- total: 1.080000sec
>
> =A0 =A0 =A0 =A0 =A0 =A0user =A0 =A0 system =A0 =A0 =A0total =A0 =A0 =A0 =
=A0real
> scan =A0 =A00.510000 =A0 0.010000 =A0 0.520000 ( =A00.519078)
> while =A0 0.470000 =A0 0.020000 =A0 0.490000 ( =A00.493562)
>
> Don't know if this is enough for you, probably not :-)

I took the liberty to extend the benchmark a bit:

http://gist.github.com/451622

I would have expected regexp to be faster...

Cheers

robert

--=20
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/