jweirich / one.net wrote:
>
>I did some timings on the three methods suggested by Aleksi.  Here is
>the code being timed ...
>
>     def try_slow(str)
>       s = str.dup
>       re  = /(\w+|[^\w]+)/
>       while md = re.match(s)
>         s = md.post_match
>       end
>     end
>
>     def try_strscan(str)
>       scanner = StringScanner.new(str)
>       while scanner.scan(/\A(\w+|[^\w]+)/)
>       end
>     end
>
>     def try_scan(str)
>       str.scan(/(\w+|[^\w]+)/) { |match| }
>     end
>
>Here are the timing results ...
>
>str.length   try_slow      try_strscan         try_scan
>----------   --------      -----------         --------
>       8192     0.1608           0.0155           0.0322
>      16384     0.4768           0.0436           0.0798
>      32768     1.7181           0.0786           0.1302
>      65536     6.8811           0.1443           0.2700
>     131072    27.7065           0.2908           0.5257
>     262144        n/a           0.5604           1.0551
>     524288        n/a           1.1240           2.1114
>    1048576        n/a           2.2513           4.2188
>    2097152        n/a           4.4887           8.4449
>    4194304        n/a           9.1853          16.9174
>
>The "slow" method is O(n*n) as expected.  Both the "strscan" and the
>"scan" methods are O(n).

Both of the scan methods are incompatible with context-sensitive
tokenizing.  Guess what I want to do? :-)

Cheers,
Ben
_____________________________________________________________________________________
Get more from the Web.  FREE MSN Explorer download : http://explorer.msn.com