jweirich / one.net wrote: > >I did some timings on the three methods suggested by Aleksi. Here is >the code being timed ... > > def try_slow(str) > s = str.dup > re = /(\w+|[^\w]+)/ > while md = re.match(s) > s = md.post_match > end > end > > def try_strscan(str) > scanner = StringScanner.new(str) > while scanner.scan(/\A(\w+|[^\w]+)/) > end > end > > def try_scan(str) > str.scan(/(\w+|[^\w]+)/) { |match| } > end > >Here are the timing results ... > >str.length try_slow try_strscan try_scan >---------- -------- ----------- -------- > 8192 0.1608 0.0155 0.0322 > 16384 0.4768 0.0436 0.0798 > 32768 1.7181 0.0786 0.1302 > 65536 6.8811 0.1443 0.2700 > 131072 27.7065 0.2908 0.5257 > 262144 n/a 0.5604 1.0551 > 524288 n/a 1.1240 2.1114 > 1048576 n/a 2.2513 4.2188 > 2097152 n/a 4.4887 8.4449 > 4194304 n/a 9.1853 16.9174 > >The "slow" method is O(n*n) as expected. Both the "strscan" and the >"scan" methods are O(n). Both of the scan methods are incompatible with context-sensitive tokenizing. Guess what I want to do? :-) Cheers, Ben _____________________________________________________________________________________ Get more from the Web. FREE MSN Explorer download : http://explorer.msn.com