I did some timings on the three methods suggested by Aleksi. Here is
the code being timed ...
def try_slow(str)
s = str.dup
re = /(\w+|[^\w]+)/
while md = re.match(s)
s = md.post_match
end
end
def try_strscan(str)
scanner = StringScanner.new(str)
while scanner.scan(/\A(\w+|[^\w]+)/)
end
end
def try_scan(str)
str.scan(/(\w+|[^\w]+)/) { |match| }
end
Here are the timing results ...
str.length try_slow try_strscan try_scan
---------- -------- ----------- --------
8192 0.1608 0.0155 0.0322
16384 0.4768 0.0436 0.0798
32768 1.7181 0.0786 0.1302
65536 6.8811 0.1443 0.2700
131072 27.7065 0.2908 0.5257
262144 n/a 0.5604 1.0551
524288 n/a 1.1240 2.1114
1048576 n/a 2.2513 4.2188
2097152 n/a 4.4887 8.4449
4194304 n/a 9.1853 16.9174
The "slow" method is O(n*n) as expected. Both the "strscan" and the
"scan" methods are O(n).
--
-- Jim Weirich jweirich / one.net http://w3.one.net/~jweirich
---------------------------------------------------------------------
"Beware of bugs in the above code; I have only proved it correct,
not tried it." -- Donald Knuth (in a memo to Peter van Emde Boas)