pat eyler wrote: > On Fri, Jan 9, 2009 at 6:20 PM, Clifford Heath <no / spam.please.net> wrote: >> Robert Klemme wrote: >>> I would have guessed that gsub! is fastest >> It still might be - the benchmark doesn't run long enough to >> compare the GC overhead of making dozens of little strings >> that get used once each. > Is this better? No. Before I elaborate, I'm not saying I don't believe the result. I'm saying your benchmark figure won't accurately represent the cost in a long-running application. All versions create an initial million strings - perhaps 40 MB. Your computer has what, 2GB of memory? At what point does the GC run? The gsub version creates a million extra strings. The gsub! version creates, perhaps no extra strings, perhaps a million. The split version creates *six* million extra strings (one per word and one from join) The squeeze version creates two million (from squeeze, and from strip). Now depending on whether the string in your real-life application is an HTML document with a thousand white-space runs, how many extra strings do the respective versions take? split makes a *billion*. A benchmark environment must consider that the code being tested will run in the context of an application where there are many other objects created by all the *other* code - perhaps a thousand times as many objects. At some point, the garbage collector is likely to run. That takes time, and the time should be part of the benchmark. Try squeezing said HTML document a million times, and run the GC inside each benchmark timer (after the n.times loop). Then I'll be happy ;-). Clifford Heath.