Daniel Berger wrote:
> One thing you can do is run your code through the profiler (ruby -r
> profile yourcode.rb) and show us the results.

I've been running lots of tests. Unfortunately, the profiler adds a 
large O(n) component to the runtime, so the nonlinear behavior doesn't 
emerge until larger n. (n is the number of tags in the input XML file.)

I have a lot of data now and I'll have to organize it a little. The 
method call counts look fine--they are all linear in n (with a small 
constant offset.) One thing that does jump out, however, is that as n 
goes from 1000 to 50000, the time per call for String#strip goes 
increases by factor of about 10. For n = 1000, String#strip accounted 
for about 6% of the total runtime. For n = 50000, it's about 30%. 
(Without the profiler, the percentages would be even higher.) That must 
mean String#strip is being passed longer arguments for larger n.

String#strip is called exactly the same number of times as:

REXML::Parsers::BaseParser#pull
REXML::Parsers::BaseParser#has_next?
REXML::Parsers::BaseParser#empty?
REXML::IOSource#empty?
REXML::Source#empty?

IOSource#empty? and Source#empty both call String#strip directly, so I 
suspect that's where the problem is.

Steve