Hello Steven,

Sunday, February 1, 2004, 6:17:27 PM, you wrote:

SJ> I've repeated all the tests with Ruby 1.8.0 and it sure looks like
SJ> String#strip is the problem.

SJ> To summarize, I have a bare REXML stream parser that reads its input
SJ> file and does nothing. I'm handing it an input file that looks like this

SJ> <?xml version="1.0" ?>
SJ> <multistatus>
SJ> <response>
SJ> <dsref>0</dsref>
SJ> </response>
SJ> <response>
SJ> <dsref>1</dsref>
SJ> </response>
SJ> ...
SJ> </multistatus>

SJ> I'm varying the number of <response> elements in the file from 100 to
SJ> 50000 and calling that n. In Ruby 1.8.0, the runtime is roughly 
SJ> proportional to n, as it should be. In 1.8.1, the total runtime is
SJ> greater (which is bad but not pathological) and worse than linear in n
SJ> (which is pathological).

SJ> Here's the profile data for strip over all values of n:

SJ>                 time/call     % total runtime
SJ>                min    max     min      max

SJ> Ruby 1.8.0    .01    .02      .82     2.48
SJ> Ruby 1.8.1    .10   1.21     5.08    29.04

SJ> In both cases, the number of calls is proportional to n, although strip
SJ> is called 2.5 times more often in 1.8.0 than 1.8.1. In 1.8.1 (only),
SJ> there is a strong increasing trend in the time per call as n increases.

SJ> I don't suspect the problem is in strip itself. I suspect the problem is
SJ> that REXML creates a string that grows as more input is read and hands
SJ> it repeatedly to strip. I looked briefly at the REXML code and didn't
SJ> see anything obvious. (Other than the fact that calling strip on a
SJ> buffer just to see whether it has any tokens in it is inefficient. That
SJ> just makes it slow, however, not nonlinear--unless the buffer is growing.)

SJ> I hope this is enough for someone to diagnose and fix the problem. I'll
SJ> send the data to anyone who wants it.

You are the only one talking with you in this thread. Do you really
think that someone else will do your job ? Do it for free ? Welcome to the
world of OpenSource Software (call it non active supported software),
write a personal email to the REXML author hope he respond, otherwise
fix it and hope he adds it to the code, otherwise publish your source
patch and hope more people use it then the original branch, otherwise repeat
the same thing every few month as long as you need this library.

Yes, no doubt, i'm not an open source fan.

-- 
Best regards,
 Lothar                            mailto:mailinglists / scriptolutions.com