Ari Brown wrote:
> Pattern matching problem. This time, it doesn't print out any thing 
> and just soaks up my CPU. I tried slowly adding more and more for it 
> to do, and it worked great -- until TABLE7. Then it just soaks up my 
> CPU and makes me cry. At first, when nothing was printing, I added 
> $stdout.flush to make it print. But it didn't print! This makes me 
> think that it's something in the when part.
>
> Whats going on?
>
> Help!
>
>
> lines.each do |line|
>   case line
>   when 
> /^"(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)"$/ 
>
There are several ways to optimize the regular expression but the most 
important thing is to not be greedy. What I mean by this is that using 
(.*) matches everything to the end of the line and then the regular 
expression backtracks to find the next " character specified. It will 
choose the " character closest to the end of the line but that is not 
the one you want so it backtracks again and again and so on wasting CPU 
cycles.

Instead of being greedy and using "(.*)", your best bet would be to use 
"([^"]*)". This assumes that there are no " characters within each 
field. This stops the regex from getting past the next " character of 
each field and eliminates all that backtracking.

Alternatively, you could look at splitting the line on the comma (see 
http://www.ruby-doc.org/core/classes/String.html#M000818) and end up 
with a nice array to reference each field. You'll still have the quotes 
that you'll need to strip from each item (unless you use the three 
character separator of "," and manually remove the leading " character 
from the first element and the trailing " character from the last 
element). This will likely be the fastest way since the regex doesn't 
need to be evaluated. However, you may need to put in more logic if not 
all lines are to be split in the text file such as comment lines.

Regards,
Jim