On Sat, 29 Oct 2005, James Edward Gray II wrote:

> On Oct 28, 2005, at 9:58 AM, Ara.T.Howard wrote:
>
>> do you have a better reference for the re?  or is that the one?
>
> This is a pretty good size section of the book that goes into much detail. 
> I'm going to try and show the simplest solution he claims fully functional 
> here and I've trimmed even that.  Just understand that this isn't his 
> complete solution.  I'll use whitespace and comments liberally in the hopes 
> of making it easier to follow:
>
> (?: ^|, )  # fields must be at the beginning of the string, or after a comma
> (?:        # now match either...
>        "  # a double-quoted field...
>                (?: [^"] | "" )*
>        "
> |          # or some non-quote/non-comma text...
>        [^",]*
> )
>
> He goes on to dramatically optimize that, which leads me to a few questions:
>
> 1.  Does Ruby support \G?

i dunno.  but isn't perl's m//g exactly equivalent to ruby's s.scan(%r//) ?

> 2.  What about atomic grouping like Perl's (?> ... )?

yes.

> Obviously, this gets more complicated or possibly even falls apart if you
> start changing the delimiters, which I believe the library does allow.
> Still, maybe we could default use this in the common case...  If it even
> turns out to be faster.
>
> I hope that answers your questions.

that expression doesn't strip whitespace, which csv parsers are supposed to do
outside of quotes, but that's easily fixed.

can you post the optimized re?

regards.

-a
-- 
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| anything that contradicts experience and logic should be abandoned.
| -- h.h. the 14th dalai lama
===============================================================================