On Fri, 28 Oct 2005, James Edward Gray II wrote:

> On Oct 28, 2005, at 8:53 AM, Ara.T.Howard wrote:
>
>> anyhow - thanks for doing this.  fyi, i've used the following approach many
>> times for loading huge csv files in an attempt to squeeze out speed - it
>> works.  the approach is simple:
>
> I wondering if we're going about this all wrong.  Mastering Regular
> Expression has a single regex that parses CSV, properly handling escapes.
> Have anyone tried porting that?

i haven't looked at it - but isn't that impossible, by definition, since
parsing them is context sensitive when quoting is considered?  can you post
the one you are referring to?  if found this one online which supposedly comes
from the book...

   http://www.unix.org.ua/orelly/perl/cookbook/ch01_16.htm

but it's not even close - failing for embedded commas:

   harp:~ > cat a.rb
   re = %r/
     "([^\"\\]*(?:\\.[^\"\\]*)*)",?
        |  ([^,]+),?
        | ,
   /x

   valid =
     %Q(
       "span
       line", 42
     ),
     %Q(
       "embedded, comma", 42
     ),
     %Q(
       "span
       line with embedded, comma", 42
     )


   valid.each{|line| line.scan(re){|field| p field}; puts}
   harp:~ > ruby a.rb
   [nil, "\n    \"span\n    line\""]
   [nil, " 42\n  "]

   [nil, "\n    \"embedded"]
   [nil, " comma\""]
   [nil, " 42\n  "]

   [nil, "\n    \"span\n    line with embedded"]
   [nil, " comma\""]
   [nil, " 42\n  "]


do you have a better reference for the re?  or is that the one?

-a
-- 
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| anything that contradicts experience and logic should be abandoned.
| -- h.h. the 14th dalai lama
===============================================================================