On Aug 16, 2007, at 7:44 PM, William James wrote:

>
> James Edward Gray II wrote:
>> On Aug 16, 2007, at 2:35 PM, William James wrote:
>>
>>> This is the best I've come up with so far.  It should handle any CSV
>>> record
>>> (i.e., fields may contain commas, double quotes, and newlines).
>>>
>>> class String
>>>   def csv
>>>     if include? '"'
>>>       ary =
>>>         "#{chomp},".scan( /\G"([^"]*(?:""[^"]*)*)",|\G([^,"]*),/ )
>>>       raise "Bad csv record:\n#{self}"  if $' != ""
>>>       ary.map{|a| a[1] || a[0].gsub(/""/,'"') }
>>>     else
>>>       ary = chomp.split( /,/, -1)
>>>       ##   "".csv ought to be [""], not [], just as
>>>       ##   ",".csv is ["",""].
>>>       if [] == ary
>>>         [""]
>>>       else
>>>         ary
>>>       end
>>>     end
>>>   end
>>> end
>>
>> You are pretty much rewriting FasterCSV here.  Why do that when we
>> could just use it instead?
>
>
> That is a dishonest comment.

Not honest?  I guess I'm not sure how you meant that.

FasterCSV's parser uses a very similar regular expression.  Quoting  
from the source:

     # prebuild Regexps for faster parsing
     @parsers = {
       :leading_fields =>
         /\A(?:#{Regexp.escape(@col_sep)})+/,     # for empty leading  
fields
       :csv_row        =>
         ### The Primary Parser ###
         / \G(?:^|#{Regexp.escape(@col_sep)})     # anchor the match
           (?: "((?>[^"]*)(?>""[^"]*)*)"          # find quoted fields
               |                                  # ... or ...
               ([^"#{Regexp.escape(@col_sep)}]*)  # unquoted fields
               )/x,
         ### End Primary Parser ###
       :line_end       =>
         /#{Regexp.escape(@row_sep)}\z/           # safer than chomp!()
     }

I felt they were similar enough to say you were recreating it.  I can  
live with it if you don't agree though.

> What if someone had said to you when you released "FasterCSV":
> "You are pretty much rewriting CSV here.  Why do that when we
> could just use it instead?"

They did.  I said it was too slow and I didn't care for the  
interface, though some do prefer it.  Pretty much what you just said  
to me, so I look forward to using your EvenFasterCSV library on my  
next project.

> Parsing CSV isn't very difficult.

Yeah, it's not too tough.

I'm a little bothered by how your solution makes me slurp the data  
into a String though.  Today I was working with a CSV file with over  
35,000 records in it, so I'm not too comfortable with that.  You  
might consider adding a little code to ease that.

Also, I really prefer to work with CSV by headers, instead of column  
indices.  That's easier and more robust, in my opinion.  You might  
want to add some code for that too.

Of course, then we're just getting closer and closer to FasterCSV, so  
maybe not...

> "FasterCSV" is too slow and far too large.

FasterCSV is mostly interface code to make the user experience as  
nice as possible.  There's also a lot of documentation in there.  The  
core parser is still way smaller than the standard library's parser.

James Edward Gray II