Ryan Davis wrote:
> On Jan 31, 2005, at 9:40 PM, William James wrote:
>
> > A small, fast, and (I think) complete csv parser.
>
> There is test_csv.rb in the ruby tarball. Can you run your new code
> against it to make sure it is complete? With good profile numbers I
> doubt it'd be hard to get the slower code replaced.

Wow. test_csv.rb is beyond my comprehension. I don't know how
to use it.

I did lift a very complex test string from it to use in testing
my program.  One of the fields in that csv string is defective;
I don't know whether that was intentional or not:

"\r\n"\r\nNaHi,

The " in the field isn't doubled, and the field doesn't end
with a quote.

Incidentally, when my program converts that string to an array
and then back to a csv string, it's not the same as
the original string because  ,"", is shortened to ,, .

I corrected a minor bug in my code by moving
",".is_fs   if $csv_fs.nil?
to its proper location.

The program conforms to the csv specification at this site:
http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm
and it handles the sample csv records shown there.

All my program can do is read a text file containing csv records,
convert those records (strings) into arrays of strings, and
convert the arrays back into csv strings.  I suppose that the
csv library that comes with Ruby may do more than that.


% ## Read, parse, and create csv records.
% ## Has a minor bug fix; discard previous versions.
% ## 2005-02-01.
%
% class Array
%   def to_csv
%     ",".is_fs   if $csv_fs.nil?
%     s = ''
%     self.map { |item|
%       str = item.to_s
%       # Quote the string if it contains the field-separator or
%       # a " or a newline, or if it has leading or trailing
whitespace.
%       if str.index($csv_fs) or /^\s|"|\n|\s$/.match(str)
%         str = '"' + str.gsub( /"/, '""' ) + '"'
%       end
%       str
%     }.join($csv_fs)
%   end
%   def unescape
%     self.map{|x| x.gsub( /""/, '"' ) }
%   end
% end
%
% class String
%   # Set regexp for parse_csv.
%   # self is the field-separator, which must be
%   # a single character.
%   def is_fs
%     $csv_fs = self
%     if "^" == $csv_fs
%       fs = "\\^"
%     else
%       fs = $csv_fs
%     end
%     $csv_re = \
%       ## Assumes embedded quotes are escaped as "".
%       %r{  \s*
%            (?:
%                "(  [^"]*  (?:  "" [^"]*  )*  )"  |
%                 ( .*? )
%            )
%            \s*
%            [#{fs}]
%         }mx
%   end
%
%   def parse_string
%     ",".is_fs   if $csv_fs.nil?
%     (self + $csv_fs).scan( $csv_re ).flatten.compact.unescape
%   end
% end
%
% def get_rec( file )
%   $csv_s = ""
%   begin
%     if file.eof?
%       raise "The csv file is malformed." if $csv_s.size>0
%       return nil
%     end
%     $csv_s += file.gets
%   end  until $csv_s.count( '"' ) % 2 == 0
%   $csv_s.chomp!
%   $csv_s.parse_string
% end
%
%
% # while  rec = get_rec( ARGF )
% #   puts "----------------"
% #   puts $csv_s
% #   p rec
% #   puts rec.to_csv
% # end
%
% ## Here is my breakdown of the test string from test-csv.rb.
% # foo,
% # """foo""",
% # "foo,bar",
% # """""",
% # "",
% # ,
% # "\r",
% # "\r\n""\r\nNaHi",    <---<<  Corrected.
% # """Na""",
% # "Na,Hi",
% # "\r.\n",
% # "\r\n\n",
% # """",
% # "\n",
% # "\r\n"
%
% # Original.
% csvStr = ("foo,!!!foo!!!,!foo,bar!,!!!!!!,!!,," +
%          "!\r!,!\r\n!\r\nNaHi,!!!Na!!!,!Na,Hi!," +
%          "!\r.\n!,!\r\n\n!,!!!!,!\n!,!\r\n!").gsub('!', '"')
%
% # Corrected?
% csvStr = ("foo,!!!foo!!!,!foo,bar!,!!!!!!,!!,," +
%          "!\r!,!\r\n!!\r\nNaHi!,!!!Na!!!,!Na,Hi!," +
%          "!\r.\n!,!\r\n\n!,!!!!,!\n!,!\r\n!").gsub('!', '"')
%
% p csvStr
% arry = csvStr.parse_string
% p arry
% newCsvStr = arry.to_csv
% p newCsvStr
% arry2 = newCsvStr.parse_string
% puts "Arrays match."  if arry == arry2