Ryan Davis wrote: > On Jan 31, 2005, at 9:40 PM, William James wrote: > > > A small, fast, and (I think) complete csv parser. > > There is test_csv.rb in the ruby tarball. Can you run your new code > against it to make sure it is complete? With good profile numbers I > doubt it'd be hard to get the slower code replaced. Wow. test_csv.rb is beyond my comprehension. I don't know how to use it. I did lift a very complex test string from it to use in testing my program. One of the fields in that csv string is defective; I don't know whether that was intentional or not: "\r\n"\r\nNaHi, The " in the field isn't doubled, and the field doesn't end with a quote. Incidentally, when my program converts that string to an array and then back to a csv string, it's not the same as the original string because ,"", is shortened to ,, . I corrected a minor bug in my code by moving ",".is_fs if $csv_fs.nil? to its proper location. The program conforms to the csv specification at this site: http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm and it handles the sample csv records shown there. All my program can do is read a text file containing csv records, convert those records (strings) into arrays of strings, and convert the arrays back into csv strings. I suppose that the csv library that comes with Ruby may do more than that. % ## Read, parse, and create csv records. % ## Has a minor bug fix; discard previous versions. % ## 2005-02-01. % % class Array % def to_csv % ",".is_fs if $csv_fs.nil? % s = '' % self.map { |item| % str = item.to_s % # Quote the string if it contains the field-separator or % # a " or a newline, or if it has leading or trailing whitespace. % if str.index($csv_fs) or /^\s|"|\n|\s$/.match(str) % str = '"' + str.gsub( /"/, '""' ) + '"' % end % str % }.join($csv_fs) % end % def unescape % self.map{|x| x.gsub( /""/, '"' ) } % end % end % % class String % # Set regexp for parse_csv. % # self is the field-separator, which must be % # a single character. % def is_fs % $csv_fs = self % if "^" == $csv_fs % fs = "\\^" % else % fs = $csv_fs % end % $csv_re = \ % ## Assumes embedded quotes are escaped as "". % %r{ \s* % (?: % "( [^"]* (?: "" [^"]* )* )" | % ( .*? ) % ) % \s* % [#{fs}] % }mx % end % % def parse_string % ",".is_fs if $csv_fs.nil? % (self + $csv_fs).scan( $csv_re ).flatten.compact.unescape % end % end % % def get_rec( file ) % $csv_s = "" % begin % if file.eof? % raise "The csv file is malformed." if $csv_s.size>0 % return nil % end % $csv_s += file.gets % end until $csv_s.count( '"' ) % 2 == 0 % $csv_s.chomp! % $csv_s.parse_string % end % % % # while rec = get_rec( ARGF ) % # puts "----------------" % # puts $csv_s % # p rec % # puts rec.to_csv % # end % % ## Here is my breakdown of the test string from test-csv.rb. % # foo, % # """foo""", % # "foo,bar", % # """""", % # "", % # , % # "\r", % # "\r\n""\r\nNaHi", <---<< Corrected. % # """Na""", % # "Na,Hi", % # "\r.\n", % # "\r\n\n", % # """", % # "\n", % # "\r\n" % % # Original. % csvStr = ("foo,!!!foo!!!,!foo,bar!,!!!!!!,!!,," + % "!\r!,!\r\n!\r\nNaHi,!!!Na!!!,!Na,Hi!," + % "!\r.\n!,!\r\n\n!,!!!!,!\n!,!\r\n!").gsub('!', '"') % % # Corrected? % csvStr = ("foo,!!!foo!!!,!foo,bar!,!!!!!!,!!,," + % "!\r!,!\r\n!!\r\nNaHi!,!!!Na!!!,!Na,Hi!," + % "!\r.\n!,!\r\n\n!,!!!!,!\n!,!\r\n!").gsub('!', '"') % % p csvStr % arry = csvStr.parse_string % p arry % newCsvStr = arry.to_csv % p newCsvStr % arry2 = newCsvStr.parse_string % puts "Arrays match." if arry == arry2