On Oct 29, 2005, at 12:11 PM, Ara.T.Howard wrote:

> it may or may not be tricky to get these failing cases working though:

This version passes all of your edge cases:

module CSV2
   def self::parse_line( data )
     io = if data.is_a?(IO) then data else StringIO.new(data) end
     line = ""

     loop do
       line  += io.gets
       parse = line.dup
       parse.chomp!

       csv = if parse.sub!(/\A,+/, "") then [nil] * $&.length else  
Array.new end
       parse.gsub!(/\G(?:^|,)(?:"((?>[^"]*)(?>""[^"]*)*)"|([^",]*))/) do
         csv << if $1.nil?
           if $2 == "" then nil else $2 end
         else
           $1.gsub('""', '"')
         end
         ""
       end

       break csv if parse.empty?
     end
   end
end

Here's how it is holding up speed wise:

Neo:~/Desktop$ cat bm_csv.rb
#!/usr/local/bin/ruby -w

require "csv"
require "benchmark"
require "stringio"
require "fast_csv"

def parse_csv( data )
   io = if data.is_a?(IO) then data else StringIO.new(data) end
   line = ""

   loop do
     line  += io.gets
     parse = line.dup
     parse.chomp!

     csv = if parse.sub!(/\A,+/, "") then [nil] * $&.length else  
Array.new end
     parse.gsub!(/\G(?:^|,)(?:"((?>[^"]*)(?>""[^"]*)*)"|([^",]*))/) do
       csv << if $1.nil?
         if $2 == "" then nil else $2 end
       else
         $1.gsub('""', '"')
       end
       ""
     end

     break csv if parse.empty?
   end
end

DATA  = %Q{Ten Thousand,10000, 2710 ,,"10,000","It's ""10 Grand"",  
baby",10K}
TESTS = 50000

fast = FastCsv.new
Benchmark.bm do |timings|
   timings.report("CSV") { TESTS.times { CSV.parse_line(DATA) } }
   timings.report("FastCsv") { TESTS.times { fast.parse(DATA) } }
   timings.report("Regexp") { TESTS.times { parse_csv(DATA) } }
end
Neo:~/Desktop$ ruby bm_csv.rb
       user     system      total        real
CSV 18.370000   0.060000  18.430000 ( 18.498160)
FastCsv  3.640000   0.010000   3.650000 (  3.671689)
Regexp  3.530000   0.020000   3.550000 (  3.560493)

FastCsv was posted to Ruby Talk last night, but I'm using the  
refactored version by Stefen Lang that was added today.  It does not  
pass all of your tests.

James Edward Gray II