On Mon, 31 Oct 2005, James Edward Gray II wrote:

> On Oct 29, 2005, at 12:11 PM, Ara.T.Howard wrote:
>
>> it may or may not be tricky to get these failing cases working though:
>
> This version passes all of your edge cases:
>
> module CSV2
>  def self::parse_line( data )
>    io = if data.is_a?(IO) then data else StringIO.new(data) end
>    line = ""
>
>    loop do
>      line  += io.gets
>      parse = line.dup
>      parse.chomp!
>
>      csv = if parse.sub!(/\A,+/, "") then [nil] * $&.length else Array.new 
> end
>      parse.gsub!(/\G(?:^|,)(?:"((?>[^"]*)(?>""[^"]*)*)"|([^",]*))/) do
>        csv << if $1.nil?
>          if $2 == "" then nil else $2 end
>        else
>          $1.gsub('""', '"')
>        end
>        ""
>      end
>
>      break csv if parse.empty?
>    end
>  end
> end
>
> Here's how it is holding up speed wise:
>
> Neo:~/Desktop$ cat bm_csv.rb
> #!/usr/local/bin/ruby -w
>
> require "csv"
> require "benchmark"
> require "stringio"
> require "fast_csv"
>
> def parse_csv( data )
>  io = if data.is_a?(IO) then data else StringIO.new(data) end
>  line = ""
>
>  loop do
>    line  += io.gets
>    parse = line.dup
>    parse.chomp!
>
>    csv = if parse.sub!(/\A,+/, "") then [nil] * $&.length else Array.new end
>    parse.gsub!(/\G(?:^|,)(?:"((?>[^"]*)(?>""[^"]*)*)"|([^",]*))/) do
>      csv << if $1.nil?
>        if $2 == "" then nil else $2 end
>      else
>        $1.gsub('""', '"')
>      end
>      ""
>    end
>
>    break csv if parse.empty?
>  end
> end
>
> DATA  = %Q{Ten Thousand,10000, 2710 ,,"10,000","It's ""10 Grand"", baby",10K}
> TESTS = 50000
>
> fast = FastCsv.new
> Benchmark.bm do |timings|
>  timings.report("CSV") { TESTS.times { CSV.parse_line(DATA) } }
>  timings.report("FastCsv") { TESTS.times { fast.parse(DATA) } }
>  timings.report("Regexp") { TESTS.times { parse_csv(DATA) } }
> end
> Neo:~/Desktop$ ruby bm_csv.rb
>      user     system      total        real
> CSV 18.370000   0.060000  18.430000 ( 18.498160)
> FastCsv  3.640000   0.010000   3.650000 (  3.671689)
> Regexp  3.530000   0.020000   3.550000 (  3.560493)
>
> FastCsv was posted to Ruby Talk last night, but I'm using the refactored 
> version by Stefen Lang that was added today.  It does not pass all of your 
> tests.
>
> James Edward Gray II

wow.  nice work james!  i'd hate to have to fix that expression - but it's
hard to to argue with faster code that's much shorter too ;-)  one thing i was
thinking was that this might best be plugged in to the stdlib as

   CSV::fast_parse

which would avoid all compatibilty issues and test case issues... it could
just drop right in.

cheers.

-a
-- 
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| anything that contradicts experience and logic should be abandoned.
| -- h.h. the 14th dalai lama
===============================================================================