--------------enig12CF5C59A566E512C474C1B1
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Hi,

William James wrote:
> I did lift a very complex test string from it to use in testing
> my program.  One of the fields in that csv string is defective;
> I don't know whether that was intentional or not:
> 
> "\r\n"\r\nNaHi,
> 
> The " in the field isn't doubled, and the field doesn't end
> with a quote.

The second \r\n is a record separator.  Here's from test_csv.rb.

   # sample data
   #
   #  1      2       3         4       5        6      7    8
   # +------+-------+---------+-------+--------+------+----+------+
   # | foo  | "foo" | foo,bar | ""    |(empty) |(null)| \r | \r\n |
   # +------+-------+---------+-------+--------+------+----+------+
   # | NaHi | "Na"  | Na,Hi   | \r.\n | \r\n\n | "    | \n | \r\n |
   # +------+-------+---------+-------+--------+------+----+------+
   #

The table contains 2 records and each record has 8 fields.

> Incidentally, when my program converts that string to an array
> and then back to a csv string, it's not the same as
> the original string because  ,"", is shortened to ,, .

In the csv.rb, string ""(0x22 0x22) means empty string, and empty string 
means NULL.  I needed to distinguish it when I first wrote that.

And here's some scenarios you may be interested in.

irb(main):001:0> require 'csv'
=> true
irb(main):002:0> CSV.parse('"abc"def')
CSV::IllegalFormatError: CSV::IllegalFormatError
         from /usr/local/lib/ruby/1.9/csv.rb:587:in `get_row'
         from /usr/local/lib/ruby/1.9/csv.rb:536:in `each'
         from /usr/local/lib/ruby/1.9/csv.rb:107:in `collect'
         from /usr/local/lib/ruby/1.9/csv.rb:107:in `parse'
         from (irb):2
irb(main):003:0> CSV.parse('"abc"def', 'def')
=> [["abc", nil]]
irb(main):004:0> CSV.parse('"abc"def"ghi"', 'd', 'f')
=> [["abc", "e"], ["ghi"]]
irb(main):005:0> CSV.parse('aaabaaacaaabaa', 'ab', 'ac')
=> [["aa", "aa"], ["aa", "aa"]]
irb(main):006:0> quit
% echo foo,bar | ruby -rcsv -e 'CSV.parse(STDIN) { |row| p row }'
["foo", "bar"]

Of cource I don't think everyone needs this "complexity" (and slowness). 
  Regexp based approach is very useful, too.  (I often do that.)

Back to the original post of this thread, wasting 110 sec. for parsing 
27000 CSV records seems too slow even if it was written in pure Ruby. 
Can I have the csv file?  I want to do profiling with the data...

Regards,
// NaHi

--------------enig12CF5C59A566E512C474C1B1
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (Cygwin)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFB/58Bf6b33ts2dPkRAvOxAJ94yhT5Up9iCR3Gf+66w5pVLYJGegCaAtx/
Awk8IZLrM+V94OHDN0nLzog=
=7aKB
-----END PGP SIGNATURE-----

--------------enig12CF5C59A566E512C474C1B1--