From: "Sean Middleditch" <elanthis / awesomeplay.com>
>
[...]
> escaped.  Also, what about something like
> 
> abc,def,\"abc,"123,456"\,xxy
> 
> The escapes won't work.  I once had a regexp that (in almost all cases)
> properly handled this, but I don't recall what it was.  Unfortunately,
> I'm not talented enough at regexps to figure it out again without
> another hour of work.  ^,^  I don't know if handling the \ escapes is
> important though for this situation (I had some text files at work that
> did need it, though... was a real pain).

Here's one that should handle everything EXCEPT that pesky escaped
comma _outside_ the quoted string.  :-(  Is that for real ???  I
take it that ought to tokenize to
'abc', 'def', '\"abc', '123,456\,xxy' ??????

That seems really weird because it gobbles quotes and yet concatenates
fields (as it were) with that escaped comma following the quotes.
I'd have expected the program generating the CSV to have output that
field as "\"123,456\",xxy" . . . which the below can handle, but . . .

Anyway, for what it's worth  :-)


require 'runit/testcase'
require 'runit/cui/testrunner'

def csv_split(str)
    str.scan(/(?:\A|,)\s*"((?:\\"|[^"])*)"|(?:\A|,)([^",]*|[^",][^,]*)(?=,|\z)/).flatten!.compact!
end

class TestCsvSplit < RUNIT::TestCase
    def testCsvSplit
        fields = csv_split(%q{"aaa",,"c,\"d\",",,,"fff",,,})
        assert fields == ['aaa', '', 'c,\"d\",', '', '', 'fff', '', '', '']
        fields = csv_split(%q{"\"",,"a\"\"b","\"c\"",})
        assert fields == ['\"', '', 'a\"\"b', '\"c\"', '']
        fields = csv_split(%q{abc,def,\"abc,"123,456",xxy})
        assert fields == ['abc', 'def', '\"abc', '123,456', 'xxy']
        fields = csv_split(%q{abc,def,\"abc,"\"123,456\",xxy"})
        assert fields == ['abc', 'def', '\"abc', '\"123,456\",xxy']
    end
end


RUNIT::CUI::TestRunner.run(TestCsvSplit.suite)



Bill