Edwin Fine wrote:

> I am perplexed by CSV.open. In IO and File, open returns something that
> quacks like an IO object. You can then call gets, puts, read, write and
> so on. The CSV open seems to return an array (or gives you a row at a
> time).

/ ...

> I wonder why it was not patterned more closely on IO? Any thoughts?

This is an experience with which I am becoming familiar. Someone requests a
solution to a problem. Someone else offers the option of a library to solve
the problem. Then the original problem fades into the background, replaced
by discussion of the library's problems.

This same pattern has repeated itself about four times in the past
fortnight, in just this one newsgroup.

I can be relied on to suggest a terse code solution. Then someone else can
be relied on to point out, correctly, that a terse code solution may miss
edge cases and, if exposed to enough data, will surely fail. Absolutely
correct.

So, just for variety, I will _not_ say "Have you considered writing your own
code?" just because that's what people expect me to say. I won't do this
because I now realize "code" is a trigger word, just like saying "abortion"
among fundamentalists -- that is something you just don't want to do.

So I will say "Have you considered writing your own library?" It amounts to
the same thing, since libraries are written using code, and all code is
written by mortals, but this way of saying it avoids the trigger word
"code".

Your own code ... er, excuse me, your own library ... will meet your
requirements exactly, it won't cover cases that are not relevant to the
problem at hand, it will be much faster overall than existing solutions,
and you will learn things about Ruby that you would not if you used someone
else's library.

In this specific case, as has been pointed out to me, a CSV field can
contain linefeeds, which means -- if your data exploits this trait -- you
need to parse the entire database using a state machine that knows about
this possibility.

On the other hand, if your data does not exploit this CSV trait (few
real-world CSV databases embed linefeeds), you can scan the data much more
quickly using a simpler solution, but a solution that will certainly fail
if the above assumption turns out to be false. Code like this:

------------------------------------------

#!/usr/bin/ruby -w

max_output_lines = 40000

input_file = "test.txt"

output_base = "output"

n = 0

ifile = File.open(input_file,"r")

header = ifile.gets

until(ifile.eof?)
 ofn = output_base + sprintf("%03d",n) + ".txt"
 ofile = File.open(ofn,"w")
 ofile.write(header)
 line = 2
 until(ifile.eof? || line > max_output_lines)
   ofile.write(ifile.gets)
   line += 1
 end
 ofile.close
 n += 1
end

ifile.close

------------------------------------------

Note that I meet your requirement to place the original header line at the
top of each database section.

If you will call this a "library", it will pass muster with those who prefer
the word "library" to the word "code". Outside the box, it's all the same.
Inside the box, not at all.

-- 
Paul Lutus
http://www.arachnoid.com