Edwin Fine wrote: > I am perplexed by CSV.open. In IO and File, open returns something that > quacks like an IO object. You can then call gets, puts, read, write and > so on. The CSV open seems to return an array (or gives you a row at a > time). / ... > I wonder why it was not patterned more closely on IO? Any thoughts? This is an experience with which I am becoming familiar. Someone requests a solution to a problem. Someone else offers the option of a library to solve the problem. Then the original problem fades into the background, replaced by discussion of the library's problems. This same pattern has repeated itself about four times in the past fortnight, in just this one newsgroup. I can be relied on to suggest a terse code solution. Then someone else can be relied on to point out, correctly, that a terse code solution may miss edge cases and, if exposed to enough data, will surely fail. Absolutely correct. So, just for variety, I will _not_ say "Have you considered writing your own code?" just because that's what people expect me to say. I won't do this because I now realize "code" is a trigger word, just like saying "abortion" among fundamentalists -- that is something you just don't want to do. So I will say "Have you considered writing your own library?" It amounts to the same thing, since libraries are written using code, and all code is written by mortals, but this way of saying it avoids the trigger word "code". Your own code ... er, excuse me, your own library ... will meet your requirements exactly, it won't cover cases that are not relevant to the problem at hand, it will be much faster overall than existing solutions, and you will learn things about Ruby that you would not if you used someone else's library. In this specific case, as has been pointed out to me, a CSV field can contain linefeeds, which means -- if your data exploits this trait -- you need to parse the entire database using a state machine that knows about this possibility. On the other hand, if your data does not exploit this CSV trait (few real-world CSV databases embed linefeeds), you can scan the data much more quickly using a simpler solution, but a solution that will certainly fail if the above assumption turns out to be false. Code like this: ------------------------------------------ #!/usr/bin/ruby -w max_output_lines = 40000 input_file = "test.txt" output_base = "output" n = 0 ifile = File.open(input_file,"r") header = ifile.gets until(ifile.eof?) ofn = output_base + sprintf("%03d",n) + ".txt" ofile = File.open(ofn,"w") ofile.write(header) line = 2 until(ifile.eof? || line > max_output_lines) ofile.write(ifile.gets) line += 1 end ofile.close n += 1 end ifile.close ------------------------------------------ Note that I meet your requirement to place the original header line at the top of each database section. If you will call this a "library", it will pass muster with those who prefer the word "library" to the word "code". Outside the box, it's all the same. Inside the box, not at all. -- Paul Lutus http://www.arachnoid.com