Thomas Mueller wrote:
> 2006/11/30, Drew Olson <olsonas / gmail.com>:
>> I'll give FasterCSV a try when I get home from work and out from behind
>> this proxy. Here's another question: in some cases I need to sort the
>> file before splitting it (in this case sorting by the 4th cell in each
>> row). However, the current file I'm trying to sort and split is around
>> 76 MB and ruby fails when trying to store the CSV as an array. The code
>> and output are below. How else can I go about this?

I'm coming to this party really late, so I hope I don't come across as 
shamelessly plugging KirbyBase, but, you might want to try it for this.

If you are simply trying to take a large csv file, sort it by one of its 
fields, and split it up into smaller files that each contain 40,000 
records, I think it might work.

Here's some code (not tested, could be incorrect) off the top of my head:


require 'kirbybase'

db = KirbyBase.new

tbl = db.create_table(:foo, :field1, :String, :field2, :Integer, 
:field3, :String............................

tbl.import_csv(name_of_csv_file)

rec_count = tbl.total_recs
last_recno_written_out = 0

while rec_count > 0
  recs = tbl.select { |r| r.recno > last_recno_written_out and r.recno < 
last_recno_written_out + 40000 }.sort(:field4)
 
  ........ here is where you put the code to write these 40,000 recs to 
a csv output file .............

  last_recno_written_out = recs.last.recno

  rec_count = rec_count - 40000
end


KirbyBase will even use FasterCSV for it's csv stuff if you have it 
installed.  :-)


Anyway, hope this helps.  If I have totally misunderstood the request, 
feel free to ignore!

Jamey Cribbs