Drew Olson wrote:

>All -
>
>I've written a script to split a .csv file into smaller .csv files of
>40,000 lines each. The intent here is to break the file down enough so
>that excel does not have issues reading each chunk. My code takes a
>filename from the command line and breaks it down as so:
>
>infile -> xyz.csv
>
>output -> xyz_part_1.csv
>          xyz_part_2.csv
>          etc...
>
>My code is working but I don't find it very "rubyish". In particular, I
>hate having my index and counter counters and I don't like that I had to
>declare my header variable outside of the loop. Bear in mind here that I
>can not do something like "rows = CSV.open(infile)" because ruby will
>yell and error as the input file is too big (250 mb). Any advice on
>making the code nicer is appreciated. The current code is as follows:
>
>require 'csv'
>
>infile = ARGV[0] if ARGV[0] != nil
>
>counter = 1
>index = 0
>header = ""
>writer = CSV.open(infile.gsub(/\./,"_part_"+counter.to_s+"."),'w')
>
>CSV.open(infile, 'r') do |row|
>  if(index != 0 && index%40000 == 0)
>    writer.close
>    counter+=1
>    writer = CSV.open(infile.gsub(/\./,"_part_"+counter.to_s+"."),'w')
>    writer << header
>  end
>  if (index == 0)
>    header = row
>  end
>  writer << row
>  index += 1
>end
>
>writer.close()
>
>  
>
I will ignore the CSV issue, not because it isn't important, but simply 
because I'm not familar with the csv parser, and this example 
sufficiently represents the concept.

For maximum elegance, I would write the code this way.  It uses the 
helper methods (i.e. not in the stdlib) File#write_fresh, File#to_a and 
Enumerable#chunks, all of which I've written at one time or another.  
Mentally sub in the appropriate code as desired.

File.to_a('xyz.csv').chunks(40000).each_with_index do |chunk,i|
 File.write_fresh("xyz_part_#{i+1}",chunk.join("\n"))
end

File.to_a returns an array of lines
Enumerable#chunks divides an Enumerable into groups of 40k.  A 100k 
array would yield 2 40k chunks and a 20k chunk.
File#write_fresh creates the file if it doesn't exist, truncates any 
existing file, and writes the 2nd argument to the file.

This version is much prettier than the corresponding version without the 
helper methods, but it is also clearer.  It is obvious at a glance what 
it does.  The same can't be said for the version without helper methods.