Drew Olson wrote: >All - > >I've written a script to split a .csv file into smaller .csv files of >40,000 lines each. The intent here is to break the file down enough so >that excel does not have issues reading each chunk. My code takes a >filename from the command line and breaks it down as so: > >infile -> xyz.csv > >output -> xyz_part_1.csv > xyz_part_2.csv > etc... > >My code is working but I don't find it very "rubyish". In particular, I >hate having my index and counter counters and I don't like that I had to >declare my header variable outside of the loop. Bear in mind here that I >can not do something like "rows = CSV.open(infile)" because ruby will >yell and error as the input file is too big (250 mb). Any advice on >making the code nicer is appreciated. The current code is as follows: > >require 'csv' > >infile = ARGV[0] if ARGV[0] != nil > >counter = 1 >index = 0 >header = "" >writer = CSV.open(infile.gsub(/\./,"_part_"+counter.to_s+"."),'w') > >CSV.open(infile, 'r') do |row| > if(index != 0 && index%40000 == 0) > writer.close > counter+=1 > writer = CSV.open(infile.gsub(/\./,"_part_"+counter.to_s+"."),'w') > writer << header > end > if (index == 0) > header = row > end > writer << row > index += 1 >end > >writer.close() > > > I will ignore the CSV issue, not because it isn't important, but simply because I'm not familar with the csv parser, and this example sufficiently represents the concept. For maximum elegance, I would write the code this way. It uses the helper methods (i.e. not in the stdlib) File#write_fresh, File#to_a and Enumerable#chunks, all of which I've written at one time or another. Mentally sub in the appropriate code as desired. File.to_a('xyz.csv').chunks(40000).each_with_index do |chunk,i| File.write_fresh("xyz_part_#{i+1}",chunk.join("\n")) end File.to_a returns an array of lines Enumerable#chunks divides an Enumerable into groups of 40k. A 100k array would yield 2 40k chunks and a 20k chunk. File#write_fresh creates the file if it doesn't exist, truncates any existing file, and writes the 2nd argument to the file. This version is much prettier than the corresponding version without the helper methods, but it is also clearer. It is obvious at a glance what it does. The same can't be said for the version without helper methods.