Drew Olson wrote:

> All -
> 
> I've written a script to split a .csv file into smaller .csv files of
> 40,000 lines each. The intent here is to break the file down enough so
> that excel does not have issues reading each chunk. My code takes a
> filename from the command line and breaks it down as so:
> 
> infile -> xyz.csv
> 
> output -> xyz_part_1.csv
>           xyz_part_2.csv
>           etc...
> 
> My code is working but I don't find it very "rubyish". In particular, I
> hate having my index and counter counters

Consider that the problem is one of counting input lines. In a case like
this, it is not possible to avoid using a counter. It's in the nature of
the problem to be solved.

> and I don't like that I had to 
> declare my header variable outside of the loop. Bear in mind here that I
> can not do something like "rows = CSV.open(infile)" because ruby will
> yell and error as the input file is too big (250 mb). Any advice on
> making the code nicer is appreciated. The current code is as follows:
> 
> require 'csv'
> 
> infile = ARGV[0] if ARGV[0] != nil
> 
> counter = 1
> index = 0
> header = ""
> writer = CSV.open(infile.gsub(/\./,"_part_"+counter.to_s+"."),'w')
> 
> CSV.open(infile, 'r') do |row|

Why are you using CSV for this? You aren't parsing the lines into fields, so
the fact that they contain CSV content has no bearing on the present task.
Your goal is to split the input file into groups of lines delimited by
linefeeds, not fields delimited by commas.

Why not simply read lines from the input file and write them to a series of
output files, until the input file is exhausted?

------------------------------------------

#!/usr/bin/ruby -w

max_output_lines = 1000

input_file = "test.txt"

output_base = "output"

n = 0

ifile = File.open(input_file,"r")

header = ifile.gets

until(ifile.eof?)
   ofn = output_base + sprintf("%03d",n) + ".txt"
   ofile = File.open(ofn,"w")
   ofile.write(header)
   line = 2
   until(ifile.eof? || line > max_output_lines)
      ofile.write(ifile.gets)
      line += 1
   end
   ofile.close
   n += 1
end

ifile.close

------------------------------------------

Just change the number for "max_output_lines" to suit your requirement.

-- 
Paul Lutus
http://www.arachnoid.com