Thanks for all the help. I've learned something from all the postings.
Joseph McDonald came up with the fastest script which clocked in at a
little over 3 seconds as opposed to the original script at 14 seconds.
--------------------------------------------------------------------
class Climo
DATE_FORMAT = "YYMMDDHH"
DATE_LEN = DATE_FORMAT.length
attr_reader :start_date, :end_date, :header
def initialize(in_file)
f = File.open(in_file)
@header = f.readline
@data = f.readline
@rec_len = @data.length
@data << f.read
@dict = Hash.new()
start = 0
@num_recs = (@data.length/@rec_len) -1
GC.disable
0.upto(@num_recs) do
@dict[@data[start,DATE_LEN].to_i] = start
start += @rec_len
end
GC.enable
@start_date = getbyline(0)[0,DATE_LEN]
@end_date = getbyline(@num_recs)[0,DATE_LEN]
puts "Start Date: #{@start_date}"
puts "End Date: #{@end_date}"
puts "By date: #{getbydate(@start_date.to_i)}"
end
def getbyline(lineno)
@data[lineno*@rec_len,@rec_len]
end
def getbydate(date)
@data[@dict[date],@rec_len]
end
end
def main()
in_file = ARGV[0]
climo_obj = Climo.new(in_file)
end
main()
--------------------------------------------------------------------
After studying Joe's script, I learned that
for i in 1...@num_lines
@dict[@lines.at(i)[0, @len_date].to_i] = i
end
is faster than
for i in 1...@num_lines
@dict[@lines[i][0... / len_date].to_i] = i
end
when modifying the original lines-based script.
In particular,
The change from ... to , saved approximately 3.5 seconds and the
switch to at() save another 0.3 seconds. The following lines-based
script now clocks in around 4 seconds. The difference in time between
this script and the fastest script can be attributed to the fact that
read() is about 1 second faster than readlines() on the file being
tested.
class Climo
DATE_FORMAT = "YYMMDDHH"
DATE_LEN = DATE_FORMAT.length
def initialize(in_file)
# Read in data
fp = File.new(in_file, "r")
@lines = fp.readlines()
fp.close()
@header = @lines.shift()
@num_lines = @lines.size()
# Store line indices in dictionary based on date field
@dict = Hash.new()
GC.disable
for i in 0...@num_lines
@dict[@lines.at(i)[0,DATE_LEN].to_i] = i
end
GC.enable
first_date_line = @lines[@dict[@lines[0][0,DATE_LEN].to_i]]
last_date_line =
@lines[@dict[@lines[@num_lines-1][0,DATE_LEN].to_i]]
puts "first_date_line: ", first_date_line
puts "last_date_line: ", last_date_line
end
end
def main()
in_file = ARGV[0]
climo_obj = Climo.new(in_file)
end
main()