This is a cool little data mining problem.  Unfortunately, it is a fair bit of
work to get a good solution.  Justin Either sent in the only full solution and
even he chose to cut a few corners.  His code does produce almost 200 lines of
interesting feedback when run on the sample data set though, so it's fun to play
around with.  The downside is that it's close to 300 lines of code.  Because of
that, I'm going to try giving a pretty high-level overview of how it works.

Let's work backwards this time and just follow the code:

	# ...
	
	if ARGV.size < 1
	  puts "Usage: vehicle_counter.rb datafile [-avg]"
	else
	  vc = VehicleCounter.new
	  vc.process(ARGV[0], ARGV[1] != nil)
	end

This is the code that actually gets executed when you run the program.  I always
call that the application code.

Here we see a simple branch that either presents a usage message or builds a
VehicleCounter and calls process() on it.  This could actually be just a module,
since it doesn't maintain any state.  It may help you to think of the following
code that way:

	# ...
	
	class VehicleCounter
	  # ...
	  
	  def process(file, report_averages)
	    raw_data = parse(file)
	    for sensor in 0..1 
	      sensor_report = DirectionDataReport.new(raw_data[sensor])
	      sensor_report.report(sensor, report_averages)
	    end
	  end
	  
	  # ...
	end
	
	# ...

This defines the overall process of this program which is just two steps:  read
in the data and build a report.  The parse() method manages the reading step
while reporting is encapsulated in DirectionDataReport objects.  We will begin
with the reading:

	  # ...
	  
	  Dir_A = 0
	  Dir_B = 1
	  
	  def parse(file)
	    times = []
	    dirs = [[], []]
	    
	    f = File.open(file)
	    f.each_line do |line|
	      sensor, time = parse_record(line)
	      times << time
	      
	      if (times.size % 2) == 0
	        if (times.size == 2 and sensor == Dir_A) or
	           (times.size == 4 and sensor == Dir_B)
	           
	           # Remove "B" records from second direction
	           times = [times[0], times[2]] if sensor == Dir_B
	           
	           dirs[sensor] << times
	           times = []
	         elsif (times.size == 4 and sensor != Dir_B)
	          puts "Parse error"
	          times = []
	         end
	      elsif (times.size % 2) == 1 and sensor == Dir_B
	        puts "Parse error - Unexpected B record"
	      end
	    end
	    f.close
	    
	    dirs
	  end
	  
	  def parse_record(data)
	    unpacked = data.unpack("a1a*")
	    return unpacked[0] == "A" ? Dir_A : Dir_B, unpacked[1].to_i
	  end
	  
	  # ...

This parser is what I meant by cutting corners earlier.  The data provided is
quite simple and you don't have to worry about cars going both directions at the
same time when working with it.  Given that, a pair of A times is a northbound
car and a set of A, B, A, B times is a southbound car.  This parse just hunts
for those sets.

Both sets of times are maintained in the Array of Arrays called dirs.  The
constants are used to select the right group based on the indicator found in
unpack()ing the last record of the group.  These two Arrays are the ultimate
results of this method.

Note that the file reading loop could be reduced to a File.foreach() call.

The rest of the code is reporting.  We saw a DirectionDataReport initialized in
the process() method and then witnessed a call to report() on that object. 
These two steps are the primary interface:

	MSecPerMin = 1000 * 60
	MSecPerHour = MSecPerMin * 60
	InchesPerMile = 63360
	
	class DirectionDataReport
	  Verbose = false
	  Sensors = %w(Northbound Southbound)
	  Days = %w(Mon Tue Wed Thu Fri Sat Sun)
	  
	  def initialize(raw_data)
	    @raw_data = raw_data
	  end
	  
	  def report(sensor, report_averages)
	    puts "Direction: #{Sensors[sensor]}"
	    puts "Total Cars: #{self.total_count}"
	    
	    report_time_periods(sensor, report_averages, MSecPerHour * 12)
	    report_time_periods(sensor, report_averages, MSecPerHour)
	    report_time_periods(sensor, report_averages, MSecPerHour / 2)
	    report_time_periods(sensor, report_averages, MSecPerHour / 4)
	    puts ""
	  end
	  
	  def total_count()
	    @raw_data.size
	  end
	  
	  # ...

You can see here that the constructor just squirrels away the data from the
parse.  The report() method, on the other, hand drives the output process. 
After reporting the direction and a total car count, it displays time oriented
breakdowns for various scales:  12 hours, one hour, 30 minutes, and 15 minutes.

Here is the report_time_periods() method and three of the methods it relies on:

	  # ...
	  
	  def report_time_periods(sensor, report_averages, time_period_length)
	    days = create_time_periods(time_period_length)
	    num_time_periods = (MSecPerHour * 24) / time_period_length
	
	    counts = count_per_time_period(days)
	    avg_speeds = speed_per_time_period(days)
	    avg_dists = dist_per_time_period(days)
	    
	    puts("\nTime Interval: #{time_period_length/MSecPerMin} Minutes")
	    if (num_time_periods > 2)
	      peaks = find_peak_times(days)
	      puts("Peak Times")
	      for i in 0...peaks.size
	        printf("#{Days[i]}:")
	        peaks[i].size.times {|p|
	          printf(format_time_interval_index(peaks[i][p][1],
	                                            time_period_length))
	        }
	        puts ""
	      end
	    end
	    
	    puts("Statistics")
	    printf("Data    ")
	    printf("\tDay") if not report_averages
	    
	    num_time_periods.times{|i|
	      printf(format_time_interval_index(i, time_period_length))
	    } 
	    report_column_data(days, num_time_periods, report_averages, counts,
	                       report_averages ? "Avg Count" : "Count    ", "% 5d")
	    report_column_data(days, num_time_periods, report_averages, avg_speeds,
	                       "Avg Speed", "%02.02f")
	    report_column_data(days, num_time_periods, report_averages, avg_dists,
	                       "Avg Dist ", "%02.02f")
	    puts ""
	  end
	  
	  def create_time_periods(time_period_length = MSecPerHour)
	    days = []
	    time_periods = nil
	    prev_start = @raw_data[0][0] + 1
	    
	    for data in @raw_data
	      if prev_start > data[0]
	        days << time_periods if time_periods != nil
	        
	        cur_time_period = 0
	        time_periods = [[]]
	        
	        puts "New day: data=#{data[0]}" if Verbose
	      elsif data[0] > ((cur_time_period + 1) * time_period_length)
	        while data[0] > ((cur_time_period + 1) * time_period_length)
	          cur_time_period += 1
	          time_periods[cur_time_period] = []
	        end
	        
	        puts "New time period: data=#{data[0]}" if Verbose
	      end
	      
	      time_periods[cur_time_period] << data
	      prev_start = data[0]
	    end
	    
	    days << time_periods
	    days
	  end
	  
	  def format_time_interval_index(index, time_period_length)
	    sprintf("\t%02d:%02d", 
	        index * time_period_length / MSecPerHour,
	        (index * time_period_length / MSecPerMin) % 60)
	  end
	  
	  def report_column_data(days, num_time_periods, report_averages, data,
	                         data_label, format_string)
	    if report_averages
	      printf("\n#{data_label}")
	      for time in 0...num_time_periods
	        avg = 0
	        days.size.times {|day| avg += data[day][time] }
	        printf("\t#{format_string}", avg / days.size)          
	      end
	    else
	      for day in 0...days.size
	        printf("\n#{data_label}\t#{Days[day]}")
	        for time in 0...num_time_periods
	          printf("\t#{format_string}", data[day][time])
	        end
	      end
	    end
	  end
	  
	  # ...

There is a lot of code here obviously, but none of it is very complex.  First,
report_time_periods() breaks the data down into time intervals using
create_time_periods().  That method is just a partition tool for the data.  Once
it's divided out, a few utility method we will examine next are used to separate
the interesting information out.

After that, the entire rest of the of the method is printing code for what it
found.  (The peak data section actually calculates, again using another helper,
and prints for appropriate data ranges.)  Most of the print work is delegated to
the bottom two methods and much or their magic is for printing columnar data.

Here are the helpers I glossed over and the helpers they rely on:

	  # ...
	  
	  def find_peak_times(days, num_peaks=4)
	    days.map do |day| 
	      find_daily_peak_times(day, num_peaks)
	    end
	  end
	  
	  def find_daily_peak_times(daily_time_periods, num_peaks)
	    peaks = []
	    daily_time_periods.size.times {|i|
	      peaks << [daily_time_periods[i].size, i]
	    }
	    peaks.sort.reverse.slice(0, num_peaks).sort {|a,b| a[1]<=>b[1]}
	  end
	  
	  def count_per_time_period(days)
	    days.map do |day|
	      day.map {|time_period| time_period.size}
	    end
	  end
	  
	  def speed_per_time_period(days)
	    days.map do |day| 
	      day.map {|time_period| calc_average_speed(time_period) } 
	    end
	  end
	
	  def dist_per_time_period(days)
	    days.map do |day| 
	      day.map {|time_period| calc_average_distance(time_period) } 
	    end
	  end  
	  
	  def calc_average_speed(time_period)
	    return 0 if time_period.size == 0
	    
	    sum = 0    
	    for time in time_period
	      sum += calc_speed(time[0], time[1])
	    end    
	    sum / (time_period.size)
	  end
	
	  def calc_speed(start_time, end_time)
	    return (100.0 / (end_time - start_time)) *
	           MSecPerHour / (InchesPerMile * 1.0) 
	  end  
	  
	  def calc_average_distance(time_period)
	    return 0 if time_period.size <= 1 # Need at least 2 cars
	    sum = 0
	    for i in 0...(time_period.size - 1)
	      sum += calc_distance(time_period[0], time_period[1])
	    end
	    return sum / (time_period.size - 1)
	  end
	
	  def calc_distance(leader_time, follower_time)
	    follower_speed = calc_speed(follower_time[0], follower_time[1])
	    
	    dist = follower_speed * ((follower_time[0] - leader_time[0]) /
	           (MSecPerHour * 1.0))
	    
	    return dist
	  end
	end
	
	# ...

There are three kinds of helpers here.  The peak methods just sort the data and
take so many entries off the top.  The *_per_time_period() method are wrappers
that apply the calc_*() methods over periods of data.  Which means the calc_*()
methods are where the action is.  They figure out car speeds, distances between
cars, and the averages for both using simple math.  This is the primary data
that we see in the report output.

Again it's a lot of code to cover, but Justin has produced a great end result. 
I'm not going to include the output here because it too is large, but go run the
program and take a look at what it builds.

My thanks to Justin for the heroic effort and Gavin for the interesting problem.

The Ruby Quiz began with an encoding problem that came out of a book and
tomorrow we have another one for you...