On 9/14/07, Ruby Quiz <james / grayproductions.net> wrote:
> This week's Ruby Quiz is to write a simple utility.  Your program should accept
> an IP address as a command-line argument and print out the two letter code for
> the country that IP is assigned in.  You can find a database for the matching
> at:
>
>         http://software77.net/cgi-bin/ip-country/geo-ip.pl
>
> To keep the problem interesting though, let's write our programs with a focus on
> speed and memory efficiency.

Hi,

This is my second submission to Ruby Quiz, any tip will be greatly
appreciated. First I downloaded the file and removed all comments,
then tried a straightforward approach. Here it is:

require 'faster_csv'
ip = ARGV[0].split(/\./)
ip = ip[0].to_i * 16777216 + ip[1].to_i * 65536 + ip[2].to_i * 256 + ip[3].to_i
file = ARGV[1] || 'ipdb.csv'

FasterCSV.foreach(file) do |line|
  if (line[0].to_i <= ip && line[1].to_i >= ip)
    puts line[4]
    exit
  end
end

It allows to pass the IP as the first parameter, then transforms it to
the format in the file (the "base" 256). The file can be passed as the
second argument, by default I used my modified file without the
comments.

The method uses FasterCSV to parse the file line by line, checking ip
against the range. The result for the IP in the example was:

time ruby quiz139a.rb 68.97.89.187
US

real    0m0.355s
user    0m0.328s
sys     0m0.024s

So, it was a little bit more than the one presented with the problem.
Unfortunately, things don't go so good when you search for an IP that
matches entries at the end of the file:

time ruby quiz139a.rb 255.0.0.1
ZZ

real    0m5.337s
user    0m5.036s
sys     0m0.292s

More than 5 seconds :-(

So, I went on to build a second version. This time the approach I
took, as the file is ordered is a binary search directly on the file.
Here it is:

require 'ftools'

ip = ARGV[0].split(/\./)
ip = ip[0].to_i * 16777216 + ip[1].to_i * 65536 + ip[2].to_i * 256 + ip[3].to_i
file = ARGV[1] || 'ipdb.csv'

File.open(file) do |f|
  low = 0
  high = f.stat.size
  f.seek(high / 2)
  while true
    while ((a = f.getc) != 10)
      f.seek(-2, IO::SEEK_CUR)
    end
    pos = f.pos
    line = f.readline.split(",")
    low_range = line[0][1..-2].to_i
    high_range = line[1][1..-2].to_i
    if (low_range > ip)
      high = pos
      offset = (f.pos - pos) + ((high - low) / 2)
      f.seek(-offset, IO::SEEK_CUR)
    elsif (high_range < ip)
      low = f.pos
      f.seek((high-low) / 2, IO::SEEK_CUR)
    else
      puts line[4][1..-2]
      exit
    end
  end
end

What the method does is a binary search in the file. It starts a low
and high variables to manage the partition, then positions the cursor
in the file to the middle. Every time, the method reads back until it
finds a 10 (\n), then reads that line to check the IP range. If the
lower range is higher than the IP, the method moves the cursor in the
file, down to half of the current low-high range. If the higher range
is lower than the IP, then it moves up half of the range. The timings
here are much better:

time ruby quiz139b.rb 68.97.89.187
US

real    0m0.077s
user    0m0.048s
sys     0m0.004s

time ruby quiz139b.rb 255.0.0.1
ZZ

real    0m0.069s
user    0m0.060s
sys     0m0.008s

Apart from the general strategy of the binary search, I haven't had
the time to give too much thought to the details, like how to extract
the values after the readline. Any tips regarding that would be
greatly appreciated. Also I don't know if there's a better way of
moving back in the file to the previous line break (the while (a =
f.getc) != 10 part). Tips?

Thanks for this quiz. I hope I had time to work on every quiz.

Regards,

Jesus.