James Edward Gray II schrieb:
> On Sep 18, 2007, at 4:00 AM, Matthias W?chter wrote:
>> Anyway -- i'd like to see a 100000 lookups comparison :) *hehe*
> Great.  Run it for us and let us know how we do.  ;)


Here are the results of the supplied solutions so far, and it looks like 
my solution can take the 100k-performance victory :)

First Table: Compilation (Table Packing)

              real    user    sys
Adam[*]     0.005   0.002   0.003
Luis        0.655   0.648   0.007
James[**]  21.089  18.142   0.051
Jesse       1.314   1.295   0.020
Matthias    0.718   0.711   0.008

[*]: Adam does not perform a real compression but he builds two 
boundaries to search within the original .csv he subsequently uses.
[**]: Upon rebuild, James fetches the .csv sources from the web making 
his solution look slow. This output highly depends on your--actually 
my--ISP speed.


Second Table: Run (100_000 Addresses)

              real    user    sys
Adam       24.943  22.993   1.951
Bill       35.080  33.029   2.051
Luis       16.149  13.706   2.444
Eugene[*]  52.307  48.689   3.620
Eugene     65.790  61.984   3.805
James      14.803  12.449   2.356
Jesse      14.016  12.343   1.673
Jesus_a[**]
Jesus_b[**]
Kevin[***]
Matt_file   6.192   5.332   0.859
Matt_str    3.704   3.699   0.005
Simon      69.417  64.679   4.706
Justin     56.639  53.292   3.345
steve      63.659  54.355   9.294

[*]: Eugene already implements a random generator. But to make things 
fair, I changed his implementation to read the same values from $stdin 
as all the other implementations. The "Star" version is using his own 
random generator and runs outside competition, the starless version is 
my modified one.
[**]: O Jesus :), I can't make your FasterCSV version (a) run, and in 
the later version you sent your direct parsing breaks when it comes to 
detecting the commented lines in the first part of the file. I couldn't 
manage to make it run, sorry.
[***]: Although I managed to write the missing SQL insertion script and 
to even add separate indexes for the address limits, Kevin's SQLite3 
version took simply too long. I estimated a run time of over an hour. I 
am willing to replay the test if someone tells me how to speed up things 
with SQLite3 to make it competitive.

Note that I slightly changed all implementations to contain a loop that 
iterates on $stdin.each instead of ARGV or using just ARGV[0]. For the 
test the script was run only once and was supplied with all addresses in 
one run. The test set consisted of 100_000 freshly generated random IP 
addresses written to a file and supplied using the following syntax:

$ (time ruby IpToCountry.rb <IP100k > /dev/null) 2>100k.time

I didn't check the output of the scripts, although I checked one address 
upfront. This was mainly because all scripts have a different output 
format. My tests were just for measuring the performance.


Just for Info:

$ uname -a
Linux sabayon2me 2.6.22-sabayon #1 SMP Mon Sep 3 00:33:06 UTC 2007 
x86_64 Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz GenuineIntel GNU/Linux
$ ruby --version
ruby 1.8.6 (2007-03-13 patchlevel 0) [x86_64-linux]
$ cat /etc/sabayon-release
Sabayon Linux x86-64 3.4

- Matthias