Hi all (this is going to comp.lang.ruby and comp.lang.perl.misc),

The other day I wrote a basic program in Perl, and the following day I
rewrote it in Ruby. I'm curious about the differences in runtime of
the two versions, though.

Let me start by describing the program (I'll append full code for both
to the end): it reads in a list of alphanumeric codes from file
(format is [\w\d\S]+_\d{3}, but they're separated by a comma in the
file), then creates a hash with those codes as keys and empty arrays
as values. After the hash is built, the program traverses through a
given directory and its subdirectories (using File::Find in Perl and
Find.find in Ruby) and checks each file against the hash of codes
(with a few regexps and conditions to prevent lots of unnecessary
looping), adding it to the array for a code if the code is found in
the filename. Finally, it writes the contents of the hash to a .csv
file in the format CODE,PATH for each match.

Now, if it were the case that Ruby or Perl were simply -slower- than
the other, I wouldn't be bothering you folks. But here's where it gets
a little unusual: the number of elements in the code list has a
noticeable impact on the run time of the Ruby version, but far less on
the Perl version. I ran each one a few times with code lists of
various sizes, and they both print start/stop timestamps at the end,
so I collected the data:

Entries | Seconds
Ruby
4	   | 153
64	  | 133
256	 | 222
512	 | 327
1024    | 562
1500    | 683
Perl
4	 | 291
64	| 258
256    | 253
512    | 248
1024  |	353
1500  |	363

Ruby runs faster for low numbers of entries, as you can see, but once
you get up to 1500, Ruby's time has more than tripled while Perl's
time has gone up about a fifth.

I've looked over the code for both versions several times, and I don't
see any significant differences. The only important feature the Ruby
version lacks is the sort() before writing the file.

I'd really appreciate any insight into why Ruby's runtime grows so
readily and Perl's does not.

Code of both versions follows.

Thanks,
Andrew Fallows

use File::Find;
use strict;
use warnings;
my $code;
my $type;
my %filecodes = ();
my $start_time = "Started: " . localtime();
$| = 1; #Enables flush on print.
$\ = "\n"; #Automatic newlines on print
open(ITEM_LIST, "(path)") or die "Error";

# This loop builds a hash whose keys are the codes/types from file
# and whose values are references to empty arrays
while(my $item = <ITEM_LIST>)
{
	$item =~ s/,/_/;
	$item =~ s/\n//g;
	print $item;
	my @files = ();
	$filecodes{$item} = \@files;
}
print "Hash built";

# Uses File::Find to iterate over the entire subdirectory
find(\&file_seek, "(path)");

# The searching portion: gets each location from File::Find, then
compares it
# to all the targets. If there is a match, prints a message and adds
that file
# to the related array.
sub file_seek
{
	my $file = $_;
	# Kicks out if the file in question is not of the necessary format
	if(!(-f $file) || !($file =~ /^[\d\w\S]+_\d{3}/)){ return; }

	foreach my $target (keys(%filecodes))
	{
		# If the file name contains the code sought
		if($file =~ /$target/)
		{
			print "found $file in $File::Find::dir";

			# Jumps out if the list for this code already contains this file.
			for (0..@{$filecodes{$target}})
			{
				if(defined(${$filecodes{$target}}[$_])
				&& $File::Find::name eq ${$filecodes{$target}}[$_]) {return; }
			}
			push(@{$filecodes{$target}}, $File::Find::name);
		}
	}
}

# After the whole directory has been searched, prints each key and all
# values found for it.
open(RESULTS, "> (path)") or die "Error 2";
foreach my$target ( sort(keys( %filecodes )))
{
	my @results = @{$filecodes{$target}};
	if(@results == 0) { push(@results, "NO FILES FOUND") }
	print $target;
	foreach (@results)
	{
		print RESULTS "$target,$_";
		print "\t$_";
	}
}
close RESULTS;
print  $start_time;
print "Ended: " . localtime();

Ruby:

class FileSearcher
  $\ = "\n"
  in_file = File.open( "(path)","r")
  start_time = Time.now
  filecodes = Hash.new
  # This loop reads all the item codes in from file and then
  # adds them to a hash, each linked to its own empty array
  while item = in_file.gets
    item = item.gsub(',','_')
    item = item.gsub("\n","")
    files = Array.new
    files.push("empty");
    filecodes[item]= files
  end
  in_file.close

  # The searching portion: looks at each file/location, then compares
it
  # to all the targets. If there is a match, prints a message and
adds
  # that file to the related array.
  require "Find"
  require 'ftools'
  Find.find("(path)") do |file|
    if !(FileTest.file?(file)) || !(File.basename(file) =~ /^[\d\w\S]+_
\d{3}/)
      next
    else
      filecodes.each_key do |target|
        if(file =~ /#{target}/)
          puts "found " + target + " at " + file
          $stdout.flush
          fail = 0
          for i in 0..filecodes[target].size-1 do
            if(filecodes[target][i] != "empty" &&
            File.basename(file) == File.basename(filecodes[target]
[i]))
              fail = 1
              break
            end
          end
          if fail == 0
            if filecodes[target][0] == "empty"
              filecodes[target][0] = file
            else
              filecodes[target].push(file)
            end
          end
        end
      end
    end
  end

  # After the whole directory has been searched, prints each key and
all
  # values found for it to a file called Ruby_results.csv.
  target_file = File.open("(path)","w")
  filecodes.each_key do |target|
    results = filecodes[target]
    if results[0] == "empty"
      results[0] = "NO FILES FOUND"
    end
    puts target
    for i in 0..(results.size-1)
      target_file.puts target + "," + results[i]
    end
  end
  target_file.close
  end_time = Time.now
  puts "Started: " + start_time.to_s
  puts "Ended: " + end_time.to_s
end