Ok, I need to preface this by saying that I'm in no way either a C or
ruby guru, so I may have missed something here, but this is what I'm
seeing. The bottle-neck here is in the printing to screen.

I don't see how the OP got the code to run in 5 seconds while using any
stdout writing function (e.g., printf). The program should print like 4
megs of text to the screen (though I couldn't get the C that was posted
to compile--like I said, I'm no C superstar) -- there's no way that is
happening in 5 seconds.

Taking a very simple test case, which just prints a 10 byte char array
15000 times using puts:

--------

#include <stdio.h>
#define SIZE 15000
int main() {
  char str[11]   = "ABCDEFGHIJ";
  int i;
  for (i = 0; i < SIZE; ++i) {
    puts(str);
  }
  printf("Printed %i bytes of data\n\0", SIZE*strlen(str));
  return 0;
}

--------

Compiled with:

gcc -o test test.c

This yields:

time ./test

ABCDEFGHIJ
ABCDEFGHIJ
...
ABCDEFGHIJ
Printed 150000 bytes of data

real    0m25.621s
user    0m0.010s
sys     0m0.029s


Now, let's see how Ruby's stdout writing stacks up:

--------

#!/usr/bin/ruby -w
SIZE = 15000
str = "ABCDEFGHIJ"
1.upto(SIZE) {
  STDOUT.syswrite("#{str}\n")
}
STDOUT.syswrite("Printed #{SIZE*str.length} bytes of data\n")

--------

This yields:

ABCDEFGHIJ
ABCDEFGHIJ
...
ABCDEFGHIJ
Printed 150000 bytes of data

real    0m26.796s
user    0m0.202s
sys     0m0.049s

Pretty comparable there.


Of course, the bottle-neck was *supposedly* the maths and array access
and such, which is where C would excel (I'm not denying that ruby is
sometimes pretty slow, or that C is *much* faster all around, just bare
with me here).


So with one ruby implementation of the program mentioned on here:

--------

#!/usr/bin/ruby -w

Wd = (ARGV.shift || 5).to_i
$board = []

# Generate all possible valid rows.
Rows = (0...Wd ** Wd)\
       .map { |n| n.to_s(Wd)\
       .rjust(Wd,'0') }\
       .reject{ |s| s =~ /(.).*\1/ }\
       .map { |s| s.split(//)\
                  .map { |n| n.to_i + 1 }
       }

def check (ary, n)
  ary[0, n + 1].transpose.all? { |x| x.size == x.uniq.size }
end

def add_a_row (row_num)
  if (Wd == row_num)
    STDOUT.syswrite($board.map { |row| row.join }.join(':'))
  else
    Rows.size.times { |i|
      $board[row_num] = Rows[i]
      if (check($board, row_num))
        add_a_row(row_num + 1)
      end
    }
  end
end

add_a_row(0)

---------

This took like 48 minutes! Ouch! But if that syswrite (or puts in the
original version) is replaced with a file write:

--------

def add_a_row (row_num)
  if Wd == row_num
    $outfile << $board.map { |row| row.join }.join(':')
  else
    Rows.size.times { |i|
      $board[row_num] = Rows[i]
      if (check($board, row_num))
        add_a_row(row_num + 1)
      end
    }
  end
end

$outfile = File.open('latins.dump', 'wb')
add_a_row(0)
$outfile.close

--------

Now we're down to 16 minutes. Much better! But still...


Ok, so what about the implementation using the permutation and set
classes?

--------

#!/usr/bin/ruby -w

require('permutation')
require('set')

$size = (ARGV.shift || 5).to_i

$perms = Permutation.new($size).map { |p| p.value }
$out = $perms.map { |p| p.map { |v| v+1 }.join }
$filter = $perms.map { |p|
  s = SortedSet.new
  $perms.each_with_index { |o, i|
    o.each_with_index { |v, j| s.add(i) if p[j] == v }
  } && s.to_a
}

$latins = []
def search lines, possibs
  return $latins << lines if lines.size == $size
  possibs.each { |p|
    search lines + [p], (possibs -
                         $filter[p]).subtract(lines.last.to_i..p)
  }
end

search [], SortedSet[*(0...$perms.size)]

$latins.each { |latin|
  $perms.each { |perm|
    perm.each { |p| STDOUT.syswrite($out[latin[p]] + "\n") }
    STDOUT.syswrite("\n")
  }
}

--------

26 minutes...that's still gonna leave a mark. But the file IO version
of the same program?

--------

outfile = File.open('latins.dump', 'wb')
$latins.each { |latin|
  $perms.each { |perm|
    perm.each { |p| outfile << $out[latin[p]] << "\n" }
    outfile << "\n"
  }
}
outfile.close

--------

time ruby test.rb

real    0m17.227s
user    0m13.969s
sys     0m1.399s

17 seconds! WOOHOO! Yes, indeedy. That's more like it. Try it if you
don't believe me.



So the moral of the story is twofold:

1.) Don't assume the bottle-neck is in the interpreter and just run off
and start writing everything in C or Java or simplified Klingon or
whatever. Testing and coverage and profiling is the key to discovering
the true cause of your woes. Here the bottle-neck was in the writing 4+
megs to stdout -- and going to C won't help for that, contra the claims
of the OP.

2.) Don't write a crapload of text to stdout. It won't be as fast as
you'd like it to be no matter what language you use.


========

NB: All testing was done on:

Linux 2.6.17-gentoo-r5 #2 PREEMPT Thu Aug 10 14:21:37 CDT 2006 i686 AMD
Athlon(tm) XP 1600+ AuthenticAMD GNU/Linux

gcc version 3.4.6 (Gentoo 3.4.6-r2, ssp-3.4.6-1.0, pie-8.7.9)

GNU C Library development release version 2.4

ruby 1.8.5 (2006-08-18) [i686-linux]

========


Regards,
Jordan