Robert Klemme wrote:

> What about this one?
> 
> def get_file_names_3
>    Dir["{CAL,NCPH,GOH}[0-9][0-9][0-9][0-9][0-9][0-9].xls"]
> end

Robert

  I had not tested case 1 in the benchmark (I dropped digit '1' in the 
call); it turns out that it is the Find, as you had thought, to be the 
one that causes most inefficiency; without repeating all the code, these 
are the results, including also your last suggestion.

# uses Find, and selects files manually
def get_file_names
  fn=[]
  ..
  fn
end

# 1) uses Dir.glob and builds array with a loop
def get_file_names1
  fn=[ ]
  all_files = Dir.glob("*")
  ..
  fn
end

# 2) uses Dir.glob and grep
def get_file_names2
  all_files = Dir.glob("*")
  my_files  = all_files.grep(%r{^ (CAL|NCPH|GOH) \d{6} \.xls $}x)
end

# 3) variation of solution 2
def get_file_names3
   Dir["{CAL,NCPH,GOH}[0-9][0-9][0-9][0-9][0-9][0-9].xls"]
end

# with 30 files
Benchmark.bm(5) do |timer|
  timer.report('get_file_names')  {10_000.times {get_file_names}  }
  timer.report('get_file_names1') {10_000.times {get_file_names1} }
  timer.report('get_file_names2') {10_000.times {get_file_names2} }
  timer.report('get_file_names3') {10_000.times {get_file_names3} }
end

               user        system      total       real
get_file_names 14.640000   9.080000   23.720000 ( 23.778029)
get_file_names1  1.690000   1.200000   2.890000 (  2.903737)
get_file_names2  1.370000   1.210000   2.580000 (  2.581539)
get_file_names3  1.430000   3.530000   4.960000 (  4.968951)

Solution 2) is, as we saw before, the winner: 10 times faster than the 
original solution. But the grep only improves things by 10%; 90% of the 
contribution comes from removing Find (as Robert had guessed)!

Regarding the last one (called solution 3) from you:
>    Dir["{CAL,NCPH,GOH}[0-9][0-9][0-9][0-9][0-9][0-9].xls"]

This turned out to be 2 times slower than solution 2. Checking if 
something is between 0-9 is apparently quite slower than checking for 
'digit'.

In conclusion:
Solution 2 is the fastest; but the reason is not the grep (as I had 
theorized), which accounts only for 10% of the improvement; the other 
90% comes from removing Find, as Robert had guessed.

Last consideration: increasing the number of files, the weight of 'grep' 
in the improvement increases (but, enough of benchmarks for to-day :-).

Regards

Raul

-- 
Posted via http://www.ruby-forum.com/.