On Sat, Jul 06, 2002 at 11:18:47PM +0900, Martin Pirker wrote:
> I have a Ruby speed problem, maybe you have some suggestions:
> 
> given: text files with values, one per line, sorted, e.g.
> 10100
> 10234
> 10292
> ......
> 
> so:
> arr1 = IO.readlines(file1)
> arr2 = IO.readlines(file2)
> 
> arr1 consists of ~40000 lines/elements
> arr2 size is ~10000
> 
> when I want to take the "set difference", arr3 = arr1-arr2, meaning "take
> all elements from arr1 which dont appear in arr2" this takes forever - I
> don't even know how long because I stopped early ;)
> 
> So the built-in operator is too slow - taking advantage of my knowledge
> of the sorting I handcoded a "loop { compare arr1[0] with arr2[0], either
> puts or remove }"
> This gets it down to 5-10s, much better, but still too slow - 1s would be
> a nice target
> 
> As I'm still quite novice in Ruby (always having Hal's book on my lap ;) ..)
> how would you code a speedier solution?

You could also do it without hashes (though that solution is clearly the
simplest).  This algorithm feels to me like the merge stage of a merge
sort, sorta (if you're not familiar with merge sort, google for it):

a1 = [1, 3, 4, 10, 13, 31, 53, 54, 55]
a2 = [3, 10, 10, 11, 13, 54, 55]

i = 0
a3 = []
a1.each { |a|
  i = i + 1 while a2[i] < a

  a3 << a if a2[i] > a
}

a3.each {|a| puts a}


-- 
      Evan Martin
martine / cs.washington.edu
  http://neugierig.org