Hi, I'm working on a librairie to analyse repport of the prelude IDS on a MySQL database. I need set operations (- and intresection) on quite huge liste (several thousand of elements) and i've discovered that the - operator seems to be always in O(n*m) (where length of list are n and m). In fact when the list is already sorted, this operation should be O(n+m) which is much better. I've realised a little Set class which ilustrate this pb, i've code a O(n+m) solution in pure Ruby code and this solution is really faster than the Ruby one when the length is important. This is an example, i've created two list t1 and t2 of 10000 random number, i apply sort! and uniq! and them i test the two algo. (source code is at the end of mail). cedric@lagavulin:~/projet/prelude$ ./set.rb sorting... 0.053775 size t1:9535, t2:9523 diff... 69.653139 my new diff... 0.263852 check if we get the same result with operator <=> ok cedric@lagavulin:~/projet/prelude$ ruby -v ruby 1.6.7 (2002-03-19) [i386-linux] So the hard coded - operation spends 69s and my one 0.26 s. So i propose few solution: 1) In the - method, test if the list are allready sorted (which is a O(n+m) operation), them adapt the algo (ie O(n+m) or O(n*m)). 2) In the - method do a sort first (which is a O(n*ln n+m*ln m) operation) and them do the - with the O(n+m) algo. Because the - operator is a set operator, the order doesn't really make sense... 3) Add a Set class (like my one but hard coded) where the order doesn't make a sense and so what we manipulate is internaly alway sorted. - operation is alway in O(n+m). Sorry for my english, it's not my mother language. Regards remark: Union operation should be in O(n+m) because list are sorted (like in a merge sort). I did a O((n+m) ln (n+m)) because i'm lazzy but it's quite trivial to improve that. #!/usr/bin/ruby class Set def initialize(tab) @set = tab.sort @set.uniq! end def [](i) return @set[i] end def length return @set.length end def -(aSet) res = [] i = 0 for j in 0..aSet.length-1 while i < @set.length and @set[i] < aSet[j] do res << @set[i] i+=1 end i += 1 if @set[i] == aSet[j] end res.concat(@set[i.. / set.length-1]) if i<@set.length return Set.new(res) end def +(aSet) tmp = @set+aSet.to_a tmp.sort! tmp.uniq! return Set.new(tmp) end def &(aSet) return (self-aSet)+(aSet-self) end def to_a return @set end def to_s return "{"+@set.join(', ')+"}" end end t1 = [] t2 = [] for i in 0..10000 t1[i]=rand(100000) t2[i]=rand(100000) end print "sorting... " time1 = Time.new t1.sort! t1.uniq! t2.sort! t2.uniq! time2 = Time.new print time2-time1,"\n" puts "size t1:#{t1.length}, t2:#{t2.length}" print "diff... " tmp1 = t2-t1 time3 = Time.new print time3-time2,"\n" print "my new diff... " tmp2 = (Set.new(t2)-Set.new(t1)).to_a time4 = Time.new print time4-time3,"\n" puts "check if we get the same result with operator <=>" puts "ok" if (tmp1 <=> tmp2) == 0 -- Cedric Foll courriel: cedric dot foll at laposte dot net