On 07.03.2007 08:56, Jon Egil Stand wrote:
> # MAIN QUESTION
> 
> # Is there a nice and fast way to do set operations on arrays of
> # non-trivial objects.
> 
> 
> # BACKGROUND
> 
> # Set operations are beatiful:
> 
> a = [1,2,3,4,5]
> b = [2,4,6,8,10]
> 
> (a & b)
> # => [2,4]
> 
> 
> # When comparing lists, which is more or less what administration of
> # pension schemes is all about, this is very, very nice.
> 
> # I very often use stuff like
> (a - (a&b))
> 
> # => [1,3,5]
> 
> # The array library is blazingly fast, I'm really happy with it.
> 
> 
> # The challenge arises when my lists don't comprise of fixnums, but 
> instead of
> # more specified objects. To simplify:
> 
> class Person
>  def initialize(name, ssid)
>    @name = name
>    @ssid = ssid
>  end
> 
>  attr_reader :name, :ssid
> end
> 
> list = []
> list << Person.new('Peter Zapffe', 1)
> list << Person.new('Peter Pan', 2)
> list << Person.new('Saint Peter', 3)
> 
> # Let's say I want to UNION that to (b = [2,4,6,8,10]) from above, using 
> ssid
> # as key.It should in that case return the 'Peter Pan'-person, since he 
> is the
> # only one with an ssid included i the list.
> 
> (list & b)
> # => []
> 
> # I can do stuff like:
> 
> list_ssid = list.collect{|p| p.ssid}
> # => [1,2,3]
> 
> union = list_ssid & b
> # => [2]
> 
> list.select{|p| union.include? p.ssid}
> # => [#<Person:0x2b8b300 @name="Peter Pan", @ssid=2>]

You can do this simpler:

list.select {|p| b.include? p.ssid}

Note, make b a Set for more efficiency (see below).

> # This works, and correctness is always nice, but it doesn't scale very 
> nice,
> # basicly because  union.include? scans the union-list from scratch 
> every time.
> 
> # I have implemented this before, by sorting both lists and stepping 
> through
> # them one at a time. That's still correct and much faster, but I have
> a feeling
> # it's possible to do it in a much more elegant way, using some sort of
> # set-operations.
> 
> # I would appreciate any hints and or pointers.
> 
> # Thank's for reading.
> # JE

First, when you work with sets you should probably use Set - if your 
sets get large you will notice the speed difference.

irb(main):002:0> a = [1,2,3,4,5].to_set
=> #<Set: {5, 1, 2, 3, 4}>
irb(main):003:0> b = [2,4,6,8,10].to_set
=> #<Set: {6, 2, 8, 4, 10}>
irb(main):004:0> a & b
=> #<Set: {2, 4}>

Secondly, it seems you're basically creating subsets based on some 
criteria.  If you just had a single key, you could define your class's 
equivalence (#hash, #eql?, #==) to reflect that and use the approach you 
tried, i.e. using set unions.

The ad hoc solution is to actually use #select as you did.  But if the 
criteria change, you rather need a structure that resembles an RDBMS 
table which can have any number of indexes.  If you have to do this 
multiple times probably also with varying criteria you might want to 
create a wrapper around a Set that can support arbitrary number of 
indexes (Hashes) and keeps those consistent much like an RDBMS does.

You might find some more information in the archives of this list.

Kind regards

	robert