On 29.04.2007 14:04, Peter Szinek wrote: > Hello all, > > I have been playing with partitioning a set recently and I am stuck with > an issue. The whole story is here: > > http://www.rubyrailways.com/partitioning-sets-in-ruby/ > > A quick version for those who would not like to read the article: > > Consider this input: > > a 53 2 3 > b 8 62 1 23 > a 9 0 31 > b 4 45 4 16 7 > b 1 23 > c 3 42 2 31 4 6 > a 1 3 22 > a 7 83 1 23 3 > b 1 14 4 15 16 2 > c 5 16 2 34 > > the goal is to create a partition based on the character in the first > column, i.e.: > > <Set: <Set: {"a 9 0 31", "a 7 83 1 23 3", "a 53 2 3", "a 1 3 22 "}>, > <Set: {"b 1 23 ", "b 1 14 4 15 16 2", "b 8 62 1 23", "b 4 45 4 16 7"}>, > <Set: {"c 5 16 2 34", "c 3 42 2 31 4 6"}>}> > > Which is exactly what Set.divide does. However, there is one problem: I > would like to know if there are duplicate lines. I.e. divide returns the > same result, no matter that the input is this: > > c 5 16 2 34 > c 5 16 2 34 > c 5 16 2 34 > > or this: > > c 5 16 2 34 > > What I would need is a modified divide which returns also the count of > the elements in the input set (at least for those elements which are > more than once in the set). Is this doable or do I have to roll some > code to do this for me additionally? Basically you need bags. Since a quick check does not reveal any, you can roll your own pretty easily with a Hash with default value 0. This is what I'd do: (see script at end). Of course you could save another line by inlining "key". Kind regards robert 8<----------------- #!/usr/bin/ruby require 'pp' parts = Hash.new {|h,k| h[k] = Hash.new(0)} DATA.each do |line| line.chomp! key = line[/^\w+/] parts[key][line] += 1 end pp parts __END__ a 53 2 3 b 8 62 1 23 a 9 0 31 b 4 45 4 16 7 b 1 23 c 3 42 2 31 4 6 a 1 3 22 a 7 83 1 23 3 b 1 14 4 15 16 2 c 5 16 2 34 c 5 16 2 34