On 29.04.2007 14:04, Peter Szinek wrote:
> Hello all,
> 
> I have been playing with partitioning a set recently and I am stuck with 
> an issue. The whole story is here:
> 
> http://www.rubyrailways.com/partitioning-sets-in-ruby/
> 
> A quick version for those who would not like to read the article:
> 
> Consider this input:
> 
> a 53 2 3
> b 8 62 1 23
> a 9 0 31
> b 4 45 4 16 7
> b 1 23
> c 3 42 2 31 4 6
> a 1 3 22
> a 7 83 1 23 3
> b 1 14 4 15 16 2
> c 5 16 2 34
> 
> the goal is to create a partition based on the character in the first 
> column, i.e.:
> 
> <Set: <Set: {"a 9 0 31", "a 7 83 1 23 3", "a 53 2 3", "a 1 3 22 "}>, 
> <Set: {"b 1 23 ", "b 1 14 4 15 16 2", "b 8 62 1 23", "b 4 45 4 16 7"}>, 
> <Set: {"c 5 16 2 34", "c 3 42 2 31 4 6"}>}>
> 
> Which is exactly what Set.divide does. However, there is one problem: I 
> would like to know if there are duplicate lines. I.e. divide returns the
> same result, no matter that the input is this:
> 
> c 5 16 2 34
> c 5 16 2 34
> c 5 16 2 34
> 
> or this:
> 
> c 5 16 2 34
> 
> What I would need is a modified divide which returns also the count of 
> the elements in the input set (at least for those elements which are 
> more than once in the set). Is this doable or do I have to roll some 
> code to do this for me additionally?

Basically you need bags.  Since a quick check does not reveal any, you 
can roll your own pretty easily with a Hash with default value 0.  This 
is what I'd do: (see script at end).  Of course you could save another 
line by inlining "key".

Kind regards

	robert

8<-----------------

#!/usr/bin/ruby

require 'pp'

parts = Hash.new {|h,k| h[k] = Hash.new(0)}

DATA.each do |line|
   line.chomp!
   key = line[/^\w+/]
   parts[key][line] += 1
end

pp parts

__END__
a 53 2 3
b 8 62 1 23
a 9 0 31
b 4 45 4 16 7
b 1 23
c 3 42 2 31 4 6
a 1 3 22
a 7 83 1 23 3
b 1 14 4 15 16 2
c 5 16 2 34
c 5 16 2 34