On 4/2/06, Minkoo Seo <minkoo.seo / gmail.com> wrote: > Hi group. > > I'm writing some scientific applications with Ruby, and found a > frequent problem that I want to solve with Ruby. > > I got tons of instances of NGram whose definition is as follows: > > NGram = Struct.new :seq, :prob > > I have a list of instances of NGram like: > > .... > #<struct NGram seq=["AO", "S"], prob=-139918.174804688> > #<struct NGram seq=["AY", "T"], prob=-46389.6875> > #<struct NGram seq=["HH", "IH"], prob=18983.1796875> > #<struct NGram seq=["OW", "Z", "AH"], prob=-326323.640625> > #<struct NGram seq=["OW", "Z", "AH"], prob=-35945.25> > #<struct NGram seq=["T", "AH", "L"], prob=20778.7421875> > #<struct NGram seq=["HH", "IH", "S"], prob=37747.3046875> > #<struct NGram seq=["IH", "S", "T"], prob=-17305.6640625> > #<struct NGram seq=["IH", "S", "T"], prob=-17477.390625> > #<struct NGram seq=["IH", "S", "T"], prob=34243.34375> > #<struct NGram seq=["IH", "S", "T"], prob=-2125.265625> > #<struct NGram seq=["IH", "S", "T"], prob=-9046.7890625> > #<struct NGram seq=["IH", "S", "T"], prob=-18200.265625> > #<struct NGram seq=["K", "L", "AH"], prob=-110206.140625> > #<struct NGram seq=["K", "L", "AH"], prob=-92664.984375> > .... > > What I want to derive from this data is the list of NGram instances > each of which is unique with regard to seq. At the same time, the prob > of each ngram in the list must be that of the highest prob. > > For example, from the ngram list I've shown above, I want to derive a > list like the folloing: > > .... > #<struct NGram seq=["AO", "S"], prob=-139918.174804688> > #<struct NGram seq=["AY", "T"], prob=-46389.6875> > #<struct NGram seq=["HH", "IH"], prob=18983.1796875> > #<struct NGram seq=["OW", "Z", "AH"], prob=-35945.25> > #<struct NGram seq=["T", "AH", "L"], prob=20778.7421875> > #<struct NGram seq=["HH", "IH", "S"], prob=37747.3046875> > #<struct NGram seq=["K", "L", "AH"], prob=-92664.984375> > .... > > What I've written so far is > > # Sort by prob in descending order > ngrams.sort_by { |ngram| > > # Compare seq > > # Then, compare prob > } > > result = [] > > # Collect unique ngrams with the highest prob. > ngrams.inject(nil) { |prev, cur| > if prev.nil? > result << cur > prev = cur > elsif prev.seq != cur.seq > result << cur > prev = cur > end > } > > return result > ngrams.inject({}) do |highest, ngram| seq = ngram.seq best_now = highest[seq] highest[seq] = ngram unless (best_now && best_now.prob > ngram.prob) highest end.values /RF