On 4/2/06, Minkoo Seo <minkoo.seo / gmail.com> wrote:
> Hi group.
>
> I'm writing some scientific applications with Ruby, and found a
> frequent problem that I want to solve with Ruby.
>
> I got tons of instances of NGram whose definition is as follows:
>
> NGram = Struct.new :seq, :prob
>
> I have a list of instances of NGram like:
>
> ....
> #<struct NGram seq=["AO", "S"], prob=-139918.174804688>
> #<struct NGram seq=["AY", "T"], prob=-46389.6875>
> #<struct NGram seq=["HH", "IH"], prob=18983.1796875>
> #<struct NGram seq=["OW", "Z", "AH"], prob=-326323.640625>
> #<struct NGram seq=["OW", "Z", "AH"], prob=-35945.25>
> #<struct NGram seq=["T", "AH", "L"], prob=20778.7421875>
> #<struct NGram seq=["HH", "IH", "S"], prob=37747.3046875>
> #<struct NGram seq=["IH", "S", "T"], prob=-17305.6640625>
> #<struct NGram seq=["IH", "S", "T"], prob=-17477.390625>
> #<struct NGram seq=["IH", "S", "T"], prob=34243.34375>
> #<struct NGram seq=["IH", "S", "T"], prob=-2125.265625>
> #<struct NGram seq=["IH", "S", "T"], prob=-9046.7890625>
> #<struct NGram seq=["IH", "S", "T"], prob=-18200.265625>
> #<struct NGram seq=["K", "L", "AH"], prob=-110206.140625>
> #<struct NGram seq=["K", "L", "AH"], prob=-92664.984375>
> ....
>
> What I want to derive from this data is the list of NGram instances
> each of which is unique with regard to seq. At the same time, the prob
> of each ngram in the list must be that of the highest prob.
>
> For example, from the ngram list I've shown above, I want to derive a
> list like the folloing:
>
> ....
> #<struct NGram seq=["AO", "S"], prob=-139918.174804688>
> #<struct NGram seq=["AY", "T"], prob=-46389.6875>
> #<struct NGram seq=["HH", "IH"], prob=18983.1796875>
> #<struct NGram seq=["OW", "Z", "AH"], prob=-35945.25>
> #<struct NGram seq=["T", "AH", "L"], prob=20778.7421875>
> #<struct NGram seq=["HH", "IH", "S"], prob=37747.3046875>
> #<struct NGram seq=["K", "L", "AH"], prob=-92664.984375>
> ....
>
> What I've written so far is
>
> # Sort by prob in descending order
> ngrams.sort_by { |ngram|
>
>     # Compare seq
>
>     # Then, compare prob
> }
>
> result = []
>
> # Collect unique ngrams with the highest prob.
> ngrams.inject(nil) { |prev, cur|
>     if prev.nil?
>         result << cur
>         prev = cur
>     elsif prev.seq != cur.seq
>         result << cur
>         prev = cur
>     end
> }
>
> return result
>
ngrams.inject({}) do |highest, ngram|
  seq = ngram.seq
  best_now = highest[seq]
  highest[seq] = ngram unless (best_now && best_now.prob > ngram.prob)
  highest
end.values

/RF