--------------030707030609070201020205 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit > by Martin DeMello > > In Scrabble parlance, a 'bingo stem' is defined as a set of six > letters that combines with a large fraction of the alphabet to anagram > to valid seven letter words (a 'bingo' is a move using all seven tiles > on your rack). For instance, one of the more prolific stems, SATIRE, > combines with twenty letters, A, B, C, D, E, F, G, H, I, K, L, M, N, > O, P, R, S, T, V, W (forming words like ASTERIA, BAITERS, RACIEST > ...). > > Write a program that, given a word list and a cutoff n, finds all 6 > letter stems that combine with n or more letters, sorted in order of > how many distinct letters they combine with. Here's my solution. My earlier performance problems resulted from several very inefficient attempts at generating unique letter combinations, i.e. list all unique sets of "N letters taken 6 at a time". Reducing the problem to "7 letters taken 6 at a time" made that part trivial, but I kept plugging at the more general problem. The dictionary I used was 2of4brif.txt, available as part of the 12Dicts package at http://prdownloads.sourceforge.net/wordlist/12dicts-4.0.zip On my 2GHz P4, I get 5319 stems with a cutoff of 3 in 4.062 seconds. A version optimized for 7-letter words (not shown) takes barely over 2 seconds. If I swap line 47 in for line 48, so that all words of 7 letters or more are processed, I get 167,227 stems with a cutoff of 3 in 2196.453 seconds (~37 min.). -- Glenn Parker | glenn.parker-AT-comcast.net | <http://www.tetrafoil.com/> --------------030707030609070201020205 Content-Type: text/plain; nameingo1.rb" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filenameingo1.rb" #!/usr/bin/env ruby class Combiner include Enumerable def initialize(pick, from) # How many objects should be returned in each set @pick ick # List of objects from which to produce combinations @from rom end def each unpicked from.length - @pick # Indices of objects in @from to return as a set. selectors 0...@pick).to_a loop do yield selectors.collect {|i| @from[i]} # Advance the selectors array to the next combination. s pick - 1 s - while selectors[s] s + unpicked break if s < 0 value electors[s] s.upto(@pick - 1) do |i| selectors[i] value + ) end end end end class BingoStems STEMLENGTH def initialize(filename, cutoff) @filename ilename @cutoff utoff @stems } end def find start_time ime.now # Collect stems for each word in the word list File.open(@filename) do |file| file.each do |word| word.chomp! # Use the next line alternative to process longer words. #next if word.length < TEMLENGTH next if word.length ! STEMLENGTH + 1) word.downcase! Combiner.new(STEMLENGTH, word.split(//).sort).each do |stem| (@stems[stem.join('')] || })[word] rue end end end # Discard stems with less than @cutoff word combos @stems.delete_if {|k, v| v.length < @cutoff} puts "found #{@stems.length} stems in #{Time.now - start_time} seconds" self end def report(stream stdout, verbose alse) keys stems.keys.sort_by {|a| @stems[a].length} keys.each do |key| stream.puts "#{key}: #{@stems[key].length}" next unless verbose @stems[key].each_key do |word| stream.puts " " + word end end stream.flush end end CUTOFF RGV[0].to_i DICT RGV[1] BingoStems.new(DICT, CUTOFF).find.report(File.open('output.txt', 'w'), true) --------------030707030609070201020205--