Hello Grouo

This was a short quiz. It was not possible for me to get a lot better than my first try. (Until I just now read the other solution :( )

Even though it was not too complicated I again spend too much time on it. If I'd live in the states, I'd definitly sue James for stealing my time;)

The main idea is, that given a list of words, I check if it contains a word that is banned. If the whole chunk passes, I can forget about all words in this chunk and return the empty list.

Otherwise I split the list into equally sized chunks and continue with each chunk. If a chunk that contains only one word is tested and matches, I have found a match.

Here is the algorithm implemented for splitting into two slices.

def  find_words(filter, words)
  return [] if words.empty? or filter.clean?(words.join(' '))  
  return words if words.length == 1
  return find_words(filter, words[0...words.length / 2]) +
           find_words(filter, words[words.length / 2..-1])
end

The fastest algorithm of this type is the above for n = 3. Here is the generic version.

def  find_words_n(filter, words, n = 2)
  return [] if words.empty? or filter.clean?(words.join(' '))
  return words if words.length == 1
  n = words.length if n > words.length
  slices = Array.new(n) { | i | i * words.length / n } << words.length
  slices[0..-2].zip(slices[1..-1]).inject([]) do | result, (low, high) |
                     result + find_words_n(filter, words[low...high], n)
  end
end

Then I tried to find the optimal solution on a more formal basis. My musings are can be read here:
http://ruby.brian-schroeder.de/quiz/detect_words/content/content.html
http://ruby.brian-schroeder.de/quiz/detect_words/content.ps

This resulted in a huge lookup table where I have the optimal split factor for a given number of words and expected probabilty.

After reading the solution by Wayne Vucenic I understood that one can save some calls, if we now that we had a match in wordlist w and no match in the first parts of the wordlist. The last part then neccessarily has to match.

This is implemented here

def find_words(filter, words, filter_mask = true)
  return [] if words.empty? or (filter_mask and filter.clean?(words.join(' ')))
  return words if words.length == 1
  result = find_words(filter, words[0...words.length / 2])
  result + find_words(filter, words[words.length / 2..-1], !result.empty?)
end

And for n > 2:

def find_words_n(filter, words, n = 2, filter_mask = true)
  return [] if words.empty? or (filter_mask and filter.clean?(words.join(' ')))
  return words if words.length == 1
  n = words.length if n > words.length
  slices = Array.new(n) { | i | i * words.length / n } << words.length
  slices = slices[0..-2].zip(slices[1..-1])
  result = slices[0..-2].inject([]) do | result, (low, high) | result + find_words_n(filter, words[low...high], n) end
  result + find_words_n(filter, words[slices[-1][0]...slices[-1][1]], n, !result.empty?)
end

Also I implemented a solution, that saves calls depending on the information that also superwords of banned words are banned. E.g. if sex is banned, sexual is also banned. It turned out that the specs did not include this behaviour, so the solution can be found under the unsuccessfull tries.

The full story (All sources, results, etc) is here:
http://ruby.brian-schroeder.de/quiz/detect_words/

regards,

Brian

-- 
Brian Schröäer
http://www.brian-schroeder.de/