Some comments on the quiz:

If we have no information on the banned word list at all,
so every subset of the dictionary would be equally likely,
then we could not do better than brute force, i.e. testing
every word. This is because every query gives us one bit
of information, and for n words in the dictionary, there
are 2^n possible subsets, which have to be distinguished
by n bits of information.

So we will have to use some kind of additional knowledge
on the banned word list. Several variants are possible.
We could get some info on the probability distribution of
the list, or we may have an upper (and maybe a lower) limit
on the number of words in it. Then it is also not clear
what "as few emails as possible" means. This could mean
the maximal number of emails, or the expected number of
emails, or the minimization of some other cost functional.

So, it is possible to choose. Here is the choice I made
and analyzed: I assume that I know the dictionary size
and the probability with which each word is chosen for
the banned word list. Here is the part of ruby code I
use to create the banned word list

   def initialize(size, prob)
     @size = size
     if prob <= 0 or prob >= 1 # sanity check
       throw "illegal probability in LanguageFilter::initialize"
     end

     @badWords = []
     @size.times {|k| @badWords.push(k) if rand < prob}
   end

Note, that I use not words but an index of the words (so I
can use numbers 1...n instead of a word list). My object is
to minimize the expected number of email calls in this
scenario. Now, if I have such a word list with  n  words, I
can send an email of size k. If it does not get blocked, my
problem reduces to the same minimization problem with a\
smaller size  n - k. If the email does get blocked, my
problem reduces to two smaller problems, one of size  k
and one of size  n - k. In the word list of size  k, I do
have slightly more information then before: I know that
there is at least one blocked word in it (this does not
make much of a difference for large k, but a _lot_ of
difference for small k). Now I can find recursive formulas
for the two kinds of expected values (available upon
request).

I coded most of this, but unfortunately RL caught up with
me, and I will need another day or so until I can present a
full solution.

Michael


-- 
Michael Ulm
R&D Team
ISIS Information Systems Austria
tel: +43 2236 27551-219, fax: +43 2236 21081
e-mail: michael.ulm / isis-papyrus.com
Visit our Website: www.isis-papyrus.com