Michael Ulm wrote:
--snip--
> I assume that I know the dictionary size
> and the probability with which each word is chosen for
> the banned word list. Here is the part of ruby code I
> use to create the banned word list
> 
>   def initialize(size, prob)
>     @size = size
>     if prob <= 0 or prob >= 1 # sanity check
>       throw "illegal probability in LanguageFilter::initialize"
>     end
> 
>     @badWords = []
>     @size.times {|k| @badWords.push(k) if rand < prob}
>   end
> 
> Note, that I use not words but an index of the words (so I
> can use numbers 1...n instead of a word list). My object is
> to minimize the expected number of email calls in this
> scenario. Now, if I have such a word list with  n  words, I
> can send an email of size k. If it does not get blocked, my
> problem reduces to the same minimization problem with a\
> smaller size  n - k. If the email does get blocked, my
> problem reduces to two smaller problems, one of size  k
> and one of size  n - k. In the word list of size  k, I do
> have slightly more information then before: I know that
> there is at least one blocked word in it (this does not
> make much of a difference for large k, but a _lot_ of
> difference for small k). Now I can find recursive formulas
> for the two kinds of expected values (available upon
> request).
> 
> I coded most of this, but unfortunately RL caught up with
> me, and I will need another day or so until I can present a
> full solution.
> 
Just used my lunchbreak to implement the last details. I am
afraid the code is not very cleanly written, I totally ignored
speed so far, and I totally cheated in so far that I didn't use
words at all but numbers. Here is the output of two test runs.

6 words of 1000 found after 142 calls (expected 196.20329267521)
11 words of 1000 found after 208 calls (expected 274.947142087926)
27 words of 1000 found after 337 calls (expected 333.688742114315)
23 words of 1000 found after 287 calls (expected 382.652436479999)
51 words of 1000 found after 442 calls (expected 425.491312500002)
64 words of 1000 found after 494 calls (expected 462.720759999999)
73 words of 1000 found after 490 calls (expected 495.858497499998)
64 words of 1000 found after 489 calls (expected 528.357760000002)
104 words of 1000 found after 567 calls (expected 560.076936390001)
101 words of 1000 found after 562 calls (expected 583.697899999996)

11 words of 1000 found after 208 calls (expected 196.20329267521)
24 words of 1000 found after 304 calls (expected 274.947142087926)
23 words of 1000 found after 282 calls (expected 333.688742114315)
38 words of 1000 found after 374 calls (expected 382.652436479999)
48 words of 1000 found after 425 calls (expected 425.491312500002)
63 words of 1000 found after 468 calls (expected 462.720759999999)
76 words of 1000 found after 531 calls (expected 495.858497499998)
87 words of 1000 found after 543 calls (expected 528.357760000002)
94 words of 1000 found after 580 calls (expected 560.076936390001)
91 words of 1000 found after 542 calls (expected 583.697899999996)

================ here is the code, version 0.1 ================
# this holds classes and methods to "solve"
# ruby-talk quiz #9: Banned Words
#
#
# creates the language filter
class LanguageFilter
   def initialize(size, prob)
     @size = size
     if prob <= 0 or prob >= 1 # sanity check
       throw "illegal probability in LanguageFilter::initialize"
     end

     @badWords = []
     @size.times {|k| @badWords.push(k) if rand < prob}
     @count = 0
   end

   def countReset
     @count = 0
   end

   def getBad
     @badWords
   end

   # checks if one of the numbers in the array testText is on the
   # badWords list
   def clean?(testText)
     sortedClean?(testText.sort)
   end

   # checks if one of the numbers in the array testText is on the
   # badWords list testText is assumed to be sorted
   def sortedClean?(testText)
     @count += 1
     binStart, binEnd = 0, @badWords.size
     testText.each do |current|
       binStart = binarySearch(current, binStart, binEnd)
       return false if @badWords[binStart] == current
     end
     return true
   end

   def test(testWords)
     [@count, testWords.sort == @badWords]
   end

   private

   def binarySearch(what, searchStart, searchEnd)
     return searchEnd if what > @badWords[searchEnd - 1]
     while searchEnd - searchStart > 1
       testSearch = (searchStart + searchEnd) / 2
       testWord = @badWords[testSearch]
       return testSearch if testWord == what
       if testWord < what
         searchStart = testSearch
       else
         searchEnd = testSearch
       end
     end
     return searchStart
   end
end

class QueryStrategy
   def initialize(prob)
     @prob = prob
     @qrob = 1.0 - prob
     @eNone = [0, 1] # expected value for interval without additional info
     @eOne = [0, 0] # expected values for interval with at least one bad 
word
     @kNone = [0, 1] # optimal interval query length in interval without 
info
     @kOne = [0, 0] # optimal interval query length in interval with one 
bad word
     @size = 2
   end

   def getStrategy(size, noInfo = true, quick = false)
     if size < @size
       return noInfo ? [@kNone[size], @eNone[size]] : [@kOne[size], 
@eOne[size]]
     end

     qToN = Math::exp(@size * Math::log(@qrob))
     @size.upto(size) do |n|
       # compute F_p(n)
       minExp = n.to_f
       minK = 1
       qToK = 1.0
       1.upto(n - 1) do |k|
         qToK *= @qrob
         thisExp = qToK * @eOne[n - k] + (1 - qToK) * (@eOne[k] + 
@eNone[n - k])
         if thisExp < minExp
           minK, minExp = k, thisExp
         end
       end
       @kOne[n] = minK
       @eOne[n] = 1.0 + minExp / (1 - qToN)

       # compute E_p(n)
       minExp = n.to_f
       minK = 1
       qToK = 1.0
       1.upto(n) do |k|
         qToK *= @qrob
         thisExp = @eNone[n - k] + (1 - qToK) * @eOne[k]
         if thisExp < minExp
           minK, minExp = k, thisExp
         end
       end
       @kNone[n] = minK
       @eNone[n] = 1 + minExp

       qToN *= @qrob
     end

     @size = size + 1;
     return noInfo ? [@kNone[size], @eNone[size]] : [@kOne[size], 
@eOne[size]]
   end

   def findWords(filter, size)
     @myWords = []
     getWordsRecursively(filter, 0...size, true)
     @myWords
   end

private
   def getWordsRecursively(filter, range, noInfo)
     rangesize = range.end - range.begin
     return if rangesize == 0
     if rangesize == 1
       if noInfo
         @myWords.push(range.begin) unless 
filter.sortedClean?([range.begin])
       else
         @myWords.push(range.begin)
       end
     else
       thisStrat = getStrategy(rangesize, noInfo)
       testRange = range.begin...(range.begin + thisStrat[0])
       testArray = []
       testRange.each {|k| testArray.push(k)}
       if filter.sortedClean?(testArray)
         getWordsRecursively(filter, (range.begin + 
thisStrat[0])...range.end, noInfo)
       else
         getWordsRecursively(filter, testRange, false)
         getWordsRecursively(filter, (range.begin + 
thisStrat[0])...range.end, true)
       end
     end
   end
end

# test
testsize = 1000
10.times do |level|
   testprob = 0.01 * (level + 1)

   myfilt = LanguageFilter.new(testsize, testprob)

   strat = QueryStrategy.new(testprob)
   testWords = strat.findWords(myfilt, testsize)
   number, found = myfilt.test(testWords)
   if found
     puts "#{testWords.size} words of #{testsize} found after #{number} 
calls (expected #{strat.getStrategy(testsize)[1]})"
   else
     puts "word list not found after #{number} calls"
   end
end


-- 
Michael Ulm
R&D Team
ISIS Information Systems Austria
tel: +43 2236 27551-219, fax: +43 2236 21081
e-mail: michael.ulm / isis-papyrus.com
Visit our Website: www.isis-papyrus.com