Michael Ulm wrote: --snip-- > I assume that I know the dictionary size > and the probability with which each word is chosen for > the banned word list. Here is the part of ruby code I > use to create the banned word list > > def initialize(size, prob) > @size = size > if prob <= 0 or prob >= 1 # sanity check > throw "illegal probability in LanguageFilter::initialize" > end > > @badWords = [] > @size.times {|k| @badWords.push(k) if rand < prob} > end > > Note, that I use not words but an index of the words (so I > can use numbers 1...n instead of a word list). My object is > to minimize the expected number of email calls in this > scenario. Now, if I have such a word list with n words, I > can send an email of size k. If it does not get blocked, my > problem reduces to the same minimization problem with a\ > smaller size n - k. If the email does get blocked, my > problem reduces to two smaller problems, one of size k > and one of size n - k. In the word list of size k, I do > have slightly more information then before: I know that > there is at least one blocked word in it (this does not > make much of a difference for large k, but a _lot_ of > difference for small k). Now I can find recursive formulas > for the two kinds of expected values (available upon > request). > > I coded most of this, but unfortunately RL caught up with > me, and I will need another day or so until I can present a > full solution. > Just used my lunchbreak to implement the last details. I am afraid the code is not very cleanly written, I totally ignored speed so far, and I totally cheated in so far that I didn't use words at all but numbers. Here is the output of two test runs. 6 words of 1000 found after 142 calls (expected 196.20329267521) 11 words of 1000 found after 208 calls (expected 274.947142087926) 27 words of 1000 found after 337 calls (expected 333.688742114315) 23 words of 1000 found after 287 calls (expected 382.652436479999) 51 words of 1000 found after 442 calls (expected 425.491312500002) 64 words of 1000 found after 494 calls (expected 462.720759999999) 73 words of 1000 found after 490 calls (expected 495.858497499998) 64 words of 1000 found after 489 calls (expected 528.357760000002) 104 words of 1000 found after 567 calls (expected 560.076936390001) 101 words of 1000 found after 562 calls (expected 583.697899999996) 11 words of 1000 found after 208 calls (expected 196.20329267521) 24 words of 1000 found after 304 calls (expected 274.947142087926) 23 words of 1000 found after 282 calls (expected 333.688742114315) 38 words of 1000 found after 374 calls (expected 382.652436479999) 48 words of 1000 found after 425 calls (expected 425.491312500002) 63 words of 1000 found after 468 calls (expected 462.720759999999) 76 words of 1000 found after 531 calls (expected 495.858497499998) 87 words of 1000 found after 543 calls (expected 528.357760000002) 94 words of 1000 found after 580 calls (expected 560.076936390001) 91 words of 1000 found after 542 calls (expected 583.697899999996) ================ here is the code, version 0.1 ================ # this holds classes and methods to "solve" # ruby-talk quiz #9: Banned Words # # # creates the language filter class LanguageFilter def initialize(size, prob) @size = size if prob <= 0 or prob >= 1 # sanity check throw "illegal probability in LanguageFilter::initialize" end @badWords = [] @size.times {|k| @badWords.push(k) if rand < prob} @count = 0 end def countReset @count = 0 end def getBad @badWords end # checks if one of the numbers in the array testText is on the # badWords list def clean?(testText) sortedClean?(testText.sort) end # checks if one of the numbers in the array testText is on the # badWords list testText is assumed to be sorted def sortedClean?(testText) @count += 1 binStart, binEnd = 0, @badWords.size testText.each do |current| binStart = binarySearch(current, binStart, binEnd) return false if @badWords[binStart] == current end return true end def test(testWords) [@count, testWords.sort == @badWords] end private def binarySearch(what, searchStart, searchEnd) return searchEnd if what > @badWords[searchEnd - 1] while searchEnd - searchStart > 1 testSearch = (searchStart + searchEnd) / 2 testWord = @badWords[testSearch] return testSearch if testWord == what if testWord < what searchStart = testSearch else searchEnd = testSearch end end return searchStart end end class QueryStrategy def initialize(prob) @prob = prob @qrob = 1.0 - prob @eNone = [0, 1] # expected value for interval without additional info @eOne = [0, 0] # expected values for interval with at least one bad word @kNone = [0, 1] # optimal interval query length in interval without info @kOne = [0, 0] # optimal interval query length in interval with one bad word @size = 2 end def getStrategy(size, noInfo = true, quick = false) if size < @size return noInfo ? [@kNone[size], @eNone[size]] : [@kOne[size], @eOne[size]] end qToN = Math::exp(@size * Math::log(@qrob)) @size.upto(size) do |n| # compute F_p(n) minExp = n.to_f minK = 1 qToK = 1.0 1.upto(n - 1) do |k| qToK *= @qrob thisExp = qToK * @eOne[n - k] + (1 - qToK) * (@eOne[k] + @eNone[n - k]) if thisExp < minExp minK, minExp = k, thisExp end end @kOne[n] = minK @eOne[n] = 1.0 + minExp / (1 - qToN) # compute E_p(n) minExp = n.to_f minK = 1 qToK = 1.0 1.upto(n) do |k| qToK *= @qrob thisExp = @eNone[n - k] + (1 - qToK) * @eOne[k] if thisExp < minExp minK, minExp = k, thisExp end end @kNone[n] = minK @eNone[n] = 1 + minExp qToN *= @qrob end @size = size + 1; return noInfo ? [@kNone[size], @eNone[size]] : [@kOne[size], @eOne[size]] end def findWords(filter, size) @myWords = [] getWordsRecursively(filter, 0...size, true) @myWords end private def getWordsRecursively(filter, range, noInfo) rangesize = range.end - range.begin return if rangesize == 0 if rangesize == 1 if noInfo @myWords.push(range.begin) unless filter.sortedClean?([range.begin]) else @myWords.push(range.begin) end else thisStrat = getStrategy(rangesize, noInfo) testRange = range.begin...(range.begin + thisStrat[0]) testArray = [] testRange.each {|k| testArray.push(k)} if filter.sortedClean?(testArray) getWordsRecursively(filter, (range.begin + thisStrat[0])...range.end, noInfo) else getWordsRecursively(filter, testRange, false) getWordsRecursively(filter, (range.begin + thisStrat[0])...range.end, true) end end end end # test testsize = 1000 10.times do |level| testprob = 0.01 * (level + 1) myfilt = LanguageFilter.new(testsize, testprob) strat = QueryStrategy.new(testprob) testWords = strat.findWords(myfilt, testsize) number, found = myfilt.test(testWords) if found puts "#{testWords.size} words of #{testsize} found after #{number} calls (expected #{strat.getStrategy(testsize)[1]})" else puts "word list not found after #{number} calls" end end -- Michael Ulm R&D Team ISIS Information Systems Austria tel: +43 2236 27551-219, fax: +43 2236 21081 e-mail: michael.ulm / isis-papyrus.com Visit our Website: www.isis-papyrus.com