Brian Schröäer wrote: > I tried to implement an algorithm that is perfect regarding the chunk size given the percentage information. This is described in > > http://ruby.brian-schroeder.de/quiz/detect_words/content/content.html > or prettier > http://ruby.brian-schroeder.de/quiz/detect_words/content.ps > > The problem is that this algorithm was not significantly better than always dividing into three parts. (Surprisingly it was even slower). Maybe my math is wrong there, if you detect the mind-bug it would be very interesting. I believe that you oversimplified the problem, and therefore your math is wrong (or, rather, a too coarse approximation, as your results prove). From your paper: N... Number of Words. n... Number of banned words. r:=P[word w is banned]=n/N p:=P[slice of size x passes] = (1-r)^x. Your formula for p ((1-r)^x) assumes that the probability r is constant, while in reality it changes with every word you put into your slice. The probability of a slice of size x passing is the same as the probability that none of 100 words you randomly pick is banned. Now, while you pick (you pick only non-banned words, because you immediatly stop if you get a banned one), the ration banned/total for the _remaining_ set constantly changes. You can instantly see that your formula cannot be exact if you try to calculate the probability of a slice of size N passing. Your formulat gives (1-n/N)^N, which is probably quite small, but still greater than zero. The exact probability is obviosly zero (assuming n > 0) Additionally, the probability r changes everytime you find a non-banned slice. If, for example, you test 500 out of 100 words, and the slice passes, the being-banned probability of the rest of the words doubles. greetings, Florian Pflug