On Thu, 2 Dec 2004 04:42:46 +0900 "Florian G. Pflug" <fgp / phlo.org> wrote: > Brian Schröäer wrote: > > I tried to implement an algorithm that is perfect regarding the chunk size > > given the percentage information. This is described in > > > > http://ruby.brian-schroeder.de/quiz/detect_words/content/content.html > > or prettier > > http://ruby.brian-schroeder.de/quiz/detect_words/content.ps > > > > The problem is that this algorithm was not significantly better than always > > dividing into three parts. (Surprisingly it was even slower). Maybe my math > > is wrong there, if you detect the mind-bug it would be very interesting. > > I believe that you oversimplified the problem, and therefore your math > is wrong (or, rather, a too coarse approximation, as your results prove). > > From your paper: > N... Number of Words. > n... Number of banned words. > r:=P[word w is banned]=n/N > p:=P[slice of size x passes] = (1-r)^x. > > Your formula for p ((1-r)^x) assumes that the probability r is constant, > while in reality it changes with every word you put into your slice. > > The probability of a slice of size x passing is the same as the > probability that none of 100 words you randomly pick is banned. > Now, while you pick (you pick only non-banned words, because you > immediatly stop if you get a banned one), the ration banned/total > for the _remaining_ set constantly changes. > > You can instantly see that your formula cannot be exact if you > try to calculate the probability of a slice of size N passing. > Your formulat gives (1-n/N)^N, which is probably quite small, but still > greater than zero. The exact probability is obviosly zero (assuming n > 0) > > Additionally, the probability r changes everytime you find a non-banned > slice. If, for example, you test 500 out of 100 words, and the slice > passes, the being-banned probability of the rest of the words doubles. > > greetings, Florian Pflug > Hello Florian, thank you for taking the time to think about my ramblings. I'd love to take some time to think more about this, so I think I won't have it. But I'm shure to have learned something from you here. best regards, Brian -- Brian Schröäer http://www.brian-schroeder.de/