On Thu, 2 Dec 2004 04:42:46 +0900
"Florian G. Pflug" <fgp / phlo.org> wrote:

> Brian Schröäer wrote:
> > I tried to implement an algorithm that is perfect regarding the chunk size
> > given the percentage information. This is described in
> > 
> > http://ruby.brian-schroeder.de/quiz/detect_words/content/content.html
> > or prettier
> > http://ruby.brian-schroeder.de/quiz/detect_words/content.ps
> > 
> > The problem is that this algorithm was not significantly better than always
> > dividing into three parts. (Surprisingly it was even slower). Maybe my math
> > is wrong there, if you detect the mind-bug it would be very interesting.
> 
> I believe that you oversimplified the problem, and therefore your math 
> is wrong (or, rather, a too coarse approximation, as your results prove).
> 
>  From your paper:
> N... Number of Words.
> n... Number of banned words.
> r:=P[word w is banned]=n/N
> p:=P[slice of size x passes] = (1-r)^x.
> 
> Your formula for p ((1-r)^x) assumes that the probability r is constant,
> while in reality it changes with every word you put into your slice.
> 
> The probability of a slice of size x passing is the same as the 
> probability that none of 100 words you randomly pick is banned.
> Now, while you pick (you pick only non-banned words, because you 
> immediatly stop if you get a banned one), the ration banned/total
> for the _remaining_ set constantly changes.
> 
> You can instantly see that your formula cannot be exact if you
> try to calculate the probability of a slice of size N passing.
> Your formulat gives (1-n/N)^N, which is probably quite small, but still
> greater than zero. The exact probability is obviosly zero (assuming n > 0)
> 
> Additionally, the probability r changes everytime you find a non-banned 
> slice. If, for example, you test 500 out of 100 words, and the slice 
> passes, the being-banned probability of the rest of the words doubles.
> 
> greetings, Florian Pflug
> 

Hello Florian,

thank you for taking the time to think about my ramblings. I'd love to take
some time to think more about this, so I think I won't have it. But I'm shure
to have learned something from you here.

best regards,

Brian

-- 
Brian Schröäer
http://www.brian-schroeder.de/