On 11/29/05, basi <basi_lio / hotmail.com> wrote:
> Yes, I learned this convention when I took a keyboarding (i.e., typing)
> lesson in high school. Sometime ago, a style manual for word processing
> appeared, and one of the advice is to use only one space to separate
> sentences. The reason given is that in a justified format, those two
> spaces can become four spaces, or even more. Anyway, a lot of text now
> has one or two spaces between sentences, and this wouldn't be a
> reliable indicator of sentence boundary.

I too learned the two space after a period convention years ago and
also recently learned that with modern fonts and word processors it is
not necessary. It was tricky to retrain myself, but I did, and have
been using just one space ever since.

So like you say, that isn't a reliable way to discern sentences.

I would recommend following the advice of first filtering out false
positives (possibly even replacing them with temporary markers, Mr.
becomes $MISTER$ or similar), then splitting on punctuation. If you
then test on various sample texts you should be able to find more
false positives that you might have missed.

Ryan