Ryan Leavengood wrote: > On 11/29/05, basi <basi_lio / hotmail.com> wrote: > >> Yes, I learned this convention when I took a keyboarding (i.e., >> typing) lesson in high school. Sometime ago, a style manual for >> word processing appeared, and one of the advice is to use only one >> space to separate sentences. The reason given is that in a >> justified format, those two spaces can become four spaces, or even >> more. Anyway, a lot of text now has one or two spaces between >> sentences, and this wouldn't be a reliable indicator of sentence >> boundary. > > > I too learned the two space after a period convention years ago and > also recently learned that with modern fonts and word processors it > is not necessary. It was tricky to retrain myself, but I did, and > have been using just one space ever since. > > So like you say, that isn't a reliable way to discern sentences. > > I would recommend following the advice of first filtering out false > positives (possibly even replacing them with temporary markers, Mr. > becomes $MISTER$ or similar), then splitting on punctuation. If you > then test on various sample texts you should be able to find more > false positives that you might have missed. Which will not help you at all with foreign languages. And don't forget putting i.e., e.g. or etc. in the list. This is an ongoing problem (think about the auto-correction 'feature' of capitalizing the first letter of every sentence in Openoffice or Word - something I always turn off because it is so insistent when it's wrong) Cheers, V.- -- http://www.braveworld.net/riva ____________________________________________________________________ http://www.freemail.gr - δωρεάν υπηρεσία ηλεκτρονικού ταχυδρομείου. http://www.freemail.gr - free email service for the Greek-speaking.