Whatever the original reason for the double spaces at the end of a line started, the practice still continues. In fact, MS word has an option in its grammar checker to enforce one or two spaces at the end of a sentence. For a lot of people (like me), it is nothing more than an old habit that is hard to break. The utility of this method for determining the end of a sentence depends entirely on the purpose of the program. If I were to write a routine to parse text that I wrote, it would probably work pretty well, and it would save me several hours of work trying to implement a fancier, more robust routine. The same routine would probably fail horribly for other users or a more generic corpus of text. As a general rule, I like to use algorithms that are as simple as possible for the job. That, of course, depends a lot on what the job is. Funny, I never thought something like spacing between sentences would be so controversial. I can almost envision _why making an esoteric remark about the beauty of 'negative space' in text files. _Kevin -----Original Message----- From: Austin Ziegler [mailto:halostatue / gmail.com] Sent: Wednesday, November 30, 2005 12:40 PM To: ruby-talk ML Subject: Re: Splitting a text file into sentences On 11/30/05, Jeffrey Schwab <jeff / schwabcenter.com> wrote: > Austin Ziegler wrote: > > On 11/29/05, Kevin Olbrich <kevin.olbrich / duke.edu> wrote: > >>Depending on the text you might be able to search for a period (or > >>other > >>punctuation) followed by two spaces. It's not robust, but if you > >>know that convention will be followed by the authors, then it can work. > > That, in fact, is a very *bad* metric to follow, as the proper > > spacing after sentence punctuation is a single space. The only > > reason that two spaces was used in the past is the space used > > between sentence endings in typeset work is a little wider than that > > used between words (an em-space vs. an en-space). > Not true at all. I was always taught to use double spaces after > sentences in grade-school homework assignments done on plain word > processors or typewriters. Then, quite honestly, you were taught wrong. I was taught to use double spaces with a typewriter or when using fixed-pitch fonts (although that was later, since most computers and printers didn't have reliable kerning routines until I was out of university). Ultimately, the use of double spaces after a period is wrong *even with fixed-pitch fonts*, but it was done to be clearer since the width of the em-space and an en-space on a typewriter with a Courier-like font is exactly the same. The two spaces *simulates* an em-space in a typeset piece of work. (And that is *fact*, not opinion.) -austin -- Austin Ziegler * halostatue / gmail.com * Alternate: austin / halostatue.ca