Whatever the original reason for the double spaces at the end of a line
started, the practice still continues.  
In fact, MS word has an option in its grammar checker to enforce one or two
spaces at the end of a sentence.  For a lot of people (like me), it is
nothing more than an old habit that is hard to break.  

The utility of this method for determining the end of a sentence depends
entirely on the purpose of the program.  If I were to write a routine to
parse text that I wrote, it would probably work pretty well, and it would
save me several hours of work trying to implement a fancier, more robust
routine.

The same routine would probably fail horribly for other users or a more
generic corpus of text.  

As a general rule, I like to use algorithms that are as simple as possible
for the job.  That, of course, depends a lot on what the job is.

Funny, I never thought something like spacing between sentences would be so
controversial.  I can almost envision _why making an esoteric remark about
the beauty of 'negative space' in text files. 

_Kevin
 

-----Original Message-----
From: Austin Ziegler [mailto:halostatue / gmail.com] 
Sent: Wednesday, November 30, 2005 12:40 PM
To: ruby-talk ML
Subject: Re: Splitting a text file into sentences

On 11/30/05, Jeffrey Schwab <jeff / schwabcenter.com> wrote:
> Austin Ziegler wrote:
> > On 11/29/05, Kevin Olbrich <kevin.olbrich / duke.edu> wrote:
> >>Depending on the text you might be able to search for a period (or 
> >>other
> >>punctuation) followed by two spaces.  It's not robust, but if you 
> >>know that convention will be followed by the authors, then it can work.
> > That, in fact, is a very *bad* metric to follow, as the proper 
> > spacing after sentence punctuation is a single space. The only 
> > reason that two spaces was used in the past is the space used 
> > between sentence endings in typeset work is a little wider than that 
> > used between words (an em-space vs. an en-space).
> Not true at all.  I was always taught to use double spaces after 
> sentences in grade-school homework assignments done on plain word 
> processors or typewriters.

Then, quite honestly, you were taught wrong. I was taught to use double
spaces with a typewriter or when using fixed-pitch fonts (although that was
later, since most computers and printers didn't have reliable kerning
routines until I was out of university).
Ultimately, the use of double spaces after a period is wrong *even with
fixed-pitch fonts*, but it was done to be clearer since the width of the
em-space and an en-space on a typewriter with a Courier-like font is exactly
the same. The two spaces *simulates* an em-space in a typeset piece of work.
(And that is *fact*, not opinion.)

-austin
--
Austin Ziegler * halostatue / gmail.com
               * Alternate: austin / halostatue.ca