basi_lio wrote:
> Looking for ideas on how to split a text file into sentences. I see the
> problem of basing the split on [.!?] -- they're also used in ways other
> than to end a sentence. If I have to do manual pre-processing of the
> text file, what editing might I do? Has anyone had to deal with this
> problem and how did you make life easier for you?
> Thanks for the help.
> basi

If you make a regexp: [.!?]\s+[A-Z] you will already capture most. Most 
Abbreviations normally aren't followed by a space/capital letter.

One change to this rule that I can think of is Mr. Name, Mrs. Name. But 
as you can see these have a <uppercase> followed by only one or two 
downcase letters. Most sentences would have at least five non uppercase 
in front of the <.> ->
[A-Z]\w\w?\w?\w?\.

-- 
Posted via http://www.ruby-forum.com/.