basi_lio wrote: > Looking for ideas on how to split a text file into sentences. I see the > problem of basing the split on [.!?] -- they're also used in ways other > than to end a sentence. If I have to do manual pre-processing of the > text file, what editing might I do? Has anyone had to deal with this > problem and how did you make life easier for you? > Thanks for the help. > basi If you make a regexp: [.!?]\s+[A-Z] you will already capture most. Most Abbreviations normally aren't followed by a space/capital letter. One change to this rule that I can think of is Mr. Name, Mrs. Name. But as you can see these have a <uppercase> followed by only one or two downcase letters. Most sentences would have at least five non uppercase in front of the <.> -> [A-Z]\w\w?\w?\w?\. -- Posted via http://www.ruby-forum.com/.