On Sat, Jan 12, 2002 at 12:35:49PM +0900, David Alan Black wrote:
> >   # correct ellipsis
> >   [/\.\.\.\.+/, '...'],
> Whoops, I meant to put that the other way around: the four-dot ellipsis
> is correct in cases where there's a sentence ending right before the
> ellision.

Never seen that case.  Oh, to be fair I should have added that the
target of this tool is some people I help with putting pages on line
(and clearly whose text I got tired to manually edit) and the way it
has been put together is ``send me three of your texts and I'll write
a program that cleans them''.  So admittedly not every case is
handled, but it passes my tests. :-)
 
> >   # delete spaces before right side punctuaction
> >   [/\s*(#{rp})/, '\1'],
> >   # ensure spaces after right side punctuation
> >   [/(#{rp})(\w)/, '\1 \2'],
> What if your text is:
> 
>   The product cost $10.00.  And 3 + 4 = 7.  @var is an example of a
>   Ruby instance variable.

We usually put the $ after the figure over here; the expression needs
be 3+4=7 or it will be broken on html pages and in wrapped paragraphs,
and (unfortunately) none of those people is going to write texts
containing instance variables, about the only thing containing `@'s
will be email addresses.  But yes, all these could eventually be in
separate replacement lists that could be switched on and off depending
on locale and other settings.

> >   # correct acronyms, even if we fucked them up previously

Whooops, *I* should do some edit before sending texts out. <blush>

> >   [/([A-Z]\.) ([A-Z]\.) /, '\1\2'],
> What about:
> 
>   I write programs in C. Do you?

Well?  Looks fine to me.

> > substitutions.each {|from, to| text.gsub!(from, to)}
> Hmmmm... looks sort of hash-like.

It is.  I did not use an hash because the substitutions are
order-sensitive.
 
> Just being pedantic :-)

And I thank you for that.  Being pedantic is why I wrote that to begin
with. ;-)

Massimiliano