Hello --

On Sat, 12 Jan 2002, Massimiliano Mirra wrote:

> On Sat, Jan 12, 2002 at 03:56:15AM +0900, Jack Dempsey wrote:
> > Often I will want to do many regex substitutions: different patterns
> > with different replacements.
>
> Here is a stripped down version of a proofreader tool I wrote.  Half
> of it would be enough to make my point, but somebody might find it
> useful beyond this subject so I'm posting the whole gsub part.

[...]

> text = STDIN.read
>
> # rp, lp and cp stand for right- left- center-puctuation
> rp_re = /[!$%&\)+,\.:;=>?@\]^|}~]/
> rp    = rp_re.source
>
> lp_re = /[\(\-<\[{]/
> lp    = lp_re.source
>
> cp_re = /[\/'\-\\]/
> cp    = cp_re.source

[...]

>   # replace single lf's with white space
>   [/\n/, ' '],
>   # delete whitespaces at line beginnings
>   [/^\s*/, ''],
>   # correct ellipsis
>   [/\.\.\.\.+/, '...'],

The three-dot ellipsis is correct in cases where something inside a
sentence is elided.  The first dot is really the period at the end of
a sentence.

>   # delete spaces before right side punctuaction
>   [/\s*(#{rp})/, '\1'],
>   # ensure spaces after right side punctuation
>   [/(#{rp})(\w)/, '\1 \2'],

What if your text is:

  The product cost $10.00.  And 3 + 4 = 7.  @var is an example of a
  Ruby instance variable.

>   # delete spaces after left side punctuation
>   [/(#{lp})\s*/, '\1'],
>   # ensure spaces before left side punctuation
>   [/(\w)(#{lp})/, '\1 \2'],
>   # correct punctuation that should be attached to both sides
>   [/\s*(#{cp})\s*/, '\1'],
>   # correct acronyms, even if we fucked them up previously
>   [/([A-Z]\.) ([A-Z]\.) /, '\1\2'],

What about:

  I write programs in C. Do you?

>   # finally, re-expand paragraph breaks (cr) to double lf
>   [/\r/, "\n\n"]
> ]
>
> substitutions.each {|from, to| text.gsub!(from, to)}

Hmmmm... looks sort of hash-like.

Just being pedantic :-)


David

-- 
David Alan Black
home: dblack / candle.superlink.net
work: blackdav / shu.edu
Web:  http://pirate.shu.edu/~blackdav