Joseph McDonald <joe / vpop.net> writes:

> string = "start {{blah {{ }} outside\n{{ {{ }}\n}} bye"

> Is there a way to do such a thing with a regex, or do I need to resort
> to a scanning technique, keeping track of the nested elements?

You can do it with regexps fairly efficiently, but it's a tad
messy. What I do it flag matching delimiters first, using something
like:

    count = "0000";

    1 while @content.gsub!(/\{([^\{\}]*)\}/m) {
      count = count.succ
      "#{MAGIC}:#{count}:#$1#{MAGIC1}:#{count}:"
    }

    @content.gsub!(/#{MAGIC}/,  '{')
    @content.gsub!(/#{MAGIC1}/, '}')

The pattern matches paired braces that don't themselves contain
braces, so it works progressively from the deeper nestings out. It
replaces braces with two magic characters (I use ^A and ^B), followed
by a count. It then replaces the magic characters with braces, just
to I can read the string when debugging.


   { hello { dave } }

will become

   {:0002: hello {:0001: dave }:0002: }:0001:


You can then match specific constructs between balanced braces using:

   txt.gsub(/\{:(\d\d\d\d):.*?\}:\1:/, "stuff")

The \1 back-reference means the counter on the braces gets matched,
and the .*? non-greedy match is simply for efficiency.

This sounds scary, but I use it to convert the Ruby book source, which
contains things like:

    \begin{method}{pack}{\self.pack ( \obj{aTemplateString} ) 
        \returns{\obj{aBinaryString}}}{A}\label{ref:arraypack}%


into XML. It seems to be relatively efficient.


Regards


Dave