Hi,

On Fri, 5 Nov 2004 09:04:30 +0900, Jason Sweat <jason.sweat / gmail.com> wrote:
> I wanted to learn Ruby, so I picked a small task of trying to write a
> command line script to parse PHP classes and shell out some unit test
> cases.  I have it working for the most part, but I ran across a
> problem trying to use Ruby regexp to find a set of matching curly
> braces.
> 
> Please forgive the intrusion of this PHP code onto the list, but I
> wanted to give you the flavor of what I am attempting to do, that can
> be easily done with recursive regular expression available in the Perl
> compatiable regexp engine.
> 
> <php>
> $test = <<<EOS
> /* some stuff */
> class foo {
>         public \$var;
>         public function __construct() {}
>         public function bar() {
>                 if (false) {
>                 }
>         }
> 
> }
> // some other stuff
> EOS;
> 
> $re = <<<EOS
> ~(class\s+\w+\s+({((?>[^{}]+)|(?2))*}))~xms
> EOS;
> preg_match($re, $test, $match);
> echo "your class matched:\n", $match[1];
> 
> </php>
> 
> Now it appears the regexp engine in Ruby does not support recursion
> (at least in Ruby ruby 1.8.2 (2004-07-29) [i686-linux] that I am
> working on, and with what I know how to test), thus the only
> workaround I found was very ugly, model the nesting of braces to a
> fixed depth, i.e.
> 
> open = '\{'
> close = '\}'
> other = '[^\{\}]*'
> l1 = other+open+other+close+other
> l2 = other+open+'('+l1 +')+'+other+close+other
> l3 = other+open+'('+l2 +')+'+other+close+other
> l4 = other+open+'('+l3 +')+'+other+close+other
> l5 = other+open+'('+l4 +')+'+other+close+other
> re = Regexp.new('class\s+'+@name+'\s+'+open+'((?:'+l5+')|(?:'+l4+')|(?:'+l3+')|(?:'+l2+')|(?:'+l1+')|(?:'+other+
> '))+'+close, 'ixm')
> 
> This code did work, but sometimes timed out on valid real classes.
> 
> I expect I am probably missing some facet of Ruby that eaily allows me
> to next regexp inside of the Ruby code in some fasion to achieve the
> result I am looking for, but how to do so eludes me.  Can anyone
> provide some insight for me on this situation?

Regular expressions, by all standard definitions, aren't recursive.
Perl's regexen have been extended to allow it, but it really isn't
considered a standard regex feature. You might try using a simple
tokenizer... here's a quick attempt:

def parse(code)
  chunks = []
  loop do
    chunks << text.slice!(/\A.*?(?=[{}])/m) # match start of string to
before next bracket
    bracket = text.slice! 0
    chunks << parse(text) if bracket == ?{
    return chunks if bracket == ?}
    return chunks if text.size == 0
  end
end

This returns a recursive array that holds all the text chunks around
the brackets. here's some sample code (can't remember exactly what php
looks like ATM) and sample output:

PHP code:

class Foo {
  def bar(){
    if true{
      do_stuff()
    }else{
      do_nothing()
    }
    clean_up()
  }
}


Then, after recursively stripping whitespace from strings, the
pretty-printed array:

["class Foo",
 ["def bar()",
  ["if true", ["do_stuff()"], "else", ["do_nothing()"], "clean_up()"],
  ""],
 nil]

uh, yeah, I know it drops off the last piece of text (reads it as nil)
but I don't want to figure that out just yet. Dinner calls :)

HTH,
Mark



> 
> Thanks,
> 
> Regards,
> Jason
> --
> http://blog.casey-sweat.us/
> 
>