On Nov 16, 9:26 am, Charles Oliver Nutter <charles.nut... / sun.com>
wrote:
> Markus Liedl wrote:
> > The grammar is hosting language neutral. It must be interpreted or
> > translated to be run, i.e. to parse something. Currently there are two
> > translators, one to Emacs Lisp, the other to C. Both produce recursive
> > descent parsers.
>
> I would be interested in hearing more about the translation process, and
> the possibility of producing RDPs in Ruby and Java.

You may have a look in the file cp-gen.el. That's the elisp code to
generate the C parser. Every one of the 31 forms I mentioned before
expands nearly directly to some C block with very little analysis done
before. There are lots of temporary variables generated for various
purposes and its left to the C compiler to sort out the mess (which
gcc does pretty well).

If one doesn't want to use code generation, one might create 31
classes, each of them doing the work of one form.  Following the
Interpreter design pattern. The grammar reader would create object
configurations mirroring the rules in the grammar. Then you are not
able to use local variables for the capture variables and a few
others. That would cost speed again.


> > Without being sure, I'd like to claim the grammar is close to cover
> > 100% of the Ruby language. It does, for example, parse Ruby stdlib
> > completely.
>
> I'm not sure how much a measure that is; a parser could parse every
> token as a literal "1" and it would parse everything, but it wouldn't
> mean it's correct. Perhaps it's possible to roundtrip from the parsed
> result back to Ruby code and see whether the result is roughly the same
> as the original?

hmm? There are many correct parses, as seen in the file tests.el.
There may be many bugs too.

The unparser won't catch "interesting" bugs like wrong priority.

> > On the bad side parsers using this grammar work slower. Even the
> > faster of both implementations is many times slower than the MRI
> > parser.
>
> Do you expect this can be improved? If there's a performance hit for
> using this parser it will substantially limit adoption.

Well I didn't ever benchmark MRI's parser before writing this grammar.
The problem for any other parser is that it is blindingly fast.  Still
I don't believe speed is that important. To give you at least one
number: One my specific hardware, the C parser groks around 3 MBytes
of Ruby code per second. Does this seem fast or slow to you?

With careful work one may make the C parser maybe two times faster,
maybe more.  I don't think I will spend my time on that.


> What's your goal with this?

I will give another try at building a Ruby VM. I didn't want to use
one of those parse.y adapters. I think its really nice and useful to
have a portable parser.

>                              At the moment, I don't like that there's
> only two "mostly correct" parsers in existence: Ruby's Bison-based
> parser and JRuby's Jay-based parser. They're both pretty painful to work
> with and evolve.
>
> - Charlie