------art_6811_20206838.1202152392256
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

On Feb 4, 2008 12:14 PM, Clifford Heath <no / spam.please.net> wrote:

> Eric Mahurin wrote:
> >> compare:
> >>        space  rammar::Element[Set[?\
> ,?\t].duck!(::include?)].discard
> > So, doing 1..3, in the released grammar v0.5, you'd have this instead:
> > space  E[?\s] | E[?\t]).discard
>
> Definitely an improvement!
>
> >> Nathan's concept is that "grammar" should become a Ruby keyword,
> > I don't think ruby needs any new keywords.  It already has more than it
> > needs in my opinion.  There is enough power with just classes, methods,
> > blocks, and operator overloading.
>
> Part of the point of using a packrat-style parser is that there are
> no true keywords, meaning words that must only be used in their KW
> places. Many of Ruby's words are like that, of course. But my point
> was that once you say "grammar", you're now talking a different
> language, which can use as much and *only* as much of the Ruby
> grammar as it needs. And when inside a rule you say {, you're back
> in Ruby grammar... etc.
>

I definitely need to go learn about packrat/PEG stuff.  Sounds interesting
after looking at wikipedia.  Still don't really understand LR/LALR parsers.
My main focus has been LL/recursive-descent and PEG sounds to be
recursive-descent.

The normal lexer/parser split makes that fairly hard to do, as the
> lexer needs to know whether to return the KW meaning or a general
> identifier.


Fortunately in my grammar package the lexer and parser can be specified in
exactly the same way.  Also, you can use anything for tokens.  If you are
using pattern to match to them, it will just use the expression
(patternnext_token) to see if it is a match.  So, in the lexer, you might
generate a String for any identifier/keyword.  The lexer wouldn't care
whether it is a keyword or not.  In the parser, you then could use this to
match an arbitrary identifier:

E(String)   # (Stringnext_token) is true if next_token is a String

or if you wanted to match a keyword, you'd have this type of Grammar:

E("begin")  # ("begin"next_token) is true if next_tokenbegin"

You could also have some arbitrary matching function by defining your own
pattern class (with a #(v0.5) or # (dev) method).

>> Ruby DSL is far too awkward to express CQL - which is my primary
> >> project at present.
> > Don't know anything about CQL to answer.  I would like to make what I
> have
> > general enough.
>
> Take a look at the ORM diagrams under the images directory, and
> then at the CQL version of the same models, under
> <http://activefacts.rubyforge.org/svn/examples/>. You'll see a lot
> of what looks like unparsable plain English... but it's not. In
> fact, the language is much harder to parse than any of those examples,
> I don't have any published examples of queries (these are just
> declarations). The CQL parser is in
> <http://activefacts.rubyforge.org/svn/lib/activefacts/cql/> - you
> might be able to see where it's hard. You can be many tokens into
> recognising a fact_clause before realising you're looking at a
> "condition".
>
> It's possible that with your lookahead it would be possible though.


At one time I made a mini English parser, so this seems quite doable.

>> to see how to do it, whereas it won't ever happen with yours AFAICS.
> > It definitely could happen with mine.  I do ruby code generation now,
> but
> > could have another Grammar-like class
>
> Cool! I didn't realize you were generating code from this.
>
> > But, if you use the #lookahead method on any
> > grammar, it will give a new grammar that handles the case when the
> grammar
> > fails in the middle (backtracks to the position where the grammar
> started).
>
> You mean it backtracks to the start of the file? Or to where you called
> lookahead?


Just back to where that lookahead Grammar started parsing.  For other
Grammar's, there is only a one char/token lookahead.  If it matches the
first char/token, it is committed to that and will give an exception if a
mismatch occurs later.  When you apply lookahead to a Grammar, it starts
buffering the input and will rescue these exceptions.  It simply rewinds the
input to the point where this lookahead Grammar started parsing when an
exception is found.  That's all there is to it.

When a lexer is used, I don't think this backtracking is needed that often.
I guess that is why this memoization is so important with packrat/PEG
parsers - no lexer.

I have to think about whether memoization would apply or not.  It may work
since my stuff carries around very little state (mainly just with regards to
the input and output streams).

------art_6811_20206838.1202152392256--