------art_17657_9538849.1131030465435
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

On 11/3/05, nobu.nokada / softhome.net <nobu.nokada / softhome.net> wrote:
>
> Hi,
>
> At Thu, 3 Nov 2005 22:13:46 +0900,
> Christian Neukirchen wrote in [ruby-talk:163903]:
> > I don't think the real problem is the real grammar, that part of
> > parse.y looks like the easier one (and rather readable) to me. The
> > problem is the lexer-parser communication, think heredocs, %q[] etc.
> > There is no way to express that in BNF.
>
> Exactly. It's a headache.
>

Too much inheritance from perl :( heredocs are the worst of all.

Since I'm writing a ruby lexer and parser in my Grammar package I've really
been diving into the details. I'm trying to match the
lex_state/space_seen/etc way of doing things (from parse.y), but if I were
to start from a scratch, I possibly would make a lexer-free parser (no
tokens - parser deals directly with characters). Of course there would be a
performance hit (parser has to lookahead more), but you wouldn't have to
deal with the parser-driven lexer state stuff.

I would love to see the ruby syntax refactored and simplified - especially
with regards to the lexer state. The first thing would be to get rid of
heredocs or only allow whitespace/comments on the line after the initial <<
keyword. Secondly, I think it might be possible to reduce the lexer state to
one bit - whether the next operator is unary or binary. For example - %:
string vs. modulus, <<: heredoc vs. leftshift, `: execution quote vs. method
name. I may look at some of this simplification later.

------art_17657_9538849.1131030465435--