> Advisory tokens (which would tell me that I am now entering
> the condition of if and now leaving it and now entering the
> action part of it and so on) might do this.

So you want to match the 'then' with it's owning 'if'? That's not
something I've had to do yet, but it shouldn't be hard... How's this
for an interface:
I can add a new method to the Token class, let's call it match_id for
now. Every time there's a token like 'if', '(', 'begin', that starts a
nested context, the match_id of that token will be set to a unique
value. When the corresponding 'end' or ')' comes along, it will have a
match_id with the same value as the corresponding context opening
token. We can easily have 'then' with a match_id corresponding to its
'if' as well. This should make it pretty easy to put the pieces
together again afterward.

Hmm... but there are tokens besides 'then' that can serve the same
syntactical role: ':', ';', and newline in this case. So the same thing
would have to happen with them, I guess. Do you want to know things
like, this colon is standing in place of a then? What sorts of thing
besides 'then' do you want to match to their owners?

There are complications for incremental lexing too, which isn't
something I do now, but I want to. Let me think a little about this.
You might be getting these features in a subclass of RubyLexer.

Heh. I just realized that strings now work the way you wanted
originally, but I'm going to break that in a future version to be the
way I want it.


> In the past I have frequently had trouble
> with the distinction of lexing and parsing in real language
> parsing -- most languages require you to keep some context
> for actually tokenizing them. Ruby, for example, requires that
> your lexer knows about all kinds of quoted Strings and where
> they end and interpolated expressions inside them.

You can say that again. The amount of extra (non-lexical, strictly
speaking) work to get RubyLexer working was phenomenal. You wouldn't
believe all the squirrelly little cases. It makes the language easy to
use, but hard to process programatically. Given the choice, I'd like to
find a different way next time. If there could be one tool that does
both at once... I don't know what that would look like. Reg might be
able to do both, but in separate stages.

> Nope, not really. I've just used it out of IRB. Integrating it
> ought to be possible, but I'm not sure why that would be
> necessary.

It's necessary because I want to. Because irb's lexer is sometimes
wrong, and freaks like me who use irb to explore the syntax get fooled
sometimes. Because irb could use it to colorize input and output.
(Maybe it's current lexer would serve for the last purpose...)

> > Ps: I haven't figured out why this breaks RubyLexer yet, but I
> > will.
>
> Good luck. :)

I got a little way through it... aside from the unique use of
whitespace, my big problem so far is handling the dos-style newlines. I
handle common cases of it now, but pre is anything but common. Are you
a windows person, or did you do that just to be more deviant and make
my life difficult? :)