> Advisory tokens (which would tell me that I am now entering > the condition of if and now leaving it and now entering the > action part of it and so on) might do this. So you want to match the 'then' with it's owning 'if'? That's not something I've had to do yet, but it shouldn't be hard... How's this for an interface: I can add a new method to the Token class, let's call it match_id for now. Every time there's a token like 'if', '(', 'begin', that starts a nested context, the match_id of that token will be set to a unique value. When the corresponding 'end' or ')' comes along, it will have a match_id with the same value as the corresponding context opening token. We can easily have 'then' with a match_id corresponding to its 'if' as well. This should make it pretty easy to put the pieces together again afterward. Hmm... but there are tokens besides 'then' that can serve the same syntactical role: ':', ';', and newline in this case. So the same thing would have to happen with them, I guess. Do you want to know things like, this colon is standing in place of a then? What sorts of thing besides 'then' do you want to match to their owners? There are complications for incremental lexing too, which isn't something I do now, but I want to. Let me think a little about this. You might be getting these features in a subclass of RubyLexer. Heh. I just realized that strings now work the way you wanted originally, but I'm going to break that in a future version to be the way I want it. > In the past I have frequently had trouble > with the distinction of lexing and parsing in real language > parsing -- most languages require you to keep some context > for actually tokenizing them. Ruby, for example, requires that > your lexer knows about all kinds of quoted Strings and where > they end and interpolated expressions inside them. You can say that again. The amount of extra (non-lexical, strictly speaking) work to get RubyLexer working was phenomenal. You wouldn't believe all the squirrelly little cases. It makes the language easy to use, but hard to process programatically. Given the choice, I'd like to find a different way next time. If there could be one tool that does both at once... I don't know what that would look like. Reg might be able to do both, but in separate stages. > Nope, not really. I've just used it out of IRB. Integrating it > ought to be possible, but I'm not sure why that would be > necessary. It's necessary because I want to. Because irb's lexer is sometimes wrong, and freaks like me who use irb to explore the syntax get fooled sometimes. Because irb could use it to colorize input and output. (Maybe it's current lexer would serve for the last purpose...) > > Ps: I haven't figured out why this breaks RubyLexer yet, but I > > will. > > Good luck. :) I got a little way through it... aside from the unique use of whitespace, my big problem so far is handling the dos-style newlines. I handle common cases of it now, but pre is anything but common. Are you a windows person, or did you do that just to be more deviant and make my life difficult? :)