Robert Feldt <feldt / ce.chalmers.se> writes: > Any comments or ideas? Which solution would you prefer if you'd get to > choose? I know next to nothing about parsing and lexing, but it seems to me that you'd spend an inordinate amount of effort trying to produce a lexer that could deal with the kind of things that people dream up for tokens. Identifying regular expressions is a particularly hairy case that comes to mind: /[[]/ and friends are all special cases. So, I'd be in favor of providing simple hooks for adding my own code to the lexer. If this isn't language independent, then have a provision to add lexer chunks in multiple languages: you can get it all working in Ruby, then when you want to produce your C-based parser, give a command line option and it will chose your C-based lexer chunk. In the code example you showed for this (S2), you had a dedicated call per symbol. I assume this means that these chunks must be called often on a trial and error basis, looking for a match. You might be able to make this more efficient by (a) providing some context as part of the call and/or (b) allowing the lexing chunk to return the type of symbol found: Tokens Blank = /\s+/ :skip: calculated = %#{delimited_string} << ie '%' followed by a match by the delimited_symbol routine Tokenizers def delimited_string s, cp = @string, @current_position type = case s[cp] when 'q' then DelimQString when 'Q' then DelimIString when 'x' then DelimXString when 'r' then Regexp else return nil end # .. more stuff return type end But, as I said, I know little about the complexities of such a beast. Dave